Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, , Soyi Jung, , Zhu Han, , and Joongheon Kim G.S. Kim, Y. Cho, J. Chung, and J. Kim are with the School of Electrical Engineering, Korea University, Seoul 02841, Korea (e-mails: {kingdom0545,joyena0909,rupang1234,joongheon}@korea.ac.kr). S. Park is with the Division of Computer Science, Sookmyung Women’s University, Seoul 04310, Korea (e-mail: soohyun.park@sookmyung.ac.kr). S. Jung is with the Department of Electrical and Computer Engineering, Ajou University, Suwon 16499, Korea (e-mail: sjung@ajou.ac.kr). Z. Han is with the Department of Electrical and Computer Engineering, University of Houston, Texas, USA (e-mail: zhan2@uh.edu).
Abstract

Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for providing cooperatively global access sustainability and energy efficiency. However, as the number of CubeSats and HALE-UAVs, increases, the scheduling dimension of each ground station (GS) increases. As a result, each GS can fall into the curse of dimensionality, and this challenge becomes one major hurdle for efficient global access. Therefore, this paper provides a quantum multi-agent reinforcement Learning (QMARL)-based method for scheduling between GSs and CubeSats/HALE-UAVs in order to improve global access availability and energy efficiency. The main reason why the QMARL-based scheduler can be beneficial is that the algorithm facilitates a logarithmic-scale reduction in scheduling action dimensions, which is one critical feature as the number of CubeSats and HALE-UAVs expands. Additionally, individual GSs have different traffic demands depending on their locations and characteristics, thus it is essential to provide differentiated access services. The superiority of the proposed scheduler is validated through data-intensive experiments in realistic CubeSat/HALE-UAV settings.

Index Terms:
Quantum Multi-Agent Reinforcement Learning (QMARL), Quantum Neural Network (QNN), Cube Satellite (CubeSat), High-Altitude Long-Endurance Unmanned Aerial Vehicle (HALE-UAV), Space-Air-Ground Integrated Network (SAGIN).
Refer to caption
Figure 1: Reference network model.

1 Introduction

Ultra-small-scale and low-cost cube satellites (CubeSats) have recently emerged as novel electrical aerospace devices in non-terrestrial networks (NTN) as one major component of global space-air-ground integrated network (SAGIN) systems in order to realize seamless global access services [1]. In the past, geostationary (GEO) satellites at the altitude of approximately 36,0003600036,00036 , 000 km were employed for the global access services, yet their considerable distances from the Earth introduced extremely long propagation delays, which hindered the global access services [2]. Given that CubeSats operate as low Earth orbit (LEO) satellites at the altitude of approximately 500500500500 km, they are more adept at facilitating global access services, offering reduced delays compared to GEO-based services [3, 4]. However, the lower altitude of CubeSats, results in considerably smaller coverage compared to GEO-based services. Consequently, in order to achieve seamless global access, a significantly larger fleet of CubeSats is essentially required [5]. To take care of large-scale CubeSats, it is essentially required to design efficient scheduling algorithms for global access availability and energy efficiency. For more details, employing CubeSats to deliver global SAGIN mobile access necessitates determinations regarding which CubeSats should engage in the global access amidst a scenario where a multitude of CubeSats are present. This scenario culminates in a scheduling problem, which can be conceptualized within the framework of multi-agent reinforcement learning (MARL) [6]. The essence of this approach stems from the necessity for multiple ground stations (GSs) to collaboratively orchestrate the scheduling and servicing of their CubeSats to facilitate global SAGIN mobile access, as depicted in Fig. 1. In the environment where multiple CubeSats exist, each GS cooperatively schedules CubeSats to participate in global SAGIN mobile access, and the corresponding efficient scheduling algorithms are needed. Due to CubeSat’s limited resources such as limited energy and bandwidth, without an efficient scheduling algorithm, it is impossible to optimally utilize these resources, maintain high quality of service (QoS), and provide optimal global access services [7]. Additionally, in the dynamic environments where the coverages of specific areas are constantly changing due to the CubeSat’s high orbital speed, it is important to schedule each GS to connect to the CubeSat in order to improve access availability and energy efficiency. Furthermore, according to the fact that the mobile access demands and requirements of individual GSs are all different depending on their locations, differentiated scheduling algorithms that can take of the characteristics, demands, and requirements of individual GSs are essentially required.

Even though CubeSats can be widely used for next-generation global SAGIN mobile access, CubeSats encounter constraints in delivering global access autonomously, owing to their restricted scales and energy capacities [8]. Hence, despite the capacity of multiple CubeSats to collectively cover extensive areas, there might persist coverage gaps in remote areas, polar regions, or the areas experiencing significant communication burdens. Moreover, the rapid orbital velocity of CubeSats, approximately 7.57.57.57.5 km/s, results in frequent handovers [9]. To maintain uninterrupted global access, it becomes necessary to integrate new aerial networks that focus on specific local regions and CubeSats must be considered together [10]. Finally, despite CubeSats experiencing reduced delay time compared to GEO satellites, their delay time is still significant challenge when contrasted with terrestrial networks (TNs). Consequently, the deployment of innovative NTN devices to support CubeSats is essential for ensuring seamless global access.

To address these challenges, this paper proposes cooperative and differentiated global SAGIN mobile access involving both CubeSats and aerial networks. The aerial networks, possessing enhanced mobility compared to CubeSats that follow predetermined orbits, are capable of more adaptable responses to changing environmental conditions. Consequently, unmanned aerial vehicles (UAVs) are particularly beneficial for establishing networks across diverse regions characterized by uncertainty [11]. Despite their utility, rotorcrafts consume a significant amount of energy, posing challenges to the seamless global SAGIN mobile access. Therefore, the system discussed in this paper employs high-altitude long-endurance (HALE)-UAVs, which are fixed-wing aircraft, to overcome these limitations. The HALE-UAVs are distinguished by their capacity for long-distance flights, attributed to their substantial endurance and energy levels. Furthermore, the attributes of the HALE-UAV, one of fixed-wing aircrafts, enable them to sustain flight longer than rotary-wing aircrafts even in the scenarios where its control systems can be damaged [12]. Ultimately, HALE-UAVs can supplement CubeSats in providing flexible and extensible coverages for particular regions, such as polar areas lacking signal availability, or the regions burdened with communication overheads [13, 14]. Based on these issues and architecture characteristics, we need to design a new global SAGIN scheduling algorithm.

Moreover, the need for effective scheduling becomes paramount in the scenarios populated by numerous CubeSats and HALE-UAVs. In order to realize effective scheduling for CubeSats and HALE-UAVs in terms of access availability and energy efficiency, cooperative and differentiated global SAGIN mobile access should be proposed. In this scheduling problem, the goal is to simultaneously improve access availability in terms of QoS and capacity as well as energy efficiency in NTN devices, i.e., CubeSats and HALE-UAVs. To achieve this, we have to consider the hardware restrictions of CubeSats and HALE-UAVs at the same time. For CubeSats, their geographical coordinates in terms of latitude and longitude as well as the direction vector toward the sun for solar charging undergo real-time alterations due to their orbital movement. Furthermore, CubeSats frequently sustain damage from cosmic rays and solar winds. Similarly, the flight environment for HALE-UAVs is characterized by dynamic and uncertain conditions, including the presence of vortices and gusts. Moreover, due to the limited energy levels and capacities of NTN devices, collaboration among these NTN devices is crucial for the simultaneous optimization of energy efficiency and channel capacity.

Distinct from conventional scheduling algorithms, reinforcement learning (RL) exhibits robust performance in dynamic and uncertain environments [15, 16, 17]. MARL proves particularly effective in situations that require cooperation among multiple NTN devices [18]. Consequently, within global SAGIN mobile access that utilizes CubeSats and HALE-UAVs, MARL-based algorithms based on MARL may be employed, with multiple GSs acting as agents. Nevertheless, conventional MARL-based schedulers are unable to ensure reward convergence as the number of agents and action dimensions of GS expands. To tackle these issues, this paper proposes a novel cooperative and differentiated scheduling algorithm for access availability and energy efficiency in global SAGIN mobile access, leading to the development of quantum MARL (QMARL) [19]. This innovation utilizes the basis measurements, known as projection-valued measure (PVM), allowing the proposed QMARL-based scheduler to diminish the action dimension to a logarithmic scale [20]. Furthermore, realistic experimental setting is constructed to demonstrate the superiority and real-world relevance of our proposed QMARL-based scheduler. This includes the use of actual CubeSat orbital data, aerodynamic information about real HALE-UAVs environments with significant vortices, and the considerations for photovoltaic (PV) charging based on the CubeSats’ relative positions to the sun, i.e., the sun side and dark side. Additionally, each GS, which is an agent, has its own differentiated maximum required channel capacity depending on the region where each GS is located, the population of that region, and the degree of communication overload. Without these settings, excessive global SAGIN mobile access may be provided to GSs that do not require communication services beyond a certain requirement, and GSs with severe communication overload may not be provided with the desired level of global access. Eventually, this can result in the energy of NTN devices (i.e., CubeSats and HALE-UAVs) being wasted, uselessly. In conclusion, the efficacy of our proposed QMARL-based scheduler is validated within realistic environments, evidencing that the algorithm fulfills its objectives by simultaneously optimizing the access availability in SAGIN and the energy efficiency in NTN devices amidst scenarios characterized by high action dimensions. Ultimately, in this paper, our considering SAGIN mobile access network is implemented using multiple GSs, CubeSats, and HALE-UAVs through our proposed QMARL-based scheduler at high action dimensions, and the proposed algorithm is tested in realistic environments to increase real-world applicability.

The main contributions are as follows.

  • First of all, this paper is the first attempt to employ a QMARL-based global SAGIN mobile access scheduler for the coordination of CubeSats and HALE-UAVs. The uniqueness of this scheduler stems from its emphasis on reducing the action dimensions through the PVM. Furthermore, a new reward function is designed and implemented to encourage cooperative global SAGIN mobile access, and efficient and equitable energy usage of NTN devices in multi-CubeSats and multi-HALE-UAVs environments.

  • Moreover, the proposed QMARL-based scheduler is designed for the coordinated and differentiated global SAGIN mobile access with multiple GSs, CubeSats, and HALE-UAVs. Furthermore, our proposed scheduling also works for energy efficiency in CubeSats and HALE-UAVs. In order to realize this, the reward function of our proposed QMARL-based scheduler is formulated, and thus, it addresses the energy utilization efficiency of CubeSats, taking into account their exposure to the sun side or dark side, which is crucial given their limited energy capacities due to their compact sizes.

  • Lastly, the efficacy of the proposed algorithm is assessed under realistic experimental environments involving CubeSat that orbits in real space areas as well as HALE-UAV that flies in the real sky. The orbital elements for CubeSats are derived from the two line element (TLE), which provide the foundational data related orbit for these CubeSats. The experiment incorporates a range of realistic aerodynamic characteristics of HALE-UAVs to enhance the algorithm’s real-world applicability. In addition, specific considerations on the differentiated maximum channel capacity in individual GSs show realistic experimental environments depending on the regions where individual GSs are located, the populations of the regions, and the degrees of communication overloads.

The rest of this paper is organized as follows. Sec. 2 presents preliminary knowledge including related work and QMARL. Sec. 3 describes the fundamental modeling and Sec. 4 presents the details of our proposed QMARL-based scheduler. Sec. 5 evaluates the performance in realistic environments, and lastly, Sec. 6 concludes this paper.

2 Preliminaries

2.1 Related Work

Numerous projects focus on establishing wireless connections to create aerial NTN devices, including UAVs or satellites [21]. Given that these rely on battery-based energy management, minimizing energy consumption is crucial to stable operation in unknown environments for the efficient operation of multiple UAVs and satellites [22]. In the literature, the efficient operation of multiple UAVs has garnered significant attention [23]. Minimizing energy consumption is important to stable operation in unfamiliar environments, necessitating efficient communications [24]. At the same time, efficient scheduling among satellites is imperative to ensure swift responses to diverse sightings and unforeseen events [25]. UAVs, characterized by remarkable acquisition flexibility and very high spatial resolution (VHSR), and LEO satellites, capable of providing time-series data across extensive areas, have traditionally been employed independently. However, the proposed algorithm in [26] can minimize total energy costs and reduce time complexity which is crucial for optimizing their effective operation for both UAVs and satellites. Therefore UAVs and satellites must be controlled cooperatively to improve performance [27]. To efficiently manage both UAVs and satellites, numerous studies have demonstrated different methodologies for applying RL algorithms [28]. The proposed algorithm in [29] proves the superiority of RL, particularly beneficial in the management of multiple agents. However, to build global SAGIN mobile access, more agents need to be controlled [30]. Notably, quantum algorithms have advantages in managing large-scale scenarios, such as those encountered in aerial networks [31]. This paper demonstrates the superiority of using QRL over RL in multi-agent scheduling.

2.2 Quantum Neural Network

Refer to caption
(a) Qubit on the Bloch sphere.
Refer to caption
(b) QNN architecture.
Figure 2: Qubit and QNN architecture.

In QNN architectures, a significant deviation from classical neural networks is the utilization of qubits as the unit for basic learning computations [32]. Within quantum systems, qubits stand as the fundamental units of information, and their representation is grounded in the base states of |0:=[1,0]Tassignket0superscript10𝑇\left|0\right\rangle:=[1,0]^{T}| 0 ⟩ := [ 1 , 0 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and |1:=[0,1]Tassignket1superscript01𝑇\left|1\right\rangle:=[0,1]^{T}| 1 ⟩ := [ 0 , 1 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. The representation of a single qubit state can be realized through a normalized 2D complex vector as |ψ=α|0+β|1ket𝜓𝛼ket0𝛽ket1\left|\psi\right>=\alpha\left|0\right>+\beta\left|1\right>| italic_ψ ⟩ = italic_α | 0 ⟩ + italic_β | 1 ⟩ and α2+β2=1superscriptnorm𝛼2superscriptnorm𝛽21\left\|\alpha\right\|^{2}+\left\|\beta\right\|^{2}=1∥ italic_α ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 holds, where α2superscriptnorm𝛼2\left\|\alpha\right\|^{2}∥ italic_α ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and β2superscriptnorm𝛽2\left\|\beta\right\|^{2}∥ italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denote the probabilities of observing |0ket0\left|0\right>| 0 ⟩ and |1ket1\left|1\right>| 1 ⟩, respectively. The QNN computation is carried out over the 3D Bloch sphere, defined as the Hilbert space which represents the quantum domain. Expressing this within the Bloch sphere, which serves as a representation of the quantum domain, it can be geometrically denoted as, |ψ=cos(θ)|0+eiϕsin(θ)|1ket𝜓𝜃ket0superscript𝑒𝑖italic-ϕ𝜃ket1\left|\psi\right>=\cos(\theta)\left|0\right>+e^{i\phi}\sin(\theta)\left|1\right>| italic_ψ ⟩ = roman_cos ( italic_θ ) | 0 ⟩ + italic_e start_POSTSUPERSCRIPT italic_i italic_ϕ end_POSTSUPERSCRIPT roman_sin ( italic_θ ) | 1 ⟩, where θ𝜃\thetaitalic_θ denotes a parameter that determines the probabilities of measuring |0ket0\left|0\right>| 0 ⟩ and |1ket1\left|1\right>| 1 ⟩, and ϕitalic-ϕ\phiitalic_ϕ represents the relative phase, respectively, where 0θπ0𝜃𝜋0\leq\theta\leq\pi0 ≤ italic_θ ≤ italic_π and 0ϕ<2π0italic-ϕ2𝜋0\leq\phi<2\pi0 ≤ italic_ϕ < 2 italic_π [32]. Fig. 2(a) shows a qubit represented over the Bloch sphere. When considering a q𝑞qitalic_q qubit system, the representation of quantum states within the system’s Hilbert space is as |ψ=l=02q1ωl|lket𝜓subscriptsuperscriptsuperscript2𝑞1𝑙0subscript𝜔𝑙ket𝑙\left|\psi\right\rangle=\sum^{2^{q}-1}_{l=0}\omega_{l}\left|l\right\rangle| italic_ψ ⟩ = ∑ start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | italic_l ⟩, where |ψket𝜓\left|\psi\right\rangle| italic_ψ ⟩ denotes the quantum state, |lket𝑙\left|l\right\rangle| italic_l ⟩ represents l𝑙litalic_l-th basis, and ωlsubscript𝜔𝑙\omega_{l}italic_ω start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT stands for the probability amplitude of q𝑞qitalic_q qubit system, respectively. Then, the probability amplitude fulfills l=02q1|ωl|2=1subscriptsuperscriptsuperscript2𝑞1𝑙0superscriptsubscript𝜔𝑙21\sum^{2^{q}-1}_{l=0}|\omega_{l}|^{2}=1∑ start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1. A significant component in classical neural networks is a hidden layer, capable of representing linear and nonlinear transformations to achieve accurate function approximation within the neural network. Hence, the primary design consideration factors in QNN involve designing and implementing linear and nonlinear transformations over the 3D sphere. This QNN design facilitates the fundamental enablement of QRL-based control, achieved by incorporating the states and actions of RL-based control as inputs and outputs within QNN architectures.

In QNN architecture, there are three primary components: (i) state encoding, (ii) parameterized quantum circuit (PQC), and (iii) measurement, as illustrated in Fig. 2(b).

  • State Encoding. The encoder performs the function of converting the classical data, represented as ζtsubscript𝜁𝑡\zeta_{t}italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at a specific time t𝑡titalic_t, to the initialized quantum state |0ket0\left|0\right\rangle| 0 ⟩. The encoder carries out this function due to the inability of quantum circuits to directly accept classical bits. Through the application of multiple unitary matrices, denoted as U()𝑈U(\cdot)italic_U ( ⋅ ), this encoding transformation is achieved mathematically. An important point to highlight is that the encoder does not include any trainable parameters. Thus, the encoded quantum state of the QNN at a specific time t𝑡titalic_t is defined as |ψ0;t=UENC(ζt)|0qketsubscript𝜓0𝑡subscript𝑈ENCsubscript𝜁𝑡superscriptket0tensor-productabsent𝑞\left|\psi_{0;t}\right\rangle=U_{\text{ENC}}(\zeta_{t})\left|0\right\rangle^{% \otimes q}| italic_ψ start_POSTSUBSCRIPT 0 ; italic_t end_POSTSUBSCRIPT ⟩ = italic_U start_POSTSUBSCRIPT ENC end_POSTSUBSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | 0 ⟩ start_POSTSUPERSCRIPT ⊗ italic_q end_POSTSUPERSCRIPT, where the classical data ζtsubscript𝜁𝑡\zeta_{t}italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT serves as rotation angles within the set of encoding gates U𝑈Uitalic_U.

  • PQC. The operations performed by PQC are analogous to the multiplications seen in the accumulated hidden layers of classical neural networks. Quantum gates can transform the state of qubits through the operations they perform [32]. Within this paper, the following three gates will be introduced: Pauli, Controlled, and rotation gates [32]. Outlined below are the definitions for Pauli-ΓΓ\Gammaroman_Γ gates and Controlled-ΓΓ\Gammaroman_Γ gates, i.e., X=[0110]𝑋matrix0110X\!\!=\!\!\begin{bmatrix}0&1\\ 1&0\end{bmatrix}\!italic_X = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ], Y=[0ii0]𝑌matrix0𝑖𝑖0\!Y\!\!=\!\!\begin{bmatrix}0&\!-\!i\\ i&0\end{bmatrix}\!italic_Y = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL - italic_i end_CELL end_ROW start_ROW start_CELL italic_i end_CELL start_CELL 0 end_CELL end_ROW end_ARG ], Z=[1001]𝑍matrix1001\!Z\!\!=\!\!\begin{bmatrix}1&0\\ 0&\!-\!1\end{bmatrix}\!italic_Z = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - 1 end_CELL end_ROW end_ARG ], and CΓ=[I00Γ]𝐶ΓmatrixI00ΓC\Gamma\!\!=\!\!\begin{bmatrix}{\textbf{{I}}}&0\\ 0&\Gamma\end{bmatrix}italic_C roman_Γ = [ start_ARG start_ROW start_CELL I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL roman_Γ end_CELL end_ROW end_ARG ], where i=(1)𝑖1i=\sqrt{(-1)}italic_i = square-root start_ARG ( - 1 ) end_ARG, Γ{X,Y,Z}for-allΓ𝑋𝑌𝑍\forall\Gamma\in\left\{X,Y,Z\right\}∀ roman_Γ ∈ { italic_X , italic_Y , italic_Z }, and I stands for the identity matrix, respectively. The Pauli-ΓΓ\Gammaroman_Γ gates perform 180superscript180180\,^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT rotations of the quantum state in the x, y, and z axes of the Bloch sphere. Between two qubits, the Controlled-ΓΓ\Gammaroman_Γ gates produce entanglement. Within QNN, rotation gates RΓsubscript𝑅ΓR_{\Gamma}italic_R start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT featuring the trainable parameters θksubscript𝜃𝑘\theta_{k}italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, defined within the range [0,2π]02𝜋[0,2\pi][ 0 , 2 italic_π ], find widespread utilization. This can be represented as follows: RΓ(θk)=eiθk2Γsubscript𝑅Γsubscript𝜃𝑘superscript𝑒𝑖subscript𝜃𝑘2ΓR_{\Gamma}(\theta_{k})=e^{-i\frac{\theta_{k}}{2}\Gamma}italic_R start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_e start_POSTSUPERSCRIPT - italic_i divide start_ARG italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG roman_Γ end_POSTSUPERSCRIPT. Achieving rotations and entanglement of all qubits involves utilizing Pauli-ΓΓ\Gammaroman_Γ, Controlled-ΓΓ\Gammaroman_Γ, and rotation gates. At this moment, Pauli-ΓΓ\Gammaroman_Γ gates and RΓsubscript𝑅ΓR_{\Gamma}italic_R start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT are employed for implementing linear transformations, while the Controlled-ΓΓ\Gammaroman_Γ gates are utilized for nonlinear transformations. Therefore, PQC achieves two transformations on the 3D sphere. Consequently, in PQC, it can vary depending on the configuration of the RΓsubscript𝑅ΓR_{\Gamma}italic_R start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT and Controlled-ΓΓ\Gammaroman_Γ gates, and is an important factor in building a QNN. To thoroughly explore trainable rotation parameters and entanglement, we implement multiple quantum layers in this paper, each consisting of RΓsubscript𝑅ΓR_{\Gamma}italic_R start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT gates within PQC of each QNN. At a specific time t𝑡titalic_t, the quantum state of the QNN, denoted as |ψtketsubscript𝜓𝑡\left|\psi_{t}\right\rangle| italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩, can be represented as |ψt=l=1L𝑼l(θt)UENC(ζt)|0qketsubscript𝜓𝑡superscriptsubscriptproduct𝑙1𝐿subscript𝑼𝑙subscript𝜃𝑡subscript𝑈ENCsubscript𝜁𝑡superscriptket0tensor-productabsent𝑞\left|\psi_{t}\right\rangle=\prod_{l=1}^{L}\nolimits\boldsymbol{U}_{l}(\theta_% {t})U_{\text{ENC}}(\zeta_{t})\left|0\right\rangle^{\otimes q}| italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ = ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_U start_POSTSUBSCRIPT ENC end_POSTSUBSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | 0 ⟩ start_POSTSUPERSCRIPT ⊗ italic_q end_POSTSUPERSCRIPT, where 𝑼l(θt)subscript𝑼𝑙subscript𝜃𝑡\boldsymbol{U}_{l}(\theta_{t})bold_italic_U start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) stands for the l𝑙litalic_l-th quantum layer at the specific time t𝑡titalic_t with its corresponding set of trainable parameters. Observe that 𝑼l(θt)subscript𝑼𝑙subscript𝜃𝑡\boldsymbol{U}_{l}(\theta_{t})bold_italic_U start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) takes the trainable parameters as inputs, therefore it works differently from the encoder’s gates.

  • Measurement. The quantum state that is acquired by PQC is utilized as the input for measurement. In this process, quantum data is decoded back to the original format before performing measurements on the input. The z-axis is commonly used for measurements, but axes in other directions can also be used if they are appropriately defined. The quantum state collapses and its properties become observable after the quantum state is measured. Upon completion of the decoding procedure, the observable property is employed to minimize the loss function. Achieving the expected decoded value of the quantum state |ψtketsubscript𝜓𝑡\left|\psi_{t}\right\rangle| italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ can be accomplished through ψt|O|ψtquantum-operator-productsubscript𝜓𝑡𝑂subscript𝜓𝑡\left\langle\psi_{t}\right|O\left|\psi_{t}\right\rangle⟨ italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_O | italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩, where |ψt=l=1L𝑼l(θt)UENC(ζt)|0qketsubscript𝜓𝑡superscriptsubscriptproduct𝑙1𝐿subscript𝑼𝑙subscript𝜃𝑡subscript𝑈ENCsubscript𝜁𝑡superscriptket0tensor-productabsent𝑞\left|\psi_{t}\right\rangle=\prod_{l=1}^{L}\boldsymbol{U}_{l}(\theta_{t})U_{% \text{ENC}}(\zeta_{t})\left|0\right\rangle^{\otimes q}| italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ = ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_U start_POSTSUBSCRIPT ENC end_POSTSUBSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | 0 ⟩ start_POSTSUPERSCRIPT ⊗ italic_q end_POSTSUPERSCRIPT, ψt|brasubscript𝜓𝑡\left\langle\psi_{t}\right|⟨ italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | denotes the conjugate transpose of |ψtketsubscript𝜓𝑡\left|\psi_{t}\right\rangle| italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩, and O𝑂Oitalic_O represents the observable, respectively.

2.3 QMARL for Scheduling

This section investigates the use of QMARL for scheduling CubeSats and HALE-UAVs, presenting a strong argument for its preference over conventional MARL approaches. Conventional MARL has been effective for optimizing decisions in scenarios with relatively small action dimensions. Nonetheless, within intricate systems like integrated networks using CubeSats/HALE-UAVs, characterized by exponentially vast action dimensions, the efficacy of conventional MARL diminishes due to computational burden and the inefficacy in managing extensive action spaces. The expansion of the action dimension introduces the challenge of the curse of dimensionality [33], a significant impediment in conventional MARL frameworks. QMARL, empowered by quantum computing features such as superposition and entanglement, offers a significant computational edge [34]. This quantum advantage allows QMARL to efficiently process large-scale data and complex decision matrices [35], presenting a superior solution for the extensive action dimensions encountered in integrated networks using CubeSats/HALE-UAVs. Moreover, the multi-agent dynamics of these integrated networks involving many communicating devices such as multiple GSs, CubeSats, and HALE-UAVs make the scheduling decision-making problem more complex. QMARL signifies a crucial advancement in overcoming the challenges of high-dimensional and complex scheduling tasks for integrated networks using CubeSats/HALE-UAVs. Its enhanced computational strength and ability to effectively manage multi-agent scenarios establish it as a powerful and efficient approach, facilitating the development of more sophisticated, effective, and dependable SAGIN.

3 Modeling

3.1 Global SAGIN Access Scheduling Modeling

The considered global SAGIN is illustrated in Fig. 1 and structured around three principal elements, N𝑁Nitalic_N GSs, a fleet of M𝑀Mitalic_M CubeSats, and a group of L𝐿Litalic_L HALE-UAVs. Each GS is denoted as Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i𝒩𝑖𝒩i\in\mathcal{N}italic_i ∈ caligraphic_N, and note that |𝒩|N𝒩𝑁|\mathcal{N}|\triangleq N| caligraphic_N | ≜ italic_N. In addition, CubeSats and HALE-UAVs are denoted as Sjsubscript𝑆𝑗S_{j}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Alsubscript𝐴𝑙A_{l}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, respectively, where Sj,jsubscript𝑆𝑗𝑗S_{j},j\in\mathcal{M}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j ∈ caligraphic_M and Al,lsubscript𝐴𝑙𝑙A_{l},l\in\mathcal{L}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_l ∈ caligraphic_L, and also note that ||M𝑀|\mathcal{M}|\triangleq M| caligraphic_M | ≜ italic_M and ||L𝐿|\mathcal{L}|\triangleq L| caligraphic_L | ≜ italic_L. Our proposed scheduling works by each GS Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to establish the communications with CubeSats Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or HALE-UAVs Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT that are located within the coverage of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for network access services. The main purpose of this scheduling is for maximizing (i) the residual energy amounts of NTN devices, (ii) the fair energy consumption among NTN devices, and (iii) the global access performance in terms of capacity and QoS, in SAGIN systems.

3.2 HALE-UAV

Refer to caption
(a) z𝑧zitalic_z-axis transformation (yawing).
Refer to caption
(b) y𝑦yitalic_y-axis transformation (pitching).
Refer to caption
(c) x𝑥xitalic_x-axis transformation (rolling).
Figure 3: Flight aerodynamics of HALE-UAV.

In order to ensure the maneuvers of HALE-UAVs while maintaining the equilibrium among the energy levels of HALE-UAVs, energy expenditure modeling for HALE-UAV is essential. The required energy is the minimum energy amount to overcome aerodynamic drag and advance in each HALE-UAV. The energy is equivalent to the work per unit over time under the force applied to the dynamic system, and it is defined as the dot product of force and velocity. Therefore, the required energy of the l𝑙litalic_l-th HALE-UAV at time t𝑡titalic_t, denoted as ElA(t)subscriptsuperscript𝐸𝐴𝑙𝑡E^{A}_{l}(t)italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), is defined as ElA(t)=DVsubscriptsuperscript𝐸𝐴𝑙𝑡𝐷𝑉E^{A}_{l}(t)=DVitalic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = italic_D italic_V, where D𝐷Ditalic_D and V𝑉Vitalic_V denote its drag and velocity at time t𝑡titalic_t, respectively. Here, drag D𝐷Ditalic_D can be obtained as D=12ρV2SCD=qSCD𝐷12𝜌superscript𝑉2𝑆subscript𝐶𝐷𝑞𝑆subscript𝐶𝐷D=\frac{1}{2}\rho V^{2}SC_{D}=qSC_{D}italic_D = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ρ italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_S italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = italic_q italic_S italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, where CDsubscript𝐶𝐷C_{D}italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is drag coefficient. Because CDsubscript𝐶𝐷C_{D}italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is expressed as CD=CD0+kCL2subscript𝐶𝐷subscript𝐶subscript𝐷0𝑘superscriptsubscript𝐶𝐿2C_{D}=C_{D_{0}}+kC_{L}^{2}italic_C start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_k italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and CLsubscript𝐶𝐿C_{L}italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT is expressed as CL=W12ρV2S=WqSsubscript𝐶𝐿𝑊12𝜌superscript𝑉2𝑆𝑊𝑞𝑆C_{L}=\frac{W}{\frac{1}{2}\rho V^{2}S}=\frac{W}{qS}italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = divide start_ARG italic_W end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ρ italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_S end_ARG = divide start_ARG italic_W end_ARG start_ARG italic_q italic_S end_ARG, the required energy of the l𝑙litalic_l-th HALE-UAV at time t𝑡titalic_t, i.e., ElA(t)subscriptsuperscript𝐸𝐴𝑙𝑡E^{A}_{l}(t)italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), is,

ElA(t)=12CD0ρV3Sparasite energy, Pp+kW212ρSVinduced energy, Pi=qSCD0Vparasite energy, Pp+W2kVqSinduced energy, Pi,subscriptsuperscript𝐸𝐴𝑙𝑡subscript12subscript𝐶subscript𝐷0𝜌superscript𝑉3𝑆parasite energy, Ppsubscript𝑘superscript𝑊212𝜌𝑆𝑉induced energy, Pisubscript𝑞𝑆subscript𝐶subscript𝐷0𝑉parasite energy, Ppsubscriptsuperscript𝑊2𝑘𝑉𝑞𝑆induced energy, PiE^{A}_{l}(t)=\underbrace{\frac{1}{2}C_{D_{0}}\rho V^{3}S}_{\textrm{parasite % energy, $P_{p}$}}+\underbrace{\frac{kW^{2}}{\frac{1}{2}\rho SV}}_{\textrm{% induced energy, $P_{i}$}}\\ =\underbrace{qSC_{D_{0}}V}_{\textrm{parasite energy, $P_{p}$}}+\underbrace{% \frac{W^{2}kV}{qS}}_{\textrm{induced energy, $P_{i}$}},start_ROW start_CELL italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ρ italic_V start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_S end_ARG start_POSTSUBSCRIPT parasite energy, italic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG divide start_ARG italic_k italic_W start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ρ italic_S italic_V end_ARG end_ARG start_POSTSUBSCRIPT induced energy, italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL = under⏟ start_ARG italic_q italic_S italic_C start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_V end_ARG start_POSTSUBSCRIPT parasite energy, italic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG divide start_ARG italic_W start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k italic_V end_ARG start_ARG italic_q italic_S end_ARG end_ARG start_POSTSUBSCRIPT induced energy, italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , end_CELL end_ROW (1)

where CD0subscript𝐶subscript𝐷0C_{D_{0}}italic_C start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, ρ𝜌\rhoitalic_ρ, V𝑉Vitalic_V, S𝑆Sitalic_S, k𝑘kitalic_k, W𝑊Witalic_W, and q𝑞qitalic_q are the parasite drag coefficient at zero lift, density of the air, velocity, wing surface area, induced drag coefficient, HALE-UAV weight, and dynamic pressure (q=12ρV2𝑞12𝜌superscript𝑉2q=\frac{1}{2}\rho V^{2}italic_q = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ρ italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT[36], respectively. As expressed in (1), the required energy is composed of the parasite energy and induced energy [37]. Here, the parasite energy arises from parasite drag, encompassing skin friction drag (drag that varies with the UAV’s surface texture), form drag (drag that depends on the HALE-UAV’s size, structure, and shape), and interference drag (drag generated from the interaction between skin friction and form drag) [38]. In addition, the induced energy originates from the drag produced by generating lift. This type of drag is caused by wingtip vortices, resulting from the differential pressure on the wing’s upper and lower surfaces, which in turn creates downwash at the wing’s rear. Accordingly, Ppsubscript𝑃𝑝P_{p}italic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT increases with the cube of velocity, whereas Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is inversely related to velocity, demonstrating the dynamics of aerodynamic drag in relation to the UAV’s velocity [39].

On the other hand, velocity V𝑉Vitalic_V is computed as the aggregate of velocities along each axis, formulated as V=u2+v2+w2𝑉superscript𝑢2superscript𝑣2superscript𝑤2V=\sqrt{u^{2}+v^{2}+w^{2}}italic_V = square-root start_ARG italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, where u𝑢uitalic_u, v𝑣vitalic_v, and w𝑤witalic_w represent the velocities over the x𝑥xitalic_x-, y𝑦yitalic_y-, and z𝑧zitalic_z-axes of body axis coordinate system, respectively. Here, velocity V𝑉Vitalic_V in (1) is the velocity based on the body axis coordinate system of aircraft. Nevertheless, due to the fact that the velocities of HALE-UAVs for each axis are determined with the relation to the ground coordinate system, it is imperative to utilize coordinate transformation matrices. Therefore, velocities u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and w1subscript𝑤1w_{1}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the ground coordinate system are transformed into the velocities u𝑢uitalic_u, v𝑣vitalic_v, and w𝑤witalic_w within the body axis coordinate system through multiplication by the coordinate transformation matrices L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and L3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, which is expressed as,

[uvw]=L1×L2×L3×[u1v1w1],matrix𝑢𝑣𝑤subscript𝐿1subscript𝐿2subscript𝐿3matrixsubscript𝑢1subscript𝑣1subscript𝑤1\begin{bmatrix}u\\ v\\ w\end{bmatrix}=L_{1}\times L_{2}\times L_{3}\times\begin{bmatrix}u_{1}\\ v_{1}\\ w_{1}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_u end_CELL end_ROW start_ROW start_CELL italic_v end_CELL end_ROW start_ROW start_CELL italic_w end_CELL end_ROW end_ARG ] = italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (2)

where L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and L3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are the transformation matrices over the z𝑧zitalic_z-axis, y𝑦yitalic_y-axis, and x𝑥xitalic_x-axes, sequentially. The geometric relationships among these transformations are illustrated in Fig. 3, and the transformation of coordinates for each axis can be articulated via,

[u2v2w2]=[cosψsinψ0sinψcosψ0001]L1[u1v1w1],matrixsubscript𝑢2subscript𝑣2subscript𝑤2subscriptmatrix𝜓𝜓0𝜓𝜓0001subscript𝐿1matrixsubscript𝑢1subscript𝑣1subscript𝑤1\begin{bmatrix}u_{2}\\ v_{2}\\ w_{2}\end{bmatrix}=\underbrace{\begin{bmatrix}\cos\psi&\sin\psi&0\\ -\sin\psi&\cos\psi&0\\ 0&0&1\\ \end{bmatrix}}_{L_{1}}\begin{bmatrix}u_{1}\\ v_{1}\\ w_{1}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = under⏟ start_ARG [ start_ARG start_ROW start_CELL roman_cos italic_ψ end_CELL start_CELL roman_sin italic_ψ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - roman_sin italic_ψ end_CELL start_CELL roman_cos italic_ψ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (3)
[u3v3w3]=[cosθ0sinθ010sinθ0cosθ]L2[u2v2w2],matrixsubscript𝑢3subscript𝑣3subscript𝑤3subscriptmatrix𝜃0𝜃010𝜃0𝜃subscript𝐿2matrixsubscript𝑢2subscript𝑣2subscript𝑤2\begin{bmatrix}u_{3}\\ v_{3}\\ w_{3}\end{bmatrix}=\underbrace{\begin{bmatrix}\cos\theta&0&-\sin\theta\\ 0&1&0\\ \sin\theta&0&\cos\theta\\ \end{bmatrix}}_{L_{2}}\begin{bmatrix}u_{2}\\ v_{2}\\ w_{2}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = under⏟ start_ARG [ start_ARG start_ROW start_CELL roman_cos italic_θ end_CELL start_CELL 0 end_CELL start_CELL - roman_sin italic_θ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL roman_sin italic_θ end_CELL start_CELL 0 end_CELL start_CELL roman_cos italic_θ end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (4)
[uvw]=[1000cosϕsinϕ0sinϕcosϕ]L3[u3v3w3],matrix𝑢𝑣𝑤subscriptmatrix1000italic-ϕitalic-ϕ0italic-ϕitalic-ϕsubscript𝐿3matrixsubscript𝑢3subscript𝑣3subscript𝑤3\begin{bmatrix}u\\ v\\ w\end{bmatrix}=\underbrace{\begin{bmatrix}1&0&0\\ 0&\cos\phi&\sin\phi\\ 0&-\sin\phi&\cos\phi\\ \end{bmatrix}}_{L_{3}}\begin{bmatrix}u_{3}\\ v_{3}\\ w_{3}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_u end_CELL end_ROW start_ROW start_CELL italic_v end_CELL end_ROW start_ROW start_CELL italic_w end_CELL end_ROW end_ARG ] = under⏟ start_ARG [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL roman_cos italic_ϕ end_CELL start_CELL roman_sin italic_ϕ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - roman_sin italic_ϕ end_CELL start_CELL roman_cos italic_ϕ end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (5)

where ψ𝜓\psiitalic_ψ, θ𝜃\thetaitalic_θ, and ϕitalic-ϕ\phiitalic_ϕ represent the rotations over the z𝑧zitalic_z-, y𝑦yitalic_y-, and z𝑧zitalic_z-axes, respectively. Within the real flight environment of HALE-UAVs, such disturbances are attributable to turbulence and wind gusts, which have the potential to alter the UAV’s rotational orientation. Amidst conditions where turbulence and gusts are prevalent across all axes, the goal of HALE-UAV is to simultaneously optimize the global access performance of the integrated network and the energy use of HALE-UAV. Details pertaining to the HALE-UAV deployed in this paper are compiled in Table I.

TABLE I: Specifications of HALE-UAV.
Notation Value
Mass of HALE-UAV, m𝑚mitalic_m 1,815 [kgkg\mathrm{kg}roman_kg]
Acceleration of gravity, g 9.81 [m/s2msuperscripts2\mathrm{m/s^{2}}roman_m / roman_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT]
Weight of HALE-UAV, W=mg𝑊𝑚𝑔{W}={mg}italic_W = italic_m italic_g 17,799 [NN\mathrm{N}roman_N]
Wing surface area, S 6.61 [m2superscriptm2\mathrm{m^{2}}roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT]
Density of the air, ρ𝜌\rhoitalic_ρ 0.089 [kg/m3kgsuperscriptm3\mathrm{kg/m^{3}}roman_kg / roman_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT]
Parasite drag coefficient at zero lift, CD0subscript𝐶subscript𝐷0C_{D_{0}}italic_C start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.045
Induced drag coefficient, k𝑘kitalic_k 0.052

3.3 CubeSat

3.3.1 Two Line Element (TLE)

Refer to caption
Figure 4: TLE configuration of the satellite used in the experiment..

In order to observe the orbital mechanics of CubeSats, TLE is essentially required. Originating from the North American Aerospace Defense Command (NORAD), TLE contains the vital details concerning the trajectories of objects orbiting the Earth, especially for CubeSats. NORAD, tasked with the surveillance and cataloging of space debris, introduced the TLE format to effectively disseminate orbital information. The structure of TLE consists of two lines as illustrated in Fig. 4, detailing specific orbital parameters and CubeSat characteristics. Fig. 4 displays the TLE for OPS-3811, a CubeSat utilized in the experiment, encompassing orbital elements such as inclination (i𝑖iitalic_i), ascending node (ΩΩ\Omegaroman_Ω), eccentricity (e𝑒eitalic_e), argument of perigee (ω𝜔\omegaitalic_ω), and mean anomaly (M𝑀Mitalic_M). The inclination (i𝑖iitalic_i) signifies the CubeSat’s orbital plane angle relative to the equatorial plane of the Earth. The ascending node (ΩΩ\Omegaroman_Ω) specifies the location where the CubeSat’s orbit crosses the equatorial plane from south to north, also known as the right ascension of the line of nodes. The eccentricity (e𝑒eitalic_e) is a measure of how far a CubeSat’s elliptical orbit deviates from a circle. The argument of perigee (ω𝜔\omegaitalic_ω) is the angle from the line of nodes to the perigee of the orbit. The mean anomaly (M𝑀Mitalic_M) indicates the CubeSat’s current position within its orbit, assuming a circular path with the same semi-major axis (a𝑎aitalic_a). In other words, the mean anomaly is the angle between the current position of the CubeSat and the perigee of the orbit, assuming that the CubeSat moves at an average speed when moving along an elliptical orbit. These TLE data, such as e𝑒eitalic_e and ΩΩ\Omegaroman_Ω, are instrumental in calculating the CubeSat’s latitude, longitude, facilitating the determination of xji(t)subscriptsuperscript𝑥𝑖𝑗𝑡x^{i}_{j}(t)italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, by (15).

3.3.2 Orbital Elements of CubeSats

Refer to caption
(a) Geometry of orbital elements.
Refer to caption
(b) Coordinate of the Earth.
Figure 5: Orbital elements of CubeSat and the geometric relationship of great circle distance between two CubeSats.

As mentioned, the orbital elements expressed in TLE include eccentricity (e𝑒eitalic_e), inclination (i𝑖iitalic_i), right ascension of the ascending node (ΩΩ\Omegaroman_Ω), argument of perigee (ω𝜔\omegaitalic_ω), and mean anomaly (M𝑀Mitalic_M). The orbital elements that are not in TLE, such as semi-major axis (a𝑎aitalic_a), eccentric anomaly (E𝐸Eitalic_E), and true anomaly (ν𝜈\nuitalic_ν), are obtained using the orbital elements in TLE. Fig. 5(a) presents the geometric representation of orbital elements. The semi-major axis (a𝑎aitalic_a), illustrated with a green line, denotes the CubeSat’s orbit’s longest radius, crucial for calculating its eccentricity (e𝑒eitalic_e). The eccentricity itself measures how much the orbit deviates from a perfect circle, with values close to 00 indicating near circularity and values near 1111 highlighting an elliptical shape. The eccentricity vector (e𝑒\overrightarrow{e}over→ start_ARG italic_e end_ARG) is a vector that goes from the center of the CubeSat’s orbit to the perigee of the orbit. Additionally, the orbital inclination (i𝑖iitalic_i) is assessed as the angle between the orbit’s normal axis (k𝑘\overrightarrow{k}over→ start_ARG italic_k end_ARG) and its angular momentum vector (H𝐻\overrightarrow{H}over→ start_ARG italic_H end_ARG), with the latter perpendicular to the plane of the orbit, thereby quantifying the orbit’s tilt with respect to the equatorial plane of the Earth. The ascending node (ΩΩ\Omegaroman_Ω) signifies the line of nodes’s longitude, which is the point where the CubeSat’s orbital plane intersects the Earth’s equatorial plane. The argument of perigee (ω𝜔\omegaitalic_ω) is defined by the angle from the ascending node vector (n𝑛\overrightarrow{n}over→ start_ARG italic_n end_ARG) to the eccentricity vector (e𝑒\overrightarrow{e}over→ start_ARG italic_e end_ARG), with n𝑛\overrightarrow{n}over→ start_ARG italic_n end_ARG directing towards the line of nodes, depicted as a sky blue line in Fig. 5(a). This angle delineates the orbit’s orientation relative to the equator, marking the perigee’s location. The mean anomaly (M𝑀Mitalic_M) is a parameter for predicting the position of a CubeSat moving along an elliptical orbit over time, and is expressed as an angle representing the average position of the object within the orbital period, aiding in the calculation of the eccentric anomaly (E𝐸Eitalic_E). In an elliptical orbit, the CubeSat’s velocity changes as it passes through periapsis (the closest point) and apogee (the farthest point), but mean anomaly does not take these velocity changes into account and assumes that it moves at a uniform velocity. Therefore, a difference may occur between the actual position of the CubeSat and the position calculated by mean anomaly, and eccentric anomaly and true anomaly are used to correct this difference. The mean anomaly does not directly correspond to the actual CubeSat position, but is used as an initial value to calculate more accurate positions, such as the eccentric anomaly and true anomaly, using the eccentricity of the orbit and other orbital elements. Therefore, the mean anomaly plays an important role when modeling trajectories as a function of time. Finally, the true anomaly (ν𝜈\nuitalic_ν) is the angle from the perigee to the CubeSat’s actual position, represented by the angle between vectors r𝑟\overrightarrow{r}over→ start_ARG italic_r end_ARG and e𝑒\overrightarrow{e}over→ start_ARG italic_e end_ARG, where r𝑟\overrightarrow{r}over→ start_ARG italic_r end_ARG points from the origin of the coordinate system to the CubeSat, and the coordinate axis i𝑖\overrightarrow{i}over→ start_ARG italic_i end_ARG aims towards the vernal equinox.

3.3.3 Latitude and Longitude of CubeSat

To ascertain the locations of CubeSats change over time, their positions are represented through coordinates of latitude (pjϕ(t)subscriptsuperscript𝑝italic-ϕ𝑗𝑡p^{\phi}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t )) and longitude (pjλ(t)subscriptsuperscript𝑝𝜆𝑗𝑡p^{\lambda}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t )) within the orbital coordinate systems. Given that the CubeSat’s unprocessed data in TLE consist of the coordinates in the celestial coordinate systems, the transformation to the orbital coordinate systems is required for the derivation of latitude and longitude. The latitude and longitude that change over time for each CubeSat are calculated through TLE, which is raw CubeSat data. Consequently, the latitude (pjϕ(t)subscriptsuperscript𝑝italic-ϕ𝑗𝑡p^{\phi}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t )) and longitude (pjλ(t)subscriptsuperscript𝑝𝜆𝑗𝑡p^{\lambda}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t )) pertaining to the current position of CubeSat Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, i.e., the j𝑗jitalic_j-th CubeSat located within the coverage of the i𝑖iitalic_i-th GS, are articulated as, pjϕ(t)=sin1(Rf[3]Rf)subscriptsuperscript𝑝italic-ϕ𝑗𝑡superscript1subscript𝑅𝑓delimited-[]3delimited-∥∥subscript𝑅𝑓p^{\phi}_{j}(t)=\sin^{-1}\left(\frac{R_{f}[3]}{\lVert R_{f}\rVert}\right)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = roman_sin start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT [ 3 ] end_ARG start_ARG ∥ italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∥ end_ARG ) and pjλ(t)=cos1(Rf[1]Rfcosϕ)subscriptsuperscript𝑝𝜆𝑗𝑡superscript1subscript𝑅𝑓delimited-[]1delimited-∥∥subscript𝑅𝑓italic-ϕp^{\lambda}_{j}(t)=\cos^{-1}\left(\frac{R_{f}[1]}{\lVert R_{f}\rVert\cos\phi}\right)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT [ 1 ] end_ARG start_ARG ∥ italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∥ roman_cos italic_ϕ end_ARG ), where Rf[1]subscript𝑅𝑓delimited-[]1R_{f}[1]italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT [ 1 ] and Rf[3]subscript𝑅𝑓delimited-[]3R_{f}[3]italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT [ 3 ] refer to Rfsubscript𝑅𝑓R_{f}italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT’s first and third elements, and this Rfsubscript𝑅𝑓R_{f}italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is defined as,

Rf[C1×C2×C3×C4]×V4.subscript𝑅𝑓delimited-[]subscript𝐶1subscript𝐶2subscript𝐶3subscript𝐶4subscript𝑉4R_{f}\triangleq[C_{1}\times C_{2}\times C_{3}\times C_{4}]\times V_{4}.italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ≜ [ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ] × italic_V start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT . (6)

In (6), the coordinate transformation matrices, C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and C4subscript𝐶4C_{4}italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, are

C1=[cos(Ω)sin(Ω)0sin(Ω)cos(Ω)0001],C2=[1000cos(i)sin(i)0sin(i)cos(i)],formulae-sequencesubscript𝐶1matrixΩΩ0ΩΩ0001subscript𝐶2matrix1000𝑖𝑖0𝑖𝑖\displaystyle C_{1}=\begin{bmatrix}\cos(\Omega)&\sin(\Omega)&0\\ -\sin(\Omega)&\cos(\Omega)&0\\ 0&0&1\end{bmatrix},C_{2}=\begin{bmatrix}1&0&0\\ 0&\cos(i)&\sin(i)\\ 0&-\sin(i)&\cos(i)\end{bmatrix},italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL roman_cos ( roman_Ω ) end_CELL start_CELL roman_sin ( roman_Ω ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - roman_sin ( roman_Ω ) end_CELL start_CELL roman_cos ( roman_Ω ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL roman_cos ( italic_i ) end_CELL start_CELL roman_sin ( italic_i ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - roman_sin ( italic_i ) end_CELL start_CELL roman_cos ( italic_i ) end_CELL end_ROW end_ARG ] ,
C3=[cos(ω)sin(ω)0sin(ω)cos(ω)0001],C4=[cos(θ)sin(θ)0sin(θ)cos(θ)0001],formulae-sequencesubscript𝐶3matrix𝜔𝜔0𝜔𝜔0001subscript𝐶4matrix𝜃𝜃0𝜃𝜃0001\displaystyle C_{3}=\begin{bmatrix}\cos({\omega})&\sin({\omega})&0\\ -\sin({\omega})&\cos({\omega})&0\\ 0&0&1\end{bmatrix},C_{4}=\begin{bmatrix}\cos({\theta})&\sin({\theta})&0\\ -\sin({\theta})&\cos({\theta})&0\\ 0&0&1\end{bmatrix},italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL roman_cos ( italic_ω ) end_CELL start_CELL roman_sin ( italic_ω ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - roman_sin ( italic_ω ) end_CELL start_CELL roman_cos ( italic_ω ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] , italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL roman_cos ( italic_θ ) end_CELL start_CELL roman_sin ( italic_θ ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - roman_sin ( italic_θ ) end_CELL start_CELL roman_cos ( italic_θ ) end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] ,

where θ𝜃\thetaitalic_θ is the angle by which the Earth has rotated in t𝑡titalic_t. Therefore, θ𝜃\thetaitalic_θ represents the product of the Earth’s rotational angular velocity and the time interval t𝑡titalic_t. Lastly, V4subscript𝑉4V_{4}italic_V start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT in (6) is,

V4=[rcos(ν)rsin(ν)0]T,subscript𝑉4superscriptmatrix𝑟𝜈𝑟𝜈0𝑇V_{4}=\begin{bmatrix}r\cos({\nu})~{}~{}r\sin(\nu)~{}~{}0\end{bmatrix}^{T},italic_V start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_r roman_cos ( italic_ν ) italic_r roman_sin ( italic_ν ) 0 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (7)

where r𝑟ritalic_r denotes the conic section, and this r𝑟ritalic_r is a clue to compute the distance between the center of the elliptical orbit and CubeSat. Additionally, r𝑟\overrightarrow{r}over→ start_ARG italic_r end_ARG is the vector pointing from the center of the elliptical orbit to the current position of CubeSat. Therefore, the current coordinates of CubeSat measured in the celestial coordinate system are expressed as (7). However, in order to calculate the CubeSat’s latitude and longitude that change over time, V4subscript𝑉4V_{4}italic_V start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT in the celestial coordinate system must be converted to the orbital coordinate system, and the previously defined coordinate transformation matrices are utilized. The corresponding coordinate transformation matrices, denoted as C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and C4subscript𝐶4C_{4}italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, facilitate the conversion of celestial coordinate systems into orbital coordinate systems. Finally, r𝑟ritalic_r in (7) is determined by

r=H2/μ1+ecos(ν),𝑟superscript𝐻2𝜇1𝑒𝜈r=\frac{H^{2}/\mu}{1+e\cos(\nu)},italic_r = divide start_ARG italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_μ end_ARG start_ARG 1 + italic_e roman_cos ( italic_ν ) end_ARG , (8)

where μ𝜇\muitalic_μ and H𝐻Hitalic_H represents the standard gravitational parameter and angular momentum, respectively, where Hμa(1e2)𝐻𝜇𝑎1superscript𝑒2H\triangleq\sqrt{\mu a(1-e^{2})}italic_H ≜ square-root start_ARG italic_μ italic_a ( 1 - italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG and ν=2tan1(1+e1etan(E2))𝜈2superscript11𝑒1𝑒𝐸2\nu=2\tan^{-1}\left(\sqrt{\frac{1+e}{1-e}}\tan\left(\frac{E}{2}\right)\right)italic_ν = 2 roman_tan start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( square-root start_ARG divide start_ARG 1 + italic_e end_ARG start_ARG 1 - italic_e end_ARG end_ARG roman_tan ( divide start_ARG italic_E end_ARG start_ARG 2 end_ARG ) ), where E=M+esinM𝐸𝑀𝑒𝑀E=M+e\sin{M}italic_E = italic_M + italic_e roman_sin italic_M. Here, the data from TLE are transformed into geographical coordinates, i.e., latitude and longitude, over time. The constants needed to calculate the latitude and longitude of a CubeSat that change over time through TLE are summarized in Table II.

TABLE II: Parameter Settings for CubeSat Position Calculations
Constant Value
Gravitational Constant, G𝐺Gitalic_G 6.673 e𝑒eitalic_e-20
Mass of the Earth, Mesubscript𝑀𝑒M_{e}italic_M start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 5.974 e𝑒eitalic_e+24 kg
Radius of the Earth, Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 6.378 e+6 m
Standard Gravitational Parameter, μ𝜇\muitalic_μ = GMe𝐺subscript𝑀𝑒GM_{e}italic_G italic_M start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 3.986 e+14 m3superscript𝑚3m^{3}italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT s2superscript𝑠2s^{-2}italic_s start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT

3.3.4 Distance between GS and CubeSat

The distance between GSs and NTN devices (i.e., CubeSats and HALE-UAVs) can be formulated as follows.

Lemma 1.

The distance between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, varies over time due to the updated latitude and longitude of the CubeSat. It can be formulated as,

dji(t)=Hji(t)2+Vji(t)2,subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝐻𝑖𝑗superscript𝑡2subscriptsuperscript𝑉𝑖𝑗superscript𝑡2d^{i}_{j}(t)=\sqrt{H^{i}_{j}(t)^{2}+V^{i}_{j}(t)^{2}},italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = square-root start_ARG italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (9)

where Hji(t)subscriptsuperscript𝐻𝑖𝑗𝑡H^{i}_{j}(t)italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) and Vji(t)subscriptsuperscript𝑉𝑖𝑗𝑡V^{i}_{j}(t)italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) represent the respective horizontal and vertical distances between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and note that Vji(t)subscriptsuperscript𝑉𝑖𝑗𝑡V^{i}_{j}(t)italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) indicates the altitude of Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT relative to Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then,

Hji(t)=Recos1(cospiϕ(t)cospjϕ(t)cos(piλ(t)pjλ(t))+sinpiϕ(t)sinpjϕ(t)),subscriptsuperscript𝐻𝑖𝑗𝑡subscript𝑅𝑒superscript1subscriptsuperscript𝑝italic-ϕ𝑖𝑡subscriptsuperscript𝑝italic-ϕ𝑗𝑡subscriptsuperscript𝑝𝜆𝑖𝑡subscriptsuperscript𝑝𝜆𝑗𝑡subscriptsuperscript𝑝italic-ϕ𝑖𝑡subscriptsuperscript𝑝italic-ϕ𝑗𝑡H^{i}_{j}(t)=R_{e}\cos^{-1}(\cos p^{\phi}_{i}(t)\cos p^{\phi}_{j}(t)\cos(p^{% \lambda}_{i}(t)-p^{\lambda}_{j}(t))\\ +\sin p^{\phi}_{i}(t)\sin p^{\phi}_{j}(t)),start_ROW start_CELL italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_cos italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) roman_cos ( italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) end_CELL end_ROW start_ROW start_CELL + roman_sin italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) , end_CELL end_ROW (10)

where piϕ(t)subscriptsuperscript𝑝italic-ϕ𝑖𝑡p^{\phi}_{i}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) and piλ(t)subscriptsuperscript𝑝𝜆𝑖𝑡p^{\lambda}_{i}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) denote the latitude and longitude of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT; and Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is the radius of the Earth.

Proof.

As illustrated in Fig. 5(b), PGSisubscript𝑃𝐺subscript𝑆𝑖\vec{P}_{GS_{i}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PCSjsubscript𝑃𝐶subscript𝑆𝑗\vec{P}_{CS_{j}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT are positioned on the surface of the Earth. These vectors are denoted as PGSi=(xi,yi,zi)subscript𝑃𝐺subscript𝑆𝑖subscript𝑥𝑖subscript𝑦𝑖subscript𝑧𝑖\vec{P}_{GS_{i}}=(x_{i},y_{i},z_{i})over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and PCSj=(xj,yj,zj)subscript𝑃𝐶subscript𝑆𝑗subscript𝑥𝑗subscript𝑦𝑗subscript𝑧𝑗\vec{P}_{CS_{j}}=(x_{j},y_{j},z_{j})over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), correspondingly, where PGSisubscript𝑃𝐺subscript𝑆𝑖\vec{P}_{GS_{i}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PCSjsubscript𝑃𝐶subscript𝑆𝑗\vec{P}_{CS_{j}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT are identified as coordinate vectors along with x𝑥xitalic_x-, y𝑦yitalic_y-, and z𝑧zitalic_z-axes, respectively. In addition, the angular difference between PGSisubscript𝑃𝐺subscript𝑆𝑖\vec{P}_{GS_{i}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PCSjsubscript𝑃𝐶subscript𝑆𝑗\vec{P}_{CS_{j}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT, i.e., θ𝜃\thetaitalic_θ, can be obtained as,

θ=cos1PGSiPCSjPGSiPCSj=cos1xixj+yiyj+zizjxi2+yi2+zi2xj2+yj2+zj2,𝜃superscript1subscript𝑃𝐺subscript𝑆𝑖subscript𝑃𝐶subscript𝑆𝑗normsubscript𝑃𝐺subscript𝑆𝑖normsubscript𝑃𝐶subscript𝑆𝑗superscript1subscript𝑥𝑖subscript𝑥𝑗subscript𝑦𝑖subscript𝑦𝑗subscript𝑧𝑖subscript𝑧𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑦𝑖2superscriptsubscript𝑧𝑖2superscriptsubscript𝑥𝑗2superscriptsubscript𝑦𝑗2superscriptsubscript𝑧𝑗2\theta=\cos^{-1}\frac{\vec{P}_{GS_{i}}\cdot\vec{P}_{CS_{j}}}{\left\|\vec{P}_{% GS_{i}}\right\|\left\|\vec{P}_{CS_{j}}\right\|}\\ =\cos^{-1}\frac{x_{i}x_{j}+y_{i}y_{j}+z_{i}z_{j}}{\sqrt{x_{i}^{2}+y_{i}^{2}+z_% {i}^{2}}\sqrt{x_{j}^{2}+y_{j}^{2}+z_{j}^{2}}},start_ROW start_CELL italic_θ = roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG ∥ over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ ∥ over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ end_ARG end_CELL end_ROW start_ROW start_CELL = roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG square-root start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG , end_CELL end_ROW (11)

where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be represented as,

[xiyizi]=[Recospiϕ(t)cospiλ(t)Recospiϕ(t)sinpiλ(t)Resinpiϕ(t)],matrixsubscript𝑥𝑖subscript𝑦𝑖subscript𝑧𝑖matrixsubscript𝑅𝑒superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑖𝜆𝑡subscript𝑅𝑒superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑖𝜆𝑡subscript𝑅𝑒superscriptsubscript𝑝𝑖italic-ϕ𝑡\displaystyle\begin{bmatrix}x_{i}\\ y_{i}\\ z_{i}\end{bmatrix}=\begin{bmatrix}R_{e}\cos p_{i}^{\phi}(t)\cos p_{i}^{\lambda% }(t)\\ R_{e}\cos p_{i}^{\phi}(t)\sin p_{i}^{\lambda}(t)\\ R_{e}\sin p_{i}^{\phi}(t)\end{bmatrix},[ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_sin italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ] , (12)
[xjyjzj]=[Recospjϕ(t)cospjλ(t)Recospjϕ(t)sinpjλ(t)Resinpjϕ(t)],matrixsubscript𝑥𝑗subscript𝑦𝑗subscript𝑧𝑗matrixsubscript𝑅𝑒superscriptsubscript𝑝𝑗italic-ϕ𝑡superscriptsubscript𝑝𝑗𝜆𝑡subscript𝑅𝑒superscriptsubscript𝑝𝑗italic-ϕ𝑡superscriptsubscript𝑝𝑗𝜆𝑡subscript𝑅𝑒superscriptsubscript𝑝𝑗italic-ϕ𝑡\displaystyle\begin{bmatrix}x_{j}\\ y_{j}\\ z_{j}\end{bmatrix}=\begin{bmatrix}R_{e}\cos p_{j}^{\phi}(t)\cos p_{j}^{\lambda% }(t)\\ R_{e}\cos p_{j}^{\phi}(t)\sin p_{j}^{\lambda}(t)\\ R_{e}\sin p_{j}^{\phi}(t)\end{bmatrix},[ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_sin italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ] , (13)

where piϕ(t)superscriptsubscript𝑝𝑖italic-ϕ𝑡p_{i}^{\phi}(t)italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ), piλ(t)superscriptsubscript𝑝𝑖𝜆𝑡p_{i}^{\lambda}(t)italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ), pjϕ(t)superscriptsubscript𝑝𝑗italic-ϕ𝑡p_{j}^{\phi}(t)italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ), and pjλ(t)superscriptsubscript𝑝𝑗𝜆𝑡p_{j}^{\lambda}(t)italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) are the latitude of PGSisubscript𝑃𝐺subscript𝑆𝑖\vec{P}_{GS_{i}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, the longitude of PGSisubscript𝑃𝐺subscript𝑆𝑖\vec{P}_{GS_{i}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_G italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, the latitude of PCSjsubscript𝑃𝐶subscript𝑆𝑗\vec{P}_{CS_{j}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and the longitude of PCSjsubscript𝑃𝐶subscript𝑆𝑗\vec{P}_{CS_{j}}over→ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_C italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT, at t𝑡titalic_t, respectively. Given that the magnitudes of these vectors are equivalent, xi2+yi2+zi2=xj2+yj2+zj2=Resuperscriptsubscript𝑥𝑖2superscriptsubscript𝑦𝑖2superscriptsubscript𝑧𝑖2superscriptsubscript𝑥𝑗2superscriptsubscript𝑦𝑗2superscriptsubscript𝑧𝑗2subscript𝑅𝑒\sqrt{x_{i}^{2}+y_{i}^{2}+z_{i}^{2}}=\sqrt{x_{j}^{2}+y_{j}^{2}+z_{j}^{2}}=R_{e}square-root start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = square-root start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, and thus, xixj+yiyj+zizj=Re2cos1(cospiϕ(t)cospjϕ(t)cos(piλ(t)pjλ(t))+sinpiϕ(t)sinpjϕ(t))subscript𝑥𝑖subscript𝑥𝑗subscript𝑦𝑖subscript𝑦𝑗subscript𝑧𝑖subscript𝑧𝑗superscriptsubscript𝑅𝑒2superscript1superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑗italic-ϕ𝑡superscriptsubscript𝑝𝑖𝜆𝑡superscriptsubscript𝑝𝑗𝜆𝑡superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑗italic-ϕ𝑡x_{i}x_{j}+y_{i}y_{j}+z_{i}z_{j}=R_{e}^{2}\cos^{-1}(\cos p_{i}^{\phi}(t)\cos p% _{j}^{\phi}(t)\cos(p_{i}^{\lambda}(t)-p_{j}^{\lambda}(t))+\sin p_{i}^{\phi}(t)% \sin p_{j}^{\phi}(t))italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_cos italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) ) + roman_sin italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) ) by (13). Therefore, according to the fact that Hji(t)subscriptsuperscript𝐻𝑖𝑗𝑡H^{i}_{j}(t)italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) is derived from Reθsubscript𝑅𝑒𝜃R_{e}\thetaitalic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_θ, which is depicted as the red line in Fig. 5(b), Hji(t)=Recos1(cospiϕ(t)cospjϕ(t)cos(piλ(t)pjλ(t))+sinpiϕ(t)sinpjϕ(t))subscriptsuperscript𝐻𝑖𝑗𝑡subscript𝑅𝑒superscript1superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑗italic-ϕ𝑡superscriptsubscript𝑝𝑖𝜆𝑡superscriptsubscript𝑝𝑗𝜆𝑡superscriptsubscript𝑝𝑖italic-ϕ𝑡superscriptsubscript𝑝𝑗italic-ϕ𝑡H^{i}_{j}(t)=R_{e}\cos^{-1}(\cos p_{i}^{\phi}(t)\cos p_{j}^{\phi}(t)\cos(p_{i}% ^{\lambda}(t)-p_{j}^{\lambda}(t))+\sin p_{i}^{\phi}(t)\sin p_{j}^{\phi}(t))italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_cos italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_cos ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ( italic_t ) ) + roman_sin italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_t ) ). ∎

Similarly, the distance between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the l𝑙litalic_l-th HALE-UAV within the coverage of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i.e., denoted as Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, is determined based on the latitude (plϕ(t)subscriptsuperscript𝑝italic-ϕ𝑙𝑡p^{\phi}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t )) and longitude (plλ(t)subscriptsuperscript𝑝𝜆𝑙𝑡p^{\lambda}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t )) of Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, calculated as dli(t)=Hli(t)2+Vli(t)2subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝐻𝑖𝑙superscript𝑡2subscriptsuperscript𝑉𝑖𝑙superscript𝑡2d^{i}_{l}(t)=\sqrt{H^{i}_{l}(t)^{2}+V^{i}_{l}(t)^{2}}italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = square-root start_ARG italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, where Hli(t)subscriptsuperscript𝐻𝑖𝑙𝑡H^{i}_{l}(t)italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) and Vli(t)subscriptsuperscript𝑉𝑖𝑙𝑡V^{i}_{l}(t)italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) are the horizontal and vertical distances, and note that Vli(t)subscriptsuperscript𝑉𝑖𝑙𝑡V^{i}_{l}(t)italic_V start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) indicates the altitude of Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT relative to Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, due to (9). Furthermore, according to (10), Hli(t)=Recos1(cospiϕ(t)cosplϕ(t)cos(piλ(t)plλ(t))+sinpiϕ(t)sinplϕ(t))subscriptsuperscript𝐻𝑖𝑙𝑡subscript𝑅𝑒superscript1subscriptsuperscript𝑝italic-ϕ𝑖𝑡subscriptsuperscript𝑝italic-ϕ𝑙𝑡subscriptsuperscript𝑝𝜆𝑖𝑡subscriptsuperscript𝑝𝜆𝑙𝑡subscriptsuperscript𝑝italic-ϕ𝑖𝑡subscriptsuperscript𝑝italic-ϕ𝑙𝑡H^{i}_{l}(t)=R_{e}\cos^{-1}(\cos p^{\phi}_{i}(t)\cos p^{\phi}_{l}(t)\cos(p^{% \lambda}_{i}(t)-p^{\lambda}_{l}(t))+\sin p^{\phi}_{i}(t)\sin p^{\phi}_{l}(t))italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_cos italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) roman_cos italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) roman_cos ( italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ) + roman_sin italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) roman_sin italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ), where plϕ(t)subscriptsuperscript𝑝italic-ϕ𝑙𝑡p^{\phi}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and plλ(t)subscriptsuperscript𝑝𝜆𝑙𝑡p^{\lambda}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) denote the latitude and longitude of the l𝑙litalic_l-th HALE-UAV at time t𝑡titalic_t, respectively.

4 Problem Formulation and Algorithm Design

4.1 Main Objective for Global SAGIN Mobile Access

The purpose of our proposed QMARL-based scheduler in SAGIN is to preserve the residual energy of NTN devices as much as possible while each GS improves the global access performance in terms of access availability and energy efficiency. Therefore, when each GS schedules CubeSats and HALE-UAVs for global access, it is important to simultaneously optimize the global access performance and the residual energy of NTN devices. To achieve this goal, corresponding reward function should designed for MARL based algorithm design. The main objective of global SAGIN mobile access for each i𝑖iitalic_i-th GS can be formulated as,

maxxj,li(t){0,1}:lim𝒯1𝒯t=0𝒯1jMi,lLiRi(dj,li(t),xj,li(t)),:subscriptsubscriptsuperscript𝑥𝑖𝑗𝑙𝑡01subscript𝒯1𝒯superscriptsubscript𝑡0𝒯1subscriptformulae-sequencefor-all𝑗superscript𝑀𝑖for-all𝑙superscript𝐿𝑖subscript𝑅𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡\max_{x^{i}_{j,l}(t)\in\{0,1\}}:\lim_{\mathcal{T}\rightarrow\infty}\frac{1}{% \mathcal{T}}\sum_{t=0}^{\mathcal{T}-1}\nolimits\sum_{\forall j\in M^{i},% \forall l\in L^{i}}\nolimits R_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t)),roman_max start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ∈ { 0 , 1 } end_POSTSUBSCRIPT : roman_lim start_POSTSUBSCRIPT caligraphic_T → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG caligraphic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) , (14)

where dj,li(t)subscriptsuperscript𝑑𝑖𝑗𝑙𝑡d^{i}_{j,l}(t)italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) and xj,li(t)subscriptsuperscript𝑥𝑖𝑗𝑙𝑡x^{i}_{j,l}(t)italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) represent the distance and the scheduling vector between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the NTN device within the coverage of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (i.e., Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT) at t𝑡titalic_t, respectively. In addition, Misuperscript𝑀𝑖M^{i}italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and Lisuperscript𝐿𝑖L^{i}italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT in (14) stand for the sets of CubeSats and HALE-UAVs within the coverage of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Furthermore, jMi,lLixj,li(t)H¯i,xj,li(t){0,1},jMi,lLiformulae-sequencesubscriptformulae-sequencefor-all𝑗superscript𝑀𝑖for-all𝑙superscript𝐿𝑖subscriptsuperscript𝑥𝑖𝑗𝑙𝑡subscript¯𝐻𝑖formulae-sequencefor-allsubscriptsuperscript𝑥𝑖𝑗𝑙𝑡01formulae-sequencefor-all𝑗superscript𝑀𝑖for-all𝑙superscript𝐿𝑖\sum_{\forall j\in M^{i},\forall l\in L^{i}}x^{i}_{j,l}(t)\leq\bar{H}_{i},% \forall x^{i}_{j,l}(t)\in\{0,1\},\forall j\in M^{i},\forall l\in L^{i}∑ start_POSTSUBSCRIPT ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ≤ over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ∈ { 0 , 1 } , ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT holds where H¯isubscript¯𝐻𝑖\bar{H}_{i}over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT means the maximal number of acceptable NTN devices (Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT) that Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can monitor. Lastly, Ri(dj,li(t),xj,li(t))subscript𝑅𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡R_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) is our utility function for seamless global access, and it can be formulated as,

Ri(dj,li(t),xj,li(t))=Ui(dj,li(t),xj,li(t))Ci(dj,li(t),xj,li(t)),subscript𝑅𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡subscript𝑈𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡subscript𝐶𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡R_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))=U_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))-C_{i% }(d^{i}_{j,l}(t),x^{i}_{j,l}(t)),italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) = italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) - italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) , (15)

where Ui(dj,li(t),xj,li(t))subscript𝑈𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡U_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) and Ci(dj,li(t),xj,li(t))subscript𝐶𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡C_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) stand for the utility and cost functions. In (15),

Ui(dj,li(t),xj,li(t))=jMi,lLiq(dj,li(t))ξj,lSA(t)xj,li(t),subscript𝑈𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡subscriptformulae-sequencefor-all𝑗superscript𝑀𝑖for-all𝑙superscript𝐿𝑖qsubscriptsuperscript𝑑𝑖𝑗𝑙𝑡superscriptsubscript𝜉𝑗𝑙𝑆𝐴𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡U_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))=\sum_{\forall j\in M^{i},\forall l\in L^{% i}}\nolimits\textbf{q}(d^{i}_{j,l}(t))\cdot\xi_{j,l}^{SA}(t)\cdot x^{i}_{j,l}(% t),italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) = ∑ start_POSTSUBSCRIPT ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT q ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) ⋅ italic_ξ start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_A end_POSTSUPERSCRIPT ( italic_t ) ⋅ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , (16)

where q(dj,li(t))qsubscriptsuperscript𝑑𝑖𝑗𝑙𝑡\textbf{q}(d^{i}_{j,l}(t))q ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) and ξj,lSA(t)superscriptsubscript𝜉𝑗𝑙𝑆𝐴𝑡\xi_{j,l}^{SA}(t)italic_ξ start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_A end_POSTSUPERSCRIPT ( italic_t ) denote the quality function and capacity of the link between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its associated NTN device (Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT). In (16), the quality function can be generalized as [40],

q(dj,li(t))(1+expξ1(Λj,li(dj,li(t))ξ2))1,qsubscriptsuperscript𝑑𝑖𝑗𝑙𝑡superscript1superscriptsubscript𝜉1subscriptsuperscriptΛ𝑖𝑗𝑙subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscript𝜉21\textbf{q}(d^{i}_{j,l}(t))\triangleq\left(1+\exp^{-\xi_{1}\left(\Lambda^{i}_{j% ,l}(d^{i}_{j,l}(t))-\xi_{2}\right)}\right)^{-1},q ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) ≜ ( 1 + roman_exp start_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) - italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (17)

where the data rate Λj,li(dj,li(t))subscriptsuperscriptΛ𝑖𝑗𝑙subscriptsuperscript𝑑𝑖𝑗𝑙𝑡\Lambda^{i}_{j,l}(d^{i}_{j,l}(t))roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) depends on bandwidth (WW\mathrm{W}roman_W) and signal-to-noise ratio (SNR), which is denoted as ΓΓ\Gammaroman_Γ, thus,

Λj,li(dj,li(t))=Wlog2(1+Γ(dj,li(t))).subscriptsuperscriptΛ𝑖𝑗𝑙subscriptsuperscript𝑑𝑖𝑗𝑙𝑡Wsubscript21Γsubscriptsuperscript𝑑𝑖𝑗𝑙𝑡\Lambda^{i}_{j,l}(d^{i}_{j,l}(t))=\mathrm{W}\cdot\log_{2}\left(1+\Gamma(d^{i}_% {j,l}(t))\right).roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) = roman_W ⋅ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + roman_Γ ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) ) . (18)

Additionally, the cost function in (15) is expressed as,

Ci(dj,li(t),xj,li(t))=jMiEjS(dji(t),xji(t))σiS(t)(cooperation)+lLiElA(dli(t),xli(t))σiA(t)(cooperation),subscript𝐶𝑖subscriptsuperscript𝑑𝑖𝑗𝑙𝑡subscriptsuperscript𝑥𝑖𝑗𝑙𝑡subscriptfor-all𝑗superscript𝑀𝑖subscriptsuperscript𝐸𝑆𝑗subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡subscriptsuperscriptsubscript𝜎𝑖𝑆𝑡(cooperation)subscriptfor-all𝑙superscript𝐿𝑖subscriptsuperscript𝐸𝐴𝑙subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡subscriptsuperscriptsubscript𝜎𝑖𝐴𝑡(cooperation)C_{i}(d^{i}_{j,l}(t),x^{i}_{j,l}(t))=\sum_{\forall j\in M^{i}}\limits E^{S}_{j% }(d^{i}_{j}(t),x^{i}_{j}(t))\cdot\underbrace{\sigma_{i}^{S}(t)}_{\textrm{(% cooperation)}}\\ +\sum_{\forall l\in L^{i}}\limits E^{A}_{l}(d^{i}_{l}(t),x^{i}_{l}(t))\cdot% \underbrace{\sigma_{i}^{A}(t)}_{\textrm{(cooperation)}},start_ROW start_CELL italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ) = ∑ start_POSTSUBSCRIPT ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) ⋅ under⏟ start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_POSTSUBSCRIPT (cooperation) end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL + ∑ start_POSTSUBSCRIPT ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ) ⋅ under⏟ start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_POSTSUBSCRIPT (cooperation) end_POSTSUBSCRIPT , end_CELL end_ROW (19)

where EjS(dji(t),xji(t))subscriptsuperscript𝐸𝑆𝑗subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡E^{S}_{j}(d^{i}_{j}(t),x^{i}_{j}(t))italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) and ElA(dli(t),xli(t))subscriptsuperscript𝐸𝐴𝑙subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡E^{A}_{l}(d^{i}_{l}(t),x^{i}_{l}(t))italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ) represent the normalized energy expenditure of Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, respectively. In (19), σiS(t)superscriptsubscript𝜎𝑖𝑆𝑡\sigma_{i}^{S}(t)italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( italic_t ), and σiA(t)superscriptsubscript𝜎𝑖𝐴𝑡\sigma_{i}^{A}(t)italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_t ) quantify the standard deviation of the residual energy levels for Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. The cooperation highlighted in (19) is essential for reducing the variance of each NTN device (CubeSat or HALE-UAV)’s energy status, thereby it can avert the disproportionate energy usage of any specific CubeSat or HALE-UAV as well as promote collaborative operations for minimizing total energy expenditure.

Furthermore, the total energy expenditure, i.e., EjS(dji(t),xji(t))subscriptsuperscript𝐸𝑆𝑗subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡E^{S}_{j}(d^{i}_{j}(t),x^{i}_{j}(t))italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) and ElA(dli(t),xli(t))subscriptsuperscript𝐸𝐴𝑙subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡E^{A}_{l}(d^{i}_{l}(t),x^{i}_{l}(t))italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ), corresponds to the amount of energy utilized during communications between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its associated NTN device (Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT). The energy consumed in Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, i.e., EjS(dji(t),xji(t))subscriptsuperscript𝐸𝑆𝑗subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡E^{S}_{j}(d^{i}_{j}(t),x^{i}_{j}(t))italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ), and also in Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, i.e., ElA(dli(t),xli(t))subscriptsuperscript𝐸𝐴𝑙subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡E^{A}_{l}(d^{i}_{l}(t),x^{i}_{l}(t))italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ), are limited by their specific maximum capacities, e¯jsubscript¯𝑒𝑗\bar{e}_{j}over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and e¯lsubscript¯𝑒𝑙\bar{e}_{l}over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT for Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, which can be expressed as EjS(dji(t),xji(t))e¯j,jMiformulae-sequencesubscriptsuperscript𝐸𝑆𝑗subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡subscript¯𝑒𝑗for-all𝑗superscript𝑀𝑖E^{S}_{j}(d^{i}_{j}(t),x^{i}_{j}(t))\leq\bar{e}_{j},\forall j\in M^{i}italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) ≤ over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and ElA(dli(t),xli(t))e¯l,lLiformulae-sequencesubscriptsuperscript𝐸𝐴𝑙subscriptsuperscript𝑑𝑖𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡subscript¯𝑒𝑙for-all𝑙superscript𝐿𝑖E^{A}_{l}(d^{i}_{l}(t),x^{i}_{l}(t))\leq\bar{e}_{l},\forall l\in L^{i}italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ) ≤ over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, respectively. Furthermore, the maximum capacity of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is also taken into account, i.e.,

ξiGS(t)+jMiξjS(t)xji(t)+lLiξlA(t)xli(t)ξ¯i=ϱ1+eζ(tτ),subscriptsuperscript𝜉𝐺𝑆𝑖𝑡subscriptfor-all𝑗superscript𝑀𝑖subscriptsuperscript𝜉𝑆𝑗𝑡subscriptsuperscript𝑥𝑖𝑗𝑡subscriptfor-all𝑙superscript𝐿𝑖subscriptsuperscript𝜉𝐴𝑙𝑡subscriptsuperscript𝑥𝑖𝑙𝑡subscript¯𝜉𝑖italic-ϱ1superscript𝑒𝜁𝑡𝜏\xi^{GS}_{i}(t)+\sum_{\forall j\in M^{i}}\nolimits\xi^{S}_{j}(t)\cdot x^{i}_{j% }(t)\\ +\sum_{\forall l\in L^{i}}\nolimits\xi^{A}_{l}(t)\cdot x^{i}_{l}(t)\leq\bar{% \xi}_{i}=\frac{\varrho}{1+e^{-\zeta(t-\tau)}},start_ROW start_CELL italic_ξ start_POSTSUPERSCRIPT italic_G italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + ∑ start_POSTSUBSCRIPT ∀ italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ⋅ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL + ∑ start_POSTSUBSCRIPT ∀ italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ⋅ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ≤ over¯ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_ϱ end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_ζ ( italic_t - italic_τ ) end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (20)

where ξiGS(t)subscriptsuperscript𝜉𝐺𝑆𝑖𝑡\xi^{GS}_{i}(t)italic_ξ start_POSTSUPERSCRIPT italic_G italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), ξjS(t)subscriptsuperscript𝜉𝑆𝑗𝑡\xi^{S}_{j}(t)italic_ξ start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), ξlA(t)subscriptsuperscript𝜉𝐴𝑙𝑡\xi^{A}_{l}(t)italic_ξ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and ξ¯isubscript¯𝜉𝑖\bar{\xi}_{i}over¯ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, are the capacity of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the capacity of Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, the capacity of Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, and the maximum capacity of the Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively, and the ξ¯isubscript¯𝜉𝑖\bar{\xi}_{i}over¯ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT varies depending on the region where each GS is located, the population of that region, and the degree of communication overloads. Additionally, ϱitalic-ϱ\varrhoitalic_ϱ, ζ𝜁\zetaitalic_ζ, t𝑡titalic_t, and τ𝜏\tauitalic_τ are the maximum of logarithmic quality function curve, control factor the steepness of the curve, time, and midpoint of the curve, respectively.

4.2 Reinforcement Learning Modeling

According to the dynamics of CubeSats and HALE-UAVs under uncertain environments, the rapid and unexpected state changes occur over time. These dynamics and uncertain environments are obviously obstacles for large-scale global SAGIN mobile access scheduling, which can be modelled with combinatorics optimization. For more details, these scheduling problems are generally formulated as integer programming (IP), which are known for their non-deterministic polynomial (NP)-hard complexity, making them particularly difficult to solve using conventional methods. Therefore, it is highly advantageous to re-formulate the original optimization framework into RL-based sequential discrete-time decision-making for time-average scheduling utility maximization. Additionally, in the environment formalized through RL, GS constantly interacts with the environment and learns the optimal policy in the process, therefore RL can be a good solution in such a very dynamic and uncertain environment. However, to implement realistic global access in SAGIN, many GSs, CubeSats, and HALE-UAVs are needed. Because multiple GSs are required, this changes the form of the problem from RL to MARL scheduling, and because multiple CubeSats and HALE-UAVs must be used, the action dimension of the GS increases exponentially as the number of these NTN devices increases. The conventional MARL has a fatal problem that as the number of GS increases, or as the number of actions that GS can select, that is, the number of CubeSats and HALE-UAVs increases, GS suffers from the curse of dimensionality and its learning performance deteriorates. This paper undertakes such a re-formulation using QMARL, proposing a novel approach for tackling the complexities of scheduling in time-varying dynamic environments. QMARL utilizes QNN and is free from the curse of dimensionality, which is the big problem in conventional MARL. If QMARL is used to implement realistic global access in SAGIN, seamless global access can be achieved by simultaneously optimizing global access performance and the residual energy of NTN devices even when using numerous GS, CubeSat, and HALE-UAV.

State. In our considering aerial network with CubeSats and HALE-UAVs, the state is defined by the observational data collected by Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, denoted as 𝒮i(t)subscript𝒮𝑖𝑡\mathcal{S}_{i}(t)caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), and it can be as follows,

𝒮i(t){Pi(t),ξi(t),jMi{PjS(t),EjS(t),ξjS(t)},lLi{PlA(t),ElA(t),ξlA(t)}},subscript𝒮𝑖𝑡subscript𝑃𝑖𝑡subscript𝜉𝑖𝑡subscript𝑗superscript𝑀𝑖subscriptsuperscript𝑃𝑆𝑗𝑡subscriptsuperscript𝐸𝑆𝑗𝑡subscriptsuperscript𝜉𝑆𝑗𝑡subscript𝑙superscript𝐿𝑖subscriptsuperscript𝑃𝐴𝑙𝑡subscriptsuperscript𝐸𝐴𝑙𝑡subscriptsuperscript𝜉𝐴𝑙𝑡\mathcal{S}_{i}(t)\triangleq\{P_{i}(t),\xi_{i}(t),\bigcup_{j\in M^{i}}\{P^{S}_% {j}(t),E^{S}_{j}(t),\xi^{S}_{j}(t)\},\\ \bigcup_{l\in L^{i}}\{P^{A}_{l}(t),E^{A}_{l}(t),\xi^{A}_{l}(t)\}\},start_ROW start_CELL caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ≜ { italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , ⋃ start_POSTSUBSCRIPT italic_j ∈ italic_M start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_ξ start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) } , end_CELL end_ROW start_ROW start_CELL ⋃ start_POSTSUBSCRIPT italic_l ∈ italic_L start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_P start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_ξ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) } } , end_CELL end_ROW (21)

where Pi(t)subscript𝑃𝑖𝑡P_{i}(t)italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), ξi(t)subscript𝜉𝑖𝑡\xi_{i}(t)italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), PjS(t)subscriptsuperscript𝑃𝑆𝑗𝑡P^{S}_{j}(t)italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), EjS(t)subscriptsuperscript𝐸𝑆𝑗𝑡E^{S}_{j}(t)italic_E start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), ξjS(t)subscriptsuperscript𝜉𝑆𝑗𝑡\xi^{S}_{j}(t)italic_ξ start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), PlA(t)subscriptsuperscript𝑃𝐴𝑙𝑡P^{A}_{l}(t)italic_P start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), ElA(t)subscriptsuperscript𝐸𝐴𝑙𝑡E^{A}_{l}(t)italic_E start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and ξlA(t)subscriptsuperscript𝜉𝐴𝑙𝑡\xi^{A}_{l}(t)italic_ξ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) stand for the position of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the capacity of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the position of Sji(t)superscriptsubscript𝑆𝑗𝑖𝑡S_{j}^{i}(t)italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ), the energy state of Sji(t)superscriptsubscript𝑆𝑗𝑖𝑡S_{j}^{i}(t)italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ), the capacity of Sji(t)superscriptsubscript𝑆𝑗𝑖𝑡S_{j}^{i}(t)italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ), the position of Ali(t)superscriptsubscript𝐴𝑙𝑖𝑡A_{l}^{i}(t)italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ), the energy state of Ali(t)superscriptsubscript𝐴𝑙𝑖𝑡A_{l}^{i}(t)italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ), and the capacity of Ali(t)superscriptsubscript𝐴𝑙𝑖𝑡A_{l}^{i}(t)italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ). Here, the positions of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT are specified as Pi(t)={piϕ(t),piλ(t),piH(t)}subscript𝑃𝑖𝑡subscriptsuperscript𝑝italic-ϕ𝑖𝑡subscriptsuperscript𝑝𝜆𝑖𝑡subscriptsuperscript𝑝𝐻𝑖𝑡P_{i}(t)=\{p^{\phi}_{i}(t),p^{\lambda}_{i}(t),p^{H}_{i}(t)\}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = { italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) }, PjS(t)={pjϕ(t),pjλ(t),pjH(t),vjS(t)}subscriptsuperscript𝑃𝑆𝑗𝑡subscriptsuperscript𝑝italic-ϕ𝑗𝑡subscriptsuperscript𝑝𝜆𝑗𝑡subscriptsuperscript𝑝𝐻𝑗𝑡subscriptsuperscript𝑣𝑆𝑗𝑡P^{S}_{j}(t)=\{p^{\phi}_{j}(t),p^{\lambda}_{j}(t),p^{H}_{j}(t),v^{S}_{j}(t)\}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = { italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , italic_v start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) }, and PlA(t)={plϕ(t),plλ(t),plH(t),vlA(t)}subscriptsuperscript𝑃𝐴𝑙𝑡subscriptsuperscript𝑝italic-ϕ𝑙𝑡subscriptsuperscript𝑝𝜆𝑙𝑡subscriptsuperscript𝑝𝐻𝑙𝑡subscriptsuperscript𝑣𝐴𝑙𝑡P^{A}_{l}(t)=\{p^{\phi}_{l}(t),p^{\lambda}_{l}(t),p^{H}_{l}(t),v^{A}_{l}(t)\}italic_P start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = { italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_v start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) }, where piϕ(t)subscriptsuperscript𝑝italic-ϕ𝑖𝑡p^{\phi}_{i}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), piλ(t)subscriptsuperscript𝑝𝜆𝑖𝑡p^{\lambda}_{i}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), and piH(t)subscriptsuperscript𝑝𝐻𝑖𝑡p^{H}_{i}(t)italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) denote the latitude, longitude, and altitude of Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Similarly, pjϕ(t)subscriptsuperscript𝑝italic-ϕ𝑗𝑡p^{\phi}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), pjλ(t)subscriptsuperscript𝑝𝜆𝑗𝑡p^{\lambda}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), pjH(t)subscriptsuperscript𝑝𝐻𝑗𝑡p^{H}_{j}(t)italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), vjS(t)subscriptsuperscript𝑣𝑆𝑗𝑡v^{S}_{j}(t)italic_v start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ), plϕ(t)subscriptsuperscript𝑝italic-ϕ𝑙𝑡p^{\phi}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), plλ(t)subscriptsuperscript𝑝𝜆𝑙𝑡p^{\lambda}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), plH(t)subscriptsuperscript𝑝𝐻𝑙𝑡p^{H}_{l}(t)italic_p start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and vlA(t)subscriptsuperscript𝑣𝐴𝑙𝑡v^{A}_{l}(t)italic_v start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) represent the latitude of Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the longitude of Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the altitude of Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the velocity vector of Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the latitude of Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the longitude of Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the altitude of Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and the velocity vector of Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Action. The action at t𝑡titalic_t is represented as 𝒜(t)=[xj,li(t)]𝒜𝑡delimited-[]subscriptsuperscript𝑥𝑖𝑗𝑙𝑡\mathcal{A}(t)={[x^{i}_{j,l}(t)]}caligraphic_A ( italic_t ) = [ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ], where xj,li(t){0,1}subscriptsuperscript𝑥𝑖𝑗𝑙𝑡01x^{i}_{j,l}(t)\in\{0,1\}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) ∈ { 0 , 1 }. This indicates whether Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is available for Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT or Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT at t𝑡titalic_t or not, and note that the network access service between Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and NTN device (Sjisuperscriptsubscript𝑆𝑗𝑖S_{j}^{i}italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT or Alisuperscriptsubscript𝐴𝑙𝑖A_{l}^{i}italic_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT) is available when xji(t)=1subscriptsuperscript𝑥𝑖𝑗𝑡1x^{i}_{j}(t)=1italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = 1 or xli(t)=1subscriptsuperscript𝑥𝑖𝑙𝑡1x^{i}_{l}(t)=1italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = 1 (vice versa).

Reward. The reward function is outlined in (15), with its maximization reliant on the action scheduling xj,li(t)subscriptsuperscript𝑥𝑖𝑗𝑙𝑡x^{i}_{j,l}(t)italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ( italic_t ) made by Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This reward encompasses both utility and cost functions. Fundamentally, the goal is for each GS to orchestrate the scheduling of NTN devices (CubeSats or HALE-UAVs) to enhance the access performance in global SAGIN systems. Simultaneously, our reward function aims at the reduction of (i) the overall energy usage and (ii) the standard deviation of individual energy levels of CubeSats and HALE-UAVs. This reward function facilitates the autonomous and cooperative energy management in CubeSat and HALE-UAV.

4.3 QMARL-based Scheduler Design

Refer to caption
Figure 6: Global SAGIN mobile access using our proposed QMARL-based scheduler.

In the depicted scenario, each GS agent, identified as the i𝑖iitalic_i-th GS, is responsible for executing a combinatorial scheduling decision across M𝑀Mitalic_M CubeSats and L𝐿Litalic_L HALE-UAVs, as illustrated in Fig. 6. As the number of CubeSats M𝑀Mitalic_M and HALE-UAVs L𝐿Litalic_L increment linearly, the total number of feasible scheduling decisions experiences an exponential rise, quantified as 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT. This significant increase highlights the imperative for conventional RL policies to expand their output dimensionality, i.e., action dimensions, thereby accommodating the 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT potential combinations of these scheduling actions. However, such an increase in output dimensionality introduces difficulties in learning efficacy, a situation often described as the curse of dimensionality [41]. To tackle the mentioned challenge, this paper proposes an innovative strategy utilizing QMARL. This approach leverages quantum measurement techniques, facilitating effective navigation through high-dimensional action decision spaces by GSs. It’s noteworthy that training MARL with a substantial number of agents typically encounters reward convergence issues. Furthermore, as the number of action dimensions required by agents rises, achieving reward convergence grows more challenging. The quantum-based proposed measurement introduced here stands out as a singular solution capable of surmounting these challenges.

The QMARL-based scheduler outlined in this scenario is organized into three separate stages. The first two stages include encoding, which involves converting classical bits into quantum states referred to as qubits, and PQC, which involves the process of applying rotation gates to manipulate these quantum states in accordance with conventional QNN-based RL policies. The third and most important stage is measurement. During the concluding measurement stage, quantum states are transformed into an observable. This observable serves as the output obtained through the measurement of quantum states. The process of quantum measurement acts as a decoding mechanism, translating the outcomes of quantum computing into a format that classical computing systems can interpret and use. To facilitate global access performance of integrated networks through QMARL, the quantum system is established with a total of M+L𝑀𝐿M+Litalic_M + italic_L qubits. This total directly reflects the combined amount of CubeSats (M𝑀Mitalic_M) and HALE-UAVs (L𝐿Litalic_L), leading to the equation: |ψ=k=12M+Lαk|𝐞kket𝜓superscriptsubscript𝑘1superscript2𝑀𝐿subscript𝛼𝑘ketsubscript𝐞𝑘|\psi\rangle=\sum_{k=1}^{2^{M+L}}\alpha_{k}|\mathbf{e}_{k}\rangle| italic_ψ ⟩ = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩. In this context, αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is defined as the probability amplitude, and 𝐞ksubscript𝐞𝑘\mathbf{e}_{k}bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents the k𝑘kitalic_k-th basis within the Hilbert space.

In the domain of QNN, the Pauli-Z measurement is a prevalent method for transforming quantum states into observables. This conversion process does not depend on the number of qubits in use. In the Pauli-Z operator, each column denotes the computational basis of |0^ket^0|\hat{0}\rangle| over^ start_ARG 0 end_ARG ⟩ and |1^ket^1|\hat{1}\rangle| over^ start_ARG 1 end_ARG ⟩. For the purpose of deriving the expectation value of each qubit’s state, a matrix that projects the quantum state onto the z𝑧zitalic_z-axis is employed, which is expressed as, PZkIk1ZIQksubscriptsuperscriptP𝑘𝑍tensor-productsuperscriptI𝑘1ZsuperscriptI𝑄𝑘{\textbf{{P}}}^{k}_{Z}\triangleq{\textbf{{I}}}^{k-1}\otimes{\textbf{{Z}}}% \otimes{\textbf{{I}}}^{Q-k}P start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ≜ I start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ⊗ Z ⊗ I start_POSTSUPERSCRIPT italic_Q - italic_k end_POSTSUPERSCRIPT, where I is the identity matrix. The equation to compute an observable associated with a single basis is formulated as, 𝒪k=ψ|PZk|ψdelimited-⟨⟩subscript𝒪𝑘quantum-operator-product𝜓subscriptsuperscriptP𝑘𝑍𝜓\langle\mathcal{O}_{k}\rangle=\langle\psi|{\textbf{{P}}}^{k}_{Z}|\psi\rangle⟨ caligraphic_O start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ = ⟨ italic_ψ | P start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_ψ ⟩, where k[1,Q]for-all𝑘1𝑄\forall k\in\mathbbm{N}[1,Q]∀ italic_k ∈ blackboard_N [ 1 , italic_Q ], 𝒪k[1,1]delimited-⟨⟩subscript𝒪𝑘11\langle\mathcal{O}_{k}\rangle\in\mathbbm{R}[-1,1]⟨ caligraphic_O start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ∈ blackboard_R [ - 1 , 1 ]. To manage the combinatorial scheduling of M𝑀Mitalic_M CubeSats and L𝐿Litalic_L HALE-UAVs, a requisite output dimensionality of 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT necessitates the use of 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT qubits. This methodology, however, does not address the issue identified as the curse of dimensionality. In contrast, the QMARL-based scheduler proposed in this paper effectively minimizes the requisite number of qubits to a logarithmic scale, transitioning from 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT down to M+L𝑀𝐿M+Litalic_M + italic_L. Consequently, this innovative approach significantly reduces the qubit requirement, ensuring its operational feasibility even amidst the constraints of the noisy intermediate-scale quantum (NISQ) era, where qubit availability is limited. By implementing the basis measurement, particularly through PVM, the approach outlined in this paper facilitates the determination of probabilities for every possible 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT combinations with merely M+L𝑀𝐿M+Litalic_M + italic_L qubits. Thus, the likelihood of each conceivable 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT action can be ascertained using only M+L𝑀𝐿M+Litalic_M + italic_L qubits, expressed as, {Pr(𝒜k)}k=12M+L{k=1M+L|xj,li\{\text{Pr}(\mathcal{A}_{k})\}_{k=1}^{2^{M+L}}\triangleq\{\operatorname*{% \raisebox{-1.07639pt}{\scalebox{1.2}{$\bigotimes$}}}_{k=1}^{M+L}\nolimits|x^{i% }_{j,l}\rangle{ Pr ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ≜ { ⨂ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ⟩}, where tensor-product\operatorname*{\raisebox{-1.07639pt}{\scalebox{1.2}{$\bigotimes$}}} symbolizes the Kronecker product, xj,li{0,1}for-allsubscriptsuperscript𝑥𝑖𝑗𝑙01\forall x^{i}_{j,l}\in\{0,1\}∀ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT ∈ { 0 , 1 }, j[1,M]for-all𝑗1𝑀\forall j\in[1,M]∀ italic_j ∈ [ 1 , italic_M ], l[1,L]for-all𝑙1𝐿\forall l\in[1,L]∀ italic_l ∈ [ 1 , italic_L ]. Finally, the process to determine the probability that the i𝑖iitalic_i-th GS will choose for the k𝑘kitalic_k-th action from 2M+Lsuperscript2𝑀𝐿2^{M+L}2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT possibilities at t𝑡titalic_t, according to its strategy, is represented as,

π(𝒜k(t)|𝒮i(t);𝜽i)=ψ|𝐞k𝐞k|ψ=|ψ|𝐞k|2=|αk|2,𝜋conditionalsubscript𝒜𝑘𝑡subscript𝒮𝑖𝑡subscript𝜽𝑖inner-product𝜓subscript𝐞𝑘inner-productsubscript𝐞𝑘𝜓superscriptinner-product𝜓subscript𝐞𝑘2superscriptsubscript𝛼𝑘2\pi(\mathcal{A}_{k}(t)|\mathcal{S}_{i}(t);\boldsymbol{\theta}_{i})\!=\!\langle% \psi|\mathbf{e}_{k}\rangle\langle\mathbf{e}_{k}|\psi\rangle\!=\!|\langle\psi|% \mathbf{e}_{k}\rangle|^{2}\!=\!|\alpha_{k}|^{2},italic_π ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) | caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ⟨ italic_ψ | bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ⟨ bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_ψ ⟩ = | ⟨ italic_ψ | bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (22)

where |𝐞k𝐞k|ketsubscript𝐞𝑘brasubscript𝐞𝑘|\mathbf{e}_{k}\rangle\langle\mathbf{e}_{k}|| bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ⟨ bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | denotes the projector for the k𝑘kitalic_k-th basis, with the collection of all such projectors for every basis being {|𝐞k𝐞k|}k=12M+Lsubscriptsuperscriptketsubscript𝐞𝑘brasubscript𝐞𝑘superscript2𝑀𝐿𝑘1\{|\mathbf{e}_{k}\rangle\langle\mathbf{e}_{k}|\}^{2^{M+L}}_{k=1}{ | bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ⟨ bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | } start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT. This is because the probabilities for each action corresponds to an individual outputs as, k=12M+Lπ(𝒜k(t)|𝒮i(t);𝜽i)=1subscriptsuperscriptsuperscript2𝑀𝐿𝑘1𝜋conditionalsubscript𝒜𝑘𝑡subscript𝒮𝑖𝑡subscript𝜽𝑖1\sum^{2^{M+L}}_{k=1}\nolimits\pi(\mathcal{A}_{k}(t)|\mathcal{S}_{i}(t);% \boldsymbol{\theta}_{i})=1∑ start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_M + italic_L end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_π ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) | caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1. This paper adopts activation functions as basis measurement, thereby allowing each GS to undertake action decision-making on the logarithmically reduced action dimension.

4.4 QMARL-based Scheduler Training

The network under consideration is conceptualized as a multi-agent system, where each i𝑖iitalic_i-th GS acts as the i𝑖iitalic_i-th agent equipped with its own QNN-based RL policy, π(𝒜(t)|𝒮i(t);𝜽i)𝜋conditional𝒜𝑡subscript𝒮𝑖𝑡subscript𝜽𝑖\pi(\mathcal{A}(t)|\mathcal{S}_{i}(t);\boldsymbol{\theta}_{i})italic_π ( caligraphic_A ( italic_t ) | caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), parameterized by 𝜽isubscript𝜽𝑖\boldsymbol{\theta}_{i}bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In the training phase, a unified centralized critic, parameterized by ϕitalic-ϕ\phiitalic_ϕ, assesses the policy effectiveness of multiple agents by estimating the state-value function Vϕ(𝒮(t))subscript𝑉bold-italic-ϕ𝒮𝑡V_{\boldsymbol{\phi}}(\mathcal{S}(t))italic_V start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( caligraphic_S ( italic_t ) ), with 𝒮(t)𝒮𝑡\mathcal{S}(t)caligraphic_S ( italic_t ) representing the ground truth, encapsulating all accessible environmental data [42]. Conversely, each GS engages in sequential decision-making based on its individual partial state (i.e., observation), 𝒮i(t)subscript𝒮𝑖𝑡\mathcal{S}_{i}(t)caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ). This training framework enables all GSs to refine their policies towards collective decision-making, notwithstanding their limited observation of the environment. Furthermore, during inference, due to the distributed approach to cooperation, it is possible to achieve effective scalability and efficient use of computing resources.

After completing this procedure, TD error is utilized to implement multi-agent PG methods for the training of quantum multi-actor centralized-critic networks. The objective function for the i𝑖iitalic_i-th actor (Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), denoted as J(𝜽i)𝐽subscript𝜽𝑖J(\boldsymbol{\theta}_{i})italic_J ( bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), is expressed as,

θiJ(θi)=𝔼𝒮[t=1Ti=1Nδϕ(t)θilogπ(𝒜(t)|𝒮i(t);θi)],subscriptsubscript𝜃𝑖𝐽subscript𝜃𝑖subscript𝔼𝒮delimited-[]subscriptsuperscript𝑇𝑡1subscriptsuperscript𝑁𝑖1subscript𝛿italic-ϕ𝑡subscriptsubscript𝜃𝑖𝜋conditional𝒜𝑡subscript𝒮𝑖𝑡subscript𝜃𝑖\nabla_{\theta_{i}}J(\theta_{i})=\mathbbm{E}_{\mathcal{S}}\Big{[}\!\sum^{T}_{t% =1}\nolimits\sum^{N}_{i=1}\nolimits\delta_{\phi}(t)\nabla_{\theta_{i}}\!\log% \pi(\mathcal{A}(t)|\mathcal{S}_{i}(t);\theta_{i})\Big{]},∇ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_t ) ∇ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_π ( caligraphic_A ( italic_t ) | caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ; italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] , (23)

where δϕ(t)subscript𝛿italic-ϕ𝑡\delta_{\phi}(t)italic_δ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_t ), π𝜋\piitalic_π, 𝒜(t)𝒜𝑡\mathcal{A}(t)caligraphic_A ( italic_t ), 𝒮i(t)subscript𝒮𝑖𝑡\mathcal{S}_{i}(t)caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ), and θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the TD error based on Bellman optimality equation in time step t𝑡titalic_t, policy, action at time t𝑡titalic_t, state at time t𝑡titalic_t, and neural network parameters, respectively. The loss function pertaining to the critic, denoted by (ϕ)italic-ϕ\mathcal{L}(\phi)caligraphic_L ( italic_ϕ ), is specified as,

ϕ(ϕ)=t=1Tϕδϕ(t)2,subscriptitalic-ϕitalic-ϕsubscriptsuperscript𝑇𝑡1subscriptitalic-ϕsuperscriptnormsubscript𝛿italic-ϕ𝑡2\nabla_{\phi}\mathcal{L}(\phi)=\sum^{T}_{t=1}\nolimits\nabla_{\phi}\left\|% \delta_{\phi}(t)\right\|^{2},∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT caligraphic_L ( italic_ϕ ) = ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ∥ italic_δ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (24)

To optimize the objective function for multiple GSs and reduce the loss function of the centralized critic, the derivatives of the k𝑘kitalic_k-th parameters are expressed as,

J(θi)θk=J(θi)πθiπθi𝒪k,𝜽i(Classical Backpropagation)𝒪k,𝜽iθk(Parameter-Shift Rule),𝐽subscript𝜃𝑖subscript𝜃𝑘subscript𝐽subscript𝜃𝑖subscript𝜋subscript𝜃𝑖subscript𝜋subscript𝜃𝑖delimited-⟨⟩subscript𝒪𝑘subscript𝜽𝑖(Classical Backpropagation)subscriptdelimited-⟨⟩subscript𝒪𝑘subscript𝜽𝑖subscript𝜃𝑘(Parameter-Shift Rule)\displaystyle\frac{\partial J(\theta_{i})}{\partial\theta_{k}}=\underbrace{% \frac{\partial J(\theta_{i})}{\partial\pi_{\theta_{i}}}\cdot\frac{\partial\pi_% {\theta_{i}}}{\partial\langle\mathcal{O}_{k,\boldsymbol{\theta}_{i}}\rangle}}_% {\textrm{(Classical Backpropagation)}}\cdot\underbrace{\frac{{\partial\langle% \mathcal{O}_{k,\boldsymbol{\theta}_{i}}\rangle}}{\partial\theta_{k}}}_{\textrm% {(Parameter-Shift Rule)}},divide start_ARG ∂ italic_J ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = under⏟ start_ARG divide start_ARG ∂ italic_J ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG ∂ italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG ∂ ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ end_ARG end_ARG start_POSTSUBSCRIPT (Classical Backpropagation) end_POSTSUBSCRIPT ⋅ under⏟ start_ARG divide start_ARG ∂ ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT (Parameter-Shift Rule) end_POSTSUBSCRIPT , (25)
(ϕ)ϕk=(ϕ)VϕVϕ𝒪k,ϕ(Classical Backpropagation)𝒪k,ϕϕk(Parameter-Shift Rule),italic-ϕsubscriptitalic-ϕ𝑘subscriptitalic-ϕsubscript𝑉italic-ϕsubscript𝑉italic-ϕdelimited-⟨⟩subscript𝒪𝑘italic-ϕ(Classical Backpropagation)subscriptdelimited-⟨⟩subscript𝒪𝑘italic-ϕsubscriptitalic-ϕ𝑘(Parameter-Shift Rule)\displaystyle\frac{\partial\mathcal{L}(\phi)}{\partial\phi_{k}}=\underbrace{% \frac{\partial\mathcal{L}(\phi)}{\partial V_{\phi}}\cdot\frac{\partial V_{\phi% }}{\partial\langle\mathcal{O}_{k,\phi}\rangle}}_{\textrm{(Classical % Backpropagation)}}\cdot\underbrace{\frac{{\partial\langle\mathcal{O}_{k,\phi}% \rangle}}{\partial\phi_{k}}}_{\textrm{(Parameter-Shift Rule)}},divide start_ARG ∂ caligraphic_L ( italic_ϕ ) end_ARG start_ARG ∂ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = under⏟ start_ARG divide start_ARG ∂ caligraphic_L ( italic_ϕ ) end_ARG start_ARG ∂ italic_V start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG ∂ italic_V start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_ARG start_ARG ∂ ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , italic_ϕ end_POSTSUBSCRIPT ⟩ end_ARG end_ARG start_POSTSUBSCRIPT (Classical Backpropagation) end_POSTSUBSCRIPT ⋅ under⏟ start_ARG divide start_ARG ∂ ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , italic_ϕ end_POSTSUBSCRIPT ⟩ end_ARG start_ARG ∂ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT (Parameter-Shift Rule) end_POSTSUBSCRIPT , (26)

and the first and second terms of the right-hand side in (25) and (26) are computed using classical partial derivatives. Nonetheless, the third term presents a challenge for classical computation methods, as the quantum state’s specifics remain indeterminate before collapsing its state by measurement. To overcome this problem in parameter optimization throughout the training phase, the parameter shift rule comes into play. The rule applied for computing the derivative of the i𝑖iitalic_i-th GS’s k𝑘kitalic_k-th parameter, focusing on the 00-th order derivative, is specified as,

𝒪k,𝜽iθk=𝒪k,𝜽i+π2𝐞k𝒪k,𝜽iπ2𝐞k,delimited-⟨⟩subscript𝒪𝑘subscript𝜽𝑖subscript𝜃𝑘delimited-⟨⟩subscript𝒪𝑘subscript𝜽𝑖𝜋2subscript𝐞𝑘delimited-⟨⟩subscript𝒪𝑘subscript𝜽𝑖𝜋2subscript𝐞𝑘\frac{{\partial\langle\mathcal{O}_{k,\boldsymbol{\theta}_{i}}\rangle}}{% \partial\theta_{k}}=\langle\mathcal{O}_{k,\boldsymbol{\theta}_{i}+\frac{\pi}{2% }\mathbf{e}_{k}}\rangle-\langle\mathcal{O}_{k,\boldsymbol{\theta}_{i}-\frac{% \pi}{2}\mathbf{e}_{k}}\rangle,divide start_ARG ∂ ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_π end_ARG start_ARG 2 end_ARG bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ - ⟨ caligraphic_O start_POSTSUBSCRIPT italic_k , bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_π end_ARG start_ARG 2 end_ARG bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ , (27)

where 𝐞ksubscript𝐞𝑘\mathbf{e}_{k}bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the k𝑘kitalic_k-th basis. Unlike classical backpropagation, the parameter shift rule provides a more straightforward and intuitive methodology. As a result, this approach can significantly expedite the training process for QNNs.

5 Performance Evaluation

5.1 Benchmarks and Simulation Setup

To evaluate the performance of the dimension-reduced QMARL-based scheduler, various benchmarks are utilized, i.e., MARL, Independent Q-Learning (IQL), Deep Q-Network (DQN), and Random (i.e., Monte Carlo) schedulers. In the (17) for the quality function, ξ1subscript𝜉1\xi_{1}italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ξ2subscript𝜉2\xi_{2}italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are ξ1=0.01subscript𝜉10.01\xi_{1}=0.01italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.01 and ξ2=1,024subscript𝜉21024\xi_{2}=1,024italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 , 024, respectively, and the parameters used for this performance evaluation are presented in Table III.

TABLE III: System Parameters for Performance Evaluation
Notation Value
No. of GSs/CubeSats/HALE-UAVs (N𝑁Nitalic_N, M𝑀Mitalic_M, L𝐿Litalic_L) 4444, 8888, 8888
Action dimension (|𝒜|𝒜|\mathcal{A}|| caligraphic_A |) {21,24,216}superscript21superscript24superscript216\{2^{1},2^{4},2^{16}\}{ 2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT }
Discount factor (γ𝛾\gammaitalic_γ) 0.980.980.980.98
Batch size 64646464
Initial/Min of epsilon (ϵinitsubscriptitalic-ϵinit\epsilon_{\mathrm{init}}italic_ϵ start_POSTSUBSCRIPT roman_init end_POSTSUBSCRIPT, ϵminsubscriptitalic-ϵ\epsilon_{\min}italic_ϵ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT) 0.2750.2750.2750.275, 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Annealing epsilon 5×1055superscript1055\times 10^{-5}5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
LR of actor (αactorsubscript𝛼actor\alpha_{\mathrm{actor}}italic_α start_POSTSUBSCRIPT roman_actor end_POSTSUBSCRIPT) 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
LR of central critic (αcriticsubscript𝛼critic\alpha_{\mathrm{critic}}italic_α start_POSTSUBSCRIPT roman_critic end_POSTSUBSCRIPT) 2.5×1042.5superscript1042.5\times 10^{-4}2.5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Training epochs 10,0001000010,00010 , 000
Activation ReLU, Optimizer: Adam

5.2 Policy Training

Refer to caption
Refer to caption
(a) Reward.
Refer to caption
(b) QoS.
Refer to caption
(c) Capacity.
Refer to caption
(d) Residual energy of CubeSat.
Refer to caption
(e) Residual energy of HALE-UAV.
Figure 7: SAGIN access performance, i.e., access availability (QoS, capacity) and energy efficiency (residual energy).

Fig. 7(a) illustrate that the QMARL-based scheduling approach introduced in this paper outperforms comparative benchmarks, achieving a maximal reward of 1.01.01.01.0. In comparison, the MARL-based scheduler provides less reward than the QMARL-based scheduler, and the reward value fluctuates and eventually does not converge. Furthermore, the performance of IQL and DQN based schedulers closely mirrors that of the Random based scheduler in terms of reward. Figs. 7(b)-(e) reveal that the scheduler based on QMARL attains superior QoS, capacity, and remaining energy for CubeSats/HALE-UAVs. Conversely, MARL-based scheduling approaches fail to concurrently optimize multiple metrics related to communication and the energy efficiency of NTN devices. Within the MARL based-scheduler, an increase in QoS and capacity correlates with a decrease in residual energy, indicating an inability to simultaneously optimize global access performance of integrated networks (QoS, capacity) and the residual energy of CubeSats/HALE-UAVs. In contrast, the QMARL-based scheduler successfully optimizes both global access performance and energy efficiency in parallel.

TABLE IV: Performance Evaluation Results when |𝒜|=216𝒜superscript216|\mathcal{A}|=2^{16}| caligraphic_A | = 2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT
Algorithm QoS Capacity Residual Energy
QMARL 0.9060.906\mathbf{0.906}bold_0.906 0.8940.894\mathbf{0.894}bold_0.894 0.9120.912\mathbf{0.912}bold_0.912
MARL 0.4840.4840.4840.484 0.3210.3210.3210.321 0.4570.4570.4570.457
IQL 0.1480.1480.1480.148 0.1880.1880.1880.188 0.4190.4190.4190.419
DQN 0.1940.1940.1940.194 0.2580.2580.2580.258 0.4420.4420.4420.442
Random 0.1510.1510.1510.151 0.1970.1970.1970.197 0.4370.4370.4370.437

Table IV illustrates that the QMARL based scheduler significantly surpasses its MARL-based scheduler, recording an 87.2%percent\%% enhancement in QoS, a 178%percent\%% increase in capacity, and an 99.5%percent\%% augmentation in remaining energy. Additionally, the performance of IQL, DQN, and Random based scheduler are notably inferior in all evaluated aspects, with QoS not exceeding 0.20.20.20.2, capacity remaining below 0.260.260.260.26, and the residual energy of CubeSats/HALE-UAVs falling short of 0.450.450.450.45, as explicated in Table IV.

Refer to caption
Refer to caption
(a) QoS vs. Residual Energy.
Refer to caption
(b) Capacity vs. Residual Energy.
Refer to caption
(c) Residual energy for each CubeSat.
Refer to caption
(d) Residual energy for each HALE-UAV.
Figure 8: Relationship between access availability and energy efficiency.

Figs. 8(a)–(b) delineate the correlation between the global access performance of integrated networks and the normalized residual energy of NTNs, contingent upon the employed algorithm. The epoch on the x𝑥xitalic_x-axis is segmented into three phases: 00 to 4k4𝑘4k4 italic_k (initial phase), 4k4𝑘4k4 italic_k to 7k7𝑘7k7 italic_k (intermediate phase), and 7k7𝑘7k7 italic_k to 10k10𝑘10k10 italic_k (final phase). Throughout the progression from the initial to the intermediate phase in MARL, an increment is observed in the energy of NTN devices, albeit with a reduction in QoS and capacity. This limitation is not exclusive to MARL but also extends to schedulers based on IQL, DQN, and Random schedulers, which are unable to concurrently optimize the performance of global access performance of integrated networks and the residual energy of NTN devices. In stark contrast, QMARL-based scheduler consistently maintains elevated levels of QoS, capacity, and residual energy. Figs. 8(c)–(d) display the remaining energy of the Sjisubscriptsuperscript𝑆𝑖𝑗S^{i}_{j}italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Alisubscriptsuperscript𝐴𝑖𝑙A^{i}_{l}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. The occurrence of non-operational NTN devices is attributed to the inefficiency in energy utilization by the benchmarks, including those based on MARL, IQL, DQN, and Random based schedulers. In contrast, the QMARL based scheduler consistently exhibits superior residual energy performance, ensuring the avoidance of any non-functional NTN devices. Additionally, the QMARL-based scheduler has higher residual energy of NTN devices compared to other benchmarks.

TABLE V: Total Normalized Converged Rewards
|𝒜|𝒜|\mathcal{A}|| caligraphic_A | QMARL MARL IQL DQN Random
21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 0.9971 1.0000 0.9411 0.9527 0.2755
24superscript242^{4}2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 0.9813 1.0000 0.8267 0.9215 0.5452
𝟐𝟏𝟔superscript216\mathbf{2^{16}}bold_2 start_POSTSUPERSCRIPT bold_16 end_POSTSUPERSCRIPT 1.0000 0.4103 0.1730 0.2235 0.1390
Refer to caption
Refer to caption
(a) Distribution of reward values according to action dimensions |𝒜|𝒜|\mathcal{A}|| caligraphic_A |.
Refer to caption
(b) Converged rewards according to action dimensions |𝒜|𝒜|\mathcal{A}|| caligraphic_A |.
Refer to caption
(c) Normalized average residual energy of NTN devices with and without GS-specific capacity requirements.
Figure 9: Rewards due to action dimension and residual energy with or without the capacity requirements in each GS.

Figs. 9(a)-(b) and Table V provide a comparative analysis of the rewards obtained by GSs utilizing both the proposed algorithms and benchmarks across varying sizes of the action dimension, specifically for |𝒜|{21|\mathcal{A}|\in\{2^{1}| caligraphic_A | ∈ { 2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, 24superscript242^{4}2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, 216}2^{16}\}2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT }. The MARL-based scheduler exhibits superior reward outcomes at smaller action dimensions (|𝒜|{21,24}𝒜superscript21superscript24|\mathcal{A}|\in\{2^{1},2^{4}\}| caligraphic_A | ∈ { 2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT }); however, it encounters significant difficulties at larger action dimension (|𝒜|=216𝒜superscript216|\mathcal{A}|=2^{16}| caligraphic_A | = 2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT), where its performance falls behind that of the QMARL based scheduler by 41.03%percent\%%, due to the curse of dimensionality. In a similar vein, IQL, DQN based schedulers yield outcomes that are analogous to those of a Random based scheduler at the largest action dimension (|𝒜|=216𝒜superscript216|\mathcal{A}|=2^{16}| caligraphic_A | = 2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT). Fig. 9(a) depicts a box plot summarizing the reward distribution across all action dimensions throughout the training process. The median reward is represented by the red line at the center of each box, with the lower and upper boundaries of the box indicating the 25%percent\%% and 75%percent\%%, respectively. Outliers are marked with a red ’+’ symbol. Notably, at the exceedingly large action dimension (|𝒜|=216𝒜superscript216|\mathcal{A}|=2^{16}| caligraphic_A | = 2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT), the QMARL-based scheduler achieves the highest reward, while the performance of other benchmarks deteriorates. Fig. 9(b) illustrates the converged normalized reward values according to the action dimensions. The utilization of larger action dimensions is deemed more realistic due to the inclusion of a greater number of CubeSats and HALE-UAVs, hence enhancing real-world applicability. In global access of integrated networks involving extensive deployment of CubeSats and HALE-UAVs, solely the QMARL-based scheduler achieves successful training outcomes, thereby evidencing a significant performance disparity in comparison to other benchmarks. These training results distinctly emphasize the exceptional capability of the QMARL based scheduler in addressing and mitigating the challenges posed by the curse of dimensionality.

Additionally, Fig. 9(c) shows the normalized average residual energy of NTN devices with and without GS-specific capacity requirements. The pink bar graph represents the average residual energy of CubeSats, and the beige bar graph represents the average residual energy of HALE-UAVs. In addition, the two bar graphs on the left are when there are no capacity requirements for each GS, and the two bar graphs on the right are when there are capacity requirements for each GS. If there are capacity requirements for each GS, unnecessary energy waste in NTN devices can be prevented. If the maximum capacity requirements are set differently for each GS depending on the region where the GS is located, the population of the region, and the degree of communication overload, the residual energy for CubeSat is 46.2%percent\%% and HALE-UAV is 38.7%percent\%% higher.

6 Concluding Remarks

This paper introduces a novel QMARL-based global SAGIN mobile access scheduler for CubeSats and HALE-UAVs, which aims at the maximization of access availability and energy efficiency. The CubeSats, characterized by their limited energy resources, employ energy efficiency strategies that differentiate between sun side and dark side orbital segments to conserve power. The reason why the quantum-based approach is utilized is that it can realize scheduling action dimension reduction. This attribute is particularly advantageous for ensuring the robust convergence of rewards in scenarios entailing extensive-scale actions, such as global access with considerable numbers of CubeSats and HALE-UAVs. The study’s experimental setup reflects real-world conditions by incorporating the orbital dynamics of CubeSats and the aerodynamic characteristics of HALE-UAVs, thereby underscoring the practical applicability of our proposed QMARL-based scheduler. Our performance evaluations with various aspects and benchmarks verify that our proposed scheduler can achieve desired performance improvements.

References

  • [1] J. Tang, J. Li, L. Zhang, X. Chen, K. Xue, Q. Sun, and J. Lu, “Opportunistic content-aware routing in satellite-terrestrial integrated networks,” IEEE Trans. Mobile Computing, pp. 1-15, 2024 (Early Access).
  • [2] Z. Luo, C. Wu, Z. Li, and W. Zhou, “Scaling GEO-Distributed Network Function Chains: A Prediction and Learning Framework,” IEEE J. Sel. Areas Commun., vol. 37, no. 8, pp. 1838–1850, Aug. 2019.
  • [3] S. Jung, M.-S. Lee, J. Kim, M.-Y. Yun, J. Kim, and J.-H. Kim, “Trustworthy handover in LEO satellite mobile networks,” ICT Express, vol. 8, no. 3, pp. 432–437, Sept. 2022.
  • [4] F. Tang, H. Zhang, and L. T. Yang, “Multipath Cooperative Routing with Efficient Acknowledgement for LEO Satellite Networks,” IEEE Trans. Mobile Computing, vol. 18, no. 1, pp. 179–192, Jan. 2019.
  • [5] S. S. Hassan, Y. M. Park, Y. K. Tun, W. Saad, Z. Han, and C. S. Hong, “Satellite-Based ITS Data Offloading & Computation in 6G Networks: A Cooperative Multi-Agent Proximal Policy Optimization DRL With Attention Approach,” IEEE Trans. Mobile Computing, vol. 23, no. 5, pp. 4956–4974, May 2024.
  • [6] Z. Ji, S. Wu, and C. Jiang, “Cooperative Multi-Agent Deep Reinforcement Learning for Computation Offloading in Digital Twin Satellite Edge Networks,” IEEE J. Sel. Areas Commun., vol. 41, no. 11, pp. 3414–3429, Nov. 2023.
  • [7] G. Pan, J. Ye, J. An, and M.-S. Alouini, “Latency Versus Reliability in LEO Mega-Constellations: Terrestrial, Aerial, or Space Relay?,” IEEE Trans. Mobile Computing, vol. 22, no. 9, pp. 5330–5345, Sept. 2023.
  • [8] Y. K. Tun, K. T. Kim, L. Zou, Z. Han, G. D ̵́an, and C. S. Hong, “Collaborative Computing Services at Ground, Air, and Space: An Optimization Approach,” IEEE Trans. Veh. Technol., vol. 73, no. 1, pp. 1491–1496, Jan. 2024.
  • [9] X. Feng, Y. Sun, and M. Peng, “Distributed Satellite-Terrestrial Cooperative Routing Strategy Based on Minimum Hop-Count Analysis in Mega LEO Satellite Constellation,” IEEE Trans. Mobile Computing, pp. 1–16, 2024 (Early Access).
  • [10] C. Dai, K. Zhu, and E. Hossain, “Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks,” IEEE Trans. Mobile Computing, vol. 22, no. 10, pp. 6056–6070, Oct. 2023.
  • [11] N. Qi, Z. Huang, F. Zhou, Q. Shi, Q. Wu, and M. Xiao, “Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks,” IEEE Trans. Mobile Computing, vol. 22, no. 10, pp. 6056–6070, Oct. 2023.
  • [12] P. Qi, X. Zhao, Y. Wang, R. Palacios, and A. Wynn, “Aeroelastic and Trajectory Control of High Altitude Long Endurance Aircraft,” IEEE Trans. Aerosp. Electron. Syst., vol. 54, no. 6, pp. 2992–3003, Dec. 2018.
  • [13] X. Dai, Z. Xiao, H. Jiang, and J. C. S. Lui, “UAV-Assisted Task Offloading in Vehicular Edge Computing Networks,” IEEE Trans. Mobile Computing, vol. 23, no. 4, pp. 2520–2534, Apr. 2024.
  • [14] X. Li, F. Tang, L. Fu, J. Yu, L. Chen, J. Liu, Y. Zhu, and L. T. Yang, “Optimized Controller Provisioning in Software-Defined LEO Satellite Networks,” IEEE Trans. Mobile Computing, vol. 22, no. 8, pp. 4850–4864, Aug. 2023.
  • [15] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks,” IEEE Trans. Mobile Computing, vol. 19, no. 11, pp. 2581–2593, Nov. 2020.
  • [16] M. Tang and V. W. Wong, “Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing Systems,” IEEE Trans. Mobile Computing, vol. 21, no. 6, pp. 1985–1997, Jun. 2022.
  • [17] G. S. Kim, J. Chung, and S. Park, “Realizing Stabilized Landing for Computation-Limited Reusable Rockets: A Quantum Reinforcement Learning Approach,” IEEE Trans. Veh. Technol., pp. 1–6, 2024 (Early Access).
  • [18] J. Cui, Y. Liu, and A. Nallanathan, “Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks,” IEEE Trans. Wirel. Commun., vol. 19, no. 2, pp. 729–743, Feb. 2020.
  • [19] S. Park, J. Chung, C. Park, S. Jung, M. Choi, S. Cho, and J. Kim, “Joint Quantum Reinforcement Learning and Stabilized Control for Spatio-Temporal Coordination in Metaverse,” IEEE Trans. Mobile Computing, pp. 1–18, 2024 (Early Access).
  • [20] H. Baek, S. Park, and J. Kim, “Logarithmic Dimension Reduction for Quantum Neural Networks,” in Proc. ACM Conf. Int. Knowl. Manage. (CIKM), Birmingham, UK, Oct. 2023, pp. 3738–3742.
  • [21] W. K. New, C. Y. Leow, K. Navaie, and Z. Ding, “Aerial-Terrestrial Network NOMA for Cellular-Connected UAVs,” IEEE Trans. Veh. Technol., vol. 71, no. 6, pp. 6559–6573, Jun. 2022.
  • [22] J.-H. Lee, J. Park, M. Bennis, and Y.-C. Ko, “Integrating LEO Satellites and Multi-UAV Reinforcement Learning for Hybrid FSO/RF Non-Terrestrial Networks,” IEEE Trans. Veh. Technol., vol. 72, no. 3, pp. 3647–3662, Mar. 2023.
  • [23] H. Hu, Z. Chen, F. Zhou, Z. Han, and H. Zhu, “Joint Resource and Trajectory Optimization for Heterogeneous-UAVs Enabled Aerial-Ground Cooperative Computing Networks,” IEEE Trans. Veh. Technol., vol. 72, no. 7, pp. 8812–8826, Jul. 2023.
  • [24] N. Babu, M. Virgili, C. B. Papadias, P. Popovski, and A. J. Forsyth, “Cost- and Energy-Efficient Aerial Communication Networks With Interleaved Hovering and Flying,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 9077–9087, Sept. 2021.
  • [25] Y. Wang, M. Sheng, W. Zhuang, S. Zhang, N. Zhang, R. Liu, and J. Li, “Multi-Resource Coordinate Scheduling for Earth Observation in Space Information Networks,” IEEE J. Sel. Areas Commun., vol. 36, no. 2, pp. 268–279, Feb. 2018.
  • [26] Z. Jia, M. Sheng, J. Li, D. Niyato, and Z. Han, “LEO-Satellite-Assisted UAV: Joint Trajectory and Data Collection for Internet of Remote Things in 6G Aerial Access Networks,” IEEE Internet Things J., vol. 8, no. 12, pp. 9814–9826, Jun. 2021.
  • [27] T. Ma, H. Zhou, B. Qian, N. Cheng, X. Shen, X. Chen, and B. Bai, “UAV-LEO Integrated Backbone: A Ubiquitous Data Collection Approach for B5G Internet of Remote Things Networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 11, pp. 3491–3505, Nov. 2021.
  • [28] J. Li, G. Wu, T. Liao, M. Fan, X. Mao, and W. Pedrycz, “Task Scheduling Under a Novel Framework for Data Relay Satellite Network via Deep Reinforcement Learning,” IEEE Trans. Veh. Technol., vol. 72, no. 5, pp. 6654–-6668, May 2023.
  • [29] C. Park, G. S. Kim, S. Park, S. Jung, and J. Kim, “Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility,” IEEE Trans. Intell. Veh., vol. 8, no. 8, pp. 4016–4030, Aug. 2023.
  • [30] R. Chen, J. Chen, H. Wang, X. Tong, Y. Xu, N. Qi, and Y. Xu, “Joint Channel Access and Power Control Optimization in Large-Scale UAV Networks: A Hierarchical Mean Field Game Approach,” IEEE Trans. Veh. Technol., vol. 72, no. 2, pp. 1982–1996, Feb. 2023.
  • [31] C. Park, W. J. Yun, J. P. Kim, T. K. Rodrigues, S. Park, S. Jung, and J. Kim, “Quantum Multi-Agent Actor-Critic Networks for Cooperative Mobile Access in Multi-UAV Systems,” IEEE Internet Things J., vol. 10, no. 22, pp. 20033–20048, Nov. 2023.
  • [32] O. Simeone, “An Introduction to Quantum Machine Learning for Engineers,” Found. Trends Signal Process., vol. 16, no. 1-2, pp. 1–223, Aug. 2022.
  • [33] S. Wojtowytsch and W. E, “Can Shallow Neural Networks Beat the Curse of Dimensionality? A Mean Field Training Perspective,” IEEE Trans. Artif. Intell., vol. 1, no. 2, pp. 121–129, Oct. 2020.
  • [34] S. Park, J. P. Kim, C. Park, S. Jung, and J. Kim, “Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation,” IEEE Commun. Mag., 2023 (Early Access).
  • [35] S. Park and J. Kim, C. Park, S. Jung, and J. Kim, “Quantum Reinforcement Learning for Large-Scale Multi-Agent Decision-Making in Autonomous Aerial Networks,” in Proc. IEEE VTS Asia Pac. Wirel. Commun. Symp. (APWCS), Taiwan, China, Aug. 2023, pp. 1–4.
  • [36] C. D. Perkins and R. E. Hage, Airplane Performance, Stability and Control, Wiley, Jan. 1991.
  • [37] S. Jung, W. J. Yun, M. Shin, J. Kim, and J.-H. Kim, “Orchestrated Scheduling and Multi-Agent Deep Reinforcement Learning for Cloud-Assisted Multi-UAV Charging Systems,” IEEE Trans. Veh. Technol., vol. 70, no. 6, pp. 5362–5377, Jun. 2021.
  • [38] A. R. S. Bramwell, D. Balmford, and G. Done, Bramwell’s Helicopter Dynamics, Elsevier, Apr. 2001.
  • [39] Y. Zeng, J. Xu, and R. Zhang, “Energy Minimization for Wireless Communication With Rotary-Wing UAV,” IEEE Trans. Wirel. Commun., vol. 18, no. 4, pp. 2329–2345, Apr. 2019.
  • [40] J. Lee, R. R. Mazumdar, and N. B. Shroff, “Non-convex optimization and rate control for multi-class services in the Internet,” IEEE/ACM Trans. Netw., vol. 13, no. 4, pp. 827–840, Aug. 2005.
  • [41] W. Du and S. Ding, “A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications,” Artif. Intell. Rev., vol. 54, no. 5, pp. 3215–3238, Nov. 2020.
  • [42] R. Lowe, Y. Wu, A. Tamar et al., “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, Dec. 2017, pp. 6379–6390.
[Uncaptioned image] Gyu Seon Kim is currently a Ph.D. student at the Department of Electrical and Computer Engineering, Korea University, Seoul, Republic of Korea. He received a B.S. degree in aerospace engineering from Inha University, Incheon, Republic of Korea. His research focuses include deep reinforcement learning algorithms and their applications to autonomous mobility systems. He received the IEEE Seoul Section Student Paper Contest Award (2023).
[Uncaptioned image] Yeryeong Cho is currently an M.S. student at the Department of Electrical and Computer Engineering, Korea University, Seoul, Republic of Korea. She received a B.S. degree in Robotics & Convergence from Hanyang University, Ansan, Republic of Korea. She was with the Eco-friendly Smart System Technical Research Center, Incheon, Republic of Korea, from 2020 to 2022. Her research focuses include deep reinforcement learning algorithms and their applications to autonomous mobility systems.
[Uncaptioned image] Jaehyun Chung is currently an M.S. student at the Department of Electrical and Computer Engineering, Korea University, Seoul, Korea, where he received his B.S. in electrical engineering, in August 2023.
[Uncaptioned image] Soohyun Park (Member, IEEE) has been an assistant professor at Sookmyung Women’s University, Seoul, Korea, since March 2024. She was a postdoctoral scholar at the Department of Electrical and Computer Engineering, Korea University, Seoul, Korea, from September 2023 to February 2024, where she received her Ph.D. in electrical and computer engineering, in August 2023. She also received her B.S. in computer science and engineering from Chung-Ang University, Seoul, Korea, in February 2019. She was a recipient of ICT Express Best Reviewer Award (2021), IEEE Seoul Section Student Paper Contest Awards, and IEEE Vehicular Technology Society (VTS) Seoul Chapter Awards.
[Uncaptioned image] Soyi Jung (Member, IEEE) has been an assistant professor at Ajou University, Suwon, Korea, since September 2022. Before joining Ajou University, she was an assistant professor at Hallym University, Chuncheon, Korea, from 2021 to 2022; a visiting scholar at Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA, USA, from 2021 to 2022; a research professor at Korea University, Seoul, Korea, in 2021; and a researcher at Korea Testing and Research (KTR) Institute, Gwacheon, Korea, from 2015 to 2016. She received her B.S., M.S., and Ph.D. degrees in electrical and computer engineering from Ajou University, Suwon, Korea, in 2013, 2015, and 2021. She was a recipient of IEEE Seoul Section Student Paper Contest Award (2018) and IEEE ICOIN Best Paper Award (2021).
[Uncaptioned image] Zhu Han (Fellow, IEEE) received the B.S. degree in electronic engineering from Tsinghua University, Beijing, China, in 1997, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Maryland at College Park, College Park, MD, USA, in 1999 and 2003, respectively. From 2000 to 2002, he was a Research and Development Engineer with JDSU, Germantown, MD, USA. From 2003 to 2006, he was a Research Associate with the University of Maryland at College Park. From 2006 to 2008, he was an Assistant Professor with Boise State University, Boise, ID, USA. He is currently a John and Rebecca Moores Professor with the Electrical and Computer Engineering Department as well as the Computer Science Department, University of Houston, Houston, TX, USA. He also works with the Department of Computer Science and Engineering, Kyung Hee University, Seoul, South Korea. His main research targets on the novel game-theory-related concepts critical to enabling efficient and distributive use of wireless networks with limited resources. His other research interests include wireless resource allocation and management, wireless communications and networking, quantum computing, data science, smart grid, carbon neutralization, security, and privacy. Dr. Han received the NSF Career Award in 2010, the Fred W. Ellersick Prize of the IEEE Communication Society in 2011, the EURASIP Best Paper Award for the Journal on Advances in Signal Processing in 2015, the IEEE Leonard G. Abraham Prize in the field of Communications Systems (Best Paper Award in IEEE JSAC) in 2016, and several best paper awards in IEEE conferences. He was an IEEE Communications Society Distinguished Lecturer from 2015 to 2018 and has been an AAAS Fellow since 2019 and an ACM Distinguished Member since 2019. He has been a 1% Highly Cited Researcher since 2017 according to Web of Science. He is also the winner of the 2021 IEEE Kiyo Tomiyasu Award (an IEEE Field Award), for outstanding early to mid-career contributions to technologies holding the promise of innovative applications, with the following citation: “for contributions to game theory and distributed management of autonomous communication networks.”
[Uncaptioned image] Joongheon Kim (M’06–SM’18) has been with Korea University, Seoul, Korea, since 2019, where he is currently an associate professor at the School of Electrical Engineering. He received the B.S. and M.S. degrees in computer science and engineering from Korea University, Seoul, Korea, in 2004 and 2006; and the Ph.D. degree in computer science from the University of Southern California (USC), Los Angeles, CA, USA, in 2014. Before joining Korea University, he was a research engineer with LG Electronics (Seoul, Korea, 2006–2009), a systems engineer with Intel Corporation (Santa Clara, CA, USA, 2013–2016), and an assistant professor with Chung-Ang University (Seoul, Korea, 2016–2019). He serves as an editor for IEEE Transactions on Vehicular Technology and IEEE Internet of Things Journal. He was a recipient of Annenberg Graduate Fellowship from USC (2009), Intel Corporation Next Generation and Standards (NGS) Division Recognition Award (2015), IEEE Systems Journal Best Paper Award (2020), IEEE ComSoc Multimedia Communications Technical Committee (MMTC) Outstanding Young Researcher Award (2020), and IEEE ComSoc MMTC Best Journal Paper Award (2021). He also received IEEE ICOIN Best Paper Award (2021), IEEE ICTC Best Paper Award (2022), and IEEE Vehicular Technology Society (VTS) Seoul Chapter Awards.