Generative AI for Unmanned Vehicle Swarms: Challenges, Applications and Opportunities

Guangyuan Liu, Nguyen Van Huynh, Hongyang Du, Dinh Thai Hoang, Dusit Niyato, , Kun Zhu, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, , and Dong In Kim G. Liu, and H. Du are with the School of Computer Science and Engineering, the Energy Research Institute @ NTU, Interdisciplinary Graduate Program, Nanyang Technological University, Singapore (e-mail: liug0022@e.ntu.edu.sg, hongyang001@e.ntu.edu.sg).N. V. Huynh is with the Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, L69 3GJ, United Kingdom (e-mail: huynh.nguyen@liverpool.ac.uk)D. T. Hoang is with the School of Electrical and Data Engineering, University of Technology Sydney, Australia (e-mail: Hoang.Dinh@uts.edu.au) D. Niyato is with the School of Computer Science and Engineering, Nanyang Technological University, Singapore (e-mail: dniyato@ntu.edu.sg).K. Zhu is with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. (E-mail: zhukun@nuaa.edu.cn).J. Kang is with the School of Automation, Guangdong University of Technology, China. (e-mail: kavinkang@gdut.edu.cn).Z. Xiong is with the Pillar of Information Systems Technology and Design, Singapore University of Technology and Design, Singapore (e-mail: zehui_xiong@sutd.edu.sg).A. Jamalipour is with the School of Electrical and Information Engineering, University of Sydney, Australia (e-mail: a.jamalipour@ieee.org).D. I. Kim is with the College of Information and Communication Engineering, Sungkyunkwan University, South Korea (e-mail: dikim@skku.ac.kr).

Abstract

With recent advances in artificial intelligence (AI) and robotics, unmanned vehicle swarms have received great attention from both academia and industry due to their potential to provide services that are difficult and dangerous to perform by humans. However, learning and coordinating movements and actions for a large number of unmanned vehicles in complex and dynamic environments introduce significant challenges to conventional AI methods. Generative AI (GAI), with its capabilities in complex data feature extraction, transformation, and enhancement, offers great potential in solving these challenges of unmanned vehicle swarms. For that, this paper aims to provide a comprehensive survey on applications, challenges, and opportunities of GAI in unmanned vehicle swarms. Specifically, we first present an overview of unmanned vehicles and unmanned vehicle swarms as well as their use cases and existing issues. Then, an in-depth background of various GAI techniques together with their capabilities in enhancing unmanned vehicle swarms are provided. After that, we present a comprehensive review on the applications and challenges of GAI in unmanned vehicle swarms with various insights and discussions. Finally, we highlight open issues of GAI in unmanned vehicle swarms and discuss potential research directions.

Index Terms:

Unmanned vehicle, unmanned vehicle swarm, generative AI, generative adversarial networks, variational autoencoder, IoT, and unmanned aerial vehicles.

I Introduction

In recent years, Unmanned Vehicles (UVs) have emerged as a disruptive technology, revolutionizing various sectors of daily life with applications spanning from package delivery and civilian Internet of Things (IoT) to military uses [1, 2]. Specifically, UVs refer to vehicles, devices, or machines that can operate with limited or without human intervention, e.g., without a human pilot or crew on board. Thanks to this special property, UVs can be used to perform tasks in challenging or hazardous environments. In general, UVs can be classified into Unmanned Aerial Vehicles (UAVs), Unmanned Ground Vehicles (UGVs), Unmanned Surface Vehicles (USVs), and Unmanned Underwater Vehicles (UUVs). As suggested by their names, each type of UV is designed for particular tasks and environments. For instance, UAVs are widely used for aerial photography and filming, environmental and wildlife monitoring, and surveillance [3, 4] while UGVs can be used for tasks such as transportation and bomb detection. Differently, USVs and UUVs are used for surface and underwater operations, respectively, including oceanographic data collection, underwater exploration, and submarine surveillance [5, 6].

With recent advances in artificial intelligence (AI) and robotics, the concept of UVs has evolved into a whole new level, namely unmanned vehicle swarms. Essentially, an unmanned vehicle swarm is designed by coordinating a group of UVs, e.g., robots, drones, and other autonomous vehicles, to achieve a common objective [7, 8]. Practically, each vehicle in a swarm can be equipped with its own sensor, processor, and communication capability. To make them efficiently collaborate together, advanced technologies in AI and robotics have been adopted to coordinate their behaviors and perform complex tasks such as autonomous navigation, self-organization, and failure management [7, 9]. As a result, unmanned vehicle swarms possess various advantages compared to conventional UVs. In particular, they offer scalability and flexibility in operations by dynamically adjusting the number of vehicles depending on specific missions and requirements. Moreover, if several UVs fail to operate in a swarm, the remaining UVs still can work together to ensure the success of their mission. This is particularly useful in missions requiring high levels of resilience and robustness. Finally, by allowing UVs to learn from and collaborate with each other, unmanned vehicle swarms can enable swarm intelligence, as known as collective intelligence [10, 11], that greatly improves operational efficiency and reliability.

Although playing an important role in unmanned vehicle swarms, conventional AI techniques still face a number of challenges. Particularly, these techniques require a large volume of labeled training data and can only obtain good performance under specific settings. As such, they are extremely vulnerable to the dynamics and uncertainty of the environment which are particularly characteristics of unmanned vehicle swarms, e.g., dynamic connections between unmanned vehicles, effects of winds and ocean currents, and sensor uncertainty and diversity in IoT applications. In addition, traditional AI methods may perform poorly in complex scenarios with a large number of UVs and in challenging environments such as underwater, remote regions, and disaster-affected areas. To overcome these challenges of conventional AI techniques, generative AI (GAI) has been widely adopted in the literature recently due to its groundbreaking abilities in understanding, capturing, and generating the distribution of complex and high-dimensional data. Given the potential of GAI in UV swarms, this paper aims to provide a comprehensive survey on the challenges, applications, and opportunities of GAI in enabling swarm intelligence from various perspectives.

There are a few surveys in the literature focusing on the applications of AI for UVs [12, 13, 14, 15]. For example, the authors in [12] study the applications of conventional AI techniques such as deep learning, deep reinforcement learning, and federated learning in UAV-based networks while the authors in [13] provide a more comprehensive survey on applications of machine learning (ML) in operations and communications of UAVs. Differently, in [15], the authors provide a review on AI-enabled UAV optimization methods in IoT networks, focusing on AI for UAV communications, swarm routing and networking, and collision avoidance. Similarly, applications of AI/ML for UAV swarm intelligence are also discussed in [7]. It is worth noting that the aforementioned surveys and others in the literature mainly focus on UAVs and traditional AI methods. To the best of our knowledge, there is no survey in the literature comprehensively covering the development of GAI for UV swarms. The main contributions of our paper can be summarized as follows.

•

We provide the fundamentals of UV swarms, including their designs and operations across the aerial, ground, surface, and underwater domains as well as practical use cases.
•

We provide an in-depth overview of common GAI techniques, including generative adversarial networks (GAN), variational autoencoder (VAE), generative diffusion model, transformer, and normalizing flows. The key advantages and challenges of each technique in the context of UV swarms are also highlighted in detail.
•

We comprehensively review applications of GAI to various problems in UV swarms such as state estimation, environmental perception, task/resource allocation, network coverage and peer-to-peer communications, and security and privacy. Through reviewing these GAI applications, we provide insights into how GAI can be applied to address emerging problems in UV swarms.
•

We present essential open issues and future research directions of GAI in UV swarms, including scalability, adaptive GAI, explainable swarm intelligence, security/privacy, and heterogeneous swarm intelligence.

The overall structure of this paper is illustrated in Fig. 1. Section II provides the fundamentals of UV swarms. An in-depth overview of different GAI techniques and their advantages are presented in Section III. Then, Section IV delves into the applications of GAI for emerging problems in UV swarms. Open issues and future research direction of GAI in UV swarms are highlighted in Section V. Section VI concludes the paper. Additionally, Table I lists all the abbreviations used in the paper.

Refer to caption — Figure 1: The overall structure of this paper.

Abbreviation	Description	Abbreviation	Description
UAV	Unmanned Aerial Vehicle	UGV	Unmanned Ground Vehicle
USV	Unmanned Surface Vehicle	UUV	Unmanned Underwater Vehicle
VTOL	Vertical Take-Off and Landing	AI	Artificial Intelligence
GAI	Generative Artificial Intelligence	GAN	Generative Adversarial Network
VAE	Variational Autoencoder	GDM	Generative Diffusion Model
NF	Normalizing Flow	DRL	Deep Reinforcement Learning
UV	Unmanned Vehicle	LLM	Large Language Model

TABLE I: List of Abbreviations

II Unmanned Vehicle swarms

UVs operate across aerial, ground, surface, and underwater domains [16] as illustrated in Fig. 2. They harness advanced technologies like AI and robotics to perform diverse tasks often in challenging or hazardous environments. By forming coordinated groups or swarms, these systems achieve common objectives with enhanced efficiency [17]. UV swarms are categorized primarily based on the degree of autonomy, which ranges from fully autonomous systems, capable of independent operation based on pre-programmed protocols and real-time data, to semi-autonomous systems that require some level of human intervention. UV swarms can also be classified by the hierarchical structure, which includes single-layered or multi-layered configurations, which are integral to the operational dynamics of these swarms. In a multi-layered swarm, leader vehicles are designated to orchestrate the activities of the collective. These leaders communicate with a central server station, constituting the operational hierarchy’s peak [18]. Each UV within the swarm, equipped with substantial computational prowess, is assigned with specific data collection and processing roles. The centralization of mass data processing facilitated either through advanced server stations or cloud-based computing solutions, significantly optimizes the efficiency of data management and task execution.

The diverse applications of UV swarms in various sectors underscore their transformative impact. These applications range from enhancing military operations to revolutionizing agricultural practices, showcasing the blend of agility, precision, and operational efficiency that these systems bring to modern society:

•

Surveillance and Monitoring: UVs are widely used for security, survey, monitoring and surveillance purposes. They offer unparalleled efficiency in covering wide areas, reducing manpower requirements, and providing real-time response capabilities, especially in detecting and alerting movements and changes in the environment [19, 20, 21, 22].
•

Environmental Conservation and Management: UVs are actively engaged in environmental monitoring and mapping. They contribute significantly to environmental conservation efforts, including studying ocean biological phenomena, disaster prediction and management [23, 24, 25, 26, 27]. Particularly, in events like tsunamis and hurricanes, these systems prove critical, especially in areas rendered inaccessible. They play a vital role in damage assessment, emergency response, and in planning and executing effective disaster management strategies [28, 29].
•

Entertainment and recreational events: UVs have revolutionized entertainment and recreational activities by introducing innovative applications such as sky painting and writing [30, 31]. In addition to this, in the realm of filmmaking and video production, the agility and precision of these UVs enable filmmakers to capture complex scenes in challenging environments. They offer unique perspectives and camera angles, previously unattainable or prohibitively expensive with traditional filming methods, adding a new dimension to storytelling and cinematography [32, 33].
•

Heathcare: In healthcare, UAVs have become a game changer. Other than delivering blood and medicine to remote or disaster-affected areas [34, 35, 36], special design medical drones like Automated External Defibrillator (AED) enabled drones are also deployed to save lives outside hospital [37, 38]. Their ability to quickly and efficiently transport essential medical supplies and equipments has significantly improved access to healthcare services.
•

Industrial Automation : Automated vehicles and robots in warehouses significantly reduce manpower requirements and improve logistical processes, demonstrating a smarter approach to internal transport and package sorting [39]. Concurrently, UV swarms are transforming package delivery, providing dynamic and efficient direct-to-customer delivery solutions [40, 41, 42, 43].

II-A Unmanned Aerial Vehicles

UAVs, also known as drones, constitute a class of aerial robots designed to operate collaboratively to achieve a diverse array of objectives. These objectives span the spectrum from military applications to an expanding suite of civilian and commercial uses. UAVs typically equipped with a multiple of rotors which are capable of VTOL [44], thus enabling them to hover, ascend, and descend vertically. The operational control of these vehicles may be manual via remote piloting or autonomous, through the integration of sophisticated onboard processing units. Other than the common applications UV swarms offers, the deployment of swarm UAVs in recreational activities has been noteworthy. One interesting application leveraging the flexible movement of UAVs is drone light shows, which involves a fleet of drones, often equipped with lights, flying in a coordinated manner to create shapes, patterns, or texts in the sky [45]. Another key application of UAV swarms is in post-disaster communication network reconstruction. Utilizing UAV’s mobility, affected areas can quickly reconnect with the outside world by employing UAVs as movable base stations [46, 47, 48, 49].

II-B Unmanned Ground Vehicles

UGVs are pivotal components in UV systems, especially for transportation and logistics. Their applications in platoon formation is crucial for enhancing the efficiency and safety of transport systems. Other than classified by the degree of autonomy, UGVs can also be categorized based on their mobility mechanisms into wheeled, tracked, and legged variants [50]. Tracked UGVs excel in rugged terrains, while legged UGVs are effective in obstacle-rich environments, beneficial for tasks like search and rescue. Besides this, UGVs have been increasingly incorporated into various applications, leveraging their autonomous capabilities for various critical and practical tasks. Other than the aforementioned common applications of UV swarms, collaborations between UAVs and UGVs allow leveraging the strengths of both vehicle types to achieve common goals. For instance, while UAVs can explore areas at a high speed using their cameras, UGVs, capable of traversing a wide variety of terrains, complement UAVs by providing moving recharging stations and transportation between mission objectives, thus enhancing the operational efficiency and extending the mission capabilities [51].

II-C Unmanned Surface Vehicles

USVs also known as autonomous boats, are specialized robotic platforms that operate on water surfaces to perform a variety of tasks. Their operational domain span from disaster management to environmental monitoring and defense applications. Compared to UAV and underwater vehicles, USVs have an advantage in terms of reliable access to Global Positioning System (GPS) data and superior communication capabilities, making them ideal for real-time operations and long-term missions [52]. Such an advantage has make USVs instrumental in conducting water depth surveys and examining mitigation patterns and changes within major ecosystems [53]. USVs also facilitate research activities that require heterogeneous UV cooperation, including integration with other UVs [25]. With the ability to generate power from solar, wind, and waves, USVs can serve as durable platforms for collecting extensive data across various locations and act as mobile communication and refueling relays for other vehicles [54, 26].

II-D Unmanned Underwater Vehicles

UUVs, also known as submersible drones, are specialized aquatic robots designed for a wide range of underwater tasks. Initially focused on military and scientific uses, their applications have expanded into civilian and recreational areas [55, 6]. UUVs are categorized into Remotely Operated Vehicles (ROVs), connected to operators via cables and used for detailed tasks like observation and maintenance, and Autonomous Underwater Vehicles (AUVs), which operate independently for various measurements, including submarine volcano monitoring and seabed surveys [56, 6]. A notable application of UUVs is in inspecting and maintaining marine structures, providing essential data for the construction industry and ensuring the integrity of underwater infrastructure, such as oil pipelines and undersea cables [57]. Additionally, UUVs are uniquely suited for extreme environments, such as ice-covered regions, nuclear facilities, and other areas where human access is dangerous or impossible, showcasing their versatility and critical role in challenging underwater operations [57, 55].

UVs are revolutionizing various industries by simplifying traditional operations and innovating new activities in air, land, sea, and underwater domains. The increasing number of UVs sets a foundation for generative AI to enhance, promising to significantly improve autonomy and coordination among swarms.

III Generative AI

III-A An Overview

GAI represents a paradigm shift in AI technology, characterized by its ability to produce novel and meaningful content, such as text, images, audio, and 3D models. As illustrated in Fig. 3, unlike discriminative models that focus on classification or prediction, GAI models are adept at interpreting instructions and generating tangible outputs, a distinction that marks a significant leap in AI capabilities [58]. These advanced models could capture a deep understanding of data patterns and structures, enabling them to not only replicate but also innovate within the learned frameworks. This innovation is evident in GAI’s diverse applications, ranging from realistic image [59] and text creation [60] to complex 3D model generation [61].

GAI is revolutionizing traditional methods, offering transformative impacts in domains such as medical and engineering education through personalized learning support and intelligent tutoring systems [62]. Similarly, in visual content generation, GAI’s potential for causal reasoning is being explored, the ability to reason about causality is crucial for many applications, such as robotics, autonomous driving, and medical diagnosis [63, 64].

Moreover, GAI’s influence extends to business model innovation, where its applications in various industries, including software engineering, healthcare and financial services, are reshaping traditional business models [65]. This versatility of GAI not only underscores its role as a tool for creativity and innovation, but also highlights its potential to drive significant advancements across multiple sectors. In summary, the primary distinction between GAI and traditional Discriminative lies in their capabilities, i.e., GAI is designed to create and innovate, generating new content, while traditional Discriminative is more focused on data classification and extraction [66].

III-B Typical GAI Models

III-B1 GAN

GANs represent a significant advancement in both semi-supervised and unsupervised learning. Conceptualized by Goodfellow [67] in 2014, GANs involve training two networks simultaneously: a generator and a discriminator. The generator’s role is to produce data that mimics real data, while the discriminator acts as a classifier, distinguishing between real and generated data. This dynamic between the two networks forms the core of the GAN model, where the generator aims to create data realistic enough to confuse the discriminator, and the discriminator improves its ability to identify fake data. This process results in a Nash equilibrium, where the generator produces increasingly realistic data, and the discriminator becomes more adept at identifying forgeries. This effectively utilizes a supervised learning approach to achieve unsupervised learning outcomes by generating synthetic data that appears authentic [68, 69]. Despite these achievements, GAN training remains challenging, primarily due to the instability. This is critical as the generator and discriminator in GANs need to optimize through alternating or simultaneous gradient descent. The GAN architecture faces several issues like achieving Nash equilibrium [70], mode collapse and vanishing gradients [71]. To address these challenges, numerous solutions have been proposed, such as unrolled GAN [72], mini-batch discrimination, historical averaging, feature matching [73], two timescale update rule [74] and self-attention GAN [75]. These developments have been crucial in stabilizing GAN training over the years.

As shown in Fig. 4, GAN excel in generating high-quality samples and achieving fast sampling primarily due to their unique adversarial training mechanism. In a GAN, the generator and discriminator networks engage in a continuous competitive process, where the generator learns to produce increasingly realistic samples to deceive the discriminator. In UV swarm applications, the competitive process between the generator and discriminator in GANs not only ensures the generation of realistic samples but also aids in the creation of varied and complex environmental simulations crucial for training UVs. Moreover, the efficiency of GANs in sample generation becomes significant when considering the computational constraints and the need for rapid decision-making in UV swarms. A trained generator in GAN is able to produce new samples through simple forward inferences. Lastly, GANs’ capacity to learn a rich and diverse latent space is essential for UV swarms. This ability allows for the generation of varied scenarios and conditions, which is crucial for the robust training of these systems. In summary, the unique features of GANs such as high-quality sample generation, efficiency in producing new samples, and the ability to learn a rich, diverse latent space are particularly beneficial for UV swarms, enhancing their adaptability, efficiency, and reliability in dynamic and challenging environments.

III-B2 VAE

VAEs are deep latent space generative models that fundamentally learn the distribution of data to generate new, meaningful data with more intra-class variations. Similar to GANs, VAEs consist of two interlinked yet independently parameterized components: the encoder and the decoder. The encoder provides the decoder with an estimation of its posterior over latent variables. This estimation is crucial for the decoder to update its parameters during iterations of “expectation maximization” learning. Conversely, the decoder forms a framework that assists the encoder in learning meaningful data representations, which may include class-labels. The encoder essentially serves as an approximate inverse to the generative model, in line with Bayes’ rule [76]. The training of VAEs involves optimizing the Evidence Lower Bound (ELBO), which balances reconstruction accuracy and the similarity of the latent space distribution with the target distribution [77].

In the context of data augmentation, VAEs are valuable for their ability to increase the variance of a dataset, particularly in domains with limited ideal training samples [76]. Although VAE can avoid issues like non-convergence and mode collapse, which are common in other generative models like GANs, the samples generated by VAEs tend to be of lower quality compared to GANs. Representation learning is another significant application of VAEs. This approach involves transforming raw data into more advanced training data representations, often requiring significant human expertise and effort. VAEs automate this process by learning mappings from high-dimensional space to a meaningful low-dimensional embedding [78].

In UV swarm applications, VAEs stand out for their stability and reliability. Compared to GANs, VAEs can mitigate issues like mode collapse, making them a more stable choice for generating training simulations [79]. This stability is crucial in UV swarms, where consistent and varied environmental modeling is essential for thorough system training. The robustness of VAEs in generating a wide range of scenarios without the risk of model collapse ensures that UV systems are exposed to a comprehensive set of conditions, enhancing their adaptability and preparedness for real-world operations.

III-B3 GDM

Different from GANs and VAEs, GDM utilize a two-stage process involving both forward and reverse diffusion [80, 81]. A diffusion model is a parameterized Markov chain trained using variational inference to produce samples matching the data after finite time [82]. In the forward diffusion stage, these models gradually add gaussian noise to input data over multiple steps, progressively degrading the data’s structure. During the reverse diffusion stage, the model learns to methodically reverse this process, sequentially predicting under guidance of prompt and removing the noise and thereby reconstructing a new data sample. The removed noise at each step is estimated via a neural network like U-Net architecture to ensure dimension preservation [83].

Research on diffusion models has shown promising results in various computer vision tasks, with three primary subcategory being Denoised Diffusion Probabilistic Models (DDPM), Noise-Conditioned Score Network (NCSN), and Stochastic Differential Equations (SDE) [84]. DDPMs excel in creating diverse and high-quality images and serve as the foundational structure for well-known models like Stable Diffusion [85] and DALL-E series [86, 87, 88]. NCSNs are the core technology of Deepfake [89], which is deployed to produce realistic altered images and videos through score matching and noise level training. SDEs utilize forward and reverse SDEs for robust and theoretically sound generation strategies are often applied to models like DiffFlow [90]. These models have achieved remarkable results in image generation, surpassing GANs in diversity of generated samples, but the need for multiple adding noise and denoising steps during inference makes them slower than GANs and less efficient than VAEs in image production. To address the efficiency challenge in GDM, a key research focus is on enhancing sampling efficiency. An example is the development of Denoising Diffusion Implicit Models (DDIM) [91]. DDIMs improve the sampling speed by allowing fewer steps in generating samples without significantly compromising the quality of the generated images.

In the context of UV swarms, GDMs can be particularly useful in generating highly detailed and diverse environmental simulations for training UV systems [92]. Unlike GANs that often suffer from mode collapse, or VAEs that sometimes generate low quality images, GDMs can produce samples with a high level of detail and variation. This is crucial for UV swarms to train in realistic and varied conditions [93]. On the other hand, the iterative process of GDMs that involves multiple steps of noise addition and removal, results in a slower and less efficient sample generation compared to GANs and VAEs. However, this trade-off is justifiable in scenarios such as search and rescue operations, military reconnaissance, and environmental monitoring, where the utmost realism and detail in training simulations are paramount for the effective functioning of UV swarms.

III-B4 Transformer

Transformers models [94] have become a fundamental in numerous state-of-the-art models in the field of generative models. Especially in natural language processing, Transformer-based Large Language Models (LLMs) such as GPT series [95, 96, 97], Bidirectional Encoder Representations from Transformers (BERT) [98] and Bard¹¹1https://ai.google/static/documents/google-about-bard.pdf , accessed Jan 3rd,2024 have demonstrated their abilities to capture large corpora of information [99]. Although both VAE and Transformer architectures feature an encoder-decoder design, their functionalities diverge significantly. In transformers, the encoder processes the input sequence, capturing complex dependencies, and the decoder generates the output sequence, often leveraging self-attention mechanisms to focus on different parts of the input data. Both the encoder and decoder in transformers consist of multiple layers of attention mechanisms and feed-forward neural networks, enabling them to handle complex sequence data effectively [100].

Transformers can offer more than just standalone models. When integrated into other generative models, they introduce essential mechanisms and techniques such as attention, self-attention, multi-head attention, and positional encoding [101]. This integration has unlocked practical applications of transformers in various domains, including text generation for creative writing [102], chatbots [103], code generation [104], and programming assistance [105]. Notably, incorporating transformers into other GAI has led to significant advancements in image synthesis. Pure transformer-based architectures in GANs, such as ViTGAN [106] and STrans GAN [107], have successfully synthesized high-resolution images without the need for convolutions neural networks. These exciting developments demonstrate the versatility and expanding capabilities of transformers in generative tasks, extending their utility from text-based applications to sophisticated image synthesis.

In the realm of UV swarms, the transformer architecture has become a key asset due to its proficiency in processing sequential data and managing long-range dependencies, greatly enhancing tasks that require intricate decision-making based on comprehensive data streams [108]. In UV swarm operations, each unit may need to process and respond to a vast array of sensor inputs, communications from other units, and environmental factors, making the transformer’s ability to analyze this sequence data and identify crucial dependencies a critical resource in aiding the rapid decision-making process [109]. Furthermore, the generative capabilities of transformers have shown great potential to create highly detailed and context-aware simulations for training or mission planning in UV swarms. Models such as GPT-4 could prove instrumental in generating realistic, complex scenarios for training UVs and enhancing their preparedness for real-world operations with their enormous parameter count and the ability to handle both text and image prompts [110].

III-B5 Normalizing flow

Normalizing Flows (NF) are a class of generative models that stand out due to their capability to produce tractable distributions where both sampling and density evaluation can be efficient and exact [111]. They are characterized by transforming a simple probability distribution, such as a standard normal distribution, into a more complex one through a sequence of invertible and differentiable mappings. This transformation allows for the evaluation of the density of a sample by reverting it back to the original simple distribution and calculating the product of the density of the inverse-transformed sample and the associated change in volume induced by the sequence of inverse transformations [112].

A key advantage of NFs over other generative models is their inherent invertibility. This feature enables exact reverse mapping, which is crucial for efficiently and accurately evaluating the density of generated samples – a challenging task in other generative models like GANs or VAEs [113]. The invertibility of NFs also enables both efficient sampling and exact density estimation, making them highly versatile. Unlike GANs, which may suffer from training instabilities such as mode collapse, NFs offer a more stable training process [114]. Additionally, NFs can reconstruct data with higher fidelity compared to the approximate reconstruction in VAEs [111].

NFs have been deployed in various applications, including image generation [115], noise modelling [116], video generation [117], audio generation [118], and graph generation [119]. In the context of UV swarm, Graphical Normalizing Flows (GNF) models can be utilized for anomaly detection in UV swarms [120]. GNF leverages Bayesian networks to identify relationships among time series components and perform density estimation in the low-density regions of a distribution where anomalies typically occur. Comparing to other GAI models, the key advantage of NF models lies in their capability to provide exact likelihood estimation, which offers a precise, stable, and efficient solution for identifying and responding to various anomalies in UV systems [93].

IV Solutions to Unmanned Vehicle swarms utilizing GAI

IV-A State Estimation

TABLE II: Generative AI in State Estimations

Type	Ref	Description	Pros	Cons
GAN	[121]	GAN is deployed to estimate traffic states by generating realistic and diverse samples from sparse and incomplete traffic data.	[leftmargin=3pt,rightmargin=0pt,after=] • Accurate spatial-temporal correlation capture and • Adapts to dynamic changes	[leftmargin=3pt,rightmargin=0pt,after=] • High memory and computing needs • Limited generalization
	[122]	GAN is used in this paper to construct a harmonic state estimation model, where the mechanism equation is integrated into the objective function to include power grid information.	• Accurate source localization	• Potential GAN-related issues like model collapse
	[123]	Conditional GAN (cGAN) is employed to refine state estimates by learning the mapping between raw estimates and true states.	• Improved estimation accuracy	• Sensitive to hyperparameters change • Large training data required
	[124]	cGANs are employed to fuse individual and global motion for enhanced multiple object tracking.	• Enhanced tracking accuracy	• Computational overhead • Training data dependency
VAE	[125]	VAE is utilized to capture the temporal correlations in the wireless channels of UAVs for improved channel state estimation.	• Reduce signal transmission needs	• Increased computational resources • Model hyperparameter sensitivity
Diffusion	[126]	Diffusion-based score models are employed to exploit the natural correlations in MIMO channels for effective state estimation.	• Enhanced channel estimation	• Computationally intensive • Noise level impact on accuracy
Others	[127]	Deep normalizing flows are employed to capture the structure and intricacies of the state distribution for state estimation.	• Flexible complex distribution modeling	• High computational demand • Large data requirement

State estimation is critical to UVs swarm applications, especially in fields like autonomous driving and traffic estimation. State variables like position, velocity, and orientation, play a crucial role in lateral’s decision making during navigation or trajectory planning [128]. However, the stochastic nature of system measurements and robot dynamics can lead to uncertainty about the actual state. Therefore, the primary objective of state estimation is to deduce the distribution over state variables based on the available time observations [127].

As shown in Table II, the integration of GAI in state estimation for UVs offers a broad range of innovative methodologies, each tailored to specific challenges and operational contexts. For instance, in addressing the challenge of data insufficiency in traffic state estimation for UGVs, the authors in [121] utilized Graph-Embedding GANs to generate realistic traffic data for underrepresented road segments by capturing the spatial interconnections within road networks. In this proposed framework, the generator uses embedded vectors from similar road segments to simulate real traffic data. Meanwhile, the discriminator distinguishes between this synthesized and actual data and iteratively train generator to optimize both components until the generated data is statistically indistinguishable from real data. This methodology not only fills data gaps but also significantly enhances estimation accuracy, as evidenced reduction in mean absolute error compared to traditional models like Deeptrend2.0 [129]. Such an advancement in traffic state estimation underscores GAI’s potential in improving UGV navigation and decision-making in complex traffic scenarios [121].

In addition to the standard GAN, cGANs can be used to generate the corresponding estimated system state variable given the raw measurements [123]. The challenge of accurately estimating the motion of multiple UAVs in dynamic environments is addressed using a cGAN framework by employ raw measurements of sensor as conditional constrain. The authors in [124] combine individual motion predictions from a Social LSTM network [130] with global motion insights from a Siamese network [131] to achieve a comprehensive motion state prediction. This method excels in accurately forecasting UAV trajectories which is crucial for effective swarm navigation. By effectively disentangling and fusing individual and global motions, the cGAN based framework demonstrates performing well and improving the performance of multiple object tracking compared to the original Social LSTM.

TABLE III: Generative AI in Environmental Perception

Type	Ref	Description	Pros	Cons
GAN	[132]	Latent Encoder Coupled GAN (LE-GAN) is introduced to achieve efficient hyper-spectral image super-resolution.	• Effective for image super-resolution.	• Risk of mode collapse and spectral-spatial distortions.
	[133]	Trajectory GAN (TraGAN) is used to create realistic and intuitive lane change trajectories from recorded highway traffic data.	• Learns trajectory parameters automatically without labels.	• Needs large, high-quality datasets and sensitive to hyperparameters.
	[134]	The DeepRoad framework, which deploys GANs for metamorphic testing and input validation in autonomous driving systems is proposed.	• Improves the reliability of testing by addressing scene diversity.	• Synthetic images might be insufficient due to limitations.
	[135]	A cGAN-based framework is deployed for automatic change detection in UAV and remote sensing images.	• Shows promising results in automatic change detection.	• Potential need for large training data sets.
VAE	[133]	Trajectory VAE (TraVAE) is used to generate synthetic lane change trajectories data from real traffic record.	• Learns intuitive latent parameters for lane changes.	• Higher reconstruction error than TraGAN.
	[136]	VAE is deployed to generate synthetic crash data from real traffic data to augment the dataset.	• Produces realistic and diverse crash data.	• May require strong assumptions leading to suboptimal models.
	[137]	Image translation framework based on VAE and GANs are used to translate simulated images into realistic synthetic images that can be used for training and testing change detection models.	• Generates labeled data with various imaging challenges.	• Real images still needed as references • May not cover all domain gaps.
	[138]	The VAE encode the environment image from pixel space to a smaller dimensional latent space before the diffusion progress and after the diffusion progress the VAE generates the final image by converting the representation into pixel space.	• Compressing high-dimensional data into a lower-dimensional latent space. • Reduce the computational resources required.	• Struggle to accurately reconstruct the finer details.
Diffusion	[138]	Stable diffusion that is developed base on GDM is deployed to generate synthetic images of the UAV models.	• Captures high-level UAV model features and fine details.	• Computationally expensive and slow to train.
Diffusion	[139]	Conditional GDM are deployed to generates photo-realistic images and corresponding ground truth bounding boxes for UAV detection according to conditional inputs, such as binary masks that specify the details and background of the UAVs, and text prompts that describe the scenes.	• Boosts UAV detector performance with high-fidelity images.	• Potential interference in object detection if not well-filtered.
Others	[140]	To generate captions for images captured by UAVs using CLIP Prefix for Image Captioning, a transformer-based architecture that uses the CLIP and GPT-2 models is used.	• Translates visual content to coherent text descriptions.	• May produce erroneous or inconsistent text descriptions.
Others	[141]	Generative knowledge-supported transformer (GKST) that leverages the mutual learning across different views is deployed to improve the feature representation ability and retrieval performance.	• Bridges appearance gap between ground and aerial views.	• Computationally expensive. • Memory-intensive.

Additionally, an application of VAEs in capturing temporal correlations in UAV wireless channels underscores the importance of GAI in communication systems, improving channel state estimation and signal clarity by generating realistic and diverse channel samples [125]. The exploration extends to diffusion-based score models and deep normalizing flows, employed for generating complex state variable distributions, showcasing the capability of GAI to model and estimate states in more flexible manners ranging from state variables (i.e., position, velocity, and orientation) to the intricate high-dimensional gradient of these distributions [126, 127].

The versatility of GAI in state estimation for UV swarms is evident in two aspects: ability of generating missing information through adversarial mechanisms and ability of fusing varied data sources for comprehensive state analysis. These ability enables more accurate states estimation in complex operational scenarios.

IV-B Environmental Perception

Environmental perception in the context of UVs typically refers to the ability of the vehicle to perceive and understand its surrounding environment in real-time [142]. This is a key technology for achieving autonomous navigation and completing tasks for UV swarms . Such technology often involves the use of sensors such as LiDAR, cameras, and millimeter wave radar to interact with the external environment [143]. The realm of environmental perception in UVs is markedly advanced by the varied and innovative applications of GAI, as detailed in Table III. For example, due to intrinsic constraints, such as motion blur from movement, adverse weather conditions, and varying flying altitudes, UAVs often capture low-resolution images. To address this problem, the authors in [132] introduce a framework called Latent Encoder Coupled Generative Adversarial Network (LE-GAN), designed for efficient hyper-spectral image (HSI) super-resolution. The generator in LE-GAN uses a short-term spectral-spatial relationship window mechanism to exploit the local-global features and enhance the informative band features. The discriminator adopts Wasserstein distance-based loss between the probability distributions of real and generated images. Such a framework not only improves the SR quality and robustness, but also alleviates the spectral-spatial distortions caused by mode collapse problem by learning the feature distributions of high resolution HSIs in the latent space [132].

Besides improving UVs’ accuracy by enhancing remote sensing resolution, a more common application of GAI is to generate synthetic datasets, which indicates the challenge of reduced model accuracy caused by insufficient data [138]. For instance, a framework named Trajectory GAN (TraGAN) is deployed for generating realistic lane change trajectories from highway traffic data [133]. Another GAN-based framework named DeepRoad is utilized for testing and input validation in autonomous driving systems [134] to enhance the reliability of testing by generating driving scenes in different weather conditions. VAEs are also deployed for generating more realistic and diverse crash data, addressing the limitations of traditional data augmentation methods [136]. Additionally, image translation frameworks that combine VAE and GANs are utilized for transforming simulated images into realistic synthetic ones for training and testing change detection models [135, 137], though they still require real images for reference. Moreover, the authors in [139] introduce a method leveraging a text-to-image diffusion model to generate realistic and diverse images of UAVs set against various backgrounds and poses. With more than 20,000 synthetic images generated by merging background descriptions and binary masks based on ground truth bounding boxes, the detector’s average precision on real-world data increased 12%.

TABLE IV: Generative AI in Level of Autonomy

Type	Ref	Description	Pros	Cons
GAN	[144, 145]	Generative Adversarial Imitation Learning (GAIL) is integrated with multi-agent DRL to enhance cooperative search strategies for UAVs which allows UAVs to learn efficient searching strategies by imitating expert behaviors.	• Simplifies learning process without explicit rewards. • Potentially leads to more natural and efficient behaviors.	• Requires significant expert trajectory data for training.
GAN	[146]	GAIL is utilized to train UAVs for navigation tasks within the virtual environment. The UAVs learn to navigate by imitating expert trajectories, enabling them to understand and adapt to complex and dynamic scenes within the virtual environment.	• Improves generalization in various virtual scenarios.	• Time-consuming collection of expert demonstrations.
VAE	[147]	BézierVAE is utilized for modeling vehicle trajectories, especially for safety validation in highly automated driving scenarios. The method encodes trajectories into a latent space using VAEs and then decodes them using Bézier curves to reconstruct and generate new trajectories.	• Captures and generates diverse driving behaviors.	• Requires diverse trajectory data for training.
	[148]	VAE within the constrained optimization in learned latent space (COIL) approach are used to optimize autonomous robot timings for deliveries, ensuring that an appropriate number of robots run simultaneously to maintain safety.	• Achieves fast optimization and adjusts to delivery demands.	• Initial learning phase can be time and resource-intensive.
	[149]	Generative Relation and Intention Network (GRIN), a conditional generative model trained using variational inference and learning is deployed for multi-agent trajectory prediction.	• Models uncertainties of intentions and relations comprehensively.	• May require a large amount of training data.
Others	[150]	Transformer architecture combined with DRL are deployed to optimize routing for multiple cooperative UAVs.	• Offers better performance and efficient parallel processing compared to transitional algorithms like Compass [151].	• May face computational speed challenges in certain scenarios.

Another filed that GAI is utilized in is the scene understanding or captioning. Such a method includes using CLIP Prefix for image captioning, translating visual content of UAV-captured images into accurate text descriptions for decision making in UV [140]. Another method is deploying the Generative Knowledge-Supported Transformer (GKST), which enhances feature representation and retrieval performance by fusing image information from different view of vehicles. [141]. An interesting aspect of these technologies is their ability to process and interpret complex visual inputs, providing a level of contextual understanding that closely resembles human perception. This capability is particularly beneficial in dynamic environments, where rapid and accurate interpretation of visual data is crucial for effective decision-making.

In summary, the generative capabilities of GAI prove to be invaluable in the field of environmental perception for UVs. From enhancing image resolution to generating synthetic dataset, creating diverse testing environments, and advancing scene understanding, GAI stands as a cornerstone technology driving the evolution and efficiency of UVs in comprehending and interacting with their surroundings.

IV-C Level of Autonomy

Autonomy refers to the capability of a system to perform a task or decision-making without human interventions [152]. The level of autonomy represents the ability of an UV can operate independently while solely relying on its onboard sensors, algorithms, and computational resources. In UV swarms, the level of autonomy depends on various factors such as the type and complexity of the task, the ability to plan and execute route [153]. Table IV illustrates how the integration of GAI is pivotal in advancing these autonomous capabilities.

In the realm of UV swarm cooperative strategies, applications of GAI are exemplified by the integration of Generative Adversarial Imitation Learning (GAIL) with multi-agent DRL. For instance, the authors in [144] introduce a Multi-Agent PPO-based Generative Adversarial Imitation Learning (MAPPO-GAIL) algorithm which employs multi-agent proximal policy optimization to sample trajectories concurrently, refining policy and value models. This algorithm incorporates grid probability for environmental target representation, increasing average target discovery probability by 73.33% while only compromising 1.11% average damage probability compared to traditional DRL search algorithms. Additionally, GAIL is utilized for training UAVs in virtual environments for navigation tasks, enabling adaptation to complex and dynamic scenes [146].

Furthermore, a VAE based model named the BézierVAE is proposed for modeling vehicle trajectories, particularly for safety validation. BézierVAE encodes trajectories into a latent space and decodes them using Bézier curves, thereby generating diverse trajectories. BézierVAE demonstrates a remarkable reduction in reconstruction errors of 91.3% and unsmoothness by 83.4% compared to traditional model TrajVAE [133], significantly enhancing the safety validation of automated vehicles [147]. For autonomous robot scheduling, COIL employs VAEs to generate optimized timing schedules, improving operational efficiency significantly [148]. Lastly, in multi-agent trajectory prediction, the GRIN model which is inspired by conditional VAE is adopted to forecast agent trajectories considering the complexities of intentions and social relations. Although complex systems face challenges such as adhering to contextual rules such as physical laws, challenges could be addressed by using a specific decoder or a surrogate model to approximate these limitations [149].

In route planning for UVs, transformer architecture combined with DRL is deployed to optimize routing for multiple cooperative UAVs. This method offers superior performance and efficient parallel processing, consistently achieving high rewards compared to traditional algorithms [150].

Enhancing autonomy in UVs is crucial for their independent and cooperative swarm operation. GAI’s generative capabilities are applied in multiple aspects, from generating new trajectories to refining routing strategies and imitating expert agent routing behaviors in diverse scenarios. These diverse applications demonstrate dynamic and adaptable solutions crucial for UVs to efficiently and independently navigate and operate in complex and changing environments.

IV-D Task/Resource Allocation

TABLE V: Generative AI in Task/Resource Allocation

GAI Type	Reference	Description	Pros	Cons
GAN	[154]	GAIL-based algorithm is proposed to reconstruct a virtual environment for DRL where the generator learns to produce expert trajectories and the discriminator distinguish the expert trajectories from the generated trajectories.	• Mimics real-world environments for effective DRL. • Faster learning and better performance.	• Real-world training may affect service quality.
VAE	[155]	An autoencoder is applied to mitigate the issue of information ambiguity in the Hungarian algorithm. Specifically, the autoencoder reconstructs the data rate matrix when there is the same weight of the cellular user (CU) and Device-to-device user (D2DU) pair in the Hungarian algorithm. It provides an optimal reconstructed cost matrix using latent space as a hyperparameter.	• Provides optimal reconstructed cost matrix. • Efficient resource allocation.	• Relies on simulated training data.
Diffusion	[156]	A diffusion model-based AI-generated optimal decision (AGOD) algorithm is proposed to address the optimal AIGC service provider (ASP) selection decision problem for the AIGC-as-a-Service architecture.	• Generates optimal selection decisions. • Applicable to various optimization problems.	• Requires large-scale architectures. • High resource consumption.
Diffusion	[157]	A diffusion-based model is deployed to perform multi-step denoising on Gaussian noise and generates an optimal allocation scheme that maximizes the transmission quality of semantic information.	• Achieves higher transmission quality scores. • Manages complex decision spaces.	• Requires extensive training.
Others	[158]	LLM are deployed for autonomously generating, executing, and prioritizing tasks in real-time, especially when deployed on-device, allowing for collaborative planning and problem-solving in wireless networks.	• Adapts to tasks with on-device intelligence.	• May produce errors due to network changes.

In the field of task and resource allocation for multi-agent UV swarms, GAI introduces effective approaches that enhance efficiency and adaptability of these systems. Traditional methods often rely on fixed algorithms and heuristic approaches, which may not always be sufficient for dynamic and complex environments [159]. As illustrated in Table V, GAI offers the flexibility necessary for these challenging scenarios.

A GAIL-based algorithm is proposed to reconstruct virtual environments for DRL, where the generator produces expert trajectories and the discriminator distinguishes them from generated trajectories [154]. This approach can make a virtual edge computing environment that closely mimics real-world conditions. It provides a place for computing resource allocation multi-agent DRL method to explore and infer reward function while avoiding impairing user experiences caused by arbitrary exploration. Furthermore, an autoencoder based method is applied to the Hungarian algorithm to mitigate information ambiguity issues caused by the same weight that appears in the data rate matrix, particularly in the allocation of bandwidth and power resources between cellular users (CU) and device-to-device users (D2DU) [155]. This method provides an optimal reconstructed cost matrix using latent space as a hyperparameter to assist in resource allocation decisions.

Additionally, the authors in [156] proposed a diffusion model-based AI-generated optimal decision (AGOD) algorithm. This algorithm enables adaptive and responsive task allocation based on real-time environmental changes and user demands. The algorithm’s efficacy is further enhanced by the integration of DRL, as demonstrated in the Deep Diffusion Soft Actor-Critic (D2SAC) algorithm. Compared to traditional SAC methods, the D2SAC algorithm shows a performance improvement of approximately 2.3% in terms of the task completion rate and 5.15% in terms of utility gained [156]. Unlike traditional task allocation methods, which assume that all tasks and their corresponding utility values are known in advance, D2SAC could address the selection of the most appropriate service provider, where tasks arrive dynamically and in real time. D2SAC shows a notable performance improvement in terms of completion rate and utility gained compared to traditional methods.

In the realm of joint computing and communication resource allocation, the importance of effective management is accentuated in UVs due to their standalone nature and battery constraints. A diffusion-based model presented in [157] offers an advanced method to design optimal energy allocation strategies for the transmission of semantic information. A key strength of this model is its ability to iteratively refine power allocation, ensuring transmission quality is optimized under varying conditions caused by the dynamic environment of UV swarm. With the transmission distance at 20 m and the transmission power at 4 kW, this diffusion model-based AI-generated scheme surpasses other traditional transmission power allocation methods like average allocation (named Avg-SemCom) and Confidence-based Semantic Communication (Conf-SemCom) [157] at approximately 500 iterations with 0.25 increase on transmission quality.

On the other hand, the authors in [158] proposed the incorporation of LLM explored to elevate the capabilities of GAI in task and resource allocation within multi-agent UV swarms. Using LLM’s advanced decision-making and analysis ability, independent LLM instances for each user are created to breakdown the original intent “reducing network energy consumption by $\Delta p=0.85W$ ” into a series of detail tasks such as tuning transmit power and channel measurement. The results are then prompted to the LLM, which will add subsequent tasks and instruct related executors to take actions. With the integration on LLM, the UAV agents managed to achieve the power saving target in 2 rounds. Although further simulation results show that current GPT-4 experiences some difficulties in maintaining multiple goals when the number of agents increases. This integration signifies a significantly advancement in the autonomy and functionality of UV swarms.

In conclusion, GAI substantially advances the field of task and resource allocation in multi-agent UV swarms. From creating vivid simulation environments for the allocation algorithm to explore on, to iteratively adjust the allocation strategy and break a rough intent to detail tasks, GAI demonstrates a strong ability to handle dynamic environments and various challenges.

IV-E Network Coverage and Peer-to-Peer Communication

TABLE VI: Generative AI in Network Coverage and Peer-to-Peer Communication

GAI Type	Reference	Description	Pros	Cons
GAN	[160]	A cGAN is deployed to solve optimize network coverage by employing three pivotal elements: A generator that models and predicts optimal network configurations. A discriminator that evaluates the authenticity and efficiency of these configurations against real-world scenarios. And a unique encoding mechanism that maps these configurations into a latent space, ensuring adaptability and scalability.	• Ensures UAV optimal positioning. • Reduce computational complexity.	• Require larger amount of training data.
GAN	[161]	A network-assisted decoder balanced network and an autoencoder-based generative contradiction network (SCMA-TPGAN), which integrate a transformer as generator and PatchGAN as the discriminator, are utilized to optimize sparse code multiple access (SCMA) encoding and decoding, improving the bit error rate in uplink Rayleigh fading channels.	• Lower bit error rate. • Enhance robustness against noise.	• Relies on base station’s knowledge of the channel matrix and the codebook of each user.
Diffusion	[162]	GDMs are employed for image restoration and network optimization in vehicle-to-vehicle communication, allowing the restoration of transmitted images corrupted due to transmission disruptions and environmental noise.	• Reduce data transfer and delay in communication	• Adds iterative computational load.
Others	[163]	Self-attention-based transformer models are utilized to predict users’ spatial distribution, guiding UAV Base Stations to adjust their positions for optimal network coverage.	• Outperforms LSTM in long sequences.	• May oversimplify real-world network dynamics.

As mentioned in Section II, a key application of UVs is their role as mobile base stations for the reconstruction of the communication network [46, 47, 48, 49, 164]. An effective positioning strategy is crucial in this context to ensure seamless access by achieving maximum user coverage with a limited number of UVs. Additionally, when UV swarms are deployed in hierarchical structures, with lead UVs acting as command centers, ensuring effective communication coverage among the sub-UVs is critical for task distribution and collaborations. As illustrated in Table VI, the need for efficient network coverage and vehicle-to-vehicle (V2V) communication is addressed by varieties of GAIs.

While utilizing UAVs as mobile stations to provide temporary network links in dynamic wireless communications is becoming increasingly popular, optimizing the network can be complex due to factors such as varying UAV altitudes, mobility patterns, spatial-domain interference distribution and external environmental conditions, which present unique challenges. In addressing the optimization of network coverage with limited UAVs, the authors in [160] propose the use of cGAN. This framework comprises a generator for modeling and predicting optimal network configurations, a discriminator for evaluating the efficiency of these configurations against real-world scenarios, and an encoding mechanism for adaptability and scalability. The method based on cGAN not only guarantees the best positioning of UAVs but also simplifies computational complexity, achieving $\mathcal{O}(k^{2})$ . In contrast, traditional methods like the core-set algorithm [165] and the spiral algorithm [166] have the time complexities of $\mathcal{O}(pk)$ and $\mathcal{O}(k^{3})$ , respectively where $p$ represents the number of user equipment in the area, while $k$ denotes the number of UAVs in the fleet [160]. Another solution proposed by the authors in [163] utilized the self-attention-based transformer to predict the user mobility and enhance the aerial base station placement. The transformer model was able to capture the spatio-temporal dependencies and handle long input and output sequences. The transformer-based scheme achieved significant gains in coverage rates compared to regular deployment schemes by improving the coverage rate by more than 31% over the regular scheme [167] and by more than 9% over the LSTM-based scheme.

In the domain of V2V communication, which is essential for secure navigation in the UV swarm, vehicles often relay images to communicate environmental data. However, these images can be corrupted due to transmission disruptions, environmental noise and noise caused by vehicle movement. To address this, the authors in [162] integrates GDMs for image restoration and network optimization. GDMs enable vehicles to restore transmitted images to their original quality by reducing data transfer and delay in communication. The iterative nature of GDMs, based on stochastic differential equations, is adept at refining internet-of-vehicles network solutions, notably in areas like path planning. For example, GDMs initiate optimization with a preliminary path, progressively enhancing it based on key performance indicators. The process capitalizes on these metric gradients to guide the path modifications towards an optimal solution. Compared to traditional DQN methods [168], the proposed GDM-based method achieved a 100% increase in average cumulative rewards at 300 epochs [162].

In summary, for network coverage and accessibility, GAI can either generate positioning strategies directly or act as an encoder to enhance traditional algorithms by capturing spatial information. For efficiency, GAI acts as a framework that uses semantic information to reduce data transmission while maintaining communication by guided generation. However, while these developments represent a leap forward in managing UV swarms, there remain areas for further exploration. For example, the authors in [162] open the question of integrating other modalities for more efficient communication. This suggests an opportunity for future research to investigate the incorporation of multimodal data processing in UV networks. Such explorations could significantly enhance the adaptability of these technologies to diverse network topology and environmental conditions. Additionally, the potential of GAI to facilitate autonomous decision-making within UV swarm deployments presents a promising avenue for advancing the field. By expanding the scope of GAI applications, researchers can further optimize UVs for a variety of complex real-world scenarios.

IV-F Security/Privacy

TABLE VII: Generative AI in Security/Privacy

Type	Ref	Description	Pros	Cons
GAN	[169]	GAN-based image-to-image translation method named Auto-Driving GAN (ADGAN) is proposed to protect the location privacy of vehicular camera data by removing or modifying background buildings in images, while preserving the recognition utility of other objects such as traffic signs and pedestrians.	• Protect privacy while maintaining recognition accuracy. • Enhance synthesis with multi-discriminator design.	• Potential privacy leaks via indirect data. • May impair auto-driving quality and safety.
	[170]	TrajGANs is proposed protect the privacy of trajectory data by generating synthetic trajectories that follow the same distribution as the real data, while obscuring the individual locations and identities of users.	• Maintains data utility and pattern accuracy. • Flexible in various data scenarios.	• Struggles with dense trajectory representation. • Risk of overfitting and missing rare events.
	[171]	LSTM-TrajGAN generates synthetic trajectories that follow the same distribution as the real trajectories, but obscure the individual locations and identities of users. The synthetic trajectories can prevent users from being reidentified by the Trajectory-User Linking (TUL) algorithm, which links trajectories to users based on their spatial, temporal, and thematic characteristics thus expose vehicle privacy.	• Retains mobility patterns and data utility. • Outperforms geomasking methods.	• Validation needed to prevent overfitting. • Must weigh privacy against data utility.
	[172]	A framework using GAN is proposed to automatically segments and replaces moving objects (such as pedestrians and vehicles) from street-view images by inserting a realistic background.	• Segments and anonymizes moving objects.	• Risk of removing relevant static objects.
VAE	[173]	VAE is deployed to generate synthetic trajectories that randomize the released vehicle locations based on differential privacy, which adds noise to the data to prevent the disclosure of individual information.	• Creates varied and realistic synthetic data. • Adjustable privacy level.	• Introduce noise that may reduce data accuracy.
Others	[174]	Federated Vehicular transformers combines federated learning with transformers for privacy-preserving computing and cooperation in autonomous driving. Transformers could achieve unified representation and fusion of multi-modal data, such as trajectories, images, and point clouds, from different vehicles	• Privacy-centric performance balance.	• More complex system requirements.
Others	[175]	A multi-class intrusion detection system (IDS) based on a transformer-based attention network for an in-vehicle controller area network (CAN) bus is proposed. The IDS can learn the features and correlations of CAN messages and classify them into different attack types using self-attention mechanisms.	• Accurate detection of diverse attacks. • Efficient without detailed message labels.	• Longer training period. • Venerable to new, unknown attack types.

Security and privacy are critical aspects in UV swarms, especially in military and surveillance applications. The integration of GAI in these domains offers innovative solutions for enhancing system security and ensuring privacy. As illustrated in Fig. 6, an interesting potential application is to utilize GAI’s ability to generate fake data or simulate communication activities to act as a honeypot to mislead potential attackers and reinforce system security [176]. The LLM-generated honeypots serve as an additional protective layer, disseminating false information to confuse and trap attackers, thereby enhancing the collective security of the swarm. This innovative use of language processing technology within the swarm network exemplifies a new frontier in safeguarding autonomous vehicles from sophisticated cyber threats. The use of GAI in UV swarm security and privacy protection is elaborated in Table VII.

One notable application of GAI in the realm of privacy protection is the Auto-Driving GAN (ADGAN) [169]. ADGAN is a GAN-based image-to-image translation method designed to protect the privacy of vehicle camera location data. ADGAN achieves this by removing or modifying background buildings in images while retaining the utility of recognizing other objects like traffic signs and pedestrians. The semantic communication acts as an effective means to enhancing the security of UV swarms, as it removes the background images that are lrrelevant to tasks. Additionally, ADGAN introduces the multi-discriminator setting that enhances image synthesis performance and offers a stronger privacy protection guarantee against more powerful attackers [169]. Another similar application is a GAN-based framework that protects identity privacy in street-view images by altering recognizable features, such as replacing moving objects with a realistic background [172].

In terms of trajectory data privacy, TrajGANs are employed to protect the privacy of trajectory data by generating synthetic trajectories [170]. These trajectories follow the same distribution as real data while obscuring individual locations and identities of users. They preserve real data’s statistical properties and capture human mobility patterns. However, TrajGANs may face challenges in creating dense representations of trajectories, particularly for timestamps and road segments, and may fail to identify some rare or exceptional events in the data. To further enhance the protection, the authors in [171] present the LSTM-TrajGAN framework. The framework consists of three parts: a generator that generates and predicts realistic trajectory configurations, a discriminator that compares these configurations with real data to validate their authenticity and utility, and a specialized encoding mechanism that utilized LSTM [177] recurrent neural network to perform space-time embedding of trajectory data and its respective time stamp. Its privacy protection efficacy is evaluated using a Trajectory-User Linking (TUL) algorithm as attackers [178]. Evaluated on a real-world semantic trajectory dataset, the proposed approach achieves better privacy protection by reducing the attacker’s accuracy from 99.8% to 45.9% compared to traditional geo-masking methods like random perturbation at 66.8% and Gaussian geo-masking at 48.6% [179]. These results show that the LSTM-TrajGAN can better prevent users from being re-identified while preserving essential spatial and temporal characteristics of the real trajectory data.

VAEs are also deployed for protecting UV trajectory privacy. The authors in [173] utilize a VAE to create synthetic vehicle trajectories, ensuring differential privacy by adding noise to data. This approach helps to effectively obscure vehicle locations, although some data distortions may be introduced due to the added noise. Transformers in federated learning, as discussed in [174], enhance privacy in autonomous driving by sharing only essential data features across networks. This method improves privacy but faces challenges with communication link stability and external interference.

To protect vehicle network security, the authors in [175] proposed a transformer-based intrusion detection system that provides a sophisticated solution for vehicle networks. This system employs self-attention mechanisms to analyze Controller Area Network (CAN) messages, accurately classifying them into various in-vehicle attacks like denial-of-service, spoofing, and replay attacks. Another transformer-based model proposed by the authors in [174] is the integration of transformers in federated learning setups. This method enables the sharing of key data features rather than raw data across a network of autonomous vehicles. This method significantly boosts privacy by minimizing the exposure of sensitive data while still enabling collaborative decision-making and computing.

In summary, the application of GAI in UV swarms has revolutionized security and privacy measures, particularly in sensitive sectors such as military and surveillance. Techniques like honeypots and GAN-based frameworks demonstrate GAI’s capability in data manipulation for enhanced security. Additionally, the implementation of VAEs and transformers in federated learning for trajectory privacy, and advanced intrusion detection systems underscore GAI’s adaptability and effectiveness in safeguarding against sophisticated cyber threats.

IV-G Vehicle Safety and Fault Detection

TABLE VIII: Generative AI in Vehicle Safety and Fault Detection

GAI Type	Reference	Description	Pros	Cons
GAN + VAE	[160]	VAE-CGAN is deployed to complements missing time-series data in case of sensor faults, where VAE helps in initializing and correcting the noise in traditional CGAN, and CGAN’s discriminator improves the sample quality generated by VAE’s decoder.	• High-quality samples even with different missing rates. • Accurate predictions.	• Require longer training period.
VAE	[180]	VAE is deployed learn to encode normal UAV data into a lower-dimensional space then reconstruct it back and identify anomalies by comparing the reconstruction error with a deviation threshold.	• Able to detect uncovering novel faults. • Accurate predictions.	• Offline operation. • Lack of generalizability.
Others	[181, 182]	LSTM network is utilized to detect faults in UVs by analyzing sensor data to identify patterns indicative of anomalies, and it operates by processing data in a sequence to capture temporal dependencies that are crucial for recognizing irregularities in the sensor readings.	• Accurate predictions.	• Require a large amount of data for training. • Computationally intensive.
Others	[183]	BERT based mode is deployed for extracting features across multiple spatio-temporal scales to identifies five common anomaly types for battery fault diagnosis.	• Reliable predictions of battery faults using onboard sensor data. • Offers significant lead time for preventive measures.	• Computationally intensive. • Risk of misidentifying battery conditions.

Vehicle safety is another critical concern that encompasses the detection, isolation, and resolution of system faults. Unlike other safety concerns such as collision avoidance or the development of safe path planning strategies for UV swarms, which are more closely related to the level of autonomy of these systems [184], research on UV safety highlights the unique challenges brought forth by internal vulnerabilities of UV systems, including algorithmic and hardware failures. Research in this field aims to enhance the overall reliability and safety of UV operations by developing methodologies and technologies that enable these systems to effectively identify and rectify potential faults before they impact vehicle performance or safety.

Monitoring operational parameters for fault detection in UV systems is essential to ensure their safety and efficiency. A novel framework has been proposed that uses LSTM networks combined with autoencoders, enabling continuous learning from vehicle performance data [181]. This framework enhances the system’s ability to pinpoint and address faults progressively. The ability of LSTM in handling time-series data makes this approach particularly effective in dynamic environments where various factors can influence vehicle performance.The LSTM autoencoder can generate synthetic data points that represent potential fault scenarios. This enhances the training dataset, allowing the model to learn from a wider range of conditions and achieve 90% accuracy in detecting and 99% accuracy in classifying different types of drone misoperations based on simulation data. This significantly improves the safety and operational efficiency of UV systems. In subsequent developments [182], the advances in UAV fault detection and classification, particularly the four-fold increase in speed through FPGA-based hardware acceleration while half the energy consumption. This research further identifies a key consideration for GAI, which shows that model computation can be optimized for real-time operations. The successful deployment in UAV swarms also suggests that similar strategies could enhance GAI performance in dynamic environments and complex task coordination.

On the other hand, VAE presents a sophisticated approach to fault and anomaly detection in UV swarms. The authors in [180] proposed a new method by training VAE on data that represents the UVs’ normal operations. This method helps VAEs develop an understanding of what constitutes standard performance. The learning process involves the reconstruction of input data, where the model’s ability to accurately replicate the original data serves as the basis for identifying operational consistency. A significant deviation in the reconstruction error from the norm signals a potential fault or anomaly. By generating reconstructions of the input data and calculating the resulting errors, the VAE-based method achieved an average accuracy of 95.6% in detecting faults and anomalies [180]. The advantage of utilizing the VAEs ability of mapping relations is their proficiency in uncovering novel faults or issues that were not present or accounted for in the training dataset. This feature ensures that VAE-based systems can maintain high levels of safety and reliability in diverse and unpredictable scenarios. This feature is invaluable in the context of UV operations, which often encounter a broad spectrum of environmental conditions and operational challenges. Nonetheless, it is critical to acknowledge that the performance of VAEs can be affected by various factors that include the complexity of the VAE model itself, the quality and diversity of the data used for training, and the specific threshold set for flagging reconstruction errors as potential faults.

Furthermore, the authors in [183] utilize a spatio-temporal Transformer network for battery fault diagnosis and failure prognosis in electric vehicles due to its specialized architecture, which performs well at extracting key features across multiple spatial and temporal scales. The adoption of a spatio-temporal Transformer network for battery fault diagnosis and failure prognosis in vehicles excels in identifying early warning signals and predicting faults over varying spatial and temporal scales. Its ability to analyze and predict the evolution of battery faults using onboard sensor data aligns perfectly with the needs of UV, which are heavily reliant on battery integrity for their operation. By integrating such a model, predictive maintenance strategies are greatly enhanced, allowing early detection of anomalies and prediction of battery failures within a precise window ranging from 24 hours to a week. This approach not only enhances operational efficiency by optimizing vehicle schedules to reduce downtime but also plays a crucial role in safeguarding against potential battery failures that could compromise vehicle safety.

In UV operations, ensuring safety and reliability not only involves detecting faults, but also isolating affected components to prevent further issues, and implementing targeted solutions for resolution. For instance, in the case of relatively minor issues such as the loss of information due to sensor malfunctions, the utilization of VAE and GAN illustrates the innovative application of GAI in fault management [185]. Through optimization of the VAE-CGAN structure, these models can regenerate missing time series data, demonstrating their effectiveness in scenarios where operational faults compromise data integrity. This function is especially beneficial for applications like UAV-based agricultural surveillance, where the continuity of data collection is paramount.

In tackling severe issues that compromise UV swarm operations, an intriguing aspect of current research is the development of strategies for “where to crash” decision protocols that stand out [186]. This concept addresses the need for predefined protocols on how and where a UV should terminate its operation in the event of a critical failure to minimize secondary hazards. These protocols range from emergency landing zones for UAVs to specific sinking points for USVs and UUVs, and controlled stop measures for UGVs. However, these predefined protocols may not be able to accommodate all possible scenarios. Thus, integrating GAI into UV swarm fault management strategies offers an advanced method to enhance safety. For example, by analyzing real-time sensor data and understanding the intricacies of swarm dynamics, Transformers are able to make context-aware decisions to identify the safest termination points for compromised UVs accurately [187]. Incorporating such GAI may not only improve the management of critical failures, but also lower the risk of secondary incidents.

V Open Issues and Future Research Directions

As discussed above, GAI has the great potential to further enhance UV swarms in various aspects. However, due to the complex and dynamic nature of UV swarms, several issues will need to be tackled. As such, this section aims to highlight open issues of GAI in UV swarms and their potential solutions.

V-A Scalability

In the future, there can be a large number of UVs in a swarm to perform complex tasks in challenging environments such as precision agriculture, environmental monitoring, military operations, and delivery services. This introduces several issues to the development of GAI in UV swarms. Specifically, as the number of UVs increases, coordinating movements and communications between them become much more complex due to various factors like congestion, latency, signal interference, and limited communication range. This demands for novel GAI approaches that can quickly determine optimal movements and actions for each UV in such complex situations. One potential direction is to design distributed GAI architectures based on federated learning for collaborations between different groups of unmanned swarms, instead of relying on a single server/swarm leader. By doing this, the computing load can be distributed among swarm leaders, resulting in a more efficient learning and collaborating process between different groups of unmanned swarms, especially in large-scale swarm settings.

V-B Adaptive GAI

In UV swarms, system conditions are highly dynamic and uncertain due to the mobility of UVs as well as surrounding complex environments. Although GAI has abilities to deal with these uncertainties, it is essential to develop adaptive GAI approaches to further reduce the system latency when retraining GAI models under new environmental conditions, especially in large-scale unmanned swarms. Integrating GAI with recent advances in AI such as transfer learning and meta-learning is a potential solution. In particular, transfer learning aims to transfer knowledge learned in a source environment to facilitate the learning process of similar tasks in new environments while meta-learning aims to learn how to learn to speed up the learning process in new conditions. These techniques can help AI models achieve good training accuracy with a few training samples from the new environments [188, 189]. In addition, advanced deep reinforcement learning can also be integrated into GAI models for real-time learning and feedback on the behaviors of UV swarms.

V-C AI-native UV Networks

In large-scale and heterogeneous UV networks, the energy efficiency, latency and security issues are the most critical for coordination locally and globally. In this operational circumstance, semantic communication (SemCom) with knowledge base (KB) shared among UVs is a viable solution to minimize the transmission overhead by eliminating the task-irrelevant information while performing a given task reliably, which greatly improves the demanding energy efficiency, latency and security. The UVs are limited in both the resource and hardware, and hence it is challenging to realize AI-native UV networks that enable the SemCom with shared KB. It should be noted that the on-going connections among internet of things (IoT), namely UVs are inherently task-oriented, dynamic and short-term due to their mobility, and achieving the above goal via the SemCom in such a circumstance is of paramount importance. The GAI approaches herein are considered to be effective in dealing with the dynamic and complex environments of such large-scale and heterogeneous UV networks, which are worthy of further research towards the AI-native UV networks.

V-D 3-D Interference Control

In large-scale and heterogeneous UV networks with high mobility, the resource-constrained multiple UV connections face with dynamic interference patterns which are distributed across 3-D space and temporal domain, and therefore a sophisticated interference control should be exercised for coordinating and maintaining communication among diverse UVs with limited power usage. Especially, the high mobility renders this interference control challenging in such complex environments, where the GAI approaches are proven to be effective in regulating the dynamic interference fluctuations within 3-D coverage.

V-E Security and Privacy

As reviewed above, GAI can help to improve the security and privacy of UVs by generating synthetic data to deceive attackers. Nevertheless, GAI, and AI in general, are vulnerable to adversarial attacks in which attackers try to disrupt the training process of AI models by poisoning training data, evading trained models, and extracting model information. More dangerously, attackers can also leverage GAI to generate “fake” data that is hard to distinguish. In addition, in UV swarms with resource-constrained vehicles, e.g., UAVs, individual vehicles are extremely vulnerable to adversarial attacks, and cyber attacks in general, due to the lack of computing and energy resources to perform sophisticated and efficient countermeasures. Dealing with adversarial attacks in GAI is still a new research problem and there are limited efforts in the literature. As such, there is an urgent demand for lightweight and effective solutions to improve the security of UV swarms. One potential solution is to leverage GAI to recover poisoned training data. Integrating deep reinforcement learning with human feedback can also be useful in defeating adversarial attacks. Finally, federated learning can also be adopted to enhance the privacy of unmanned swarms since data of vehicles (e.g., their sensing data and locations) will not be shared with others.

VI Conclusion

In this paper, we have provided a comprehensive survey on the applications of GAI for UV swarms. We first provide an in-depth overview of UVs and UV swarms as well as their applications and existing challenges. Then, we present the fundamentals of various GAI techniques and their capabilities in addressing the challenges of UV swarms. After that, a comprehensive review on the applications of GAI for emerging problems in UV swarms is presented with insights and discussions. Finally, we discuss the opening issues of GAI in UV swarms and provide several potential research directions.

References

[1] Y. Tan, J. Wang, J. Liu, and Y. Zhang, “Unmanned systems security: Models, challenges, and future directions,” IEEE Network, vol. 34, no. 4, pp. 291–297, 2020.
[2] P. McEnroe, S. Wang, and M. Liyanage, “A survey on the convergence of edge computing and AI for UAVs: Opportunities and challenges,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 15 435–15 459, 2022.
[3] N. H. Motlagh, T. Taleb, and O. Arouk, “Low-altitude unmanned aerial vehicles-based internet of things services: Comprehensive survey and future perspectives,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 899–922, 2016.
[4] B. Li, Z. Fei, and Y. Zhang, “UAV communications for 5G and beyond: Recent advances and future trends,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2241–2263, 2018.
[5] Z. Liu, Y. Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,” Annual Reviews in Control, vol. 41, pp. 71–93, 2016.
[6] J. Neira, C. Sequeiros, R. Huamani, E. Machaca, P. Fonseca, and W. Nina, “Review on unmanned underwater robotics, structure designs, materials, sensors, actuators, and navigation control,” Journal of Robotics, vol. 2021, pp. 1–26, 2021.
[7] Y. Zhou, B. Rao, and W. Wang, “UAV swarm intelligence: Recent advances and future trends,” IEEE Access, vol. 8, pp. 183 856–183 878, 2020.
[8] G. Venayagamoorthy et al., “Unmanned vehicle navigation using swarm intelligence,” in International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of. IEEE, 2004, pp. 249–253.
[9] A. Puente-Castro, D. Rivero, A. Pazos, and E. Fernandez-Blanco, “A review of artificial intelligence applied to path planning in UAV swarms,” Neural Computing and Applications, pp. 1–18, 2022.
[10] J. Kennedy, “Swarm intelligence,” in Handbook of nature-inspired and innovative computing: integrating classical models with emerging technologies. Springer, 2006, pp. 187–219.
[11] L. Giacomossi, F. Souza, R. G. Cortes, H. M. M. Cortez, C. Ferreira, C. A. Marcondes, D. S. Loubach, E. F. Sbruzzi, F. A. Verri, J. C. Marques et al., “Autonomous and collective intelligence for UAV swarm in target search scenario,” in 2021 Latin American Robotics Symposium (LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021 Workshop on Robotics in Education (WRE). IEEE, 2021, pp. 72–77.
[12] M.-A. Lahmeri, M. A. Kishk, and M.-S. Alouini, “Artificial intelligence for UAV-enabled wireless networks: A survey,” IEEE Open Journal of the Communications Society, vol. 2, pp. 1015–1040, 2021.
[13] H. Kurunathan, H. Huang, K. Li, W. Ni, and E. Hossain, “Machine learning-aided operations and communications of unmanned aerial vehicles: A contemporary survey,” IEEE Communications Surveys & Tutorials, 2023.
[14] S. Sai, A. Garg, K. Jhawar, V. Chamola, and B. Sikdar, “A comprehensive survey on artificial intelligence for unmanned aerial vehicles,” IEEE Open Journal of Vehicular Technology, 2023.
[15] N. Cheng, S. Wu, X. Wang, Z. Yin, C. Li, W. Chen, and F. Chen, “AI for UAV-assisted iot applications: A comprehensive review,” IEEE Internet of Things Journal, 2023.
[16] J. Li, G. Zhang, C. Jiang, and W. Zhang, “A survey of maritime unmanned search system: theory, applications and future directions,” Ocean Engineering, vol. 285, p. 115359, 2023.
[17] H. Shakhatreh, A. H. Sawalmeh, A. Al-Fuqaha, Z. Dou, E. Almaita, I. Khalil, N. S. Othman, A. Khreishah, and M. Guizani, “Unmanned aerial vehicles (UAVs): A survey on civil applications and key research challenges,” IEEE Access, vol. 7, pp. 48 572–48 634, 2019.
[18] M. Abdelkader, S. Güler, H. Jaleel, and J. S. Shamma, “Aerial swarms: Recent applications and challenges,” Current robotics reports, vol. 2, pp. 309–320, 2021.
[19] A. Ahmadzadeh, A. Jadbabaie, V. Kumar, and G. J. Pappas, “Multi-UAV cooperative surveillance with spatio-temporal specifications,” in Proceedings of the 45th IEEE Conference on Decision and Control, 2006, pp. 5293–5298.
[20] N. Nigam, S. Bieniawski, I. Kroo, and J. Vian, “Control of multiple UAVs for persistent surveillance: Algorithm and flight test results,” IEEE Transactions on Control Systems Technology, vol. 20, no. 5, pp. 1236–1251, 2012.
[21] J. Scherer and B. Rinner, “Multi-UAV surveillance with minimum information idleness and latency constraints,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4812–4819, 2020.
[22] R.-j. Yan, S. Pang, H.-b. Sun, and Y.-j. Pang, “Development and missions of unmanned surface vehicle,” Journal of Marine Science and Application, vol. 9, pp. 451–457, 2010.
[23] M. Abdelkader, M. Shaqura, M. Ghommem, N. Collier, V. Calo, and C. Claudel, “Optimal multi-agent path planning for fast inverse modeling in UAV-based flood sensing applications,” in 2014 international conference on unmanned aircraft systems (ICUAS). IEEE, 2014, pp. 64–71.
[24] M. Carpentiero, L. Gugliermetti, M. Sabatini, and G. B. Palmerini, “A swarm of wheeled and aerial robots for environmental monitoring,” in 2017 IEEE 14th international conference on networking, sensing and control (ICNSC). IEEE, 2017, pp. 90–95.
[25] I. Bae and J. Hong, “Survey on the developments of unmanned marine vehicles: Intelligence and cooperation,” Sensors, vol. 23, no. 10, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/10/4643
[26] Z. Liu, Y. Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,” Annual Reviews in Control, vol. 41, pp. 71–93, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1367578816300219
[27] A. Vamråk Solheim, M. Lesage, B. E. Asbjørnslett, and S. O. Erikstad, “Deep sea mining: Towards conceptual design for underwater transportation,” in International Conference on Offshore Mechanics and Arctic Engineering, vol. 84386. American Society of Mechanical Engineers, 2020, p. V06BT06A050.
[28] H. Qin, J. Q. Cui, J. Li, Y. Bi, M. Lan, M. Shan, W. Liu, K. Wang, F. Lin, Y. Zhang et al., “Design and implementation of an unmanned aerial vehicle for autonomous firefighting missions,” in 2016 12th IEEE International Conference on Control and Automation (ICCA). IEEE, 2016, pp. 62–67.
[29] A. M. Hayajneh, S. A. R. Zaidi, D. C. McLernon, and M. Ghogho, “Drone empowered small cellular disaster recovery networks for resilient smart cities,” in 2016 IEEE international conference on sensing, communication and networking (SECON Workshops). IEEE, 2016, pp. 1–6.
[30] D. Kim and P. Y. Oh, “Skywriting unmanned aerial vehicle proof-of-concept design,” in 2017 International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 2017, pp. 1398–1403.
[31] V. Serpiva, E. Karmanova, A. Fedoseev, S. Perminov, and D. Tsetserukou, “Dronepaint: Swarm light painting with dnn-based gesture recognition,” in ACM SIGGRAPH 2021 Emerging Technologies, 2021, pp. 1–4.
[32] J. Fleureau, Q. Galvane, F.-L. Tariolle, and P. Guillotel, “Generic drone control platform for autonomous capture of cinema scenes,” in Proceedings of the 2nd workshop on micro aerial vehicle networks, systems, and applications for civilian use, 2016, pp. 35–40.
[33] T. Nägeli, “Intelligent drone cinematography,” Ph.D. dissertation, ETH Zurich, 2018.
[34] C. A. Thiels, J. M. Aho, S. P. Zietlow, and D. H. Jenkins, “Use of unmanned aerial vehicles for medical product transport,” Air medical journal, vol. 34, no. 2, pp. 104–108, 2015.
[35] T. K. Amukele, J. Hernandez, C. L. Snozek, R. G. Wyatt, M. Douglas, R. Amini, and J. Street, “Drone transport of chemistry and hematology samples over long distances,” American journal of clinical pathology, vol. 148, no. 5, pp. 427–435, 2017.
[36] A. Claesson, D. Fredman, L. Svensson, M. Ringh, J. Hollenberg, P. Nordberg, M. Rosenqvist, T. Djarv, S. Österberg, J. Lennartsson et al., “Unmanned aerial vehicles (drones) in out-of-hospital-cardiac-arrest,” Scandinavian journal of trauma, resuscitation and emergency medicine, vol. 24, pp. 1–9, 2016.
[37] A. Pulver, R. Wei, and C. Mann, “Locating aed enabled medical drones to enhance cardiac arrest response times,” Prehospital Emergency Care, vol. 20, no. 3, pp. 378–389, 2016.
[38] A. Claesson, L. Svensson, P. Nordberg, M. Ringh, M. Rosenqvist, T. Djarv, J. Samuelsson, O. Hernborg, P. Dahlbom, A. Jansson et al., “Drones may be used to save lives in out of hospital cardiac arrest due to drowning,” Resuscitation, vol. 114, pp. 152–156, 2017.
[39] N. Ogorelysheva, A. Vasileva, and N. Gramse, “On troubleshooting in agv-based autonomous systems,” in 2023 International Conference on Control, Automation and Diagnosis (ICCAD). IEEE, 2023, pp. 1–6.
[40] K. Kuru, D. Ansell, W. Khan, and H. Yetgin, “Analysis and optimization of unmanned aerial vehicle swarms in logistics: An intelligent delivery platform,” IEEE Access, vol. 7, pp. 15 804–15 831, 2019.
[41] Z. Xin, J. Li, J. Li, and C. Liu, “Collaborative search and package delivery strategy for UAV swarms under area restrictions,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 27, no. 5, pp. 932–941, 2023.
[42] Y. Wu, Y. Ding, S. Ding, Y. Savaria, and M. Li, “Autonomous last-mile delivery based on the cooperation of multiple heterogeneous unmanned ground vehicles,” Mathematical Problems in Engineering, vol. 2021, pp. 1–15, 2021.
[43] B. Alkouz, A. Bouguettaya, and S. Mistry, “Swarm-based drone-as-a-service (sdaas) for delivery,” in 2020 IEEE International Conference on Web Services (ICWS). IEEE, 2020, pp. 441–448.
[44] A. Tahir, J. Böling, M.-H. Haghbayan, H. T. Toivonen, and J. Plosila, “Swarms of unmanned aerial vehicles — A survey,” Journal of Industrial Information Integration, vol. 16, p. 100106, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2452414X18300086
[45] K. Z. Ang, X. Dong, W. Liu, G. Qin, S. Lai, K. Wang, D. Wei, S. Zhang, S. K. Phang, X. Chen et al., “High-precision multi-UAV teaming for the first outdoor night show in singapore,” Unmanned Systems, vol. 6, no. 01, pp. 39–65, 2018.
[46] T. Akram, M. Awais, R. Naqvi, A. Ahmed, and M. Naeem, “Multicriteria UAV base stations placement for disaster management,” IEEE Systems Journal, vol. 14, no. 3, pp. 3475–3482, 2020.
[47] J. Li, D. Lu, G. Zhang, J. Tian, and Y. Pang, “Post-disaster unmanned aerial vehicle base station deployment method based on artificial bee colony algorithm,” IEEE Access, vol. 7, pp. 168 327–168 336, 2019.
[48] A. Merwaday, A. Tuncer, A. Kumbhar, and I. Guvenc, “Improved throughput coverage in natural disasters: Unmanned aerial base stations for public-safety communications,” IEEE Vehicular Technology Magazine, vol. 11, no. 4, pp. 53–60, 2016.
[49] F. Malandrino, C. Rottondi, C.-F. Chiasserini, A. Bianco, and I. Stavrakakis, “Multiservice UAVs for emergency tasks in post-disaster scenarios,” in Proceedings of the ACM MobiHoc workshop on innovative aerial communication solutions for FIrst REsponders network in emergency scenarios, 2019, pp. 18–23.
[50] S. G. Fernandez, K. Vijayakumar, R. Palanisamy, K. Selvakumar, D. Karthikeyan, D. Selvabharathi, S. Vidyasagar, and V. Kalyanasundhram, “Unmanned and autonomous ground vehicle,” International Journal of Electrical and Computer Engineering, vol. 9, no. 5, p. 4466, 2019.
[51] D. H. Stolfi, M. R. Brust, G. Danoy, and P. Bouvry, “UAV-ugv-umv multi-swarms for cooperative surveillance,” Frontiers in Robotics and AI, vol. 8, p. 616950, 2021.
[52] W. Liu, “Robust multi-sensor data fusion for practical unmanned surface vehicles (usvs) navigation,” Ph.D. dissertation, UCL (University College London), 2020.
[53] P. H. Heins, B. L. Jones, and D. J. Taunton, “Design and validation of an unmanned surface vehicle simulation model,” Applied Mathematical Modelling, vol. 48, pp. 749–774, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0307904X17301245
[54] V. A. M. Jorge, R. Granada, R. G. Maidana, D. A. Jurak, G. Heck, A. P. F. Negreiros, D. H. dos Santos, L. M. G. Gonçalves, and A. M. Amory, “A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions,” Sensors, vol. 19, no. 3, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/3/702
[55] A. Wibisono, M. J. Piran, H.-K. Song, and B. M. Lee, “A survey on unmanned underwater vehicles: Challenges, enabling technologies, and future research directions,” Sensors, vol. 23, no. 17, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/17/7321
[56] G. Liu, “Remotely operated vehicle (rov) in subsea engineering,” in Encyclopedia of Ocean Engineering. Springer, 2022, pp. 1466–1478.
[57] S. Watson, D. A. Duecker, and K. Groves, “Localisation of unmanned underwater vehicles (uuvs) in complex and confined environments: A review,” Sensors, vol. 20, no. 21, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/21/6203
[58] M. Byrne, “The disruptive impacts of next generation generative artificial intelligence,” CIN: Computers, Informatics, Nursing, vol. 41, no. 7, pp. 479–481, 2023.
[59] C. Zhang, C. Zhang, M. Zhang, and I. S. Kweon, “Text-to-image diffusion model in generative AI: A survey,” arXiv preprint arXiv:2303.07909, 2023.
[60] T. Iqbal and S. Qureshi, “The survey: Text generation models in deep learning,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 2515–2528, 2022.
[61] L. Regenwetter, A. H. Nobari, and F. Ahmed, “Deep generative models in engineering design: A review,” Journal of Mechanical Design, vol. 144, no. 7, p. 071704, 2022.
[62] Z. Bahroun, C. Anane, V. Ahmed, and A. Zacca, “Transforming education: A comprehensive review of generative artificial intelligence in educational settings through bibliometric and content analysis,” Sustainability, vol. 15, no. 17, p. 12983, 2023.
[63] Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh, “Making the v in vqa matter: Elevating the role of image understanding in visual question answering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6904–6913.
[64] X. Li, B. Fan, R. Zhang, L. Jin, D. Wang, Z. Guo, Y. Zhao, and R. Li, “Image content generation with causal reasoning,” arXiv preprint arXiv:2312.07132, 2023.
[65] D. K. Kanbach, L. Heiduk, G. Blueher, M. Schreiter, and A. Lahmann, “The genai is out of the bottle: generative artificial intelligence from a business model innovation perspective,” Review of Managerial Science, pp. 1–32, 2023.
[66] H. Du, D. Niyato, J. Kang, Z. Xiong, K.-Y. Lam, Y. Fang, and Y. Li, “Spear or shield: Leveraging generative AI to tackle security threats of intelligent network services,” arXiv preprint arXiv:2306.02384, 2023.
[67] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
[68] ——, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
[69] A. Jabbar, X. Li, and B. Omar, “A survey on generative adversarial networks: Variants, applications, and training,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–49, 2021.
[70] L. J. Ratliff, S. A. Burden, and S. S. Sastry, “Characterization and computation of local nash equilibria in continuous games,” in 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2013, pp. 917–924.
[71] I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016.
[72] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, “Unrolled generative adversarial networks,” arXiv preprint arXiv:1611.02163, 2016.
[73] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
[74] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
[75] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in International conference on machine learning. PMLR, 2019, pp. 7354–7363.
[76] D. P. Kingma, M. Welling et al., “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019.
[77] X. Yang, “Understanding the variational lower bound,” variational lower bound, ELBO, hard attention, vol. 22, pp. 1–4, 2017.
[78] N. Leelarathna, A. Margeloiu, M. Jamnik, and N. Simidjievski, “Enhancing representation learning on high-dimensional, small-size tabular data: A divide and conquer method with ensembled VAEs,” arXiv preprint arXiv:2306.15661, 2023.
[79] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton, “Veegan: Reducing mode collapse in gans using implicit variational learning,” Advances in neural information processing systems, vol. 30, 2017.
[80] H. Cao, C. Tan, Z. Gao, Y. Xu, G. Chen, P.-A. Heng, and S. Z. Li, “A survey on generative diffusion model,” arXiv preprint arXiv:2209.02646, 2022.
[81] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023.
[82] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
[83] S. Li, T. Hu, F. S. Khan, L. Li, S. Yang, Y. Wang, M.-M. Cheng, and J. Yang, “Faster diffusion: Rethinking the role of unet encoder in diffusion models,” arXiv preprint arXiv:2312.09608, 2023.
[84] F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[85] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
[86] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831.
[87] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022.
[88] J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo et al., “Improving image generation with better captions,” Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2023.
[89] M. Westerlund, “The emergence of deepfake technology: A review,” Technology innovation management review, vol. 9, no. 11, 2019.
[90] J. Zhang, H. Shi, J. Yu, E. Xie, and Z. Li, “Diffflow: A unified sde framework for score-based diffusion models and generative adversarial networks,” arXiv preprint arXiv:2307.02159, 2023.
[91] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
[92] P. Katara, Z. Xian, and K. Fragkiadaki, “Gen2sim: Scaling up robot learning in simulation with generative models,” arXiv preprint arXiv:2310.18308, 2023.
[93] R. Zhang, K. Xiong, H. Du, D. Niyato, J. Kang, X. Shen, and H. V. Poor, “Generative AI-enabled vehicular networks: Fundamentals, framework, and case study,” 2023.
[94] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[95] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[96] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
[97] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[98] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[99] A. Karapantelakis, P. Alizadeh, A. Alabassi, K. Dey, and A. Nikou, “Generative AI in mobile networks: a survey,” Annals of Telecommunications, pp. 1–19, 2023.
[100] Z. Niu, G. Zhong, and H. Yu, “A review on the attention mechanism of deep learning,” Neurocomputing, vol. 452, pp. 48–62, 2021.
[101] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.
[102] G. Marco, J. Gonzalo, and L. Rello, “A systematic evaluation of the creative writing skills of transformer deep neural networks,” Available at SSRN 4042578, 2022.
[103] A. K. M. Masum, S. Abujar, S. Akter, N. J. Ria, and S. A. Hossain, “Transformer based bengali chatbot using general knowledge dataset,” in 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2021, pp. 1235–1238.
[104] A. Svyatkovskiy, S. K. Deng, S. Fu, and N. Sundaresan, “Intellicode compose: Code generation using transformer,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1433–1443.
[105] S. Gao, C. Gao, Y. He, J. Zeng, L. Nie, X. Xia, and M. Lyu, “Code structure–guided transformer for source code summarization,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 1, pp. 1–32, 2023.
[106] K. Lee, H. Chang, L. Jiang, H. Zhang, Z. Tu, and C. Liu, “Vitgan: Training gans with vision transformers,” arXiv preprint arXiv:2107.04589, 2021.
[107] R. Xu, X. Xu, K. Chen, B. Zhou, and C. C. Loy, “Stransgan: An empirical study on transformer in gans,” ArXiv, vol. abs/2110.13107, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:239768864
[108] T. Singh, J. Prakash, T. Bharti, and A. K. Mandpura, “Time series approach for visual servoing using transformers,” in 2023 12th Mediterranean Conference on Embedded Computing (MECO). IEEE, 2023, pp. 1–6.
[109] M. Volger, “Human detection and recognition in visual data from a swarm of unmanned aerial and ground vehicles through dynamic navigation,” Ph.D. dissertation, Faculty of Science and Engineering, 2015.
[110] H. Liao, H. Shen, Z. Li, C. Wang, G. Li, Y. Bie, and C. Xu, “Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models,” arXiv preprint arXiv:2312.03543, 2023.
[111] I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 3964–3979, 2020.
[112] F. Coeurdoux, N. Dobigeon, and P. Chainais, “Normalizing flow sampling with langevin dynamics in the latent space,” arXiv preprint arXiv:2305.12149, 2023.
[113] H. Reyes-González and R. Torre, “Testing the boundaries: Normalizing flows for higher dimensional data sets,” in Journal of Physics: Conference Series, vol. 2438, no. 1. IOP Publishing, 2023, p. 012155.
[114] T. Liu and J. Regier, “An empirical comparison of gans and normalizing flows for density estimation,” arXiv preprint arXiv:2006.10175, 2020.
[115] J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel, “Flow++: Improving flow-based generative models with variational dequantization and architecture design,” in International Conference on Machine Learning. PMLR, 2019, pp. 2722–2730.
[116] A. Abdelhamed, M. A. Brubaker, and M. S. Brown, “Noise flow: Noise modeling with conditional normalizing flows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3165–3173.
[117] M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh, and D. Kingma, “Videoflow: A flow-based generative model for video,” arXiv preprint arXiv:1903.01434, vol. 2, no. 5, p. 3, 2019.
[118] P. Esling, N. Masuda, A. Bardet, R. Despres et al., “Universal audio synthesizer control with normalizing flows,” arXiv preprint arXiv:1907.00971, 2019.
[119] K. Madhawa, K. Ishiguro, K. Nakago, and M. Abe, “Graphnvp: An invertible flow model for generating molecular graphs,” arXiv preprint arXiv:1905.11600, 2019.
[120] Y. Ma, M. N. Al Islam, J. Cleland-Huang, and N. V. Chawla, “Detecting anomalies in small unmanned aerial systems via graphical normalizing flows,” IEEE Intelligent Systems, 2023.
[121] D. Xu, C. Wei, P. Peng, Q. Xuan, and H. Guo, “Ge-GAN: A novel deep learning framework for road traffic state estimation,” Transportation Research Part C: Emerging Technologies, vol. 117, p. 102635, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0968090X19312409
[122] Y. Xie, Z. Shao, F. Chen, and H. Lin, “Harmonic state estimation based on network equivalence and closed loop GAN,” in 2023 8th Asia Conference on Power and Electrical Engineering (ACPEE), 2023, pp. 1688–1693.
[123] Y. He, S. Chai, and Z. Xu, “A novel approach for state estimation using generative adversarial network,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), 2019, pp. 2248–2253.
[124] H. Yu, G. Li, L. Su, B. Zhong, H. Yao, and Q. Huang, “Conditional GAN based individual and global motion fusion for multiple object tracking in UAV videos,” Pattern Recognition Letters, vol. 131, pp. 219–226, 2020.
[125] N. K. Jha and V. K. N. Lau, “Temporally correlated compressed sensing using generative models for channel estimation in unmanned aerial vehicles,” IEEE Transactions on Wireless Communications, pp. 1–1, 2023.
[126] M. Arvinte and J. I. Tamir, “Mimo channel estimation using score-based generative models,” IEEE Transactions on Wireless Communications, vol. 22, no. 6, pp. 3698–3713, 2023.
[127] H. Delecki, L. A. Kruse, M. R. Schlichting, and M. J. Kochenderfer, “Deep normalizing flows for state estimation,” 2023.
[128] T. D. Barfoot, State estimation for robotics. Cambridge University Press, 2017.
[129] X. Dai, R. Fu, E. Zhao, Z. Zhang, Y. Lin, F.-Y. Wang, and L. Li, “Deeptrend 2.0: A light-weighted multi-scale traffic prediction model using detrending,” Transportation Research Part C: Emerging Technologies, vol. 103, pp. 142–157, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0968090X1830648X
[130] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 961–971.
[131] A. He, C. Luo, X. Tian, and W. Zeng, “A twofold siamese network for real-time object tracking,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4834–4843.
[132] Y. Shi, L. Han, L. Han, S. Chang, T. Hu, and D. Dancey, “A latent encoder coupled generative adversarial network (LE-GAN) for efficient hyperspectral image super-resolution,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–19, 2022.
[133] R. Krajewski, T. Moers, D. Nerger, and L. Eckstein, “Data-driven maneuver modeling using generative adversarial networks and variational autoencoders for safety validation of highly automated vehicles,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 2383–2390.
[134] M. Zhang, Y. Zhang, L. Zhang, C. Liu, and S. Khurshid, “Deeproad: GAN-based metamorphic testing and input validation framework for autonomous driving systems,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 132–142.
[135] Y. Zharkovsky and O. Menadeva, “End-to-end change detection for high resolution drone images with GAN architecture,” arXiv preprint arXiv:2006.00467, 2020.
[136] Z. Islam, M. Abdel-Aty, Q. Cai, and J. Yuan, “Crash data augmentation using variational autoencoder,” Accident Analysis & Prevention, vol. 151, p. 105950, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S000145752031770X
[137] X. Li, H. Duan, Y. Tian, and F.-Y. Wang, “Exploring image generation for UAV change detection,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 6, pp. 1061–1072, 2022.
[138] D. Duplevska, V. Medvedevs, D. Surmacs, and A. Aboltins, “The synthetic data application in the UAV recognition systems development,” in 2023 IEEE 10th Jubilee Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), 2023, pp. 1–6.
[139] D. Xing and A. Tzes, “Synthetic aerial dataset for UAV detection via text-to-image diffusion models,” in 2023 IEEE Conference on Artificial Intelligence (CAI), 2023, pp. 51–52.
[140] J. de Curtò, I. de Zarzà, and C. T. Calafate, “Semantic scene understanding with large language models on unmanned aerial vehicles,” Drones, vol. 7, no. 2, 2023. [Online]. Available: https://www.mdpi.com/2504-446X/7/2/114
[141] J. Zhao, Q. Zhai, P. Zhao, R. Huang, and H. Cheng, “Co-visual pattern-augmented generative transformer learning for automobile geo-localization,” Remote Sensing, vol. 15, no. 9, 2023. [Online]. Available: https://www.mdpi.com/2072-4292/15/9/2221
[142] W. Zhang, F. Jiang, C.-F. Yang, Z.-P. Wang, and T.-J. Zhao, “Research on unmanned surface vehicles environment perception based on the fusion of vision and lidar,” IEEE Access, vol. 9, pp. 63 107–63 121, 2021.
[143] Z. Zhang and M. Fu, “Research on unmanned system environment perception system methodology,” in International Workshop on Advances in Civil Aviation Systems Development. Springer, 2023, pp. 219–233.
[144] X. Yu, W. Liao, C. Qu, Q. Bao, and Z. Xu, “UAV cooperative search based on multi-agent generative adversarial imitation learning,” in 2022 International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM), 2022, pp. 441–446.
[145] J. Song, H. Ren, D. Sadigh, and S. Ermon, “Multi-agent generative adversarial imitation learning,” Advances in neural information processing systems, vol. 31, 2018.
[146] S. Bandela and Y. Cao, “Drone navigation in unreal engine using generative adversarial imitation learning,” in AIAA SCITECH 2023 Forum, 2023, p. 0506.
[147] R. Krajewski, T. Moers, A. Meister, and L. Eckstein, “Béziervae: Improved trajectory modeling using variational autoencoders for the safety validation of highly automated vehicles,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019, pp. 3788–3795.
[148] P. Bentley, S. L. Lim, P. Arcaini, and F. Ishikawa, “Using a variational autoencoder to learn valid search spaces of safely monitored autonomous robots for last-mile delivery,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 1303–1311. [Online]. Available: https://doi.org/10.1145/3583131.3590459
[149] L. Li, J. Yao, L. Wenliang, T. He, T. Xiao, J. Yan, D. Wipf, and Z. Zhang, “Grin: Generative relation and intention network for multi-agent trajectory prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 27 107–27 118, 2021.
[150] D. Fuertes, C. R. del Blanco, F. Jaureguizar, J. J. Navarro, and N. García, “Solving routing problems for multiple cooperative unmanned aerial vehicles using transformer networks,” Engineering Applications of Artificial Intelligence, vol. 122, p. 106085, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0952197623002695
[151] G. Kobeaga, M. Merino, and J. A. Lozano, “An efficient evolutionary algorithm for the orienteering problem,” Computers & Operations Research, vol. 90, pp. 42–59, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0305054817302241
[152] X. Bi, “Overview of autonomous unmanned systems,” Environmental Perception Technology for Unmanned Systems, pp. 1–15, 2021.
[153] L. Mejias, J.-P. Diguet, C. Dezan, D. Campbell, J. Kok, and G. Coppin, “Embedded computation architectures for autonomy in unmanned aircraft systems (uas),” Sensors, vol. 21, no. 4, p. 1115, 2021.
[154] W. Miao, Z. Zeng, M. Zhang, S. Quan, Z. Zhang, S. Li, L. Zhang, and Q. Sun, “Multi-agent reinforcement learning for edge resource management with reconstructed environment,” in 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2021, pp. 1729–1736.
[155] T. Rathod and S. Tanwar, “Autoencoder-based efficient resource allocation in device-to-device communication,” Physical Communication, vol. 60, p. 102133, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1874490723001362
[156] H. Du, Z. Li, D. Niyato, J. Kang, Z. Xiong, H. Huang, and S. Mao, “Generative AI-aided optimization for AI-generated content (aigc) services in edge networks,” arXiv preprint arXiv:2303.13052, 2023.
[157] B. Du, H. Du, H. Liu, D. Niyato, P. Xin, J. Yu, M. Qi, and Y. Tang, “Yolo-based semantic communication with generative AI-aided resource allocation for digital twins construction,” 2023.
[158] H. Zou, Q. Zhao, L. Bariah, M. Bennis, and M. Debbah, “Wireless multi-agent generative AI: From connected intelligence to collective intelligence,” arXiv preprint arXiv:2307.02757, 2023.
[159] G. M. Skaltsis, H.-S. Shin, and A. Tsourdos, “A survey of task allocation techniques in mas,” in 2021 International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 2021, pp. 488–497.
[160] M. Ružička, M. Vološin, J. Gazda, T. Maksymyuk, L. Han, and M. Dohler, “Fast and computationally efficient generative adversarial network algorithm for unmanned aerial vehicle–based network coverage optimization,” International Journal of Distributed Sensor Networks, vol. 18, no. 3, p. 15501477221075544, 2022. [Online]. Available: https://doi.org/10.1177/15501477221075544
[161] C. Duan, S. Zhang, P. Yin, X. Li, and J. Luo, “Scma-tpgan: A new perspective on sparse codebook multiple access for UAV system,” Computer Communications, vol. 200, pp. 161–170, 2023.
[162] H. Du, R. Zhang, Y. Liu, J. Wang, Y. Lin, Z. Li, D. Niyato, J. Kang, Z. Xiong, S. Cui et al., “Beyond deep reinforcement learning: A tutorial on generative diffusion models in network optimization,” arXiv preprint arXiv:2308.05384, 2023.
[163] E. Chaalal, L. Reynaud, and S. M. Senouci, “Mobility prediction for aerial base stations for a coverage extension in 5G networks,” in 2021 International Wireless Communications and Mobile Computing (IWCMC), 2021, pp. 2163–2168.
[164] R. Zhang, H. Du, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, P. Zhang, and D. I. Kim, “Generative AI for space-air-ground integrated networks (sagin),” arXiv preprint arXiv:2311.06523, 2023.
[165] M. Bādoiu, S. Har-Peled, and P. Indyk, “Approximate clustering via core-sets,” in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 2002, pp. 250–257.
[166] J. Lyu, Y. Zeng, R. Zhang, and T. J. Lim, “Placement optimization of UAV-mounted mobile base stations,” IEEE Communications Letters, vol. 21, no. 3, pp. 604–607, 2016.
[167] E. Chaalal, L. Reynaud, and S. M. Senouci, “A social spider optimisation algorithm for 3d unmanned aerial base stations placement,” in 2020 IFIP Networking Conference (Networking). IEEE, 2020, pp. 544–548.
[168] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[169] Z. Xiong, Z. Cai, Q. Han, A. Alrawais, and W. Li, “Adgan: Protect your location privacy in camera data of auto-driving vehicles,” IEEE Transactions on Industrial Informatics, vol. 17, no. 9, pp. 6200–6210, 2021.
[170] X. Liu, H. Chen, and C. Andris, “trajgans: Using generative adversarial networks for geo-privacy protection of trajectory data (vision paper),” in Location privacy and security workshop, 2018, pp. 1–7.
[171] J. Rao, S. Gao, Y. Kang, and Q. Huang, “LSTM-TrajGAN: A Deep Learning Approach to Trajectory Privacy Protection,” in 11th International Conference on Geographic Information Science (GIScience 2021) - Part I, ser. Leibniz International Proceedings in Informatics (LIPIcs), K. Janowicz and J. A. Verstegen, Eds., vol. 177. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2020, pp. 12:1–12:17. [Online]. Available: https://drops.dagstuhl.de/opus/volltexte/2020/13047
[172] R. Uittenbogaard, C. Sebastian, J. Vijverberg, B. Boom, D. M. Gavrila et al., “Privacy protection in street-view panoramas using depth and multi-view imagery,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 581–10 590.
[173] O. Adeboye, T. Dargahi, M. Babaie, M. Saraee, and C.-M. Yu, “Deepclean: a robust deep learning technique for autonomous vehicle camera data privacy,” IEEE Access, vol. 10, pp. 124 534–124 544, 2022.
[174] Y. Tian, J. Wang, Y. Wang, C. Zhao, F. Yao, and X. Wang, “Federated vehicular transformers and their federations: Privacy-preserving computing and cooperation for autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 456–465, 2022.
[175] T. P. Nguyen, H. Nam, and D. Kim, “Transformer-based attention network for in-vehicle intrusion detection,” IEEE Access, vol. 11, pp. 55 389–55 403, 2023.
[176] N. Provos et al., “A virtual honeypot framework.” in USENIX Security Symposium, vol. 173, no. 2004, 2004, pp. 1–14.
[177] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[178] Q. Gao, F. Zhou, K. Zhang, G. Trajcevski, X. Luo, and F. Zhang, “Identifying human mobility via trajectory embeddings.” in Proceedings of International Joint Conference on Artificial Intelligence, vol. 17, 2017, pp. 1689–1695.
[179] S. Gao, J. Rao, X. Liu, Y. Kang, Q. Huang, and J. App, “Exploring the effectiveness of geomasking techniques for protecting the geoprivacy of twitter users,” Journal of Spatial Information Science, no. 19, pp. 105–129, 2019.
[180] R. Dhakal, C. Bosma, P. Chaudhary, and L. N. Kandel, “UAV fault and anomaly detection using autoencoders,” in Proceedding of IEEE/AIAA 42nd Digital Avionics Systems Conference. IEEE, 2023, pp. 1–8.
[181] V. Sadhu, S. Zonouz, and D. Pompili, “On-board deep-learning-based unmanned aerial vehicle fault cause detection and identification,” in Proceedding of IEEE international conference on robotics and automation (icra). IEEE, 2020, pp. 5255–5261.
[182] V. Sadhu, K. Anjum, and D. Pompili, “On-board deep-learning-based unmanned aerial vehicle fault cause detection and classification via fpgas,” IEEE Transactions on Robotics, 2023.
[183] J. Zhao, X. Feng, J. Wang, Y. Lian, M. Ouyang, and A. F. Burke, “Battery fault diagnosis and failure prognosis for electric vehicles using spatio-temporal transformer networks,” Applied Energy, vol. 352, p. 121949, 2023.
[184] H. Huang, G. Zhu, Z. Fan, H. Zhai, Y. Cai, Z. Shi, Z. Dong, and Z. Hao, “Vision-based distributed multi-UAV collision avoidance via deep reinforcement learning for navigation,” in Proceedding of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 13 745–13 752.
[185] S. LING, N. WANG, J. LI, and L. DING, “Optimization of VAE-CGAN structure for missing time-series data complementation of uav jujube garden aerial surveys,” Turkish Journal of Agriculture and Forestry, vol. 47, no. 5, pp. 746–760, May 2023.
[186] L. Tong, X. Gan, L. Yu, and H. Zhang, “Evaluation of safety target level of unmanned aerial vehicle system in fusion airspace,” in Proceedding of IEEE International Conference on Artificial Intelligence and Computer Applications. IEEE, 2022, pp. 375–379.
[187] M. A. Mehmood, M. N. A. Khan, and W. Afzal, “Transforming context-aware application development model into a testing model,” in Proceedding of 2017 8th IEEE International Conference on Software Engineering and Service Science, 2017, pp. 177–182.
[188] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009.
[189] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.