Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System

Zou, Cheng; Sun, Yezhen; Kong, Linghua

doi:10.3390/app14146079

Open AccessArticle

Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System

by

Cheng Zou

^*,

Yezhen Sun

and

Linghua Kong

Fujian Key Laboratory of Intelligent Machining Technology and Equipment, Fujian University of Technology, Fuzhou 350100, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6079; https://doi.org/10.3390/app14146079

Submission received: 22 April 2024 / Revised: 2 July 2024 / Accepted: 8 July 2024 / Published: 12 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Safely landing unmanned aerial vehicles (UAVs) in unknown environments that are denied by GPS is challenging but crucial. In most cases, traditional landing methods are not suitable, especially under complex terrain conditions with insufficient map information. This report proposes an innovative multi-stage UAV landing framework involving (i) point cloud and image fusion positioning, (ii) terrain analysis, and (iii) neural network semantic recognition to optimize landing site selection. In the first step, 3D point cloud and image data are fused to attain a comprehensive perception of the environment. In the second step, an energy cost function considering texture and flatness is employed to identify potential landing sites based on energy scores. To navigate the complexities of classification for precise landings, the results are stratified by the difficulty of various UAV landing scenarios. In the third step, a network model is applied to analyze UAV landing site options by integrating the ResNet50 network with a convolutional block attention module. Experimental results indicate a reduction in computational load and improved landing site identification accuracy. The developed framework fuses multi-modal data to enhance the safety and feasibility of UAV landings in complex environments.

Keywords:

convolutional neural network; LIDAR; landing scene recognition; landing site selection; point cloud; unmanned aerial vehicle

Graphical Abstract

1. Introduction

Unmanned aerial vehicles (UAVs) are highly mobile and lightweight devices that have garnered significant interest for outdoor missions, such as terrain surface rendering [1,2], delivery of rescue materials [3], and inspection of large wild regions [4]. The terrains in these areas are rugged, and only some of the flat land is suitable for secure UAV landings. Therefore, to ensure successful landing without damage, the UAV should be able to land autonomously by detecting a suitable landing site after accomplishing its mission. In typical human-operated scenarios, autonomous landing capabilities can help the human operator determine whether the proposed location is suitable for landing. In autonomous decision scenarios, UAVs are more dependent on their autonomous landing capabilities, e.g., if the UAV requires an emergency landing because of low power or loss of connection with the ground station when it reaches an unknown location.

Most UAV positioning or landing systems are based on positioning equipment, such as GPS or Inertial Navigation System (INS) devices [5,6]. Despite wide use, GPS signals can be noisy, imprecise, and easily disrupted by obstacles in the field. Although INS systems can estimate five degrees of freedom, their accuracy can degrade over time due to error accumulation if there is no external correction. Additionally, INS systems that incorporate magnetometers are subject to local magnetic interference from buildings and other structures. These limitations make both GPS and INS systems unreliable for certain outdoor missions. Real-time kinematic (RTK) stations can significantly improve GPS accuracy and mitigate these problems; however, they can be ineffective if the GPS signal is lost or if there are interfering obstacles.

Light detection and ranging (LIDAR) represents an increasingly popular type of vision system, which sends and receives laser beams to obtain 3D geometric information about the surroundings. Although LIDAR can be affected by direct sunlight, artificial light contamination, and adverse weather conditions, such as rain or fog, it remains relatively reliable under various lighting conditions. Therefore, LIDAR is widely applicable in automated vehicles, particularly when combined with cameras. Airborne LIDAR has large spatial coverage but is limited by costs and payloads, and it can only be equipped on large planes [7]. Similarly, mechanical LIDAR that obtains a point cloud in several lines by spinning itself is not financially practical. Nevertheless, LIDAR-based odometry and the Simultaneous Localization and Mapping (SLAM) technique have attracted attention in autonomous driving and ground mobile robots. UAVs equipped with light mechanical LIDAR were investigated in recent studies, e.g., a mechanical LIDAR “Puck” with a high-resolution camera was applied to reconstruct 3D colorful maps of field environments (e.g., coastlines [8] and forests [9]). In theory, a UAV’s odometer can be estimated based on the 3D point cloud obtained by LIDAR.

However, owing to the low density provided by the light mechanical LIDAR, the corresponding information in the point cloud sequence is difficult to detect. Eventually, the LIDAR plays a secondary or negligible role in these UAV positioning applications. For example, the laser-camera-based 3D mapping methods proposed by Suzuki et al. [10] required six Global Navigation Satellite System (GNSS) receivers to estimate the position of a UAV.

Visual positioning systems have demonstrated applicability for autonomous landing management, although most such landings were aided by specifically designed pads. For example, UAV landing progress was controlled based on a fixed landing device with a QR code image [11]. Vidal et al. [12] designed a landmark positioning system consisting of a dark rectangle with a white inner circle. Rabah et al. [13] proposed an image-based target detection approach for safe UAV landings. These methods combined feature tracking, image mosaics, and motion estimation, which allowed UAVs to identify landing sites.

Few studies have focused on UAV autonomous landing in complex terrains whose geometry can be described by a carriable vision system. For example, a camera [14] can search for wide grassy areas for UAV landings, and a stereo camera [15] can be used for modeling to obtain a 3D map to limit the altitude of a UAV; however, these methods can only be applied in structured or simulated environments. The difficulty associated with autonomous landings is the generation of a 3D terrain map because this relies on the position data of the UAV, which are then transformed into the coordinates of the relative sensor data. Vision-based positioning and the SLAM technique have been used in the context of ground mobile platforms; however, UAVs fly at high altitude, which renders these methods less applicable for constructing ground maps.

Most landings without fixed target points depend on GPS. However, in practical applications, phenomena due to instability (e.g., signal drift and low accuracy) are likely to occur, especially if there are poor weather conditions. In addition to visual-based approaches, optical flow systems can be used separately to determine the altitude and position of the UAV [16] and to compensate for the drift of the GPS system, as described by Cheng et al. [17], thereby ensuring that the UAV maintains a relatively stable altitude during the landing process. In general, further research and development are necessary to establish more reliable alternative technologies.

The autonomous navigation capabilities of UAVs rely on their effective handling of emergencies and awareness of the surrounding environment. The present study proposes an efficient framework for guiding UAV landings (Figure 1), which includes three key modules aimed at rapidly and reliably selecting a landing site when GPS is insufficient. The UAV landing site selection method is defined, and the effectiveness of the proposed method is validated based on experiments. The main contributions of this study are summarized as follows:

1.: A LIDAR–camera system is developed to determine the position of a UAV and create geometric and color terrain maps of its environment, thereby solving the positioning problem in GPS-denied environments.
2.: The proposed method based on LIDAR–camera mapping techniques can identify potential landing sites by fusing weighted energy cost functions of texture and geometric flatness, thereby ensuring terrain safety.
3.: A dataset classified according to scene landing difficulty is established to facilitate UAV landing scene recognition. Additionally, an attention mechanism is added to the network to improve the accuracy and reliability of the scene analysis.

The remainder of this report is organized as follows:

Section 2 discusses relevant previous work.
Section 3 introduces the UAV positioning methods.
Section 4 describes the UAV landing site selection system.
Section 5 analyzes experimental results to evaluate the feasibility and robustness of the proposed method.
Section 6 presents the conclusions from this study and discusses future research directions.

2. Related Work

2.1. UAV Landing on Rugged Terrain

The task of autonomous landing on rugged terrain requires methods for autonomously choosing a safe landing site in an unknown environment. Most techniques employ image segmentation or 3D mapping to identify level terrain that is suitable for landing. Geometric information typically enables intuitive inference of the smoothness of a landing terrain, thereby aiding in the decision process regarding landing suitability. For example, Zheng et al. [18] proposed a method for UAV landing based on point cloud data; they used a fine-grained grid elevation map and a comprehensive risk model to minimize risks during touchdown. To improve the real-time capabilities of the system, Yang et al. [19] used coarse-to-fine point clouds in different dimensional spaces within the landing framework. A principal component analysis is often used to detect suitable UAV landing sites in real time by evaluating geometric features, slopes, distances, obstacles, roughness, and ground conditions, thus demonstrating applicability in diverse scenarios [20]. However, considering the geometric factors of the environment without assessing semantic information may be dangerous, even for flat landing areas.

Edge detection and semantic recognition technologies applied to images can help UAVs identify large areas within the same category, thereby guiding the UAVs for landing [20]. Singh et al. [21] extracted image edges and employed a diameter area estimation algorithm to cluster contours to calculate the shortest distance between polygons to determine potential landing zones. Kaljahi et al. [22] introduced Markov chain codes to perform histogram operations on Gabor response images. They grouped neighboring pixels sharing the same value based on probability as a region segmentation criterion. Guérin et al. [23] guided UAV landing by identifying specific areas in urban scenes, such as roads, vehicles, and buildings. It is important to note that in practical applications, these technologies are susceptible to variations in lighting conditions, sensor performance, and environmental changes, which diminish the robustness and reliability of the system.

However, relying only on individual information sources to determine the safety of a landing site may lead to UAV accidents. In recent years, novel approaches [24] have emerged that integrate two types of information to promote precise landing site selection. These methods typically involve frame-by-frame recognition of input images or point cloud data, which make them unsuitable for real-time execution on computationally constrained UAVs or for selecting landing sites over expansive geographical areas. The UAV landing framework proposed herein integrates 3D mapping to identify suitable landing sites within its range and combines these data with semantic information from images to pinpoint the optimal landing site.

2.2. UAV SLAM during the Landing Process

A UAV can employ the SLAM technique to estimate its location by generating a map based on essential data acquired from the surrounding environment. SLAM generates a 3D map of the environment and determines the UAV’s position, thereby facilitating its identification of suitable landing locations. This approach [25,26] primarily uses image or LIDAR data as the main information source, although it is complemented by GPS data for positioning and mapping. In these methods [27,28,29], overlapping sections of images or LIDAR data from different time stamps are compared to estimate the trajectory of the UAV and construct maps. Particularly in outdoor conditions, it is likely that some characteristic information or sensor data are missing between the two images/datasets with different time stamps, which increases the risk of the UAV losing its position within a short time. A more stable position estimate can be generated by integrating image features with laser characteristics, which enables the UAV to execute a landing with greater precision along the provided path.

2.3. Deep Learning for Landing Site Recognition

Deep learning is effective for template matching, and prior training allows autonomous UAVs to better meet user needs by reducing the randomness associated with landing in complex environments. Dynamic landing, where the specific UAV landing location is not predetermined, requires the UAV to analyze the suitability of various landing sites based on the surroundings. For example, Du et al. [30] used a modified ResNeXt50 architecture and trained it on the Places365 dataset to develop a seven-class method for assessing landing safety. Although this classification method was not deployed in real scenarios, it provided a useful reference. González-Trejo et al. [31] employed an architecture to isolate pedestrians in crowded urban areas and determine the most suitable location for UAV landing based on the maximum circumscribed circle. This method ensured the safety of the UAV landing site but did not account for the impact of the actual terrain on the landing. Liu et al. [25] used point clouds and neural networks to assist in landing site determination. They identified platform features using the ResNet50 network and evaluated the 3D attributes of the platform with 3D point clouds to ensure a safe landing. Compared with a dynamic analysis using a single sensor, the strategy proposed herein also includes a terrain analysis to enhance UAV landing safety. However, owing to the lack of an optimized terrain judgment algorithm, it cannot attain the same advantages as multiple sensors.

There are also approaches that help the UAVs detect deployed landing sites. For example, Shen et al. [32] implemented target point recognition within the visual range using the MobileNetV3 framework and reported long-distance detection of target points; however, that approach required significant prior deployment across a wide range of jobs. Kim et al. [33] developed YOLO-V4 for dynamic unmanned vehicle recognition. They incorporated Ultra-Wide Band (UWB) for assisted positioning during landing and an RGB-D camera for landing area segmentation and pose estimation. Although short-range RGB-D is less affected, the early deployment of outdoor UWB was heavy and difficult to maintain. Gao et al. [34] evaluated tracking problems in dynamic scenarios and developed a precise and robust strategy for mobile platform target tracking. However, their method relied on the GPS information of the target mobile platform under basic conditions, which is not always available.

3. Proposed Positioning Framework

3.1. Unmanned Aerial Vehicle

The LIDAR system (Horizon, Livox, Shenzhen, Guangdong, China) and camera (RealSense, Intel, Santa Clara, CA, USA) were suspended on the UAV to obtain the LIDAR point clouds and an image of the earth’s surface. The LIDAR and camera were installed close to one another and pointing in the same direction to maximize the degree of overlap in terms of the field of view (FOV). The camera, which has two lenses, is a stereo camera, and only one lens was used in combination with the LIDAR. Its main purpose was to conduct comparative experiments with several stereo camera-based systems. The GPS on the UAV in the proposed system was not involved in positioning, although it synchronized the initialization time of the LIDAR–camera systems based on GPS-supplied time pulses.

3.2. Sensor Fusion

Because the frequency of the camera was higher than that of LIDAR, LIDAR locked phase when detecting pulses from the GPS and searched for the closest image in time. A point in the LIDAR frame at time t was denoted as

p_{i}^{t, l} = {[x_{i}^{t, l}, y_{i}^{t, l}, z_{i}^{t, l}]}^{T}

, and it was transformed to the camera frame using the following:

[\begin{matrix} p_{i}^{t, c} \\ 1 \end{matrix}] = [\begin{matrix} R_{l}^{c} & 0 \\ t_{l}^{c} & 1 \end{matrix}] [\begin{matrix} p_{i}^{t, l} \\ 1 \end{matrix}],

(1)

where

p_{i}^{t, c} = {[x_{i}^{t, c}, y_{i}^{t, c}, z_{i}^{t, c}]}^{T}

is the point in the camera frame at time t, and the extrinsic transformation matrix from the LIDAR frame to the camera frame consisted of a rotation matrix

R_{l}^{c} \in S O (3)

and a translation vector

t_{l}^{c} \in R^{3}

.

Because the FOV of the LIDAR was smaller than that of the camera in the horizontal and vertical directions, all LIDAR points that could be projected into the image were considered according to the expression,

{u v}_{i}^{t} = K_{c} p_{i}^{t, c},

(2)

where

{u v}_{i}^{t} = {[u_{i}^{t}, v_{i}^{t}, 1]}^{T}

is the image coordinate,

K_{c} = [\begin{matrix} k_{x} & β & u_{0} \\ 0 & k_{y} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

, which is determined by the focal distance

[k_{x}, k_{y}]

and skew

β

, and the central point coordinate of the image is the intrinsic parameter matrix of the camera.

The intrinsic parameter matrix and the extrinsic transformation matrix were calibrated based on a fixed chessboard plane, which shared the same FOV as the LIDAR and camera. The calibration process is shown in Figure 2. The difference of normals (DoN) [35] feature description was applied to detect chessboard planes in the LIDAR frame. Meanwhile, the internal parameter matrix

K_{c}

was estimated using a flexible camera calibration toolbox [36], which reconstructed the position and plane of the chessboard corner. The chessboard plane at the LIDAR and camera frames was indicated by their normal vector

θ

and the distance to the sensor optical center d, respectively, as shown in the following equation,

\begin{matrix} α_{t}^{l} : θ^{t, l} p_{i}^{t, l} + d^{t, l} = 0, \\ α_{t}^{l} : θ^{t, l} p_{i}^{t, c} + d^{t, l} = 0 . \end{matrix}

(3)

The collection of the chessboard in n FOVs was obtained as follows:

\begin{matrix} A^{l} = [α_{1}^{l}, α_{2}^{l}, \dots, α_{n}^{l}], \\ A^{c} = [α_{1}^{c}, α_{2}^{c}, \dots, α_{n}^{c}] . \end{matrix}

(4)

Each point in the LIDAR frame can be transformed via an extrinsic transformation matrix to correspond to a location on the plane in the camera frame. Thus, the constraint equation can be constructed:

arg min_{R_{l}^{c}, t_{l}^{c}} \sum_{t = 1}^{n} \frac{1}{l (i)} \sum_{i = 1}^{l (i)} {(θ_{t, c}^{T} (R_{l}^{c} p_{i}^{t, l} + t_{l}^{c}) + d_{t, c})}^{2}

(5)

Equation (5) is a nonlinear optimization, which can be solved using the Levenberg–Marquardt algorithm [37]. The last image in Figure 2 shows the color point cloud fused from the LIDAR and camera calibration results in one FOV, where the corresponding relationship between the fitted points and the real scene indicate accurate alignment.

3.3. Positioning

The framework of the LIDAR–camera positioning system is based on the RTAB-Map building method [38]. RTAB-Map is mainly used for ground scenarios with varying distances; however, from an aerial perspective (i.e., at high altitude), the ground appears uniform, which reduces the effectiveness of some RTAB-Map modules for UAV positioning. As shown in Figure 3, the positioning system framework proposed herein is divide into LIDAR positioning and camera positioning. For LIDAR positioning, an edge and planar feature detection method was implemented, which replaced an Iterative Closest Point (ICP) algorithm to estimate the UAV’s motion based on LIDAR cloud sequences. Moreover, because the LIDAR point cloud feature is vulnerable to smooth terrain, detecting natural landmarks from images is essential. The stereo camera that RTAB-Map uses to assist the LIDAR is not applicable for long-distance ranges. Therefore, a depth estimation step is included for monocular association of the color point cloud map reconstruction to account for motion. The details of this proposed method are provided in the following subsections.

3.3.1. LIDAR Position

Plane and edge features were extracted using an algorithm to calculate the smoothness of the local surface [39]. Supposing that

S

is an area centered on

p_{i}^{t, l}

and

p_{j}^{t, l}

is the neighbor of

p_{i}^{t, l}

, where

p_{j}^{t, l} \in S

, the smoothness score of the local surface is defined as:

f_{l} = \frac{∥\sum_{p_{j}^{t, l} \in S} (p_{i}^{t, l} - p_{j}^{t, l})∥}{| S | \cdot ∥p_{i}^{t, l}∥},

(6)

The score of all points was sorted, and the maximum and minimum groups were denoted as edge and planar points, respectively. Here,

f_{l}

is robust for distinguishing edge and planar points benefiting from the hierarchical structure of the ground horizontal space. However, the value of

∥ p_{i}^{t, l} ∥

can increase with increasing UAV altitude, which reduces the ability of

f_{l}

to detect feature points.

To address this issue, the smoothness of the local surface was evaluated by fusing image information, as shown in the following equation,

f_{s} = f_{l} + λ f_{c},

(7)

where

f_{c}

uses color to evaluate the local smoothness by weighing the similarity between a LIDAR point and its neighbor, both of which are projected onto the image plane. Supposing that the k-nearest points of a LIDAR point

u v_{i}^{t}

on the image plane are

u v_{(1 \sim k)}^{t}

in the image plane, the smoothness of

u v_{i}^{t}

is defined as,

\begin{matrix} f_{c} = \sum_{j = 1}^{k} (L_{G} (u v_{j}, u v_{i})), \\ L_{G} (u v_{j}^{L}, u v_{i}) = inf_{u v_{ζ} \in Z (u v_{i}^{t}, u v_{j}^{t})} \int_{ζ} ε (u v_{ζ}) d u v_{ζ}, \end{matrix}

(8)

where

L_{G}

is the shortest geodesic distance based on the weight similarity between the two points. Here,

Z (u v_{i}^{t}, u v_{j}^{t})

denotes all possible routes from

u v_{i}^{t}

to

u v_{j}^{t}

;

(u v_{ζ}) d u v_{ζ}

denotes the energy along one route; and the infimum is the route with minimum energy. A point

u v_{i}^{t}

belonging to the plane will be visually close to its neighbors

u v_{(1 \sim k)}^{t}

but far from boundaries. According to this principle, the structured edge detector (SED), which estimates the probability of a pixel being part of a boundary, was applied to evaluate the energy, as shown in Equation (8). Figure 4 shows an example of how to estimate the local smoothness according to

f_{c}

. To meet the requirements for real-time UAV processes, a parallel algorithm was employed to approximate the infimum of SED images.

After detecting the feature points, the next task was to determine the corresponding relationship in the sequence data. In contrast to RTAB-MAP, the feature matching and motion estimation were conducted simultaneously. Here,

ε_{t}

and

H_{t}

represent the set of edge and planar points, respectively, at time t, and

ε_{m}

and

H_{m}

are the corresponding feature points in the global map. The transformation of each point from the LIDAR frame to the global map was achieved using the following equation,

p_{i}^{t, g} = R_{t} p_{i}^{t, l} + T_{t},

(9)

where

R_{t}

and

T_{t}

are the LIDAR motion parameters that must be optimized, as shown in Algorithm 1. In practice, the transformation of the last frame is selected as the initialization point, and the distance from a feature point to its corresponding point is set as the optimizing condition. The plane and edge features of a point can be easily distinguished by computing the covariance of the k-nearest points

p_{k}^{g}

. The point with the largest/smallest eigenvalue, which is three times larger/smaller than the second was considered as a line/plane in the global map [40]. The LIDAR motion was recovered by minimizing the overall distances from

p_{i}^{t, g}

to its corresponding edge/line, as shown in the following equation:

\begin{matrix} d_{ε} = \frac{|(p_{i}^{t, g} - p_{1}^{g}) \times (p_{i}^{t, g} - p_{k}^{g})|}{|p_{1}^{g} - p_{k}^{g}|}, \\ d_{H} = \frac{|{(p_{i}^{t, g} - p_{1}^{g})}^{T} (p_{k / 2}^{g} - p_{k}^{g}) \times (p_{k / 2}^{g} - p_{1}^{g})|}{|(p_{k / 2}^{g} - p_{k}^{g}) \times (p_{k / 2}^{g} - p_{1}^{g})|} . \end{matrix}

(10)

Algorithm 1 Optimize evaluation of

R_{t}

and

T_{t}

Input :: $ε_{t}$ , $H_{t}$ , $ε_{g}$ , $H_{g}$ , $R_{t - 1}$ , $T_{t - 1}$
Output :: $R_{t}$ , $T_{t}$
1:: $R_{t - 1}$ → $R_{t}$ , $T_{t - 1}$ → $T_{t}$
2:: for Iterative pose optimization has not converged do
3:: for $p_{i}^{t, l} \in ε_{k}$ or $H_{k}$ do
4:: Compute $p_{i}^{t, g}$ via Equation (9)
5:: Find 5 nearest points in $ε_{m}$ or $H_{m}$ and compute covariance
6:: if the covariance is largest/smallest and three times large/small than second then
7:: Add edge-to-edge or plane-to-plane residual
8:: end if
9:: end for
10:: Perform pose optimization with 2 iterations. Recompute edge-to-edge and plane-to-plane residual, then remove 20 of the biggest residuals.
11:: for a maximal number of iterations do
12:: if the nonlinear optimization converges then
13:: Break;
14:: end if
15:: end for
16:: end for

3.3.2. Camera Position

Camera positioning, which builds a 2D visual map, is a common method [41], whereby Scale Invariant Feature Transform (SIFT), Sped Up Robust Features (SURF), and Binary Robust Invariant Scalable Keypoint (BRISK) features are extracted from captured images to calculate the position relative to the UAV. These feature data are stored in a dictionary map and used to match the incoming image.

Supposing that

p_{i}^{t, c}

and

p_{i}^{g, c}

are corresponding feature points at time

t - 1

on a global map, the transformation is formulated as:

p_{i}^{g, c} = R_{t}^{c} p_{i}^{t, c} + T_{t}^{c} .

(11)

The solution of

R_{t}^{c}

and

T_{t}^{c}

is a

p n p

problem. If the depth of feature points is known (associated with Equation (2)), then Equation (11) can be deduced via bundle adjustment.

The depth map is updated with the registered LIDAR clouds, triggered by the reception of a new feature point. A key assumption is that depth value similarity correlates positively with color value similarity. The depth of each feature point is estimated using the projected laser points on the current image:

d (u v_{i}) = \frac{\sum_{j = 1}^{k} (L_{G} (u v_{j}^{L}, u v_{i}) d_{u v_{j}^{L}})}{\sum_{j = 1}^{k} L_{G} (u v_{j}^{L}, u v_{i})} .

(12)

4. Landing Site Detection

During landing site detection, the UAV selects a portion of the flat area as its landing location based on the established color point cloud map. Qualifying the landing site as suitable involves assessing an acceptable slope range and confirming the absence of obstacles, a sufficient size, and proximity to the UAV. These factors were quantified by generating a cost energy map from the geometric and texture information of the area map to identify multiple potential landing sites. Areas with shallow water or heavy road traffic pose potential hazards for UAV landings; thus, landing site semantics determined the safest option for the UAV.

4.1. Construction of the Cost Energy Map

Considering the color point cloud map and the estimated UAV position, the cost energy map was established to weigh the landing possibilities at every point, as described in the following equation,

C_{s u m} = λ_{1} c_{s} + λ_{2} c_{f} + λ_{3} c_{d},

(13)

where

c_{s}

,

c_{f}

, and

c_{d}

represent smoothness, flatness, and distance costs, respectively, as shown in the alternative landing site detection module in Figure 1. Additionally,

λ_{1}

,

λ_{2}

, and

λ_{3}

are the weight parameters for each costmap with the constraints that

λ_{1} + λ_{2} + λ_{3} = 1

. The details of the costmap and weight parameter definitions for alternative landing site detection are described below.

Smoothness costmap

c_{s}

: Normally, the flat sites that are suitable for a UAV landing comprise similar colors visually in natural scenes. The edge contour of the image may indicate that the area is filled with various irregularly shaped objects. Because the color map is obtained during the UAV’s flight, it is possible to directly calculate the map edges in the global area instead of in each image frame. A canny edge detection method was adopted to generate a binary edge map using a Gaussian filter to reduce image noise. The suitable landing sites could be identified by finding a large circle centered on a point in the edge map, where there was no edge point therein. The radius of the circle could be determined using the distance transformation method, which calculates the distance between each point and the closest edge point in the map. For each point

x_{i}^{e}

in the edge map, the smoothness cost was defined by the circle radius expressed as

c_{s} (x_{i}) = min \{\sqrt{{(x_{i} - x_{k})}^{T} (x_{i} - x_{k})} |C a n d y (x_{k}) = 1\},

(14)

where

C a n d y (x_{k}) = 1

indicates that

x_{k}

is an edge point.

Flatness costmap

c_{f}

: After obtaining the 3D point cloud map, the geometric shape of the terrain under the UAV can be analyzed. A site that is suitable for the UAV to land is expected to be a large and flat plane. The flatness of each point in a point cloud map can be measured according to the angle between its normal

n_{i}

and the vertical vector z:

θ_{i} = {cos}^{- 1} (\frac{n_{i} \cdot z}{|n_{i}| |z|}) .

(15)

The cost energy of each point was evaluated by a Gaussian kernel,

f l_{i} = exp (\frac{{(θ_{i} - \bar{θ})}^{2}}{2 σ_{θ}^{2}}),

(16)

where

\bar{θ}

and

σ_{θ}

represent the mean and standard deviation of all points, respectively. Similar to the smoothness cost, the landing site was determined by finding a circle centered on a point in the normal map, where the

f l_{i}

’s of all points were accumulated. To increase the computational speed, the KD-tree in the Point Cloud Library (PCL) was constructed to search for the nearest neighbors of the query point within the radius and to estimate the surface normal via principal component analysis. The flatness cost was represented by the following equation,

c_{f} (p_{i}) = \sum_{j \in ℜ_{i}} f l_{j},

(17)

where

ℜ_{i}

denotes a circle centered on the point

p_{i}

.

Energy consumption costmap

c_{d}

: The last factor influencing the landing site selection was related to the energy consumed by the UAV to reach the landing site. It is time consuming to accurately estimate the energy consumed by the UAV from the current position along the optimal trajectory to each point on the map, and therefore, the Euclidean distance was computed to approximate this costmap:

c_{d} = ∥p_{i} - p_{u a v}∥,

(18)

where

p_{u a v}

represents the current position of the UAV.

Weighting parameters: Each costmap was normalized to scale its values to

[0, 1]

. The flatness costmap can be used to find large flat areas most directly, so

λ_{1}

was assigned a lager value than

λ_{2}

and

λ_{3}

. Because the LIDAR point cloud density and sampling frequency were lower than those of the camera, the density of the LIDAR point cloud map was limited by the flying speed and altitude of the UAV. The smoothness costmap served as a supplement to detect areas composed of the same material. Thus,

λ_{3}

was assigned the smallest value, and a landing site associated with an energy consumption cost below a certain threshold ensured that the UAV had sufficient energy to reach the site. The parameters for the sum of all costmaps shown in Figure 1 were set to 0.5, 0.3, and 0.2, respectively. Points reflecting “hot” colors indicate suitable landing sites.

4.2. Place Recognition

The most appropriate landing site can be determined according to the smooth area in the LIDAR point cloud. However, some areas that are smooth on the surface may not actually be suitable for landing. For example, a shallow water area may appear as a fairly regular plane in the LIDAR system, but landing there is a potential risk. Similarly, a road may be suitable for UAV landing, but it could cause a traffic accident. Therefore, a visual analysis of the semantic information of the scene is necessary to further ensure a safe UAV landing.

4.2.1. Cluster of Alternative Landing Sites

The scene map was constructed from data streams; however, the vast area covered by the UAV made performing a semantic analysis of all locations unrealistic. Therefore, the local maximum of the costmap was computed to identify alternative landing sites to reduce the UAV’s reaction time for making landing decisions. The costmap was projected onto the horizontal plane, which was divided into scaling grids. Each grid stored the average cost value of all projected points. The present study applied a Quick Shift algorithm to segment a set of clusters suitable for landing whose center was a local maximum in the costmap corresponding to each cluster. The Quick Shift algorithm finds the modes of each cost

C_{s u m}

by iteratively shifting each grid

g_{x i}

to the nearest neighboring grid

g_{y i}

with a lower cost value as follows:

\begin{matrix} g_{y i} = \underset{x j : D_{x j} < D_{x i}}{arg min} d (g_{x i}, g_{x j}), \\ P_{x i} = \frac{1}{N} \sum_{x j = 1}^{N} θ (d (g_{x i}, g_{x j})), \end{matrix}

(19)

where

d (g_{x i}, g_{x j})

is the distance between

g_{x i}

and

g_{x j}

; and

P_{x i}

is the density of grid

g_{x i}

with the kernel function,

θ

. The iterative procedure yields numerous clusters, which can be regarded as alternative landing sites.

4.2.2. Datasets

In this study, a concise yet effective image classification strategy was implemented, where images were organized into a 7-category framework based on landing difficulty, designated as

D_{L a n d}

. These images were categorized (Figure 5) to cover the most common scenarios encountered in UAV operations, thereby facilitating a comprehensive environmental analysis. To assess the effectiveness of this focused method relative to a broader classification scheme, images were also sorted into 30 classes based on the AID dataset’s UAV classification [42], thus creating the

D_{S c e n e}

dataset for comparison.

D_{S c e n e}

was used to indicate that the UAV recognized the attributes of the landing scene and determined whether these attributes were suitable for landing. The goal of focusing on a limited number of pertinent categories was to reduce the computational load of model training and to facilitate the incorporation of future datasets. Given the significant resource constraints in UAV systems, the development of an efficient and lightweight model was paramount.

The core image was assembled by merging three principal data sources: UC Merced Land Use [43], AID [42], and Places365 [30,44]. Although the usage scenarios for these three datasets differed, ensuring consistent target recognition allowed for the diversity of the combined datasets, thereby enhancing the model’s ability to learn and identify a wider array of environmental features. This fusion yielded a comprehensive library of 117,100 images. The dataset was divided into a training set with 81,970 images and a validation set containing 35,130 images. Furthermore, to evaluate the model’s practical resilience, 5000 low-altitude aerial photographs taken by UAVs served as the test dataset. This expansive image collection, featuring a broad range of geographic, natural, and artificial landscapes, significantly enhanced the dataset’ diversity and scope.

4.2.3. Neural Network Architecture

This study proposes a high-altitude scene recognition method that combines a deep residual network with an attention mechanism, as illustrated in Figure 6. The aim of this approach is to optimize the UAV landing site selection process.

The convolutional block attention module (CBAM), developed by Woo et al. [45], enhances the model’s handling of complex aerial views [46], making it suitable for UAV aerial scene classification. This module comprises two key submodules, namely, the Channel Attention Module (CAM) and the Spatial Attention Module (SAM), which focus on extracting different aspects of features.

The CAM, shown in Figure 6a, changes the features of different channels by applying globally pooled weights learned from the input features. Subsequently, the SAM, depicted in Figure 6b, takes these weighted channel features as input and further magnifies the model’s spatial recognition by assessing the importance of features at each location. This dual attention mechanism of adaptive filtering enables CBAM to capture key scene information with greater precision.

ResNet50 [47] served as the neural network backbone, as illustrated in the lower half of Figure 6. It was selected as the core architecture for this research, which aimed to construct an efficient model for scene recognition. The design philosophy of ResNet50 revolves around using residual learning to address the degradation problem encountered during the training of deep neural networks [48]. By incorporating skip connections, ResNet50 helps the network learn residual mappings for additional layers, rather than directly fitting the desired map. This feature enhances training efficiency and reduces the risk of over-fitting.

In UAV aerial scene recognition tasks, ResNet50 has demonstrated powerful feature learning capabilities owing to its deep structure and the stability gained through its residual learning mechanism [49]. Moreover, the efficiency of ResNet50 makes it particularly suitable for use in UAV systems with limited computational resources. Despite being a deep network, its residual connections enable it to efficiently perform forward and backward propagation without significantly increasing the computational costs. This means that UAVs can process and analyze aerial images quickly without reducing their accuracy, thereby supporting real-time decision-making.

The feature identification capabilities of ResNet50 are further enhanced when combined with the attention mechanism of CBAM. This enables the network to learn complex feature representations, thereby improving the model’s robustness and generalizability in classifying and recognizing aerial scenes.

5. Experimental Results

The proposed system was evaluated in both simulated and real scenarios. Four simulated environments including roads, vehicles, lakes, houses, forests, and various other common elements were created, and the algorithm was tested using the Gazebo 9.0 simulation software. Additionally, a five-day experiment was performed under five different real-world scenarios. The system’s performance metrics, including modeling accuracy and landing accuracy, were assessed via data logging and analysis at each experimental site and under varying conditions. The applicability of the proposed method was evaluated with respect to site selection, the timing of landing, proximity to obstacles, and UAV safety indicators.

In a real-world deployment scenario, the algorithm was deployed on an computer (NX, NVIDIA, Santa Clara, CA, USA) equipped with a six-core 64-bit CPU and a 384-core Volta architecture GPU, running on a 64-bit Ubuntu 18.04 operating system. It functioned as the motion algorithm and vision processing unit for the UAV. A flight control computer (Pixhawk, 3DR Solutions, Berkeley, CA, USA) was responsible for motor control and speed feedback as the UAV motion controller. Additionally, the NVIDIA NX was tasked with computing the proposed localization method, map construction, and landing site selection, and communicating the motion requirements of the UAV to the Pixhawk flight control computer, as illustrated in Figure 7.

5.1. Simulation Experiments

In the simulation, the Robot Operating System (ROS) served as middleware to facilitate real-time execution of various processes, including state estimation, trajectory planning, landing site detection, LIDAR, and image processing. As described in Section 1, the UAV was equipped with a LIDAR system for point cloud collection, a binocular camera for image acquisition, and an IMU for UAV motion data feedback. The sensor setup was based on its true parameters, as shown in Figure 8a. Figure 8b–e show the four simulation environments constructed for this study. To more accurately reflect real-world conditions in the simulation, a disturbance factor was introduced, we set the local average wind speed to 12 km/h, and we did not use tools outside the UAV (e.g., RTK). An iconic crossroad scene was selected for a detailed analysis.

In terms of trajectory selection, a circuitous path was adopted and evaluated within the simulation environment, as depicted in Figure 9a. Within the chosen trajectory (Table 1), the density of point clouds per square meter (

ρ_{m a p}

) over the total scanning time (

T_{t o t a l}

) was consistent with the UAV equipped with LIDAR to perform high-precision scene modeling. It successfully completed the task, maintaining a levelness deviation (

Δ α_{s c e n e}

) ≤ 1.5° at 10 randomly selected locations across each scene. Figure 9b shows the UAV’s flight trajectory to the designated landing sites.

In terms of UAV control, the Proportional–Integral–Derivative (PID) method [50] used for trajectory tracking managed the UAV’s motor speed to follow the planned trajectory. It adjusted the motors based on the current position error, past errors, and the predicted future errors. This simple yet effective approach allowed the UAV to accurately maintain its course and respond dynamically to any environmental changes, thus ensuring stable flight and precise trajectory tracking.

During the simulation phase, a comparative analysis was conducted for the UAV positioning methods: GPS, Optical Flow [16], Hybrid Optical Flow [17], and this work. Due to the limitations of image recognition in the simulation environment, a specific location from the initial site selection was used for landing. Figure 10a shows the UAV’s trajectory tracked using different methods. Figure 10b–e present the average absolute errors in the x, y, z, and

y a w

dimensions relative to the expected values for each method in the simulation. Analyzing positional errors revealed that Optical Flow [16] had notable deviations during long-distance tracking. The position adjustments using GPS and Hybrid Optical Flow [17] showed more fluctuations. In contrast, the method developed in this study leveraged the perception of environmental distances, resulting in superior positioning accuracy.

The trajectories from each method were compared with the expected trajectory by calculating the average overall trajectory deviation

E_{t r a}

and the variance of the overall trajectory deviation

σ_{t r a}^{2}

. Table 2 shows that the proposed method performed well, with a 29.31% improvement in

E_{t r a}

and a stable

σ_{t r a}^{2}

. Hybrid Optical Flow is the integration of Optical Flow and GPS, which enhances location accuracy, thus becoming a prevalent approach for outdoor positioning. The proposed method provided UAVs with positioning accuracy comparable to traditional GPS-based methods in the event of GPS component failure, demonstrating that it could serve as a viable alternative for UAV positioning in GPS-denied conditions.

5.2. Real-World Outdoor Experiments

Five common real scenes (including turf, roof parking lot, and cottage area) were selected, as shown in Figure 11. The UAV was set up to fly into the experimental area at a height of 30 m, and the point cloud and image were collected and processed. At the beginning of the experiment, multiple locations were calculated and identified via a scoring selection of the point cloud data by the algorithm. In addition, the UAV was tested regarding the regional identification of multiple landing sites via autonomous location. The algorithm first evaluated whether the landing area where the target landing site was located met the landing conditions. The suitable accurate landing site was calculated and compared with the physical environment information to assess the rationality of the selection.

5.2.1. Costmap Selection of Landing Sites

The experimental process (Figure 11) began with a predefined UAV trajectory, and sensors were used to capture point clouds and images. The algorithm integrated area information and autonomously identified potential landing sites from the point cloud data by calculating the costmaps. The experimental results demonstrated that the terrain analysis algorithm developed in this study could successfully identify several potential safe landing sites. However, in certain cases, despite meeting the terrain requirements of the initial experiment, the selected landing site was near areas unsuitable for landing, such as the swimming pool area shown in Figure 11e(iii). This limitation primarily stemmed from using point cloud data alone to evaluate the characteristics of a specific area. Consequently, the region recognition landing approach proposed in this study became necessary.

5.2.2. Place Recognition

To verify the accuracy of the network architecture used in this study, the performance of several well-known models was compared using

D_{S c e n e}

and

D_{L a n d}

. These models were deployed on NVIDIA NX to test the efficiency of the algorithm on an edge computing platform. The models were trained and tested with a validation set of 81,970 images (same images but different classifications), and the results are summarized in Table 3. The performances of various models were evaluated using the validation dataset, thus highlighting their applicability and effectiveness in real-world scenarios.

Analyzing the accuracy of the validation data (right-most column of Table 3) revealed that the ResNet series of models demonstrated outstanding performance compared with other basic network architecture frameworks. In particular, ResNet50 and ResNet101 excelled in terms of accuracy. When the dataset was subdivided into seven main categories, as shown in Table 3, each network’s performance on the

D_{L a n d}

dataset exhibited stronger generalizability. Almost all networks showed significant improvements in recognition accuracy during inference tasks. This confirmed that

D_{L a n d}

, which directly determined whether the scene was suitable for landing, was a better scheme than

D_{S c e n e}

, which determined the properties of the scene before deciding whether the scene was suitable for landing. In addition, this method verified that reducing the dataset in this way could improve the practicability of the UAV’s ability to perform complex scene recognition tasks.

In contrast to approaches like that described by Dhami et al. [24], which start with semantic segmentation before identifying safe landing zones [18] with the same kind of dataset, this work leverages a point cloud costmap to obviate the segmentation process when selecting landing areas. By relying on a CNN for semantic scene recognition, this approach streamlines the algorithm and significantly enhances its stability.

A performance evaluation was carried out on the test dataset comprising 5000 images of multiple scenes collected by aerial photography to ensure the efficient operation and stability of airborne computing systems in practical environments and to effectively reduce potential overload risks. Performance metrics, such as storage requirements (i.e., ROM), runtime, and memory usage (i.e., RAM) were examined for image classification tasks in airborne computers. Table 4 shows the performance of various network models for these computational performance metrics. ResNeXt50 and ResNet50 showed excellent resource utilization. In contrast to the deeper ResNet101 or the shallower ResNet18 and ResNet34, ResNet50 provided a better balance, maintaining higher accuracy while requiring fewer computational resources. This justified the selection of ResNet50 as the primary model for the UAV landing area recognition algorithm proposed herein.

In the evaluation, the CBAM-enhanced ResNet50 model demonstrated superior performance, achieving higher accuracy than the traditional ResNet101 model with the test dataset (right-most column of Table 4). These results confirmed the significant impact of CBAM on the decision-making process of the ResNet50 model and highlighted its exceptional ability for extracting important features and capturing key information. Notably, despite the addition of the CBAM module, the improved model required a minimal increase in ROM, and it even exhibited better performance in terms of memory usage. Particularly in terms of memory efficiency, the ResNet50+CBAM model performed better than other advanced models, such as ResNeXt50 and MobileNetV3, which is important for memory-constrained airborne environments.

Furthermore, considering the running efficiency requirements for practical applications, the ResNet50+CBAM model’s ability to process each image with a short running time further validated its potential applicability for real-time image processing tasks (fourth column of Table 4). These advantages make the CBAM-enhanced ResNet50 model an ideal choice for airborne image processing systems because it exhibits higher performance, while maintaining an efficient use of computational resources.

5.2.3. Landing Analysis

The evaluation index for landing stability in the final moment of landing

t_{L}

relied on the average horizontal degree

α_{L}

of the selected site, the offset

d_{L}

from the expected position, the vertical–horizontal direction speed

v_{z}

, and the vertical–horizontal direction acceleration

a_{z}

. The UAV position posture at

t_{L}

was also considered, as shown in Table 5. The reference object was detected using the methods evaluated in this study, i.e., GPS, Optical Flow [16], Hybrid Optical Flow [17], and our method, and the UAV landed at the designated position.

The latitude was set as the y-axis, and the longitude was set as the x-axis. A coordinate system was established to assess how the different landing solutions deviated from the target point during the actual landing. Figure 12 shows the deviation from the expected target point of the UAV landing area when five experiments were conducted using each of the four methods. The GPS landing approach had larger errors due to the noise introduced by the device itself. Over multiple experiments, the Hybrid Optical Flow significantly improved its performance [17]. The method developed in this study could achieve high landing accuracy and meet the needs of emergency situations as an alternative to traditional methods (green arrow in Figure 12) because it used a map generated from collected point clouds for more accurate positioning. This resulted in closer proximity to the landing center point. The algorithm leveraged the precision of LIDAR to identify the distance to obstacles during landing, which significantly enhanced the accuracy and robustness of detecting the UAV’s altitude relative to the ground. This advantage is detailed in Table 5, which illustrates the effectiveness of this approach compared with other methods. As a result, the UAV could maintain a safe flight speed during landing, thus enabling it to smoothly land at the target point.

The real-world UAV landing performance of the proposed method, demonstrated across five types of scenarios, is presented in Table 6. As expected, with the algorithm, the UAV autonomously landed in safe and relatively flat areas without the need for GPS, thus highlighting its effectiveness, especially in emergency situations. The algorithm developed in this study improved the landing search process by assessing both the site type and the terrain, which significantly enhanced the robustness and reliability of UAV landings.

5.2.4. Runtime Analysis

Five tests were conducted in each real-world scenario to verify the overall reliability of the proposed algorithm. As shown in Table 7, a significant proportion of time was allocated to scene recognition due to the high computational complexity of some algorithms; however, this was a necessity for ensuring a safe landing. Moreover, new modules and factors can be seamlessly integrated or substituted within the landing framework proposed in this study, thereby further enhancing the landing performance of UAVs.

6. Conclusions

This report described a comprehensive framework for UAV landing in various environments. To solve the localization problem facing UAVs in GPS-denied environments, a LIDAR–camera system was developed to generate geometric and color terrain maps of the environment. Then, based on the LIDAR–camera mapping technology, a cost function combining the weighted energy texture and geometric plane was proposed to identify and evaluate potential landing sites. The cost function combined 3D terrain and 2D scene recognition processes, which increased the rationality and safety of UAV landing site selection. Finally, a dataset classification method suitable for UAV landings based on scene landing difficulty was proposed, and a CBAM+ResNet50 network structure was designed for dataset classification to ensure a safe final landing. Extensive experiments were carried out in simulated and real-world scenarios. In the simulated scenarios, the selection of a UAV operating trajectory was confirmed, and the developed approach was compared with various positioning methods to verify the stability of the proposed positioning scheme in long-distance trajectories by measuring the error relative to the ideal trajectory. In the real scenarios, a variety of network structures were compared with the classification method proposed in this study using a traditional dataset that judged the landing rationality based on certain attributes; all were deployed to the real machine for testing. The results showed that the proposed dataset and network structure improved UAVs’ scene classification and were suitable for UAV scene recognition for landing. In addition, the entire system was tested extensively in different types of scenarios to verify the robustness of the overall LIDAR–camera method. Future research efforts will aim to enhance the landing trajectory selection process, UAV motion control algorithm, and network structure utilization.

Author Contributions

Conceptualization, C.Z. and L.K.; methodology, C.Z.; software, C.Z.; validation, C.Z. and Y.S.; formal analysis, C.Z. and Y.S.; investigation, C.Z. and Y.S.; resources, C.Z. and L.K.; data curation, C.Z. and Y.S.; writing—original draft preparation, C.Z. and Y.S.; writing—review and editing, C.Z. and Y.S.; visualization, C.Z. and Y.S.; supervision, L.K.; project administration, C.Z. and L.K.; funding acquisition, C.Z. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fujian Natural Science Foundation Project (project no. 2021J05213) and Scientific Research Start Foundation of Fujian University of Technology (project no. GY-Z20167).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and code that support the findings of this study are available from the corresponding authors upon reasonable request.

Acknowledgments

We would like to express our sincere gratitude to all editors, reviewers, and staff who participated in the review of this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Navia, J.; Mondragon, I.; Patino, D.; Colorado, J. Multispectral mapping in agriculture: Terrain mosaic using an autonomous quadcopter UAV. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; IEEE: Piscataway, NY, USA, 2016; pp. 1351–1358. [Google Scholar]
Stefanik, K.V.; Gassaway, J.C.; Kochersberger, K.; Abbott, A.L. UAV-based stereo vision for rapid aerial terrain mapping. GIScience Remote Sens. 2011, 48, 24–49. [Google Scholar] [CrossRef]
Tomic, T.; Schmid, K.; Lutz, P.; Domel, A.; Kassecker, M.; Mair, E.; Grixa, I.L.; Ruess, F.; Suppa, M.; Burschka, D. Toward a fully autonomous UAV: Research platform for indoor and outdoor urban search and rescue. IEEE Robot. Autom. Mag. 2012, 19, 46–56. [Google Scholar] [CrossRef]
Nikolic, J.; Burri, M.; Rehder, J.; Leutenegger, S.; Huerzeler, C.; Siegwart, R. A UAV system for inspection of industrial facilities. In Proceedings of the 2013 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2013; IEEE: Piscataway, NY, USA, 2013; pp. 1–8. [Google Scholar]
Guo, Y.; Wu, M.; Tang, K.; Tie, J.; Li, X. Covert spoofing algorithm of UAV based on GPS/INS-integrated navigation. IEEE Trans. Veh. Technol. 2019, 68, 6557–6564. [Google Scholar] [CrossRef]
Zhang, X.; Zhu, F.; Tao, X.; Duan, R. New optimal smoothing scheme for improving relative and absolute accuracy of tightly coupled GNSS/SINS integration. Gps Solut. 2017, 21, 861–872. [Google Scholar] [CrossRef]
Obu, J.; Lantuit, H.; Grosse, G.; Günther, F.; Sachs, T.; Helm, V.; Fritz, M. Coastal erosion and mass wasting along the Canadian Beaufort Sea based on annual airborne LiDAR elevation data. Geomorphology 2017, 293, 331–346. [Google Scholar] [CrossRef]
Lin, Y.C.; Cheng, Y.T.; Zhou, T.; Ravi, R.; Hasheminasab, S.M.; Flatt, J.E.; Troy, C.; Habib, A. Evaluation of UAV LiDAR for mapping coastal environments. Remote Sens. 2019, 11, 2893. [Google Scholar] [CrossRef]
Guo, Q.; Su, Y.; Hu, T.; Zhao, X.; Wu, F.; Li, Y.; Liu, J.; Chen, L.; Xu, G.; Lin, G.; et al. An integrated UAV-borne lidar system for 3D habitat mapping in three forest ecosystems across China. Int. J. Remote Sens. 2017, 38, 2954–2972. [Google Scholar] [CrossRef]
Suzuki, T.; Inoue, D.; Amano, Y. Robust UAV Position and Attitude Estimation using Multiple GNSS Receivers for Laser-based 3D Mapping. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019. [Google Scholar]
Guo, Y.; Guo, J.; Liu, C.; Xiong, H.; Chai, L.; He, D. Precision Landing Test and Simulation of the Agricultural UAV on Apron. Sensors 2020, 20, 3369. [Google Scholar] [CrossRef] [PubMed]
Vidal, V.; Honório, L.; Santos, M.; Silva, M.; Cerqueira, A.; Oliveira, E. UAV vision aided positioning system for location and landing. In Proceedings of the 2017 18th International Carpathian Control Conference (ICCC), Sinaia, Romania, 28–31 May 2017; IEEE: Piscataway, NY, USA, 2017; pp. 228–233. [Google Scholar]
Rabah, M.; Rohan, A.; Talha, M.; Nam, K.H.; Kim, S.H. Autonomous vision-based target detection and safe landing for UAV. Int. J. Control. Autom. Syst. 2018, 16, 3013–3025. [Google Scholar] [CrossRef]
Hinzmann, T.; Stastny, T.; Cadena, C.; Siegwart, R.; Gilitschenski, I. Free LSD: Prior-free visual landing site detection for autonomous planes. IEEE Robot. Autom. Lett. 2018, 3, 2545–2552. [Google Scholar] [CrossRef]
Forster, C.; Faessler, M.; Fontana, F.; Werlberger, M.; Scaramuzza, D. Continuous on-board monocular-vision-based elevation mapping applied to autonomous landing of micro aerial vehicles. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; IEEE: Piscataway, NY, USA, 2015; pp. 111–118. [Google Scholar]
Miller, A.; Miller, B.; Popov, A.; Stepanyan, K. UAV landing based on the optical flow videonavigation. Sensors 2019, 19, 1351. [Google Scholar] [CrossRef]
Cheng, H.W.; Chen, T.L.; Tien, C.H. Motion estimation by hybrid optical flow technology for UAV landing in an unvisited area. Sensors 2019, 19, 1380. [Google Scholar] [CrossRef]
Zheng, W.; Yi, J.; Xiang, H.; Zhou, B.; Wang, D.; Zhao, C. A Study for UAV Autonomous Safe Landing-Site Selection on Rough Terrain. In Proceedings of the The 2nd International Conference on Computing and Data Science, Stanford, CA, USA, 28–30 January 2021; pp. 1–7. [Google Scholar]
Yang, L.; Wang, C.; Wang, L. Autonomous UAVs landing site selection from point cloud in unknown environments. ISA Trans. 2022, 130, 610–628. [Google Scholar] [CrossRef]
Loureiro, G.; Dias, A.; Martins, A.; Almeida, J. Emergency landing spot detection algorithm for unmanned aerial vehicles. Remote Sens. 2021, 13, 1930. [Google Scholar] [CrossRef]
Singh, J.; Adwani, N.; Kandath, H.; Krishna, K.M. RHFSafeUAV: Real-Time Heuristic Framework for Safe Landing of UAVs in Dynamic Scenarios. In Proceedings of the 2023 International Conference on Unmanned Aircraft Systems (ICUAS), Warsaw, Poland, 6–9 June 2023; IEEE: Piscataway, NY, USA, 2023; pp. 863–870. [Google Scholar]
Kaljahi, M.A.; Shivakumara, P.; Idris, M.Y.I.; Anisi, M.H.; Lu, T.; Blumenstein, M.; Noor, N.M. An automatic zone detection system for safe landing of UAVs. Expert Syst. Appl. 2019, 122, 319–333. [Google Scholar] [CrossRef]
Guérin, J.; Delmas, K.; Guiochet, J. Certifying emergency landing for safe urban uav. In Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Taipei, Taiwan, 21–24 June 2021; IEEE: Piscataway, NY, USA, 2021; pp. 55–62. [Google Scholar]
Dhami, H.S.; Ignatyev, D.; Tsourdos, A. Semantic segmentation based mapping systems for the safe and precise landing of flying vehicles. IFAC Pap. Online 2022, 55, 310–315. [Google Scholar] [CrossRef]
Liu, F.; Shan, J.; Xiong, B.; Fang, Z. A real-time and multi-sensor-based landing area recognition system for uavs. Drones 2022, 6, 118. [Google Scholar] [CrossRef]
Alam, M.S.; Oluoch, J. A survey of safe landing zone detection techniques for autonomous unmanned aerial vehicles (UAVs). Expert Syst. Appl. 2021, 179, 115091. [Google Scholar] [CrossRef]
Yan, L.; Qi, J.; Wang, M.; Wu, C.; Xin, J. A safe landing site selection method of UAVs based on LiDAR point clouds. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; IEEE: Piscataway, NY, USA, 2020; pp. 6497–6502. [Google Scholar]
Lee, H.; Cho, S.; Jung, H. Real-time collision-free landing path planning for drone deliveries in urban environments. ETRI J. 2023, 45, 746–757. [Google Scholar] [CrossRef]
Chen, C.; Chen, S.; Hu, G.; Chen, B.; Chen, P.; Su, K. An auto-landing strategy based on pan-tilt based visual servoing for unmanned aerial vehicle in GNSS-denied environments. Aerosp. Sci. Technol. 2021, 116, 106891. [Google Scholar] [CrossRef]
Du, H.; Wang, W.; Wang, X.; Wang, Y. Autonomous landing scene recognition based on transfer learning for drones. J. Syst. Eng. Electron. 2023, 34, 28–35. [Google Scholar] [CrossRef]
González-Trejo, J.; Mercado-Ravell, D.; Becerra, I.; Murrieta-Cid, R. On the visual-based safe landing of UAVs in populated areas: A crucial aspect for urban deployment. IEEE Robot. Autom. Lett. 2021, 6, 7901–7908. [Google Scholar] [CrossRef]
Shen, K.; Zhuang, Y.; Zhu, Y. Incremental learning-based land mark recognition for mirco-UAV autonomous landing. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; IEEE: Piscataway, NY, USA, 2020; pp. 6786–6791. [Google Scholar]
Kim, C.; Lee, E.M.; Choi, J.; Jeon, J.; Kim, S.; Myung, H. ROLAND: Robust landing of UAV on moving platform using object detection and UWB based extended kalman filter. In Proceedings of the 2021 21st International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 12–15 October 2021; IEEE: Piscataway, NY, USA, 2021; pp. 249–254. [Google Scholar]
Gao, Y.; Ji, J.; Wang, Q.; Jin, R.; Lin, Y.; Shang, Z.; Cao, Y.; Shen, S.; Xu, C.; Gao, F. Adaptive Tracking and Perching for Quadrotor in Dynamic Scenarios. IEEE Trans. Robot. 2023, 40, 499–519. [Google Scholar] [CrossRef]
Ioannou, Y.; Taati, B.; Harrap, R.; Greenspan, M. Difference of normals as a multi-scale operator in unorganized point clouds. In Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, Zurich, Switzerland, 13–15 October 2012; IEEE: Piscataway, NY, USA, 2012; pp. 501–508. [Google Scholar]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Moré, J.J. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis; Springer: Berlin/Heidelberg, Germany, 1978; pp. 105–116. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; Volume 2. [Google Scholar]
Lin, J.; Zhang, F. Loam livox: A fast, robust, high-precision LiDAR odometry and mapping package for LiDARs of small FoV. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3126–3131. [Google Scholar] [CrossRef]
Tsai, C.H.; Lin, Y.C. An accelerated image matching technique for UAV orthoimage registration. ISPRS J. Photogramm. Remote Sens. 2017, 128, 130–145. [Google Scholar] [CrossRef]
Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Gao, J.; Zhang, B.; Wu, Y.; Guo, C. Building Extraction from High Resolution Remote Sensing Images Based on Improved Mask R-CNN. In Proceedings of the 2022 4th International Conference on Robotics and Computer Vision (ICRCV), Wuhan, China, 25–27 September 2022; IEEE: Piscataway, NY, USA, 2022; pp. 1–6. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zheng, J.; Zhao, Y.; Wu, W.; Chen, M.; Li, W.; Fu, H. Partial domain adaptation for scene classification from remote sensing imagery. IEEE Trans. Geosci. Remote. Sens. 2022, 61, 1–17. [Google Scholar] [CrossRef]
Liu, X.; Zhou, Y.; Zhao, J.; Yao, R.; Liu, B.; Ma, D.; Zheng, Y. Multiobjective ResNet pruning by means of EMOAs for remote sensing scene classification. Neurocomputing 2020, 381, 298–305. [Google Scholar] [CrossRef]
Katigbak, C.N.R.; Garcia, J.R.B.; Gutang, J.E.D.; De Villa, J.B.; Alcid, A.D.C.; Vicerra, R.R.P.; Cruz, A.R.D.; Roxas, E.A.; Serrano, K.K.D. Autonomous trajectory tracking of a quadrotor UAV using PID controller. In Proceedings of the 2015 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Cebu, Philippines, 9–12 December 2015; IEEE: Piscataway, NY, USA, 2015; pp. 1–5. [Google Scholar]

Figure 1. Landing framework involving a LIDAR–camera positioning system.

Figure 2. Fusion of LIDAR point cloud and image.

Figure 3. LIDAR and camera positioning framework.

Figure 4. Example showing that the local smoothness of LIDAR points is estimated on the image plane. Left: the original image is captured from the UAV, and the SED image is applied to evaluate the edge energy. Two LIDAR points (A and B) are selected to show the computation of

f_{c}

. Right: Plane and edge points are evaluated in terms of their local smoothness by summing the SED energy along the minimum route to their neighbors.

Figure 4. Example showing that the local smoothness of LIDAR points is estimated on the image plane. Left: the original image is captured from the UAV, and the SED image is applied to evaluate the edge energy. Two LIDAR points (A and B) are selected to show the computation of

f_{c}

. Right: Plane and edge points are evaluated in terms of their local smoothness by summing the SED energy along the minimum route to their neighbors.

Figure 5. The

D_{S c e n e}

and

D_{L a n d}

datasets were derived from the UC Merced Land Use, AID, and Places365 datasets.

Figure 5. The

D_{S c e n e}

and

D_{L a n d}

datasets were derived from the UC Merced Land Use, AID, and Places365 datasets.

Figure 6. ResNet50 neural network residual function with convolutional block attention module network architecture, (a) Internal structure diagram of CAM module, (b) Internal structure diagram of SAM module.

Figure 7. Schematic diagram of the UAV experimental platform.

Figure 8. Simulation of UAVs and constructed scenes. (a) System configuration of the UAV in the simulation including Livox Horizon and RealSense with scanning ranges parallel to the ground; the images in (b–e) are virtual environments based on four common scenarios.

Figure 9. Running trajectories of the UAV: (a) terrain scanning trajectory of the UAV; (b) landing designated trajectory of the UAV.

Figure 10. Comparative analysis of the absolute error of the UAV trajectory using four positioning methods during algorithm execution: (a) comparing each positioning technology’s trajectory with the expected value; (b–e) evaluating the mean absolute error of the UAV trajectory for each method over 10 simulations.

Figure 11. Based on the image and point cloud information acquired during the experiment and the confirmation of the landing site, (i–iii) illustrate the steps of the experimental process, and (a–e) show the types of experimental sites selected for the outdoor tests (part of the point cloud map is rotated). The blue points marked in the lower column show the approximate landing site selection, and the red points show the final landing site.

Figure 12. Comparison of UAV landing positions with different methods.

Table 1. Comparison of point cloud densities collected by a UAV in the selected trajectory and the levelness deviation among four scenes.

$T_{total}$ (s)	$ρ_{map}$ (Points/ $m^{2}$ )	$Δ α_{scene 1}$ (°)	$Δ α_{scene 2}$ (°)	$Δ α_{scene 3}$ (°)	$Δ α_{scene 4}$ (°)
36.11	111	$\pm 1.19$	$\pm 0.89$	$\pm 0.96$	$\pm 1.19$

Table 2. The overall average and variance generated by different methods relative to the expected trajectory.

Method	GPS	Optical Flow [16]	Hybrid Optical Flow [17]	Ours
$E_{t r a} (m)$	1.0112	2.0538	0.5193	0.3671
$σ_{t r a}^{2} (m^{2})$	0.5083	1.5819	0.1828	0.1568

The bold values indicate the best performance among the compared methods.

Table 3. Results of different neural network models trained on

D_{S c e n e}

and

D_{L a n d}

.

Table 3. Results of different neural network models trained on

D_{S c e n e}

and

D_{L a n d}

.

Network	Dataset	Loss	F1	Recall	TOP1 Accuracy
ResNeXt50	$D_{S c e n e}$	0.417	0.736	0.732	0.762
ResNeXt50	$D_{L a n d}$	0.362	0.791	0.782	0.839
Vgg19	$D_{S c e n e}$	0.389	0.768	0.753	0.776
Vgg19	$D_{L a n d}$	0.332	0.811	0.792	0.832
GoogleNetV4	$D_{S c e n e}$	0.468	0.681	0.668	0.756
GoogleNetV4	$D_{L a n d}$	0.402	0.769	0.706	0.844
MobileNetV3	$D_{S c e n e}$	0.437	0.703	0.671	0.754
MobileNetV3	$D_{L a n d}$	0.384	0.775	0.742	0.821
ResNet18	$D_{S c e n e}$	0.521	0.621	0.548	0.703
ResNet18	$D_{L a n d}$	0.469	0.714	0.625	0.769
ResNet34	$D_{S c e n e}$	0.491	0.695	0.683	0.714
ResNet34	$D_{L a n d}$	0.412	0.743	0.714	0.801
ResNet50	$D_{S c e n e}$	0.441	0.686	0.674	0.737
ResNet50	$D_{L a n d}$	0.356	0.813	0.808	0.829
ResNet101	$D_{S c e n e}$	0.398	0.765	0.761	0.793
ResNet101	$D_{L a n d}$	0.305	0.838	0.832	0.851
ResNet50 +CBAM	$D_{S c e n e}$	0.402	0.767	0.758	0.786
ResNet50 +CBAM	$D_{L a n d}$	0.322	0.845	0.842	0.857

The bold values indicate the best performance among the compared methods.

Table 4. Performance results of different network architectures on the test dataset.

Network	ROM (Mb)	RAM (Mb)	RunTime (s)	Loss	F1	Recall	TOP1 Accuracy
ResNeXt50	88	901.493	0.95	0.342	0.726	0.712	0.794
Vgg19	532	896.237	0.84	0.331	0.713	0.703	0.761
GoogleNetV4	157	887.847	1.09	0.385	0.748	0.735	0.792
MobileNetV3	154	914.367	1.18	0.379	0.734	0.721	0.777
ResNet18	72	966.265	0.92	0.396	0.682	0.657	0.761
ResNet34	83	923.461	0.96	0.329	0.701	0.673	0.773
ResNet50	90	874.207	0.96	0.336	0.716	0.698	0.799
ResNet101	162	863.052	1.03	0.314	0.765	0.753	0.817
ResNet50+CBAM	92	855.925	0.98	0.328	0.817	0.823	0.841

The bold values indicate the best performance among the compared methods.

Table 5. Evaluation of site selection results obtained by different methods.

	$α_{L}$ (°)	$d_{L}$ (m)	$v_{z}$ (m/s)	$a_{z}$ (m/ $s^{2}$ )
GPS	3.11	1.41	0.32	0.21
Optical Flow [16]	2.36	1.31	0.29	0.18
Hybrid Optical Flow [17]	1.83	0.54	0.17	0.15
Ours	1.99	0.41	0.10	0.04

The bold values indicate the best performance among the compared methods.

Table 6. Autonomous UAV landing results for multiple experimental scenarios.

	(a) Turf	(b) Rooftop Parking Lots	(c) Cottage Area	(d) Footbridge	(e) Swimming Pools
Obstacle distance (m)	4.52	2.12	1.58	1.86	4.33
Average level of site selection (°)	3.52	2.32	4.11	2.45	3.18
Speed at landing (m/s)	0.13	0.12	0.15	0.13	0.10
Type of landing site	Lawn (safe)	Path (safe)	Road (safe)	Road (risky)	Field (safe)

Table 7. The computational runtime of the proposed landing site identification method.

Algorithm	Time Cost (ms)
Down simple	101.75 ± 5.49
Flatness	103.35 ± 8.27
Smooth canny	8.64 ± 2.51
Euclidean distance	7.17 ± 2.93
Places classification	3920.21 ± 54.04
Total	4141.12 ± 73.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, C.; Sun, Y.; Kong, L. Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System. Appl. Sci. 2024, 14, 6079. https://doi.org/10.3390/app14146079

AMA Style

Zou C, Sun Y, Kong L. Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System. Applied Sciences. 2024; 14(14):6079. https://doi.org/10.3390/app14146079

Chicago/Turabian Style

Zou, Cheng, Yezhen Sun, and Linghua Kong. 2024. "Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System" Applied Sciences 14, no. 14: 6079. https://doi.org/10.3390/app14146079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unmanned Aerial Vehicle Landing on Rugged Terrain by On-Board LIDAR–Camera Positioning System

Abstract

1. Introduction

2. Related Work

2.1. UAV Landing on Rugged Terrain

2.2. UAV SLAM during the Landing Process

2.3. Deep Learning for Landing Site Recognition

3. Proposed Positioning Framework

3.1. Unmanned Aerial Vehicle

3.2. Sensor Fusion

3.3. Positioning

3.3.1. LIDAR Position

3.3.2. Camera Position

4. Landing Site Detection

4.1. Construction of the Cost Energy Map

4.2. Place Recognition

4.2.1. Cluster of Alternative Landing Sites

4.2.2. Datasets

4.2.3. Neural Network Architecture

5. Experimental Results

5.1. Simulation Experiments

5.2. Real-World Outdoor Experiments

5.2.1. Costmap Selection of Landing Sites

5.2.2. Place Recognition

5.2.3. Landing Analysis

5.2.4. Runtime Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI