1. Introduction
With the development of 3D scanning and imaging technologies, the acquisition of point cloud data has become increasingly easy, and its applications have expanded from remote sensing information to a wide range of fields, such as robotics, virtual reality, automated driving and smart cities. Three-dimensional point clouds have shown strong vitality and great application potential, facilitating the interaction between machines and the real world [
1]. Point cloud semantic segmentation is an important part of 3D processing technology. By giving machines the ability to recognize and classify elements of the surrounding environment, semantic segmentation plays an important role in enhancing our perception of the world [
2,
3]. The segmentation results directly affect subsequent applications such as autonomous driving and robot navigation, and therefore, accurate and effective point cloud segmentation is significant.
Point clouds possess characteristics such as disorder, unstructured, high-dimensional and non-uniform density. Currently, the mainstream method used for point cloud semantic segmentation is deep learning. Among them, two commonly used methods are point-based and graph convolution-based ones. Although, methods such as PointNet [
4] and pointnet++ [
5] process points independently on a local scale to maintain permutation invariance. However, this independence ignores the geometric relationships between points, resulting in an inability to capture local features [
6,
7]. The graph convolution-based methods, such as GACNet [
8] and HDGCN [
9], model point clouds with graphs, then learn point cloud features using graph neural networks. The methods improve the performance of point cloud segmentation to a greater extent. However, they consider only pairwise relationships among data while ignoring higher-order relationships. The higher-order relationships refer to those between two or more objects [
10]. A growing number of studies have shown that focusing on higher-order relationships can help to dig deeper into the potential connections between data samples, thus improving the capability of the model. Hypergraph is a generalized graph structure that extends the traditional notion of a graph. It is composed of a vertex set and a hyperedge set. The hyperedge set is a collection of subsets of the vertex set, and each hyperedge can connect one or more vertices [
11]. This structure allows hypergraphs to represent and handle complex relationships more flexibly.
Due to the advantages of hypergraph in modeling data correlation. In recent years, some scholars have adopted hypergraph tools to analyze and process point cloud data, but the related studies are quite sparse. Zhang et al. [
12] introduced a tensor-based approach to estimate the hypergraph spectral components and frequency coefficients of the point cloud in both ideal and noisy environments, established an analytical relationship between hypergraph frequencies and structural features, then evaluated the effectiveness of hypergraph spectra in the tasks of point cloud sampling and denoising. Subsequently, they investigated the capability of hypergraph spectral analysis in unsupervised segmentation of 3D point clouds [
13]. In addition, Jiang et al. [
14] proposed a 3D object detection method for noisy point clouds based on hypergraph construction–compression–conversion based on the fact that the hypergraph is robust to noise, by constructing hypergraphs with multi-scale voxelized structures and clustering methods, then transforming hypergraphs into graphs, as well as learning the features using graph neural networks. Deng et al. [
15] investigated point cloud resampling based on hypergraph signal processing (HGSP) and designed hypergraph spectral filters to capture multilateral interactions between nodes. In the above methods, the method of utilizing hypergraph spectrum and hypergraph signal processing will bring the problem of excessive computational complexity, and the method of converting hypergraph to graph not only brings a conversion cost but also leads to loss of higher-order information. Therefore, it is necessary to apply deep learning techniques to the hypergraph, such as constructing hypergraph neural networks, to give full access to their ability in representation learning by utilizing higher-order data correlations, so as to comprehensively explore the potential information in the data and obtain better point cloud semantic segmentation performance.
A hypergraph neural network is a neural network structure that utilizes higher-order data correlations for representation learning. Compared to graph neural networks, it is better able to capture both global and local information in data. Currently, hypergraph neural networks have shown excellent performance in a variety of tasks such as object retrieval and classification [
16], action recognition [
17], sentiment prediction [
18] and recommender systems [
19]. In the literature [
20,
21,
22,
23], K-Nearest Neighbor strategy is used to construct hypergraphs, and a hyperedge convolution operator is proposed to obtain the output features of the vertices by aggregating hyperedge features in which the vertices are located. The literature [
19,
21,
24] considers the attention of the node layer and the hyperedge layer and introduces the attention mechanism into the vertex convolution and hyperedge convolution process, which automatically learns the different weights of the vertices and hyperedges during the feature transformation and propagation process. Although hypergraph neural networks have shown significant advantages in a variety of tasks, there are still some challenges in applying them to point cloud semantic segmentation. Firstly, in the current hypergraph neural networks, the widely adopted hyperedge convolution operator [
20,
25] can effectively aggregate the local information of the nodes in the hypergraph, but in the face of discrete and disordered point cloud data, it is difficult to capture the correlations between local and global features in the data, which leads to the incompleteness and distortion of the information. Secondly, some current hypergraph attention operators [
26,
27] adopt the vertex–hyperedge–vertex feature transformation mode by using hyperedges as the intermediate layer, but this practice increases the number of parameters, leading to an increase in model complexity. These limitations make it difficult for the current hypergraph neural networks to process large-scale point cloud data. Therefore, it is urgent to design more suitable hypergraph neural network structures for point cloud data processing, thereby better improving the semantic segmentation performance of point clouds.
To this end, we propose an end-to-end hypergraph deep learning framework, i.e., hypergraph position attention convolution network framework, for semantic segmentation of point clouds. Specifically, in order to efficiently organize disordered, unstructured, and high-dimensional point clouds, we construct a hypergraph to capture correlations between point clouds by combining the farthest point sampling and ball query methods. Then, we propose a hyperedge position attention convolution operator for extracting high-level semantic features of point clouds. This operator adopts the hyperedge–hyperedge feature propagation model, which not only effectively utilizes the spatial positional information and higher-order information of the point cloud but also avoids the vertex to hyperedge propagation process, reducing the number of network parameters. Finally, we design a ResNet-like module for feature learning, which further improves the efficiency of the network by introducing deep convolution into the network. The main contributions of our work are summarized as follows:
We propose a new hyperedge position attention convolution module for feature information extraction, which makes the network focus more on task-related feature information through the position information of the points and the combination of hyperedges generated from other features.
We design a hypergraph position attention convolution network framework for the semantic segmentation of point clouds. Particularly, we introduced a ResNet-like deep convolution module to lighten the network and improve its efficiency.
We perform segmentation and a series of ablation experiments on the S3IDS and ShapeNet Part datasets to validate the performance of the proposed method.