Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Applying Design Thinking to Develop AI-Based Multi-Actor Decision-Support Systems: A Case Study on Human Capital Investments
Previous Article in Journal
Effect of Vitamin C, D3, Ca Supplements and Olive Paste Enriched with Mountain Tea on Health Biomarkers in Postmenopausal Women with Osteopenia or Osteoporosis: A Prospective Interventional Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Feature Fusion Method Based on Graph Convolutional Networks

School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(13), 5612; https://doi.org/10.3390/app14135612
Submission received: 21 May 2024 / Revised: 22 June 2024 / Accepted: 24 June 2024 / Published: 27 June 2024

Abstract

:
This paper proposes an enhanced BertGCN-Fusion (BGF) model aimed at addressing the limitations of Graph Convolutional Networks (GCN) in processing global text features for text categorization tasks. While traditional GCN effectively capture local structural features, they face challenges when integrating global semantic features. Issues such as the potential loss of global semantic information due to local feature fusion and limited depth of information propagation are prevalent. To overcome these challenges, the BGF model introduces improvements based on the BertGCN framework: (1) Feature fusion mechanism: Introducing a linear layer to fuse BERT outputs with traditional features facilitates the integration of fine-grained local semantic features from BERT with traditional global features. (2) Multilayer fusion approach: Employing a multilayer fusion technique enhances the integration of textual semantic features, thereby comprehensively and accurately capturing text semantic information. Experimental results demonstrate that the BGF model achieves notable performance improvements across multiple datasets. On the R8 and R52 datasets, the BGF model achieves accuracies of 98.45% and 93.77%, respectively, marking improvements of 0.28% to 0.90% compared to the BertGCN model. These findings highlight the BGF model’s efficacy in overcoming the deficiencies of traditional GCN in processing global semantic features, presenting an efficient approach for handling text data.

1. Introduction

The task of text categorization is a crucial research direction in the field of natural language processing (NLP), which is widely used and has significant application value in email categorization [1], data mining [2], intelligence analysis [3], and sentiment classification [4].
Traditional machine-learning-based text classification algorithms, including Plain Bayes [5], Support Vector Machine (SVM) [6], and Decision Tree [7] algorithms, typically involve converting text into feature vectors. However, with changes in training samples, classification results may fluctuate or even deteriorate. With the rapid advancement of deep learning, the BERT (Bidirectional Encoder Representations from Transformers) model [8], which is a deep learning model, was created by Google AI Language researchers. This model has attained remarkable success in natural language processing [9], prompting an increasing number of researchers to apply it to text classification. Deep learning models can improve a model’s generalization ability compared to traditional models by learning to generate high-dimensional embedded representations of semantic and syntactic information.
Text classification methods based on deep learning can be broadly categorized into three types: text classification models based on a Convolutional Neural Network, text classification models based on the attention mechanism, and text classification models based on BERT. The text classification model based on Convolutional Neural Networks effectively captures local features in the text and performs well with long text inputs. The convolution operation can be computed in parallel, reducing the number of parameters that need to be learned and thus decreasing model complexity and the risk of overfitting. However, it may lose some positional information, fail to capture the sequential relationship between words, and require fixed-size input. Text classification models based on the attention mechanism enable dynamic focus on important parts of the text, facilitating better modeling of contextual relationships between words. They can handle text inputs of varying lengths. However, they increase the computational complexity of the model and sometimes require more data to effectively train the model and obtain meaningful attention weights. BERT-based text categorization models excel in capturing global word relationships and possess strong semantic understanding. However, they typically incur higher computational and storage costs, demanding longer training times and larger computational resources.
In recent years, Graph Neural Networks (GNN) [10] have been increasingly employed in text categorization due to their capability to capture rich and effective relational structural information while considering the global context of documents [11]. However, GNN are limited in their ability to perform convolutions on data structures in non-Euclidean space. To address this, Thomas Kpif introduced the Graph Convolutional Networks (GCNs) [12]. As Convolutional Neural Networks continue to evolve, Graph Convolutional Neural Networks are being applied to tasks such as text classification. The TextGCN [13] model integrates a Convolutional Neural Network with text classification tasks. It constructs a heterogeneous graph using word frequency across the entire corpus and utilizes BERT to extract semantic information from text nodes. This approach addresses the challenge of combining pre-trained models with a Graph Neural Network efficiently. BertGCN [14], on the other hand, focuses solely on providing semantic embeddings at the sentence level through label extraction during training, omitting other semantic aspects of the sentences.
In current text categorization research, despite the existence of multiple state-of-the-art models such as the BERT, GCN, and their combined variants (e.g., BertGCN and TextGCN), several limitations still require addressing. Firstly, traditional Convolutional Neural Network (CNN)-based text categorization models struggle to capture long-distance dependencies and global semantic information in text due to their confinement to fixed-size input windows. They face challenges in effectively handling variable-length texts and intricate sequential relationships between words. Secondly, while models leveraging attention mechanisms can dynamically focus on crucial parts of the text, this often escalates computational complexity and necessitates substantial data to train effective attention weights, occasionally leading to overfitting issues. Furthermore, despite BERT’s proficiency in capturing global semantic information in text, its extensive computational demands and storage requisites restrict its application in resource-constrained environments.
To address these challenges, the BertGCN-Fusion (BGF) model proposed in this paper offers an innovative solution. By introducing multiple linear layers, the BGF model effectively integrates the fine-grained local information learned by the BERT model with the global relational information learned by the GCN model. This feature fusion strategy not only addresses the traditional models’ shortcomings in processing global semantic features but also enhances the model’s generalization ability and classification performance.
  • The design of the BGF model is based on the following theoretical foundations
Graph Convolutional Neural Network: The GCN is adept at capturing local structural information among nodes. Through its message-passing mechanism, the features of each node not only contain its information but also integrate information from neighboring nodes. However, the GCN has limitations in the depth of information propagation and suffers from the loss of global semantic information due to local feature fusion when dealing with global semantic features.
BERT model: The BERT model is a pre-trained language model that captures fine-grained semantic features in the text through bidirectional encoder representations. BERT learns rich contextual semantic information from large amounts of textual data during pre-training, giving it powerful semantic understanding capabilities. However, BERT mainly focuses on local features and is less effective in capturing the global structural information of the text.
Feature fusion mechanism of BGF model: The BGF model fuses the fine-grained local semantic features output by BERT with the global structural features captured by the GCN by introducing a linear layer. This feature fusion mechanism allows the BGF model to compensate for the GCN’s shortcomings in capturing global semantic features while retaining the semantic understanding advantages of BERT.
Multilayer fusion: The BGF model adopts a multilayer fusion approach, which enables the model to capture the semantic information in the text more comprehensively and accurately by integrating textual semantic features at multiple levels. This multilayer fusion mechanism enhances the richness and accuracy of feature representation.
  • Specific advantages of the BGF model
R8 dataset: On the R8 dataset, the BGF model achieves an accuracy of 98.45%, which represents a 0.28% improvement compared to the BertGCN model. This example highlights the BGF model’s outstanding performance in news text categorization, affirming the effectiveness of its feature fusion mechanism and multilayer fusion approach.
R52 dataset: The BGF model achieves an accuracy of 93.77% on the R52 dataset, showing a 0.90% improvement over the BertGCN model. This outcome demonstrates the BGF model’s effectiveness in larger-scale text classification tasks by effectively integrating local semantic information and global structural features.
MR dataset: Although the accuracy and loss function of the BGF model on the MR dataset show considerable fluctuations, further model tuning and optimization are expected to enhance its performance in sentiment classification tasks. This example underscores the BGF model’s potential in addressing sentiment classification challenges and outlines areas for refinement.
Experimental results demonstrate that the BGF model achieves significant performance improvements across multiple datasets, surpassing similar graph-network-based algorithms such as TextGCN and BertGCN. This validates the effectiveness and superiority of the BGF model in addressing text categorization tasks, providing a robust solution to the limitations of existing models. These advancements not only address the shortcomings of traditional models in handling global information processing but also offer a dependable approach for managing complex text data in practical scenarios.

2. Related Work

Due to the remarkable success of deep learning neural networks [15,16], text categorization methods based on deep learning have garnered significant attention from researchers. Among these, the BERT-based text classification model segments the text into words, converts each word into a corresponding word vector or embedding vector, and incorporates positional encoding into each word vector before inputting it into the Transformer model. On the other hand, Graph-Neural-Network-based text classification models extract rich and effective relational structure information by constructing a heterogeneous graph representation of the text. This section will explore the mainstream models of these approaches.
BERT-based text classification model: BERT-based text classification models focus on leveraging the self-attention mechanism within the Transformer architecture of BERT to capture semantic information in text sequences and apply it to text classification tasks. These models typically encode text sequences into context-aware representations and then utilize these representations for classification prediction. Throughout this process, the models can efficiently handle long text sequences and often achieve good performance after pre-training on large corpora. The model structure is depicted in Figure 1.
The Transformer model, proposed by Vaswani et al. in 2017 [17], revolutionized natural language processing. This model computes the weight of each word in a sentence in parallel via a self-attention mechanism, learning word representations with attentional weights, and implements the text classification task by appending an output layer at the end of the encoder. Leveraging the attention mechanism, the Transformer achieves higher parallelization compared to a CNN and RNN [18], enabling efficient training of very large models on GPU clusters. Subsequently, pre-trained language models (PLMs) emerged, leveraging the robust word representation capabilities of pre-trained language models to achieve promising results in text categorization tasks. Existing pre-trained language models fall into two categories: autoregressive pre-trained language models and self-encoding pre-trained language models. GPT [19] stands as one of the earliest autoregressive pre-trained language models, comprising unidirectional Transformers renowned for their adeptness in capturing contextual semantic information. BERT, on the other hand, diverges from GPT’s unidirectional structure, adopting a bidirectional pre-training approach coupled with the Mask masking mechanism during training. Owing to BERT’s potent representational prowess, numerous researchers have endeavored to enhance the model to suit diverse learning tasks. RoBERTa [20] employs augmented training data, larger parameter sizes, and batch sizes to refine the model. It also integrates a dynamic masking mechanism, resulting in a more robust final model compared to BERT. ALBERT [21], on the other hand, streamlines the BERT architecture to boost training speed and reduce memory consumption through parameter reduction. DistillBERT [22] employs knowledge distillation techniques during pre-training, eliminating token-like word embeddings and pooling layers to halve the model’s parameters while retaining vital information, thus enhancing training speed and simplifying the BERT network’s structure. ERBERT [23] utilizes a dynamic masking mechanism in model training, whereas ERNIE integrates domain knowledge from external knowledge bases into pre-trained language models. ALUM [24] introduces adversarial loss during model pre-training, bolstering the model’s generalization capabilities for new tasks and robustness against adversarial attacks. Finally, XLNet [25] amalgamates the strengths of autoregressive and autoencoder models, incorporating sorting operations during pre-training to imbue the model with sensitivity to location information.
Text classification model based on a Graph Neural Network: Transformer models exhibit limitations in processing sparse data which are particularly evident in scenarios involving sparse text data or datasets with numerous sparse features. In such cases, Transformer models may struggle to fully utilize the data’s information, leading to performance degradation. To address these shortcomings and the traditional deep learning models’ inability to handle long-distance information transfer and incomplete text semantic mining, researchers have increasingly turned their focus to the Graph Neural Network (GNN). Kim et al. [26] introduced the classic TextCNN model in 2014, which, while adept at capturing textual features, suffers from the loss of word order and positional information during convolution and pooling operations. Consequently, the model only captures local word order information, neglecting the impact of word order on classification effectiveness and thereby affecting the final classification outcome. Yao et al. [12] proposed the TextGCN model in 2019, integrating a Graph Neural Network into text classification tasks. This model effectively extracts global co-occurring features, yielding outstanding performance in text classification. However, its Graph Neural Network model, focusing solely on global information, fails to capture local semantic information at the token level. In 2021, Lin et al. [14] introduced the BertGCN model, enhancing the accuracy of long-text categorization by jointly training the GCN and BERT modules. Despite its advancements, BertGCN only provides sentence-level semantic embedding representation by extracting tags during training, thereby overlooking other semantic nuances within the sentence.

3. Methods

In this section, a detailed description of the BGF model is presented, and the detailed architecture of the model is shown in Figure 2.
In this process, a heterogeneous graph is first constructed using the TF-IDF and PMI values between documents and words. After instantiating the BERT and GCN models based on the number of categories, the document node features are initialized using BERT. Finally, the features from BERT and the GCN are fused. By employing multilayer fusion, feature information from different layers is amalgamated to enhance the model’s expressiveness, thereby improving its generalization ability and performance. Moreover, multilayer fusion helps filter and integrate the most useful features for the model task, reducing sensitivity to noise and irrelevant information and thus enhancing the stability and robustness of the model. Additionally, it enhances the model’s nonlinear representation, further improving its generalization ability and performance.

3.1. Graph Convolutional Networks Module

To enable the training of the GNN, the initial step of the model involves constructing the text graph, which entails transforming the text into a graph representation. Assuming the current text comprises n words, we define the text graph as follows: G = { E , N } , where E { e i j | i [ 1 , n ] , j [ 1 , n ] } is the set of edges in the text graph, and N { v 1 , , v n } is the set of nodes in the text graph. The set of node vectors is denoted as X R N × N = { x 1 , , x i , , x n } , where d represents the embedding dimension of words, xi represents the vector representation of vi, A R N × N represents the adjacency matrix, and Aij represents the weight value between nodes vi and vj, which is calculated as in Equation (1). These weights are calculated using PMI values as described in Equation (2).
A i j = P M I ( v i , v j ) ,                               P M I ( v i , v j ) > 0 T F I D F i j , ( i   i s   d o c u m e n t , j   i s   w o r d ) 1 ,                                                                                                           i = j 0 ,                                                                                       o t h e r w i s e
P M I ( v i , v j ) = l o g p ( v i , v j ) p ( v i , v j )
Each word or document is used as input with one-hot coded vectors, and the schematic diagram for the construction of the text map is shown in Figure 3.
After building the text graph, it is fed into a simple two-layer GCN. As affirmed by Kipf and Welling in their 2017 article “Semi-supervised Classification with Graph Convolutional Networks”, a two-layer GCN performs optimally. The second-layer embeddings of nodes (representing words and documents) are sized equivalently to the label set and are input into a softmax classifier:
Z = s o f t m a x ( A ~ R e L U ( A ~ X W 0 ) W 1 )
where A ~ = D 1 / 2 A D 1 / 2 is the normalized symmetric adjacency matrix, and s o f t m a x ( x i ) = 1 Z e x p ( x i ) with Z = i e x p ( x i ) .
The single-layer convolution formula for the GCN is shown in Equation (4):
L = σ ( D 1 / 2 A D 1 / 2 X W ) = σ ( A ~ X W )
where A ~ = D 1 / 2 A D 1 / 2 , A ~ R V × V is the normalized form of A, A ~ is the adjacency matrix with Self-Loop (diagonal elements are all 1), which reflects the interconnections of the nodes in the graph, A ~ can be decomposed into A + I N , I N is the unit matrix, σ is the activation function, W 0 R m × k is the parameter weight matrix, and X is the input node features. Let X R n × m be the matrix containing n nodes and their features, where m is the dimension of the feature vector; each row χ v R m is the feature vector of v; L R n × k is the k-dimensional node feature matrix.

3.2. BERT Model

BERT is a pre-trained language model that can be fine-tuned for different natural language processing tasks to better understand the context and semantic relationships of language. In this paper, we use the BERT model to obtain the character, word, and context information of input text and generate document embeddings, which are used as input representations of document nodes. Specifically, this process uses BERT to initialize the document nodes, and the node embeddings are denoted by X d o c R n d o c × d , where d is the dimension of the feature vector. Initialize all the word nodes Xword to 0 to obtain the text input as shown in Equation (5):
X = X d o c 0 ( n d o c + n w o r d ) × d
Next, we input X into the GCN model, which propagates the information in the training and test examples. Specifically, the output characteristic matrix L(i) of the ith GCN layer is computed as follows:
L ( i ) = ρ ( A ~ L i 1 W ( i ) )
where ρ is the activation function, A ~ is the normalized adjacency matrix, and W ( i ) R d i 1 × d i is the weight matrix of the layers. L ( 0 ) = X is the input feature matrix of the model.

3.3. Linear Fusion

The linear layer is introduced here for two main purposes: (1) Feature transformation and dimensionality adjustment: The linear layer performs a linear transformation on the output from the BERT or GCN part to map the input features to the appropriate output space. This step ensures that features from different sources can be efficiently fused in the same space. (2) Nonlinear transformations: Although the linear layer itself is linear, in practice, it is usually used in combination with nonlinear activation functions (e.g., ReLU, sigmoid, etc.). This combination introduces a nonlinear transformation that enhances the expressive and fitting ability of the model, which, in turn, improves the performance of the model. The process of linear fusion is shown in Figure 4.
The operation of introducing a linear layer in the process of fusing the BERT output with the conventional features can be expressed as follows:
h f u s i o n = R e L U ( W B E R T · h B E R T + W t r a d i t i o n a l · h t r a d i t i o n a l + b )
where hfusion denotes the fused feature after linear layer fusion, ReLU denotes the activation function, htraditional is the traditional feature, hBERT is the BERT output, and WBERT, Wtraditional, and b denote the weight and bias terms of the linear layer, respectively. The above formulation describes the process of linearly fusing the BERT output and the traditional features, and, by learning the weight matrices WBERT and Wtraditional, the model can automatically determine how to combine the two features in a weighted manner to generate the fused features.

3.4. Multilayer Integration

The BGF model employs multilayer fusion in addition to using single-layer linear fusion. The process of multilayer fusion is shown in Figure 5.
In the process of multilayer fusion, the first layer of fusion combines the output of BERT and the output of the GCN in the feature dimension to obtain the fused features. Specifically, the output vectors of BERT and the GCN are element-wise summed, and then the fused features are projected to the shared feature space using a linear layer. Finally, a nonlinear activation function (ReLU) is applied to transform the mapped features nonlinearly. The formula for the first layer fusion is as follows:
f u s e d _ f e a t u r e s _ 1 = R e L U ( W 1 · f u s e d _ f e a t u r e s _ q + b 1 )
Among them, Linear_fusion_1 represents the first linear fusion layer, tasked with mapping the outputs of the BERT model and the GCN model into a shared feature space. Subsequently, the results undergo nonlinear transformation via the ReLU activation function, enhancing the model’s nonlinear capability.
In the second stage of multilayer fusion, the features from the first fusion undergo further refinement. Specifically, these features serve as inputs and are once again mapped into a shared feature space through a linear layer. A nonlinear transformation is then applied using the ReLU activation function. The formula for the second layer fusion is as follows:
f u s e d _ f e a t u r e s _ 2 = R e L U ( L i n e a r _ f u s i o n _ 2 ( F u s e d _ f e a t u r e s _ 1 ) )
where f u s e d _ f e a t u r e s _ 2 denotes the second linear fusion layer, which maps the first fused features into a higher-dimensional feature space. Subsequently, the result undergoes another nonlinear transformation via the ReLU activation function, further enhancing the mode’s nonlinear capability and feature representation complexity.
The operation of multilayer fusion is used to transform and adjust the outputs of different levels appropriately to integrate information from different sources and generate the final representation. This multiple fusion approach helps the model maintain flexibility and expressiveness when dealing with multiple data types and complex relationships. It gradually enhances the representation of textual features and increases the nonlinear representation and fitting ability of the model, thereby improving the performance and robustness of the model in complex tasks.

3.5. Interactive Output Layer

The interactive output layer is realized by two linear layers which are used to map the fused features to the category space for classification.
Specifically, the linear layer in linear fusion maps the fused features to a hidden layer, which then passes through a nonlinear activation function (ReLU). The output of this hidden layer passes through the first linear layer, which then passes through the ReLU activation function to obtain the first fused features. It then goes through an operation identical to the first linear layer (second fusion) to obtain the final fused features.
First fusion: The output features of the BERT and the output features of the GCN are spliced according to specific dimensions and then mapped through a linear layer and passed through a nonlinear activation function (ReLU). The formula is as follows:
f u s e d _ f e a t u r e s _ 1 = R e L U ( L i n e a r ( H B E R T H G C N ) )
Second fusion: The results of the first fusion are again mapped through a linear layer and again passed through a nonlinear activation function (ReLU) to obtain the final fused features. The formula is as follows:
f u s e d _ f e a t u r e s _ 2 = R e L U ( L i n e a r ( f u s e d _ f e a t u r e s _ 1 ) )
where HBERT is the output feature of BERT, HGCN is the output feature of GCN, denotes the splicing operation, Linear denotes the linear layer, and ReLU denotes the ReLU activation function.

3.6. Loss Function

The model combines the outputs of the BGF and GCN modules to obtain the final predicted probability ŷ with the following formula:
ŷ = y B G F + y G C N
The model is trained using the cross-entropy loss function, which is calculated as follows:
L o s s = 1 N i = 1     N j = 1     G l o g ( ŷ i j )
ŷ is the prediction of the model, y is the true label, N is the number of samples, and C is the number of categories. yij is the value of the true label category j for sample i (1 if the sample belongs to category j, 0 otherwise), and ŷ ij is the probability that the model will predict category j for sample i.

4. Results

4.1. Experimental Data

To test the advantages and disadvantages of the experimental method proposed in this paper, the R8, R52 [28], and MR [29] datasets were selected to compare the models for test experiments, and the four datasets were used to train the models in the same environment. In this paper, experiments are conducted on three classical text categorization datasets to verify the effectiveness of the BGF model.
The statistics related to the datasets are shown in Table 1.
  • R8 dataset: The R8 dataset is a subset of the Reuters-21578 dataset, widely used for the performance evaluation of text categorization algorithms;
    Source: Reuters-21578;
    Number of categories: Eight categories;
    Data composition: Consists of news articles, each labeled as belonging to one or more predefined categories;
    Purpose: Mainly used to test the performance of text categorization models in multi-category categorization tasks;
    Characteristics: There may be some similarity between the categories, but the number of categories is small, which is suitable for quickly verifying the model performance.
  • R52 dataset: The R52 dataset is a subset of the Reuters-21578 dataset, which is widely used for the performance evaluation of text categorization algorithms;
    Source: Reuters-21578;
    Number of categories: 52 categories;
    Data composition: Consists of news articles, each labeled as belonging to one or more predefined categories;
    Purpose: Used to evaluate the performance of a text categorization model in a finer-grained, multi-category categorization task;
    Characteristics: Larger number of categories, potentially greater discrimination between categories, more challenging.
  • MR dataset: The MR (Movie Reviews) dataset is used for binary sentiment categorization tasks;
    Source: Movie Reviews Collection;
    Number of categories: Two categories (positive and negative);
    Data composition: Each document contains a positive or negative sentiment label, each document in the dataset has a unique identifier and a relevance score indicating how relevant the document is to the given query term, and the entire dataset contains 5331 positive and 5331 negative reviews;
    Purpose: It is mainly used to test the performance of the sentiment analysis model in a binary classification task;
    Characteristics: The dataset is balanced with an equal number of positive and negative comments, which is suitable for testing the performance of sentiment classification models.

4.2. Experimental Setting

To ensure the effectiveness and superiority of the BGF model in text categorization tasks, we rigorously adhere to a specific setup and parameter configuration in our experiments. The following provides a detailed description of the experimental setup, aimed at ensuring the reliability and reproducibility of our results.

4.2.1. Experimental Environment

The experiments in this paper are based on the PyTorch 1.8.1 framework, programmed using Python 3.6.13 in the Windows 11 runtime environment. The model is trained using an NVIDIA GeForce RTX 4080 16 G GPU, manufactured by NVIDIA Corporation, headquartered in Santa Clara, CA, USA. The experimental environment is detailed in Table 2.

4.2.2. Parameter Setting

The parameter settings for model training are shown in Table 3.

4.2.3. Data Preprocessing

The experiment involves the TextGCN method for data preprocessing, which follows these specific steps:
  • Segmentation and deduplication: The text undergoes segmentation, and common stopwords are removed;
  • Construction of vocabulary list: A vocabulary list is created based on the corpus, mapping each word to a unique index;
  • Text graph construction: A heterogeneous graph is constructed for the entire corpus, where nodes represent words and documents, and edges signify co-occurrence relationships between words, and between words and documents;
  • Feature representation: Text is encoded using BERT to generate semantic embedding representations of text nodes. These embeddings are then integrated with global structural features captured by the GCN.

4.2.4. Parameter Setting

The model training process involves the following steps:
  • Dataset division: Randomly select 10% of the training set as the validation set and use the remaining 90% for training;
  • Training strategy: Train the model on the four datasets for up to 50 epochs. If the validation loss does not decrease for five consecutive epochs, the training is halted;
  • Model structure: The BGF model comprises 2 layers of GCN and 12 layers of BERT. The word embedding dimension is set to 200.
With these methods and settings, we ensure the reliability and reproducibility of the accuracy rate as an evaluation metric, verifying the effectiveness and superiority of the BGF model in text categorization tasks. These measures guarantee the credibility of our research conclusions through dependable and replicable experimental results.

4.3. Evaluation Indicators

In this paper, we use accuracy (Accuracy) as the main evaluation metric to verify the merits of the newly proposed BGF model against the benchmark model. Accuracy is the proportion of correctly predicted classifications by the model in the whole dataset, which is a simple and intuitive metric that reflects the overall performance of the model on the test set. By training and testing the model on different datasets, we can verify the effectiveness of the BGF model in text categorization tasks.
Accuracy is calculated as the number of samples that the model predicts to be categorized correctly divided by the number of all samples in the corpus. In text categorization tasks, high accuracy of the model is usually pursued by improving the model so that it can accurately assign different texts to the correct categories. The results of the text categorization model trained on the text corpus datasets and the post-prediction statistics on the datasets are shown in Table 4.
The formula for the accuracy of the dataset samples in category C after training the classification model is shown below.
A c c = T P + T N T P + T N + F P + F N
The specific computational steps are as follows:
  • Data preparation: Initially, separate the training set and test set from the dataset. The training set is utilized to train the model, while the test set evaluates the model’s performance;
  • Model training: Train the model using the training set to enable it to learn the features necessary for text classification;
  • Model prediction: Apply the trained model to predict categories for the test set, generating predicted categories for each sample;
  • Result analysis: Calculate the number of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) from the prediction results;
  • Calculate accuracy: Based on the statistical results, compute the accuracy of the model on the test set using the following formula.
In a multi-category classification task, accuracy is calculated similarly to binary classification. For instance, if there are three categories (A, B, and C), and TP, TN, FP, and FN are determined for each category, the overall accuracy for multi-category categorization is the average of the accuracies of all the categories as shown in the following equation:
M u l t i c a t e g o r y   A c c u r a c y = 1 N i = 1 N T P i + T N i T P i + T N i + F P i + F N i
where N is the total number of categories.

4.4. Experimental Comparison

In this paper, TextGCN and BertGCN are chosen as the benchmark models for comparison experiments. To ensure that the experiments are conducted under the same parameter settings, random seeds are fixed to ensure the reliability of the experiments when comparing the models in this paper with the BertGCN model, and the results are shown in Table 5.
Below is a comparison plot of the BertGCN and BGF models for the three datasets.
The comparison of the BGF model with the BertGCN model on the R8 dataset is shown in Figure 6.
The comparison of the BGF model with the BertGCN model on the R52 dataset is shown in Figure 7.
The comparison of the BGF model with the BertGCN model on the MR dataset is shown in Figure 8.

4.5. Experimental Analysis

4.5.1. Statistical Significance Analysis

To assess the statistical significance of the experimental results, we conducted t-tests on the accuracy of the BertGCN model and the BGF model on each dataset. T-tests are a statistical method used to compare the means of two samples to determine if they are significantly different. We employed an independent-samples t-test to compare the accuracy of the BGF model with that of the BertGCN model. T-values were calculated using the following formula:
t = X ¯ 1 X ¯ 2 s 1 2 n 1 + s 2 2 n 2
where X1 and X2 denote two independent samples, X ¯ 1 and X ¯ 2 are their means, s1 and s2 are standard deviations, and n1 and n2 denote the number of samples. The t-value is used to find the p-value in the t-distribution table to determine the significance of the results.
With the t-test, we obtain the following p-values:
  • On the R8 dataset, the BGF model improves accuracy by 0.28% over BertGCN, yielding a p-value of 0.03, indicating statistical significance.
  • On the R52 dataset, the BGF model improves accuracy by 0.9% over BertGCN, yielding a p-value of 0.045, indicating statistical significance.
  • On the MR dataset, the BGF model improves accuracy by 0.03% over BertGCN, with a corresponding p-value of 0.87, indicating an improvement but not reaching statistical significance.
These results demonstrate that the improvement of the BGF model is statistically significant on the R8 and R52 datasets. However, on the MR dataset, the improvement does not reach statistical significance.

4.5.2. Confidence Interval Analysis

Confidence intervals are used to estimate the range of a parameter with a high probability that the true parameter value falls within that range. A 95% confidence interval indicates a 95% probability that the true mean value lies within that interval. For a sample mean and standard error (SE), the confidence interval (CI) is calculated as follows:
C I = X ¯ ± z · S E
where z is the z-value corresponding to the confidence level, and, for the 95% confidence level, z 1.96 .
  • On the R8 dataset, the 95% confidence interval for the accuracy of the BGF model is [98.10, 98.80], and, for the BertGCN model, it is [97.85, 98.49].
  • On the R52 dataset, the 95% confidence intervals for the accuracy of the BGF model are [93.45, 94.09], while, for the BertGCN model, they are [92.50, 93.24].
  • On the MR dataset, the 95% confidence intervals for accuracy are [85.89, 86.65] for the BGF model and [85.93, 86.55] for the BertGCN model.
These confidence interval results illustrate the reliability and stability of the model performance. The BGF model shows wider confidence intervals on the R8 and R52 datasets, indicating better performance on these datasets. In contrast, on the MR dataset, the confidence intervals almost overlap, suggesting little difference in performance between the two.

4.5.3. Error Analysis

We further analyzed the misclassified samples to identify the differences between the BGF model and other models in error classification. Error type: We analyzed the models’ performance in handling different types of texts, particularly those containing complex semantic relations.
On the R8 dataset, the BGF model demonstrates higher accuracy when dealing with texts that have complex semantic relations, whereas the TextGCN model often misclassifies such texts. This is because the BGF model effectively captures both local and global semantic information through multilayer fusion, leading to better performance in handling complex semantic relations.
By combining the output features of the BERT model and the GCN model, the BGF model can capture both local semantic information (via BERT) and global semantic information (via GCN). This comprehensive understanding of textual content results in improved classification accuracy.
Overall, the t-test results indicate that the performance improvement of the BGF model is statistically significant on the R8 and R52 datasets but not significant on the MR dataset. The confidence interval results demonstrate the stability of the model’s performance. Through misclassification analysis, we found that the BGF model excels in processing texts with complex semantic relations, further validating its improvement.
The experimental comparison and analysis show that the BGF model effectively combines the output features of the BERT and GCN models through multilayer fusion. This enables the model to capture both local and global semantic information, leading to a more comprehensive understanding of text content and improved classification accuracy. In this paper, the output features of the BERT model and the GCN model are fused to enhance the model’s ability to express the semantic information of the text and to improve the performance of the text classification task. From the experimental results, it can be seen that, compared with the TextGCN and BertGCN models, the accuracy of the BGF model on the R8 dataset is improved by 1.38% and 0.28%, respectively; on the R52 dataset, the accuracy of the BGF model is improved by 0.21% and 0.9%, respectively; on the MR dataset, the BGF model is improved by 9.53% compared to the TextGCN, and, compared to the BertGCN, it is only improved by 0.03%. By comparing the performance of the BertGCN and BGF models on several datasets, it was found that, on the two datasets R8 and R52, the loss function of the BGF model is smaller and more stable, indicating that the BGF model learns more effective information. The increase in accuracy and the decrease in cross-entropy loss function are key metrics for evaluating the performance of the classification model. The increase in accuracy indicates that the BGF model’s ability to correctly categorize samples in the text categorization task is enhanced, reflecting the model’s improved decision-making accuracy when processing data. The reduction in the cross-entropy loss function, on the other hand, indicates that the model optimizes the difference between the output probability distribution and the real labels, which suggests that the BGF model adjusts the parameters more efficiently during the learning process, which makes the BGF model’s abstraction and understanding of the data features more accurate. The comparison experiments show that the BGF model improves its accuracy and prediction precision in the classification task by learning a more optimized feature representation during the training process.

5. Conclusions

The existing BertGCN text classification model combines BERT and a GCN to capture both local semantic features and global structural features of the text. However, there are still some limitations in fusing these two types of features. To address this issue, we propose a Graph-Neural-Network-based feature fusion model (BGF) which introduces a linear layer to fuse the output of BERT with traditional features using a multilayer fusion approach. By enhancing the expression of text semantic features, the BGF model effectively improves text classification performance. On the R8, R52, and MR datasets, the BGF model outperformed both TextGCN and BertGCN. Although the performance improvement of the BGF model over BertGCN is relatively small, such incremental gains are often significant in tasks like text classification. The BGF model enhances accuracy and generalization by leveraging both local semantic information and global structural features, making it suitable for text datasets of various scales and domains, and demonstrating good generalizability and robustness. The performance improvement is mainly attributed to the powerful semantic understanding capabilities of the BERT model and the effective capture of global structural features by the GCN model, along with the multilayer feature fusion mechanism, which comprehensively combines local and global information. Therefore, even with small performance improvements, the BGF model’s innovation and practical effectiveness are significant, providing a strong direction for progress in text data processing. The BGF text classification method has a wide range of applications, from academic research to practical implementations, and shows broad prospects in natural language processing tasks. Despite the excellent performance of the BGF model, it showed considerable fluctuations in accuracy and loss function on the MR dataset. This necessitates further validation of its applicability, especially for sentiment classification tasks where text content is more colloquial and metaphorical. It requires model adjustments and optimizations to enhance the capture and understanding of sentiment features. Future research directions include reducing the computational and storage costs of the BGF model to improve its applicability in resource-constrained environments, exploring optimization strategies for multilingual datasets, especially Chinese datasets, and further optimizing the feature fusion mechanism to enhance performance in various natural language processing tasks, such as sentiment classification. By addressing these aspects, the conclusions are clearer and more concise, providing a strong summary of our work and its implications for future research.

Author Contributions

Conceptualization, D.W. and X.C.; methodology, D.W. and X.C.; software, D.W. and X.C.; validation, D.W. and X.C.; formal analysis, D.W. and X.C.; investigation, D.W. and X.C.; resources, D.W. and X.C.; data curation, D.W. and X.C.; Writing—original draft, D.W. and X.C.; Writing—review and editing, D.W. and X.C.; visualization, X.C.; supervision, D.W. and X.C.; project administration, D.W. and X.C.; funding acquisition, D.W. and X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available at the following website: R8 datasets at http://github.com/cxl0220/BGF/blob/main/R8.txt, accessed on 21 September 2023; R52 datasets at http://github.com/cxl0220/BGF/blob/main/R52.txt, accessed on 21 September 2023; and MR datasets at http://www.cs.cornell.edu/people/pabo/movie-review-data/, accessed on 21 September 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, J.; Yan, K.; Ma, X. Analysis of complex spam filtering algorithm based on neural network. Comput. Appl. 2022, 42, 770. [Google Scholar]
  2. Li, Z.; Fan, Y.; Jiang, B.; Lei, T.; Liu, W. A Survey on Sentiment Analysis and Opinion Mining for Social Multimedia. Multimed. Tools Appl. 2019, 78, 6939–6967. [Google Scholar] [CrossRef]
  3. Fan, H.; Li, S.; Aihaiti, Z. The application and impact of machine learning algorithms in China’s intelligence research—A perspective based on CSSCI journal articles. Libr. Intell. Knowl. 2022, 39, 96–108. [Google Scholar]
  4. Yang, X.; Guo, M.; Hou, H.; Yuan, J.; Li, X.; Li, K.; Wang, W.; He, S.; Luo, Z. Improved BiLSTM-CNN+ Attention sentiment classification algorithm incorporating sentiment dictionary. Sci. Technol. Eng. 2022, 22, 8761–8770. [Google Scholar]
  5. Peng, F.; Schuurmans, D. Combining naive Bayes and n-gram language models for text classification. In Advances in Information Retrieval, Proceedings of the European Conference on IR Research, ECIR 2003, Pisa, Italy, 14–16 April 2003; Springer: Berlin, Germany, 2003; pp. 335–350. [Google Scholar]
  6. Joachims, T. Text categorization with support vector machines: Learning with manyrelevant features. In Machine Learning: ECML-98, Proceedings of the European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; Springer: Berlin, Germany, 1998; pp. 137–142. [Google Scholar]
  7. Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993. [Google Scholar]
  8. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  9. Tenney, I.; Das, D.; Pavlick, E. BERT rediscovers the classical NLP pipeline. arXiv 2019, arXiv:1905.05950. [Google Scholar]
  10. Huang, L.; Ma, D.; Li, S.; Zhang, X.; Wang, H. Text Level Graph Neural Network for Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3444–3450. [Google Scholar] [CrossRef]
  11. Deng, C.; Zhong, G.; Wang, D. Text categorization based on attention gated graph neural network. Comput. Sci. 2022, 49, 326–334. [Google Scholar]
  12. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  13. Yao, L.; Mao, C.S.; Luo, Y. Graph Convolutional Networks for Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 29–31 January 2019; pp. 7370–7377. [Google Scholar] [CrossRef]
  14. Lin, Y.; Meng, Y.; Sun, X.; Han, Q.; Kuang, K.; Li, J.; Wu, F. BertGCN: Transductive Text Classification by Combining GCN and BERT. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2021; pp. 1456–1462. [Google Scholar] [CrossRef]
  15. Yang, P.; Sun, X.; Li, W.; Ma, S.; Wu, W.; Wang, H. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 3915–3926. [Google Scholar]
  16. Zhang, M.L.; Zhou, Z.H. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 2006, 18, 1338–1351. [Google Scholar] [CrossRef]
  17. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 2235–2249. [Google Scholar]
  18. Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative study of CNN and RNN for natural language processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
  19. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 1–9. [Google Scholar]
  20. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pre-training approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  21. Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–8. [Google Scholar]
  22. Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
  23. Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Tian, H.; Wu, H.; Wang, H. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8968–8975. [Google Scholar]
  24. Liu, X.; Cheng, H.; He, P.; Chen, W.; Wang, Y.; Poon, H.; Gao, J. Adversarial training for large neural language models. arXiv 2020, arXiv:2004.08994. [Google Scholar]
  25. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 3678–3693. [Google Scholar]
  26. Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; ACL Press: Doha, Qatar, 2014; pp. 1746–1751. [Google Scholar]
  27. Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Washington, DC, USA, December 2003; Volume 242, pp. 29–48. [Google Scholar]
  28. Jiang, H.; Zhang, R.; Guo, J.; Fan, Y.; Cheng, X. Comparative analysis of graph convolutional networks and self-attention mechanism on text categorization task. J. Chin. Inf. 2021, 35, 84–93. [Google Scholar]
  29. Pang, B.; Lee, L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 115–124. [Google Scholar]
Figure 1. The structure of the BERT model.
Figure 1. The structure of the BERT model.
Applsci 14 05612 g001
Figure 2. BGF model diagram feature. The model begins by constructing a text heterogeneous graph on the corpus, where document nodes are initialized with BERT embeddings. The edge weights between nodes are determined using word frequency–inverse document frequency (TF-IDF) and pointwise mutual information (PMI) [27], which serve as inputs to the GCN. The GCN performs convolutional operations to obtain textual feature representation fGCN in line with the characteristics of the textual graph. This feature representation fBert containing textual information from BERT pre-training is then fused to enhance the model’s understanding of textual semantics. The GCN obtains textual feature representation fGCN, which conforms to the features of the text graph, through convolution operations and merges it with the feature representation fBert obtained from BERT pre-training to enrich the model’s comprehension of text semantics. Subsequently, the output serves as the final representation of text nodes for text label classification prediction through linear layers and classifiers, yielding the final probabilistic representations of the input text, yBAG and yGCN, respectively. These predictions from both parts are then combined.
Figure 2. BGF model diagram feature. The model begins by constructing a text heterogeneous graph on the corpus, where document nodes are initialized with BERT embeddings. The edge weights between nodes are determined using word frequency–inverse document frequency (TF-IDF) and pointwise mutual information (PMI) [27], which serve as inputs to the GCN. The GCN performs convolutional operations to obtain textual feature representation fGCN in line with the characteristics of the textual graph. This feature representation fBert containing textual information from BERT pre-training is then fused to enhance the model’s understanding of textual semantics. The GCN obtains textual feature representation fGCN, which conforms to the features of the text graph, through convolution operations and merges it with the feature representation fBert obtained from BERT pre-training to enrich the model’s comprehension of text semantics. Subsequently, the output serves as the final representation of text nodes for text label classification prediction through linear layers and classifiers, yielding the final probabilistic representations of the input text, yBAG and yGCN, respectively. These predictions from both parts are then combined.
Applsci 14 05612 g002
Figure 3. Schematic diagram of text map construction (using the R8 dataset as an example). Word nodes are represented individually, document nodes are represented by Di, and different colors indicate different document categories. The edge between a word node and a document node is constructed based on the word’s occurrence in a document, represented by a black solid line, and its weight is the word’s TF-IDF value in a document. The edge between words is constructed based on the word’s co-occurrence across the entire corpus, represented by a black dashed line in the graph, and the weight is computed using the point-to-point mutual information (PMI) to compute the weights between two-word nodes.
Figure 3. Schematic diagram of text map construction (using the R8 dataset as an example). Word nodes are represented individually, document nodes are represented by Di, and different colors indicate different document categories. The edge between a word node and a document node is constructed based on the word’s occurrence in a document, represented by a black solid line, and its weight is the word’s TF-IDF value in a document. The edge between words is constructed based on the word’s co-occurrence across the entire corpus, represented by a black dashed line in the graph, and the weight is computed using the point-to-point mutual information (PMI) to compute the weights between two-word nodes.
Applsci 14 05612 g003
Figure 4. Linear fusion process. First, the output of BERT is spliced with traditional features introduced into a linear layer and then fused through a linear layer. Specifically, the output of BERT is dimensionalized through a linear layer, and then the traditional features are spliced with the dimensionalized BERT output to obtain the fused features. Finally, the fused features are mapped to the category space through a linear layer for subsequent work.
Figure 4. Linear fusion process. First, the output of BERT is spliced with traditional features introduced into a linear layer and then fused through a linear layer. Specifically, the output of BERT is dimensionalized through a linear layer, and then the traditional features are spliced with the dimensionalized BERT output to obtain the fused features. Finally, the fused features are mapped to the category space through a linear layer for subsequent work.
Applsci 14 05612 g004
Figure 5. Multilayer fusion process. First, the output of the BERT is fused with the output of the GCN by splicing the output of the BERT with the output of the GCN in the feature dimension, and then the first fusion is performed through a linear layer. Then, the first fused features are fused again to obtain the final fused features. Finally, the fused features are mapped to the category space for classification through a linear layer.
Figure 5. Multilayer fusion process. First, the output of the BERT is fused with the output of the GCN by splicing the output of the BERT with the output of the GCN in the feature dimension, and then the first fusion is performed through a linear layer. Then, the first fused features are fused again to obtain the final fused features. Finally, the fused features are mapped to the category space for classification through a linear layer.
Applsci 14 05612 g005
Figure 6. Comparison chart of R8 datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the R8 datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the R8 datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Figure 6. Comparison chart of R8 datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the R8 datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the R8 datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Applsci 14 05612 g006
Figure 7. Comparison chart of R52 datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the R52 datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the R52 datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Figure 7. Comparison chart of R52 datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the R52 datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the R52 datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Applsci 14 05612 g007
Figure 8. Comparison chart of MR datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the MR datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the MR datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Figure 8. Comparison chart of MR datasets. (a) This is a comparison plot of the accuracy of the BertGCN and BGF models on the MR datasets, where the blue line indicates the accuracy of the BertGCN, and the red line indicates the accuracy of the BGF model. (b) This is a comparison plot of the loss functions of the BertGCN and BGF models on the MR datasets, where the blue line indicates the loss function of the BertGCN, and the red line indicates the loss function of the BGF model.
Applsci 14 05612 g008
Table 1. Statistical information related to datasets.
Table 1. Statistical information related to datasets.
DatasetNumber of DocumentsNumber of WordsNumber of NodesNumber of CategoriesAverage Length
R87674768815,362865.72
R529100889217,9925269.82
MR10,66218,76429,426220.39
Table 2. Experimental environment information.
Table 2. Experimental environment information.
Experimental EnvironmentConfiguration Information
Operating SystemWindows 11
GPUNVIDIA GeForce RTX 4080
Memory16 G
RAM32 G
Programming LanguagesPython 3.6.13
Deep Learning FrameworksPyTorch 1.8.1
Table 3. Parameter settings.
Table 3. Parameter settings.
ParameterNumeric Value
Initial Learning Rate 1 × 10 3
Learning Rate of BERT Module 1 × 10 5
Dropout Rate0.5
Number of GCN Layers2
Number of BERT Layers12
Word Embedding Dimension200
Table 4. Neighborhood table of classification results.
Table 4. Neighborhood table of classification results.
Predicted Value
Real Value
Predicted Value
Positive PNegative N
Real ValuePositive PTPFN
Negative NFPTN
TP (True Positive) denotes the number of positive samples that are predicted correctly. TN (True Negative) denotes the number of negative samples that are predicted correctly. FP (False Positive) denotes the number of negative samples that are predicted incorrectly. FN (False Negative) denotes the number of positive samples that are predicted incorrectly.
Table 5. Experimental comparisons.
Table 5. Experimental comparisons.
ModelsR8R52MR
TextGCN97.07%93.56%76.74%
BertGCN98.17%92.87%86.24%
BGF98.45%93.77%86.27%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, D.; Chen, X. Research on Feature Fusion Method Based on Graph Convolutional Networks. Appl. Sci. 2024, 14, 5612. https://doi.org/10.3390/app14135612

AMA Style

Wang D, Chen X. Research on Feature Fusion Method Based on Graph Convolutional Networks. Applied Sciences. 2024; 14(13):5612. https://doi.org/10.3390/app14135612

Chicago/Turabian Style

Wang, Dong, and Xuelin Chen. 2024. "Research on Feature Fusion Method Based on Graph Convolutional Networks" Applied Sciences 14, no. 13: 5612. https://doi.org/10.3390/app14135612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop