Review of Image Classification Algorithms Based On
Review of Image Classification Algorithms Based On
1
School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, PR China
Abstract
In recent years, graph convolutional networks (GCNs) have gained widespread attention and applications in image
classification tasks. While traditional convolutional neural networks (CNNs) usually represent images as a two-dimensional
grid of pixels when processing image data, the classical model of graph neural networks (GNNs), GCNs, can effectively
handle data with the graph structure, such as social networks, recommender systems, and molecular structures. This paper
summarizes the classical convolutional neural network models, highlighting their innovative approaches. And it will
introduce the problems that graph convolutional networks have had, such as over-smoothing, and the ways to solve them,
and suggest some possible future directions.
Keywords: Graph Convolutional Networks, Convolutional Neural Networks, Graph Neural Networks, over-smoothing.
Copyright © 2023 Tang et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons
Attribution license, which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly
cited.
doi: 10.4108/airo.2%y.3462
1.2. Application Of Graph Convolutional of the paper concludes and shows the future trends of graph
Networks In Image Classification convolutional networks.
larger ones, which reduces the number of parameters and achieve effective hierarchical accumulation without the
increases computational efficiency. The authors' degradation problem encountered when training deep
experiments further confirm the conjecture that the higher convolutional networks in a single pass. In the concrete
the number of layers, the more accurate the prediction. The implementation, the authors added a batch normalization
top-1 validation error dropped to 25.5% and the top-5 error (BN) layer after each convolutional layer and before the
dropped to 8.0% by layer 19. activation layer instead of the dropout method to reduce
The architecture of the Inception network [20] (also overfitting. Due to the excellent stackability of Resnet, the
known as GoogLeNet) differs from the original preference authors incrementally increase the network level from 18
for hierarchical stacking to a more flexible network model. to 152 layers and obtain better results than 18 layers on the
The main innovation of Inception is the introduction of the ImageNet test set.
Inception module, a structure for parallel stacking of
convolutional kernels at different scales. Unlike the X
stacking of a single network of residuals or VGG networks,
GoogLe-net solves the problem of model degradation at weight layer
images with globally related nodes, CNNs may not be GraphSAGE uses a multi-layer aggregation function,
suitable for processing. where each layer aggregates the information of nodes and
• In CNNs, convolution operations are usually used to their neighbors to obtain the feature vector of the next
aggregate spatial information. However, the layer. GraphSAGE employs the neighborhood information
convolution operation has high computational of nodes and does not depend on the global graph structure.
complexity. When the depth of the network increases, An innovative improvement of GraphSAGE is to use node
the computation and parameters of convolution features to learn an embedding function that enables
become difficult to control [29]. invisible nodes to generate embeddings.
• Although CNNs excel in dealing with two- GraphSAGE has good performance in both
dimensional grid structures such as image data, GCNs unsupervised and supervised learning.
can better capture the relationships [30] between GAT [35] introduces what is essentially a single-layer pre-
nodes and local features for more general graph- feedback neural network with an attention mechanism that
structured data, especially when the node allows the model to learn itself. This mechanism is
relationships and attribute features are more complex, performed by adding a model learnable coefficient to each
providing stronger modeling and representation edge and performing node feature fusion with an attention
capabilities. coefficient 𝛼𝛼 . This allows the process of convolutional
fusion feature to adjust the model parameters according to
the task and become adaptive for better results. The
2.2. Graph Neural Networks formula for the Attention mechanism is defined as
For datasets containing images, traditional machine �⃗𝑖𝑖 ||𝑊𝑊ℎ
�⃗𝑗𝑗 ���
𝑒𝑒𝑒𝑒𝑒𝑒 �𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿�a�⃗𝑇𝑇 �𝑊𝑊ℎ
learning methods first preprocess the graph structure data 𝛼𝛼𝑖𝑖𝑖𝑖 = , �1.�
[31], mapping the graph structure information to a simple �⃗𝑖𝑖 ||𝑊𝑊ℎ
∑𝑘𝑘∈𝑁𝑁𝑖𝑖 𝑒𝑒𝑒𝑒𝑒𝑒 �𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿�a�⃗𝑇𝑇 �𝑊𝑊ℎ �⃗𝑘𝑘 ���
representation such as a high or low-dimensional feature
vector. This pre-processing stage may add to the image
where a is the weight vector of the attention mechanism,
noise by losing the topological dependency of the
information of the graph nodes [32]. is the attention coefficient between the 𝑖𝑖-th node and
Graph neural Networks (GNN) [32] are based on an the 𝑗𝑗-th node, .𝑇𝑇 is the transposition, || is the concatenation
information diffusion mechanism. Its appearance extends operation, 𝑊𝑊 is the weight matrix, and 𝑁𝑁𝑖𝑖 is some neighbor
existing neural network methods for processing the data nodes of 𝑖𝑖-th node. After getting the of each edge, the
represented in graph domains. GNN pre-processing differs node feature of the 𝑖𝑖-th point after the fusion of attention
from traditional deep learning models in that it requires the can be expressed as the following formula, which is
conversion of images into node and edge representations. essentially a weighted feature summation process, except
For large-size images or images with deep structure, that the weight coefficients in each fusion are learned with
certain GNN models face three problems: neighbor the model training, and finally after a nonlinear activation
explosion, node dependency, and over-smoothing [33]. In function.
addition to the graph itself, these problems are attributed to
the design of a multilayer GNNs framework [33]. To
address these problems, [33] proposed the Ripple Walk ���⃗ �⃗𝑗𝑗 � .
ℎ′ 𝑖𝑖 = 𝜎𝜎 � � 𝛼𝛼𝑖𝑖𝑖𝑖 𝑊𝑊ℎ �2.�
Training (RWT) method for deep and large GNNs. general
𝑗𝑗∈𝑁𝑁𝑖𝑖
subgraph-based training framework RWT does not train
directly on the whole picture but takes subgraphs from the
whole picture and constructs small batches for training. On this basis, to make the attention mechanism more
Their proposed complete GNN is based on small-batch scalable, the defined multi-head attention mechanism is
gradient updates. By computing small batches of gradients used to calculate the attention weights and then contact
within subgraphs, subgraphs with acceptable sizes can them to get the node feature. When at the last layer of the
avoid neighbor explosion altogether. Also, the gradients do network, the dimensionality of the node features output
not depend on nodes outside the subgraph, which solves the after contact is too large. To make the data reasonable, it
node dependency at the subgraph level. Aggregation was switched to accumulate and then average, and then
between subgraphs usually occurs accidentally. However, output to do specific tasks. Experiments on the PPI dataset
propagation-aggregation occurs in the subgraphs, so the show that GAT has better performance than GraphSAGE.
GNNs do not perform well on training sets with class imbalance.
over-smoothing problem can be solved.
This is because, in the class imbalance node classification, Song,
The classical models of GNN are GCN, GraphSAGE, et al. (2022) found that compensating for sub-nodes that deviate
and GAT. from the class connectivity pattern can easily lead to sub-node
GraphSAGE [34] is a graph neural network algorithm false positives. Juan, et al. (2023) point out that increasing the
proposed in 2017, which solves the limitations of GCN participation of a few nodes in the propagation process is an
networks: GCN training requires the adjacency matrix of effective solution. Their INS-GNN contains Self-supervised pre-
the entire graph, depends on the specific graph structure, training, Self-training, and Self-supervised edge augmentation.
and can generally only be used in transductive learning. self-supervised pre-training focuses on the contribution of a few
nodes, allowing the contribution to model learning, the model extracted after feature aggregation [32]. Before the
does not favor the majority of nodes. This is done by randomly improvement of the equation,
sampling the edges of the graph and masking a few nodes. Self-
training aims to reduce the cost of semi-supervised learning. It 1 1
enables unlabeled nodes to have labels by creating pseudo-labels, � −2 𝐴𝐴̂𝐷𝐷
𝐴𝐴̃ = 𝐷𝐷 � −2 + 𝐼𝐼𝑁𝑁 , �4.�
which facilitates the performance of two different numbers of
nodes. Self-supervised edge augmentation aims to involve a few where 𝐴𝐴̂ = 𝐴𝐴. The eigenvalues of the above equation are in
nodes more in information transfer so that the majority of nodes
a small range. Therefore, when used in deep neural network
have less influence on the model. However, Self-training may
introduce noise while expanding the dataset. However, Self-
models, repeated application of this operator can lead to
supervised learning may have an advancing effect on solving the numerical instability, and gradient explosion or
problem of unstable performance of GNNs in settings with too disappearance [41]. To alleviate this problem, Kipf and
few labeled nodes [38-40]. Welling (2016) let 𝐴𝐴̂ = 𝐴𝐴 + 𝐼𝐼𝑁𝑁 , and then
1 1
2.3. Graph Convolutional Networks � −2 𝐴𝐴̂𝐷𝐷
𝐴𝐴̃ = 𝐷𝐷 � −2 . �5.�
However, there are still some challenges and limitations of Pan, et al. (2022) proposed that no deep GCN model has
the original graph convolutional networks in image been used for medical diagnosis because of the problem of
classification. One of them is the problem of computational over-smoothing. To overcome the over-smoothing
efficiency because GCNs need to perform aggregation problem, they used a snowball GCN module to build a
operations on the whole graph, resulting in high multiscale convolutional module. the snowball GCN [53]
computational complexity [45]. In addition, GCNs may is a densely connected graph network that can connect
encounter the problem of memory limitation when dealing multiscale features. This graph network can superimpose
with large-scale image data because of the need to store and all the learned features as input to subsequent hidden
process a large number of nodes and edges [46]. Further, layers. This network also overcomes the gradient vanishing
the original GCN can capture information only about problem and reduces the number of parameters, etc. The
immediate neighbors with one layer of convolution. when key to solving the over-smoothing problem in Luan, et al.
multiple GCN layers are stacked, information about larger (2019) is that they define the graph convolution of a
neighborhoods is integrated [47]. This is because the graph spectral filter as the product of a block Krylov matrix and
convolution layers of GCNs can be considered low-pass a specific form of learnable parameter matrix. The formula
filters [48], and this property causes the signal to become 𝐾𝐾𝑚𝑚 (𝐴𝐴, 𝐵𝐵) ≔ [𝐵𝐵, 𝐴𝐴𝐴𝐴, … , 𝐴𝐴𝑚𝑚−1 𝐵𝐵] for the block Krylov
smoother, which is an inherent advantage of GCNs; matrix comes from the transformation of S-span of 𝕊𝕊-span
however, if the number of GCN layers is large, performing 𝑚𝑚
of {𝑋𝑋𝑘𝑘 }𝑘𝑘=1 and 𝐾𝐾𝑚𝑚𝕊𝕊 (𝐴𝐴, 𝐵𝐵). They stated that it is difficult to
this signal smoothing operation multiple times will make apply the block Krylov method directly to the GCN, so they
the signal converge, which loses the diversity of node developed the snowball and the truncated Krylov. For
features, which is the over-smoothing problem [49]. 𝑜𝑜𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝐿𝐿𝑝𝑝 ∁𝑊𝑊𝐶𝐶 ), if 𝑝𝑝 = 1 and 𝐿𝐿𝑝𝑝 = 𝐿𝐿 , then
Too many graph convolution layers may cause the over- the snowball implementation maps back to the Fourier
smoothing problem, however, there are image basis of the graph, thus achieving a "snowball "The
classification studies that solve this problem to some extent Adaptive multi-channel fusion GCN implemented by Pan,
[50-52], allowing the model to extract deep-level features. et al. (2022) also contains the channel common convolution
SelfSAGCN [50] overcomes the overfitting problem module, which solves the problem of extracting the nodes
and the over-smoothing problem by combining feature in a particular channel embedding and the common
aggregation and semantic alignment. SelfSAGCN first information shared between channels. As shown in Figure
applies feature aggregation to extract semantic information 3, the output of the channel common convolution module
from the labeled nodes layer by layer, which does not suffer is represented by the equation
from the over-smoothing phenomenon. The core idea of
selfSAGCN is that the node features obtained from the 𝐻𝐻𝑐𝑐 = 𝛼𝛼𝐻𝐻𝑐𝑐 + 𝛽𝛽𝐻𝐻𝑐𝑐𝑐𝑐𝐻𝐻0 + 𝛾𝛾𝐻𝐻𝑐𝑐𝑐𝑐 , �6.�
semantic and graph structure should be consistent when the
categories are the same. This is not affected by the over- where, 𝛼𝛼, 𝛽𝛽, 𝛾𝛾 are hyperparameters that measure the
smoothing phenomenon. The unlabeled node features importance of the common convolution, respectively. The
obtained from graph aggregation are aligned with semantic multi-channel attention they introduce can fuse the outputs
features by semantic alignment techniques to find of different channels and the features of each channel to
additional supervisory information. This improves the integrate the learned embeddings for prediction. The
performance of the model and enhances the identifiability ablation experiments demonstrate that their proposed
of node features. The semantic alignment of selfSAGCN is MAMF-GCN has strong robustness and high accuracy.
based on the central similarity of homogeneous class
information, which enables the model to transfer relevant
features to unlabeled data after learning semantics from the
labeled data. Yang, et al. (2021) additionally utilized
central similarity optimization to align node features
obtained from semantic and graph structure aspects,
respectively features are aligned, which has a significant
effect on mitigating over-smoothing. Also, the central
similarity of labeled and unlabeled nodes can provide
additional supervised information, which further improves
the classification accuracy of unlabeled nodes. Moreover,
they use the pseudo-labeling trick for unlabeled data and
also suppress the noise using the practice of updating the
centers. It is experimentally confirmed that selfSAGCN
has better performance on different datasets even when the
labeled nodes are severely limited. This indicates that the
overfitting problem does not affect it too much. Even if the
number of layers is increased to 16, selfSAGCN can
mitigate the over-smoothing problem.
Multi-Scale Deep Convolution showed that it is not feasible to down-sample the original
Functional
Knn Graph Snowball HF image to the right size and compress the input graph data
and GNN model to reduce the memory consumption of the
Cd model. They state that this is because of two ways of down-
Functional sampling, one, neighborhood sampling will result in
Multi-Channel Attention
Knn Graph
Snowball H CF "neighborhood explosion" when the number of graph
convolution layers increases, i.e., many neighborhoods are
Prediction
WC( ) WC( )
l Parameter l +1
Sharing Cu H making it difficult to train. Second, although graph
Phenotypic
Graph sampling can avoid "neighborhood explosion," it does not
Snowball H CP guarantee that each node will be sampled once throughout
Channel Common Convolution the training or inference process. For compressed input
Cd images, it is difficult to manipulate because the original
graph is small. For compressed GNN models, the
Multi-Scale Deep Convolution
relationship between the compression rate and the accuracy
Phenotypic
Graph Snowball
HP of the GNN model needs to be carefully designed to ensure
that both nodes in the high-dimensional semantic space and
nodes in the low-dimensional feature space can be
characterized. However, this approach is more difficult.
The core idea of Bi-GCN is to binarize the network
Figure 3. The structure of adaptive multi-channel parameters (e.g., weights) and input node features in the
fusion GCN. feature extraction phase, while not operating in this way in
the feature aggregation phase. In addition, the original
matrix multiplication of the graph convolution operation is
The prominent innovation of NSCGCN [52] is to overcome modified to a binary addition operation. Another core idea
the over-smoothing problem using feature fusion based on is to binarize the node features by separating them and
the node-self-convolution algorithm and to preserve the assigning attention weights to each node. By deploying
spatial structure information of the original feature graph additional parameters, the model can remain effective to
using the feature reconstruction algorithm. The innovative learn more useful information. In theory, Bi-GCN can
point of the node-self-convolution algorithm is that the reduce the memory consumption of network parameters
input undirected graph 𝐺𝐺 𝑙𝑙 retains only the node-self- and input data by an average of about 30 times and increase
connected degree matrix 𝐼𝐼. The result of the convolution of the inference speed by an average of about 47 times.
the input graph node features 𝑋𝑋 is given by However, Bi-GCN had not experimented based on solving
the over-smoothing problem, i.e., Bi-GCN does not
1 1 perform well at deeper layers.
𝑍𝑍 (𝑙𝑙+1) = 𝐷𝐷 −2 𝐼𝐼𝐷𝐷 −2 𝑋𝑋 (𝑙𝑙 ) 𝑊𝑊 (𝑙𝑙 ) , �7.� Similar to CNNs, GCNs have a multilayer structure [13],
where each layer extracts higher-level features by
where 𝑊𝑊 (𝑙𝑙 ) is the learnable parameter. A new graph aggregating features from neighboring nodes and applying
structure 𝐺𝐺 𝑙𝑙+1 is then obtained by combining the original a nonlinear activation function. This allows GCNs to take
adjacency matrix of the undirected graph 𝐺𝐺 𝑙𝑙 with 𝑍𝑍 (𝑙𝑙+1) to full advantage of the topology of the graph and thus better
regain a new graph structure 𝐺𝐺 (𝑙𝑙+1) . The feature capture the relationships between nodes. However,
reconstruction algorithm is based on the image applying GCNs may not be able to adaptively extract the
neighborhood structure, which converts the image into most relevant information between topologies or node
graphically structured data and has better classification features. Xu, et al. (2021) proposed a multiscale skeleton
performance than the down-sampling approach. Although adaptive weighted graph convolution network (MS-
NSCGCN also has outstanding performance, it also suffers AWGCN) for skeleton-based human action recognition in
from overfitting. And the feature reconstruction stage is not IoT. MS-AWGCN solves the problem of learning potential
adaptive and does not achieve the best structure. For the graph topology in an adaptive extraction style by an
overfitting phenomenon due to an insufficient amount of adaptive information aggregation strategy to weight
data, semantic alignment techniques can be considered to information from different sampling strategies more
allow labeled nodes to guide unlabeled nodes [50]. efficiently. The adaptive weighted graph convolution block
For the problem of large memory consumption of graph formulation of MS-AWGCN is as follows,
convolution operations, Bi-GCN [54] provides a solution.
𝐾𝐾
Bi-GCN has two innovative approaches, one is to perform 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 = � 𝛼𝛼 𝑊𝑊𝑘𝑘 𝑓𝑓𝑖𝑖𝑖𝑖 (𝐴𝐴𝑘𝑘 + 𝐿𝐿𝑘𝑘 ). �8.�
binarized network parameters and input node feature 𝑘𝑘
operations in the feature extraction phase, which can
theoretically reduce memory consumption by up to 30 where 𝑊𝑊𝑘𝑘 denotes the weight matrix, 𝐿𝐿𝑘𝑘 is learnable, and
times and improve inference speed by up to 47 times, and 𝑓𝑓𝑖𝑖𝑖𝑖 is the node feature of the input.
the other is to design a new backpropagation method to Relative to the formulation of [56] graph self-learning
accommodate binarized weights. Wang, et al. (2021) module
generalizable truss structural optimization," Applied [34] W. L. Hamilton, R. Ying, and J. Leskovec, "Inductive
Soft Computing, vol. 134, p. 110015, 2023/02/01/ 2023. Representation Learning on Large Graphs," 2017.
[17] X. Zhu, C. Li, J. Guo, and S. Dietze, "CNIM-GCN: [35] P. Velikovi, G. Cucurull, A. Casanova, A. Romero, P.
Consensus Neighbor Interaction-based Multi-channel Liò, and Y. Bengio, "Graph Attention Networks," 2017.
Graph Convolutional Networks," Expert Systems with [36] J. Song, J. Park, and E. Yang, "TAM: Topology-Aware
Applications, vol. 226, p. 120178, 2023/09/15/ 2023. Margin Loss for Class-Imbalanced Node
[18] A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification," presented at the Proceedings of the 39th
Classification with Deep Convolutional Neural International Conference on Machine Learning,
Networks," Advances in neural information processing Proceedings of Machine Learning Research, 2022.
systems, vol. 25, no. 2, 2012. [37] X. Juan, F. Zhou, W. Wang, W. Jin, J. Tang, and X.
[19] K. Simonyan and A. Zisserman, "Very Deep Wang, "INS-GNN: Improving graph imbalance
Convolutional Networks for Large-Scale Image learning with self-supervision," Information Sciences,
Recognition," Computer Science, 2014. vol. 637, p. 118935, 2023/08/01/ 2023.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, and A. [38] Q. Q. Zhao, H. F. Ma, L. J. Guo, and Z. X. Li,
Rabinovich, "Going Deeper with Convolutions," IEEE "Hierarchical attention network for attributed
Computer Society, 2014. community detection of joint representation," NEURAL
[21] S. Ioffe and C. Szegedy, "Batch Normalization: COMPUTING & APPLICATIONS, vol. 34, no. 7, pp.
Accelerating Deep Network Training by Reducing 5587-5601, APR 2022.
Internal Covariate Shift," JMLR.org, 2015. [39] Z. Yang, Y. Yan, H. Gan, J. Zhao, and Z. Ye, "A safe
[22] F. Chollet, "Xception: Deep Learning with Depthwise semi-supervised graph convolution network,"
Separable Convolutions," in 2017 IEEE Conference on Mathematical Biosciences and Engineering, vol. 19,
Computer Vision and Pattern Recognition (CVPR), no. 12, pp. 12677-12692, 2022.
2017. [40] Z. Zhou, J. Shi, S. Zhang, Z. Huang, and Q. Li,
[23] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual "Effective stabilized self-training on few-labeled graph
Learning for Image Recognition," IEEE, 2016. data," Information Sciences, vol. 631, pp. 369-384,
[24] G. Huang, Z. Liu, L. Maaten, and K. Q. Weinberger, 2023/06/01/ 2023.
"Densely Connected Convolutional Networks," in [41] T. N. Kipf and M. Welling, "Semi-Supervised
IEEE Conference on Computer Vision and Pattern Classification with Graph Convolutional Networks,"
Recognition, 2017. 2016.
[25] M. Tan and Q. V. Le, "EfficientNet: Rethinking Model [42] M. Niepert, M. Ahmed, and K. Kutzkov, "Learning
Scaling for Convolutional Neural Networks," 2019. Convolutional Neural Networks for Graphs,"
[26] X. Liu, Z. You, Y. He, S. Bi, and J. Wang, "Symmetry- JMLR.org, 2016.
Driven hyper feature GCN for skeleton-based gait [43] B. Jiang, Z. Zhang, D. Lin, J. Tang, and B. Luo, "Semi-
recognition," Pattern Recognition, vol. 125, p. 108520, Supervised Learning With Graph Learning-
2022/05/01/ 2022. Convolutional Networks," in 2019 IEEE/CVF
[27] H. Jiang, P. Cao, M. Xu, J. Yang, and O. Zaiane, "Hi- Conference on Computer Vision and Pattern
GCN: A hierarchical graph convolution network for Recognition (CVPR), 2020.
graph embedding learning of brain network and brain [44] Prithviraj et al., "Collective Classification in Network
disorders prediction," Computers in Biology and Data," Ai Magazine, 2008.
Medicine, vol. 127, p. 104096, 2020/12/01/ 2020. [45] T. Altaf, X. Wang, W. Ni, R. P. Liu, and R. Braun, "NE-
[28] Y. Zhang, S. Satapathy, D. Guttery, J. Gorriz, and S. GConv: A lightweight node edge graph convolutional
Wang, "Improved Breast Cancer Classification network for intrusion detection," Computers &
Through Combining Graph Convolutional Network Security, vol. 130, Jul 2023, Art no. 103285.
and Convolutional Neural Network," Information [46] R. Corrias, M. Gjoreski, and M. Langheinrich,
Processing and Management, vol. 58, p. 102439, 01/18 "Exploring Transformer and Graph Convolutional
2021. Networks for Human Mobility Modeling," Sensors,
[29] Y. Zang, D. Yang, T. Liu, H. Li, S. Zhao, and Q. Liu, vol. 23, no. 10, May 2023, Art no. 4803.
"SparseShift-GCN: High precision skeleton-based [47] L. Yao, C. Mao, and Y. Luo, "Graph Convolutional
action recognition," Pattern Recognition Letters, vol. Networks for Text Classification," 2019, pp. 7370-
153, pp. 136-143, 2022/01/01/ 2022. 7377.
[30] S. Wang, V. Govindaraj, J. Gorriz, X. Zhang, and Y. [48] N. T. Hoang and T. Maehara, "Revisiting Graph Neural
Zhang, "Explainable diagnosis of secondary pulmonary Networks: All We Have is Low-Pass Filters," 2019.
tuberculosis by graph rank-based average pooling [49] Q. Li, Z. Han, and X. M. Wu, "Deeper Insights into
neural network," Journal of Ambient Intelligence and Graph Convolutional Networks for Semi-Supervised
Humanized Computing, 03/11 2021. Learning," 2018.
[31] S. Wang, "Advances in data preprocessing for [50] X. Yang, C. Deng, Z. Dang, K. Wei, and J. Yan, "Self-
biomedical data fusion: an overview of the methods, SAGCN: Self-Supervised Semantic Alignment for
challenges, and prospects," Information Fusion, vol. Graph Convolution Network," in Computer Vision and
76, pp. 376-421, 2021. Pattern Recognition, 2021.
[32] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and [51] J. Pan, H. Lin, Y. Dong, Y. Wang, and Y. Ji, "MAMF-
G. Monfardini, "The Graph Neural Network Model," GCN: Multi-scale adaptive multi-channel fusion deep
IEEE Transactions on Neural Networks, vol. 20, no. 1, graph convolutional network for predicting mental
p. 61, 2009. disorder," Computers in Biology and Medicine, vol.
[33] Y. Liu, M. Zhang, C. Ma, B. Bai, and G. Yan, "Graph 148, p. 105823, 2022/09/01/ 2022.
neural network," 2020. [52] C. Tang, C. Hu, J. Sun, S.-H. Wang, and Y.-D. Zhang,
"NSCGCN: A novel deep GCN model to diagnosis