Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2403.04962v1 [eess.IV] 08 Mar 2024

C2P-GCN: Cell-to-Patch Graph Convolutional Network for Colorectal Cancer Grading

Sudipta Paul Department of Electrical Engineering
Rensselaer Polytechnic Institute
Troy, New York, USA
pauls5@rpi.edu
   Bülent Yener Department of Computer Science
Rensselaer Polytechnic Institute
Troy, New York, USA
yener@cs.rpi.edu
   Amanda W. Lund Department of Pathology
NYU Grossman School of Medicine
New York, NY, USA
amanda.lund@nyulangone.org
Abstract

Graph-based learning approaches, due to their ability to encode tissue/organ structure information, are increasingly favored for grading colorectal cancer histology images. Recent graph-based techniques involve dividing whole slide images (WSIs) into smaller or medium-sized patches, and then building graphs on each patch for direct use in training. This method, however, fails to capture the tissue structure information present in an entire WSI and relies on training from a significantly large dataset of image patches. In this paper, we propose a novel cell-to-patch graph convolutional network (C2P-GCN), which is a two-stage graph formation-based approach. In the first stage, it forms a patch-level graph based on the cell organization on each patch of a WSI. In the second stage, it forms an image-level graph based on a similarity measure between patches of a WSI considering each patch as a node of a graph. This graph representation is then fed into a multi-layer GCN-based classification network. Our approach, through its dual-phase graph construction, effectively gathers local structural details from individual patches and establishes a meaningful connection among all patches across a WSI. As C2P-GCN integrates the structural data of an entire WSI into a single graph, it allows our model to work with significantly fewer training data compared to the latest models for colorectal cancer. Experimental validation of C2P-GCN on two distinct colorectal cancer datasets demonstrates the effectiveness of our method.

Index Terms:
Graph convolutional network, Patch-level graph, Image-level graph, Cell graph, Colorectal cancer grading

I Introduction

Recent advancements in deep learning techniques have demonstrated higher efficacy in diagnosing colorectal cancer, outperforming traditional hand-crafted machine learning techniques. Numerous convolutional neural network (CNN)-based architectures [1, 2] were introduced in recent years to automatically grade colorectal histology images. These methods mostly break larger-size images into smaller patches, which are then used for both training and making predictions. Unfortunately, these methods fail to capture the tissue structure information as the features obtained from smaller patch sizes might not have a clear interpretative connection with the glandular architecture presented in colorectal cancer images.

To alleviate the issues mentioned above, recently some graph convolutional network (GCN) based architectures [3, 4] were introduced which utilize the tissue structure information through the cell graphs technique [5]. For the colorectal cancer grading task, GCN-based architectures such as CGC-Net [3] and HAT-Net [4] demonstrated notably better performance compared to other methods. However, similar to CNN-based methods, both CGC-Net and HAT-Net divide the large CRC histology images into medium-sized patches and construct cell graphs on those. The graphs constructed on each patch are then used for training, requiring a substantially large amount of training data. While these methods capture the tissue structure information on each patch, they do not establish a discernible interpretive connection across the whole image.

In this paper, we propose a novel Cell-to-Patch Graph Convolution Network (C2P-GCN) that features a two-stage graph construction process; a patch-level graph, and an image-level graph. The process is initiated by breaking a WSI into smaller or medium-sized patches. Then, at the patch level, a cell graph alongside various well-known global graphs (Voronoi graph of cells) [6] is created for each patch. Next, an image-level graph is constructed which is built over the entirety of a large image or a WSI, treating each patch as a node. The edges (connections) between these nodes (patches) are formed by a similarity measure between patches, thereby mapping out an interpretive connection among similar patches throughout the whole image. By encoding the entire image data into a unified graph, C2P-GCN processes it through a sophisticatedly designed multi-layer GCN-based classification network.

The key contributions of this paper are: (a) We introduced a novel C2P-GCN architecture, a dual-stage graph construction-based approach that harnesses the tissue structure information from the local cellular organization via patch-level graphs; then forms a meaningful connection among patches within a WSI that exhibit similar characteristics using an image-level graph; (b) We performed comprehensive experiments on two distinct colorectal cancer dataset, Extended CRC dataset [1] and the Color Cancer Dataset from Zhejiang University in China [2], showing that C2P-GCN exhibiting strong performance on par or even surpassing that of the recently proposed CNN and GCN-based methods. (c) Since C2P-GCN incorporates the structural details from an entire WSI into a single graph, it allows our model to require considerably less training data. On our tested datasets, C2P-GCN uses over two orders of magnitude less training data compared to the state-of-the-art methods while yielding comparable or better results.

Refer to caption
Figure 1: (a) C2P-GCN overall pipeline. C2P-GCN initially breaks a WSI into multiple patches and then constructs a patch-level graph for capturing structural features within a patch. Next, it forms an image-level graph of collective patches based on similar measures. The image-level graph containing whole image structural information is fed into a multi-layer GCN structure for classification. (b) Nuclei detection with gLoG filter. (c) Cell-graph construction; the magnified section shows how a sample node (nuclei) is connected with its neighborhood. (d) Voronoi diagram. (e) Delaunay triangulation. (f) Minimum spanning tree. (g) This figure highlights the red patch, which is a randomly chosen node of interest, and its top 15 most similar patches (blue) in terms of similarity scores.

II Methodology

A complete pipeline of the methodologies we propose in this paper is depicted in Fig. 1(a), which involves dividing a WSI into smaller or medium-sized patches, constructing patch-level and image-level graphs, and finally feeding these graphs into a multi-layer GCN for classification.

II-A Cell identification

Once a WSI is divided into patches, the next task is to extract cells from each patch to construct patch-level graphs in the following step. We note that the method proposed in this paper does not require precise cell segmentation. Determining only the cell locations is enough to form a graph or extract necessary features. In this work, we adopted the generalized Laplacian of Gaussian (gLoG) filter-based automatic nuclei detection technique developed in [7]. An illustrative example of nuclei detection with a gLoG filter on a colorectal cancer image patch is presented in Fig. 1(b). Table I shows the gLoG kernel parameters and the corresponding values we used to detect nuclei from the colorectal cancer images.

TABLE I: kernel parameters
x-axis scale, σxsubscript𝜎𝑥\sigma_{x}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT y-axis scale, σysubscript𝜎𝑦\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT Orientation, k𝑘kitalic_k Bandwidth, w𝑤witalic_w
8 4 9 7

II-B Patch graph formation

II-B1 Cell graph

Let, Gp=(Vp,Ep)subscript𝐺𝑝subscript𝑉𝑝subscript𝐸𝑝G_{p}=(V_{p},E_{p})italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ( italic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) be the cell graph constructed on a patch where Vpsubscript𝑉𝑝V_{p}italic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and Epsubscript𝐸𝑝E_{p}italic_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT denote the set of nodes and edges of the graph. We consider the cells extracted from the previous stage as the node set Vpsubscript𝑉𝑝V_{p}italic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT for our cell graph. An edge (u,v)𝑢𝑣(u,v)( italic_u , italic_v ) between two nodes (cells) u𝑢uitalic_u and v𝑣vitalic_v is determined based on biological understanding and the known interactions between cells in a particular tissue type. We propose that cells closer together in Euclidean distance are more prone to interact. Therefore, an edge is assigned between two nuclei if they are within a predefined distance dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT from each other. From this, the adjacency matrix is constructed as:

TABLE II: Cell graph features
Feature type No. Description

Connectedness and cliquishness measures

4

Average degree; Clustering coefficient; Giant connected component ratio; Number of connected components

Distance-based measures

8

Average eccentricity, Diameter, Radius, Average path length, Number of central points, Percent of central points, Number of vertices, and Number of edges

Spectral Measures

6

Largest eigenvalue adjacency, Trace of adjacency, Energy of adjacency, Lower slope, Upper slope, Trace of Laplacian

Ap(i,j)={1 if Dp(i,j)<dp0 otherwise subscript𝐴𝑝𝑖𝑗cases1 if subscript𝐷𝑝𝑖𝑗subscript𝑑𝑝0 otherwise A_{p}(i,j)=\left\{\begin{array}[]{cc}1&\text{ if }D_{p}(i,j)<d_{p}\\ 0&\text{ otherwise }\end{array}\right.italic_A start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_i , italic_j ) = { start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL if italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_i , italic_j ) < italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY

Here, the value of dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is set to be 64646464. An example of this graph built within a patch is illustrated in Fig. 1(c). The cell graph constructed on each patch made it possible to assess and quantify the interactions occurring within localized neighborhoods. A total of 18181818 distinct cell graph features are extracted to characterize this local cell organization within each patch, as shown in Table II [5].

II-B2 Voronoi graphs of cells for global graph

We applied three distinct methods: the Voronoi Diagram (VD), Delaunay Triangulation (DT), and the Minimum Spanning Tree (MST) to construct global graphs in each patch, and illustrations of each of these global graphs applied to an individual patch are shown in Figure 1(d), 1(e), and 1(f), respectively. To quantify the global graph tesselations of cells present within an entire patch, 24242424 distinct features (12121212 Voronoi, 8888 Delaunay, and 4444 MST) are extracted and are presented in Table III [6]. Apart from that, 27 nuclear density attributes were also integrated to characterize the nuclei clustering within the patch.

TABLE III: Global graph features
Feature type No. Description

Voronoi diagram

12

Polygon area, chord length, perimeter: mean, SD, min/max ratio, disorder

Delaunay triangulation

8

Triangle side length, area: mean, SD, min/max ratio, disorder

Minimum spanning tree

4

Edge length: mean, SD, min./max. ratio, disorder

Nuclei nearest neighbor (NN) features

27

Area of polygons; number of nuclei; density of nuclei; mean, SD, and disorder of distance to k𝑘kitalic_k-NN (k=3,5,7𝑘357k=3,5,7italic_k = 3 , 5 , 7); mean, SD, and disorder of NN in a n𝑛nitalic_n pixel radius (n=10,20,30,40,50𝑛1020304050n=10,20,30,40,50italic_n = 10 , 20 , 30 , 40 , 50)

II-C Image level Graph

For the image level graph, GI=(VI,EI)subscript𝐺𝐼subscript𝑉𝐼subscript𝐸𝐼G_{I}=(V_{I},E_{I})italic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = ( italic_V start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ), we treat each patch from a WSI as an individual node, VIsubscript𝑉𝐼V_{I}italic_V start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, with a d𝑑ditalic_d dimensional feature vector, xi𝐑dsubscript𝑥𝑖superscript𝐑𝑑x_{i}\in\mathbf{R}^{d}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for iVI𝑖subscript𝑉𝐼i\in V_{I}italic_i ∈ italic_V start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT. Here, the node attributes are the same features set we obtained from each patch in our patch-level graph, i,e; d=69𝑑69d=69italic_d = 69. EIsubscript𝐸𝐼E_{I}italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT represents the edge set and ei,jEIsubscript𝑒𝑖𝑗subscript𝐸𝐼e_{i,j}\in E_{I}italic_e start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT denotes an edge between two patches. To form an edge between a pair of nodes (patches), we rely on assessing the similarity in attributes between these nodes, facilitating connections between nodes that exhibit similar traits. For this, we use cosine similarity as a similarity measure and form an edge between a pair of nodes if the similarity value exceeds a predefined threshold, θ𝜃\thetaitalic_θ, and the edge weights are set as the similarity values. Mathematically,

wij=cos(i,j)=vivjvivjsubscript𝑤𝑖𝑗𝑐𝑜𝑠𝑖𝑗subscript𝑣𝑖subscript𝑣𝑗normsubscript𝑣𝑖normsubscript𝑣𝑗w_{ij}=cos(i,j)=\dfrac{v_{i}\cdot v_{j}}{||v_{i}||\hskip 2.84526pt||v_{j}||}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_c italic_o italic_s ( italic_i , italic_j ) = divide start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG | | italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | | | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | end_ARG

where, wijsubscript𝑤𝑖𝑗w_{ij}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents weight of the edge between node i𝑖iitalic_i and j𝑗jitalic_j. Now, the adjacency matrix for the image level graph can be computed as

AI(i,j)={wij if wij>θ0 otherwise subscript𝐴𝐼𝑖𝑗casessubscript𝑤𝑖𝑗 if subscript𝑤𝑖𝑗𝜃0 otherwise A_{I}(i,j)=\left\{\begin{array}[]{cc}w_{ij}&\text{ if }w_{ij}>\theta\\ 0&\text{ otherwise }\end{array}\right.italic_A start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_i , italic_j ) = { start_ARRAY start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_CELL start_CELL if italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > italic_θ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY

The edges between these patches exhibiting similar characteristics establish a meaningful link across the whole image. Here, θ𝜃\thetaitalic_θ is chosen as 0.80.80.80.8 which is obtained by manual tuning. A visual representation of a specific patch and its connections to other patches based on similarity is shown in Fig. 1(g).

II-D Network architecture

For network architecture, we use a multi-layer Graph Convolutional Network (GCN) followed by a series of linear layers and a softmax classification layer. For each graph convolutional layer, GCNl𝐺𝐶subscript𝑁𝑙GCN_{l}italic_G italic_C italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, with ReLU activation and dropout, the equations might be represented as follows:

H(l+1)=Dropout(ReLU(GCNl(X(l),AI;W(l))))superscript𝐻𝑙1DropoutReLUsubscriptGCN𝑙superscript𝑋𝑙subscriptsuperscript𝐴𝐼superscript𝑊𝑙H^{(l+1)}=\text{Dropout}\left(\text{ReLU}\left({\text{GCN}}_{l}\left(X^{(l)},A% ^{\prime}_{I};W^{(l)}\right)\right)\right)italic_H start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = Dropout ( ReLU ( GCN start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ; italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ) (1)

Here, AIsubscriptsuperscript𝐴𝐼A^{\prime}_{I}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is the weighted adjacency matrix where (Aij)I=(Aij)Iwijsubscriptsubscriptsuperscript𝐴𝑖𝑗𝐼subscriptsubscript𝐴𝑖𝑗𝐼subscript𝑤𝑖𝑗{(A^{\prime}_{ij})}_{I}={(A_{ij})}_{I}\cdot w_{ij}( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = ( italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, W(l)superscript𝑊𝑙W^{(l)}italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the learnable weight matrix at layer l𝑙litalic_l, X(l)superscript𝑋𝑙X^{(l)}italic_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the input to a GCN layer l𝑙litalic_l, and H(l+1)superscript𝐻𝑙1H^{(l+1)}italic_H start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT is the node feature matrix at layer l+1𝑙1l+1italic_l + 1. Then, we apply a global mean pooling on the feature representation of each layer and then concatenate the pooled features from all GCN layers L as:

Pconcat=Concatenate(P(1),P(2),,P(L)),wheresubscript𝑃concatConcatenatesuperscript𝑃1superscript𝑃2superscript𝑃𝐿whereP_{\text{concat}}=\text{Concatenate}\left(P^{(1)},P^{(2)},\ldots,P^{(L)}\right% ),{\text{where}}italic_P start_POSTSUBSCRIPT concat end_POSTSUBSCRIPT = Concatenate ( italic_P start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_P start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) , where
P(l)=GlobalMeanPooling(GCNl(X(l),AI;W(l)))superscript𝑃𝑙GlobalMeanPoolingsubscriptGCN𝑙superscript𝑋𝑙subscriptsuperscript𝐴𝐼superscript𝑊𝑙P^{(l)}=\text{GlobalMeanPooling}\left(\text{GCN}_{l}\left(X^{(l)},A^{\prime}_{% I};W^{(l)}\right)\right)italic_P start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = GlobalMeanPooling ( GCN start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ; italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) )

Pconcatsubscript𝑃concatP_{\text{concat}}italic_P start_POSTSUBSCRIPT concat end_POSTSUBSCRIPT is now passed through a series of linear layers with ReLU activations and dropout:

Z(l+1)=Dropout(ReLU(Wlinear(l)Z(l)+blinear(l)))superscript𝑍𝑙1DropoutReLUsubscriptsuperscript𝑊𝑙linearsuperscript𝑍𝑙subscriptsuperscript𝑏𝑙linearZ^{(l+1)}=\text{Dropout}\left(\text{ReLU}\left(W^{(l)}_{\text{linear}}Z^{(l)}+% b^{(l)}_{\text{linear}}\right)\right)italic_Z start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = Dropout ( ReLU ( italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT italic_Z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT ) ) (2)

where, Z(1)=Pconcatsuperscript𝑍1subscript𝑃concatZ^{(1)}=P_{\text{concat}}italic_Z start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = italic_P start_POSTSUBSCRIPT concat end_POSTSUBSCRIPT, Wlinear(l)subscriptsuperscript𝑊𝑙linearW^{(l)}_{\text{linear}}italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT and blinear(l)subscriptsuperscript𝑏𝑙linearb^{(l)}_{\text{linear}}italic_b start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT are the weights and biases for the linear layers, and l𝑙litalic_l ranges from 1111 to the number of linear layers N𝑁Nitalic_N. Finally, a softmax layer is used for classification:

Y=Softmax(Wlinear(N+1)Z(N)+blinear(N+1))𝑌Softmaxsubscriptsuperscript𝑊𝑁1linearsuperscript𝑍𝑁subscriptsuperscript𝑏𝑁1linearY=\text{Softmax}\left(W^{(N+1)}_{\text{linear}}Z^{(N)}+b^{(N+1)}_{\text{linear% }}\right)italic_Y = Softmax ( italic_W start_POSTSUPERSCRIPT ( italic_N + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT italic_Z start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT ( italic_N + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT linear end_POSTSUBSCRIPT ) (3)

Multi-layer GCN architecture used in C2P-GCN along with the dimension of each layer is visualized in Fig. 1(a).

III Experiments

III-A Dataset

Our proposed method is evaluated on two different colorectal cancer datasets. Dataset I [1], The Extended CRC dataset consists of 300300300300 images at 20202020x magnification and contains 120120120120 normal, 120120120120 low grade, and 60606060 high grade images, with the size of 4548454845484548 ×\times× 7548754875487548 and 5000500050005000 ×\times× 7300730073007300 pixels. We divide each large image into 768768768768 ×\times× 768768768768 image patches with a stride of 128128128128. To ensure fair comparisons with other methods [4, 3, 1], we divided the dataset into three equal folds for cross-validation, maintaining the identical split of the images in each fold. We then train our model with two folds at a time and validate it on another fold.

Dataset II [2] is from the Department of Pathology of Zhejiang University in China which consists of 717717717717 large images. It has 355355355355 cancer and 362362362362 normal images at 40404040x magnification of various sizes. For a fair comparison with [2], we randomly split the images in such a way that half of them form the training set and the other half form the testing set while preserving the same class proportion. For this dataset, we divide each image into small patches of size 768768768768 ×\times× 768768768768 with a stride of 256256256256 for cell-graph construction.

III-B Implementation

The C2P-GCN is implemented with the PyTorch framework, utilizing the PyTorch Geometric (PyG) library provided in the package. The node features are standardized by adjusting them based on their respective mean and standard deviation. In this experiment, we used the Adam optimizer with a learning rate of 0.00020.00020.00020.0002 selected through a grid search process. The batch size is chosen as 20202020 and the model is trained for 600600600600 epochs. To avoid overfitting, we used Dropout regularization with p=0.3𝑝0.3p=0.3italic_p = 0.3.

III-C Experimental Results

To validate the efficacy of our approach, we compared the performance of our method with several cutting-edge state-of-the-art methods. As illustrated in Table IV, our method, C2P-GCN, outperforms all the CNN-based methods such as MobileNet, ResNet50, Inception, Xception, CA-CNN, and VIT by a large margin. It also outperforms the GCN-based method, CGC-Net comfortably. There is a small gap between our method and HAT-Net, which we believe could be mitigated by incorporating additional training data. It is worth mentioning that both CGC-Net and HAT-Net were trained on image patches and based on the patch-level performance a majority voting was used to generate the image-level prediction. For the Extended CRC dataset, [4] extracted a total of 114243114243114243114243 patches from the large images, and used approximately two-thirds of the patches for training for the HAT-Net method. With C2P-GCN, on the other hand, we trained our model on the image-level graph data, and only 200200200200 training data were used at a time to predict performance on each fold. This means C2P-GCN uses more than two orders of magnitude smaller training data than those used by the HAT-Net while experiencing a performance decrease of just 0.33%percent0.330.33\%0.33 %.

On Dataset II, we worked only on the binary classification since the dataset is highly imbalanced to conduct a multiclass classification with our method as our approach leverages full images for training rather than individual image patches. For binary classification, as can be seen from Table V, our method yields the highest performance compared to other methods implemented on this dataset and beats the SVM-CNN method by 0.4%percent0.40.4\%0.4 %. It is noted that our model was trained only with 359359359359 image-level graph data. The SVM-CNN methods did not directly mention the training size, however, Dataset II has an average image size of 5.105.105.105.10 mm2𝑚superscript𝑚2mm^{2}italic_m italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (10000×10000)1000010000(10000\times 10000)( 10000 × 10000 ), cropped at 40x magnification i.e. 226226226226 nm/pixel [2]. Since the authors used a patch size of 672×672672672672\times 672672 × 672 to break each large image and used patches extracted from half of the total image dataset for training purposes, it is safe to say that, C2P-GCN uses over two orders of magnitude less training data compared to what is used by SVM-CNN.

IV Conclusion

In this paper, we introduced a novel GCN-based architecture, cell-to-patch graph convolutional network (C2P-GCN), which effectively integrates the structural information of a Whole Slide Image (WSI) into one comprehensive graph using a dual phase graph formation strategy. Our method when applied to two different colorectal cancer image datasets, not only yields comparable or better performance than most of the recently proposed CNN and GCN-based architecture but also does so with a substantially less amount of training data. C2P-GCN, exhibiting strong performance confirms the significance of our contribution.

TABLE IV: Results on Dataset I
Methods Accuracy (%) Methods Accuracy (%)
MobileNet [1] 84.33 ±plus-or-minus\pm± 3.30 ResNet50 [1] 84.33 ±plus-or-minus\pm± 0.94
Inception [3] 84.67 ±plus-or-minus\pm± 1.70 Xception [4] 86.67 ±plus-or-minus\pm± 0.94
CA-CNN [1] 86.67 ±plus-or-minus\pm± 1.70 VIT [4] 86.67 ±plus-or-minus\pm± 4.04
CGC-Net [3] 93.33 ±plus-or-minus\pm± 0.93 HAT-Net [4] 95.33 ±plus-or-minus\pm± 0.58
Ours (C2P-GCN) 95.00 ±plus-or-minus\pm± 1.70
TABLE V: Results on Dataset-2
Methods Accuracy Methods Accuracy
MCIL [8] 95.5%percent95.595.5\%95.5 % TRANS [9] 92.3%percent92.392.3\%92.3 %
SVM-IMG [2] 94.3%percent94.394.3\%94.3 % SVM-MF [2] 90.1%percent90.190.1\%90.1 %
SVM-CNN [2] 98.0%percent98.098.0\%98.0 % Ours (C2P-GCN) 98.4%percent98.498.4\%98.4 %

References

  • [1] Muhammad Shaban et al. “Context-Aware Convolutional Neural Network for Grading of Colorectal Cancer Histology Images” In IEEE Transactions on Medical Imaging 39.7, 2020, pp. 2395–2405
  • [2] Xu Y, Jia Z and Wang LB “Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features” In BMC Bioinformatics 18(1):281, 2017
  • [3] Yanning Zhou et al. “CGC-Net: Cell Graph Convolutional Network for Grading of Colorectal Cancer Histology Images” In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 388–398
  • [4] Yihan Su et al. “HAT-Net: A Hierarchical Transformer Graph Neural Network for Grading of Colorectal Cancer Histology Images” In British Machine Vision Conference, 2021
  • [5] Bülent Yener “Cell-graphs: image-driven modeling of structure-function relationship” In Commun. ACM 60.1 New York, NY, USA: Association for Computing Machinery, 2016, pp. 74–84
  • [6] G. Lee et al. “Nuclear Shape and Architecture in Benign Fields Predict Biochemical Recurrence in Prostate Cancer Patients Following Radical Prostatectomy: Preliminary Findings.” In Eur Urol Focus, 2017, pp. 457–466
  • [7] Hongming Xu et al. “Automatic Nuclei Detection Based on Generalized Laplacian of Gaussian Filters” In IEEE Journal of Biomedical and Health Informatics 21.3, 2017, pp. 826–837 DOI: 10.1109/JBHI.2016.2544245
  • [8] Y. Xu et al. “Weakly supervised histopathology cancer image segmentation and classification.” In MIA, 2014
  • [9] Y. Song et al. “Discriminative data transform for image feature extraction and classification.” In MICCAI, 2013