C2P-GCN: Cell-to-Patch Graph Convolutional Network for Colorectal Cancer Grading

Sudipta Paul Department of Electrical Engineering
Rensselaer Polytechnic Institute
Troy, New York, USA
pauls5@rpi.edu Bülent Yener Department of Computer Science
Rensselaer Polytechnic Institute
Troy, New York, USA
yener@cs.rpi.edu Amanda W. Lund Department of Pathology
NYU Grossman School of Medicine
New York, NY, USA
amanda.lund@nyulangone.org

Abstract

Graph-based learning approaches, due to their ability to encode tissue/organ structure information, are increasingly favored for grading colorectal cancer histology images. Recent graph-based techniques involve dividing whole slide images (WSIs) into smaller or medium-sized patches, and then building graphs on each patch for direct use in training. This method, however, fails to capture the tissue structure information present in an entire WSI and relies on training from a significantly large dataset of image patches. In this paper, we propose a novel cell-to-patch graph convolutional network (C2P-GCN), which is a two-stage graph formation-based approach. In the first stage, it forms a patch-level graph based on the cell organization on each patch of a WSI. In the second stage, it forms an image-level graph based on a similarity measure between patches of a WSI considering each patch as a node of a graph. This graph representation is then fed into a multi-layer GCN-based classification network. Our approach, through its dual-phase graph construction, effectively gathers local structural details from individual patches and establishes a meaningful connection among all patches across a WSI. As C2P-GCN integrates the structural data of an entire WSI into a single graph, it allows our model to work with significantly fewer training data compared to the latest models for colorectal cancer. Experimental validation of C2P-GCN on two distinct colorectal cancer datasets demonstrates the effectiveness of our method.

Index Terms:

Graph convolutional network, Patch-level graph, Image-level graph, Cell graph, Colorectal cancer grading

I Introduction

Recent advancements in deep learning techniques have demonstrated higher efficacy in diagnosing colorectal cancer, outperforming traditional hand-crafted machine learning techniques. Numerous convolutional neural network (CNN)-based architectures [1, 2] were introduced in recent years to automatically grade colorectal histology images. These methods mostly break larger-size images into smaller patches, which are then used for both training and making predictions. Unfortunately, these methods fail to capture the tissue structure information as the features obtained from smaller patch sizes might not have a clear interpretative connection with the glandular architecture presented in colorectal cancer images.

To alleviate the issues mentioned above, recently some graph convolutional network (GCN) based architectures [3, 4] were introduced which utilize the tissue structure information through the cell graphs technique [5]. For the colorectal cancer grading task, GCN-based architectures such as CGC-Net [3] and HAT-Net [4] demonstrated notably better performance compared to other methods. However, similar to CNN-based methods, both CGC-Net and HAT-Net divide the large CRC histology images into medium-sized patches and construct cell graphs on those. The graphs constructed on each patch are then used for training, requiring a substantially large amount of training data. While these methods capture the tissue structure information on each patch, they do not establish a discernible interpretive connection across the whole image.

In this paper, we propose a novel Cell-to-Patch Graph Convolution Network (C2P-GCN) that features a two-stage graph construction process; a patch-level graph, and an image-level graph. The process is initiated by breaking a WSI into smaller or medium-sized patches. Then, at the patch level, a cell graph alongside various well-known global graphs (Voronoi graph of cells) [6] is created for each patch. Next, an image-level graph is constructed which is built over the entirety of a large image or a WSI, treating each patch as a node. The edges (connections) between these nodes (patches) are formed by a similarity measure between patches, thereby mapping out an interpretive connection among similar patches throughout the whole image. By encoding the entire image data into a unified graph, C2P-GCN processes it through a sophisticatedly designed multi-layer GCN-based classification network.

The key contributions of this paper are: (a) We introduced a novel C2P-GCN architecture, a dual-stage graph construction-based approach that harnesses the tissue structure information from the local cellular organization via patch-level graphs; then forms a meaningful connection among patches within a WSI that exhibit similar characteristics using an image-level graph; (b) We performed comprehensive experiments on two distinct colorectal cancer dataset, Extended CRC dataset [1] and the Color Cancer Dataset from Zhejiang University in China [2], showing that C2P-GCN exhibiting strong performance on par or even surpassing that of the recently proposed CNN and GCN-based methods. (c) Since C2P-GCN incorporates the structural details from an entire WSI into a single graph, it allows our model to require considerably less training data. On our tested datasets, C2P-GCN uses over two orders of magnitude less training data compared to the state-of-the-art methods while yielding comparable or better results.

Refer to caption — Figure 1: (a) C2P-GCN overall pipeline. C2P-GCN initially breaks a WSI into multiple patches and then constructs a patch-level graph for capturing structural features within a patch. Next, it forms an image-level graph of collective patches based on similar measures. The image-level graph containing whole image structural information is fed into a multi-layer GCN structure for classification. (b) Nuclei detection with gLoG filter. (c) Cell-graph construction; the magnified section shows how a sample node (nuclei) is connected with its neighborhood. (d) Voronoi diagram. (e) Delaunay triangulation. (f) Minimum spanning tree. (g) This figure highlights the red patch, which is a randomly chosen node of interest, and its top 15 most similar patches (blue) in terms of similarity scores.

II Methodology

A complete pipeline of the methodologies we propose in this paper is depicted in Fig. 1(a), which involves dividing a WSI into smaller or medium-sized patches, constructing patch-level and image-level graphs, and finally feeding these graphs into a multi-layer GCN for classification.

II-A Cell identification

Once a WSI is divided into patches, the next task is to extract cells from each patch to construct patch-level graphs in the following step. We note that the method proposed in this paper does not require precise cell segmentation. Determining only the cell locations is enough to form a graph or extract necessary features. In this work, we adopted the generalized Laplacian of Gaussian (gLoG) filter-based automatic nuclei detection technique developed in [7]. An illustrative example of nuclei detection with a gLoG filter on a colorectal cancer image patch is presented in Fig. 1(b). Table I shows the gLoG kernel parameters and the corresponding values we used to detect nuclei from the colorectal cancer images.

TABLE I: kernel parameters

x-axis scale, $\sigma_{x}$	y-axis scale, $\sigma_{y}$	Orientation, $k$	Bandwidth, $w$
8	4	9	7

II-B Patch graph formation

II-B1 Cell graph

Let, $G_{p}=(V_{p},E_{p})$ be the cell graph constructed on a patch where $V_{p}$ and $E_{p}$ denote the set of nodes and edges of the graph. We consider the cells extracted from the previous stage as the node set $V_{p}$ for our cell graph. An edge $(u,v)$ between two nodes (cells) $u$ and $v$ is determined based on biological understanding and the known interactions between cells in a particular tissue type. We propose that cells closer together in Euclidean distance are more prone to interact. Therefore, an edge is assigned between two nuclei if they are within a predefined distance $d_{p}$ from each other. From this, the adjacency matrix is constructed as:

TABLE II: Cell graph features

Feature type	No.	Description
Connectedness and cliquishness measures	4	Average degree; Clustering coefficient; Giant connected component ratio; Number of connected components
Distance-based measures	8	Average eccentricity, Diameter, Radius, Average path length, Number of central points, Percent of central points, Number of vertices, and Number of edges
Spectral Measures	6	Largest eigenvalue adjacency, Trace of adjacency, Energy of adjacency, Lower slope, Upper slope, Trace of Laplacian

A_{p}(i,j)=\left\{\begin{array}[]{cc}1&\text{ if }D_{p}(i,j)<d_{p}\\ 0&\text{ otherwise }\end{array}\right.

Here, the value of $d_{p}$ is set to be $64$ . An example of this graph built within a patch is illustrated in Fig. 1(c). The cell graph constructed on each patch made it possible to assess and quantify the interactions occurring within localized neighborhoods. A total of $18$ distinct cell graph features are extracted to characterize this local cell organization within each patch, as shown in Table II [5].

II-B2 Voronoi graphs of cells for global graph

We applied three distinct methods: the Voronoi Diagram (VD), Delaunay Triangulation (DT), and the Minimum Spanning Tree (MST) to construct global graphs in each patch, and illustrations of each of these global graphs applied to an individual patch are shown in Figure 1(d), 1(e), and 1(f), respectively. To quantify the global graph tesselations of cells present within an entire patch, $24$ distinct features ( $12$ Voronoi, $8$ Delaunay, and $4$ MST) are extracted and are presented in Table III [6]. Apart from that, 27 nuclear density attributes were also integrated to characterize the nuclei clustering within the patch.

TABLE III: Global graph features

Feature type	No.	Description
Voronoi diagram	12	Polygon area, chord length, perimeter: mean, SD, min/max ratio, disorder
Delaunay triangulation	8	Triangle side length, area: mean, SD, min/max ratio, disorder
Minimum spanning tree	4	Edge length: mean, SD, min./max. ratio, disorder
Nuclei nearest neighbor (NN) features	27	Area of polygons; number of nuclei; density of nuclei; mean, SD, and disorder of distance to $k$ -NN ( $k=3,5,7$ ); mean, SD, and disorder of NN in a $n$ pixel radius ( $n=10,20,30,40,50$ )

II-C Image level Graph

For the image level graph, $G_{I}=(V_{I},E_{I})$ , we treat each patch from a WSI as an individual node, $V_{I}$ , with a $d$ dimensional feature vector, $x_{i}\in\mathbf{R}^{d}$ for $i\in V_{I}$ . Here, the node attributes are the same features set we obtained from each patch in our patch-level graph, i,e; $d=69$ . $E_{I}$ represents the edge set and $e_{i,j}\in E_{I}$ denotes an edge between two patches. To form an edge between a pair of nodes (patches), we rely on assessing the similarity in attributes between these nodes, facilitating connections between nodes that exhibit similar traits. For this, we use cosine similarity as a similarity measure and form an edge between a pair of nodes if the similarity value exceeds a predefined threshold, $\theta$ , and the edge weights are set as the similarity values. Mathematically,

w_{ij}=cos(i,j)=\dfrac{v_{i}\cdot v_{j}}{||v_{i}||\hskip 2.84526pt||v_{j}||}

where, $w_{ij}$ represents weight of the edge between node $i$ and $j$ . Now, the adjacency matrix for the image level graph can be computed as

A_{I}(i,j)=\left\{\begin{array}[]{cc}w_{ij}&\text{ if }w_{ij}>\theta\\ 0&\text{ otherwise }\end{array}\right.

The edges between these patches exhibiting similar characteristics establish a meaningful link across the whole image. Here, $\theta$ is chosen as $0.8$ which is obtained by manual tuning. A visual representation of a specific patch and its connections to other patches based on similarity is shown in Fig. 1(g).

II-D Network architecture

For network architecture, we use a multi-layer Graph Convolutional Network (GCN) followed by a series of linear layers and a softmax classification layer. For each graph convolutional layer, $GCN_{l}$ , with ReLU activation and dropout, the equations might be represented as follows:

H^{(l+1)}=\text{Dropout}\left(\text{ReLU}\left({\text{GCN}}_{l}\left(X^{(l)},A% ^{\prime}_{I};W^{(l)}\right)\right)\right)

(1)

Here, $A^{\prime}_{I}$ is the weighted adjacency matrix where ${(A^{\prime}_{ij})}_{I}={(A_{ij})}_{I}\cdot w_{ij}$ , $W^{(l)}$ is the learnable weight matrix at layer $l$ , $X^{(l)}$ is the input to a GCN layer $l$ , and $H^{(l+1)}$ is the node feature matrix at layer $l+1$ . Then, we apply a global mean pooling on the feature representation of each layer and then concatenate the pooled features from all GCN layers L as:

P_{\text{concat}}=\text{Concatenate}\left(P^{(1)},P^{(2)},\ldots,P^{(L)}\right% ),{\text{where}}

P^{(l)}=\text{GlobalMeanPooling}\left(\text{GCN}_{l}\left(X^{(l)},A^{\prime}_{% I};W^{(l)}\right)\right)

$P_{\text{concat}}$ is now passed through a series of linear layers with ReLU activations and dropout:

Z^{(l+1)}=\text{Dropout}\left(\text{ReLU}\left(W^{(l)}_{\text{linear}}Z^{(l)}+% b^{(l)}_{\text{linear}}\right)\right)

(2)

where, $Z^{(1)}=P_{\text{concat}}$ , $W^{(l)}_{\text{linear}}$ and $b^{(l)}_{\text{linear}}$ are the weights and biases for the linear layers, and $l$ ranges from $1$ to the number of linear layers $N$ . Finally, a softmax layer is used for classification:

Y=\text{Softmax}\left(W^{(N+1)}_{\text{linear}}Z^{(N)}+b^{(N+1)}_{\text{linear% }}\right)

(3)

Multi-layer GCN architecture used in C2P-GCN along with the dimension of each layer is visualized in Fig. 1(a).

III Experiments

III-A Dataset

Our proposed method is evaluated on two different colorectal cancer datasets. Dataset I [1], The Extended CRC dataset consists of $300$ images at $20$ x magnification and contains $120$ normal, $120$ low grade, and $60$ high grade images, with the size of $4548$ $\times$ $7548$ and $5000$ $\times$ $7300$ pixels. We divide each large image into $768$ $\times$ $768$ image patches with a stride of $128$ . To ensure fair comparisons with other methods [4, 3, 1], we divided the dataset into three equal folds for cross-validation, maintaining the identical split of the images in each fold. We then train our model with two folds at a time and validate it on another fold.

Dataset II [2] is from the Department of Pathology of Zhejiang University in China which consists of $717$ large images. It has $355$ cancer and $362$ normal images at $40$ x magnification of various sizes. For a fair comparison with [2], we randomly split the images in such a way that half of them form the training set and the other half form the testing set while preserving the same class proportion. For this dataset, we divide each image into small patches of size $768$ $\times$ $768$ with a stride of $256$ for cell-graph construction.

III-B Implementation

The C2P-GCN is implemented with the PyTorch framework, utilizing the PyTorch Geometric (PyG) library provided in the package. The node features are standardized by adjusting them based on their respective mean and standard deviation. In this experiment, we used the Adam optimizer with a learning rate of $0.0002$ selected through a grid search process. The batch size is chosen as $20$ and the model is trained for $600$ epochs. To avoid overfitting, we used Dropout regularization with $p=0.3$ .

III-C Experimental Results

To validate the efficacy of our approach, we compared the performance of our method with several cutting-edge state-of-the-art methods. As illustrated in Table IV, our method, C2P-GCN, outperforms all the CNN-based methods such as MobileNet, ResNet50, Inception, Xception, CA-CNN, and VIT by a large margin. It also outperforms the GCN-based method, CGC-Net comfortably. There is a small gap between our method and HAT-Net, which we believe could be mitigated by incorporating additional training data. It is worth mentioning that both CGC-Net and HAT-Net were trained on image patches and based on the patch-level performance a majority voting was used to generate the image-level prediction. For the Extended CRC dataset, [4] extracted a total of $114243$ patches from the large images, and used approximately two-thirds of the patches for training for the HAT-Net method. With C2P-GCN, on the other hand, we trained our model on the image-level graph data, and only $200$ training data were used at a time to predict performance on each fold. This means C2P-GCN uses more than two orders of magnitude smaller training data than those used by the HAT-Net while experiencing a performance decrease of just $0.33\%$ .

On Dataset II, we worked only on the binary classification since the dataset is highly imbalanced to conduct a multiclass classification with our method as our approach leverages full images for training rather than individual image patches. For binary classification, as can be seen from Table V, our method yields the highest performance compared to other methods implemented on this dataset and beats the SVM-CNN method by $0.4\%$ . It is noted that our model was trained only with $359$ image-level graph data. The SVM-CNN methods did not directly mention the training size, however, Dataset II has an average image size of $5.10$ $mm^{2}$ $(10000\times 10000)$ , cropped at 40x magnification i.e. $226$ nm/pixel [2]. Since the authors used a patch size of $672\times 672$ to break each large image and used patches extracted from half of the total image dataset for training purposes, it is safe to say that, C2P-GCN uses over two orders of magnitude less training data compared to what is used by SVM-CNN.

IV Conclusion

In this paper, we introduced a novel GCN-based architecture, cell-to-patch graph convolutional network (C2P-GCN), which effectively integrates the structural information of a Whole Slide Image (WSI) into one comprehensive graph using a dual phase graph formation strategy. Our method when applied to two different colorectal cancer image datasets, not only yields comparable or better performance than most of the recently proposed CNN and GCN-based architecture but also does so with a substantially less amount of training data. C2P-GCN, exhibiting strong performance confirms the significance of our contribution.

TABLE IV: Results on Dataset I

Methods	Accuracy (%)	Methods	Accuracy (%)
MobileNet [1]	84.33 $\pm$ 3.30	ResNet50 [1]	84.33 $\pm$ 0.94
Inception [3]	84.67 $\pm$ 1.70	Xception [4]	86.67 $\pm$ 0.94
CA-CNN [1]	86.67 $\pm$ 1.70	VIT [4]	86.67 $\pm$ 4.04
CGC-Net [3]	93.33 $\pm$ 0.93	HAT-Net [4]	95.33 $\pm$ 0.58
Ours (C2P-GCN)		95.00 $\pm$ 1.70

TABLE V: Results on Dataset-2

Methods	Accuracy	Methods	Accuracy
MCIL [8]	$95.5\%$	TRANS [9]	$92.3\%$
SVM-IMG [2]	$94.3\%$	SVM-MF [2]	$90.1\%$
SVM-CNN [2]	$98.0\%$	Ours (C2P-GCN)	$98.4\%$

References

[1] Muhammad Shaban et al. “Context-Aware Convolutional Neural Network for Grading of Colorectal Cancer Histology Images” In IEEE Transactions on Medical Imaging 39.7, 2020, pp. 2395–2405
[2] Xu Y, Jia Z and Wang LB “Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features” In BMC Bioinformatics 18(1):281, 2017
[3] Yanning Zhou et al. “CGC-Net: Cell Graph Convolutional Network for Grading of Colorectal Cancer Histology Images” In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 388–398
[4] Yihan Su et al. “HAT-Net: A Hierarchical Transformer Graph Neural Network for Grading of Colorectal Cancer Histology Images” In British Machine Vision Conference, 2021
[5] Bülent Yener “Cell-graphs: image-driven modeling of structure-function relationship” In Commun. ACM 60.1 New York, NY, USA: Association for Computing Machinery, 2016, pp. 74–84
[6] G. Lee et al. “Nuclear Shape and Architecture in Benign Fields Predict Biochemical Recurrence in Prostate Cancer Patients Following Radical Prostatectomy: Preliminary Findings.” In Eur Urol Focus, 2017, pp. 457–466
[7] Hongming Xu et al. “Automatic Nuclei Detection Based on Generalized Laplacian of Gaussian Filters” In IEEE Journal of Biomedical and Health Informatics 21.3, 2017, pp. 826–837 DOI: 10.1109/JBHI.2016.2544245
[8] Y. Xu et al. “Weakly supervised histopathology cancer image segmentation and classification.” In MIA, 2014
[9] Y. Song et al. “Discriminative data transform for image feature extraction and classification.” In MICCAI, 2013