Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

Siemen Brussee Leiden University Medical Center, The Netherlands Giorgio Buzzanca Leiden University Medical Center, The Netherlands Anne M.R. Schrader M.D Leiden University Medical Center, The Netherlands Jesper Kers M.D Leiden University Medical Center, The Netherlands Amsterdam University Medical Center, The Netherlands
(27-03-2024)
Abstract

Histopathological analysis of Whole Slide Images (WSIs) has seen a surge in the utilization of deep learning methods, particularly Convolutional Neural Networks (CNNs). However, CNNs often fall short in capturing the intricate spatial dependencies inherent in WSIs. Graph Neural Networks (GNNs) present a promising alternative, adept at directly modeling pairwise interactions and effectively discerning the topological tissue and cellular structures within WSIs. Recognizing the pressing need for deep learning techniques that harness the topological structure of WSIs, the application of GNNs in histopathology has experienced rapid growth. In this comprehensive review, we survey GNNs in histopathology, discuss their applications, and explore emerging trends that pave the way for future advancements in the field. We begin by elucidating the fundamentals of GNNs and their potential applications in histopathology. Leveraging quantitative literature analysis, we identify four emerging trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs. Through an in-depth exploration of these trends, we offer insights into the evolving landscape of GNNs in histopathological analysis. Based on our findings, we propose future directions to propel the field forward. Our analysis serves to guide researchers and practitioners towards innovative approaches and methodologies, fostering advancements in histopathological analysis through the lens of graph neural networks.

Keywords— Graph Neural Networks, Computational Pathology, Graph Representation Learning, Hierarchical Graph Representation Learning, Adaptive Graph Structure Learning, Multimodal Graph Representation Learning, Higher-order Graph Representation Learning

1 Introduction

Histopathology analysis is an important diagnostic tool and examination tool that can be used for disease diagnosis, estimating disease prognosis, and selecting for or monitoring of therapeutic strategies. Since the digitization of whole slide images (WSIs) in the early 2000s, the computational analysis of histopathology images has become an increasingly important part of histopathology. Starting with image analysis algorithms, the field transitioned to a deep learning approach following the rise of convolutional neural networks in the 2010s, largely due to the availability of large datasets (e.g., ImageNet [1]) and deeper convolutional architectures (e.g., AlexNet [2]). In the last 5 years, paradigms in the field have become more heterogeneous, with the advent of attention-based multiple instance learning [3] [4], vision transformers [5] [6], self-supervised learning [7] [8] and graph neural network [9] [10] approaches.

The emergence of Graph Neural Networks (GNNs) [9] has allowed effective modeling of naturally graph-structured data, such as social networks, (bio)chemical molecules [11] [12], geospatial data [13] [14], and tabular data which can be effectively modeled as a graph, such as in recommendation systems [15] and drug interactions [16]. GNNs can be effectively used for problems involving pairwise interactions between entities in data. In addition, the topological inductive bias that can be encoded in the graph structure allows GNN models to learn based on the topology of the problem. We can define the graph neural network as an optimizeable transformation on all graph attributes that preserves graph symmetries by being permutation invariant [17]. Fundamental for the graph neural network is the notion of message-passing in which we use a learned transformation that exchanges feature information between entities in the graph, leading to topology-aware feature vectors. How the message-passing function is defined is dependent on the type of GNN used, of which many varieties exist (e.g., GCN [18], GAT [19], GIN [20]). In 2018, GNNS were also introduced to histopathology [10] and have gained tremendous popularity in the field since then.

While review papers on the application of GNNs in histopathology exist, they give a general overview [21] or focus on the clinical applications of GNNs in histopathology [22]. Instead, we focus on identifying and quantifying emerging trends in the application of GNNs in histopathology and use these to provide future directions in the field.

Our review is organized into three main sections: First, we introduce GNNs, and their applications in histopathology. Secondly, we identify emerging trends in the application of GNNs in histopathology, from which we select some emerging paradigms which we discuss in more depth (Figure 1). Thirdly, based on our findings, we provide future directions for the field.

Refer to caption
Figure 1: Overview of the four emerging subtopics of GNNs in Histopathology, covered in this review: Hierarchical GNNs, Multimodal GNNS, Higher-order Graphs, and Adaptive Graph Structure Learning. [23] [24] [25] [26] [27]

2 Graph Neural Networks in Histopathology

2.1 Graph Neural Networks

A graph G𝐺Gitalic_G is defined as a set of nodes N𝑁Nitalic_N connected by edges E𝐸Eitalic_E: G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ). The set of edges is defined as a tuple of nodes: E={(x,y)|x,yV}𝐸conditional-set𝑥𝑦𝑥𝑦𝑉E=\{(x,y)|x,y\in V\}italic_E = { ( italic_x , italic_y ) | italic_x , italic_y ∈ italic_V }. The connectivity of the nodes in a graph is captured in the adjacency matrix An×nsuperscript𝐴𝑛𝑛A^{n\times n}italic_A start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, where n𝑛nitalic_n is the number of nodes in G𝐺Gitalic_G. Each entry aijAsubscript𝑎𝑖𝑗𝐴a_{ij}\in Aitalic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_A denotes the existence of an edge eijEsubscript𝑒𝑖𝑗𝐸e_{ij}\in Eitalic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_E as follows:

aij={1,if eijE0,if eijEsubscript𝑎𝑖𝑗cases1if subscript𝑒𝑖𝑗𝐸0if subscript𝑒𝑖𝑗𝐸a_{ij}=\begin{cases}1,&\text{if }e_{ij}\in E\\ 0,&\text{if }e_{ij}\notin E\\ \end{cases}italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_E end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∉ italic_E end_CELL end_ROW (1)

Alternatively, the values of aijsubscript𝑎𝑖𝑗a_{ij}italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT can denote edge weights ranging from 0 to 1, which represents the connectivity strength between nodes i𝑖iitalic_i and j𝑗jitalic_j. Given an undirected graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), we can define the k𝑘kitalic_k-neighborhood of any node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, noted as Nk(v)subscript𝑁𝑘𝑣N_{k}(v)italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_v ) recursively as follows:

N0(v)subscript𝑁0𝑣\displaystyle N_{0}(v)italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_v ) ={v},absent𝑣\displaystyle=\{v\},= { italic_v } , (2)
N1(v)subscript𝑁1𝑣\displaystyle N_{1}(v)italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_v ) ={u(v,u)E or (u,v)E},absentconditional-set𝑢𝑣𝑢𝐸 or 𝑢𝑣𝐸\displaystyle=\{u\mid(v,u)\in E\text{ or }(u,v)\in E\},= { italic_u ∣ ( italic_v , italic_u ) ∈ italic_E or ( italic_u , italic_v ) ∈ italic_E } , (3)
Nk(v)subscript𝑁𝑘𝑣\displaystyle N_{k}(v)italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_v ) ={uwNk1(v) such that (w,u)E or (u,w)E}.absentconditional-set𝑢𝑤subscript𝑁𝑘1𝑣 such that 𝑤𝑢𝐸 or 𝑢𝑤𝐸\displaystyle=\{u\mid\exists w\in N_{k-1}(v)\text{ such that }(w,u)\in E\text{% or }(u,w)\in E\}.= { italic_u ∣ ∃ italic_w ∈ italic_N start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ( italic_v ) such that ( italic_w , italic_u ) ∈ italic_E or ( italic_u , italic_w ) ∈ italic_E } . (4)

GNNs aggregate feature information from the k𝑘kitalic_k-neighborhood of each node, where k𝑘kitalic_k directly corresponds to the number of GNN layers used. This aggregated information is used to update the node feature representation, hhitalic_h, in each GNN layer. Mathematically, the node representation update is defined as follows:

huk+1=UPDATE(k)(hu(k),AGGREGATE(k)({hv(k),vNk(u)}))=UPDATE(k)(hu(k),mN(u)(k))superscriptsubscript𝑢𝑘1superscriptUPDATE𝑘superscriptsubscript𝑢𝑘superscriptAGGREGATE𝑘superscriptsubscript𝑣𝑘for-all𝑣subscript𝑁𝑘𝑢superscriptUPDATE𝑘superscriptsubscript𝑢𝑘subscriptsuperscript𝑚𝑘𝑁𝑢\begin{split}h_{u}^{k+1}&=\text{UPDATE}^{(k)}(h_{u}^{(k)},\text{AGGREGATE}^{(k% )}(\{h_{v}^{(k)},\forall v\in N_{k}(u)\}))\\ &=\text{UPDATE}^{(k)}(h_{u}^{(k)},m^{(k)}_{N(u)})\end{split}start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = UPDATE start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , AGGREGATE start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( { italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , ∀ italic_v ∈ italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_u ) } ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = UPDATE start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N ( italic_u ) end_POSTSUBSCRIPT ) end_CELL end_ROW (5)

where UPDATE and AGGREGATION denote the functions that update node representation husubscript𝑢h_{u}italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and aggregate the hidden representations from u𝑢uitalic_u’s neighborhood Nk(u)subscript𝑁𝑘𝑢N_{k}(u)italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_u ), respectively. How the UPDATE and AGGREGATION functions are exactly defined is dependent on the message passing scheme used and are usually parameterized by two learnable weight matrices. However, all message passing schemes employ a permutation-invariant AGGREGATION function (e.g., sum, mean). We can generally distinguish two types of message-passing schemes: Spectral message-passing, based on the spectral graph properties (e.g., eigenvalues) calculated using the graph Fourier transform, and Spatial message-passing, which are directly applied on the connectivity structure present in the input graphs. In this review, we mainly focus on spatial message-passing methods as these are applied in the vast majority of histopathology applications using GNNs. We first denote a tuple (G,A,X)𝐺𝐴𝑋(G,A,X)( italic_G , italic_A , italic_X ), where G𝐺Gitalic_G denotes the input graph, A𝐴Aitalic_A the associated adjacency matrix and X𝑋Xitalic_X the input node feature matrix. To make the graph representation less sensitive to node degrees, we can normalize the adjacency matrix into a normalized adjacency matrix A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG, as follows:

A~=D1/2AD1/2~𝐴superscript𝐷12𝐴superscript𝐷12\tilde{A}=D^{-1/2}AD^{-1/2}over~ start_ARG italic_A end_ARG = italic_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_A italic_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT (6)

, where D𝐷Ditalic_D denotes the degree matrix (diagonal matrix where Diisubscript𝐷𝑖𝑖D_{ii}italic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT is the degree of node i𝑖iitalic_i) of the graph. To utilize spectral information in the graph structure, we can use the Laplacian matrix of the graph, defined as: L=DA𝐿𝐷𝐴L=D-Aitalic_L = italic_D - italic_A. During message passing, we transform feature matrix X𝑋Xitalic_X into hidden feature representation matrix H𝐻Hitalic_H, typically using a learned weight matrix W𝑊Witalic_W and a nonlinear activation function σ𝜎\sigmaitalic_σ.

One of the most widely adopted and earliest spatial GNN schemes is the Graph Convolutional Network (GCN). The message passing function uses a normalized adjacency matrix to update the hidden representations of nodes based on the node neighborhood. To acquire the hidden representation matrix H𝐻Hitalic_H, the message passing function in GCN layer l𝑙litalic_l is defined as follows:

Hl+1=σ(D~12A~D~12HlWl)superscript𝐻𝑙1𝜎superscript~𝐷12~𝐴superscript~𝐷12superscript𝐻𝑙superscript𝑊𝑙H^{l+1}=\sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{l}W% ^{l})italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_σ ( over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (7)

in which D~~𝐷\tilde{D}over~ start_ARG italic_D end_ARG denotes the degree matrix of G𝐺Gitalic_G and A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG represents the adjacency matrix with added self-loops for each node [18].

The spatial Graph Attention Network (GAT) [19] extends the GCN scheme by adding attention weights to each edge of the graph. This essentially allows models to learn the importance of nodes during message passing. For each edge evusubscript𝑒𝑣𝑢e_{vu}italic_e start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT connecting nodes v𝑣vitalic_v and u𝑢uitalic_u, we first calculate an attention score:

evu=σ(aT[Whv(l)Whu(l)])subscript𝑒𝑣𝑢𝜎superscript𝑎𝑇delimited-[]conditional𝑊superscriptsubscript𝑣𝑙𝑊superscriptsubscript𝑢𝑙e_{vu}=\sigma\left(\vec{a}^{T}\left[Wh_{v}^{(l)}\|Wh_{u}^{(l)}\right]\right)italic_e start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT = italic_σ ( over→ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_W italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∥ italic_W italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ] ) (8)

\| denotes concatenation and a𝑎\vec{a}over→ start_ARG italic_a end_ARG is a trainable shared parameter vector. Using this score, we can calculate the corresponding edge attention weight as follows:

αvu=exp(evu)uN(v)exp(evu)subscript𝛼𝑣𝑢subscript𝑒𝑣𝑢subscriptsuperscript𝑢𝑁𝑣subscript𝑒𝑣superscript𝑢\alpha_{vu}=\frac{\exp(e_{vu})}{\sum_{u^{\prime}\in N(v)}\exp(e_{vu^{\prime}})}italic_α start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT = divide start_ARG roman_exp ( italic_e start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_N ( italic_v ) end_POSTSUBSCRIPT roman_exp ( italic_e start_POSTSUBSCRIPT italic_v italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_ARG (9)

We then update the hidden node representation of hv(l)Hlsuperscriptsubscript𝑣𝑙superscript𝐻𝑙h_{v}^{(l)}\in H^{l}italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT as follows:

hv(l+1)=σ(uN(v)αvuWlhu(l))superscriptsubscript𝑣𝑙1𝜎subscript𝑢𝑁𝑣subscript𝛼𝑣𝑢superscript𝑊𝑙superscriptsubscript𝑢𝑙h_{v}^{(l+1)}=\sigma\left(\sum_{u\in N(v)}\alpha_{vu}\cdot W^{l}h_{u}^{(l)}\right)italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_u ∈ italic_N ( italic_v ) end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) (10)

the spatial GraphSAGE method [28] provides a scalable and flexible framework to decide how neighboring nodes should be aggregated. It differs from other message-passing schemes in that it samples S𝑆Sitalic_S neighbors in the neighborhood of each node, instead of using all neighbors. Given hidden node representation hv(l)Hlsuperscriptsubscript𝑣𝑙superscript𝐻𝑙h_{v}^{(l)}\in H^{l}italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, we can define its message-passing scheme as follows:

hv(l+1)=σ(𝐖(l)AGG(l)({hu(l):u𝒮v}))superscriptsubscript𝑣𝑙1𝜎superscript𝐖𝑙superscriptAGG𝑙conditional-setsuperscriptsubscript𝑢𝑙𝑢subscript𝒮𝑣h_{v}^{(l+1)}=\sigma\left(\mathbf{W}^{(l)}\cdot\text{AGG}^{(l)}\left(\{h_{u}^{% (l)}:u\in\mathcal{S}_{v}\}\right)\right)italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⋅ AGG start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( { italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT : italic_u ∈ caligraphic_S start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } ) ) (11)

Where AGG denotes an aggregation function at layer l𝑙litalic_l, which can be any permutation invariant function (e.g., sum, mean).

Xu et al. introduced the Graph Isomorphism Network (GIN) [20], which has an expressive spatial message-passing scheme aimed to differentiate between isomorphic graph structures 111G1=(V1,E1) and G2=(V2,E2) are isomorphic f:V1V2 such that f is a bijection and u,vV1,{u,v}E1{f(u),f(v)}E2.\begin{aligned} G_{1}&=(V_{1},E_{1})\text{ and }G_{2}=(V_{2},E_{2})\text{ are % {isomorphic} }\iff\exists f:V_{1}\to V_{2}\text{ such that }\\ f&\text{ is a bijection and }\forall u,v\in V_{1},\{u,v\}\in E_{1}\iff\{f(u),f% (v)\}\in E_{2}.\end{aligned}start_ROW start_CELL italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL = ( italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) are bold_isomorphic ⇔ ∃ italic_f : italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that end_CELL end_ROW start_ROW start_CELL italic_f end_CELL start_CELL is a bijection and ∀ italic_u , italic_v ∈ italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , { italic_u , italic_v } ∈ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇔ { italic_f ( italic_u ) , italic_f ( italic_v ) } ∈ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . end_CELL end_ROW For any hidden node representation hvlHlsuperscriptsubscript𝑣𝑙superscript𝐻𝑙h_{v}^{l}\in H^{l}italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, the message-passing is defined as follows:

hv(l+1)=MLP(l)((1+ϵ(l))hv(l)+u𝒩(v)hu(l))superscriptsubscript𝑣𝑙1superscriptMLP𝑙1superscriptitalic-ϵ𝑙superscriptsubscript𝑣𝑙subscript𝑢𝒩𝑣superscriptsubscript𝑢𝑙h_{v}^{(l+1)}=\text{MLP}^{(l)}\left((1+\epsilon^{(l)})\cdot h_{v}^{(l)}+\sum_{% u\in\mathcal{N}(v)}h_{u}^{(l)}\right)italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = MLP start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( ( 1 + italic_ϵ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ⋅ italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) (12)

Here, the MLP denotes a multilayer perceptron which process each node’s aggregated feature vector. ϵlsuperscriptitalic-ϵ𝑙\epsilon^{l}italic_ϵ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is a learnable parameter which learns how to scale the node’s own feature vector.

The spectral ChebNet [29] method uses Chebyshev polynomials to approximate spectral graph convolution. First, we rescale the graphs Laplacian matrix L𝐿Litalic_L using the largest eigenvector of L𝐿Litalic_L, λmaxsubscript𝜆𝑚𝑎𝑥\lambda_{max}italic_λ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT: L^=(2L/λmax)I^𝐿2𝐿subscript𝜆𝑚𝑎𝑥𝐼\hat{L}=(2L/\lambda_{max})-Iover^ start_ARG italic_L end_ARG = ( 2 italic_L / italic_λ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ) - italic_I. Given the approximation parameter k𝑘kitalic_k, we can compute the approximated Chebyshev polynomial Z(k)superscript𝑍𝑘Z^{(k)}italic_Z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT as follows:

𝐙(1)=𝐗𝐙(2)=𝐋^𝐗𝐙(k)=2𝐋^𝐙(k1)𝐙(k2)superscript𝐙1absent𝐗superscript𝐙2absent^𝐋𝐗superscript𝐙𝑘absent2^𝐋superscript𝐙𝑘1superscript𝐙𝑘2\displaystyle\begin{aligned} \mathbf{Z}^{(1)}&=\mathbf{X}\\ \mathbf{Z}^{(2)}&=\mathbf{\hat{L}}\cdot\mathbf{X}\\ \mathbf{Z}^{(k)}&=2\cdot\mathbf{\hat{L}}\cdot\mathbf{Z}^{(k-1)}-\mathbf{Z}^{(k% -2)}\end{aligned}start_ROW start_CELL bold_Z start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL = bold_X end_CELL end_ROW start_ROW start_CELL bold_Z start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_CELL start_CELL = over^ start_ARG bold_L end_ARG ⋅ bold_X end_CELL end_ROW start_ROW start_CELL bold_Z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_CELL start_CELL = 2 ⋅ over^ start_ARG bold_L end_ARG ⋅ bold_Z start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT - bold_Z start_POSTSUPERSCRIPT ( italic_k - 2 ) end_POSTSUPERSCRIPT end_CELL end_ROW (13)

Finally, the message-passing function to update hidden representation matrix in layer l𝑙litalic_l, Hlsuperscript𝐻𝑙H^{l}italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, is defined as follows:

𝐇l+1=k=1K𝐙(k)𝐖(l)superscript𝐇𝑙1superscriptsubscript𝑘1𝐾superscript𝐙𝑘superscript𝐖𝑙\mathbf{H}^{l+1}=\sum_{k=1}^{K}\mathbf{Z}^{(k)}\cdot\mathbf{W}^{(l)}bold_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ⋅ bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT (14)

Prediction tasks using GNNs can be categorized into node-level, edge-level, and graph-level prediction tasks. Node-level tasks, such as node classification, predict labels of target nodes based on the transformed representations after message-passing. Edge-level tasks include edge classification, where labels are predicted for edges in the graph, and link prediction. In link prediction, the aim is to predict whether links between nodes should exist based on the node features after message passing. Lastly, graph-level tasks need a global pooling step, which aggregates information from node and / or edge level into a global representation which can be used to predict graph-level labels. Let us define a graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) with an associated node feature matrix X𝑋Xitalic_X. We can then use any permutation-invariant function to pool the node features into a global representation:

pool(G)=vVX(v)𝑝𝑜𝑜𝑙𝐺subscriptdirect-sum𝑣𝑉𝑋𝑣pool(G)=\bigoplus_{v\in V}X(v)italic_p italic_o italic_o italic_l ( italic_G ) = ⨁ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_X ( italic_v ) (15)

where direct-sum\bigoplus is any permutation invariant function (e.g., sum).

2.2 GNNs in Histopathology

Graphs have been used in digital pathology since the 1990s [30] and have later been combined with classical machine learning algorithms for diagnosis tasks [31]. Since then, GNNs have been gaining popularity throughout the 2010s to become the primary method for graph-based machine learning tasks. Since the first application of GNNs in histopathology, in 2018 [10], the use of GNNs in histopathology has grown rapidly, with more than 150 publications in 2024. GNNs offer several important advantages for modeling of histopathological images:

  1. 1.

    GNNs acquire relationship-aware representations By exchanging information between nodes in the input graph, GNNs learn context-aware representations. This is important in pathology, where meaningul biological structures often depend on the cellular or regional context [32]. It should be noted that vision transformer models do also allow learning relationship-aware representations but these relationships are calculated between arbitrary patches instead of between predefined biologically relevant entities (e.g., cells), in the case of GNNs.

  2. 2.

    GNNs can learn from the topological information in the WSI Graphs are a natural way to capture topology. In histopathology, factors like cellular density can be important in diagnosis, which can be captured in the topological information in the graph structure [33] [34].

  3. 3.

    GNNs model the entire WSI at once Due to the sheer size of whole slide images, traditional deep learning methods usually split the WSI into image patches and use these as model input. This approach introduces patching bias, as optimal resolution, size, and stride depend on the problem at hand [35]. GNNs can model the WSI as a graph, which is much smaller than the original image. This allows it to be loaded into memory, effectively capturing the global structure of the WSI [36].

  4. 4.

    GNNs allow for hierarchical modeling In histopathological image analysis, diagnosis often relies on information acquired from multiple spatial scales of the WSI (e.g., global patterns combined with specific cellular features). GNNs allow for modeling both these scales in a single model, either by connecting graphs on different scales or by learning the global structure through pooling operations [24] [37].

  5. 5.

    GNNs allow for entity-wise interpretability Whereas CNN-based methods usually rely on pixel-level explainability, GNNs allow for entity-wise explainability. This allows pathologists to investigate the dependence of the model prediction on certain biological entities, such as cells or substructures in the WSI [38].

  6. 6.

    GNNs allow for injecting task-specific inductive biases The input graph structure can be modified based on prior information about the task at hand. This in turn allows for more specific explainability and efficient modeling of the problem [39].

  7. 7.

    GNNs allow for straightforward multimodal integration Multimodal integration often requires modeling separate modules whose information is fused together to arrive at a final prediction. In GNNs, information can be simply added to the feature vectors associated with the node, edge, or graph, which can then be jointly updated using message passing. This approach is efficient, as no additional model modules are required and allows quick injection or removal of information from different modalities [40] [41].

Applying GNNs to histopathology requires some decision making and algorithmic steps (Figure 4). First, we preprocess the WSI (e.g., quality control, stain normalization). Now either a cell segmentation algorithm can be applied from which a cell graph can be constructed, or one extracts patches from which a patch graph can be constructed. Using the extracted image entities, a graph can be defined using a chosen graph construction algorithm. When a graph has been defined, it can be used as input for a GNN-model. The predictions given by the GNN-model can be explained using various GNN explainability methods. We will further explore this typical workflow of GNNs in histopathology in the following sections.

2.2.1 Defining the input graph

For GNNs to be applied to histopathology images, one first needs to define which entities nodes in the input graph will represent. The majority of GNNs applied to histopathology use one of 3 types of input graphs, as shown in Figure 2: Cell Graphs, where nodes represent cells or nuclei, segmented using a segmentation algorithm or model (e.g., HoVerNet [42]). Patch Graphs, where nodes represent patches of the image, and lastly, Tissue Graphs, where nodes represent larger-scale semantic entities in the graph. These tissue graph entities can be acquired from a semantic segmentation map, superpixels (usually generated using the SLIC algorithm [43]), or clustered superpixels, which represent similar regions in the input image. Some alternate approaches also exist; notably, approaches that treat image pixels as nodes and approaches that construct a patch-based hypergraph 222graph where edges can connect any number of nodes instead of the pairwise edges seen in regular graphs.

Refer to caption
Figure 2: Most widely used graph types in GNNs for histopathology. A) Cell Graph, B) Patch Graph, C) Tissue graph (based on superpixels, clustered superpixels, or a semantic segmentation mask. The superpixel image was acquired from [44].)

Once the entities for the nodes have been established, one needs to decide how the nodes should be connected. For this, histopathology GNNs usually use one of four graph construction strategies, or combinations of these strategies. First, we can use a simple distance threshold, where we connect all nodes having a pairwise distance (e.g., Euclidean) less than a set threshold t𝑡titalic_t. Second, we can use the k-Nearest Neighbor (k-NN) algorithm. Here, we set a parameter k𝑘kitalic_k, which denotes how many neighbors each node will have. Then, we connect the k𝑘kitalic_k closest neighbors of each node to the target node. Note that for both of these approaches, we can base our notion of distance on spatial distance or distance between the node-associated feature vectors. Third, we can construct a Region Adjacency Graph (RAG), where we connect all entities that share a border333In patch graphs, this is equivalent to using a k=4𝑘4k=4italic_k = 4 k-NN without diagonal neighbors and k=8𝑘8k=8italic_k = 8 k-NN with diagonal neighbors.. Typically, this approach is used for patch- or tissue graphs, with a clear border between entities. Lastly, we can use Delaunay triangulation. Here, we form all possible triangles between the nodes, such that the circumcircle of each triangle does not contain other nodes than the 3 nodes the triangle consists of.

Refer to caption
Figure 3: Most widely used graph construction techniques in GNNs for histopathology. A) Delaunay triangulation. B) K-NN with k=3𝑘3k=3italic_k = 3, C) Distance threshold with threshold t𝑡titalic_t, D) RAG with diagonal neighbors (k=8𝑘8k=8italic_k = 8).

2.2.2 Extracting features

To allow a GNN to use the image-based features present in whole slide images, one usually extracts features associated with the entity of the node and attaches that to the node as node features. Similarly, one can also add features to graph edges, which the GNN can use in the message-passing function. Backbones of pretrained444usually on the ImageNet dataset [1] CNN (e.g., ResNet [45]) or Vision Transformer [5] models are primarily used for node feature extraction, where we use an image patch corresponding with the node entity, process it using the feature extraction model, and extract the feature vectors of this image in the intermediate layers of the model as node features. Sometimes, the feature extraction model is pretrained in a supervised manner on the histopathology images for the problem itself, or fine-tuned for the prediction task at hand, which allows for more problem-specific features. More recently, self-supervised training has been applied for feature extraction, allowing for learning features that generalize better across prediction tasks [46]. Handcrafted features, based on morphology-, texture- or intensity measurements can also be used as node features. Furthermore, (spatial) graph features (e.g., node degree) can be calculated on a node-, edge- or graph-level to more directly incorporate topological information in the model prediction.

2.2.3 Graph Neural Network architectures

Most message-passing schemes used in histopathology GNNs are not specific to histopathology. Popular schemes used include Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), GraphSAGE, or GINs (Graph Isomorphism Networks). Some approaches invented schemes specific for their problem [47] [48] [49] [39] [50] [51] and lately, Graph Transformers models have gained traction as a popular alternative or addition to regular message-passing. In the overall model architectures, many approaches combine message-passing layers with other neural network modules, like transformers, LSTMs, MLPs, and MIL aggregation layers. For graph-level prediction problems, global pooling layers are applied, sometimes combined with sequentially applied local pooling layers which hierarchically coarsen the graph.

2.2.4 Applications

GNNs in histopathology have been applied to a wide variety of tasks. Mainly on supervised prediction tasks such as survival prediction, region-of-interest (ROI) classification, cancer grading, cancer subtyping, cell classification, and the prediction of treatment response. Some applications aim to predict data in other modalities, such as genetic mutations or (spatial) gene expression. Although most use cases are classification problems, some research has used GNNs for semantic segmentation [52] [53] [54] or nuclei detection [55] [56]. Another interesting application is Content-Based Histopathological Image Retrieval (CBHIR). Here, we first use GNNs to extract- and save a graph representation for a ROI in a WSI. When pathologists grade new cases, we can use these embeddings to retrieve similar ROIs, helping in the diagnostic process. Most GNN applications focus on cancer as a disease, with a few exceptions [57] [58] [39] [59] [60] [61] [62].

2.3 Explainability

One major advantage GNNs have over other model types in histopathology is interpretability. The model output can be explained on an entity level and visualized using a graph overlay. For example, one can pool nodes in a cell graph using an attention mechanism, calculate the attention scores for each node, assign a color based on the attention score per node, and then visualize the attention scores on a cellular level when overlaying the graph over the WSI. Many methods for explainability in GNNs have emerged since the inception of the GNN (e.g., GNNExplainer [63], GCExplainer [64]). There have also been efforts to develop explainability methods specific for histopathology GNNs [65] [66] [67] or to use combinations of existing GNN explainability techniques to extract a clinically interpretable model output [68].

Refer to caption
Figure 4: Overview of a typical workflow of applying GNNs to histopathology whole slide images. A) First, preprocessing steps, such as slide quality thresholds and tissue segmentation (e.g., using Otsu thresholding) are applied. B) Then, if chosen for a patch graph approach, the WSI is divided into smaller image patches. C) When a cell graph approach is used, nuclei-segmentation algorithms are applied to acquire a mask of the nuclei in the WSI. D) For each acquired entity (patch, nucleus) features are extracted, typically using a pretrained CNN-model (e.g., ResNet) to acquire a feature matrix X𝑋Xitalic_X. E) Using a graph construction strategy (e.g., k-NN), entities are connected to other entities to form a cell/patch graph, G𝐺Gitalic_G. F) Now, this graph, along with its associated feature matrix, can be used as input for a GNN model which applies message passing operations to learn a representation and then produces an output depending on the prediction task. G) (Graph) explainability methods can be applied to the GNN model to acquire interpretable information on the model behaviour and its predictions.

3 Methodology

Using Google Scholar, we identified 156 papers applying GNNs to histopathology. The first of these papers is from September 2018, when the first paper applying GNNs to histopathology was published, up to March 2024. We included all papers applied on H&E stained whole slide images or tissue microarrays (TMAs) where GNNs (i.e., message-passing) were part of the methodology. The papers were categorized based on the following properties:

  • Message-passing scheme

  • Type(s) of input graph

  • Graph construction method

  • Feature extraction method

  • Application(s)

  • Tissue type(s)

  • Hierarchy

  • Multimodality

We quantified the frequencies in each of these properties to identify emerging trends in the literature (Figure 5).

Refer to caption
Figure 5: Cumulative frequency of publications on GNNs applied on histopathology, with different properties (e.g., Application, Graph type). For the types of message passing, graph types, graph constructors, and applications, only properties occurring in more than 4 papers were retained in the plot.

From our quantification, we identified 4 upcoming trends to explore further:

  1. 1.

    Hierarchical GNNs

  2. 2.

    Adaptive Graph Structure Learning

  3. 3.

    Multimodal GNNs

  4. 4.

    Higher-order graphs

4 Emerging Trends

4.1 Hierarchical GNNs

Diagnostic- and prognostic information present on WSIs often exists on multiple levels of coarsity. For example, the cellular microenvironment can be an important diagnostic factor but can depend on where this microenvironment is globally located in the tissue. Cellular graphs are suitable for capturing the microenvironment, but can miss the global tissue information present in the WSI. Similarly, patch- or tissue-based graphs can capture global information in the WSI, but miss the topological information of the cellular structures [69]. To connect the information on different levels of coarsity, we can either apply local pooling layers which learn a hierarchical representation of the input graph in an end-to-end manner, denoted as Learned Hierarchy, or we can define the hierarchy between graphs prior to model training, denoted as Pre-established hierarchy. Both are illustrated in Figure 6.

In a learned hierarchy, we apply local pooling layers that can iteratively coarsen the graph structure hierarchically. Let us define our input graph with associated node features as G0=(V0,E0,X0)subscript𝐺0subscript𝑉0subscript𝐸0subscript𝑋0G_{0}=(V_{0},E_{0},X_{0})italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Assuming that we have k𝑘kitalic_k local pooling layers in our GNN architecture, we sequentially coarsen our input graph to G1,G2,,Gk𝐺1subscript𝐺2subscript𝐺𝑘G1,G_{2},...,G_{k}italic_G 1 , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the final pooled graph representation. Mathematically, we define a local pooling to coarsen the graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Gi+1subscript𝐺𝑖1G_{i+1}italic_G start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT as follows:

Gi+1=pooli(Gi),i{0,1,2,,k1}formulae-sequencesubscript𝐺𝑖1subscriptpool𝑖subscript𝐺𝑖for-all𝑖012𝑘1G_{i+1}=\text{pool}_{i}(G_{i}),\quad\forall i\in\{0,1,2,...,k-1\}italic_G start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = pool start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ∀ italic_i ∈ { 0 , 1 , 2 , … , italic_k - 1 } (16)

where pooli𝑝𝑜𝑜subscript𝑙𝑖pool_{i}italic_p italic_o italic_o italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined by any permutation-invariant pooling function. Prominently used examples include DiffPool [70], SAGPool [71], and MinCutPool [72]. Apart from pure local pooling, we also classify methods that learn the hierarchy using a cross-hierarchical transformer [73] [74] [75] layer as learned hierarchy methods.

Learned hierarchy methods learn a node assignment matrix S(l)superscript𝑆𝑙S^{(l)}italic_S start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT which denotes the changes in the graph structure after applying the pooling operation. Often, multiple local pooling layers are applied subsequently to coarsen the graph. One sets a pooling ratio hyperparameter, denoted as k𝑘kitalic_k, which determines how many nodes should be present after the pooling operation. For any one of these layers, l𝑙litalic_l, the pooling operation updates the adjacency matrix of the input graph, A𝐴Aitalic_A, and its corresponding node attributes X𝑋Xitalic_X. The hidden representations are denoted H𝐻Hitalic_H, where X=H0𝑋superscript𝐻0X=H^{0}italic_X = italic_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. We denote the pooling operation as:

(Al+1,Hl+1)=POOL(Al,Hl)superscript𝐴𝑙1superscript𝐻𝑙1POOLsuperscript𝐴𝑙superscript𝐻𝑙(A^{l+1},H^{l+1})=\text{POOL}(A^{l},H^{l})( italic_A start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT , italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT ) = POOL ( italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (17)

The pooling operation is dependent on the pooling function used. DiffPool [70] applies a graph neural network to learn a differentiable cluster assignment matrix which maps nodes to clusters, which are used as individual nodes after the pooling operation. DiffPool uses two GNNs: one for obtaining node embeddings, GNNl,embed𝐺𝑁subscript𝑁𝑙𝑒𝑚𝑏𝑒𝑑GNN_{l,embed}italic_G italic_N italic_N start_POSTSUBSCRIPT italic_l , italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT, and one for assigning the nodes to cluster nodes, GNNl,pool𝐺𝑁subscript𝑁𝑙𝑝𝑜𝑜𝑙GNN_{l,pool}italic_G italic_N italic_N start_POSTSUBSCRIPT italic_l , italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT. In each DiffPool layer l𝑙litalic_l, we use the embedding GNN for extracting a feature matrix Z𝑍Zitalic_Z:

Zl=GNNl,embed(Al,Hl)superscript𝑍𝑙𝐺𝑁subscript𝑁𝑙𝑒𝑚𝑏𝑒𝑑superscript𝐴𝑙superscript𝐻𝑙Z^{l}=GNN_{l,embed}(A^{l},H^{l})italic_Z start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = italic_G italic_N italic_N start_POSTSUBSCRIPT italic_l , italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (18)

Then, we calculate the assignment matrix using the pooling GNN:

Sl=softmax(GNNl,pool(Al,Hl))superscript𝑆𝑙𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝐺𝑁subscript𝑁𝑙𝑝𝑜𝑜𝑙superscript𝐴𝑙superscript𝐻𝑙S^{l}=softmax(GNN_{l,pool}(A^{l},H^{l}))italic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( italic_G italic_N italic_N start_POSTSUBSCRIPT italic_l , italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) (19)

Now, we update both the hidden node representations and a new adjacency matrix:

Hl+1=SlTZlsuperscript𝐻𝑙1superscript𝑆superscript𝑙𝑇superscript𝑍𝑙\displaystyle H^{l+1}=S^{l^{T}}Z^{l}italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_S start_POSTSUPERSCRIPT italic_l start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_Z start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT (20)
Al+1=SlTAlSlsuperscript𝐴𝑙1superscript𝑆superscript𝑙𝑇superscript𝐴𝑙superscript𝑆𝑙\displaystyle A^{l+1}=S^{l^{T}}A^{l}S^{l}italic_A start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_S start_POSTSUPERSCRIPT italic_l start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT (21)

Self-Attention Graph Pooling (SAGPool) [71] uses the self-attention mechanism mechanism to learn which nodes are important and to discard unimportant ones. First, we calculate the self-attention score using a graph convolution operation:

Hl+1=σ(D~12A~D~12HlWl)superscript𝐻𝑙1𝜎superscript~𝐷12~𝐴superscript~𝐷12superscript𝐻𝑙superscript𝑊𝑙H^{l+1}=\sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{l}W% ^{l})italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_σ ( over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (22)

Here, Wlsuperscript𝑊𝑙W^{l}italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is a learned weight matrix which we use to calculate the attention score. For each node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, we calculate:

αil=softmax(Wlh˙il)subscriptsuperscript𝛼𝑙𝑖𝑠𝑜𝑓𝑡𝑚𝑎𝑥superscript𝑊𝑙subscriptsuperscript˙𝑙𝑖\alpha^{l}_{i}=softmax(W^{l}\dot{h}^{l}_{i})italic_α start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT over˙ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (23)

where hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the feature embedding of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. SAGPool then ranks the nodes on their attention scores and selects the top-k𝑘kitalic_k nodes to retain. Based on the nodes to retain, the adjacency matrix gets masked and this mask, Hmasksubscript𝐻𝑚𝑎𝑠𝑘H_{mask}italic_H start_POSTSUBSCRIPT italic_m italic_a italic_s italic_k end_POSTSUBSCRIPT, gets multiplied with the original adjacency matrix to coarsen the graph: Al+1=AHmasksuperscript𝐴𝑙1direct-product𝐴subscript𝐻𝑚𝑎𝑠𝑘A^{l+1}=A\odot H_{mask}italic_A start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_A ⊙ italic_H start_POSTSUBSCRIPT italic_m italic_a italic_s italic_k end_POSTSUBSCRIPT.

Lastly, MinCutPool [72] uses the mincut partition objective function to decide the assignment matrix S𝑆Sitalic_S. Similarly to the DiffPool method, we first generate a GNN-based node feature matrix Hl+1superscript𝐻𝑙1H^{l+1}italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT:

Hl+1=GNN(Hl,Al,WGNNl)superscript𝐻𝑙1𝐺𝑁𝑁superscript𝐻𝑙superscript𝐴𝑙subscriptsuperscript𝑊𝑙𝐺𝑁𝑁H^{l+1}=GNN(H^{l},A^{l},W^{l}_{GNN})italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_G italic_N italic_N ( italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G italic_N italic_N end_POSTSUBSCRIPT ) (24)

where WGNNlsubscriptsuperscript𝑊𝑙𝐺𝑁𝑁W^{l}_{GNN}italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G italic_N italic_N end_POSTSUBSCRIPT is the learned weight matrix of the GNN. Using the updated representation, we can use a multilayer perceptron (MLP) to calculate the node assignment matrix S𝑆Sitalic_S:

S=MLP(Hl+1,WMLPl)𝑆𝑀𝐿𝑃superscript𝐻𝑙1subscriptsuperscript𝑊𝑙𝑀𝐿𝑃S=MLP(H^{l+1},W^{l}_{MLP})italic_S = italic_M italic_L italic_P ( italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT , italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT ) (25)

Where WMLPlsubscriptsuperscript𝑊𝑙𝑀𝐿𝑃W^{l}_{MLP}italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT are the learned weights of the MLP. Both WGNNlsubscriptsuperscript𝑊𝑙𝐺𝑁𝑁W^{l}_{GNN}italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G italic_N italic_N end_POSTSUBSCRIPT and WMLPlsubscriptsuperscript𝑊𝑙𝑀𝐿𝑃W^{l}_{MLP}italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT are trained by minimizing two loss terms Lcsubscript𝐿𝑐L_{c}italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, denoting the cut loss term, and Losubscript𝐿𝑜L_{o}italic_L start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, denoting the orthogonality loss term. The cut loss term approximates the Mincut objective, by aiming to minimize the number of edges between clusters while maximizing the edges within clusters. The orthogonality loss term encourages orthogonal cluster assignments and similarly sized clusters. Together, these loss functions form the objective loss Lusubscript𝐿𝑢L_{u}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT:

Lu=Lc+Lo=Tr(SA~S)Tr(SDS~)+Tr(SSIK)Ksubscript𝐿𝑢subscript𝐿𝑐subscript𝐿𝑜Trsuperscript𝑆top~𝐴𝑆Trsuperscript𝑆top𝐷~𝑆Trsuperscript𝑆top𝑆𝐼𝐾𝐾L_{u}=L_{c}+L_{o}=-\frac{{\text{Tr}(S^{\top}\tilde{A}S)}}{{\text{Tr}(S^{\top}D% \tilde{S})}}+\frac{{\text{Tr}(S^{\top}S-IK)}}{{\sqrt{K}}}italic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = - divide start_ARG Tr ( italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG italic_S ) end_ARG start_ARG Tr ( italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D over~ start_ARG italic_S end_ARG ) end_ARG + divide start_ARG Tr ( italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_S - italic_I italic_K ) end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG (26)

Where D𝐷Ditalic_D is the degree matrix of the normalized adjacency matrix A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG, I𝐼Iitalic_I is the identity matrix and K𝐾Kitalic_K is the number of desired clusters.

The pooling operation is performed as follows:

Al+1superscript𝐴𝑙1\displaystyle A^{l+1}italic_A start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT =STA~Sabsentsuperscript𝑆𝑇~𝐴𝑆\displaystyle=S^{T}\tilde{A}S= italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG italic_S (27)
Hl+1superscript𝐻𝑙1\displaystyle H^{l+1}italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT =STHabsentsuperscript𝑆𝑇𝐻\displaystyle=S^{T}H= italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H

4.1.1 Learned hierarchy

As Table 1 shows, the vast majority of GNN applications in histopathology use existing local pooling functions such as in the examples above. In this section, we give some examples of newly designed learned hierarchy methods, specifically for problems in histopathology.

Local Pooling: Hou et al. [49] proposed Iterative Hierarchical Pooling (IHPool), which they combined with a pre-established hierarchy. As input, the authors used a pyramidal heterogeneous patch graph, with one graph existing on 10x resolution, one on 5x resolution, and one on thumbnail resolution. Features were generated using KimiaNet. IHPool was designed to filter redundant information for the downstream prediction task while retaining this pyramidal structure when applying the pooling operation. The method achieves this by conditioning the set of nodes to be pooled on each resolution level on the pooling outcome of the lower-resolution nodes. Let X𝑋Xitalic_X be a matrix of node features, A𝐴Aitalic_A be the adjacency matrix of the input graph, k𝑘kitalic_k be the ratio of nodes to retain after pooling and P𝑃Pitalic_P be a learnable projection layer. Now, let us denote the input graph G=(V,E,R)𝐺𝑉𝐸𝑅G=(V,E,R)italic_G = ( italic_V , italic_E , italic_R ) where R𝑅Ritalic_R represents the set of different resolutions in the graph. For each resolution rR𝑟𝑅r\in Ritalic_r ∈ italic_R, patches on resolution r𝑟ritalic_r are represented as nodes. The nodes are pooled hierarchically, such that nodes in higher magnification levels are subordinate to nodes in lower levels. For all nodes, a fitness score is calculated and nodes are assigned to clusters based on spatial distance and fitness difference between nodes. Specifically, for each node nN𝑛𝑁n\in Nitalic_n ∈ italic_N on resolution r𝑟ritalic_r, we use a learnable projection matrix P𝑃Pitalic_P to calculate the fitness scores as follows:

ϕnr=tanh(VnrPP)superscriptsubscriptitalic-ϕ𝑛𝑟superscriptsubscript𝑉𝑛𝑟𝑃norm𝑃\phi_{n}^{r}=\tanh\left(\frac{{V_{n}^{r}\cdot P}}{{||P||}}\right)italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = roman_tanh ( divide start_ARG italic_V start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ⋅ italic_P end_ARG start_ARG | | italic_P | | end_ARG ) (28)

where Vnrsuperscriptsubscript𝑉𝑛𝑟V_{n}^{r}italic_V start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT is the set of nodes to be pooled, based on the hierarchical edges between resolutions. Based on the calculated node assignments, we create a new node feature matrix Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The adjacency matrix Asuperscript𝐴A^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is updated to maintain graph connectivity based on the node assignments.

Wang et al. [76] proposed a new module for pooling information from cell graphs to use as embeddings for clusters of cells, called cell community forests. The authors first applied DBSCAN clustering to cell embeddings where they clustered the cells based on their density. The hierarchical relationships between the cellular structures is captured by organizing the clusters into nested relationships based on their density (i.e. each dense cellular cluster is nested within a sparser, larger cluster). Cellular features pooled hierarchically up to the sparsest cluster level and then processed by a LSTM module to construct the graph embedding for downstream predictions.

Zhao et al. [77] proposed an extension of the popular MinCutPool by adding an additional message-passing layer in the pooling equation. For acquiring the cluster assignment matrix S𝑆Sitalic_S, where each node sS𝑠𝑆s\in Sitalic_s ∈ italic_S will be a single node in the coarsened graph, the authors used the following equation:

S=H(σ(A^HWpool))𝑆𝐻𝜎^𝐴𝐻subscript𝑊𝑝𝑜𝑜𝑙S=H(\sigma(\hat{A}HW_{pool}))italic_S = italic_H ( italic_σ ( over^ start_ARG italic_A end_ARG italic_H italic_W start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT ) ) (29)

where A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG is the LaPlacian-normalized adjacency matrix H𝐻Hitalic_H denotes the hidden representation matrix of the nodes, Wpoolsubscript𝑊𝑝𝑜𝑜𝑙W_{pool}italic_W start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT denotes a learnable pooling weight matrix and σ𝜎\sigmaitalic_σ denotes a nonlinear activation function (e.g., ReLU).

Attention-based Interaction Modeling: Azadi et al. [75] proposed two attention-based methods for exchanging information between different levels of graph coarsity. The authors used a local graph, where nodes represent patches in the WSI, and a global graph, where nodes represent MinCutPool-based clusters of nodes in the local graph. Now, attention scores are calculated for each node in the local- and global graph. The first method the authors proposed for exchanging information between the local- and global graph was Mixed Co-Attention (MCA), in which the information is not mixed directly, but weight sharing is applied between parallel processing of the local- and global nodes. the second method was Mixed Guided Attention, where the idea of MCA was expanded on by directly infusing the calculated local node feature representation into the attention score calculation of the global nodes. The authors found that the mixed co-attention strategy worked optimally for their use case.

Alternative Approaches: Ding et al. [78] did not learn a hierarchical representation using pooling layers, instead using a FractalNet architecture. Here, the input graph is given to separate processing paths which consist of different numbers of GNN layers, thereby representing different semantic levels in the tissue. The hierarchy between the paths is encoded using a combination of a gated bimodal unit and an MLP mixer architecture. The former calculates a weighted combination of representations, while the latter enhances communication between the path representations and strengthens connections among different path features.

Li et al. [79] propose a hierarchical Graph V-Net to encode hierarchy in a patch graph input. First, attention-based message-passing is used to exchange information between adjacent patches. Then, the authors used a graph coarsening operation where the node features are arranged as a 2D grid based on the spatial location of the patches. This grid is then evenly divided into submatrices and each submatrix is projected to a single feature vector using a learnable layer, which will act as a node after the coarsening operation. Notably, the Graph V-Net also uses graph upsampling layers, which add nodes until the size of the input graph has been restored, similar to what is done in UNet-architectures.

Publication Date Application Learned hierarchy method
Zheng et al. [37] 2019/10 CBHIR DiffPool
Zhou et al. [80] 2019/10 Cancer grading DiffPool
Sureka et al. [38] 2020/10 Binary classification DiffPool
Zheng et al. [81] 2020/12 CBHIR DiffPool
Chen et al. [25] 2020/09 Survival prediction, Cancer grading SAGPool
Jiang et al. [82] 2021/01 Cancer grading DiffPool
Zheng et al. [83] 2021/04 CBHIR DiffPool
Wang et al. [84] 2021/09 Survival prediction SAGpool
Xiang et al. [85] 2021/10 Binary classification DiffPool
Xie et al. [86] 2022/01 Treatment response prediction TopKPooling
Dwivedi et al. [87] 2022/04 Cancer grading SAGPool
Hou et al. [49] 2022/06 Binary classification IHPool
Bai et al. [88] 2022/08 Cancer subtyping MinCutPool
Zuo et al. [89] 2022/09 Survival prediction SAGPool
Hou et al. [73] 2022/09 Cancer subtyping Hierarchical attention mechanism
Lim et al. [90] 2022/10 Survival prediction SAGPool
Wang et al. [76] 2023/02 Cancer subtyping Scattering Cell Pooling
Zhao et al. [77] 2023/02 Cancer subtyping, Cancer grading GCMinCut
Ding et al. [78] 2023/02 Cancer subtyping, Cancer grading Fractal paths
Ding et al. [91] 2023/04 Survival prediction SAGPool
Li et al. [79] 2023/09 Node classification Graph V-Net
Syed et al. [59] 2023/09 Rheuma subtyping SAGPool
Shi et al. [74] 2023/09 Cancer subtyping, mutation prediction Hierarchical attention mechanism
Wu et al. [92] 2023/10 Survival prediction SAGPool
Nakhli et al. [50] 2023/10 Survival prediction SAGPool
Azadi et al. [75] 2023/10 Survival prediction MinCutPool, Hierarchical attention mechanism
Hou et al. [93] 2023/10 Survival prediction Matrix multiplication
Abbas et al. [94] 2023/12 Cancer grading DiffPool
Xu et al. [95] 2023/12 Cancer subtyping DiffPool
Azher et al. [96] 2024/01 Cancer grading, Survival prediction SAGPool
Yang et al. [97] 2024/03 Binary classification, Survival prediction MinCutPool
Table 1: Publications applying GNNs to histopathology which used learned hierarchies

4.1.2 Pre-established hierarchy

In pre-established hierarchy, we encode the hierarchy prior to model training. For example, we can construct multiple graphs at different levels of coarsity in the WSI, and connect them using assignment matrices, which denote how the nodes are connected between the hierarchical levels. During message-passing, the learned representations of the lower hierarchy level are aggregated and used as input for the corresponding nodes at the higher hierarchy level. We differentiate between approaches connecting graphs on different semantic levels (e.g., cells and tissues), and approaches connecting different magnifications of the WSI (e.g., 40x, 20x). An overview of publications using this approach is given in Table 2.

Semantic Hierarchies: Pati et al. [24] were the first to introduce a pre-established hierarchy in the graph to use as input for a GNN model. They constructed a cell graph, CG𝐶𝐺CGitalic_C italic_G, using a nuclei segmentation map and a tissue graph, TG𝑇𝐺TGitalic_T italic_G, constructed by clustering superpixels into larger tissue areas based on similarity. To model the hierarchy, they introduced an assignment matrix SCGTGsubscript𝑆𝐶𝐺𝑇𝐺S_{CG\to TG}italic_S start_POSTSUBSCRIPT italic_C italic_G → italic_T italic_G end_POSTSUBSCRIPT, such that SCGTG(i,j)=1subscript𝑆𝐶𝐺𝑇𝐺𝑖𝑗1S_{CG\to TG}(i,j)=1italic_S start_POSTSUBSCRIPT italic_C italic_G → italic_T italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 if a cellular node i𝑖iitalic_i from the cell graph belongs to tissue node j𝑗jitalic_j in the tissue graph.

Wang et al. [84] introduced hierarchy by applying separate message passing operations on both a cell graph and a patch graph. As patch-level features, cellular node representations pooled based on the cells located in the patch. were used. The authors combined hierarchy learning with pre-established hierarchy by also applying self-attention graph pooling on both the cell- as well as the patch-graph.

Sims et al. [98] connected a cell graph with a level-1 and level-2 patch graph, which represent patches of increasing size (400 μ𝜇\muitalic_μm, 800 μ𝜇\muitalic_μm). They define their message passing for any cellular node i𝑖iitalic_i as CGiL1iL2iL1iCGi𝐶subscript𝐺𝑖𝐿subscript1𝑖𝐿subscript2𝑖𝐿subscript1𝑖𝐶subscript𝐺𝑖CG_{i}\longrightarrow L1_{i}\longrightarrow L2_{i}\longrightarrow L1_{i}% \longrightarrow CG_{i}italic_C italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟶ italic_L 1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟶ italic_L 2 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟶ italic_L 1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟶ italic_C italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where each \longrightarrow defines a message-passing function, CGi𝐶subscript𝐺𝑖CG_{i}italic_C italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the node in the cell graph and L1i𝐿subscript1𝑖L1_{i}italic_L 1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT,L2i𝐿subscript2𝑖L2_{i}italic_L 2 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represent the node corresponding to the level-1 patch and the level-2 patch on which this cell exists. By applying message-passing in this way, the model can exchange information between distant cells without using many message-passing layers, as the cellular nodes belonging to the same layer-2 node can be 800 μ𝜇\muitalic_μm away.

Guan et al. [99] proposed a Node-aligned hierarchical graph-to-local clustering approach, inspired by the Bag-Of-Visual-Words (BOVW) methodology in Computer Vision. Starting with a set of H&E stained WSIs, the authors first clustered the patches for each WSI, into visual word bags, where each bag is defined as B𝐵Bitalic_B. A local clustering approach is used that samples global clusters from each bag B𝐵Bitalic_B into local subclusters using K-means. These subclusters represent a codebook of ’visual words’ representing tissues with different properties. We can use this codebook to categorize patches in input WSIs into subclusters, from which we can construct a graph. This is achieved by connecting the patches in each subcluster using inner-sub-bag edges, and the subclusters themselves using outer-sub-bag edges. This graph structure allows hierarchically modeling WSIs by applying message-passing between patches in each subclusters to retrieve representations which are pooled on a subgraph-level. Subsequently, message-passing is performed between the pooled subcluster representations themselves.

Hou et al. [73] proposed constructing a cell graph along with superpixel-based tissue graphs at two levels (CG,TGl1,TGl2𝐶𝐺𝑇subscript𝐺𝑙1𝑇subscript𝐺𝑙2CG,TG_{l1},TG_{l2}italic_C italic_G , italic_T italic_G start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT , italic_T italic_G start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT). They generated features for the cell graph by using a pretrained ResNet on a patch around the nucleus centroid, while generating tissue graph representations by averaging ResNet-embeddings from all crops belonging to a superpixel. The hierarchical information flow is modeled using a Transformer block that calculates the cross-attention between the graphs at different levels.

Shi et al. [100] used graphs at four different levels of hierarchy: a tissue graph on 5x resolution, consisting of superpixels constructed using the SLIC algorithm [43], and 3 patch graphs at 5x, 10x, and 20x resolution, respectively. The 5x resolution patch graph is used to generate features for the tissue graph. Then, after applying message-passing to the 10x- and 20x patch graphs, the interaction between the different hierarchical levels is modeled using a hierarchical attention module. This module produces a tissue graph where the interactions are captured in the node features. Message-passing layers, global attention layers and a fully connected layer are applied subsequently to the tissue graph to come to a final prediction.

Gupta et al. [101] modeled a tissue graph and a cell graph together as a heterogeneous graph with cellular nodes, tissue nodes, cell-cell edges, tissue-tissue edges, and cell-tissue edges: H={C,T,Ecellcell,Etissuetissue,Ecelltissue}𝐻𝐶𝑇subscript𝐸𝑐𝑒𝑙𝑙𝑐𝑒𝑙𝑙subscript𝐸𝑡𝑖𝑠𝑠𝑢𝑒𝑡𝑖𝑠𝑠𝑢𝑒subscript𝐸𝑐𝑒𝑙𝑙𝑡𝑖𝑠𝑠𝑢𝑒H=\{C,T,E_{cell\to cell},E_{tissue\to tissue},E_{cell\to tissue}\}italic_H = { italic_C , italic_T , italic_E start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l → italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_t italic_i italic_s italic_s italic_u italic_e → italic_t italic_i italic_s italic_s italic_u italic_e end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l → italic_t italic_i italic_s italic_s italic_u italic_e end_POSTSUBSCRIPT }. After applying message-passing layers, they calculated the cross-attention between the cellular and tissue nodes using the transformer architecture to model the hierarchical relationships.

Abbas et al. [94] established four separate hierarchical levels, where one level is a global image analyzed using a CNN model and the other levels are cell graphs constructed at different semantic hierarchy levels (global, spanning the entire wsi (G(0)superscript𝐺0G^{(0)}italic_G start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT), using patches of size 512x512512𝑥512512x512512 italic_x 512px (G(1)superscript𝐺1G^{(1)}italic_G start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT) or using patches of size 256x256256𝑥256256x256256 italic_x 256px (G(2)superscript𝐺2G^{(2)}italic_G start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT)). For each level, a subset of the segmented cells is randomly selected to build a cell graph. After applying message-passing layers on each level separately, the outputs are combined and processed using a fully connected layer. The combined representation and the representations gathered at each cell graph level separately are combined using an entropy weighting strategy, which weights the different representations based on the uncertainty of the model prediction given that representation.

Multiresolution Hierarchies: Xing et al. [102] constructed hierarchical patch graphs at several levels of image resolution, thus aggregating information from multiple resolution levels. Starting with a single patch, they subsampled the same patch at increasingly lower resolution, and connected the lower-resolution patches to the corresponding higher-resolution patch it was sampled from. This input graph was then used for a GNN model.

Bazargani et al. [103] introduced hierarchy into their approach by constructing separate patch graphs on 5x, 10x and 20x resolution and then performing message-passing operations both on each graph separately as well as between graphs with different resolutions.

Bontempo et al. [104] used a knowledge distillation approach combined with two patch graphs at different resolutions (high, low). They performed message-passing both hierarchically between high and low resolution and in each resolution graph itself. They treated the high-resolution graph as a ’teacher’ and the low-resolution graph as a ’student’ network, between which they optimize the KL-divergence for the bag-level predictions at each resolution.

Mirabadi et al. [23] proposed modeling the pyramidal multi-magnification structure in whole slide images as a multiresolution graph, where information on both the inner-magnification and the intra-magnification levels could be modeled. They extracted patches from three magnification levels (20x, 10x and 5x), such that the patches on the higher resolutions are spatially equivalent to center crops of the patches at the lower resolutions. A RAG-graph was constructed such that nodes on each level were connected to both their adjacent neighbors on the same resolution as well as the spatially corresponding lower- and higher-level patch nodes. This allowed information to be exchanged between resolutions during message passing. After message passing, a mean pooling operation was applied on each resolution level, resulting in a 3 node graph. This three-node graph embedding is then used for the downstream classification task.

Refer to caption
Figure 6: A) Pre-established hierarchy, where different graphs are constructed at different levels of coarsening, which are connected hierarchically (e.g., using an assignment matrix) [24] [102]. B)Learned Hierarchy, where trainable local pooling operations sequentially coarsen the graph structure [15].
Publication Date Application Hierarchy
Pati et al. [24] 2020/07 ROI classification CGTG𝐶𝐺𝑇𝐺CG\to TGitalic_C italic_G → italic_T italic_G
Xing et al. [102] 2021/08 Cancer subtyping PG40xPG10xPG5x𝑃subscript𝐺40𝑥𝑃subscript𝐺10𝑥𝑃subscript𝐺5𝑥PG_{40x}\to PG_{10x}\to PG_{5x}italic_P italic_G start_POSTSUBSCRIPT 40 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 10 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 5 italic_x end_POSTSUBSCRIPT
Wang et al. [84] 2021/09 Survival prediction CGPG𝐶𝐺𝑃𝐺CG\to PGitalic_C italic_G → italic_P italic_G
Sims et al. [98] 2022/01 ROI classification CG𝐶𝐺absentCG\toitalic_C italic_G → PG1PG2𝑃subscript𝐺1𝑃subscript𝐺2PG_{1}\to PG_{2}italic_P italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Hou et al. [49] 2022/06 Binary classification PG10xPG5xPGthumbnail𝑃subscript𝐺10𝑥𝑃subscript𝐺5𝑥𝑃subscript𝐺𝑡𝑢𝑚𝑏𝑛𝑎𝑖𝑙PG_{10x}\to PG_{5x}\to PG_{thumbnail}italic_P italic_G start_POSTSUBSCRIPT 10 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 5 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT italic_t italic_h italic_u italic_m italic_b italic_n italic_a italic_i italic_l end_POSTSUBSCRIPT
Guan et al. [99] 2022/06 Cancer subtyping SkKGBsubscript𝑆𝑘subscript𝐾𝐺𝐵S_{k}\to K_{G}\to Bitalic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_K start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT → italic_B
Hou et al. [73] 2022/09 Cancer subtyping CGTGl1TGl2𝐶𝐺𝑇subscript𝐺𝑙1𝑇subscript𝐺𝑙2CG\to TG_{l1}\to TG_{l2}italic_C italic_G → italic_T italic_G start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT → italic_T italic_G start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT
Shi et al. [100] 2023/01 Cancer grading PG20xPG10xTG5x𝑃subscript𝐺20𝑥𝑃subscript𝐺10𝑥𝑇subscript𝐺5𝑥PG_{20x}\to PG_{10x}\to TG_{5x}italic_P italic_G start_POSTSUBSCRIPT 20 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 10 italic_x end_POSTSUBSCRIPT → italic_T italic_G start_POSTSUBSCRIPT 5 italic_x end_POSTSUBSCRIPT
Wang et al. [76] 2023/02 Cancer subtyping CGCCFG𝐶𝐺𝐶𝐶𝐹𝐺CG\to CCFGitalic_C italic_G → italic_C italic_C italic_F italic_G
Gupta et al. [101] 2023/07 Cancer subtyping, binary classification CGTG𝐶𝐺𝑇𝐺CG\to TGitalic_C italic_G → italic_T italic_G
Bazargani et al. [103] 2023/08 Cancer subtyping PG20xPG10xPG5x𝑃subscript𝐺20𝑥𝑃subscript𝐺10𝑥𝑃subscript𝐺5𝑥PG_{20x}\to PG_{10x}\to PG_{5x}italic_P italic_G start_POSTSUBSCRIPT 20 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 10 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 5 italic_x end_POSTSUBSCRIPT
Bontempo et al. [104] 2023/10 Binary classification PGhighPGlow𝑃subscript𝐺𝑖𝑔𝑃subscript𝐺𝑙𝑜𝑤PG_{high}\to PG_{low}italic_P italic_G start_POSTSUBSCRIPT italic_h italic_i italic_g italic_h end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT
Abbas et al. [94] 2023/12 Cancer grading CG256pxCG512pxCGglobalWSIthumbnail𝐶subscript𝐺256𝑝𝑥𝐶subscript𝐺512𝑝𝑥𝐶subscript𝐺𝑔𝑙𝑜𝑏𝑎𝑙𝑊𝑆subscript𝐼𝑡𝑢𝑚𝑏𝑛𝑎𝑖𝑙CG_{256px}\to CG_{512px}\to CG_{global}\to WSI_{thumbnail}italic_C italic_G start_POSTSUBSCRIPT 256 italic_p italic_x end_POSTSUBSCRIPT → italic_C italic_G start_POSTSUBSCRIPT 512 italic_p italic_x end_POSTSUBSCRIPT → italic_C italic_G start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT → italic_W italic_S italic_I start_POSTSUBSCRIPT italic_t italic_h italic_u italic_m italic_b italic_n italic_a italic_i italic_l end_POSTSUBSCRIPT
Mirabadi et al. [23] 2024/02 Cancer subtyping PG20xPG10xPG5x𝑃subscript𝐺20𝑥𝑃subscript𝐺10𝑥𝑃subscript𝐺5𝑥PG_{20x}\to PG_{10x}\to PG_{5x}italic_P italic_G start_POSTSUBSCRIPT 20 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 10 italic_x end_POSTSUBSCRIPT → italic_P italic_G start_POSTSUBSCRIPT 5 italic_x end_POSTSUBSCRIPT
Table 2: Publications applying GNNs to histopathology which used a pre-established hierarchy. All hierarchies are shown small to large, such that when XY𝑋𝑌X\to Yitalic_X → italic_Y, entities in X𝑋Xitalic_X are subordinate to the entities in Y𝑌Yitalic_Y. CG: Cell Graph, PG: Patch Graph, TG: Tissue Graph.

4.2 Adaptive Graph Structure Learning

Most GNN applications in histopathology use a fixed input graph with fixed edge connectivity. While successful results have been achieved using this approach, we argue that it is suboptimal. Whether connections between nodes should exist is not clearly defined in the histopathology image, leading to the wide range of different approaches for constructing the input graphs, as previously discussed. These approaches are usually not based on biological or medical information, and thus introduce inductive bias which might not reflect the biology in the tissue. To counteract this problem, one can either adjust the message-passing equation such that some edges are given more representative power than others (e.g., using GAT [19]), or one can make the graph construction a learnable transformation. The second approach, Adaptive Graph Structure Learning (AGSL), has gained more popularity recently (Table 3). In GNNs for histopathology, the AGSL strategy employs either a learned transformation that updates the adjacency matrix or learned convolutional filters that dynamically construct the graph.

Learned Transformation: In 2020, Adnan et al. [36] introduced adaptive graph learning for the classification of lung cancer subtypes. The authors modeled the whole slide image as a fully connected graph of representative patches. Then, they used a pre-trained DenseNet for feature extraction. The graph connectivity is learned end-to-end using both global WSI context and local pairwise context between patches. Let us denote WSI W𝑊Witalic_W with patches w1,,wnsubscript𝑤1subscript𝑤𝑛w_{1},...,w_{n}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where for each patch wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT we have a feature vector xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The authors first pooled the patch representations into a global context vector c𝑐citalic_c using a pooling function ϕitalic-ϕ\phiitalic_ϕ (e.g., sum):

c=ϕ(x1,x2,,xn)𝑐italic-ϕsubscript𝑥1subscript𝑥2subscript𝑥𝑛c=\phi(x_{1},x_{2},...,x_{n})italic_c = italic_ϕ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (30)

The global vector c𝑐citalic_c is concatenated to each patch feature vector xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and is jointly processed by MLP layers which gives a feature vector xisuperscriptsubscript𝑥𝑖x_{i}^{*}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that contains both local and global context information. Finally the matrix Xsuperscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which holds all feature vectors xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, is processed using a cross-correlation layer which determines the connectivity of the output graph in A𝐴Aitalic_A, where each element aijAsubscript𝑎𝑖𝑗𝐴a_{ij}\in Aitalic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_A represents the correlation between patches wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and wjsubscript𝑤𝑗w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and are used as edge weights in the learned graph structure. The learned graph can be used for any downstream tasks and has shown better performance than other (graph-based) MIL methods, available at the time.

Hou et al. [73] described a spatial-hierarchical GNN framework that could dynamically learn the graph structure during model training. Their Dynamic Structure Learning module first embeds the representation of both node features V𝑉Vitalic_V and nuclear centroid coordinates P𝑃Pitalic_P together into a single representation J𝐽Jitalic_J, using the following equation:

J=Concat[σ(PTW1),σ(VTW2)]𝐽𝐶𝑜𝑛𝑐𝑎𝑡𝜎superscript𝑃𝑇subscript𝑊1𝜎superscript𝑉𝑇subscript𝑊2J=Concat[\sigma(P^{T}W_{1}),\sigma(V^{T}W_{2})]italic_J = italic_C italic_o italic_n italic_c italic_a italic_t [ italic_σ ( italic_P start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_σ ( italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] (31)

Where W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and W2subscript𝑊2W_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are learned weight matrices and σ𝜎\sigmaitalic_σ denotes a non-linear activation function. Next, the authors applied a distance-thresholded k-NN algorithm on the acquired embedding J𝐽Jitalic_J to determine the edge connectivity. Given a set of nodes V𝑉Vitalic_V, set of edges E𝐸Eitalic_E, distance threshold dminsubscript𝑑mind_{\text{min}}italic_d start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and the number of neighbors k𝑘kitalic_k, we use the following equation to determine the edges in E𝐸Eitalic_E:

euvE{u,vVvu2min(dk,dmin)}iffsubscript𝑒𝑢𝑣𝐸conditional-set𝑢𝑣𝑉subscriptnorm𝑣𝑢2subscript𝑑𝑘subscript𝑑mine_{uv}\in E\iff\{u,v\in V\mid||v-u||_{2}\leq\min(d_{k},d_{\text{min}})\}italic_e start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT ∈ italic_E ⇔ { italic_u , italic_v ∈ italic_V ∣ | | italic_v - italic_u | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ roman_min ( italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) } (32)

Here, dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the distance between nodes u𝑢uitalic_u and the k𝑘kitalic_k-closest neighbor.

Liu et al. [105] propose learning the graph structure based on the cosine similarity between the transformed patch feature vectors. Given an input feature matrix X𝑋Xitalic_X and a transformation matrix T𝑇Titalic_T we create a projected matrix P=XT𝑃𝑋𝑇P=XTitalic_P = italic_X italic_T. They then calculate the cosine similarity between each pair of patches in P𝑃Pitalic_P which are saved as a symmetric adjacency matrix ALsubscript𝐴𝐿A_{L}italic_A start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, which holds the ’edge strength’ between any two patches in P𝑃Pitalic_P. The edge strength is then thresholded using a set threshold ϵitalic-ϵ\epsilonitalic_ϵ:

euvE{u,vVP[u]P[v]P[u]P[v]ϵ}iffsubscript𝑒𝑢𝑣𝐸conditional-set𝑢𝑣𝑉𝑃delimited-[]𝑢𝑃delimited-[]𝑣norm𝑃delimited-[]𝑢norm𝑃delimited-[]𝑣italic-ϵe_{uv}\in E\iff\{u,v\in V\mid\frac{{P[u]\cdot P[v]}}{{\|P[u]\|\cdot\|P[v]\|}}% \leq\epsilon\}italic_e start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT ∈ italic_E ⇔ { italic_u , italic_v ∈ italic_V ∣ divide start_ARG italic_P [ italic_u ] ⋅ italic_P [ italic_v ] end_ARG start_ARG ∥ italic_P [ italic_u ] ∥ ⋅ ∥ italic_P [ italic_v ] ∥ end_ARG ≤ italic_ϵ } (33)

where P[u]𝑃delimited-[]𝑢P[u]italic_P [ italic_u ] and P[v]𝑃delimited-[]𝑣P[v]italic_P [ italic_v ] denote the projected feature vectors of nodes u𝑢uitalic_u and v𝑣vitalic_v, respectively. Note that the transformation matrices are learned, which allows the graph structure to be adapted during model training.

CNN-filter Based: Gao et al. [106] and Ding et al. [78] both use a very different approach, where the learned feature maps generated by a CNN are used as basis for the graph construction. More specifically, they treat the units in each feature map as nodes in which the features are spatially concatenated across channels into a node feature vector. After this concatenation, the K-nn algorithm is used to connect the nodes. By basing the graph structure on learned CNN feature maps, the graph structure is learned by training the CNN and, since each unit in the feature maps corresponds to a spatial region in the input image, the constructed graph can capture spatial dependencies between regions in the WSI. Given the acquired node embedding matrix X𝐑N×C𝑋superscript𝐑𝑁𝐶X\in\mathbf{R}^{N\times C}italic_X ∈ bold_R start_POSTSUPERSCRIPT italic_N × italic_C end_POSTSUPERSCRIPT where N𝑁Nitalic_N is the number of nodes and C𝐶Citalic_C the amount of channels, we determine the existence of edges as follows:

euvE{u,vVufvf2dk}iffsubscript𝑒𝑢𝑣𝐸conditional-set𝑢𝑣𝑉subscriptnormsubscript𝑢𝑓subscript𝑣𝑓2subscript𝑑𝑘e_{uv}\in E\iff\{u,v\in V\mid||u_{f}-v_{f}||_{2}\leq d_{k}\}italic_e start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT ∈ italic_E ⇔ { italic_u , italic_v ∈ italic_V ∣ | | italic_u start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } (34)

Where ufsubscript𝑢𝑓u_{f}italic_u start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, vfsubscript𝑣𝑓v_{f}italic_v start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT are the feature vectors of node u𝑢uitalic_u and v𝑣vitalic_v, and dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the distance between node u𝑢uitalic_u and the k𝑘kitalic_k-closest neighbor of u𝑢uitalic_u.

Publication Date Application Adaptive learning mechanism
Adnan et al. [36] 2020/05 Binary classification Learned transformation
Gao et al. [106] 2022/02 Cancer subtyping CNN-filter based
Hou et al. [73] 2022/09 Cancer subtyping Learned transformation
Ding et al. [78] 2023/02 Cancer subtyping, Cancer grading CNN-filter based
Liu et al. [105] 2023/04 Survival prediction Learned transformation
Table 3: Publications applying GNNs in histopathology and using adaptive graph structure learning strategies.

4.3 Multimodal GNNs

In histopathology diagnostics, different modalities are often combined to assist in clinical decision-making and prognostic predictions. While most applications of GNNs in histopathology focus solely on H&E image data, approaches considering multiple modalities have gained popularity recently. Combining data from multiple modalities helps increase model accuracy and generalization. Graph Neural Networks are especially suitable for multimodal integration, as data from different modalities can be easily combined in the node- and edge feature vectors [107]. In the last few years, multiple approaches combined IHC-stained biopsy images with H&\&&E stained biopsy images, while other approaches incorporated spatial transcriptomics or genetic data in the model input. We differentiate between Stain multimodality, where the same whole slide images with different stainings (e.g., IHC) are combined, and Full multimodality, where the modalities are not based on WSIs (e.g., CT-scans, gene expression data). An overview of the multimodal GNNs in histopathology is given in Table 4.

An important challenge in multimodal integration in Deep Learning models is how- and where in the model architecture data from different modalities should be combined, which we call fusion. In a GNN context, we broadly differentiate between early fusion, where data from different modalities is combined prior to message passing and late fusion, where data is combined after the message passing steps (Figure 7).

We broadly categorize the multimodal GNNs into four groups: Pathomic fusion based, which uses the pathomic fusion strategy, popularized by Chen et al. [25], Early fusion, Late fusion and Modality prediction, encompassing models that predict one modality using another. Models that do not directly fuse modalities but use predictions from one modality to drive how the other modalities are processed are considered Late fusion models.

4.3.1 Full multimodality

Pathomic Fusion: Chen et al. [25] integrated whole slide image information together with RNA-Seq counts and copy number variant (CNV) information. They used this combined information for cancer subtyping and survival analysis on clear cell renal cell carcinoma and glioma TCGA datasets. Their multimodal model fused information from 3 different modules: A CNN-based image module, a GNN-based cell graph module, and a genomic module, which took CNV and RNA-seq information as input. In the image module, a set of WSI patches was used as input for an ImageNet-pretrained VGG19 CNN model optimized for cancer grading and survival prediction. The cell graph module first segmented the nuclei in the image, constructed a graph using these nuclei, and used message-passing layers to learn a graph representation. Lastly, the genomic module, where a self-normalizing neural network was learned on a vector of CNV- and RNA-seq information to learn a genomic representation. Their approach for multimodal fusion, which they call Pathomic fusion models interactions between modalities via the Kronecker product of attention-gated representation. The attention gating is applied to the hidden representation of modality m𝑚mitalic_m, hmsubscript𝑚h_{m}italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, by learning a transformation Wignmsubscript𝑊𝑖𝑔𝑛𝑚W_{ign\to m}italic_W start_POSTSUBSCRIPT italic_i italic_g italic_n → italic_m end_POSTSUBSCRIPT which assigns an importance score for each modality, which we denote as zmsubscript𝑧𝑚z_{m}italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT:

hm,gatedsubscript𝑚gated\displaystyle h_{m,\text{gated}}italic_h start_POSTSUBSCRIPT italic_m , gated end_POSTSUBSCRIPT =zmhm,m{i,g,n}formulae-sequenceabsentsubscript𝑧𝑚subscript𝑚for-all𝑚𝑖𝑔𝑛\displaystyle=z_{m}\ast h_{m},\quad\forall m\in\{i,g,n\}= italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∗ italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , ∀ italic_m ∈ { italic_i , italic_g , italic_n } (35)
where,hmwhere,subscript𝑚\displaystyle\text{where,}\quad h_{m}where, italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT =ReLU(Wmhm)absentReLUsubscript𝑊𝑚subscript𝑚\displaystyle=\text{ReLU}(W_{m}\cdot h_{m})= ReLU ( italic_W start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⋅ italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
zmsubscript𝑧𝑚\displaystyle z_{m}italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT =σ(Wignm[hi,hg,hn])absent𝜎subscript𝑊ign𝑚subscript𝑖subscript𝑔subscript𝑛\displaystyle=\sigma(W_{\text{ign}\rightarrow m}\cdot[h_{i},h_{g},h_{n}])= italic_σ ( italic_W start_POSTSUBSCRIPT ign → italic_m end_POSTSUBSCRIPT ⋅ [ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] )

Where hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, hgsubscript𝑔h_{g}italic_h start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, and hnsubscript𝑛h_{n}italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, are the gated representation vectors of the image module, graph module, and genomic module, respectively. The authors calculated the Kronecker product of these vectors to get a combined representation hfusionsubscript𝑓𝑢𝑠𝑖𝑜𝑛h_{fusion}italic_h start_POSTSUBSCRIPT italic_f italic_u italic_s italic_i italic_o italic_n end_POSTSUBSCRIPT:

hfusion=(hi1)(hg1)(hn1)subscript𝑓𝑢𝑠𝑖𝑜𝑛tensor-productmatrixsubscript𝑖1matrixsubscript𝑔1matrixsubscript𝑛1h_{fusion}=\begin{pmatrix}h_{i}\\ 1\\ \end{pmatrix}\otimes\begin{pmatrix}h_{g}\\ 1\\ \end{pmatrix}\otimes\begin{pmatrix}h_{n}\\ 1\\ \end{pmatrix}italic_h start_POSTSUBSCRIPT italic_f italic_u italic_s italic_i italic_o italic_n end_POSTSUBSCRIPT = ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ) ⊗ ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ) ⊗ ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ) (36)

where tensor-product\otimes denotes the outer product. The result, hfusionsubscript𝑓𝑢𝑠𝑖𝑜𝑛h_{fusion}italic_h start_POSTSUBSCRIPT italic_f italic_u italic_s italic_i italic_o italic_n end_POSTSUBSCRIPT is a three-dimensional tensor that can then be connected to a fully connected layer for classification tasks or survival prediction.

Jiang et al. [108] predicted EGFR gene mutations in lung cancer by augmenting the approach used by Chen et al. [25]. The authors approach differs from Chen et al. by not using genomic data but instead using clinical information (e.g., gender, age) as the third modality, next to a spatial cell graph and whole slide image. Comparing with a previous model from the same group [109], which used a cell graph- and image module but no clinical features, the authors found considerable performance increases for the multimodal model.

Early Fusion: Azher et al. [96] integrated spatial transcriptomics data with accompanying H&E WSI data to predict survival and grade cancer in colorectal cancer. The authors first constructed an embedding model that used an ImageNet-pretrained CNN to encode H&E patches and fully connected layers to encode spatial gene expression data at the same location. They then optimized a projection layer to merge the data from these modalities into a single vector using a combination of unimodal and cross-modal SimCLR loss functions. This effectively trained the model to encode a cross-modal embedding vector. The acquired embeddings were used as node vectors in a GNN model for downstream tasks. The authors showed that using expression-aware embeddings improved model performance on all tasks, indicating that pretraining using coupled H&E WSIs and spatial transcriptomics datasets can help retrieve more discriminative embeddings for downstream tasks.

Late Fusion: Zuo et al. [89] integrated H&E stained WSIs with genomic biomarker information. Specifically, they constructed a graph of patches containing Tumor Infiltrating Lymphocyte (TILs) and analyzed this graph using a GNN. Genomic data consisted of mRNA gene counts, which were transformed to a gene co-expression module matrix using the lmQCM algorithm. They then applied a concrete autoencoder model to the co-expression matrix to identify survival-associated features. The GNN- and autoencoder outputs were then fused using a self-attention layer.

De et al. [110] combined MRI- and H&E stained WSIs of brain tumors to predict the type of brain cancer. The modalities were not directly fused; instead, the authors first used a 3D-CNN model to detect whether the cancer was one of the possible cancer types (Glioblastoma). If this was the case, the model simply outputs glioblastoma as its prediction. When this was not the case, a patch graph was constructed from the H&E image which was used as input for a GNN model. Finally, this GNN model could predict one of the remaining subtypes (Normal, Astrocytoma, or Oligodendroglioma).

Xie et al. [111] combined gene expression with H&E whole slide image data for survival prediction in gastric cancer. Here, the authors first processed ResNet-based WSI tile features and a gene expression matrix separately using MLP layers. Then the interaction between each WSI patch and each gene feature vector was calculated using a cross-modal attention layer. After this processing, the data from both modalities was aggregated using a MIL-aggregation module and finally fused using concatenation. The fused embeddings were used to construct a patient graph, based on the similarity of the fused embeddings between the patients. A GNN was used to process this graph, which produced a survival prediction.

Zheng et al. [112] fused gene-expression signatures with a WSI-patch graph using their Genomic Attention Module approach. After message-passing on the patch graph, the pairwise interactions between each patch and each individual gene signature modeled using a self-attention mechanism. This allows the model to learn the interactions between spatial tissue regions and gene signatures, which allowed the authors to visualize which gene signatures were associated with certain regions in the WSI.

Modality Prediction: Fatemi et al. [113] integrated spatial transcriptomic data with co-localized H&E WSI data to characterize spatial tumor heterogeneity in colorectal cancer. They achieved this by training a model to predict the spatial gene expression from the H&E WSI. The authors tried to predict the spatial gene expression using both a CNN- and GNN-based network and showed that for this task, the CNN-based methods worked better.

Gao et al. [114] predicted spatial transcriptomic data using H&E images by integrating image- and cell graph data using CNN- and GNN-based models. The authors showed that integrating the graph- and image-based information together did significantly improve over using either one alone.

4.3.2 Stain multimodality

Early fusion: Li et al. [41] fused information from Second-Harmonic Generation (SHG) microscopy images and H&E WSIs together to differentiate between pancreatic ductal adenocarcinoma and chronic pancreatitis in pancreatic cancer. The images from both modalilities were registered and for each modality a separate graph was constructed. The features from each modality were combined into node features for the input graph, where nodes represented registered patches in both modalities. An ImageNet-pretrained ResNet model was used to retrieve features from the H&E patches, while collagen fiber-specific handcrafted features were extracted for each SHG-patch. A H&E-SHG graph was constructed where the node vectors contained the concatenation of the patch features from both modalities. This graph was used in a GNN model which predicted between the two classes.

Gallagher-Syed et al. [59] integrated data from IHC- (CD138, CD68, CD20) and H&E stained synovial biopsy samples to predict a Rheumatoid Arthritis subtype using a GNN model. Information between the staining modalities was exchanged by modeling each patch, from each staining, as a node and connecting the nodes based on their feature similarity to get a single multistain graph. The authors showed that the features across stains were similar enough to cause nodes from different staining to mix in the graph and, thus, enable information exchange between the modalities in message passing layers of the GNN. The authors used the multimodal graph as input for a GNN model whose output was used to predict the rheuma subtype.

Late fusion: Dwivedi et al. [87] combined trichrome- (TC) and H&E stainings of liver biopsies to predict liver fibrosis. The authors experimented with different modality fusion techniques. Their experiments showed that their late concatenation or addition and the pathomic fusion strategy proposed by Chen et al. [25] performed the best for fibrosis prediction. In the late and pathomic fusion strategies, they separately processed both the H&E and TC tissues as graphs using a GNN and then fused the features from both modalities together.

Qiu et al. [115] combined information from H&E stainings, multiphoton microscopy (MP), and two-photon excited fluorescence (TPEF) applied to the same breast cancer biopsies. Instead of fusing the modalities in the model itself, the authors determined tumor-associated collagen signatures from the 3 different modalities in different regions to calculate a 8-bit binary vector for each region. The regions sampled were treated as graph nodes having the binary vector as node attributes. Using these nodes, a fully connected graph was constructed and used as input for a GNN-model. The models output could be used for survival prediction.

Modality prediction: Pati et al. [116] used a generative approach to virtually predict IHC stained tissue images from H&E WSIs, and then used a multimodal GNN Transformer model to perform survival prediction and cancer grading tasks in prostate cancer, breast cancer, and colorectal cancer. The authors used three strategies for fusion (no fusion, early fusion, late fusion) and found that early fusion works optimally for both tasks. In early fusion, the authors combined ImageNet-pretrained ResNet features from the same patch in all modalities to form the node features in the input graph. In late fusion, meanwhile, all modalities were assigned a separate input graph, which was processed separately using the GNN Transformer model. Subsequently, the output features were combined. The authors hypothesized that early fusion allowed the model to learn multimodal spatial interactions during message passing, causing a performance gain compared to the other fusion strategy.

Publication Date Application Fusion Modalities
Chen et al. [25] 2020/09 Survival prediction, Cancer subtyping Late (Pathomic fusion) H&E WSI, Gene expression, CNV
Dwivedi et al. [87] 2022/04 Cancer grading Late H&E WSI, TC WSI
Qiu et al. [115] 2022/07 Survival prediction Early H&E WSI, MP, TPEF
Zuo et al. [89] 2022/09 Survival prediction Late (Self-attention) H&E WSI, Gene expression
De et al. [110] 2022/10 Cancer subtyping None H&E WSI, MRI
Li et al. [41] 2022/11 Cancer subtyping Early H&E WSI, SHG
Xie et al. [111] 2022/12 Survival prediction Late H&E WSI, Gene expression
Fatemi et al. [113] 2023/03 ST-prediction None H&E WSI, ST
Jiang et al. [108] 2023/03 Mutation prediction Late (Pathomic fusion) H&E WSI, clinical data
Gao et al. [114] 2023/07 ST-prediction, survival prediction None H&E WSI, ST
Gallagher et al. [59] 2023/09 Rheumatoid Subtyping Early H&E WSI, IHC WSI
Pati et al. [116] 2023/12 Survival prediction, Cancer grading Early, Late H&E, virtual IHC
Azher et al. [96] 2024/01 Survival prediction, Cancer grading Early H&E WSI, ST
Zheng et al. [112] 2024/01 Survival prediction Late WSI, Gene Expression
Table 4: Applications of Multimodal GNNs in histopathology. CNV: Copy Number Variation, TC: Trichrome, MP: MultiPhoton microscopy, TPEF: two-photon excited fluorescence microscopy, MRI: Magnetic Resonance Imaging, SHG: Second-Harmonic Generation microscopy, ST: Spatial Transcriptomics, IHC: Immunohistochemistry
Refer to caption
Figure 7: Early fusion (A) versus late fusion (B). In early fusion, information from different modalities is typically integrated in the node features before message passing, enabling modeling multimodal interactions. In late fusion, modalities are separately processed and combined before the final model layers which calculate the model prediction. FCN: Fully Connected Layer.

4.4 Higher-order graphs

While graphs have shown to be adequate formats for the representation of histopathology slides, it is limited by the fact only pairwise relations can be modeled. Furthermore, the entities in the graphs can solely be modeled as nodes and edges. This limitation has inspired extensions to the graph modeling framework, which are collectively known as higher-order graphs. Examples of higher-order graphs are hypergraphs, cellular complexes, and combinatorial complexes. To allow learning from these higher-order graph structures, message-passing frameworks called topological neural networks (TNNs) have been developed [117].

In histopathology, TNNs have not yet been widely adopted, but there has been a steadily increasing number of publications that model WSIs as hypergraphs. Hypergraphs extend the graph modeling framework with hyperedges, which can connect sets containing an arbitrary number of nodes in the graph. This allows hypergraphs to model relations that rely on more than 2 pairwise entities. Deep learning on hypergraphs can be done using hypergraph neural network architectures, such as HGNN [118] and HyperGAT [119]. We provide an overview of publications using higher-order graphs in histopathology in Table 5.

Let us denote a hypergraph as G=(V,Ehyp)𝐺𝑉subscript𝐸hypG=(V,E_{\text{hyp}})italic_G = ( italic_V , italic_E start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT ), which consists of a set of nodes V𝑉Vitalic_V and a set of hyperedges Ehypsubscript𝐸hypE_{\text{hyp}}italic_E start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT. Each hyperedge in Ehypsubscript𝐸hypE_{\text{hyp}}italic_E start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT is a pair of subsets of V𝑉Vitalic_V, allowing connections between any number of vertices. For example, a hypergraph with vertices V={v1,v2,v3,v4}𝑉subscript𝑣1subscript𝑣2subscript𝑣3subscript𝑣4V=\{v_{1},v_{2},v_{3},v_{4}\}italic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } and hyperedges Ehyp={{v1,v2},{v2,v3,v4},{v1,v3,v4}}subscript𝐸hypsubscript𝑣1subscript𝑣2subscript𝑣2subscript𝑣3subscript𝑣4subscript𝑣1subscript𝑣3subscript𝑣4E_{\text{hyp}}=\{\{v_{1},v_{2}\},\{v_{2},v_{3},v_{4}\},\{v_{1},v_{3},v_{4}\}\}italic_E start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT = { { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } } of V𝑉Vitalic_V, expressing relationships between multiple nodes simultaneously. We denote the connectivity of a hypergraph using an incidence matrix H|V|×|E|superscript𝐻𝑉𝐸H^{|V|\times|E|}italic_H start_POSTSUPERSCRIPT | italic_V | × | italic_E | end_POSTSUPERSCRIPT whose entries are defined as:

h(v,e)={1,if ve0,if ve𝑣𝑒cases1if 𝑣𝑒0if 𝑣𝑒h(v,e)=\begin{cases}1,&\text{if }v\in e\\ 0,&\text{if }v\notin e\\ \end{cases}italic_h ( italic_v , italic_e ) = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_v ∈ italic_e end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_v ∉ italic_e end_CELL end_ROW (37)

for nodes vV𝑣𝑉v\in Vitalic_v ∈ italic_V and edges eEhyp𝑒subscript𝐸𝑦𝑝e\in E_{hyp}italic_e ∈ italic_E start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT. For any node v𝑣vitalic_v, its degree is defined as d(v)=eEhyph(v,e)𝑑𝑣subscript𝑒subscript𝐸𝑦𝑝𝑣𝑒d(v)=\sum_{e\in E_{hyp}}h(v,e)italic_d ( italic_v ) = ∑ start_POSTSUBSCRIPT italic_e ∈ italic_E start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_h ( italic_v , italic_e ), similarly for any edge eEhyp𝑒subscript𝐸𝑦𝑝e\in E_{hyp}italic_e ∈ italic_E start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT, its degree is defined as d(e)=vVh(v,e)𝑑𝑒subscript𝑣𝑉𝑣𝑒d(e)=\sum_{v\in V}h(v,e)italic_d ( italic_e ) = ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_h ( italic_v , italic_e ). These degrees are saved in diagonal matrices Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and Dvsubscript𝐷𝑣D_{v}italic_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, which contain the edge degrees and node degrees, respectively. Lastly, we denote the matrix of node features as X𝑋Xitalic_X. The decision on which nodes to connect to a hyperedge is usually based on the feature similarity or spatial distance (i.e. closely related nodes are connected together by a single hyperedge). Feng et al. [118] introduced the hypergraph neural network (visualized in Figure 8), which defined a message passing operation on hypergraphs as follows:

𝐗(l+1)=σ(𝐃v1/2𝐇𝐖𝐃e1𝐇𝐃v1/2𝐗(l)𝚯(𝐥))superscript𝐗𝑙1𝜎superscriptsubscript𝐃𝑣12superscriptsubscript𝐇𝐖𝐃𝑒1superscript𝐇topsuperscriptsubscript𝐃𝑣12superscript𝐗𝑙superscript𝚯𝐥\mathbf{X}^{(l+1)}=\sigma\left(\mathbf{D}_{v}^{-1/2}\mathbf{H}\mathbf{W}% \mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}\mathbf{X}^{(l)}% \mathbf{\Theta^{(l)}}\right)bold_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( bold_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_HWD start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT bold_Θ start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ) (38)

where W𝑊Witalic_W is a learned weight matrix, σ𝜎\sigmaitalic_σ denotes a nonlinear activation function, and ΘΘ\Thetaroman_Θ is a learnable filter matrix used for feature extraction. After applying message passing, we have an updated feature matrix X𝑋Xitalic_X. This can then be used to obtain features on the hyperedge level using the equation Xhe(l+1)=HT×Xsubscriptsuperscript𝑋𝑙1𝑒superscript𝐻𝑇𝑋X^{(l+1)}_{he}=H^{T}\times Xitalic_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_e end_POSTSUBSCRIPT = italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT × italic_X. Finally, the updated node-level embeddings are acquired by multiplying the hyperedge features with the incidence matrix: X(l+1)=Xhe(l+1)×Hsuperscript𝑋𝑙1subscriptsuperscript𝑋𝑙1𝑒𝐻X^{(l+1)\prime}=X^{(l+1)}_{he}\times Hitalic_X start_POSTSUPERSCRIPT ( italic_l + 1 ) ′ end_POSTSUPERSCRIPT = italic_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_e end_POSTSUBSCRIPT × italic_H.

Refer to caption
Figure 8: Graphical overview of the hypergraph neural network framework [118]. First message-passing gets applied between all nodes connected to the same hyperedge. Then the learned features are calculated on an hyperedge-levels. Finally, the hyperedge-level features are used to calculate the new node features.

Di et al. [120] were the first to model WSIs as hypergraphs. They used their hypergraph approach for survival prediction in lung and brain cancer datasets. The authors started by constructing sets of K𝐾Kitalic_K similar patches based on the Euclidean distance between the feature vectors, which were retrieved using an ImageNet-pretrained ResNet model. N𝑁Nitalic_N hyperedges are then used to connect the patches in each of the sets. The authors then used the node feature matrix X𝑋Xitalic_X with the defined hypergraph, captured in H𝐻Hitalic_H, and updated the features using a series of convolutional hypergraph layers (HGNN). The acquired representations after message-passing are then used for the downstream survival prediction task. The authors show that their hypergraph-based method outperforms other CNN- and GNN-based frameworks for survival prediction.

Bakht et al. [121] followed by the construction of a patch-based hypergraph for the classification of patches in colorectal cancer. They used an ImageNet-pretrained VGG-19 model for extracting features for each WSI patch. Given a fixed neighbor parameter k𝑘kitalic_k, their hypergraph construction strategy starts by defining the distance between any two patches i𝑖iitalic_i, j𝑗jitalic_j as:

dk(i,j)=exp(xixj222σ2)subscript𝑑𝑘𝑖𝑗superscriptsubscriptnormsubscript𝑥𝑖subscript𝑥𝑗222superscript𝜎2d_{k}(i,j)=\exp\left(-\frac{||x_{i}-x_{j}||_{2}^{2}}{2\sigma^{2}}\right)italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i , italic_j ) = roman_exp ( - divide start_ARG | | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (39)

where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT represent the feature vectors of patch i𝑖iitalic_i and j𝑗jitalic_j, respectively, and σ𝜎\sigmaitalic_σ is a bandwidth parameter. Then, the authors calculated the vertex-edge probabilistic incidence matrix which determines the probability of a node v𝑣vitalic_v to be connected using hyperedge e𝑒eitalic_e:

h(n,e)={exp(dpmaxdavg),if ve0,if ve𝑛𝑒cases𝑑subscript𝑝maxsubscript𝑑avgif 𝑣𝑒0if 𝑣𝑒h(n,e)=\begin{cases}\exp\left(-\frac{d}{p_{\text{max}}d_{\text{avg}}}\right),&% \text{if }v\in e\\ 0,&\text{if }v\notin e\end{cases}italic_h ( italic_n , italic_e ) = { start_ROW start_CELL roman_exp ( - divide start_ARG italic_d end_ARG start_ARG italic_p start_POSTSUBSCRIPT max end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT avg end_POSTSUBSCRIPT end_ARG ) , end_CELL start_CELL if italic_v ∈ italic_e end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_v ∉ italic_e end_CELL end_ROW (40)

Here d𝑑ditalic_d denotes the distance between the current node n𝑛nitalic_n and the neighboring node. pmaxsubscript𝑝𝑚𝑎𝑥p_{max}italic_p start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT denotes the maximum probability and davgsubscript𝑑𝑎𝑣𝑔d_{avg}italic_d start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT is the average distance between all k𝑘kitalic_k nearest neighbors. Finally, they use this incidence matrix to calculate the node and edge degrees:

d(v)=vVh(v,e),d(e)=eEh(v,e)formulae-sequence𝑑𝑣subscript𝑣𝑉𝑣𝑒𝑑𝑒subscript𝑒𝐸𝑣𝑒d(v)=\sum_{v\in V}h(v,e),\quad d(e)=\sum_{e\in E}h(v,e)italic_d ( italic_v ) = ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_h ( italic_v , italic_e ) , italic_d ( italic_e ) = ∑ start_POSTSUBSCRIPT italic_e ∈ italic_E end_POSTSUBSCRIPT italic_h ( italic_v , italic_e ) (41)

The degrees are combined into matrices Dvsubscript𝐷𝑣D_{v}italic_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, which are used, together with the incidence matrix H𝐻Hitalic_H and node feature matrix X𝑋Xitalic_X in 3 HGNN message passing layers. The output of these layers was used to predict the label of patches in the WSI.

Di et al. [122] then expanded on their previous work by using multiple hypergraphs that are fused together to be used as input for message passing layers. Specifically, they construct a topological hypergraph and a phenotype (feature-based) hypergraph. The authors sampled patches sequentially from the tissue boundary to the tissue center and grouped the patches in the same sequence step in the same topological area. The topological hypergraph is constructed by connecting neighboring patches with a hyperedge if they belong to the same topological area. The phenotype hypergraph meanwhile, is constructed using K-NN based on the vector similarity between the patch features. The two hypergraphs are then concatenated together to form a total incidence matrix H𝐻Hitalic_H. For processing the constructed hypergraph, the authors use max-mask convolutional layers, which are defined in 4 iterative steps:

  1. 1.

    Hyperedge Feature Gathering First, hyperedge-level features are formed by multiplying the hypergraph incidence matrix (H𝐻Hitalic_H) and the node feature matrix (Fv(l)superscriptsubscript𝐹𝑣𝑙F_{v}^{(l)}italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT). This step aggregates the information from nodes connected by each hyperedge, resulting in hyperedge-level features (Fe(l)superscriptsubscript𝐹𝑒𝑙F_{e}^{(l)}italic_F start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT).

  2. 2.

    Max-Mask Operation After gathering hyperedge-level features, a max-mask operation is performed on each dimensionality of Fe(l)superscriptsubscript𝐹𝑒𝑙F_{e}^{(l)}italic_F start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. This operation aims to avoid overfitting by disregarding the contribution of dominant hyperedges that take the largest values.

  3. 3.

    Node Feature Aggregating By multiplying the hyperedge features with the transposed incidence matrix (HT×Fe(l)superscript𝐻𝑇superscriptsubscript𝐹𝑒𝑙H^{T}\times F_{e}^{(l)}italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT × italic_F start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT), we can calculate the node features (Fv(l+1)superscriptsubscript𝐹𝑣𝑙1F_{v}^{(l+1)}italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT).

  4. 4.

    Node Feature Reweighting Finally, the output node features are further weighted using learnable parameters (ι(l)superscript𝜄𝑙\iota^{(l)}italic_ι start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT), which are represented as a diagonal matrix. This reweighting is followed by a non-linear activation function (σ𝜎\sigmaitalic_σ). The reweighting step allows the model to learn the importance of different node features and adaptively adjust them.

Mathematically, the max-mask convolutional layer is defined as follows:

X(l+1)superscript𝑋𝑙1\displaystyle X^{(l+1)}italic_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT =σ((IL)X(l)+H1(IL)X(λ)ι(l))absent𝜎𝐼𝐿superscript𝑋𝑙superscript𝐻1𝐼𝐿superscript𝑋𝜆superscript𝜄𝑙\displaystyle=\sigma\left((I-L)X^{(l)}+H^{-1}(I-L)X^{(\lambda)}\cdot\iota^{(l)% }\right)= italic_σ ( ( italic_I - italic_L ) italic_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I - italic_L ) italic_X start_POSTSUPERSCRIPT ( italic_λ ) end_POSTSUPERSCRIPT ⋅ italic_ι start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) (42)
Fe(l+1)subscriptsuperscript𝐹𝑙1𝑒\displaystyle F^{(l+1)}_{e}italic_F start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT =H1(IL)X(l)+X(λ)absentsuperscript𝐻1𝐼𝐿superscript𝑋𝑙superscript𝑋𝜆\displaystyle=H^{-1}(I-L)X^{(l)}+X^{(\lambda)}= italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I - italic_L ) italic_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + italic_X start_POSTSUPERSCRIPT ( italic_λ ) end_POSTSUPERSCRIPT

Here, L𝐿Litalic_L is the multigraph Laplacian matrix, and I𝐼Iitalic_I denotes the identity matrix. H1(IL)X(λ)superscript𝐻1𝐼𝐿superscript𝑋𝜆H^{-1}(I-L)X^{(\lambda)}italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I - italic_L ) italic_X start_POSTSUPERSCRIPT ( italic_λ ) end_POSTSUPERSCRIPT functionally ensures that the top λ𝜆\lambdaitalic_λ attribute feature dimensionalities are ignored during gradient calculation.

Bankirane et al. [123] used adaptive agglomeration clustering to construct a patch hypergraph, which was then processed using a combination of HGNN and HGAT layers. The authors used self-supervised learning to learn patch-level representations. For agglomeration clustering, a similarity kernel was used that took into account both spatial locality and feature similarity between patches. This kernel calculated similarity scores between all two patches. If the similarity score was higher than a fixed threshold δ𝛿\deltaitalic_δ, the patches were assigned to the same cluster Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. For each cluster, the representation of the patches in the cluster was averaged to obtain cluster-level representations. Each clustered patch is treated as a node of a hypergraph. The hyperedges connected all nodes with a feature similarity higher than a fixed threshold δhsubscript𝛿\delta_{h}italic_δ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. We denote the neighborhood of a clustered node cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as γ(ci)=ciC;κh(ci,cj)δhformulae-sequence𝛾subscript𝑐𝑖subscript𝑐𝑖𝐶subscript𝜅subscript𝑐𝑖subscript𝑐𝑗subscript𝛿\gamma(c_{i})={c_{i}\in C;\kappa_{h}(c_{i},c_{j})\geq\delta_{h}}italic_γ ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_C ; italic_κ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≥ italic_δ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. Here, C𝐶Citalic_C denotes the set of all clusters andκh(ci,cj)subscript𝜅subscript𝑐𝑖subscript𝑐𝑗\kappa_{h}(c_{i},c_{j})italic_κ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) denotes the output of the feature similarity kernel κhsubscript𝜅\kappa_{h}italic_κ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. Having determined the neighborhood, we can calculate the incidence matrix H𝐻Hitalic_H where:

hk,j={1,if ciγ(ci)0,elsesubscript𝑘𝑗cases1if subscript𝑐𝑖𝛾subscript𝑐𝑖0elseh_{k,j}=\begin{cases}1,&\text{if }c_{i}\in\gamma(c_{i})\\ 0,&\text{else}\\ \end{cases}italic_h start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_γ ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL else end_CELL end_ROW (43)

The authors then used the incidence matrix H𝐻Hitalic_H, and node feature matrix X𝑋Xitalic_X as input for a series of HGNN-HGAT layers and were finally pooled into a hypergraph-level representation. This representation was finally used as input for an MLP layer which predicted the hazard score for survival prediction.

Most recently, Liang et al. [124] introduced the adaptive HGNN to histopathology, for the classification of sentinel node metastases and the differentiation between lung squamous cell carcinoma and lung adenocarcinoma. Here, the authors used the K-NN algorithm on patch-level ImageNet-pretrained ResNet features to construct a hypergraph of patches, where the k𝑘kitalic_k most similar patches were connected using a hyperedge. Their main innovation comes in the form of adaptive HGNN, which can adjust the correlation strength between nodes and hyperedges on the graph during model training. They first denote a matrix of edge strength in layer l𝑙litalic_l as T(l)superscript𝑇𝑙T^{(l)}italic_T start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. Each element ti(l)T(l)superscriptsubscript𝑡𝑖𝑙superscript𝑇𝑙t_{i}^{(l)}\in T^{(l)}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∈ italic_T start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, which denotes the attention score of the node i𝑖iitalic_i and its associated hyperedge ei,isubscript𝑒𝑖superscript𝑖e_{i,i^{\prime}}italic_e start_POSTSUBSCRIPT italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in the l𝑙litalic_l-th layer, is defined as:

ti(l)=exp(σ(sim(fiM(l),ei,iM(l))))kNjexp(σ(sim(fiM(l),ei,kM(l))))superscriptsubscript𝑡𝑖𝑙𝜎𝑠𝑖𝑚subscript𝑓𝑖superscript𝑀𝑙subscript𝑒𝑖superscript𝑖superscript𝑀𝑙subscript𝑘subscript𝑁𝑗𝜎𝑠𝑖𝑚subscript𝑓𝑖superscript𝑀𝑙subscript𝑒𝑖𝑘superscript𝑀𝑙t_{i}^{(l)}=\frac{{\exp(\sigma(sim(f_{i}M^{(l)},e_{{i,i^{\prime}}}M^{(l)})))}}% {{\sum\nolimits_{{k\in N_{j}}}{\exp(\sigma(sim(f_{i}M^{(l)},e_{i,k}M^{(l)})))}}}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = divide start_ARG roman_exp ( italic_σ ( italic_s italic_i italic_m ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , italic_e start_POSTSUBSCRIPT italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_σ ( italic_s italic_i italic_m ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , italic_e start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ) end_ARG (44)

here, M(l)superscript𝑀𝑙M^{(l)}italic_M start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT denotes a feature transformation matrix. ei,isubscript𝑒𝑖superscript𝑖e_{i,i^{\prime}}italic_e start_POSTSUBSCRIPT italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT denotes the hyperedge in connecting node i𝑖iitalic_i and isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By calculating these edge strength scores, the incidence matrix can be updated as follows:

H~i′′(l)=DV1/2(Ti(l)Hi)WDe1(Ti(l)Hi)TDV1/2superscript~𝐻𝑖′′𝑙superscriptsubscript𝐷𝑉12direct-productsuperscriptsubscript𝑇𝑖𝑙superscript𝐻𝑖𝑊superscriptsubscript𝐷𝑒1superscriptdirect-productsuperscriptsubscript𝑇𝑖𝑙superscript𝐻𝑖Tsuperscriptsubscript𝐷𝑉12\tilde{H}^{i{\prime\prime(l)}}=D_{V}^{-1/2}(T_{i}^{(l)}\odot H^{i})WD_{e}^{-1}% (T_{i}^{(l)}\odot H^{i})^{{\text{T}}}D_{V}^{-1/2}over~ start_ARG italic_H end_ARG start_POSTSUPERSCRIPT italic_i ′ ′ ( italic_l ) end_POSTSUPERSCRIPT = italic_D start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⊙ italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_W italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⊙ italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT (45)

where Dvsubscript𝐷𝑣D_{v}italic_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT denote the node degree and edge degree matrices. T(l)superscript𝑇𝑙T^{(l)}italic_T start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT denotes the edge strength matrix and W𝑊Witalic_W is a learnable weight matrix. This function essentially adapts the node interconnection in H𝐻Hitalic_H using the calculated edge strengths in T(l)superscript𝑇𝑙T^{(l)}italic_T start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. Note that the edge strength changes depending on the layer l𝑙litalic_l, as the feature similarities also change between layer embeddings. The feature matrix is updated as follows:

F~i(l+1)={f~i,j}j=1P=σ((H~i′′(l))Fi(l)Pi(l))superscriptsubscript~𝐹𝑖𝑙1superscriptsubscriptsubscript~𝑓𝑖𝑗𝑗1𝑃𝜎superscript~𝐻𝑖′′𝑙superscriptsubscript𝐹𝑖𝑙superscriptsubscript𝑃𝑖𝑙\tilde{F}_{i}^{(l+1)}=\{\tilde{f}_{i,j}\}_{j=1}^{P}=\sigma((\tilde{H}^{i{% \prime\prime(l)}})F_{i}^{(l)}P_{i}^{(l)})over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = { over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT = italic_σ ( ( over~ start_ARG italic_H end_ARG start_POSTSUPERSCRIPT italic_i ′ ′ ( italic_l ) end_POSTSUPERSCRIPT ) italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) (46)

where σ𝜎\sigmaitalic_σ is a nonlinear activation function and Pi(l)P_{i}^{(}l)italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( end_POSTSUPERSCRIPT italic_l ) denotes a learned projection matrix.

Publication Date Application Hypergraph type Message-Passing
Di et al. [120] 2020/09 Survival prediction Patch Hypergraph HGNN
Bakht et al. [121] 2021/05 Patch classification Patch Hypergraph HGNN
Di et al. [122] 2022/09 Survival prediction Patch Hypergraph HGMConv
Benkirane et al. [123] 2022/11 Survival prediction Patch Hypergraph HGCN, HGAT
Liang et al. [124] 2024/02 Binary classification Patch HyperGraph Adaptive HGNN
Table 5: Publications which utilized hypergraph neural networks for histopathology WSI analysis.

5 Future Prospects and Directions

5.1 Topological Deep Learning

In our review, we highlighted the application of deep learning on hypergraphs in histopathology. Interestingly, this approach has only been applied on a patch level, whereas we argue that hypergraph-based modeling might be very well suited for cell-level modeling. For example, cells can be organized in clusters that can have an important diagnostic context [125]. Such cell clusters could be modeled using hypergraphs, where homogeneous clusters are connected using a single hyperedge. Furthermore, there exist many other higher-order graph types such as cellular complexes and combinatorial complexes. We anticipate that these approaches will also be tested in a histopathological context. For example, using cellular complexes, different semantic tissue structures (e.g., tertiary lymphoid structures) can be modeled jointly with cells, but as separate graph entities.

5.2 Graph transformers

In the last few years, GNNs have been combined with transformer architectures, which has given birth to the Graph Transformer modeling paradigm. Graph transformers either use the positional embedding of the graph in the input to the transformer module, use the graph structure as a prior to build an attention mask for each input, or directly combine message passing layers with transformer blocks in the model architecture [126]. Graph transformers are especially suited for modeling long-distance relations in graphs, as they do not suffer from oversmoothing, where node representations become almost identical across the graph when using increased GNN layer depth and oversquashing, where the computational costs of adding GNN layers growth exponentially [127]. In histopathology, these graph transformers have also been used. One major challenge in the application of graph transformers is their scalability, as the time- and memory complexity of the attention mechanism in Transformers grows exponentially (O(|V|2)𝑂superscript𝑉2O(|V|^{2})italic_O ( | italic_V | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where V𝑉Vitalic_V is the number of nodes). This is especially a problem in cell graphs in histopathology, as these graphs often pass 10.000 nodes in size. Recently, efforts have been made to greatly mitigate this scalability challenge [128] [129] [130], which leads us to believe the popularity of graph transformers in histopathology will continue to grow.

5.3 Graph-based multimodality

In our review, we highlighted the use of graph-based modeling in multimodal approaches, but we argue that graphs themselves should be utilized more for the multimodal integration itself. For example, several researchers have used the concept of a Patient graph, where nodes represents (aggregated) datapoints from different medical modalities corresponding to the same patient [131] or multiple patients [132] [133]. Some approaches use graphs to model time series data, where, for example, medical information on the same patient gathered at different timepoints can be effectively utilized [134] [135]. Zheng et al. proposed a framework in which adaptive graph structure learning and GNNs are combined to integrate data from different medical modalities for disease prediction [136]. One major problem in the application of multimodal approaches in histopathology is that, often, not every modality is available for each patient. This effectively creates a missing modality problem. Ma et al. proposed a Bayesian meta-learning framework which mitigates this problem, allowing effectively multimodal learning and prediction even when a large number of modalities are missing in the data [137]. We argue that these approaches should be combined to effectively model the relationships between modalities, based on the task at hand, even in settings where modalities are missing from the patient data.

5.4 SSL using GNNs

Due to the high costs of annotation in histopathology, adaptation of self-supervised learning (SSL) has been steadily growing in histopathology applications, particularly for feature extraction. As such, they have been primarily adapted for feature extraction in GNN approaches. Although there have been a handful of approaches that used graph-based SSL [138] [139], we argue that more can still be gained from this approach. For example, only contrastive approaches have been tried, which leaves room for other schemes (e.g., autoregressive, generative). We propose using an approach similar to that of Deep Graph Infomax [140], where the aim is to maximize the mutual information between the global graph structure and the local subgraphs. This effectively makes the node features mindful of the global graph structure. This idea can be extended in hierarchical histopathology graphs, where the agreement between intermediate graphs (e.g., cell graphs and tissue graphs) can be maximized, to get more context-aware embeddings, similar to work by Yan et al. [141]

5.5 Hierarchical modeling in GNNs

As explained in our review, hierarchical GNNs are an increasingly popular modeling technique for histopathology WSIs, due to the information in WSIs existing on different levels of coarsity. We believe that this trend will continue and extend to hypergraphs and other higher-order graph structures, for which hierarchical pooling frameworks are currently being established [142] [143]. Another future approach will be to learn the necessary level of coarsening to establish an effective hierarchical structure end-to-end, which is currently controlled using a pooling ratio hyperparameter. We argue that different levels of graph coarsity might be optimal for different problems, as some problems in histopathology rely more on cell-level information, while others more on larger tissue structures. Lastly, in most current approaches, message-passing occurs on each level of hierarchy separately, not directly between hierarchies. We argue that the field could move to message passing schemes that are more effective at taking into account the hierarchical graph structure [144].

5.6 Foundation models in computational histopathology

The rise of self-supervised learning as well as increased availability of histopathology datasets, has allowed the construction of very large deep neural networks, termed Foundation modes, trained on huge amounts of (unlabeled) histopathology images [145]. These models can be used for effective feature extraction in a wide variety of tissue types. Recent approaches have introduced medical texts in addition to image data [146] [147], which allows associating image data with medical texts and is thus very suitable for CBHIR applications. In both natural language processing and computer vision, there has been a move to foundation models that incorporate an even broader spectrum of modalities (video [148], audio [149], knowledge graphs [150]). We argue that in histopathology and medical imaging in general, there will also be a move towards broader multimodality, especially given the amount of different modalities available in the medical domain (WSI, IHC, MRI, CT, EHR, etc.). Graph models of WSIs can also be used as input in these models, to encode the topological information present in WSIs and correlate that with the image data.

5.7 Adaptive graph structure learning

We have seen that adaptive graph structure learning is currently based either on learned projections or CNN filters. Outside of histopathology, most adaptive graph structure learning assume graph homophily [151] where similar nodes are likely to be colocated. In histopathology, this is not always the case, as some structures might be composed of different cell types which can vary widely in morphology. Furthermore, most applications focus on homogeneous graphs, where a single type of node and edge exists. Work by Zhao et al. [152] showed that we can learn a heterogeneous graph optimized for downstream tasks, which is suitable for graphs showing heterophily, which can be the case in histopathology. Therefore, we argue that heterogeneous graph learning will be a useful approach for histopathology, if we model the WSIs as a heterogeneous graphs.

6 Conclusion

In this review, we provided a comprehensive overview of the recent developments in the applications of GNNs in histopathology, which can be used for guiding new research in the field. We quantified the growth of different modeling paradigms in the use of GNNs in histopathology. Based on our quantification, we provided a comprehensive overview of several emerging subfields, including hierarchical graph models, adaptive graph structure learning, multimodality using GNNs, and higher-order graph models. We then provided future directions in the field, including the use of topological deep learning, graph transformer models, self-supervised learning using GNNs, the use of foundation models and expanding adaptive graph structure learning to heterogeneous graphs.

References

  • [1] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  • [2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  • [3] Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018.
  • [4] PJ Sudharshan, Caroline Petitjean, Fabio Spanhol, Luiz Eduardo Oliveira, Laurent Heutte, and Paul Honeine. Multiple instance learning for histopathological breast cancer image classification. Expert Systems with Applications, 117:103–111, 2019.
  • [5] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  • [6] Xiyue Wang, Sen Yang, Jun Zhang, Minghui Wang, Jing Zhang, Junzhou Huang, Wei Yang, and Xiao Han. Transpath: Transformer-based self-supervised learning for histopathological image classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24, pages 186–195. Springer, 2021.
  • [7] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  • [8] Ozan Ciga, Tony Xu, and Anne Louise Martel. Self supervised contrastive learning for digital histopathology. Machine Learning with Applications, 7:100198, 2022.
  • [9] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
  • [10] Ruoyu Li, Jiawen Yao, Xinliang Zhu, Yeqing Li, and Junzhou Huang. Graph cnn for survival analysis on whole slide pathological images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 174–182. Springer, 2018.
  • [11] Kristof T Schütt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R Müller. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24), 2018.
  • [12] Shuangli Li, Jingbo Zhou, Tong Xu, Liang Huang, Fan Wang, Haoyi Xiong, Weili Huang, Dejing Dou, and Hui Xiong. Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 975–985, 2021.
  • [13] Zhiyong Cui, Kristian Henrickson, Ruimin Ke, and Yinhai Wang. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems, 21(11):4883–4894, 2019.
  • [14] Di Zhu, Fan Zhang, Shengyin Wang, Yaoli Wang, Ximeng Cheng, Zhou Huang, and Yu Liu. Understanding place characteristics in geographic contexts through graph convolutional neural networks. Annals of the American Association of Geographers, 110(2):408–420, 2020.
  • [15] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 974–983, 2018.
  • [16] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.
  • [17] Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, and Alexander B Wiltschko. A gentle introduction to graph neural networks. Distill, 6(9):e33, 2021.
  • [18] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • [19] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • [20] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  • [21] David Ahmedt-Aristizabal, Mohammad Ali Armin, Simon Denman, Clinton Fookes, and Lars Petersson. A survey on graph-based deep learning for computational histopathology. Computerized Medical Imaging and Graphics, 95:102027, 2022.
  • [22] Xiangyan Meng and Tonghui Zou. Clinical applications of graph neural networks in computational histopathology: A review. Computers in Biology and Medicine, 164:107201, 2023.
  • [23] Ali Khajegili Mirabadi, Graham Archibald, Amirali Darbandsari, Alberto Contreras-Sanz, Ramin Ebrahim Nakhli, Maryam Asadi, Allen Zhang, C Blake Gilks, Peter Black, Gang Wang, et al. Grasp: Graph-structured pyramidal whole slide image representation. arXiv preprint arXiv:2402.03592, 2024.
  • [24] Pushpak Pati, Guillaume Jaume, Lauren Alisha Fernandes, Antonio Foncubierta-Rodríguez, Florinda Feroce, Anna Maria Anniciello, Giosue Scognamiglio, Nadia Brancati, Daniel Riccio, Maurizio Di Bonito, et al. Hact-net: A hierarchical cell-to-tissue graph neural network for histopathological image classification. In Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis: Second International Workshop, UNSURE 2020, and Third International Workshop, GRAIL 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 2, pages 208–219. Springer, 2020.
  • [25] Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, and Faisal Mahmood. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Transactions on Medical Imaging, 41(4):757–770, 2020.
  • [26] Donglin Di, Jun Zhang, Fuqiang Lei, Qi Tian, and Yue Gao. Big-hypergraph factorization neural network for survival prediction from whole slide image. IEEE Transactions on Image Processing, 31:1149–1160, 2022.
  • [27] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang. Deep graph structure learning for robust representations: A survey. arXiv preprint arXiv:2103.03036, 14, 2021.
  • [28] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  • [29] Shanshan Tang, Bo Li, and Haijun Yu. Chebnet: Efficient and stable constructions of deep neural networks with rectified power units using chebyshev approximations. arXiv preprint arXiv:1911.05467, 2019.
  • [30] Harshita Sharma, Norman Zerbe, Sebastian Lohmann, Klaus Kayser, Olaf Hellwich, and Peter Hufnagl. A review of graph-based methods for image analysis in digital histopathology. Diagnostic pathology, 1(1), 2015.
  • [31] Cagatay Bilgin, Cigdem Demir, Chandandeep Nagi, and Bulent Yener. Cell-graph mining for breast tissue modeling and classification. In 2007 29th Annual international conference of the IEEE Engineering in Medicine and Biology Society, pages 5311–5314. IEEE, 2007.
  • [32] Phillip P Santoiemma and Daniel J Powell Jr. Tumor infiltrating lymphocytes in ovarian cancer. Cancer biology & therapy, 16(6):807–820, 2015.
  • [33] Sahirzeeshan Ali, Robert Veltri, Jonathan A Epstein, Christhunesa Christudass, and Anant Madabhushi. Cell cluster graph for prediction of biochemical recurrence in prostate cancer patients from tissue microarrays. In Medical Imaging 2013: Digital Pathology, volume 8676, pages 164–174. SPIE, 2013.
  • [34] Hayley M Reynolds, Scott Williams, Alan M Zhang, Cheng Soon Ong, David Rawlinson, Rajib Chakravorty, Catherine Mitchell, and Annette Haworth. Cell density in prostate histopathology images as a measure of tumor distribution. In Medical Imaging 2014: Digital Pathology, volume 9041, pages 180–187. SPIE, 2014.
  • [35] Le Hou, Dimitris Samaras, Tahsin M Kurc, Yi Gao, James E Davis, and Joel H Saltz. Patch-based convolutional neural network for whole slide tissue image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2424–2433, 2016.
  • [36] Mohammed Adnan, Shivam Kalra, and Hamid R Tizhoosh. Representation learning of histopathology images using graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 988–989, 2020.
  • [37] Yushan Zheng, Bonan Jiang, Jun Shi, Haopeng Zhang, and Fengying Xie. Encoding histopathological wsis using gnn for scalable diagnostically relevant regions retrieval. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, pages 550–558. Springer, 2019.
  • [38] Mookund Sureka, Abhijeet Patil, Deepak Anand, and Amit Sethi. Visualization for histopathology images using graph convolutional neural networks. In 2020 IEEE 20th international conference on bioinformatics and bioengineering (BIBE), pages 331–335. IEEE, 2020.
  • [39] Tai Hasegawa, Helena Arvidsson, Nikolce Tudzarovski, Karl Meinke, Rachael V Sugars, and Aravind Ashok Nair. Edge-based graph neural networks for cell-graph modeling and prediction. In International Conference on Information Processing in Medical Imaging, pages 265–277. Springer, 2023.
  • [40] Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. Multimodal learning with graphs. Nature Machine Intelligence, 5(4):340–350, 2023.
  • [41] Bin Li, Michael S Nelson, Omid Savari, Agnes G Loeffler, and Kevin W Eliceiri. Differentiation of pancreatic ductal adenocarcinoma and chronic pancreatitis using graph neural networks on histopathology and collagen fiber features. Journal of Pathology Informatics, 13:100158, 2022.
  • [42] Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza, Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, and Nasir Rajpoot. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical image analysis, 58:101563, 2019.
  • [43] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11):2274–2282, 2012.
  • [44] Babak Ehteshami Bejnordi, Geert Litjens, Meyke Hermsen, Nico Karssemeijer, and Jeroen AWM van der Laak. A multi-scale superpixel classification approach to the detection of regions of interest in whole slide histopathology images. In Medical Imaging 2015: Digital Pathology, volume 9420, pages 99–104. SPIE, 2015.
  • [45] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [46] Atharva Tendle and Mohammad Rashedul Hasan. A study of the generalizability of self-supervised representations. Machine Learning with Applications, 6:100124, 2021.
  • [47] Zhiyang Gao, Jun Shi, and Jun Wang. Gq-gcn: Group quadratic graph convolutional network for classification of histopathological images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24, pages 121–131. Springer, 2021.
  • [48] Mo Zhang, Bin Dong, and Quanzheng Li. Ms-gwnn: multi-scale graph wavelet neural network for breast cancer diagnosis. In 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2022.
  • [49] Wentai Hou, Lequan Yu, Chengxuan Lin, Helong Huang, Rongshan Yu, Jing Qin, and Liansheng Wang. H^ 2-mil: Exploring hierarchical representation with heterogeneous multiple instance learning for whole slide image analysis. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 933–941, 2022.
  • [50] Ramin Nakhli, Allen Zhang, Ali Mirabadi, Katherine Rich, Maryam Asadi, Blake Gilks, Hossein Farahani, and Ali Bashashati. Co-pilot: Dynamic top-down point cloud with conditional neighborhood aggregation for multi-gigapixel histopathology image representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21063–21073, 2023.
  • [51] Shidan Wang, Ruichen Rong, Qin Zhou, Donghan M Yang, Xinyi Zhang, Xiaowei Zhan, Justin Bishop, Zhikai Chi, Clare J Wilhelm, Siyuan Zhang, et al. Deep learning of cell spatial organizations identifies clinically relevant insights in tissue images. Nature communications, 14(1):7872, 2023.
  • [52] Valentin Anklin, Pushpak Pati, Guillaume Jaume, Behzad Bozorgtabar, Antonio Foncubierta-Rodrıguez, Jean-Philippe Thiran, Mathilde Sibony, Maria Gabrani, and Orcun Goksel. Learning whole-slide segmentation from inexact and incomplete labels using tissue graphs (2021). arXiv preprint arXiv:2103.03129.
  • [53] Jun Zhang, Zhiyuan Hua, Kezhou Yan, Kuan Tian, Jianhua Yao, Eryun Liu, Mingxia Liu, and Xiao Han. Joint fully convolutional and graph convolutional networks for weakly-supervised segmentation of pathology images. Medical image analysis, 73:102183, 2021.
  • [54] PengHui He, AiPing Qu, ShuoMin Xiao, and MeiDan Ding. A gnn-based network for tissue semantic segmentation in histopathology image. In Journal of Physics: Conference Series, volume 2504, page 012047. IOP Publishing, 2023.
  • [55] Sachin S Bahade, Michael Edwards, and Xianghua Xie. Cascaded graph convolution approach for nuclei detection in histopathology images. Journal of Image and Graphics, 11(1), 2023.
  • [56] Zhi Wang, Kai Fan, Xiaoya Zhu, Honglei Liu, Gang Meng, Minghui Wang, and Ao Li. Cross-domain nuclei detection in histopathology images using graph-based nuclei feature alignment. IEEE Journal of Biomedical and Health Informatics, 2023.
  • [57] Marta Wojciechowska, Stefano Malacrino, Natalia Garcia Martin, Hamid Fehri, and Jens Rittscher. Early detection of liver fibrosis using graph convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 217–226. Springer, 2021.
  • [58] Aravind Nair, Helena Arvidsson, Jorge E Gatica V, Nikolce Tudzarovski, Karl Meinke, and Rachael V Sugars. A graph neural network framework for mapping histological topology in oral mucosal tissue. BMC bioinformatics, 23(1):506, 2022.
  • [59] Amaya Gallagher-Syed, Luca Rossi, Felice Rivellese, Costantino Pitzalis, Myles Lewis, Michael Barnes, and Gregory Slabaugh. Multi-stain self-attention graph multiple instance learning pipeline for histopathology whole slide images. arXiv preprint arXiv:2309.10650, 2023.
  • [60] Joonsang Lee, Elisa Warner, Salma Shaikhouni, Markus Bitzer, Matthias Kretzler, Debbie Gipson, Subramaniam Pennathur, Keith Bellovich, Zeenat Bhat, Crystal Gadegbeku, et al. Clustering-based spatial analysis (clusa) framework through graph neural network for chronic kidney disease prediction using histopathology images. Scientific Reports, 13(1):12701, 2023.
  • [61] Ran Su, Hao He, Changming Sun, Xiaomin Wang, and Xiaofeng Liu. Prediction of drug-induced hepatotoxicity based on histopathological whole slide images. Methods, 212:31–38, 2023.
  • [62] Vasundhara Acharya, Diana Choi, Bülent Yener, and Gillian Beamer. Prediction of tuberculosis from lung tissue images of diversity outbred mice using jump knowledge based cell graph neural network. IEEE Access, 2024.
  • [63] Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems, 32, 2019.
  • [64] Lucie Charlotte Magister, Dmitry Kazhdan, Vikash Singh, and Pietro Liò. Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks. arXiv preprint arXiv:2107.11889, 2021.
  • [65] Guillaume Jaume, Pushpak Pati, Antonio Foncubierta-Rodriguez, Florinda Feroce, Giosue Scognamiglio, Anna Maria Anniciello, Jean-Philippe Thiran, Orcun Goksel, and Maria Gabrani. Towards explainable graph representations in digital pathology. arXiv preprint arXiv:2007.00311, 2020.
  • [66] Junchi Yu, Tingyang Xu, and Ran He. Towards the explanation of graph neural networks in digital pathology with information flows. arXiv preprint arXiv:2112.09895, 2021.
  • [67] Sina Abdous, Reza Abdollahzadeh, and Mohammad Hossein Rohban. Ks-gnnexplainer: Global model interpretation through instance explanations on histopathology images. arXiv preprint arXiv:2304.08240, 2023.
  • [68] Alessandro Farace di Villaforesta, Lucie Charlotte Magister, Pietro Barbiero, and Pietro Liò. Digital histopathology with graph neural networks: Concepts and explanations for clinicians. arXiv preprint arXiv:2312.02225, 2023.
  • [69] Pushpak Pati, Guillaume Jaume, Antonio Foncubierta-Rodriguez, Florinda Feroce, Anna Maria Anniciello, Giosue Scognamiglio, Nadia Brancati, Maryse Fiche, Estelle Dubruc, Daniel Riccio, et al. Hierarchical graph representations in digital pathology. Medical image analysis, 75:102264, 2022.
  • [70] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. Advances in neural information processing systems, 31, 2018.
  • [71] Junhyun Lee, Inyeop Lee, and Jaewoo Kang. Self-attention graph pooling. In International conference on machine learning, pages 3734–3743. PMLR, 2019.
  • [72] Filippo Maria Bianchi, Daniele Grattarola, and Cesare Alippi. Spectral clustering with graph neural networks for graph pooling. In International conference on machine learning, pages 874–883. PMLR, 2020.
  • [73] Wentai Hou, Helong Huang, Qiong Peng, Rongshan Yu, Lequan Yu, and Liansheng Wang. Spatial-hierarchical graph neural network with dynamic structure learning for histological image classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 181–191. Springer, 2022.
  • [74] Jiangbo Shi, Lufei Tang, Zeyu Gao, Yang Li, Chunbao Wang, Tieliang Gong, Chen Li, and Huazhu Fu. Mg-trans: Multi-scale graph transformer with information bottleneck for whole slide image classification. IEEE Transactions on Medical Imaging, 2023.
  • [75] Puria Azadi, Jonathan Suderman, Ramin Nakhli, Katherine Rich, Maryam Asadi, Sonia Kung, Htoo Oo, Mira Keyes, Hossein Farahani, Calum MacAulay, et al. All-in: Al ocal gl obal graph-based di stillatio n model for representation learning of gigapixel histopathology images with application in cancer risk assessment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 765–775. Springer, 2023.
  • [76] Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, and Danny Z Chen. Ccf-gnn: A unified model aggregating appearance, microenvironment, and topology for pathology image classification. IEEE Transactions on Medical Imaging, 2023.
  • [77] Weiqin Zhao, Shujun Wang, Maximus Yeung, Tianye Niu, and Lequan Yu. Mulgt: Multi-task graph-transformer with task-aware knowledge injection and domain knowledge-driven pooling for whole slide image analysis. arXiv preprint arXiv:2302.10574, 2023.
  • [78] Saisai Ding, Zhiyang Gao, Jun Wang, Minhua Lu, and Jun Shi. Fractal graph convolutional network with mlp-mixer based multi-path feature fusion for classification of histopathological images. Expert Systems with Applications, 212:118793, 2023.
  • [79] Yonghao Li, Yiqing Shen, Jiadong Zhang, Shujie Song, Zhenhui Li, Jing Ke, and Dinggang Shen. A hierarchical graph v-net with semi-supervised pre-training for histological image based breast cancer classification. IEEE Transactions on Medical Imaging, 2023.
  • [80] Yanning Zhou, Simon Graham, Navid Alemi Koohbanani, Muhammad Shaban, Pheng-Ann Heng, and Nasir Rajpoot. Cgc-net: Cell graph convolutional network for grading of colorectal cancer histology images. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
  • [81] Yushan Zheng, Zhiguo Jiang, Fengying Xie, Jun Shi, Haopeng Zhang, Jianguo Huai, Ming Cao, and Xiaomiao Yang. Diagnostic regions attention network (dra-net) for histopathology wsi recommendation and retrieval. IEEE transactions on medical imaging, 40(3):1090–1103, 2020.
  • [82] Nan Jiang, Yaqing Hou, Dongsheng Zhou, Pengfei Wang, Jianxin Zhang, and Qiang Zhang. Weakly supervised gleason grading of prostate cancer slides using graph neural network. In ICPRAM, pages 426–434, 2021.
  • [83] Yushan Zheng, Zhiguo Jiang, Haopeng Zhang, Fengying Xie, Jun Shi, and Chenghai Xue. Histopathology wsi encoding based on gcns for scalable and efficient retrieval of diagnostically relevant regions. arXiv preprint arXiv:2104.07878, 2021.
  • [84] Zichen Wang, Jiayun Li, Zhufeng Pan, Wenyuan Li, Anthony Sisk, Huihui Ye, William Speier, and Corey W Arnold. Hierarchical graph pathomic network for progression free survival prediction. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24, pages 227–237. Springer, 2021.
  • [85] Xu Xiang and Xiaofeng Wu. Multiple instance classification for gastric cancer pathological images based on implicit spatial topological structure representation. Applied Sciences, 11(21):10368, 2021.
  • [86] Chensu Xie, Chad Vanderbilt, Chao Feng, David Ho, Gabrielle Campanella, Jacklynn Egger, Andrew Plodkowski, Jeffrey Girshman, Peter Sawan, Kathryn Arbour, et al. Computational biomarker predicts lung ici response via deep learning-driven hierarchical spatial modelling from h&e. 2022.
  • [87] Chaitanya Dwivedi, Shima Nofallah, Maryam Pouryahya, Janani Iyer, Kenneth Leidal, Chuhan Chung, Timothy Watkins, Andrew Billin, Robert Myers, John Abel, et al. Multi stain graph fusion for multimodal integration in pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1835–1845, 2022.
  • [88] Yu Bai, Yue Mi, Yihan Su, Bo Zhang, Zheng Zhang, Jingyun Wu, Haiwen Huang, Yongping Xiong, Xiangyang Gong, and Wendong Wang. A scalable graph-based framework for multi-organ histology image classification. IEEE Journal of Biomedical and Health Informatics, 26(11):5506–5517, 2022.
  • [89] Yingli Zuo, Yawen Wu, Zixiao Lu, Qi Zhu, Kun Huang, Daoqiang Zhang, and Wei Shao. Identify consistent imaging genomic biomarkers for characterizing the survival-associated interactions between tumor-infiltrating lymphocytes and tumors. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 222–231. Springer, 2022.
  • [90] Seohoon Lim and Seung-Won Jung. A comparative study on graph construction methods for survival prediction using histopathology images. In 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pages 1–4. IEEE, 2022.
  • [91] Ruiwen Ding, Erika Rodriguez, Ana Cristina Araujo Lemos Da Silva, and William Hsu. Using graph neural networks to capture tumor spatial relationships for lung adenocarcinoma recurrence prediction. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023.
  • [92] Yawen Wu, Yingli Zuo, Qi Zhu, Jianpeng Sheng, Daoqiang Zhang, and Wei Shao. Transfer learning-assisted survival analysis of breast cancer relying on the spatial interaction between tumor-infiltrating lymphocytes and tumors. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 612–621. Springer, 2023.
  • [93] Wentai Hou, Yan He, Bingjian Yao, Lequan Yu, Rongshan Yu, Feng Gao, and Liansheng Wang. Multi-scope analysis driven hierarchical graph transformer for whole slide image based cancer survival prediction. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 745–754. Springer, 2023.
  • [94] Syed Farhan Abbas, Trinh Thi Le Vuong, Kyungeun Kim, Boram Song, and Jin Tae Kwak. Multi-cell type and multi-level graph aggregation network for cancer grading in pathology images. Medical Image Analysis, 90:102936, 2023.
  • [95] Jichen Xu, Jingmin Xin, Peiwen Shi, Jiayi Wu, Zheng Cao, Xiaoli Feng, and Nanning Zheng. Lymphoma recognition in histology image of gastric mucosal biopsy with prototype learning. In 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–4. IEEE, 2023.
  • [96] Zarif L Azher, Michael Fatemi, Yunrui Lu, Gokul Srinivasan, Alos B Diallo, Brock C Christensen, Lucas A Salas, Fred W Kolling IV, Laurent Perreard, Scott M Palisoul, et al. Spatial omics driven crossmodal pretraining applied to graph-based deep learning for cancer pathology analysis. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024, pages 464–476. World Scientific, 2023.
  • [97] Zijian Yang, Yibo Zhang, Lili Zhuo, Kaidi Sun, Fanling Meng, Meng Zhou, and Jie Sun. Prediction of prognosis and treatment response in ovarian cancer patients from histopathology images using graph deep learning: a multicenter retrospective study. European Journal of Cancer, 199:113532, 2024.
  • [98] Joe Sims, Heike I Grabsch, and Derek Magee. Using hierarchically connected nodes and multiple gnn message passing steps to increase the contextual information in cell-graph classification. In MICCAI Workshop on Imaging Systems for GI Endoscopy, pages 99–107. Springer, 2022.
  • [99] Yonghang Guan, Jun Zhang, Kuan Tian, Sen Yang, Pei Dong, Jinxi Xiang, Wei Yang, Junzhou Huang, Yuyao Zhang, and Xiao Han. Node-aligned graph convolutional network for whole-slide image representation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18813–18823, 2022.
  • [100] Jiangbo Shi, Lufei Tang, Yang Li, Xianli Zhang, Zeyu Gao, Yefeng Zheng, Chunbao Wang, Tieliang Gong, and Chen Li. A structure-aware hierarchical graph-based multiple instance learning framework for pt staging in histopathological image. IEEE Transactions on Medical Imaging, 2023.
  • [101] Ravi Kant Gupta, Nikhil Cherian Kurian, Pranav Jeevan, Amit Sethi, et al. Heterogeneous graphs model spatial relationships between biological entities for breast cancer diagnosis. arXiv preprint arXiv:2307.08132, 2023.
  • [102] Xiaodan Xing, Yixin Ma, Lei Jin, Tianyang Sun, Zhong Xue, Feng Shi, Jinsong Wu, and Dinggang Shen. A multi-scale graph network with multi-head attention for histopathology image diagnosis. In COMPAY 2021: The third MICCAI workshop on Computational Pathology, 2021.
  • [103] Roozbeh Bazargani, Ladan Fazli, Larry Goldenberg, Martin Gleave, Ali Bashashati, and Septimiu Salcudean. Multi-scale relational graph convolutional network for multiple instance learning in histopathology images. arXiv preprint arXiv:2212.08781, 2022.
  • [104] Gianpaolo Bontempo, Angelo Porrello, Federico Bolelli, Simone Calderara, and Elisa Ficarra. Das-mil: Distilling across scales for mil classification of histological wsis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 248–258. Springer, 2023.
  • [105] Pei Liu, Luping Ji, Feng Ye, and Bo Fu. Graphlsurv: A scalable survival prediction network with adaptive and sparse structure learning for histopathological whole-slide images. Computer Methods and Programs in Biomedicine, 231:107433, 2023.
  • [106] Zhiyang Gao, Zhiyang Lu, Jun Wang, Shihui Ying, and Jun Shi. A convolutional neural network and graph convolutional network based framework for classification of breast histopathological images. IEEE Journal of Biomedical and Health Informatics, 26(7):3163–3173, 2022.
  • [107] Kexin Ding, Mu Zhou, Zichen Wang, Qiao Liu, Corey W Arnold, Shaoting Zhang, and Dimitri N Metaxas. Graph convolutional networks for multi-modality medical imaging: Methods, architectures, and clinical applications. arXiv preprint arXiv:2202.08916, 2022.
  • [108] Yanyun Jiang, Shuai Ma, Wei Xiao, Jing Wang, Yanhui Ding, Yuanjie Zheng, and Xiaodan Sui. Predicting egfr gene mutation status in lung adenocarcinoma based on multifeature fusion. Biomedical Signal Processing and Control, 84:104786, 2023.
  • [109] Wei Xiao, Yanyun Jiang, Zhigang Yao, Xiaoming Zhou, Xiaodan Sui, and Yuanjie Zheng. Lad-gcn: Automatic diagnostic framework for quantitative estimation of growth patterns during clinical evaluation of lung adenocarcinoma. Frontiers in Physiology, 13:946099, 2022.
  • [110] Arijit De, Radhika Mhatre, Mona Tiwari, and Ananda S Chowdhury. Brain tumor classification from radiology and histopathology using deep features and graph convolutional network. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 4420–4426. IEEE, 2022.
  • [111] Yuzhang Xie, Guoshuai Niu, Qian Da, Wentao Dai, and Yang Yang. Survival prediction for gastric cancer via multimodal learning of whole slide images and gene expression. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1311–1316. IEEE, 2022.
  • [112] Yi Zheng, Regan D Conrad, Emily J Green, Eric J Burks, Margrit Betke, Jennifer E Beane, and Vijaya B Kolachalama. Graph attention-based fusion of pathology images and gene expression for prediction of cancer survival. bioRxiv, pages 2023–10, 2023.
  • [113] Michael Fatemi, Eric Feng, Cyril Sharma, Zarif Azher, Tarushii Goel, Ojas Ramwala, Scott M Palisoul, Rachael E Barney, Laurent Perreard, Fred W Kolling, et al. Inferring spatial transcriptomics markers from whole slide images to characterize metastasis-related spatial heterogeneity of colorectal tumors: A pilot study. Journal of Pathology Informatics, 14:100308, 2023.
  • [114] Ruitian Gao, Xin Yuan, Yanran Ma, Ting Wei, Luke Johnston, Yanfei Shao, Wenwen Lv, Tengteng Zhu, Yue Zhang, Junke Zheng, et al. Predicting gene spatial expression and cancer prognosis: An integrated graph and image deep learning approach based on he slides. bioRxiv, pages 2023–07, 2023.
  • [115] Lida Qiu, Deyong Kang, Chuan Wang, Wenhui Guo, Fangmeng Fu, Qingxiang Wu, Gangqin Xi, Jiajia He, Liqin Zheng, Qingyuan Zhang, et al. Intratumor graph neural network recovers hidden prognostic value of multi-biomarker spatial heterogeneity. Nature communications, 13(1):4250, 2022.
  • [116] Pushpak Pati, Sofia Karkampouna, Francesco Bonollo, Eva Comperat, Martina Radic, Martin Spahn, Adriano Martinelli, Martin Wartenberg, Marianna Kruithof-de Julio, and Maria Anna Rapsomaniki. Multiplexed tumor profiling with generative ai accelerates histopathology workflows and improves clinical predictions. bioRxiv, pages 2023–11, 2023.
  • [117] Mathilde Papillon, Sophia Sanborn, Mustafa Hajij, and Nina Miolane. Architectures of topological deep learning: A survey on topological neural networks. arXiv preprint arXiv:2304.10031, 2023.
  • [118] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3558–3565, 2019.
  • [119] Kaize Ding, Jianling Wang, Jundong Li, Dingcheng Li, and Huan Liu. Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387, 2020.
  • [120] Donglin Di, Shengrui Li, Jun Zhang, and Yue Gao. Ranking-based survival prediction on histopathological whole-slide images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 428–438. Springer, 2020.
  • [121] Ahsan Baidar Bakht, Sajid Javed, Hasan AlMarzouqi, Ahsan Khandoker, and Naoufel Werghi. Colorectal cancer tissue classification using semi-supervised hypergraph convolutional network. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 1306–1309. IEEE, 2021.
  • [122] Donglin Di, Changqing Zou, Yifan Feng, Haiyan Zhou, Rongrong Ji, Qionghai Dai, and Yue Gao. Generating hypergraph-based high-order representations of whole-slide histopathological images for survival prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5800–5815, 2022.
  • [123] Hakim Benkirane, Maria Vakalopoulou, Stergios Christodoulidis, Ingrid-Judith Garberis, Stefan Michiels, and Paul-Henry Cournède. Hyper-adac: Adaptive clustering-based hypergraph representation of whole slide images for survival analysis. In Machine Learning for Health, pages 405–418. PMLR, 2022.
  • [124] Meiyan Liang, Xing Jiang, Jie Cao, Bo Li, Lin Wang, Qinghui Chen, Cunlin Zhang, and Yuejin Zhao. Caf-ahgcn: context-aware attention fusion adaptive hypergraph convolutional network for human-interpretable prediction of gigapixel whole-slide image. The Visual Computer, pages 1–19, 2024.
  • [125] P. S. Chandran, N. B. Byju, R. Deepak, R. Rajesh Kumar, S. Sudhamony, P. Malm, and E. Bengtsson. Cluster detection in cytology images using the cellgraph method. 2012 International Symposium on Information Technologies in Medicine and Education, 2:923–927, 2012.
  • [126] Erxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, Peilin Zhao, Junzhou Huang, Sophia Ananiadou, and Yu Rong. Transformer for graphs: An overview from architecture perspective. arXiv preprint arXiv:2202.08455, 2022.
  • [127] Devin Kreuzer, Dominique Beaini, Will Hamilton, Vincent Létourneau, and Prudencio Tossou. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  • [128] Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  • [129] Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J Sutherland, and Ali Kemal Sinop. Exphormer: Sparse transformers for graphs. arXiv preprint arXiv:2303.06147, 2023.
  • [130] Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian, and Junchi Yan. Simplifying and empowering transformers for large-graph representations. Advances in Neural Information Processing Systems, 36, 2024.
  • [131] So Yeon Kim. Gnn-surv: Discrete-time survival prediction using graph neural networks. Bioengineering, 10(9):1046, 2023.
  • [132] Jianliang Gao, Tengfei Lyu, Fan Xiong, Jianxin Wang, Weimao Ke, and Zhao Li. Mgnn: A multimodal graph neural network for predicting the survival of cancer patients. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1697–1700, 2020.
  • [133] Juan G Diaz Ochoa and Faizan E Mustafa. Graph neural network modelling as a potentially effective method for predicting and analyzing procedures based on patients’ diagnoses. Artificial Intelligence in Medicine, 131:102359, 2022.
  • [134] Emma Rocheteau, Catherine Tong, Petar Veličković, Nicholas Lane, and Pietro Liò. Predicting patient outcomes with graph representation learning. arXiv preprint arXiv:2101.03940, 2021.
  • [135] Hirad Daneshvar and Reza Samavi. Heterogeneous patient graph embedding in readmission prediction. In AI, 2022.
  • [136] Shuai Zheng, Zhenfeng Zhu, Zhizhe Liu, Zhenyu Guo, Yang Liu, Yuchen Yang, and Yao Zhao. Multi-modal graph learning for disease prediction. IEEE Transactions on Medical Imaging, 41(9):2207–2216, 2022.
  • [137] Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, and Xi Peng. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2302–2310, 2021.
  • [138] Yigit Ozen, Selim Aksoy, Kemal Kösemehmetoğlu, Sevgen Önder, and Ayşegül Üner. Self-supervised learning with graph neural networks for region of interest retrieval in histopathology. In 2020 25th International conference on pattern recognition (ICPR), pages 6329–6334. IEEE, 2021.
  • [139] Oscar Pina and Verónica Vilaplana. Self-supervised graph representations of wsis. In Geometric Deep Learning in Medical Image Analysis, pages 107–117. PMLR, 2022.
  • [140] Petar Veličković, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. arXiv preprint arXiv:1809.10341, 2018.
  • [141] Hao Yan, Senzhang Wang, Jun Yin, Chaozhuo Li, Junxing Zhu, and Jianxin Wang. Hierarchical graph contrastive learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 700–715. Springer, 2023.
  • [142] Yingfu Zhao, Fusheng Jin, Ronghua Li, Hongchao Qin, Peng Cui, and Guoren Wang. Self-attention hypergraph pooling network. International Journal of Software & Informatics, 13(4), 2023.
  • [143] Domenico Mattia Cinque, Claudio Battiloro, and Paolo Di Lorenzo. Pooling strategies for simplicial convolutional networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  • [144] Zhiqiang Zhong, Cheng-Te Li, and Jun Pang. Hierarchical message-passing graph neural networks. Data Mining and Knowledge Discovery, 37(1):381–408, 2023.
  • [145] Eugene Vorontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Siqi Liu, Philippe Mathieu, Alexander van Eck, Donghun Lee, Julian Viret, et al. Virchow: A million-slide digital pathology foundation model. arXiv preprint arXiv:2309.07778, 2023.
  • [146] Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Andrew Zhang, Long Phi Le, et al. Towards a visual-language foundation model for computational pathology. arXiv preprint arXiv:2307.12914, 2023.
  • [147] Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas J Montine, and James Zou. A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023.
  • [148] Matthew Christensen, Milos Vukadinovic, Neal Yuan, and David Ouyang. Multimodal foundation models for echocardiogram interpretation. arXiv preprint arXiv:2308.15670, 2023.
  • [149] Josh Gardner, Simon Durand, Daniel Stoller, and Rachel M Bittner. Llark: A multimodal foundation model for music. arXiv preprint arXiv:2310.07160, 2023.
  • [150] Yizhen Luo, Kai Yang, Massimo Hong, Xingyi Liu, and Zaiqing Nie. Molfm: A multimodal molecular foundation model. arXiv preprint arXiv:2307.09484, 2023.
  • [151] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Yuanqi Du, Jieyu Zhang, Qiang Liu, Carl Yang, and Shu Wu. A survey on graph structure learning: Progress and opportunities. arXiv preprint arXiv:2103.03036, 2021.
  • [152] Jianan Zhao, Xiao Wang, Chuan Shi, Binbin Hu, Guojie Song, and Yanfang Ye. Heterogeneous graph structure learning for graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 4697–4705, 2021.