Learning Cortical Parcellations Using Graph Neural Networks

METHODS
published: 24 December 2021

doi: 10.3389/fnins.2021.797500
Learning Cortical Parcellations Using

Graph Neural Networks
Kristian M. Eschenburg 1,2 , Thomas J. Grabowski 2,3,4 and David R. Haynor 1,2,3*
1
Department of Bioengineering, University of Washington, Seattle, WA, United States, 2 Integrated Brain Imaging Center,
University of Washington Medical Center, Seattle, WA, United States, 3 Department of Radiology, University of Washington
Medical Center, Seattle, WA, United States, 4 Department of Neurology, University of Washington Medical Center, Seattle,
WA, United States
Deep learning has been applied to magnetic resonance imaging (MRI) for a variety of
purposes, ranging from the acceleration of image acquisition and image denoising to
tissue segmentation and disease diagnosis. Convolutional neural networks have been
particularly useful for analyzing MRI data due to the regularly sampled spatial and
temporal nature of the data. However, advances in the field of brain imaging have led
to network- and surface-based analyses that are often better represented in the graph
domain. In this analysis, we propose a general purpose cortical segmentation method
that, given resting-state connectivity features readily computed during conventional
MRI pre-processing and a set of corresponding training labels, can generate cortical
parcellations for new MRI data. We applied recent advances in the field of graph
Edited by: neural networks to the problem of cortical surface segmentation, using resting-state
Jordi Solé-Casals,
Universitat de Vic - Universitat Central
connectivity to learn discrete maps of the human neocortex. We found that graph
de Catalunya, Spain neural networks accurately learn low-dimensional representations of functional brain
Reviewed by: connectivity that can be naturally extended to map the cortices of new datasets.
Sun Zhe,
After optimizing over algorithm type, network architecture, and training features, our
RIKEN, Japan
Shijie Zhao, approach yielded mean classification accuracies of 79.91% relative to a previously
Northwestern Polytechnical University, published parcellation. We describe how some hyperparameter choices including
China
training and testing data duration, network architecture, and algorithm choice affect
*Correspondence:
David R. Haynor
model performance.
haynor@uw.edu
Keywords: graph neural network, parcellation, functional connectivity, representation learning, segmentation,
brain, human
Specialty section:
This article was submitted to
Brain Imaging Methods,
a section of the journal
1. INTRODUCTION
Frontiers in Neuroscience
Neural network approaches such as multi-layer feed-forward networks have been applied to a wide
Received: 18 October 2021 variety of tasks in medical imaging, ranging from disease classification to tissue segmentation.
Accepted: 03 December 2021
However, these networks do not always take into account the true spatial relationships between
Published: 24 December 2021
data points. Convolutional neural network approaches, such as those applied to static images or
Citation: dynamic video streams, learn translationally-invariant, multidimensional kernel filters over the
Eschenburg KM, Grabowski TJ and
data domain. Both these methods assume that the data is sampled regularly in space, allowing
Haynor DR (2021) Learning Cortical
Parcellations Using Graph Neural
convolution and pooling of information from fixed neighborhood topologies. However, real-world
Networks. data, such as graph-structured data, is often sampled on irregular domains. Data sampled from
Front. Neurosci. 15:797500. graph domains often contains non-uniform topology—individual data points can vary in their
doi: 10.3389/fnins.2021.797500 neighborhood structure, and notions of direction (e.g., up, down, left, right) do not generalize
Frontiers in Neuroscience | www.frontiersin.org 1 December 2021 | Volume 15 | Article 797500

Eschenburg et al. Learning Cortical Parcellations Using GNNs
well to graphs. This makes learning filters to process graph- 3D-volumetric convolution kernels. However, these approaches
structured data very difficult with conventional neural network are not easily applied to data distributed over 2-D manifolds like
approaches. the cortical surface. Additionally, more recent large-scale studies
Graph neural networks are a class of neural network models interpolate neurological signals, like cortical activation patterns
that operate on data distributed over a graph domain. Data are or various histological scalar measures, onto the cortical manifold
sampled from a graph with an explicit structure defined by a to mitigate the potential for mixing signals from anatomically
set of nodes and edges. These models have been shown to be close yet geodesically distant cortical regions, e.g., across sulci
useful for graph and node classification tasks, along with learning (Yeo et al., 2011; Glasser et al., 2013). These studies could also
generative models of data distributed over graphs (Kipf and benefit from methods that operate directly on graphs.
Welling, 2016b; Hamilton et al., 2017; Zhao et al., 2019; Zeng With the growth of large-scale open-source brain imaging
et al., 2020). Graph convolution networks (GCN), proposed in databases [ADNI (Petersen et al., 2010), ABCD (Hagler et al.,
Defferrard et al. (2016), generalized the idea of convolutional 2019), HCP (Glasser et al., 2013)], neuroscientists now have
networks on grid-like data to data distributed over irregular access to high-quality data that can be used for training models
domains by applying Chebyshev polynomial approximations of that can then be applied to new datasets. We leveraged the
spectral filters to graph data. Graph attention networks (GAT) statistical properties of these high-quality datasets to inform
are based on the idea of an attention function, a learned global the segmentation of new data using multiple variants of graph
function that selectively aggregates information across node neural networks. We considered graph convolution networks
neighborhoods. The attention function maps a query and set of and two variants of graph attention networks: standard attention
key-value pairs to an output (Vaswani et al., 2017). The output networks (Velickovic et al., 2018), and attention networks with
is defined as a weighted sum of the values, where weights are adaptive network depth weighting (a.k.a. jumping-knowledge
computed using some similarity or kernel function of the key- networks, Xu et al., 2018). We examined how algorithm
value pairs. choice and network parameterization affect cortical segmentation
It is believed that biological signals distributed over the performance. We trained our classification models on high-
cortical manifold are locally stationary. Given a small cortical quality open-source imaging data, and tested them on two
patch, voxels sampled from the patch will display similar datasets with unique spatial and temporal resolutions and
functional and structural connectivity patterns, cortical thickness different pre-processing pipelines. Other methods have been
and myelin density measures, and gene expression profiles, proposed for delineating the cortex using various registration
among various other signals (Glasser and van Essen, 2011; (Fischl et al., 2004; Robinson et al., 2018), neural network
Amunts et al., 2020; Wagstyl et al., 2020). Prior studies have (Hacker et al., 2013; Glasser et al., 2016), label fusion (Asman
attempted to delineate and map the cortex by identifying and Landman, 2012, 2014; Liu et al., 2016), and even graph
contiguous cortical subregions that are characterized by relative neural network approaches (Cucurull et al., 2018; Gopinath et al.,
uniformity of these signals (Blumensath et al., 2013; Arslan et al., 2019). To the best of our knowledge, this is the first attempt to
2015; Baldassano et al., 2015; Gordon et al., 2016). This work is examine the performance of common variants of graph neural
based on the fundamental idea that contiguous regions of the networks in a whole-brain cortical classification setting and
cortex with similar connectivity and histological properties will explore their ability to generalize to new datasets using functional
tend to function as coherent units. Biological signals distributed magnetic resonance imaging (fMRI). While other studies have
over the cortex exhibit local but not global stationarity, so proposed the use of graph neural networks to delineate cortical
any attempt to parcellate the cortex must take both properties areas, these studies did not perform in-depth analyses on how
into account. network architecture, algorithm parameter choices, feature type,
Most brain imaging studies utilize cortical atlases—template and training and testing data parameters impact the predicted
maps of the cortex that can be deformed and mapped to cortical maps (Cucurull et al., 2018; Gopinath et al., 2019). To
individual subjects’ brains—to discretize the cortical manifold this end, we studied how each of these different variables impacts
and simplify downstream analyses (Fischl et al., 2004; Bullmore model performance and prediction reliability.
and Sporns, 2012). However, it remains an open question how
to “apply” existing cortical maps to unmapped data. A recent
study identified considerable variability in the size, topological 2. BACKGROUND
organization, and existence of cortical areas defined by functional
connectivity across individuals, raising the question of how best 2.1. Graph Convolution Networks
to utilize the biological properties of any given unmapped dataset Convolution filters over graphs using spectral graph theory were
to drive the application of a cortical atlas to this new data introduced by Defferrard et al. (2016). For a graph G = (V, E)
(Glasser et al., 2016). with N nodes and symmetric normalized graph Laplacian, L,
Here, we developed an approach to perform cortical define the eigendecomposition of L = U3U T , where the
segmentation—a node classification problem—using graph columns of U are the spectral eigenfunctions of G. Given a graph
neural networks. The cerebral cortex is often represented as signal x ∈ RN distributed over G, the graph Fourier transform of
a folded sheet, and a usable parcellation approach must be x is defined as x̃ = U T x, and its inverse graph Fourier transform
applicable to this sort of data. Neural networks can be extended as x = U x̃. Graph filtering of x is then defined as gθ (L)x =
to account for non-stationarity in MRI volumes by incorporating Ugθ (3)U T x, where gθ is an arbitrary function of the eigenvalues.

then normalized by a softmax operation. To update the features

of node i at the (l + 1)-st layer, we compute the weighted
sum over the neighborhood Ni with weights defined by the
normalized attentions.
Velickovic et al. (2018) propose an ensemble (“multi-head”)
attention mechanism, such that, for each layer, M different
attention functions are learned, each with their own weight vector
aElm . The outputs of each attention head are concatenated feature-
FIGURE 1 | Each layer, l, implicitly aggregates more distant neighborhood wise. In the last layer, the number of hidden channels is the
signals into a node update. The first layer aggregates information over
immediately adjacent neighbors, while the second, third, etc. layers
number of output classes, C—rather than concatenating across
incorporate signals from increasingly larger neighborhoods. attention heads, the outputs of all attention heads are averaged to
generate the final network output.
Because these filters are not localized in space, Defferrard et al.

2.3. Jumping-Knowledge Networks
While graph neural networks have been instrumental in applying
(2016) proposed to use a Chebyshev polynomial approximation
principles of deep learning to graph-structured domains, they
to learn spatially localized filters directly from the Laplacian,
are not without pitfalls (Kipf and Welling, 2016a; Velickovic
reducing the filtering operation of a x to
et al., 2018; Xu et al., 2018; Wang et al., 2019). Graph neural
K−1 networks are prone to over-fitting of model parameters and over-
smoothing of learned embeddings as network depth increases
X
gθ (L)x = θk Tk (L)x (1)
k=0
(Wang et al., 2019). One approach to alleviate this over-
smoothing is to adaptively learn optimized network depths for
where Tk (L) and is the k-th polynomial and θk the k-th learnable each node in the graph, a method (Xu et al., 2018) describe as
Chebyshev coefficient. The polynomial order, K, determines the “jumping-knowledge networks.”
local spatial extent of the filter. If two nodes i and j are more than Suppose we have a network with L layers, such that the l-
K hops apart, the filter value gθ (L)i,j = 0. th layer embedding hli for node i is learned by incorporating
In Kipf and Welling (2016a), the polynomial order is set signals from up to l hops away from node i. The layer aggregation
to K = 1 so that the spatial extent of the filter is limited function described by Xu et al. (2018) learns a unique output
to directly adjacent nodes and only one coefficient weight is embedding by optimally combining the embeddings of each
learned per feature component in each layer of the network. hidden layer as
Given H l ∈ RN×kl , the input feature matrix for layer l, the model
learns kl Chebyshev coefficients, in addition to any additional yi = σ (g(h1i , h2i , . . . , hLi )) (3)
mixing weights. The model incorporates signals from the l-ring
neighborhood into the update of a node—each layer implicitly
Xu et al. (2018) propose three permutation-invariant aggregation
aggregates over a larger neighborhood than the previous layer
functions for g(x): concatenation, max-pooling, and long-short
(Figure 1).
term memory (LSTM) (Hochreiter and Schmidhuber, 1997).
2.2. Graph Attention Networks The output, y, is then passed through a linear feed-forward
layer to generate the network probabilities. Concatenation is
Whereas graph convolution networks uniformly aggregate
a global aggregator (i.e., the same function is applied to all
local neighborhood signals, attention networks learn optimized
graph nodes) whereas max-pooling and LSTM both learn node-
weights for each node neighbor using an attention mechanism.
specific aggregations. Further, by utilizing a bi-directional LSTM
Assume we have data distributed over a graph with N nodes.
layer, jumping-knowledge networks learn layer-specific attention
Inputs to the network are characterized by matrix X ∈ RN×F ,
weights for each node which can then be interrogated post-
where F is the number of features. Assume that at any given layer,
hoc (Figure 2). In this analysis, we incorporated the jumping-
the inputs to layer l are represented as H l ∈ RN×kl , where H 0 =
knowledge mechanism into an attention network framework
X. We define the immediate neighborhood of node i as Ni . For
and examine cortical segmentation performance using both the
two vectors nE , pE ∈ Rk , we define their feature-wise concatenation
LSTM and the concatenation functions.
as n||p ∈ R2k . In Velickovic et al. (2018), the attention paid by
Given a sequence of samples x1 , x2 , . . . xt , an LSTM layer
node i to node j ∈ Ni at layer l is computed using a single-layer
maintains a memory of previously observed samples in the
perceptron as
sequence in order to learn dependencies between elements.
Here, the “sequence” consists of the embeddings learned
αi,j = σ (EaT (Wl hEli ||Wl hElj )) (2) at each consecutive hidden layer, h1 , h2 . . . hL , representing
increasingly-abstract representations of functional connectivity.
where σ is a fixed non-linearity, Wl ∈ Rkl+1 ×kl is a learned We hypothesized that, because the jumping knowledge networks
layer-specific global linear projection matrix and aE, the attention learn optimized node-specific network depths, these networks
function, is also learned. The attention weights for j ∈ Ni are would be able to more-accurately segment the cortex of new data.

FIGURE 2 | Graph attention network employing a jumping-knowledge mechanism. The network takes as input the graph adjacency structure and the nodewise
feature matrix, and outputs a node-by-label logit matrix. Each GATConv block is composed of multiple attention heads. Arrows indicate the direction of processing.
Aggregation function, g(x), which takes as input the embeddings from each GATConv block, learns a convex combination of the layer-wise embeddings.
3. DATA session was roughly 15-min in length. These images were pre-
processed using a custom pipeline developed by the HCP (Glasser
The data used in this study come from the Human Connectome et al., 2013). BOLD images were denoised using subject-ICA
Project (HCP) (Glasser et al., 2013, 2016) and from the Midnight (Beckmann et al., 2005) and FIX (Salimi-Khorshidi et al., 2014)
Scan Club (MSC) (Gordon et al., 2017). We were specifically to automatically identify and remove spurious noise components,
interested in examining how models trained on one dataset and motion parameters were regressed out. No additional global
would perform on another dataset. Specifically, we trained signal regression, tissue regression, temporal filtering, or motion
models on data from the HCP (Glasser et al., 2013), one of scrubbing were performed. Denoised voxel time series were
the highest quality MRI datasets to date in terms of spatial and interpolated onto the fsaverage_LR32k surface mesh using a
temporal sampling of brain signals. We then tested our models barycentric averaging algorithm, and then smoothed at FWHM =
on images from both the HCP and MSC datasets. 2 mm to avoid the mixing of signals across gyri. Surface-mapped
BOLD signals were brought into register across subjects using a
3.1. HCP Dataset multi-modal surface matching algorithm (Robinson et al., 2014)
The HCP consortium collected data on a set of 1,200 young to the fsaverage_LR32 space and vectorized to CIFTI format,
adult subjects 21–35 years of age. We utilized a subset of 268 mapping each surface vertex to an index in a vector (toward the
of these datasets (22–35 years; 153 female) from the S500 data end of this work, we learned that different HCP data releases were
release. The HCP acquired high-resolution 0.7 mm isotropic T1w processed using different versions of this surface registration
(TI = 1,000 ms, TR = 2,400 ms, TE = 2.14 ms, FA = 8◦ , FOV algorithm; we discuss this in more depth in section 5.5). CIFTI
= 224 mm, matrix = 320, 256 saggital slices) and T2w images vector indices, referred to as “grayordinates” by the HCP, are in
(TR = 3,200 ms, TE = 565 ms, FOV = 224 mm, matrix = 320). spatial correspondence across subjects (i.e., index i in subjects
T1w and T2w data were pre-processed using a custom pipeline s and t correspond to roughly the same anatomical location),
developed by the HCP (Glasser et al., 2013) using FreeSurfer such that each subject shares the same mesh topology and
(Fischl et al., 2004) to generate highly refined cortical surface adjacency structure. Time-series for each session were demeaned
meshes at the white/gray and pial/CSF interfaces. The surface and temporally concatenated.
meshes were spatially normalized to Montreal Neurological The HCP consortium developed a pipeline to generate high-
Institute (MNI) space and resampled to have 32k vertices. The resolution multi-modal cortical parcellations (MMP) with 180
pipeline also generated four surface-based scalar maps: cortical cortical areas using a spatial derivative based algorithm (Glasser
thickness, Gaussian curvature along the cortical manifold, sulcal et al., 2016) computed from resting and task-based fMRI
depth of the cortical gyri and sulci, and a myelin density map signals, cortical thickness, myelin content, and cortical curvature.
characterizing the spatially-varying myelin content of the gray Manual editing was performed on the group-average gradient-
matter (Glasser and van Essen, 2011). based parcellation to ensure that boundaries conformed across
For each subject, the HCP acquired four resting-state feature types. Using a set of 210 independent subjects as training
functional MRI (rs-fMRI) images: TR = 0.720 s, TE = 33 ms, data, the authors trained a 3-layer neural network model to
multi-band factor = 8, FA = 52◦ , FOV = 208 × 180 mm, Matrix learn these boundary-based regions. The authors trained 180
= 104 × 90 × 72, voxel size: 2 × 2 × 2 mm. The authors refer to classifiers, one for each cortical area, to distinguish a single
these four acquisitions as: REST1_LR, REST1_RL, REST2_LR, cortical area from its immediately adjacent neighborhood (using
REST2_RL. The images were acquired over two separate a 30 mm radius neighborhood size) in a binary classification
days, such that REST1_LR / REST1_RL were acquired on 1 setting. At test time, the authors compared the probabilities of
day, and REST2_LR / REST2_RL were acquired on another. the predicted areal class across all classifiers in a single find-the-
Each session acquired 1,200 time-points, such that each BOLD biggest operation. Label predictions were regularized to minimize

spurious predictions and “holes” in the final parcellation. 4.1. Regional Functional Connectivity
Apart from the 30 mm radius around each group-level area, As mentioned above in sections 3.1 and 3.2, the MSC and
the classifiers did not incorporate any spatial information at HCP studies aligned cortical surfaces to the fsaverage_LR surface
training or test time. Predictions generated from subjects in the space. The result is such that, given two meshes S and T, the
training set were used to compute a group-average multi-modal anatomical location of grayordinate i in mesh S corresponds
parcellation which can be freely downloaded here: https://balsa. to generally the same anatomical location as grayordinate i in
wustl.edu/DLabel/show/nn6K. The individual parcellations and mesh T, allowing for direct comparisons between the same
the classifier itself have not yet been publically released. grayordinates across individual surfaces.
We utilized the subject-level cortical parcellations generated In cases where spatial normalization of surfaces has not been
by the HCP as the training set for our models. Subject-level performed, it would be incorrect to assume that two grayordinate
parcellations for a subset of 449 subjects were made available by indices correspond to the same anatomical locations across
an HCP investigator (see Acknowledgements). subjects. In order to alleviate the requirement of explicit vertex-
wise correspondence across training, validation, and testing
3.2. Midnight Scan Club Dataset datasets, we assume that most imaging studies will first run
The Midnight Scan Club dataset consists of MRI data acquired FreeSurfer to generate subject-specific folding-based cortical
on ten individual subjects (5 female) ranging in age from 24 parcellations (Desikan et al., 2006; Destrieux et al., 2010). We can
to 34 years of age: https://openneuro.org/datasets/ds000224/ then aggregate the high-dimensional vertex-wise connectivity
versions/1.0.3 (Gordon et al., 2017). The MCP study acquired features over one of these cortical atlases, as in Eschenburg
5 h of resting-state data on each participant in ten 30-min et al. (2018), and simultaneously reduce the feature vector
acquisitions, with the goal being to develop high-precision, dimension. This guarantees that column indices of feature
individual-specific functional connectomes to yield deeper vectors represent anatomically comparable variables across
insight into the reproducibility and inter-subject differences in individuals corresponding to connectivity to whole cortical
functional connectivity. areas rather than explicit vertex-vertex connections. These low-
The MSC dataset preprocessing followed a roughly similar dimensional vectors are agnostic to the original mesh resolution
pipeline to that of the HCP dataset. Four 0.8 mm isotropic T1w and degree of spatial normalization. As long as resting-state
images (TI = 1,000 ms, TR = 2,400 ms, TE = 3.74 ms, FA = 8◦ , data are collected for a given study, and that good spatial
matrix = 224, saggital) and four 0.8 mm isotropic T2w images correspondence between the T1w and BOLD image can be
(TR = 3,200 ms, TE = 479 ms, matrix = 224 slices, saggital) achieved, we can apply our processing steps to this data.
were acquired. T1w images were processed using FreeSurfer to Given a BOLD time series matrix T ∈ R32k×t and cortical atlas
generate refined cortical mesh representations of the white/gray with k regions, we consider the set of vertices assigned to region
and pial/CSF tissue interfaces, which were subsequently warped k and compute the mean time-series of region k as:
to the fsaverage_LR brain surface using the FreeSurfer shape- 1 X
based spherical registration method, and resampled to 164K and T̂k,t = Ti,t (4)
|k|
32k vertex resolutions. The authors performed myelin mapping i∈k
by computing the volumetric T1/T2 ratio and interpolating the
voxel-wise myelin densities onto the 32k surface mesh. where T̂ ∈ RK×t is the matrix of mean regional time-series. We
MSC resting-state data were acquired using gradient-echo compute R ∈ R32k×K , the Pearson cross-correlation between T
EPI sequences with the following parameters: TR = 2.2 s, TE and T̂, where Ri,k represents the temporal correlation between a
= 27 ms, FA = 90◦ , voxel size = 4 × 4 × 4 mm. The MSC vertex i and cortical region k. These cross-correlation vectors are
applied slice timing correction, and distortion correction using used as features to train our models.
subject-specific mean field maps. Images were demeaned and In this analysis, we generated connectivity features using
detrended, and global, ventricular, and white matter signals the Destrieux atlas (Destrieux et al., 2010) with 75 regions per
were regressed out. Images were interpolated using least squares hemisphere, as it is computed by FreeSurfer and represents a
spectral estimation and band-pass filtered (0.009 Hz < f < reasonably high-resolution partition of the cortical surface that
0.08 Hz), and then scrubbed of high-motion volumes. Denoised we hypothesize captures vertex-to-vertex functional variability
volumetric resting-state data were then interpolated onto the well. In section 5.5, we show how classification performance
midthickness 32k vertex mesh. The MSC study did not perform depends on which cortical atlas we regionalize over, and on which
subject-ICA and FIX to remove spurious noise components from representation of functional connectivity models are trained on.
the temporal signals. We also examined segmentation performance when models
were trained on continuous representations of functional
connectivity, computed by group-ICA and dual regression. As
4. METHODS part of their preprocessing, the HCP applied group-ICA to
a set of 1,003 subjects using MELODIC’s Incremental Group
Here, we describe processing steps applied to the HCP and MSC PCA (MIGP) algorithm to compute group-ICA components
fMRI datasets for this analysis. We begin with the minimally of dimensions 15, 25, 50, and 100 (Smith et al., 2014). We
pre-processed BOLD and scalar data interpolated onto the 32k dual-regressed these group-level components onto each subject’s
surface mesh. resting-state data to generate subject-level ICA components.

These subject-level regression coefficients were fed into our cortex. We then applied the singular value decomposition as
models as alternative representations of functional connectivity. R = USV T , where S is the diagonal matrix of singular values
σ1 , σ2 . . . σN Gordon et al. (2016) defined homogeneity as ρl =
.. P
4.2. Markers of Global Spatial Position 100 ∗ (σ12 k 2
i=1 σi ), the percent of variance explained by
We also included measures of position in grayordinate space
the first principal component. The variance captured by the
(global spatial position) as model features (Cucurull et al., 2018;
first component describes how well a single vector explains
Gopinath et al., 2019). Surface mesh Laplacian eigenvectors
the functional connectivity profiles of a given cortical parcel—
represent a spatial variance decomposition of the cortical
the larger the variance explained, the more homogeneous the
mesh into orthogonal bases along the cortical manifold. We
parcel connectivity. We computed an estimate of functional
retained the first three eigenvectors corresponding to eigenvalues
homogeneity for each parcel and averaged the estimates across
λ1 , λ2 , λ3 . The eigenfunctions represent an intrinsic coordinate
all parcels.
system of the surface that is invariant to rotations and
For scalar features (e.g., myelin density), we estimated
translations of the surface mesh.
homogeneity as the ratio of within-parcel variance to between-
The eigendecomposition computes eigenvectors up to a sign
parcel variance. For each parcel l ∈ L and feature F ∈
flip (that is, the positive/negative direction of an eigenvector
R32k , we computed the mean, µl , and variance, σl2 of the
is arbitrary), and eigenvector ordering is not guaranteed to be
parcel-wise features. Homogeneity is estimated as Li=1 (σl2 −
P
equivalent across individuals. We chose a template subject and .P
flipped (multiplied by −1) and reordered the eigenvectors of all σ¯2 ) i=1
L
l(µ − µ̄)2 , where σ¯2 and µ̄ are the average variance
remaining subjects with respect to this template subject via the and average mean estimates across all parcels. A smaller
Hungarian algorithm, to identify the lowest cost vector matching value represents more homogeneous parcels. This measure of
for every template-test pair (here, we minimized the Pearson homogeneity is a dimensionless quantity that allows for the
correlation distance). comparison of estimates across datasets and features.
4.3. Incorporating a Spatial Prior

The models trained in this analysis represent multi-class 4.5. Model Training and Parameter
classifiers. By default, each vertex considers every label (out of a Selection
total of 180 possible labels) as a viable candidate. This approach, We implemented each graph neural network model using
however, does not take advantage of the fact that training and the Python package Deep Graph Library (DGL) and PyTorch
testing data are in spatial correspondence with one another. For (Wang et al., 2020). Code developed for this analysis for
example, if we know a vertex is likely to be assigned a label in training these models can be found here: https://github.com/
the occipital lobe, we can restrict the set of candidate labels for kristianeschenburg/parcellearning/.
this vertex to a subset of the possible 180 areas e.g., only those We split the 268 HCP subjects into 100 training samples,
areas in the primary and higher-order visual areas. We restricted 20 validation samples, and 148 test samples. For parameter
the label search space of a test vertex to only those labels with optimization, we trained models on three types of datasets:
non-zero probabilities in the training set. If a given vertex i is (1) 100 15-min images (REST1_LR session for each subject),
never assigned label k in the training data, we set the estimated (2) 100 60-min images (temporal concatenation of all four
network probability of label k for vertex i to 0, such that it is never rfMRI sessions), and (3), 400 15-min images (four independent
assigned label k in the test set. We implemented the application of rfMRI sessions per subject). We used a validation dataset of 20
the spatial prior by multiplying the network logits with a binary subjects of the same scanning duration as the training data to
masking matrix at test time (e.g., the prior is not included in the determine when to stop training. We examined the performance
model training phase). of each model on test hold-out test set of different scanning
Applying the spatial prior is only feasible if the test image durations: 15-min (four independent rfMRI sessions), 30-min
surface mesh has been spatially normalized to the fsaverage_LR32 (concatenation of two 15-min rfMRI sessions acquired on the
space. Given that many studies will be interested in performing same day), and 60-min (temporal concatenation of all four 15-
multi-subject inference over surface-based maps, we believe this min rfMRI sessions). The outcome variable to be predicted
is a reasonable assumption to make. We examine classification was the subject-level parcellation provided to us by MG. We
performance when excluding and including a spatial prior. performed similar temporal concatenation of the MSC data,
concatenating the original ten 30-min sessions into five 60-min
4.4. Regional Homogeneity sessions, two 150-min sessions, and one 300-min session.
We examined whether our models learned parcellations in The features used for parameter optimization were the
which the features of each parcel were homogenous. We defined regionalized functional correlations between each cortical vertex
homogeneity for a given parcellation as in Gordon et al. (2016). and all regions in the Destrieux atlas, the first three Laplacian
Assume we are given a resting-state fMRI BOLD time series eigenvector embeddings capturing global location information,
matrix T ∈ R32k×t and precomputed cortical parcellation with and four scalar maps corresponding to sulcal depth, Gaussian
L cortical areas. For each parcel l ∈ L with nl vertices, we curvature, myelin density, and cortical thickness for a total of 81
computed the Pearson correlation matrix, Rl ∈ Rnl ×32k , between features at each vertex. We concatenated these features column-
the parcel BOLD signals with the BOLD signals of the entire wise into a matrix for each subject.

We refer to the graph convolution network, graph attention

network, and jumping knowledge network as “GCN,” “GAT,” and
“JKGAT,” respectively. We compared the performance of these
algorithm variants to a simple linear feed-forward neural network
(“baseline”) where only the features at each vertex were used to
classify cortical nodes (no adjacency information is incorporated
into the learning process). We optimized model performance
over network depth, number of hidden channels per layer, feature
dropout rate, number of attention heads (GAT and JKGAT
only), and aggregation function (JKGAT only). The “default”
parameters are 3 layers, a dropout rate of 0.1, 32 hidden channels,
4 attention heads per layer, and an LSTM aggregation function.
We varied one parameter at a time: for example, when comparing
networks with 3 layers vs. 6 layers, all other parameters are fixed
to the default values.
For training, we used the cross entropy loss implemented
in Pytorch, a LeakyReLU activation function with a negative
slope of 0.2, and Adam optimization with a weight decay rate of
0.0005 and L2 weight regularization of 0.005. We trained in mini-
batches of size s = 10 graphs and accumulated the gradients for
each batch before computing the gradient update. We trained for
1,000 epochs using an early stopping criteria evaluated on the
validation loss. At each iteration, we retained the model if the
current validation loss was lower than the previous validation
loss. If validation loss did not decrease for 150 epochs, training
was terminated and the best performing model was saved. In
practice, we found that few of the models trained for more than
1,000 epochs. FIGURE 3 | Subject-level (A) and group-level (B) predictions generated by the
optimal model in the MSC (left) and HCP (middle) datasets.
5. RESULTS
We first examine the best performing model of those we
a mean classification accuracy of 79.91% on the S1200 subjects.
considered in our analysis, and discuss the classification accuracy
We henceforth refer to this model as the “optimal” model, and
and reproducibility of parcellations predicted by this model in
discuss results associated with this model below.
relation to parcellations computed by Glasser et al. (2016), which
In Figure 3A, we show predicted parcellations computed
we call “ground truth” in what follows. We define classification
using this model for exemplar HCP and MSC test subjects.
accuracy as the percentage of correctly predicted vertex labels
Predicted subject-level parcellations closely resemble the
relative to the ground truth maps. We then show broadly how
“ground truth” maps generated by Glasser et al. (2016) (see
algorithm choice, network architecture, and training and testing
Supplementary Material for additional examples of predictions
image scan duration affect overall model performance. Finally,
generated by each model). No specific contiguity constraint was
we illustrate how classification performance is related to the
imposed on the parcellations; it is inherent in the graph neural
features used during model training and testing.
network models. Subjects from the MSC dataset do not have
corresponding ground truth maps against which to compare
5.1. Prediction Accuracy in the Best their predictions. In Figure 3B, we show consensus predictions
Performing Model for each dataset, compared against the publicly released HCP-
Network optimization was performed using labels provided by MMP atlas. Consensus predictions were computed by assigning
Matthew Glasser (see section Acknowledgments) using subject a vertex to the label most frequently assigned to that vertex
data from the S500 HCP release. As mentioned in section 3.1, the across the individual test subject predictions. We see that
S1200 data release uses a different surface registration algorithm, both consensus predictions closely resemble the HCP-MMP
producing subject-level resting-state data that is better aligned atlas—however, the consensus map derived from the MSC
with the labels provided by Glasser. Final model evaluation was subjects shows noisy parcel boundaries and disconnected areal
performed using this S1200 data. The best performing model components (lateral and medial prefrontal areas).
was the 6-layer graph attention network (GAT), with 4 attention Figure 4 shows the spatial distribution of classification
heads per layer, 32 hidden channels per layer, and a dropout accuracy rates averaged across all subjects in the HCP test
rate of 0.1, and incorporated a spatial prior at test time. When set. Average accuracy is shown as a map distributed over the
trained on features computed using ICA, this model achieved cortex, with values ranging between 0 (blue; vertex incorrectly

superior temporal areas in the fundus and medial superior

temporal regions (FST, MST, MT, and V4t), and lateral higher-
order visual areas (LO1, LO2, LO3). In the lateral prefrontal
area, we found that the premotor eye field (PEF) shows higher
error rates relative to adjacent regions (55b and frontal eye field,
FEF). Glasser et al. (2016) identified three unique topologies
(typical, shifted, and split) for area 55b that varied across subjects,
which might to some degree explain the higher error rates in
area PEF.
We quantified the relationship between the spatial
distribution of errors and their distance to cortical areal
boundaries. We computed the fraction of misclassified
vertices that occurred at a geodesic distance of k edges
(geodesic hops) from any cortical areal boundary. Using
the default model parameters and regionalized features, we
examined this distribution of errors as function of distance
(Supplementary Material). Over 50% of misclassified vertices
occurred at the region boundaries i.e., those vertices in the
ground-truth parcellations that are directly adjacent to different
regions, and roughly 30 and 12% of misclassified vertices were 1
and 2 edges away from areal boundaries, respectively. The simple
feed-forward network misclassified vertices further away from
region boundaries, while the three graph neural networks tended
to misclassify only vertices close to the boundary.
Although the MSC subjects do not have corresponding
ground truth maps, the data is in spatial correspondence with
the fsaverage_LR32 map. We computed the correspondence of
FIGURE 4 | Average accuracy maps for the HCP test set using the optimal maps predicted on the MSC subjects with the HCP-MMP atlas
model, computed by averaging the classification error maps across all HCP
dataset test subjects. (A) Blue (0.0) = vertex incorrectly classified in all test
in order to gain insight into the accuracy of these predictions.
subjects; Red (1.0) = vertex correctly classified in all test subjects. Areas in the Mean correspondence of predictions computed on the MSC and
lateral prefrontal and ventral/dorsal occipital areas showed the highest error HCP datasets with the HCP-MMP atlas was 70.04 and 84.35%,
rates. (B) Errors occur most frequently at the boundaries of cortical regions. respectively (Supplementary Material).
Black lines represent areal boundaries of the consensus prediction parcellation. Mean model probabilities computed by the optimal model for
a set of cortical areas are illustrated in Figure 5, showing that
areal probabilities are local in nature and restricted to precise
anatomical locations. Individual areal probabilities computed by
classified in all subjects) and 1 (red; vertex classified correctly Glasser et al. (2016) and Coalson et al. (2018) using their binary
in all subjects). Vertices near the centers of cortical regions classifier are shown in the bottom row. Probability estimates in
were classified correctly more frequently, while prediction errors the HCP dataset mirror those estimated by the original HCP
tended to be distributed near the boundaries of cortical regions. classifier (Glasser et al., 2016), indicating that our model faithfully
To some degree, this effect can be attributed to the idea that learns the proper spatial extent of each cortical areal. Estimates in
boundaries between putative cortical areas represent segments of the MSC dataset were slightly more diffuse and less confident (see
the cortex with changing biological properties. In developing a areas V1 and 46), such that probability mass was assigned to more
statistical model to assign a vertex to one cortical area or another, disparate areas of the cortex, relative to probabilities estimated in
vertices at region boundaries will have more ambiguous label the HCP dataset.
assignments simply due to the fact that their feature vectors
are sampled from a space with greater distributional overlap
across various cortical areas. However, another explanation is 5.2. Model Predictions Are Reproducible
that MRI resolution is low with respect to cortical functional Across Scanning Sessions
features like cell columns. Consequently, this means that voxel- The HCP acquired four 15-min resting-state acquisitions per
wise measurements reflect mixtures of connectivity patterns subject, while the MSC acquired ten 30-min resting-state
due to partial volume effects, thereby reducing the ability of acquisitions per subject. We examined how reliable predictions
a statistical model to distinguish between two cortical areas at generated from each resting-state session were within subjects,
parcel boundaries. and how this reliability related to the scanning duration. For a
While errors globally tended to be concentrated at region given subject, we estimated session-specific reproducibility using
boundaries, some cortical areas showed higher error rates than datasets of the same scan duration. We defined reproducibility
others. Of note are higher error rates for cortical areas in the using the Dice coefficient, which measures the similarity of two

FIGURE 5 | Mean model probabilities for a subset of cortical areas for the HCP (top) and MSC (middle) datasets computed using the optimal model, and the MMP
binary class probabilities from Glasser et al. (2016) and Coalson et al. (2018) (bottom). Probabilistic maps are illustrated for areas V1, 46, TE1a, LIPv, MT, RSC, and
10r. These maps are thresholded at a minimum probability value of 0.005, the probability of randomly assigning a vertex to one of the 180 cortical areas.
images. The Dice coefficient between sets J and K is defined as subject MSC08 reported restlessness, displayed considerable head
motion, and repeatedly fell asleep during the scanning sessions.
2 ∗ |J ∩ K| Area-level topologies were also reproducible across scanning
Dice(J, K) = (5)
|J| + |K| sessions (Supplementary Material). Glasser et al. (2016)
identified three unique topologies of area 55b, corresponding to
Figure 6 shows the mean areal Dice coefficients for each dataset a “typical,” “shifted,” or “split” organization pattern, relative to
from predictions computed using the optimal model. Predictions the group-average cortical map. We were able to identify these
made on the HCP dataset were more reproducible across same unique topologies in individual subjects, indicating that
the entire cortex than predictions on the MSC dataset. In graph neural networks are identifying the unique connectivity
both datasets, sensory/motor and areas near the angular and fingerprints of each cortical area, and not simply learning where
supramarginal gyri were most reproducible. The visual cortex the parcel is. When we examined the predictions generated by
showed high reproducibility in area V1, while areas V2-V4 were the optimal model on the four independent 15-min scanning
less reproducible. sessions, we found that, within a given subject, the topological
Figure 7A, shows mean reproducibility estimates computed organization of area 55b was reproducible. Allowing for
on the HCP and MSC datasets. Predictions for both datasets some variability in prediction boundaries and location due to
were highly reproducible across repeated scanning sessions, and resampling of the connectivity data and partial volume effects,
reproducibility increased with increasing scan duration. Mean this indicates that the graph neural networks are learning
Dice coefficient estimates in the HCP dataset were 0.81 and subject-specific topological layouts that incorporate their unique
0.86 for the 15- and 30-min durations. In the MSC dataset, the connectivity and histology patterns.
mean Dice coefficients were 0.69, 0.76, and 0.82 for the 30-, 60-,
and 150-min durations. When fixing scan duration (e.g., 30-min
durations), HCP data were more reproducible than the MSC
5.3. Parcellations Learned by GNNs Are
data. One feature that we could not evaluate directly was the Homogeneous in Their Scalar and
reproducibility of the ground truth maps. Glasser et al. (2016) Connectivity Measures
reported maximum and median Dice coefficient estimates of 0.75 If a model is in fact learning unique, discrete areas, the
and 0.72 for repeated scans on HCP participants, indicating that distribution of biological features in these areas should
our classifier learned parcellations that were more reproducible be relatively homogeneous. Unsupervised learning clustering
than those generated by the binary classifier. algorithms designed to parcellate the cortex often incorporate
Figure 7B illustrates subject-level reproducibility estimates objective functions that attempt to maximize within-parcel
in the MSC dataset. Predictions for subject MSC08 were similarity and minimize between-parcel similarity. On the other
significantly less reliable, relative to the other subjects. Gordon hand, gradient-based approaches, like those proposed in Gordon
et al. (2017) also identified MSC08 as having low reproducibility et al. (2016), Wig et al. (2014), and Schaefer et al. (2018), do not
with respect to various graph theoretical metrics computed directly maximize an objective function in this manner, but rather
from the functional connectivity matrices. They noted that identify putative areal boundaries by identifying where biological

FIGURE 6 | Mean areal Dice coefficient estimates, computed using the optimal model on 15-min HCP data (4 repeated sessions) and 30-min MSC data (10 repeated
sessions), normalized with the same color map. Estimates are computed for each area, and averaged across all subjects.
FIGURE 7 | Reproducibility of predicted maps generated by the optimal model, as measured using the Dice coefficient. We show mean reproducibility estimates for
each dataset (A), and subject-level estimates in the Midnight Scan Club (B). Estimates for 60 min (HCP) and 300 min (MSC) durations are not shown in (A) because
there is only one image per subject for these durations. Similarly, estimates for 150 min durations are not shown in (B) because there is only a single scalar estimate
per subject.
properties change dramatically in a small local neighborhood. It Cortical maps predicted in the HCP dataset explained,
is assumed that this biological gradient captures differences in on average, 67.03% of the functional variation while MSC
homogeneity between adjacent cortical areas. In order to group predictions explained 72.90% (t: −3.137, p: 0.007) (Figure 8).
cortical voxels together, these voxels must inherently share some We hypothesized that parcellations predicted in the HCP dataset
physical or biological traits. would be more homogeneous, relative to those learned in the
We computed homogeneity estimates as described in MSC dataset, due to the fact that the MSC imaging data were
section 4.4. In order to compare the homogeneity and variance acquired with lower spatial resolution than that acquired by
estimates between predicted parcellations, we fixed the features the HCP and therefore subject to greater partial volume effects.
used to compute these estimates. For a given subject, we Homogeneity of myelin (t: −0.910, p: 0.377) and sulcal depth
computed functional homogeneity using that subject’s 60-min (t: 1.043, p: 0.320) was not statistically different between the two
BOLD signal (HCP), or the 300-min BOLD signal (MSC). In datasets, while curvature was less variable in the HCP dataset (t:
this way, the only variable that changed with respect to the −2.423, p: 0.029). Contrary to our hypothesis, cortical thickness
homogeneity estimate is the cortical map itself. We could then was less variable in the MSC dataset (t: 11.562, p: 0.000). This
make meaningful quantitative comparisons between estimates is likely a consequence of using a dimensionless representation
for different maps, with respect to a given dataset. of homogeneity, which is internally normalized for each dataset

FIGURE 8 | Homogeneity of predicted parcellations in the HCP and MSC datasets using the optimal model. (A) Predicted parcels in the HCP test set explained as
much variability in the functional connectivity as the ground truth parcels. (B–E) Predictions in the MSC had more variable myelin content and less variable cortical
thickness estimates, relative to the HCP predictions.
as a ratio of the within-to-between parcel variances. This metric than a simple concatenation marginally decreased classification
allows for the direct comparison of homogeneity estimates across accuracy for the jumping-knowledge networks. In contrast to
datasets, instead of representing the raw variance estimates. our predictions, we found that the GAT networks slightly
We compared homogeneity estimates in the predicated HCP outperformed the more flexible JKGAT networks for most
parcellations to estimates computed for the ground truth maps parameterizations.
using paired t-tests. Predicted and ground truth maps both We used a fixed validation dataset of 20 subjects to determine
explained roughly 67% of the functional variation (t: −0.305, p: when to stop model training and evaluated the performance of
0.761). Myelin (t: 0.176, p: 0.860) and curvature: (t: −1.746, p: our models using a fixed test dataset of 148 subjects. In order to
0.083) variation were not statistically different between the two determine the reliability of our accuracy estimates, we computed
groups. However, predictions were more homogeneous than the the standard error of classification accuracy for each model
ground truth maps with respect to sulcal depth (t: −4.442, p: using a bootstrapped approach (Supplementary Material). We
0.000) and cortical thickness: (t: −2.553, p: 0.012). randomly sampled 100 test subjects, with replacement, out of
the 148, and computed the mean accuracy for each sample, for
5.4. Network Architecture Impacts Model each model. We repeated this process 1,000 times, and computed
Performance the variability of these bootstrapped estimates. Standard error
As noted in section 5, we first optimized over network algorithms estimates were less than 0.5%, indicating that test set accuracy
and architectures using the S500 dataset, and then utilized estimates are robust with respect to resampling of the test dataset.
the S1200 dataset for model evaluation. We fixed the features We examined how classification accuracy in the HCP
used for network optimization to the regionalized connectivity dataset was related to the scanning duration of training and
features. We examined how varying each network parameter testing datasets using the default model parameters (as defined
impacted model classification accuracy (Table 1). As mentioned in section 5). When fixing test scan duration, classification
in section 5.1, the best performing model was the GAT network accuracy improved as the training dataset size increased for
with 6 layers with a classification accuracy of 67.60% on the S500 all model types, with maximum accuracy achieved by graph
dataset (significantly inferior to the performance of the same attention network models trained on 400 15-min duration
network on S1200 data, with an accuracy of 79.91%). We found datasets (Supplementary Material). When training dataset size
that optimal performance for the GAT and GCN networks was and training scan duration were fixed, longer test image
achieved with 6 layers, 9 layers for the JKGAT, and 3 layers for the duration yielded more accurate predictions across the board.
baseline model. In general, classification accuracy increased with Predictions on 60-min test data were more accurate than
the number of attention heads, and number of hidden channels, those computed on 30-min images, which in turn were
while classification accuracy decreased with increasing feature more accurate than those generated from 15-min images
dropout rates. Using an LSTM aggregation function rather (Supplementary Material). However, models trained on 15-min

TABLE 1 | Model classification accuracy as a function of network architecture and parameterization.
Model
Parameter Value Baseline (%) GCN (%) GAT (%) JKGAT (%)
3 62.64 64.93 67.02 66.71

Network depth
6 61.13 65.14 67.60 67.33
9 57.72 64.76 67.36 67.42
16 60.54 62.60 66.37 66.12

Hidden channels
32 62.64 64.93 67.02 66.71
64 63.84 66.24 67.15 67.15
0.1 62.64 64.93 67.02 66.71
Dropout rate 0.3 60.74 63.94 66.72 66.58

0.5 58.34 63.10 65.45 65.39
0.7 55.63 61.18 62.70 62.60
4 67.02 66.71
Attention heads 8 67.39 67.30
12 67.56 67.29
concat 66.85
Aggregation function
lstm 66.71
Models were trained on 400 15-min datasets, and tested on 60-min test data using the S500 dataset. Boxed values indicate the default parameter values. The best performing model
was the GAT network with 6 layers, achieving a mean classification accuracy of 67.60%. Values in bold are the mean classification accuracy of the best model, trained on resting-state
connectivity features computed by regionalizing time-series over the Destrieux cortical atlas (see Section 4.1).
data performed best when tested on 15-min data, and models independent component analysis. We identified the connected
trained on 60-min data performed best when tested on components of each of the 17 resting-state networks and
60-min data (Supplementary Material) indicating an interaction excluded component regions with sizes smaller than 10
between training and testing scan duration. Similarly, when vertices, resulting in a map of 55 discrete functionally-derived
fixing training and testing scan duration, we found that including subregions of the cortex. We also examined the performance of
the spatial prior significantly improved classification accuracy in models trained on continuous, overlapping connectivity features
all architectures. representing resting-state networks computed using group-ICA
and dual regression.
Computing connectivity features over the Destrieux atlas
5.5. Incorporating Functional Connectivity yielded increased classification accuracy over the Desikan-
Improves Model Performance Beyond Killiany atlas (72.01 vs. 70.08%; paired t: 25.197, p: 0.000;
Spatial Location and Scalar Metrics see models “Full-DX” and “Full-DK”). We hypothesized that
After identifying the optimal network architecture, we examined computing connectivity features over a functionally-aware
how model performance varied as a function of which parcellation (Yeo-17) would yield a significant improvement
features the model was trained on. Briefly, we delineated in classification accuracy, relative to the Destrieux atlas,
three broad feature types: (1) scalar features corresponding to but this was not the case (see “Full-DX” vs. “Full-YEO”
myelin, cortical thickness, sulcal depth, and cortical curvature in Figure 9). Models trained on the Yeo-17 features had a
(2) global location features corresponding to the spectral mean classification accuracy of 71.58% (paired t: 1.916, p:
coordinates computed from the graph Laplacian and (3) 0.057). Training on spatial location or histological features
connectivity features computed from the resting-state signal. alone yielded mean classification accuracies of 44.10 and
In our primary analysis, we utilized connectivity features 54.45%, respectively (Figure 9A). However, training on
computed by regionalizing over the Destrieux atlas (75 folding- features defined by resting-state ICA components had clear
based cortical areas). We compared these features against performance benefits. Models trained on ICA dimensions
those computed using the Desikan-Killiany atlas (35 folding- of 15, 25, 50, and 100 generated mean classification
based cortical areas) and the Yeo-17 resting-state network accuracies of 75.34, 77.79, 79.68, and 79.91%, respectively
atlas (Yeo et al., 2011). The Yeo-17 atlas is a functional (Figure 9C). Similarly, incorporating the prior mask also
atlas of discretized resting-state networks, computed via improved model performance. However, the mask added

FIGURE 9 | Classification accuracy as a function of model features, using the optimal model architecture for (A) single feature types, (B) regionalization over different
cortical atlases, and (C) independent component analysis features. Refer to Table 2 for a description of each feature set.
TABLE 2 | Feature combinations tested by our optimal model.
Feature sets
Full feature sets Connectivity Scalar Location
DK (F) DX (F) YEO (F) ICA (F) DX Hist. Spect.
Thickness + + + + +
Curvature + + + + +
Myelin + + + + +
Sulcal depth + + + + +
Laplacian + + + + +
Desikan (DK) +
Destrieux (DX) + +
Yeo-17 (YEO) +
ICA-RSN +
Features included in a model are marked by a “+.” “Full” models include histological features, global position information, and functional connectivity signals.
diminishing returns, with the better-performing models the S500 and S1200 data releases were preprocessed using
benefiting less from its inclusion. Models trained on higher- different surface registration algorithms: MSMSulc and
dimensional ICA resting-state networks (50 and 100 networks), MSMAll (Robinson et al., 2014, 2018). A consequence
performed almost as well without the spatial prior as they of these preprocessing differences is that data from the
did with it. S1200 release is better aligned with the subject-level
Late into our analysis, we learned of differences in labels provided by Glasser. After performing network
the preprocessing steps used to generate the minimally- optimization using the S500 data, we evaluated final model
preprocessed HCP resting-state data, and to generate the performance on the S1200 dataset. Figure 10 illustrates model
subject- and group-level HCP-MMP parcellations. Specifically, performance after training on each independent dataset. We

FIGURE 10 | Classification accuracy as a function of HCP data release and corresponding multi-modal surface matching algorithm. S500: MSMSulc (Robinson et al.,
2014), S1200: MSMAll (Robinson et al., 2018).
found that utilizing the S1200 dataset showed significant on repeated samples of BOLD images, such that for a given
improvements in mean classification accuracy by upwards 5%, training subject, models were shown four BOLD datasets. This
relative to the S500 dataset. This indicates that the surface likely enabled the models to better learn the mapping between
registration algorithm choice plays a critical role in cortical a given subject’s unique BOLD signature, and its cortical map.
segmentation quality. Another possible explanation is that the ground truth maps
were generated using a linear perceptron model, which does not
6. DISCUSSION take into account any spatial relationships between data points,
while graph neural networks do take this spatial structure into
In this analysis, we presented a general cortical segmentation account. It is likely the case that the perceptron model could not
approach that, given functional connectivity information and adapt to utilize spatial dependencies in the BOLD signal in local
a set of corresponding training labels, can generate cortical neighborhoods and thereby failed to fully learn unique subject-
parcellations for individual participants. This approach to specific connectivity fingerprints, and consequently learned more
segmenting the cortex requires accessible MRI acquisition variable parcellations.
sequences and standard morphological parcellations as inputs. The optimal model predicted parcellations that were as
We compared three different graph neural network variants homogeneous as the ground truth maps when considering
to a baseline fully-connected network. We found that, in multidimensional connectivity features and univariate scalar
all cases, graph neural networks consistently and significantly features. Though the models considered in this analysis
outperformed a baseline neural network that excluded adjacency are capable of learning parcels that capture inter-areal
information. We identified the best performing model and variation of functional brain connectivity and other cortical
explored its performance with respect to various metrics features, it is worth noting that homogeneity as a measure
like segmentation accuracy, prediction reliability, and areal of parcellation quality is an imperfect metric and should be
homogeneity in two independent datasets. used judiciously. For example, the primary sensory areas
Predictions generated for both the HCP and MSC datasets can be further divided into five somatotopic subregions
were highly reproducible. However, we found that nearly twice corresponding to the upper and lower limbs, trunk, ocular,
as much resting-state data was required in MSC subjects to and face areas (Glasser et al., 2013). These subdivisions
achieve the same reproducibility estimates as in the HCP correspond well with task-based fMRI activity and gradients in
data. Predictions generated on the HCP dataset were more myelin content, indicating that the parcels learned by GNNs
reproducible than the ground truth maps themselves (Glasser in our analysis still incorporate significant variability due
et al., 2016), while predictions in the MSC data were roughly as to the aggregation of signals from different somatosensory
reproducible as the ground-truth parcellations. This may in part areas. While learning homogeneous regions is important
be due to the way we trained our models. Models were trained in order to effectively capture spatial biological variation,

maximizing homogeneity was not the training criterion for networks would significantly outperform GAT networks due
this analysis. to the increased flexibility to learn optimized node-specific
As noted in section 3, the MSC study applied different network depths. In their original formulation of the jumping-
preprocessing steps than the HCP. Specifically, the MSC did knowledge network architecture, Xu et al. (2018) found that
not perform FIX-ICA to remove noise components from the including the jumping-knowledge mechanism improved
BOLD images and utilized the FreeSurfer spherical surface model performance relative to the GAT in almost all of their
registration to bring surfaces into spatial correspondence with comparisons. However, we found this not to be the case. This
one another instead of the multi-modal surface matching may be a consequence of the increased number of estimated
algorithm (Robinson et al., 2014, 2018). Given that the MSC parameters in the JKGAT networks, relative to the GAT—the
dataset did not have “ground truth” labels against which we jumping-knowledge aggregation layer learns the parameters for
could compare predictions made on the MSC data, we compared the aggregation function cells in addition to the attention head
predictions against the HCP-MMP atlas (Glasser et al., 2016). As and projection matrix weights learned in the GAT networks. The
expected, predictions generated on the HCP dataset more closely lower classification accuracy at test time is possibly the result
resembled the HCP-MMP atlas than predictions made from the of model over-fitting, necessitating a larger training dataset. It
MSC dataset (the HCP-MMP atlas was derived as a group- is possible that the jumping-knowledge mechanism is generally
average of individual ground truth parcellations). Nevertheless, more useful in the case where graph topologies vary considerably
we found that correspondence of MSC predictions with the across a network, as opposed to more regular graphs such as
atlas followed similar trends with respect to testing image cortical surface data.
duration. We believe some discrepancy in results between the As expected, network performance was dependent on both
HCP and MSC datasets can be attributed to the differences in the size and duration of the training set, and duration of the
dataset-specific preprocessing choices noted above, although the testing data. Classification accuracy increased when models were
relationship between methodological choices and parcellation trained on larger datasets consisting of shorter-duration images.
outcome requires future analyses. Performance differences across Conversely, accuracy increased when models were deployed
the two datasets are also possibly a result of the models on longer-duration test data. It is important to note that we
learning characteristics inherent to the training (HCP) dataset, examined performance of our models on images of long scanning
and thereby performing better on hold out subjects from that durations by concatenating multiple sessions together (30/60-
same dataset. min in the HCP, and 60/150/300-min in the MSC). It is unrealistic
Our optimal model was the 6-layer graph attention network, to expect study participants to be able to lay in an MRI scanner for
trained and tested on resting-state network components single sessions of these lengths. However, it is useful to examine
computed using a 50-dimensional ICA. This model performed how model performance is impacted by tunable parameters
as well with the spatial prior as it did without. However, models like scan duration in order to best guide image acquisition
trained on regionalized connectivity features benefited from in future studies. We found that utilizing repeated scans on
including the spatial prior. We believe it would be prudent for individual subjects as independent training examples, rather
future studies to include a spatial prior of some form into their than concatenating repeated scans together into single datasets,
classification frameworks. Interestingly, predictions on HCP test significantly improved our classification frameworks. This likely
subjects resembled the HCP-MMP atlas more closely than they speaks to the ability of neural network models to generalize
resembled their ground truth counterparts, which might in part better to noise in the datasets. Training models on multiple
be driven by the specific form of the prior. We made the samples of shorter-duration images more accurately captures
assumption that cortical map topology is relatively conserved the individual variability in the resting-state signal than fewer
across individuals. This assumption may be too conservative and longer-duration images, thereby allowing the networks to more
may reduce model sensitivity to atypical cortical connectivity accurately learn a mapping between functional connectivity and
patterns. Nevertheless, there is evidence our GNN models learn cortical areal assignments.
subject-specific topologies of cortical areas, rather than simply Our methodology could be improved in a variety of ways.
learning where a cortical parcel usually is. Importantly, we We chose not to perform intensive hyperparameter optimization,
found that the optimal GAT model could identify three unique and instead focused our efforts on overall performance of
topologies for area 55b (typical, shifted, and split) and that the various network architectures as a function of network
predictions generated by our model replicated, with high fidelity, parameters and data parameters, and the applicability of
the same spatial organization patterns as identified in Glasser trained models to new datasets. However, in the case where a
et al. (2016). This indicates that the model is capable of learning classification model is meant to be distributed to the research
unique connectivity fingerprints of each cortical area on a community for open-source use, it would be prudent to perform
subject-by-subject basis, rather than simply learning the group a more extensive search over the best possible parameter choices.
average fingerprint. As such, we do not believe that including the The utility of functional connectivity has been shown in a
spatial prior in its current form inhibits the ability of the graph variety of studies for delineating cortices (Blumensath et al.,
neural network models used in this analysis to identify atypical 2013; Arslan et al., 2015; Baldassano et al., 2015; Gordon et al.,
cortical topologies. 2016). However, in recent years, using diffusion tractography
We compared three different graph neural networks: graph for learning whole-brain cortical maps has been underutilized,
convolution networks, standard attention networks, and relative to functional connectivity (Gorbach et al., 2011; Parisot
jumping-knowledge networks. We hypothesized that JKGAT et al., 2015; Bajada et al., 2017). Given cortical maps defined

independently by tractography and functional connectivity, it in Figure 10, incorporating MSMAll-processed data from the
is difficult to “match” cortical areas across maps to compare S1200 dataset, instead of MSMSulc-processed data from the S500
biological properties, so heuristics are often applied. Few dataset, improved model classification accuracy by nearly 5%.
studies have simultaneously combined functional connectivity We hypothesize that this improvement would only increase if we
and tractography to better inform the prediction of cortical had access to the data processed with the prototypical version of
maps. Recent work has extended the idea of variational auto- MSMAll. Based on the comparisons of subject-level predictions
encoders to the case of multi-modal data by training coupled with the subject-level ground truth MMP maps, our models
auto-encoders to jointly learn embeddings of multiple data performed well in spite of these registration discrepancies. Our
types. In Gala et al. (2021), the authors apply this approach results lend evidence to the robustness of graph neural networks
to jointly learn embeddings defined by transcriptomics and for learning cortical maps from functional connectivity.
electrophysiology that allow them to identify cell clusters with Finally, participants in both the HCP and MSC studies were
both similar transcriptomic and electrophysiology properties. healthy young adults, and the datasets had been extensively
Future work could apply similar ideas to aggregate functional and quality controlled. Little to no work has been done on extending
diffusion-based connectivity signals. connectivity-based classifiers to atypical populations, such as to
The majority of recent studies have approached the cortical individuals with neurodegeneration. It is unknown how a model
mapping problem from the perspective of generating new trained on connectivity properties from healthy individuals
parcellations from underlying neurobiological data using would perform in populations where connectivity is known
unsupervised clustering or spatial gradient methods. These to degrade. While our model (and that developed by Glasser
approaches attempt to delineate areal boundaries by grouping et al., 2016) predict maps based on healthy individuals, it is
cortical voxels together on the basis of similarity between their possible that some studies would need to train population-
features. Spatial gradient-based methods explicitly define areal specific models.
boundaries, while clustering methods define these boundaries
implicitly. However, both approaches are distinct from methods DATA AVAILABILITY STATEMENT
that utilize pre-existing or pre-computed parcellations as
templates for mapping new data. In the current analysis, we were The original contributions presented in the study are included
concerned with the latter problem. in the article/Supplementary Material, further inquiries can be
Clustering and spatial gradient approaches are often interested directed to the corresponding author/s.
in relating newly-generated cortical maps to underlying in vitro
measures, such as transcriptomics or cytoarchitectural results. AUTHOR CONTRIBUTIONS
Clearly, it is impossible to acquire this data in human subjects
simultaneously with in vivo data. Various projects have attempted KE conceptualized this study, developed the code, performed
to build cytoarchitectural datasets from post-mortem subjects to the analyses, and wrote the bulk of the document. TG
use as a basis of comparisons for maps generated in vivo (Amunts provided comments and neuroscientific insight into the analysis,
et al., 2020). While some cortical areas have been recapitulated contributed to the editing, and organization structure of
using both in vitro and in vivo features, this is not a general the manuscript. DH provided extensive neuroscientific and
rule across the cortex. As such, cross-modal verification is often technical guidance for this work, contributed to the editing,
difficult, and leaves room for methods and datasets than can and organizational structure of the manuscript. All authors
improve upon the validation of cortical mapping studies. contributed to the article and approved the submitted version.
One limitation of our analysis concerns the use of different
versions of the multi-modal surface matching for cortical surface FUNDING
alignment for the S500 HCP data release (Glasser et al., 2013;
Robinson et al., 2014), the S1200 release (Robinson et al., This project was supported by grant NSF BCS 1734430, titled
2018), and for the subject-level HCP-MMP parcellations (Glasser Collaborative Research: Relationship of Cortical Field Anatomy
et al., 2016), which used a different regularization term. These to Network Vulnerability and Behavior (TG, PI).
differences between the three registration methods result in a
slight spatial misalignment between the training labels and the ACKNOWLEDGMENTS
cortical features. While the S500 data release utilized MSMSulc, a
spherical surface registration driven by cortical folding patterns, We thank Matthew F. Glasser for making subject-level Human
the S1200 release utilized MSMAll, and incorporated functional Connectome Project multi-modal parcellations available for this
connectivity into the spatial resampling step. Glasser et al. analysis, and for his helpful and extensive comments on a draft
(2016) used a prototypical version of MSMAll in addition to version of the manuscript.
MSMSulc, and thereby incorporated additional features derived
from resting-state networks to drive the surface matching SUPPLEMENTARY MATERIAL
process. Importantly, this discrepancy between the training labels
and training features is not a flaw in our methodology itself, The Supplementary Material for this article can be found
and correcting for this difference in the registration approach online at: https://www.frontiersin.org/articles/10.3389/fnins.
would only improve the results of our analysis. As we showed 2021.797500/full#supplementary-material

REFERENCES Gopinath, K., Desrosiers, C., and Lombaert, H. (2019). Graph convolutions on
spectral embeddings for cortical surface parcellation. Med. Image Anal. 54,
Amunts, K., Mohlberg, H., Bludau, S., and Zilles, K. (2020). Julich-Brain: a 3D 297–305. doi: 10.1016/j.media.2019.03.012
probabilistic atlas of the human brain’s cytoarchitecture. Science 369, 988–992. Gorbach, N. S., Schutte, C., Melzer, C., Goldau, M., Sujazow, O., Jitsev, J.,
doi: 10.1126/science.abb4588 et al. (2011). Hierarchical information-based clustering for connectivity-based
Arslan, S., Parisot, S., and Rueckert, D. (2015). Joint spectral decomposition for the cortex parcellation. Front. Neuroinform. 5:18. doi: 10.3389/fninf.2011.00018
parcellation of the human cerebral cortex using resting-state fMRI. Inf. Process. Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W.
Med. Imaging 24, 85–97. doi: 10.1007/978-3-319-19992-4_7 M., and Petersen, S. E. (2016). Generation and evaluation of a cortical
Asman, A. J., and Landman, B. A. (2012). Non-local statistical label area parcellation from resting-state correlations. Cereb. Cortex 26, 288–303.
fusion for multi-atlas segmentation. Med. Image Anal. 17, 194–208. doi: 10.1093/cercor/bhu239
doi: 10.1016/j.media.2012.10.002 Gordon, E. M., Laumann, T. O., Gilmore, A. W., Newbold, D. J., Greene, D. J., Berg,
Asman, A. J., and Landman, B. A. (2014). Hierarchical performance estimation J. J., et al. (2017). Precision functional mapping of individual human brains.
in the statistical label fusion framework. Med. Image Anal. 18, 1070–1081. Neuron 95, 791.e7–807.e7. doi: 10.1016/j.neuron.2017.07.011
doi: 10.1016/j.media.2014.06.005 Hacker, C. D., Laumann, T. O., Szrama, N. P., Baldassarre, A., Snyder, A. Z.,
Bajada, C. J., Jackson, R. L., Haroon, H. A., Azadbakht, H., Parker, G. J., Lambon Leuthardt, E. C., et al. (2013). Resting state network estimation in individual
Ralph, M. A., et al. (2017). A graded tractographic parcellation of the temporal subjects. Neuroimage 15, 616–633. doi: 10.1016/j.neuroimage.2013.05.108
lobe. Neuroimage 155, 503–512. doi: 10.1016/j.neuroimage.2017.04.016 Hagler, D. J., Hatton, S., Cornejo, M. D., Makowski, C., Fair, D. A., Dick,
Baldassano, C., Beck, D. M., and Fei-Fei, L. (2015). Parcellating connectivity in A. S., et al. (2019). Image processing and analysis methods for the
spatial maps. PeerJ 3:e784. doi: 10.7717/peerj.784 Adolescent Brain Cognitive Development Study. Neuroimage 202:116091.
Beckmann, C. F., DeLuca, M., Devlin, J. T., and Smith, S. M. (2005). Investigations doi: 10.1016/j.neuroimage.2019.116091
into resting-state connectivity using independent component analysis. Philos. Hamilton, W. L., Ying, R., and Leskovec, J. (2017). “Representation learning
Trans. R. Soc. Lond. B Biol. Sci. 360, 1001–1013. doi: 10.1098/rstb.2005.1634 on graphs: methods and applications,” in IEEE Data Engineering Bulletin
Blumensath, T., Jbabdi, S., Glasser, M. F., Van Essen, D. C., Ugurbil, (California, CF: IEEE) arXiv:1709.05584.
K., Behrens, T. E., et al. (2013). Spatially constrained hierarchical Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural
parcellation of the brain with resting-state fMRI. Neuroimage 76, 313–324. Comput. 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735
doi: 10.1016/j.neuroimage.2013.03.024 Kipf, T. N., and Welling, M. (2016a). Semi-Supervised Classification With Graph
Bullmore, E., and Sporns, O. (2012). The economy of brain network organization. Convolutional Networks. Technical report, University of Amsterdam.
Nat. Rev. Neurosci. 13, 336–349. doi: 10.1038/nrn3214 Kipf, T. N., and Welling, M. (2016b). Variational Graph Auto-Encoders. Technical
Coalson, T. S., Van Essen, D. C., and Glasser, M. F. (2018). The impact of traditional report, University of Amsterdam.
neuroimaging methods on the spatial localization of cortical areas. Proc. Natl. Liu, M., Kitsch, A., Miller, S., Chau, V., Poskitt, K., Rousseau, F., et al. (2016).
Acad. Sci. U.S.A. 115, E6356–E6365. doi: 10.1073/pnas.1801582115 Patch-based augmentation of expectation-maximization for brain MRI tissue
Cucurull, G., Wagstyl, K., Casanova, A., Velickovic, P., Jakobsen, E., Drozdzal, M., segmentation at arbitrary age after premature birth. Neuroimage 127, 387–408.
et al. (2018). “Convolutional neural networks for mesh-based parcellation of the doi: 10.1016/j.neuroimage.2015.12.009
cerebral cortex,” in Med. Imaging with Deep Learn. Available online at: https:// Parisot, S., Arslan, S., Passerat-Palmbach, J., Wells, W. M. III, Rueckert, D.,
openreview.net/pdf?id=rkKvBAiiz. Wells, W. M. III, et al. (2015). Tractography-driven groupwise multi-
Defferrard, M., Bresson, X., and Vandergheynst, P. (2016). “Convolutional scale parcellation of the cortex. Inf. Process. Med. Imaging 24, 600–612.
neural networks on graphs with fast localized spectral filtering,” in NIPS’16: doi: 10.1007/978-3-319-19992-4_47
Proceedings of the 30th International Conference on Neural Information Petersen, R. C., Aisen, P. S., Beckett, L. A., Donohue, M. C., Gamst, A. C., Harvey,
Processing Systems. Barcelona: ACM, 187–98. D. J., et al. (2010). Alzheimer’s Disease Neuroimaging Initiative (ADNI) clinical
Desikan, R. S., Segonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, characterization. Neurology 74, 201–209. doi: 10.1212/WNL.0b013e3181cb3e25
D., et al. (2006). An automated labeling system for subdividing the human Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos,
cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage A., et al. (2018). Multimodal surface matching with higher-order smoothness
31, 968–980. doi: 10.1016/j.neuroimage.2006.01.021 constraints. Neuroimage 167, 453–465. doi: 10.1016/j.neuroimage.2017.10.037
Destrieux, C., Fischl, B., Dale, A., and Halgren, E. (2010). Automatic parcellation Robinson, E. C., Jbabdi, S., Glasser, M. F., Andersson, J., Burgess, G. C., Harms,
of human cortical gyri and sulci using standard anatomical nomenclature. M. P., et al. (2014). MSM: a new flexible framework for multimodal surface
Neuroimage 53, 1–15. doi: 10.1016/j.neuroimage.2010.06.010 matching. Neuroimage 100, 414–426. doi: 10.1016/j.neuroimage.2014.05.069
Eschenburg, K., Haynor, D., and Grabowski, T. (2018). “Automated connectivity- Salimi-Khorshidi, G., Douaud, G., Beckmann, C. F., Glasser, M. F., Griffanti,
based cortical mapping using registration-constrained classification,” in L., and Smith, S. M. (2014). Automatic denoising of functional MRI
Medical Imaging 2018: Biomedical Applications in Molecular, Structural, and data: combining independent component analysis and hierarchical fusion of
Functional Imaging (Houston, TX). doi: 10.1117/12.2293968 classifiers. Neuroimage 90, 449–468. doi: 10.1016/j.neuroimage.2013.11.046
Fischl, B., van der Kouwe, A., Destrieux, C., Halgren, E., Segonne, F., Salat, D. Schaefer, A., Kong, R., Gordon, E. M., Laumann, T. O., Zuo, X.-N., Holmes,
H., et al. (2004). Automatically parcellating the human cerebral cortex. Cereb. A. J., et al. (2018). Local-global parcellation of the human cerebral cortex
Cortex 14, 11–22. doi: 10.1093/cercor/bhg087 from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114.
Gala, R., Budzillo, A., Baftizadeh, F., Miller, J., Gouwens, N., Arkhipov, A., et al. doi: 10.1093/cercor/bhx179
(2021). Consistent cross-modal identification of cortical neurons with coupled Smith, S. M., Hyvärinen, A., Varoquaux, G., Miller, K. L., and Beckmann, C.
autoencoders. Nat. Comput. Sci. 1, 120–127. doi: 10.1038/s43588-021-00030-1 F. (2014). Group-PCA for very large fMRI datasets. Neuroimage 101:738.
Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, doi: 10.1016/j.neuroimage.2014.07.051
E., et al. (2016). A multi-modal parcellation of human cerebral cortex. Nature Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al. (2017).
536, 171–178. doi: 10.1038/nature18933 Attention Is All You Need. Technical report, Google.
Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lí, P., and Bengio, Y.
B., Andersson, J., et al. (2013). The minimal preprocessing pipelines (2018). “Graph attention networks,” in International Conference on Learning
for the Human Connectome Project. Neuroimage 80, 105–124. Representations (Vancouver, VC :ICLR) arXiv:1710.10903.
doi: 10.1016/j.neuroimage.2013.04.127 Wagstyl, K., Larocque, S., Cucurull, G., Lepage, C., Cohen, J. P., Bludau, S., et al.
Glasser, M. F., and van Essen, D. C. (2011). Mapping human cortical areas in vivo (2020). BigBrain 3D atlas of cortical layers: cortical and laminar thickness
based on myelin content as revealed by T1- and T2-weighted MRI. J. Neurosci. gradients diverge in sensory and motor cortices. PLoS Biol. 18:e3000678.
31, 11597–11616. doi: 10.1523/JNEUROSCI.2180-11.2011 doi: 10.1371/journal.pbio.3000678

Wang, G., Ying, R., Huang, J., and Leskovec, J. (2019). Improving Graph Attention analysis. Technical report, Stanford. doi: 10.1007/978-3-030-322
Networks with Large Margin-based Constraints. Technical report, Stanford 45-8_91
University, Mountain View.
Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., et al. (2020). Deep Conflict of Interest: The authors declare that the research was conducted in the
Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural absence of any commercial or financial relationships that could be construed as a
Networks. Technical report, New York University. potential conflict of interest.
Wig, G. S., Laumann, T. O., and Petersen, S. E. (2014). An approach for parcellating
human cortical areas using resting-state correlations. Neuroimage 93(Pt 2), Publisher’s Note: All claims expressed in this article are solely those of the authors
276–291. doi: 10.1016/j.neuroimage.2013.07.035 and do not necessarily represent those of their affiliated organizations, or those of
Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-I., and Jegelka, S.
the publisher, the editors and the reviewers. Any product that may be evaluated in
(2018). Representation Learning on Graphs with Jumping Knowledge Networks.
this article, or claim that may be made by its manufacturer, is not guaranteed or
Technical report, MIT.
Yeo, B. T. T., Krienen, F. M., Sepulcre, J., Sabuncu, M. R., Lashkari, D., endorsed by the publisher.
Hollinshead, M., et al. (2011). The organization of the human cerebral cortex
estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165. Copyright © 2021 Eschenburg, Grabowski and Haynor. This is an open-access article
doi: 10.1152/jn.00338.2011 distributed under the terms of the Creative Commons Attribution License (CC BY).
Zeng, H., Zhou, H., Srivastava, A., Kannan, R., and Prasanna, V. (2020). The use, distribution or reproduction in other forums is permitted, provided the
GraphSAINT: Graph Sampling Based Inductive Learning Method. Technical original author(s) and the copyright owner(s) are credited and that the original
report, University of Southern California. publication in this journal is cited, in accordance with accepted academic practice.
Zhao, Q., Adeli, E., Honnorat, N., Leng, T., and Pohl, K. M. (2019). No use, distribution or reproduction is permitted which does not comply with these
Variational autoencoder for regression: application to brain aging terms.

Learning Cortical Parcellations Using Graph Neural Networks

Uploaded by

Copyright:

Available Formats

Learning Cortical Parcellations Using Graph Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Cortical Parcellations Using Graph Neural Networks

Uploaded by

Copyright:

Available Formats

METHODS

published: 24 December 2021

Learning Cortical Parcellations Using

Frontiers in Neuroscience | www.frontiersin.org 1 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 2 December 2021 | Volume 15 | Article 797500

then normalized by a softmax operation. To update the features

Because these filters are not localized in space, Defferrard et al.

Frontiers in Neuroscience | www.frontiersin.org 3 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 4 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 5 December 2021 | Volume 15 | Article 797500

4.3. Incorporating a Spatial Prior

Frontiers in Neuroscience | www.frontiersin.org 6 December 2021 | Volume 15 | Article 797500

We refer to the graph convolution network, graph attention

Frontiers in Neuroscience | www.frontiersin.org 7 December 2021 | Volume 15 | Article 797500

superior temporal areas in the fundus and medial superior

Frontiers in Neuroscience | www.frontiersin.org 8 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 9 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 10 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 11 December 2021 | Volume 15 | Article 797500

TABLE 1 | Model classification accuracy as a function of network architecture and parameterization.

3 62.64 64.93 67.02 66.71

16 60.54 62.60 66.37 66.12

64 63.84 66.24 67.15 67.15

0.1 62.64 64.93 67.02 66.71

Dropout rate 0.3 60.74 63.94 66.72 66.58

Frontiers in Neuroscience | www.frontiersin.org 12 December 2021 | Volume 15 | Article 797500

TABLE 2 | Feature combinations tested by our optimal model.

Full feature sets Connectivity Scalar Location

DK (F) DX (F) YEO (F) ICA (F) DX Hist. Spect.

Frontiers in Neuroscience | www.frontiersin.org 13 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 14 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 15 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 16 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 17 December 2021 | Volume 15 | Article 797500

Frontiers in Neuroscience | www.frontiersin.org 18 December 2021 | Volume 15 | Article 797500

You might also like