Adolescent Fluid Intelligence Prediction from Regional Brain Volumes and Cortical Curvatures Using BlockPC-XGBoost

Li, Tengfei; Wang, Xifeng; Luo, Tianyou; Yang, Yue; Zhao, Bingxin; Yang, Liuqing; Zhu, Ziliang; Zhu, Hongtu

doi:10.1007/978-3-030-31901-4_20

Tengfei Li ORCID: orcid.org/0000-0001-6142-3865¹²,
Xifeng Wang¹³,
Tianyou Luo¹³,
Yue Yang¹³,
Bingxin Zhao¹³,
Liuqing Yang¹⁴,
Ziliang Zhu¹³ &
…
Hongtu Zhu¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11791))

Included in the following conference series:

Challenge in Adolescent Brain Cognitive Development Neurocognitive Prediction

1778 Accesses
3 Citations

Abstract

From the ABCD dataset, we discover that besides the gray matter volume of cortical regions, other measures such as the mean cortical curvature, white matter volume and subcortical volume exhibit additional capabilities in the prediction of the pre-residulized fluid intelligence scores for adolescents. The MSE and R-square on validation dataset are improved from 70.65 and 0.0175 to 69.39 and 0.0350, respectively, comparing with using mostly the grey matter volume provided by the challenge organizer. Specifically, by employing a BlockPC-XGBoost framework we discover the following predictors in reducing the MSE on validation set: the gray matter volume of right posterior cingulate gyrus and left caudate nucleus, the entorhinal white matter volume of the left hemisphere, the number of detected surface holes, the globus pallidus volume, the mean curvatures of precentral gyrus, postcentral gyrus and Banks of Superior Temporal Sulcus.

You have full access to this open access chapter, Download conference paper PDF

Predicting intelligence from brain gray matter volume

Article Open access 21 July 2020

Discriminative-Region-Aware Residual Network for Adolescent Brain Structure and Cognitive Development Analysis

A multicohort geometric deep learning study of age dependent cortical and subcortical morphologic interactions for fluid intelligence prediction

Article Open access 22 October 2022

Keywords

1 Introduction

Fluid intelligence is one crucial component of human general intelligence which involves the capacity to think logically, solve problems in novel situations and independent of acquired knowledge [14]. It has been widely accepted that the fluid intelligence reaches a peak in late adolescence and then declines [13]. Furthermore, its quantification and accurate predictions are important for teenagers which foresees creative achievement, scholastic performance, employment pro-spects, socioeconomic status, etc. in their future years [8]. Structural magnetic resonance (MR) images are one of the most powerful tool to help predict the fluid intelligence. The ABCD challenge dataset provides us with a large amount of adolescent participants with structural MR images, aiming at the precise prediction of the pre-Residualized Fluid Intelligence Scores (RFIS), which has been adjusted for different data collection sites, demographic variables and whole brain volume. It includes 3,739 training subjects, 415 validation subjects and 4402 test subjects. For each subject, the cortical volume of 122 regions of interest were extracted from T1 images by the challenge organizers according to the NCANDA pipeline [20].

From previous literature, scientists found that regional brain volumes were closely related to the cognitive status of individuals from the study of Alzheimer disease [2]. While most of the previous literature mainly focus on gray matter cortical metrics [3, 23], recent studies revealed the association of white matter and subcortical volumes with cognitive functions [11, 25]. Cortical thickness can also be related to the cognition according to [7, 15]. Based on those findings we extracted white matter, subcortical regional volumes, etc., by pre-processing the raw T1 MR images using the software Freesurfer [21] to obtain more resourceful explanatory features in different regions of interest (ROIs) for improvement of the prediction performance. We discovered that besides the cortical volumes, structural metrics such as white matter and subcortical volume and mean curvatures were also useful based on our challenge results.

The information extraction and manipulation are two important components for prediction. For information extraction, principal component analysis (PCA) is a common and standard technique in multivariate statistics, which aims to use a set of linear approximations for dimension reduction. However, the classical procedure of PCA could lead to a side effect that the principal components mainly focus on subdomains with large variance. When small groups of highly-correlated covariates exist, important features might get hidden behind [16]. Human brain encompasses complex network structures, and different brain regions can be highly-correlated. Cluster analysis is a typical method of grouping a set of objects into different subsets in terms of their “similarities”. Previous literature has shown that hierarchical clustering results could correspond to brain anatomical configurations [17]. By dividing similar covariates into groups to extract principle components (PCs) within each cluster, we can preserve important features. For information manipulation, statistical modeling and machine learning (ML) methods, such as linear and ridge regression [12], random forest [5], support vector machines [9], etc., remain popular for decades. Among those, the Extreme Gradient Boosting (XGBoost) [6] is extensively used by ML practitioners to create state of art data science solutions, and has gained much attention recently as the choice of many winning teams of ML competitions [24].

We hereby propose to use hierarchical clustering with block PCA to extract important features which are fed into the XGBoost machine for predicting fluid intelligence. Our results show that incorporating block PCA into XGBoost framework leads to better prediction performance than using XGBoost based on either original covariates or traditionally calculated principal components.

2 Method

2.1 Dataset

The ABCD challenge uses NIH Toolbox® [19], Rey Auditory Verbal Learning Test (RAVLT) [10], Little man task [1], etc., to quantify the fluid intelligence scores [22]. The whole dataset we obtained includes 4459 males, 4085 females aged from 8 to 12 years old (107–133 months), and 12 additional individuals with missing demographic information. A detailed distribution of ages and genders for each of the training, validation and test set can be found in Fig. 1. Raw T1-weighted MR images were multi-protocol acquired with Siemans, Philips and GE scanners, which were further processed according to [20]. The cortical ROI volumes for each subject were calculated. With this dataset we describe our workflow in the following subsections and illustrated it in Fig. 4(a).

2.2 The Preprocessing

To predict the RFIS of subjects on validation and test set, we first pre-processed the raw T1 MR images for participants in the whole dataset to extract their brain white matter and subcortical ROI volumes, and the mean curvatures, etc., by using the software FREESURFER’s standard recon-all pipeline (v.6.0.0) [24], which include motion correction, intensity normalization, skull stripping, removal of non-brain tissue, brain mask generation, cortical reconstruction, WM and subcortical segmentation, and cortical parcellation. The white matter volume, subcortical volume, and mean curvature for each ROI and individual were extracted as supplemental information which were combined with cortical volumes, age and gender in Subsect. 2.1 to make predictions. Pearson’s correlations and p-values between all structural metrics with the RFIS were calculated from all subjects in training dataset and displayed in Fig. 2.

We discovered that the age and gender were not significant (p-value > 0.80); there were 22 features with the False Discovery Rate (FDR) adjusted p-values [4] smaller than 0.05, which include: white matter volume of pons and left entorhinal; gray matter volume of left and right parahippocampal gyrus; subcortical volume of right and left globus pallidus and right ventral diencephalon; mean cortical curvature of right and left precentral gyrus, right postcentral gyrus, right paracentral lobule, right and left superior parietal lobule, right Banks of Superior Temporal Sulcus, right superior temporal gyrus, right medial orbital gyrus and right inferior temporal gyrus; number of defect holes in right and total surface. The top 3 features were white matter volume of pons gray matter of right precentral gyrus and white matter volume of left entorhinal, whose FDR adjusted p-values were less than 0.01.

2.3 Clustering and Block PCA

To efficiently extract information from the original datasets, We first perform hierarchical clustering using WardD2 algorithm [18] on the four structural metrics: cortical volumes, WM volumes, subcortical volumes and mean curvature, separately. Looking at the clustering results, we found that cutting the number of clusters at 12 led to the best performance. For each cluster we extracted the first 5 PCs. Hence there could be totally about \(12 \times 5 \times 4=240\) additional features extracted if 5 PCs for each cluster exists. However, in real data there were only 206 PCs generated due to the fact that for very small clusters, the total number of principal components is less than 5. Those additional features were then combined with the original ROI quantitative measures in previous steps to fit the XGBoost model for prediction. In Fig. 3 we provided one example to illustrate the correlation heatmap of the clustering structures for cortical volumes, which indicates a latent correlation structure among all the regions. The Figure also shows that the first 5 principle components for almost all clusters cover more than 60% contribution of variance. Combining the 206 PCs with the original features generated from the previous steps, Pearson’s correlations and p-values with the RFIS response were calculated similar to Subsect. 2.2. After the FDR p-value correction, 30 features were found significantly correlated with the RFIS, with adjusted p-values less than 0.05.

2.4 XGBoost Statistical Models

All features generated in the last three steps on the training set were combined together as explanary varibles to fit a prediction model. First, pvalues based on Pearson’s correlation between all features and the RFIS by only using the training dataset were ranked, where the first \(p_0\) features with the smallest p-values were screened which was then fed into the XGBoost machine. The “GbLinear” booster was used with the default “reg:linear” objective function and the initial prediction score (“base_score”) was set as zero; we set the learning rate as 0.05, and used 10-fold cross validation on the training dataset with the mean absolute error (MAE) as the evaluation metric to select the number of iteration rounds. The \(L_2\) tuning parameter was fixed at the default value 1. We trained XGBoost models with different tuning parameters and then made predictions on the validation set. The optimal \(p_0\) was then selected to minimize the mean squared error (MSE) on the validation set.

We made comparisons between the BlockPC-XGBoost with the XGBoost without Block PCs, and between the BlockPC-XGBoost using all features with BlockPC-XGBoost using the cortical volumes provided by the challenge organizer. The results for both training and validation were shown in the next section.

3 Results

The MSE and R square for the BlockPC-XGBoost using all features, the BlockPC-XGBoost using mostly the grey matter cortical volume provided by the challenge organizer, and the XGBoost without Block PCs for both training and validation were shown in Table 1. The comparisons indicate the advantage of the proposed BlockPC-XGBoost. Importance scores of the top features for the BlockPC-XGBoost were shown in Fig. 4(b). We found that the important predictors did not match the significant features in Fig. 2 exactly. These important predictors were: the gray matter volume of right posterior cingulate gyrus and left caudate nucleus, the entorhinal white matter volume of the left hemisphere, the number of detected surface holes, the globus pallidus volume, several regional lateral ventricle and cerebellum volumes, the mean curvatures of precentral gyrus, postcentral gyrus and Banks of Superior Temporal Sulcus. Among those, the 15 PCs from clustering takes 30% of the top 50 features in total.

Table 1. Prediction results using 3 methods

Full size table

4 Discussion

From the analysis discussed above, we found several brain areas (white matter volume of pons, gray matter of right precentral gyrus and white matter volume of left entorhinal, etc.) significantly correlated with RFIS. Based on the given features from the challenge organizer and our generated features by Freesurfer software, we used a Block PCA design to learn the representation from all these features, which shows a good learning ability for correlated features. We then used the XGBoost machine to train a prediction model using the learned features, obtaining a result of 69.39 on the validation set. Simultaneously, we found several features which exhibit strong prediction power.

However, the proposed approach is based on segmentation and parcellation of the ROIs, which relies on the image processing precision. Furthermore, the approach does not consider spatial location of all ROIs. Comparing with modern deep learning techniques, e.g., the U-Net, or the graphical model-based deep neural network, it loses local information.

References

Acker, W., Acker, C.: Bexley maudsley automated processing screening and bexley maudsley category sorting test manual. NFER-Nelson, Windsor, England (1982)
Google Scholar
Adak, S., et al.: Predicting the rate of cognitive decline in aging and early alzheimer disease. Neurology 63(1), 108–114 (2004)
Article Google Scholar
Bajaj, S., et al.: The relationship between general intelligence and cortical structure in healthy individuals. Neuroscience 388, 36–44 (2018)
Article Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2008)
Google Scholar
Cheng, C., Cheng, S., Tam, C., Chan, W., Chu, W., Lam, L.: Relationship between cortical thickness and neuropsychological performance in normal older adults and those with mild cognitive impairment. Aging Dis. 9(6), 1020–1030 (2018)
Article Google Scholar
Colom, R., Escorial, S., Shih, P., Privado, J.: Fluid intelligence, memory span, and temperament difficulties predict academic performance of young adolescents. Pers. Individ. Differ. 42(8), 1503–1514 (2007)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Daniel, M., Wahlstrom, D., Zhang, O.: Equivalence of q-interactive\(^{\rm TM}\) and paper administrations of cognitive tasks: Wisc-v. Q-Interactive Technical Report 8 (2014)
Google Scholar
Davies, G., et al.: Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat. Commun. 9(1), 2098 (2018)
Article Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Horn, J.L.: Human Ability Systems. Academic Press, New York (1978)
Google Scholar
Jaeggi, S.M., Buschkuehl, M., Jonides, J., Perrig, W.J.: Improving fluid intelligence with training on working memory. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 105, pp. 6829–6833 (2008)
Article Google Scholar
Jiang, L., et al.: Cortical thickness changes correlate with cognition changes after cognitive training: evidence from a Chinese community study. Front. Aging Neurosci. 8, 118–118 (2016)
Google Scholar
Lin, Z., Zhu, H.: MFPCA: multiscale functional principal component analysis. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019) (2019)
Article Google Scholar
Menon, D., Bullmore, E., Suckling, J., Pickard, J.D., Coleman, M.R., Salvador, R.: Neurophysiological architecture of functional magnetic resonance images of human brain. Cereb. Cortex 15(9), 1332–1342 (2005)
Article Google Scholar
Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif. 31(3), 274–295 (2014)
Article MathSciNet Google Scholar
NIH Toolbox. http://www.nihtoolbox.org
Pfefferbaum, A., et al.: Altered brain developmental trajectories in adolescents after initiating drinking. Am. J. Psychiatry 175(4), 370–380 (2017)
Article Google Scholar
Reuter, M., Schmansky, N.J., Rosas, H.D., Fischl, B.: Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage 61(4), 1402–1418 (2012)
Article Google Scholar
Thompson, W., et al.: The structure of cognition in 9 and 10 year-old children and associations with problem behaviors: findings from the ABCD study’s baseline neurocognitive battery. Dev. Cogn. Neurosci. 36, 100606 (2018)
Article Google Scholar
Walhovd, K., et al.: Cortical volume and speed-of-processing are complementary in prediction of performance intelligence. Neuropsychologia 43(5), 704–713 (2005)
Article Google Scholar
XGBoost - ML winning solutions. https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
Zhao, B., et al.: Gwas of 19,629 individuals identifies novel genetic variants for regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. bioRxiv (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Radiology and the Biomedical Research Imaging Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Tengfei Li
Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Xifeng Wang, Tianyou Luo, Yue Yang, Bingxin Zhao, Ziliang Zhu & Hongtu Zhu
Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Liuqing Yang

Authors

Tengfei Li
View author publications
You can also search for this author in PubMed Google Scholar
Xifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyou Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yue Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bingxin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Liuqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ziliang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hongtu Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongtu Zhu .

Editor information

Editors and Affiliations

Stanford University, Stanford, CA, USA
Kilian M. Pohl
University of California, San Diego, La Jolla, CA, USA
Wesley K. Thompson
Stanford University, Stanford, CA, USA
Ehsan Adeli
Children’s National Health System, Washington, DC, USA
Marius George Linguraru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, T. et al. (2019). Adolescent Fluid Intelligence Prediction from Regional Brain Volumes and Cortical Curvatures Using BlockPC-XGBoost. In: Pohl, K., Thompson, W., Adeli, E., Linguraru, M. (eds) Adolescent Brain Cognitive Development Neurocognitive Prediction. ABCD-NP 2019. Lecture Notes in Computer Science(), vol 11791. Springer, Cham. https://doi.org/10.1007/978-3-030-31901-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-31901-4_20
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31900-7
Online ISBN: 978-3-030-31901-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics