references.bib
Supplementary Material
1 Systematic Search Methodology
We identified relevant articles by querying PubMed, Web of Science, Google Scholar and arXiv using specific search terms. The search terms used are listed in Fig. LABEL:search_terms. All fields in the database were queried, with the exception of Google Scholar where full texts were searched instead. Articles from 2015 onwards only were included for PubMed and Google Scholar, whereas all years were included for Web of Science and arXiV due to the small number of articles returned.
Next we screened articles based on article title and abstract. We formulated a set of inclusion and exclusion criteria and accepted or rejected articles based on these criteria. The screening criteria are listed in Table 1. Only the first 500 results from Google Scholar were screened because later results were largely irrelevant.
Inclusion/ exclusion criteria for article screening |
---|
Include…both in-vivo and ex-vivo imaging. |
Exclude…non-human subjects. |
Include…the following imaging modalities: structural and functional MRI, CT, PET, DWI/tractography. |
Exclude…EEG and MEG data. |
Include…both peer reviewed and non-peer reviewed articles. |
Exclude…non-English language articles. |
Exclude…PhD and Masters theses. |
Exclude…reviews, surveys, opinion articles and books. Articles must implement at least one interpretable deep learning method. |
Exclude…interpretable methods applied to machine learning models other than neural networks. For example, decision trees, random forests, SVMs, Gaussian processes. |
Exclude…for quality control. For example, some methods claimed to be interpretable but were not. |
After screening, we extracted data that were relevant to our review questions from all accepted articles into a table. We extracted 27 data points covering 6 different topics: article, imaging, modelling, interpretability method, interpretability method evaluation and study limitations (see Table LABEL:tab:article_data_collection in appendix).
The count of neuroimaging studies applying interpretable deep learning methods have approximately doubled annually111note, the cutoff date of this review was part way through 2021 (Fig. 1(a)). Most studies used existing public medical image datasets (76%), with the most popular being the Alzheimer’s Disease Neuroimaging Initiative (ADNI, 37% of studies) followed by the Human Connectome Project (HCP, 17% of studies) (Fig. 1(b)). The majority of studies (90%) are either structural or functional magnetic resonance imaging (MRI).