Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Computational Psychiatry As A Bridge From Neuroscience To Clinical Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

eview

r F o c u s o n N e u r a l C o m p u tat i o n a n d T h e o r y

Computational psychiatry as a bridge from


neuroscience to clinical applications
Quentin J M Huys1,2,5, Tiago V Maia3,5 & Michael J Frank4

Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it
involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such
complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation
with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational
psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches
2016 Nature America, Inc. All rights reserved.

apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes
or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven
approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly
at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical
applications, and highlight the utility of combining them.

The translation of advances in neuroscience into concrete improvements uncover these complex relations, but poses formidable data-analytic
for patients suffering from mental illness has been slow. Part of the prob- challenges. Here we argue that these theoretical and data-analytic
lem is the complexity of disease classification and outcome measurement challenges are insurmountable without powerful computational
in psychiatry1. A broader reason, however, is the complexity of the prob- tools and the conceptual frameworks they provide. Computational
lem: mental health depends not only on the function of the brain, the most psychiatry, conceived broadly, is therefore critical to the future of
complex of organs, but also on how that function relates to, influences, psychiatry and will likely have a central role in the rational develop-
and is influenced by the individuals environmental and experiential chal- ment of treatments, nosologies and preventive strategies.
lenges. Understanding mental health, and its disruption, therefore relies Computational psychiatry encompasses two approaches 8: data-
on linking multiple interacting levels, from molecules to cells, circuits, driven, theoretically agnostic data-analysis methods from machine
cognition, behavior, and the physical and social environment. learning (ML) broadly construed (including, but extending, standard
One of the difficulties is that the mapping between these levels is statistical methods), and theory-driven models that mathematically
not one-to-one. The same biological disturbance can affect several specify mechanistically interpretable relations between variables (often
npg

seemingly unrelated psychological functions and, conversely, different including both observable variables and postulated, theoretically
biological dysfunctions can produce similar psychological and even meaningful hidden variables). We review advances in both approaches,
neural-circuit disturbances24. Disturbances can arise independently with an emphasis on clinical applications, and discuss how they can
in some levels without dysfunction in other levels. Low mood, for be combined. Further aspects of computational psychiatry have been
example, may affect social function independently of its particular reviewed in other general912 and more specific reviews8,1316.
biological cause. Mappings between health and biology also vary with
external circumstances5. For example, neurobiologically determined The blessing and curse of dimensionality
emotion-regulation abilities may suffice for some environments, but Very few individual signs, let alone symptoms, are sufficiently specific
produce mood disorders in others. The current age of big data, with to identify underlying diseases. Depressed mood, for example, is insuf-
the ability to acquire and manipulate extremely high-dimensional, ficient for the diagnosis of major depressive disorder. The intuition
multimodal data sets, including clinical, genetic, epigenetic, cognitive, behind classification schemes such as DSM17 and ICD18 is that the
neuroimaging and other data types6,7, holds great promise to presence of additional features, such as anhedonia, fatigue, overeating
and suicidal thoughts, increases specificity by identifying a group of
1Translational Neuromodeling Unit, Institute for Biomedical Engineering, people with a relatively worse outcome that requires intervention, and
University of Zrich and Swiss Federal Institute of Technology (ETH) Zrich,
Zrich, Switzerland. 2Centre for Addictive Disorders, Department of Psychiatry,
thereby sanctions the labels disorder or disease. This description pro-
Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of ceeds in the absence of any understanding of the underlying biological
Zrich, Zrich, Switzerland. 3School of Medicine and Institute for Molecular (or environmental) pathology, and without any guarantee about an
Medicine, University of Lisbon, Lisbon, Portugal. 4Computation in Brain and
identifiable relationship between symptom clusters and biology.
Mind, Brown Institute for Brain Science, Psychiatry and Human Behavior, Brown
University, Providence, USA. 5These authors contributed equally to this work. The hope is that biomarkers might provide additional information and
Correspondence should be addressed to Q.J.M.H. (qhuys@cantab.net). either augment (stratify1) or even (partially) replace19 symptoms.
Received 24 September 2015; accepted 4 January 2016; published online Improving classification through the addition of features is in fact
23 February 2016; doi:10.1038/nn.4238 an important concept in ML, with blessings and curses. The kernel

404 VOLUME 19 | NUMBER 3 | MARCH 2016 nature neuroscience


review

Figure 1 The blessing and curse of dimensionality. a c d


In rich data sets in psychiatry, the number of
measured variables d per subject can substantially
exceed the number of subjects. (a) When this
occurs, subjects can always be separated linearly:
up to d + 1 subjects can always be separated b
linearly into two classes if the data span a
d-dimensional space. Three subjects can always
be separated into two groups using a combination
of two features. (b) For d + 2 (or more) subjects,
linear separation is not always possible. (The
e 1.0
f 1.0
g 1.0

0.8

LASSO with tenfold


0.8 0.8

Logistic regression
subjects indicated by black points are not linearly

True positive rate

True positive rate


cross-validation
separable from those indicated by red points.) 0.6 0.6 0.6

AUC
(c) Such data can, however, be separated linearly if
0.4 0.4 0.4
projected into a higher-dimensional space. Here, a
third dimension was added to the two-dimensional 0.2 0.2 0.2
data in b by calculating the absolute distance
from the line through the black points, thereby 0 0 0
0 0.5 1.0 0 0.5 1.0 Regression LASSO
making the two classes linearly separable, as False positive rate False positive rate
shown by the gray two-dimensional plane. (d) A
similar fact can be illustrated in regression: a d-order polynomial can always fit d + 1 points perfectly (red line), but it makes extreme predictions outside the
range of observations and is extremely sensitive to noise, overfitting the training data. (e) Even when the features and classes are just random noise, performing
regression in a high-dimensional space leads to misleadingly high performance142. The panel shows receiver operating characteristic (ROC)the false positive
against the true positive ratefor logistic regression applied to such random data. The red curve shows that logistic regression performs misleadingly well on the
2016 Nature America, Inc. All rights reserved.

training data, with a high area under the ROC curve (AUC) (regression training data, g). Obviously, however, this is overfitting, as the data are random. Indeed,
applying the resulting regression coefficients to unseen validation data not included in the training set, the predictions are random as they should be (blue line;
regression validation data, g). (f) Using LASSO, a form of cross-validated regularized regression (Box 1), partially prevents overfitting (red line; LASSO training
data, g). However, because the regularization parameter is fitted to the training data, even LASSO does not fully prevent overfitting: it is only when the LASSO
parameters are tested on the validation data set that performance is correctly revealed to be at chance level (blue line; LASSO validation data, g).

trick consists of implicitly adding a large or infinite number of portray or summarize the processes generating the data, they may
features20. The blessing of dimensionality is that in this infinite- improve the performance of ML algorithms beyond approaches that
dimensional space any finite-sized data set can always be classified do not consider such generative mechanisms13,24,25.
perfectly using a simple linear classifier (Fig. 1ac). The resulting
classification in the original space can be complex and nonlinear, Data-driven approaches
particularly if the included (implicit) features are nonlinear or involve ML approaches have been applied to several clinically relevant prob-
interactions or correlations between original data dimensions or lems, including automatic diagnosis, prediction of treatment outcomes
features. Practically, this blessing can also be a curse, as it is always and longitudinal disease course, and treatment selection. We provide
possible to perfectly distinguish n patients from m controls by using an overview of the central methodological features of these approaches
n + m 1 features (Fig. 1dg). As this is true for any outcome of and highlight some illustrative examples. Other recent reviews provide
interest and any features, it will perform well even on random noise complementary information beyond the scope of this review26,27.
(Fig. 1eg) and overfit (Box 1), meaning that the results will general-
ize poorly to new data (for example, future subjects). The danger of Diagnostic classification. Most symptoms in the compendia of psy-
npg

overfitting decreases as the number of subjects, not number of meas- chiatric classification are shared between two or more disorders28.
urements, increases, motivating larger studies and consortia pooling Current classification schemes attempt to improve diagnosis by requir-
their efforts6,7,21. ing the presence of multiple symptoms17,18. Unfortunately, individu-
Three broad approaches exist to cope with the curse of als often still satisfy criteria for multiple disorders (co-morbidity29)
dimensionality. First, unsupervised methods can be used to perform or do not fit any category clearly30, the categorical thresholds do not
dimensionality reduction before classification or regression (Fig. 2 separate clusters differing in illness burden31, and diagnostic reli-
and Box 1). Second, techniques such as regularization, Bayesian ability for some disorders is still problematic32.
model selection and cross-validation can be used to select the A now substantial body of work has applied ML to automatically
most informative features for classification or regression20, thereby classify patients versus controls26,27. The state of the art for using
integrating dimensionality reduction with the predictive task of MRI data to distinguish schizophrenia from healthy controls was
interest (Fig. 2 and Box 1). Both of these approaches are entirely recently examined in a competition33. The best entry reached an
data driven (although Bayesian approaches allow the incorpora- area under the curve (AUC) for classification of validation data of
tion of prior knowledge). A third, radically different approach uses 0.89 (ref. 34), and a combination of the top approaches reached 0.93.
theory-driven models to extract theoretically meaningful parameters The first three entries achieved similar performance despite using
based on models of the underlying processes. These parameters can different techniques35. There is nevertheless scope for further
then be used as efficient, low-dimensional representations of the very improvement through the integration of more modalities36, or from
high-dimensional data to which ML techniques for classification or algorithmic advances, for example, with deep belief networks37 or other
regression can subsequently be applied (Fig. 2). For example, models methods38 that outperform more standard approaches on a variety
of a variety of time-varying processes, such as learning22, multi-neuron of ML benchmarks35,39,40. Similar accuracies have been reported for
recordings23 and BOLD time series24, can collapse long and seemingly other disorders27, and these results have been extended in several
complex time series into a few parameters characterizing the underlying waysfor example, probabilistic classification approaches yield
dynamics. To the extent that the theory-driven models accurately an estimate of how certain the classification is34,35, and multi-class

nature neuroscience VOLUME 19 | NUMBER 3 | MARCH 2016 405


review

Box 1 Dealing with overfitting


Unsupervised dimensionality reduction. Overfitting tends to occur when the dimensionality of the data set (which is usually proportional to the number
of variables) is excessively high relative to the size of the training set. A first approach to overfitting therefore focuses on reducing the number of
variables: dimensionality reduction. This reduction can be done as a preprocessing step: variables that are highly related provide little independent
information, and such redundancies can be identified and removed using general-purpose unsupervised methods, such as principal or independent
component analysis, factor analysis or k-nearest neighbor143,144, or approaches specific to the data at hand145. Other ML techniques can subsequently
be applied to the reduced data (Fig. 2).
Regularization. Performing dimensionality reduction as a preprocessing step has an important limitation: it is not tailored to the specific problem being
solved (for example, prediction of a given outcome). Another approach is therefore to limit the number of variables selected by using regularizationfor
instance, by including a penalization term for too many predictorsin the prediction and classification algorithms themselves (Fig. 2). This approach is
inherent in support vector machines, LASSO, elastic nets, stochastic discrimination approaches such as random forests146 and other variable-selection
methods in multiple regression.
Bayesian model evidence. Bayesian approaches automatically penalize excessively complex models and are therefore an alternative to regularization.
These approaches evaluate how well a model fits the data by using the model evidence, which averages the likelihood over all possible parameter set-
tings, instead of using just the maximum-likelihood parameter set. The model evidence inherently penalizes excessively complex models; the intuition
is that even though for the maximum-likelihood parameter set these models may have a very high likelihood (as a result of overfitting), they also allow a
very wide range of other parameter settings that would produce very low likelihood. Appropriately complex models fare better, as they predict the data
with higher probability across parameter settings147. For example, in Figure 1d, data was generated from a straight line with some noise added. Even
though the model including higher order polynomials fits the data perfectly with a specific setting of parameters, the data would have very low likeli-
hood under other parameter settings, making the model evidence low. A linear model will have somewhat lower likelihood for the maximum-likelihood
parameter set (as it cannot overfit), but its model evidence will be larger because the likelihood of the data integrated across all parameter settings will
be higher. The model evidence will therefore select the model with the appropriate complexity, preventing overfitting. The downsides of the Bayesian
2016 Nature America, Inc. All rights reserved.

approach are that it does not provide absolute, but only relative, measures of model quality and that it is computationally demanding.
Cross-validation. The fundamental problem with overfitting is that it leads to poor predictions on new data. Cross-validation is a technique that esti-
mates and minimizes this problem by splitting the data set into two subsets: a training data set, which is used to estimate the prediction parameters,
and a validation data set, which is used to test how well those parameters predict new data (Fig. 1). This procedure can provide an unbiased estimate
of the expected error on new data148, but the variance of the estimator depends on the size of the data set. Splitting the data set into two subsets
decreases the size of the training data set, which leads to loss of valuable examples that could be used to improve prediction. Note that cross-validation
(for example, leave-one-out cross-validation) is often used in the training data to optimize aspects of the algorithm, and then the final held-out part of
the data is referred to as validation set. It is critical to keep these apart.

techniques deal with the clinically more relevant problem of longitudinal conversion in at-risk groups4752 and treatment response.
distinguishing between diagnostic groups27,41. The latter addresses a pressing need in psychiatry.
The fact that ML analyses of neuroimaging data can distinguish In depression, for example, although up to three quarters of patients
cases and controls suggests that, at least in these cases, the symptom will eventually respond to an antidepressant, two thirds require
clusters do map onto specific neurobiological substrates, despite multiple treatment trials before responding53. Several quantitative
diagnostic caveats and likely heterogeneity in disorders 32. However, electroencephalography (qEEG) markers have each been found to
one cannot always identify the relevant neural substrates simply predict pharmacological response in depression54,55. However, a
by inspecting the features used by the classifiers: these features are recent large-scale study, the International Study to Predict Optimized
typically complex, counterintuitive and not meaningful in isolation, Treatment in Depression (ISPOT-D6), tempered hopes about several
and cannot usually be collated across different ML techniques42. of these individual qEEG predictors5658. Some combinations of qEEG
npg

These approaches also have several limitations that would need variables, such as cordance59 or the antidepressant treatment response
to be overcome to make them practically useful. First, the com- index60, outperform individual predictors. The combination of qEEG
parison of cases to healthy controls might treat severity in a flawed features in a fully data-driven way promises better results: combined
manner34,43: although severity exacerbates comorbidity and hence features yielded better prediction of treatment response (81% specifi-
blurs diagnostic boundaries, it is also often used as a quantitative city, 95% sensitivity) than relying on any individual predictor (60%
marker for degree of caseness, making an understanding of how to specificity, 86% sensitivity)61, although this sample size was too small
deal with severity and comorbidity critical44. Second, existing binary to include a proper separate validation sample (Fig. 1).
or multi-class classification approaches usually treat comorbidity Results from other modalities similarly highlight the usefulness of
incorrectly by assuming that different diagnoses are mutually exclu- applying ML techniques for prediction using multiple features. For
sive; addressing this limitation may require statistical schemes that example, a reanalysis of data from STAR*D and COMED, two large
allow for multiple labels for each individual (for example, see ref. 45) trials in depression, suggested that a combination of supervised dimen-
that are much more demanding computationally. Third, the extent to sionality reduction and multivariate classifiers yielded a cross-validated
which these algorithms, which are typically trained on unambiguous prediction of remission with an area under the curve (AUC; Fig. 1)
cases, yield useful information for ambiguous cases, which are of 0.66. The number needed to treat (NNT) was 14, meaning that
clinically more relevant, remains to be explored. Finally, these applying the algorithm to 14 patients should result in one additional
approaches may be fundamentally limited because they reify 46 remission62. Similarly, although univariate cognitive markers acquired
symptom-based classification, although they could feasibly be used in ISPOT-D did not distinguish remitters from non-remitters63,64,
to refine them by subdividing existing classes1. the multivariate pattern of task performance did predict response
to the selective serotonin reuptake inhibitor (SSRI) escitalopram in
Prediction of treatment response. The current limitations in nosol- a subgroup of patients64. Multivariate structural MRI analyses also
ogy have led to a shift toward predicting inherently more valid and improved the identification of patients unlikely to respond, beyond the
immediately useful variables such as relapse in alcoholism, suicide, level achieved using individual markers65. As these examples illustrate,

406 VOLUME 19 | NUMBER 3 | MARCH 2016 nature neuroscience


review

Figure 2 Exploiting and coping with high


Data
dimensionality in psychiatric data sets.
Very high-dimensional dataset
Purely data-driven approaches (left and middle
branches) and combinations of theory- and
data-driven approaches (right branch) can
Dimensionality reduction
be used to analyze large data sets to arrive at
clinically useful applications. Dimensionality Unsupervised ML techniques Estimating theoretically
dimensionality reduction incorporating automatic meaningful parameters
reduction is a key step to avoid overfitting.
dimensionality reduction
It can be performed as a preprocessing step
Prediction and classification
using unsupervised methods before application
of other ML techniques with or without further ML techniques
dimensionality reduction (left branch; Box 1); without further
dimensionality reduction
using ML techniques that automatically limit
the number of variables for prediction; using
regularization or Bayesian model selection Purely data-driven Combining theory- and
(middle branch; Box 1); or using theory-driven data-driven approaches
models that in essence project the original high-dimensional data into a low-dimensional space of theoretically meaningful parameters, which can then
be fed into ML algorithms that may or not further reduce dimensionality (right branch).

ML techniques can lead to improvements in treatment-response pre- response to CBT, but worse response to escitalopram. The effect size
diction. In addition to these combinations of features in modalities, it was large, although predictive power was not examined70. As in the
seems likely that a combination of features across multiple modalities case of treatment-response prediction, it seems likely that treatment-
would lead to even further performance improvements. selection approaches will also benefit from including multiple
2016 Nature America, Inc. All rights reserved.

variables from various modalities.


Treatment selection. The most relevant question for practitioners is To the best of our knowledge, only one study has thus far attempted
not necessarily whether a given treatment will work, but rather which to validate the clinical utility of an automatic treatment-selection
of several possible treatments (or treatment combinations in the age algorithm in a randomized clinical trial71, with tantalizingly promis-
of polypharmacy) will work best for a given patient. Theoretically, ing results. This study used a proprietary algorithm constructed from
multiclass classifications can be cast in terms of multiple binary clas- a reference database of EEGs from over 1,800 subjects with within-
sifications66. Practically, however, it presents additional challenges: subject information about response to multiple treatment attempts
it may not be feasible to perform different tests (for example, neu- (about 17,000 treatment attempts in total). The algorithm extracts 74
roimaging, genetics, etc.) for each treatment option, so, ideally, the features from the EEG of each patient to predict the most likely suc-
same set of tests should be used to distinguish responses to multi- cessful medication for depression. Notably, the automatic algorithm
ple treatments. Furthermore, if different tests, or even different ML significantly outperformed clinical selection (Fig. 3). One caveat is
algorithms for the same tests, are used for different treatments, these that the medications prescribed in the two arms differed substantially,
predictions may not be directly comparable and hence not facilitate and the improvement in the automatic-selection arm might not have
choice between treatments. arisen purely through better targeting of the medications, but rather
Nevertheless, studies have started to address this question using through using more monoamine oxidase inhibitors and stimulants
data from trials in which subjects were randomized to multiple (although stimulants have generally fared poorly in the treatment
treatment arms, by looking for interactions between treatments and of depression72).
relevant variables in multiple regression. This has shown that being
npg

married and employed and having had more life events and more Understanding relations between symptoms. Limitations of current
failed antidepressant trials predicted relatively better response to diagnostic schemes have been mentioned above and are discussed
cognitive-behavioral therapy (CBT) over antidepressants, whereas elsewhere1,32,73. An alternative framework that provides insight into
comorbid personality disorders favored response to antidepressants patterns of co-occurrence and sequential expression of symptoms
over CBT67. The improvement that could be expected through allo- comes from descriptions of symptoms as networks, where, rather
cating each patient to the ideal treatment was a further reduction of than being considered as expressions of an underlying latent variable
3.6 points on the Hamilton Rating Scale for Depression beyond the (a given disorder), symptoms are viewed as entities in their own right
reduction obtainable using standard treatment selection, a clinically with direct relationships to other symptoms. Sleep disturbances, for
significant effect68. Similar approaches to the ISPOT-D data yielded example, typically cause fatigue; their co-occurrence might therefore
predictions for remission with escitalopram in individuals with poor be a result of their direct causal interaction rather than, say, underly-
cognitive function with a NNT of 3.8, meaning that assigning patients ing depression74. Indeed, computational modeling of the symptoms
in this group to escitalopram on the basis of their cognitive perform- that appear earliest before, and remain longest after, depressive
ance pattern led to remission in one additional patient for every episodeshopelessness and poor self-esteem 75suggests that they
four evaluated64. One study63 was able to make individual response might drive features such as anhedonia and lack of motivation76.
predictions that were strong enough to guide treatment choice in the Network analyses of the descriptions in the DSM itself have shown
majority of patients, resulting in NNTs of 25. that the symptom overlap across DSM diagnoses by itself recapitu-
Steps toward using ML applied to neuroimaging data for treat- lates many key features of empirically observed comorbidity pat-
ment selection are being made. One group69 used a univariate marker, terns and reveal one dominant cluster with a small-world topology28
amygdala responses to subliminal facial emotion stimuli, to predict (Fig. 4): a few symptoms strongly mediate between other symptoms
overall responses to SSRIs and serotonin-noradrenaline reuptake (having high betweenness and centrality) with short paths from one
inhibitors (SNRI) and differential responses to SNRIs versus SSRIs. symptom to another. Strong coherence between many symptoms has
Another group showed that increased insula activity related to better been argued to reflect a general psychopathology factor p, capturing

nature neuroscience VOLUME 19 | NUMBER 3 | MARCH 2016 407


review

Figure 3 Using EEG measures for treatment selection in depression 19


improves treatment response. Left, reference EEG (rEEG) procedure. After Medication washout 18
rEEG STARD
withdrawing all medications, a rEEG was performed. This was submitted for 17

QIDS score
online automated analysis involving 74 biomarkers and a comparison to a 16
Resting EEG
large reference database of EEG measures linked to longitudinal treatment 15
outcomes. Finally, a medication ranking was returned. Right, in a 12-site 14
Automated rEEG analysis 13
trial, patients were randomized to treatment selection via an optimized
12
clinical protocol (based on STAR*D) or rEEG. The rEEG-based selection led
Medication ranking 11
to improved treatment response relative to the optimized clinical protocol Screen EEG 2 4 6 8 10 12
after 2 weeks (red dots), and this effect grew stronger over 12 weeks. These Time (weeks)
results suggest that biological measures can improve treatment selection in
depression. Adapted with permission from ref. 71.

concurrent and sequential comorbidity patterns in a manner akin to what effect a change in the concentration of a given neurotransmitter
how the factor g of general intelligence captures covariance between will have on neural-circuit dynamics or behavior) or abductively to try
multiple cognitive abilities44. to infer the likely causes for a known consequence (for example, what
Dynamic network analyses that examined the temporal occur- type of disturbance in the concentration of certain neurotransmitters
rence patterns of symptoms over days (assessed, for example, using could give rise to observed neural-circuit or behavioral disturbances)9.
experience-sampling methods77) revealed frequent loops of mutually These models can have many different parameters that are constrained
reinforcing symptoms that could potentially stabilize each other7880. by a broad scientific literature. They are validated by qualitatively
Indeed, before transitions between non-depressed and depressed examining their predictions, which may include multiple levels of
states, or vice-versa, symptoms show increased variance and increased analysis (for example, neural activity and behavior).
2016 Nature America, Inc. All rights reserved.

autocorrelations81. These are signs of so-called critical slowing down, Algorithmic models, exemplified here by reinforcement learning
which are indicative of a transition from a stable state to another stable (RL) models, are usually simpler. Validation typically occurs through
state in dynamical systems. Indeed, residual subthreshold symptom quantitative statistical means (for example, model-comparison and
variation is known to be a risk factor for relapse and may relate to the model-selection techniques) that assess whether the data warrants the
variance identified here82,83. features and complexities embodied in each model (for example, see
Dispensing entirely with latent variables is questionable in the long ref. 85). They contain a comparatively small number of parameters,
run, as symptoms do reflect multiple underlying variables. Network whose values can be estimated for individual subjects by fitting the
analyses could be integrated with other levels of analyses (for exam- models to the data. These parameters, which represent theoreti-
ple, genetics, neural-circuit function, etc.) using graphical models20. cally meaningful constructs, can then be compared across groups,
These provide a probabilistic generalization of network descriptions correlated with symptom severity, etc9. These models are particularly
that can include hidden as well as observed variables and can incor- useful as tools for measuring hidden variables and processes that are
porate complex relationships between variables at different levels, difficult or impossible to measure directly.
potentially forming a bridge to more mechanistic models. Optimal (Bayesian) models attempt to link observed behavior
to the Bayes-optimal solution of a problem. This is particularly
Theory-driven approaches revealing when that optimum is unique, as it can be used to show
We now turn to theory-driven models. Unlike data-driven approaches, whether subjects can solve a task and whether they have done so in a
these models encapsulate a theoretical, often mechanistic, understanding particular experimental instance. Bayesian decision theory broadly
of the phenomena at hand. Their descriptions at theoretically provides three routes to psychopathology86: solving the wrong prob-
npg

independent84, but practically linked, levels provide powerful tools for lem correctly (for example, consistently prioritizing alcohol intake
integration. Models can be classified in many different ways; here, we will over health), solving the correct problem wrongly (for example,
distinguish between synthetic, algorithmic and optimal models. using alcohol to treat emotional problems), and solving the correct
Synthetic models, exemplified by biophysically detailed models, may problem correctly, but in an unfortunate environment or after unfor-
be the most intuitive model building exercises. They are informed tunate prior experiences (for example, having persecutive worries
by data from multiple sources relevant to the particular system(s) of after persecutory experiences).
interest (for example, a neural system, its modulation by specific neu- The distinction between these model types can be blurry. For
rotransmitters, etc.) and explore the interaction between these factors example, a biophysically realistic model of the basal ganglia may
through simulations and mathematical analysis. These models often have an algorithmic-like RL component to calculate prediction errors.
bridge different levels of analysis and can be used deductively to infer Furthermore, the different model types can sometimes profitably be
the likely consequences of known or suspected causes (for example, used in concerted fashion. For example, by approximating a detailed

a b Sadness Contentedness Figure 4 Networks of symptoms. (a) Network of symptoms in DSM-IV. Two
Childhood disorders
healthy depressed symptoms have a link if they belong to a common diagnostic category. There
Autocorrelation

Substance use is a large, strongly connected cluster containing 48% of the symptoms.
Schizophrenia Overall, the network has small-world characteristics, with the average path
Mood disorders length between two symptoms being only 2.6. Adapted with permission from
Anxiety disorders ref. 28. (b) Autocorrelations and variance, two signs of critical slowing down,
increase before a phase transition in dynamic networks. Prior to a transition
Variance

from a healthy state to depression, negative emotions such as sadness show


increasing variance and temporal autocorrelation. Prior to a transition from
Increase in Increase in depression to a remitted state, this is observed in positive emotions, such as
depression recovery contentedness. Adapted with permission from ref. 81.

408 VOLUME 19 | NUMBER 3 | MARCH 2016 nature neuroscience


review

Figure 5 Theory-driven biophysical and RL approaches. a c


(a) Insights into working-memory disturbances in schizophrenia. Control

Actor-critic Q-learning
Reducing NMDA currents on inhibitory interneurons leads to overall

Neuron number
disinhibition and broadens the bump representation of a stimulus

Firing rate
in working memory (compare top versus bottom), making it more
susceptible to distractors, especially those that activate neighboring Disinhibition
neurons. Adapted with permission from ref. 90. (b) Insights into
obsessive-compulsive disorder. Both lowering serotonin levels and Healthy Low neg. High neg.
increasing glutamatergic levels renders activity patterns excessively control symptoms symptoms
1s
stable, such that when a new cluster of neurons is stimulated, activity
does not shift to the new location, as would be expected (top, bFlexibly switch thought to new stimulus d
normal response), but rather remains stuck in the previous location X 20

Neuron number
(bottom). Adapted with permission from ref. 2. (c) Negative symptoms
in schizophrenia are related to a failure to represent expected values. 20 +20
+20 20
In an instrumental-learning task, healthy controls and patients with
schizophrenia with low levels of negative symptoms learned according
+140 +20 20 100 20 100 100 20
to a reinforcement-learning algorithm that explicitly represents the Obsessive thought resistant to switch
e

Probability of thinking
expected value of each state-action pair (Q-learning), whereas 0.6

through transition
Neuron number
patients with schizophrenia with high levels of negative symptoms
learned according to an algorithm that learns preferences without
such explicit representations (actor-critic). Adapted with permission
from ref. 101. (d) Examining the processes that guide goal-directed
0
evaluations. Shown is a decision tree corresponding to a sequence of 0 1 2 3 4 5 X = 70 X = 100 X = 140
three binary choices, where each choice leads to a gain or loss Time (s) Size of salient loss
2016 Nature America, Inc. All rights reserved.

indicated by the numbers. A RL model was fitted to choices and


contained two key parameters, representing the probability of continuing thinking when encountering a large salient loss (red arrow, X) or when encountering
other outcomes (blue arrows). (e) Subjects were far less likely to continue evaluating a branch after encountering a salient loss (red bars) than after other
outcomes, for a variety of salient loss sizes. Adapted with permission from ref. 132.

neural model with a more abstract algorithmic model to allow quan- This class of attractor models has also been used to explore the
titative estimation of parameters from subject data87. This approach effects of glutamatergic and serotonergic disturbances in obsessive-
also allows one to refine the details of one level of description con- compulsive disorder (OCD)2. Both decreased serotonin and increased
strained by the other. For example, detailed basal-ganglia models glutamate, two suspected abnormalities in OCD, led to the devel-
distinguish between opponent direct and indirect pathways that dif- opment of strong and persistent activity patterns toward which the
ferentially process dopaminergic reinforcement signals. Incorporating network tended to and from which it had difficulty escapinga pos-
this feature in more abstract models allows one to formally analyze its sible neurodynamic substrate for obsessions (Fig. 5b). Of note, the
consequences for a variety of behaviors across a wide range of param- model suggested that these neurodynamic disturbances can be alle-
eters. It also facilitates the quantitative fitting of behavioral data, and viated by increasing serotonin levels independently of whether the
formulating normative accounts for how adding this opponency is underlying cause is low levels of serotonin or high levels of glutamate.
helpful beyond classical algorithmic models88. Finally, it should also The model also included specific serotonin receptor types: 5HT2A
be noted that Bayesian techniques can be applied to all three types blockade ameliorated the neurodynamical abnormalities, suggest-
of models for fitting, validation and other purposes, that is, non- ing one explanation for why treatment augmentation with atypical
npg

Bayesian models can also be fit using Bayesian techniques. antipsychotics can be beneficial.
A similar integration from synaptic properties to high-level
Biophysically realistic neural-network models. Synthetic, biophysi- function was achieved with biologically detailed models of the
cally realistic neural-network models have been used to link biologi- cortico-striato-thalamic loops93,94. As reviewed previously, these
cal abnormalities in psychiatric disorders to their neurodynamical models explain various aspects of Parkinsons disease, Tourettes
and behavioral consequences. One class of models that has led to syndrome, schizophrenia and addiction9,87.
important insights in psychiatry includes cortical pyramidal neurons, In short, where detailed knowledge of the structure and function of
connected recurrently, and GABAergic interneurons; these models relevant circuits exists, synthetic models often allow an understand-
can form stable bumps of activity that maintain information online. ing of causally complex and even distant relations between levels of
Reducing NMDA receptor density on inhibitory interneurons as analysis (for example, from synaptic alterations to behavior). Such
found in schizophrenia89 led to weaker and broader attractor states models represent a critical tool to link biological details to symp-
(Fig. 5a) that were more sensitive to disruption by inputs close to the toms. Biophysically detailed models can also be reduced to extract
bump, suggesting that working memory in schizophrenia should be the core nonlinear dynamical components95 and make it amenable
particularly sensitive to distractors similar to the items held in work- to detailed mathematical analysis using stability or perturbation
ing memory90. A different use of this model to integrate across levels analyses. It should be noted, however, that even detailed biophysical
has been to relate NMDA receptor density to BOLD signals. Ketamine models typically involve substantial simplification, and conclusions
induces symptoms of psychosis91 and abolishes the negative relation- are restricted to the levels of analysis included in the model. What is,
ship between the resting-state default mode and task-related modes92. for instance, captured by an alteration in the parameter supposed to
A model that incorporated two populations of neurons representing reflect NMDA receptor density could be a result of other biological
the default-mode and task-positive networks was only able to cap- and emergent factors of the system.
ture this disruption when NMDA receptor function on GABAergic Biophysical models have also been successfully applied to
interneurons (and not on pyramidal neurons) was reduced92. neurological conditions95, such as epilepsy, with strong, identifiable

nature neuroscience VOLUME 19 | NUMBER 3 | MARCH 2016 409


review

1.0 on the basis of an internal model of the world, are thought to


0.8 capture goal-directed actions and rely on cognitive and limbic cortico-

True positive rate


striato-thalamo-cortical (CSTC) loops. Resource-light MF systems
0.6
conversely learn values by iteratively updating them with predic-
Raw data
0.4 tion errors through experience, and are thought to capture habits
Parameters
0.2 and rely on sensorimotor CSTC loops98,105108. As most109 addic-
tive substances release dopamine, they may boost dopaminer-
0
0 0.2 0.4 0.6 0.8 1.0 gic prediction-error learning110 (but see ref. 111) and speed up
False positive rate the establishment of drug-related habits112. Indeed, animals that rely
more on prediction-error learning are more prone to addiction113115,
Figure 6 Mechanistic models yield parameters that can be used as features
to improve ML performance. A classifier trained on estimated parameters with parallel findings of a shift from MB to MF choices emerging in
of a model fitted to simulated behavioral data (light blue curve, AUC 0.87) humans116118. A similar argument has been made for a shift toward
performed better than when trained on the raw data directly (purple curve, MF actions in OCD based on the idea that compulsions in OCD and
AUC 0.74). Data for 200 subjects with Gaussian distributed parameters compulsive drug use share some features117,119,120. However, tonic
were simulated from a simple MF RL model with time-varying action dopamine promotes MB rather than MF decisions121,122, questioning
reinforcements. Subjects were separated into two groups based on only one
its role in shifting competition from MB to MF valuation in addic-
parameter (the learning rate). The data set was split into two, with half of the
subjects used for training a classifier and the other half for validation.
tion. An alternative to a competitive account between MB and MF
Two classifiers were trained, with one trained on the raw behavioral data, and is a more integrated one where the goals driving MB evaluations are
the other on the parameters estimated by fitting a RL model. The ROC curve provided by MF processes123, for instance by more abstract plans in
is shown for performance on the validation set. anterior CSTC circuits being reinforced by dopaminergic signals124.
This would account for the prominent goal-seeking features of drug
2016 Nature America, Inc. All rights reserved.

neurophysiological correlates that can be modeled in their own right. addiction125. Finally, the shifts from MB to MF across disorders have
The absence of known strong correlates in psychiatry makes it diffi- often been a result of reductions in the MB component, rather than
cult to model them in their own right and instead requires them to be more prominent MF components, both neurally117,120 and behavio-
related to symptoms either theoretically, as in the examples discussed rally117,116 (but see ref. 116), raising the possibility that the MB to MF
here, or empirically, as in data-driven approaches. shift is a result of nonspecific impairments in executive function126,127
or stress128 affecting resources for MB computations.
Algorithmic reinforcement learning models. RL encompasses a set Indeed, a re-emerging RL direction explicitly addresses the effect of
of algorithms to infer policies that optimize long-run returns96 and resource constraints and bounded rationality15,129. These may provide
thus has been applied extensively to issues of affect, motivation and paths toward normative accounts of how MB and MF systems interact,
emotional decision-making. Practically, RL models typically consist with the MB system only being engaged when the resource costs are
of two components: an RL algorithm putatively capturing the inter- outweighed by the potential additional gains130. Furthermore, given
nal learning and evaluation processes, and a link function relating that full MB evaluation is prohibitively costly, they have to be partial,
the results of the internal evaluations to choice3,97. This allows them with profound consequences for the resulting valuations: if important
to assign a probability to each individual participants choice in an potential outcomes are not included in the evaluation, the results can
experiment and give statistically detailed accounts of learning and differ vastly and the glass will go from half full to half empty. The
behavior. Although they do not tend to be biophysically detailed, they regulation of internal valuation strategies may be related to cognitive
have characterized multiple aspects of neural activity and behavior98. aspects of emotion regulation15,131. RL modeling has started to iden-
The most prominent example is so-called model-free (MF) temporal tified specific aspects of these processes, such as a role for aversive
npg

prediction errors that compare expected to obtained reinforcement. outcomes in guiding resource allocation process132,133 (Fig. 5d,e).
These appear to be reported by phasic dopaminergic activity 99. Here,
we describe several uses of these models in psychiatry. Bayesian models. Bayes-optimal modeling approaches can be used
Reward sensitivity is altered in many psychiatric circumstances. to better understand the nature of problems and their solutions.
However, when analyzing behavior, variations in reward sensitivity For example, conditioning models that use gradual acquisition of
are often difficult to distinguish from variations in other processes, associations fail to capture standard extinction phenomena that result
particularly those of MF learning. When RL models were fitted to data from the fact that extinction generally involves new learning rather
to disentangle them, anhedonia in depression related specifically to a than unlearning. The correct statistical description of extinction
loss of reward sensitivity in a manner distinct from that of dopamin- procedures is that there is a latent variable, the experiment phase,
ergic manipulations affecting learning22. Similar approaches have that causes sudden switches in the association between stimulus and
facilitated more precise measurements of the sensitivity to irrelevant outcome. Using models that allow for the learning of such latent vari-
valued stimuli, which predicts relapse in alcoholism51 and the natural- ables provides a better account of standard extinction phenomena134
istic course of depression100, to relate negative symptoms to a shift in and predicts that stable unlearning can in fact occur as long as there
learning strategy away from representing expected values (Fig. 5c)101. is no obvious sudden switch, which was verified experimentally135.
In schizophrenia, RL has been used to examine aberrant learning102 One important aspect of Bayesian models more generally is their
and to show that ventral striatal hypofunction persists even when emphasis on the representation and use of uncertainty. These have
quantitatively controlling for differences in reward sensitivity and been used to show that the statistics of aversive experiences have
learning strategy103. important, but sometimes neglected, roles in several other proc-
A second important direction has been the examination of two esses, from familiarity in fear conditioning136 to learned helplessness
algorithms for choice valuation that were initially thought to act in and depression76.
parallel and to compete for behavioral expression104,105. Resource- Optimal models can also be used to ask whether a given symptom
costly prospective model-based (MB) systems simulate the future relates to suboptimal inference. For example, individuals with high

410 VOLUME 19 | NUMBER 3 | MARCH 2016 nature neuroscience


review

trait anxiety are unable to optimally update how volatile an aver- the processes underlying specific disorders, but for the most part have
sive environment is, whereas low-anxiety controls exhibit close to yet to be applied to clinical problems. We have highlighted why and
Bayes-optimum behavior137. Finally, Bayesian models can also how the combination of theory- and data-driven approaches can be
be used for applied purposes. For example, a Bayesian model of especially powerful and have described some initial, but promising,
stop-signal task performance138 differentiated occasional stimulant attempts at such integration. A shift in focus across both approaches
users with good and poor long-term outcomes and provided regres- from understanding or predicting current disease categories toward
sors for fMRI analyses that allowed longitudinal prediction25; classical transdiagnostic approaches and the prediction of imminently prac-
analyses failed to achieve either. tical and valid variables, such as treatment outcomes, appears to be
very promising.
Combining theory- and data-driven approaches Computational tools have a number of limitations. Most obviously,
Studies aimed at developing clinically useful applications have tended they require substantial expertise and are frequently opaque to the non-
to use theoretically agnostic ML approaches, whereas studies aimed expert. One challenge for the field is hence how to stimulate fruitful
at increasing understanding of disorders have tended to use theory- exchange between clinicians, experimentalists, trialists and theorists.
driven mechanistic approaches. Theory-driven approaches depend, This might be helped by a stronger focus on establishing utility by
of course, on the extent to which prior knowledge, mechanistic actively pursuing computational approaches in clinical trials. In addi-
understanding, and appropriate assessments of such mechanisms tion, computational tools are not a panacea and are not released from
(for example, via suitable tasks or physiological measurements) are the requirements of independent replication. However, the increasing
available. When such enabling factors are present, however, some popularity of open-source code and databases will facilitate such replica-
preliminary studies suggest that the combined use of theory-driven tions and the establishment and extension of (clinically) robust methods.
and ML approaches can be advantageous even from an applied Overall, the interaction between theorists and clinicians promises many
viewpoint. If the mechanistic theory is sufficiently accurate, opportunities and ultimately better outcomes for patients.
2016 Nature America, Inc. All rights reserved.

theory-driven approaches allow the estimation of features specifically


Acknowledgments
relevant to the disorder. In other words, theory-driven approaches use
Q.J.M.H. was supported by a project grant from the Swiss National Science
prior knowledge to massively reduce the dimensionality of the data Foundation (320030L_153449/1) and M.F. by NSF grant 1460604 and
set by projecting it to the space of a few relevant parameters. ML NIMH R01 MH080066-01.
approaches can then work on this lower-dimensional data set with
COMPETING FINANCIAL INTERESTS
increased efficiency and reliability (Fig. 2). Figure 6 shows a simu-
The authors declare competing financial interests: details are available in the online
lation of this intuitive effect: applying a classifier to data produced version of the paper.
by a generative model performs worse than applying it to the model
parameters recovered from that data. Reprints and permissions information is available online at http://www.nature.com/
A proof-of-concept study illustrating this approach built on prior reprints/index.html.

work showing that the drift-diffusion models (DDMs) decision


1. Kapur, S., Phillips, A.G. & Insel, T.R. Why has it taken so long for biological
thresholdthe amount of evidence required in favor of one option psychiatry to develop clinical tests and what to do about it? Mol. Psychiatry 17,
over another before committing to a choiceis partly controlled by 11741179 (2012).
communication between frontal cortex and the subthalamic nucleus 2. Maia, T.V. & Cano-Colino, M. The role of serotonin in orbitofrontal function and
obsessive-compulsive disorder. Clin. Psychol. Sci. 3, 460482 (2015).
(STN)139. Impulsive behaviors that result from reduced decision 3. Huys, Q.J.M., Moutoussis, M. & Williams, J. Are computational models of any
thresholds are observed in patients with Parkinsons disease treated use to psychiatry? Neural Netw. 24, 544551 (2011).
with STN deep brain stimulation (DBS) and are linked to disrup- 4. Stephan, K.E. et al. Charting the landscape of priority problems in psychiatry,
part 1: classification and diagnosis. Lancet Psychiatry 3, 7783 (2015).
tion of normal communication between frontal cortex and STN140.
npg

5. Caspi, A. & Moffitt, T.E. Gene-environment interactions in psychiatry: joining


One study used ML methods applied to EEG and behavioral data to forces with neuroscience. Nat. Rev. Neurosci. 7, 583590 (2006).
6. Williams, L.M. et al. International Study to Predict Optimized Treatment for
classify patients into those on versus off DBS12. Classification was Depression (iSPOT-D), a randomized clinical trial: rationale and protocol. Trials
better when using fitted DDM parameters than when using the raw 12, 4 (2011).
data; moreover, as suggested by the prior mechanistic work, the most 7. Mennes, M., Biswal, B.B., Castellanos, F.X. & Milham, M.P. Making data sharing
work: the FCP/INDI experience. Neuroimage 82, 683691 (2013).
informative parameters for classification were the decision threshold 8. Maia, T.V. Introduction to the series on computational psychiatry. Clin. Psychol.
and its modulation by frontal cortical activity. Similar improvements Sci. 3, 374377 (2015).
were found using model parameters for classifying presymptomatic 9. Maia, T.V. & Frank, M.J. From reinforcement learning models to psychiatric and
neurological disorders. Nat. Neurosci. 14, 154162 (2011).
Huntingtons patients versus controls and separating patients that 10. Montague, P.R., Dolan, R.J., Friston, K.J. & Dayan, P. Computational psychiatry.
were closer versus further from exhibiting symptoms141. Using Trends Cogn. Sci. 16, 7280 (2012).
11. Wang, X.J. & Krystal, J.H. Computational psychiatry. Neuron 84, 638654
model-based assessments has also enhanced classification and subtyp- (2014).
ing of schizophrenia patients24 and the aforementioned prospective 12. Wiecki, T.V., Poland, J. & Frank, M.J. Model-based cognitive neuroscience
prediction of stimulant abuse25. approaches to computational psychiatry clustering and classification.
Clin. Psychol. Sci. 3, 378399 (2015).
13. Maia, T.V. & McClelland, J.L. A neurocomputational approach to obsessive-
Conclusion compulsive disorder. Trends Cogn. Sci. 16, 1415 (2012).
We have outlined multiple fronts on which computational psychiatry 14. Stephan, K.E. & Mathys, C. Computational approaches to psychiatry. Curr. Opin.
Neurobiol. 25, 8592 (2014).
is likely to substantially advance psychiatry. Data-driven approaches 15. Huys, Q.J.M., Daw, N.D. & Dayan, P. Depression: a decision-theoretic analysis.
have started to bear some fruit for clinically relevant problems, Annu. Rev. Neurosci. 38, 123 (2015).
16. Stephan, K.E., Iglesias, S., Heinzle, J. & Diaconescu, A.O. Translational
such as improving classification, predicting treatment response and perspectives for computational neuroimaging. Neuron 87, 716732
aiding treatment selection. These approaches, however, are limited in (2015).
their ability to capture the complexities of interacting variables in and 17. American Psychiatric Association. Diagnostic and Statistical Manual of Mental
Disorders (DSM-5 R) (American Psychiatric Publishing, 2013).
across multiple levels. Theory-driven modeling efforts, on the other 18. World Health Organization. International Classification of Diseases (World Health
hand, have yielded key insights at many levels of analysis concerning Organization Press, 1990).

nature neuroscience VOLUME 19 | NUMBER 3 | MARCH 2016 411


review

19. Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework 50. Whelan, R. et al.; IMAGEN Consortium. Neuropsychosocial profiles of current and
for research on mental disorders. Am. J. Psychiatry 167, 748751 (2010). future adolescent alcohol misusers. Nature 512, 185189 (2014).
20. MacKay, D.J. Information Theory, Inference and Learning Algorithms (CUP, 51. Garbusow, M. et al. Pavlovian-to-instrumental transfer effects in the nucleus
Cambridge, 2003). accumbens relate to relapse in alcohol dependence. Addict. Biol. published
21. Lee, S.H. et al.; Cross-Disorder Group of the Psychiatric Genomics Consortium; online, doi:10.1111/adb.12243 (1 April 2015).
International Inflammatory Bowel Disease Genetics Consortium (IIBDGC). Genetic 52. Niculescu, A.B. et al. Understanding and predicting suicidality using a combined
relationship between five psychiatric disorders estimated from genome-wide SNPs. genomic and clinical risk assessment approach. Mol. Psychiatry 20, 12661285
Nat. Genet. 45, 984994 (2013). (2015).
22. Huys, Q.J.M., Pizzagalli, D.A., Bogdan, R. & Dayan, P. Mapping anhedonia onto 53. Rush, A.J. et al. Acute and longer-term outcomes in depressed outpatients
reinforcement learning: a behavioural meta-analysis. Biol. Mood Anxiety Disord. requiring one or several treatment steps: a STAR*D report. Am. J. Psychiatry 163,
3, 12 (2013). 19051917 (2006).
23. Cunningham, J.P. & Yu, B.M. Dimensionality reduction for large-scale neural 54. Olbrich, S. & Arns, M. EEG biomarkers in major depressive disorder: discriminative
recordings. Nat. Neurosci. 17, 15001509 (2014). power and prediction of treatment response. Int. Rev. Psychiatry 25, 604618
24. Brodersen, K.H. et al. Dissecting psychiatric spectrum disorders by generative (2013).
embedding. Neuroimage Clin. 4, 98111 (2014). 55. Iosifescu, D.V. Electroencephalography-derived biomarkers of antidepressant
25. Harl, K.M. et al. Bayesian neural adjustment of inhibitory control predicts response. Harv. Rev. Psychiatry 19, 144154 (2011).
emergence of problem stimulant use. Brain 138, 34133426 (2015). 56. Arns, M. et al. Frontal and rostral anterior cingulate (rACC) theta EEG in
26. Orr, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G. & Mechelli, A. Using Support depression: implications for treatment outcome? Eur. Neuropsychopharmacol. 25,
Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: 11901200 (2015).
a critical review. Neurosci. Biobehav. Rev. 36, 11401152 (2012). 57. Arns, M. et al. EEG alpha asymmetry as a gender-specific predictor of outcome
27. Wolfers, T., Buitelaar, J.K., Beckmann, C.F., Franke, B. & Marquand, A.F. From to acute treatment with different antidepressant medications in the randomized
estimating activation locality to predicting disorder: A review of pattern recognition ISPOT-D study. Clin. Neurophysiol. 127, 509519 (2015).
for neuroimaging-based psychiatric diagnostics. Neurosci. Biobehav. Rev. 57, 58. Dinteren, Rv. et al. Utility of event-related potentials in predicting antidepressant
328349 (2015). treatment response: an iSPOT-D report. Eur. Neuropsychopharmacol. 25,
28. Borsboom, D., Cramer, A.O.J., Schmittmann, V.D., Epskamp, S. & Waldorp, L.J. 19811990 (2015).
The small world of psychopathology. PLoS One 6, e27407 (2011). 59. Leuchter, A.F. et al. Cordance: a new method for assessment of cerebral perfusion
29. Kessler, R.C. et al. Comorbidity of DSM-III-R major depressive disorder in the and metabolism using quantitative electroencephalography. Neuroimage 1,
general population: results from the US National Comorbidity Survey. Br. J. 208219 (1994).
2016 Nature America, Inc. All rights reserved.

Psychiatry Suppl. 30, 1730 (1996). 60. Iosifescu, D.V. et al. Frontal EEG predictors of treatment outcome in major
30. Fairburn, C.G. & Bohn, K. Eating disorder NOS (EDNOS): an example of the depressive disorder. Eur. Neuropsychopharmacol. 19, 772777 (2009).
troublesome not otherwise specified (NOS) category in DSM-IV. Behav. Res. 61. Khodayari-Rostamabad, A., Reilly, J.P., Hasey, G.M., de Bruin, H. & Maccrimmon, D.J.
Ther. 43, 691701 (2005). A machine learning approach using EEG data to predict response to SSRI treatment
31. Kessler, R.C., Zhao, S., Blazer, D.G. & Swartz, M. Prevalence, correlates, and for major depressive disorder. Clin. Neurophysiol. 124, 19751985 (2013).
course of minor depression and major depression in the National Comorbidity 62. Chekroud, A. et al. Cross-trial prediction of treatment outcome in depression. Lancet
Survey. J. Affect. Disord. 45, 1930 (1997). Psychiatry published online, doi:10.1016/S2215-0366(15)00471-X (20 January
32. Freedman, R. et al. The initial field trials of DSM-5: new blooms and old thorns. 2016).
Am. J. Psychiatry 170, 15 (2013). 63. Gordon, E., Rush, A.J., Palmer, D.M., Braund, T.A. & Rekshan, W. Toward an
33. Silva, R.F. et al. The tenth annual MLSP competition: schizophrenia classification online cognitive and emotional battery to predict treatment remission in
challenge. IEEE Int. Workshop Mach. Learn. Signal Process. 16 (2014). depression. Neuropsychiatr. Dis. Treat. 11, 517531 (2015).
34. Solin, A. & Sarkka, S. The tenth annual MLSP competition: first place. in IEEE 64. Etkin, A. et al. A cognitive-emotional biomarker for predicting remission with
Int. Workshop Mach. Learn. Signal Process. 16 (2014). antidepressant medications: a report from the iSPOT-D trial.
35. Sabuncu, M.R. & Konukoglu, E. Alzheimers Disease Neuroimaging Initiative. Neuropsychopharmacology 40, 13321342 (2015).
Clinical prediction from structural brain MRI scans: a large-scale empirical study. 65. Korgaonkar, M.S. et al. Magnetic resonance imaging measures of brain structure
Neuroinformatics 13, 3146 (2015). to predict antidepressant treatment outcome in major depressive disorder.
36. Hahn, T. et al. Integrating neurobiological markers of depression. Arch. Gen. EBioMedicine 2, 3745 (2015).
Psychiatry 68, 361368 (2011). 66. Rifkin, R. & Klautau, A. In defense of one-vs-all classification. J. Mach. Learn.
37. Hinton, G.E., Osindero, S. & Teh, Y.W. A fast learning algorithm for deep belief Res. 5, 101141 (2004).
nets. Neural Comput. 18, 15271554 (2006). 67. DeRubeis, R.J. et al. The Personalized Advantage Index: translating research on
38. Peng, X., Lin, P., Zhang, T. & Wang, J. Extreme learning machine-based prediction into individualized treatment recommendations. A demonstration.
classification of ADHD using brain structural MRI data. PLoS One 8, e79476 PLoS One 9, e83875 (2014).
(2013). 68. Anderson, I. & Pilling, S. Depression: the Treatment and Management of
39. Kim, J., Calhoun, V.D., Shim, E. & Lee, J.H. Deep neural network with weight Depression in Adults (Updated Edition) (The British Psychological Society and
sparsity control and pre-training extracts hierarchical features and enhances The Royal College of Psychiatrists, 2010).
npg

classification performance: Evidence from whole-brain resting-state functional 69. Williams, L.M. et al. Amygdala reactivity to emotional faces in the prediction of
connectivity patterns of schizophrenia. Neuroimage 124 Pt A, 127146 2016). general and medication-specific responses to antidepressant treatment in the
40. Watanabe, T., Kessler, D., Scott, C., Angstadt, M. & Sripada, C. Disease prediction randomized iSPOT-D trial. Neuropsychopharmacology 40, 23982408 (2015).
based on functional connectomes using a scalable and spatially-informed support 70. McGrath, C.L. et al. Toward a neuroimaging treatment selection biomarker for
vector machine. Neuroimage 96, 183202 (2014). major depressive disorder. JAMA Psychiatry 70, 821829 (2013).
41. Costafreda, S.G. et al. Pattern of neural responses to verbal fluency shows 71. DeBattista, C. et al. The use of referenced-EEG (rEEG) in assisting medication
diagnostic specificity for schizophrenia and bipolar disorder. BMC Psychiatry 11, selection for the treatment of depression. J. Psychiatr. Res. 45, 6475 (2011).
18 (2011). 72. Candy, M., Jones, L., Williams, R., Tookman, A. & King, M. Psychostimulants for
42. Pereira, F., Mitchell, T. & Botvinick, M. Machine learning classifiers and fMRI: a depression. Cochrane Database Syst. Rev. (2): CD006722 (2008).
tutorial overview. Neuroimage 45 (suppl.) S199S209 (2009). 73. Cuthbert, B.N. & Insel, T.R. Toward the future of psychiatric diagnosis: the seven
43. Lubke, G.H. et al. Subtypes versus severity differences in attention-deficit/ pillars of RDoC. BMC Med. 11, 126 (2013).
hyperactivity disorder in the Northern Finnish Birth Cohort. J. Am. Acad. Child 74. Cramer, A.O.J., Waldorp, L.J., van der Maas, H.L.J. & Borsboom, D. Comorbidity: a
Adolesc. Psychiatry 46, 15841593 (2007). network perspective. Behav. Brain Sci. 33, 137150, discussion 150193 (2010).
44. Caspi, A. et al. The p factor: One general psychopathology factor in the structure 75. Iacoviello, B.M., Alloy, L.B., Abramson, L.Y. & Choi, J.Y. The early course of
of psychiatric disorders? Clin. Psychol. Sci. 2, 119137 (2014). depression: a longitudinal investigation of prodromal symptoms and their relation
45. Ruiz, F.J.R., Valera, I., Blanco, C. & Perez-Cruz, F. Bayesian nonparametric to the symptomatic course of depressive episodes. J. Abnorm. Psychol. 119,
comorbidity analysis of psychiatric disorders. J. Mach. Learn. Res. 15, 1215 459467 (2010).
1247 (2014). 76. Huys, Q.J.M. & Dayan, P. A Bayesian formulation of behavioral control. Cognition
46. Hyman, S.E. The diagnosis of mental disorders: the problem of reification. Annu. 113, 314328 (2009).
Rev. Clin. Psychol. 6, 155179 (2010). 77. Telford, C., McCarthy-Jones, S., Corcoran, R. & Rowse, G. Experience sampling
47. Koutsouleris, N. et al. Use of neuroanatomical pattern classification to identify methodology studies of depression: the state of the art. Psychol. Med. 42,
subjects in at-risk mental states of psychosis and predict disease transition. Arch. 11191129 (2012).
Gen. Psychiatry 66, 700712 (2009). 78. Bringmann, L.F. et al. A network approach to psychopathology: new insights into
48. Schmaal, L. et al. Predicting the naturalistic course of major depressive disorder clinical longitudinal data. PLoS One 8, e60188 (2013).
using clinical and multimodal neuroimaging information: A multivariate pattern 79. Bringmann, L.F., Lemmens, L.H.J.M., Huibers, M.J.H., Borsboom, D. & Tuerlinckx,
recognition study. Biol. Psychiatry 78, 278286 (2015). F. Revealing the dynamic network structure of the Beck Depression Inventory-II.
49. Stringaris, A. et al.; IMAGEN Consortium. The brains response to reward Psychol. Med. 45, 747757 (2015).
anticipation and depression in adolescence: dimensionality, specificity, and 80. Wigman, J.T.W. et al.; MERGE. Exploring the underlying structure of mental disorders:
longitudinal predictions in a community-based sample. Am. J. Psychiatry 172, cross-diagnostic differences and similarities from a network perspective using both a
12151223 (2015). top-down and a bottom-up approach. Psychol. Med. 45, 23752387 (2015).

412 VOLUME 19 | NUMBER 3 | MARCH 2016 nature neuroscience


review

81. van de Leemput, I.A. et al. Critical slowing down as early warning for the onset 115. Huys, Q.J.M., Tobler, P.N., Hasler, G. & Flagel, S.B. The role of learning-related
and termination of depression. Proc. Natl. Acad. Sci. USA 111, 8792 (2014). dopamine signals in addiction vulnerability. Prog. Brain Res. 211, 3177
82. Segal, Z.V. et al. Antidepressant monotherapy vs sequential pharmacotherapy and (2014).
mindfulness-based cognitive therapy, or placebo, for relapse prophylaxis in 116. Sjoerds, Z. et al. Behavioral and neuroimaging evidence for overreliance on habit
recurrent depression. Arch. Gen. Psychiatry 67, 12561264 (2010). learning in alcohol-dependent patients. Transl. Psychiatry 3, e337
83. Dunlop, B.W., Holland, P., Bao, W., Ninan, P.T. & Keller, M.B. Recovery and (2013).
subsequent recurrence in patients with recurrent major depressive disorder. 117. Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits.
J. Psychiatr. Res. 46, 708715 (2012). Mol. Psychiatry 20, 345352 (2014).
84. Marr, D. Vision (Freeman, New York, 1982). 118. Sebold, M. et al. Model-based and model-free decisions in alcohol dependence.
85. Guitart-Masip, M. et al. Go and no-go learning in reward and punishment: Neuropsychobiology 70, 122131 (2014).
interactions between affect and effect. Neuroimage 62, 154166 (2012). 119. Robbins, T.W., Gillan, C.M., Smith, D.G., de Wit, S. & Ersche, K.D. Neurocognitive
86. Huys, Q.J.M., Guitart-Masip, M., Dolan, R.J. & Dayan, P. Decision-theoretic endophenotypes of impulsivity and compulsivity: towards dimensional psychiatry.
psychiatry. Clin. Psychol. Sci. 3, 400421 (2015). Trends Cogn. Sci. 16, 8191 (2012).
87. Frank, M.J. Linking across levels of computation in model-based cognitive 120. Gillan, C.M. et al. Functional neuroimaging of avoidance habits in obsessive-
neuroscience. in An Introduction to Model-Based Cognitive Neuroscience (eds. compulsive disorder. Am. J. Psychiatry 172, 284293 (2015).
B. Forstmann & E. Wagenmakers) 163181 (Springer, New York, 2015). 121. Wunderlich, K., Smittenaar, P. & Dolan, R.J. Dopamine enhances model-based
88. Collins, A.G.E. & Frank, M.J. Opponent actor learning (OpAL): modeling interactive over model-free choice behavior. Neuron 75, 418424 (2012).
effects of striatal dopamine on reinforcement learning and choice incentive. 122. Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural
Psychol. Rev. 121, 337366 (2014). signatures of model-based control during sequential decision making. Proc. Natl.
89. Lisman, J.E. et al. Circuit-based framework for understanding neurotransmitter and Acad. Sci. USA 112, 15951600 (2015).
risk gene interactions in schizophrenia. Trends Neurosci. 31, 234242 (2008). 123. Cushman, F. & Morris, A. Habitual control of goal selection in humans.
90. Murray, J.D. et al. Linking microcircuit dysfunction to cognitive impairment: Proc. Natl. Acad. Sci. USA 112, 1381713822 (2015).
effects of disinhibition associated with schizophrenia in a cortical working memory 124. Collins, A.G.E. & Frank, M.J. Cognitive control over learning: creating, clustering,
model. Cereb. Cortex 24, 859872 (2014). and generalizing task-set structure. Psychol. Rev. 120, 190229 (2013).
91. Krystal, J.H. et al. Subanesthetic effects of the noncompetitive NMDA antagonist, 125. Everitt, B.J. & Robbins, T.W. Neural systems of reinforcement for drug
ketamine, in humans. Psychotomimetic, perceptual, cognitive, and neuroendocrine addiction: from actions to habits to compulsion. Nat. Neurosci. 8, 14811489
responses. Arch. Gen. Psychiatry 51, 199214 (1994). (2005).
92. Anticevic, A. et al. NMDA receptor function in large-scale anticorrelated neural 126. Otto, A.R., Gershman, S.J., Markman, A.B. & Daw, N.D. The curse of planning:
systems with implications for cognition and schizophrenia. Proc. Natl. Acad. Sci. dissecting multiple reinforcement-learning systems by taxing the central executive.
2016 Nature America, Inc. All rights reserved.

USA 109, 1672016725 (2012). Psychol. Sci. 24, 751761 (2013).


93. Frank, M.J. Dynamic dopamine modulation in the basal ganglia: a 127. Schad, D.J. et al. Processing speed enhances model-based over model-free
neurocomputational account of cognitive deficits in medicated and nonmedicated reinforcement learning in the presence of high working memory functioning. Front.
Parkinsonism. J. Cogn. Neurosci. 17, 5172 (2005). Psychol. 5, 1450 (2014).
94. Gurney, K.N., Humphries, M.D. & Redgrave, P. A new framework for cortico-striatal 128. Otto, A.R., Raio, C.M., Chiang, A., Phelps, E.A. & Daw, N.D. Working-memory
plasticity: behavioural theory meets in vitro data at the reinforcement-action capacity protects model-based learning from stress. Proc. Natl. Acad. Sci. USA
interface. PLoS Biol. 13, e1002034 (2015). 110, 2094120946 (2013).
95. Deco, G., Jirsa, V.K., Robinson, P.A., Breakspear, M. & Friston, K. The dynamic 129. Boureau, Y.L., Sokol-Hessner, P. & Daw, N.D. Deciding how to decide: self-control
brain: from spiking neurons to neural masses and cortical fields. PLoS Comput. and meta-decision making. Trends Cogn. Sci. 19, 700710 (2015).
Biol. 4, e1000092 (2008). 130. Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual
96. Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).
1998). 131. Etkin, A., Bchel, C. & Gross, J.J. The neural bases of emotion regulation.
97. Daw, N. Trial-by-trial data analysis using computational models. in Decision Nat. Rev. Neurosci. 16, 693700 (2015).
Making, Affect, and Learning: Attention and Performance XXIII (eds. M.R. 132. Huys, Q.J.M. et al. Bonsai trees in your head: how the pavlovian system sculpts
Delgado, E.A. Phelps & T.W. Robbins) 123 (OUP, 2009). goal-directed choices by pruning decision trees. PLoS Comput. Biol. 8, e1002410
98. Maia, T.V. Reinforcement learning, conditioning and the brain: successes and (2012).
challenges. Cogn. Affect. Behav. Neurosci. 9, 343364 (2009). 133. Huys, Q.J.M. et al. Interplay of approximate planning strategies. Proc. Natl. Acad.
99. Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction Sci. USA 112, 30983103 (2015).
errors. Nature 525, 243246 (2015). 134. Gershman, S.J., Blei, D.M. & Niv, Y. Context, learning and extinction. Psychol.
100. Huys, Q.J.M. et al. The specificity of pavlovian regulation is associated with Rev. 117, 197209 (2010).
recovery from depression. Psychol. Med. (in the press). 135. Gershman, S.J., Jones, C.E., Norman, K.A., Monfils, M.H. & Niv, Y. Gradual
101. Gold, J.M. et al. Negative symptoms and the failure to represent the expected extinction prevents the return of fear: implications for the discovery of state.
reward value of actions: behavioral and computational modeling evidence. Arch. Front. Behav. Neurosci. 7, 164 (2013).
Gen. Psychiatry 69, 129138 (2012). 136. Maia, T.V. Fear conditioning and social groups: statistics, not genetics. Cogn. Sci.
npg

102. Roiser, J.P. et al. Do patients with schizophrenia exhibit aberrant salience? 33, 12321251 (2009).
Psychol. Med. 39, 199209 (2009). 137. Browning, M., Behrens, T.E., Jocham, G., OReilly, J.X. & Bishop, S.J. Anxious
103. Schlagenhauf, F. et al. Striatal dysfunction during reversal learning in unmedicated individuals have difficulty learning the causal statistics of aversive environments.
schizophrenia patients. Neuroimage 89, 171180 (2014). Nat. Neurosci. 18, 590596 (2015).
104. Killcross, S. & Coutureau, E. Coordination of actions and habits in the medial 138. Shenoy, P. & Yu, A.J. Rational decision-making in inhibitory control. Front. Hum.
prefrontal cortex of rats. Cereb. Cortex 13, 400408 (2003). Neurosci. 5, 48 (2011).
105. Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal 139. Frank, M.J. et al. fMRI and EEG predictors of dynamic decision parameters during
and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, human reinforcement learning. J. Neurosci. 35, 485494 (2015).
17041711 (2005). 140. Cavanagh, J.F. et al. Subthalamic nucleus stimulation reverses mediofrontal
106. Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312325 (2013). influence over decision threshold. Nat. Neurosci. 14, 14621467 (2011).
107. Friedel, E. et al. Devaluation and sequential decisions: linking goal-directed and 141. Wiecki, T.V., Antoniades, C.A., Stevenson, A., Kennard, C. & Borowsky, B. A
model-based behavior. Front. Hum. Neurosci. 8, 587 (2014). computational cognitive biomarker for early-stage Huntingtons disease. PLoS One
108. Horga, G. et al. Changes in corticostriatal connectivity during reinforcement (in the press).
learning in humans. Hum. Brain Mapp. 36, 793803 (2015). 142. Whelan, R. & Garavan, H. When optimism hurts: inflated predictions in psychiatric
109. Nutt, D.J., Lingford-Hughes, A., Erritzoe, D. & Stokes, P.R. The dopamine theory of neuroimaging. Biol. Psychiatry 75, 746748 (2014).
addiction: 40 years of highs and lows. Nat. Rev. Neurosci. 16, 305312 (2015). 143. Lemm, S., Blankertz, B., Dickhaus, T. & Mller, K.R. Introduction to machine
110. Redish, A.D. Addiction as a computational process gone awry. Science 306, learning for brain imaging. Neuroimage 56, 387399 (2011).
19441947 (2004). 144. Mwangi, B., Tian, T.S. & Soares, J.C. A review of feature reduction techniques
111. Panlilio, L.V., Thorndike, E.B. & Schindler, C.W. Blocking of conditioning to a cocaine- in neuroimaging. Neuroinformatics 12, 229244 (2014).
paired stimulus: testing the hypothesis that cocaine perpetually produces a signal of 145. Wig, G.S. et al. Parcellating an individual subjects cortical and subcortical brain
larger-than-expected reward. Pharmacol. Biochem. Behav. 86, 774777 (2007). structures using snowball sampling of resting-state correlations. Cereb. Cortex 24,
112. Nelson, A. & Killcross, S. Amphetamine exposure enhances habit formation. 20362054 (2014).
J. Neurosci. 26, 38053812 (2006). 146. Maroco, J. et al. Data mining methods in the prediction of Dementia: A real-data
113. Flagel, S.B. et al. A selective role for dopamine in stimulus-reward learning. comparison of the accuracy, sensitivity and specificity of linear discriminant
Nature 469, 5357 (2011). analysis, logistic regression, neural networks, support vector machines,
114. Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E. & Khamassi, M. Modelling classification trees and random forests. BMC Res. Notes 4, 299 (2011).
individual differences in the form of Pavlovian conditioned approach responses: 147. MacKay, D.J. Bayesian interpolation. Neural Comput. 4, 415447 (1992).
a dual learning systems approach with factored representations. PLoS Comput. 148. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and
Biol. 10, e1003466 (2014). model selection. Neuroimage 14, 11371145 (1995).

nature neuroscience VOLUME 19 | NUMBER 3 | MARCH 2016 413

You might also like