Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Psychological Medicine cambridge.org/psm Original Article *Co-first author Contextual influence of reinforcement learning performance of depression: evidence for a negativity bias? Henri Vandendriessche1,2, * Julien Yadak4, Cédric , Amel Demmou4, *, Sophie Bavard1,2,3 Lemogne5,6, Thomas Mauras7 , and Stefano Palminteri1,2 1 Cite this article: Vandendriessche H, Demmou A, Bavard S, Yadak J, Lemogne C, Mauras T, Palminteri S (2022). Contextual influence of reinforcement learning performance of depression: evidence for a negativity bias? Psychological Medicine 1–11. https://doi.org/ 10.1017/S0033291722001593 Received: 28 May 2021 Revised: 3 May 2022 Accepted: 12 May 2022 Key words: Context dependency; depression; negativity bias; reinforcement learning; reward processing Authors for correspondence: Stefano Palminteri, E-mail: stefano.palminteri@ens.fr; Henri Vandendriessche, E-mail: henri.vandendriessche@ens.fr Laboratoire de Neurosciences Cognitives Computationnelles, INSERM U960, Paris, France; 2Département d’Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France; 3Department of Psychology, University of Hamburg, Hamburg, Germany; 4Unité Psychiatrie Adultes, Hôpital Cochin Port Royal, Paris, France; 5 Université Paris Cité, INSERM U1266, Institute de Psychiatrie et Neurosciences de Paris, Paris, France; 6Service de Psychiatrie de l’adulte, AP-HP, Hôpital Hôtel-Dieu, Paris, France and 7Groupe Hospitalier Universitaire, GHU paris psychiatrie neurosciences, Paris, France Abstract Backgrounds. Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. Methods. We used a two-armed bandit task with two different contexts: one ‘rich’, one ‘poor’ where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socio-economically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. Results. Control subjects showed similar learning performance in the ‘rich’ and the ‘poor’ contexts, while patients displayed reduced learning in the ‘poor’ context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). Conclusions. Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level. Introduction © The Author(s), 2022. Published by Cambridge University Press Depression is a common debilitating disease that is a worldwide leading cause of morbidity and mortality. According to the latest estimates from World Health Organization, in 2015 more than 300 million people are now living with depression (World Health Organization, 2017). Low mood and anhedonia are core symptoms of major depressive disorder. Those two symptoms are key criteria to the diagnostic of Major Depressive Disorder (MDD) in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association, 2013). Anhedonia is broadly defined as a decreased ability to experience pleasure from positive stimuli. Specifically, it is described as a reduced motivation to engage in daily life activities (motivational anhedonia) and reduced enjoyment of usually enjoyable activities (consummator anhedonia). Depression is a complex and heterogeneous disorder implying instinctual, emotional and cognitive dysfunctions. Although its underlying mechanisms remain unclear, it has been proposed – based on the importance of anhedonia and low mood in depression – that reduced reward processing, both in terms of incentive motivation and reinforcement learning, plays a key role in the clinical manifestation of depression (Admon & Pizzagalli, 2015; Chen, Takahashi, Nakagawa, Inoue, & Kusumi, 2015; Eshel & Roiser, 2010; Huys, Pizzagalli, Bogdan, & Dayan, 2013; Safra, Chevallier, & Palminteri, 2019; Whitton et al., 2016). This hypothesis implies that subjects with depression should display reduced reward sensitivity both at the behavioral and neural levels in value-based learning. On the long term, a better understanding of these processes could help for the prevention and management of depression. https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press 2 Henri Vandendriessche et al. Table 1. Descriptive statistics for age, gender, education, usual optimism (LOT-R: Life Orientation Test – Revised), current optimism, depression scores (BDI: Beck Depression Inventory) and number of major depressive episodes (MDE) Group Patients Controls Gender (%female) 30 (53.33) 26 (61.53) Age (mean ± sem) 36.5 ± 2.80 Significance p = 0.54 40.35 ± 2.09 p = 0.28 Education 1.97 ± 0.24 2.42 ± 0.21 p = 0.12 Usual optimism 5.98 ± 0.42 7.16 ± 0.30 p = 0.03 Current optimism 2.38 ± 0.40 7.46 ± 0.29 p = 4.19 × 10−14 LOTR 9.1 ± 0.79 16 ± 0.49 p = 1.76 × 10−9 BDI Previous MDE 29.37 ± 0.22 – – 1.8 ± 0.38 – – Education: years after graduation For each sample, the mean of each variable is presented with its standard error of the mean. Following up on this assumption, numerous studies have tried to identify and characterize such reinforcement learning deficits, however the results have been mixed so far. Indeed, while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate limited or no effect (Brolsma et al., 2022; Chung et al., 2017; Hägele et al., 2015; Rothkirch, Tonn, Köhler, & Sterzer, 2017; Rutledge et al., 2017; Shah, O’carroll, Rogers, Moffoot, & Ebmeier, 1999). Outside the learning domain, other recent studies showed no disrupted valuation during decision-making under risk (Chung et al., 2017; Moutoussis et al., 2018). It is also worth noting that many of previous studies identifying value-related deficits in depression, only included one valence domain (i.e., only rewards or only punishments) and did not directly contrast between rewards and punishments nor separate the two valence domains in different experimental sessions (Admon & Pizzagalli, 2015; Elliott et al., 1996; Elliott, Sahakian, Herrod, Robbins, & Paykel, 1997; Forbes & Dahl, 2012; Gradin et al., 2011; Kumar et al., 2008; Pizzagalli, 2014; Vrieze et al., 2013; Zhang, Chang, Guo, Zhang, & Wang, 2013). A recent study (Pike & Robinson, 2022), where reward and punishment sensitivity has been computationally quantified by assuming different learning rate parameters for positive or negative outcomes show that, compared to controls, contrary to what is generally found in healthy subjects (Chambon et al., 2020; Palminteri, Lefebvre, Kilford, & Blakemore, 2017) patient’s behaviour is generally better explained assuming reduced sensitivity to negative outcomes. Here we speculate that the lack of concordant results may be in part explained by the fact that reinforcement learning impairment in depression is dependent on the overall value of the learning context. In fact, computational studies clearly illustrate that the behavioral consequences of blunted reward and punishment sensitivity depend on the underlying distribution of outcome. More specifically, Cazé and Van Der Meer (Cazé & van der Meer, 2013) showed that greater sensitivity to reward compared to punishment (positivity bias; as proxied by different learning rates; Pike and Robinson, 2022) advantages learning in contexts with poor overall reward expectation (i.e., ‘poor’ contexts) compared those with high overall reward expectation (‘rich’ contexts). Conversely, greater sensitivity to punishment compared to reward (negativity bias) should advantage learning in ‘rich’ context. As a consequence, if depressive patients present blunted reward compared to punishment sensitivity (i.e., a negativity bias) this should https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press induce a difference in performance, specifically in ‘poor’ contexts, where displaying a positivity bias is optimal. To test this hypothesis, we adapted a standard protocol composed by a learning and a post-learning transfer phase. The learning phase included two different contexts: one defined as ‘rich’ (in which the two options have an overall positive expected value) and the other as ‘poor’ (two options with an overall negative expected value). In contrast with the learning phase, there was no feedback in the transfer phase, in order to probe the subjective values of the options without modifying it (Bavard, Lebreton, Khamassi, Coricelli, & Palminteri, 2018; Frank, Seeberger, & O’Reilly, 2004; Palminteri, Khamassi, Joffily, & Coricelli, 2015). In similar tasks, healthy subjects are generally reported to be able to learn equally from rewards and punishments (Palminteri et al., 2015; Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006). However, based on the idea that depression blunts reward sensitivity and that a positivity bias is advantageous in the ‘poor’ contexts, we expected a learning asymmetry in MDD patients. More precisely, learning rate differences should induce lower performance in the ‘poor’ context in MDD patients. In addition to choice data, we also analyzed reaction times and outcome observation times as ancillary measures of attention and performance. Previous findings suggest that negative value contexts are associated with overall slower responses (Fontanesi, Gluth, Spektor, & Rieskamp, 2019a; Fontanesi, Palminteri, & Lebreton, 2019b). However, previous studies did not find any specific reaction time signatures in patients (Brolsma et al., 2021; Chase et al., 2010; Douglas, Porter, Frampton, Gallagher, & Young, 2009; Knutson, Bhanji, Cooney, Atlas, & Gotlib, 2008). Methods Participants and inclusion criteria Fifty-six subjects were recruited in a clinical center (the Ginette Amado psychiatric crisis center) in Paris between May 2016 and July 2017. Inclusion criteria were a diagnosis of major unipolar depression diagnosed by a psychiatrist and an age between 18 and 65 years old (see Table 1). A clear, oral and written explanation was also delivered to all participants. All procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 3 Psychological Medicine Table 2. Patients’ treatments Medication Number of patients SSRI 22 Benzodiazepine 21 Tricyclic antidepressant 2 Tetracyclic antidepressant 1 Phenothiazine 2 Corticosteroïds 1 Others 2 ‘SSRI’: selective serotonin reuptake inhibitor; ‘others’: anti-arrhythmic agent or vitamins. 2008. In total, we tested N = 30 patients undergoing a Major Depressive Episode (MDE) and N = 26 age-, gender- and socioeconomically-matched controls. For patients, exclusion criteria were the presence of psychotic symptoms or a diagnosis of chronic psychosis, severe personality disorder, neurological or any somatic disease that might cause cognitive alterations, neuroleptic treatment, electro-convulsive therapy in the past 12 months and current substance use. Psychiatric co-morbidities were established by a clinician with a semi-structured interview based on the Mini International Neuropsychiatric Interview (MINI) (Sheehan et al., 1998). In our final sample, some patients (n = 13) presented anxiety-related disorders. Among them, some (n = 6) presented isolated anxiety-related disorders (social anxiety n = 2; panic disorder n = 2; agoraphobia n = 1; claustrophobia n = 1) and the rest of the group (n = 7) presented several associated anxiety-related disorders (agoraphobia n = 4; panic disorder n = 4; social anxiety n = 3; generalized anxiety n = 3; OCD n = 1; PTSD n = 1). Others (n = 8) presented substance abuse disorder (cannabis n = 3; alcohol n = 4; cocaine n = 2). All patients were undertaking medication (see Table 2 for details). Participants included in the healthy volunteer group had no past or present psychiatric diagnosis and were not taking any psychoactive treatment. Behavioral testing Patients volunteering to take part in the experiment were welcomed in a calm office away from the center’s activity where they were given information about the aim and the procedure of the study. The study was verbally described as an evaluation of cognitive functions through a computer «game». The diagnostic of MDE and the presence of psychiatric co-morbidities were assessed with the MINI screener completed in a semi-structured interview with a psychiatrist by the MINI. The subjects were then asked to complete several questionnaires assessing their level of optimism [Life Orientation Test- Revised (LOT-R)], an optimism analog scale (created for this study to contrast usual and current level of optimism) and the severity of depression (Beck Depression Inventory – II) (Beck, Steer, Ball, & Ranieri, 1996). The participants were told they were going to play a simple computer game, whose goal was to earn as many points as possible. Written instructions were provided and verbally reformulated if necessary. There was no monetary compensation as patients did the task alongside a psychiatric assessment. To match patients’ conditions, controls did not receive any compensation either. As in previous studies of reinforcement learning the behavioral protocol was divided into a learning phase and a transfer phase (Chase et al., 2010; Frank et al., 2004; Palminteri & Pessiglione, https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press 2017)(Fig. 1a). Options were materialized by abstract symbols (agathodaimon font). Symbols appeared in pairs of abstract symbols displayed on a black screen. During the learning phase, options were presented in fixed pairs, while during the transfer phase they were presented in all possible combinations (Fig. 1b). Beforehands, subjects were told that one of the two options was more advantageous than the other and encouraged to identify it to maximize their (fictive) reward. Each symbol was associated to a fixed reward probability. The reward probability attached to each symbol was never explicitly given and the subjects had to learn it through trial and error. Reward probabilities were inspired by previous empirical and theoretical studies (Cazé & van der Meer, 2013; Chambon et al., 2020; Palminteri & Pessiglione, 2017) and distributed across symbols as follows: 10%/40% (‘poor’ context), 60%/90% (‘rich context’). The reward probabilities were decided in order to have the same choice difficulty (as indexed by the difference in expected value between the two options) across choice contexts. The learning phase was divided in two sessions of 100 trials each (each involving both the ‘rich’ and the ‘poor’ context repeated for 50 trials). In the transfer phase the eight different symbols were presented in all binary combinations four times (including pairing that had never been displayed together in the previous phase; 112 trials). The subjects had to choose which symbol was deemed to be the more rewarding, however, in the transfer phase, no feedback was provided in order not to interfere with subjects’ final estimates of option values (Chase et al., 2010; Frank et al., 2004; Palminteri & Pessiglione, 2017). The subjects were told to use instinct when doubting. The aim of the transfer phase was to assess the participants’ learning process on a longer time scale than the learning phase, which is supposed to mainly rely on working memory (Collins & Frank, 2012). The transfer phase also assessed the capacity to remember and extrapolate the symbols’ subjective values out of their initial context (generalization). When the symbols appeared on the screen, subjects had to choose between the two symbols by pushing a right or a left key on a keyboard. In respectively rewarded/punished trials a green/red smiley/sad face and ‘ + 1pts’/‘−1pts’ appeared on screen. In order to be sure that the subjects paid attention to the feedback, they had to push the up key after a win and the down key after a loss to move to the next trial (Fig. 1c; top). Trials in the transfer phase were different in that the feedback was not displayed (Fig. 1c; bottom). Dependent variables The main behavioral variables of our study are the correct choice rates, as measured in the learning and the transfer phase. A choice is defined ‘correct’ (coded as ‘1’) if the participant picks the reward maximizing option, incorrect (coded as ‘0’) otherwise. In the learning phase, the correct choice is, therefore picking ‘A’ in the ‘rich’ context and ‘B’ in the ‘poor’ contexts (Fig. 1b). For display purposes, the learning curves were smoothed (five trials sliding average) (Fig. 2a). In the transfer phase, the correct choice was defined in a trial-by-trial basis and depended on the particular presented combination (note that in some trials, a correct choice could not be defined, as the comparison involved two symbols with the same value, originally presented in different sessions) (Fig. 1b). For display purposes, concerning the transfer phase, we also considered the choice rate, defined as how many times a given option has been chosen, divided by the number of times a given option has been presented (calculated across all 4 Henri Vandendriessche et al. Fig. 1. Experimental methods. (a) Time course of the experiment: after written instruction the experiment started with a short training (20 trials) using different stimuli (letters). The training was followed by two learning sessions, each with 4 different stimuli, arranged in fixed pairs. Each pair was presented 50 times, learning to 200 trials in total. After the last session, participants were administered a transfer phase where all stimuli from the learning sessions were presented in all possible combinations. All pair-wise combinations (28) were presented 4 times, learning to 112 trials in total. (b) Option pairs. Each learning session featured two 2 fixed pairs of options (contexts), characterized by different outcomes values: a ‘rich’ one with an overall positive expected value (the optimal option with a 0.9 probability of reward) and a ‘poor’ context (the optimal option with a 0.4 probability of reward). The two contexts were presented in an interleaved manner during the learning phase. In the transfer phase all 8 symbols from the learning phase (2 symbols × 2 contexts × 2 learning sessions) were presented in every possible combination. Gray boxes indicate the comparisons between options with the same value (e.g., A v. A’), which were not included in the statistical analysis of the transfer phase (because there is no correct response). (c) Successive screen in the learning phrase (top) and the transfer phase (bottom). Durations are given in milliseconds. possible combinations except the similar option ones) (Fig. 2b). As ancillary exploratory dependent variables we also looked at two different measures of response times. More precisely, we extracted the reaction times (i.e., the time spent between symbols’ onset and choice; Figure 3a) and the outcome observation time (i.e., the time spent between reward onset and key press to next trial; Figure 3b). For display purposes, response time curves were also smoothed (five trials sliding average). Statistical analyses The dependent variables were analyzed using Generalized Linear Mixed Models (GLMM) as implemented by the function glmer of the software R [R version 3.6.3 (2020-02-29) R Core Team (2022)] and the package lme4 [lme4 version: 1.1-27.1; (Bates, Mächler, Bolker, & Walker, 2015)]. The GLMMs of correct choice rates (both in the learning and the transfer phase) used a binomial linking function, while those of response times (both reaction times and outcome observation time) used a gamma linking function (Yu et al., 2022). All GLMMs were similarly constructed and included ‘subject’ number as a random effect and ‘group’ (between-subject variable: controls v. patients), ‘context’ (withinsubject variable) and interaction between the two as fixed-effects. For dependent variables extracted from the learning phase the ‘context’ within subject variable corresponded to whether the measure was taken from the ‘rich’ or the ‘poor’ context. In the GLMM of the correct choice rate in the transfer phase the variable ‘condition’ took three levels that corresponded to whether or not the choice under consideration involved the best possible option in the ‘rich’ condition (‘A present’); whether or not the choice https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press under consideration involved the worst possible option in the ‘poor’ condition (‘D present’) and all the other trials (‘other’) (see Fig. 1b). Post hoc comparisons were assessed by comparing the marginal means of the contrast of interest to zero. All p values are reported after Tukey’s correction for multiple comparisons. Model fitting and model simulations To link the behavioral performance in our task to computational processes, we performed some simulations. More specifically, to assess the behavioral consequences of learning rate biases, we simulated a variant of a standard cognitive model of reinforcement learning. The model assumes that subjective option values (Q values) are learnt from reward prediction errors (RPE) that quantify the difference between expected and obtained outcome (Sutton & Barto, 2018). In this model, Q values are calculated for each combination of states (s; in our task the four contexts; Figure 1b) and actions (a; in our task the symbols). Most of those models assume that subjective options values are updated following a Rescorla-Wagner rule (Rescorla & Wagner, 1972). However, to assess the behavioral consequences of a positivity and negativity bias, based on previous studies (Chambon et al., 2020; Frank, Moustafa, Haughey, Curran, & Hutchison, 2007; Niv, Edlund, Dayan, & O’Doherty, 2012), we modified the standard model by including different learning rates for positive and negative prediction errors (that in our design are correspond to positive and negative outcomes): Q(s, a) ⇐ Q(s, a) +  a+ × (r − Q(s, a)), a− × (r − Q(s, a)), if r . 0 if r , 0 5 Psychological Medicine Fig. 2. Choice data. (a) ‘Correct choice rate’ is the probability of picking the most rewarding option. Thick lines represent smoothed running average (5 trials sliding average) and shaded areas the standard error of the mean. The violet dots correspond to trials displaying a significant difference among contexts ( p < 0.05; calculated on the raw, unsmoothed, data points). (b) ‘Choice rate’ is the probability of picking given symbols in any given choice pair. The choice rates are averaged across symbols belonging to the first and second session (in Fig. 1, denoted A and A’, respectively). Areas represent probability density functions. Boxes represent confidence intervals (95%) and dots represent individual subjects. The model decision rule was implemented as a softmax function, that calculates the probability of choosing a given option as a function of the difference between the Q values of the two options, as follows: Pt (s, a) = 1+e 1 Q (s,b)−Q (s,a) t t b To assess the effect of the positivity and negativity bias on learning performance of our task we ran extensive model simulation where artificial agents played our learning task (i.e., a ‘rich’ https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press and a ‘poor’ context, for 50 trials each). More specifically, we simulated two different sets of learning rates (1000 virtual agents each). One set represented agents with a positivity bias (i.e., α+ > α−), and the other set agents with a negativity bias (α+ < α−)(Cazé & van der Meer, 2013). The value of the parameters (learning rates and temperatures) was randomly drawn from uniform distributions; the temperature was drawn from β ∈ U(0, 1) and the learning rates (for example in the positivity bias case) were drawn from α+ ∈ U(0, 1) and α− ∈ U(0, α+) (the opposite was true for the negativity bias case). After running the simulations, we also fitted the empirical data. More specifically, we focused on fitting the transfer phase choices, 6 Henri Vandendriessche et al. Fig. 3. Model-based results. (a) The panels depict the results of model simulations where agents are represented by a two learning rates model, featuring either a positivity or a negativity bias (N = 1000 virtual subjects per group; see methods for more details about the simulations). The leftmost panel (green) show the simulations of agents displaying a positivity bias, while the rightmost panel (orange) displays the simulations of agents displaying a negativity bias. Thick lines represent smoothed running average (5 trials sliding average) and shaded areas the standard error of the mean. (b) The panels represent learning rates for positive (green) and negative (red) prediction errors separately for healthy controls (leftmost panel) and patients (rightmost panel). Areas represent probability density functions. Boxes represent confidence interval (95%) and dots represent individual subjects. because it allows to estimate learning rates involved in long term learning, whose estimation is not contaminated by working memory or choice perseveration biases (Collins & Frank, 2012; Frank et al., 2007; Katahira, Yuki, & Okanoya, 2017). The model free parameters (temperature and learning rates) were fitted at the individual level using the fmincon function (Optimization Toolbox R2021b. MATLAB. (2021). 9.11.0.1809720 (R2021b). 2021B, Natick, Massachusetts: The MathWorks, Inc.) via log model evidence maximization as previously described (Daw, Gershman, Seymour, Dayan, & Dolan, 2011; Wilson & Collins, 2019). Results Demographics Patients and controls were matched in age (t(51) = −1.1, p = 0.28), gender (t(53) = 1.15, p = 0.29) and years of education (t(54) = −1.59, p = 0.12). Concerning the optimism measures, patients with depression were found to be less optimistic in all scales (LOT-R: t(47) = −7.42, p = 1.76 × 10−9; usual optimism: t(51) = −2.29, p = 0.03; current optimism: t(50) = −10.34, p = 4.19 × 10−14). Furthermore, the comparison between usual v. current optimism in patients and controls, revealed that only patients were significantly less optimistic than usual at the moment of the test (patients: t(29) = 8.26, p = 4.21 × 10−9; controls t(25) = −1.53, p = 0.14), consistent with the fact that they were undergoing an MDE. All patients were taking at least one psychotropic medication at the moment of test. Their average BDI was: 29.37 and they had, on average, 1.8 previous MDE in the past. A more detailed inspection reveals that controls’ learning curves were unaffected by the choice context (‘rich’ v. ‘poor’), while patients’ learning curves were different depending on the choice context (with a lower correct response rate in the ‘poor’ context). Correct response rate (as proxied by the intercept of our GLMM) in the learning phase (Fig. 2a) indicated that overall performance is significantly above chance (χ2(1, 56) = 16.17, p < 0.001) which reflects the fact that accuracy was, on average, well above chance level (0.5). There was no significant effect of context (χ2(1, 56) = 0.046, p = 0.83) and no main effect of group (χ2(1, 56) = 2.86, p = 0.091) meaning that there were no overall significant differences between the patients and controls and between the ‘rich’ and ‘poor’ contexts. However, there was a significant interaction between context and group (χ2(1, 56) = 5.88, p = 0.015). Concerning the interaction context and group, post hoc tests indicated that it was driven by an effect of context present in patients (slope = −0.72, S.E. = 0.24, p < 0.0027), but not in controls (slope = −0.063, S.E. = 0.29, p = 0.83). These results therefore show a specific impact of the context on the two groups. Patients displayed higher accuracy in the ‘rich’ compared to the ‘poor’ contexts, while controls were not affected by this factor as expected from previous articles in the literature (Palminteri et al., 2015; Pessiglione et al., 2006). Critically, learning phase results cannot establish whether the performance asymmetry observed in patients stems from the learning (i.e., how values are updated) or a decision effect (i.e., how options are selected) processes. To tease apart these interpretations we turned to the analysis of the transfer phase performance. Transfer phase analysis Learning phase results Global inspection of the learning curves (Fig. 2a) suggests that, overall, participants were able to learn to respond correctly. Indeed, all the learning curves are above chance whatever the group or the context. https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press The visual inspection of the option-by-option choice rate in the transfer phase, showed that subjects were able to retrieve the values of the options and express meaning preferences among them (Fig. 2b). In fact, in all groups, the options ‘A’ (overall 7 Psychological Medicine Fig. 4. Response times. (a) ‘Reaction time’ is the time separating the options onset from the moment the participant selects of one of the two options. Trials are grouped by condition and averaged across sessions. Durations are given in milliseconds. Thick lines represent smoothed running average (5 trials sliding average) and shaded areas the standard error of the mean. The violet dots correspond to trials displaying a significant difference among conditions ( p < 0.05; calculated on the raw, unsmoothed, data points). (b) Outcome observation time is the time separating the outcome onset from the moment the participant confirms the outcome to move to the subsequent trial. Legend as in (a). highest value) were chosen much more frequently compared to options ‘D’ (overall lowest value) in both groups. Intermediate value options (‘B’ and ‘C’) scored in between the extreme one (with a pattern reminiscent of relative value encoding; Klein, Ullsperger, & Jocham, 2017; Palminteri & Lebreton, 2021). Before assessing whether the learning asymmetry observed in patients in the learning phase replicated in the transfer phase, one has to keep in mind that there were no more fixed choices contexts in the transfer phase, but options were presented in all possible combinations. Accordingly, the context factor used for the transfer phase contained three levels, defined by the presence of particular options: (1) trials involving the ‘A’ options (and https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press not ‘D’); (2) trials involving the ‘D’ options (and not ‘A’); (3) other trials. Also in the transfer phase, average correct response rate (as proxied by the intercept of our GLMM) shows that overall performance was significantly above chance (χ2(1, 56) = 15.9, p < 0.001). We also found a significant effect of group (χ2(1, 56) = 6.83, p = 0.009), no effect of context (χ2(1, 56) = 2.23, p = 0.327) and a very strong and significant group by context interaction (χ2(1, 56) = 53.21, p < 0.001). Post-hoc tests reveal that controls were equally able to make the correct decision in contexts involving seeking ‘A’ or those involving avoiding ‘D’ (slope = −0.004, S.E. = 0.1, p = 0.999) whereas patients were strikingly better at seeking ‘A’ than avoiding ‘D’ (slope = 1.06, S.E. = 0.1, p < 0.001). 8 These results are consistent with the learning phase results. The context-specific asymmetry in patients that we found in the learning phase was also present in the transfer phase where all the different options were extracted from their initial context and paired with other options. It allows us to conclude that the performance asymmetry can be traced back to the learning asymmetry, where negative outcomes (more frequent following the worst possible option ‘D’) seem to exert a smaller effect on patients’ learning performances than positive ones (more frequent following the best possible option ‘A’) (Frank et al., 2004). Modelling results Model simulations indicate that learning biases affect performance in a context-dependent manner (Fig. 3a). More specifically in our task, a positivity bias (α+ > α−) is associated to similar accuracy in the ‘rich’ and ‘poor’ contexts, while a negativity bias (α+ < α−) is associated with much higher accuracy in the ‘rich’ compared to the ‘poor’ context. The reason for this result can be traced down to the idea that it is rational to preferentially learn from rare outcomes (Cazé & van der Meer, 2013). The ‘positivity bias’ behavioral pattern closely resembles what we observed in healthy participants, while the ‘negativity bias’ pattern closely reminds the one observed in patients, thus suggesting what we patients are better explained by an exacerbated sensitivity to negative outcomes. To formally substantiate this intuition, we submitted the learning rates fitted from transfer phase choices to a 2 × 2 ANOVA, with group (patients v. controls) and valence (positive or negative learning rate), as between- and within-subject variables, respectively (Fig. 3b). The results showed a main effect of group [F(1, 107) = 5.26, p = 0.024; η2 (partial) = 0.05, 95% CI (3.37 × 10−3, 1.00)], no main effect of valence [F(1, 107) = 3.27 × 10−3, p = 0.954; η2 (partial) = 3.06 × 10−5, 95% CI (0.00, 1.00)], and, crucially, a significant valence-by-group interaction [F(1, 107) = 7.58, p = 0.007; η2 (partial) = 0.07, 95% CI (0.01, 1.00)]. Finally, we detected no significant different in the choice temperature (t(48) = 1.64, p = 0.11). Response time analysis As an exploratory analysis, to assess how learning performance reflected into response times (both at the decision and the learning phase), we looked at reaction and outcome observation times during the learning phase. Reaction times (defined as the difference between stimuli onset and button pressing to make a decision) showed a main effect of the context (χ2(1, 56) = 9.83, p = 0.002), with reaction times being higher in the ‘poor’ compared to the ‘rich’ condition, which is consistent with previous studies showing valence induced slowing in reinforcement learning (Fontanesi et al., 2019b; Figure 4a). Reaction times showed is no significant main effect of the group (χ2(1, 56) = 0.03, p = 0.86) nor interaction between context and group (χ2(1, 56) = 0.12, p = 0.73). Post hoc tests showed that the effect of context was significant in both controls (slope = 0.047, S.E. = 0.016, p < 0.003) and patients (slope = −0.043 S.E. = 0.0067, p < 0.001). Outcome observation time (defined as the difference between the outcome onset and button pressing to move to the next trial) also displayed no significant effect of the context (χ2(1, 56) = 10.39, p < 0.123) but no effect of the group (χ2(1, 56) = 2.17, p = 0.14) nor interaction (χ2(1, 56) = 0.39, p = 0.53) (Fig. 4b). https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press Henri Vandendriessche et al. Taken together, reaction and outcome observation time analyses, suggest that learning performance asymmetry in patients could not be accounted for by reduced engagement and outcome processing during the learning task. Discussion In the present study, we assessed reinforcement learning with a behavioral paradigm involving two different reward contexts – one ‘rich’ with a positive overall expected value and one ‘poor’ with a negative overall expected value – in patients undergoing a major depressive episode and age-, gender- and educationmatched healthy volunteers. We used a reinforcement learning task featuring two different learning contexts: one with an overall positive expected value (‘rich’ context) and one with a overall negative expected value (‘poor’ context). Coherent with previous studies, healthy subjects learned equally well in both contexts (Palminteri & Pessiglione, 2017). On the other hand, patients with depression displayed reduced correct response rate in the ‘poor’ context. This contextdependent learning asymmetry found in the learning phase was confirmed in the analysis of the transfer phase, where subjects were asked to retrieve and generalize the values learned during the learning sessions. In standard reinforcement learning tasks, a participant has to learn the value of the options and select among them. A deficit in reinforcement learning can therefore arise from two possible causes. On one hand, it can be caused by a learning impairment, i.e., failing to accurately update the value of the stimulus. On the other hand, it can be the result of a decision impairment. In this scenario, a participant could still end up selecting the wrong stimulus even though the learning process in itself is intact. Our design, coupling a learning phase with feedback and a transfer phase, where we shuffled all options without any feedback, allows us to separate these two possible sources of error. Indeed, a decision-related problem would lead to a specific impairment during the learning phase but in the transfer phase, there should be none or only an unspecific impairment. On the other side, a valence-specific update-related deficit would originate in the learning phase (when feedback is provided) and would therefore propagate in the transfer phase and be associated only to the concerned specific options (Frank et al., 2007). Our results are consistent with this second scenario, as we showed that patients were less able to identify the correct response of the ‘poor’ context both in the learning and the transfer phase. Hence, this suggests that the asymmetrical performance observed in patients, stems from the learning process per se and not from the decision process. Therefore, we suppose that this asymmetric learning pattern is the consequence of a more complex mechanism, embedded in the learning process and triggered by affectively negative situations or less frequent affectively positive situations (‘poor’ context). Our results suggest that learning performances in depression are dependent on the valence of the context. More specifically, patients undergoing a major depressive episode seem to perform worst at learning in negative value context, compared to positive one. This was true despite the fact that the two contexts are matched in difficulty. Accordingly, control participants on the contrary show no difference in performance between the two contexts. Prima facie, this observation challenges some formulations of the negative bias hypothesis described in the literature. Some studies describe negative affective biases in several cognitive 9 Psychological Medicine processes, such as emotion, memory and perception, as an increased and aberrant saliency of negative affective stimuli (for review see Gotlib and Joormann, 2010; Joormann and Quinn, 2014). From this view, one could extrapolate that, contrary to what we observed in our data, MDD patients should display, if anything, higher performance in the ‘poor’ contexts. This prediction contrasts with a computational definition of negativity bias, as a difference between learning rates for positive and negative outcomes (or reward prediction errors). In fact, model simulations studies clearly show that learning positivity or negativity biases affect performance in a context-dependent manner, that in our case is consistent with the idea of a negativity bias in depression (Bavard & Théro, 2018; Cazé & van der Meer, 2013). The results were confirmed by model simulations and analysis of learning rates that were fitted from transfer phase choices and, even if it is hard to find in the literature a systematic pattern, it is consistent with recent computational meta analyses by Pike and co (Beck, 1987; Brolsma et al., 2022; Chase et al., 2010; Eshel & Roiser, 2010; Gradin et al., 2011; Henriques et al., 1994; Huys et al., 2013; Knutson et al., 2008; Kumar et al., 2008; Murphy, Michael, Robbins, & Sahakian, 2003; Pike & Robinson, 2022; Pizzagalli, Jahn, & O’Shea, 2005; Steele, Kumar, & Ebmeier, 2007; Ubl et al., 2015; Whitton et al., 2016). Crucially, consistent with our simulations, the overall good performance of patients and more specifically in the ‘rich’ context indicated that patients displayed no generic impairments. Overall good performance of patients in some control conditions is actually not uncommon and can be explained by the fact that patients in general are more focused and more involved than controls in this type of study (the so-called Hawthorne effect), because the result of this experiment is much more ‘meaningful’ for them than it is for controls (Frank et al., 2004). In addition to choice data, in our studies we collected two different response time measures. The first one, reaction time, was classically defined as the time between the stimuli onset the choice button press. Reaction times were not different between our groups of participants, indicating that in our experiment we were not able to provide support for the idea of a generalized sensorimotor slowing in patients (Byrne, 1976). On the other hand, reaction times were strongly affected by the experimental condition, being significantly slower in the ‘poor’ context in both groups. This finding is at apparent odds with the fact that objective difficulty (as quantified by the difference in value between the two options) was matched across contexts (note that this effect was also present in healthy controls, who displayed equal performance in both conditions). However, slower reaction times in the ‘poor’ context are consistent with recent findings (Fontanesi et al., 2019b). Indeed, previous studies coupling behavioral decision diffusion model analyses with reinforcement learning paradigms indicate that reaction times tend to be slower in negative valence contexts, compared to positive valence ones. This effect is well captured by a combination of increased non-decision time (a possible manifestation of Pavlovian-toinstrumental transfer; Guitart-Masip et al., 2012) and increased cautiousness (a possible manifestation of loss attention; Yechiam & Hochman, 2014). We also recorded the outcome observation times, that quantify the time separating the onset of the outcome from the button press necessary to move to the subsequent trial. Overall, outcome observation times were not significantly modulated by our factors, therefore indicating that the learning asymmetry observed in patients could not be explained by not processing outcome information. https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press Our study of course suffers from few important limitations. One limitation is the relatively small sample size, which is of course due to the fact that our study was monocentric and went for a relatively short time period. We note, however, that several meaningful insights concerning impairment of reinforcement learning in psychiatric diseases has been obtained until very recently from studies with sample size comparable to ours (Chase et al., 2010; Frank et al., 2004; Henriques & Davidson, 2000; Huys et al., 2016; Moutoussis et al., 2018; Murphy et al., 2003; Rothkirch et al., 2017; Rupprechter, Stankevicius, Huys, Steele, & Seriès, 2018). Future, multi-centric, studies will be required to overcome this issue and probe the replicability and generalizability of our findings. Furthermore, by openly sharing our data, our study may contribute to (computational) meta-analysis (Pike & Robinson, 2022). Another limitation of our study is that patients were medicated at the time of the experiment. Even though studies have found effects on performance on medicated and unmedicated patients (Douglas et al., 2009; Steele et al., 2007), it is always difficult to control for this effect, especially when certain patients take medications for other comorbidities. Additionally, the role of serotonin in reward and punishment learning is far from being understood (Palminteri & Pessiglione, 2017). In some tasks, it has been shown to improve performance in a valence-independent manner, making unlikely that the observed effect was a consequence of medication (Palminteri, Clair, Mallet, & Pessiglione, 2012). So, under the theory that serotonin drives punishment avoidance learning, we would observe the opposite effect. Finally, as MDD is a polysemic condition, and even though we tried to monitor and control the inclusion of patients to avoid interference with other mental conditions, some patients had other symptoms, especially addictive disorders, that should be considered in future studies. In the literature, is has been repeatedly shown that controls perform equally when they have to choose a reward or avoid a punishment. It is also frequent that patients with mental or neurological disorders other than MDD show an imbalance behavior when implicated in a task with a reward selection and a punishment avoidance (Frank et al., 2004). Studying several aspects of reward processing that correspond to different neurobiological circuits and exploring dysregulation across different psychiatric disorders could be a very efficient way to unfold abnormalities in reward-related decision making. It could be interesting to apply our task to other psychiatric disorders in order to identify neurobiological signatures and develop more targeted and promising treatments (Brolsma et al., 2022; Insel et al., 2010; Whitton, Treadway, & Pizzagalli, 2015). Data Data collected for this paper, a R script presenting the main figures of the paper as well as some Matlab simulation files are available here https://github.com/hrl-team/Data_depression. Acknowledgements. We thank Magdalena Soukupova for her bright insights on statistical analysis. HV is supported by the Insti tut de Recherche en Santé Publique (IRESP, grant number: 20II171-00). SP is supported by the Institut de Recherche en Santé Publique (IRESP, grant number: 20II138-00), and the Agence National de la Recherche (CogFinAgent: ANR-21-CE23-0002-02; RELATIVE: ANR-21-CE37-0008-01; RANGE: ANR-21-CE28-0024-01). The Departement d’études cognitives is funded by the Agence National de la Recherche (FrontCog ANR-17-EURE-0017). The funding agencies did not influence the content of the manuscript. 10 Conflict of interest. Dr Lemogne reports personal fees and non-financial support from Boehringer Ingelheim, Janssen-Cilag, Lundbeck, Otsuka Pharmaceutical, outside the submitted work. The other authors declare not competing conflict of interest concerning the related work. References Admon, R., & Pizzagalli, D. A. (2015). Dysfunctional reward processing in depression. Current Opinion in Psychology, 4, 114–118. https://doi.org/10. 1016/j.copsyc.2014.12.011. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). Washington, DC: American Psychiatric Pub. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01. Bavard, S., Lebreton, M., Khamassi, M., Coricelli, G., & Palminteri, S. (2018). Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences. Nature Communications, 9(1), 4503. https://doi.org/10.1038/s41467-018-06781-2. Bavard, S., & Théro, H. (2018). [Re] adaptive properties of differential learning rates for positive and negative outcomes. ReScience 4(1), 5. https://doi.org/ 10.5281/ZENODO.1289889. Beck, A T. (1987). Cognitive models of depression. Journal of Cognitive Psychotherapy, 1(1), 5–37. Beck, A. T., Steer, R. A., Ball, R., & Ranieri, W. F. (1996). Comparison of beck depression inventories-IA and-II in psychiatric outpatients. Journal of Personality Assessment, 67(3), 588–597. https://doi.org/10.1207/ s15327752jpa6703_13. Brolsma, S. C. A., Vrijsen, J. N., Vassena, E., Kandroodi, M. R., Bergman, M. A., van Eijndhoven, P. F., … Cools, R. (2022). Challenging the negative learning bias hypothesis of depression: Reversal learning in a naturalistic psychiatric sample. Psychological Medicine, 52(2), 303–313. https://doi. org/10.1017/S0033291720001956. Brolsma, S. C. A., Vassena, E., Vrijsen, J. N., Sescousse, G., Collard, R. M., van Eijndhoven, P. F., … Cools, R. (2021). Negative learning bias in depression revisited: Enhanced neural response to surprising reward across psychiatric disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 6(3), 280–289. https://doi.org/10.1016/j.bpsc.2020.08.011. Byrne, D. G. (1976). Choice reaction times in depressive states. British Journal of Social and Clinical Psychology, 15(2), 149–156. https://doi.org/10.1111/j. 2044-8260.1976.tb00020.x. Cazé, R. D., & van der Meer, M. A. A. (2013). Adaptive properties of differential learning rates for positive and negative outcomes. Biological Cybernetics, 107(6), 711–719. https://doi.org/10.1007/s00422-013-0571-5. Chambon, V., Théro, H., Vidal, M., Vandendriessche, H., Haggard, P., & Palminteri, S. (2020). Information about action outcomes differentially affects learning from self-determined versus imposed choices. Nature Human Behaviour, 4(10), 1067–1079. https://doi.org/10.1038/s41562-020-0919-5. Chase, H. W., Frank, M. J., Michael, A., Bullmore, E. T., Sahakian, B. J., & Robbins, T. W. (2010). Approach and avoidance learning in patients with major depression and healthy controls: Relation to anhedonia. Psychological Medicine, 40(3), 433–440. https://doi.org/10.1017/S0033291709990468. Chen, C., Takahashi, T., Nakagawa, S., Inoue, T., & Kusumi, I. (2015). Reinforcement learning in depression: A review of computational research. Neuroscience & Biobehavioral Reviews, 55, 247–267. https://doi.org/10.1016/ j.neubiorev.2015.05.005. Chung, D., Kadlec, K., Aimone, J. A., McCurry, K., King-Casas, B., & Chiu, P. H. (2017). Valuation in major depression is intact and stable in a nonlearning environment. Scientific Reports, 7, 44374. https://doi.org/10.1038/ srep44374. Collins, A. G. E., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024– 1035. https://doi.org/10.1111/j.1460-9568.2011.07980.x. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. https://doi.org/10.1016/j.neuron.2011. 02.027. https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press Henri Vandendriessche et al. Douglas, K. M., Porter, R. J., Frampton, C. M., Gallagher, P., & Young, A. H. (2009). Abnormal response to failure in unmedicated major depression. Journal of Affective Disorders, 119(1), 92–99. https://doi.org/10.1016/j.jad. 2009.02.018. Elliott, R., Sahakian, B. J., Herrod, J. J., Robbins, T. W., & Paykel, E. S. (1997). Abnormal response to negative feedback in unipolar depression: Evidence for a diagnosis specific impairment. Journal of Neurology, Neurosurgery & Psychiatry, 63(1), 74–82. https://doi.org/10.1136/jnnp.63.1.74. Elliott, R., Sahakian, B. J., McKay, A. P., Herrod, J. J., Robbins, T. W., & Paykel, E. S. (1996). Neuropsychological impairments in unipolar depression: The influence of perceived failure on subsequent performance. Psychological Medicine, 26(5), 975–989. https://doi.org/10.1017/S0033291700035303. Eshel, N., & Roiser, J. P. (2010). Reward and punishment processing in depression. Biological Psychiatry, 68(2), 118–124. https://doi.org/10.1016/j.biopsych.2010.01.027. Fontanesi, L., Gluth, S., Spektor, M. S., & Rieskamp, J. (2019a). A reinforcement learning diffusion decision model for value-based decisions. Psychonomic Bulletin & Review, 26(4), 1099–1121. https://doi.org/10.3758/ s13423-018-1554-2. Fontanesi, L., Palminteri, S., & Lebreton, M. (2019b). Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: A meta-analytical approach using diffusion decision modeling. Cognitive, Affective, & Behavioral Neuroscience, 19(3), 490–502. https://doi.org/10.3758/s13415-019-00723-1. Forbes, E. E., & Dahl, R. E. (2012). Research review: Altered reward function in adolescent depression: What, when and how? Journal of Child Psychology and Psychiatry, 53(1), 3–15. https://doi.org/10.1111/j.1469-7610.2011.02477.x. Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T., & Hutchison, K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proceedings of the National Academy of Sciences, 104(41), 16311–16316. https://doi.org/10.1073/pnas.0706111104. Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science (New York, N.Y.), 306(5703), 1940–1943. https://doi.org/10.1126/science.1102941. Gotlib, I. H., & Joormann, J. (2010). Cognition and depression: Current status and future directions. Annual Review of Clinical Psychology, 6(1), 285–312. https://doi.org/10.1146/annurev.clinpsy.121208.131305. Gradin, V. B., Kumar, P., Waiter, G., Ahearn, T., Stickle, C., Milders, M., … Steele, J. D. (2011). Expected value and prediction error abnormalities in depression and schizophrenia. Brain: A Journal of Neurology, 134(Pt 6), 1751–1764. https://doi.org/10.1093/brain/awr059. Guitart-Masip, M., Huys, Q. J. M., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012). Go and no-go learning in reward and punishment: Interactions between affect and effect. NeuroImage, 62(1), 154–166. https://doi.org/10.1016/j.neuroimage.2012.04.024. Hägele, C., Schlagenhauf, F., Rapp, M., Sterzer, P., Beck, A., Bermpohl, F., … Heinz, A. (2015). Dimensional psychiatry: Reward dysfunction and depressive mood across psychiatric disorders. Psychopharmacology, 232(2), 331– 341. https://doi.org/10.1007/s00213-014-3662-7. Henriques, J. B., Glowacki, J. M., & Davidson, R. J. (1994). Reward fails to alter response bias in depression. Journal of Abnormal Psychology, 103(3), 460. https://psycnet.apa.org/buy/1994-45308-001. Henriques, J. B., & Davidson, R. J. (2000). Decreased responsiveness to reward in depression. Cognition and Emotion, 14(5), 711–724. https://doi.org/10. 1080/02699930050117684. Huys, Q. J., Pizzagalli, D. A., Bogdan, R., & Dayan, P. (2013). Mapping anhedonia onto reinforcement learning: A behavioural meta-analysis. Biology of Mood & Anxiety Disorders, 3(1), 12. https://doi.org/10.1186/2045-53803-12. Huys, Q. J. M., Gölzer, M., Friedel, E., Heinz, A., Cools, R., Dayan, P., & Dolan, R. J. (2016). The specificity of Pavlovian regulation is associated with recovery from depression. Psychological Medicine, 46(5), 1027–1035. https://doi. org/10.1017/S0033291715002597. Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., … Wang, P. (2010). Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp. 2010.09091379. Psychological Medicine Joormann, J., & Quinn, M. E. (2014). Cognitive processes and emotion regulation in depression. Depression and Anxiety, 31(4), 308–315. https://doi. org/10.1002/da.22264. Katahira, K., Yuki, S., & Okanoya, K. (2017). Model-based estimation of subjective values using choice tasks with probabilistic feedback. Journal of Mathematical Psychology, 79, 29–43. https://doi.org/10.1016/j.jmp.2017.05.005. Klein, T. A., Ullsperger, M., & Jocham, G. (2017). Learning relative values in the striatum induces violations of normative decision making. Nature Communications, 8(1), 16033. https://doi.org/10.1038/ncomms16033. Knutson, B., Bhanji, J. P., Cooney, R. E., Atlas, L. Y., & Gotlib, I. H. (2008). Neural responses to monetary incentives in major depression. Biological Psychiatry, 63(7), 686–692. https://doi.org/10.1016/j.biopsych.2007.07.023. Kumar, P., Waiter, G., Ahearn, T., Milders, M., Reid, I., & Steele, J. D. (2008). Abnormal temporal difference reward-learning signals in major depression. Brain, 131(8), 2084–2093. https://doi.org/10.1093/brain/awn136. Moutoussis, M., Rutledge, R. B., Prabhu, G., Hrynkiewicz, L., Lam, J., Ousdal, O.-T., … Dolan, R. J. (2018). Neural activity and fundamental learning, motivated by monetary loss and reward, are intact in mild to moderate major depressive disorder. PLoS One, 13(8), e0201451. https://doi.org/10. 1371/journal.pone.0201451. Murphy, F. C., Michael, A., Robbins, T. W., & Sahakian, B. J. (2003). Neuropsychological impairment in patients with major depressive disorder: The effects of feedback on task performance. Psychological Medicine, 33(3), 455–467. https://doi.org/10.1017/S0033291702007018. Niv, Y., Edlund, J. A., Dayan, P., & O’Doherty, J. P. (2012). Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. Journal of Neuroscience, 32(2), 551–562. https://doi.org/10.1523/ JNEUROSCI.5498-10.2012. Palminteri, S., Clair, A.-H., Mallet, L., & Pessiglione, M. (2012). Similar improvement of reward and punishment learning by serotonin reuptake inhibitors in obsessive-compulsive disorder. Biological Psychiatry, 72(3), 244–250. https://doi.org/10.1016/j.biopsych.2011.12.028. Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6(1), 8096. https://doi.org/10.1038/ncomms9096. Palminteri, S., & Lebreton, M. (2021). Context-dependent outcome encoding in human reinforcement learning. Current Opinion in Behavioral Sciences, 41, 144–151. https://doi.org/10.1016/j.cobeha.2021.06.006. Palminteri, S., Lefebvre, G., Kilford, E. J., & Blakemore, S.-J. (2017). Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing. PLOS Computational Biology, 13(8), e1005684. https://doi.org/10.1371/journal.pcbi.1005684. Palminteri, S., & Pessiglione, M. (2017). Chapter 23 – opponent brain systems for reward and punishment learning: Causal evidence from drug and lesion studies in humans. In J.-C. Dreher & L. Tremblay (Eds.), Decision neuroscience (pp. 291–303). San Diego: Academic Press. Retrieved from https://doi. org/10.1016/B978-0-12-805308-9.00023-3. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R., & Frith, C. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 442(7106), 1042–1045. https://doi.org/10.1038/ nature05051. Pike, A. C., & Robinson, O. J. (2022). Reinforcement learning in patients with mood and anxiety disorders vs control individuals: A systematic review and meta-analysis. JAMA Psychiatry, 79(4), 313–322. https://doi.org/10.1001/ jamapsychiatry.2022.0051. Pizzagalli, D. A. (2014). Depression, stress, and anhedonia: Toward a synthesis and integrated model. Annual Review of Clinical Psychology, 10, 393–423. https://doi.org/10.1146/annurev-clinpsy-050212-185606. Pizzagalli, D. A., Jahn, A. L., & O’Shea, J. P. (2005). Toward an objective characterization of an anhedonic phenotype: A signal-detection approach. Biological Psychiatry, 57(4), 319–327. https://doi.org/10.1016/j.biopsych.2004.11.026. R Core Team. (2022). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.Rproject.org/. Recorla, R. A., & Wagner, A. R. (1972). A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In https://doi.org/10.1017/S0033291722001593 Published online by Cambridge University Press 11 A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton- Century-Crofts. Rothkirch, M., Tonn, J., Köhler, S., & Sterzer, P. (2017). Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder. Brain, 140(4), 1147–1157. https://doi.org/10.1093/brain/awx025. Rupprechter, S., Stankevicius, A., Huys, Q. J. M., Steele, J. D., & Seriès, P. (2018). Major depression impairs the use of reward values for decisionmaking. Scientific Reports, 8(1), 13798. https://doi.org/10.1038/s41598018-31730-w. Rutledge, R. B., Moutoussis, M., Smittenaar, P., Zeidman, P., Taylor, T., Hrynkiewicz, L., … Dolan, R. J. (2017). Association of neural and emotional impacts of reward prediction errors with major depression. JAMA Psychiatry, 74(8), 790–797. https://doi.org/10.1001/jamapsychiatry.2017.1713. Safra, L., Chevallier, C., & Palminteri, S. (2019). Depressive symptoms are associated with blunted reward learning in social contexts. PLOS Computational Biology, 15(7), e1007224. https://doi.org/10.1371/journal. pcbi.1007224. Shah, P. J., O’carroll, R. E., Rogers, A., Moffoot, A. P. R., & Ebmeier, K. P. (1999). Abnormal response to negative feedback in depression. Psychological Medicine, 29(1), 63–72. https://doi.org/10.1017/S0033291798007880. Sheehan, D. V., Lecrubier, Y., Sheehan, K. H., Amorim, P., Janavs, J., Weiller, E., … Dunbar, G. C. (1998). The mini-international neuropsychiatric interview (M.I.N.I.): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. The Journal of Clinical Psychiatry, 59(Suppl. 20), 22–33; quiz 34–57. Steele, J. D., Kumar, P., & Ebmeier, K. P. (2007). Blunted response to feedback information in depressive illness. Brain, 130(9), 2367–2374. https://doi.org/ 10.1093/brain/awm150. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). Cambridge, MA: The MIT Press. Ubl, B., Kuehner, C., Kirsch, P., Ruttorf, M., Diener, C., & Flor, H. (2015). Altered neural reward and loss processing and prediction error signalling in depression. Social Cognitive and Affective Neuroscience, 10(8), 1102– 1112. https://doi.org/10.1093/scan/nsu158. Vrieze, E., Pizzagalli, D. A., Demyttenaere, K., Hompes, T., Sienaert, P., de Boer, P., … Claes, S. (2013). Reduced reward learning predicts outcome in major depressive disorder. Biological Psychiatry, 73(7), 639–645. https://doi.org/10.1016/j.biopsych.2012.10.014. Whitton, A. E., Kakani, P., Foti, D., Van’t Veer, A., Haile, A., Crowley, D. J., & Pizzagalli, D. A. (2016). Blunted neural responses to reward in remitted major depression: A high-density event-related potential study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1(1), 87–95. https:// doi.org/10.1016/j.bpsc.2015.09.007. Whitton, A. E., Treadway, M. T., & Pizzagalli, D. A. (2015). Reward processing dysfunction in major depression, bipolar disorder and schizophrenia. Current Opinion in Psychiatry, 28(1), 7–12. https://doi.org/10.1097/YCO. 0000000000000122. Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. ELife, 8, e49547. https://doi.org/10.7554/eLife. 49547. World Health Organization. (2017). Depression and other common mental disorders: Global health estimates (No. WHO/MSD/MER/2017.2). Retrieved from World Health Organization website: https://apps.who.int/ iris/handle/10665/254610. Yechiam, E., & Hochman, G. (2014). Loss attention in a dual-task setting. Psychological Science, 25(2), 494–502. https://doi.org/10.1177/095679761 3510725. Yu, Z., Guindani, M., Grieco, S. F., Chen, L., Holmes, T. C., & Xu, X. (2022). Beyond t test and ANOVA: Applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron, 110(1), 21–35. https://doi.org/10.1016/j.neuron.2021.10.030. Zhang, W.-N., Chang, S.-H., Guo, L.-Y., Zhang, K.-L., & Wang, J. (2013). The neural correlates of reward-related processing in major depressive disorder: A meta-analysis of functional magnetic resonance imaging studies. Journal of Affective Disorders, 151(2), 531–539. https://doi.org/10.1016/j.jad.2013. 06.039.