Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Recent evidence indicates that reward value encoding in humans is highly context dependent, leading to suboptimal decisions in some cases, but whether this computational constraint on valuation is a shared feature of human cognition... more
Recent evidence indicates that reward value encoding in humans is highly context dependent, leading to suboptimal decisions in some cases, but whether this computational constraint on valuation is a shared feature of human cognition remains unknown. Here we studied the behaviour of n = 561 individuals from 11 countries of markedly different socioeconomic and cultural makeup. Our findings show that context sensitivity was present in all 11 countries. Suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (that is, lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a feature of human cognition that remains consistently present across different countries, as opposed to description-based decision-making, which is more permeable to cultural factors.
In the present study, we investigate and compare reasoning in large language models (LLMs) and humans, using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. We presented to human... more
In the present study, we investigate and compare reasoning in large language models (LLMs) and humans, using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. We presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an indepth comparison between humans and LLMs indicated important differences with human-like reasoning, with models' limitations disappearing almost entirely in more recent LLMs' releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.
Do we preferentially learn from outcomes that confirm our choices? In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments... more
Do we preferentially learn from outcomes that confirm our choices? In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is, the tendency of repeating previous choices, despite contradictory evidence. However, choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice (gradual) perseveration term in the model. In the present study, we reanalyze data from four published papers (nine experiments; 363 subjects; 126,192 trials), originally included in the studies demonstrating or criticizing the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including gradual perseveration. Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of a gradual perseveration term, thus proving to be a robust feature of human reinforcement learning. We conclude by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny. (PsycInfo Database Record (c) 2022 APA, all rights reserved)
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to... more
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from out
Backgrounds. Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test... more
Backgrounds. Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. Methods. We used a two-armed bandit task with two different contexts: one 'rich', one 'poor' where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socioeconomically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. Results. Control subjects showed similar learning performance in the 'rich' and the 'poor' contexts, while patients displayed reduced learning in the 'poor' context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). Conclusions. Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level.
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one’s own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and... more
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one’s own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to ‘high-level’ belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
A wealth of evidence in perceptual and economic decisionmaking research suggests that the subjective assessment of one option is influenced by the context. A series of studies provides evidence that the same coding principles apply to... more
A wealth of evidence in perceptual and economic decisionmaking research suggests that the subjective assessment of one option is influenced by the context. A series of studies provides evidence that the same coding principles apply to situations where decisions are shaped by past outcomes, that is, in reinforcement-learning situations. In bandit tasks, human behavior is explained by models assuming that individuals do not learn the objective value of an outcome, but rather its subjective, context-dependent representation. We argue that, while such outcome context-dependence may be informationally or ecologically optimal, it concomitantly undermines the capacity to generalize value-based knowledge to new contexts-sometimes creating apparent decision paradoxes.
Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement... more
Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.
While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible... more
While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.
Language Huntington's disease Basal ganglia Brain mapping a b s t r a c t Though accumulating evidence indicates that the striatum is recruited during language processing, the specific function of this subcortical structure in language... more
Language Huntington's disease Basal ganglia Brain mapping a b s t r a c t Though accumulating evidence indicates that the striatum is recruited during language processing, the specific function of this subcortical structure in language remains to be elucidated. To answer this question, we used Huntington's disease as a model of striatal lesion. We investigated the morphological deficit of 30 early Huntington's disease patients with a novel linguistic task that can be modeled within an explicit theory of linguistic computation. Behavioral results reflected an impairment in HD patients on the linguistic task. Computational model-based analysis compared the behavioral data to simulated data from two distinct lesion models, a selection deficit model and a grammatical deficit model. This analysis revealed that the impairment derives from an increased randomness in the process of selecting between grammatical alternatives, rather than from a disruption of grammatical knowledge per se. Voxel-based morphometry permitted to correlate this impairment to dorsal striatal degeneration. We thus show that the striatum holds a role in the selection of linguistic alternatives, just as in the selection of motor and cognitive programs.
Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent... more
Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent evidence suggests that behavioral tasks fare poorly compared to standard questionnaires to measure individual differences in risk-taking. Crucially, using model-based measures of risk taking does not seem to improve reliability. Here, we put forward two possible-not mutually exclusive-explanations for these results and suggest future avenues of research to improve the assessment of inter-individual differences in risk-taking by combining repeated online testing and mechanistic computational models.
Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money... more
Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to interindi-vidual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations , as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multistep exchange task and fitted be-havioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the mar-ketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations. search-theoretic model | reinforcement learning | speculative behavior | opportunity cost M oney is both a very complex social phenomenon and easy to manipulate in everyday basic transactions. It is an institutional solution to common frictions in an exchange economy, such as the absence of double coincidence of wants between traders (1). It is of widespread use despite its being dominated in terms of rate of return by all other assets (2). However, it can be speculatively used in a fundamental sense: Its economically dominated holding can be justified by the anticipation of future trading opportunities that are not available at the present moment but will necessitate this particular holding. In this study, we concentrate on a paradigm of commodity-money emergence in which one of the goods exchanged in the economy becomes the selected medium of exchange despite its storage being costlier than any other good. This is typical monetary speculation, in contrast to other types of speculation, which consist in expecting an increased price on the market of a good in the future. The price of money does not vary: only the opportunity that it can afford in the future does. This seems to us to be an important feature of speculative economic behavior relative to the otherwise apparently irrational holding of such a good. We study whether individuals endowed with some information about future exchange opportunities will tend to consider a financially dominated good as a medium for exchange. Modern behaviorally founded theories of the emergence of money and monetary equilibrium (3, 4) are jointly based on the idea of minimizing a trading search process and on individual choices of accepting, declining, or postponing immediate exchanges at different costs incurred. We focus on an influent paradigm by Kiyotaki and Wright (4) (KW hereafter) in which the individual choice of accepting temporarily costly exchanges due to the anticipation of later better trading opportunities is precisely stylized as a speculative behavior and yields a corresponding monetary equilibrium. The environment of this paradigm consists of N agents specialized in terms of both consumption and production in such a manner that there is initially no double coincidence of wants. Frictions in the exchange process create a necessity for at least some of the agents to trade for goods that they neither produce nor consume, which are then used as media of exchange. The ultimate goal of agents-that is, to consume-may then require multiple steps to be achieved. The most interesting part is that in some configurations, the optimal medium of exchange (i.e., the good that maximizes expected utility because of its relatively Significance In the present study, we applied reinforcement learning models that are not classically used in experimental economics to a multistep exchange task of the emergence of money derived from a classic search-theoretic paradigm for the emergence of money. This method allowed us to highlight the importance of counterfactual feedback processing of opportunity costs in the learning process of speculative use of money and the pre-dictive power of reinforcement learning models for multistep economic tasks. Those results constitute a step toward understanding the learning processes at work in multistep economic decision-making and the cognitive microfoundations of the use of money.
The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework,... more
The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework, combining perceptual masking, computational modeling, and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual... more
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
Adaptive coding of stimuli is well documented in perception, where it supports efficient encoding over a broad range of possible percepts. Recently, a similar neural mechanism has been reported also in value-based decision, where it... more
Adaptive coding of stimuli is well documented in perception, where it supports efficient encoding over a broad range of possible percepts. Recently, a similar neural mechanism has been reported also in value-based decision, where it allows optimal encoding of vast ranges of values in PFC: neuronal response to value depends on the choice context (relative coding), rather than being invariant across contexts (absolute coding). Additionally, value learning is sensitive to the amount of feedback information: providing complete feedback (both obtained and forgone outcomes) instead of partial feedback (only obtained outcome) improves learning. However, it is unclear whether relative coding occurs in all PFC regions and how it is affected by feedback information. We systematically investigated univariate and multivariate feedback encoding in various mPFC regions and compared three modes of neural coding: absolute, partially-adaptive and fully-adaptive. Twenty-eight human participants (both sexes) performed a learning task while undergoing fMRI scanning. On each trial, they chose between two symbols associated with a certain outcome. Then, the decision outcome was revealed. Notably, in one-half of the trials participants received partial feedback, whereas in the other half they got complete feedback. We used univariate and multivariate analysis to explore value encoding in different feedback conditions. We found that both obtained and forgone outcomes were encoded in mPFC, but with opposite sign in its ventral and dorsal subdivisions. Moreover, we showed that increasing feedback information induced a switch from absolute to relative coding. Our results suggest that complete feedback information enhances context-dependent outcome encoding. This study offers a systematic investigation of the effect of the amount of feedback information (partial vs complete) on uni-variate and multivariate outcome value encoding, within multiple regions in mPFC and cingulate cortex that are critical for value-based decisions and behavioral adaptation. Moreover, we provide the first comparison of three possible models of neu-ral coding (i.e., absolute, partially-adaptive, and fully-adaptive coding) of value signal in these regions, by using commensura-ble measures of prediction accuracy. Taken together, our results help build a more comprehensive picture of how the human brain encodes and processes outcome value. In particular, our results suggest that simultaneous presentation of obtained and foregone outcomes promotes relative value representation.
In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals'... more
In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals' confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valence-induced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within-and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.
Depending on environmental demands, humans can learn and exploit multiple concurrent sets of stimulus-response associations. Mechanisms underlying the learning of such task-sets remain unknown. Here we investigate the hypothesis that... more
Depending on environmental demands, humans can learn and exploit multiple concurrent sets of stimulus-response associations. Mechanisms underlying the learning of such task-sets remain unknown. Here we investigate the hypothesis that task-set learning relies on unsupervised chunking of stimulus-response associations that occur in temporal proximity. We examine behavioral and neural data from a task-set learning experiment using a network model. We first show that task-set learning can be achieved provided the timescale of chunking is slower than the timescale of stimulus-response learning. Fitting the model to behavioral data on a subject-by-subject basis confirmed this expectation and led to specific predictions linking chunking and task-set retrieval that were borne out by behavioral performance and reaction times. Comparing the model activity with BOLD signal allowed us to identify neural correlates of task-set retrieval in a functional network involving ventral and dorsal prefrontal cortex, with the dorsal system preferentially engaged when retrievals are used to improve performance.
D etermining whether similar valence-induced biases exist in reinforcement learning and probabilistic reasoning may be crucial to help refine our understanding of adaptive and maladaptive decision-making through the lens of a unified... more
D etermining whether similar valence-induced biases exist in reinforcement learning and probabilistic reasoning may be crucial to help refine our understanding of adaptive and maladaptive decision-making through the lens of a unified computational approach. Standard reinforcement learning models conceive agents as impartial learners: they learn equally well from positive and negative outcomes alike 1. However, empirical studies have recently come to challenge this view by demonstrating that human learners, rather than processing information impartially , consistently display a valence-induced bias: when faced with uncertain choice options, they tend to disregard bad news by integrating worse-than-expected outcomes (negative prediction errors) at a lower rate relative to better-than-expected ones (positive prediction errors) 2-4. This positivity bias would echo the asymmetric processing of self-relevant information in probabilistic reasoning, whereby good news on average receives more weight than bad news 5,6. A bias for learning preferentially from better-than-expected outcomes would reflect a preference for positive events in general. However, this prediction is at odds with recent findings. In a two-armed bandit task featuring complete feedback information, we previously found that participants would learn preferentially from better-than-expected obtained outcomes while preferentially learning from worse-than-expected forgone outcomes (that is, from the outcome associated with the option they had not chosen 7). This learning asymmetry suggests that what has been previously characterized as a positivity bias may, in fact, be the upshot of a more general, and perhaps ubiquitous, choice-confirmation bias, whereby human agents preferentially integrate information that confirms their previous decision 8. Building on these previous findings, we reasoned that if human reinforcement learning is indeed biased in a choice-confirmatory manner, learning from action-outcome couplings that were not voluntarily chosen by the subject (forced choice) should present no bias. To test this hypothesis, we conducted three experiments involving instrumental learning and computational model-based analyses. Participants were administrated new variants of a prob-abilistic learning task in which they could freely choose between two options, or were 'forced' to implement the choice made by a computer. In the first experiment, participants were only shown the obtained outcome corresponding to their choice (factual learning). In the second experiment, participants were shown both the obtained and the forgone outcome (counterfactual learning). Finally, to address a concern raised during the review process, a third experiment was included in which both free-and forced-choice trials featured a condition with a random reward schedule (50/50). The rationale for implementing this reward schedule was to test whether or not the confirmation bias was due to potential sampling differences between types of trials. Indeed, in the free-choice condition, the most rewarding symbol should be increasingly selected as the subject learns the structure of the task. Having a random reward schedule eliminates the possibility of such unbalanced sampling between free-and forced-choice conditions. We had two key predictions. With regard to factual learning, participants should learn better from positive prediction error, but they should only do so when free to choose (free-choice trials), while showing no effect when forced to match a computer's choice (forced-choice trials). With regard to counterfactual learning from forgone outcomes, we expected the opposite pattern: in free-choice trials, negative prediction errors should be more likely to be taken The valence of new information influences learning rates in humans: good news tends to receive more weight than bad news. We investigated this learning bias in four experiments, by systematically manipulating the source of required action (free versus forced choices), outcome contingencies (low versus high reward) and motor requirements (go versus no-go choices). Analysis of model-estimated learning rates showed that the confirmation bias in learning rates was specific to free choices, but was independent of outcome contingencies. The bias was also unaffected by the motor requirements, thus suggesting that it operates in the representational space of decisions, rather than motoric actions. Finally, model simulations revealed that learning rates estimated from the choice-confirmation model had the effect of maximizing performance across low-and high-reward environments. We therefore suggest that choice-confirmation bias may be adaptive for efficient learning of action-outcome contingencies, above and beyond fostering person-level dispositions such as self-esteem. NaTure HuMaN BeHaVIour | www.nature.com/nathumbehav
La psychiatrie computationnelle est une approche théorique utilisant des modèles mathéma-tiques pour éclairer les liens entre symptômes et anomalies neurobiologiques observées dans les troubles mentaux. Cette introduction passe en revue... more
La psychiatrie computationnelle est une approche théorique utilisant des modèles mathéma-tiques pour éclairer les liens entre symptômes et anomalies neurobiologiques observées dans les troubles mentaux. Cette introduction passe en revue trois champs d'application principaux : les modèles issus de l'apprentissage par renforcement, les modèles issus de la théorie économique de la décision et les modèles bayésiens. Les premiers ont été principalement utilisés pour l'étude des addictions, les deuxièmes dans le cadre des troubles de la motivation et de l'impulsivité, et les derniers constituent un apport important pour la compréhension des symptômes psychotiques. Les perspectives ouvertes par l'approche computa-tionnelle sont larges, allant de l'élucidation des mécanismes physiopathologiques des troubles mentaux à l'échelle populationnelle à la personnalisation des prises en charge à l'échelle individuelle. © 2020 Elsevier Masson SAS. Tous droits réservés.
Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful... more
Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Moreover , engaging control to proactively suppress irrelevant information that could conflict with task-relevant information would presumably also be cognitively costly. Yet, it remains unclear whether the cognitive control demands involved in preventing and resolving conflict also constitute costs in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their free choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of investing cognitive control to suppress an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that free choices were more biased when participants were less sure about which action was more rewarding. This supports the hypothesis that the costs linked to conflict management were traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one's actions and external distractors. Our results show that the subjective cognitive control costs linked to conflict factor into value-based decision-making, and highlight that different types of conflict may have different effects on learning about action outcomes. PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.
Depression is characterized by a marked decrease in social interactions and blunted sensitivity to rewards. Surprisingly, despite the importance of social deficits in depression, non-social aspects have been disproportionally... more
Depression is characterized by a marked decrease in social interactions and blunted sensitivity to rewards. Surprisingly, despite the importance of social deficits in depression, non-social aspects have been disproportionally investigated. As a consequence, the cognitive mechanisms underlying atypical decision-making in social contexts in depression are poorly understood. In the present study, we investigate whether deficits in reward processing interact with the social context and how this interaction is affected by self-reported depression and anxiety symptoms in the general population. Two cohorts of subjects (discovery and replication sample: N = 50 each) took part in an experiment involving reward learning in contexts with different levels of social information (absent, partial and complete). Behavioral analyses revealed a specific detrimental effect of depressive symptoms-but not anxiety-on behavioral performance in the presence of social information, i.e. when participants were informed about the choices of another player. Model-based analyses further characterized the computational nature of this deficit as a negative audience effect, rather than a deficit in the way others' choices and rewards are integrated in decision making. To conclude, our results shed light on the cognitive and computational mechanisms underlying the interaction between social cognition, reward learning and decision-making in depressive disorders. Author summary Blunted sensitivity to rewards is at the core of depression. However, studies that investigated the influence of depression on decision-making have often done so in asocial contexts , thereby providing only partial insights into the way depressive disorders impact the underlying cognitive processes. Indeed, atypical social functioning is also a central characteristic of depression. Here, we aimed at integrating the social component of depressive disorders into the study of decision-making in depression. To do so, we measured the influence of self-reported depressive symptoms on social learning in participants performing an online experiment. Our study shows that depressive symptoms are associated with decreased performance only when participants are informed about the actions of another player. Computational characterizations of this effect reveal that participants with more PLOS Computational Biology | https://doi.
The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests... more
The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process-confidence judgment-is susceptible to numerous biases. Here, we investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, participants were more confident in their choices when learning to seek gains compared to avoiding losses, despite equal difficulty and performance between those two contexts. Computational modelling revealed that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options, necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, revealed here for the first time in a reinforcement-learning context, is therefore domain-general, with likely important functional consequences. We show that one such consequence emerges in volatile environments , where the (in)flexibility of individuals' learning strategies differs when outcomes are framed as gains or losses. Despite apparent similar behavior-profound asymmetries might therefore exist between learning to avoid losses and learning to seek gains. Author summary In order to arbitrate between different decision strategies, as well as to inform future choices, a decision maker needs to estimate the probability of her choices being correct as precisely as possible. Surprisingly, this metacognitive operation, known as confidence judgment, has not been systematically investigated in the context of simple instrumental-learning tasks. Here, we assessed how confident individuals are in their choices when learning stimulus-outcome associations by trial-and-errors to maximize gains or to PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.
R esearchers in psychology have long ago acknowledged the importance of building and testing theories that account for both the typical behaviour observed in a representative sample of the population and the observed differences between... more
R esearchers in psychology have long ago acknowledged the importance of building and testing theories that account for both the typical behaviour observed in a representative sample of the population and the observed differences between people 1-3. Since the mid-twentieth century, scientific psychology has benefited from important complementary insights from experimental psychology, which studies variance among treatments, and from correlational psychology, which studies variance among participants. Similarly nowadays, understanding the average typical brain and understanding the differences between individuals constitute the two complementary goals of cognitive neuroscience 4,5. Inter-individual differences in neural activities can be a source of statistical noise when considering the typical brain, but may also represent the very object of interest 6-9 and can help provide an accurate and representative picture of brain function 10. Across the whole spectrum of neuroscience subfields, understanding how differences in neural activity across individuals produce differences in behavioural responses appears necessary, not only to test key predictions of neurobiological theories, but also to realize the potential of neuroimaging applications. For instance, developmental neuroscience and neuroscience of ageing rely, by nature, on the comparison of different individuals characterized by different ages or life histories 11. Likewise, some neurobiologi-cal concepts, like cognitive reserve, are entirely designed to explain differences in symptoms between individuals faced with the same neural pathology 12,13. Inter-individual differences are also important in neuroscience subfields investigating cognitive processes such as learning 14 or executive control 6 , where the neural data could shed light on why some individuals perform better than others. Regarding applications, clinical diagnostics in psychiatry are expected to greatly benefit from the joint analysis of individual behaviour and brain activity, as such complementary techniques will allow doctors to better dissociate between neurotypical and affected cases 15-17. The most promising socioeconomic applications of neuroimaging, such as the characterization of individual preferences and cognitive abilities , also critically depend on our ability to understand how inter-individual differences in brain functions relate to inter-individual differences in behaviour 18-22. One appealing strategy to investigate how inter-individual differences in brain functions relate to inter-individual differences in behaviour involves task-related functional MRI (fMRI). Task-related fMRI is claimed to be able to target the mechanisms underpinning cognitive processes, because-unlike other biomarkers, such as genetics, neuroanatomy, or measures estimated from resting-state functional imaging 8,23-it allows measuring the neural activity directly elicited by the cognitive processes of interest 16,24. This is particularly true when fMRI is combined with computational modelling, an approach called model-based fMRI, as mechanistic measures of cognitive function are explicitly incorporated in the analysis framework in the form of a computational variables 16,25-28. In the following section, we develop a concrete example of inter-individual difference analyses in task-related fMRI. This example is inspired by the human reinforcement-learning literature, as it is one of the most typical examples of model-based fMRI 25,29,30. We then use this example to expose and discuss important assumptions and requirements underlying the standard inter-individual brain-behaviour differences (IBBD) analytical strategy. An IBBD analysis example from human reinforcement-learning Reinforcement learning, i.e., learning by trial and error, is thought to be a fundamental cognitive building block and is used to achieve behavioural goals ranging from tuning motor actions to making decisions in social contexts 31,32. Reinforcement learning is one of the Explaining and predicting individual behavioural differences induced by clinical and social factors constitutes one of the most promising applications of neuroimaging. In this Perspective, we discuss the theoretical and statistical foundations of the analyses of inter-individual differences in task-related functional neuroimaging. Leveraging a five-year literature review (July 2013-2018), we show that researchers often assess how activations elicited by a variable of interest differ between individuals. We argue that the rationale for such analyses, typically grounded in resource theory, offers an over-large analytical and interpre-tational flexibility that undermines their validity. We also recall how, in the established framework of the general linear model, inter-individual differences in behaviour can act as hidden moderators and spuriously induce differences in activations. We conclude with a set of recommendations and directions, which we hope will contribute to improving the statistical validity and the neurobiological interpretability of inter-individual difference analyses in task-related functional neuroimaging. NAture HuMAN BeHAvIour | www.nature.com/nathumbehav
I n uncertain environments, decision-makers can learn rewarding actions by trial-and-error to maximize their expected payoff (Fig. 1a). An important challenge is that reward contingencies typically change over time, and thus a... more
I n uncertain environments, decision-makers can learn rewarding actions by trial-and-error to maximize their expected payoff (Fig. 1a). An important challenge is that reward contingencies typically change over time, and thus a less-rewarded action at a given point in time can become more rewarding later (Fig. 1b). Versatile machine learning algorithms, known collectively as reinforcement learning (RL), describe the changing values of possible actions and the policy used to choose among them 1. One biologically plausible class of RL models updates the expected values associated with possible actions sequentially based on the prediction error between obtained and expected reward-a learning scheme known as the Rescorla-Wagner rule 2. At any given time point, the decision-maker chooses on the basis of the difference in expected value between possible actions, by selecting the action associated with the largest expected reward. However, in volatile environments in which reward contingencies change rapidly over time, human decision-makers make a substantial number of 'non-greedy' decisions that do not maximize the expected value predicted by reinforcement learning 3 (in contrast to value-maximizing, 'greedy' decisions). Prominent theories describe these non-greedy decisions as the result of a compromise between exploiting a currently well-valued action versus exploring other, possibly better-valued actions-known as the exploration-exploitation trade-off. In this view, information seeking motivates non-greedy decisions. Indeed, for a value-maximizing agent, lower-valued actions are chosen less often and thus their expected values are more uncertain than those of higher-valued actions. Non-greedy decisions in favor of recently unchosen actions thus reduce uncertainty about their current value and increase long-term payoff 4. An important, implicit corollary of this view is that the underlying RL process updates action values without any internal variability after each obtained reward. However, it has recently been shown that the accuracy of human perceptual decisions based on multiple sensory cues is bounded not by variability in the choice process, but rather by inference noise arising during the accumulation of evidence 5,6. An intriguing possibility is that the learning process at the center of reward-guided decision-making might be subject to the same kind of computational noise, in this case random variability in the update of action values (Fig. 1c). Critically, the existence of intrinsic noise in RL would trigger non-greedy decisions owing to random deviations between exact applications of the learning rule and its noisy real-izations following each obtained reward. In this view, an unknown fraction of non-greedy decisions would not result from overt information seeking during choice, as assumed by existing theories and computational models, but from the limited precision of the underlying learning process. To determine whether, and to what extent, learning noise drives non-greedy decisions during reward-guided decision-making, we first derived a theoretical formulation of RL that allows for random noise in its core computations. In a series of behavioral and neuro-imaging experiments, tested on a total of 90 human participants, we then quantified the fraction of non-greedy decisions that could be attributed to learning noise, and identified its neurophysiologi-cal substrates using functional magnetic resonance imaging (fMRI) and pupillometric recordings. Results Experimental protocol and computational model. We designed a restless, two-armed bandit game. In three experiments, human participants were asked to maximize their monetary payoff by sampling repeatedly from one of two reward sources depicted by colored shapes (Fig. 1a, see Methods). The payoffs that could be obtained from either shape were sampled from probability When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning. NatuRe NeuRoSCieNCe | www.nature.com/natureneuroscience
Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the... more
Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the interactions between choice preference and response time (RT), or how these interactions are influenced by contextual factors. However, in the field of perceptual decision-making, such interactions have proven to be important to dissociate between different underlying cognitive processes. Here, we investigated such interactions to shed new light on overlooked differences between learning to seek rewards and learning to avoid losses. We leveraged behavioral data from four RL experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy: While valence only affects RTs, feedback information affects both RTs and accuracy. To dissociate between the latent cognitive processes, we jointly fitted choices and RTs across all experiments with a Bayesian, hierarchical diffusion decision model (DDM). We found that the feedback manipulation affected drift rate, threshold, and non-decision time, suggesting that it was not a mere difficulty effect. Moreover, valence affected non-decision time and threshold, suggesting a motor inhibition in punishing contexts. To better understand the learning dynamics, we finally fitted a combination of RL and DDM (RLDDM). We found that while the threshold was modulated by trial-specific decision conflict, the non-decision time was modulated by the learned context valence. Overall, our results illustrate the benefits of jointly modeling RTs and choice data during RL, to reveal subtle mechanistic differences underlying decisions in different learning contexts.
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual... more
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money... more
Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to interindi-vidual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations , as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multistep exchange task and fitted be-havioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the mar-ketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations. search-theoretic model | reinforcement learning | speculative behavior | opportunity cost M oney is both a very complex social phenomenon and easy to manipulate in everyday basic transactions. It is an institutional solution to common frictions in an exchange economy, such as the absence of double coincidence of wants between traders (1). It is of widespread use despite its being dominated in terms of rate of return by all other assets (2). However, it can be speculatively used in a fundamental sense: Its economically dominated holding can be justified by the anticipation of future trading opportunities that are not available at the present moment but will necessitate this particular holding. In this study, we concentrate on a paradigm of commodity-money emergence in which one of the goods exchanged in the economy becomes the selected medium of exchange despite its storage being costlier than any other good. This is typical monetary speculation, in contrast to other types of speculation, which consist in expecting an increased price on the market of a good in the future. The price of money does not vary: only the opportunity that it can afford in the future does. This seems to us to be an important feature of speculative economic behavior relative to the otherwise apparently irrational holding of such a good. We study whether individuals endowed with some information about future exchange opportunities will tend to consider a financially dominated good as a medium for exchange. Modern behaviorally founded theories of the emergence of money and monetary equilibrium (3, 4) are jointly based on the idea of minimizing a trading search process and on individual choices of accepting, declining, or postponing immediate exchanges at different costs incurred. We focus on an influent paradigm by Kiyotaki and Wright (4) (KW hereafter) in which the individual choice of accepting temporarily costly exchanges due to the anticipation of later better trading opportunities is precisely stylized as a speculative behavior and yields a corresponding monetary equilibrium. The environment of this paradigm consists of N agents specialized in terms of both consumption and production in such a manner that there is initially no double coincidence of wants. Frictions in the exchange process create a necessity for at least some of the agents to trade for goods that they neither produce nor consume, which are then used as media of exchange. The ultimate goal of agents-that is, to consume-may then require multiple steps to be achieved. The most interesting part is that in some configurations, the optimal medium of exchange (i.e., the good that maximizes expected utility because of its relatively Significance In the present study, we applied reinforcement learning models that are not classically used in experimental economics to a multistep exchange task of the emergence of money derived from a classic search-theoretic paradigm for the emergence of money. This method allowed us to highlight the importance of counterfactual feedback processing of opportunity costs in the learning process of speculative use of money and the pre-dictive power of reinforcement learning models for multistep economic tasks. Those results constitute a step toward understanding the learning processes at work in multistep economic decision-making and the cognitive microfoundations of the use of money.
Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent... more
Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent evidence suggests that behavioral tasks fare poorly compared to standard questionnaires to measure individual differences in risk-taking. Crucially, using model-based measures of risk taking does not seem to improve reliability. Here, we put forward two possible-not mutually exclusive-explanations for these results and suggest future avenues of research to improve the assessment of inter-individual differences in risk-taking by combining repeated online testing and mechanistic computational models.
Language Huntington's disease Basal ganglia Brain mapping a b s t r a c t Though accumulating evidence indicates that the striatum is recruited during language processing, the specific function of this subcortical structure in language... more
Language Huntington's disease Basal ganglia Brain mapping a b s t r a c t Though accumulating evidence indicates that the striatum is recruited during language processing, the specific function of this subcortical structure in language remains to be elucidated. To answer this question, we used Huntington's disease as a model of striatal lesion. We investigated the morphological deficit of 30 early Huntington's disease patients with a novel linguistic task that can be modeled within an explicit theory of linguistic computation. Behavioral results reflected an impairment in HD patients on the linguistic task. Computational model-based analysis compared the behavioral data to simulated data from two distinct lesion models, a selection deficit model and a grammatical deficit model. This analysis revealed that the impairment derives from an increased randomness in the process of selecting between grammatical alternatives, rather than from a disruption of grammatical knowledge per se. Voxel-based morphometry permitted to correlate this impairment to dorsal striatal degeneration. We thus show that the striatum holds a role in the selection of linguistic alternatives, just as in the selection of motor and cognitive programs.
The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework,... more
The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework, combining perceptual masking, computational modeling, and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.
Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful... more
Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one's actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.
Research on social influence has focused mainly on the target of influence (e.g., consumer and voter); thus, the cognitive and neurobiological underpinnings of the source of the influence (e.g., politicians and salesmen) remain unknown.... more
Research on social influence has focused mainly on the target of influence (e.g., consumer and voter); thus, the cognitive and neurobiological underpinnings of the source of the influence (e.g., politicians and salesmen) remain unknown. Here, in a three-sided advice-giving game, two advisers competed to influence a client by modulating their own confidence in their advice about which lottery the client should choose. We report that advisers' strategy depends on their level of influence on the client and their merit relative to one another. Moreover, blood-oxygenation-level-dependent (BOLD) signal in the temporo-parietal junction is modulated by adviser's current level of influence on the client, and relative merit prediction error affects activity in medial-prefrontal cortex. Both types of social information modulate ventral striatum response. By demonstrating what happens in our mind and brain when we try to influence others, these results begin to explain the biological mechanisms that shape inter-individual differences in social conduct.
Research Interests:
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative , prediction errors. However, whether or not... more
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative , prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment , participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. While the investigation of decision-making biases has a long history in economics and psychology, learning biases have been much less systematically investigated. This is surprising as most of the choices we deal with in everyday life are recurrent, thus allowing learning to occur and therefore influencing future decision-making. Combining beha-vioural testing and computational modeling, here we show that the valence of an outcome biases both factual and counterfactual learning. When considering factual and
Research Interests:
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine... more
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication. Aripiprazole is a recently introduced antipsychotic medication with dopamine receptor partial agonist mechanisms 1. In schizophrenia patients, aripiprazole exhibited efficacy comparable to that of typical and atypical antip-sychotics in the treatment of positive and negative symptoms as well as in the prevention of relapse 2, 3. Its efficacy has also been demonstrated in other neurological disorders for which antipsychotics are indicated, such as Gilles de la Tourette syndrome (GTS) 4, 5. Its tolerability is often considered superior to that of typical antipsychotics, and it is associated with fewer adverse side effects, such as extrapyramidal and metabolic syndromes 6, 7. In GTS, arip-iprazole is effective for suppressing tics while displaying a less severe side effect profile than dopamine receptor antagonists with regard to motivational deficits such as sedation and depressive reactions 8. Due to this advantageous cost-benefit trade-off, aripiprazole has become a widely prescribed treatment for schizophrenia and GTS. Pharmacological studies in humans suggest that dopamine receptor antagonist-induced sedation and depres-sive states may be the consequence of blunting of reward-related signals 9, 10. For instance, reward-seeking behaviour and reward prediction errors encoded in the ventral striatum were reduced by haloperidol administration in healthy volunteers 11. In contrast, both functions were preserved in GTS patients medicated with aripiprazole 12 .
Research Interests:
In the past decade the field of cognitive sciences has seen an exponential growth in the number of computational modeling studies. Previous work has indicated why and how candidate models of cognition should be compared by trading off... more
In the past decade the field of cognitive sciences has seen an exponential growth in the number of computational modeling studies. Previous work has indicated why and how candidate models of cognition should be compared by trading off their ability to predict the observed data as a function of their complexity. However, the importance of falsifying candidate models in light of the observed data has been largely underestimated, leading to important drawbacks and unjustified conclusions. We argue here that the simulation of candidate models is necessary to falsify models and therefore support the specific claims about cognitive function made by the vast majority of model-based studies. We propose practical guidelines for future research that combine model comparison and falsification. Complementary Roles of Comparison and Falsification in Model Selection Computational modeling has grown considerably in cognitive sciences in the past decade (Figure 1A). Computational models of cognition are also becoming increasingly central in neuroimaging and psychiatry as powerful tools for understanding normal and pathological brain function [1–5]. The importance of computational models in cognitive sciences and neurosciences is not surprising; because the core function of the brain is to process information to guide adaptive behavior, it is particularly useful to formulate cognitive theories in computational terms [6,7] (Box 1). Similarly to cognitive theories, computational models should be submitted to a selection process. We argue here that the current practice for model selection often omits a crucial step: model falsification (see Glossary). One universally recognized heuristic for theory selection is Occam's law of parsimony: 'pluralitas non est ponenda sine necessitate' (plurality is never to be posited without necessity). This principle dictates that among 'equally good' explanations of data, the less complex explanation should be held as true. More formally, a trade-off exists between the complexity of a given model (which specifically grows with its number of 'free' and adjustable parameters) and its goodness-of-fit (the likelihood of the observed data given the model). Different quantitative criteria (e.g., the Bayesian information criterion, Bayes factor, and other approximations of the model evidence) have been proposed to take model parsimony into account when comparing different models. These criteria are based on the predictive performance of a model, in other words its ability to predict the observed data [8–11]. We refer to them as relative comparison criteria because they imply no absolute criterion for model selection or rejection. Following these criteria, the 'winning' (or 'best') model is the model with the strongest evidence (i.e., trading off goodness-of-fit with complexity) compared to rival models [8,12]. Various statistical methods can then be used to test whether there is significantly stronger evidence in favor of the winning model than rival models. Trends Computational modeling has grown exponentially in cognitive sciences in the past decade. Model selection most often relies on evaluating the ability of candidate models to predict the observed data. The ability of a candidate model to generate a behavioral effect of interest is rarely assessed, but can be used as an absolute falsification criterion. Recommended guidelines for model selection should combine the evaluation of both the predictive and genera-tive performance of candidate models.
Research Interests:
When forming and updating beliefs about future life outcomes, people tend to consider good news and to disregard bad news. This tendency is assumed to support the optimism bias. Whether this learning bias is specific to 'high-level'... more
When forming and updating beliefs about future life outcomes, people tend to consider good news and to disregard bad news. This tendency is assumed to support the optimism bias. Whether this learning bias is specific to 'high-level' abstract belief update or a particular expression of a more general 'low-level' reinforcement learning process is unknown. Here we report evidence in favour of the second hypothesis. In a simple instrumental learning task, participants incorporated better-than-expected outcomes at a higher rate than worse-than-expected ones. In addition, functional imaging indicated that inter-individual difference in the expression of optimistic update corresponds to enhanced prediction error signalling in the reward circuitry. Our results constitute a step towards the understanding of the genesis of optimism bias at the neurocomputational level.
Research Interests:
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be... more
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissocia-ble computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabi-listic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisa-tion module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. We employed a novel learning task to investigate how adolescents and adults learn from reward versus punishment, and to counterfactual feedback about decisions. Computational analyses revealed that adults and adolescents did not implement the same algorithm to solve the learning task. In contrast to adults, adolescents' performance did not take into account counterfactual information; adolescents also learned preferentially to seek rewards
Research Interests:
Tics are sometimes described as voluntary movements performed in an automatic or habitual way. Here, we addressed the question of balance between goal-directed and habitual behavioural control in Gilles de la Tourette syndrome and... more
Tics are sometimes described as voluntary movements performed in an automatic or habitual way. Here, we addressed the question of balance between goal-directed and habitual behavioural control in Gilles de la Tourette syndrome and formally tested the hypothesis of enhanced habit formation in these patients. To this aim, we administered a three-stage instrumental learning paradigm to 17 unmedicated and 17 antipsychotic-medicated patients with Gilles de la Tourette syndrome and matched controls. In the first stage of the task, participants learned stimulus-response-outcome associations. The subsequent outcome devaluation and ‘slip-of-action’ tests allowed evaluation of the participants’ capacity to flexibly adjust their behaviour to changes in action outcome value. In this task, unmedicated patients relied predominantly on habitual, outcome-insensitive behavioural control. Moreover, in these patients, the engagement in habitual responses correlated with more severe tics. Medicated patients performed at an intermediate level between unmedicated patients and controls. Using diffusion tensor imaging on a subset of patients, we also addressed whether the engagement in habitual responding was related to structural connectivity within cortico-striatal networks. We showed that engagement in habitual behaviour in patients with Gilles de la Tourette syndrome correlated with greater structural connectivity within the right motor cortico-striatal network. In unmedicated patients, stronger structural connectivity of the supplementary motor cortex with the sensorimotor putamen predicted more severe tics. Overall, our results indicate enhanced habit formation in unmedicated patients with Gilles de la Tourette syndrome. Aberrant reinforcement signals to the sensorimotor striatum may be fundamental for the formation of stimulus-response associations and may contribute to the habitual behaviour and tics of this syndrome.
Research Interests:
Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option... more
Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative—context-dependent—scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system.
Research Interests:
A state of pathological uncertainty about environmental regularities might represent a key step in the pathway to psychotic illness. Early psychosis can be investigated in healthy volunteers under ketamine, an NMDA receptor antagonist.... more
A state of pathological uncertainty about environmental regularities might represent a key step in the pathway to psychotic illness. Early psychosis can be investigated in healthy volunteers under ketamine, an NMDA receptor antagonist. Here, we explored the effects of ketamine on contingency learning using a placebo-controlled, double-blind, crossover design. During functional magnetic resonance imaging, participants performed an instrumental learning task, in which cue-outcome contingencies were probabilistic and reversed between blocks. Bayesian model comparison indicated that in such an unstable environment, reinforcement learning parameters are downregulated depending on confidence level, an adaptive mechanism that was specifically disrupted by ketamine administration. Drug effects were underpinned by altered neural activity in a fronto-parietal network, which reflected the confidence-based shift to exploitation of learned contingencies. Our findings suggest that an early characteristic of psychosis lies in a persistent doubt that undermines the stabilization of behavioral policy resulting in a failure to exploit regularities in the environment.
Research Interests:
Human decision-making arises from both reflective and reflexive mechanisms, which underpin goal-directed and habitual behavioural control. Computationally, these two systems of behavioural control have been described by different learning... more
Human decision-making arises from both reflective and reflexive mechanisms, which underpin goal-directed and habitual behavioural control. Computationally, these two systems of behavioural control have been described by different learning algorithms, model-based and model-free learning, respectively. Here, we investigated the effect of diminished serotonin (5-hydroxytryptamine) neurotransmission using dietary tryptophan depletion (TD) in healthy volunteers on the performance of a two-stage decision-making task, which allows discrimination between model-free and model-based behavioural strategies. A novel version of the task was used, which not only examined choice balance for monetary reward but also for punishment (monetary loss). TD impaired goal-directed (model-based) behaviour in the reward condition, but promoted it under punishment. This effect on appetitive and aversive goal-directed behaviour is likely mediated by alteration of the average reward representation produced by TD, which is consistent with previous studies. Overall, the major implication of this study is that serotonin differentially affects goal-directed learning as a function of affective valence. These findings are relevant for a further understanding of psychiatric disorders associated with breakdown of goal-directed behavioural control such as obsessive-compulsive disorders or addictions.
Research Interests:
The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the... more
The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is
known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general
mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy
human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the
monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to
update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables
were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to
positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal
anterior cingulate, andthe posterior parietal cortex, correlated positively with expected and actual efforts. Thesefindings suggestthatthe
same computational rule is applied by distinct brain systems, depending on the choice dimension— cost or benefit—that has to be
learned.
Research Interests:
Research Interests:
Rewards have various effects on human behavior and multiple representations in the human brain. Behaviorally, rewards notably enhance response vigor in incentive motivation paradigms and bias subsequent choices in instrumental learning... more
Rewards have various effects on human behavior and multiple representations in the human brain. Behaviorally, rewards notably enhance response vigor in incentive motivation paradigms and bias subsequent choices in instrumental learning paradigms. Neurally, rewards affect activity in different fronto-striatal regions attached to different motor effectors, for instance in left and right hemispheres for the two hands. Here we address the question of whether manipulating reward-related brain activity has local or general effects, with respect to behavioral paradigms and motor effectors. Neuronal activity was manipulated in a single hemisphere using unilateral deep brain stimulation (DBS) in patients with Parkinson’s disease. Results suggest that DBS amplifies the representation of reward magnitude within the targeted hemisphere, so as to affect the behavior of the contralateral hand specifically. These unilateral DBS effects on behavior include both boosting incentive motivation and biasing instrumental choices. Furthermore, using computational modeling we show that DBS effects on incentive motivation can predict DBS effects on instrumental learning (or vice versa). Thus, we demonstrate the feasibility of causally manipulating reward-related neuronal activity in humans, in a manner that is specific to a class of motor effectors but that generalizes to different computational processes. As these findings proved independent from therapeutic effects on parkinsonian motor symptoms, they might provide insight into DBS impact on non-motor disorders, such as apathy or hypomania.
Research Interests:

And 5 more

Approaching rewards and avoiding punishments are core principles that govern the adaptation of behavior to the environment. The machine learning literature has proposed formal algorithms to account for how agents adapt their decisions to... more
Approaching rewards and avoiding punishments are core principles that govern the adaptation of behavior to the environment. The machine learning literature has proposed formal algorithms to account for how agents adapt their decisions to optimize outcomes. In principle, these reinforcement learning models could be equally applied to positive and negative outcomes, ie, rewards and punishments. Yet many neuroscience studies have suggested that reward and punishment learning might be underpinned by distinct brain systems. Reward learning has been shown to recruit midbrain dopaminergic nuclei and ventral pre-frontostriatal circuits. The picture is less clear regarding the existence and anatomy of an opponent system: several hypotheses have been formulated for the neural implementation of punishment learning. In this chapter, we review the evidence for and against each hypothesis, focusing on human studies that compare the effects of neural perturbation, following drug administration and/or pathological conditions , on reward and punishment learning. Good and evil, reward and punishment, are the only motives to a rational creature: these are the spur and reins whereby all mankind are set on work, and guided. These famous words by John Locke suggest that rewards and punishments are not on a continuum from positive to negative: they pertain to distinct categories of events that we can imagine or experience. Indeed rewards and punishments trigger different kinds of subjective feelings (such as pleasure versus pain or desire versus dread) and elicit different types of behaviors (approach versus avoidance or invigoration versus inhibition). These considerations might suggest the idea that rewards and punishments are processed by different parts of the brain. In this chapter we examine this idea in the context of reinforcement learning, a computational process that could in principle apply equally to rewards and punishments. We start by summarizing the computational principles underlying reinforcement learning (Box 23.1 and Fig. 23.1) and by describing typical tasks that implement a comparison between reward and punishment learning (Box 23.2 and Fig. 23.2). Then we expose the current hypotheses about the possible implementation of reward and punishment learning systems in the brain (Fig. 23.3). Last, BOX 23.1 The first reinforcement learning (RL) models come from the behaviorist tradition, in the form of mathematical laws describing learning curves [82] or formal descriptions of associative conditioning [2]. Subsequently, in the 1980s, computational investigation of RL received a significant boost when it grabbed the attention of machine learning scholars, who were aiming at developing algorithms for goal-oriented artificial agents [1]. In the Continued 291 Decision Neuroscience
Research Interests: