The computational roots of positivity and confirmation biases in reinforcement learning

Stefano Palminteri; Maël Lebreton

The computational roots of positivity and confirmation biases in reinforcement learning

Trends in Cognitive Sciences , 2022

Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one’s own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to ‘high-level’ belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories....Read more

Review The computational roots of positivity and conﬁrmation biases in reinforcement learning Stefano Palminteri 1,2,3, * and Maël Lebreton 4,5,6, * Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence conﬁrming one’s own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and conﬁrmation biases assumed them to be speciﬁc to ‘high-level’ belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same char- acteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neu- robiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories. From belief updating to reinforcement learning Our decisions critically depend on the beliefs we have about the options available to us: their probability of occurrence, conditional on the actions that we undertake, and their value – that is, how good they are. It is therefore not surprising that an ever-growing literature in cognitive psychol- ogy and behavioral economics focuses on how humans form and update their beliefs. While Bayesian inference principles provide a normative solution for how beliefs can be optimally updated when we receive new information, in humans, belief-updating behaviors often deviate from this normative benchmark. Among the most prominent systematic deviations, the positivity and the conﬁrmation biases (see Glossary) stand out for their pervasiveness and ecological relevance [1]. The positivity bias characterizes the fact that decision-makers tend to update their beliefs more when new evidence conveys a positive valence [1,2]. This bias has notoriously been revealed in situations where subjects learn something about themselves and preferentially integrate informa- tion that convey a positive signal [e.g., a higher intelligence quotient (IQ) or a lower risk of disease] [3–5]. The conﬁrmation bias characterizes the fact that decision-makers tend to update their be- liefs more when new evidence conﬁrms their prior beliefs and past decisions compared with when it disconﬁrms or contradicts them [1,6]. This bias can take many forms – extending to positive test strategies and selective information sampling – and it has been robustly reported in a variety of natural or laboratory experimental setups [6,7]. Of note, in most ecological settings, positivity and conﬁrmatory biases co-occur [8,9]. Indeed, unless a cogent experimental design carefully or- thogonalizes them, we typically hold opinions and select actions that we believe have a positive subjective value (e.g., a higher payoff in economic settings). Therefore, after a better than ex- pected outcome, such actions result in a positive and conﬁrmatory update [3,10,11]. To date, the dominant framework used to explain the existence and persistence of asymmetric belief updating posits that they stem from a ‘rational’ cost-beneﬁt trade-off. The cost of holding Highlights Human belief updating is pervaded by distortions, such as positivity and conﬁr- mation bias. Experimental evidence from a variety of tasks and collected in different mammal species suggest that these biases also exist in simple reinforcement learning (RL) contexts. Conﬁrmatory RL generates over-opti- mistic reward expectations and aberrant preferred response rates. Counter-intuitively, conﬁrmatory RL ex- hibits statistical advantages over unbi- ased RL in a variety of learning contexts. Conﬁrmatory RL may contribute to di- verse and apparently unrelated behav- ioral phenomena, such as stickiness to the status quo, overconﬁdence, and the persistence of (pathological) gambling. 1 Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France 2 Département d’Études Cognitives, Ecole Normale Supérieure, Paris, France 3 Université de Recherche Paris Sciences et Lettres, Paris, France 4 Paris School of Economics, Paris, France 5 LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland 6 Swiss Center for Affective Science, Ge- neva, Switzerland *Correspondence: stefano.palminteri@ens.fr (S. Palminteri) and mael.lebreton@pse.fr (M. Lebreton). Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx https://doi.org/10.1016/j.tics.2022.04.005 1 © 2022 Elsevier Ltd. All rights reserved. Trends in Cognitive Sciences TICS 2287 No. of Pages 15

objectively wrong (i.e., overly optimistic) beliefs is traded against the psychological beneﬁts of them being self-serving: believing in a world that is pleasant and reassuring per se (consumption value) [2,12–15]. While originally designed to account for the positivity bias, this logic arguably ex- tends to the conﬁrmation bias, when one considers that being right (as signaled by conﬁrmatory information) is also valuable and self-serving (‘the ego utility consequences of being right’ [3]). Im- portantly, both original and more recent versions of this theoretical account of asymmetric belief updating explicitly suggest that this class of learning biases is speciﬁc to high-level and ego-rele- vant beliefs [2,12,14,15], a position that seems supported by the fact that the positivity bias (as opposed to the conﬁrmatory bias) does not clearly extend to belief updates that are not ego-rel- evant, such as in purely ﬁnancial contexts [10,16–19]. In the present article, we review recent empirical and modeling studies that challenge the stan- dard account, and suggest that the asymmetries that affect high-level belief updates are shared with more elementary forms of updates. This set of empirical ﬁndings cannot be purely explained by the dominant, self-serving bias account of asymmetric updates and shows that some forms of positivity and conﬁrmatory biases occur across a wide variety of species and contexts. Testing asymmetric updating in the reinforcement-learning framework Arguably, the RL framework represents the ideal elementary form of motivated belief updating. RL characterizes the behavioral processes that consist of selecting among alternative courses of ac- tion, based on inferred economic (or affective) values that are learned by interacting with the en- vironment [20]. In addition to being computationally simple, elegant, and tractable, the most popular RL algorithms can solve (or be a core component of the solution to) higher-level cognitive tasks, such as spatial navigation, games involving strategic interactions, and even complex video games, thereby consisting an ideal basic building block for higher-level cognitive processes [21,22]. The basic experimental framework of a two-armed bandit task (often referred to as a two-alterna- tive forced-choice task) provides all the key elements necessary to assess the pervasiveness of positivity and conﬁrmation bias. In this simpliﬁed set-up, the decision-maker faces two neutral cues, associated with different reward distributions (Figure 1A). In the most popular RL formalism, the decision-maker learns, through an error-correction mechanism, to attach subjective values [Q (:)] to each option, which they use to make later choices (Figure 1B). Concretely, once an option is chosen (‘c’), the decision-maker receives an outcome R. The out- come is compared with its subjective value, generating a prediction error PE c ðÞ¼ Rc ðÞ–Qc ðÞ ½1 The prediction error is then used to update the subjective value of the chosen option via an error correction mechanism involving a weighting parameter, the learning rate: Qc ðÞ Qc ðÞþ α  PE c ðÞ ½2 Reframed in terms of belief updating, the magnitude of the prediction error quantiﬁes how surpris- ing the experienced outcome is, while its sign (positive or negative) speciﬁes the valence of the information carried by the experienced outcome. In other words, positive prediction errors follow outcomes that are better than expected (i.e., they signal relative gains or good news), while neg- ative prediction errors follow outcomes that are worse than expected (i.e., they signal relative Trends in Cognitive Sciences 2 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Glossary Belief-confirmation bias: the ten- dency to overweight or selectively sam- ple information that conﬁrms our own beliefs (‘what I believe is true’). Also referred to as prior-biased updating, belief perseverance, or conservatism, among other nomenclatures. Bias: a feature of a cognitive process that introduces systematic deviations between state of the world and an inter- nal representation. Choice-confirmation bias: the ten- dency to overweight information that conﬁrms our own choice (‘what I did was right’). Learning rate: a model parameter that traditionally indexes the extent to which prediction errors affect future expecta- tions. Model comparison: collection of methods aimed at determining what is the best model in a given dataset com- bining model ﬁtting and model simula- tions, to assess, respectively the falsiﬁability of the rejected models and the parsimony of the accepted one. Model fitting: a statistical method aimed at estimating the values of model parameters that maximize the likelihood of observing the empirical data. Model ﬁtting is not to be confounded with model comparison (see later). Positivity bias: the tendency to over- weight events with a positive affective valence. In the speciﬁc context of RL, it would consist in overweighting positive prediction errors (regardless of them being associated with chosen forgone option). Positivity bias is also sometimes referred to as the good news-bad news effect or preference-biased updating. Prediction error: the discrepancy between an expectation and the reality. In the context of RL, prediction errors are deﬁned as the difference between an expected and an obtained outcome and they therefore have a valence: they are positive when the outcome is better than expected, and they are negative when the outcome is worse than expected.

TICS 2287 No. of Pages 15 Trends in Cognitive Sciences Review The computational roots of positivity and conﬁrmation biases in reinforcement learning Stefano Palminteri 1,2,3, * and Maël Lebreton4,5,6,* Highlights Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence conﬁrming one’s own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and conﬁrmation biases assumed them to be speciﬁc to ‘high-level’ belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories. Human belief updating is pervaded by distortions, such as positivity and conﬁrmation bias. Experimental evidence from a variety of tasks and collected in different mammal species suggest that these biases also exist in simple reinforcement learning (RL) contexts. Conﬁrmatory RL generates over-optimistic reward expectations and aberrant preferred response rates. Counter-intuitively, conﬁrmatory RL exhibits statistical advantages over unbiased RL in a variety of learning contexts. From belief updating to reinforcement learning Our decisions critically depend on the beliefs we have about the options available to us: their probability of occurrence, conditional on the actions that we undertake, and their value – that is, how good they are. It is therefore not surprising that an ever-growing literature in cognitive psychology and behavioral economics focuses on how humans form and update their beliefs. While Bayesian inference principles provide a normative solution for how beliefs can be optimally updated when we receive new information, in humans, belief-updating behaviors often deviate from this normative benchmark. Among the most prominent systematic deviations, the positivity and the conﬁrmation biases (see Glossary) stand out for their pervasiveness and ecological relevance [1]. Conﬁrmatory RL may contribute to diverse and apparently unrelated behavioral phenomena, such as stickiness to the status quo, overconﬁdence, and the persistence of (pathological) gambling. 1 The positivity bias characterizes the fact that decision-makers tend to update their beliefs more when new evidence conveys a positive valence [1,2]. This bias has notoriously been revealed in situations where subjects learn something about themselves and preferentially integrate information that convey a positive signal [e.g., a higher intelligence quotient (IQ) or a lower risk of disease] [3–5]. The conﬁrmation bias characterizes the fact that decision-makers tend to update their beliefs more when new evidence conﬁrms their prior beliefs and past decisions compared with when it disconﬁrms or contradicts them [1,6]. This bias can take many forms – extending to positive test strategies and selective information sampling – and it has been robustly reported in a variety of natural or laboratory experimental setups [6,7]. Of note, in most ecological settings, positivity and conﬁrmatory biases co-occur [8,9]. Indeed, unless a cogent experimental design carefully orthogonalizes them, we typically hold opinions and select actions that we believe have a positive subjective value (e.g., a higher payoff in economic settings). Therefore, after a better than expected outcome, such actions result in a positive and conﬁrmatory update [3,10,11]. To date, the dominant framework used to explain the existence and persistence of asymmetric belief updating posits that they stem from a ‘rational’ cost-beneﬁt trade-off. The cost of holding Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France 2 Département d’Études Cognitives, Ecole Normale Supérieure, Paris, France 3 Université de Recherche Paris Sciences et Lettres, Paris, France 4 Paris School of Economics, Paris, France 5 LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland 6 Swiss Center for Affective Science, Geneva, Switzerland *Correspondence: stefano.palminteri@ens.fr (S. Palminteri) and mael.lebreton@pse.fr (M. Lebreton). https://doi.org/10.1016/j.tics.2022.04.005 © 2022 Elsevier Ltd. All rights reserved. 1 Trends in Cognitive Sciences objectively wrong (i.e., overly optimistic) beliefs is traded against the psychological beneﬁts of them being self-serving: believing in a world that is pleasant and reassuring per se (consumption value) [2,12–15]. While originally designed to account for the positivity bias, this logic arguably extends to the conﬁrmation bias, when one considers that being right (as signaled by conﬁrmatory information) is also valuable and self-serving (‘the ego utility consequences of being right’ [3]). Importantly, both original and more recent versions of this theoretical account of asymmetric belief updating explicitly suggest that this class of learning biases is speciﬁc to high-level and ego-relevant beliefs [2,12,14,15], a position that seems supported by the fact that the positivity bias (as opposed to the conﬁrmatory bias) does not clearly extend to belief updates that are not ego-relevant, such as in purely ﬁnancial contexts [10,16–19]. In the present article, we review recent empirical and modeling studies that challenge the standard account, and suggest that the asymmetries that affect high-level belief updates are shared with more elementary forms of updates. This set of empirical ﬁndings cannot be purely explained by the dominant, self-serving bias account of asymmetric updates and shows that some forms of positivity and conﬁrmatory biases occur across a wide variety of species and contexts. Testing asymmetric updating in the reinforcement-learning framework Arguably, the RL framework represents the ideal elementary form of motivated belief updating. RL characterizes the behavioral processes that consist of selecting among alternative courses of action, based on inferred economic (or affective) values that are learned by interacting with the environment [20]. In addition to being computationally simple, elegant, and tractable, the most popular RL algorithms can solve (or be a core component of the solution to) higher-level cognitive tasks, such as spatial navigation, games involving strategic interactions, and even complex video games, thereby consisting an ideal basic building block for higher-level cognitive processes [21,22]. The basic experimental framework of a two-armed bandit task (often referred to as a two-alternative forced-choice task) provides all the key elements necessary to assess the pervasiveness of positivity and conﬁrmation bias. In this simpliﬁed set-up, the decision-maker faces two neutral cues, associated with different reward distributions (Figure 1A). In the most popular RL formalism, the decision-maker learns, through an error-correction mechanism, to attach subjective values [Q (:)] to each option, which they use to make later choices (Figure 1B). Concretely, once an option is chosen (‘c’), the decision-maker receives an outcome R. The outcome is compared with its subjective value, generating a prediction error PEðcÞ ¼ RðcÞ–QðcÞ ½1 The prediction error is then used to update the subjective value of the chosen option via an error correction mechanism involving a weighting parameter, the learning rate: Qðc Þ QðcÞ þ α PEðcÞ ½2 Reframed in terms of belief updating, the magnitude of the prediction error quantiﬁes how surprising the experienced outcome is, while its sign (positive or negative) speciﬁes the valence of the information carried by the experienced outcome. In other words, positive prediction errors follow outcomes that are better than expected (i.e., they signal relative gains or good news), while negative prediction errors follow outcomes that are worse than expected (i.e., they signal relative 2 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Glossary Belief-confirmation bias: the tendency to overweight or selectively sample information that conﬁrms our own beliefs (‘what I believe is true’). Also referred to as prior-biased updating, belief perseverance, or conservatism, among other nomenclatures. Bias: a feature of a cognitive process that introduces systematic deviations between state of the world and an internal representation. Choice-confirmation bias: the tendency to overweight information that conﬁrms our own choice (‘what I did was right’). Learning rate: a model parameter that traditionally indexes the extent to which prediction errors affect future expectations. Model comparison: collection of methods aimed at determining what is the best model in a given dataset combining model ﬁtting and model simulations, to assess, respectively the falsiﬁability of the rejected models and the parsimony of the accepted one. Model fitting: a statistical method aimed at estimating the values of model parameters that maximize the likelihood of observing the empirical data. Model ﬁtting is not to be confounded with model comparison (see later). Positivity bias: the tendency to overweight events with a positive affective valence. In the speciﬁc context of RL, it would consist in overweighting positive prediction errors (regardless of them being associated with chosen forgone option). Positivity bias is also sometimes referred to as the good news-bad news effect or preference-biased updating. Prediction error: the discrepancy between an expectation and the reality. In the context of RL, prediction errors are deﬁned as the difference between an expected and an obtained outcome and they therefore have a valence: they are positive when the outcome is better than expected, and they are negative when the outcome is worse than expected. Trends in Cognitive Sciences (A) (B) (C) Trends in Cognitive Sciences Figure 1. Typical behavioral task and computational reinforcement learning framework. (A) A typical trial of a two-armed bandit task. Both the partial and complete feedback condition are presented. Labels in black indicate the objective steps of the trial, while labels in gray indicate the corresponding hidden cognitive processes. (B) Box-and-arrow representation of a reinforcement learning model of a two-armed bandit task. The ﬁgure presents a complete feedback task, where both the obtained [i.e., following the chosen option: R(c)] and forgone [i.e., following the unchosen option: R(u)] outcomes are displayed. The ﬁgure also presents a ‘full’ model with a learning rate speciﬁc to each combination of prediction error (PE) valence (positive ‘+’ or negative ‘–’) and relation to choice (chosen ‘c’ or unchosen ‘u’) [45,46]. (C) A ﬁgure of how the learning rates of the full (i.e., a model for a different learning rate for any possible combination of outcome types and prediction error valences) model relate to the those of the conﬁrmation bias model, which bundles together the learning rates for positive obtained and negative forgone (i.e., conﬁrmatory - ‘CON’) prediction errors and the learning rates for negative obtained and positive forgone (i.e., disconﬁrmatory - ‘DIS’) prediction errors. losses or bad news). In addition, a positive prediction error following the chosen option conﬁrms that the decision-maker was right to pick the current course of action (and the converse is true for a negative prediction error). In theory, it is possible to deﬁne two different learning rates, following these two types of prediction errors: Qðc Þ QðcÞ þ αþ PEðcÞ, if PEðcÞ > 0 α– PEðcÞ, if PEðcÞ < 0 ½3 As a consequence, in this simpliﬁed experimental and computational framework, an elementary counterpart of both the positivity and conﬁrmation bias should be reﬂected in a learning rate asymmetry – that is, in the fact that positive learning rates (α+) are higher than negative ones (α–). In the following sections, we review evidence in favor (or challenging) of the hypothesis that updating biases analogous to the positivity bias and conﬁrmation bias occur in simple RL tasks. Value update biases in reinforcement learning Positivity bias in reinforcement learning About 15 years ago, a few studies incidentally started ﬁtting variants of the Q-learning model to human data collected in simple RL tasks [23–27]. Notably, they ﬁtted Q-learning models with separate learning rates depending on prediction error valence [Q(α±)]. Comparisons between the two Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 3 Trends in Cognitive Sciences learning rates generally revealed a positivity bias (α+ > α–), although sometimes results were mixed across groups or learning phases. Arguably, a strong demonstration of a positivity bias requires three steps, which were usually absent in these incidental observations: ﬁrst, the Q(α±) model should outperform the standard model with one learning rate Q(α) in a stringent model comparison [24]; second, although allowed to vary across individuals, the comparison of the two learning rates estimated from model fitting should reveal a signiﬁcant asymmetry on average, such as α+ > α–; third, behavioral data should exhibit at least one qualitative pattern which falsiﬁes the standard model, while being explained by the Q(α±) (see [28] and Box 1 for a survey of the behavioral signatures of the positivity bias). These three levels of demonstration were unambiguously achieved in a recent study investigating asymmetric updating in a simple two-armed bandit task in humans [29]. The fact that individuals update the option values more following positive rather than negative prediction errors leads to optimistic overestimating of reward expectations and a heightened probability of selecting what the decision-maker believes is the best option. Importantly, the key aspects of such optimistic RL were later replicated in fully incentivized experiments, which included various types of outcome ranges, such as gain (+0.5€/0.0€), loss (0.0€/–0.5€), and mixed contexts (+0.5 €/–0.5€) [29,30]. These results conﬁrm that negative prediction errors are downweighted relative to positive prediction errors even when they are associated with actual monetary losses. Moreover, the positivity bias cannot be neutralized, nor reverted, by either increasing the saliency of negative outcomes (0.0 € ➔ –0.5 €) or decreasing the saliency of the positive outcomes (+0.5 € ➔ 0.0 €). Finally, this also tells us that the bias depends on the valence (or sign) of the prediction error and not the outcome. Generalizing the results Since then, several other studies featuring different experimental designs also ﬁtted the Q(α±) model, thus putting learning asymmetry to the test. In a task featuring different regimens of outcome uncertainty, learning rates are typically adaptively modulated as a function of this environmental volatility: learning rates in a volatile condition are higher than those in a stable condition [31]. In addition to this adaptive modulation, a positivity bias can be observed in human participants in both the low- and high-volatility conditions [32]. When the same volatility task features an ‘appetitive’ treatment (winning money vs. nothing) and an ‘aversive’ treatment (getting a mild electric shock vs. nothing), a positivity bias is reported in human participants in all treatments (rewarding and aversive) and conditions (stable and volatile) (Figure 2A) [33]. The positivity bias in learning rates has been found beyond two-armed bandit task contexts, such as in foraging situations [34], in multi-attribute RL (e.g., instantiated by Wisconsin Card Sorting Test [35]), in strategic interactions and multistep decisions with delayed rewards [36], and in learning transitivity relations [37]. These results suggest that the positivity bias is robust to major variations of experimental protocols, from uncertainty about the outcomes (stable vs. volatile) to differences in the nature of the outcomes themselves (e.g., primary, like electric shocks, or secondary, like money) and the extension of the state-transition structure of the task beyond two-armed bandits. It is worth noting, however, that on some occasions, studies failed to ﬁnd a positivity bias or even reported a negativity bias (α+ < α–) [38–43]. We argue that sources of such inconsistencies could sometimes be found in speciﬁc choices concerning model speciﬁcation that can hinder the identiﬁcation of a positivity bias (Box 2). Other features of the design, such as mixing instrumental (or ‘free’) choices and Pavlovian (or ‘forced’) trials, may also have blurred the result (see the following section). 4 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Trends in Cognitive Sciences Box 1. Behavioral signatures of positivity-biased update Here, we illustrate some behavioral signatures that have been associated with the positivity bias in standard reinforcement learning paradigms, focusing on two-armed bandit tasks with partial feedback (see Figure 1A in main text). A ﬁrst signature associated with positivity bias is reported in ‘stable’ bandits (i.e., situations where the option probabilities and values do not change), speciﬁcally in situation where there is no correct option [23]. In such situations, the positivity bias predicts the development of a preferred response rate to a much greater extent compared with the other learning rate patterns (Figure IA). Another signature has been uncovered in ‘reversal’ bandits, that is, tasks where after some time the best option becomes the worst and vice versa. In these situations, the positivity bias ﬁrst generates a high correct response rate before reversal, then induces a reluctance to switch toward the alternative option in the second phase (post reversal) [62–64] (Figure IB). Both the development of a higher preferred response rate and the reluctance to reverse can be broadly understood as manifestation of the fact that the positivity bias induces choice inertia. Here, the feedback that is supposed make us change our policy, is not taken into account [112]. A third signature of positivity bias, independent from the choice inertia phenomenon, comes from bandits designed to assess risk preferences, by contrasting a risky option (i.e., the option with variable outcome) to a safe one with similar expected value (Figure IC). Crucially, in these kinds of bandits, the alternative patterns of learning rates (unbiased, α+ = α–; and negativity bias, α+ < α–) predict subjects to behave in a risk-avoidant manner. Although prima facie counterintuitive, this result can be understood by considering that outcome sampling can locally generate a negative expectation for the risky option, which may never be corrected (with partial feedback). The positivity bias predicts a certain degree of risk-seeking behavior: a pattern that has often been observed in humans [55,113] (albeit sometimes in interaction with the valence of the decision frame) and frequently in non-human primates [90]. Finally, by inducing an overestimation of reward expectations, both positivity and choice-conﬁrmation biases mechanistically overestimate the subjective probability of making a correct choice. Not only is this prediction weakly conﬁrmed by the observation of widespread patterns of overconﬁdence in reinforcement learning tasks, but recent results also suggest that individual levels of overconﬁdence and conﬁrmatory learning are correlated [47,48]. (A) (B) (C) Trends in Cognitive Sciences Figure I. Behavioral signatures of biased updates. The panels display the two-armed bandit task contingencies (top) and simulated choice rates (bottom) as a function of the trial number with three different models (unbiased α+ = α–, positivity bias α+ > α–, and negativity bias α+ < α–). (A) Two-armed bandit task with stable contingencies and no correct response (top) and preferred choice rate (bottom). The preferred choice rate is deﬁned as the choice rate of the option most frequently selected by the simulated subject – by deﬁnition, in more than 50% of trials [29,46]. (B) Reversal learning task (top) and correct choice rate (bottom). (C) Risk preference task (top) and risky choice rate (bottom). The curves are obtained simulating the corresponding models using a very broad range of parameters’ values. For each task (‘stable’, ‘reversal’, and ‘risk’) and model (‘unbiased’, ‘positivity’, and ‘negativity’), we simulated 10 000 agents; decisions were implemented using a softmax decision rule. The parameters were drawn from uniform distributions covering all possible values of learning rates to ensure the generality of the results. See github.com/spalminteri/valence_ bias_simulations for full details. From positivity to conﬁrmatory bias The studies surveyed so far all feature what is often referred to as partial feedback conditions, that is, the standard situation where the subject is informed only about the outcome of the chosen option (Figure 1A and [44]). Critically, under this standard set-up, it is not possible to assess whether Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 5 Trends in Cognitive Sciences (B) (A) (C) (D) Trends in Cognitive Sciences Figure 2. Reinforcement learning biases across tasks, species, and outcome types. (A) The panel displays learning rates from Gagne et al. [33] plotted as a function of the nature of the outcomes used in the task (appetitive/money vs. aversive/electric shocks), the volatility of option-outcome contingencies (stable vs. volatility as in [31]), and the prediction error valence (positive ‘+’ vs. negative ‘–’). (B and C) The panel displays learning rates from Farashahi et al. [32] (B) and Ohta et al. [54] (C) plotted as a function of the species (monkeys vs. rats), the volatility of option-outcome contingencies, and the prediction error valence (positive ‘+’ vs. negative ‘–’). (D and E) The panels display the choiceconﬁrmation bias. The ﬁgure displays the learning rates from Chambon et al. [45] (experiment 2 in the paper) of a full model (i.e., a model with a different learning rate for any possible combination of choices, outcomes, and prediction error types) as a function of whether the outcome followed a free (or instrumental) or a forced (or observational) trial; whether the outcome was associated with the obtained or forgone option and, ﬁnally, the valence of the prediction error (positive ‘+’ or negative ‘–’). The overall pattern is consistent with a choice-conﬁrmation bias because positive obtained and negative forgone prediction errors are overweighed only if they follow a free choice (D), but not after a forced choice [observation trial; (E)]. Data visualization is as in [111]: horizontal lines represent the mean; the error bars represent the error of the mean; the box, the 95% conﬁdence interval. Finally, the colored area is the distribution of the individual points. the reported positivity bias actually reﬂects a saliency bias (‘all positive prediction errors are overweighed’) or a choice-confirmation bias (‘only positive prediction errors following obtained outcomes are overweighed’). To tease apart these interpretations, we conducted a series of 6 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Trends in Cognitive Sciences Box 2. Misidentifying asymmetric update Assessing reinforcement learning update biases relies on estimating learning rates from choice data. Although the logic of such inference is intuitive, ﬁtting and interpreting parameters remain some of the trickiest analytical steps in computational cognitive modeling [28,100,114]. Here, we discuss how the estimation of learning rates asymmetries can be affected by (or mistaken for) apparently neutral choices of model speciﬁcation and the omission of alternative computational processes. For instance, although counterintuitive, Q value initialization markedly affect learning rate and learning bias estimates. The reason is that the ﬁrst prediction error plays a very important role in shaping all subsequent responses, especially in designs involving a small number of trials and stable contingencies. For instance, pessimistic initializations (i.e., setting initial Q values lower than the true default expectation) can counter, or reverse, genuine positivity biases, by artiﬁcially amplifying the size of the ﬁrst positive prediction error. Consequently, it is not surprising that many of the papers reporting a negativity bias used pessimistic initialization [38–40] – although not all, (see [59]). Since the effect of priors vanishes after a few trials and in volatile environments, tasks featuring long learning phases and variable contingencies are particularly well suited to tease apart pessimistic initializations from positivity and conﬁrmation biases [33]. It has also recently been proposed that positivity and conﬁrmation biases may spuriously arise by ﬁtting different learning rates to models including an explicit choice-autocorrelation term [104,112]. The choice-autocorrelation term is usually modeled as a (ﬁxed or graded) bias in the choice function toward the option that was previously chosen and is thought to account for the development of a habitual processes [115]. Intuitively, both processes naturally lead to a similar escalation of choice repetition, as successful learning increasingly identify the best option (Box 1). Yet a crucial, conceptual difference is that the autocorrelation is independent of the outcome (i.e., of the prediction error). A recent meta-analysis showed that in nine datasets the choice-conﬁrmation bias is still detectable despite the inclusion of a choice-autocorrelation term [103]. It can be further argued that in the context of the typically short learning task (less than 1 h), developing a strong outcome-independent habit is unlikely. As a consequence, it is possible that studies ﬁtting explicit choice autocorrelation actually missed occurrences of positivity and conﬁrmation bias [27,116–118]. Tasks contrasting a riskier and safer options can tease apart these competing accounts, because only the positivity and conﬁrmation biases predict a preference for the riskier (high variance) options (Box 1) [55,90]. studies leveraging complete feedback conditions, that consist of also displaying the forgone (or counterfactual) outcome, that is, the outcome associated with the unchosen option in a twoarmed bandit task [45,46]. Under the saliency bias hypothesis, one expects larger learning rates for positive prediction errors, independent of them being associated with the chosen or unchosen option. Under the conﬁrmation bias hypothesis, one expects an interaction between the valence of the prediction error and its association with the chosen or the unchosen option (Figure 1B). The rationale is that a better-than-expected forgone outcome can be interpreted as a relative loss, as it indicates that the alternative course of action could have been beneﬁcial (a disconﬁrmatory signal). Symmetrically, a worse-than-expected forgone outcome can be interpreted as a relative gain as it indicates that the current course of action is advantageous (a conﬁrmatory signal). In a recent study that explicitly and systematically exploited this rationale, we observed the interaction characterizing the conﬁrmation bias hypothesis: positive and negative learning rates associated with the unchosen option mirrored the learning rates associated with the chosen option (Figure 3D; left). Additional model comparison analyses showed that the four learning rate model could be reduced to a two learning rate model, featuring a single parameter for all conﬁrmatory and disconﬁrmatory feedback, respectively (Figure 1C). The symmetrical pattern of learning rates, as well as the superiority of this implementation of choice conﬁrmation bias against other models, has been replicated several times in RL tasks that include both partial and complete feedback information [47–49]. In a follow-up study that further investigated the choice-related aspects of the positivity bias, standard instrumental trials were interleaved with observational trials, where participants observed the computer making a choice for them and the resulting outcome [45]. Results from model ﬁtting and model comparison indicated that the update bias was speciﬁc to freely chosen outcomes, further corroborating the presence of a proper choice-conﬁrmation bias (Figure 3D; right). Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 7 Trends in Cognitive Sciences Importantly, the fact that agency seems mandatory to observe the choice-conﬁrmation bias [45,50] is reminiscent of the ego-relevance aspect of belief-updating biases. Finally, several studies have experimentally manipulated participants’ beliefs about the option values through task instructions (e.g., by explicitly indicating option values to the participants before the beginning of the experiment) [51,52]). Behavioral results in this task are consistent with a model that assumes that the usual learning asymmetry is further exacerbated by the (instructed) prior about the option value, such that positive prediction errors following options with a positive prior are overweighted (and the reverse is true for options with a negative prior). Therefore, the available evidence is consistent with the idea that belief-confirmation bias can be induced in (A) (B) Trends in Cognitive Sciences Figure 3. Optimality of the learning rate biases. The ﬁgure displays the simulation results recently reported in Lefebvre et al. [62]. Performance of the model is expressed as the average reward per trial obtained by the artiﬁcial agents and is indexed by a colored gradient so that the yellow represents the highest values. Artiﬁcial agents are simulated playing a two-armed bandit task, using an exhaustive range of model parameters (learning rates) and across different task conditions. ‘Partial feedback’ refers to simulations where only the feedback of the chosen outcome is disclosed to the agent, while ‘complete feedback’ refers to simulations where both the obtained and forgone outcomes are disclosed to the agents. ‘Rich task’ refers to simulations in which both options have an overall positive expected value, while ‘poor task’ indicates the opposite conﬁguration. ‘Stable task’ refers to simulations featuring a good option (positive expected value) and a bad option (negative expected value), whose values do not change across time. On the contrary, ‘volatile task’ refers to simulations in which the options switched from good to bad (and vice versa) three times during the learning period. Performance is plotted as a function of the learning rates. Cells above the diagonal correspond to positivity bias (‘partial feedback’) or a conﬁrmation bias (‘complete feedback’). The cell with a black circle indicates the best possible unbiased (or symmetric) combination of learning rates (in terms of average reward per trial). Cells surrounded by black lines indicate the biased (or asymmetric) combinations of learning rates that obtain a higher reward rate compared with the best unbiased combination (see the original paper for more details; adapted with permission from [62]). 8 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Trends in Cognitive Sciences the context of RL via semantic instructions, thus suggesting a permeability between cognitive representations and instrumental associations. Positivity and conﬁrmation biases across evolution and development A valuable aspect of RL tasks in general (and n-armed bandits in particular) is that they are routinely used in non-human research, opening up the possibility of testing the comparative validity of the positivity bias results. To our knowledge, to date, few studies have tested the dual-learning model in other species. Among those few, one study featuring stable and volatile phases tested both humans and rhesus monkeys (Macaca mulatta) with the same task [32]. Like humans, monkeys displayed a positivity bias, whose size was, if anything, larger than that observed in humans (Figure 2B and Box 1 for possible behavioral consequences). A couple of recent studies in rodents (Rattus norvegicus) also provide support for the positivity bias [53,54]. In addition, they suggest that the bias could be modulated by factors such as the stage of learning (the bias being larger in the exploratory phase) and the overall value of the decision problem (the bias being larger in ‘poor’ environments) (Figure 2C). Regarding the developmental aspects of positivity and conﬁrmation bias, a series of recent studies investigated learning behavior in a simple two-armed bandit task in cohorts including children and young adults. While most of these studies actually report a positivity bias in all age groups [55–58] (but see [59]), they draw conﬂicting conclusions regarding the developmental trajectories of the bias. Further studies are therefore required to better assess the trajectory of these biases during development and aging, as well as identify the individual traits and tendencies that promote or counteract them. Is conﬁrmatory updating a ﬂaw or a desirable feature of reinforcement learning? The presence of update bias (such as the positivity and the conﬁrmation bias) in basic RL across species and contexts naturally raises the question of why evolution has selected and maintained what can be perceived, prima facie, as error-introducing processes that generate apparently irrational behavioral tendencies (Box 1). Statistical normativity of choice-conﬁrmation bias Early simulations restricted to speciﬁc task contingencies and partial feedback regimens demonstrated that a positivity bias is optimal in learning contexts with a low overall reward rate (‘poor’ environments) but detrimental in learning contexts with a high overall reward rate (‘rich’ environments) [60]. This result can be intuitively understood as a consequence of the fact that, in partial feedback situations, it is rational to preferentially take into account the prediction errors that are rare (i.e., positive prediction errors in ‘poor’ environments and negative prediction errors in ‘rich’ environments) (Figure 3A). However, to date, experimental data have not provided convincing evidence in favor of an inversion of the learning bias as a function of task demands [39,45] (but see [55] for a partial adaptation). Accordingly, a positivity bias following partial feedback is maintained in tasks involving contingency reversals and volatility [33,46], even though these reduce the learner’s capacity to quickly adapt their responses in these conditions (Box 1). However, the fact that the positivity bias appears maladaptive in some (laboratory based) conditions does not rule out the possibility that it has been selected and maintained by evolution because it could still be adaptive in most ecologically relevant scenarios [61]. Indeed, the fact that the bias is documented in several species, suggests that its statistical advantages should apply across a broad range of ecological contexts. A recent study systematic analyzed the performance of the choice-conﬁrmation bias in complete feedback contexts to clarify its statistical properties. Speciﬁcally, the study assessed its optimality Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 9 Trends in Cognitive Sciences in a larger space of learning problems, including ‘rich’ and ‘poor’, ‘stable’, and ‘volatile’ environments, as well as more demanding decision problems [62]. The authors reported that conﬁrmatory-biased RL algorithms generally outperform their unbiased counterparts (Figure 3B). This counterintuitive result, replicated by other simulation studies, arises from the fact that conﬁrmatory RL algorithms mechanistically neglect uninformative – stochastic – negative prediction errors associated with the best response. Thereby they accumulate resources (i.e., collect rewards and avoid losses) more efﬁciently than their unbiased counterparts [62–64]. Thus, conﬁrmatory updating appears to facilitate and optimize learning and performance in a broad range of learning situations [61,65]. Metacognitive efﬁciency potentiates the positivity bias Finally, positivity and conﬁrmatory bias may be normative or advantageous in combination with other features of cognition. Supporting this idea, recent work proposes that learning biases are normative when coupled with efﬁcient metacognition [66]. This is because when one can efﬁciently tease apart one’s own correct decisions from one’s mistakes, the probabilistic negative feedback (that sometimes inevitably follows correct choices) can be neglected. This creates a normative ground for positivity and conﬁrmation biases. Note that this mechanism might not be restricted to humans, as efﬁcient metacognition has been reported in animals, from non-human primates to rodents [67,68]. A challenge to this idea lies in the fact that learning biases and metacognitive (in)efﬁciencies might not be independent. Indeed, a yet unpublished study shows that in a two-armed bandit task where conﬁdence in choice is elicited, the conﬁrmation bias can cause overconﬁdence, which is a metacognitive bias [48]. While these ﬁndings challenge the idea that metacognition ensures that updating biases are normative, they might connect the asymmetric updating observed in RL to the original theoretical accounts of asymmetric belief updating, if overconﬁdence (i.e., the metacognitive illusion of accuracy) is considered self-serving per se, that is, carries an ego-relevant utility [15,69]. In conclusion, although this section reviewed the evidence that learning asymmetry may be normative in some contexts – and as such may provide justiﬁcation for its selection in that context – its persistence in contexts where it is unfavorable along with its lack of modulation in many circumstances reinforce the idea that learning asymmetry constitutes a hardcoded learning bias [39,45,54,55]. A complementary perspective on the normativity of this bias could emerge from different modeling perspectives. For example, a recent unpublished study suggests that asymmetric updating can be derived from Bayesian-optimal principles [70]. Neuronal bases Neural circuits for biased updating An important question concerns the neurobiological bases of positivity and conﬁrmatory bias in RL [71]. A prerequisite to answering this question is a consensus concerning the neural bases of RL, per se. The dominant hypothesis, stemming from the repeated and robust electrophysiological and pharmacological observations, postulates that reinforcement is instantiated by dopaminergic modulation of corticostriatal synapses [72–75]. A neural model of biased (or asymmetric) updates then further requires that the neural channels for positive and negative prediction errors are dissociable. In line with this assumption, anatomically plausible neural network models of corticostriatal circuits suggest that positive and negative reinforcements are mediated by speciﬁc subpopulations of striatal neurons, which exhibit different receptors with excitatory (D1) or inhibitory (D2) properties [76]. These models (as well as their more recent developments [77,78]), can therefore support, in principle, asymmetric updating, by implementing the processing of positive 10 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx Trends in Cognitive Sciences and negative reinforcements in different neurobiological pathways. Crucially, recent extensions of these models also account for the absence of biases following observational trials and its exacerbation induced by instruction priors [50,51]. A conceptually similar but structurally different neural network model put forward an alternative theory, which suggests a key computational role for metaplasticity in the generation of update biases [79]. While the metaplasticity framework does not necessitate the emergence of a positivity bias, this bias naturally emerges under most outcome contingencies and conﬁrms its advantageous properties [62–64,80]. Neural signatures in human studies Several lines of evidence suggest that the neurotransmitter dopamine and a basal ganglia structure, the striatum, govern the relative sensitivity to positive and negative prediction errors. First, in both healthy and neurological patients, dopaminergic modulation affects the learning rate bias, such that higher dopamine is associated with a higher positivity bias [81–84]. Second, in healthy subjects, interindividual differences in positivity bias are associated with higher striatal activation in response to rewards [29]. Interindividual differences in the positive bias have also been associated with pupil dilation (another physiological proxy of neuromodulator activity during outcome presentation in classic two-armed bandit tasks) [85]. Finally, the choice-conﬁrmation bias model supposes that positive and negative predictions associated, respectively, with obtained and forgone outcomes, are treated by the same learning rate as conﬁrmatory signals. fMRI studies of two-armed bandit tasks with complete feedback (Figure 1A) conﬁrm that obtained and forgone outcome signals are both encoded in the dopaminergic striatum, with opposite signs, thereby suggesting that the neurocomputational role currently attributed to this structure can be extended to accommodate the choice-conﬁrmation bias without major structural changes [86,87]. Loss aversion versus loss neglect Overall, the studies reviewed here suggest that in RL, outcomes are processed in a choice-conﬁrmatory manner. This bias takes the form of a selective neglect of losses (i.e., obtained punishments and forgone rewards) relative to gains (i.e., obtained rewards and forgone punishments) when updating outcome expectations. Superﬁcially, this pattern seems in stark contrast with a vast literature in behavioral economics revolving around the notion of loss aversion [88]. According to loss aversion, prospective losses loom greater than corresponding gains in determining individuals’ economic choices [89]. In the RL framework, this valuation asymmetry would directly translate into the negative prediction error having a larger relative inﬂuence on value expectation. Consequently, the choice-conﬁrmation bias observed in RL does not align, at least prima facie, with dominant behavioral economics theories, potentially representing an additional instance of the experience-description gap [44,90] (Figure 4). However, a more in-depth consideration of the processes at stake may help reconcile these apparently contradictory ﬁndings. First, loss aversion pertains to the calculation of subjective decision values, while loss neglect, in the context of RL, applies to the retrospective subjective assessment of experienced outcomes. It is well known that different heuristics and biases apply to expected and experience utilities [91,92]. Second, most of the ﬁndings reviewed here, although properly incentivized, use relatively small outcomes (primary or secondary). Evidence in behavioral science and economics suggests that the utility function may display speciﬁc features in the range of small amounts usually involved in RL studies, making them unsuited to test – and to challenge – the general structure of loss aversion [93,94] (but note some recent studies claim that loss aversion also extends to small outcomes [95]). Finally, it is worth noting that prospective loss aversion and retrospective loss neglect, although superﬁcially antithetic, provide complementary explanations for the status Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 11 Trends in Cognitive Sciences Outstanding questions What are the boundaries and limits of positivity and choice-conﬁrmation bias? Important questions remain, pertaining to, for example, the persistence of those biases in learning contexts where outcome distributions are not binomial (such as continuous outcomes – drifting bandits), or when outcome distributions include (very) high monetary stakes. Likewise, the existence and potential signatures of those biases in complex (multistep, multi-attribute) tasks are still to be investigated. Trends in Cognitive Sciences Figure 4. Loss aversion versus loss neglect. This ﬁgure exempliﬁes the crucial computational differences between ‘loss aversion’ and ‘loss neglect’. The former applies to decisions between explicit (or described) options, often referred to also as ‘prospects’ and experimentally instantiated by lotteries. The latter applies to decisions between options (often experimentally instantiated by bandits) whose values have been learned by trial and error (or experience). In the former case, the slope in the loss domain, which determines the relation between subjective and objective values, corresponds to the loss aversion parameter. In the latter case, the slopes in the positive and negative domains determine the extent at which an option estimate (Q value) is updated as a function of the prediction error; the slopes correspond to the learning rates for positive and negative prediction errors, respectively. quo bias. While loss aversion would explain the bias by the fear of losing current assets [94,96,97], loss neglect rather posits that we disregard the feedback that suggests we made a wrong decision (Box 1). Retrospective loss neglect (or choice-conﬁrmation bias), however, provides a putative, new computational explanation for the puzzling phenomenon of (pathological) gambling, which is difﬁcult to accommodate with loss aversion (see Figure IC in Box 1) [98,99]. Concluding remarks The evidence reviewed here suggests that, contrary to what was previously thought [2,69], positivity and conﬁrmation biases permeate RL, leading to an over-optimistic estimation of outcome expectations. This results in characteristic behavioral consequences (Box 1), that may explain phenomena such as choice inertia (or status quo bias) and risky decision-making (gambling). Empirical investigations of the choice-conﬁrmation bias in RL have mostly relied on inferring model parameters from choice data. Therefore, no matter how carefully this inferential process is carried out [100], it is still conceivable that a surrogate, spurious computational process is responsible for the observed patterns of behavioral and neurobiological results. While we believe the current competing interpretations are not supported by available experimental evidence (Box 2 and [101–106]), future research should carefully combine model ﬁtting and clever designs, to provide unambiguous evidence for the neurocomputational mechanisms of positivity and conﬁrmatory biases [28]. Recently, a stream of studies from cognitive (neuro)science has described behavioral patterns consistent with this emerging account of positivity and conﬁrmatory bias. Indeed, conﬁrmation bias was recently described in a simple perceptual task [107,108], within the time-evolving dynamic of the decision [109]. Crucially, in this latter case, the act of choosing was critical to the 12 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx What are the precise computational mechanisms underlying positivity and choice-conﬁrmation bias? Further research should elucidate whether those biases are rather generated by an absolute overweighting of positive/ conﬁrmatory feedback, an absolute underweighting of negative/ disconﬁrmatory feedback, or simply a relative imbalance between the two. Which cognitive processes contribute or affect the choice-conﬁrmation bias? Links with selective attention and selective memory might be of special interest, as they are both suspected to have a key role in biasing high-order belief updating. What could be the macroscopic consequences of these reinforcement biases? We anticipate that two directions might be particularly fruitful: the clinical direction, where the biases could have a role in several types of addiction and pathological gambling; and the social sciences direction, where the elementary RL biases could be connected with general social phenomena like opinion polarization. What could be the beneﬁt of including these RL biases in artiﬁcial intelligent agents? Simulation studies show that the conﬁrmation biases is statistically optimal in simple two-armed bandit tasks, but what about more complex learning problems? What about more ecological situations? Trends in Cognitive Sciences expression of the bias [110]. These ﬁndings suggest that conﬁrmation bias is not purely a reﬂection of a high-level reasoning bias, nor restricted to the domain of abstract, semantic beliefs. In sum, a growing body of empirical studies in humans and animals reveal that the asymmetries that affect high-level belief updates are shared with more elementary forms of updates, notably in the form of the choice-conﬁrmation bias observed in RL. Whether those update asymmetries are caused by shared neurocomputational mechanisms, or whether they have emerged independently in two separate pathways remains an open question (see Outstanding questions). Finally, at the conceptual level, it seems that important links between concepts of agency, metacognition, and ego-relevance could help reconciliate fundamental aspects of belief and value update asymmetries. Acknowledgments S.P. and M.L. thank Germain Lefebvre, Nahuel Salem-Garcia, Valerian Chambon, and Héloise Théro for stimulating discussions and for leading most of the experimental work that nurtured these ideas over the last years. S.P. and M.L. thank Zoe Koopmans for proof reading the manuscript. S.P. and M.L. thank Alireza Soltani, Hiroyuki Ohta, Sonia Bishop, Christopher Gagne, and Germain Lefebvre for providing material for the ﬁgures. S.P. is supported by the Institut de Recherche en Santé Publique (IRESP, grant number: 20II138-00), and the Agence National de la Recherche (CogFinAgent: ANR-21-CE23-000202; RELATIVE: ANR-21-CE37-0008-01; RANGE: ANR-21-CE28-0024-01). The Departement d’Études Cognitives is supported by the Agence National de la Recherche (ANR; FrontCog ANR-17-EURE-0017). M.L. is supported by a Swiss National Science Foundation (SNSF) Ambizione grant (PZ00P3_174127) and an European research Council (ERC) Starting Grant (INFORL-948671). Declaration of interests No interests are declared. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Benjamin, D.J. (2019) Errors in probabilistic reasoning and judgment biases. In Handbook of Behavioral Economics: Applications and Foundations (Bernheim, B.D. et al., eds), pp. 69–186, North-Holland Sharot, T. and Garrett, N. (2016) Forming beliefs: why valence matters. Trends Cogn. Sci. 20, 25–33 Eil, D. and Rao, J.M. (2011) The good news-bad news effect: asymmetric processing of objective information about yourself. Am. Econ. J. Microecon. 3, 114–138 Kuzmanovic, B. et al. (2018) Inﬂuence of vmPFC on dmPFC predicts valence-guided belief formation. J. Neurosci. 38, 7996–8010 Sharot, T. et al. (2011) How unrealistic optimism is maintained in the face of reality. Nat. Neurosci. 14, 1475–1479 Klayman, J. (1995) Varieties of conﬁrmation bias. In The Psychology of Learning and Motivation (Busemeyer, J. et al., eds), pp. 385–418, Academic Press Nickerson, R.S. (1998) Conﬁrmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2, 175–220 Eskreis-Winkler, L. and Fishbach, A. (2019) Not learning from failure—the greatest failure of all. Psychol. Sci. 30, 1733–1744 Staats, B.R. et al. (2018) Maintaining beliefs in the face of negative news: the moderating role of experience. Manag. Sci. 64, 804–824 Coutts, A. (2019) Good news and bad news are still news: experimental evidence on belief updating. Exp. Econ. 22, 369–395 Tappin, B.M. et al. (2017) The heart trumps the head: desirability bias in political belief revision. J. Exp. Psychol. Gen. 146, 1143 Bénabou, R. and Tirole, J. (2016) Mindful economics: the production, consumption, and value of beliefs. J. Econ. Perspect. 30, 141–164 Loewenstein, G. and Molnar, A. (2018) The renaissance of belief-based utility in economics. Nat. Hum. Behav. 2, 166–167 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Sharot, T. et al. (2021) Why and when beliefs change: a multiattribute value-based decision problem. PsyArXiv Published online November 4 2021. https://doi.org/10.31234/osf.io/ q75ej Bénabou, R. and Tirole, J. (2002) Self-conﬁdence and personal motivation. Q. J. Econ. 117, 871–915 Kuhnen, C.M. and Knutson, B. (2011) The inﬂuence of affect on beliefs, preferences, and ﬁnancial decisions. J. Financ. Quant. Anal. 46, 605–626 Barron, K. (2021) Belief updating: does the ‘good-news, badnews’ asymmetry extend to purely ﬁnancial domains? Exp. Econ. 24, 31–58 Kuhnen, C.M. (2015) Asymmetric learning from ﬁnancial information. J. Finan. 70, 2029–2062 Buser, T. et al. (2018) Responsiveness to feedback as a personal trait. J. Risk Uncertain. 56, 165–192 Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction, Cambridge University Press Botvinick, M. et al. (2019) Reinforcement learning, fast and slow. Trends Cogn. Sci. 23, 408–422 Hassabis, D. et al. (2017) Neuroscience-inspired artiﬁcial intelligence. Neuron 95, 245–258 Aberg, K.C. et al. (2016) Linking individual learning styles to approach-avoidance motivational traits and computational aspects of reinforcement learning. PLoS One 11, e0166675 Chase, H.W. et al. (2010) Approach and avoidance learning in patients with major depression and healthy controls: relation to anhedonia. Psychol. Med. 40, 433–440 Frank, M.J. et al. (2007) Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl. Acad. Sci. U. S. A. 104, 16311–16316 Kahnt, T. et al. (2009) Dorsal striatal–midbrain connectivity in humans predicts how reinforcements are used to guide decisions. J. Cogn. Neurosci. 21, 1332–1345 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 13 Trends in Cognitive Sciences 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 14 den Ouden, H.E.M. et al. (2013) Dissociable effects of dopamine and serotonin on reversal learning. Neuron 80, 1090–1100 Palminteri, S. et al. (2017) The importance of falsiﬁcation in computational cognitive modeling. Trends Cogn. Sci. 21, 425–433 Lefebvre, G. et al. (2017) Behavioural and neural characterization of optimistic reinforcement learning. Nat. Hum. Behav. 1, 1–9 Ting, C.-C. et al. (2021) The elusive effects of incidental anxiety on reinforcement-learning. J. Exp. Psychol. Learn. Mem. Cogn. Published online September 13, 2021. https://doi.apa.org/doi/ 10.1037/xlm0001033 Behrens, T.E.J. et al. (2007) Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 Farashahi, S. et al. (2019) Flexible combination of reward information across primates. Nat. Hum. Behav. 3, 1215–1224 Gagne, C. et al. (2020) Impaired adaptation of learning to contingency volatility in internalizing psychopathology. eLife 9, e61387 Garrett, N. and Daw, N.D. (2020) Biased belief updating and suboptimal choice in foraging decisions. Nat. Commun. 11, 3417 Steinke, A. et al. (2020) Parallel model-based and model-free reinforcement learning for card sorting performance. Sci. Rep. 10, 15464 Nioche, A. et al. (2019) Coordination over a unique medium of exchange under information scarcity. Palgrave Commun. 5, 1–11 Ciranka, S. et al. (2022) Asymmetric reinforcement learning facilitates human inference of transitive relations. Nat. Hum. Behav. 6, 555–564 Christakou, A. et al. (2013) Neural and psychological maturation of decision-making in adolescence and young adulthood. J. Cogn. Neurosci. 25, 1807–1823 Gershman, S.J. (2015) Do learning rates adapt to the distribution of rewards? Psychon. Bull. Rev. 22, 1320–1327 Niv, Y. et al. (2012) Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci. 32, 551–562 Pulcu, E. and Browning, M. (2017) Affective bias as a rational response to the statistics of rewards and punishments. eLife 6, e27879 Wise, T. and Dolan, R.J. (2020) Associations between aversive learning processes and transdiagnostic psychiatric symptoms in a general population sample. Nat. Commun. 11, 4179 Wise, T. et al. (2019) A computational account of threat-related attentional bias. PLoS Comput. Biol. 15, e1007341 Hertwig, R. and Erev, I. (2009) The description–experience gap in risky choice. Trends Cogn. Sci. 13, 517–523 Chambon, V. et al. (2020) Information about action outcomes differentially affects learning from self-determined versus imposed choices. Nat. Hum. Behav. 4, 1067–1079 Palminteri, S. et al. (2017) Conﬁrmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput. Biol. 13, e1005684 Lebreton, M. et al. (2019) Contextual inﬂuence on conﬁdence judgments in human reinforcement learning. PLoS Comput. Biol. 15, e1006973 Salem-Garcia, N.A. et al. (2021) The computational origins of conﬁdence biases in reinforcement learning. PsyArXiv Published online July 6, 2021. https://doi.org/10.31234/osf.io/ dpqj6 Schüller, T. et al. (2020) Decreased transfer of value to action in Tourette syndrome. Cortex 126, 39–48 Cockburn, J. et al. (2014) A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 83, 551–557 Doll, B.B. et al. (2009) Instructional control of reinforcement learning: a behavioral and neurocomputational investigation. Brain Res. 1299, 74–94 Doll, B.B. et al. (2011) Dopaminergic genes predict individual differences in susceptibility to conﬁrmation bias. J. Neurosci. 31, 6188–6198 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. Harris, C. et al. (2020) Unique features of stimulus-based probabilistic reversal learning. bioRxiv Published online September 25, 2022. https://doi.org/10.1101/2020.09.24.310771 Ohta, H. et al. (2021) The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw. 143, 218–229 Nussenbaum, K. et al. (2021) Flexibility in valenced reinforcement learning computations across development. PsyArXiv Published online November 16, 2021. https://doi.org/10. 31234/osf.io/5f9uc Chierchia, G. et al. (2021) Choice-conﬁrmation bias in reinforcement learning changes with age during adolescence. PsyArXiv Published online October 6, 2021. https://doi.org/ 10.31234/osf.io/xvzwb Habicht, J. et al. (2021) Children are full of optimism, but those rose-tinted glasses are fading—Reduced learning from negative outcomes drives hyperoptimism in children. J. Exp. Psychol. Gen. Published online December 30, 2021. https:// doi.apa.org/doi/10.1037/xge0001138 Xia, L. et al. (2021) Modeling changes in probabilistic reinforcement learning during adolescence. PLoS Comput. Biol. 17, e1008524 Rosenbaum, G.M. et al. (2022) Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory. eLife 11, e64620 Cazé, R.D. and van der Meer, M.A.A. (2013) Adaptive properties of differential learning rates for positive and negative outcomes. Biol. Cybern. 107, 711–719 Gigerenzer, G. and Selten, R. (2002) Bounded Rationality: The Adaptive Toolbox, MIT Press Lefebvre, G. et al. (2022) A normative account of conﬁrmation bias during reinforcement learning. Neural Comput. 34, 307–337 Kandroodi, M.R. et al. (2021) Optimal reinforcement learning with asymmetric updating in volatile environments: a simulation study. bioRxiv Published online February 16, 2021. https://doi. org/10.1101/2021.02.15.431283 Tarantola, T. et al. (2021) Conﬁrmation bias optimizes reward learning. bioRxiv Published online March 11, 2021. https:// doi.org/10.1101/2021.02.27.433214 Summerﬁeld, C. and Tsetsos, K. (2020) Rationality and efﬁciency in human decision-making. In The Cognitive Neurosciences VII (Gazzaniga, M., ed.), pp. 427–438, MIT Press Rollwage, M. and Fleming, S.M. (2021) Conﬁrmation bias is adaptive when coupled with efﬁcient metacognition. Philos. Trans. R. Soc. B Biol. Sci. 376, 20200131 Joo, H.R. et al. (2021) Rats use memory conﬁdence to guide decisions. Curr. Biol. 31, 4571–4583.e4 Kepecs, A. and Mainen, Z.F. (2012) A computational framework for the study of conﬁdence in humans and animals. Philos. Trans. R. Soc. B Biol. Sci. 367, 1322–1337 Sharot, T. et al. (2021) Why and when beliefs change: a multiattribute value-based decision problem. PsyArXiv Published online November 4, 2021. https://doi.org/10.31234/osf.io/ q75ej Kobayashi, T. (2021) Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization. ArXiv Published online May 27, 2021. https://doi.org/10.48550/arXiv.2105. 12991 Palminteri, S. and Pessiglione, M. (2017) Opponent brain systems for reward and punishment learning: causal evidence from drug and lesion studies in humans. In Decision Neuroscience (Dreher, J.-C. and Tremblay, L., eds), pp. 291–303, Academic Press Bayer, H.M. and Glimcher, P.W. (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 Dayan, P. (2012) Twenty-ﬁve lessons from computational neuromodulation. Neuron 76, 240–256 Di Chiara, G. (1999) Drug addiction as dopamine-dependent associative learning disorder. Eur. J. Pharmacol. 375, 13–30 Schultz, W. et al. (1997) A neural substrate of prediction and reward. Science 275, 1593–1599 Frank, M.J. (2006) Hold your horses: a dynamic computational role for the subthalamic nucleus in decision making. Neural Netw. 19, 1120–1136 Trends in Cognitive Sciences 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. Collins, A.G.E. and Frank, M.J. (2014) Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychol. Rev. 121, 337–366 van Swieten, M.M.H. and Bogacz, R. (2020) Modeling the effects of motivation on choice and learning in the basal ganglia. PLoS Comput. Biol. 16, e1007465 Soltani, A. et al. (2006) Neural mechanism for stochastic behaviour during a competitive game. Neural Netw. 19, 1075–1090 Farashahi, S. et al. (2017) Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron 94, 401–414.e6 Frank, M.J. et al. (2004) By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science 306, 1940–1943 McCoy, B. et al. (2019) Dopaminergic medication reduces striatal sensitivity to negative outcomes in Parkinson’s disease. Brain 142, 3605–3620 Palminteri, S. et al. (2009) Pharmacological modulation of subliminal learning in Parkinson’s and Tourette’s syndromes. Proc. Natl. Acad. Sci. U. S. A. 106, 19179–19184 Pessiglione, M. et al. (2006) Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 Slooten, J.C.V. et al. (2018) How pupil responses track valuebased decision-making during and after reinforcement learning. PLoS Comput. Biol. 14, e1006632 Li, J. and Daw, N.D. (2011) Signals in human striatum are appropriate for policy update rather than value prediction. J. Neurosci. 31, 5504–5511 Klein, T.A. et al. (2017) Learning relative values in the striatum induces violations of normative decision making. Nat. Commun. 8, 16033 Ruggeri, K. et al. (2020) Replicating patterns of prospect theory for decision under risk. Nat. Hum. Behav. 4, 622–633 Kahneman, D. and Tversky, A. (1979) Prospect theory: an analysis of decision under risk. Econometrica 47, 263 Garcia, B. et al. (2021) The description–experience gap: a challenge for the neuroeconomics of decision-making under uncertainty. Philos. Trans. R. Soc. B Biol. Sci. 376, 20190665 Kahneman, D. and Tversky, A. (2000) Choices, Values, and Frames, Cambridge University Press Kahneman, D. et al. (1997) Back to Bentham? Explorations of experienced utility. Q. J. Econ. 112, 375–406 Yechiam, E. (2019) Acceptable losses: the debatable origins of loss aversion. Psychol. Res. 83, 1327–1339 Anderson, C.J. (2003) The psychology of doing nothing: forms of decision avoidance result from reason and emotion. Psychol. Bull. 129, 139–167 Sokol-Hessner, P. and Rutledge, R.B. (2019) The psychological and neural basis of loss aversion. Curr. Dir. Psychol. Sci. 28, 20–27 Jachimowicz, J.M. et al. (2019) When and why defaults inﬂuence decisions: a meta-analysis of default effects. Behav. Public Policy 3, 159–186 Kahneman, D. et al. (1991) Anomalies: the endowment effect, loss aversion, and status quo bias. J. Econ. Perspect. 5, 193–206 Fauth-Bühler, M. et al. (2017) Pathological gambling: a review of the neurobiological evidence relevant for its classiﬁcation as an addictive disorder. Addict. Biol. 22, 885–897 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. Clark, L. et al. (2019) Neuroimaging of reward mechanisms in Gambling disorder: an integrative review. Mol. Psychiatry 24, 674–693 Wilson, R.C. and Collins, A.G. (2019) Ten simple rules for the computational modeling of behavioral data. eLife 8, e49547 Agrawal, V. and Shenoy, P. (2021) Tracking what matters: a decision-variable account of human behavior in bandit tasks. Proceedings of the 43rd Annual Meeting of the Cognitive Science Society, virtual meeting Harada, T. (2020) Learning from success or failure? – Positivity biases revisited. Front. Psychol. 11, 1627 Palminteri, S. (2021) Choice-conﬁrmation bias and gradual perseveration in human reinforcement learning. PsyArXiv Published online July 6, 2021. https://doi.org/10.31234/osf.io/ dpqj6 Sugawara, M. and Katahira, K. (2021) Dissociation between asymmetric value updating and perseverance in human reinforcement learning. Sci. Rep. 11, 3574 Tano, P. et al. (2017) Variability in prior expectations explains biases in conﬁdence reports. bioRxiv Published online April 13, 2017. https://doi.org/10.1101/127399 Zhou, C.Y. et al. (2020) Devaluation of unchosen options: a Bayesian account of the provenance and maintenance of overly optimistic expectations. CogSci. 42, 1682–1688 Rajsic, J. et al. (2015) Conﬁrmation bias in visual search. J. Exp. Psychol. Hum. Percept. Perform. 41, 1353–1364 Rollwage, M. et al. (2020) Conﬁdence drives a neural conﬁrmation bias. Nat. Commun. 11, 2634 Talluri, B.C. et al. (2018) Conﬁrmation bias through selective overweighting of choice-consistent evidence. Curr. Biol. 28, 3128–3135.e8 Talluri, B.C. et al. (2021) Choices change the temporal weighting of decision evidence. J. Neurophysiol. 125, 1468–1481 Bavard, S. et al. (2021) Two sides of the same coin: beneﬁcial and detrimental consequences of range adaptation in human reinforcement learning. Sci. Adv. 7, eabe0340 Katahira, K. (2018) The statistical structures of reinforcement learning with asymmetric value updates. J. Math. Psychol. 87, 31–45 Madan, C.R. et al. (2019) Comparative inspiration: from puzzles with pigeons to novel discoveries with humans in risky choice. Behav. Process. 160, 10–19 Eckstein, M.K. et al. (2021) What do reinforcement learning models measure? Interpreting model parameters in cognition and neuroscience. Curr. Opin. Behav. Sci. 41, 128–137 Miller, K.J. et al. (2019) Habits without values. Psychol. Rev. 126, 292 Correa, C.M.C. et al. (2018) How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. J. Neurosci. 38, 10338–10348 Gueguen, M.C.M. et al. (2021) Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans. Nat. Commun. 12, 3344 Voon, V. et al. (2015) Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry 20, 345–352 Trends in Cognitive Sciences, Month 2022, Vol. xx, No. xx 15

办英国学位证仿制坎布里亚大学硕士毕业证《Q微信/1954292140》学历认证疫情：哪里卖坎布里亚大学文凭成绩单修改文凭等级《留学申请材料英国坎布里亚大学学位证书学历证书》、办理英国坎布里亚大学大学毕业证文凭、坎布里亚大学毕业证学历文凭如可办理、《英国坎布里亚大学留学生毕业证认证留信认证》、哪里卖英国毕业证认证学校原版英国英国坎布里亚大学 Bachloer Degree坎布里亚大学Offer letter。全套留学文凭服务：坎布里亚大学毕业证+成绩单+学历认证《Q微1954 292 140》《哪里卖坎布里亚大学毕业证认证》《英国文凭等级学校原版坎布里亚大学毕业证认证》#成绩单 #真实回国人员证明 #真实教育部认证。让您回国发展信心十足！一整套留学文凭证件服务：一：《英国坎布里亚大学毕业证认证哪里卖》【Q/微1954292140】《学校原版坎布里亚大学学历认证报告文凭等级》毕业证#成绩单等全套材料，从防伪到印刷，水印底纹到钢印烫金；二：真实使馆认证（留学人员回国证明），使馆存档；三：真实教育部认证，教育部存档，教育部留服网站可查；四：留信认证，留学生信息网站可查；五：国外学历、毕业证、学位证、成绩单办理《英国坎布里亚大学成绩单电子版坎布里亚大学文凭等级1比1制作》【Q/微1954292140】《学校原版英国坎布里亚大学毕业证认证成绩单电子版》。真实留信认证的作用《哪里卖坎布里亚大学毕业证认证》(私企，外企，荣誉的见证): 1：该专业认证可证明留学生真实留学身份《英国文凭等级学校原版坎布里亚大学毕业证认证》《Q微1954292140》。 2：同时对留学生所学专业等级给予评定。 3：国家专业人才认证中心颁发入库证书 4：这个入网证书并且可以归档到地方 5：凡是获得留信网入网的信息将会逐步更新到个人身份内，将在公安部网内查询个人身份证信息后，同步读取人才网入库信息。 6：个人职称评审加20分。 7：个人信誉贷款加10分。 8：在国家人才网主办的全国网络招聘大会中纳入资料，供国家500强等高端企业选择人才。八年从业经验《英国文凭等级学校原版坎布里亚大学毕业证认证》【Q/微1954292140】《1比1制作坎布里亚大学成绩单电子版》、专业指导、私人定制、倾心为您解决留学毕业回国各种疑难问题。 <1>教育部学历学位认证服务: 做到真实永久存档，网上轻易可查，绝对对客户的资料进行保密，登录核实后再付款。中国教育部留学服务中心认证（中国）：《国外学历学位认证》 <2>为什么您的学位需要在国内进一步认证？如果您计划在国内发展，那么办理国内教育部认证是必不可少的。事业性用人单位如银行，国企，公务员，在您应聘时都会需要您提供这个认证。其他私营、外企企业，无需提供！办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，帮您快速整合材料，让您少走弯路。在留学之路上，千万不要觉得挂科是离自己很遥远的事，如果只是抱着车到山前必有路的想法，在无法毕业《哪里卖坎布里亚大学毕业证认证》【Q/微1954 292 140】《英国坎布里亚大学成绩单电子版1比1制作》的边缘疯狂试探，到了事情已经是无法挽回的地步，才会后悔莫及。因为现在没有顺利毕业，拿不到本科学位《英国文凭等级学校原版坎布里亚大学毕业证认证》【Q/微1954 292 140】《1比1制作坎布里亚大学成绩单电子版》，回国是根本没有办法可以做学历认证《英国坎布里亚大学毕业证认证哪里卖》【Q/微1954 292 140】《学校原版坎布里亚大学文凭等级》的。而本科没有顺利毕业就是一件非常遗憾的事了，不仅花费了大量的时间和费用，后没有一个学位给自己的留学生涯一个交代，回国之后受认可的就只有出国前的学历，拿着这样的学历去找工作只会变得举步维艰。

学位证书认证报告《伯明翰城市大学毕业证书原版制作BCU成绩单》【微信95270640】《仿制BCU毕业证成绩单伯明翰城市大学学位证书pdf电子图》，A.为什么留学生需要操作留信认证? 留信认证全称全国留学生信息服务网认证,隶属于北京中科院。①留信认证门槛条件更低,费用更美丽,并且包过,完单周期短,效率高②留信认证虽然不能去国企,但是一般的公司都没有问题,因为国内很多公司连基本的留学生学历认证都不了解。这对于留学生来说,这就比自己光拿一个证书更有说服力,因为留学学历可以在留信网站上进行查询! B.为什么我们提供的毕业证成绩单具有使用价值？查询留服认证是国内鉴别留学生海外学历的唯一途径，但认证只是个体行为，不是所有留学生都操作，所以没有办理认证的留学生的学历在国内也是查询不到的，他们也仅仅只有一张文凭。所以这时候我们提供的和学校颁发的一模一样的毕业证成绩单，就有了使用价值。实体公司，专业可靠，办理加拿大毕业证|办美国成绩单|做德国文凭学历认证|办新西兰学位证，办澳洲文凭认证，办留信网认证（网上可查，实体公司，专业可靠）铸就十年品质！信誉！实体公司！【实体公司】办伯明翰城市大学伯明翰城市大学毕业证成绩单学历认证学位证文凭认证办留信网认证办留服认证办教育部认证（网上可查实体公司专业可靠） — — — 留学归国服务中心 — — - 【主营项目】一.伯明翰城市大学毕业证成绩单使馆认证教育部认证成绩单等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 国外毕业证学位证成绩单办理流程： 1客户提供伯明翰城市大学伯明翰城市大学毕业证成绩单办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。专业服务请勿犹豫联系我！本公司是留学创业和海归创业者们的桥梁。一次办理终生受用一步到位高效服务。详情请在线咨询办理,欢迎有诚意办理的客户咨询!洽谈。招聘代理：本公司诚聘英国加拿大澳洲新西兰美国法国德国新加坡各地代理人员如果你有业余时间有兴趣就请联系我们咨询顾问：+微信:95270640

Log In

The computational roots of positivity and confirmation biases in reinforcement learning