MINI REVIEW a section of the journal Frontiers in Psychology Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks

Stefano  Palminteri

MINI REVIEW published: 21 November 2018 doi: 10.3389/fpsyg.2018.02307 Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks? Stefano Palminteri 1,2,3* and Coralie Chevallier 1,2,3 1 Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, France, Département d’Etudes Cognitives, Ecole Normale Supérieure, Paris, France, 3 Institut d’Etudes de la Cognition, Université de Paris Sciences et Lettres, Paris, France 2 Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent evidence suggests that behavioral tasks fare poorly compared to standard questionnaires to measure individual differences in risk-taking. Crucially, using model-based measures of risk taking does not seem to improve reliability. Here, we put forward two possible – not mutually exclusive – explanations for these results and suggest future avenues of research to improve the assessment of inter-individual differences in risk-taking by combining repeated online testing and mechanistic computational models. Keywords: risk-taking, inter-individual variability, behavioral phenotype, behavioral economics, correlational psychology Edited by: Michael Banissy, Goldsmiths, University of London, United Kingdom Reviewed by: Ariel Telpaz, General Motors, United States Joshua Weller, Tilburg University, Netherlands *Correspondence: Stefano Palminteri stefano.palminteri@ens.fr Specialty section: This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology Received: 28 May 2018 Accepted: 05 November 2018 Published: 21 November 2018 Citation: Palminteri S and Chevallier C (2018) Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks? Front. Psychol. 9:2307. doi: 10.3389/fpsyg.2018.02307 INTRODUCTION In a recent series of studies Frey et al. (2017) investigated the relationship between diﬀerent measures of risk-sensitivity in a laboratory-based experiment involving over a thousand participants (N∼1500) (Frey et al., 2017; Pedroni et al., 2017). By comparing standard behavioral tasks, personality questionnaires, and reports of actual frequency of risky behaviors, the authors were able to demonstrate that behavioral tasks are consistently less reliable than questionnaires. First, performance in risk-taking tasks were less correlated to actual frequency of risky behaviors than personality scores, which suggests that external validity is low. Second, behavioral measures were less correlated among themselves than personality scores and frequency measures, which suggests that they tap constructs that are less consistent (low between-task reliability). These ﬁndings are not isolated: other studies from other laboratories, involving smaller number of subjects and behavioral tasks, reached very similar conclusions (Corsetto and Filippin, 2013; Attanasi et al., 2018). Beyond raw behavioral measures, a computational modeling approach using cumulative prospect theory (CPT) parameters (decreasing marginal utility, loss aversion and subjective weighting of probabilities) failed to improve between-task reliability. Finally, testretest reliability was lower for behavioral tasks than for personality scores. Strikingly, preliminary evidence suggests that these issues generalize to many behavioral tasks beside decision-making under risk, such as reinforcement learning (Enkavi et al., 2018). These ﬁndings are not isolated: other studies from other laboratories, involving smaller number of subjects and behavioral tasks, reached very similar conclusions (Corsetto and Filippin, 2013; Attanasi et al., 2018). Low external validity and-reliability is extremely worrying for the development of behavioral economics applications and (by extension) for the neuroeconomics research framework, where risk preferences are commonly assessed and elicited using behavioral tasks. In addition, the unreliability Frontiers in Psychology | www.frontiersin.org 1 November 2018 | Volume 9 | Article 2307 Palminteri and Chevallier Inter-Individual Differences in Risk-Taking of behavioral measures is also problematic for the computational psychiatry research framework that has recently placed great emphasis on the use of cognitive and behavioral phenotyping tools. The idea behind these frameworks is that behavioral measures can be used to phenotype patients at the individual level and ultimately work as tools to perfect diagnosis, personalize care, and assess the eﬃcacy of new treatments or drugs in clinical trials (Huys et al., 2016). In this context, it is therefore vital that behavioral tasks generate results that are stable and predictive of real life outcomes. In addition to questioning approaches based on behavioral phenotyping tools, these ﬁndings also raise a profound epistemological challenge. Given that real life frequency of risky behaviors is the reﬂection of past choices, why then, do personality measures – that are based on questionnaires – explain real life behaviors better than behavioral measures – that are based on choices? And why would the same subjects produce diﬀerent choices when presented with the very same task twice? response modality as the personality measures. Furthermore, risk propensity and risk frequency assessments shared similar content and it should come as no surprise that subjects provide similar responses to similar questions, e.g., in order to present a coherent image of themselves (a good “narrative”). In statistical terms, this would result in an artiﬁcially increased correlation between frequency and personality measures. Taken together, these features may inﬂate the consistency and validity of the personality measures. Finally, self-reported questionnaires are well-known for eliciting edulcorated representations that are inﬂuenced by a range of social norms (Edwards, 1953). To overcome the issues raised by self-reported frequency of risk behaviors, personality and behavioral measures should be tested against objective assessments of risky behaviors (e.g., expired CO2 for smoking, medical records, etc.). The argument that there is a “problem with the instrument” also applies to the mathematical model used to quantify risk propensity parameters. The authors indeed focused on CPT, which is a widely used descriptive model originally designed to explain one-shot decisions. But three features of CPT may undermine the internal consistency of modelbased measures of risk sensitivity (Tversky and Kahneman, 1992). First, diﬀerent tasks engage diﬀerent peripheral processes but the same CPT model is applied to various behavioral tasks with no task-speciﬁc adjustment of the functional form. Second, and more importantly, CPT parameters are assumed to be static and not aﬀected by the individual’s history of choice, by relevant contextual factors or by feedback. In that respect, CPT is a purely descriptive model rather than a mechanistic model. Third, CPT parameters are often correlated and it is often hard to disentangle their respective contribution to risky behavior using standard ﬁtting procedures. This is in part because diﬀerent parameter values can produce the same behavioral phenotype (e.g., loss aversion) (Nilsson et al., 2011), which may undermine the power of the model to unambiguously predict particular behavioral proﬁles. The “problem with the construct” argument implies that behavioral tasks provide a genuine estimate of the subject’s momentary risk attitude at the time of testing, but that risk attitude itself changes over time. This is plausible if we assume that momentary risk attitude is inﬂuenced by multiple factors. To illustrate this idea, we now consider a simpliﬁed case involving two possible phenotypes, a risk-seeking phenotype (red) and a risk-averse phenotype (blue), and we propose a multi-layer model in which momentary risk attitude corresponds to the weighted sum of diﬀerent sources of inﬂuence that change with diﬀerent time constants (Figure 1). In this toy example, the ﬁrst layer corresponds to the subject’s “trait,” which is determined by her genotype and which remains stable over her lifespan. The last layer corresponds to random (or unpredictable) factors, such as unexpected external stimuli and contextual factors. In between these two extremes, we hypothesize that additional sources of inﬂuence are at play, such as very slow age-related changes and very fast circadian rhythms. According to this model, a subject tested twice with the same behavioral task at diﬀerent time points will not necessarily display the same phenotype. Within TWO POSSIBLE EXPLANATIONS We put forward two possible answers for these puzzling results and fundamental questions (low external validity and consistency of behavioral measures): The ﬁrst possibility is that these ﬁndings reﬂect a problem with the instrument; The other possibility is that these ﬁndings reﬂect a problem with the construct. The “problem with the instrument” argument has been explicitly put forward by the authors of the studies (Frey et al., 2017; Pedroni et al., 2017). According to this hypothesis, the low external validity and reliability of the behavioral tasks derive from intrinsic limitations of the tasks. For instance, it has been argued that low between-task consistency between behavioral measures derives from the fact that each task involves both central (risk sensitivity) and peripheral processes (responses, stimuli), whose variability may aﬀect the results. Low test-retest reliability should also be expected given that behavioral and cognitive tasks are traditionally designed to reduce between-subjects variance and to maximize between-conditions variance, such that the very features that make a behavioral task “successful” (high reproducibility of the “average” results) make it unsuited to assess inter-individual diﬀerences (Hedge et al., 2017). As nicely summarized by Hedge et al. (2017): “Experimental effects become well established – and thus those tasks become popular – when between-subject variability is low. However, low between-subject variability causes low reliability for individual differences, destroying replicable correlations with other factors and potentially undermining published conclusions drawn from correlational relationships.” Propensity measures on the other hand, are designed to maximize inter-individual diﬀerences. In addition, a good testretest reliability is a condicio sine qua non for the publication of personality questionnaires, hence their good temporal consistency. Finally, in the context of the speciﬁc set of studies at hand, it is also worth noting that the frequency measures were assessed using self-report questionnaires, which involve the same Frontiers in Psychology | www.frontiersin.org 2 November 2018 | Volume 9 | Article 2307 Palminteri and Chevallier Inter-Individual Differences in Risk-Taking Layers of influence Increase risk taking Risk neutral Decrease risk taking Change speed Slow Trait + Ageing + State + Circadian + Random error Fast = ES1 Momentary attitude Risk-averse Risk-seeker Risk attitudet=ω1Tt+ω2At+ω3St+ω4Ct+εt ES2 Legend ES1: Experimental Session 1 ES2: Experimental Session 2 T1: Experimental Task 1 T2: Experimental Task 2 T2 T1 T1 T2 Time FIGURE 1 | The figure schematizes how low consistency of behavioral measures of risk may arise from the multi-layer model. At the top, we represent the different factors that influence the probability to express a given behavioral phenotype at a given time point in addition to random error. We consider a simplified case in which only two phenotypes are possible: red (risk seeking) and blue (risk aversion). The different layers change at different time constants (as exemplified by the gray triangle on the right). At a given time point (t) the momentary risk attitude is the weighted sum of the different layers of influence plus random error. A given subject is tested in two experimental sessions (ES1, and ES2) with two behavioral tasks supposed to measure the same behavioral phenotype (T1 and T2). The multi-layer model may explain why behavioral measures are not consistent between-tasks and between-sessions. this framework, the fact that propensity measures produce more stable results can be explained by the fact that ﬁlling out questionnaires relies on cognitive processes that do not involve risk attitude per se, such as robust averaging of previous experiences stored in episodic memory or introspection. Crucially, there is evidence demonstrating that these various layers are indeed relevant to understanding decision-making under risk: genetic factors inﬂuence risk-related behaviors (Linner et al., 2018), behavioral measures of risk sensitivity evolve across the life-span (Weller et al., 2011), and are aﬀected by hormonal and circadian factors (Lazzaro et al., 2016; Glimcher and Tymula, 2017), mood states (Stanton et al., 2014), as well as momentary arousal (FeldmanHall et al., 2016). Importantly, the same factors are involved in other decision-making processes such as cooperation in economics dilemmas, a ﬁeld where behavioral tasks also predict real life behaviors poorly (Gurven and Winking, 2008; Winking and Mizer, 2013). By contrast, propensity measures, as implemented by questionnaires, are designed to assess participants’ prototypical behavior averaged across long period of times, thus canceling out momentary trends. In other words, questionnaires are designed to assess stable “traits.” In many cases, participants are explicitly instructed to extract their prototypical behavior with formulations such Frontiers in Psychology | www.frontiersin.org as “describe yourself as you generally are” and to ignore the variability induced by circadian or age-related changes. CONCLUSION AND PERSPECTIVES Recent evidence based on large-scale behavioral testing shows that behavioral measures in cognitive tasks are outperformed by propensity measures from personality questionnaires, in terms of external validity (i.e., correlation with frequency measures) and reliability (between-tasks consistency and testretest reliability). We delineate two possible – not mutually exclusive – interpretations of these results. The pessimistic “problem with the instrument” argument states that behavioral tasks are not suited to investigate inter-individual diﬀerences. The optimistic “problem with the construct” argument states that variability in behavioral tasks reﬂects true changes in momentary risk attitude. According to this view, behavioral tasks reﬂect true momentary risk attitude and will the quantiﬁcation of the relative weights of the diﬀerent layers. At the moment, personality questionnaires appear to be the best psychological tools to predict the frequency of real-life risky behavior. Should we then, abandon the quest for behavioral 3 November 2018 | Volume 9 | Article 2307 Palminteri and Chevallier Inter-Individual Differences in Risk-Taking measures of individual variability? Probably not. Questionnaires are hugely informative when it comes to providing an accurate description of the variability with which personality traits manifest but they cannot be used to trace back the cognitive and neural mechanisms that together produce such variability. The paucity of robust behavioral tools to characterize interindividual diﬀerences therefore constitutes an important obstacle in building proper models of cognitive variability. Developing behavioral biomarkers, however, requires a proper re-think in the way cognitive scientists design tasks so that they maximize between-subjects variance. One promising possibility is to shift from ﬁxed and passive designs to active and adaptive ones. Adjusting task parameters online could indeed correct for momentary changes in baseline performance that may aﬀect the assessment of risk preferences. These results also highlight the importance of repeated testing, which has now become considerably easier with the development of smartphone based behavioral experiments. Repeated testing should also allow us to test the multi-layer hypothesis, to attribute precise coeﬃcients to the diﬀerent layers, and by averaging performance over experiments, to infer participants’ trait-level phenotype. The issue related to the ambiguous relationship between CTP parameters and behavioral proﬁles and their correlation may be solved by implementing principal component analyses instead of working with the raw parameters and by implementing hierarchical model ﬁtting (Nilsson et al., 2011). This approach would of course require external validation to assess which component reﬂects risk sensitivity but we believe it is a valuable alternative to current methods. Ultimately, developing and reﬁning mechanistic and dynamic models of risk preferences that integrate learning processes and contextual factors, might also allow for a better quantiﬁcation of risk preferences at the individual level. A promising way to design these models could be the development of choice prediction competitions, a method that already commonly used in the machine learning literature (Erev et al., 2017). Even more ambitiously, these prediction competitions would include data collection at multiple time points as well as external validation by real life outcomes. REFERENCES Gurven, M., and Winking, J. (2008). Collective action in action: prosocial behavior in and out of the laboratory. Am. Anthropol. 110, 179–190. doi: 10.1111/j.15481433.2008.00024.x Hedge, C., Powell, G., and Sumner, P. (2017). The reliability paradox: why robust cognitive tasks do not produce reliable individual diﬀerences. Behav. Res. Methods 50, 1166–1186. doi: 10.3758/s13428-017-0935-1 Huys, Q. J., Maia, T. V., and Frank, M. J. (2016). Computational psychiatry as a bridge from neuroscience to clinical applications. Nat. Neurosci. 19, 404–413. doi: 10.1038/nn.4238 Lazzaro, S. C., Rutledge, R. B., Burghart, D. R., and Glimcher, P. W. (2016). The impact of menstrual cycle phase on economic choice and rationality. PLoS One 11:e0144080. doi: 10.1371/journal.pone.0144080 Linner, R. K., Biroli, P., Kong, E., Meddens, S. F. W., Wedow, R., Fontana, M. A., et al. (2018). Genome-wide study identiﬁes 611 loci associated with risk tolerance and risky behaviors. bioRxiv [Preprint]. doi: 10.1101/261081 Nilsson, H., Rieskampa, J., and Wagenmakers, E. (2011). Hierarchical bayesian parameter estimation for cumulative prospect theory. J. Math. Psychol. 55, 84–93. doi: 10.1016/j.jmp.2010.08.006 Pedroni, A., Frey, R., Bruhin, A., Dutilh, G., Hertwig, R., and Rieskamp, J. (2017). The risk elicitation puzzle. Nat. Hum. Behav. 1, 803–809. doi: 10.1038/s41562017-0219-x Stanton, S. J., Reeck, C., Huettel, S. A., and LaBar, K. S. (2014). Eﬀects of induced moods on economic choices. Judgem. Decis. Mak. 9, 167–175. Tversky, A., and Kahneman, D. (1992). Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323. doi: 10.1007/ BF00122574 Weller, J. A., Levin, I. P., and Denburg, N. L. (2011). Trajectory of risky decision making for potential gains and losses from ages 5 to 85. J. Behav. Decis. Mak. 24, 331–344. doi: 10.1002/bdm.690 AUTHOR CONTRIBUTIONS SP designed the review. SP and CC wrote the review. FUNDING SP was supported by an ATIP-Avenir grant (R16069JS) Collaborative Research in Computational Neuroscience ANR-NSF grant (ANR-16-NEUC-0004), the Programme Emergence(s) de la Ville de Paris, and the Fondation Fyssen. The Institut d’Etudes de la Cognition was supported ﬁnancially by the LabEx IEC (ANR-10-LABX-0087 IEC) and the IDEX PSL∗ (ANR-10-IDEX-0001-02 PSL∗ ). ACKNOWLEDGMENTS We thank Nathaniel Daw for useful comments. Attanasi, A., Georgantzís, N., Rotondi, V., and Vigani, D. (2018). Lottery- and survey-based risk attitudes linked through a multichoice elicitation task. Theory Decis. 84, 341–372. doi: 10.1007/s11238-017-9613-0 Corsetto, P., and Filippin, A. (2013). A theoretical and experimental appraisal of ﬁve risk elicitation methods. SOEPpapers on Multidisciplinary Panel Data Research 547, Berlin. doi: 10.2139/ssrn.2253819 Edwards, A. L. (1953). The relationship between the judged desirability of a trait and the probability that the trait will be endorsed. J. Appl. Psychol. 37, 90–93. doi: 10.1037/h0058073 Enkavi, A., Eisenberg, L., Bissett, P., Mazza, G. L., Mackinnon, D. P., Marsch, L. A., et al. (2018). A large-scale analysis of test-retest reliabilities of self-regulation measures. PsyArXiv [Preprint]. doi: 10.31234/osf.io/x5pm4 Erev, I., Ert, D., Plonsky, O., Cohen, D., and Cohen, O. (2017). From anomalies to forecasts: toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychol. Rev. 124, 369–409. doi: 10.1037/rev000 0062 FeldmanHall, O., Glimcher, P., Baker, A. L., and Phelps, E. A. (2016). Emotion and decision-making under uncertainty: physiological arousal predicts increased gambling during ambiguity but not risk. J. Exp. Psychol. Gen. 145, 1255–1262. doi: 10.1037/xge0000205 Frey, R., Pedroni, A., Mata, R., Rieskamp, J., and Hertwig, R. (2017). Risk preference shares the psychometric structure of major psychological traits. Sci. Adv. 3:e1701381. doi: 10.1126/sciadv.1701381 Glimcher, P. W., and Tymula, A. (2017). Let the sunshine in: the eﬀects of luminance on economic preferences, choice consistency and dominance violations. PLoS One 12:e0181112. doi: 10.1371/journal.pone.018 1112 Frontiers in Psychology | www.frontiersin.org 4 November 2018 | Volume 9 | Article 2307 Palminteri and Chevallier Inter-Individual Differences in Risk-Taking Winking, J., and Mizer, N. (2013). Natural-ﬁeld dictator game shows no altruistic giving. Evol. Hum. Bahav. 34, 288–293. doi: 10.1016/j.evolhumbehav.2013. 04.002 Copyright © 2018 Palminteri and Chevallier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. Conflict of Interest Statement: The authors declare that the research was conducted in the absence of any commercial or ﬁnancial relationships that could be construed as a potential conﬂict of interest. Frontiers in Psychology | www.frontiersin.org 5 November 2018 | Volume 9 | Article 2307

Log In

MINI REVIEW a section of the journal Frontiers in Psychology Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks

MINI REVIEW a section of the journal Frontiers in Psychology Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks

Related Papers

RELATED PAPERS