Visual Cognition
ISSN: 1350-6285 (Print) 1464-0716 (Online) Journal homepage: http://www.tandfonline.com/loi/pvis20
Reward boosts working memory encoding over a
brief temporal window
George Wallis, Mark G. Stokes, Craig Arnold & Anna C. Nobre
To cite this article: George Wallis, Mark G. Stokes, Craig Arnold & Anna C. Nobre (2015) Reward
boosts working memory encoding over a brief temporal window, Visual Cognition, 23:1-2,
291-312, DOI: 10.1080/13506285.2015.1013168
To link to this article: http://dx.doi.org/10.1080/13506285.2015.1013168
Published online: 06 Mar 2015.
Submit your article to this journal
Article views: 212
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=pvis20
Download by: [Radcliffe Infirmary]
Date: 05 October 2015, At: 03:48
Visual Cognition, 2015
Vol. 23, Nos. 1–2, 291–312, http://dx.doi.org/10.1080/13506285.2015.1013168
Reward boosts working memory encoding over a brief
temporal window
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
George Wallis , Mark G. Stokes, Craig Arnold and
Anna C. Nobre
Oxford Centre for Human Brain Activity, University Department of
Psychiatry, Warneford Hospital, University of Oxford, Oxford, UK
(Received 1 June 2014; accepted 26 January 2015)
Selection mechanisms for WM are ordinarily studied by explicitly cueing a subset of
memory items. However, we might also expect the reward associations of stimuli we
encounter to modulate their probability of being represented in working memory (WM).
Theoretical and computational models explicitly predict that reward value should
determine which items will be gated into WM. For example, a model by Braver and
colleagues in which phasic dopamine signalling gates WM updating predicts a
temporally-specific but not item-specific reward-driven boost to encoding. In contrast,
Hazy and colleagues invoke reinforcement learning in cortico-striatal loops and predict
an item-wise reward-driven encoding bias. Furthermore, a body of prior work has
demonstrated that reward-associated items can capture attention, and it has been shown
that attentional capture biases WM encoding. We directly investigated the relationship
between reward history and WM encoding. In our first experiment, we found an
encoding benefit associated with reward-associated items, but the benefit generalized to
all items in the memory array. In a second experiment this effect was shown to be highly
temporally specific. We speculate that in real-world contexts in which the environment
is sampled sequentially with saccades/shifts in attention, this mechanism could
effectively mediate an item-wise encoding bias, because encoding boosts would occur
when rewarded items were fixated.
Keywords: Working memory; Reward; Attention.
Please address all correspondence to George Wallis, Oxford Centre for Human Brain Activity,
University Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford, OX3 7JX,
UK. E-mail: wallisgj@gmail.com
We thank two anonymous reviewers for their constructive comments.
No potential conflict of interest was reported by the authors.
This research was supported by a Wellcome Trust DPhil Studentship to GW.
© 2015 Taylor & Francis
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
292
WALLIS ET AL.
A key function of working memory (WM)—the ability to temporarily retain
information over short intervals—is to enable goal-relevant items to guide
ongoing behaviour over temporal gaps, though they may no longer be
perceptually available (Fuster, 1990). WM is a capacity limited resource: in the
visuospatial domain, a maximum of around four separate items can be retained
(Alvarez & Cavanagh, 2004; Luck & Vogel, 1997). Given this capacity
limitation, for working memory to be useful selection mechanisms must act to
ensure that only the most behaviourally relevant information is encoded. Task
requirements can lead to strategic encoding biases for WM—for example, people
can use explicit cues as to which items will be task relevant to bias encoding
(Murray, Nobre, & Stokes, 2011; Sperling, 1960). However, besides task-oriented
gating biases, it might be adaptive for the reward value of items encountered in
the environment to influence the probability that they are encoded into WM.
Interactions between the dopamine system, the striatum and prefrontal cortex
have motivated theoretical models explicitly predicting that reward value should
play a role in determining which items are gated into working memory. One such
theory was proposed by Braver and Cohen (2000). They suggested that updating
of WM representations in prefrontal cortex (Curtis & D’Esposito, 2003; Fuster &
Alexander, 1971; Jacobsen & Nissen, 1937) might be gated by phasic dopamine
release. In macaque, phasic dopamine signals in the ventral tegmental area (VTA)
and substantia nigra pars compacta (SNpc) code reward prediction errors
(Schultz, 1998, 2013), ramping up within 100 ms of encountering a rewardassociated stimulus, and lasting a further 200 ms. These dopaminergic neurons
send widespread afferents to the prefrontal cortex. During learning, appetitive
stimuli (primary reinforcers) initially induce bursts of firing in dopamine neurons,
but over time the burst firing transfers to stimuli (secondary reinforcers) that are
predictive of the primary reinforcers. If this dopaminergic signal gated WM
encoding, then reward-predictive stimuli would be more likely to be encoded
into WM.
This theory makes a particular behavioural prediction, given the properties of
the dopaminergic innervation. Dopamine neurons have diffuse terminal projections within PFC. Besides their lack of target specificity, dopaminergic neurons
also seem to respond rather homogeneously: a large percentage of recorded
dopamine neurons will respond to reward-associated stimuli, or pause if an
expected reward is omitted (Schultz, 2013). This phasic activation is temporally
stereotyped. A homogeneous, phasic signal would seem well suited for temporal
selection, briefly opening the gate to WM. However, it is not clear that a phasic
dopamine signal could selectively gate a subset of multiple simultaneously
presented items. If (as is typical of laboratory WM tasks) a number of items are
presented briefly and simultaneously on the screen, and participants are required
to maintain central fixation, we might expect any item in an array to benefit from
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
293
such a gating event even if only a subset of items in the array carry a reward
association.
A related model that does permit item-wise selectivity incorporates the
striatum. Hazy, Frank, and O’Reilly (2007) have proposed a model in which
dopamine is thought of as a teaching signal for striatal plasticity, as opposed to a
gating signal per se. The striatum is suggested to control selective updating of
WM. Long-standing models propose a role for basal ganglia in action selection
(Mink, 1996; Redgrave, Prescott, & Gurney, 1999), and the striatum is
reciprocally connected with motor and premotor cortex. However, the striatum
is also extensively connected with “cognitive” areas in PFC, including regions
associated with WM control (Alexander, DeLong, & Strick, 1986; Middleton &
Strick, 2000). In the model by Frank, Hazy, O’Reilly and colleagues, action
selection is generalized to the cognitive domain: parallel cortico-striatal loops are
responsible for selective updating of representations in WM, and striatal circuitry
adapts to gate only goal-relevant representations into memory, using dopaminergic firing as the teaching signal (Frank, Loughry, & O’Reilly, 2001; Hazy
et al., 2007). This theory predicts that item-wise reward associations should be
able to bias WM encoding in favour of items associated with high reward value.
fMRI has provided evidence that both the dopamine system and basal ganglia
are involved in WM updating. D’Ardenne et al. (2012) imaged activity in both
dlPFC and SNpc/VTA. They contrasted a “context independent” task condition,
in which participants responded simply on the basis of a probe stimulus, with a
“context dependent” condition in which participants had to encode a rule into
memory, cued by a preceding stimulus, in order to know how to respond to the
probe stimulus. This latter condition gave rise to a phasic increase in activity in
both the dlPFC and the VTA/SNpc following the context cue (i.e., when WM
updating was required). There was less activity in SNpc/VTA on contextdependent error trials than correct trials, suggesting that activity in the
dopaminergic nuclei is important for successful updating of WM. Similarly, the
basal ganglia have been linked to selective gating of information into WM.
McNab and Klingberg (2007) gave participants a task in which they were cued
on each trial to remember the location of three red items, remember the location
of three red and two yellow items, or remember the location of three red items
but ignore a further two yellow distractor items presented in the same array.
The first two conditions manipulated memory load, whilst the latter condition
added the requirement to filter distractors. The BOLD signal following the
presentation of the cue revealed activation in basal ganglia when participants
were preparing to filter out distractors. The magnitude of this activation
correlated with WM capacity across participants, and negatively correlated
with parietal BOLD in the retention interval (where higher activity may reflect
unnecessary storage of irrelevant items).
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
294
WALLIS ET AL.
Despite the imaging work, there is little direct behavioural evidence for a link
between reward history and WM encoding, notwithstanding a growing body of
work showing that reward associations can capture attention. For example,
Anderson, Laurent, and Yantis (2011) developed a two-part design to quantify
the effect of learnt value associations upon the allocation of visuospatial
attention. First, a reward-training task was used to associate different stimulus
colours with different levels of reward. In a subsequent visual search task, there
was no reward feedback given, and reward was task-irrelevant. Nevertheless,
when items associated with a high-reward colour were present as a distractor,
reaction times were slower, suggesting that previously learned reward automatically captured attention. Given that attentional capture can bias working
memory encoding (Schmidt, Vogel, Woodman, & Luck, 2002) we might expect
that items in a working memory task carrying a higher reward value would be
better encoded into memory. However, as discussed in detail above, there are
also compelling theoretical models linking the reward system and working
memory updating. There are therefore several putative mechanisms by which
reward associations could influence working memory encoding. In this study, we
adapted the two-part design to investigate this relationship.
EXPERIMENT 1
The experimental session had two parts. First, participants performed a binarychoice reward-learning task, in order to associate different novel shapes with
different reward values (nil, low, or high). After a short break, participants then
performed a WM task in which the previously encountered high- and low-reward
shapes served as the memoranda (nil-valued items were not used), but reward
value was now completely task-irrelevant. The WM task is schematized in
Figure 1.
Figure 1. Task schematic, experiment 1. Participants were briefly presented with two or four shape
memoranda (panel A; when only two items were presented, they were always in opposite corners). After a 2 s
retention interval a single item appeared in the centre of the screen. This item was present in the memory array
with probability 0.5 (i.e., 50% match trials, 50% non-match trials). Participants judged “present” or “absent ”.
Panel B describes the condition bins in experiment 1 (and the condition labelling scheme for results figures).
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
295
The two hypotheses outlined above make different predictions about the
effect of reward value on WM performance (as indexed using D′, a measure of
available information that is robust to response biases). According to the striatalgating account, item-wise reward value should bias encoding. High-reward items
should have a competitive advantage compared with low-reward items.
If capacity is fixed and reward value biases competition between the memoranda
for representation in memory, the most substantial effect of reward value should
be observed in the mixed-value arrays, between trials in which high or low value
items are probed. Alternatively, according to the gating-by-dopamine account,
WM updating should be generally facilitated whenever high value items are
encountered. Performance should therefore scale with the reward value of the
array. These hypotheses are not mutually exclusive, and mixed patterns, with an
item-wise bias superimposed on a general facilitation with higher net rewardvalue are also plausible.
Methods
Participants
Twenty-three volunteers (15 female) took part in the experiment, recruited from
the Oxford community using a mailing list. They were aged between 20 and 39
years old (mean age 23.5). Participants were paid between £11 and £14,
depending on their performance in the reward-training task. All participants had
normal or corrected-to-normal visual acuity. Left-handedness did not preclude
participation. Ethics approval for the study was granted by the Central University
Research Ethics Committee of the University of Oxford.
Stimuli
The experiment was conducted in a quiet, dimly lit booth. Stimuli were
presented on an LCD monitor at a viewing distance of 80 cm. During the WM
task gaze direction was monitored by the experimenter using an Eyelink 1000
infra-red video eyetracker (binocular). A chinrest was used to stabilize the head.
Participants were asked to maintain fixation at whilst the memory array was
presented (but were permitted to saccade to the probe item at the end of each
trial). The eye-movement trace was monitored from outside of the experimental
booth by the experimenter in conjunction with a duplicate of the stimulus
display. We found that all participants were able to maintain fixation whilst the
memory array was presented. Stimuli subtended 2 degrees visual angle. The
shape stimuli are illustrated in Figure 2. One group of 12 stimuli always carried
either high or low value (6 stimuli each)—value allocation within this group was
randomized for each participant. A separate group of 18 stimuli made up the nilvalue stimuli. These were never used in the WM task, but were important for the
reward training, as described below.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
296
WALLIS ET AL.
Figure 2. Shape stimuli. The upper two rows contain the 12 stimuli used in both reward training and WM
task, which were assigned high or low value (allocated randomly across participants). The lower three rows
contain the nil-value stimuli, which were never used in the WM task.
Procedure: reward training
The reward-training task (Figure 3) consisted of a sequence of binary choices
between pairs of the shape stimuli. On a given trial, a participant was presented
with two outline shapes, one on the left and one on the right side of the screen,
and selected one or other by pressing the left or right key on the computer
keyboard. A red square appeared to indicate the chosen item, and reward
feedback was given immediately afterwards. An on-screen reward bar incremented, and an auditory signal was given (a cash register “kerr-ching!” sound for
high reward, a “coin drop” sound for low reward, and a low beep for nil reward).
The reward bar incremented from the left towards a target bar on the right-hand
side of the screen. Each time this bar was reached, participants “banked” £1 and
the bar was reset. The magnitude of the reward feedback depended on the shape
chosen. There were three possible values for the shape stimuli: nil value, low
value (1 pence), and high value (10 pence). Participants were instructed to win as
much money as they could over the course of the experiment, and that they
would be given the value they won in cash to take with them at the end of the
session. Overall winnings ranged from £11 to £14.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
297
Figure 3. Reward task. On each trial, participants were presented with a pair of items. They chose one
item by pressing the left or right key on the keyboard. Reward feedback was then given about the value of
the chosen item. A “coin stack” on the right hand side of the screen kept track of net winnings. The red
reward bar incremented towards the gold bar on the right hand side of the screen, and when it reached this
bar it was reset (and £1 was added to the coin stack).
Over the course of reward training, there were twice as many pairings of a
low value item with a nil value item (144 pairings) than of either a high with a
low value item (72 pairings), or a high with nil value item (72 pairings). This
was in order to equate the number of times participants chose high value items
and low value items (but note that in order to equate choices, the low value items
were presented more often than the high value items).
Procedure: WM task
The WM task was a “present/absent” forced-choice task (panels A, B; Figure 1).
The WM array consisted of either four or two items. Following the presentation
of the fixation cue, the WM array was presented for 200 ms, followed by a 2 s
retention interval. The probe item was then presented for 200 ms at fixation, and
participants indicated whether the probe item had been a member of the memory
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
298
WALLIS ET AL.
array by pressing either “p” (present) or “a” (absent) on the keyboard. Only the
set of shapes associated with either a low or high reward value in the rewardtraining task (six low value shapes, six high value shapes) were used as memory
items in the WM task. The different combinations of memory item and probe
item values are summarized in Figure 1, panel B.
There were 384 trials in total, split evenly between two-item and four-item
trials. There were therefore 48 trials for each intersection of set size (x2) and
reward condition (x4), consisting of 24 probe-present (termed “match”) trials and
24 probe-absent (termed “non-match”) trials. For set size two, the memory items
were presented on the diagonals. In conditions for which the memory items in
the array were all of the same value, the probe stimulus in non-match trials was
always of the same value as the stimulus array. This was to avoid participants
using the change in reward value relative to the all-high/all-low array to inform
their match/non-match decision (as opposed to remembered item identity).
Results
Reward training
The effectiveness of the reward training was assessed by dividing trials into the
three possible stimulus pair types (High Reward with Low Reward, High with Nil,
Low with Nil) and calculating the probability that participants correctly chose the
more valuable item as the task progressed, over a 21-trial moving window. Any
participant who failed to learn to choose higher value items in preference to lower
valued items by the end of the reward training was excluded from the analysis of
the WM data. An exact binomial test was used to find the probability of the
proportion of choices each participant made over the last 40 trials of the reward
training, under the null hypothesis of random choice. A criterion p-value of p ≤
.01 (two-tailed) was used for all three pair types separately to exclude participants
who did not show evidence of learning. Six participants were excluded from
further analyses because they did not learn to choose the low-value item in
preference to the nil-value item. A subset of these participants (n = 4) actively
chose the nil-valued items in preference to the low-value items, perhaps because
of an exploratory incentive to discover the value of the nil-value item. The
remaining participants were choosing the low valued items in preference to the nil
valued items by the end of training (all p < .0011, over the last 40 trials). Grandaveraged learning curves for these participants are shown in Figure 4.
WM task
Trials in which reaction time was longer than 5 seconds or shorter than 0.3
seconds were excluded from the analysis, as they likely reflected lapses in task
engagement or premature responding, respectively (on average 2.5/384 trials
were excluded). We analyzed the results using signal-detection theory measures,
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
Figure 4. Reward-learning, experiment 1. Following the exclusion of six participants who failed to demonstrate significant learning in the low-value / nil-value pairs, the
above grand-averaged learning curves were obtained. By the end of the training session, all participants included in the WM analysis were able to successfully discriminate
nil, low, and high value items at near-ceiling performance. Error bounds are ±1SEM.
299
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
300
WALLIS ET AL.
to distinguish changes in response bias from changes in sensitivity. D-prime and
criterion values were analyzed using separate repeated-measures ANOVAs with
factors Load (2 levels) and Reward Condition (4 levels).
In the D-prime ANOVA, there was a main effect of working memory load (F
(1,16) = 132.4, p < .0005) and a main effect of reward condition (F(3,48) = 3.28, p
= .029). There was no evidence for an interaction between working memory load
and reward value (F(3,48) = 0.311, p = .817). In the criterion ANOVA, there was a
main effect of reward condition (F(3,48) = 4.56, p = .007) but no evidence for a
main effect of memory load (F(1,16) = 2.85, p = .11) or for an interaction between
memory load and reward condition (F(3,48) = 0.627, p = .601).
The results are plotted in Figure 5: the raw d-prime and criterion values are
shown in panels A and B, respectively, for both working memory loads. The
effect of memory load dominates the scale in panel A. In order to more clearly
show the effect of reward condition, we re-plotted these data in panels C and D
after normalizing the data within load condition and within participant (by
subtracting the mean across all reward conditions from each data point). This
also factors out between-subject variance.
The value of the memoranda and probe item affected sensitivity (D-prime). D′
was elevated for conditions in which the array items had higher net reward (Figure
5, panels A and C). If these results had been driven by biasing of competition for
encoding, in which case we would expect the biggest difference in sensitivity to
be between high and low-reward probed items in the mixed-value arrays (HL(H)
versus HL(L)). However, the difference in sensitivity was in fact greatest between
the all-high reward (HH(H)) and all-low reward (LL(L)) arrays, with little
difference in sensitivity between the HL(H) and HL(L) conditions. In general the
sensitivity scaled with the net reward value of the memory array.
Reward value also induced a response bias (criterion change). Participants
were more liberal in their responding (criterion is lower) when the probe value
was high (Figure 5, panels B and D). This effect appeared to be driven by the
value of the probe item only (participants were more liberal in indicating a
memory match when the probe item was of high value), and not by the value of
the items in the memory array.
Interim discussion
Experiment 1 showed that reward value of memoranda in a WM task affects
performance. However, rather than driving an item-wise encoding bias towards
rewarded items, sensitivity scales with the net reward value of the memory array
suggesting that encountering reward associated items boosts WM encoding for
all presented items (in line with the proposal that encountering reward associated
stimuli briefly boosts WM encoding). The reward value of the probed item also
induced a response bias, with participants’ responding becoming more liberal
when the probe item was of high value.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
301
Figure 5. Signal detection measures, WM task, experiment 1. In panels A and B, raw D-prime and
criterion data are shown for each reward condition and memory load. There is a large effect of memory load
on d-prime: d-prime is much higher in the load 2 than load 4 condition. In order to factor out the effect of
memory load on D-prime (to better show the more subtle effect of reward condition), and also to factor out
between-subject variance, these data are re-plotted in panels C and D after base-lining within-subject and
within each memory load. D-prime is proportional to the net reward value of the memory array, but there is
no evidence for an item-wise reward-driven bias in encoding (panels A and C). Criterion (higher criterion
indicates more conservative responding) was decreased for high reward probe items, indicating a more
liberal response rule (panels B and D). Error bars are ± 1SEM.
These data were consistent with the hypothesis that encountering a rewardassociated item briefly boosts encoding for WM. However, there are a number of
alternative explanations for the effects we observed. Firstly, our design did not
allow us to assess the temporal scope of this encoding boost, as we presented all
items simultaneously. Arguably, the reward benefit we observed could have been
driven by heightened engagement throughout trials in which the array was of higher
net value (i.e., an “alerting” or “effort” effect). Secondly, it is possible that the
effects we observed were mediated by changes in the dynamics of working memory
retention, as opposed to encoding per se. For example, the higher value items may
have acquired a better long-term memory representation during reward training,
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
302
WALLIS ET AL.
and therefore be more efficiently represented in memory. This could have released
working memory resources for the remaining items in the working memory array,
explaining the co-variation of performance with net reward value of the array.
In order to circumvent these limitations, we ran a second task in which we
presented memoranda in rapid sequence. This allowed us to assess whether the
reward-associated benefit was due to a short-lived encoding boost (of the order
of hundreds of milliseconds, as predicted by the phasic dopamine gating
account) or was attributable to a less temporally focused alerting or effort effect.
As the reward associations in experiment 1 also triggered response biases, we
switched from a present/absent forced choice task to a precision/capacity task, in
which participants remember several oriented items, and attempt to reproduce the
orientation of one of the remembered items at the end of each trial (Zhang &
Luck, 2008). This design eliminates the scope for response bias, and has the
additional advantage of affording dissociable measures of guess rate, and the
precision with which items are represented (on non-guess trials).
Following Anderson et al. (2011) we associated different colours with
different reward values in the reward training phase. In the subsequent WM
task each oriented item was identified by colour, of which a subset were rewardassociated. As the memory dimension (orientation) was orthogonal to the reward
feature (colour) potential differences in long-term memory strength for the highreward items were not a confound in this second experiment. Each participant
performed three sessions on three separate days, and in each session the reward
task was (re)administered before participants performed the WM task. This
allowed us to quantify any training effects, and boosted statistical power.
EXPERIMENT 2
Methods
Participants
Twenty-two volunteers (14 female) took part in the experiment, recruited from the
Oxford community using a mailing list. They were aged between 19 and 28 years
old (mean age 21). Participants were paid between £15 and £17 per session,
depending on their performance in the reward-training task, and each participant
completed three sessions on separate days. All participants had normal or
corrected-to-normal visual acuity. Ethics approval for the study was granted by
the Central University Research Ethics Committee of the University of Oxford.
Procedure
The experiment was conducted in a quiet, dimly lit booth. Stimuli were
presented on an LCD monitor at a viewing distance of 80 cm. During the WM
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
303
Figure 6. Reward training and WM tasks, experiment 2. Panel A shows the reward training task, which
was framed as a game. On each trial, three coloured circles appeared on the screen, moving with
unpredictable trajectories. Participants moved a cross-hair towards the red or green circle and “caught” it by
pressing the mouse button. Panel B shows the sequential WM task. Three coloured “UFO” stimuli with
orientations assigned independently and at random were presented at fixation in rapid succession. A white
stimulus was presented at the end of the stimulus sequence and never tested; this was to reduce the potential
for a final-item advantage resulting from an after-image or fragile VSTM. After a 2 s retention interval, a
probe stimulus was presented that matched one of the memoranda in colour. Participants rotated the
stimulus until it matched the remembered orientation for that colour, at which point they pressed the mouse
button to register their response.
task gaze direction was monitored using an Eyelink 1000 infra-red video
eyetracker (binocular). A chinrest was used to stabilize the head.
Schematics of the reward training and WM tasks are shown in Figure 6.
In each of three separate sessions, participants performed both the reward
training task and then after a short (~10 minute) break, the WM task. Each
session was run on a separate day, and the full set of sessions for a given
participant did not span more than seven days.
Reward training
The aim of the reward training task was to associate different colours with
different levels of reward. There were six colours in total: blue, purple, orange,
yellow, red and green. Following Anderson et al. (2011), red and green were
always rewarded, one being associated with high reward and the other low
reward, which was counterbalanced across participants. The remaining six
colours were never targets in reward training, and were not rewarded. On each
trial of the reward training, either a red or green circle was present, along with
two nil-value colours.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
304
WALLIS ET AL.
The reward-training task was framed as a game. Participants were presented
with a square “arena” (subtending 12 degrees visual angle) in which three
coloured circles (subtending 0.6 degrees visual angle) moved unpredictably, but
with smooth trajectories. The participants controlled the location of a cross-hair
with a trackball mouse (Logitech Marble mouse). Participants were informed that
the goal was to “catch” the red or green patches. They “chased” the rewardassociated patch on a given trial (red or green) with the cross-hair, and clicked
the mouse button to “catch” it. At the beginning of the first session, participants
were given 120 trials of training (attempting to catch the reward-associated
colours, but without any reward feedback). This was to familiarize them with the
mouse. There was no time limit on responses at this point. Following this
training session, participants performed the task again, but this time with reward
feedback. An initial time limit of 3 seconds was imposed. If participants failed to
respond before this time limit, the trial would time-out and the next trial would
be presented. This time-out both kept participants engaged and permitted
staircasing of task difficulty. A staircasing procedure ran continuously with the
aim of equating the money won by different participants, despite differences in
the ease with which different participants performed the task. Every 20 trials, the
time available to catch a patch before the task timed out was changed in 100 ms
steps, decreasing if the participant was more than 90% accurate on the previous
20 trials, and increasing if they were less than 90% accurate. There were 360
trials in total, divided into blocks of 40 trials interspersed with self-timed rest
breaks.
Reward feedback was given visually in the form of a reward bar at the bottom
of the screen that grew incrementally from the left towards the right of the screen
(Figure 6, panel A). When this bar hit a target bar at the right hand side of the
screen, £1 was added to the participant’s winnings. The overall winnings were
represented by a coin stack on the right-hand side of the screen. Auditory
feedback was also given, in the form of a “kerr-ching” cash register sound when
participants caught a high reward item, the sound of a single coin dropping into a
can when participants caught a low reward item, and a low tone when
participants missed the target item, or timed out. Participants were informed
before the beginning of the reward training that the money they won was real
and would be given to them at the end of the experiment. The high-reward
colour earned participants 9 pence if caught, the low reward colour 1 pence.
Participants earned between £15 and £17 per session.
WM task
We employed a precision/capacity orientation memory task (Zhang & Luck,
2008), schematized in Figure 6, panel B. Participants were presented with a
sequence of symmetrical “UFO” stimuli in rapid succession. Each memory
stimulus was presented for 200 ms, with a 100 ms blank ISI between stimuli.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
305
There were three coloured memory stimuli, followed by a white distractor
stimulus that participants were informed they would never be asked to report.
This stimulus was included to ensure the last item was not encoded in a
qualitatively different manner to the preceding items (e.g., as an after-image).
After a delay, one of the coloured stimuli reappeared with a random orientation.
Participants moved the trackball of the mouse horizontally to rotate the probe
stimulus until it matched the remembered orientation for the remembered
stimulus of the matching colour, at which point they pressed the mouse button.
Participants were asked to make an attempt to match the orientation even if they
felt they were completely guessing.
The task was divided into 12 blocks of 30 trials each (360 trials), with rest
breaks between blocks. On each trial, one of the three stimuli was a previously
rewarded colour, with 50% of trials harbouring the high reward associated
colour, and 50% the low reward associated colour. The remainder of the stimuli
were rendered in the unrewarded non-target colours from the reward-training
task. The previously rewarded colour could appear in any of the three sequence
locations with equal probability. Each of the three sequence locations had an
equal probability of being probed. As there were two rewarded colours and four
nil-value colours, each colour appeared on 50% of trials and colours were
equally likely to be probed.
Results
The data from the WM task were analyzed using a mixture model (Braver &
Cohen, 2000; Zhang & Luck, 2008). The response from each trial is expressed as
the difference in response orientation from the orientation of the memory item
(response error, bounded [– π/2: π/2] in radians, as stimuli were symmetrical).
Provided participants are doing better than chance, this distribution has a peak
around zero error (Figure 7, panel A). The mixture model fits the distribution of
response errors as a mixture of a von Mises distribution (the circular equivalent
of a Gaussian distribution), representing the precision with which items are
stored in memory provided they have been encoded, and a uniform guess
distribution modelling trials in which the probed item was not encoded. The fit
(illustrated in Figure 7, panel B) is captured in two parameters: the probability of
guessing on any given trial (pGuess) and the precision of the Von Mises
distribution (kappa). The model was fitted using maximum likelihood estimation in MATLAB.
Trials were binned by reward value of the probed item (nil, low, high), and the
experimental session from which they came (#1, #2, #3). Precision values for
two participants were zero for the majority of condition bins, and inspecting the
error histograms confirmed that these participants were performing at chance
(flat error distributions). These participants were excluded from further analysis.
WALLIS ET AL.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
306
Figure 7. WM data, experiment 2. Response data for high-reward items for one representative participant are
shown in panel A. Panel B shows how the mixture model decomposes this error distribution into the sum of a
uniform distribution of errors, representing trials on which participants retained no information about the
probed item and were guessing, and a von Mises distribution of errors representing trials on which participants
had retained information about the probed item. The model fit is captured in the two parameters pGuess
(probability of a guess response) and kappa (the precision of the non-guess error distribution). Panels C and D
plot the fitted pGuess and kappa parameters for nil value (N), low value (L) and high value (H) items on the
LHS of each panel, and the difference between high and low reward items on the RHS. The data are collapsed
over experimental session for plotting, as there was no interaction between the effect of reward and session. The
guess rate for high reward items was lower than for low reward items (panel A), but memory precision did not
vary between high and low reward items (panel B). Panels E and F plot the pGuess and kappa parameters for
the nil-reward item immediately following a high (H+1) or low reward item (L+1) on the LHS of each panel, and
the difference on the RHS. There are no significant differences in pGuess or kappa between these two cases.
Panels G and H plot the pGuess and kappa parameters for the nil-reward item immediately preceding a high
(H−1) or low (L−1) reward item on the LHS of the panel, and the difference on the RHS. There are no significant
differences in pGuess or kappa between these two cases. Error bars are ±1SEM.
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
307
We analyzed precision and guess rate parameters for trials in which a
previously-rewarded item was probed. A repeated-measures ANOVA with
factors of Reward (two levels) and Session (three levels) indicated a main effect
of reward upon guess rate (F(1,19) = 7.95, p = .011). There was no main effect
of Session (F(1.65,38) = 0.758, p = .454), or Session × Reward interaction (F
(1,38) = 0.382, p = .685). A separate ANOVA was run for precision, and there
was no evidence for a main effect of either factor, or an interaction between the
factors. These data are plotted in Figure 7, panels C and D. As there was no main
effect of session, or interaction between session and reward condition, the data
are averaged over sessions for plotting. The guess rate was lower when a high
reward item was probed than when a low reward item was probed, but the
precision of recall was unaffected. These data suggest that high-reward items
were more likely to be represented in memory than low reward items, but that
they were not encoded with greater precision.
The aim of the rapid sequential task was to establish whether the effects in
experiment 1 and 2 were driven, as hypothesized, by a reward-mediated temporal
encoding boost associated with high-reward items, or whether the effects were
less temporally specific—for example, driven by greater task engagement or
effort on high-reward trials. We tested this by comparing performance for nilreward items immediately following high or low reward items. If the effect of
reward were temporally extended, then we might expect this subsequent item to
“inherit” the performance boost associated with a preceding high-reward item. If,
on the other hand, the boost in performance was short-lived (≤ 300 ms), then
performance for the subsequent item should not differ according to the reward
value of the preceding item. In order to perform this analysis, we first analyzed
only trials in which the second or third sequence location was probed, and the
immediately preceding item (in the first or second sequence location, respectively) was rewarded. There were two groups of trials (per session): trials in
which the previous item was of high reward value, and trials in which it was of
low reward value. A repeated-measures ANOVA (factors Session, 3 levels;
Reward, 2 levels) did not find evidence for a main effect of Session or Reward,
or an interaction between these factors, for either guess rate or precision. These
data are plotted in Figure 7, panels E and F. Analogously, we compared
performance for nil-value items which were immediately followed by either a
high or low value item. These data are plotted in Figure 7, panels G and H.
Similarly there was no evidence of an effect of Session or Reward, or an
interaction between these factors.
This pattern of data suggests that when items are separated in time, reward
value affects encoding of the rewarded item but does not generalize to other
items in the trial as it did in experiment 1, when items were presented
simultaneously. However, there was a numerical trend for the guess rate to be
slightly lower when an item appeared either before or immediately after a high
reward item than a low reward item (Figure 7), though these effects were not
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
308
WALLIS ET AL.
statistically significant. We directly contrasted the guess rate effects for the
rewarded item itself and for the subsequent or prior item—i.e., a paired-sample ttest of the “difference of differences” [pguess(H-L) – pguess(H+1-L+1)] and
[pguess(H-L) – pguess(H−1-L−1)]. Neither test reached significance (p = .22 and
p = .27, respectively).
It is conceivable that with more statistical power we would have seen a
significant effect of reward value on the previous/subsequent items in the
stimulus sequence. This would not be inconsistent with our hypothesis that
rewarded items briefly boost WM encoding: future studies systematically
manipulating the ISI between WM items might be able to map out the
timecourse of this effect.
GENERAL DISCUSSION
In this study, we adapted a two-part paradigm previously used to study the
effects of reward associations on attentional processing (Anderson et al., 2011;
Hazy et al., 2007; Raymond & O’Brien, 2009). By first training participants to
associate certain stimuli with reward, and then using these stimuli in a
subsequent WM task in which reward was task-irrelevant, we were able to
assess the effect of stimulus value on WM encoding independently of strategic
reward-driven biases that might have been associated with on-line reward
feedback during the WM task.
In experiment 1, we found that there was surprisingly little evidence for an
item-wise bias to encode high reward items in preference to low reward items.
However, participants’ sensitivity to the information in the memory array was
proportional to with the net reward value of the memory array on a given trial
(regardless of whether a high or low value item was probed, in mixed-value
arrays). This is consistent with the hypothesis that reward associations facilitate
WM encoding, but in a temporally-specific, not item-specific manner. The
reward value of the probe stimulus also induced a response bias, with
participants responding more liberally (i.e., making more false alarms) when
the reward value of the probe stimulus was high.
In order to investigate the temporal specificity of this effect, we ran a second
experiment in which memoranda were presented sequentially. We adopted a
precision/capacity orientation memory design in this case (Zhang & Luck, 2008),
reducing the scope for response biases and allowing us to differentiate between
the precision with which items are represented in memory and the probability
that they can be recalled. We found that guess rate was lower for reward
associated items, suggesting that they were more likely to be encoded, but that
precision was unaffected, suggesting that the encoding effect is discrete and does
not additionally bias maintenance resources towards high-reward items. When
we examined the effect of reward value when the subsequent or prior item was
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
309
probed, we found no effect on either guess rate or precision. This suggests that
the encoding benefit is short-lived—on the order of ~300 ms or less—and affects
the rewarded item most strongly. However, there was a numerical trend for items
adjacent to a high reward item to be slightly better encoded. Future studies in
which the interval between items in the WM test phase is systematically
manipulated may be able to better characterize the temporal extent of the
proposed encoding boost.
Our results are consistent with an account in which phasic dopamine signals
driven by reward associated stimuli open a gating window for WM encoding
(Braver & Cohen, 2000; Zhang & Luck, 2008), though our behavioural data
cannot directly speak to the proposed neural substrates of this effect. A second
account (Hazy et al., 2007) predicts that reinforcement history should modify
cortico-striatal circuitry to render reward associated items more likely to be
encoded. Regardless of the role of striatum, a further possibility is that rewardassociated items capture attention (Anderson et al., 2011) and that this attentional
capture biases working memory encoding (Schmidt et al., 2002). However, we
found no evidence for an item-wise bias at encoding when the memory array
contained both high and low value items.
The “temporal window” and “striatal gating” or attentional capture accounts
are not mutually exclusive, and our data do not rule out an item-wise bias, which
might be revealed with a larger sample size. Given attentional capture has been
shown to bias memory encoding (Schmidt et al., 2002) and item-wise reward
associations drive attentional capture (Anderson et al., 2011; Theeuwes &
Belopolsky, 2012; Zhang & Luck, 2008), it would be surprising if there was not
at least a subtle item-wise bias. However, our data suggest that WM encoding
may also be influenced by temporal boosts to encoding, as well as item-wise
biases.
Our WM task was conducted under typical laboratory conditions: items were
presented simultaneously and briefly, and eye movements were not permitted.
Whilst a temporal gating mechanism for WM does not lead to an item-wise bias
in encoding under such conditions, we speculate that in more natural settings
such a mechanism might in fact drive item-wise encoding biases, if the encoding
boost was time-locked to sequential sampling of the environment. In the visual
modality, sequential sampling is revealed by the dynamics of eye movements.
People saccade 3–4 times per second when going about a typical task, like
making tea (Land, Mennie, & Rusted, 1999). A gating window lasting ~300 ms
or less could be time-locked to changes in fixation to encode only rewardassociated items into WM.
Previous evidence from the attentional-blink (AB) task suggests that reward
associations can interact with temporal attention. In the standard AB task, two
targets (T1 and T2) are embedded one after the other in a stream of distractor
items. If the interval between T1 and T2 is short, then the second target is often
not available for report (Shapiro, Raymond, & Arnell, 1997). Raymond and
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
310
WALLIS ET AL.
O’Brien (2009) used a two-part paradigm in which different faces were first
associated with either positive or negative expected value. These face stimuli
were subsequently embedded in an attentional blink task, as T2. The first target
(T1) was a texture patch on which participants had to make a simple perceptual
discrimination. Participants were asked at the end of the trial to judge whether
the T2 face they had seen was familiar or novel. As expected, when the T1–T2
interval was short, performance (as indexed by d′) for this discrimination was
lower, indicating that T2 had not been perceived. However, when the T2 face
was associated with high positive value, the attentional blink effect was
abolished. This is consistent with a temporal “boost-bounce” model of the
attentional blink (Olivers & Meeter, 2008), if reward associated stimuli were able
to “re-boost” temporal attention. Whilst these results were framed in terms of
temporal attention, we note that the “temporal boost” hypothesis for WM gating
proposed here is similar. Intriguingly, Slagter et al. (2012) reported that D2receptor binding in the striatum co-varied with attentional blink magnitude
across participants, suggesting a link between dopaminergic function and gating
in the AB task.
We wish to emphasize that whilst a dopaminergic boosting effect is an
intriguing explanation of the results we observed, there are other potential
explanations in terms of temporal attention which do not necessarily invoke a
subcortical mechanism. Attention can be boosted at specific points in time
(Nobre et al., 2007; Rohenkohl and Nobre, 2011) and the modulation of
attentional resources over time may be influenced by modulation of slow
oscillations in the cortex, for example (Lakatos, Karmos, Mehta, Ulbert, &
Schroeder, 2008; Rohenkohl and Nobre, 2011).
In summary, we provide preliminary behavioural evidence for a temporallyspecific but not item-specific boost in WM encoding when reward-associated
stimuli are encountered. Our data cannot address the neural basis of this effect,
but it is consistent with the proposal that phasic dopamine release gates WM
encoding (Braver & Cohen, 2000), for example due to direct dopaminergic
innervation of PFC or via the striatum. We speculate that in more natural
conditions, such a gating function might be time-locked to changes in attentional
fixation, mediating item-wise encoding biases.
ORCID
George Wallis id marker
http://orcid.org/0000-0003-4990-5460
REFERENCES
Alexander, G. E., DeLong, M. R., & Strick, P. L. (1986). Parallel organization of functionally
segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience, 9(1), 357–
381. doi:10.1146/annurev.ne.09.030186.002041
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
REWARD BOOSTS WORKING MEMORY ENCODING
311
Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual short-term memory is set both by
visual information load and by number of objects. Psychological Science, 15(2), 106–111.
doi:10.1111/j.0963-7214.2004.01502006.x
Anderson, B. A., Laurent, P. A., & Yantis, S. (2011). Value-driven attentional capture. Proceedings of
the National Academy of Sciences, 108, 10367–10371. doi:10.1073/pnas.1104047108
Braver, T. S., & Cohen, J. D. (2000). On the control of control: The role of dopamine in regulating
prefrontal function and working memory. Control of Cognitive Processes: Attention and
Performance XVIII, 713–737.
Curtis, C. E., & D’Esposito, M. (2003). Persistent activity in the prefrontal cortex during working
memory. Trends in Cognitive Sciences, 7, 415–423. doi:10.1016/S1364-6613(03)00197-9
D’Ardenne, K., Eshel, N., Luka, J., Lenartowicz, A., Nystrom, L. E., & Cohen, J. D. (2012). Role of
prefrontal cortex and the midbrain dopamine system in working memory updating. Proceedings of
the National Academy of Sciences, 109, 19900–19909.
Frank, M. J., Loughry, B., & O’Reilly, R. C. (2001). Interactions between frontal cortex and basal
ganglia in working memory: A computational model. Cognitive, Affective, & Behavioral
Neuroscience, 1, 137–160.
Fuster, J. M. (1990). Prefrontal cortex and the bridging of temporal gaps in the perception‐action
cycle. Annals of the New York Academy of Sciences, 608, 318–336. doi:10.1111/j.1749-6632.
1990.tb48901.x
Fuster, J. M., & Alexander, G. E. (1971). Neuron activity related to short-term memory. Science, New
Series, 173, 652–654.
Hazy, T. E., Frank, M. J., & O’Reilly, R. C. (2007). Towards an executive without a homunculus:
Computational models of the prefrontal cortex/basal ganglia system. Philosophical Transactions of
the Royal Society of London: B, Biological Sciences, 362, 1601–1613. doi:10.1098/rstb.2007.2055
Jacobsen, C. F., & Nissen, H. W. (1937). Studies of cerebral function in primates. IV. The effects of
frontal lobe lesions on the delayed alternation habit in monkeys. Journal of Comparative
Psychology, 23(1), 101–112. doi:10.1037/h0056632
Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of neuronal
oscillations as a mechanism of attentional selection. Science, 320(5872), 110–113. doi:10.1126/
science.1154735
Land, M., Mennie, N., & Rusted, J. (1999). The roles of vision and eye movements in the control of
activities of daily living. Perception, 28(11), 1311–1328. doi:10.1068/p2935
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and
conjunctions. Nature, 390, 279–281. doi:10.1038/36846
McNab, F., & Klingberg, T. (2007). Prefrontal cortex and basal ganglia control access to working
memory. Nature Neuroscience, 11(1), 103–107. doi:10.1038/nn2024
Middleton, F. A., & Strick, P. L. (2000). Basal ganglia output and cognition: Evidence from
anatomical, behavioral, and clinical studies. Brain and Cognition, 42, 183–200. doi:10.1006/
brcg.1999.1099
Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition of competing motor
programs. Progress in Neurobiology, 50, 381–425. doi:10.1016/S0301-0082(96)00042-1
Murray, A. M., Nobre, A. C., & Stokes, M. G. (2011). Markers of preparatory attention predict
visual short-term memory performance. Neuropsychologia, 49, 1458–1465. doi:10.1016/
j.neuropsychologia.2011.02.016
Nobre, A., Correa, A., & Coull, J. (2007). The hazards of time. Current Opinion in Neurobiology, 17,
465–470. doi:10.1016/j.conb.2007.07.006
Olivers, C. N. L., & Meeter, M. (2008). A boost and bounce theory of temporal attention.
Psychological Review, 115, 836–863. doi:10.1037/a0013395
Raymond, J. E., & O’Brien, J. L. (2009). Selective visual attention and motivation the consequences
of value learning in an attentional blink task. Psychological Science, 20, 981–988. doi:10.1111/
j.1467-9280.2009.02391.x
Downloaded by [Radcliffe Infirmary] at 03:48 05 October 2015
312
WALLIS ET AL.
Redgrave, P., Prescott, T. J., & Gurney, K. (1999). The basal ganglia: A vertebrate solution to the
selection problem? Neuroscience, 89, 1009–1023. doi:10.1016/S0306-4522(98)00319-4
Rohenkohl, G., & Nobre, A. C. (2011). Alpha oscillations related to anticipatory attention follow
temporal expectations. Journal of Neuroscience, 31, 14076–14084. doi:10.1523/JNEUROSCI.
3387-11.2011
Schmidt, B. K., Vogel, E. K., Woodman, G. F., & Luck, S. J. (2002). Voluntary and automatic
attentional control of visual working memory. Perception & Psychophysics, 64, 754–763.
doi:10.3758/BF03194742
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80
(1), 1–27.
Schultz, W. (2013). Updating dopamine reward signals. Current Opinion in Neurobiology, 23, 229–
238. doi:10.1016/j.conb.2012.11.012
Shapiro, K. L., Raymond, J. E., & Arnell, K. M. (1997). The attentional blink. Trends in Cognitive
Sciences, 1, 291–296. doi:10.1016/S1364-6613(97)01094-2
Slagter, H. A., Tomer, R., Christian, B. T., Fox, A. S., Colzato, L. S., King, C. R., … Davidson, R. J.
(2012). PET evidence for a role for striatal dopamine in the attentional blink: Functional
implications. Journal of Cognitive Neuroscience, 24, 1932–1940. doi:10.1109/42.906424
Sperling, G. (1960). The information available in brief visual presentations. Psychological
Monographs: General and Applied, 74(11), 1–29. doi:10.1037/h0093759
Theeuwes, J., & Belopolsky, A. V. (2012). Reward grabs the eye: Oculomotor capture by rewarding
stimuli. Vision Research, 74, 80–85. doi:10.1016/j.visres.2012.07.024
Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory.
Nature, 453, 233–235. doi:10.1038/nature06860