The Oxford Handbook of Cognitive Neuroscience, Vol. 1 PDF
The Oxford Handbook of Cognitive Neuroscience, Vol. 1 PDF
The Oxford Handbook of Cognitive Neuroscience, Vol. 1 PDF
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
Peter E. Nathan
Area Editors:
Clinical Psychology
David H. Barlow
Cognitive Neuroscience
Cognitive Psychology
Daniel Reisberg
Counseling Psychology
Developmental Psychology
Health Psychology
Howard S. Friedman
History of Psychology
David B. Baker
Page 1 of 2
Oxford Library of Psychology
Todd D. Little
Neuropsychology
Kenneth M. Adams
Organizational Psychology
Steve W. J. Kozlowski
Page 2 of 2
[UNTITLED]
[UNTITLED]
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
(p. iv)
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Page 1 of 2
[UNTITLED]
987654321
Printed in the United States of America
on acid-free paper
Page 2 of 2
Oxford Library of Psychology
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
The Library surveys psychology’s principal subfields with a set of handbooks that capture
the current status and future prospects of those major subdisciplines. This initial set in
cludes handbooks of social and personality psychology, clinical psychology, counseling
psychology, school psychology, educational psychology, industrial and organizational psy
chology, cognitive psychology, cognitive neuroscience, methods and measurements, histo
ry, neuropsychology, personality assessment, developmental psychology, and more. Each
handbook undertakes to review one of psychology’s major subdisciplines with breadth,
comprehensiveness, and exemplary scholarship. In addition to these broadly conceived
volumes, the Library also includes a large number of handbooks designed to explore in
depth more specialized areas of scholarship and research, such as stress, health and cop
ing, anxiety and related disorders, cognitive development, and child and adolescent as
sessment. In contrast to the broad coverage of the subfield handbooks, each of these lat
ter volumes focuses on an especially productive, more highly focused line of scholarship
and research. Whether at the broadest or most specific level, however, all of the Library
handbooks offer synthetic coverage that reviews and evaluates the relevant past and
Page 1 of 2
Oxford Library of Psychology
present research and anticipates research in the future. Each handbook in the Library
includes introductory and concluding chapters written by its editor or editors to provide a
roadmap to the handbook’s table of contents and to offer informed anticipations of signifi
cant future developments in that field.
An undertaking of this scope calls for handbook editors and chapter authors who are es
tablished scholars in the areas about which they write. Many of the (p. viii) nation’s and
world’s most productive and respected psychologists have agreed to edit Library
handbooks or write authoritative chapters in their areas of expertise.
For whom has the Oxford Library of Psychology been written? Because of its breadth,
depth, and accessibility, the Library serves a diverse audience, including graduate stu
dents in psychology and their faculty mentors, scholars, researchers, and practitioners in
psychology and related fields. All will find in the Library the information they seek on the
subfield or focal area of psychology in which they work or are interested.
In summary, the Oxford Library of Psychology will grow organically to provide a thorough
ly informed perspective on the field of psychology, one that reflects both psychology’s dy
namism and its increasing interdisciplinarity. Once published electronically, the Library is
also destined to become a uniquely valuable interactive tool, with extended search and
browsing capabilities. As you begin to consult this handbook, we sincerely hope you will
share our enthusiasm for the more than 500-year tradition of Oxford University Press for
excellence, innovation, and quality, as exemplified by the Oxford Library of Psychology.
Peter E. Nathan
Editor-in-Chief
Page 2 of 2
About the Editors
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
Stephen M. Kosslyn
Stephen M. Kosslyn is the Founding Dean of the university at the Minerva Project, based
in San Francisco. Before that, he served as Director of the Center for Advanced Study in
the Behavioral Sciences and Professor of Psychology at Stanford University, and was pre
viously chair of the Department of Psychology, Dean of Social Science, and the John Lind
sley Professor of Psychology in Memory of William James at Harvard University. He re
ceived a B.A. from UCLA and a Ph.D. from Stanford University, both in psychology. His
original graduate training was in cognitive science, which focused on the intersection of
cognitive psychology and artificial intelligence; faced with limitations in those approach
es, he eventually turned to study the brain. Kosslyn’s research has focused primarily on
Page 1 of 2
About the Editors
the nature of visual cognition, visual communication, and individual differences; he has
authored or coauthored 14 books and over 300 papers on these topics. Kosslyn has re
ceived the American Psychological Association’s Boyd R. McCandless Young Scientist
Award, the National Academy of Sciences Initiatives in Research Award, a Cattell Award,
a Guggenheim Fellowship, the J-L. Signoret (p. x) Prize (France), an honorary Doctorate
from the University of Caen, an honorary Doctorate from the University of Paris
Descartes, an honorary Doctorate from Bern University, and election to Academia Rodi
nensis pro Remediatione (Switzerland), the Society of Experimental Psychologists, and
the American Academy of Arts and Sciences.
Page 2 of 2
Contributors
Contributors
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
Claude Alain
Baycrest Centre
Agnès Alsius
Department of Psychology
Queen’s University
George A. Alvarez
Page 1 of 19
Contributors
Department of Psychology
Harvard University
Cambridge, MA
Stephen R. Arnott
Baycrest Centre
Moshe Bar
Charlestown, MA
Bryan L. Benson
Department of Psychology
Page 2 of 19
Contributors
School of Kinesiology
University of Michigan
Ann Arbor, MI
Damien Biotti
Bron, France
Annabelle Blangero
Bron, France
Sheila E. Blumstein
Brown University
Providence, RI
Page 3 of 19
Contributors
Grégoire Borst
Laboratory for the Psychology of Child Development and Education (CNRS Unit
3521)
Paris, France
Department of Psychology
Harvard University
Cambridge, MA
Nathaniel B. Boyden
Department of Psychology
University of Michigan
Ann Arbor, MI
Andreja Bubic
Page 4 of 19
Contributors
Charlestown, MA
Bradley R. Buchsbaum
Baycrest Centre
Roberto Cabeza
Duke University
Durham, NC
Denise J. Cai
Department of Psychology
La Jolla, CA
Page 5 of 19
Contributors
Alfonso Caramazza
Department of Psychology
Harvard University
Cambridge, MA
University of Trento
Rovereto, Italy
Department of Psychology
University of Kansas
Lawrence, KS
Jared Danker
Department of Psychology
Page 6 of 19
Contributors
New York, NY
Sander Daselaar
Radboud University
Nijmegen, Netherlands
Duke University
Durham, NC
Lila Davachi
Department of Psychology
New York, NY
Mark D’Esposito
Page 7 of 19
Contributors
Department of Psychology
University of California
Berkeley, CA
Benjamin J. Dyson
Department of Psychology
Ryerson University
Jessica Fish
Cambridge, UK
Angela D. Friederici
Department of Neuropsychology
Page 8 of 19
Contributors
Leipzig, Germany
Melvyn A. Goodale
Kalanit Grill-Spector
Stanford University
Stanford, CA
Argye E. Hillis
Baltimore, MD
Page 9 of 19
Contributors
Ray Jackendoff
Tufts University
Medford, MA
Petr Janata
Department of Psychology
Davis, CA
Roni Kahana
Department of Neurobiology
Rehovot, Israel
Stephen M. Kosslyn
Page 10 of 19
Contributors
Minerva Project
San Francisco, CA
Youngbin Kwak
Neuroscience Program
University of Michigan
Ann Arbor, MI
Bruno Laeng
Department of Psychology
University of Oslo
Oslo, Norway
Ewen MacDonald
Department of Psychology
Queen’s University
Ontario, Canada
Page 11 of 19
Contributors
Lyngby, Denmark
Bradford Z. Mahon
University of Rochester
Rochester, NY
Claudia Männel
Department of Neuropsychology
Leipzig, Germany
Page 12 of 19
Contributors
University of Queensland
Josh H. McDermott
Cambridge, MA
Kevin Munhall
Department of Psychology
Queen’s University
Emily B. Myers
Department of Psychology
University of Connecticut
Page 13 of 19
Contributors
Storrs, CT
Jeffrey Nicol
Department of Psychology
Nipissing University
Kevin N. Ochsner
Department of Psychology
Columbia University
New York, NY
Laure Pisella
Bron, France
Gilles Rode
Page 14 of 19
Contributors
University Lyon
Yves Rossetti
University Lyon
Mouvement et Handicap
Plateforme IFNL-HCL
Lyon, France
M. Rosario Rueda
Page 15 of 19
Contributors
Universidad de Granada
Granada, Spain
Rachael D. Seidler
Department of Psychology
School of Kinesiology
Neuroscience Program
University of Michigan
Ann Arbor, MI
Noam Sobel
Department of Neurobiology
Rehovot, Israel
Sharon L. Thompson-Schill
Department of Psychology
Page 16 of 19
Contributors
University of Pennsylvania
Philadelphia, PA
Caroline Tilikete
University Lyon
Hôpital Neurologique
Lyon, France
Kyrana Tsapkini
Baltimore, MD
Alain Vighetto
Page 17 of 19
Contributors
University Lyon
Hôpital Neurologique
Lyon, France
Barbara A. Wilson
Department of Psychology
Institute of Psychiatry
King’s College
London, UK
John T. Wixted
Department of Psychology
La Jolla, CA
Eiling Yee
Page 18 of 19
Contributors
Josef Zihl
Neuropsychology Unit
Department of Psychology
University of Munich
Munich, Germany
(p. xiv)
Page 19 of 19
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
This two-volume set reviews the current state-of-the art in cognitive neuroscience. The in
troductory chapter outlines central elements of the cognitive neuroscience approach and
provides a brief overview of the eight sections of the book’s two volumes. Volume 1 is di
vided into four sections comprising chapters that examine core processes, ways in which
they develop across the lifespan, and ways they may break down in special populations.
The first section deals with perception and addresses topics such as the abilities to repre
sent and recognize objects and spatial relations and the use of top-down processes in vi
sual perception. The second section focuses on attention and how it relates to action and
visual motor control. The third section, on memory, covers topics such as working memo
ry, semantic memory, and episodic memory. Finally, the fourth section, on language, in
cludes chapters on abilities such as speech perception and production, semantics, the ca
pacity for written language, and the distinction between linguistic competence and per
formance.
Keywords: cognitive neuroscience, perception, attention, language, memory, spatial relations, visual perception,
visual motor control, semantics, linguistic competence
On a night in the late 1970s, something important happened in a New York City taxicab:
A new scientific field was named. En route to a dinner at the famed Algonquin Hotel, the
neuroscientist Michael Gazzaniga and the cognitive psychologist George Miller coined
the term “cognitive neuroscience.” This field would go on to change the way we think
about the relationship between behavior, mind, and brain.
This is not to say that the field was born on that day. Indeed, as Hermann Ebbinghaus
(1910) noted, “Psychology has a long past, but a short history,” and cognitive neuro
science clearly has a rich and complex set of ancestors. Although it is difficult to say ex
actly when a new scientific discipline came into being, the groundwork for the field had
begun to be laid decades before the term was coined. As has been chronicled in detail
elsewhere (Gardner, 1985; Posner & DiGirolamo, 2000), as behaviorism gave way to the
Page 1 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
cognitive revolution, and as computational and neuroscientific approaches to understand
ing the mind became increasingly popular, researchers in numerous allied fields came to
believe that understanding the relationships between behavior and the mind required un
derstanding their relationship to the brain.
This two-volume set reviews the current state-of-the art in cognitive neuroscience, some
35 years after the field was named. In these intervening years, the field has grown
tremendously—so much so, in fact, that cognitive neuroscience is now less a bounded dis
cipline focused on specific topics and more an approach that permeates psychological
and neuroscientific inquiry. As such, no collection of chapters could possibly encompass
the entire breadth and depth of cognitive neuroscience. That said, this two-volume set at
tempts systematically to survey eight core areas of inquiry in cognitive neuroscience, four
per volume, in a total of 55 chapters.
sketch of some central elements of the cognitive neuroscience approach and a brief
overview of the eight sections of the Handbook’s two volumes.
The first crucial influence on cognitive neuroscience were insights presented in a book by
the late British vision scientist David Marr. Published in 1982, the book Vision took an old
idea—levels of analysis—and made a strong case that we can only understand visual per
ception if we integrate descriptions cast at three distinct, but fundamentally interrelated
(Kosslyn & Maljkovic, 1990), levels. At the topmost computational level, one describes the
problem at hand, such as how one can see edges, derive three-dimensional structure of
shapes, and so on; this level characterizes “what” the system does. At the middle algo
rithm level, one describes how a specific computational problem is solved by a system
that includes specific processes that operate on specific representations; this level char
acterizes “how” the system operates. And at the lowest implementation level, one de
scribes how the representations and processes that constitute the algorithm are instanti
ated in the brain. All three levels are crucial, and characteristics of the description at
each level affect the way we must describe characteristics at the other levels.
This approach proved enormously influential in vision research, and researchers in other
domains quickly realized that it could be applied more broadly. This multilevel approach
is now the foundation for cognitive neuroscience inquiry more generally, although we of
Page 2 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
ten use different terminology to refer to these levels of analysis. For instance, many re
searchers now talk about the levels of behavior and experience, psychological processes
(or information processing mechanisms), and neural systems (Mitchell, 2006; Ochsner,
2007; Ochsner & Lieberman, 2001). But the core idea is still the same as that articulated
by Marr: A complete understanding of the ways in which vision, memory, emotion, or any
other cognitive or emotional faculty operates necessarily involves connecting descriptions
of phenomena across levels of analysis.
The resulting multilevel descriptions have many advantages over the one- or two-level ac
counts that are typical of traditional approaches in allied disciplines such as cognitive
psychology. These advantages include the ability to use both behavioral and brain data in
combination—rather than just one or the other taken alone—to draw inferences about
psychological processes. In so doing, one constructs theories that are constrained by,
must connect to, and must make sense in the context of more types of data than theories
that are couched solely at the behavioral or at the behavioral and psychological levels. We
return to some of these advantages below.
If we are to study human abilities and capacities at multiple levels of analysis, we must
necessarily use multiple types of methods to do so. In fact, many methods exist to mea
sure phenomena at each of the levels of analysis, and new measures are continually being
invented (Churchland & Sejnowski 1988).
Today, this observation is taken as a given by many graduate students who study cogni
tive neuroscience. They take it for granted that we should use studies of patient popula
tions, electrophysiological methods, functional imaging methods, transcranial magnetic
stimulation (TMS, which uses magnetic fields to temporarily impair or enhance neural
functioning in a specific brain area), and other new techniques as they are developed. But
this view wasn’t always the norm. This fact is illustrated nicely by a debate that took
place in the early 1990s about whether and how neuroscience data should inform psycho
logical models of cognitive processes. On one side was the view from cognitive neuropsy
chology, which centered on the idea that studies of patient populations may be sufficient
to understand the structure of cognitive processing (Caramazza, 1992). The claim was
that by studying the ways in which behavior changes as a result of the unhappy accidents
of nature (e.g., strokes, traumatic brain injuries) that caused lesions of language areas,
memory areas, and so on, we can discover the processing modules that constitute the
mind. The key assumption here is that researchers can identify direct relationships be
tween behavioral deficits and specific areas of the brain that were damaged. On the other
side of the debate was the view from cognitive neuroscience, (p. 3) which centered on the
idea that the more methods used, the better (Kosslyn & Intriligator, 1992). Because every
method has its limitations, the more methods researchers could bring to bear, the more
likely they are to have a correct picture of how behavior is related to neural functioning.
In the case of patient populations, for example, in some cases the deficits in behavior
might not simply reflect the normal functions of the damaged regions; rather, they could
Page 3 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
reflect reorganization of function after brain damage or diffuse damage to multiple re
gions that affects multiple separate functions. If so, then observing patterns of dissocia
tions and associations of abilities following brain damage would not necessarily allow re
searchers to delineate the structure of cognitive processing. Other methods would be re
quired (such as neuroimaging) to complement studies of brain-damaged patients.
The field quickly adopted the second perspective, drawing on multiple methods when
constructing and testing theories of cognitive processing. Researchers realized that they
could use multiple methods together in complementary ways: They could use functional
imaging methods to describe the network of processes active in the healthy brain when
engaged in a particular behavior; they could use lesion methods or TMS to assess the
causal relationships between activity in specific brain areas and particular forms of infor
mation processing (which in turn give rise to particular types of behavior); they could use
electrophysiological methods to study the temporal dynamics of cortical systems as they
interactively relate to the behavior of interest. And so on. The cognitive neuroscience ap
proach adopted the idea that no single technique provides all the answers.
That said, there is no denying that some techniques have proved more powerful and gen
erative than others during the past 35 years. In particular, it is difficult to overstate the
impact of functional imaging of the healthy intact human brain, first ushered in by
positron emission tomography studies in the late 1980s (Petersen et al., 1988) and given
a tremendous boost by the advent of, and subsequent boom of, functional magnetic reso
nance imaging in the early 1990s (Belliveau et al., 1992). The advent of functional imag
ing is in many ways the single most important contributor to the rise of cognitive neuro
science. Without the ability to study cortical and subcortical brain systems in action in
healthy adults, it’s not clear whether cognitive neuroscience would have become the cen
tral paradigm that it is today.
We must, however, offer a cautionary note: Functional imaging is by no means the be-all
and end-all of cognitive neuroscience techniques. Like any other method, it has its own
strengths and weaknesses (which have been described in detail elsewhere, e.g., Poldrack,
2006, 2008, 2011; Van Horn & Poldrack, 2009; Yarkoni et al., 2010). Researchers trained
in cognitive neuroscience understand many, if not all, of these limitations, but unfortu
nately, many outside the field do not. This can cause two problems. The first is that new
comers to the field may improperly use functional imaging in the service of overly simplis
tic “brain mapping” (e.g., seeking to identify “love spots” in the brain; Fisher et al., 2002)
and may commit other inferential errors (Poldrack, 2006). The second, less appreciated
problem, is that when nonspecialists read about studies of such overly simplistic hypothe
ses, they may assume that all cognitive neuroscientists traffic in this kind of experimenta
tion and theorizing. As the chapters in these volumes make clear, most cognitive neuro
scientists appreciate the strengths and limits of the various techniques they use, and un
derstand that functional imaging is simply one of a number of techniques that allow neu
roscience data to constrain theories of psychological processes. In the next section, we
turn to exactly this point.
Page 4 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
Constraints and Convergence
One implication of using multiple methods to study phenomena at multiple levels of analy
sis is that we have numerous types of data. These data provide converging evidence for,
and constrain the nature of, theories of human cognition, emotion, and behavior. That is,
the data must fit together, painting different facets of the same picture (this is what we
mean by convergence). And even though each type of data alone does not dictate a partic
ular interpretation, each type helps to narrow the range of possible interpretations (this
is what we mean by constraining the nature of theories). Researchers in cognitive neuro
science acknowledge that data always can be interpreted in various ways, but they also
rely on the fact that data limit the range of viable interpretations—and the more types of
data, the more strongly they will narrow down the range of possible theories. In this
sense, constraints and convergence are the very core of the cognitive neuroscience ap
proach (Ochsner & Kosslyn, 1999).
We note that the principled use of constraining and converging evidence does not privi
lege evidence couched at any one level of analysis. Brain data are not more important,
more real, or more (p. 4) intrinsically valuable than behavioral data, and vice versa.
Rather, both kinds of data constrain the range of possible theories of psychological
processes, and as such, both are valuable.
In addition, both behavioral and brain data can spark changes in theories of psychological
processes. This claim stands in contrast to claims made by those who have argued that
brain data can never change, or in any way constrain, a psychological theory. According
to this view, brain data are ambiguous without a psychological theory to interpret them
(Kihlstrom, 2012). Such arguments fail to appreciate the fact that the goal of cognitive
neuroscience is to construct theories couched at all three levels of analysis. Moreover, be
havioral and brain data often are dependent variables collected in the same experiments.
This is not arbitrary; we have ample evidence that behavior and brain function are inti
mately related: When the brain is damaged in a particular location, specific behaviors are
disrupted—and when a person engages in specific behaviors, specific brain areas are acti
vated. Dependent measures are always what science uses to constrain theorizing, and
thus it follows that both behavioral and brain data must constrain our theories of the in
tervening psychological processes.
This point is so important that we want to illustrate it with a two examples. The first be
gins with classic studies of the amnesic patient known for decades only by his initials,
H.M. (Corkin, 2002). After he died, his brain was famously donated to science and dis
sected live on the Internet in 2009 (see http://thebrainobservatory.ucsd.edu/hm_live.php).
We now know that his name was Henry. In the 1960s, Henry suffered from severe epilep
sy that could not be treated with medication, which arose because of abnormal neural tis
sue in his temporal lobes. At the time, he suffered horribly from seizures, and the last re
maining course of potential treatment was a neurosurgical operation that removed the
tips of Henry’s temporal lobes (and with them, the neural origins of his epileptic
seizures).
Page 5 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
When Henry awoke after his operation, the epilepsy was gone, but so was his ability to
form new memories of events he experienced. Henry was stuck in the eternal present,
forevermore awakening each day with his sense of time frozen at the age at which he had
the operation. The time horizon for his experience was about two minutes, or the amount
of time information could be retained in short-term memory before it required transfer to
a longer-term episodic memory store.
To say that the behavioral sequelae of H.M.’s operation were surprising to the scientific
community at that time is an understatement. Many psychologists and neuroscientists
spent the better part of the next 20 to 30 years reconfiguring their theories of memory in
order to accommodate these and subsequent findings. It wasn’t until the early 1990s that
the long-reaching theoretical implications of Henry’s amnesia finally became clear
(Schacter & Tulving, 1994), when a combination of behavioral, functional imaging, and
patient lesion data converged to implicate a multiple-systems account of human memory.
This understanding of H.M.’s deficits was hard won, and emerged only after an extended
“memory systems debate” in psychology and neuroscience (Schacter & Tulving, 1994).
This debate was between, on the one hand, behavioral and psychological theorists who
argued that we have a single memory system (which has multiple processes) and, on the
other hand, neuroscience-inspired theorists who argued that we have multiple memory
systems (each of which instantiates a particular kind of process or processes). The initial
observation of H.M.’s amnesia, combined with decades of subsequent careful experimen
tation using multiple behavioral and neuroscience techniques, decisively came down on
the side of the multiple memory systems theorists. Cognitive processing relies on multiple
types of memory, and each uses a distinct set of representations and processes. This was
a clear victory for the cognitive neuroscience approach over purely behavioral approach
es.
This debate went back and forth for many years without resolution, and at one point a
mathematical proof was offered that behavioral data alone could never resolve it (Ander
son, 1978). The advent of neuroimaging helped bring this debate largely to a close (Koss
lyn, Thompson, & Ganis, 2006). A key (p. 5) finding was that the first cortical areas that
process visual input during perception each are topographically mapped, such that adja
cent locations in the visual world are represented in adjacent locations in the visual cor
tex. That is, these areas use space on the cortex to represent space in the world. In the
early 1990s, researchers showed that visualizing objects typically activates these areas,
Page 6 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
and increasing the size of a visual mental image activates portions of this cortex that reg
ister increasingly larger sizes in perception. Moreover, in the late 1990s researchers
showed that temporarily impairing these areas using TMS hampers imagery and percep
tion to the same degree. Hence, these brain-based findings provided clear evidence that
visual mental images are, indeed, analogous to visual percepts in that both represent
space in the world by using space in a representation.
We have written as if both debates—about memory systems and mental imagery repre
sentation—are now definitely closed. But this is a simplification; not everyone is con
vinced of one or another view. Our crucial point is that the advent of neuroscientific data
has shifted the terms of the debate. When only behavioral data were available, in both
cases the two alternative positions seemed equally plausible—but after the relevant neu
roscientific data were introduced, the burden of proof shifted dramatically to one side—
and a clear consensus emerged in the field (e.g., see Reisberg, Pearson, & Kosslyn, 2003).
In the years since these debates, evidence from cognitive neuroscience has constrained
theories of a wide range of phenomena. Many such examples are chronicled in this Hand
book.
Volume 1
The first volume surveys classic areas of interest in cognitive neuroscience: perception,
attention, memory, and language. Twenty years ago when Kevin Ochsner was a graduate
student and Stephen Kosslyn was one of his professors, research on these topics formed
the backbone of cognitive neuroscience research. And this is still true today, for two rea
sons.
First, when cognitive neuroscience took off, these were the areas of research within psy
chology that had the most highly developed behavioral, psychological, and neuropsycho
logical (i.e., brain-damaged patient based) models in place. And in the case of research on
perception, attention, and memory, these were topics for which fairly detailed models of
the underlying neural circuitry already had been developed on the basis of rodent and
nonhuman primate studies. As such, these areas were poised to benefit from the use of
brain-based techniques in humans.
Page 7 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
findings themselves and in terms of the evidence such findings provided that the cogni
tive neuroscience approach could be successful.
With this in mind, each of the four sections in Volume 1 includes a selection of chapters
that cover core processes and the ways in which they develop across the lifespan and may
break down in special populations.
The first section, on perception, includes chapters on the abilities to represent and recog
nize objects and spatial relations. In addition, this section contains chapters on the use of
top-down processes in visual perception and on the ways in which such processes enable
us to construct and use mental images. We also include chapters on perceptual abilities
that have seen tremendous research growth in the past 5 to 10 years, such as on the
study of olfaction, audition, and music perception. Finally, there is a chapter on disorders
of perception.
The second section, on attention, includes chapters on the abilities to attend to auditory
and spatial information as well as on the relationships between attention, action, and vi
sual motor control. These are followed by chapters on the development of attention and
its breakdown in various disorders.
The third section, on memory, includes chapters on the abilities to maintain information
in working memory as well as semantic memory, episodic memory, and the consolidation
process that governs the transfer of information from working to semantic and episodic
memory. There is also a chapter on the ability to acquire skills, which depends on differ
ent systems than those used in other forms of memory, as well as chapters on changes in
memory function with older age and the ways in which memorial processes break down in
various disorders.
Finally, the fourth section, on language, includes chapters on abilities such as speech per
ception and production, the distinction between linguistic (p. 6) competence and perfor
mance, semantics, the capacity for written language, and multimodal and developmental
aspects of speech perception.
Volume 2
The first section, on emotion, begins with processes involved in interactions between
emotion, perception, and attention, as well as the generation and regulation of emotion.
This is followed by chapters that provide models for understanding broadly how emotion
affects cognition as well as the contribution that bodily sensation and control make to af
Page 8 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
fective and other processes. This section concludes with chapters on genetic and develop
mental approaches to emotion.
The second section, on self and social cognition, begins with a chapter on the processes
that give rise to the fundamental ability to know and understand oneself. This is followed
by chapters on increasingly complex abilities involved in perceiving others, starting with
the perception of nonverbal cues and perception–action links, and from there ranging to
face recognition, impression formation, drawing inferences about others’ mental states,
empathy, and social interaction. This section concludes with a chapter on the develop
ment of social cognitive abilities.
The third section, on higher cognitive functions, surveys abilities that largely depend on
processes in the frontal lobes of the brain, which interact with the kinds of core perceptu
al, attentional, and memorial processes described in Volume 1. Here, we include chapters
on conflict monitoring and cognitive control, the hierarchical control of action, thinking,
decision making, categorization, expectancies, numerical cognition, and neuromodulatory
influences on higher cognitive abilities.
Finally, in the fourth section, four chapters illustrate how disruptions of the mechanisms
of cognition and emotion produce abnormal functioning in clinical populations. This sec
tion begins with a chapter on attention deficit-hyperactivity disorder and from there
moves to chapters on anxiety, post-traumatic stress disorder, and obsessive-compulsive
disorder.
Summary
Before moving from the appetizer to the main course, we offer two last thoughts.
First, we edited this Handbook with the goal of providing a broad-reaching compendium
of research on cognitive neuroscience that will be widely accessible to a broad audience.
Toward this end, the chapters included in this Handbook are available online to be down
loaded individually. This is the first time that chapters of a Handbook of this sort have
been made available in this way, and we hope this facilitates access to and dissemination
of some of cognitive neuroscience’s greatest hits.
Second, we hope that, whether you are a student, an advanced researcher, or an interest
ed layperson, this Handbook whets your appetite for learning more about this exciting
and growing field. Although reading survey chapters of the sort provided here is an excel
lent way to become oriented in the field and to start building your knowledge of the top
ics that interest you most, we encourage you to take your interests to the next level:
Delve into the primary research articles cited in these chapters—and perhaps even get in
volved in doing this sort of research!
Page 9 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
References
Anderson, J. R. (1978). Arguments concerning representations for mental imagery. Psy
chological Review, 85, 249–277.
Belliveau, J. W., Kwong, K. K., Kennedy, D. N., Baker, J. R., Stern, C. E., et al. (1992). Mag
netic resonance imaging mapping of brain function: Human visual cortex. Investigative
Radiology, 27 (Suppl 2), S59–S65.
Corkin, S. (2002). What’s new with the amnesic patient H.M.? Nature Reviews, Neuro
science, 3, 153–160.
Fisher, H. E., Aron, A., Mashek, D., Li, H., & Brown, L. L. (2002). Defining the brain sys
tems of lust, romantic attraction, and attachment. Archives of Sexual Behavior, 31, 413–
419.
Gardner, H. (1985). The mind’s new science: A history of the cognitive revolution. New
York: Basic Books.
Kihlstrom, J. F. (2012). Social neuroscience: The footprints of Phineas Gage. Social Cogni
tion, 28, 757–782.
Kosslyn, S. M., & Intriligator, J. I. (1992). Is cognitive neuropsychology plausible? The per
ils of sitting on a one-legged stool. Journal of Cognitive Neuroscience, 4, 96–105.
Kosslyn, S. M., & Maljkovic, V. M. (1990). Marr’s metatheory revisited. Concepts in Neu
roscience, 1, 239–251.
Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.
Page 10 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
Ochsner, K. N., & Kosslyn, S. M. (1999). The cognitive neuroscience approach. In B. M.
Bly & D. E. Rumelhart (Eds.), Cognitive science (pp. 319–365). San Diego, CA: Academic
Press.
Ochsner, K. N., & Lieberman, M. D. (2001). The emergence of social cognitive neuro
science. American Psychologist, 56, 717–734.
Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1988). Positron
emission tomographic studies of the cortical anatomy of single-word processing. Nature,
331, 585–589.
Poldrack, R. A. (2011). Inferring mental states from neuroimaging data: From reverse in
ference to large-scale decoding. Neuron, 72, 692–697.
Posner, M. I., & DiGirolamo, G. J. (2000). Cognitive neuroscience: Origins and promise.
Psychological Bulletin, 126, 873–889.
Reisberg, D., Pearson, D. G., & Kosslyn, S. M. (2003). Intuitions and introspections about
imagery: The role of imagery experience in shaping an investigator’s theoretical views.
Applied Cognitive Psychology, 17, 147–160.
Schacter, D. L., & Tulving, E. (1994). (Eds.) Memory systems 1994. Cambridge, M A: MIT
Press.
Van Horn, J. D., & Poldrack, R. A. (2009). Functional MRI at the crossroads. International
Journal of Psychophysiology, 73, 3–9.
Yarkoni, T., Poldrack, R. A., Van Essen, D. C., & Wager, T. D. (2010). Cognitive neuro
science 2.0: Building a cumulative science of human brain function. Trends in Cognitive
Sciences, 14, 489–496.
Kevin N. Ochsner
Stephen Kosslyn
Stephen M. Kosslyn, Center for Advanced Study in the Behavioral Sciences, Stanford
University, Stanford, CA
Page 11 of 11
Representation of Objects
Representation of Objects
Kalanit Grill-Spector
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Functional magnetic resonance imaging (fMRI) has enabled neuroscientists and psycholo
gists to understand the neural bases of object recognition in humans. This chapter re
views fMRI research that yielded important insights about the nature of object represen
tations in the human brain. Combining fMRI with psychophysics may offer clues about
what kind of visual processing is implemented in distinct cortical regions. This chapter
explores how fMRI has influenced current understanding of object representations by fo
cusing on two aspects of object representation: how the underlying representations pro
vide for invariant object recognition and how category information is represented in the
ventral stream. It first provides a brief introduction of the functional organization of the
human ventral stream and a definition of object-selective cortex before describing cue-in
variant responses in the lateral occipital complex (LOC), neural bases of invariant object
recognition, object and position information in the LOC, and viewpoint sensitivity across
the LOC. The chapter concludes by commenting on debates about the nature of functional
organization in the human ventral stream.
Keywords: functional magnetic resonance imaging, object recognition, psychophysics, brain, object
representation, category information, ventral stream, object-selective cortex, lateral occipital complex, functional
organization
Introduction
Humans can effortlessly recognize objects in a fraction of a second despite large variabili
ty in the appearance of objects (Thorpe et al., 1996). What are the underlying representa
tions and computations that enable this remarkable human ability? One way to answer
these questions is to investigate the neural mechanisms of object recognition in the hu
man brain. With the advent of functional magnetic resonance imaging (fMRI) about 20
years ago, neuroscientists and psychologists began to examine the neural bases of object
recognition in humans. fMRI is an attractive method because it is a noninvasive tech
nique that allows multiple measurements of brain activation in the same awake behaving
human. Among noninvasive techniques, it provides the best spatial resolution currently
Page 1 of 29
Representation of Objects
Before the advent of fMRI, knowledge about the function of the ventral stream was based
on single-unit electrophysiology measurements in monkeys and on lesion studies. These
studies showed that neurons in the monkey inferotemporal (IT) cortex respond to shapes
(Fujita et al., 1992) and complex objects such as faces (Desimone et al., 1984), and that
lesions to the ventral stream can produce specific deficits in object recognition such as
agnosia (inability to recognize objects) and prosopagnosia (inability to recognize faces;
see Farah, 1995). However, interpreting lesion data is complicated because lesions are
typically diffuse (usually more than one region is damaged), may disrupt both a cortical
region and its connectivity, (p. 12) and are not replicable across patients. Therefore, the
primary knowledge gained from fMRI research was which cortical sites in the normal hu
man brain are involved in object recognition. The first set of fMRI studies of object and
face recognition in humans identified the regions in the human brain that respond selec
tivity to objects and faces (Malach et al., 1995; Kanwisher et al., 1997; Grill-Spector et al.,
1998b). Next, a series of studies demonstrated that activation in object- and face-selec
tive regions correlates with success at recognizing object and faces, respectively, provid
ing striking evidence for the involvement of these regions in recognition (Bar et al., 2001;
Grill-Spector et al., 2000, 2004). After researchers determined which regions in cortex
are involved in object recognition, their focus shifted to examining the nature of repre
sentations and computations that are implemented in these regions to understand how
they enable efficient object recognition in humans.
In this chapter, I review fMRI research that provided important knowledge about the na
ture of object representations in the human brain. For example, one of the fundamental
problems in recognition is how to recognize an object across variations in its appearance
(invariant object recognition). Understanding how a biological system has solved this
problem may give hints for how to build a robust artificial recognition system. Further,
fMRI is more adequate for measuring object representations than the temporal sequence
of computations en route to object recognition because the time scale of fMRI measure
ments is longer than the time scale of the recognition process (the temporal resolution of
fMRI is on the order of seconds, whereas object recognition takes about 100 to 250 ms).
Nevertheless, combining psychophysics with fMRI may give us some clues to the kinds of
visual processing implemented in distinct cortical regions. For example, finding regions
whose activation is correlated with success at some tasks, but not others, may suggest
the involvement of particular cortical regions in one computation, but not another.
In discussing how fMRI has affected our current understanding of object representations,
I focus on results pertaining to two aspects of object representation:
Page 2 of 29
Representation of Objects
I have chosen these topics for three reasons: (1) are central topics in the field of object
recognition for which fMRI has substantially advanced our understanding, (2) some find
ings related to these topics stirred considerable debate (see the later section, Debates
about the Nature of Functional Organization in the Human Ventral Stream), and (3) some
of the fMRI findings in humans are surprising given prior knowledge from single-unit
electrophysiology in monkeys. In terms of the chapter’s organization, I begin with a brief
introduction of the functional organization of the human ventral stream and a definition
of object-selective cortex, and then describe research that elucidated the properties of
these regions with respect to basic coding principles. I continue with findings related to
invariant object recognition, and then end with research and theories regarding category
representation and specialization in the human ventral stream.
Page 3 of 29
Representation of Objects
For
(p. 13)
Page 4 of 29
Representation of Objects
The LOC responds robustly to many kinds of objects and object categories (including nov
el objects) and is thought to be in the intermediate or high-level stages of the visual hier
archy. Importantly, LOC activations are correlated with subjects’ object recognition per
formance. High LOC responses correlate with successful object recognition (hits), and
low LOC responses correlate with trials in which objects are present, but are not recog
nized (misses) (see Figure 2.1b). There are also object-selective regions in the dorsal
stream (Grill-Spector, 2003; Grill-Spector & Malach, 2004), but these regions do not cor
relate with object recognition performance (Fang & He, 2005) and may be involved in
computations related to visually guided actions toward objects (Culham et al., 2003).
However, a comprehensive discussion of the dorsal stream’s role in object perception is
beyond the scope of this chapter.
In addition to the LOC, researchers found several ventral regions that show preferential
responses to specific object categories. Searching for regions with categorical preference
was motivated by reports that suggested that lesions to the ventral stream can produce
very specific deficits, such as the inability to recognize faces or the inability to read
words, whereas other visual (and recognition) faculties are preserved. By contrasting ac
tivations to different kinds of objects, researchers found ventral regions that show higher
responses to specific object categories, such as lateral fusiform regions that respond
more to animals than tools and medial fusiform regions that respond to tools more than
animals (Chao et al., 1999; Martin et al., 1996); a region in the left OTS that responds
more strongly to letters than textures (the visual word form area [VWFA]; Cohen et al.,
2000); several foci that respond more strongly to faces than other objects (Grill-Spector
et al., 2004; Haxby et al., 2000; Hoffman & Haxby, 2000; Kanwisher et al., 1997; Weiner &
Grill-Spector, 2012), including the fusiform face areas (FFA-1, FFA-2; Kanwisher et al.,
1997; Weiner & Grill-Spector, 2010); regions that respond more strongly to houses and
places than faces and objects, including a region in the parahippocampal gyrus, the
parahippocampal place area (PPA; Epstein & Kanwisher, 1998); and regions that respond
more strongly to body parts than faces and objects, including a region near the MT called
the extrastriate body area (EBA; Downing et al., 2001); and a region in the fusiform gyrus,
the fusiform body area (FBA; Schwarzlose et al., 2005, or OTS-limbs, Weiner and Grill-
Spector, 2011). Nevertheless, many of these object-selective and category-selective re
gions respond to more than one object category and also respond strongly to object frag
ments (Grill-Spector et al., 1998b; Lerner et al., 2001, 2008). This suggests that caution is
needed when interpreting the nature of the selective responses. It is possible that the un
derlying representation is perhaps of object parts, features, and/or fragments and not of
whole objects or object categories.
Findings of category-selective regions in the human brain initiated a fierce debate about
the (p. 14) principles of functional organization in the ventral stream. Should one consider
only the maximal responses to the preferred category, or do the non maximal responses
also carry information? How abstract is the information represented in these regions? For
example, is category information represented in these regions, or are low-level visual fea
tures that are associated with categories represented? I address these questions in detail
Page 5 of 29
Representation of Objects
in the later section, Debates about the Nature of Functional Organization in the Human
Ventral Stream.
Converging evidence from several studies revealed an important aspect of coding in the
LOC: it responds to object shape, not low-level visual features. These studies showed that
all LOC subregions (LO and pFus/OTS) respond more strongly when subjects view objects
independently of the type of visual information that defines the object form (Gilaie-Dotan
et al., 2002; Grill-Spector et al., 1998a; Kastner et al., 2000; Kourtzi & Kanwisher, 2000,
2001; Vinberg & Grill-Spector, 2008) (Figure 2.2). The LOC responds more strongly to (1)
objects defined by luminance compared with luminance textures, (2) objects generated
from random dot stereograms compared with structureless random dot stereograms, (3)
objects generated from structure from motion relative to random (structureless) motion,
and (4) objects generated from textures compared with texture patterns. LOC response to
Page 6 of 29
Representation of Objects
objects is also similar across object format (gray-scale, line drawings, silhouettes), and it
responds to objects delineated by both real and illusory contours (Mendola et al., 1999;
Stanley & Rubin, 2003). Kourtzi and Kanwisher (2001) also showed that when objects
have the same shape but different contours, there is fMRI adaptation (fMRI-A, indicating
a common neural substrate), but there is no fMRI-A when the shared contours were iden
tical but the perceived shape was different, suggesting that the LOC responds to global
shape rather than local contours (see also Kourtzi et al., 2003; Lerner et al., 2002). Over
all, these studies provided fundamental knowledge showing that LOC activation is driven
by shape rather than low-level visual information.
More recently, we examined whether LOC response to objects is driven by their global
shape or their surfaces and whether LOC subregions are sensitive to border ownership.
One open question in object recognition is whether the region in the image that belongs
to the object is first segmented from the rest of the image (figure–ground segmentation)
and then recognized, or whether knowing the shape of an object aids its segmentation
(Nakayama et al., 1995; Peterson & Gibson, 1994a, 1994b). To address these questions,
we scanned subjects when they viewed stimuli that were matched for their low-level in
formation (p. 15) but generated different percepts. Conditions included: (1) a flat object in
front of a flat background object, (2) a flat surface with a shaped hole (same shape as the
object) in front of a flat background, (3) two flat surfaces without shapes, (4) local edges
(created by scrambling the object contour) in front of a background, or (5) random dot
stimuli with no structure (Vinberg & Grill-Spector, 2008) (Figure 2.3a). Note that condi
tions 1 and 2 both contain a shape, but only condition 1 contains an object. We repeated
the experiment twice, once with random dots that were presented stereoscopically and
once with random dots that moved, to determine whether the pattern of result varied
across stereo and motion cues. We found that LOC responses (both LO and pFus/OTS)
were higher for objects and shaped holes than for surfaces, local edges, or random stim
uli (see Figure 2.3b). We observed these results for both motion and stereo cues. In con
trast, LOC responses were not higher for surfaces than for random stimuli and were not
higher for local edges than for random stimuli. Thus, adding either local edge information
or global surface information does not increase LOC response. However, adding a global
shape produces a significant increase in LOC response. These results provide clear evi
dence that cue-invariant responses in the LOC are driven by object shape, rather than by
global surface information or local edge information.
Additional studies revealed that the LOC is also sensitive to border ownership (Appel
baum et al., 2006; Vinberg & Grill-Spector, 2008). Specifically, LO and pFus/OTS respons
es were higher for objects (shapes presented in the foreground) than for the same shapes
when they defined holes in the foreground. Since objects and holes had the same shape,
the only difference between the objects and the holes was the border ownership of the
contour defining the shape. In the former case, the border belongs to the object, and in
the latter case, it belongs to the flat surface in which the hole is punched in. Interestingly,
this higher response to objects than holes was a unique characteristic of LOC subregions
and did not occur in other visual regions (see Figure 2.3). This result suggests that LOC
prefers shapes (and contours) when they define the figure region. One implication of this
Page 7 of 29
Representation of Objects
result is that perception the same cortical machinery determines what is the object in the
visual input as well as which region in the visual input is the figure regions, correspond
ing to the object.
The literature reviewed so far provides accumulating evidence that LOC is involved in
processing object form. The next question that one may ask, given the role of the LOC in
object perception, (p. 16) is, How does it deal with the variability in objects’ appearance?
There are many factors that can affect the appearance of objects. Changes in object ap
pearance can occur as a result of the object being at different locations relative to the ob
server, which will affect the retinal projection of the object in terms of its size and posi
tion. Also, the two-dimensional (2D) projection of a three-dimensional (3D) object on the
retina varies considerably owing to changes in its rotation and viewpoint relative to the
observer. Other changes in appearance occur because of differential illumination condi
Page 8 of 29
Representation of Objects
tions, which affect the object’s color, contrast, and shadowing. Nevertheless, humans are
able to recognize objects across large changes in their appearance, which is referred to
as invariant object recognition.
A central topic of research in the study of object recognition is understanding how invari
ant recognition is accomplished. One view suggests that invariant object recognition is
accomplished because the underlying neural representations are invariant to the appear
ance of objects. Thus, there will be similar neural responses even when the appearance of
an object changes considerably. One means by which this can be achieved is by extracting
from the visual input features or fundamental elements (such as geons; Biederman, 1987)
that are relatively insensitive to changes in objects’ appearance. According to one influen
tial model (the recognition by components [RBC] model; Biederman, 1987), objects are
represented by a library of geons (that are easy to detect in many viewing conditions) and
their spatial relations. Other theories suggest that invariance may be generated through
a sequence of computations across a hierarchically organized processing stream in which
the level of sensitivity to object transformation decreases from one level of processing to
the next. For example, at the lowest level, neurons code local features, and in higher lev
els of the processing stream, neurons respond to more complex shapes and are less sensi
tive to changes in position and size (Riesenhuber & Poggio, 1999).
gree of position sensitivity of IT neurons (DiCarlo & Maunsell, 2003; Op De Beeck & Vo
gels, 2000; Rolls, 2000). A related issue is whether position sensitivity of neurons in high
er visual areas manifests as an orderly, topographic representation of the visual field.
Several studies documented sensitivity to both eccentricity and polar angle in distinct
ventral stream regions. Both object-selective and category-selective regions in the ventral
stream respond to objects presented at multiple positions and sizes. However, the ampli
tude of response to an object varies across different retinal positions. The LO, pFus/OTS,
and category-selective regions (e.g. FFA, PPA) respond more strongly to objects present
ed in the contralateral compared with ipsilateral visual field (Grill-Spector et al., 1998b;
Hemond et al., 2007; McKyton & Zohary, 2007). Some regions (LO and EBA) also respond
more strongly to objects presented in the lower visual field (Sayres & Grill-Spector, 2008;
Schwarzlose et al., 2008). Responses also vary with eccentricity: the FFA and the VWFA
respond more strongly to centrally presented stimuli, and the PPA responds more strong
ly to peripherally presented stimuli (Hasson et al., 2002, 2003; Levy et al., 2001; Sayres &
Grill-Spector, 2008). Further, more recently, Aracro & (p. 17) colleagues discovered that
the PPA contains two visual field map representations (Aracaro et al., 2009).
Using fMRI-A, my colleagues and I have shown that the pFus/OTS, but not the LO, ex
hibits some degree of insensitivity to objects’ size and position (Grill-Spector et al., 1999).
fMRI-A is a method that allows characterization of the sensitivity of neural representa
tions to stimulus transformations at a subvoxel resolution. fMRI-A is based on findings
from single-unit electrophysiology showing that when objects repeat, there is a stimulus-
specific decrease in IT cells’ response to the repeated image, but not to other object im
ages (Miller et al., 1991; Sawamura et al., 2006). Similarly, fMRI signals in higher visual
regions show a stimulus-specific reduction (fMRI-A) in response to repetition of identical
object images (Grill-Spector et al., 1999, 2006a; Grill-Spector & Malach, 2001). We
showed that fMRI-A can be used to test the sensitivity of neural responses to object trans
formation by adapting cortex with a repeated presentation of an identical stimulus and
examining adaptation effects when the stimulus is changed along an object transforma
tion (e.g., changing its position). If the response remains adapted, it indicates that neu
rons are insensitive to the change. However, if the response returns to the initial level
(i.e., recovers from adaptation), it indicates sensitivity to the change (Grill-Spector &
Malach, 2001).
Using fMRI-A, we found that repeated presentation of the same face or object at the same
position and size produces reduced fMRI activation. This is thought to reflect stimulus-
specific neural adaptation. Presenting the same face or object in different positions in the
visual field or at different sizes also produces fMRI-A in pFus/OTS and FFA, indicating in
sensitivity to object size and position in the range we tested (Grill-Spector et al., 1999;
see also Vuilleumier et al., 2002). This result is consistent with electrophysiology findings
showing that IT neurons that respond similarly to stimuli at different positions in the visu
al field also show adaptation when the same object is shown in different positions
(Lueschow et al., 1994). In contrast, the LO recovered from fMRI-A to images of the same
Page 10 of 29
Representation of Objects
face or object when presented at different sizes or positions. This indicates that the LO is
sensitive to object position and size.
Recently, several groups examined the sensitivity of the distributed response across the
visual stream to object category and object position (Sayres & Grill-Spector, 2008; Sch
warzlose et al., 2008) and also object identity and object position (Eger et al., 2008).
These studies used multivoxel pattern analyses (MVPA) and classifier methods developed
in machine learning to examine what information is present in the distributed responses
across voxels in a cortical region. The distributed response can carry different informa
tion from the mean response of a region of interest (ROI) when there is variation across
voxel responses.
When exemplars from the same object category are shown in the same position in the vi
sual field LO responses are reliable (or positively correlated). Surprisingly, showing ob
jects from the same category, but at a different position, significantly reduced the correla
tion between activation patterns (Figure 2.4, first vs. third bars) and this reduction was
larger than changing the object category in the same position (see Figure 2.4, first vs.
second bar). Importantly, position and category effects were independent because there
were no significant interactions between position and category (all F values < 1.02, all p
values > 0.31). Thus, changing both object category and position produced maximal
decorrelation between distributed responses (see Figure 2.4, fourth bar).
Page 11 of 29
Representation of Objects
A related recent study examined position sensitivity using pattern analysis more broadly
across the ventral stream, providing additional evidence for a hierarchical organization
across the ventral stream (Schwarzlose et al., 2008). Schwarzlose and colleagues found
that distributed responses to a particular object category (faces, body parts, or scenes)
were similar across positions in ventral temporal regions (e.g., pFus/OTS and FBA) but
changed across positions in occipital regions (e.g., EBA and LO). Thus, accumulating evi
dence from both fMRI-A and pattern analysis studies suggests a hierarchy of representa
Page 12 of 29
Representation of Objects
tions in the human ventral stream through which representations become less sensitive to
object position as one ascends the visual hierarchy.
Page 13 of 29
Representation of Objects
In humans, results vary considerably. Short-lagged fMRI-A experiments, in which the test
stimulus is presented immediately after the adapting stimulus (Grill-Spector et al.,
2006a), suggest that object representations in the lateral occipital complex are view de
pendent (Fang et al., 2007; Gauthier et al., 2002; Grill-Spector et al., 1999; but see Va
lyear et al., 2006). However, long-lagged fMRI-A experiments, in which many intervening
stimuli occur between the test and adapting stimulus (Grill-Spector et al., 2006a), have
provided some evidence for view-invariant representations in the ventral LOC, especially
in the left hemisphere (James et al., 2002; Vuilleumier et al., 2002) and the PPA, (Epstein
et al., 2008). Also, a recent study showed that the distributed LOC responses to objects
remained stable across 60-degree rotations (Eger et al., 2008). Presently, there is no con
sensus across experimental findings in the degree to which ventral stream representa
tions are view dependent or view invariant. These variable results may reflect differences
in the neural representations depending on object category and cortical region, or
methodological differences across studies (e.g., level of object rotation and fMRI-A para
digm used).
Page 14 of 29
Representation of Objects
when adapting with front views than back views of animals. However, subjects’ behav
ioral performance in a discrimination task across object rotations showed that they are
equally sensitive to rotations (performance decreases with rotation level) whether rota
tions are relative to the front or back of an animal (Andresen et al., 2009), suggesting
that this interpretation is unlikely. Alternatively, the apparent rotation cross-adaptation
may be due to lower responses for back views of animals. That is, the apparent adapta
tion across rotation from the front view to the back view is driven by lower responses to
the back view rather than adaptation across 180-degree rotations.
Page 15 of 29
Representation of Objects
tuning width of the neural population and (2) the proportion of neurons in a voxel that
prefer a specific object view.
Figure 2.6a shows the response characteristics of a model of a putative voxel containing a
mixture of view-dependent neural populations tuned to different object views, in which
the distribution of neurons tuned to different object views is uniform. In this model, nar
rower neural tuning to object view (left) results in recovery from fMRI-A for smaller rota
tions than wider view tuning (right). Responses to front and back views are identical
when there is no adaptation (see Figure 2.6a, diamonds), and the pattern of adaptation as
a function of rotation is similar when adapting with the front or back views (see Figure
2.6a). Such a model provides an account of responses to vehicles across object-selective
cortex (as measured with fMRI), and for animals in the LO. Thus, this model suggests that
the difference between the representation of animals and vehicles in the LO is likely due
to a smaller population view tuning for vehicles than animals (a tuning width of σ < 40°
produces complete recovery from adaptation for rotations larger than 60 degrees, as ob
served for vehicles).
Figure 2.6b shows simulation results when there is a prevalence of neurons to the front
view of objects. This simulation shows higher BOLD responses to frontal views without
adaptation (gray vs. black diamonds) and a flatter profile of fMRI-A across rotations when
adapting with the front view. These simulation results are consistent with our observa
tions in pFus/OTS and indicate that the differential recovery from adaptation as a func
tion of the adapting animal view may be a consequence of a larger neural population
tuned to front views of animals.
Page 16 of 29
Representation of Objects
Overall, recent results provide empirical evidence for view-dependent object representa
tion across human object-selective cortex that is evident both with standard fMRI and fM
RI-A measurements. These data provide important empirical constraints for theories of
object recognition and highlight the importance of parametric manipulations for captur
ing neural selectivity to any type of stimulus transformation. They findings also generate
new questions. For example, if there is no view-invariant neural representation in the hu
man ventral temporal cortex, how is view invariant object recognition accomplished? One
Page 17 of 29
Representation of Objects
As illustrated in Figure 2.1, several regions in the ventral stream exhibit higher responses
to particular object categories such as places, faces, and body parts compared with other
object categories. Findings of category-selective regions initiated a fierce debate about
the principles of functional organization in the ventral stream. Are there regions in the
cortex that are specialized for (p. 22) any object category? Is there something special
about computations relevant to specific categories that generate specialized cortical re
gions for these computations? That is, perhaps some general processing is applied to all
objects, but some computations may be specific to certain domains and may require addi
tional brain resources.
In explaining the pattern of functional selectivity in the ventral stream, four prominent
views have emerged. The main debate centers on the question of whether regions that
elicit maximal response for a category should be treated as a module for the representa
tion of that category, or whether they are part of a more general object recognition sys
tem.
classes of stimuli beyond what would be required from a general object recognition sys
tem. For example, faces, like other objects, need to be recognized across variations in
their appearance (a domain-general process). However, given the importance of face pro
cessing for social interactions, there are aspects of face processing that are unique. Spe
cialized face processing may include identifying faces at the individual level (e.g., John vs.
Harry), extracting gender information, gaze, expression, and so forth. These unique face-
related computations may be implemented in face-selective regions.
Process Maps
Tarr and Gauthier (2000) proposed that object representations are clustered according to
the type of processing that is required, rather than according to their visual attributes. It
is possible that different levels of processing may require dedicated computations that
are performed in localized cortical regions. For example, faces are usually recognized at
the individual level (e.g. “Bob Jacobs”), but many objects are typically recognized at the
category level (e.g. “a horse”). Following this reasoning, and evidence that objects of ex
pertise activate the FFA more than other objects (Gauthier et al., 1999, 2000), Gauthier,
Tarr, and their colleagues have suggested that the FFA is a region for subordinate identi
fication of any object category that is automated by expertise (Gauthier et al., 1999, 2000;
Tarr & Gauthier, 2000).
One of the reasons that this view is appealing is that a distributed code is a combinatorial
code that allows representation of a large number of object categories. Given
Biederman’s rough estimate that humans can recognize about 30,000 categories (Bieder
man, 1987), this provides a neural substrate that has a capacity to represent such a large
number of categories. Second, this model posited a provocative view that when consider
ing information in the ventral stream, one needs to consider the weak signals as much as
the strong signals because both convey useful information.
2010, 2011, 2013). Specifically, our data illustrate that there is not just one distinct re
gion selective for each category (i.e., a single FFA or FBA) in the ventral temporal cortex,
but rather a series of face- and limb-selective clusters that minimally overlap, with a con
sistent organization relative to one another on a posterior-to-anterior axis on the OTS and
fusiform gyrus (FG). Our data also show an interaction between localized cortical clusters
and distributed responses across voxels outside these clusters. Our results further illus
trate that even in weakly selective voxels outside of these clusters, the distributed re
sponses for faces and limbs are distinct from one another. Nevertheless, there is signifi
cantly more face information in the distributed responses in weakly and highly selective
voxels compared with nonselective voxels, indicating differential amounts of information
in these different subsets of voxels where weakly and highly selective voxels are more in
formative than nonselective voxels.
These data suggest a fourth model—a sparsely distributed organization in the ventral
temporal cortex—mediating the debate between modular and distributed theories of ob
ject representation. Sparsely refers to the presence of several face- and limb-selective
clusters with a distinct, minimally overlapping organization, and distributed refers to the
presence of information in weakly and nonselective voxels outside of these clusters. This
sparsely distributed organization is supported by recent cortical connectivity studies indi
cating a hybrid modular and distributed organization (Borra et al., 2009; Zangenehpour &
Chaudhuri, 2005), as well as theoretical work of a sparse-distributed network (Kanerva,
1988).
Presently, there is no consensus in the field about which account best explains ventral
stream functional organization. Much of the debate centers on the degree to which object
processing is constrained to discrete modules or involves distributed computations across
large stretches of the ventral stream (Op de Beeck et al., 2008). The debate is both about
the spatial scale on which computations for object recognition occur and about the funda
mental principles that underlie specialization in the ventral stream. On the one hand, do
main-specific theories need to address findings of multiple foci that show selectivity. For
example, there are multiple foci in the ventral stream that respond more strongly to faces
versus objects. Thus, a strong modular account of a single “face module” for face recogni
tion is unlikely. Second, the spatial extent of these putative modules is undetermined, and
it is unclear whether each of these category-selective regions corresponds to a visual
area. On the other hand, a very distributed and overlapping account of object representa
tion in the ventral stream suffers from the potential problem that in order to resolve cate
gory information, the brain may need to read out information present across the entire
ventral stream (which is inefficient). Further, the fact that there is information in the dis
tributed response does not mean that the brain uses the information in the same way that
an independent classifier does. It is possible that activation in localized regions is more
informative for perceptual decisions than the information available across the entire ven
tral stream (Grill-Spector et al., 2004; Williams et al., 2007). For example, FFA responses
predict when subjects recognize faces and birds, but do not predict when subjects recog
nize houses, guitars, or flowers (Grill-Spector et al., 2004). The recent sparsely distrib
uted model we proposed attempts to bridge between the extreme modular views and
Page 20 of 29
Representation of Objects
highly distributed and overlapping views of organization of the ventral temporal cortex.
One particular appeal of this view is that it is closely tied to the measurements and allows
for additional clusters to be incorporated into the model. As scanning resolutions improve
for human fMRI studies, the number of clusters is likely to increase, but the alternating
nature of face and limb representations is likely to remain in adjacent activations, as also
suggested by monkey fMRI (Pinsk et al., 2009).
However, many questions remain. First, what is the relationship between neural sensitivi
ty to object transformations and behavioral sensitivity to object transformations? Do bias
es in neural representations produce biases in performance? For example, empirical evi
dence shows over-representation of the lower visual field in LO. Does this lead to better
recognition in the lower than upper visual field? Second, what information does the visual
system use to build invariant object representations? Third, (p. 24) what computations are
implemented in distinct cortical regions involved in object recognition? Does the “aha”
moment in recognition involve a specific response in a particular brain region, or does it
involve a distributed response across a large cortical expanse? Combining experimental
methods such as fMRI and EEG will provide high spatial and temporal resolution, which
is critical to addressing this question. Fourth, why do representation of few categories
such as faces or body parts yield local clustered activations while many other categories
(e.g., manmade objects) produce more diffuse and less intense responses across the ven
tral temporal cortex? Fifth, what is the pattern of connectivity between ventral stream vi
sual regions in the human brain? Although the connectivity in monkey visual cortex has
been extensively explored (Moeller et al., 2008; Van Essen et al., 1990), there is little
knowledge about connectivity between cortical visual areas in the human ventral stream.
This knowledge is necessary for building a model of hierarchical processing in humans
and any neural network model of object recognition. Future directions that combine
methodologies, such as psychophysics with fMRI, EEG with fMRI, or diffusion tensor
imaging with fMRI, will be instrumental in addressing these fundamental questions.
Acknowledgements
I thank David Andresen, Rory Sayres, Joakim Vinberg, and Kevin Weiner for their contri
butions to the research summarized in this chapter. This work was supported by NSF
grant and NEI grant.
Page 21 of 29
Representation of Objects
References
Andresen, D. R., Vinberg, J., & Grill-Spector, K. (2009). The representation of object view
point in the human visual cortex. NeuroImage, 45, 522–536.
Appelbaum, L. G., Wade, A. R., Vildavski, V. Y., Pettet, M. W., & Norcia, A. M. (2006). Cue-
invariant networks for figure and background processing in human visual cortex. Journal
of Neuroscience, 26, 11695–11708.
Bar, M., Tootell, R. B., Schacter, D. L., Greve, D. N., Fischl, B., Mendola, J. D., Rosen, B. R.,
& Dale, A. M. (2001). Cortical mechanisms specific to explicit visual object recognition.
Neuron, 29, 529–535.
Biederman, I., & Cooper, E. E. (1991). Evidence for complete translational and reflection
al invariance in visual object priming. Perception, 20, 585–593.
Borra, E., Ichinohe, N., Sato, T., Tanifuji, M., Rockland KS. (2010). Cortical connections to
area TE in monkey: hybrid modular and distributed organization. Cereb Cortex, 20 (2):
257–70.
Bulthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view
interpolation theory of object recognition. Proceedings of the National Academy of
Sciences U S A 89: 60–64.
Bulthoff, H. H., Edelman, S. Y., & Tarr, M. J. (1995). How are three-dimensional objects
represented in the brain? Cerebral Cortex 5, 247–260.
Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates in tempo
ral cortex for perceiving and knowing about objects. Nature Neuroscience, 2, 913–919.
Cohen, L., Dehaene, S., Naccache, L., Lehericy S., Dehaene-Lambertz, G., Henaff, M. A.,
& Michel, F. (2000). The visual word form area: Spatial and temporal characterization of
an initial stage of reading in normal subjects and posterior split-brain patients. Brain, 123
(2), 291–307.
Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153, 180–189.
Desimone, R., Albright, T. D., Gross, C. G., & Bruce, C. (1984). Stimulus-selective proper
ties of inferior temporal neurons in the macaque. Journal of Neuroscience, 4, 2051–2062.
Page 22 of 29
Representation of Objects
DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog
nitive Science, 11, 333–341.
DiCarlo, J. J., & Maunsell, J. H. (2003). Anterior inferotemporal neurons of monkeys en
gaged in object recognition can be highly sensitive to object retinal position. Journal of
Neurophysiology, 89, 3264–3278.
Dill, M., & Edelman, S. (2001). Imperfect invariance to object translation in the discrimi
nation of complex shapes. Perception, 30: 707–724.
Downing, P. E., Jiang, Y., Shuman, M., & Kanwisher, N. (2001). A cortical area selective
for visual processing of the human body. Science, 293, 2470–2473.
Edelman, S., & Bulthoff, H. H. (1992) Orientation dependence in the recognition of famil
iar and novel views of three-dimensional objects. Vision Research, 32, 2385–2400.
Edelman, S., & Intrator, N. (2000), (Coarse coding of shape fragments) + (retinotopy) ap
proximately = representation of structure. Spatial Vision, 13, 255–264.
Eger, E., Ashburner, J., Haynes, J. D., Dolan, R. J., & Rees, G. (2008). fMRI activity pat
terns in human LOC carry information about object exemplars within category. Journal of
Cognitive Neuroscience, 20, 356–370.
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ
ment. Nature, 392, 598–601.
Epstein, R. A., Parker, W. E., & Feiler, A. M. (2008). Two kinds of fMRI repetition suppres
sion? Evidence for dissociable neural mechanisms. Journal of Neurophysiology, 99 (6),
2877–2886.
Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.
Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2001) Categorical rep
(p. 25)
resentation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316.
Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri
mate prefrontal and inferior temporal cortices during visual categorization. Journal of
Neuroscience, 23, 5235–5246.
Fujita, I., Tanaka, K., Ito, M., & Cheng, K. (1992). Columns for visual features of objects in
monkey inferotemporal cortex. Nature, 360, 343–346.
Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and
birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197.
Page 23 of 29
Representation of Objects
Gauthier, I., Tarr, M. J., Anderson, A. W., Skudlarski, P., & Gore, J. C. (1999). Activation of
the middle fusiform “face area” increases with expertise in recognizing novel objects. Na
ture Neuroscience, 2, 568–573.
Gilaie-Dotan, S., Ullman, S., Kushnir, T., & Malach, R. (2002). Shape-selective stereo pro
cessing in human object-related visual areas. Human Brain Mapping, 15, 67–79.
Grill-Spector, K. (2003). The neural basis of object perception. Current Opinion in Neuro
biology, 13, 159–166.
Grill-Spector, K., Golarai, G., & Gabrieli, J. (2008). Developmental neuroimaging of the hu
man ventral visual cortex. Trends in Cognitive Science, 12, 152–162.
Grill-Spector, K., Henson, R., & Martin, A. (2006a). Repetition and the brain: Neural mod
els of stimulus-specific effects. Trends in Cognitive Science, 10, 14–23.
Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area subserves
face perception, not generic within-category identification. Nature Neuroscience, 7, 555–
562.
Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Dif
ferential processing of objects under various viewing conditions in the human lateral oc
cipital complex. Neuron, 24, 187–203.
Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., & Malach, R. (1998a). Cue-invariant
activation in object-related areas of the human occipital lobe. Neuron, 21, 191–202.
Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., & Malach, R. (1998b). A
sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hu
man Brain Mapping, 6, 316–328.
Grill-Spector, K., Kushnir, T., Hendler, T., & Malach, R. (2000). The dynamics of object-se
lective activation correlate with recognition performance in humans. Nature Neuro
science, 3, 837–843.
Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica (Amst), 107, 293–321.
Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro
science, 27, 649–677.
Hasson, U., Harel, M., Levy, I., & Malach, R. (2003). Large-scale mirror-symmetry organi
zation of human occipito-temporal object areas. Neuron, 37, 1027–1041.
Hasson, U., Levy, I., Behrmann, M., Hendler, T., & Malach, R. (2002). Eccentricity bias as
an organizing principle for human high-order object areas. Neuron, 34, 479–490.
Page 24 of 29
Representation of Objects
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor
tex. Science, 293, 2425–2430.
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system
for face perception. Trends in Cognitive Science, 4, 223–233.
Hemond, C. C., Kanwisher, N. G., & Op de Beeck, H. P. (2007). A preference for contralat
eral stimuli in human object- and face-selective cortex. PLoS ONE, 2, e574.
James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differen
tial effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron,
35, 793–801.
Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17,
4302–4311.
Kastner, S., De Weerd, P., & Ungerleider, L. G. (2000). Texture segregation in the human
visual cortex: A functional MRI study. Journal of Neurophysiology, 83, 2453–2457.
Kobatake, E., & Tanaka, K. (1994). Neuronal selectivities to complex object features in
the ventral visual pathway of the macaque cerebral cortex. Journal of Neurophysiology,
71, 856–867.
Kourtzi, Z., & Kanwisher, N. (2000). Cortical regions involved in perceiving object shape.
Journal of Neuroscience, 20, 3310–3318.
Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the hu
man lateral occipital complex. Science, 293, 1506–1509.
Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K. (2003). Integra
tion of local features into global shapes: monkey and human fMRI studies. Neuron, 37,
333–346.
Larsson, J., & Heeger, D. J. (2006). Two retinotopic visual areas in human lateral occipital
cortex. Journal of Neuroscience, 26, 13128–13142.
Lerner, Y., Epshtein, B., Ullman, S., & Malach, R. (2008). Class information predicts acti
vation by object fragments in human object areas. Journal of Cognitive Neuroscience, 20,
1189–1206.
Page 25 of 29
Representation of Objects
Lerner, Y., Hendler, T., Ben-Bashat, D., Harel, M., & Malach, R. (2001). A hierarchical axis
of object processing stages in the human visual cortex. Cerebral Cortex, 11, 287–297.
Lerner, Y., Hendler, T., & Malach, R. (2002). Object-completion effects in the human later
al occipital complex. Cerebral Cortex, 12, 163–177.
Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organi
zation of human object areas. Nature Neuroscience, 4, 533–539.
Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior tem
poral cortex of monkeys. Current Biology, 5, 552–563.
Malach, R., Levy, I., & Hasson, U. (2002) The topography of high-order human object ar
eas. Trends in Cognitive Science, 6, 176–184.
Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., Ledden,
P. J., Brady, T. J., Rosen, B. R., & Tootell, R. B. (1995). Object-related activity revealed by
functional magnetic resonance imaging in human occipital cortex. Proceedings of the Na
tional Academy of Sciences U S A, 92, 8135–8139.
Marr, D. (1980). Visual information processing: The structure and creation of visu
(p. 26)
Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of cat
egory-specific knowledge. Nature, 379, 649–652.
McKyton, A., & Zohary, E. (2007). Beyond retinotopic mapping: The spatial representa
tion of objects in the human lateral occipital complex. Cerebral Cortex, 17, 1164–1172.
Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K., & Tootell, R. B. (1999). The representa
tion of illusory and real contours in human cortical visual areas revealed by functional
magnetic resonance imaging. Journal of Neuroscience, 19, 8560–8572.
Miller, E. K., Li, L., & Desimone, R. (1991). A neural mechanism for working and recogni
tion memory in inferior temporal cortex. Science, 254, 1377–1379.
Moeller, S., Freiwald, W. A., & Tsao, D. Y. (2008). Patches with links: A unified system for
processing faces in the macaque temporal lobe. Science, 320, 1355–1359.
Nakayama, K., He, Z. J., & Shimojo, S. (1995). Visual surface representation: A critical
link between low-level and high-level vision. In S. M. Kosslyn & D. N. Osherson (Eds.), An
invitation to cognitive sciences: Visual cognition. Cambridge, MA: MIT Press.
Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules and dimensions. Nature Reviews, Neuroscience, 9, 123–135.
Op De Beeck, H., & Vogels, R. (2000). Spatial sensitivity of macaque inferior temporal
neurons. Journal of Comparative Neurology, 426, 505–518.
Page 26 of 29
Representation of Objects
Perrett, D. I. (1996). View-dependent coding in the ventral stream and its consequence
for recognition. In R. Camaniti, K. P. Hoffmann, & A. J. Lacquaniti (Eds.), Vision and move
ment mechanisms in the cerebral cortex (pp. 142–151). Strasbourg: HFSP.
Perrett, D. I., Oram, M. W., & Ashbridge, E. (1998). Evidence accumulation in cell popula
tions responsive to faces: An account of generalisation of recognition without mental
transformations. Cognition, 67, 111–145.
Peterson, M. A., & Gibson, B. S. (1994a). Must shape recognition follow figure-ground or
ganization? An assumption in peril. Psychological Science, 5, 253–259.
Pinsk, MA., Arcaro, M., Weiner, KS., Kalkus, JF., Inati, SJ., Gross, CG., Kastner, S. (2009).
Neural representations of faces and body parts in macaque and human cortex: a compar
ative FMRI study. J Neurophysiol. (5): 2581–600.
Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional
objects. Nature, 343, 263–266.
Quiroga, R. Q., Mukamel, R., Isham, E. A., Malach, R., & Fried, I. (2008). Human single-
neuron responses at the threshold of conscious recognition. Proceedings of the National
Academy of Sciences U S A, 105, 3599–3604.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual repre
sentation by single neurons in the human brain. Nature, 435, 1102–1107.
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex.
Nature Neuroscience, 2, 1019–1025.
Rolls, E. T. (2000). Functions of the primate temporal lobe cortical visual areas in invari
ant visual object and face recognition. Neuron, 27, 205–218.
Rolls, E. T., & Milward, T. (2000). A model of invariant object recognition in the visual sys
tem: Learning rules, activation functions, lateral inhibition, and information-based perfor
mance measures. Neural Computation, 12, 2547–2572.
Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of neuronal adaptation does
not match response selectivity: A single-cell study of the FMRI adaptation paradigm. Neu
ron, 49, 307–318.
Sayres, R., & Grill-Spector, K. (2008). Relating retinotopic and object-selective responses
in human lateral occipital cortex. Journal of Neurophysiology, 100 (1), 249–267.
Schwarzlose, R. F., Baker, C. I., & Kanwisher, N. K. (2005). Separate face and body selec
tivity on the fusiform gyrus. Journal of Neuroscience, 25, 11055–11059.
Page 27 of 29
Representation of Objects
Schwarzlose, R. F., Swisher, J. D., Dang, S., & Kanwisher, N. (2008). The distribution of
category and location information across object-selective regions in human visual cortex.
Proceedings of the National Academy of Sciences U S A, 105, 4447–4452.
Stanley, D. A., & Rubin, N. (2003). fMRI activation in response to illusory contours and
salient regions in the human lateral occipital complex. Neuron, 37, 323–331.
Tarr, M. J., & Bulthoff, H. H. (1995). Is human object recognition better described by geon
structural descriptions or by multiple views? Comment on Biederman and Gerhardstein
(1993). Journal of Experimental Psychology: Human Perception and Performance, 21,
1494–1505.
Tarr, M. J., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual
processing automatized by expertise. Nature Neuroscience, 3, 764–769.
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system.
Nature, 381, 520–522.
Ungerleider, L. G., Mishkin, M., & Macko, K. A. (1983). Object vision and spatial vision:
Two cortical pathways. Trends in Neuroscience, 6, 414–417.
Van Essen, D. C., Felleman, D. J., DeYoe, E. A., Olavarria, J., & Knierim, J. (1990). Modular
and hierarchical organization of extrastriate visual cortex in the macaque monkey. Cold
Spring Harbor Symposia on Quantum Biology, 55, 679–696.
Vinberg, J., & Grill-Spector, K. (2008). Representation of shapes, edges, and surfaces
across multiple cues in the human visual cortex. Journal of Neurophysiology, 99, 1380–
1393.
Vogels, R., & Biederman, I. (2002). Effects of illumination intensity and direction on ob
ject coding in macaque inferior temporal cortex. Cerebral Cortex, 12, 756–766.
Vuilleumier, P., Henson, R. N., Driver, J., & Dolan, R. J. (2002). Multiple levels of visual ob
ject constancy revealed by event-related fMRI of repetition priming. Nature
Neuroscience, 5, 491–499.
Wang, G., Tanaka, K., & Tanifuji, M. (1996). Optical imaging of functional organization in
the monkey inferotemporal cortex. Science, 272, 1665–1668.
(p. 27) Weiner, K. S., & Grill-Spector, K. (2010). Sparsely-distributed organization of face
and limb activations in human ventral temporal cortex. NeuroImage, 52, 1559–1573.
Page 28 of 29
Representation of Objects
Weiner, KS., & Grill-Spector, K. (2011). Not one extrastriate body area: using anatomical
landmarks, hMT+, and visual field maps to parcellate limb-selective activations in human
lateral occipitotemporal cortex. NeuroImage, 56 (4): 2183–99.
Weiner, KS., & Grill-Spector, K. (2013). Neural representations of faces and limbs neigh
bor in human high-level visual cortex: evidence for a new organization principle. Psychol
Res. 277 (1): 74–97.
Williams, M. A., Dang, S., & Kanwisher, N. G. (2007). Only some spatial patterns of fMRI
response are read out in task performance. Nature Neuroscience, 10, 685–686.
Zangenehpour, S., Chaudhuri, A., Zangenehpour, S., Chaudhuri A. (2005). Patchy organi
zation and asymmetric distribution of the neural correlates of face processing in monkey
inferotemporal cortex. Curr Biol, 15 (11): 993–1005.
Kalanit Grill-Spector
Page 29 of 29
Representation of Spatial Relations
Keywords: spatial representations, frames of reference, spatial neglect, lateralization, cognitive maps
Representing the space around our human bodies seems to serve three main functions or
goals: to “act,” to “know,” and to “talk.” First of all, humans need to navigate in their en
vironment. Some navigational skills require a very precise adaptation of the movement of
the entire body to the presence of other external bodies and obstacles (whether these are
static or also mobile). Without an adequate representation of physical distances between
external bodies (and between these and oneself), actions like running in a crowd or dri
ving in traffic would be impossible. Humans also possess hands and need to manipulate
objects; their highly developed control of fine finger movements makes possible the con
struction and use of tools. Engaging in complex sequences of manual actions requires the
positioning and direction of movements over precise and narrow areas of space (e.g.,
when typing on the keyboard, opening a lock with a key, making a knot, playing a musical
Page 1 of 59
Representation of Spatial Relations
However, humans do not solely “act” in space. Humans also “cognize” about space, For
example, we can think about an object being present in a place although neither the ob
ject nor the place is any longer visible (i.e., Piagetian object permanence). We can think
about simple spatial schemata or complex mathematical spaces and geometries (Aflalo &
Graziano, 2006); the ability to represent space as a continuum may lie at the basis of our
understanding of objects’ permanence in space and, therefore, (p. 29) of numerosity (De
haene, 1997). We can also engage in endless construction of meaning and parse the phys
ical world into classes and concepts; in fact, abstraction and recognition of equivalences
between events (i.e., categorization) has obvious advantages, as the cultural evolution of
humans demonstrates. Categories can be expressed in symbols, which can be exchanged.
All this applies to spatial representations as well. For example, the category “to the left”
identifies as equivalent a whole class of positions and can be expressed either in verbal
language or with pictorial symbols (e.g., an arrow: ←). Clearly, humans not only “act” in
space and “know” space, but they also “talk” about space. The role played by spatial cog
nition in linguistic function cannot be underestimated (e.g., “thinking for speaking”;
Slobin, 1996). Thus, a categorical, nonmetric, representation of space constitutes an addi
tional spatial ability, one that could make the groundwork for spatial reference in lan
guage.
The present discussion focuses on vision, in part because evolution has devoted to it a
large amount of the primate brain’s processing capacity (e.g., in humans, about 4 to 6 bil
lion neurons and 20 percent of the entire cortical area; Maunsell & Newsome, 1987; Wan
dell et al., 2009;); in part because vision is the most accurate spatial sense in primates
and is central to the human representation of space (“to see is to know what is where by
looking”; Marr, 1982). Nevertheless, it is important to acknowledge that a representation
of space can be obtained in humans through several sensory modalities (e.g., kinesthesia)
and that blind people do posses a detailed knowledge of space and can move and act in it
as well as talk about it. Some animals possess a representation of navigational space that
can be obtained through sensory information that is not available to humans (e.g., a
“magnetic” sense; Johnsen & Lohmann, 2005; Ritz, 2009). Honeybees’ “dances” can com
municate spatial information that refers to locations where food can be found (Kirchner &
Braun, 1994; Menzel et al., 2000).
We begin by discussing (1) the existence of topographic maps in the brain. From these
maps, the brain extracts higher order representations of the external world that are dedi
cated to the basic distinction between (2) “what” is out there versus “where” it is. These
two types of information need to be integrated for the control of action, and this yields in
turn the representation of (3) “how” an object can be an object of action and “which” spe
cific object among multiple ones is located in a specific place at a given time. However, lo
calization of objects can take place according to multiple and parallel (4) spatial frames of
reference; the deployment of spatial attention can also occur along these multiple refer
Page 2 of 59
Representation of Spatial Relations
ence frames, as the pathology of spatial attention or (5) neglect, after brain damage, has
clearly revealed. Regarding the brain’s specialization for spatial cognition, humans show
a strong degree of (6) cerebral lateralization that may be unique in the animal world. The
current evidence indicates a right hemisphere’s specialization for analog (coordinate)
spatial information versus a left hemisphere’s specialization for digital (categorical) spa
tial information. The representation of categorical spatial relations is also relevant for (7)
object recognition because an object’s structural description is made in terms of parts
and categorical spatial relations between these (i.e., the “where of what”). Finally, we dis
cuss the brain’s representation of large-scale space, as it is used in navigation, or the (8)
“cognitive map” of the environment.
The retina provides the initial topographic map for humans, where nearby scene points
are represented in the responses of nearby photoreceptors and, in turn, in a matrix of
neurons to which they provide their input. Areas of the brain receiving retinal input are
also topographically organized in retinotopic “visual field maps” (Tootell, Hadjikhani, et
al., 1998; Wandell et al., 2005). These preserve, to some extent, the geometric structure
of the retina, which in turn, by the laws of optic refraction, reflects the geometric struc
ture of the external visual (p. 30) world as a planar projection onto a two-dimensional sur
face (Figure 3.1).
Page 3 of 59
Representation of Spatial Relations
There is now consensus that the topographic features of cortical and subcortical maps
are not incidental, but instead are essential to brain function (Kaas, 1997). A topographi
cally organized structure can depict visual information as “points” organized by their rel
ative locations in space and varying in size, brightness, and color. Points near each other
in the represented space are represented by points near each other in the representing
substrate; this (internal) space can be used to represent (external) space (Markman,
1999). Thus, one fundamental property of the brain’s representation of space is that the
brain uses space on the cortex to represent space in the world (Kosslyn, Thompson, & Ga
nis, 2006). Specifically, topographic maps can evaluate how input from one set of recep
tors can be different from that of adjoining sets of receptors. Local connections among
neurons that are topographically organized can easily set up center-surround receptive
fields and compare adjacent features. Other types of brain organization between units
sensitive to adjacent points requires more complex arrays and longer connections (Cher
niak, 1990), which are metabolically (and evolutionarily) costly and result in increases in
neural transmission time. Cortical maps appear to be further arranged in spatial clusters
at a coarser scale (Wandell et al., 2005, 2009). This organization allows neural mosaics in
different maps that serve similar common computational goals to share resources (e.g.,
coordinating the timing of neural signals or temporarily storing memories; Wandell et al.,
2005). Thus, cortical maps may organize themselves to optimize nearest neighbor rela
tionships (Kohonen, 2001) so that neurons that process similar information are located
near each other, minimizing wiring length.
A topographical neural design is clearly revealed by the effects of localized cortical dam
age, resulting in a general loss of visual function restricted to a corresponding region
within the visual field (Horton & Hoyt, 1991). Several areas of the human visual cortex,
but also brainstem nuclei (e.g., the superior colliculus) and thalamus (e.g., the lateral
Page 4 of 59
Representation of Spatial Relations
geniculate nucleus and the pulvinar), are organized (p. 31) into retinotopic maps, which
preserve both left–right and top–bottom ordering. Consequently, cells that are close to
gether on the sending surface (the retina) project to regions that are close together on
the target surface (Thivierge & Marcus, 2007). Remarkably, the dorsal surface of the hu
man brain, extending from the posterior portion of the intraparietal sulcus forward, con
tains several maps (400–700 mm2) that are much smaller than the V1 map (4000 mm2;
Wandell et al., 2005). The visual field is represented continuously as in V1, but the visual
field is split along the vertical meridian so that input to each hemisphere originates from
the contralateral visual hemifield. The two halves are thus “seamed” together by the long
connections of the corpus callosum.
If early vision’s topography has high resolution, later maps in the hierarchy are progres
sively less organized topographically (Malach, Levy, & Hasson, 2002). As a result, the im
age is represented at successive stages with decreasing spatial precision and resolution.
In addition, beyond the initial V1 cortical map, retinotopic maps are complex and patchy.
Consequently, adjacent points in the visual field are not represented in adjacent regions
of the same area in every case (Sereno et al., 1995). However, the topographic organiza
tion of external space in the visual cortex is extraordinarily veridical compared with other
modalities. For example, the representation of bodily space in primary somatosensory
cortex (or the so-called homunculus) clearly violates a smooth and continuous spatial rep
resentation of body parts. The face representation is inverted (Servos et al., 1999; Yang et
al., 1994), and the facial skin areas are located between those representing the thumb
and the lower lip of the mouth (Nguyen et al., 2004). Also, similarly to the extensive rep
resentation of the thumb in somatosensory cortex (in fact larger than that of the skin of
the whole face), cortical visual maps magnify or compress distances in some portions of
the visual field. The central 15 degrees of vision take up about 70 percent of cortical
area; and the central 24 degrees cover 80 percent (Fishman, 1997; Zeki, 1969). In V2,
parts of the retina that correspond to the upper half of the visual field are represented
separately from parts that respond to the lower half of the visual field. Area MT repre
sents only the binocular field, and V4 only the central 30 to 40 degrees, whereas the pari
etal areas represent more of the periphery (Gatass et al., 2005; Sereno et al., 2001). Cal
losal connections in humans allow areas of the inferior parietal cortex and the fusiform
gyrus in the temporal lobe to deal with stimuli presented in the ipsilateral visual field
(Tootell, Mendola, et al., 1998). These topographic organizations have been revealed by a
variety of methods, including clinical studies of patients (Fishman, 1997); animal re
search (Felleman & Van Essen, 1991; Tootell et al., 1982); and more recently, neuroimag
ing in healthy humans (Engel et al., 1994, 1997; Sereno et al., 1995; DeYoe et al., 1996;
Wandell, 1999) and brain stimulation (Kastner et al., 1998). In some of the retinotopic
mapping studies with functional magnetic resonance imaging (fMRI), participants per
formed working memory tasks or planning eye movements (Silver & Kastner, 2009).
These studies revealed the existence of previously unknown topo-graphic maps of visual
space in the human parietal (e.g., Sereno & Huang, 2006) and frontal (e.g., Kastner et al.,
2007) lobes.
Page 5 of 59
Representation of Spatial Relations
joined. Disjoining attributes of the same object exacerbates the problem of integrating
them (i.e., the so-called binding problem; Treisman, 1996; Revonsuo & Newman, 1999;
Seth et al., 2004). However, computer simulations with artificial neural networks have
demonstrated that two subsystems can be more efficient than one in computing different
mappings of the same input at the same time (Otto et al., 1992; Reuckl, Cave & Kosslyn,
1989).
Thus, cognitive neuroscience reveals that the brains of humans and, perhaps, of all other
vertebrates process space and forms (bodies) as independent aspects of reality. Thus, the
visual brain can be divided between “two cortical visual systems” (Ingle, 1967; Mishkin,
Ungerleider & Macko, 1983; Schneider, 1967; Ungerleider & Mishkin, 1982). A ventral
(mostly temporal cortex in humans) visual stream mediates object identification (“what is
it?”), and a dorsal (mostly parietal cortex) visual stream mediates localization of objects
in space (“where is it?”). This partition is most clearly documented for the visual modality
in our species (and in other primates), although it appears to be equally valid for other
sensory processes (e.g., for the auditory system, Alain et al., 2001; Lomber & Malhotra,
2008; Romanski et al., 1999; for touch, Reed et al., 2005). Information processed in the
two streams appears to be integrated at a successive stage in the superior temporal lobe
(Morel & Bullier, 1990), where integration of the two pathways could recruit ocular move
ments to the location of a shape and provide some positional information to the temporal
areas (Otto et al., 1992). Further integration of shape and position information occurs in
Page 6 of 59
Representation of Spatial Relations
Similarly to the classic Newtonian distinction between matter and absolute space, the dis
tinction between what and where processing assumes that (1) objects can be represented
by the brain independently from their locations, and (2) locations can be perceived as im
material points in an immaterial medium, empty of objects. Indeed, at a phenomenologi
cal level, space primarily appears as an all-pervading stuff, an incorporeal or “ethereal”
receptacle that can be filled by material bodies, or an openness in which matter spreads
out. In everyday activities, a region of space is typically identified with reference to
“what” is or could be located there (e.g., “the book is on the table”; Casati & Varzi, 1999).
However, we also need to represent empty or “negative” space because trajectories and
paths toward other objects or places (Collett, 1982) occur in a spatial medium that is free
of material obstacles. The visual modality also estimates distance (i.e., the empty portion
of space between objects), and vision can represent the future position of a moving object
into some unoccupied point in space (Gregory, 2009).
Humans exemplify the division of labor between the two visual systems. Bálint (1909) was
probably the first to report a human case in which the perception of visual location was
impaired while the visual recognition of an object was relatively spared. Bálint’s patient,
who had bilateral inferior parietal lesions, was unable to reach for objects or estimate dis
tances between objects. Holmes and Horax (1919) described similar patients and showed
that they had difficulty in judging differences in the lengths of two lines, but could easily
judge whether a unitary shape made by connecting the same two lines was a trapezoid
and not a rectangle (i.e., when the lines were part of a unitary shape). In general, pa
tients with damage to the parietal lobes lack the ability to judge objects’ positions in
space, as shown by their difficulties in reaching, grasping, pointing to, or verbally de
scribing their position and size (De Renzi, 1982). In contrast, “blindsight” patients, who
are unaware of the presence of an object, can locate by pointing an object in the blind
field of vision (Perenin & Jeannerod, 1978; Weiskrantz, 1986). Children with the
Williams’ (developmental) syndrome show remarkable sparing of object recognition with
severe breakdown of spatial abilities (Landau et al., 2006). In contrast, patients with infe
rior temporal-occipital lesions, who have difficulty in recognizing the identity of objects
and reading (i.e., visual agnosia), typically show unimpaired spatial perception; they can
reach and manipulate objects and navigate without bumping into objects (Kinsbourne &
Warrington, 1962). Patients with visual agnosia after bilateral damage to the ventral sys
tem (Goodale et al., 1991) can also guide a movement in a normal and natural manner to
ward a vertical opening by inserting a piece of cardboard into it (Figure 3.2), but perform
at chance when asked to report either verbally or by adjusting the cardboard to match
the orientation of the opening (Milner & Goodale, 2008). It would thus seem that the spa
tial representations of the dorsal system can effectively guide action but cannot make
even simple pattern discriminations.
Page 7 of 59
Representation of Spatial Relations
In addition, electrical recordings from individual neurons in monkeys’ parietal lobes re
veal cells that encode the shape of objects (Sereno & Maunsell, 1998; Taira et al., 1990;;
Sakatag et al., 1997). However, these pattern (or “what”) representations in the dorsal
cortex are clearly action related; their representation of the geometrical properties of
shapes (e.g., orientation, size, depth, and motion) are used exclusively when reaching and
grasping objects. That is, they represent space in a “pragmatic” sense and without a con
Page 8 of 59
Representation of Spatial Relations
ceptual content that is reportable (Faillenot et al., 1997; Jeannerod & Jacob, 2005; Va
lyear et al., 2005).
It is also clear that the dorsal system’s shape sensitivity does not derive from information
relayed by the ventral system because monkeys with large temporal lesions and profound
object recognition deficits are able to normally grasp small objects (Glickstein et al.,
1998) and catch flying insects. Similarly, patient D.F. (Goodale et al., 1991; Milner et al.,
1991; Milner & Goodale, 2006) showed preserved visuomotor abilities and could catch a
ball in flight (Carey et al., 1996) but could not recognize a drawing of an apple. When
asked to make a copy of it, she arranged straight lines into a spatially incoherent square-
like configuration (Servos et al., 1999). D.F.’s drawing deficit indicates that her spared
dorsal shape representations cannot be (p. 34) accessed or expressed symbolically, de
spite being usable as the targets of directed action. The fact that D.F. performed at
chance, when reporting the orientation of the opening in the previously described “post
ing” experiment, does not necessarily indicate that the conscious representation of space
is an exclusive function of the ventral system (Milner & Goodale, 2006, 2008). Nor does it
indicate that the dorsal system’s representation of space should be best described as a
“zombie agent” (Koch, 2004) or as a form of “blindsight without blindness” (Weiskrantz,
1997). In fact, patient D.F. made accurate metric judgments of which elements in an array
were nearest and which were farthest; when asked to reproduce an array of irregularly
positioned colored dots on a page, her rendition of their relative positions (e.g., what ele
ment was left of or below another element) was accurate, although their absolute posi
tioning was deficient (Carey et al., 2006). It seems likely that a perfect reproduction of an
array of elements requires comparing the copy to the model array as an overall “gestalt”
or perceptual template, a strategy that may depend on the shape perception mechanisms
of D.F.’s (damaged) ventral system.
Page 9 of 59
Representation of Spatial Relations
Finally, it would seem that the ventral system in the absence of normal parietal lobes can
not support an entirely normal perception of shapes. Patients with extensive and bilateral
parietal lesions (i.e., with Bálint’s syndrome) do not show completely normal object or
shape perception; their object recognition is typically limited to one object or just a part
of it (Luria, 1963). Remarkably, these patients need an extraordinarily long time to ac
complish recognition of even a single object, thus revealing a severe reduction in object
processing rate (Duncan et al., 2003). Kahneman, Treisman, and Gibbs (1992) made a dis
tinction between object identification and object perception. They proposed that identifi
cation (i.e., the conscious experience of seeing an instance of an object) depends on form
ing a continuously updated, integrated representation of the shapes and their space–time
coordinates. The ventral and dorsal system may each support a form of “phenomenal”
consciousness (cf. Baars, 2002; Block, 1996), but they necessitate the functions of the oth
er system in order to generate a conscious representation that is reportable and accessi
ble to other parts of the brain (e.g., the executive areas of the frontal lobes; Lamme, 2003,
2006).
Spatial information (in particular fine-grained spatial information about distances, direc
tion, and size) is clearly essential to action. In this respect, much of the spatial informa
tion of the “where” system is actually in the service of movement planning and guidance
or of “praxis” (i.e., how to accomplish an action, especially early-movement planning; An
dersen & Buneo, 2002). In particular, the posterior parietal cortex of primates performs
the function of transforming visual information into a motor plan (Snyder et al., 1997).
For example, grasping an object with one hand is a common primate behavior that, ac
cording to studies of both monkeys and humans, depends on the spatial functions of the
parietal lobe (Castiello, 2005). Indeed, patients with damage to the superior parietal lobe
show striking deficits in visually guided grasping (i.e., optic ataxia; Perenin & Vighetto,
1988). Damage to this area may result in difficulties generating visual-motor transforma
tions that are necessary to mold the hand’s action to the shape and size of the object
(Jeannerod et al., 1994; Khan et al., 2005), as well as to take into account the position of
potential obstacles (Schindler et al., 2004).
Page 10 of 59
Representation of Spatial Relations
The parietal lobes clearly support spatial representations detailed enough to provide the
coordinates for precise actions like reaching, grasping, pointing, touching, looking, and
avoiding a projectile. Given the spatial precision of both manual action and oculomotor
behavior, one would expect that the neural networks of the parietal cortex would include
units with the smallest possible spatial tuning (i.e., very small receptive fields; Gross &
Mishkin, 1977). By the same reasoning, the temporal cortex may preferentially include
units with large spatial tuning because the goal of such a neural network is to represent
the presence of a particular object, regardless of its spatial attributes (i.e., show dimen
sional and translational invariance). However, it turns out that both the parietal and tem
poral cortices contain units with large receptive fields (from 25 to 100 degrees; O’Reilly
et al., 1990) that exclude the fovea and can even represent large bilateral regions of the
visual field (Motter & Mountcastle, 1981).
Therefore, some property of these neural populations other than receptive field size must
underlie the ability of the parietal lobes to code positions precisely. A hint is given by
computational models (Ballard, 1986; Eurich & Schwegler, 1997; Fahle & Poggio, 1981;
Hinton, McClelland, & Rumelhart, 1986; O’Reilly et al., 1990) showing that a population
of neurons with large receptive fields, if these are appropriately overlapping, can be su
perior to a population of neurons with smaller receptive fields in its ability to pinpoint
something. Crucially, receptive fields of parietal neurons are staggered toward the pe
riphery of the visual field, whereas receptive fields of temporal neurons tend to crowd to
ward the central, foveal position of the visual field. Consequently, parietal neurons can ex
ploit coarse coding to pinpoint locations, whereas the temporal neurons, which provide
less variety in output (i.e., they all respond to stimuli in the fovea), trade off the ability to
locate an object with the ability to show invariance to spatial transformations. Spatial at
tention evokes increased activity in individual parietal cells (Constantinidis & Steinmetz,
2001) and within whole regions of the parietal lobes (Corbetta et al., 1993; 2000). There
fore, focusing attention onto a region of space can also facilitate computations of the rela
tive activation of overlapping receptive fields of cells and thus enhance the ability to pre
cisely localize objects (Tsal & Bareket, 2005; Tsal, Meiran, & Lamy, 1995).
Page 11 of 59
Representation of Spatial Relations
Following the common parlance among neuro-scientists, who refer to major divisions be
tween neural streams with interrogative pronouns, the “where” system is to a large ex
tent also the “how” system of the brain (Goodale & Milner, 1992), or the “vision-for-ac
tion” system (whereas the “what” system has been labeled the “vision-for-perception’ sys
tem by Milner & Goodale, 1995, 2008). However, spatial representations do much more
than guide action; we also “know” space and can perceive it without having an intention
al plan or having to perform any action. Although neuroimaging studies show that areas
within the human posterior parietal cortex are active when the individual is preparing to
act, the same areas are also active during the mere observation of others’ actions (Bucci
no et al., 2004; Rizzolatti & Craighero, 2004). One possibility is that these areas may be
automatically registering the “affordances” of objects that could be relevant to action, if
an action were to be performed (Culham & Valyear, 2006).
The superior and intraparietal regions of the parietal lobes are particularly engaged with
eye movements (Corbetta et al., 1998; Luna et al., 1998), but neuroimaging studies also
show that the parietal areas are active when gaze is fixed on a point on the screen and no
action is required, while the observer simply attends to small objects moving randomly on
the screen (Culham et al., 1998). The larger the number of moving objects that have to be
attentively monitored on the screen, the greater is the activation in the parietal lobe (Cul
ham et al., 2001). Moreover, covert attention to spatial positions that are empty of objects
(i.e., before they appear in the expected locations) strongly engages mappings in the pari
etal lobes (Corbetta et al., 2000; Kastner et al., 1999). Monkeys also show neural popula
tion activity within the parietal cortex when they solve visual maze tasks and when men
tally following a path, without moving their eyes or performing any action (Crowe et al.,
2005). In human neuroimaging studies, a stimulus position judgment (left vs. right) in re
lation to the body midline mainly activates the superior parietal lobe (Neggers et al.,
2006), although pointing to or reaching is not required.
In addition, our cognition of space is also qualitative or “categorical” (Hayward & Tarr,
1995). Such a type of spatial information is too abstract to be useful in fine motor guid
ance. Yet, neuropsychological evidence clearly indicates that this type of (p. 36) spatial
perception and knowledge is also dependent on parietal lobe function (Laeng, 1994).
Thus, the superior and inferior regions of the parietal lobes may specialize, respectively,
in two visual-spatial functions: vision-for-action versus vision-for-knowledge, or a “prag
matic” versus a “semantic” function that encodes the behavioral relevance or meaning of
stimuli (Freedman & Assad, 2006; Jeannerod & Jacob, 2005). The superior parietal lobe
(i.e., the dorsal part of the dorsal system) may have a functional role close to the idea of
an “agent” directly operating in space, whereas the inferior parietal lobe plays a role clos
er to that of an “observer” that understands space and registers others’ actions as they
evolve in space (Rizzolatti & Matelli, 2003).
According to Milner and Goodale (2006), the polysensory areas of the inferior parietal
lobe and superior temporal cortex may have developed in humans as new functional ar
eas and be absent in monkeys. Thus, they can be considered a “third stream” of visual
processing. More generally, they may function as a supramodal convergent system be
Page 12 of 59
Representation of Spatial Relations
tween the dorsal and ventral systems that supports forms of spatial cognition that are
unique to our species (e.g., use of pictorial maps; Farrell & Robertson, 2000; Semmes et
al., 1955). These high-level representational systems in the human parietal lobe may also
provide the substrate for the construction and spatial manipulation of mental images
(e.g., the three-dimensional representation of shapes and the ability to “mentally rotate”).
Indeed, mental rotation of single shapes is vulnerable to lesions of the posterior parietal
lobe (in the right hemisphere; Butters et al., 1970) or to its temporary and reversible de
activation after transcranial magnetic stimulation (Harris & Miniussi, 2003) or stimula
tion with cortically implanted electrodes in epileptic patients (Zacks et al., 2003). Neu
roimaging confirms that imagining spatial transformations of shapes (i.e., mental rota
tion) produces activity in the parietal lobes (e.g., Alivisatos & Petrides, 1997; Carpenter et
al., 1999a; Harris et al., 2000; Jordan et al., 2002; Just et al., 2001; Kosslyn, DiGirolamo,
et al., 1998). Specifically, it is the right superior parietal cortex that seems most involved
in the mental rotation of objects (Parsons, 2003). Note that space in the visual cortex is
represented as two-dimensional space (i.e., as a planar projection of space in the world),
but disparity information from each retina can be used to reconstruct the objects’ three-
dimensional (3D) shape and the depth and global 3D layout of a scene. Neuroimaging in
humans and electrical recordings in monkeys both indicate that the posterior parietal
lobe is crucial to cortical 3D processing (Naganuma et al., 2005; Tsao et al., 2003). The
parietal lobe of monkeys also contains neurons that are selectively sensitive to 3D infor
mation from monocular information like texture gradients (Tsutsui et al., 2002).
Thus, the human inferior parietal lobe or Brodmann area 39 (also known as the angular
gyrus) would then be a key brain structure for our subjective experience of space or for
“space awareness.” This area may contribute to forming individual representations of
multiple objects by representing the spatial distribution of their contours and boundaries
(Robertson et al., 1997). Such a combination of “what” with “where” information would
result in selecting “which” objects will be consciously perceived. In sum, the posterior
parietal cortex in humans and primates appears to be the control center for visual-spatial
functions and the hub of a widely distributed brain system for the processing of spatial in
formation (Graziano & Gross, 1995; Mountcastle, 1995). This distributed spatial system
would include the premotor cortex, putamen, frontal eye fields, superior colliculus, and
hippocampus. Parietal lesions, however, would disrupt critical input to this distributed
system of spatial representations.
Page 13 of 59
Representation of Spatial Relations
the eyes or the head’s vertical axis). Such computations are in principle possible with the
use of various frames of reference.
The initial visual frame of reference is retinotopic. However, this frame represents a view
of the world that changes with each eye movement and therefore is of limited use for con
trolling action. In fact, the primate brain uses multiple frames of reference, which are ob
tained by integrating information from the other sense modalities with the retinal infor
mation. These subsequent frames of reference provide a more stable representation of
the visual world (Feldman, 1985). In the parietal lobe, neurons combine information to a
stimulus in a particular location with information about the position (p. 37) of the eyes
(Andersen & Buneo, 2002; Andersen, Essick, & Siegel, 1985), which is updated across
saccades (Heide et al., 1995). Neurons with head-centered receptive fields are also found
in regions of the monkey’s parietal lobe (Duhamel et al., 1997). Comparing location on
the retina to one internal to the observer’s body is an effective way to compute position
within a spatiotopic frame of reference, as also shown by computer simulations (Zipser &
Andersen, 1988). In the monkey, parietal neurons can also code spatial relationships as
referenced to an object and not necessarily to an absolute position relative to the viewer
(Chafee et al., 2005, 2007).
A reference frame can be defined by an origin and its axes. These can be conceived as
rigidly attached or fixed onto something (an object or an element of the environment) or
someone (e.g., the viewer’s head or the hand). For example, the premotor cortex of mon
keys contains neurons that respond to touch, and their receptive fields form a crude map
of the body surface (Avillac et al., 2005). These neurons are bimodal in that they also re
spond to visual stimuli that are adjacent in space to the area of skin they represent (e.g.,
the face or an arm). However, these cells’ receptive fields are not retinotopic; instead,
when the eyes move, their visual receptive fields remain in register with their respective
tactile fields (Gross & Graziano, 1995; Kitada et al., 2006). For example, a bimodal neu
ron with a facial tactile field responds as if its visual field is an inflated balloon glued to
the side of the face. About 20 percent of these bimodal neurons continue their activity af
ter lights are turned off, so as to also code the memory of an object’s location (Graziano,
Hu, & Gross, 1997). Apparently, some of these neurons are specialized for withdrawing
from an object rather than for reaching it (Graziano & Cooke, 2006). Neurons that inte
grate several modalities at once have also been found within the premotor cortex of the
monkey; trimodal neurons (visual, tactile, and auditory; Figure 3.3) have receptive fields
that respond to a sound stimulus located in the space surrounding the head, within
roughly 30 cm (Graziano, Reiss, & Gross, 1999). Neuroimaging reveals maximal activity
in the human dorsal parieto-occipital sulcus when viewing objects looming near the face
(i.e., in a range of 13 to 17 cm), and this neural response decreases proportionally to dis
tance from the face (Quinlan & Culham, 2007).
The superior parietal lobe of monkeys might be the substrate for body-centered positional
codes for limb movements, where coordinates define the azimuth, elevation, and distance
of the hand (Lacquaniti et al.,
Page 14 of 59
Representation of Spatial Relations
5. Neglecting Space
Localizing objects according to multiple and parallel spatial frames of reference is also
relevant to the manner in which spatial attention is deployed. After brain damage, atten
tional deficit, or the “neglect” of space, clearly reveals how attention can be allocated
within different frames of reference. Neglect is a clinical disorder that is characterized by
a failure to notice objects to one side (typically, the left). However, “left” and “right” must
be defined with respect to some frame of reference (Beschin et al., 1997; Humphreys &
Riddoch, 1994; Pouget & Snyder, 2000), and several aspects of the neglect syndrome are
best understood in terms of different and specific frames of reference. That is, an object
Page 15 of 59
Representation of Spatial Relations
can be on the left side with respect to the eyes, head, or body, or with respect to some ax
is placed on the object (e.g., the left side of the person facing the patient). In the latter
case, one can consider the left side of an object (e.g., of a ball) as (1) based on a vector
originating from the viewer or (2) based on the intrinsic geometry of the object (e.g., the
left paw of the cat). These frames of reference can be dissociated by positioning different
ly the parts of the body or of the object. For example, a patient’s head may turn to the
right, but gaze can be positioned far to the left. Thus, another person directly in front of
the patient would lie to the left with respect to the patient’s head and to the right with re
spect to the patient’s eyes. Moreover, the person in front of the patient would have her
right hand to the left of the patient’s body, but if she turned 180 degrees, her right hand
would then lie to the right of the patient’s body.
Page 16 of 59
Representation of Spatial Relations
According to a notion of multiple coordinate systems, different forms of neglect will mani
fest depending on the specific portions of parietal or frontal cortex that are damaged.
These will reflect a complex mixture of various coordinate frames. Thus, if a lesion of the
parietal lobe causes a deficit in a distributed code of locations that can be read out in a
variety of reference frames (Andersen & Buneo, 2002), neglect behavior will emerge in
the successive visual transformations (Driver & Pouget, 2000). It may also be manifested
within an object-centered reference frame (Behrmann & Moscovitch, 1994). Indeed, ne
Page 17 of 59
Representation of Spatial Relations
glect patients can show various mixtures and dissociations between the reference frames;
thus, some patients show both object-centered and viewer-centered neglect (Behrmann &
Tipper 1999), but other patients shown neglect in just one of these frames (Hillis & Cara
mazza, 1991; Tipper & Behrmann, 1996). For example, Laeng and colleagues (2002)
asked a neglect patient to report the colors of two objects (cubes) that could either lie on
a table positioned near or far from the patient or be held in the left and right hands of the
experimenter. In the latter case, the experimenter either faced the patient or turned
backward so that the cubes held in her hands could lie in either the left or right hemi
space (Figure 3.4). Thus, the cubes’ position in space was also determined by the
experimenter’s (p. 39) body position (i.e., they could be described according to an exter
nal body’s object-centered frame). Moreover, by use of a mirror, the cubes could be seen
in the mirror far away, although they were “near” the patient’s body, so that the patient
actually looked at a “far” location (i.e., the surface of the mirror) to see the physically
near object. The experiment confirmed the presence of all forms of neglect. Not only did
the patient name the color of a cube seen in his left hemispace more slowly than in his
right hemispace, but also latencies increased for a cube held by the experimenter in her
left hand and in the patient’s left hemispace (both when the left hand was seen directly or
as a mirror reflection). Finally, the patient’s performance was worse for “far” than “near”
locations. He neglected cubes located near his body (i.e., within “grasping” space) but
seen in the mirror, thus dissociating directing gaze toward extrapersonal space to see an
object physically located in peripersonal space.
In most accounts of spatial attention, shifting occurs within coordinate frames that can be
defined by a portion (or side) of a spatial plane that is orthogonally transected by some
egocentric axis (Bisiach et al., 1985). However, together with the classic form of neglect
for stimuli or features to the left of the body (or an object’s) midline, neglect can also oc
cur below or above the horizontal plane or in the lower (Butter et al., 1989) versus the
upper visual field (Shelton et al., 1990) In addition, several neglect behaviors would seem
to occur in spatial frames of reference that are best defined by vectors (p. 40) (Kins
bourne, 1993) or polar coordinates (Halligan & Marshall, 1995), so that either there is no
abrupt boundary for the deficit to occur or the neglected areas are best described by an
nular regions of space around the patient’s body (e.g., grasping or near, peripersonal,
space). Neurological studies have identified patients with more severe neglect for stimuli
within near or reaching space than for stimuli confined beyond the peripersonal region in
far, extrapersonal, space (Halligan & Marshall, 1991; Laeng et al., 2002) as well as pa
tients with the opposite pattern of deficit (Cowey et al., 1994; Mennemeier et al., 1992).
These findings appear consistent with the evidence from neurophysiology studies in mon
keys (e.g., Graziano et al., 1994), where spatial position can be defined within a bounded
region of space to the head or arm. Moreover, a dysfunction within patients’ inferior pari
etal regions is most likely to result in neglect occurring in an “egocentric” spatial frame
of reference (i.e., closely related to action control within personal space), whereas dys
function within the superior temporal region is most likely to result in “allocentric” ne
glect occurring in a spatial frame of reference centered on the visible objects in extraper
sonal space (Committeri et al., 2004, 2007; Hillis, 2006).
Page 18 of 59
Representation of Spatial Relations
Patients with right parietal lesions also have difficulties exploring “virtual space” (i.e., lo
cating objects within their own mental images). For example, patients with left-sided ne
glect are unable to describe from their visual memory left-sided buildings in a city scene
(“left” being based on their imagined position within a city’s square; Beschin et al., 2000;
Bisach & Luzzatti, 1978). Such patients may also be unable to spell the beginning of
words (i.e., unable to read the left side of the word from an imaginary display; Baxter &
Warrington, 1983). However, patients with neglect specific to words (or “neglect dyslex
ia”) after a left-hemisphere lesion can show a spelling deficit for the ends of words (Cara
mazza & Hillis, 1990).
Neuropsychological findings also have shown that not only lesions of the inferior parietal
lobe but also those of the frontal lobe and the temporal-parietal-occipital junction lead to
unilateral neglect. Remarkably, damage to the rostral part of the human superior tempo
ral cortex (of the right hemisphere) results in profound spatial neglect (Karnath et al.,
2001) in humans and monkeys, characterized by a profound lack of awareness for objects
in the left hemispace. Because the homologous area of the left hemisphere is specialized
for language in humans, this may have preempted the spatial function of the left superior
temporal cortex, causing a right-sided dominance for space-related information (Wein
traub & Mesulam, 1987). One possibility is that the right-sided superior temporal cortex
plays an integrative role with regard to the ventral and dorsal streams (Karnath, 2001)
because the superior temporal gyrus is adjacent to the inferior areas of the dorsal system
and receives input from both streams and is therefore a site for multimodal sensory con
vergence (Seltzer & Pandya, 1978). However, none of these individual areas should be in
terpreted as the “seat” of the conscious perception of spatially situated objects. In fact,
no cortical area alone may be sufficient for visual awareness (Koch, 2004; Lamme et al.,
2000). Most likely, a conscious percept is the expression of a distributed neural network
and not of any neural bottleneck. That is, a conscious percept is the gradual product of
recurrent and interacting neural activity from several reciprocally interconnected regions
and streams (Lamme, 2003, 2006). Nevertheless, the selective injury of a convergence
zone, like the superior temporal lobe, could disrupt representations that are necessary
(but not sufficient) to spatial awareness.
Interestingly, patients with subcortical lesions and without detectable damage of either
temporal or parietal cortex also show neglect symptoms. However, blood perfusion mea
surements in these patients reveal that the inferior parietal lobe is hypoperfused and
therefore dysfunctional (Hillis et al., 2005). Similarly, damage to the temporoparietal
junction, an area neighboring both the ventral and dorsal systems, produces abnormal
correlation of the resting state signal between left and right inferior parietal lobes, which
are not directly damaged; this abnormality correlates with the severity of neglect (Corbet
ta et al., 2008; He et al., 2007). Therefore, the “functional lesion” underlying neglect may
include a more extensive area than what is revealed by structural magnetic resonance, by
disrupting underlying association or recurrent circuits (e.g., parietal-frontal pathways;
Thiebaut de Schotten et al., 2005).
Page 19 of 59
Representation of Spatial Relations
There is a clear analogy between this evolution-arily adaptive division of labor between
the vertebrate cerebral hemispheres and the performance of artificial neural networks
that segregate processing to multiple, smaller subsystems (Otto et al., 1992; Reuckl,
Cave, & Kosslyn, 1989). Most relevant, this principle of division of labor has also been ap
plied to the modularization of function for types of spatial representations. Specifically,
Kosslyn (1987) proposed the existence of two neural subnetworks within the dorsal
stream that process qualitatively different types of spatial information. One spatial repre
sentation is based on a quantitative parsing of space and therefore closely related to that
of spatial information in the service of action. This type of representation is called coordi
nate (Kosslyn, 1987) because it is derived from representations that provide coordinates
for navigating into the environment as well as for performing targeted actions such as
reaching, grasping, hitting, throwing, and pointing to something. In contrast, the other
hypothesized type of spatial representation, labeled categorical spatial relation, parses
space in a qualitative manner. For example, two configurations can be described as “one
to the left of the other.” Thus, qualitative spatial relations are based on the perception of
spatial categories, where an object (but also an empty place) is assigned to a broad equiv
alence class of spatial positions (e.g., if a briefcase can be on the floor, and being “on the
floor” is satisfied by being placed on any of the particular tiles that make up the whole
floor).
Each of the two proposed separate networks would be complementarily lateralized. Thus,
the brain can represent in parallel the same spatial layout in at least two separate man
ners (Laeng et al., 2003): a right-hemisphere mode that assesses spatial “analog” spatial
relations (e.g., the distance between two objects) and a left-hemisphere mode that assess
es “digital” spatial relations (e.g., whether two objects are attached to one another or
above or below the other). The underlying assumption in the above account is that com
Page 20 of 59
Representation of Spatial Relations
puting separately the two spatial relations (instead of, e.g., taking the quantitative repre
sentation and making it coarser by grouping the finer locations) could result in a more ef
ficient representation of space, where both properties can be attended simultaneously.
Artificial neural network simulations of these spatial judgments provide support for more
efficient processing in “split” networks than unitary networks (Jacobs & Kosslyn, 1994;
Kosslyn, Chabris, et al., 1992; Kosslyn & Jacobs, 1994). These studies have shown that,
when trained to make either digital or analog spatial judgments, the networks encode
more effectively each relation if their input is based, respectively, on units with relatively
small, nonoverlapping receptive fields, as opposed to units with relatively large, overlap
ping receptive fields (Jacobs & Kosslyn, 1994). Overlap of location detectors would then
promote the representation of distance, based on a “coarse coding” strategy (Ballard,
1986, Eurich & Schwegler, 1997; Fahle & Poggio, 1981; Hinton, McClelland, & Rumel
hart, 1986). In contrast, the absence of overlap between location detectors benefits the
representation of digital or categorical spatial relations, by effectively parsing space.
Consistent with the above computational account, Laeng, Okubo, Saneyoshi, and Michi
mata (2011) observed that spreading the attention window to encompass an area that in
cludes two objects or narrowing it to encompass an area that includes only one of the ob
jects can modulate the ability to represent each type of spatial relation. In this study, the
spatial attention window was manipulated to select regions of differing areas by use of
cues of differing sizes that preceded the presentation of pairs of stimuli. The main as
sumption was that larger cues would encourage a more diffused attention allocation,
whereas the small cues would encourage a more focused mode of attention. When the at
tention window was large (by cueing an area that included both objects as well as the
empty space between them), spatial transformations of distance between two objects
were noticed faster than when (p. 42) the attention window was relatively smaller (i.e.,
when cueing an area that included no more than one of the objects in the pair). Laeng
and colleagues concluded that a relatively larger attention window would facilitate the
processing of an increased number of overlapping spatial detectors so as to include (and
thus “measure”) the empty space in between or the spatial extent of each form (when
judging, e.g., size or volume). In contrast, smaller nonoverlapping spatial detectors would
facilitate parsing space into discrete bins or regions and, therefore, the processing of cat
egorical spatial transformations; indeed, left–right and above–below were noticed faster
in the relatively smaller cueing condition than in the larger (see also Okubo et al., 2010).
The theoretical distinction between analog and digital spatial functions is relatively re
cent, but early neurological investigations had already noticed that some spatial functions
(e.g., distinguishing left from right) are commonly impaired after damage to the posterior
parietal cortex of the left hemisphere, whereas impairment of other spatial functions, like
judging an object’s orientation or exact position, is typical after damage to the same area
in the opposite, right, hemisphere (Luria, 1973). The fact that different forms of spatial
dysfunctions can occur independently for each hemisphere has been repeatedly con
firmed by studies of patients with unilateral lesions (Laeng, 1994, 2006; Palermo et al.,
2008) as well as by neuroimaging studies of normal individuals (Baciu et al., 1999; Koss
lyn et al., 1998; Slotnick & Moo, 2006; Trojano et al., 2002). Complementary results have
Page 21 of 59
Representation of Spatial Relations
been obtained with behavioral methods that use the lateralized (and tachistoscopic) pre
sentation of visual stimuli to normal participants (e.g., Banich & Federmeier, 1999; Bruy
er et al., 1997; Bullens & Postma, 2008; Hellige & Michimata, 1989; Kosslyn, 1897; Koss
lyn et al., 1989, 1995; Laeng et al., 1997; Laeng & Peters, 1995; Roth & Hellige, 1998; Ry
bash & Hoyer, 1992).
Laeng (1994, 2006) showed a double dissociation between failures to notice changes in
categorical spatial relations and changes in coordinate spatial relations. A group of pa
tients with unilateral damage to the right hemisphere had difficulty noticing a change in
distance or angle between two figures of animals presented successively. The same pa
tients had less difficulty noticing a change of relative orientation (e.g., left vs. right or
above vs. below) between the same animals. In contrast, the patients with left-hemi
sphere damage had considerably less difficulty noticing that the distance between the
two animals had either increased or decreased. In another study (Laeng, 2006), similar
groups of patients with unilateral lesions made corresponding errors in spatial construc
tion tasks from memory (e.g., building patterns made of matchsticks; relocating figures of
animals on a cardboard). Distortions in reproducing the angle between two elements and
accuracy of relocation of the objects in the original position were more common after
damage to the right hemisphere (see also Kessels et al., 2002), whereas mirror reversals
of elements of a pattern were more common after damage to the left hemisphere. A study
by Palermo and colleagues (2008) showed that patients with damage confined to the left
hemisphere had difficulty visually imaging whether a dot shown in a specific position
would fall inside or outside of a previously seen circle. These patients were relatively bet
ter in visually imaging whether a dot shown in a specific position would be nearer to or
farther from the circle’s circumference than another dot previously seen together with
the same circle. The opposite pattern of deficit was observed in the patients with right-
hemisphere damage. Another study with patients by Amorapanth, Widick, and Chatterjee
(2010) showed that lesions to a network of areas in the left hemisphere resulted in more
severe impairment in judging categorical spatial relations (i.e., the above–below relations
between pairs of objects) than lesions to homologous areas of the right hemisphere. Also
in this study, the reverse pattern of impairment was observed for coordinate spatial pro
cessing, where right-brain damage produced more severe deficit than left-hemisphere
damage.
Page 22 of 59
Representation of Spatial Relations
The above evidence with patients is generally consistent with that from studies with
healthy participants, in particular studies using the lateralized tachistoscopic method. In
these studies, the relative advantages in speed of response to stimuli presented either to
the left or right of fixation indicated superiority of the right hemisphere (i.e., left visual
field) for analog judgments and of the left hemisphere (i.e., right visual field) for digital
judgments. However, in studies with healthy subjects, the lateral differences appear to be
small (i.e., in the order of a few tens of milliseconds according to a meta-analysis; Laeng
et al., 2003). Nevertheless, small effect sizes identified with such a noninvasive method
are actually greater than effect sizes in percent blood oxygenation observed with fMRI.
Most important, both behavioral effects can predict very dramatic outcomes after dam
age to the same region or side of the brain. Another method, whereby the same cortical
sites can be temporarily and reversibly deactivated (i.e., transcranial magnetic stimula
tion [TMS]), (p. 43) provides converging evidence. Left-sided stimulation can effectively
mimic the deficit in categorical perception after left-hemisphere damage, whereas right-
sided stimulation mimics the deficit in coordinate space perception after right-hemi
sphere damage (Slotnick et al., 2001; Trojano et al., 2006).
A common finding from studies using methods with localizing power (e.g., neuroimaging,
TMS, and selected patients) is that both parietal lobes play a key role in supporting the
perception of spatial relations (e.g., Amorapanth et al., 2010; Baciu et al., 1999; Kosslyn
et al., 1998; Laeng et al., 2002; Trojano et al., 2006). Moreover, areas of the left and right
prefrontal cortex that receive direct input from ipsilateral parietal areas also show activi
ty when categorical or coordinate spatial information, respectively, is held in memory
(Kosslyn, Thompson, et al., 1998; Trojano et al., 2002). In an fMRI study (Slotnick & Moo,
2006), participants viewed in each trial a configuration consisting of a shape and a dot
placed at a variable distance from the shape (either “on” or “off” the shape and, in the
latter case, either “near” or “far” from the shape). In the subsequent retrieval task, the
shape was presented without the dot, and participants responded to queries about the
previously seen spatial layout (e.g., either about a categorical spatial relation property:
Page 23 of 59
Representation of Spatial Relations
“was the dot ‘on’ or ‘off’ the shape?”; or about a coordinate spatial relation property:
“was the dot ‘near’ to or ‘far’ from the shape?”). Spatial memories for coordinate rela
tions were accompanied by increased activity in the right hemisphere’s prefrontal cortex,
whereas memories for categorical relations were accompanied by activity in the left
hemisphere’s prefrontal cortex (see Figure 3.5).
One should note that the above studies on the perception of categorical and coordinate
relations do not typically involve any specific action in space, but instead involve only ob
servational judgments (e.g., noticing or remembering the position of objects in a display).
Indeed, a child’s initial cognition of space and of objects’ numerical identity may be en
tirely based on a purely observational representation of space whereby the child notices
that entities preserve their identities and trajectories when they disappear behind other
objects and reappear within gaps of empty space (Dehaene & Changeux, 1993; Xu &
Carey, 1996). The above findings from neuroscience studies clearly point to a role of the
dorsal system in representing spatial information beyond the mere service of action (cf.
Milner & Goodale, 2008). Thus, the evidence from categorical and coordinate spatial pro
cessing, together with the literature on other spatial transformations or operations (e.g.,
mental rotations of shapes, visual maze solving) clearly indicates that a parietal-frontal
system supports not merely support the “act” function but also two other central func
tions of visual-spatial representations: to “know” and “talk.” The latter, symbolic function
would seem of particular relevance to our species and the only one that we do not share
with other living beings (except, perhaps, honeybees; Kirchner & Braun, 1994; Menzel et
al., 2000).
That is, humans can put into words or verbal propositions (as well as into gestures) any
type of (p. 44) spatial relations, whether quantitative (by use of numerical systems and
geometric systems specifying angles and eccentricities) or qualitative (by use of preposi
tions and locutions). However, quantitative propositions may require measurement with
tools, whereas establishing qualitative spatial relations between objects would seem to
require merely looking at them (Ullman, 1984). If abstract spatial relations between ob
jects in a visual scene can be effortlessly perceived, these representations are particular
ly apt to be efficiently coded in a propositional manner (e.g., “on top of”). The latter lin
guistic property would seem pervasive in all languages of the world and also pervade dai
ly conversations. Some “localist” linguists have proposed that the deep semantic struc
ture of language is intrinsically spatial (Cook, 1989). Some cognitive neuro-scientists
have also suggested that language in our species may have originated precisely from the
need to transmit information about the spatial layout of an area from one person to anoth
er (O’Keefe, 2003; O’Keefe & Nadel, 1978).
Locative prepositions are often used to refer to different spatial relations in a quick and
frugal manner (e.g., “above,” “alongside,” “around,” “behind,” “between,” “inside,” “left,”
“on top of,” “opposite,” “south,” “toward,” “underneath”); their grammatical class may
exist in all languages (Jackendoff & Landau, 1992; Johnson-Laird, 2005; Kemmerer, 2006;
Miller & Johnson-Laird, 1976; Pinker, 2007). Clearly, spatial prepositions embedded in
sentences (e.g., the man is “in” the house) can express spatial relations only in a rather
Page 24 of 59
Representation of Spatial Relations
abstract manner (compared, for example, with how GPS coordinates can pinpoint space)
and can guide actions and navigation only in a very coarse sense (e.g., by narrowing
down an area of search). Locative prepositions resemble categorical spatial relations in
that they express spatial relationships in terms of sketchy or schematic structural proper
ties of the objects, often ignoring details of spatial metrics (e.g., size, orientation, dis
tance; Talmy, 2000). Nevertheless, the abstract relations of locative prepositions seem ex
tremely useful to our species because they can become the referents of vital communica
tion. Moreover, categorical spatial representations and their verbal expression counter
parts may underlie the conceptual structure of several other useful representations
(Miller & Johnson-Laird, 1976), like the representations of time and of numerical entities
(Hubbard et al., 2005). Indeed, categorical spatial representations could provide the ba
sic mental scaffolding for semantics (Cook, 1989: Jeannerod & Jacob, 2005), metaphors
(Lakoff & Johnson, 1999), and reasoning in general (Goel et al., 1998; Johnson-Laird,
2005; Pinker, 1990).
O’Keefe (1996; 2003) has proposed that the primary function of locative prepositions is to
identify a set of spatial vectors between places. The neural substrate supporting such
function would consist of a specific class of neurons or “place cells” within the right hip
pocampus and of cerebral structures interconnected with the hippocampus. Specifically, a
combination of the receptive fields of several space cells would define boundaries of re
gions in space that effectively constitute the referential meaning of a preposition. For ex
ample, the preposition “below” would identify a “place field” with its center on the verti
cal direction vector from a reference object. The width of such a place field would typical
ly be larger than the width of the reference object but would taper with distance so as to
form a tear-dropped region attached to the bottom surface of the reference object (see al
so Carlson et al., 2003; Hayward & Tarr, 1995).
tual representation of locative prepositions also appears to be separated from other lin
guistic representations (e.g., action verbs, despite several of these verbs sharing with
prepositions the conceptual domain of space) because brain damage can dissociate the
meanings of these terms (Kemmerer & Tranel, 2003).
Although the evidence for the existence of a division of labor for analog versus digital
spatial relations in humans is now clearly established, an analogous lateralization of brain
function in nonhuman species remains unclear (Vauclair et al., 2006). Importantly, lateral
ization is under the influence of “opportunistic” processes of brain development that opti
mize the interaction of different subsystems within the cerebral architecture (Jacobs,
1997, 1999). Thus, in our species, local interactions with linguistic and semantic net
works may play a key role in the manner in which the spatial system is organized. That is,
biasing categorical spatial representations within a left hemisphere’s substrate by “yok
ing” them with linguistic processes may facilitate a joint operation between perception,
language, and thought (Jacobs & Kosslyn, 1994; Kosslyn, 1987).
Research in neuroscience shows that topological judgments are accomplished by the pari
etal lobe (also in rats; Goodrich-Hunsaker et al., 2008); in humans, these judgments have
a robust left-hemisphere advantage (Wang et al., 2007). As originally reasoned by Franco
and Sperry (1977), given that we can represent multiple geometries (e.g., Euclidian,
affine, projective, topological) and the right hemisphere’s spatial abilities are superior to
those of the left (the prevalent view in the 1970s), the right hemisphere should match
shapes by their geometrical properties better than the left hemisphere. They tested this
idea with a group of (commissurotomized) “split-brain” patients in an intermodal (vision
and touch) task. Five geometrical forms of the same type were examined visually, while
one hand searched behind a curtain for one shape among three with the matching geome
try. As expected, the left hand’s performance of the split-brain patients was clearly superi
or to that of their right hand. This geometrical discrimination task required the percep
tion of fine spatial properties of shapes (e.g., differences in slant and gradient of surfaces,
angular values, presence of concavities or holes). Thus, the superior performance of the
left hand, which is controlled by the right hemisphere, reflects the use of the right
hemisphere’s coordinate spatial relations’ system (of the right hemisphere) in solving a
shape discrimination task that crucially depends on the fine metrics of the forms. Later
Page 26 of 59
Representation of Spatial Relations
investigations on split-brain patients showed that the left hand outperforms the right
hand also when copying drawings from memory or in rearranging blocks of the WAIS-R
Block Design Test (LeDoux, Wilson, & Gazzaniga, 1977). Specifically, LeDoux and Gaz
zaniga (1978) proposed that the right hemisphere possesses a perceptual capacity that is
specifically dedicated to the analysis of space in the service of the organization of action
or movements planning that they called a manipulospatial subsystem. Again, a left-hand
(right-hemisphere) superiority in these patients’ constructions is consistent with a view
that rearranging multiple items that are identical in shape (and share colors) may require
a coordinate representation of the matrix of the design or array (Laeng, 2006).
Based on the above account, one would expect that lesions of the dorsal system (in partic
ular, of the left hemisphere) would result in object recognition problems. However, as al
ready discussed, patients with parietal lesions do not present the dramatic object recogni
tion deficits of patients with temporal lesions. Patients with unilateral lesions localized in
the parietal lobe often appear to lack knowledge of the spatial orientation of objects, yet
they appear to achieve normal object recognition (Turnbull et al., 1995, 1997). For exam
Page 27 of 59
Representation of Spatial Relations
ple, they may fail to recognize their correct orientation or, when drawing from memory,
they rotate shapes of 90 or 180 degrees. Most remarkably, patients with bilateral parietal
lesions (i.e., with Bálint’s syndrome and simultanagnosia), despite being catastrophically
impaired in their perception of spatial relations between separate objects, can recognize
an individual object (albeit very slowly; Duncan et al., 2003) on the basis of its shape.
Hence, we face something of a paradox: Object identity depends on spatial representa
tions among parts (i.e., within object relations), but damage to the dorsal spatial systems
does not seem to affect object recognition (Farah, 1990). It may appear that representing
spatial relations “within” and “without” shapes depends on different perceptual mecha
nisms.
However, there exists evidence that lesions in the dorsal system can cause specific types
of object recognition deficits (e.g., Warrington, 1982; Warrington & Taylor, 1973). First of
all, patients with Bálint’s syndrome do not have entirely normal object perception (Dun
can et al., 2003; Friedman-Hill et al., 1995; Robertson et al., 1997). Specifically, patient
R.M., with bilateral parieto-occipital lesions, is unable to judge both relative and absolute
visual locations. Concomitantly, he makes mistakes in combining the colors and shapes of
separate objects or the shape of an object with the size of another (i.e., the patient com
mits several “illusory conjunctions”). Thus, an inadequate spatial representation or loss of
spatial awareness of the features of forms, due to damage to parietal areas, appears to
underlie both the deficit in spatial judgment and that of binding shape features. Accord
ing to feature integration theory (Treisman, 1988), perceptual representations of two sep
arate objects currently in view require integration of information in the dorsal and ven
tral system, so that each object’s specific combination of features in their proper loca
tions can be obtained.
Additional evidence that spatial information plays a role in shape recognition derives
from a study with lateralized stimuli (Laeng, Shah & Kosslyn, 1999). This study revealed a
short-lived advantage for the left hemisphere (i.e., for stimuli presented tachistoscopical
ly to the right visual field) in the recognition of pictures of contorted poses of animals. It
was reasoned that nonrigid multipart objects (typically animal bodies but also some arti
facts, e.g., a bicycle) can take a number of contortions that, combined with an unusual
perspective, are likely to be novel or rarely experienced by the observer. In such cases,
the visual system may opt to operate in a different mode from the usual matching of
stored representations (i.e., bottom-up matching of global templates) and initiate a hy
pothesis-testing procedure (i.e., a top-down search for connected parts and a serial
matching of these to stored structural descriptions). In the latter case, the retrieval of
categorical spatial information (i.e., a hypothesized dorsal and left hemisphere’s function)
seems to be crucial for recognition. Abstract spatial information about the connectivity of
the object’s parts would facilitate the formation of a perceptual hypothesis and verifying
it by matching visible parts to the object’s memorized spatial configuration. In other
words, an “object map” in the dorsal system specifies the spatial relations among parts’
representation of the complex pattern represented by the ventral system (Kosslyn, Ganis,
& Thompson, 2006).
Page 28 of 59
Representation of Spatial Relations
A subsequent study (Laeng et al., 2000) of patients with unilateral posterior dam
(p. 47)
age (mainly affecting the parietal lobe) confirmed that patients with left-hemisphere dam
age had greater difficulties in recognizing the contorted bodies of animals (i.e., the same
images used in Laeng et al.’s, 1999, study) than those with right- hemisphere damage
(Figure 3.6). However, left-hemisphere damage resulted in less difficulty than right-hemi
sphere damage when recognizing pictures of the same animals seen in conventional pos
es but from noncanonical (unusual) views as well as when recognizing rigid objects (arti
facts) from noncanonical views. As originally shown in studies by Warrington (1982; War
rington & Taylor, 1973), patients with right parietal lesions showed difficulties in the
recognition or matching of objects when viewed at unconventional perspectives or in the
presence of strong shadows. According to Marr (1982), these findings suggested that the
patients’ difficulties reflect the inability to transform or align an internal spatial frame of
reference centered on the object’s intrinsic coordinates (i.e., its axes of elongation) to
match the perceived image.
To conclude, the dorsal system plays a role in object recognition but as an optional re
source (Warrington & James, 1988) by cooperating with the ventral system during chal
lenging visual situations (e.g., novel contortions of flexible objects or very unconventional
views or difficult shape-from-shadows discriminations; Warrington & James, 1986) or
when making fine judgments about the shapes of objects that differ by subtle variations
Page 29 of 59
Representation of Spatial Relations
in size or orientation (Aguirre & D’Esposito, 1997; Faillenot et al., 1997). In ordinary cir
cumstances, different from these “visual problem solving” (p. 48) situations (Farah, 1990),
a spatial analysis provided by the dorsal system seems neither necessary nor sufficient to
achieve object recognition.
8. Cognitive Maps
As thinking agents, we accumulate in our lifetime a spatial understanding of our sur
rounding physical world. Also, we can remember and think about spatial relations either
in the immediate, visible, physical environment or in the invisible environments of a large
geographic scale. We can also manipulate virtual objects in a virtual space and imagined
geometry (Aflalo & Graziano, 2008). As a communicative species, we can transfer knowl
edge about physical space to others through symbolic systems like language and geo
graphical maps (Liben, 2009). Finally, as a social species, we tend to organize space into
territories and safety zones, and to develop a sense of personal place.
A great deal of our daily behavior must be based on spatial decisions and choices be
tween routes, paths, and trajectories. A type of spatial representation, called the cogni
tive map, appears to be concerned with the knowledge of large-scale space (Cheng, 1986;
Kosslyn et al., 1974; Kuipers, 1978; Tolman, 1948; Wolbers & Hegarty, 2010). A distinc
tion can be made between (1) a map-like representation, consisting of a spatial frame ex
ternal to the navigating organism (this representation is made by the overall geometric
shape of the environment [survey knowledge] and/or a set of spatial relationships be
tween locales [landmarks and place]); and (2) an internal spatial frame that is based on
egocentric cues generated by self-motion (route knowledge) and vestibular information
(Shelton & McNamara, 2001). Cognitive maps may be based on categorical spatial infor
mation (often referred to as topological; e.g., Poucet, 1993), which affords a coarse repre
sentation of the connectivity of space and its overall arrangement, combined with coordi
nate (metric) information (e.g., information about angles and distances) of the large-scale
environment.
Navigation (via path integration or dead reckoning or via the more flexible map-like rep
resentation) and environmental knowledge can be disrupted by damage to a variety of
brain regions. Parietal lesions result in difficulties when navigating in immediate space
(DiMattia & Kesner, 1988; Stark et al., 1996) and can degrade the topographic knowledge
of their environment (Newcombe & Russell, 1969; Takahashi et al., 1997). However, areas
supporting cognitive map representations in several vertebrate species appear to involve
portions of the hippocampus and surrounding areas (Wilson & McNaughton, 1993). As
originally revealed by single-cell recording studies in rats (O’Keefe, 1976; O’Keefe &
Nadel, 1978) and later in primates (O’Keefe et al., 1998; Ludvig et al., 2004) and also hu
mans (Ekstrom et al., 2003), some hippocampal cells can provide a spatial map-like repre
sentation within a reference frame fixed onto the external environment. For example,
some of these cells have visual receptive fields that do not move with the position of the
animal or with changes in viewpoint but instead fire whenever the animal (e.g., the mon
Page 30 of 59
Representation of Spatial Relations
key; Rolls et al., 1989; Fyhn et al., 2004) is in a certain place in the local environment.
Thus, these cells can play an important functional role as part of a navigational system
(Lenck-Santini et al., 2001). Reactivation of place cells has also been observed during
sleep episodes in rats (Wilson & McNaughton, 1994), which can be interpreted as an of
fline consolidation process of spatial memories. Reactivation of whole past sequences of
place cell activity has been recorded in rats during maze navigation whenever they stop
at a turning point (Foster & Wilson, 2006); in some cases, place cell discharges can indi
cate future locations along the path (Johnson & Redish, 2007) before the animals choose
between alternative trajectories. Another type of cell (Figure 3.7) has been found in the
rat entorhinal cortex (adjacent to the hippocampus). These cells present tessellating fir
ing fields or “grids” (Hafting et al., 2005; Solstad et al., 2008) that could provide the ele
ments of a spatial map based on path integration (Kjelstrup et al., 2008; Moser et al.,
2008) and thus complement the function of the place cells.
Ekstrom and colleagues (2003) recorded directly from hippocampal and parahippocampal
cells of epileptic patients undergoing neurosurgery. The patients played a videogame (a
taxi-driving game in which a player navigates within a virtual city) while neural activity
was recorded simultaneously from multiple cells. A significant proportion of the recorded
cells showed spiking properties identical to those of place cells already described for the
rat’s hippocampus. Other cells were instead view responsive. They responded to the view
of a specific landmark (e.g., the picture of a particular building) and were relatively more
common in the patients’ parahippocampal region. Thus, these findings support an ac
count of the human hippocampus as computing a flexible map-like representation of
space by combining visual and spatial elements with a coarser representation of salient
scenes, views, and landmarks formed in the parahippocampal region. In addition, neu
roimaging studies in humans (p. 49) revealed activity in the hippocampal region during
navigational memory tasks (e.g., in taxi drivers recalling routes; Grön et al., 2000;
Maguire et al., 1997; Wolbers et al., 2007). Lesion studies of animals and neurological
cases have demonstrated deficits after temporal lesions that include the hippocampus
(Barrash et al., 2000; Kessels et al., 2001; Maguire et al., 1996). However, the hippocam
pus and the entorhinal cortex may not constitute necessary spatial structures for humans
and for all types of navigational abilities; patients with lesions in these areas can main
Page 31 of 59
Representation of Spatial Relations
tain a path in mind and point to (estimate) the distance from a starting point by keeping
track of a reference location while moving (Shrager etal., 2008).
In rats, the parietal cortex is also clearly involved in the processing of spatial information
(Save & Poucet, 2000) and constitutes another important structure for navigation (Nitz,
2006; Rogers & Kesner, 2006). One hypothesis is that the parietal cortex is involved in
combining visual-spatial information and self-motion information so that egocentrically
acquired information can be relayed to the hippocampus to generate and update an allo
centric representation of space. Based on the role of the human dorsal system in the com
putation of both categorical and coordinate types of spatial representations (Laeng et al.,
2003), one would expect a strong interaction between processing in the hippocampal for
mation and in the posterior parietal cortex. The human parietal cortex could provide both
coordinate (distance and angle) and categorical information (boundary conditions, con
nectivity, and topological information; Poucet, 1993) to the hippocampus. In turn, the hip
pocampus could combine the above spatial information with spatial scenes encoded by
the parahippocampal area and ventral areas specialized for landscape object recognition
(e.g., recognition of a specific building; Aguirre et al., 1998). In addition, language-based
spatial information (Hermer & Spelke, 1996; Hermer-Vazquez et al., 2001) could play an
active role for this navigational system.
Neuroimaging studies with humans revealed activity in the parahippocampal cortex when
healthy participants passively viewed an environment or large-scale scenes (Epstein &
Kanwisher, 1998), including an empty room, as well as during navigational tasks (e.g., in
virtual environments; Aguirre et al., 1996; Maguire et al., 1998, 1999). Patients with dam
age in this area show problems in scene recognition and route learning (Aguirre &
D’Esposito, 1999; Epstein et al., 2001). Subsequent research with patients and monkeys
has clarified the involvement of the parahippocampal cortex in memorizing objects’ loca
tions within a large-scale scene or room’s geometry (Bohbot et al., 1998; Malkova &
Mishkin, 2003), more than in supporting navigation or place knowledge (Burgess &
O’Keefe, 2003). One proposal is that, when remembering a place or scene, the parietal
cortex, based on reciprocal connections, can also translate an allocentric (North, South,
East, West) parahippocampal representation into an egocentric (left, right, ahead, be
hind) representation (Burgess, 2008). By this account, neglect in scene imagery (e.g., the
Milan square’s neglect experiment of Bisiach & Luzzatti, 1978) after parietal lesions
would result from an intact ventral allocentric representation of space (i.e., the whole
square) along with damage to the parietal egocentric representation.
Conclusion
As humans, we “act” in space, “know” space, and “talk” about space; three func
(p. 50)
tions that together would seem to require the whole human brain in order to be accom
plished. Indeed, research on the human brain’s representation of spatial relations in
cludes rather different traditions and theoretical backgrounds, which taken together pro
vide us with a complex and rich picture of our cognition of space. Neuroscience has re
Page 32 of 59
Representation of Spatial Relations
vealed (1) the existence of topo-graphic maps in the brain or, in other words, the brain’s
representation of space by the spatial organization of the brain itself. The visual world is
then represented by two higher order, representational streams of the brain, anatomically
located ventrally and dorsally, that make the basic distinction between (2) “what” is in the
world and “where” it is. However, these two forms of information need to be integrated in
other representations that specify (3) “how” an object can be acted on and “which” object
is the current target. Additionally, the brain localizes objects according to multiple and
parallel (4) spatial frames of reference that are also relevant to the manner in which spa
tial attention is deployed. After brain damage, attentional deficits (5) or neglect clearly
reveal the relevance allocating attention along different frames of reference. Although
many of the reviewed functions are shared among humans and other animals, humans
show a strong degree of (6) cerebral lateralization for spatial cognition, and the current
evidence indicates complementary hemispheric specializations for digital (categorical)
and for analog (coordinate) spatial information. The representation of categorical spatial
relations is also relevant for (7) object recognition by specifying the spatial arrangement
of parts within an object (i.e., the “where of what”). Humans, as other animals, can also
represent space in the very large scale, a (8) cognitive map of the external environment,
which is useful for navigation.
The most striking finding of cognitive neuro-science is the considerable degree of func
tional specialization of the brain’s areas. Interestingly, the discovery that the visual brain
separates visual information into two streams of processing (“what” versus “where”) does
particular justice to Kant’s classic concept of space as a separate mode of knowledge
(Moser et al., 2008). In the Critique of Pure Reason, space was defined as what is left
when one ignores all the attributes of a shape: “If we remove from our empirical concept
of a body, one by one, every feature in it which is empirical, the color, the hardness or
softness, the weight, even the impenetrability, there still remains the space which the
body (now entirely vanished) occupied, and this cannot be removed” (Kant, 1787; 2008, p.
377).
Author Note
I am grateful for comments and suggestions on drafts of the chapter to Charlie Butter,
Michael Peters, and Peter Svenonius.
References
Aflalo, T. N., & Graziano, M. S. A. (2006). Possible origins of the complex topographic or
ganization of motor cortex: Reduction of a multidimensional space onto a two-dimension
al array. Journal of Neuroscience, 26, 6288–6297.
Page 33 of 59
Representation of Spatial Relations
Aglioti, S., Goodale, M. A., & DeSouza, J. F. X (1995). Sizecontrast illusions deceive the
eye but not the hand. Current Biology, 5, 679–685.
Aguirre, G. K., Zarahn, E., & D’Esposito, M. (1998). An area within human ventral cortex
sensitive to “building” stimuli: Evidence and implications. Neuron, 17, 373–383.
Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). “What” and
“where” in the human auditory system. Proceedings of the National Academy of Sciences
U S A, 98, 12301–12306.
Alivisatos, B., & Petrides, M. (1997). Functional activation of the human brain during
mental rotation. Neuropsychologia, 35, 111–118.
Amorapanth, P. X., Widick, P., & Chatterjee, A. (2010). The neural basis for spatial rela
tions. Journal of Cognitive Neuroscience, 22, 1739–1753.
Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. An
nual Review of Neuroscience, 25, 189–220.
Andersen, R. A., Essick, G. K., & Siegel, R. M. (1985). The encoding of spatial location by
posterior parietal neurons. Science, 230, 456–458.
Avillac, M., Deneve, S., Olivier, E., Pouget, A., & Duhamel, J. R. (2005). Reference frames
for representing visual and tactile locations in parietal cortex. Nature Neuroscience, 8,
941–949.
Baars, B. J. (2002). The conscious access hypothesis: Origins and recent evidence. Trends
in Cognitive Sciences, 6, 47–52.
Baciu, M., Koenig, O., Vernier, M.-P., Bedoin, N., Rubin, C., & Segebarth, C. (1999). Cate
gorical and coordinate spatial relations: fMRI evidence for hemispheric specialization.
Neuroreport, 10, 1373–1378.
(p. 51) Bálint, R. (1909). Seelenlähmung des “Schauens”, optische Ataxie, räumliche
Störung der Aufmerksamkeit. European Neurology, 25 (1), 51–66, 67–81.
Ballard, D. H. (1986). Cortical connections and parallel processing: Structure and func
tion. Behavioral and Brain Sciences, 9, 67–120.
Page 34 of 59
Representation of Spatial Relations
Banich, M. T., & Federmeier, K. D. (1999). Categorical and metric spatial processes distin
guished by task demands and practice. Journal of Cognitive Neuroscience, 11 (2), 153–
166.
Barlow, H. (1981). Critical limiting factors in the design of the eye and visual cortex. Pro
ceedings of the Royals Society of London, Biological Sciences, 212, 1–34.
Barrash, J., Damasio, H., Adolphs, R., & Tranel, D. (2000). The neuroanatomical corre
lates of route learning impairment. Neuropsychologia, 38, 820–836.
Battista, C., & Peters, M. (2010). Ecological aspects of mental rotation around the vertical
and horizontal axis. Learning and Individual Differences, 31 (2), 110–113.
Baxter, D. M., & Warrington, E. K. (1983). Neglect dysgraphia. Journal of Neurology, Neu
rosurgery, and Psychiatry, 46, 1073–1078.
Behrmann, M., & Moscovitch, M. (1994). Object-centered neglect in patients with unilat
eral neglect: Effects of left-right coordinates of objects. Journal of Cognitive
Neuroscience, 6, 1–16.
Behrmann, M., & Tipper, S. P. (1999). Attention accesses multiple reference frames: Evi
dence from visual neglect. Journal of Experimental Psychology: Human Perception and
Performance, 25, 83–101.
Beschin, N., Basso, A., & Della Sala, S. (2000). Perceiving left and imagining right: Disso
ciation in neglect. Cortex, 36, 401–414.
Beschin, N., Cubelli, R., Della Sala, S., & Spinazzola, L. (1997). Left of what? The role of
egocentric coordinates in neglect. Journal of Neurosurgery and Psychiatry, 63, 483–489.
Bisiach, E., Capitani, E., & Porta, E. (1985). Two basic properties of space representation
in the brain: Evidence from unilateral neglect. Journal of Neurology, Neurosurgery, and
Psychiatry, 48, 141–144.
Bisiach, E., & Luzzatti, C. (1978). Unilateral neglect of representational space. Cortex, 14,
129–133.
Block, N. (1996). How can we find the neural correlate of consciousness. Trends in Neuro
sciences, 19, 456–459.
Bohbot, V. D., Kalina, M., Stepankova, K., Spackova, N., Petrides, M., & Nadel, L. (1998).
Spatial memory deficits in patients with lesions to the right hippocampal and the right
parahippocampal cortex. Neuropsychologia, 36, 1217–1238.
Brandt, T., & Dietrich, M. (1999). The vestibular cortex: Its locations, functions and disor
ders. Annals of the New York Academy of Sciences, 871, 293–312.
Page 35 of 59
Representation of Spatial Relations
Bruyer, R., Scailquin, J. C., & Coibon, P. (1997). Dissociation between categorical and co
ordinate spatial computations: Modulation by cerebral hemispheres, task properties,
mode of response, and age. Brain and Cognition, 33, 245–277.
Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., Porro, C.A., &
Rizzolatti, G. (2004). Neural circuits involved in the recognition of actions performed by
nonconspecifics: An fMRI study. Journal of Cognitive Neuroscience, 16, 114–126.
Bullens, J., & Postma, A. (2008). The development of categorical and coordinate spatial
relations. Cognitive Development, 23, 38–47.
Burgess, N. (2008). Spatial cognition and the brain. Annals of the New York Academy of
Sciences, 1124, 77–97.
Burgess, N., & O’Keefe, J. (2003). Neural representations in human spatial memory.
Trends in Cognitive Sciences, 7, 517–519.
Burnod, Y., Baraduc, P., Battaglia-Mayer, A., Guigon, E., Koechlin, E., Ferraina, S., Lac
quaniti, F., & Caminiti, R. (1999). Parieto-frontal coding of reaching: An integrated frame
work. Experimental Brain Research, 129, 325–346.
Butter, C. M., Evans, J., Kirsh, N., & Kewman, D. (1989). Altitudinal neglect following trau
matic brain injury: A case report. Cortex, 25, 135–146.
Butters, N., Barton, M., & Brody, B. A. (1970). Role of the right parietal lobe in the media
tion of cross-modal associations and reversible operations in space. Cortex, 6, 174–190.
Caramazza, A., & Hillis, A. E. (1990). Spatial representation of words in the brain implied
by the studies of a unilateral neglect patient. Nature, 346, 267–269.
Carey, D. P., Dijkerman, H. C., Murphy, K. J., Goodale, M. A., & Milner, A. D. (2006). Point
ing to places and spaces in a patient with visual form agnosia. Neuropsychologia, 44,
1584–1594.
Carey, D. P., Harvey, M., & Milner, A. D. (1996). Visuomotor sensitivity for shape and ori
entation in a patient with visual form agnosia. Neuropsychologia, 3, 329–337.
Carlson, L., Regier, T., & Covey, E. (2003). Defining spatial relations: Reconciling axis and
vector representations. In E. van der Zee & J. Slack (Eds.), Representing direction in lan
guage and space (pp. 111–131). Oxford, UK: Oxford University Press.
Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999a). Graded func
tional activation in the visuospatial system with the amount of task demand. Journal of
Cognitive Neuroscience, 11, 9–24.
Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999b). Time-course of
fMRI activation in language and spatial networks during sentence comprehension. Neu
roImage, 10, 216–224.
Page 36 of 59
Representation of Spatial Relations
Casati, R., & Varzi, A. (1999). Parts and places: The structures of spatial representation.
Boston: MIT Press.
Cavanagh, P. (1998). Attention: Exporting vision to the mind. In S. Saida & P. Cavanagh
(Eds.), Selection and integration of visual information, pp. 3–11. Tsukuba, Japan: STA &
NIBH-T.
Chafee, M. V., Averbeck, B. B., & Crowe, D. A. (2007). Representing spatial relationships
in posterior parietal cortex: Single neurons code object-referenced position. Cerebral Cor
tex, 17, 2914–2932.
Chafee, M. V., Crowe, D. A., Averbeck, B. B., & Georgopoulos, A. P. (2005). Neural corre
lates of spatial judgement during object construction in parietal cortex. Cerebral Cortex,
15, 1393–1413.
Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.
Cheng, K. (1986) A purely geometric module in the rat’s spatial representation. Cognition,
23, 149–178.
(p. 52) Clément, G., & Reschke, M. F. (2008). Neuroscience in space. New York: Springer.
Colby, C. L., & Goldberg, M. E. (1999), Space and attention in parietal cortex. Annual Re
view of Neuroscience, 23, 319–349.
Collett, T. (1982). Do toads plan routes? A study of the detour behaviour of Bufo Viridis.
Journal of Comparative Physiology, 146, 261–271.
Committeri, G., Galati, G., Paradis, A. L., Pizzamiglio, L., Berthoz, A., & LeBihan, D.
(2004). Reference frames for spatial cognition: Different brain areas are involved in view
er-, object-, and landmark-centered judgments about object location. Journal of Cognitive
Neuroscience, 16, 1517–1535.
Committeri, G., Pitzalis, S., Galati, G., Patria, F., Pelle, G., Sabatini, U., Castriota-Scander
beg, A., Piccardi, L., Guariglia, C., & Pizzamiglio L. (2007). Neural bases of personal and
extrapersonal neglect in humans. Brain, 130, 431–441.
Cook, W. A. (1989). Case grammar theory. Washington, DC: Georgetown University Press.
Page 37 of 59
Representation of Spatial Relations
Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A.,
Linenweber, M. R., Petersen, S. E., Raichle, M. E., Van Essen, D. C., & Shulman, G. L.
(1998). A common network of functional areas for attention and eye movements. Neuron,
21, 761–773
Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature, 3, 292–297.
Corbetta, M., Miezin, F. M., Shulman, G. L., & Petersen, S. E. (1993). A PET study of visu
ospatial attention. Journal of Neuroscience, 13, 1202–1226.
Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human
brain: From environment to theory of mind. Neuron, 58, 306–324.
Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1997). Transient and sustained
activity in a distributed neural system for human working memory. Nature, 386, 608–611.
Cowey, A., Small, M., & Ellis, S. (1994). Left visuo-spatial neglect can be worse in far than
in near space. Neuropsychologia, 32, 1059–1066.
Crowe, D. A., Averbeck, B. B., Chafee, M. V., & Georgopoulos, A. P. (2005). Dynamics of
parietal neural activity during spatial cognitive processing. Neuron, 47, 885–891.
Culham, J. C., Brandt, S. A., Cavanagh, P., Kanwisher, N. G., Dale, A. M., & Tootell, R. B.
H. (1998). Cortical fMRI activation produced by attentive tracking of moving targets.
Journal of Neurophysiology, 80, 2657–2670.
Culham, J. C., Cavanagh, P., & Kanwisher, N. G. (2001) Attention response functions:
Characterizing brain areas using fmri activation during parametric variations of atten
tional load. Neuron, 32, 737–745.
Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153 (2), 180–189.
Culham, J. C., & Valyear, K. F. (2006). Human parietal cortex in action. Current Opinion in
Neurobiology, 16 (2), 205–212.
Damasio, H., Grabowski, T. J., Tranel, D., Ponto, L. L. B., Hichwa, R. D., & Damasio, A. R.
(2001). Neural correlates of naming actions and of naming spatial relations. NeuroImage,
13, 1053–1064.
Dehaene, S. (1997). The number sense. Oxford, UK: Oxford University Press.
Dehaene, S., & Changeux, J.-P. (1993). Development of elementary numerical abilities: A
neuronal model. Journal of Cognitive Neuroscience, 5, 390–407.
Page 38 of 59
Representation of Spatial Relations
Denys, K., Vanduffel, W., Fize, D., Nelissen, K., Peuskens, H., Van Essen, D., & Orban, G.
A. (2004). The processing of visual shape in the cerebral cortex of human and nonhuman
primates: A functional magnetic resonance imaging study. Journal of Neuroscience, 24,
2551–2565.
De Renzi, E. (1982). Disorders of space exploration and cognition. New York: John Wiley
& Sons.
De Renzi, E., & Vignolo, L. (1962). The Token Test: A sensitive test to detect receptive dis
turbances in aphasics. Brain, 85, 665–678.
DeYoe, E. A., Carman, G. J., Bandettini, P., Glickman, S., Wieser, J., Cox, R., Miller, D., &
Neitz, J. (1996). Mapping striate and extrastriate visual areas in human cerebral cortex.
Proceedings of the National Academy of Sciences U S A, 93, 2382–2386.
DiMattia, B. V., & Kesner, R. P. (1988). Role of the posterior parietal association cortex in
the processing of spatial event information. Behavioral Neuroscience, 102, 397–403.
Driver, J., & Pouget, A. (2000). Object-Centered Visual Neglect, or Relative Egocentric Ne
glect? Journal of Cognitive Neuroscience, 12 (3), 542–545.
Duhamel, J.-R., Bremmer, F., BenHamed, S., & Graf, W. (1997) Spatial invariance of visual
receptive fields in parietal cortex neurons. Nature, 389, 845–848.
Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Ward, R., Kyllingsbæk, S., van Raams
donk, M., Rorden, R., & Chavda, S. (2003). Attentional functions in dorsal and ventral si
multanagnosia. Cognitive Neuropsychology, 20, 675–701.
Ekstrom, A. D., Kahana, M. J., Caplan, J. B., Fields, T. A., Isham, E. A., Newman, E. L., &
Fried, I. (2003). Cellular networks underlying human spatial navigation. Nature, 425,
184–187.
Emmorey, K., Damasio, H., McCullough, S., Grabowski, T., Ponto, L., Hichwa, R., & Bellu
gi, U. (2002). Neural systems underlying spatial language in American Sign Language.
Neuroimage, 17, 812–824.
Engel, S. A., Glover, G. H., & Wandell, B. A. (1997) Retinotopic organization in human vi
sual cortex and the spatial precision of functional MRI. Cerebral Cortex, 7, 181–192.
Engel, S. A., Rumelhart, D. E., Wandell, B. A., Lee, A. T., Glover, G. H., Chichilnisky, E. J.,
& Shadlen, M. N. (1994). fMRI of human visual cortex. Nature, 369, 525.
Epstein, R., DeYoe, E. A., Press, D. Z., Rosen, A. C., & Kanwisher, N. (2001). Neuropsycho
logical evidence for a topographical learning mechanism in parahippocampal cortex. Cog
nitive Neuropsychology, 18, 481–508.
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ
ment. Nature, 392 (6676), 598–601.
Page 39 of 59
Representation of Spatial Relations
Eurich, C. W., & Schwegler, H. (1997). Coarse coding: Calculation of the resolution
achieved by a population of large receptive field neurons. Biological Cybernetics, 76, 357–
363.
human vision. Philosophical Transactions of the Royal Society of London: Series B, 213,
451–477.
Faillenot, I., Toni, I., Decety, J., Grégoire, M.-C., & Jeannerod, M. (1997). Visual pathways
for object-oriented action and object recognition: Functional anatomy with PET. Cerebral
Cortex, 7, 77–85.
Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.
Farah, M. J. (1990). Visual agnosia: Disorders of object recognition and what they tell us
about normal vision. Cambridge, MA: MIT Press.
Farrell, M. J., & Robertson, I. H. (2000). The automatic updating of egocentric spatial re
lationships and its impairment due to right posterior cortical lesions. Neuropsychologia,
38, 585–595.
Feldman, J. (1985). Four frames suffice: A provisional model of vision and space. Behav
ioral and Brain Sciences, 8, 265–289.
Felleman, D. J., & Van Essen, D. C. (1991) Distributed hierarchical processing in primate
cerebral cortex. Cerebral Cortex, 1, 1–47.
Fishman, R. S. (1997). Gordon Holmes, the cortical retina, and the wounds of war. Docu
menta Ophthalmologica, 93, 9–28.
Foster, D.J., & Wilson, M. A. (2006). Reverse replay of behavioural sequences in hip
pocampal place cells during the awake state. Nature, 440, 680–683.
Franco, L., & Sperry, R. W. (1977). Hemisphere lateralization for cognitive processing of
geometry. Neuropsychologia, 75, 107–114.
Friederici, A. D. (1982). Syntactic and semantic processes in aphasic deficits: The avail
ability of prepositions. Brain and Language, 15, 249–258.
Friedman-Hill, S. R., Robertson, L. C., & Treisman, A. (1995). Parietal contributions to vi
sual feature binding: Evidence from a patient with bilateral lesions. Science, 269, 853–
855.
Fyhn, M., Molden, S., Witter, M. P., Moser, E. I., & Moser, M.-B. (2004). Spatial represen
tation in the entorhinal cortex. Science, 305, 1258–1264.
Page 40 of 59
Representation of Spatial Relations
Gallivan, J. P., Cavina-Pratesi, C., & Culham, J. C. (2009). Is that within reach? fMRI re
veals that the human superior parieto-occipital cortex (SPOC) encodes objects reachable
by the hand. Journal of Neuroscience, 29, 4381–4391.
Gattass, R., Nascimento-Silva, S., Soares, J. G. M., Lima, B., Jansen, A. K., Diogo, A. C. M.,
Farias, M. F., Marcondes, M., Botelho, E. P., Mariani, O. S., Azzi, J., & Fiorani, M. (2005).
Cortical visual areas in monkeys: Location, topography, connections, columns, plasticity
and cortical dynamics. Philosophical Transactions of the Royal Society, B, 360, 709–731.
Glickstein, M., Buchbinder, S., & May, J. L. (1998). Visual control of the arm, the wrist and
the fingers: Pathways through the brain. Neuropsychologia, 36, 981–1001.
Goel, V., Gold, B., Kapur, S., & Houle, S. (1998). Neuroanatomical correlates of human
reasoning. Journal of Cognitive Neuroscience, 10, 293–302.
Goodale, M.A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15, 20–25.
Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissoci
ation between perceiving objects and grasping them. Nature, 349, 154–156.
Goodrich-Hunsaker, N. J., Howard, B. P., Hunsaker, M. R., & Kesner, R. P. (2008). Human
topological task adapted for rats: Spatial information processes of the parietal cortex.
Neurobiology of Learning and Memory, 90, 389–394.
Graziano, M. S. A., & Cooke, D. F. (2006). Parieto-frontal interactions, personal space, and
defensive behavior. Neuropsychologia, 44, 845–859.
Graziano, M. S. A., & Gross, C. G. (1995). Multiple representations of space in the brain.
The Neuroscientist, 1, 43–50.
Graziano, M. S. A., & Gross, C. G. (1998). Spatial maps for the control of movement. Cur
rent Opinion in Neurobiology, 8, 195–201.
Graziano, M. S. A., Hu, X. T., & Gross, C. G. (1997). Coding the locations of objects in the
dark. Science, 277, 239–241.
Graziano, M. S. A., Reiss, L. A. J., & Gross, C. G. (1999). A neuronal representation of the
location of nearby sounds. Nature, 397, 428–430.
Graziano, M. S. A., Yap, G. S., & Gross, C. G. (1994). Coding of visual space by pre-motor
neurons. Science, 226, 1054–1057.
Gregory, R. (2009). Seeing through illusions. Oxford, UK: Oxford University Press.
Page 41 of 59
Representation of Spatial Relations
Grön, G., Wunderlich, A. P., Spitzer, M., Tomczak, R., & Riepe, M. W. (2000). Brain activa
tion during human navigation: Gender-different neural networks as a substrate of perfor
mance. Nature Neuroscience, 3, 404–408.
Gross, C.G., & Graziano, M. S. (1995). Multiple representations of space in the brain.
Neuroscientist, 1, 43–50.
Gross, C. G., & Mishkin, M. (1977). The neural basis of stimulus equivalence across reti
nal translation. In S. Harnad, R. Doty, J. Jaynes, L. Goldstein, and G. Krauthamer (Eds.),
Laterulizution in the nervous system (pp. 109–122). New York: Academic Press.
Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). Microstructure of a
spatial map in the entorhinal cortex. Nature, 436, 801–806.
Halligan, P. W., & Marshall, J. C. (1991). Left neglect for near but not far space in man.
Nature, 350, 498–500.
Halligan, P. W., & Marshall, J. C. (1995). Lateral and radial neglect as a function of spatial
position: A case study. Neuropsychologia, 33, 1697–1702.
Harris, I. M., Egan, G. F., Sonkkila, C., Tochon-Danguy, H. J., Paxinos, G., & Watson, J. D.
(2000). Selective right parietal lobe activation during mental rotation: A parametric PET
study. Brain, 123, 65–73.
Harris, I. M., & Miniussi, C. (2003). Parietal lobe contribution to mental rotation demon
strated with rTMS. Journal of Cognitive Neuroscience, 15, 315–323.
Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., et al.
(1991). Dissociation of object and spatial visual processing pathways in human extrastri
ate cortex. Proceedings of the National Academy of Sciences U S A, 88, 1621–1625.
Hayward, W. G., & Tarr, M. J. (1995). Spatial language and spatial representation. Cogni
tion, 55, 39–84.
He, B. J., Snyder, A. Z., Vincent, J. L., Epstein, A., Shulman, G. L., & Corbetta, M.
(p. 54)
Heide, W., Blankenburg, M., Zimmermann, E., & Kompf, D. 1995. Cortical control of dou
blestep saccades—implications for spatial orientation. Annals of Neurology, 38, 739–748.
Heilman, K. M., Bowers, D., Coslett, H. B., Whelan, H., & Watson, R. T. (1985). Directional
hypokinesia: prolonged reaction times for leftward movements in patients with right
hemisphere lesions and neglect. Neurology, Cleveland, 35, 855–859.
Heilman, K. M., & Van Den Abell, T. (1980). Right hemisphere dominance for attention:
The mechanism underlying hemispheric asymmetries of inattention (neglect). Neurology,
30, 327–330.
Page 42 of 59
Representation of Spatial Relations
Hellige, J. B., & Michimata, C. (1989). Categorization versus distance: Hemispheric differ
ences for processing spatial information. Memory & Cognition, 17, 770–776.
Hermer, L., & Spelke, E. (1996). Modularity and development: The case of spatial reorien
tation. Cognition, 61, 195–232.
Hermer-Vazquez, L., Moffet, A., & Munkholm, P. (2001). Language, space, and the devel
opment of cognitive flexibility in humans: The case of two spatial memory tasks. Cogni
tion, 79, 263–299.
Hillis, A. E., Newhart, M., Heidler, J., Barker, P. B., & Degaonkar, M. (2005). Anatomy of
spatial attention: Insights from perfusion imaging and hemispatial neglect in acute
stroke. Journal of Neuroscience, 25, 3161–3167.
Holmes, G., & Horax, G. (1919). Disturbances of spatial orientation and visual attention,
with loss of stereoscopic vision. Archives of Neurology and Psychiatry, 1, 385–407.
Horton, J. C., & Hoyt, W. F. (1991). The representation of the visual field in human striate
cortex: A revision of the classic Holmes map. Archives of Ophthalmology, 109, 816–824.
Hubbard, E. M., Piazza, M., Pinel, P., & Dehaene, S. (2005). Interactions between number
and space in parietal cortex. Nature Reviews, Neuroscience, 6, 435–448.
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape
recognition. Psychological Review, 99, 480–517.
Ingle, D. J. (1967). Two visual mechanisms underlying the behaviour of fish. Psychologis
che Forschung, 31, 44–51.
Ings, S. (2007). A natural history of seeing. New York: Norton & Company.
Jackendoff, R., & Landau, B. (1992). Spatial language and spatial cognition. In R. Jackend
off (Ed.), Languages of the mind: Essays on mental representation (pp. 99–124). Cam
bridge, MA: MIT Press.
Page 43 of 59
Representation of Spatial Relations
Jacobs, R. A., & Kosslyn, S. M. (1994). Encoding shape and spatial relations: The role of
receptive field size in coordinating complementary representations. Cognitive Science,
18, 361–386.
James, T. W., Culham, J. C., Humphrey, G. K., Milner, A. D., & Goodale, M. A. (2003). Ven
tral occipital lesions impair object recognition but not object-directed grasping: A fMRI
study. Brain, 126, 2463–2475.
Jeannerod, M., Decety, J., & Michel, F. (1994). Impairment of grasping movements follow
ing a bilateral posterior parietal lesion. Neuropsychologia, 32, 369–380.
Jeannerod, M., & Jacob, P. (2005). Visual cognition: A new look at the two-visual systems
model. Neuropsychologia, 43, 301–312.
Johnsen, S., & Lohmann, K. J. (2005). The physics and neurobiology of magnetoreception.
Nature Review Neuroscience, 6, 703–712.
Johnson, H., & Haggard, P. (2005). Motor awareness without perceptual awareness. Neu
ropsychologia, 43, 227–237.
Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths
forward of the animal at a decision point. Journal of Neuroscience, 27, 12176–12189.
Jordan, K., Wustenberg, T., Heinze, H. J., Peters, M., & Jancke, L. (2002). Women and men
exhibit different cortical activation patterns during mental rotation tasks. Neuropsycholo
gia, 40, 2397–2408.
Just, M. A., Carpenter, P. A., Maguire, M., Diwadkar, V., & McMains, S. (2001). Mental ro
tation of objects retrieved from memory: A functional MRI study of spatial processing.
Journal of Experimental Psychology: General, 130, 493–504.
Kaas, J. H. (1997). Topographic maps are fundamental to sensory processing. Brain Re
search Bulletin, 44, 107–112.
Kahane, P., Hoffman, D., Minotti, L., & Berthoz, A. (2003) Reappraisal of the human
vestibular cortex by cortical electrical stimulation study. Annals of Neurology, 54, 615–
624.
Page 44 of 59
Representation of Spatial Relations
Kahneman, D., Treisman, A., & Gibbs, B. (1992). The reviewing of object files: Object-spe
cific integration of information. Cognitive Psychology, 24, 175–219.
Kant, I. (1781). Kritik der reinen Vernunft (translation: Critique of Pure Reason, 2008.
Penguin Classics.
Karnath, H. O. (2001). New insights into the functions of the superior temporal cortex:
Nature Reviews, Neuroscience, 2, 568–576.
Karnath, H. O., Ferber, S., & Himmelbach, M. (2001). Spatial awareness is a function of
the temporal not the posterior parietal lobe. Nature, 411, 950–953.
Khan, A. Z., Pisella, L., Vighetto, A., Cotton, F., Luauté, J., Boisson, D., Salemme, R., Craw
ford, J. D., & Rossetti, Y. (2005). Optic ataxia errors depend on remapped, not viewed, tar
get location. Nature Neuroscience, 8, 418–420.
Kastner, S., Demner, I., & Ziemann, U. (1998). Transient visual field defects induced by
transcranial magnetic stimulation. Experimental Brain Research, 118, 19–26.
Kastner, S., DeSimone, K., Konen, C. S., Szczepanski, S. M., Weiner, K. S., & Sch
(p. 55)
Kastner, S., Pinsk, M. A., De Weerd, P., Desimone, R., & Ungerleider, L. G. (1999). In
creased activity in human visual cortex during directed attention in the absence of visual
stimulation. Neuron, 22, 751–761.
Kemmerer, D. (2006). The semantics of space: Integrating linguistic typology and cogni
tive neuroscience. Neuropsychologia, 44, 1607–1621.
Kemmerer, D., & Tranel, D. (2000). A double dissociation between linguistic and percep
tual representations of spatial relationships. Cognitive Neuropsychology, 17, 393–414.
Kemmerer, D., & Tranel, D. (2003). A double dissociation between the meanings of action
verbs and locative prepositions. Neurocase, 9, 421–435.
Kessels, R. P. C., de Haan, E. H. F., Kappelle, L. J., & Postma, A. (2001). Varieties of human
spatial memory: A meta-analysis on the effects of hippocampal lesions. Brain Research
Reviews, 35, 295–303.
Kessels, R. P. C., Kappelle, L. J., de Haan, E. H. F., & Postma, A. (2002). Lateralization of
spatial-memory processes: evidence on spatial span, maze learning, and memory for ob
ject locations. Neuropsychologia, 40, 1465–1473.
Kirchner, W. H., & Braun, U. (1994). Dancing honey bees indicate the location of food
sources using path integration rather than 48, cognitive maps. Animal Behaviour, 1437–
1441.
Page 45 of 59
Representation of Spatial Relations
Kinsbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from at
tentional gradients within hemispace. In I. H. Robertson & J. C. Marshall (Eds.), Unilater
al neglect: Clinical and experimental studies (pp. 63–86). Hillsdale, NJ: Erlbaum.
Kitada, R., Kito, T., Saito, D. N., Kochiyama, T., Matsamura, M., Sadato, N., & Lederman,
S. J. (2006). Multisensory activation of the intraparietal area when classifying grating ori
entation: A functional magnetic resonance imaging study. Journal of Neuroscience, 26,
7491–7501.
Kjelstrup, K. B., Solstad, T., Brun, V. H., Hafting, T., Leutgeb, S., Witter, M. P., Moser, E. I.,
& Moser, M.-B. (2008). Finite scales of spatial representation in the hippocampus.
Science, 321, 140–143.
Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press.
Kosslyn, S. M. (1994). Image and brain: The resolution of the imagery debate. Cambridge,
MA: MIT Press.
Kosslyn, S. M., Chabris, C. F., Marsolek, C. J., & Koenig, O. (1992). Categorical versus co
ordinate spatial relations: Computational analyses and computer simulations. Journal of
Experimental Psychology: Human Perception and Performance, 18 (2), 562–577.
Kosslyn, S. M., DiGirolamo, G. J., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation
of objects versus hands: neural mechanisms revealed by positron emission tomography.
Psychophysiology, 35, 151–161.
Kosslyn, S. M., & Jacobs, R. A. (1994). Encoding shape and spatial relations: A simple
mechanism for coordinating complementary representations. In V. Honavar & L. M. Uhr
(Eds.), Artificial intelligence and neural networks: Steps toward principled integration
(pp. 373–385). Boston: Academic Press.
Kosslyn, S. M., & Koenig, O. (1992). wet mind: the new cognitive neuroscience. New York:
Free Press.
Kosslyn, S. M., Koenig, O., Barrett, A., Cave, C. B., Tang, J., & Gabrieli, J. D. E. (1989). Evi
dence for two types of spatial representations: Hemispheric specialization for categorical
Page 46 of 59
Representation of Spatial Relations
and coordinate relations. Journal of Experimental Psychology: Human Perception and Per
formance, 15 (4), 723–735.
Kosslyn, S. M., Maljkovic, V., Hamilton, S. E., Horwitz, G., & Thompson, W. L. (1995). Two
types of image generation: Evidence for left and right hemisphere processes. Neuropsy
chologia, 33 (11), 1485–1510.
Kosslyn, S. M., Pick, H. L., & Fariello, G. R. (1974). Cognitive maps in children and men.
Child Development, 45, 707–716.
Kosslyn, S. M., Thompson, W. T., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.
Kosslyn, S. M., Thompson, W.T., Gitelman, D. R., & Alpert, N. M. (1998). Neural systems
that encode categorical versus coordinate spatial relations: PET investigations. Psychobi
ology, 26 (4), 333–347.
Króliczak, G., Heard, P., Goodale, M. A., & Gregory, R. L. (2006). Dissociation of percep
tion and action unmasked by the hollow-face illusion. Brain Research, 1080, 9–16.
Lacquaniti, F., Guigon, E., Bianchi, L., Ferraina, S., & Caminiti, R. (1995). Representing
spatial information for limb movement: The role of area 5 in the monkey. Cerebral Cortex,
5, 391–409.
Laeng, B. (2006). Constructional apraxia after left or right unilateral stroke. Neuropsy
chologia, 44, 1519–1523.
Laeng, B., Brennen, T., Johannessen, K., Holmen, K., & Elvestad, R. (2002). Multiple refer
ence frames in neglect? An investigation of the object-centred frame and the dissociation
between “near” and “far” from the body by use of a mirror. Cortex, 38, 511–528.
Laeng, B., Carlesimo, G. A., Caltagirone, C., Capasso, R., & Miceli, G. (2000). Rigid and
non-rigid objects in canonical and non-canonical views: Effects of unilateral stroke on ob
ject identification. Cognitive Neuropsychology, 19, 697–720.
Laeng, B., Chabris, C. F., & Kosslyn, S. M. (2003). Asymmetries in encoding spatial rela
tions. In K. Hugdahl and R. Davidson (Eds.), The asymmetrical brain (pp. 303–339). Cam
bridge, MA: MIT Press.
Laeng, B., Okubo, M., Saneyoshi, A., & Michimata, C. (2011). Processing spatial relations
with different apertures of attention. Cognitive Science, 35, 297–329.
Laeng, B., & Peters, M. (1995). Cerebral lateralization for the processing of spatial coor
dinates and categories in left- and right-handers. Neuropsychologia, 33, 421–439.
Page 47 of 59
Representation of Spatial Relations
Laeng, B., Peters, M., & McCabe, B. (1997). Memory for locations within regions. Spatial
biases and visual hemifield differences. Memory and Cognition, 26, 97–107.
Laeng, B., Shah, J., & Kosslyn, S. M. (1999). Identifying objects in conventional and con
torted poses: Contributions of hemisphere-specific mechanisms. Cognition, 70 (1), 53–85.
Laeng, B., & Teodorescu, D.-S. (2002). Eye scanpaths during visual imagery reen
(p. 56)
act those of perception of the same visual scene. Cognitive Science, 26, 207–231.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh. Chicago: University of Chicago
Press.
Lamb, T. D., Collin, S. P., & Pugh, E. N. (2007). Evolution of the vertebrate eye: Opsins,
photoreceptors, retina and eye cup. Nature Reviews, Neuroscience, 8, 960–975.
Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cog
nitive Sciences, 7, 12–18.
Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed
forward and recurrent processing. Trends in Neuroscience, 23, 571–579.
Lamme, V. A. F., Super, H., Landman, R., Roelfsema, P. R., & Spekreijse, H. (2000). The
role of primary visual cortex (V1) in visual awareness. Vision Research, 40, 1507–1521.
Landau, B., Hoffman, J. E., & Kurz, N. (2006). Object recognition with severe spatial
deficits in Williams syndrome: Sparing and breakdown. Cognition, 100, 483–510.
LeDoux, J. E., & Gazzaniga, M. S. (1978). The integrated mind. New York: Plenum.
LeDoux, J. E., Wilson, D. H., & Gazzaniga, M. S. (1977). Manipulospatial aspects of cere
bral lateralization: Clues to origin of lateralization. Neuropsychologia, 15, 743–750.
Lenck-Santini, P.-P., Save, E., & Poucet, B. (2001). Evidence for a relationship between
place-cell spatial firing and spatial memory performance. Hippocampus, 11, 337–390.
Livingstone, M. S., & Hubel, D. H. (1988). Segregation of form, color, movement, and
depth: Anatomy, physiology, and perception. Science, 240, 740–749.
Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process
ing in auditory cortex. Nature Neuroscience, 11, 609–616.
Page 48 of 59
Representation of Spatial Relations
Ludvig, N., Tang, H. M., Gohil, B. C., & Botero, J. M. (2004). Detecting location-specific
neuronal firing rate increases in the hippocampus of freely moving monkeys. Brain Re
search, 1014, 97–109.
Luna, B., Thulborn, K. R., Strojwas, M. H., McCurtain, B. J., Berman, R. A., Genovese, C.
R., et al. (1998). Dorsal cortical regions subserving visually guided in humans: an fMRI
study. Cerebral Cortex, 8 (1), 40–47.
Maguire, E. A., Burgess, N., & O’Keefe, J. (1999). Human spatial navigation: cognitive
maps, sexual dimorphism. Current Opinion in Neurobiology, 9, 171–177.
Maguire, E. A., Burke, T., Phillips, J., & Staunton, H. (1996). Topographical disorientation
following unilateral temporal lobe lesions in humans. Neuropsychologia, 34, 993–1001.
Maguire, E. A., Frackowiak, R. S., & Frith, C. D. (1997). Recalling routes around London:
Activation of the right hippocampus in taxi drivers. Journal of Neuroscience, 17, 7103–
7110.
Maguire, E. A., Frith, C. D., Burgess, N., Donnett, J. G., & O’Keefe, J. (1998). Knowing
where things are: Parahippocampal involvement in encoding object relations in virtual
large-scale space. Journal of Neuroscience, 10, 61–76.
Mahon, B. Z., Milleville, S. C., Negri, G. A. L., Rumiati, R. I., Caramazza, A., & Martin, A.
(2007). Action-related properties shape object representations in the ventral stream. Neu
ron, 55, 507–520.
Malach, R., Levy, I., & Hasson, U. (2002). The topography of high-order human object ar
eas. Trends in Cognitive Science, 6, 176–184.
Malkova, L., & Mishkin, M. (2003). One-trial memory for object-place associations after
separate lesions of hippocampus and posterior parahippocampal region in the monkey.
Journal of Neuroscience, 1; 23 (5), 1956–1965.
Maunsell, J. H. R., & Newsome, W. T. (1987). Visual processing in the monkey extrastriate
cortex. Annual Review of Neuroscience, 10, 363–401.
Medendorp, W. P., Goltz, H. C., Vilis, T., & Crawford, J. D. (2003). Gaze-centered updating
of visual space in human parietal cortex. Journal of Neuroscience, 23, 6209–6214.
Page 49 of 59
Representation of Spatial Relations
Mennemeier, M., Wertman, E., & Heilman, K. M. (1992). Neglect of near peripersonal
space. Brain, 115, 37–50.
Menzel, R., Brandt, R., Gumbert, A., Komischke, B., & Kunze, J. (2000). Two spatial mem
ories for honeybee navigation. Proceedings of the Royals Society of London, Biological
Sciences, 267, 961–968.
Miller, G. A., & Johnson-Laird, P. N. (1976). Language and perception. Cambridge, MA:
Harvard University Press.
Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford: Oxford Univer
sity Press.
Milner, D. A., & Goodale, M. A. (2006). The visual brain in action. New York: Oxford Uni
versity Press.
Milner, D. A., & Goodale, M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46, 774–785.
Milner, D. A., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., et al.
(1991). Perception and action in “visual form agnosia.” Brain, 114, 405–428.
Mishkin, M., Ungerleider, L. G., & Macko, K. A. (1983). Object vision and spatial vision:
Two cortical pathways. Trends in Neurosciences, 6, 414–417.
Morel, A., & Bullier, J. (1990). Anatomical segregation of two cortical visual pathways in
the macaque monkey. Visual Neuroscience, 4, 555–578.
Moser, E. I., Kropff, E., & Moser, M.-B. (2008). Place cells, grid cells and the brain’s spa
tial representation system. Annual Review of Neuroscience, 31, 69–89.
Motter, B. C., & Mountcastle, V. B. (1981). The functional properties of light-sensitive neu
rons of the posterior parietal cortex studied in waking monkeys: Foveal sparing and oppo
nent vector organization. Journal of Neuroscience, 1, 3–26.
Mountcastle, V. B. (1995). The parietal system and some higher brain functions. Cerebral
Cortex, 5, 377–390.
Naganuma, T., Nose, I., Inoue, K., Takemoto, A., Katsuyama, N., & Taira, M. (2005). Infor
mation processing of geometrical features of a surface based on binocular disparity cues:
An fMRI study. Neuroscience Research, 51, 147–155.
Neggers, S. F. W., Van der Lubbe, R. H. J., Ramsey, N. F., & Postma, A. (2006). Interactions
between ego- and allocentric neuronal representations of space. NeuroImage, 31, 320–
331.
Newcombe, F., & Russell, W. R. (1969). Dissociated visual perceptual and spatial deficits
in focal lesions of the right hemisphere. Journal of Neurology, Neurosurgery, and Psychia
try, 32, 73–81.
Page 50 of 59
Representation of Spatial Relations
Nguyen, B. T., Trana, T. D., Hoshiyama, M., Inuia, K., & Kakigi, R. (2004). Face rep
(p. 57)
Nitz, D. A. (2006). Tracking route progression in the posterior parietal cortex. Neuron, 49,
747–756.
O’Keefe, J. (1976). Place units in the hippocampus of the freely moving rat. Experimental
Neurology, 51, 78–109.
O’Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive
map theory. In P. Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Eds.), Language and
space (pp. 277–316). Cambridge, MA: The MIT Press.
O’Keefe, J. (2003). Vector grammar, places, and the functional role of spatial prepositions
in English. In E. van der Zee & J. Slack (Eds.), Representing direction in language and
space (pp. 69–85). Oxford, K: Oxford University Press.
O’Keefe, J., Burgess, N., Donnett, J. G., Jeffery, J. K., & Maguire, E. A. (1998). Place cells,
navigational accuracy, and the human hippocampus. P hilosophical Transactions of the
Royal Society of London, Series B, Biological Sciences, 353, 1333–1340.
O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, UK: Calren
don Press.
Okubo, M., Laeng, B., Saneyoshi, A., & Michimata, C. (2010). Exogenous attention differ
entially modulates the processing of categorical and coordinate spatial relations. Acta
Psychologica, 135, 1–11.
Olson, C. R., & Gettner, S. N. (1995). Object-centered direction selectivity in the macaque
supplementary eye field. Science, 269, 985–988.
O’Regan, J. K., & Nöe, A. (2001). A sensorimotor account of vision and visual conscious
ness. Behavioral and Brain Sciences, 24, 939–1011.
O’Reilly, R. C., Kosslyn, S. M., Marsolek, C. J., & Chabris, C. F. (1990). Receptive field
characteristics that allow parietal lobe neurons to encode spatial properties of visual in
put: A computational analysis. Journal of Cognitive Neuroscience, 2, 141–155.
Page 51 of 59
Representation of Spatial Relations
Otto, I., Grandguillaume, P., Boutkhil, L., & Guigon, E. (1992). Direct and indirect cooper
ation between temporal and parietal networks for invariant visual recognition. Journal of
Cognitive Neuroscience, 4, 35–57.
Paillard, J. (1991). Motor and representational framing of space. In J. Paillard (Ed.), Brain
and space (pp. 163–182). Oxford, UK: Oxford University Press.
Palermo, L., Bureca, I., Matano, A., & Guariglia, C. (2008). Hemispheric contribution to
categorical and coordinate representational processes: A study on brain-damaged pa
tients. Neuropsychologia, 46, 2802–2807.
Parsons, L. M. (2003). Superior parietal cortices and varieties of mental rotation. Trends
in Cognitive Sciences, 7, 515–517.
Perenin, M.-T., & Vighetto, A. (1988). Optic ataxia: A specific disruption in visuomotor
mechanisms. Brain, 111, 643–674.
Piaget, J., & Inhelder, B. (1956). The child’s conception of space. London: Routledge &
Kegan Paul.
Perenin, M. T., & Jeannerod, M. (1978). Visual function within the hemianopic field follow
ing early cerebral hemidecortication in man. I. Spatial localization. Neuropsychologia, 16,
1–13.
Pinker, S. (2007). The stuff of thought: Language as a window into human nature. New
York: Penguin Books.
Poucet, B. (1993). Spatial cognitive maps in animals: New hypotheses on their structure
and neural mechanisms. Psychological Review, 100, 163–182.
Pouget, A., & Sejnowski, T. J. (1997). A new view of hemineglect based on the response
properties of parietal neurones. Philosophical Transactions of the Royal Society, Series B,
Biological Sciences, 352, 1449–1459.
Quinlan, D. J., & Culham, J. C. (2007). fMRI reveals a preference for near viewing in the
human parietal-occipital cortex. NeuroImage, 36, 167–187.
Rao, S. C., Rainer, G., & Miller, E. K. (1997). Integration of what and where in the primate
prefrontal cortex. Science, 276, 821–824.
Reed, C. L., Klatzky, R. L., & Halgren, E. (2005). What vs. where in touch: An fMRI study.
NeuroImage, 25, 718–726.
Page 52 of 59
Representation of Spatial Relations
Revonsuo, A., & Newman, J. (1999). Binding and consciousness. Consciousness and Cog
nition, 8, 123–127.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neu
roscience, 27, 169–192.
Rizzolatti, G., & Matelli, M. (2003). Two different streams form the dorsal visual system:
Anatomy and functions. Experimental Brain Research, 153, 146–157.
Robertson, L. C. (2003). Binding, spatial attention and perceptual awareness. Nature Re
views: Neuroscience, 4, 93–102.
Robertson, L. C., Treisman, A., Friedman-Hill, S., & Grabowecky, M. (1997). The interac
tion of spatial and object pathways: Evidence from Bálint ‘s syndrome. Journal of Cogni
tive Neuroscience, 9, 295–317.
Rogers, J. L., & Kesner, R. P. (2006). Lesions of the dorsal hippocampus or parietal cortex
differentially affect spatial information processing. Behavioral Neuroscience, 120, 852–
860.
Rogers, L. J., Zucca, P., & Vallortigara, G. (2004). Advantage of having a lateralized brain.
Proceedings of the Royal Society of London B (Suppl.): Biology Letters, 271, 420–422.
Rolls, E. T., Miyashita, Y., Cahusac, P. M. B., Kesner, R. P., Niki, H., Feigenbaum, J., et al.
(1989). Hippocampal neurons in the monkey with activity related to the place in which a
stimulus is shown. Journal of Neuroscience, 9, 1835–1845.
Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P.
(1999). Dual streams of auditory afferents target multiple domains in the primate pre
frontal cortex. Nature Neuroscience, 2, 1131–1136.
Roth, E. C., & Hellige, J. B. (1998). Spatial processing and hemispheric asymmetry: Con
tributions of the transient/magnocellular visual system. Journal of Cognitive
Neuroscience, 10, 472–484.
Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are ‘what’ and ‘where’ processed
by separate visual systems? A computational investigation. Journal of Cognitive Neuro
science, 1 (2), 171–186.
Rybash, J. M., & Hoyer, W. J. (1992). Hemispheric specialization for categorical and
(p. 58)
Sakata, H., Taira, M., Kusunoki, M., Murata, A., & Tanaka, Y. (1997). The parietal associa
tion cortex in depth perception and visual control of hand action. Trends in Neuroscience,
20, 350–357.
Page 53 of 59
Representation of Spatial Relations
Save, E., & Poucet, B. (2000). Hippocampal-parietal cortical interactions in spatial cogni
tion. Hippocampus, 10, 491–499.
Schindler, I., Rice, N. J., McIntosh, R. D., Rossetti, Y., Vighetto, A., & Milner, A. D. (2004).
Automatic avoidance of obstacles is a dorsal stream function: Evidence from optic ataxia.
Nature Neuroscience, 7, 779–784.
Seltzer, B., & Pandya, D. N. (1978). Afferent cortical connections and architectonics of the
superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Research,
149, 1–24.
Semmes, J., Weinstein, S., Ghent, L., & Teuber, H. L. (1955). Spatial orientation in man af
ter cerebral injury: I. Analyses by locus of lesion. Journal of Psychology, 39, 227–244.
Sereno, A. B., & Maunsell, J. H. R. (1998). Shape selectivities in primate lateral intrapari
etal cortex. Nature, 395, 500–503.
Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., et al. (1995). Bor
ders of multiple visual areas in humans revealed by functional magnetic resonance imag
ing. Science, 268, 889–893.
Sereno, M. I., & Huang, R.-S. (2006). A human parietal face area contains head-centered
visual and tactile maps. Nature Neuroscience, 9, 1337–1343.
Sereno, M. I., Pitzalis, S., & Martinez, A. (2001). Mapping of contralateral space in retino
topic coordinates by a parietal cortical area in humans. Science, 294, 1350–1354.
Servos, P., Engel, S. A., Gati, J., & Menon, R. (1999). FMRI evidence for an inverted face
representation in human somatosensory cortex. Neuroreport, 10, 1393–1395.
Seth, A. K., McKinstry, J. L., Edelman, G. M., & Krichmar, J. L. (2004). Visual binding
through reentrant connectivity and dynamic synchronization in a brain-based device.
Cerebral Cortex, 14, 1185–1199.
Shelton, P. A., Bowers, D., & Heilman, K. M. (1990). Peripersonal and vertical neglect.
Brain, 113, 191–205.
Shelton, A. L., & McNamara, T. P. (2001). Systems of spatial reference in human memory.
Cognitive Psychology, 43, 274–310.
Shrager, Y., Kirwan, C. B., & Squire, L. R. (2008). Neural basis of the cognitive map: Path
integration does not require hippocampus or entorhinal cortex. Proceedings of the Na
tional Academy of Sciences U S A, 105, 12034–12038.
Silver, M., & Kastner, S. (2009). Topographic maps in human frontal and parietal cortex.
Trends in Cognitive Sciences, 11, 488–495.
Page 54 of 59
Representation of Spatial Relations
Slotnick, S. D., & Moo, L. R. (2006). Prefrontal cortex hemispheric specialization for cate
gorical and coordinate visual spatial memory. Neuropsychologia, 44, 1560–1568.
Slotnick, S. D., Moo, L., Tesoro, M. A., & Hart, J. (2001). Hemispheric asymmetry in cate
gorical versus coordinate visuospatial processing revealed by temporary cortical deacti
vation. Journal of Cognitive Neuroscience, 13, 1088–1096.
Smallman, H. S., MacLeod, D. I. A., He, S., & Kentridge, R. W. (1996). Fine grain of the
neural representation of human spatial vision. Journal of Neuroscience, 76 (5), 1852–
1859.
Smith, E. E., Jonides, J., Koeppe, R. A., Awh, E., Schumacher, E., & Minoshima, S. (1995).
Spatial vs. object working memory: PET investigations. Journal of Cognitive
Neuroscience, 7, 337–358.
Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior
parietal cortex. Nature, 386, 167–170.
Solstad, T., Boccara, C. N., Kropff, E., Moser, M.-B., & Moser, E. I. (2008). Representation
of geometric borders in the entorhinal cortex. Science, 322, 1865–1868.
Sovrano, V. A., Dadda, M., Bisazza, A. (2005). Lateralized fish perform better than nonlat
eralized fish in spatial reorientation tasks. Behavioural Brain Research, 163, 122–127.
Stark, M., Coslett, H. B., & Saffran, E. M. (1996). Impairment of an egocentric map of lo
cations: Implications for perception and action. Cognitive Neuropsychology, 13, 481–523.
Sutherland, R. J., & Rudy, J. W. (1988) Configural association theory: The role of the hip
pocampal formation in learning, memory and amnesia. Psychobiology, 16, 157–163.
Tanaka, K., Saito, H., Fukada, Y., & Moriya, M. (1991) Coding visual images of objects in
the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, 66, 170–
189.
Taira, M., Mine, S., Georgopoulos, A. P., Murata, A., & Sakata, H. (1990). Parietal cortex
neurons of the monkey related to the visual guidance of hand movement. Experimental
Brain Research, 83, 29–36.
Takahashi, N., Kawamura, M., Shiota, J., Kasahata, N., & Hirayama, K. (1997). Pure topo
graphic disorientation due to right retrosplenial lesion. Neurology, 49, 464–469.
Page 55 of 59
Representation of Spatial Relations
Thiebaut de Schotten, M., Urbanski, M., Duffau, H., Volle, E., Levy, R., Dubois, B., & Bar
tolomeo, P. (2005). Direct evidence for a parietal-frontal pathway subserving spatial
awareness in humans. Science, 309, 2226–2228.
Thivierge, J.-P., & Marcus, G. (2007). The topographic brain: From neural connectivity to
cognition. Trends in Neuroscience, 30, 251–259.
Tipper, S.P., & Behrmann, M. (1996). Object-centered not scene based visual neglect.
Journal of Experimental Psychology: Human Perception and Performance, 22, 1261–1278.
Tolman, E. C. (1948) Cognitive maps in rats and men. Psychological Review, 55, 189–208.
Tootell, R. B., Hadjikhani, N., Hall, E. K., Marrett, S., Vanduffel, W., Vaughan, J.T., & Dale,
A. M. (1998). The retinotopy of visual spatial attention. Neuron, 21, 1409–1422.
Tootell, R. B., Hadjikhani, N., Vanduffel, W., Liu, A. K., Mendola, J. D., Sereno, M. I., &
Dale, A. M. (1998). Functional analysis of primary visual cortex (V1) in humans. Proceed
ings of the National Academy of Sciences U S A, 95, 811–817.
Tootell, R. B., Mendola, J. D., Hadjikhani, N., Liu, A. K., & Dale, A. M. (1998). The repre
sentation of the ipsilateral visual field in human cerebral cortex. Proceedings of the Na
tional Academy of Sciences U S A, 95, 818–824.
Tootell, R. B., Silverman, M. S., Switkes, E., & DeValois, R. L. (1982). Deoxyglucose analy
sis of retinotopic organization in primate striate cortex. Science, 218, 902–904.
Trojano, L., Conson, M., Maffei, R., & Grossi, D. (2006). Categorical and coordinate spa
tial processing in the imagery domain investigated by rTMS. Neuropsychologia, 44, 1569–
1574.
Trojano, L., Grossi, D., Linden, D. E. J., Formisano, E., Goebel, R., & Cirillo, S. (2002). Co
ordinate and categorical judgements in spatial imagery: An fMRI study. Neuropsychologia,
40, 1666–1674.
Tsal, Y., & Bareket, T. (2005). Localization judgments under various levels of attention.
Psychonomic Bulletin & Review, 12 (3), 559–566.
Page 56 of 59
Representation of Spatial Relations
Tsal, Y., Meiran, N., & Lamy, D. (1995). Toward a resolution theory of visual attention. Vi
sual Cognition, 2, 313–330.
Tsao, D. Y., Vanduffel, W., Sasaki, Y., Fize, D., Knutsen, T. A., Mandeville, J. B., Wald, L. L.,
Dale, A. M., Rosen, B. R., Van Essen, D. C., Livingstone, M. S., Orban, G. A., & Tootell, R.
B. H. (2003). Stereopsis activates V3A and caudal intraparietal areas in macaques and
humans. Neuron, 31, 555–568.
Tsutsui, K.-I., Sakata, H., Naganuma, T., & Taira, M. (2002). Neural correlates for percep
tion of 3D surface orientation from texture gradients. Science, 298, 409–412.
Turnbull, O. H., Beschin, N., & Della Sala, S. (1997). Agnosia for object orientation: Impli
cations for theories of object recognition. Neuropsychologia, 35, 153–163.
Turnbull, O. H., Laws, K. R., & McCarthy, R. A. (1995). Object recognition without knowl
edge of object orientation. Cortex, 31, 387–395.
Ungerleider, L. G., & Haxby, J. V. (1994). “What” and “where” in the human brain. Current
Opinion in Neurobiology, 4, 157–165.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge,
MA: MIT Press.
Vallar, G., Bottini, G., & Paulesu, E. (2003). Neglect syndromes: the role of the parietal
cortex. Advances in Neurology, 93, 293–319.
Vallortigara, G., Cozzutti, C., Tommasi, L., & Rogers, L. J. (2001) How birds use their eyes:
Opposite left-right specialisation for the lateral and frontal visual hemifield in the domes
tic chick. Current Biology, 11, 29–33.
Vallortigara, G., & Rogers, L. J. (2005). Survival with an asymmetrical brain: advantages
and disadvantages of cerebral lateralization. Behavioral and Brain Sciences, 28, 575–589.
Valyear, K. F., Culham, J. C., Sharif, N., Westwood, D., & Goodale, M. A. (2005). A double
dissociation between sensitivity to changes in object identity and object orientation in the
ventral and dorsal visual streams: A human fMRI study. Neuropsychologia, 44, 218–228.
Vauclair, J., Yamazaki, Y., & Güntürkün, O. (2006). The study of hemispheric specialization
for categorical and coordinate spatial relations in animals. Neuropsychologia, 44, 1524–
1534.
Wandell, B.A., Brewer, A.A., & Dougherty, R. F. (2005). Visual field map clusters in human
cortex. Philosophical Transactions of the Royal Society, B, 360, 693–707.
Page 57 of 59
Representation of Spatial Relations
Wandell, B. A., Dumoulin, S. O., & Brewer, A. A. (2009). Visual cortex in humans. In L.
Squire (Ed.), Encyclopedia of neuroscience (pp. 251–257). Elsevier: Network Version.
Wang, B., Zhou, T. G., Zhuo, Y., & Chen, L. (2007). Global topological dominance in the
left hemisphere. Proceedings of the National Academy of Sciences U S A, 104, 21014–
21019.
Warrington, E. K., & James, A. M. (1986). Visual object recognition in patients with right-
hemisphere lesions: Axes or features. Perception, 15, 355–366.
Warrington, E. K., & Taylor, A. M. (1973). The contribution of the right parietal lobe to ob
ject recognition. Cortex, 9, 152–164.
Waszak, F., Drewing, K., & Mausfeld, R. (2005). Viewerexternal frames of reference in the
mental transformation of 3-D objects. Perception & Psychophysics, 67, 1269–1279.
Weintraub, S., & Mesulam, M.-M. (1987). Right cerebral dominance in spatial attention:
Further evidence based on ipsilateral neglect. Archives of Neurology, 44, 621–625.
Weiskrantz, L. (1986). Blindsight: A Case Study and Implications. Oxford: Oxford Univer
sity Press.
Wilson, F. A., Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of object and
spatial processing domains in primate prefrontal cortex. Science, 260, 1955–1958.
Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code
for space. Science, 261, 1055–1058.
Wolbers, T., & Hegarty, M. (2010). What determines our navigational abilities? Trends in
Cognitive Sciences, 14 (3), 138–146.
Wolbers, T., Wiener, J. M., Mallot, H. A., & Bűchel, C. (2007). Differential recruitment of
the hippocampus, medial prefrontal vortex, and the human motion complex during path
integration in humans. Journal of Neuroscience, 27, 9408–9416.
Xu, F., & Carey, S. (1996). Infants’ metaphysics: The case of numerical identity. Cognitive
Psychology, 30, 111–153.
Page 58 of 59
Representation of Spatial Relations
Yang, T. T., Gallen, C. C., Schwartz, B. J., & Bloom, F. E. (1994). Noninvasive somatosenso
ry homunculus mapping in humans by using a large-array biomagnetometer. Proceedings
of the National Academy of Sciences U S A, 90, 3098–3102.
Zacks, J. M., Gilliam, F., & Ojemann, J. G. (2003). Selective disturbance of mental rotation
by cortical stimulation. Neuropsychologia, 41, 1659–1667.
Zeki, S. (2001). Localization and globalization in conscious vision. Annual Review of Neu
roscience, 24, 57–86.
Zeki, S., Watson, J. D. G., Luexk, C. J., Friston, K. J., Kennard, C., & Frackowiak, R. S. J.
(1991). A direct demonstration of functional specialization in human visual cortex. Journal
of Neuroscience, 11, 641–649.
Zipser, D., & Andersen, R. A. (1988). A back-propagation programmed network that simu
lates response properties of a subset of posterior parietal neurons. Nature, 331, 679–684.
Bruno Laeng
Page 59 of 59
Top-Down Effects in Visual Perception
The traditional belief that our perception is determined by sensory input is an illusion.
Empirical findings and theoretical ideas from recent years indicate that our experience,
memory, expectations, goals, and desires can substantially affect the appearance of the
visual information that is in front of us. Indeed, our perception is equally shaped by the
incoming bottom-up information that captures the objective appearance of the world sur
rounding us, as by our previous knowledge and personal characteristics that constitute
different sources of top-down perceptual influences. The present chapter focuses on such
feedback effects in visual perception, and describes how the interplay between prefrontal
and posterior visual cortex underlies the mechanisms through which top-down predic
tions guide visual processing. One major conclusion is that perception and cognition are
more interactive than typically thought, and any artificial boundary between them may be
misleading.
Keywords: top-down, feedback, predictions, expectations, perception, visual cortex, prefrontal cortex
Introduction
The importance of top-down effects in visual perception is widely acknowledged by now.
There is nothing speculative in recognizing the fact that our perception is influenced by
our previous experiences, expectations, emotions, or motivation. Research in cognitive
neuroscience has revealed how such factors systematically influence the processing of
signals that originate from our retina or other sensory organs. Nevertheless, even most
textbooks still view percepts as objective reflections of the external world. According to
this view, when placed in a particular environment, we all principally see the same things,
but may mutually differ with respect to postperceptual, or higher level, cognitive process
es. Thus, perception is seen as exclusively determined by the external sensory, so-called
bottom-up, inputs. And, although the importance of the constant exchange between in
coming sensory data and existing knowledge used for postulating hypotheses regarding
sensations has occasionally been recognized during the early days of cognitive psycholo
Page 1 of 24
Top-Down Effects in Visual Perception
gy (MacKay, 1956), only rarely have the top-down effects been considered to be of signifi
cant importance as bottom-up factors (Gregory, 1980; Koffka, 1935).
Although increased emphasis has been placed on top-down effects in perception within
modern cognitive neuroscience theories and approaches, it is important to note that these
are mostly treated as external, modulatory effects. This view incorporates an implicit and
rather strong assumption according to which top-down effects represent secondary phe
nomena whose occurrence is not mandatory, but rather is contingent on environmental
conditions such as the level of visibility, or ambiguity. As such, they are even discussed as
exclusively attentional or higher-level executive effects that may influence and (p. 61) bias
visual perception, without necessarily constituting one of its inherent, core elements.
This chapter reviews a more comprehensive outlook on visual perception. In this account,
interpreting the visual world can’t be accomplished by relying solely on bottom-up infor
mation, but rather it emerges from the integration of external information with preexist
ing knowledge. Several sources of such knowledge that trigger different types of top-
down biases and may modulate and guide visual processing are described here. First, we
discuss how our brain, when presented with a visual object, employs a proactive strategy
of testing multiple hypotheses regarding its identity. Another source of early top-down fa
cilitatory bias derived from the presented object concerns its potential emotional salience
that is also extracted during the early phases of visual processing. At the same time, a
comparable associative and proactive strategy is used for formulating and testing hy
potheses based on the context in which the object is presented and on other items likely
to accompany it. Finally, factors outside the currently presented input such as task de
mands, previously encountered events, or the behavioral context may be used as addition
al sources of top-down influences in visual recognition. In conclusion, although triggered
and constrained by the incoming bottom-up signals, visual perception is strongly guided
and shaped by top-down mechanisms whose origin, temporal dynamics, and neural basis
are reviewed and synthesized here.
In our everyday life, we are rarely presented with clearly and fully perceptible visual in
formation, and we rarely approach our environment without certain predictions. Even
highly familiar objects are constantly encountered in different circumstances that greatly
vary in terms of lighting, occlusions, and other parameters. Naturally, all of these varia
tions limit our ability to structure and interpret the presented scene in a meaningful man
ner without relying on previous experience. Therefore, outside the laboratory, successful
recognition has to rely not only on the immediately available visual input but also on dif
Page 2 of 24
Top-Down Effects in Visual Perception
Before reviewing specific instances in which top-down mechanisms are of prominent rele
vance, it is important to mention that the phrase “top-down modulation” has typically
been utilized in several different ways in the literature. Engel et al. (2001) summarize
four variants of its use: anatomical (equating top-down influences with the feedback activ
ity in a processing hierarchy), cognitivist (equating top-down influences with hypothesis-
or expectation-driven processing), perceptual or gestaltist (equating top-down influences
with contextual modulation of perceptual items), and dynamicist (equating top-down in
fluences with the enslavement of local processing elements by large-scale neuronal dy
namics). Although separable, these variants are in some occasions partly overlapping
(e.g., it may be hard to clearly separate cognitivist and perceptual definitions) or mutually
complementary (e.g., cognitivist influences may be mediated by anatomical or dynamicist
processing mechanisms). A more general definition might specify top-down influences as
instances in which complex types of information represented at higher processing stages
influence simpler processing occurring at earlier stages (Gilbert & Sigman, 2007). How
ever, even this definition might be problematic in some instances because the term “com
plexity of information” can prove to be hard to specify, or even to apply in some instances
of top-down modulations. Nevertheless, bringing different definitions and processing lev
els together, it is possible to principally suggest that bottom-up information flow is sup
ported by feedforward connections that transfer information from lower to higher level
regions, in contrast to feedback connections that transmit the signals originating from
the higher level areas downstream within the processing hierarchy. Feedforward connec
tions are typically considered to be those that originate from supragranular layers and
terminate in and around the fourth cortical layer, in contrast to feedback connections that
originate in infragranular and end in agranular layers (Felleman & Van Essen, 1991; Fris
ton, 2005; Maunsell & Van Essen, 1983; Mumford, 1992; Shipp, 2005). (p. 62) Further
more, feedforward connections are relatively focused and tend to project to a smaller
number of mostly neighboring regions in the processing hierarchy. In contrast, feedback
connections are more diffused because they innervate many regions and form wider con
nection patterns within them (Shipp & Zeki, 1989; Ungerleider & Desimone, 1986). Con
sequently, feedforward connections carry the main excitatory input to cortical neurons
and are considered as driving connections, unlike the feedback ones that typically have
modulatory effects (Buchel & Friston, 1997; Girard & Bullier, 1989; Salin & Bullier, 1995;
Sherman & Guillery, 2004). For example, feedback connections potentiate responses of
low-order areas by nonlinear mechanisms such as gain control (Bullier et al., 2001), in
crease the synchronization in lower order areas (Munk et al., 1995), and contribute to at
tentional processing (Gilbert et al., 2000). Mumford (1992) has argued that feedback and
feedforward connections have to be considered of equal importance, emphasizing that
most cognitive processes reflect a balanced exchange of information between pairs of
Page 3 of 24
Top-Down Effects in Visual Perception
brain regions that are often hierarchically asymmetrical. However, the equal importance
does not imply equal functionality: as suggested by Lamme et al. (1998), feedforward con
nections may be fast, but they are not necessarily linked with perceptual experience. In
stead, attentive vision and visual awareness, which may in everyday life introspectively be
linked to a conscious perceptual experience, arise from recurrent processing within the
hierarchy (Lamme & Roefsema, 2000).
er object features (Bar, 2003, 2007; Kveraga et al., 2007b). One proposal is that percep
tion, even when single objects are presented in isolation, progresses through several
phases that constitute an activate-predict-confirm perception cycle (Enns & Lleras, 2008).
Explicitly or implicitly, this view is in accordance with predictive theories of cortical pro
cessing (Bar, 2003; Friston, 2005; Grossberg, 1980; Mumford, 1992; Rao & Ballard, 1999;
Ullman, 1995) that were developed in attempts to elucidate the mechanisms of iterative
recurrent cortical processing underlying successful cognition. Their advancement was
strongly motivated by the increase in knowledge regarding the structural and functional
properties of feedback and feedforward connections described earlier, as well as the
posited distinctions between (p. 63) forward and inverse models in computer vision (Bal
lard et al., 1983; Friston, 2005; Kawato et al., 1993). Based on these developments, pre
dictive theories suggested that different cortical systems share a common mechanism of
constant formulation and communication of expectations and other top-down biases from
higher to lower level cortical areas, which thereby become primed for the anticipated
events. This allows the input that arrives to these presensitized areas to be compared and
integrated with the postulated expectations, perhaps through specific synchrony patterns
visible across different levels of the hierarchy (Engel et al., 2001; Ghuman et al., 2008;
Kveraga et al., 2007b; von Stein et al., 2000; von Stein & Satnthein, 2000). Such compari
son and integration of top-down and bottom-up information has been posited to rely on it
erative error-minimization mechanisms (Friston, 2005; Grossberg, 1980; Kveraga et al.,
2007b; Mumford, 1992; Ullman, 1995) that support successful cognitive functioning. With
respect to visual processing, this means that an object can be recognized once a match
between the prepostulated hypothesis and sensory representation is reached, such that
no significant difference exists between the two. As mentioned before, this implies that
feedforward pathways carry error signals, or information regarding the residual discrep
ancy between predicted and actual events (Friston, 2005; Mumford, 1992; Rao & Ballard,
1999). This suggestion is highly important because it posits a privileged status for errors
of prediction in cortical processing. These events require more pronounced and pro
longed analysis because they typically signal inadequacy of the preexisting knowledge for
efficient functioning. Consequently, in addition to providing a powerful signal for novelty
detection, discrepancy signals often trigger a reevaluation of current knowledge, new
learning, or a change in behavior (Corbetta et al., 2002; Escera et al., 2000; Schultz &
Dickinson, 2000; Winkler & Czigler, 1998). In contrast, events that are predicted correct
ly typically carry little informational value (because of successful learning, we expected
these to occur all along) and are therefore processed in a more efficient manner (i.e.,
faster and more accurately) than the unexpected or wrongly predicted ones. Although the
described conceptualization of the general dynamics of iterative recurrent processing
across different levels of cortical hierarchies is shared by most predictive theories of
neural processing, these theories differ with respect to their conceptualizations of more
specific features of such processing (e.g., the level of abstraction of the top-down mediat
ed templates or the existence of information exchange outside neighboring levels of corti
cal hierarchies; cf. Bar, 2003; Kveraga et al., 2007b; Mumford, 1992).
Page 5 of 24
Top-Down Effects in Visual Perception
One of the most influential models that highlight the relevance of recurrent connections
and top-down feedback in visual processing is the “global-to-local” integrated model of vi
sual processing of Bullier (2001). This model builds on our knowledge of the anatomy and
functionality of the visual system, especially the differences between magnocellular and
parvocellular visual pathways that carry the visual information from the retina to the
brain. Specifically, it takes into account findings showing that the magnocellular and par
vocellular pathways are separated over the first few processing stages and that, following
stimulus presentation, area V1 receives activation from lateral geniculate nucleus magno
cellular neurons around 20 ms earlier than from parvocellular neurons. This faster activa
tion of the M-channel (characterized by high contrast but poor chromatic sensitivity, larg
er receptive fields, and lower spatial sampling rate), together with the high degree of
myelination, could account for the early activation of the dorsal visual processing stream
after visual stimulation, which enables the generation of fast feedback connections from
higher to lower areas (V1 and V2) at exactly the time when feedforward activation from
the P-channel arrives (Bullier, 2001; Kaplan, 2004; Kaplan & Shapley, 1986; Merigan &
Maunsell, 1993; Schiller & Malpeli, 1978; Goodale & Milner, 1992). This view is very dif
ferent not only from the classic theories that emphasize the importance of feedforward
connections but also from the more “traditional” account of feedback connections stating
that, regardless of the overall importance of recurrent connections, the first sweep of ac
tivity through the hierarchy of (both dorsal and ventral) visual areas is still primarily de
termined by the pattern of feedforward connections (Thorpe & Imbert, 1989). The inte
grated model of visual processing treats V1 and V2 areas as general-purpose representa
tors that integrate computations from higher levels, allowing global information to influ
ence the processing of more detailed ones. V1 could be a place for perceptual integration
that reunites information returned from different higher level areas by feedback connec
tions after being divided during the first activity sweep for a very simple reason: This cor
tical area still has a high-resolution map of almost all relevant feature information. This
view resonates well with recent theories of visual processing and awareness, such as
Zeki’s theory of visual (p. 64) consciousness (Zeki & Bartels, 1999), Hochstein and
Ahissar’s theory of perceptual processing and learning (Hochstein & Ahissar, 2002), or
Lamme’s (2006) views on consciousness.
Another model that suggests a similar division of labor and offers a predictive view of vi
sual processing in object recognition was put forth by Bar and colleagues (Bar, 2003,
2007; Bar et al., 2006; Kveraga et al., 2007a). This model identifies the information con
tent that triggers top-down hypotheses and characterizes the exact cortical regions that
bias visual processing by formulating and communicating those top-down predictions to
lower-level cortical areas. It also explains how top-down information modulates bottom-up
processing in situations in which single objects are presented in isolation and in which
prediction is driven by a subset of an object’s own properties that facilitate the recogni
tion of the object itself as well as other objects it typically appears with. Specifically, this
top-down facilitation model posits that visual recognition progresses from an initial stage
aimed at determining what an object resembles, and a later stage geared toward specify
ing its fine details. For this to occur, the top-down facilitation model critically assumes
Page 6 of 24
Top-Down Effects in Visual Perception
that different features of the presented stimulus are processed at different processing
stages. First, the coarse, global information regarding the object shape is rapidly extract
ed and used for activating in memory existing representations that most resemble the
global properties of the given object to be recognized. These include all objects that share
the rudimentary properties of the presented object and look highly similar if viewed in
blurred or decontextualized circumstances (e.g., a mushroom, desk lamp, and umbrella).
Although an object cannot be identified with full certainty based on coarse stimulus out
line, such rapidly extracted information is still highly useful for basic-level recognition of
resemblance, creating the so-called analogies (Bar, 2007). Generally, analogies allow the
formulation of links between the presented input and relevant preexisting representa
tions in memory, which may be based on different types of similarity between the two
(e.g., perceptual, semantic, functional, or conceptual). In the context of visual recogni
tion, analogies are based on global perceptual similarity between the input object and ob
jects in memory, which allows the brain to generate multiple hypotheses or guesses re
garding the object’s most likely identity.
For these analogies and initial guesses to be useful, they need to be triggered early dur
ing visual processing, while they can still bias the slower incoming bottom-up input. Thus,
information regarding the coarse stimulus properties has to be processed first, before the
finer object features. Indeed, it has been suggested that such information is rapidly ex
tracted and conveyed using low spatial frequencies of the visual input through the mag
nocellular (M) pathway, which is ideal for transmitting coarse information regarding the
general object outlines at higher velocity compared with other pathways (Merigan &
Maunsell, 1993; Schiller & Malpeli, 1978). This information is transmitted to the or
bitofrontal cortex (OFC) (Bar, 2003; Bar et al., 2006), a polymodal region implicated pri
marily in the processing of rewards and affect (Barbas, 2000; Carmichael & Price, 1995;
Cavada et al., 2000; Kringelbach & Rolls, 2004), as well as supporting some aspects of vi
sual processing (Bar et al., 2006; Freedman et al., 2003; Frey & Petrides, 2000; Meunier
et al., 1997; Rolls et al., 2005; Thorpe et al., 1983). In the present framework, the OFC is
suggested to generate predictions regarding the object’s identity by activating all repre
sentations that share the global appearance with the presented image by relying on
rapidly analyzed low spatial frequencies (Bar, 2003, 2007). Once fed back into the ventral
processing stream, these predictions interact with the slower incoming bottom-up infor
mation and facilitate the recognition of the presented object. Experimental findings cor
roborate the hypothesis regarding the relevance of the M-pathway for transmitting
coarse visual information (Bar, 2003; Bar et al., 2006; Kveraga et al., 2007b), as well as
the importance of OFC activity, and the interaction between the OFC and inferior tempo
ral cortex, for recognizing visual objects (Bar et al., 2001, 2006). It has also been suggest
ed that low spatial frequency information is used for generating predictions regarding
other objects and events that are likely to be encountered in the same context (Bar, 2004;
Oliva & Torralba, 2007). As also suggested by Ganis and Kosslyn (2007), the associative
nature of our long-term memory plays a crucial role in matching the presented input with
preexisting representations and all associated information relevant for object identifica
tion. Finally, as suggested by Barrett and Bar (2009), different portions of the OFC are al
Page 7 of 24
Top-Down Effects in Visual Perception
Although the number of different association types shared by the two objects may be of
some relevance, it is mainly the frequency and consistency of their mutual co-occurrence
that determines the strength of their associative connections (Bar, 2004; Biederman,
1981; Hutchison, 2003; Spence & Owens, 1990). Such associative strength is important in
that it provides information that our brain can use for accurate predictions. Our nervous
system is extremely efficient in capturing the statistics in natural visual scenes (Fiser &
Aslin, 2001; Torralba & Oliva, 2003), as well as learning the repeated spatial contingen
cies of even arbitrarily distributed abstract stimuli (Chun & Jiang, 1999). In addition to
being sensitive and successful in learning contextual relations, our brain is also highly ef
ficient in utilizing this knowledge for facilitating its own processing. In accordance with
the general notion of a proactively predictive brain (Bar, 2009), it has repeatedly been
Page 8 of 24
Top-Down Effects in Visual Perception
demonstrated that the learned contextual information is constantly used for increasing
the efficiency of visual search and recognition (Bar, 2004; Biederman et al., 1982; Daven
port & Potter, 2004; Torralba et al., 2006). Specifically, objects presented in familiar back
grounds, especially if appearing in expected spatial configuration, are processed faster
and more accurately than those presented in noncongruent settings. Furthermore, the
contextual and semantic redundancy provided by the presentation of several related ob
jects encountered in such settings allows the observer to resolve the insecurities and am
biguities of individual objects more efficiently. Such context-based predictions are ex
tremely useful because they save processing resources and reduce the need for exerting
mental effort while dealing with predictable aspects of our surroundings. They allow us to
allocate attention toward relevant environmental aspects more efficiently, and they are
very useful for guiding our actions and behavior. However, to understand the mechanisms
underlying contextual top-down facilitation effects, it is important to illustrate how the
overall meaning, or the gist, of a complex image can be processed fast enough to become
useful for guiding the processing of individual objects presented within it (Bar, 2004; Oli
va, 2005; Oliva & Torralba, 2001).
Studies that have addressed this issue have recently demonstrated how extracting the
gist of a scene mainly relies on low spatial frequencies present in the image. This is not
surprising because we demonstrated earlier how such frequencies can be rapidly ana
lyzed within the M-pathway allowing them to, in this case, aid the rapid classification of
the presented context (Bar, 2004; Oliva & Torralba, 2001; Schyns & Oliva, 1994). Specifi
cally, information regarding the global scene features can proactively be used for activat
ing context frames, namely the representations of objects and relations that are common
to that specific context (Bar, 2004; Bar & Ullman, 1996). Similar to the ideas of schemata
(p. 66) (Hock et al., 1978), scripts (Shank, 1975), and frames (Minsky, 1975) from the
1970s and 1980s, such context frames are suggested to aid the processing of individual
objects presented in the scene. At this point, it is important to mention that context
frames should not be understood as static images that are activated in an all-or-none
manner, but rather as dynamic entities that are processed gradually. In this view, a proto
typical spatial template of the global structure of a familiar context is activated first, and
is then filled with more instance-specific details until it develops into an episodic context
frame that includes specific information regarding an individual instantiation of the given
context. Overall, there is ample evidence that demonstrates our brain’s efficiency in ex
tracting the gist of even highly complex visual images, allowing it to identify individual
objects typically encountered in such settings.
This, however, represents only one level of predictive, contextual top-down modulations.
On another level, it is possible that individual objects that are typically encountered to
gether in a certain context are also able to mutually facilitate each other’s processing. In
other words, although it has long been known that a kitchen environment helps in recog
nizing a refrigerator that it typically contains, it was less clear whether seeing an image
of that refrigerator in isolation could automatically invoke the image of the contextual set
ting in which it is typically embedded. If so, it would be plausible to expect that present
ing objects typically associated with a certain context in isolation could facilitate subse
Page 9 of 24
Top-Down Effects in Visual Perception
quent recognition of their typical contextual associates, even when these are presented
outside the shared setting. This hypothesis has been addressed and substantiated in a se
ries of studies conducted by Bar and colleagues (Aminoff et al., 2007; Bar et al., 2008a,
2008b; Bar & Aminoff, 2003; Fenske et al., 2006) that indicated the relevance of the
parahippocampal cortex (PHC), the retrosplenial complex (RSC), and the medial pre
frontal cortex (MPFC) for contextual processing. In addition to identifying the regions rel
evant for contextual relations, they also revealed a greater sensitivity of the PHC to asso
ciations with greater physical specificity, in contrast to RSC areas that seem to represent
contextual associations in a more abstract manner (Bar et al., 2008b). Similar to the
neighboring OFC’s role in generating predictions related to object identity, the involve
ment of the neighboring part of the MPFC was suggested to reflect the formulations of
predictions based on familiar, both visual-spatial and more abstract types of contextual
associations (Bar & Aminoff, 2003; Bar et al., 2008a, 2008b). These findings are of high
relevance because they illustrate how the organization of the brain as a whole may auto
matically and parsimoniously support the strongly associative nature of our cognitive
mind (Figure 4.1).
Page 10 of 24
Top-Down Effects in Visual Perception
(p. 67)
Page 11 of 24
Top-Down Effects in Visual Perception
1972, 1981; Biederman et al., 1973; Driver & Baylis, 1998; Scholl, 2001). Typically, know
ing what to expect allows one to attend to the location or other features of the expected
stimulus. Not only spatial but also object features, objects or categories, and temporal
context or other perceptual groups could be considered different subtypes of top-down in
fluences (Gilbert & Sigman, 2007).
In addition, prior presentation of events that provide clues regarding the identity of the
incoming stimulation may also influence visual processing. Within this context, one spe
cific form of the potential influence of previous experience on current visual processing
involves priming (Schacter et al., 2004; Tulving & Schacter, 1990; Wiggs & Martin, 1998).
Events that occur before target presentation and that influence its processing may in
clude those that are semantically or contextually long-term related to the target event
(Kveraga et al., 2007b), (p. 68) as well as stimuli that had become associated with the tar
get stimulus through short-term learning within the same (Schubotz & von Cramon,
2001; 2002) or a different (Widmann et al., 2007) modality. It has been suggested that, in
some cases, and especially in the auditory domain, prediction regarding the forthcoming
stimuli can be realized within the sensory system itself (Näätänen et al., 2007). In other
cases, however, these predictive sensory phenomena may be related to the computations
of the motor domain. Specifically, expectations regarding the incoming stimulation may
be formulated based on an “efference copy” elicited by the motor system in situations in
which a person’s own actions trigger such stimulation (Sperry, 1950). Specifically, von
Holst, Mittelstaedt, and Sperry in the 1950s provided the first experimental evidence
demonstrating the importance of motor-to-sensory feedback in controlling behavior (Bays
& Wolpert, 2008; Wolpert & Flanagan, 2001). This motivated an increased interest in the
so-called internal model framework that can now be considered a prevailing, widely ac
cepted view of action generation (Miall & Wolpert, 1996; Wolpert & Kawato, 1998). Ac
cording to this framework, not only is there a prediction for sensory events that may be
considered consequences of one’s own behavior (Blakemore et al., 1998), but the same
system may be utilized for anticipating sensory events that are strongly associated with
other sensory events co-occurring on a short time scale (Schubotz, 2007). Before moving
away from the motor system, it is important to mention one more, somewhat different
view that also emphasizes a strong link between perceptual and motor behavior. Specifi
cally, according to Gross and colleagues (1999), when discussing perceptual processing, a
strong focus should be placed on sensorimotor anticipation because it allows one to di
rectly characterize a visual scene in categories of behavior. Thus, perception is not simply
predictive but also is “predictive with purpose” and may thus be described as behavior
based (Gross et al., 1999) or, as suggested by Arbib (1972), action oriented.
ample, contextual information is crucial in recovering visual scene properties lost be
cause of the blurs or superimpositions in visual image (Albright & Stoner, 2002). In situa
tions in which an object is ambiguous and may be interpreted in more than one fashion,
the importance of a prior template that may guide recognition is even more of relevance
than when viewing a clear percept (Cavanagh, 1991). Examples of such stimuli include
two faces or a vase versus one face behind a vase (Costall, 1980), or as shown in Figure
4.2, a face of a young versus an old woman or a man playing a saxophone seen in silhou
ette versus a woman’s face in shadow (Shepard, 1990).
Generally, in the previous sections, a wide array of top-down phenomena have been men
tioned, and some were described in more detail. From this description, it became clear
how difficult it is to categorize these effects in relation to other cognitive functions, the
most important of which is attention. Specifically, not only is it hard to clearly distinguish
between attentional and other forms of top-down phenomena, but also, given that the
same effect is sometimes discussed as an attentional, and sometimes as a perceptual,
phenomenon, a clear separation may not be possible. This might not even be necessary in
all situations because, for most practical purposes, the mechanisms underlying such phe
nomena and the effects they introduce may be considered the same. When discussing top-
down influences of selective attention, one typically refers to the processing guided by
hypotheses or expectations and the influence of prior knowledge or other (p. 69) personal
features on stimulus selection (Engel et al., 2001), which is quite concordant to the way
top-down influences have been addressed in this chapter. Even aspects of visual percep
tion as mentioned here might, at least in part, be categorized under anticipatory atten
tion that involves a preparation for the upcoming stimulus and improves the speed and
precision of stimulus processing (Posner & Dehaene, 1994; Brunia, 1999). Not surprising
ly, then, instead of always categorizing a certain effect into one functional “box,” Gazzaley
(2010) uses a general term “topdown modulation,” which includes changes in sensory
cortical activity associated with relevant and irrelevant information that stand at the
crossroads of perception, memory, and attention. Similarly, a general conceptualization of
Page 13 of 24
Top-Down Effects in Visual Perception
top-down influences as those that reflect an interaction of goals, action plans, working
memory, and selective attention is suggested by Engel et al. (2001). In an attempt to bet
ter differentiate basic types of top-down influences in perception, Ganis and Kosslyn
(2007) suggested a crucial distinction between strategic top-down processes that include
those influences that are under voluntary control and may be contrasted with involuntary
reflexive types of top-down modulations. In a somewhat different view, Summerfield and
Egner (2009) argued that biases in visual processing include attentional mechanisms that
prioritize processing based on motivational relevance and expectations that bias process
ing based on prior experience. While acknowledging these initial attempts to create a
clear taxonomy of top-down influences in perceptual processing, there is still quite a lot
of disagreement in this area that remains to be settled in the future.
Concluding Remarks
Visual recognition was depicted here as a process that reflects the integration of two
equally relevant streams of information. One, the bottom-up stream, captures the input
present in the visual image itself and conveyed through the senses. Second, a top-down
stream, contributes based on prior experiences, current personal and behavioral sets, and
future expectations and modifies the processing of the presented stimulus. In this concep
tualization, perception may be considered the process of integrating the incoming input
and our preexisting (p. 70) knowledge that exists in all contexts, even in very controlled
and simple viewing circumstances.
In this chapter, it has been argued that top-down effects in visual recognition may be of
different complexity and may be triggered by the stimulus itself or information external to
the presented stimulus such as the task instruction, copies of motor commands, or prior
presentation of events informative for the current stimulation. These various types of top-
down influences originate from different cortical sources, reflecting the fact that they are
triggered by different types of information. Regardless of the differences in their respec
tive origins, different types of top-down biases may nevertheless result in similar local ef
fects, or rely on similar mechanisms for the communication between lower level visual ar
eas and higher level regions to enable the described effects. In that sense, understanding
the principles of one type of top-down facilitation effect may provide important clues re
garding the general mechanisms of triggering and integrating top-down and bottom-up
information that underlie successful visual recognition.
In conclusion, the body of research synthesized here demonstrates the richness, complex
ity, and importance of top-down effects in visual perception and illustrates some of the
fundamental mechanisms underlying these. Critically, it is suggested that top-down ef
fects should not be considered a secondary phenomenon that only occurs in special, ex
treme settings. Instead, the wide, heterogeneous set of such biases suggests an ever-
present and complex informational processing stream that, together with the bottom-up
stream, constitutes the core of visual processing. As suggested by Gregory (1980), per
ception can then be defined as a dynamic search for the best interpretation of the sensory
data, a claim that highlights both the active and the constructive nature of visual percep
tion. Consequently, visual recognition itself should be considered a proactive, predictive,
and dynamic process of integrating different sources of information, the success of which
is determined by their mutual resonance and correspondence and by our ability to learn
from the past in order to predict the future.
Page 15 of 24
Top-Down Effects in Visual Perception
Author Note
Work on this chapter was supported by NIH grant R01EY019477-01, NSF grant 0842947,
and DARPA grant N10AP20036.
References
Albright, T. D., & Stoner, G. R. (2002). Contextual influences on visual processing. Annual
Review of Neuroscience, 25, 339–379.
Aminoff, E., Gronau, N., & Bar, M. (2007). The parahippocampal cortex mediates spatial
and nonspatial associations. Cerebral Cortex, 27, 1493–1503.
Ballard, D. H., Hinton, G. E., & Sejnowski, T. J. (1983). Parallel visual computation. Nature,
306, 21–26.
Bar, M. (2009). The proactive brain: Memory for predictions. Theme issue: Predictions in
the brain: Using our past to generate a future (M. Bar, Ed.), Philosophical Transactions of
the Royal Society, Series B, Biological Sciences, 364, 1235–1243.
Bar, M. (2007). The proactive brain: Using analogies and associations to generate predic
tions. Trends in Cognitive Sciences, 11 (7), 280–289.
Bar, M. (2004). Visual objects in context. Nature Reviews, Neuroscience, 5 (8), 617–629.
Bar, M. (2003). A cortical mechanism for triggering top-down facilitation in visual object
recognition. Journal of Cognitive Neuroscience, 15, 600–609.
Bar, M., & Aminoff, E. (2003). Cortical analysis of visual context. Neuron, 38 (2), 347–358.
Bar, M., Aminoff, E., & Ishai, A. (2008a). Famous faces activate contextual associations in
the parahippocampal cortex. Cerebral Cortex, 18 (6), 1233–1238.
Bar, M., Aminoff, E., & Schacter, D. L. (2008b). Scenes unseen: The parahippocampal cor
tex intrinsically subserves contextual associations, not scenes or places per se. Journal of
Neuroscience, 28, 8539–8544.
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Dale, A. M., Hämäläi
nen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., & Halgren, E. (2006). Top-down
facilitation of visual recognition. Proceedings of the National Academy of Sciences U S A,
103 (2), 449–454.
Bar, M., Tootell, R., Schacter, D., Greve, D., Fischl, B., Mendola, J., Rosen, B. R., & Dale, A.
M. (2001). Cortical mechanisms of explicit visual object recognition. Neuron, 29 (2), 529–
535.
Page 16 of 24
Top-Down Effects in Visual Perception
Bar, M., & Ullman, S. (1996). Spatial context in recognition. Perception, 25 (3), 343–352.
Barbas, H. (2000). Connections underlying the synthesis of cognition, memory, and emo
tion in primate prefrontal cortices. Brain Research Bulletin, 52 (5), 319–330.
Barrett, L. F., & Bar, M. (2009). See it with feeling: Affective predictions during object
perception. Theme issue: Predictions in the brain: Using our past to generate a future (M.
Bar. Ed.), Philosophical Transactions of the Royal Society, Series B, Biological Sciences,
364, 1325–1334.
Bays, P. M., & Wolpert, D. M. (2008). Predictive attenuation in the perception of touch. In
P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Sensorimotor foundations of higher cogni
tion: Attention and performance XXII (pp. 339–359). New York: Oxford University Press.
Beck, D. M., & Kastner, S. (2009). Top-down and bottom-up mechanisms in biasing com
petition in the human brain. Vision Research, 49 (10), 1154–1165.
Biederman, I., Glass, A. L., & Stacy, W. (1973). Searching for objects in real-world scenes.
Journal of Experimental Psychology, 97, 22–27.
Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: De
(p. 71)
tecting and judging objects undergoing relational violations. Cognitive Psychology, 14 (2),
143–177.
Blakemore, S. J., Rees, G., & Frith, C. D. (1998). How do we predict the consequences of
our actions? A functional imaging study. Neuropsychologia, 36 (6), 521–529.
Bruno, N., & Franz, V. H. (2009). When is grasping affected by the Muller-Lyer illusion? A
quantitative review. Neuropsychologia, 47 (6), 1421–1433.
Buchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by atten
tion: Cortical interactions evaluated with structural equation modeling and fMRI. Cere
bral Cortex, 7 (8), 768–778.
Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–
107.
Carlsson, K., Petrovic, P., Skare, S., Petersson, K. M., & Ingvar, M. (2000). Tickling antici
pations: Neural processing in anticipation of a sensory stimulus. Journal of Cognitive Neu
roscience, 12, 691–703.
Page 17 of 24
Top-Down Effects in Visual Perception
Carmichael, S. T., & Price, J. L. (1995). Limbic connections of the orbital and medial pre
frontal cortex in macaque monkeys. Journal of Comparative Neurology, 363 (4), 615–641.
Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R. J., & Reinoso-Suarez, F. (2000). The
anatomical connections of the macaque monkey orbitofrontal cortex: A review. Cerebral
Cortex, 10 (3), 220–242.
Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning
of visual covariation. Psychological Science, 10, 360–365.
Corbetta, M., Kincade, J. M., & Shulman, G. L. (2002). Neural systems for visual orienting
and their relationships to spatial working memory. Journal of Cognitive Neuroscience, 14,
508–523.
Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background per
ception. Psychological Science, 15 (8), 559–564.
Dehaene, S., Kerszberg, M., & Changeux, J. P. (1998). A neuronal model of a global work
space in effortful cognitive tasks. Proceedings of the National Academy of Sciences U S A,
95, 14529–14534.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An
nual Review of Neuroscience, 183, 193–222.
Driver, J., & Baylis, G. C. (1998). Attention and visual object segmentation. In R. Parasura
man (Ed.), The attentive brain (pp. 299–325). Cambridge, MA: MIT Press.
Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: Oscillations and syn
chrony in top-down processing. Nature Reviews, Neuroscience, 2 (10), 704–716.
Enns, J. T., & Lleras, A. (2008). What’s next? New evidence for prediction in human vi
sion. Trends in Cognitive Sciences, 12, 327–333.
Escera, C., Alho, K., Schröger, E., & Winkler, I. (2000). Involuntary attention and dis
tractibility as evaluated with event-related brain potentials. Audiology and Neurootology,
5, 151–166.
Felleman, D. J., & Van Essen, V. C. (1991). Distributed hierarchical processing in primate
visual cortex. Cerebral Cortex, 1, 1–47.
Fenske, M. J., Aminoff, E., Gronau, N., & Bar, M. (2006). Top-down facilitation of visual ob
ject recognition: Object-based and context-based contributions. Progress in Brain Re
search, 155, 3–21.
Page 18 of 24
Top-Down Effects in Visual Perception
Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial
structures from visual scenes. Psychological Science, 12, 499–504.
Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri
mate prefrontal and inferior temporal cortices during visual categorization. Journal of
Neuroscience, 23 (12), 5235–5246.
Frey, S., & Petrides, M. (2000). Orbitofrontal cortex: A key prefrontal region for encoding
information. Proceedings of the National Academy of Sciences U S A, 97, 8723–8727.
Frith, C., & Dolan, R. J. (1997). Brain mechanisms associated with top-down processes in
perception. Philosophical Transactions of the Royal Society, Series B, Biological Sciences,
352 (1358), 1221–1230.
Ganis, G., & Kosslyn, S. M. (2007). Multiple mechanisms of top-down processing in vision.
In S. Funahashi (Ed.), Representation and brain (pp. 21–45). Tokyo: Springer-Verlag.
Ghuman, A., Bar, M., Dobbins, I. G., & Schnyer, D. (2008). The effects of priming on
frontal-temporal communication. Proceedings of the National Academy of Sciences U S A,
105 (24), 8405–8409.
Gross, H., Heinze, A., Seiler, T., & Stephan, V. (1999). Generative character of perception:
A neural architecture for sensorimotor anticipation. Neural Networks, 12 (7–8), 1101–
1129.
Gilbert, C., Ito, M., Kapadia, M., & Westheimer, G. (2000). Interactions between attention,
context and learning in primary visual cortex. Vision Research, 40 (10–12), 1217–1226.
Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory process
ing. Neuron, 54, 677–696.
Girard, P., & Bullier, J. (1989). Visual activity in area V2 during reversible inactivation of
area 17 in the macaque monkey. Journal of Neurophysiology, 62 (6), 1287–1302.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15 (1), 20–25.
Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87
(1), 1–51.
Page 19 of 24
Top-Down Effects in Visual Perception
Hochstein, S., & Ahissar, M. (2002). Review view from the top: Hierarchies and reverse
hierarchies in the visual system. Neuron, 36, 791–804.
Hock, H. S., Romanski, L., Galie, A., & Williams, C. S. (1978). Real-world schemata and
scene recognition in adults and children. Memory and Cognition, 6, 423–431.
Hopfinger, J. B., Buonocore, M. H., & Mangun, G. R. (2000). The neural mechanisms of
top-down attentional control. Nature Neuroscience, 3 (3), 284–291.
Kaplan, E. (2004). The M, P, and K pathways of the primate visual system. In L. M. Chalu
pa & J.S. Werner (Eds.), The visual neuroscience (pp. 481–494). Cambridge, MA: MIT
Press.
Kaplan, E., & Shapley, R. M. (1986). The primate retina contains two types of ganglion
cells, with high and low contrast sensitivity. Proceedings of the National Academy of
Sciences U S A, 83 (8), 2755–2757.
Kastner, S., & Ungerleider, L.G. (2000). Mechanisms of visual attention in the human cor
tex. Annual Review of Neuroscience, 23, 315–341.
Kawato, M., Hayakawa, H., & Inui, T. (1993). A forward-inverse optics model of reciprocal
connections between visual cortical areas. Network, 4, 415–422.
Koffka, K. (1935). The principles of gestalt psychology. New York: Harcourt, Brace, &
World.
Kringelbach, M. L., & Rolls, E. T. (2004). The functional neuroanatomy of the human or
bitofrontal cortex: Evidence from neuroimaging and neuropsychology. Progress in Neuro
biology, 72 (5), 341–372.
Kveraga, K., Boshyan, J., & Bar, M. (2007a). Magnocellular projections as the trigger of
top-down facilitation in recognition. Journal of Neuroscience, 27 (48), 13232–13240.
Kveraga, K., Ghuman, A. S., & Bar, M. (2007b). Top-down predictions in the cognitive
brain. Brain and Cognition, 65, 145–168.
Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed
forward and recurrent processing. Trends in Neurosciences, 23, 571–579.
Lamme, V. A. F., Super, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feed
back processing in the visual cortex. Current Opinion in Neurobiology, 8, 529–535.
Page 20 of 24
Top-Down Effects in Visual Perception
Li, W., Piech, V., & Gilbert, C. D. (2006). Contour saliency in primary visual cortex. Neuron,
50, 951–962.
Maunsell, J. H. R., & Van Essen, D. C. (1983) Functional properties of neurons in the mid
dle temporal visual area of the macaque monkey. II. Binocular interactions and the sensi
tivity to binocular disparity. Journal of Neurophysiology, 49, 1148–1167.
Merigan, W. H., & Maunsell, J. H. (1993). How parallel are the primate visual pathways?
Annual Review of Neuroscience, 16, 369–402.
Meunier, M., Bachevalier, J., & Mishkin, M. (1997). Effects of orbital frontal and anterior
cingulate lesions on object and spatial memory in rhesus monkeys. Neuropsychologia, 35,
999–1015.
Miall, R. C., & Wolpert, D. M. (1996). Forward models for physiological motor control.
Neural Networks, 9 (8), 1265–1279.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An
nual Review of Neuroscience, 24, 167–202.
Munk, M. H., Nowak, L. G., Nelson, J. I., & Bullier, J. (1995). Structural basis of cortical
synchronization. II. Effects of cortical lesions. Journal of Neurophysiology, 74 (6), 2401–
2414.
Näätänen, R., Paavilainen, P., Rinne, T., & Ahlo, K. (2007). The mismatch negativity
(MMN) in basic research of central auditory processing: A review. Clinical Neurophysiolo
gy, 118, 2544–2590.
Oliva, A. (2005). Gist of the scene. In L. Itti, G. Rees, & J.K. Tsotsos (Eds.), Encyclopedia of
neurobiology of attention (pp. 251–256). San Diego, CA: Elsevier.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representa
tion of the spatial envelope. International Journal in Computer Vision, 42, 145–175.
Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cogni
tive Sciences, 11 (12), 520–527.
Pessoa, L., Kastner, S., & Ungerleider, L. G. (2003). Neuroimaging studies of attention:
From modulation of sensory processing to top-down control. Journal of Neuroscience, 23
(10), 3990–3998.
Page 21 of 24
Top-Down Effects in Visual Perception
Posner, M. I., & Dehaene, S. (1994). Attentional networks. Trends in Neuroscience, 17,
75–79.
Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional in
terpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–
87.
Reynolds, J. H., & Chelazzi, L. (2004). Attentional modulation of visual processing. Annual
Review of Neuroscience, 27, 611–647.
Rolls, E. T., Browning, A. S., Inoue, K., & Hernadi, I. (2005). Novel visual stimuli activate
a population of neurons in the primate orbitofrontal cortex. Neurobiology of Learning and
Memory, 84 (2), 111–123.
Salin, P. A., & Bullier, J. (1995). Corticocortical connections in the visual system: Struc
ture and function. Physiological Reviews, 75 (1), 107–154.
Schacter, D. L., Dobbins, I. G., & Schnyer, D. M. (2004). Specificity of priming: A cognitive
neuroscience perspective. Nature Reviews Neuroscience, 5 (11), 853–862.
Shank, R. C. (1975). Conceptual information processing. New York: Elsevier Science Ltd.
Schiller, P. H., & Malpeli, J. O. (1978). Functional specificity of lateral geniculate nucleus
laminae of the rhesus monkey. Journal of Neurophysiology, 41, 788–797.
Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time- and
spatial-dependent scene recognition. Psychological Science, 5 (4), 195–200.
Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80 (1–2), 1–46.
Schubotz, R. I. (2007). Prediction of external events with our motor system: towards a
new framework. Trends in Cognitive Sciences, 11, 211–218.
Schubotz, R. I., & von Cramon, D. Y. (2002). Dynamic patterns make the premotor cortex
interested in objects: Influence of stimulus and task revealed by fMRI. Brain Research
Cognitive Brain Research, 14, 357–369.
Schubotz, R. I., & von Cramon, D. Y. (2001). Functional organization of the lateral premo
tor cortex: fMRI reveals different regions activated by anticipation of object properties,
location and speed. Brain Research: Cognitive Brain Research, 11 (1), 97–112.
Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual
(p. 73)
Sherman, S. M., & Guillery, R. W. (2004). The visual relays in the thalamus. In L. M.
Chalupa & J. S. Werner (Eds.), The visual neuroscience (pp. 565–592). Cambridge, MA:
MIT Press.
Page 22 of 24
Top-Down Effects in Visual Perception
Shipp, S. (2005). The importance of being agranular: A comparative account of visual and
motor cortex. Philosophical Transactions of the Royal Society of London, Series B, Biolog
ical Sciences, 360, 797–814.
Shipp, S., & Zeki, S. (1989). The organization of connections between areas V5 and V2 in
macaque monkey visual cortex. European Journal of Neuroscience, 1 (4), 333–354.
Simmons, A., Matthews, S. C., Stein, M. B., & Paulus, M. P. (2004). Anticipation of emo
tionally aversive visual stimuli activates right insula. Neuroreport, 15, 2261–2265.
Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association strength.
Journal of Psycholinguistic Research, 19, 317–330.
Sperry, R. (1950). Neural basis of the spontaneous optokinetic response produced by visu
al inversion. Journal of Comparative and Physiological Psychology, 43, 482–489.
Summerfield, C., & Egner, T. (2009). Expectation (and attention) in visual cognition.
Trends in Cognitive Sciences, 13 (9), 403–408.
Thorpe, S. J., Rolls, E. T., & Maddison, S. (1983). Neuronal activity in the orbitofrontal
cortex of the behaving monkey. Experimental Brain Research, 49, 93–115.
Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network, 14 (3),
391–412.
Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance
of attention in natural scenes: the role of global features on object search. Psychological
Review, 113, 766–786.
Tulving, E., & Schacter, D. L. (1990). Priming and human memory systems. Science, 247,
301–306.
Ullman, S. (1995). Sequence seeking and counter streams: A computational model for
bidirectional information flow in the visual cortex. Cerebral Cortex, 1, 1–11.
Ungerleider, L. G., & Desimone, R. (1986). Cortical connections of visual area MT in the
macaque. Journal of Comparative Neurology, 248 (2), 190–222.
von Stein, A., Chiang, C., Konig, P., & Lorincz, A. (2000). Top-down processing mediated
by interareal synchronization. Proceedings of the National Academy of Sciences U S A, 97
(26), 14748–14753.
Page 23 of 24
Top-Down Effects in Visual Perception
von Stein, A., & Satnthein, J. (2000). Different frequencies for different scales of cortical
integration: From local gamma to long range alpha/theta synchronization. International
Journal of Psychophysiology, 38 (3), 301–313.
Watanabe, T., Harner, A. M., Miyauchi, S., Sasaki, Y., Nielsen, M., Palomo, D., & Mukai, I.
(1998). Task-dependent influences of attention on the activation of human primary visual
cortex. Proceedings of the National Academy of Sciences U S A, 95, 11489–11492.
Widmann, A., Gruber, T., Kujala, T., Tervaniemi, M., & Schroger, E. (2007). Binding sym
bols and sounds: Evidence from event-related oscillatory gamma-band activity. Cerebral
Cortex, 17, 2696–2702.
Wiggs, C. L., & Martin, A. (1998). Properties and mechanisms of visual priming. Current
Opinion in Neurobiology, 8, 227–233.
Winkler, I., & Czigler, I. (1998). Mismatch negativity: Deviance detection or the mainte
nance of the “standard.” Neuroreport, 9, 3809–3813.
Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction. Current Biology, 11 (18),
R729–R732.
Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for mo
tor control. Neural Networks, 11 (7–8), 1317–1329.
Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness. Consciousness and
Cognition 8, 225–259.
Moshe Bar
Andreja Bubic
Andreja Bubic, Martinos Center for Biomedical Imaging, Massachusetts General Hos
pital, Harvard Medical School, Charlestown, MA
Page 24 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Mental imagery is one of the cognitive functions that has received a lot of attention in the
past 40 years both from philosophers and cognitive psychologists. Recently, researchers
started to use neuroimaging techniques in order to tackle fundamental properties of men
tal images such as their depictive nature—which was fiercely debated for almost 30
years. Results from neuroimaging, brain-damaged patients, and transcranial magnetic
stimulation studies converged in showing that visual, spatial and motor mental imagery
relies on the same basic brain mechanisms used respectively in visual perception, spatial
cognition, and motor control. Thus, neuroimaging and lesions studies have proved critical
to answer the imagery debate between depictive and propositionalist theorists. Partly be
cause of the controversy that surrounded the nature of mental images, the neural bases
of mental imagery are probably more closely defined than those of any other higher cog
nitive functions.
Keywords: visual mental imagery, spatial mental imagery, motor mental imagery, visual perception, neuroimaging,
brain-damaged patients, transcranial magnetic stimulation
When we think of the best way to load luggage in the trunk of a car, of the fastest route to
go from point A to point B, or of the easiest way to assemble bookshelves, we generally
rely on our abilities to simulate those events by visualizing them instead of actually per
forming them. When we do so, we experience “seeing with the mind’s eye,” which is the
hallmark of a specific type of representation processed by our brain, namely, visual men
tal images. According to Kosslyn, Thompson, and Ganis (2006), mental images are repre
sentations that are similar to those created on the initial phase of perception but that do
not require an external stimulation to be created. In addition, those representations pre
serve the perceptible properties of the stimuli they represent.
Page 1 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
The revival of mental imagery was driven not only by empirical evidence that mental im
agery was a key part of memory, problem solving, and creativity but also by the type of
questions and methodologies researchers used. Researchers shifted from phenomenologi
cal problematic and introspective methods and started to focus on refining the under
standing of the nature of the representations involved in mental imagery and of the cogni
tive processes that interpret those representations. The innovative idea was to use
chronometric data as a “mental tape measure” of the underlying cognitive processes that
interpret mental images in order to characterize the properties of the underlying repre
sentations and cognitive processes. One of the most influential works that helped mental
imagery to regain its respectability was proposed by Shepard and Metzler (1971). In their
mental rotation paradigm, participants viewed two three-dimensional (3D) objects with
several arms, each consisting of small cubes, and decided whether the two objects had
the same shape, regardless of difference in the orientations of the objects. The key find
ing was that response times increased linearly with increasing angular disparity between
the two objects. The results demonstrated for the first time that people mentally rotated
one of the objects in congruence with the orientation of the other object. Other para
digms, such as the image scanning paradigm (e.g., Finke & Pinker, 1982; Kosslyn, Ball, &
Reiser, 1978), allowed researchers to characterize not only the properties of the cognitive
processes at play in visual mental imagery but also the nature of visual mental images.
Critically, the data of these experiments suggested that visual mental images are depic
tive representations. By depictive, researchers mean that (1) each part of the representa
tion corresponds to a part of the represented object, such that (2) the distances between
Page 2 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
representations of the parts (in a representational space) correspond to the distances be
tween the parts on the object itself (see Kosslyn et al., 2006).
However, not all researchers interpreted behavioral results in mental imagery studies as
evidence that mental images are depictive. For example, Pylyshyn (1973, 2002, 2003a,
2003b, 2007) proposed a propositional account of mental imagery. According to this view,
results obtained in mental imagery experiments can be best explained by the fact that
participants rely not on visuo-spatial mental images, but instead on descriptive represen
tations (the sort of representations that underlie language). Pylyshyn (1981) championed
the idea that the conscious experience of visualizing an object is purely epiphenomenal,
as is the power light on an electronic device—the light does not plays a functional role in
the way the electronic device works. Thus, it became evident that behavioral data would
not be sufficient to resolve the mental imagery debate between propositional and depic
tive researchers. In fact, Anderson (1978) demonstrated that any behavioral data collect
ed in a visual mental imagery experiment could be explained equally well by inferring
that depictive representations were processed or that a set of propositions were
processed.
In this chapter we report results collected in positron tomography emission (PET), func
tional magnetic resonance imagery (fMRI), transcranial magnetic stimulation (TMS), and
brain lesions studies, which converged in showing that visual mental imagery relies on
the same brain areas as those elicited when one perceives the world or initiates an ac
tion. The different methods serve different means. For example, fMRI allows researchers
to monitor the whole brain at work with a good spatial resolution—by contrasting the
mean blood oxygen level–dependent signal (BOLD) in a baseline condition to the BOLD
signal in an experimental condition. However, fMRI provides information on the correla
tions between the brain areas activated and the tasks performed but not on the causal re
lations between the two. In contrast, brain-damaged patients and TMS studies can pro
vide such causal (p. 76) relations. In fact, if a performance in a particular task is selective
ly impaired following a virtual lesion (TMS) or an actual brain lesion, this specific brain
area plays a causal role in the cognitive processes and representations engaged in that
particular task. However, researchers need to rely on previous fMRI or PET studies to de
cide what specific brain areas to target with TMS or which patients to include in their
Page 3 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
study. Thus, a comprehensive view of the neural underpinning of any cognitive function
requires taking into account data from all of these approaches.
In this chapter, we first discuss and review the results of studies that document an over
lap of the neural circuitry in the early visual cortex between visual mental imagery and vi
sual perception. Then, we present studies that specifically look at the neural underpin
ning of shape-based mental imagery and spatial mental imagery. Finally, we report stud
ies on the neural bases of motor imagery and how they overlap with those recruited when
ones initiates an action.
However, similar findings were reported when fMRI was used—a technique that provides
a better spatial resolution of the brain areas activated. For example, Klein, Paradis, Po
line, Kosslyn, and Lebihan (2000) in an event-related fMRI study documented an activa
tion of area 17 that started 2 seconds after the auditory cue prompting participants to
form a mental image, peaked around 4 to 6 seconds, and dropped off after 8 seconds or
so. In a follow-up experiment, Klein et al. (2004) demonstrated that the orientation with
which a bowtie shape stimulus was visualized modulated the activation of the visual cor
Page 4 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tex. The activation elicited by visualizing the bowtie vertically or horizontally matched the
cortical representations of the horizontal and vertical meridians. Moreover, in a fMRI
study, Slotnick, Thompson, and Kosslyn (2005) found that the retinotopic maps produced
by the visual presentation of rotating and flickering checkerboard wedges were similar to
the ones produced when rotating and flickering checkerboard wedges were visualized.
And to some extent, those maps were more similar than the maps produced in an atten
tion-based condition. Finally, Thirion and colleagues (2006) adopted an “inverse retino
topy” approach to infer the content of visual images based on the brain activations ob
served. Participants were asked in a perceptual condition to fixate rotating Gabor patches
and in the mental imagery condition to visualize one of the six Gabor patches rotating
right or left of a fixation point. Authors were able to predict accurately the stimuli partici
pants had seen and to a certain degree the stimuli participants had visualized. Crucially,
most of the voxels leading to a correct prediction of the stimuli visualized or presented vi
sually were located in area 17 and 18.
Taken together, results from fMRI and PET studies converge in showing that visual men
tal imagery activates the early visual areas and that the spatial structure of the activa
tions elicited by the mental imagery task is accounted for by standard (p. 77) retinotopic
mapping. Nevertheless, the questions remained as to whether activation of the early visu
al areas plays any functional role in visual imagery. In order to address this question,
Kosslyn et al. (1999) designed a task in which participants first memorized four patterns
of black and white stripes (which varied in length, width, orientation, and spacing of the
stripes; Figure 5.1) in four quadrants, and then were asked to visualize two of the pat
terns and to compare them on a given visual attribute (such as the orientation of the
stripes). The same participants performed the task in a perceptual condition on which
their judgments were based on patterns of stripes displayed on the screen. In both condi
tions, before comparing two patterns of stripes, repetitive TMS (rTMS) was delivered to
the early visual cortex—which had been shown to be activated using PET. In rTMS stud
Page 5 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
ies, a coil is used to deliver low-frequency magnetic pulses, which decrease cortical ex
citability for several minutes in the cortical area targeted (see Siebner et al. 2000). This
technique has the advantage that the disruption is reversible and lasts for a few minutes.
In addition, because the disruption is transient, there are no compensatory phenomena as
with real lesions. When stimulation was delivered to the posterior occipital lobe (real
rTMS condition), participants required more time to compare two patterns of stripes than
when stimulations were delivered away from the brain (in a sham rTMS). The effect of re
al rTMS (as denoted by the difference between the sham and real stimulations; see Fig
ure 5.1) was similar in visual mental imagery and visual perception, which makes sense if
area 17 is critical for both.
Sparing et al. (2002) used another TMS approach to determine whether visual mental im
agery modulates cortex excitability. The rationale of their approach was to use the
phosphene threshold (PT; i.e., the minimum TMS intensity that evokes phosphenes) to de
termine the cortical excitability of the primary visual areas of the brain. A single-pulse
TMS was delivered on the primary visual cortex to produce phosphenes in the right lower
quadrant of the visual field. Concurrently, participants performed either a visual mental
imagery task or an auditory control task. For each participant, the PT was determined by
increasing TMS intensity on each trial until participants reported experiencing
phosphenes. Visual mental imagery decreased the PT compared with the baseline condi
tion, whereas the auditory task had no effect on the PT. The results indicate that visual
mental imagery enhances cortex excitability in the visual cortex, which supports the func
tional role of the primary visual cortex in visual mental imagery. Consistent with the role
of area 17 in visual mental imagery, the apparent horizontal (p. 78) size of visual mental
images of a patient who had the occipital lobe surgically removed in one hemisphere was
half the apparent size of mental images in normal participants (Farah, Soso, & Dasheiff,
1992).
However, not all studies reported a functional role of area 17 in visual mental imagery. In
fact, neuropsychological studies offered compelling evidence that cortically blind patients
could have spared visual mental imagery abilities (see Bartolomeo, 2002, 2008). Anton
(1899) and Goldenberg, Müllbacher, and Nowak (1995) reported cortically blind patients
who seemed to be able to form visual mental images. In addition, Chatterjee and South
wood (1995) reported two cortically blind patients resulting from medial occipital lesions
with no impairment of their capacity to imagine object forms—such as capital letters or
common animals. These two patients could also draw a set of common objects from mem
ory.
Finally, Kosslyn and Thompson (2003), reviewed more than 50 neuroimaging studies (fM
RI, PET, and single-photon emission computer tomography, or SPECT) and found that in
nearly half, there was no activation of the early visual cortex. A meta-analysis of the neu
roimaging literature of visual mental imagery revealed that three factors accounted for
the probability of activation in area 17. Sensitivity of the technique is one of the factors,
and 19 fMRI studies out of 27 reported activation in area 17, compared with only 2
SPECT studies out of 9 reporting such activation. The degree of detail of the visual men
Page 6 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tal images that needs to be generated is also important, with high-resolution mental im
ages more likely to elicit activation in area 17. Finally, the type of judgment accounts for
the probability of activation in area 17: If spatial judgment is required, activation in area
17 is less likely. Thus, activation in area 17 most likely reflects the computation needed to
generate the visual images, at least for certain types of mental images, such as high-reso
lution, shape-based mental images.
In the next section, we review neuroimaging and brain-damaged patient studies showing
that shape-based mental imagery (including mental images of faces) and visual percep
tion engage most of the same higher visual areas in the ventral stream and that spatial
mental imagery and spatial vision recruit most of the same areas in the dorsal stream.
Brain imaging and neuropsychological data document a spatial segregation of visual ob
ject representations in the higher visual areas. For example, Kanwisher and Yovel (2006)
demonstrated that the lateral fusiform gyrus responds more strongly to pictures of faces
than other categories of objects, whereas the medial fusiform gyrus and the parahip
pocampal gyri respond selectively to pictures of buildings (Downing, Chan, Peelen,
Dodds, & Kanwisher, 2006).
To demonstrate the similarity between the cognitive processes and representations in vi
sion and visual mental imagery, researchers investigated whether the spatial segregation
of visual objects in the ventral stream can be found during shape-based mental imagery.
Bearing on this logic, O’Craven and Kanwisher (2000) asked a group of participants ei
ther to recognize pictures of familiar faces and buildings or to visualize those pictures in
Page 7 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
an fMRI study. In the perceptual condition, a direct comparison of activation elicited by
the two types of stimuli (buildings and faces) revealed a clear segregation within the ven
trotemporal cortex—with activation found in the fusiform face area (FFA) for faces and in
the parahippocampal place area for buildings (PPA). In the visual mental imagery condi
tion, (p. 79) the same pattern was observed but with weaker activation and smaller patch
es of cortex activated. Crucially, there was no hint of activation of the FFA when partici
pants visualized faces, nor of the PPA when they visualized buildings. The similarity be
tween vision and mental imagery in the higher visual areas was further demonstrated by
the fact that more than 84 percent of the voxels activated in the mental imagery condition
were activated in the perceptual condition.
These results were replicated in another fMRI study (Ishai, Ungerleider, & Haxby, 2000).
In this study, participants were asked either to view passively pictures of three objects
categories (i.e., faces, houses, and chairs), to view scrambles version of these pictures
(perceptual control condition), to visualize the pictures while looking at a gray back
ground, or to stare passively at the gray background (mental imagery control condition).
When activation elicited by the three object categories were compared in the perceptual
condition—after removing the activation in the respective control condition—different re
gions in the ventral stream showed differential responses to faces (FFA), houses (PPA),
and chairs (inferior temporal gyrus). Fifteen percent of the voxels in these three ventral
stream regions showed a similar pattern of activation in the mental imagery condition.
Mental images of the three categories of objects elicited additional activation in the pari
etal and the frontal regions that were not found in the perceptual condition.
In a follow-up study, Ishai, Haxby, and Ungerleider (2002) studied the activation elicited
by famous faces either presented visually or visualized. In the mental imagery condition,
participants studied pictures of half of the famous faces before the experiment. For the
other half of the trials, participants had to rely on their long-term memory to generate the
mental images of the faces. In the mental imagery and perceptual conditions, the FFA
(lateral fusiform gyrus) was activated, and 25 percent of the voxels activated in the men
tal imagery condition were within regions recruited during the perceptual condition. The
authors found that activation within the FFA was stronger for faces studied before the ex
periment than for faces generated on the basis of information stored in long-term memo
ry. In addition, given that visual attention did not modulate the activity recorded in higher
visual areas, Ishai and colleagues argued that attention and mental imagery are dissociat
ed to some degree.
Finally, although mental imagery and perception recruit the same category-selective ar
eas in the ventral stream, these areas are activated predominantly through bottom-up in
puts during perception and through top-down inputs during mental imagery. In fact, a
new analysis of the data reported by Ishai et al. (2000) revealed that the functional con
nectivity of ventral stream areas was stronger with the early visual areas in visual percep
tion; whereas during visual mental imagery, stronger functional connections were found
Page 8 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
between the higher visual areas and the frontal and parietal areas (Mechelli, Price, Fris
ton, & Ishai, 2004).
A recent fMRI study further supported the similarity of the brain areas recruited in the
ventral stream during visual mental imagery and visual perception (Stokes, Thompson,
Cusack, & Duncan, 2009). In this study, participants were asked to visualize an “X” or an
“O” based on an auditory cue, or to view passively the two capital letters displayed on a
computer screen. During both conditions (i.e., visual mental imagery and visual percep
tion), the visual cortex was significantly activated. Above-baseline activation was record
ed in the calcarine sulcus, cuneus, and lingual gyrus, and it extended to the fusiform and
middle temporal gyri. In addition, in both conditions, a multivoxel pattern analysis re
stricted to the anterior and posterior regions of the lateral occipital cortex (LOC) re
vealed that different populations of neurons code for the two types of stimuli (“X” and
“O”). Critically, a perceptual classifier trained on patterns of activation elicited by the
perceptual presentation of the stimuli was able to predict the type of visual images gener
ated in the mental imagery condition. The data speak to the fact that mental imagery and
visual perception activate shared content-specific representations in high-level visual ar
eas, including in the LOC.
Brain lesions studies generally present converging evidence that mental imagery and per
ception rely on the same cortical areas in the ventral stream (see Ganis, Thompson, Mast,
& Kosslyn, 2003). The logic underlying the brain lesions studies is that if visual mental
imagery and perception engage the same visual areas, then the same pattern of impair
ment should be observe in the two functions. Moreover, given that different visual mental
images (i.e., houses vs. faces) selectively elicit activation in different areas of the ventral
stream, the impairment in one domain of mental imagery (color or shape) should yield
parallel deficit in this specific domain in visual perception. In fact, patients with impair
ment in face recognition (i.e., prosopagnosia) are impaired in their ability to visualize
faces (Shuttleworth, Syring, & Allen, 1982; (p. 80) Young, Humphreys, Riddoch, Hellawell,
& De Haan, 1994). Selective deficit to identify animals in a single case study was accom
panied by similar deficit to describe animals or to draw them from memory (Sartori &
Job, 1988). In addition, as revealed by an early review of the literature (Farah, 1984), ob
ject agnosia was generally associated with deficit in the ability to visualize objects. Even
finer parallel deficits can be observed in the ventral stream. For example, Farah, Ham
mond, Mehta, and Ratcliff (1989) reported the case of a prosopagnosic patient with spe
cific deficit in his ability to visualize living things (such as animals or faces) but not in his
ability to visualize nonliving things. In addition, some brain-damaged patients cannot dis
tinguish colors perceptually and present similar deficits in color imagery (e.g., Rizzo,
Smith, Pokorony, & Damasio, 1993). Critically, patients with color perception deficits have
good general mental imagery abilities but are specifically impaired in color mental im
agery tasks (e.g., Riddoch & Humphreys, 1987).
However, not all neuropsychological findings report parallel deficits in mental imagery
and perception. Cases of patients were reported who had altered perception but pre
served imagery (e.g., Bartolomeo et al., 1998; Behrmann, Moscovitch, & Winocur, 1994;
Page 9 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Servos & Goodale, 1995). For example, Behrmann et al. (1994) reported the case of C.K.,
a brain-damaged patient with a left homonymous hemianopia and a possible thinning of
the occipital cortex (as revealed by a PET and MRI scan) who was severely impaired at
recognizing objects but who had no apparent deficit in shape-based mental imagery. In
fact, C.K. could draw objects with considerable detail from memory and could use infor
mation derived from visual images in a variety of tasks. Conversely, he could not identify
objects presented visually, even those he drew from memory. A similar dissociation be
tween perceptual impairments and relative spared ability in mental imagery was ob
served in Madame D. (Bartolomeo et al., 1998). Following bilateral brain lesions to the ex
trastriate visual areas (i.e., Brodmann areas 18, 19 bilaterally and 37 in the right hemi
sphere), Madame D. developed severe alexia, agnosia, prosopagnosia, and achromatop
sia. Her ability to recognize objects presented visually was severely impaired except for
very simple shapes like geometric figures. In contrast, she could draw objects from mem
ory, but she could not identify them. She performed well on an object mental imagery
test. Her impairment was not restricted to shape processing. In fact, she could not dis
criminate between colors, match colors, or point to the correct color. In sharp contrast,
she presented no deficit in color imagery, being able, for example, to determine which of
two objects had a darker hue when presented with a pair of objects names.
In some instances, studies reported the reverse pattern of dissociation with relatively nor
mal perception associated with deficits in visual mental imagery (e.g., Goldenberg, 1992;
Guariglia, Padovani, Pantano, & Pizzamiglio, 1993; Jacobson, Pearson, & Robertson,
2008; Moro, Berlucchi, Lerch, Tomaiuolo, & Aglioti, 2008). For example, two patients who
performed a battery of mental imagery tests in several sensory domains (visual, tactile,
auditory, gustatory, olfactory, and motor) showed pure visual imagery deficit for one and
visual and tactile imagery deficit for the other. Critically, the two patients had no appar
ent perceptual, language, or memory deficits (Moro et al., 2008). Lesions were located in
the middle and inferior temporal gyri of the left hemisphere in one patient and in the tem
poro-occipital area and the left medial and superior parietal lobe in the other patient.
The fact that some brain-damaged patients can present spared mental imagery with
deficit in visual perception or spared visual perception with deficit in mental imagery
could reveal a double dissociation between shape- and color-based imagery and visual
perception. In fact, visualizing an object relies on top-down processes that are not always
necessary to perceive this object, whereas perceiving an object relies on bottom-up orga
nizational processes not required to visualize it (e.g., Ganis, Thompson, & Kosslyn, 2004;
Kosslyn, 1994). This double dissociation is supported by the fact that not all of the same
brain areas are activated during visual mental imagery and visual perception (Ganis et
al., 2004; Kosslyn, Thompson, & Alpert, 1997). In an attempt to quantify the similarity be
tween visual mental imagery and visual perception, Ganis et al. (2004) in an fMRI study
asked participants to judge visual properties of objects (such as whether the object was
taller than wide) based either on a visual mental image of that object or on a picture of
that object presented visually. Across the entire brain, the amount of overlap of the brain
regions activated during visual mental imagery and visual perception reached 90 percent.
The amount of overlap in activation was smaller in the occipital and temporal lobes than
Page 10 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
in the frontal and parietal lobes, which suggests that perception relies in part on bottom-
up organizational processes that are not used as extensively during mental imagery. How
ever, visual imagery elicited activation in regions that were a (p. 81) subset of the regions
activated during the perceptual condition.
In the same way that researchers have studied brain areas in the ventral stream involved
in shape- and color-based mental imagery, researchers have identified brain areas recruit
ed during spatial mental imagery in the dorsal stream. A number of neuroimaging studies
used a well-understood mental imagery phenomenon to investigate the brain areas elicit
ed during spatial mental imagery, namely, the image scanning paradigm.
In the image scanning paradigm, participants first learn a map of an island with a num
ber of landmarks, then they mentally scan the distance between each pair of landmarks
after hearing the names of a pair of landmarks (e.g., Denis & Cocude, 1989; Kosslyn et al.,
1978). The landmarks are positioned in such a way that distances between each pair of
landmarks are different. The classic finding is a linear increase of response times with in
creasing distance between landmarks (see Denis & Kosslyn, 1999). The linear relation
ship between distance and scanning times suggests that spatial images incorporate the
metric properties of the objects they represent—which constitutes some of the evidence
that spatial images depict information. In a PET study, Mellet, Tzourio, Denis, and Mazoy
er (1995) investigated the neural basis of image scanning. After learning the map of a cir
cular island, participants were asked either to scan between each landmark on a map pre
sented visually in clockwise or counterclockwise direction or to scan a mental image of
the same map in the same way. When compared with a rest condition, both conditions
elicited brain activation in the bilateral superior external occipital regions and in the left
internal parietal region (precuneus). However, primary visual areas were activated only in
the perceptual condition.
fMRI studies provided further evidence that spatial processing of spatial images and spa
tial processing of the same material presented visually share the same brain areas in the
dorsal stream (e.g., Trojano et al., 2000, 2004). For example, Trojano et al. (2000) asked
participants to visualize two analogue clock faces and then to decide on which of them
the clock hands form the greater angle. In the perceptual task, the task of the partici
pants was identical, but the two clock faces were presented visually. When compared with
a control condition (i.e., participants judged which of the two times was numerically
greater), the mental imagery condition elicited activation in the posterior parietal cortex
and several frontal regions. In both conditions, brain activation was found in the inferior
parietal sulcus (IPS). Critically, when the two conditions (imagery and perception) were
directly contrasted, the activity in the IPS was no longer observed. The neuroimaging da
ta suggest that the IPS supports spatial processing of mental images and of visual per
cepts. In a follow-up study using the clock-face mental imagery task in an event-related
fMRI study, Formisano et al. (2002) found similar activation of the posterior parietal cor
Page 11 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tex with a peak of activation in the IPS 2 seconds after the auditory presentation of the
hours to visualize.
Interestingly, the frontoparietal network at play during spatial imagery is not restricted to
the processing of static spatial representation. In fact, Kaas, Weigelt, Roebroeck, Kohler,
and Muckli (2010) studied the brain areas recruited when participants were imagining
objects in movement using fMRI. In the motion imagery task, participants were asked to
visualize a blue ball moving back and forth within either the upper right corner or the
lower left corner of a computer screen. Participants imagined the motion of the ball at dif
ferent speeds—adjusted in the function of duration of an auditory cue. To determine
whether participants visualized the ball at the correct speed, participants were required
upon hearing a specific auditory cue to decide which of two visual targets was closer to
the imagined blue ball. The motion imagery task elicited activation in a parietofrontal net
work comprising bilaterally the superior and inferior parietal lobules (areas 7 and 40) and
the superior frontal gyrus (area 6), in addition to activation in the left middle occipital
gyrus and hMT/V5+. Finally, in V1, V2, and V3, a negative BOLD response was found.
Kass and colleagues argue that this negative BOLD signal might reflect an inhibition of
these areas to prevent visual inputs to interfere with motion imagery in higher visual ar
eas such as hMT/V5+.
The recruitment of the dorsal route for spatial imagery is not restricted to the modality in
which information is presented. Mellet et al. (2002) found similar activation in a pari
etofrontal network (i.e., intraparietal sulcus, presupplementary motor area, and superior
frontal sulcus) when participants mentally scan an environment described verbally or an
environment learned visually. Activation of similar brain areas in the dorsal route is also
observed when participants generate spatial images of cubes assembled on the basis of
verbal information (Mellet et al., 1996). In addition, neuroimaging studies on (p. 82) blind
participants suggest that representations and cognitive processes in spatial imagery are
not visuo-spatial. For example, during a spatial mental imagery task, the superior occipi
tal (area 19), the precuneus, and the superior parietal lobes (area 7) were activated in the
same way in sighted and early blind participants (Vanlierde, de Volder, Wanet-Defalque, &
Veraart, 2003). The task required participants to generate a pattern in a 6 × 6 grid by fill
ing in cells based on verbal instructions. Once they generated the mental image of the
pattern, participants judged the symmetry of this pattern. The fact that vision is not nec
essary to form and to process spatial images was further demonstrated in an rTMS study.
Aleman et al. (2002) found that participants required more time to determine whether a
cross presented visually “fell” on the uppercase letter they visualized in a real rTMS con
dition (compared with a sham rTMS condition) only when repetitive pulses were delivered
on the posterior parietal cortex (P4 positions) but not when delivered on the early visual
cortex (Oz position).
The functional role of the dorsal route in spatial mental imagery is supported by results
collected on brain-damaged patients (e.g., Farah et al., 1988; Levine et al., 1985; Luzzatti,
Vecchi, Agazzi, Cesa-Bianchi, & Vergani, 1998; Morton & Morris, 1995). For example,
Morton and Morris (1995) reported a patient called M.G. with a left parieto-occipital le
Page 12 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
sion who was selectively impaired in visuo-spatial processing. M.G. had no deficit in face
recognition and visual memory tests nor in an image inspection task. In contrast, she was
not only impaired on a mental rotation task but also on an image scanning task. She could
learn the map of the island and indicate the correct positions of the landmarks, but she
was not able to mentally scan the distance between the landmarks. She presented similar
deficit when asked to scan the contour of block letters.
Motor Imagery
In the previous sections, we presented evidence that visual mental imagery and spatial
imagery rely on the same brain areas as the ones elicited during vision and spatial vision,
respectively. Given that motor imagery occurs when a movement is mentally simulated,
motor imagery should recruit brain areas involved in physical movement. And in fact
there is a growing number of evidence that motor areas are activated during motor im
agery. In the next section, we review evidence that motor imagery engages the same
brain areas as the ones recruited during a physical movement, including in some in
stances the primary motor cortex, and that motor imagery is one of the strategies used to
transform mental images.
Decety and Jeannerod (1995) demonstrated that if one is asked to mentally walk from
point A to point B, the time to realize this “mental travel” is similar to the time one would
take to walk that distance. This mental travel effect (i.e., similarity of the time to imagine
an action and the time to perform that action) constitutes strong evidence that motor im
agery is crucial to simulating actual physical movements. Motor imagery is a particular
type of mental imagery and differs from visual imagery (and to a certain extent from spa
tial imagery). In fact, a number of studies have documented that visual mental imagery
and motor imagery rely on distinct mechanisms and brain areas (Tomasino, Borroni, Isa
ja, & Rumiati, 2005; Wraga, Shepard, Church, Iniati, Kosslyn, 2005; Wraga, Thompson,
Alpert, & Kosslyn, 2003). A single-cell recoding of the motor strip of monkeys first demon
strated that motor imagery relies partially on areas of the cortex that carry motor control:
Neurons in the motor cortex fired in sequence depending of their orientation tuning while
monkeys were planning to move a lever along a specific arc (Georgopoulos, Lurito,
Petrides, Schwartz, & Massey, 1989). Crucially, the neurons fired when the animals were
preparing to move their arms, not actually moving them.
To study motor imagery in humans, researchers often used mental rotation paradigms. In
the seminal mental rotation paradigm designed by Shepard and Metzler (1971), a pair of
3D objects with several arms (each consisting of small cubes) is presented visually (Fig
ure 5.2). The task of the participants is to decide whether the two objects have the same
shape, regardless of difference in their orientation. The key finding is that the time to
make this judgment increases linearly as the angular disparity between the two objects
increases (i.e., mental rotation effect). Subsequent studies showed that the mental rota
Page 13 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tion effect is found with alphanumerical stimuli (e.g., Cooper & Shepard, 1973, Koriat &
Norman, 1985), two-dimensional line drawings of letter-like asymmetrical characters
(e.g., Tarr & Pinker, 1989), and pictures of common objects (e.g., Jolicoeur, 1985).
Richter et al. (2000) in an fMRI study found that mental rotation of Shepard and Metzler
stimuli elicited activation in the superior parietal lobes bilaterally, the supplementary mo
tor cortex, and the left (p. 83) primary motor cortex. Results from a hand mental rotation
study provided additional evidence that motor processes were involved during image
transformation (Parsons et al., 1995). Pictures of hands were presented in the right or left
visual field with different orientations, and participants determined whether each picture
depicted a left or right hand. Parsons and colleagues reasoned that the motor cortex
would be recruited if participants mentally rotated their own hand in congruence with the
orientation of the stimulus presented to make their judgment. Bilateral activation was
found in the supplementary motor cortex, and critically, activation in the prefrontal and
the insular premotor areas occurred in the hemisphere contralateral to the stimulus
handedness. Activation was not restricted to brain areas that implemented motor func
tions; significant activation was also reported in the frontal and parietal lobes as well as
in area 17.
According to Decety (1996), image rotation occurs because we anticipate what we would
see if we manipulate an object, which implies that motor areas are recruited during men
tal rotation regardless of the category of objects rotated. Kosslyn, DiGirolamo, Thompson,
and Alpert (1998) in a PET study directly tested this assumption by asking participants ei
ther to mentally rotate inanimate 3D armed objects or pictures of hands. In both condi
tions, the two objects (or the two hands) were presented with different angular dispari
ties, and participants judged whether the two objects (or hands) were identical. To deter
mine the brain areas specifically activated during mental rotation, each experimental con
ditions was compared with a baseline condition in which the two objects (or hands) were
presented in the same orientation. The researchers found activation in the primary motor
cortex (area M1), premotor cortex, and posterior parietal lobe when participants rotated
hands. In contrast, none of the frontal motor areas was activated when participants men
tally rotated inanimate objects. The findings suggest that there are at least two ways ob
jects in images can be rotated: one that relies heavily on motor processes, and one that
does not. However, the type of stimuli rotated might not predict when the motor cortex is
recruited. In fact, Cohen et al. (1996) in an fMRI study found that motor areas were acti
Page 14 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
vated in half of the participants in a mental rotation task using 3D armed object similar to
the one used in the Kosslyn et al. (1998) study.
The fact that mental rotation of inanimate objects elicits activation in frontal motor areas
in some participants but not others suggests that there might be more than one strategy
to rotate this type of object. Kosslyn, Thompson, Wraga, and Alpert (2001) tested whether
in a mental rotation task of 3D armed objects participants could imagine the rotation of
objects in two different ways: as if an external force (such as a motor) was rotating the
objects (i.e., external action condition), or as if the objects were being physically manipu
lated (i.e., internal action condition). Participants received different sets of instructions
and practice procedures to prompt them to use one of the two strategies (external action
vs. internal action). In the practice of the external action condition, a wooden model of a
typical Shepard and Metzler object was rotated by an electric motor. In contrast, in the
internal condition, participants rotated the wooden model physically. The object used dur
ing practice was not used on the experimental trials. On each new set of trials, partici
pants were instructed to mentally rotate the object in the exact same way the wooden
model was rotated in the preceding practice session. The crucial finding was that area
M1 was activated when participants mentally rotated the object on the internal action tri
als but not on the external action trials. However, posterior parietal and secondary motor
(p. 84) areas were recruited in both conditions. The results have two implications: First,
mental rotation in general (independently of the type of stimuli) can be achieved by imag
ining the physical manipulation of the object. Second, participants can adopt one or the
other strategy voluntarily regardless of their cognitive styles or cognitive abilities.
However, the previous study left open the question of whether one can spontaneously use
a motor strategy to perform a mental rotation task of inanimate objects. Wraga et al.
(2003) addressed this issue in a PET study. In their experiment, participants performed ei
ther a mental rotation task of pictures of hands (similar to the one used by Kosslyn et al.,
1998) and then a Shepard and Metzler rotation task or two Shepard and Metzler tasks.
The authors reasoned that for the group that started with the mental rotation task of
hands, motor processes involved in the hand rotation task would covertly transfer to the
Shepard and Metzler task. In fact, when the brain activation in the two groups of partici
pants were compared in the second mental rotation task (Shepard and Metzler task in
both groups), activation in the motor areas (areas 6 and M1) were found only in the group
that performed a hand rotation task before the Shepard and Metzler task (Figure 5.3).
The results clearly demonstrate that motor processes can be used spontaneously to men
tally rotate objects that are not body parts.
Page 15 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Functional Role of Area M1
The studies we reviewed suggest that area M1 plays a role in the mental transformation
of objects. However, none addressed whether M1 plays a functional role in mental trans
formation and more specifically in mental rotation of objects. To test this issue, Ganis,
Keenan, Kosslyn, and Pascual-Leone (2000) administered single-pulse TMS to the left pri
mary motor cortex of participants while they performed mental rotations of line drawings
of hands or feet presented in their right visual field. Single-pulse TMS was administered
at different time intervals from the stimulus onset (400 or 650 ms) to determine when pri
mary motor areas are recruited during mental rotation. In addition, to test whether men
tal rotation of body parts is achieved by imagining the movement of the corresponding
part of the body, single-pulse TMS was delivered specifically to the hand area of M1. Par
ticipants required more time and made more errors when a single-pulse TMS was deliv
ered to M1, when the single-pulse TMS was delivered 650 ms rather than 400 ms after
stimulus onset, and when participants mentally rotated hands rather than feet. Within the
limits of the spatial resolution of the TMS methodology, the results suggest that M1 is re
quired to perform mental rotation of body parts by mapping the movement on one’s own
body part but only after the visual and spatial relations of the stimuli have been encoded.
Tomasino et al. (2005) reported converging data supporting the functional role of M1 in
mental rotation by using a mental rotation task of hands in a TMS study.
However, the data are not sufficient to claim that the computations are actually taking
place in M1. It is possible that M1 relays information computed elsewhere in the brain
(such as in the posterior parietal cortex). And in fact, Sirigu, Duhamel, Cohen, Pillon,
Dubois, and Agid (1996) demonstrated that the parietal cortex, not the motor cortex, is
critical to generate mental movement representations. Patients with lesions restricted to
the parietal cortex showed deficit in predicting the time necessary to perform specific fin
Page 16 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
ger movements, whereas no such deficit was reported for a patient with lesions restricted
to M1.
Conclusion
Some remain dubious that mental imagery can be functionally meaningful and can consti
tute a topic of research on its own. However, by drifting away from a purely introspective
approach of mental imagery to embrace more objective approaches, and notably by using
neuroimaging, researchers have collected evidence that mental images are depictive rep
resentations interpreted by cognitive processes at play in other systems—like the percep
tual and the (p. 85) motor systems. In fact, we hope that this review of the literature has
made clear that there is little evidence to counter the concepts that most of the same
neural processes underlying perception are also used in visual mental imagery and that
motor imagery can recruit the motor system in a similar way that physical action does.
Researchers now rely on what is known of the organization of the perceptual and motor
systems and of the key features of the neural mechanisms in those systems to refine the
characterization of the cognitive mechanisms at play in the mental imagery system. The
encouraging note is that each new characterization of the perceptual and motor systems
brings a chance to better understand neural mechanisms at play in mental imagery.
Finally, with the ongoing development of more elaborate neuroimaging techniques and
analyses of the BOLD signal, mental imagery researchers have an increasing set of tools
at their disposal to resolve complicate questions about mental imagery. A number of ques
tions remain to be answered in order to achieve a full understanding of the neural mecha
nisms carrying shape, color, spatial, and motor imagery. For example, although much evi
dence points toward an overlapping of perceptual and visual mental imagery processes in
high-level visual cortices—temporal and parietal lobes—evidence remains mixed at this
point concerning the role of lower level processes in visual mental imagery. Indeed, we
need to understand the circumstances under which the early visual cortex is recruited
during mental imagery. Another problem that warrants further investigation is the neural
basis of the individual differences observed in mental imagery abilities. As a prerequisite,
we can develop objective methods to measure individual differences in those abilities.
References
Aleman, A., Schutter, D. J. L. G., Ramsey, N. F., van Honk, J., Kessels, R. P. C., Hoogduin, J.
M., Postma, A., Kahn, R. S., & de Haan, E. H. F. (2002). Functional anatomy of top-down
visuospatial processing in the human brain: Evidence from rTMS. Cognitive Brain Re
search, 14, 300–302.
Page 17 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Anton, G. (1899). Über die Selbstwahrnehmungen der Herderkranungen des Gehirns
durch den Kranken bei Rindenblindheit. Archiv für Psychiatrie und Nervenkrankheiten,
32, 86–127.
Bartolomeo, P. (2002). The relationship between visual perception and visual mental im
agery: A reappraisal of the neuropsychological evidence. Cortex, 38, 357–378.
Bartolomeo, P. (2008). The neural correlates of visual mental imagery: An ongoing de
bate. Cortex, 44, 107–108.
Bartolomeo, P., Bachoud-Levi, A. C., De Gelder, B., Denes, G., Dalla Barba, G., Brugieres,
P., et al. (1998). Multiple-domain dissociation between impaired visual perception and
preserved mental imagery in a patient with bilateral extrastriate lesions. Neuropsycholo
gia, 36, 239–249.
Behrmann, M., Moscovitch, M., & Winocur, G. (1994). Intact visual imagery and impaired
visual perception in a patient with visual agnosia. Journal of Experimental Psychology:
Human Perception and Performance, 20, 1068–1087.
Borst, G., Thompson, W. L., & Kosslyn, S. M. (2011). Understanding the dorsal and ventral
systems of the cortex: Beyond dichotomies. American Psychologist, 66, 624–632.
Chatterjee, A., & Southwood, M. H. (1995). Cortical blindness and visual imagery. Neurol
ogy, 45.
Cohen, M. S., Kosslyn, S. M., Breiter, H. C., DiGirolamo, G. J., Thompson, W. L.,
Bookheimer, S. Y., Belliveau, J. W., & Rosen, B. R. (1996). Changes in cortical activity dur
ing mental rotation: A mapping study using functional MRI. Brain, 119, 89–100.
Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental im
ages. In W. G. Chase (Eds.), Visual information processing (pp. 75–176). New York: Acade
mic Press.
Decety, J. (1996). Neural representation for action. Reviews in the Neurosciences, 7, 285–
297.
Decety, J., & Jeannerod, M. (1995). Mentally simulated movements in virtual reality: Does
Fitts’s law hold in motor imagery? Behavioral Brain Research, 72, 127–134.
Denis, M., & Cocude, M. (1989). Scanning visual images generated from verbal descrip
tions. European Journal of Cognitive Psychology, 1, 293–307.
Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental images: A window on the
mind. Current Psychology of Cognition, 18, 409–465.
Downing, P. E., Chan, A. W., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain
specificity in visual cortex. Cerebral Cortex, 16, 1453–1461.
Page 18 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Farah, M. J. (1984). The neurological basis of mental imagery: A componential analysis.
Cognition, 18, 245–272.
Farah, M. J., Hammond, K. M., Mehta, Z., & Ratcliff, G. (1989). Category-specificity and
modality-specificity in semantic memory. Neuropsychologia, 27, 193–200.
Farah, M. J., Soso, M. J., & Dasheiff, R. M. (1992). Visual angle of the mind’s eye before
and after unilateral occipital lobectomy. Journal of Experimental Psychology: Human Per
ception and Performance, 18, 241–246.
Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the pri
mate cerebral cortex. Cerebral Cortex, 1, 1–47.
Finke, R. A., & Pinker, S. (1982). Spontaneous imagery scanning in mental extrapolation.
Journal of Experimental Psychology: Learning, Memory and Cognition, 8, 142–147.
Formisano, E., Linden, D. E. J., Di Salle, F., Trojano, L., Esposito, F., Sack, A. T., Grossi, D.,
Zanella, F. E., & Goebel, R. (2002). Tracking the mind’s image in the brain I: Time-re
solved fMRI during visuospatial mental imagery. Neuron, 35, 185–194.
Ganis, G., Keenan, J. P., Kosslyn, S. M., & Pascual-Leone, A. (2000). Transcranial magnetic
stimulation of primary motor cortex affects mental rotation. Cerebral Cortex, 10, 175–
180.
Ganis, G., Thompson, W. L., & Kosslyn, S. M. (2004). Brain areas underlying visual mental
imagery and visual perception: An fMRI study. Brain Research: Cognitive Brain Research,
20, 226–241.
Ganis, G., Thompson, W. L., Mast, F. W., & Kosslyn, S. M. (2003). Visual imagery in
(p. 86)
Georgopoulos, A. P., Lurito, J. T., Petrides, M., Schwartz, A. B., & Massey, J. T. (1989).
Mental rotation of the neuronal population vector. Science, 243, 234–236.
Goldenberg, G. (1992). Loss of visual imagery and loss of visual knowledge: A case study.
Neuropsychologia, 30, 1081–1099.
Goldenberg, G., Müllbacher, W., & Nowak, A. (1995). Imagery without perception: A case
study of anosognosia for cortical blindness. Neuropsychologia, 33, 1373–1382.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15, 20–25.
Guariglia, C., Padovani, A., Pantano, P., & Pizzamiglio, L. (1993). Unilateral neglect re
stricted to visual imagery. Nature, 364, 235–237.
Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., Her
scovitch, P., Schapiro, M. B., & Rapoport, S. I. (1991). Dissociation of object and spatial
Page 19 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
visual processing pathways in human extrastriate cortex. Proceedings of the National
Academy of Sciences U S A, 88, 1621–1625.
Ishai, A., Haxby, J. V., & Ungerleider, L. G. (2002). Visual imagery of famous faces: Effects
of memory and attention revealed by fMRI. NeuroImage, 17, 1729–1741.
Ishai, A., Ungerleider, L. G., & Haxby, J. V. (2000). Distributed neural systems for the gen
eration of visual images. Neuron, 28, 979–990.
Jacobson, L. S., Pearson, P. M., & Robertson, B. (2008). Hue-specific color memory impair
ment in an individual with intact color perception and color naming. Neuropsychologia,
46, 22–36.
Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory and Cognition,
13, 289–303.
Kaas, A., Weigelt, S., Roebroeck, A., Kohler, A., & Muckli, L. (2010). Imagery of a moving
object: The role of occipital cortex and human MT/V5+. NeuroImage, 49, 794–804.
Kanwisher, N., & Yovel, G. (2006). The fusiform face area: A cortical region specialized for
the perception of faces. Philosophical Transactions of the Royal Society of London B, 361,
2109–2128.
Klein, I., Dubois, J., Mangin, J. F., Kherif, F., Flandin, G., Poline, J. B., Denis, M., Kosslyn, S.
M., & Le Bihan, D. (2004). Retinotopic organization of visual mental images as revealed
by functional magnetic resonance imaging. Brain Research: Cognitive Brain Research, 22,
26–31.
Klein, I., Paradis, A.-L., Poline, J.-B., Kosslyn, S. M., & Le Bihan, D. (2000). Transient activ
ity in human calcarine cortex during visual imagery. Journal of Cognitive Neuroscience,
12, 15–23.
Koriat, A., & Norman, J., (1985). Reading rotated words. Journal of Experimental Psychol
ogy: Human Perception and Performance, 11, 490–508.
Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press.
Kosslyn, S. M. (1994). Image and brain. Cambridge, MA: Harvard University Press.
Kosslyn, S. M., Alpert, N. M., Thompson, W. L., Maljkovic, V., Weise, S. B., Chabris, C.,
Hamilton, S. E., & Buonanno F. S. (1993). Visual mental imagery activates topographically
organized visual cortex: PET investigations. Journal of Cognitive Neuroscience, 5, 263–
287.
Kosslyn, S. M., Ball, T. M., & Reiser, B. J. (1978). Visual images preserve metric spatial in
formation: Evidence from studies of image scanning. Journal of Experimental Psychology:
Human Perception and Performance, 4, 47–60.
Page 20 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Kosslyn, S. M., DiGirolamo, G., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation
of objects versus hands: Neural mechanisms revealed by positron emission tomography.
Psychophysiology, 35, 151–161.
Kosslyn, S. M., Pascual-Leone, A., Felician, O., Camposano, S., Keenan, J. P., Thompson, W.
L., Ganis, G., Sukel, K. E., & Alpert, N. M. (April 2, 1999). The role of area 17 in visual im
agery: Convergent evidence from PET and rTMS. Science, 284, 167–170.
Kosslyn, S. M., & Thompson, W. L. (2003). When is early visual cortex activated during vi
sual mental imagery? Psychological Bulletin, 129, 723–746.
Kosslyn, S. M., Thompson, W. L., & Alpert, N. M. (1997). Neural systems shared by visual
imagery and visual perception: A positron emission tomography study. NeuroImage, 6,
320–334.
Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.
Kosslyn, S. M., Thompson, W. L., Kim, I. J., & Alpert, N. M. (1995). Topographical repre
sentations of mental images in primary visual cortex. Nature, 378, 496–498.
Kosslyn, S. M., Thompson, W. L., Wraga, M., & Alpert, N. M. (2001). Imagining rotation by
endogenous versus exogenous forces: Distinct neural mechanisms. NeuroReport, 12,
2519–2525.
Levine, D. N., Warach, J., & Farah, M. J. (1985). Two visual systems in mental imagery:
Dissociation of “what” and “where” in imagery disorders due to bilateral posterior cere
bral lesions. Neurology, 35, 1010–1018.
Luzzatti, C., Vecchi, T., Agazzi, D., Cesa-Bianchi, M., & Vergani, C. (1998). A neurological
dissociation between preserved visual and impaired spatial processing in mental imagery.
Cortex, 34, 461–469.
Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets top-
down: neuronal interactions during perception and imagery. Cerebral Cortex, 14, 1256–
1265.
Mellet, E., Briscogne, S., Crivello, F., Mazoyer, B., Denis, M., & Tzourio-Mazoyer, N.
(2002). Neural basis of mental scanning of a topographic representation build from a text.
Cerebral Cortex, 12, 1322–1330.
Mellet, E., Tzourio, N., Crivello, F., Joliot, M., Denis, M., & Mazoyer, B. (1996). Functional
anatomy of spatial mental imagery generated from verbal instructions. Journal of Neuro
science, 16, 6504–6512.
Mellet, E., Tzourio, N., Denis, M., & Mazoyer, B. (1995). A positron emission tomography
study of visual and mental spatial exploration. Journal of Cognitive Neuroscience, 4, 433–
445.
Page 21 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Moro, V., Berlucchi, G., Lerch, J., Tomaiuolo, F., & Aglioti, S. M. (2008). Selective deficit of
mental visual imagery with intact primary visual cortex and visual perception. Cortex, 44,
109–118.
Morton, N., & Morris, R. G. (1995). Image transformations dissociated from visuo-spatial
working memory. Cognitive Neuropsychology, 12, 767–791.
O’Craven, K. M., & Kanwisher, N. (2000). Mental imagery of faces and places activates
corresponding stimulus-specific brain regions. Journal of Cognitive Neuroscience, 12,
1013–1023.
(p. 87) Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart and Win
ston.
Parsons, L. M., Fox, P. T., Downs, J. H., Glass, T., Hirsch, T. B., Martin, C. C., Jerabek, P. A.,
Lancaster, J. L. (1995). Use of implicit motor imagery for visual shape discrimination as
revealed by PET. Nature, 375, 54–58.
Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental
imagery. Psychological Bulletin, 80, 1–24.
Pylyshyn, Z. W. (2003a). Return of the mental image: Are there really pictures in the
head? Trends in Cognitive Sciences, 7, 113–118.
Pylyshyn, Z. W. (2003b). Seeing and visualizing: It s not what you think. Cambridge, MA:
MIT Press.
Pylyshyn, Z. W. (2007). Things and places: How the mind connects with the world. Cam
bridge, MA: MIT Press.
Richter, W., Somorjai, R., Summers, R., Jarmasz, M., Tegeler, C., Ugurbil, K., Menon, R.,
Gati, J. S., Georgopoulos, A. P., & Kim, S.-G. (2000). Motor area activity during mental ro
tation studied by time-resolved single-trial fMRI. Journal of Cognitive Neuroscience, 12,
310–320.
Riddoch, M. J., & Humphreys, G. W. (1987). A case of integrative visual agnosia. Brain,
110, 1431–1462.
Rizzo, M., Smith, V., Pokorny, J., & Damasio, A. (1993). Color perception profiles in central
achromatopsia. Neurology 43, 995–1001.
Sartori, G., & Job, R. (1988). The oyster with four legs: A neuropsychological study on the
interaction of visual and semantic information. Cognitive Neuropsychology, 5, 105–132.
Page 22 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., Brady, T. J., Rosen,
B. R., & Tootell, R. B. H. (1995). Borders of multiple visual areas in humans revealed by
functional magnetic resonance imaging. Science, 268, 889–893.
Servos, P., & Goodale, M. A. (1995). Preserved visual imagery in visual form agnosia. Neu
ropsychologia, 33 (11), 1383–1394.
Shuttleworth, E. C., Jr., Syring, V., & Allen, N. (1982). Further observations on the nature
of prosopagnosia. Brain and Cognition, 1, 307–322.
Siebner, H. R., Peller, M., Willoch, F., Minoshima, S., Boecker, H., Auer, C., Drzezga, A.,
Conrad, B., & Bartenstein, P. (2000). Lasting cortical activation after repetitive TMS of
the motor cortex: A glucose metabolic study. Neurology, 54, 956–963.
Sirigu, A., Duhamel, J.-R., Cohen, L., Pillon, B., Dubois, B., & Agid, Y. (1996). The mental
representation of hand movements after parietal cortex damage. Science, 273 (5281),
1564–1568.
Slotnick, S. D., Thompson, W. L., & Kosslyn, S. M. (2005). Visual mental imagery induces
retinotopically organized activation of early visual areas. Cerebral Cortex, 15, 1570–1583.
Sparing, R., Mottaghy, F., Ganis, G. Thompson, W. L., Toepper, R., Kosslyn, S. M., & Pas
cual-Leone, A. (2002). Visual cortex excitability increases during visual mental imagery: A
TMS study in healthy human subjects. Brain Research, 938, 92–97.
Stokes, M., Thompson, R., Cusack, R., & Duncan, J. (2009). Top-down activation of shape-
specific population codes in visual cortex during mental imagery. Journal of Neuroscience,
29, 1565–1572.
Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation-dependence in shape
recognition. Cognitive Psychology, 21, 233–282.
Thirion, B., Duchesnay, E., Hubbard, E., Dubois, J., Poline, J.-B., Lebihan, D., & Dehaene,
S. (2006). Inverse retinotopy: Inferring the visual content of images from brain activation
patterns. Neuroimage, 33, 1104–1116.
Tomasino, B., Borroni, P., Isaja, A., & Rumiati, R. I. (2005). The role of the primary motor
cortex in mental rotation: A TMS study. Cognitive Neuropsychology, 22, 348–363.
Trojano, L., Grossi, D., Linden, D. E., Formisano, E., Hacker, H., Zanella, F. E., Goebel, R.,
& Di Salle, F. (2000). Matching two imagined clocks: The functional anatomy of spatial
analysis in the absence of visual stimulation. Cerebral Cortex, 10, 473–481.
Page 23 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Trojano, L., Linden, D. E., Formisano, E., Grossi, D., Sack, A. T., & Di Salle, F. (2004).
What clocks tell us about the neural correlates of spatial imagery. European Journal of
Cognitive Psychology, 16, 653–672.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam
bridge, MA: MIT Press.
Vanlierde, A., de Volder, A. G., Wanet-Defalque, M. C., & Veraart C. (2003). Occipito-pari
etal cortex activation during visuo-spatial imagery in early blind humans. NeuroImage,
19, 698–709.
Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20,
158–177.
Wraga, M., Shephard, J. M., Church, J. A., Inati, S., & Kosslyn, S. M. (2005). Imagined ro
tations of self versus objects: An fMRI study. Neuropsychologia, 43, 1351–1361.
Wraga, M. J., Thompson, W. L., Alpert, N. M., & Kosslyn, S. M. (2003). Implicit transfer of
motor strategies in mental rotation. Brain and Cognition, 52, 135–143.
Young, A. W., Humphreys, G. W., Riddoch, M. J., Hellawell, D. J., & de Haan, E. H. (1994).
Recognition impairments and face imagery. Neuropsychologia, 32, 693–702.
Grégoire Borst
Page 24 of 24
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Keywords: olfactory perception, olfaction, behavior, odorant, olfactory epithelium, olfactory discrimination, piri
form cortex, eating, mating, social interaction
Introduction
Even in reviews on olfaction, it is often stated that human behavior and perception are
dominated by vision, or that humans are primarily visual creatures. This reflects the con
sensus in cognitive neuroscience (Zeki & Bartels, 1999). Indeed, if asked which distal
sense we would soonest part with, most (current authors included) would select olfaction
before audition or vision. Thus, whereas primarily olfactory animals such as rodents are
referred to as macrosmatic, humans are considered microsmatic.
That said, we trust our nose over our eyes and ears in the two most critical decisions we
make: what we eat, and with whom we mate (Figure 6.1).
We review various studies in this respect, yet first we turn to the reader’s clear intuition:
Given a beautiful-looking slice of cake that smells of sewage and a mushy-looking shape
less mixture that smells of cinnamon and banana, which do you eat? Given a gorgeous-
Page 1 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
looking individual who smells like yeast and a profoundly physically unattractive person
who smells like sweet spice, with whom do you mate? In both of these key behaviors, hu
mans, like all mammals, are primarily olfactory. With this simple truth in mind, namely,
that in our most important decisions we follow our nose, should humans nevertheless still
be considered microsmatic (Stoddart, 1990)?
Page 2 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Before considering the behavioral significance of human olfaction, we first provide a ba
sic overview of olfactory system organization. The mammalian olfactory system follows a
rather clear hierarchy, starting with transduction at the olfactory epithelium in the nose,
then initial processing subserving odor discrimination in the olfactory bulb, and finally
higher order processing related to odor object formation and odor memory in primary ol
factory cortex (R. I. Wilson & Mainen, 2006) (Figure 6.2). This organization is bilateral
and symmetrical, and although structural connectivity appears largely (p. 89) ipsilateral
(left epithelium to left bulb to left cortex) (Powell, Cowan, & Raisman, 1965), functional
measurements have implied more contralateral than ipsilateral driving of activity (Cross
et al., 2006; McBride & Slotnick, 1997; J. Porter, Anand, Johnson, Khan, & Sobel, 2005;
Savic & Gulyas, 2000; D. A. Wilson, 1997). The neural underpinnings of this functional
contralaterality remain unclear.
Page 3 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
More than One Nose in the Nose
Odorants are concurrently processed in several neural subsystems beyond the above-de
scribed main olfactory system (Breer, Fleischer, & Strotmann, 2006) (Figure 6.3). For ex
ample, air-borne molecules are transduced at endings of the trigeminal nerve in the eye,
nose, and throat (Hummel, 2000). It is trigeminal activation that provides the cooling sen
sation associated with odorants such as menthol, or the stingy sensation associated with
odorants such as ammonia or onion. In rodents, at least three additional sensing mecha
nisms have been identified in the nose. These include (1) the septal organ, which consists
of a small patch of olfactory receptors that are anterior to the main epithelium (Ma et al.,
2003); (2) the Grueneberg organ, which contains small grape-like clusters of receptors at
the anterior end of the nasal passage that project to a separate subset of main olfactory
bulb targets (Storan & Key, 2006); and (3) the vomeronasal system, or accessory olfactory
system (Halpern, 1987; Wysocki & Meredith, 1987). The accessory olfactory system is
equipped with a separate bilateral epithelial structure, the vomeronasal organ, or VNO
(sometimes also referred to as Jacobson’s organ). The VNO is a (p. 90) pit-shaped struc
ture at the anterior portion of the nasal passage, containing receptors that project to an
accessory olfactory bulb, which in turn projects directly to critical components of the lim
bic system such as the amygdala and hypothalamus (Keverne, 1999; Meredith, 1983) (see
Page 4 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Figure 6.3). In rodents, the accessory olfactory system plays a key role in mediating so
cial chemosignaling (Halpern, 1987; Kimchi, Xu, & Dulac, 2007; Wysocki & Meredith,
1987). Whether humans have a septal organ or Grueneberg organ has not been carefully
studied, and it is largely held that humans do not have an accessory olfactory system, al
though this issue remains controversial (Frasnelli, Lundstrˆm, Boyle, Katsarkas, & Jones
Gotman; Meredith, 2001; Monti-Bloch, Jennings-White, Dolberg, & Berliner, 1994; Witt &
Hummel, 2006). Regardless of this debate, it is clear that the sensation of smell in hu
mans and other mammals is the result of common activation across several neural subsys
tems (Restrepo, Arellano, Oliva, Schaefer, & Lin, 2004; Spehr et al., 2006). However, be
fore air-borne stimuli are processed, they first must be acquired.
Mammalian olfaction starts with a sniff—a critical act of odor sampling. Sniffs are not
merely an epiphenomenon of olfaction, but rather are an intricate component of olfactory
perception (Kepecs, Uchida, & Mainen, 2006, 2007; Mainland & Sobel, 2006; Schoenfeld
& Cleland, 2006). Sniffs are in part a reflexive action (Tomori, Benacka, & Donic, 1998),
which is then rapidly modified in accordance with odorant content (Laing, 1983) (Figure
6.4). Humans begin tailoring their sniff according to odorant properties within about 160
ms of sniff onset, reducing sniff magnitude for both intense (B. N. Johnson, Mainland, &
Sobel, 2003) and unpleasant (Bensafi et al., 2003) odorants. We have proposed that the
mechanism that tailors a sniff to its content is cerebellar (Sobel, Prabhakaran, Hartley, et
Page 5 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
al., 1998), and cerebellar lesions indeed negate this mechanism (Mainland, Johnson,
Khan, Ivry, & Sobel, 2005). Moreover, not only are sniffs the key mechanism for odorant
sampling, they also play a key role in timing and organization of neural representation in
the olfactory system. This influence of sniffing on neural representation in olfaction may
begin at the earliest phase of olfactory processing because olfactory receptors are also
mechanosensitive (Grosmaitre, Santarelli, Tan, Luo, & Ma, 2007), potentially responding
to sniffs even without odor. Sniff properties are then reflected in neural activity at both
the olfactory bulb (Verhagen, Wesson, Netoff, White, & Wachowiak, 2007) and olfactory
cortex (Sobel, Prabhakaran, Desmond, et al., 1998). Indeed, negating sniffs (whether
their execution, or only their (p. 91) intension) may underlie in part the pronounced differ
ences in olfactory system neural activity during wake and anesthesia (Rinberg, Koulakov,
& Gelperin, 2006). Finally, odor sampling is not only through the nose (orthonasal) but al
so through the mouth (retronasal): Food odors make their way to the olfactory system by
ascending through the posterior nares of the nasopharynx (Figure 6.5). Several lines of
evidence have suggested partially overlapping yet partially distinct neural substrates sub
serving orthonasal and retronasal human olfaction (Bender, Hummel, Negoias, & Small,
2009; Hummel, 2008; Small, Gerber, Mak, & Hummel, 2005).
Once sniffed, an odorant makes its way up the nasal passage, where it crosses a mucous
membrane before (p. 92) interacting with olfactory receptors that line the olfactory ep
ithelium. This step is not inconsequential to the olfactory process. Odorants cross this
mucus following the combined action of passive gradients and active transporters, which
generate an odorant-specific pattern of dispersion (Moulton, 1976). These so-called sorp
Page 6 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
tion properties have been hypothesized to play a key role in odorant discrimination, in
that they form a sort of chromatographic separation at the nose (Mozell & Jagodowicz,
1973). The later identification of an inordinately large family of specific olfactory receptor
types (L. Buck & Axel, 1991; Zhang & Firestein, 2002) shifted the focus of enquiry regard
ing odor discrimination to that of receptor–ligand interactions, but the chromatographic
component of this process has never been negated and likely remains a key aspect of
odorant processing.
Once an odorant crosses the mucosa, it interacts with olfactory receptors at the sensory
end of olfactory receptor neurons. Humans have about 12 million bipolar receptor neu
rons (Moran, Rowley, Jafek, & Lovell, 1982) that differ from typical neurons in that they
constantly regenerate from a basal cell layer throughout the lifespan (Graziadei & Monti
Graziadei, 1983). These neurons send their dendritic process to the olfactory epithelial
surface, where they form a knob from which five to twenty thin cilia extend into the mu
cus. These cilia contain the olfactory receptors: 7-transmembrane G-protein–coupled sec
ond-messenger receptors, where a cascade of events that starts with odorant binding cul
minates in the opening of cross-membrane cation channels that depolarize the cell
(Firestein, 2001; Spehr & Munger, 2009; Zufall, Firestein, & Shepherd, 1994) (Figure
6.6). The mammalian genome contains more than 1,000 such receptor types (L. Buck &
Axel, 1991), yet humans functionally express only about 400 of these (Gilad & Lancet,
2003). Typically, each receptor neuron expresses only one receptor type, although recent
evidence from Drosophila has suggested that in some cases a single neuron may express
two receptor types (Goldman, Van der Goes van Naters, Lessing, Warr, & Carlson, 2005).
In rodents, receptor types are grouped into four functional expression zones along a
dorsoventral epithelial axis, yet are randomly dispersed within each zone (Ressler, Sulli
van, & Buck, 1993; Strotmann, Wanner, Krieger, Raming, & Breer, 1992; Vassar, Ngai, &
Axel, 1993). Each receptor type is typically responsive to a small subset of odorants
(Hallem & Carlson, 2006; Malnic, Hirono, Sato, & Buck, 1999; Saito, Chi, Zhuang, Mat
sunami, & Mainland, 2009), although some receptors may be responsive to only very few
odorants (Keller, Zhuang, Chi, Vosshall, & Matsunami, 2007), and other receptors may be
responsive to a very wide range of odorants (Grosmaitre et al., 2009). Despite some alter
native hypotheses (Franco, Turin, Mershin, & Skoulakis, 2011), this receptor-to-odorant
specificity is widely considered the basis for olfactory coding (Su, Menuz, & Carlson,
2009).
Page 7 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Olfactory Bulb: A Neural Substrate for Odorant Discrimination
Whereas receptor types appear randomly dispersed throughout each epithelial subzone,
the path (p. 93) from epithelium to bulb via the olfactory nerve entails a unique pattern of
convergence that brings together all receptor neurons that express a particular receptor
type. These synapse onto one of two common points at the olfactory bulb, termed
glomeruli (Mombaerts et al., 1996). Thus, the number of glomeruli is expected to be
about double the number of receptor types, and the receptive range of a glomerulus is ex
pected to reflect the receptive range of a given receptor type (Feinstein & Mombaerts,
2004). Within the glomeruli, receptor axons contact dendrites of either mitral or tufted
output neurons and periglomerular interneurons. Whereas these rules have been learned
mostly from studies in rodents, the human olfactory system may be organized slightly dif
ferently; rather than the expected about 750 glomeruli (about double the number of ex
pressed receptor types), postmortem studies revealed many thousands of glomeruli in the
human olfactory bulb (Maresh, Rodriguez Gil, Whitman, & Greer, 2008).
The structural and functional properties of epithelium and bulb are relatively straightfor
ward: The epithelium is the site of transduction, where odorants become neural signals.
The bulb is the site of discrimination, where different odors form different spatiotemporal
patterns of neural activity. By contrast, the structure and function of primary olfactory
cortex remain unclear. In other words, there is no clear agreement as to what constitutes
primary olfactory cortex, let alone what it does.
By current definition, primary olfactory cortex consists of all brain regions that receive di
rect input from the mitral and tufted cell axons of the olfactory bulb (Allison, 1954;
Carmichael, Clugnet, & Price, 1994; de Olmos, Hardy, & Heimer, 1978; Haberly, 2001; J.
L. Price, 1973, 1987; J. L. Price, 1990; Shipley, 1995). These comprise most of the paleo
cortex, including (by order along the olfactory tract) the anterior olfactory cortex (also re
ferred to as the anterior olfactory nucleus) (Brunjes, Illig, & Meyer, 2005), ventral tenia
tecta, anterior hippocampal continuation and indusium griseum, olfactory tubercle, piri
form cortex, anterior cortical nucleus of the amygdala, periamygdaloid cortex, and rostral
entorhinal cortex (Carmichael et al., 1994) (Figure 6.8).
As can be appreciated by both the sheer area and diversity of cortical real estate that is
considered primary olfactory cortex, this definition is far from functional. One cannot as
sign a single function to “primary olfactory cortex” when primary olfactory cortex is a la
bel legitimately applied to a large proportion of the mammalian brain. The term primary
typically connotes basic functional roles such as early feature extraction, yet as can be ex
pected, a region comprising in part piriform cortex, amygdala, and entorhinal cortex is in
volved in far more complex sensory processing than mere early feature extraction. With
this in mind, several authors have simply shifted the definition by referring to the classic
primary olfactory structures as secondary olfactory structures, noting that the definition
Page 9 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
of mammalian primary olfactory cortex may better fit the olfactory bulb than piriform cor
tex (Cleland & Sullivan, 2003; Haberly, 2001).
At the same time, there has been a growing tendency to use the term primary olfactory
cortex for piriform cortex alone. Piriform cortex, the largest (p. 94) component of primary
olfactory cortex in mammals, lies along the olfactory tract at the junction of temporal and
frontal lobes and continues onto the dorsomedial aspect of the temporal lobe (see Figure
6.8A and B). Consistent with the latter approach, here we restrict our review of olfactory
cortex to the piriform portion of primary olfactory cortex alone.
Page 10 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Piriform Cortex: A Neural Substrate for Olfactory Object Formation
Piriform cortex is three-layered paleocortex that has been described in detail (Martinez,
Blanco, Bullon, & Agudo, 1987). In brief, layer I is subdivided into layer Ia, where afferent
fibers from the olfactory bulb terminate, and layer lb, where (p. 95) association fibers ter
minate (see Figure 6.8C). Layer II is a compact zone of neuronal cell bodies. Layer III
contains neuronal cell bodies at a lower density than layer II and a large number of den
dritic and axonal elements. Piriform input is widely distributed, and part of piriform out
put feeds back into piriform as further distributed input. Moreover, piriform cortex is rec
iprocally and extensively connected with several high-order areas of the cerebral cortex,
including the prefrontal, amygdaloid, perirhinal, and entorhinal cortices (Martinez et al.,
1987).
Page 11 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
The current understanding of piriform cortex function largely originated from the work of
Lew Haberly and colleagues (Haberly, 2001; Haberly & Bower, 1989; Illig & Haberly,
2003). These authors hypothesized that the structural organization of piriform cortex ren
ders it highly suitable to function as a content-addressable memory system, where frag
mented input can be used to “neurally reenact” a stored representation. Haberly and col
leagues identified or predicted several aspects of piriform organization that render it an
ideal substrate for such a system. These predictions have been remarkably borne out in
later studies of structure and function. Haberly and colleagues noted that first, associa
tive networks depend on spatially distributed input systems. Several lines of evidence
have indeed suggested that the projection from bulb to piriform is in fact spatially distrib
uted. In other words, in contrast to the spatial clustering of responses at the olfactory
bulb, this ordering is apparently obliterated in the projection to piriform cortex (Stettler
& Axel, 2009). Second, the discriminative power of associative networks relies on positive
feedback via interconnections between the processing units that receive the distributed
input. Indeed, in piriform cortex, each pyramidal cell makes a small number of synaptic
contacts on a large number (>1,000) of other cells in piriform cortex at disparate loca
tions. Axons from individual pyramidal cells also arborize extensively within many neigh
boring cortical areas, most of which send strong projections back to piriform cortex (D.
M. G. Johnson, Illig, Behan, & Haberly, 2000). Third, in associative memory models, indi
vidual inputs are typically weak relative to output threshold, a situation that indeed likely
(p. 96) occurs in piriform (Barkai & Hasselmo, 1994). Finally, content-addressable memo
Page 12 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
olfactory bulb mitral cell ensembles. In other words, olfactory bulb mitral cell ensembles
readily separated these overlapping patterns. In contrast, piriform cortex ensembles
showed no significant decorrelation across 10C and 10C-1 mixtures. In other words, the
piriform ensemble filled in the missing component and responded as if the full 10C mix
ture were present—consistent with pattern completion. In contrast, a shift from 10C to
10CR1 produced significant cortical ensemble pattern separation. In other words, the en
semble results were consistent with behavior whereby introduction of a novel component
into a complex mixture was relatively easy to detect, whereas removal of a single compo
nent was difficult to detect.
Consistent with the above, Jay Gottfried and colleagues have used functional magnetic
resonance imaging (fMRI) to investigate piriform activity in humans (Gottfried & Wu,
2009). In an initial study, they uncovered a heterogenic response profile whereby odorant
physicochemical properties were evident in activity patterns measured in anterior piri
form cortex, and odorant perceptual properties were associated with activity patterns
measured in posterior piriform (Gottfried, Winston, & Dolan, 2006). In that posterior piri
form is richer than anterior piriform in the extent of associational connectivity, this find
ing is consistent with the previously described findings in rodents. Moreover, using multi
variate fMRI analysis techniques, they found that odorants with similar perceived quality
induced similar patterns of ensemble activity in posterior piriform cortex alone (Howard,
Plailly, Grueschow, Haynes, & Gottfried, 2009). Taken together, these results from both
rodents and humans depict piriform cortex as a critical component allowing the olfactory
system to deal with an ever-changing olfactory environment, while still allowing stable ol
factory object formation and constancy.
Page 13 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
havior
As reviewed above, the basic functional architecture of mammalian olfaction is well un
derstood. In the olfactory epithelium, there is a thorough understanding of receptor
events that culminate in transduction of odorants into neural signals. In the olfactory
bulb, there is a comprehensive view of how such neural signals form spatiotemporal pat
terns that allow odor discrimination. Finally, in piriform cortex, there is an emerging view
of how sparse neural representation enables formation of stable olfactory objects. Howev
er, despite this good understanding of olfaction at the genetic, molecular, and cellular lev
els, we have only poor understanding of structure–function relations in this system
(Mainen, 2006). Put simply, there is not a scientist or perfumer in the world who can look
at a novel molecule and predict its odor, or smell a novel smell and predict its structure.
One reason for this state of affairs is that the olfactory stimulus, namely, a chemical, has
typically been viewed as it would be by chemists. For example, carbon chain length has
been the most consistently studied odorant property, yet there is no clear importance for
carbon chain length in mammalian olfactory behavior (Boesveldt, Olsson, & Lundstrom,
2010). Indeed, as elegantly stated by the late Larry Katz at a lecture he gave at the Asso
ciation for Chemoreception Science: “The olfactory system did not evolve to decode the
catalogue of Sigma-Aldrich, it evolved to decode the world around us.” In other words,
perhaps if we reexamine the olfactory stimulus space from a perceptual rather than a
chemical perspective, we may gain important insight into the function of the olfactory
system. It is with this notion in mind that we have recently generated an olfactory percep
tual metric, and tested its application to perception and neural activity in the olfactory
system.
In an effort led by Rehan Khan (Khan et al., 2007), we constructed a perceptual “odor
space” using data from the Dravnieks’ Atlas of Odor Character Profiles, wherein about
150 experts (perfumers and olfactory scientists) ranked (from 0 to 5, reflecting “absent”
to “extremely” representative) 160 odorants (144 monomolecular species and 16 mix
tures) against each of the 146 verbal descriptors (Dravnieks, 1982, 1985). We applied
principal components analysis (PCA), a well-established method for dimension reduction
that generates a new set of dimensions (principal components, or PCs) for the profile
space in which (1) each successive dimension has the maximal possible variance and (2)
all dimensions are uncorrelated. We found that the effective dimensionality of the odor
profile space was much smaller than 146, with the first four PCs accounting for 54 per
cent of the variance (Figure 6.9A). To generate a perceptual odor space, we projected the
odorants onto a subspace formed by these first four PCs (Figure 6.9B. A navigable version
of this space is available at the odor space link at http://www.weizmann.ac.il/neurobiolo
gy/worg). In a series of experiments, we found that this space formed a valid representa
tion of odorant perception: Simple Euclidian distances in the space predicted both explic
it (Figure 6.9C) and implicit (Figure 6.9D) odor similarity. In other words, odorants close
in the space smell similar, and odorants far-separated in the space smell dissimilar (Khan
et al, 2007). Moreover, we found that the primary dimension in the space (PC1) was tight
Page 14 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
ly linked to odorant pleasantness, that is, a continuum ranging from very unpleasant at
one end to very pleasant at the other (Haddad et al, 2010, Figure 6.10).
Finding that pleasantness was the primary dimension of human olfactory perception was
consistent with many previous efforts. Odorant pleasantness was the primary aspect of
odor spontaneously used by subjects in olfactory discrimination tasks (S. S. Schiffman,
1974), and odorant pleasantness was the primary criterion spontaneously used by sub
jects in order to combine odorants into groups (Berglund, Berglund, Engen, & Ekman,
1973; S. Schiffman, Robinson, & Erickson, 1977). When using large numbers of verbal de
scriptors in order to describe odorants, pleasantness repeatedly emerged as the primary
dimension in multidimensional analyses of the resultant descriptor space (Khan et al.,
2007; Moskowitz & Barbe, 1977). Studies with newborns suggested that at least some as
pects of olfactory pleasantness are innate (Soussignan, Schaal, Marlier, & Jiang, 1997;
Steiner, 1979). For example, neonate’s behavioral markers of disgust (nose wrinkling, up
per lip raising) discriminated between vanillin judged as being pleasant and butyric acid
judged to be unpleasant by adult raters (Soussignan et al., 1997). Moreover, there is
agreement in the assessments of pleasantness by adults and children for various pure
odorants (Schmidt & Beauchamp, 1988) and personal odors (Mallet & Schaal, 1998).
Page 15 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Page 16 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
After using PCA to reduce the apparent dimensionality of olfactory perception, we set out
to independently apply the same approach to odorant structure. We used structural chem
istry software to obtain 1,514 physicochemical descriptors for each of 1,565 odorants.
These descriptors were of many types (p. 98) (p. 99) (e.g., atom counts, functional group
counts, counts of types of bonds, molecular weights, topological descriptors). We applied
PCA to these data and found that much of the variance could be explained by a relatively
small number of PCs. The first PC accounted for about 32 percent of the variance, and
the first ten accounted for about 70 percent of the variance.
Because we separately generated PC spaces for perception and structure, we could then
ask whether these two spaces were related in any way. In other words, we tested for a
correlation between perceptual PCs and physicochemical PCs. Strikingly, the strongest
correlation was between the first perceptual PC and the first physicochemical PC (Figure
6.11A). In other words, there was a privileged relationship between PC1 of perception
and PC1 of physicochemical organization. The single best axis for explaining the variance
in the physicochemical data was the best predictor of the single best axis for explaining
the variance in the perceptual data. Having established that the physicochemical space is
related to the perceptual space, we next built a linear predictive model through a cross-
validation procedure that allowed us to predict odor perception from odorant structure
(Figure 6.11B). To test the predictive power of our model, we obtained physicochemical
parameters for 52 odorants commonly used in olfaction experiments, but not present in
the set of 144 used in the model building. We applied our model to the fifty-two new mole
cules so that for each we had predicted values for the first PC of perceptual space. We
found that using these PC values, we could convincingly predict the rank-order of pleas
Page 17 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
antness of these molecules (Spearman rank correlation, r = 0.72; p = 0.0004), and mod
estly yet significantly predict their actual pleasantness ratings (r = 0.55; p = 0.004).
Moreover, we obtained similar predictive power across three different cultures: urban
Americans in California, rural Muslim Arab Israelis, and urban Jewish Israelis (Figure
6.12).
An aspect of these results that has been viewed as challenging by many is that they imply
that pleasantness is written into the molecular structure of odorants and is therefore by
definition innate. This can be viewed as inconsistent with the high levels of cross-individ
ual and cross-cultural variability in odor perception (Ayabe-Kanamura et al., 1998; Wysoc
ki, Pierce, & Gilbert, 1991). We indeed think that odor pleasantness is hard-wired and in
nate. Consistent with this, many odors have clear hedonic value despite no previous expe
rience or exposure (Soussignan et al., 1997; Steiner, 1979), and moreover, the metric that
links this hedonic value with odorant structure (PC1 of structure) predicts (p. 100) re
sponses across species (Mandairon, Poncelet, Bensafi, & Didier, 2009). Nevertheless, we
stress that an innate hard-wired link remains highly susceptible to the influences of learn
ing, experience, and context. For example, no one would argue that perceived color is in
nately and hard-wire-linked to wavelength. However, a given wavelength can be per
ceived to have very different colors as a function of context (see striking online demon
strations at http://www.purveslab.net/seeforyourself/). Moreover, no one would argue that
location in space is reflected in location on the retina in an innate and hard-wired fashion.
Nevertheless, context can alter spatial perception, as clearly evident in the Muller-Lyer il
Page 18 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
lusion. Similarly, we argue that odor pleasantness is hard-wire-linked to (PC1 of) odorant
structure, yet this link is clearly modified by learning, experience, and context.
Finally, here we also conducted an initial investigation into the second principal compo
nent of activity as well. In that PC1 of neural activity reflected approach or withdrawal in
animals, we speculated that once approached, a second decision to be made regarding an
odor is whether it is edible or poisonous. Consistent with this prediction, we found signifi
cant correlations between PC2 of neural activity and odorant toxicity in mice and in rats
(Figure 6.13C), as well as a significant correlation between toxicity/edibility and PC2 of
perception in humans (Figure 6.13D). Similar findings have been obtained by others inde
pendently (Zarzo, 2008).
To conclude this section, we found that if one uses the human nose as a window onto ol
faction, one obtains a surprisingly simplified picture that explains a significant portion of
the variance in both neural activity and perception in this system. This picture relied on a
Page 19 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
set of simple linear transforms. It suggested that the primary axis of perception was
linked to the primary axis of odorant structure, and that both of these were in turn relat
ed to the primary axis of neural activity in this system. Moreover, the second axis of per
ception was linked to the second axis of neural activity. Critically, these transforms al
lowed for modest but significant predictions of perception, structure, and neural activity
across species.
In the previous section, the human nose taught us about the mammalian olfactory system.
This was (p. 102) possible because, in contrast to popular notions, the human nose is an
astonishingly acute device. This is evident in unusually keen powers of detection and dis
crimination, which in some cases compete with those of microsmatic mammals, or with
those of sophisticated analytical equipment. These abilities have been detailed within re
cent reviews (Sela & Sobel, 2010; Shepherd, 2004, 2005; Stevenson, 2010; Yeshurun &
Sobel, 2010; Zelano & Sobel, 2005). Here, we highlight key cases in which these keen ol
factory abilities clearly influence human behavior.
As noted in the introduction, two aspects of human behavior that are, in our view, macros
matic, are eating and mating: Notably, both are critical for survival. A third human behav
Page 20 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
ior for which human olfactory influences are less clearly apparent, yet in our view are
nevertheless critical, is social interaction. It is beyond the scope of this chapter to provide
a comprehensive review on human chemosignaling, as recently done elsewhere (Steven
son, 2010). Here, we selectively choose examples to highlight the role of olfaction in hu
man behavior.
Eating
We eat what tastes good (Drewnowski, 1997). Taste, or more accurately flavor, is domi
nated by smell (Small, Jones-Gotman, Zatorre, Petrides, & Evans, 1997). Hence, things
taste good because they smell good (Letarte, 1997). In other words, by determining the
palatability and hedonic value of food, olfaction influences the balance of food intake
(Rolls, 2006; Saper, Chou, & Elmquist, 2002; Yeomans, 2006). In addition to this very sim
ple basic premise, there are also several direct and indirect lines of evidence that high
light the significance of olfaction in eating behavior. For example, olfaction drives saliva
tion even at subthreshold odor concentrations (Pangborn & Berggren, 1973; Rogers &
Hill, 1989). Odors regulate appetite (Rogers & Hill, 1989) and affect the cephalic phase of
insulin secretion (W. G. Johnson & Wildman, 1983; Louis-Sylvestre & Le Magnen, 1980)
and gastric acid secretion (Feldman & Richardson, 1986).
The interaction between olfaction and eating is bidirectional. Olfaction influences eating,
and eating behavior and mechanisms influence olfaction. The nature of this influence,
however, remains controversial. For example, whereas some studies suggest that hunger
increases olfactory sensitivity to food odors (Guild, 1956; Hammer, 1951; Schneider &
Wolf, 1955; Stafford & Welbeck, 2010), others failed to replicate these results (Janowitz &
Grossman, 1949; Zilstorff-Pedersen, 1955), or even found the opposite—higher sensitivity
in satiety (Albrecht et al., 2009). Hunger and satiety influence not only sensitivity but also
hedonics: Odors of foods consumed to satiety become less pleasant (Albrecht et al., 2009;
Rolls & Rolls, 1997). This satiety-driven shift in hedonic representation is accompanied by
altered brain representation. This was uncovered in an elegant human brain–imaging
study in which eating bananas to satiety changed the representation of banana odor in
the orbitofrontal cortex (O’Doherty et al., 2000). Also, an odor encoded during inactiva
tion of taste-cortex in rats was later remembered as the same only during similar taste-
cortex inactivation (Fortis-Santiago, Rodwin, Neseliler, Piette, & Katz, 2009). The mecha
nism for these shifted representations may be evident at the earliest stages of olfactory
processing: Perfusion of the eating-related hormones insulin and leptin onto olfactory re
ceptor neurons in rats significantly increased spontaneous firing frequency in the ab
sence of odors and decreased odorant-induced peak amplitude in response to food odors
(Ketterer et al., 2010; Savigner et al., 2009). Therefore, by increasing spontaneous activi
ty but reducing odorant-induced activity of olfactory receptor neurons, elevated levels of
insulin and leptin (such as after a meal) may result in decreased global signal-to-noise ra
tio in the olfactory epithelium (Ketterer et al., 2010; Savigner et al., 2009).
Page 21 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
The importance of olfaction for human eating behavior is clearly evidenced in cases of ol
factory loss. Anosmic patients experience distortions in flavor perception (Bonfils, Avan,
Faulcon, & Malinvaud, 2005) and changes in eating behavior (Aschenbrenner et al.,
2008). Simulating anosmia in healthy subjects by intranasal lidocaine administration re
sulted in reduced hunger ratings (Greenway et al., 2007). Nevertheless, the rate of abnor
mal body mass index subjects among anosmic people is no larger than in the general pop
ulation (Aschenbrenner et al., 2008).
Several eating disorders, ranging from obesity (Hoover, 2010; Obrebowski, Obrebowska-
Karsznia, & Gawlinski, 2000; Richardson, Vander Woude, Sudan, Thompson, & Leopold,
2004; Snyder, Duffy, Chapo, Cobbett, & Bartoshuk, 2003) to anorexia (Fedoroff, Stoner,
Andersen, Doty, & Rolls, 1995; Roessner, Bleich, Banaschewski, & Rothenberger, 2005),
have been associated with alterations in olfactory perception, and the nutritional chal
lenge associated with aging has been clearly linked to the age-related loss of olfaction
(Cain & Gent, 1991; Doty, 1989; (p. 103) S. S. Schiffman, 1997). Accordingly, artificially in
creasing the odorous properties of foods helps overcome the nutritional challenge in ag
ing (Mathey, Siebelink, de Graaf, & Van Staveren, 2001; S. S. Schiffman & Warwick, 1988;
Stevens & Lawless, 1981).
Consistent with the bidirectional influences of olfaction and eating behavior, edibility is
clearly a key category in odor perception. It was identified as the second principal axis of
perception independently by us (Haddad et al., 2010) and others (Zarzo, 2008). Consis
tent with edibility as an olfactory category, olfactory responses are stronger (Small et al.,
2005) and faster (Boesveldt, Frasnelli, Gordon, & Lundstrom), and identification is more
accurate (Fusari & Ballesteros, 2008), for food over nonfood odors. Moreover, whereas
humans are poor at spontaneous odor naming, they are very good at spontaneous rating
of odor edibility, even in childhood (de Wijk & Cain, 1994a, 1994b). Indeed, olfactory pref
erences of neonates are influenced by their mother’s food preferences during pregnancy
(Schaal, Marlier, & Soussignan, 2000), suggesting that the powerful link between olfacto
ry preferences and eating behavior is formed at the earliest stages of development.
Mating
When reasoning the choice of a sexual partner, some may list physical and personality
qualities, whereas others may just explain the choice by a “simple click” or “chemistry.” Is
this “click” indeed chemical? Although, as noted, humans tend to underestimate their
own olfactory abilities, humans can nevertheless use olfaction to discriminate the genetic
makeup of potential mating partners. The human genome includes a region called human
leukocyte antigen (HLA), which consists of many genes related to the immune system, in
addition to olfactory receptor genes and pseudogenes. Several studies have found that
women can use smell to discriminate between men as a function of similarity between
their own and the men’s HLA alleles (Eggert, Muller-Ruchholtz, & Ferstl, 1998; Jacob,
McClintock, Zelano, & Ober, 2002; Ober et al., 1997; Wedekind & Furi, 1997; Wedekind,
Seebeck, Bettens, & Paepke, 1995). The “ideal” smell of genetic makeup remains contro
versial, yet most evidence suggests that women prefer an odor of a man with HLA alleles
Page 22 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
not identical to their own, but at the same time not too different (Jacob et al., 2002; T.
Roberts & Roiser, 2010). In turn, this preference may be for major histocompatibility
complex (MHC) heterozygosity rather than dissimilarity (Thornhill et al., 2003). Olfactory
mate preference, however, is plastic. For example, single women preferred odors of MHC-
similar men, whereas women in relationships preferred odors of MHC-dissimilar men (S.
C. Roberts & Little, 2008). Moreover, olfactory mate preferences are influenced by the
menstrual cycle (Gangestad & Cousins, 2001; Havlicek, Roberts, & Flegr, 2005; Little,
Jones, & Burriss, 2007; Singh & Bronstad, 2001) (Figure 6.14A) and by hormone-based
contraceptives (S. C. Roberts, Gosling, Carter, & Petrie, 2008; Wedekind et al., 1995;
Wedekind & Furi, 1997).
Finally, although not directly related to mate selection, the clearest case of chemical com
munication in humans also has clear potential implications for mating behavior. This is
the phenomenon of menstrual synchrony, whereby women who live in close proximity,
such as roommates in dorms, synchronize their menstrual cycle over time (McClintock,
1971). This effect is mediated by an odor in sweat. This was verified in a series of studies
in which experimenters obtained underarm sweat extracts from donor women during ei
ther the ovulatory or follicular menstrual phase. These extracts were then deposited on
the upper lips of recipient women, where follicular sweat accelerated ovulation, and ovu
latory sweat delayed it (Russell, Switz, & Thompson, 1980; Stern & McClintock, 1998)
(Figure 6.14B). Moreover, variation in menstrual timing can be increased by the odor of
other lactating women (Jacob et al., 2004) or regulated by the odor of male hormones
(Cutler et al., 1986; Wysocki & Preti, 2004).
Olfactory influences on mate preferences are not restricted to women. Men can detect an
HLA odor different from their own when taken from either men or women odor donors,
and can rate the similar odor as more pleasant for both of the sexes (Thornhill et al.,
2003; Wedekind & Furi, 1997). In addition, men preferred the scent of common over rare
MHC alleles (Thornhill et al., 2003). Moreover, unrelated to HLA similarity, male raters
can detect the menstrual phase of female body odor donors. The follicular phase is rated
as more pleasant and sexy than the luteal phase (Singh & Bronstad, 2001), an effect that
is diminished when the women use hormonal contraceptives (Kuukasjarvi et al., 2004;
Thornhill et al., 2003).
Page 23 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
These behavioral results are echoed in hormone expression. Men exposed to the scent of
an ovulating woman subsequently displayed higher levels of testosterone than did men
exposed to the scent of a (p. 104) nonovulating woman or a control scent (Miller & Maner,
2010) (Figure 6.14C). Moreover, a recent study on chemosignals in human tears revealed
a host of influences on sexual arousal (Gelstein et al., 2011). Sniffing negative-emotion-re
lated odorless tears obtained from women donors induced reductions in sexual appeal at
tributed by men to pictures of women’s faces. Sniffing tears also reduced self-rated sexu
al arousal, reduced physiological measures of arousal, and reduced levels of testosterone.
Finally, fMRI revealed that sniffing women’s tears selectively reduced activity in brain
substrates of sexual arousal in men (Gelstein et al., 2011) (Figure 6.15).
Page 24 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Social Interaction
Whereas olfactory influences on human eating and mating are intuitively obvious, olfacto
ry cues may play into aspects of human social interaction that have been less commonly
associated with smell. Many such types of social chemosignaling have been examined
(Meredith, 2001), but here we will detail only one particular case that has received more
attention than others, and that is the ability of humans to smell fear. Fear or distress
chemosignals are prevalent throughout animal species (Hauser et al., 2008; Pageat &
Gaultier, 2003). In an initial study in humans, Chen and Haviland-Jones (2000) collected
underarm odors on gauze pads from young women and men after they watched funny or
frightening movies. They later asked other women and men to determine by smell which
was the odor of people when they were “happy” or “afraid.” Women correctly identified
happiness in men and women, and fear in men. Men correctly identified happiness in
women and fear in men. A (p. 105) similar result was later obtained in a study that exam
ined women only (Ackerl, Atzmueller, & Grammer, 2002). Moreover, women had improved
performance in a cognitive verbal task after smelling fear sweat versus neutral sweat
(Chen, Katdare, & Lucas, 2006), and the smell of fearful sweat biased women toward in
terpreting ambiguous expressions as more fearful, but had no effect when the facial emo
tion was more discernible (Zhou & Chen, 2009). Moreover, subjects had an increased
startle reflex when exposed to anxiety-related sweat versus sports-related sweat (Prehn,
Ohrt, Sojka, Ferstl, & Pause, 2006). Finally, imaging studies have revealed dissociable
brain representations after smelling anxiety sweat versus sports-related sweat (Prehn-
Kristensen et al., 2009). These differences are particularly pronounced in the amygdala, a
brain substrate common to olfaction, fear responses, and emotional regulation of behav
ior (Mujica-Parodi et al., 2009). Taken together, this body of research strongly suggests
Page 25 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
that humans can discriminate the scent of fear from other body odors, and it is not unlike
ly that this influences behavior. We think that smelling fear or distress is by no means one
of the key roles of human chemical communication, yet we have chosen to detail this par
ticular example of human social chemosignaling because it has received increased experi
mental attention. We think that chemosignaling in fact plays into many aspects of human
social interaction, and uncovering these instances of chemosignaling is a major goal for
research in our field.
Final Word
We have described the functional neuroanatomy of the mammalian sense of smell. This
system is highly conserved (Ache & Young, 2005), and therefore the human sense of smell
is not very different from that of other mammals. With this in mind, just as a deep under
standing of human visual psychophysics provided the basis for probing vision neurobiolo
gy, we propose that a solid understanding of human olfactory psychophysics is a
perquisite to understanding the neurobiological mechanisms of the sense of smell. More
over, olfaction significantly influences critical human behaviors directly related to sur
vival, such as eating, mating, and social interaction. Better understanding of these olfac
tory influences is key, in our view, to a comprehensive picture of human behavior.
References
Ache, B. W., & Young, J. M. (2005). Olfaction: Diverse species, conserved principles. Neu
ron, 48 (3), 417–430.
Ackerl, K., Atzmueller, M., & Grammer, K. (2002). The scent of fear. Neuroendocrinology
Letters, 23 (2), 79–84.
Albrecht, J., Schreder, T., Kleemann, A. M., Schopf, V., Kopietz, R., Anzinger, A., et al.
(2009). Olfactory detection thresholds and pleasantness of a food-related and a non-food
odour in hunger and satiety. Rhinology, 47 (2), 160–165.
Allison, A. (1954). The secondary olfactory areas in the human brain. Journal of Anatomy,
88, 481–488.
Aschenbrenner, K., Hummel, C., Teszmer, K., Krone, F., Ishimaru, T., Seo, H. S., et al.
(2008). The influence of olfactory loss on dietary behaviors. Laryngoscope, 118 (1), 135–
144.
Ayabe-Kanamura, S., Schicker, I., Laska, M., Hudson, R., Distel, H., Kobayakawa, T., et al.
(1998). Differences in perception of everyday odors: A Japanese-German cross-cultural
study. Chemical Senses, 23 (1), 31–38.
Barkai, E., & Hasselmo, M. E. (1994). Modulation of the input/output function of rat piri
form cortex pyramidal cells. Journal of Neurophysiology, 72 (2), 644.
Page 26 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Barnes, D. C., Hofacer, R. D., Zaman, A. R., Rennaker, R. L., & Wilson, D. A. (2008). Olfac
tory perceptual stability and discrimination. Nature Neuroscience, 11 (12), 1378–1380.
Bathellier, B., Buhl, D. L., Accolla, R., & Carleton, A. (2008). Dynamic ensemble odor cod
ing in the mammalian olfactory bulb: Sensory information at different timescales. Neuron,
57 (4), 586–598.
Bender, G., Hummel, T., Negoias, S., & Small, D. M. (2009). Separate signals for or
thonasal vs. retronasal perception of food but not nonfood odors. Behavioral Neuro
science, 123 (3), 481–489.
Bensafi, M., Porter, J., Pouliot, S., Mainland, J., Johnson, B., Zelano, C., et al. (2003). Olfac
tomotor activity during imagery mimics that during perception. Nature Neuroscience, 6
(11), 1142–1144.
Berglund, B., Berglund, U., Engen, T., & Ekman, G. (1973). Multidimensional analysis of
21 odors. Scandinavian Journal of Psychology, 14 (2), 131–137.
Boesveldt, S., Frasnelli, J., Gordon, A. R., & Lundstrom, J. N. (2010). The fish is bad: Nega
tive food odors elicit faster and more accurate reactions than other odors. Biological Psy
chology, 84 (2), 313–317.
Boesveldt, S., Olsson, M. J., & Lundstrom, J. N. (2010). Carbon chain length and the stimu
lus problem in olfaction. Behavioral Brain Research, 215 (1), 110–113.
Bonfils, P., Avan, P., Faulcon, P., & Malinvaud, D. (2005). Distorted odorant perception:
Analysis of a series of 56 patients with parosmia. Archives of Otolaryngology—Head and
Neck Surgery, 131 (2), 107–112.
Breer, H., Fleischer, J., & Strotmann, J. (2006). The sense of smell: Multiple olfactory sub
systems. Cellular and Molecular Life Sciences, 63 (13), 1465–1475.
Brunjes, P. C., Illig, K. R., & Meyer, E. A. (2005). A field guide to the anterior olfactory nu
cleus (cortex). Brain Res Brain Res Rev, 50 (2), 305–335.
Buck, L. B. (2000). The molecular architecture of odor and pheromone sensing in mam
mals. Cell, 100 (6), 611–618.
Buck, L., & Axel, R. (1991). A novel multigene family may encode odorant receptors: a
molecular basis for odor recognition. Cell, 65 (1), 175–187.
Cain, W. S., & Gent, J. F. (1991). Olfactory sensitivity: Reliability, generality, and associa
tion with aging. Journal of Experimental Psychology: Human Perception and Performance,
17 (2), 382–391.
Carmichael, S. T., Clugnet, M. C., & Price, J. L. (1994). Central olfactory connections in
the macaque monkey. Journal of Comparative Neurology, 346 (3), 403–434.
Page 27 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Chen, D., & Haviland-Jones, J. (2000). Human olfactory communication of emo
(p. 106)
Chen, D., Katdare, A., & Lucas, N. (2006). Chemosignals of fear enhance cognitive perfor
mance in humans. Chemical Senses, 31 (5), 415.
Cleland, T. A., & Sullivan, R. M. (2003). Central olfactory structures. In R. L. Doty (Ed.),
Handbook of olfaction and gustation (2nd ed., pp. 165–180). New York: Marcel Dekker.
Cohen, Y., Reuveni, I., Barkai, E., & Maroun, M. (2008). Olfactory learning-induced long-
lasting enhancement of descending and ascending synaptic transmission to the piriform
cortex. Journal of Neuroscience, 28 (26), 6664.
Cross, D. J., Flexman, J. A., Anzai, Y., Morrow, T. J., Maravilla, K. R., & Minoshima, S.
(2006). In vivo imaging of functional disruption, recovery and alteration in rat olfactory
circuitry after lesion. NeuroImage, 32 (3), 1265–1272.
Cutler, W. B., Preti, G., Krieger, A., Huggins, G. R., Garcia, C. R., & Lawley, H. J. (1986).
Human axillary secretions influence women’s menstrual cycles: The role of donor extract
from men. Hormones and Behavior, 20 (4), 463–473.
de Olmos, J., Hardy, H., & Heimer, L. (1978). The afferent connections of the main and the
accessory olfactory bulb formations in the rat: an experimental HRP-study. Journal of
Comparative Neurology, 15 (181), 213–244.
de Wijk, R. A., & Cain, W. S. (1994a). Odor identification by name and by edibility: Life-
span development and safety. Human Factors, 36 (1), 182–187.
de Wijk, R. A., & Cain, W. S. (1994b). Odor quality: Discrimination versus free and cued
identification. Perception and Psychophysics, 56 (1), 12–18.
DeMaria, S., & Ngai, J. (2010). The cell biology of smell. Journal of Cell Biology, 191 (3),
443–452.
Doty, R. L. (1989). Influence of age and age-related diseases on olfactory function. Annals
of the New York Academy of Sciences, 561, 76–86.
Drewnowski, A. (1997). Taste preferences and food intake. Annual Review of Nutrition, 17
(1), 237–253.
Eggert, F., Muller-Ruchholtz, W., & Ferstl, R. (1998). Olfactory cues associated with the
major histocompatibility complex. Genetica, 104 (3), 191–197.
Page 28 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Fedoroff, I. C., Stoner, S. A., Andersen, A. E., Doty, R. L., & Rolls, B. J. (1995). Olfactory
dysfunction in anorexia and bulimia nervosa. International Journal of Eating Disorders, 18
(1), 71–77.
Feinstein, P., & Mombaerts, P. (2004). A contextual model for axonal sorting into
glomeruli in the mouse olfactory system. Cell, 117 (6), 817–831.
Feldman, M., & Richardson, C. T. (1986). Role of thought, sight, smell, and taste of food in
the cephalic phase of gastric acid secretion in humans. Gastroenterology, 90 (2), 428–433.
Ferrero, D. M., & Liberles, S. D. (2010). The secret codes of mammalian scents. Wiley In
terdisciplinary Reviews: Systems Biology and Medicine, 2 (1), 23–33.
Firestein, S. (2001). How the olfactory system makes sense of scents. Nature, 413 (6852),
211–218.
Frasnelli, J., Lundstrom, J. N., Boyle, J. A., Katsarkas, A., & Jones-Gotman, M. (2011). The
Vomeronasal Organ is not Involved in the Perception of Endogenous Odors. Human Brain
Mapping, 32 (3), 450–460.
Fortis-Santiago, Y., Rodwin, B. A., Neseliler, S., Piette, C. E., & Katz, D. B. (2009). State
dependence of olfactory perception as a function of taste cortical inactivation. Nature
Neuroscience, 13 (2), 158–159.
Franco, M. I., Turin, L., Mershin, A., & Skoulakis, E. M. (2011). Molecular vibration-sens
ing component in Drosophila melanogaster olfaction. Proceedings of the National Acade
my of Sciences U S A, 108 (9), 3797–3802.
Fusari, A., & Ballesteros, S. (2008). Identification of odors of edible and nonedible stimuli
as affected by age and gender. Behavior Research Methods, 40 (3), 752.
Gangestad, S. W., & Cousins, A. J. (2001). Adaptive design, female mate preferences, and
shifts across the menstrual cycle. Annual Review of Sex Research, 12, 145–185.
Gangestad, S. W., & Thornhill, R. (1998). Menstrual cycle variation in women’s prefer
ences for the scent of symmetrical men. Proceedings of the Royal Society of London. B.
Biological Sciences, 265 (1399), 927–933.
Gelstein, S., Yeshurun, Y., Rozenkrantz, L., Shushan, S., Frumin, I., Roth, Y., et al. (2011).
Human tears contain a chemosignal. Science, 331 (6014), 226–230.
Gilad, Y., & Lancet, D. (2003). Population differences in the human functional olfactory
repertoire. Molecular Biology and Evolution, 20 (3), 307–314.
Goldman, A. L., Van der Goes van Naters, W., Lessing, D., Warr, C. G., & Carlson, J. R.
(2005). Coexpression of two functional odor receptors in one neuron. Neuron, 45 (5), 661–
666.
Page 29 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Gottfried, J. A. (2010). Central mechanisms of odour object perception. Nature Reviews,
Neuroscience, 11 (9), 628–641.
Gottfried, J. A., Winston, J. S., & Dolan, R. J. (2006). Dissociable codes of odor quality and
odorant structure in human piriform cortex. Neuron, 49 (3), 467–479.
Gottfried, J. A., & Wu, K. N. (2009). Perceptual and neural pliability of odor objects. An
nals of the New York Academy of Sciences, 1170, 324–332.
Graziadei, P. P., & Monti Graziadei, A. G. (1983). Regeneration in the olfactory system of
vertebrates. American Journal of Otolaryngology, 4 (4), 228–233.
Greenway, F. L., Martin, C. K., Gupta, A. K., Cruickshank, S., Whitehouse, J., DeYoung, L.,
et al. (2007). Using intranasal lidocaine to reduce food intake. International Journal of
Obesity (London), 31 (5), 858–863.
Grosmaitre, X., Fuss, S. H., Lee, A. C., Adipietro, K. A., Matsunami, H., Mombaerts, P., et
al. (2009). SR1, a mouse odorant receptor with an unusually broad response profile. Jour
nal of Neuroscience, 29 (46), 14545–14552.
Grosmaitre, X., Santarelli, L. C., Tan, J., Luo, M., & Ma, M. (2007). Dual functions of mam
malian olfactory sensory neurons as odor detectors and mechanical sensors. Nature Neu
roscience, 10 (3), 348–354.
Guild, A. A. (1956). Olfactory acuity in normal and obese human subjects: Diurnal varia
tions and the effect of d-amphetamine sulphate. Journal of Laryngology and Otology, 70
(7), 408–414.
Haberly, L. B., & Bower, J. M. (1989). Olfactory cortex: Model circuit for study of associa
tive memory? Trends in Neurosciences, 12 (7), 258–264.
Haddad, R., Khan, R., Takahashi, Y. K., Mori, K., Harel, D., & Sobel, N. (2008). A
(p. 107)
Haddad, R., Lapid, H., Harel, D., & Sobel, N. (2008). Measuring smells. Current Opinion
in Neurobiology, 18 (4), 438–444.
Haddad, R., Weiss, T., Khan, R., Nadler, B., Mandairon, N., Bensafi, M., et al. (2010). Glob
al features of neural activity in the olfactory system form a parallel code that predicts ol
factory behavior and perception. Journal of Neuroscience, 30 (27), 9017–9026.
Hallem, E. A., & Carlson, J. R. (2006). Coding of odors by a receptor repertoire. Cell, 125
(1), 143–160.
Page 30 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Halpern, M. (1987). The organization and function of the vomeronasal system. Annual Re
view of Neuroscience, 10 (1), 325–362.
Hammer, F. J. (1951). The relation of odor, taste and flicker-fusion thresholds to food in
take. Journal of Comparative Physiology and Psychology, 44 (5), 403–411.
Hauser, R., Marczak, M., Karaszewski, B., Wiergowski, M., Kaliszan, M., Penkowski, M.,
et al. (2008). A preliminary study for identifying olfactory markers of fear in the rat. Labo
ratory Animals (New York), 37 (2), 76–80.
Havlicek, J., Roberts, S. C., & Flegr, J. (2005). Women’s preference for dominant male
odour: Effects of menstrual cycle and relationship status. Biology Letters, 1 (3), 256–259.
Howard, J. D., Plailly, J., Grueschow, M., Haynes, J. D., & Gottfried, J. A. (2009). Odor qual
ity coding and categorization in human posterior piriform cortex. Nature Neuroscience,
12 (7), 932–938.
Illig, K. R., & Haberly, L. B. (2003). Odor-evoked activity is spatially distributed in piri
form cortex. Journal of Comparative Neurology, 457 (4), 361–373.
Jacob, S., McClintock, M. K., Zelano, B., & Ober, C. (2002). Paternally inherited HLA alle
les are associated with women’s choice of male odor. Nature Genetics, 30 (2), 175–179.
Jacob, S., Spencer, N. A., Bullivant, S. B., Sellergren, S. A., Mennella, J. A., & McClintock,
M. K. (2004). Effects of breastfeeding chemosignals on the human menstrual cycle. Hu
man Reproduction, 19 (2), 422–429.
Johnson, B. A., Ong, J., & Leon, M. (2010). Glomerular activity patterns evoked by natural
odor objects in the rat olfactory bulb are related to patterns evoked by major odorant
components. Journal of Comparative Neurology, 518 (9), 1542–1555.
Johnson, B. N., Mainland, J. D., & Sobel, N. (2003). Rapid olfactory processing implicates
subcortical control of an olfactomotor system. Journal of Neurophysiology, 90 (2), 1084–
1094.
Johnson, D. M. G., Illig, K. R., Behan, M., & Haberly, L. B. (2000). New features of connec
tivity in piriform cortex visualized by intracellular injection of pyramidal cells suggest
Page 31 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
that” primary” olfactory cortex functions like” association” cortex in other sensory sys
tems. Journal of Neuroscience, 20 (18), 6974.
Johnson, W. G., & Wildman, H. E. (1983). Influence of external and covert food stimuli on
insulin secretion in obese and normal persons. Behavioral Neuroscience, 97 (6), 1025–
1028.
Kadohisa, M., & Wilson, D. A. (2006). Separate encoding of identity and similarity of com
plex familiar odors in piriform cortex. Proceedings of the National Academy of Sciences U
S A, 103 (41), 15206–15211.
Keller, A., Zhuang, H., Chi, Q., Vosshall, L. B., & Matsunami, H. (2007). Genetic variation
in a human odorant receptor alters odour perception. Nature, 449 (7161), 468–472.
Kepecs, A., Uchida, N., & Mainen, Z. F. (2006). The sniff as a unit of olfactory processing.
Chemical Senses, 31 (2), 167.
Kepecs, A., Uchida, N., & Mainen, Z. F. (2007). Rapid and precise control of sniffing dur
ing olfactory discrimination in rats. Journal of Neurophysiology, 98 (1), 205.
Ketterer, C., Heni, M., Thamer, C., Herzberg-Schafer, S. A., Haring, H. U., & Fritsche, A.
(2010). Acute, short-term hyperinsulinemia increases olfactory threshold in healthy sub
jects. International Journal of Obesity (London), 35 (8), 1135–1138.
Khan, R., Luk, C., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., et al. (2007). Predict
ing odor pleasantness from odorant structure: Pleasantness as a reflection of the physical
world. Journal of Neuroscience, 27 (37), 10015–10023.
Kimchi, T., Xu, J., & Dulac, C. (2007). A functional circuit underlying male sexual behav
iour in the female mouse brain. Nature, 448 (7157), 1009–1014.
Kreher, S. A., Mathew, D., Kim, J., & Carlson, J. R. (2008). Translation of sensory input in
to behavioral output via an olfactory system. Neuron, 59 (1), 110–124.
Kuukasjarvi, S., Eriksson, C. J. P., Koskela, E., Mappes, T., Nissinen, K., & Rantala, M. J.
(2004). Attractiveness of women’s body odors over the menstrual cycle: the role of oral
contraceptives and receiver sex. Behavioral Ecology, 15 (4), 579–584.
Lagier, S., Carleton, A., & Lledo, P. M. (2004). Interplay between local GABAergic in
terneurons and relay neurons generates {gamma} oscillations in the rat olfactory bulb.
Journal of Neuroscience, 24 (18), 4382.
Laing, D. G. (1983). Natural sniffing gives optimum odour perception for humans. Percep
tion, 12 (2), 99–117.
Laurent, G. (1997). Olfactory processing: Maps, time and codes. Current Opinion in Neu
robiology, 7 (4), 547–553.
Page 32 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Laurent, G. (1999). A systems perspective on early olfactory coding. Science, 286 (5440),
723–728.
Laurent, G. (2002). Olfactory network dynamics and the coding of multidimensional sig
nals. Nature Reviews, Neuroscience, 3 (11), 884–895.
Laurent, G., Wehr, M., & Davidowitz, H. (1996). Temporal representations of odors in an
olfactory network. Journal of Neuroscience, 16 (12), 3837–3847.
Leon, M., & Johnson, B. A. (2003). Olfactory coding in the mammalian olfactory bulb.
Brain Research, Brain Research Reviews, 42 (1), 23–32.
Letarte, A. (1997). Similarities and differences in affective and cognitive origins of food
likings and dislikes* 1. Appetite, 28 (2), 115–129.
Li, W., Lopez, L., Osher, J., Howard, J. D., Parrish, T. B., & Gottfried, J. A. (2010). Right or
bitofrontal cortex mediates conscious olfactory perception. Psychological Science, 21 (10),
1454–1463.
Linster, C., Henry, L., Kadohisa, M., & Wilson, D. A. (2007). Synaptic adaptation
(p. 108)
Linster, C., Menon, A. V., Singh, C. Y., & Wilson, D. A. (2009). Odor-specific habituation
arises from interaction of afferent synaptic adaptation and intrinsic synaptic potentiation
in olfactory cortex. Learning and Memory, 16 (7), 452–459.
Little, A. C., Jones, B. C., & Burriss, R. P. (2007). Preferences for masculinity in male bod
ies change across the menstrual cycle. Hormones and Behavior, 51 (5), 633–639.
Louis-Sylvestre, J., & Le Magnen, J. (1980). Palatability and preabsorptive insulin release.
Neuroscience and Biobehavioral Reviews, 4 (Suppl 1), 43–46.
Ma, M., Grosmaitre, X., Iwema, C. L., Baker, H., Greer, C. A., & Shepherd, G. M. (2003).
Olfactory signal transduction in the mouse septal organ. Journal of Neuroscience, 23 (1),
317.
Mainland, J., Johnson, B. N., Khan, R., Ivry, R. B., & Sobel, N. (2005). Olfactory impair
ments in patients with unilateral cerebellar lesions are selective to inputs from the con
tralesion nostril. Journal of Neuroscience, 25 (27), 6362–6371.
Mainland, J., & Sobel, N. (2006). The sniff is part of the olfactory percept. Chemical Sens
es, 31 (2), 181–196.
Mallet, P., & Schaal, B. (1998). Rating and recognition of peers’ personal odors by 9-year-
old children: an exploratory study. Journal of General Psychology, 125 (1), 47–64.
Page 33 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Malnic, B., Hirono, J., Sato, T., & Buck, L. B. (1999). Combinatorial receptor codes for
odors. Cell, 96 (5), 713–723.
Mandairon, N., Poncelet, J., Bensafi, M., & Didier, A. (2009). Humans and mice express
similar olfactory preferences. PLoS One, 4 (1), e4209.
Maresh, A., Rodriguez Gil, D., Whitman, M. C., & Greer, C. A. (2008). Principles of
glomerular organization in the human olfactory bulb—implications for odor processing.
PLoS One, 3 (7), e2640.
Martinez, M. C., Blanco, J., Bullon, M. M., & Agudo, F. J. (1987). Structure of the piriform
cortex of the adult rat: A Golgi study. J Hirnforsch, 28 (3), 341–834.
Mathey, M. F., Siebelink, E., de Graaf, C., & Van Staveren, W. A. (2001). Flavor enhance
ment of food improves dietary intake and nutritional status of elderly nursing home resi
dents. Journals of Gerontology. A. Biological Sciences and Medical Sciences, 56 (4),
M200–M205.
McBride, S. A., & Slotnick, B. (1997). The olfactory thalamocortical system and odor re
versal learning examined using an asymmetrical lesion paradigm in rats. Behavioral Neu
roscience, 111 (6), 1273.
Meredith, M. (2001). Human vomeronasal organ function: a critical review of best and
worst cases. Chemical Senses, 26 (4), 433.
Miller, S. L., & Maner, J. K. (2010). Scent of a woman: men’s testosterone responses to ol
factory ovulation cues. Psychological Sciences, 21 (2), 276–283.
Mombaerts, P., Wang, F., Dulac, C., Chao, S. K., Nemes, A., Mendelsohn, M., et al. (1996).
Visualizing an olfactory sensorymap. Cell, 87 (4), 675–686.
Monti-Bloch, L., Jennings-White, C., Dolberg, D. S., & Berliner, D. L. (1994). The human
vomeronasal system. Psychoneuroendocrinology, 19 (5–7), 673–686.
Moran, D. T., Rowley, J. C., Jafek, B. W., & Lovell, M. A. (1982). The fine-structure of the
olfactory mucosa in man. Journal of Neurocytology, 11 (5), 721–746.
Moskowitz, H. R., & Barbe, C. D. (1977). Profiling of odor components and their mixtures.
Sensory Processes, 1 (3), 212–226.
Mujica-Parodi, L. R., Strey, H. H., Frederick, B., Savoy, R., Cox, D., Botanov, Y., et al.
(2009). Chemosensory cues to conspecific emotional stress activate amygdala in humans.
PLoS One, 4 (7), 113–123.
Negoias, S., Visschers, R., Boelrijk, A., & Hummel, T. (2008). New ways to understand
aroma perception. Food Chemistry, 108 (4), 1247–1254.
Ober, C., Weitkamp, L. R., Cox, N., Dytch, H., Kostyu, D., & Elias, S. (1997). HLA and mate
choice in humans. Am J Hum Genet, 61 (3), 497–504.
Obrebowski, A., Obrebowska-Karsznia, Z., & Gawlinski, M. (2000). Smell and taste in chil
dren with simple obesity. International Journal of Pediatric Otorhinolaryngology, 55 (3),
191–196.
O’Doherty, J., Rolls, E. T., Francis, S., Bowtell, R., McGlone, F., Kobal, G., et al. (2000).
Sensory-specific satiety-related olfactory activation of the human orbitofrontal cortex.
Neuroreport, 11 (4), 893–897.
Pageat, P., & Gaultier, E. (2003). Current research in canine and feline pheromones. Vet
erinary Clinics of North America, Small Animal Practice, 33 (2), 187–211.
Pangborn, R. M., & Berggren, B. (1973). Human parotid secretion in response to pleasant
and unpleasant odorants. Psychophysiology, 10 (3), 231–237.
Plailly, J., Howard, J. D., Gitelman, D. R., & Gottfried, J. A. (2008). Attention to odor modu
lates thalamocortical connectivity in the human brain. Journal of Neuroscience, 28 (20),
5257–5267.
Porter, J., Anand, T., Johnson, B., Khan, R. M., & Sobel, N. (2005). Brain mechanisms for
extracting spatial information from smell. Neuron, 47 (4), 581–592.
Porter, J., Craven, B., Khan, R. M., Chang, S. J., Kang, I., Judkewitz, B., et al. (2007).
Mechanisms of scent-tracking in humans. Nature Neuroscience, 10 (1), 27–29.
Powell, T. P., Cowan, W. M., & Raisman, G. (1965). The central olfactory connexions. Jour
nal of Anatomy, 99 (Pt 4), 791.
Prehn, A., Ohrt, A., Sojka, B., Ferstl, R., & Pause, B. M. (2006). Chemosensory anxiety sig
nals augment the startle reflex in humans. Neuroscience Letters, 394 (2), 127–130.
Prehn-Kristensen, A., Wiesner, C., Bergmann, T. O., Wolff, S., Jansen, O., Mehdorn, H. M.,
et al. (2009). Induction of empathy by the smell of anxiety. PLoS One, 4 (6), e5987.
Page 35 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Price, J. L. (1973). An autoradiographic study of complementary laminar patterns
(p. 109)
Price, J. L. (1987). The central olfactory and accessory olfactory systems. In T. E. Finger &
W. L. Silver (Eds.), Neurobiology of taste and smell (179–203). New York: Wiley.
Price, J. L. (1990). Olfactory system. In G. Paxinos (Ed.), Human nervous system (pp. 979–
1001). San Diego: Academic Press.
Ressler, K. J., Sullivan, S. L., & Buck, L. B. (1993). A zonal organization of odorant recep
tor gene expression in the olfactory epithelium. Cell, 73 (3), 597–609.
Restrepo, D., Arellano, J., Oliva, A. M., Schaefer, M. L., & Lin, W. (2004). Emerging views
on the distinct but related roles of the main and accessory olfactory systems in respon
siveness to chemosensory signals in mice. Hormones and Behavior, 46 (3), 247–256.
Richardson, B. E., Vander Woude, E. A., Sudan, R., Thompson, J. S., & Leopold, D. A.
(2004). Altered olfactory acuity in the morbidly obese. Obesity Surgery, 14 (7), 967–969.
Rinberg, D., Koulakov, A., & Gelperin, A. (2006). Sparse odor coding in awake behaving
mice. Journal of Neuroscience, 26 (34), 8857.
Roberts, S. C., Gosling, L. M., Carter, V., & Petrie, M. (2008). MHC-correlated odour pref
erences in humans and the use of oral contraceptives. Proceedings of the Royal Society of
London. B. Biological Sciences, 275 (1652), 2715–2722.
Roberts, S. C., & Little, A. C. (2008). Good genes, complementary genes and human mate
preferences. Genetica, 132 (3), 309–321.
Roberts, T., & Roiser, J. P. (2010). In the nose of the beholder: are olfactory influences on
human mate choice driven by variation in immune system genes or sex hormone levels?
Experimental Biology and Medicine (Maywood), 235 (11), 1277–1281.
Roessner, V., Bleich, S., Banaschewski, T., & Rothenberger, A. (2005). Olfactory deficits in
anorexia nervosa. European Archives of Psychiatry and Clinical Neuroscience, 255 (1), 6–
9.
Rogers, P. J., & Hill, A. J. (1989). Breakdown of dietary restraint following mere exposure
to food stimuli: interrelationships between restraint, hunger, salivation, and food intake.
Addictive Behavior, 14 (4), 387–397.
Rolls, E. T., Critchley, H. D., & Treves, A. (1996). Representation of olfactory information
in the primate orbitofrontal cortex. Journal of Neurophysiology, 75 (5), 1982–1996.
Page 36 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Rolls, E. T., & Rolls, J. H. (1997). Olfactory sensory-specific satiety in humans. Physiology
& Behavior, 61 (3), 461–473.
Russell, M. J., Switz, G. M., & Thompson, K. (1980). Olfactory influences on the human
menstrual cycle. Pharmacology, Biochemisty, and Behavior, 13 (5), 737–738.
Saito, H., Chi, Q., Zhuang, H., Matsunami, H., & Mainland, J. D. (2009). Odor coding by a
Mammalian receptor repertoire. Science Signal, 2 (60), ra9.
Saper, C. B., Chou, T. C., & Elmquist, J. K. (2002). The need to feed: Homeostatic and he
donic control of eating. Neuron, 36 (2), 199–211.
Savic, I., & Gulyas, B. (2000). PET shows that odors are processed both ipsilaterally and
contralaterally to the stimulated nostril. Neuroreport, 11 (13), 2861–2866.
Savigner, A., Duchamp-Viret, P., Grosmaitre, X., Chaput, M., Garcia, S., Ma, M., et al.
(2009). Modulation of spontaneous and odorant-evoked activity of rat olfactory sensory
neurons by two anorectic peptides, insulin and leptin. Journal of Neurophysiology, 101
(6), 2898–2906.
Schaal, B., Marlier, L., & Soussignan, R. (2000). Human foetuses learn odours from their
pregnant mother’s diet. Chemical Senses, 25 (6), 729–737.
Schiffman, S. S. (1997). Taste and smell losses in normal aging and disease. Journal of the
American Medical Association, 278 (16), 1357–1362.
Schiffman, S. S., & Warwick, Z. S. (1988). Flavor enhancement of foods for the elderly can
reverse anorexia. Neurobiology of Aging, 9 (1), 24–26.
Schmidt, H. J., & Beauchamp, G. K. (1988). Adult-like odor preferences and aversions in
three-year-old children. Child Development, 1136–1143.
Schneider, R. A., & Wolf, S. (1955). Olfactory perception thresholds for citral utilizing a
new type olfactorium. Journal of Applied Physiology, 8 (3), 337–342.
Sela, L., Sacher, Y., Serfaty, C., Yeshurun, Y., Soroker, N., & Sobel, N. (2009). Spared and
impaired olfactory abilities after thalamic lesions. Journal of Neuroscience, 29 (39),
12059.
Page 37 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Sela, L., & Sobel, N. (2010). Human olfaction: A constant state of change-blindness. Ex
perimental Brain Research, 1–17.
Shepherd, G. M. (2004). The human sense of smell: Are we better than we think? PLoS Bi
ology, 2 (5), E146.
Shipley, M. T. (1995). Olfactory system. In G. Paxinos (Ed.), Rat nervous system (2nd ed.,
pp. 899–928). San Diego: Academic Press.
Singh, D., & Bronstad, P. M. (2001). Female body odour is a potential cue to ovulation.
Proceedings of the Royal Society of London. B. Biological Sciences, 268 (1469), 797–801.
Slotnick, B. M., & Schoonover, F. W. (1993). Olfactory sensitivity of rats with transection
of the lateral olfactory tract. Brain Research, 616 (1–2), 132–137.
Small, D. M., Gerber, J. C., Mak, Y. E., & Hummel, T. (2005). Differential neural responses
evoked by orthonasal versus retronasal odorant perception in humans. Neuron, 47 (4),
593–605.
Small, D. M., Jones-Gotman, M., Zatorre, R. J., Petrides, M., & Evans, A. C. (1997). Flavor
processing: more than the sum of its parts. Neuroreport, 8 (18), 3913–3917.
Snyder, D., Duffy, V., Chapo, A., Cobbett, L., & Bartoshuk, L. (2003). Childhood taste dam
age modulates obesity risk: Effects on fat perception and preference. Obesity Research,
11, A147–A147.
Sobel, N., Prabhakaran, V., Desmond, J. E., Glover, G. H., Goode, R. L., Sullivan, E. V., et
al. (1998). Sniffing and smelling: Separate subsystems in the human olfactory cortex. Na
ture, 392 (6673), 282–286.
Sobel, N., Prabhakaran, V., Hartley, C. A., Desmond, J. E., Zhao, Z., Glover, G. H.,
(p. 110)
et al. (1998). Odorant-induced and sniff-induced activation in the cerebellum of the hu
man. Journal of Neuroscience, 18 (21), 8990–9001.
Soussignan, R., Schaal, B., Marlier, L., & Jiang, T. (1997). Facial and autonomic responses
to biological and artificial olfactory stimuli in human neonates: Re-examining early hedo
nic discrimination of odors. Physiology & Behavior, 62 (4), 745–758.
Spehr, M., & Munger, S. D. (2009). Olfactory receptors: G protein-coupled receptors and
beyond. Journal of Neurochemistry, 109 (6), 1570–1583.
Spehr, M., Spehr, J., Ukhanov, K., Kelliher, K., Leinders-Zufall, T., & Zufall, F. (2006). Sig
naling in the chemosensory systems: Parallel processing of social signals by the mam
malian main and accessory olfactory systems. Cellular and Molecular Life Sciences, 63
(13), 1476–1484.
Page 38 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Stafford, L. D., & Welbeck, K. (2010). High hunger state increases olfactory sensitivity to
neutral but not food odors. Chemical Senses, 36 (2), 189–198.
Steiner, J. E. (1979). Human facial expressions in response to taste and smell stimulation.
Advances in Child Development and Behavior, 13, 257–295.
Stern, K., & McClintock, M. K. (1998). Regulation of ovulation by human pheromones. Na
ture, 392 (6672), 177–179.
Stettler, D. D., & Axel, R. (2009). Representations of odor in the piriform cortex. Neuron,
63 (6), 854–864.
Stevens, D. A., & Lawless, H. T. (1981). Age-related changes in flavor perception. Appetite,
2 (2), 127–136.
Stoddart, D. M. (1990). The scented ape: The biology and culture of human odour: Cam
bridge, UK: Cambridge University Press.
Storan, M. J., & Key, B. (2006). Septal organ of Gr¸neberg is part of the olfactory system.
Journal of Comparative Neurology, 494 (5), 834–844.
Strotmann, J., Wanner, I., Krieger, J., Raming, K., & Breer, H. (1992). Expression of odor
ant receptors in spatially restricted subsets of chemosensory neurons. Neuroreport, 3
(12), 1053–1056.
Su, C. Y., Menuz, K., & Carlson, J. R. (2009). Olfactory perception: receptors, cells, and
circuits. Cell, 139 (1), 45–59.
Tanabe, T., Iino, M., Ooshima, Y., & Takagi, S. F. (1974). Olfactory area in prefrontal lobe.
Brain Research, 80 (1), 127–130.
Tham, W. W. P., Stevenson, R. J., & Miller, L. A. (2010). The role of the mediodorsal thala
mic nucleus in human olfaction. Neurocase, 99999 (1), 1–12.
Thornhill, R., Gangestad, S. W., Miller, R., Scheyd, G., McCollough, J. K., & Franklin, M.
(2003). Major histocompatibility complex genes, symmetry, and body scent attractiveness
in men and women. Behavioral Ecology, 14 (5), 668–678.
Tomori, Z., Benacka, R., & Donic, V. (1998). Mechanisms and clinicophysiological implica
tions of the sniff- and gasp-like aspiration reflex. Respiration Physiology, 114 (1), 83–98.
Uchida, N., Takahashi, Y. K., Tanifuji, M., & Mori, K. (2000). Odor maps in the mammalian
olfactory bulb: Domain organization and odorant structural features. Nature Neuro
science, 3 (10), 1035–1043.
Page 39 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Vassar, R., Ngai, J., & Axel, R. (1993). Spatial segregation of odorant receptor expression
in the mammalian olfactory epithelium. Cell, 74 (2), 309–318.
Verhagen, J. V., Wesson, D. W., Netoff, T. I., White, J. A., & Wachowiak, M. (2007). Sniffing
controls an adaptive filter of sensory input to the olfactory bulb. Nature Neuroscience, 10
(5), 631–639.
Wedekind, C., & Furi, S. (1997). Body odour preferences in men and women: Do they aim
for specific MHC combinations or simply heterozygosity? Proceedings of the Royal Soci
ety of London. B. Biological Sciences, 264 (1387), 1471–1479.
Wedekind, C., Seebeck, T., Bettens, F., & Paepke, A. J. (1995). MHC-dependent mate pref
erences in humans. Proceedings of the Royal Society of London. B. Biological Sciences,
260 (1359), 245–249.
Wilson, D. A. (1997). Binaral interactions in the rat piriform cortex. Journal of Neurophys
iology, 78 (1), 160–169.
Wilson, D. A. (2009b). Pattern separation and completion in olfaction. Annals of the New
York Academy of Sciences, 1170, 306–312.
Wilson, D. A., & Stevenson, R. J. (2003). The fundamental role of memory in olfactory per
ception. Trends in Neurosciences, 26 (5), 243–247.
Wilson, R. I., & Mainen, Z. F. (2006). Early events in olfactory processing. Neuroscience,
29 (1), 163.
Witt, M., & Hummel, T. (2006). Vomeronasal versus olfactory epithelium: is there a cellu
lar basis for human vomeronasal perception? International review of cytology, 248, 209–
259.
Wysocki, C. J., & Meredith, M. (1987). The vomeronasal system. In T. E. Finger & W. L.
Silver (Eds.), Neurobiology of taste and smell (pp. 125–150). New York: John Wiley &
Sons.
Wysocki, C. J., Pierce, J. D., & Gilbert, A. N. (1991). Geographic, cross-cultural, and indi
vidual variation in human olfaction. In T. V. Getchell (Ed.), Smell and taste in health and
disease (pp. 287–314). New York: Raven Press.
Wysocki, C. J., & Preti, G. (2004). Facts, fallacies, fears, and frustrations with human
pheromones. Anatomical Record. A. Discoveries in Molecular, Cellular, and Evolutionary
Biology, 281 (1), 1201–1211.
Page 40 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Yeshurun, Y., & Sobel, N. (2010). An odor is not worth a thousand words: from multidi
mensional odors to unidimensional odor objects. Annual Review of Psychology, 61, 219–
241.
Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness* 1. Consciousness
and Cognition, 8 (2), 225–259.
Zelano, C., & Sobel, N. (2005). Humans as an animal model for systems-level organization
of olfaction. Neuron, 48 (3), 431–454.
Zhang, X., & Firestein, S. (2002). The olfactory receptor gene superfamily of the mouse.
Nature Neuroscience, 5 (2), 124–133.
Zhou, W., & Chen, D. (2009). Fear-related chemosignals modulate recognition of fear in
ambiguous facial expressions. Psychological Science, 20 (2), 177.
Zufall, F., Firestein, S., & Shepherd, G. M. (1994). Cyclic nucleotide-gated ion channels
and sensory transduction in olfactory receptor neurons. Annual Review of Biophysics and
Biomolecular Structure, 23 (1), 577–607.
Roni Kahana
Noam Sobel
Page 41 of 41
Cognitive Neuroscience of Music
Humans engage with music in many ways, and music is associated with many aspects of
our personal and social lives. Music represents an organization of our auditory environ
ments, and many neural processes must be recruited and coordinated both to perceive
and to create musical patterns. Accordingly, our musical experiences depend on the inter
play of diverse brain systems underlying perception, cognition, action, and emotion. Com
pared with the study of other human faculties, the neuroscientific study of music is rela
tively recent. Paradigms for examining musical functions have been adopted from other
domains of neuroscience and also developed de novo. The relationship of music to other
cognitive domains, in particular language, has garnered considerable attention. This
chapter provides a survey of the experimental approaches taken, and emphasizes consis
tencies across the various studies that help us understand musical functions within the
broader field of cognitive neuroscience.
Introduction
Discoveries of ancient bone flutes illustrate that music has been a part of human society
for millennia (Adler, 2009). Although the reason for why the human brain enables musical
capacities is hotly debated—did it evolve like language, or is it an evolutionary epiphe
nomenon?—the fact that human individuals and societies devote considerable time and
resources to incorporate music into their lives is indisputable.
Scientific study of the psychology and neuroscience of music is relatively recent. In terms
of human behaviors, music is usually viewed and judged in relation to language. Accord
ingly, the organization of music in the brain has been viewed by many in terms of modular
organization, whereby music-specific processing modules are associated with specialized
neural substrates in much the same way that discrete brain areas (e.g. Broca’s and
Wernicke’s areas) have been associated traditionally with language functions (Peretz &
Coltheart, 2003). Indeed, neuropsychological case studies in which the loss of language
Page 1 of 42
Cognitive Neuroscience of Music
abilities can be dissociated from the loss of musical abilities, and vice versa, clearly sup
port a modular view. To the extent that language has been treated as a domain of cogni
tive neuroscience that is largely separate from other functional domains—such as percep
tion, memory, attention, action, and emotion—music, too, has been regarded as a cogni
tive novelty with a unique embodiment in the human brain. In other words, music has
been subjected to the same localizationist pressures that have historically pervaded cog
nitive neuroscience. Neuroscientific inquiry into musical functions has therefore focused
most often on the auditory cortices situated along the superior surface of the temporal
lobes. The logic is simple: if music is an auditory phenomenon and the auditory cortex
(p. 112) processes auditory information, then music must reside in the auditory cortex.
Although it is true that we mainly listen to music, meaningful engagement with music is
much more than a perceptual phenomenon. The most obvious example is that of musical
performance, whereby the action systems of the brain must be engaged. However, as de
scribed below in detail, underneath overt perception and action lie more subtle aspects of
musical engagement: attention, various forms of memory, and covert action such as men
tal imagery. Finally, music plays on our emotions in various ways. Thus, the most parsimo
nious view of how music coexists with the other complex behaviors that the human brain
supports is not one in which all of the cognitive functions on which music depends are lo
calized to a specific area of the brain, but rather one in which musical experiences are in
stantiated through the coupling of networks that serve domain general functions. Follow
ing a brief description of key principles underlying the organization of acoustic informa
tion into music, this chapter treats the neuroscience of each of those functions in turn.
Time
The patterning of acoustic events in time is a crucial element of music that gives rise to
rhythm and meter, properties that are essential in determining how we move along with
music. We often identify “the beat” in music by tapping our feet or bobbing our heads
with a regular periodicity that seems to fit best with the music, and this capacity for en
trainment has strong social implications in terms of coordinated action and shared experi
ence (Pressing, 2002; Wiltermuth & Heath, 2009). Importantly, meter and rhythm provide
a temporal scaffold that guides our expectations for when events will occur (Jones, 1976;
London, 2004). This scaffold might be thought of in terms of coupled oscillators that fo
cus attention at particular moments in time (Large & Jones, 1999; Large & Palmer, 2002)
Page 2 of 42
Cognitive Neuroscience of Music
and thereby influence our perception of other musical attributes such as pitch (Barnes &
Jones, 2000; Jones et al., 2002).
Our perception of metric structure, that aspect of music that distinguishes a waltz from a
march, is associated with hierarchies of expectations for when events will occur (Jones &
Boltz, 1989; Palmer & Krumhansl, 1990). Some temporal locations within a metric struc
ture are more likely to be associated with events (“strong beats”), whereas others are less
likely (“weak beats”). The manipulation of when events actually occur relative to weak
and strong beat locations is associated with syncopation, a salient feature of many
rhythms, which in turn characterize musical styles and shape our tendency to move along
with music (Pressing, 2002).
Tonality
Tonality refers to notes (pitches) and the relationships among notes. Even though there
are many notes that differ in their fundamental frequency (e.g., the frequency associated
with each key on a piano keyboard), tonality in Western tonal music is based on twelve
pitch classes, with each pitch class associated with a label such as C or C-sharp (Figure
7.1). The organization into twelve pitch classes arises because (1) two sounds that are
separated by an octave (a doubling in frequency) are perceptually similar and are given
the same note name (e.g., F), and (2) an octave is divided into twelve equal (logarithmic)
frequency steps called semitones. When we talk about melody we refer to sequences of
notes, and when we talk about harmony, we refer to two or more notes that sound at the
same time to produce an interval or chord. Melodies are defined in large part by their
contours—the pattern of ups and downs of the notes—such that changes in contour are
easier to detect than contour-preserving shifts of isolated notes when the entire melody is
transposed to a different key (Dowling, 1978).
Page 3 of 42
Cognitive Neuroscience of Music
Central to the way that tonality works are the concepts of pitch probability distributions
and the related notion of key (Krumhansl, 1990; Temperley, 2001, 2007). When we say
that a piece of music is in the key of G-major or g-minor, it means that certain notes, such
as G or D, will be perceived as fitting well into that key, whereas others, like G-sharp or C-
sharp, will sound out of place. Tallying up the notes of many different pieces written in
the key of G-major, we would find that G and D are the notes that occur most often,
whereas G-sharp and C-sharp would not be very frequent. That is, (p. 113) the key of G-
major is associated with a particular probability distribution across the twelve possible
pitch classes. The key of D-major is associated with a slightly different probability distrib
ution, albeit one that is closely related to that of G-major in that the note (pitch class) D
figures prominently in both. However, the probability distribution for the key of C-sharp
major differs markedly from that of G-major. These probability distributions are often re
ferred to as key profiles or tonal hierarchies, and they simultaneously reflect the statistics
of music as well as perceptual distances between individual notes and the keys (or tonal
contexts) that have been established in the minds of listeners (Krumhansl, 1990; Temper
ley, 2007). Knowledge of the statistics underlying tonality is presumably acquired through
(p. 114) implicit statistical learning mechanisms (Tillmann et al., 2000), as evidenced by
the brain’s rapid adaptation to novel tonal systems (Loui et al., 2009b).
Page 5 of 42
Cognitive Neuroscience of Music
Timbre
The third broad dimension of music is timbre. Timbre is what distinguishes instruments
from each other. Timbre is the spectrotemporal signature of an instrument: a description
of how the frequency content of the sound changes in time. If one considers the range of
musical sounds that are generated not just by physical instruments but also by electronic
synthesizers, human voices, and environmental sounds that are used for musical purpos
es, the perceptual space underlying timbre is vast. Based on multidimensional scaling
analyses of similarity judgments between pairs of sounds, it has been possible to identify
candidate dimensions of timbre (Caclin et al., 2005; Grey, 1977; Lakatos, 2000; McAdams
et al., 1995). Most consistently identified across studies are dimensions corresponding to
attack and spectral centroid. Attack refers to the onset characteristics of the amplitude
envelope of the sound (e.g., percussive sounds have a rapid attack, whereas bowed
sounds have a slower attack). Centroid refers to the location along the frequency axis
where the peak of energy lies, and is commonly described as the “brightness” of a sound.
The use of acoustic features beyond attack and centroid for the purpose of judging the
similarities between pairs of sounds is much less consistent and appears to be context de
pendent (Caclin et al., 2005). For (p. 115) example, variation in spectral fine structure,
such as the relative weighting of odd and even frequency components, or time-varying
spectral features (spectrotemporal flux) influence similarity judgments also. The fact that
timbre cannot be decomposed into a compact set of consistent dimensions that satisfacto
rily explain the abundance of perceptual variation somewhat complicates the search for
the functional representation of timbre in the brain.
Page 6 of 42
Cognitive Neuroscience of Music
al., 2000). Whereas having the context of a familiar melody generally facilitates the abili
ty to detect when a final note is mistuned (Warrier & Zatorre, 2002), RTL damage re
duces the amount of that facilitation (Warrier & Zatorre, 2004), indicating that the pro
cessing of pitch relationships in the temporal lobes also affects more basic perceptual
processes such as intonation judgments. Interestingly, the melody processing functions of
the RTL appear to depend in large part on areas that are anterior to the primary auditory
cortex, which is situated on Heschl’s gyrus (HG; Johnsrude et al., 2000; Samson & Za
torre, 1988).
The results from the patient studies are supported by a number of functional magnetic
resonance imaging (fMRI) studies of pitch and melody processing. A hierarchy of pitch
processing is observed in the auditory cortex following a medial to lateral progression.
Broadband noise or sounds that have no clear pitch activate medial HG, sounds with dis
tinct pitch produce more activation in the lateral half of HG, and sequences in which the
pitch varies, in either tonal or random melodies, generate activity that extends rostrally
from HG along the superior temporal gyrus (STG) toward the planum polare, biased to
ward the right hemisphere (Patterson et al., 2002).
One of the critical features of pitch in music is the distinction between pitch height and
pitch chroma (pitch class). The chroma of a pitch is referred to by its note name (e.g. D,
D#). Critically, chroma represent perceptual constancy that allows notes played in differ
ent octaves to be identified as the same note. These separable aspects of pitch appear to
have partially distinct neural substrates, with preferential processing of pitch height pos
terior to HG in the planum temporale, and processing of chroma anterolateral to HG in
the planum polare (Warren et al., 2003), consistent with the proposed role of anterior
STG regions in melody processing (Griffiths et al., 1998; Patterson et al., 2002;
Schmithorst & Holland, 2003). The neural representations of individual pitches in melod
ic sequences are also influenced by the statistical structure of the sequences (Patel & Bal
aban, 2000, 2004). In these experiments, the neural representations were quantified by
examining the quality of the coupling between the amplitude modulations of the tones
used to create the sequence and the amplitude modulations in the response recorded
above the scalp using magnetoencephalography (MEG). Random sequences elicited little
coupling, whereas highly predictable musical scales elicited the strongest coupling. These
effects were strongest above temporal and lateral prefrontal sensor sites, consistent with
a hypothesis that a frontotemporal circuit supports the processing of melodic structure.
Page 7 of 42
Cognitive Neuroscience of Music
When scales or melodies end in notes that do not belong to the established key, large pos
itive deflections are evident in event-related potentials (ERPs) recorded at posterior scalp
sites, indicating the activation of congruency monitoring and context-updating processes
indexed by the P300 or late-positive complex (LPC) components of ERP waveforms
(Besson & Faïta, 1995; Besson & Macar, 1987; Paller et al., 1992). These effects are ac
centuated in subjects with musical training and when the melodies are familiar (Besson &
Faïta, 1995; Miranda & Ullman, 2007). Similarly, short sequences of chords that termi
nate with a chord that is unexpected given the tonal context established by the preceding
chords elicit P300 and LPC components (Beisteiner et al., 1999; Carrion & Bly, 2008;
Janata, 1995; Koelsch et al., 2000; Patel et al., 1998), even in natural musical contexts
(Koelsch & Mulder, 2002). As would be expected given the sensitivity of the P300 to glob
al event probability (Tueting et al., 1970), the magnitude of the posterior positive re
sponses increases as the starkness of the harmonic violation increases (Janata, 1995; Pa
tel et al., 1998).
The appearance of the late posterior positivities depends on overt processing of the tar
get chords by making either a detection or categorization judgment. When explicit judg
ments about target chords are eliminated, the most prominent deviance-related response
is distributed frontally, and typically manifests as an attenuated positivity approximately
200 ms after the onset of a deviant chord. This relative negativity in response to contextu
ally irregular chords was termed an early right anterior negativity (ERAN; Koelsch et al.,
2000), although in many subsequent studies, it was found to be distributed bilaterally.
The ERAN has been studied extensively and is interesting for two principle reasons. First,
the ERAN and the right anterior negativity (RATN; Patel et al., 1998) have been interpret
ed as markers of syntactic processing in music, paralleling the left anterior negativities
associated with the processing of syntactic deviants in language (Koelsch et al., 2000; Pa
tel et al., 1998). Localization of the ERAN to Broca’s area using MEG supports such an in
terpretation (Maess et al., 2001). (The parallels between syntactic processing in music
and language are discussed in a later subsection.) Second, the ERAN is regarded as an in
dex of automatic harmonic syntax processing in that it is elicited even when the irregular
chords themselves are not task relevant (Koelsch et al., 2000; 2002b). Whether the ERAN
is attenuated when attention is oriented away from musical material is a matter of some
debate (Koelsch et al., 2002b; Loui et al., 2005).
The ERAN is a robust index of harmonic expectancy violation processing, and it is sensi
tive to musical training. It is found in children (Koelsch et al., 2003b), and it increases in
amplitude with musical training in both adults (Koelsch et al., 2002a) and children
(Jentschke & Koelsch, 2009).
The amplitude of the ERAN is also sensitive to the probability with which a particular
chord occurs at a particular location in a sequence. For example, the ERAN to the same
irregular chord function, such as a Neapolitan sixth, is weaker when that chord occurs at
a sequence location that is more plausible from a harmonic syntax perspective (Leino et
al., 2007). Similarly, using Bach chorales, chords that are part of the original score, but
Page 8 of 42
Cognitive Neuroscience of Music
not the most expected from a music-theoretical point of view, elicit an ERAN, in compari
son to more expected chords that have been substituted in, but a much weaker ERAN
than highly unexpected Neapolitan sixth chords inserted into the same location (Steinbeis
et al., 2006).
The automaticity of the ERAN naturally leads to comparisons with the mismatch negativi
ty (MMN), the classic marker of preattentive detection of deviant items in auditory
streams (Näätänen, 1992; Näätänen & Winkler, 1999). Given that irregular chords might
be regarded as violations of an abstract context established by a sequence of chords, the
ERAN could just be a form of “abstract MMN.” The ERAN and MMN occur within a simi
lar latency range, and their frontocentral distributions often make them difficult to distin
guish from one another based on their scalp topographies (e.g. Koelsch et al., 2001; Leino
et al., 2007). Nonetheless, the ERAN and MMN are dissociable (Koelsch, 2009). For ex
ample, an MMN elicited by an acoustically aberrant stimulus, such as a mistuned chord
or a change in the frequency of a note (frequency MMN), does not show sensitivity to lo
cation within a harmonic context (Koelsch et al., 2001; Leino et al., 2007). Moreover, if
the sensory properties of the chord sequences are carefully controlled in terms of repeti
tion priming for specific notes or the relative roughness of target chords and those in the
preceding context, an ERAN is elicited by harmonically incongruent chords even if the
harmonically incongruent chords are more similar (p. 117) in their sensory characteristics
to the penultimate chords than are the harmonically congruent chords (Koelsch et al.,
2007).
A number of fMRI studies have contributed to the view that musical syntax is evaluated in
the ventrolateral prefrontal cortex (VLPFC), in a region comprising the ventral aspect of
the inferior frontal gyrus (IFG), frontal operculum, and anterior insula. The evaluation of
target chords in a harmonic priming task results in bilateral activation of this region, and
is greater for harmonically unrelated targets than harmonically related targets (Koelsch
et al., 2005b; Tillmann et al., 2003). Similarly, chord sequences that contain modulations
—strong shifts in the tonal center toward another key—activate this region in the right
hemisphere (Koelsch et al., 2002c). Further evidence that the VLPFC is sensitive to con
textual coherence comes from a paradigm in which subjects listened to a variety of 23-
second excerpts of familiar and unfamiliar classical music. Each excerpt was rendered in
coherent by being chopped up into 250- to 350-ms segments and then reconstituted with
random arrangement of the segments. Bilaterally, the inferior frontal cortex and adjoining
insula responded more strongly to the normal music, compared with reordered music
(Levitin & Menon, 2003).
Tonal Dynamics
Despite their considerable appeal from an experimental standpoint, trial-based expectan
cy violation paradigms are limited in their utility in investigating the brain dynamics that
accompany listening to extended passages of music in which the manipulation of ex
pectancies is typically more nuanced and ongoing. When examined more closely, chord
sequences such as those used in the experiments described above do more than establish
a particular static tonal context. They actually create patterns of movement within tonal
Page 9 of 42
Cognitive Neuroscience of Music
space—the system of major and minor keys that can be represented on a torus. The de
tails of the trajectories depend on the specific chords and the sequence in which they oc
cur (Janata, 2007). Different pieces of music will create different patterns of movement
through tonal space, depending on the notes in the melodies and harmonic accompani
ments. The time-varying pattern on the toroidal surface can be quantified for any piece of
music, and this quantification can then be used to probe the time-varying structure of the
fMRI activity recorded while a person listens to the music. This procedure identifies brain
areas that are sensitive to the movements of the music through tonal space (Janata, 2005,
2009; Janata et al., 2002b).
The “tonality-tracking” approach has suggested a role of the medial prefrontal cortex
(MPFC) in the maintenance of tonal contexts and the integration of tonal contexts with
music-evoked autobiographical memories (Janata, 2009). When individuals underwent fM
RI scans while listening attentively to an arpeggiated melody that systematically moved
(modulated) through all twenty-four major and minor keys over the course of 8 minutes
(Janata et al., 2003), the MPFC was the one brain region that was consistently active
across three to four repeated sessions within listeners and across listeners, even though
consistent tonality tracking responses were observed at the level of individuals in several
brain areas (Janata, 2005; Janata et al., 2002b). ERP studies provide converging evidence
for a context maintenance interpretation in that a midline negativity with a very frontal
focus characterizes both the N5 component, a late negative peak that has been interpret
ed to reflect contextual integration of harmonically incongruous material (Koelsch et al.,
2000; Loui et al., 2009b), and a sustained negative shift in response to modulating se
quences (Koelsch et al., 2003a). As discussed below, tonality tracking in the MPFC is ger
mane to understanding how music interacts with functions underlying a person’s sense of
self because the MPFC is known to support such functions (Gilbert et al., 2006; Northoff
& Bermpohl, 2004; Northoff et al., 2006).
As in the case of melody perception, the question arises to what extent auditory cortical
areas in the temporal lobe are engaged in the processing of musical rhythmic patterns.
Studies of patients in whom varying amounts of either the left or right anterior temporal
lobes have been removed indicate that the greatest impairment in reproducing rhythmic
patterns is found in patients with excisions that encroach on secondary auditory areas in
HG in the right hemisphere (Penhune et al., 1999). The deficits are observed when exact
durational patterns are to be reproduced, but not when the patterns can be encoded cate
gorically as sequences of long and short intervals.
Given the integral relationship between timing and movement, and the propensity of hu
mans to move along with the beat in the music, neuroimaging experiments of rhythm and
meter perception have examined the degree to which motor systems (p. 118) of the brain
are engaged alongside auditory areas during passive listening to rhythms or attentive lis
tening while performing a secondary discrimination task (Grahn & Rowe, 2009), listening
with the intent to subsequently synchronize with or reproduce the rhythm (Chen et al.,
Page 10 of 42
Cognitive Neuroscience of Music
2008a), or listening with the intent to make a same/different comparison with a target
rhythm (Grahn & Brett, 2007). Discrimination of metrically simple, complex, and non
metric rhythms recruits, bilaterally, the auditory cortex, cerebellum, IFG, and a set of pre
motor areas including the basal ganglia (putamen), pre–supplementary motor area (pS
MA) or supplementary motor area (SMA), and dorsal premotor cortex (PMC) (Grahn &
Brett, 2007). The putamen has been found to respond more strongly to simple rhythms
than complex rhythms, suggesting that its activation is central to the experienced
salience of a beat. A subsequent study found stronger activation throughout the basal
ganglia in response to beat versus nonbeat rhythms, along with greater coupling of the
putamen with the auditory cortex and medial and lateral premotor areas in the beat con
ditions (Grahn & Rowe, 2009). Interestingly, the response of the putamen increased as ex
ternal accenting cues weakened, suggesting that activity within the putamen is also
shaped by the degree to which listeners generate a subjective beat.
Activity in premotor areas and the cerebellum is differentiated by the degree of engage
ment with a rhythm (Chen et al., 2008a). The SMA and mid-PMC are active during pas
sive listening to rhythms of varying complexity. Listening to a rhythm with the require
ment to subsequently synchronize with that rhythm recruits these regions along with ven
tral premotor and inferior frontal areas. These regions are then also active when subjects
subsequently synchronize their taps with the rhythm. Similar results are obtained in re
sponse to short 3-second piano melodies: lateral premotor areas are activated both dur
ing listening and during execution of arbitrary key press sequences without auditory
feedback (Bangert et al., 2006). Converging evidence for the recruitment of premotor ar
eas during listening to rhythmic structure in music has been obtained through analyses of
MEG data in which a measure related to the amplitude envelope of the auditory stimulus
is correlated with the same measure applied to the MEG data (Popescu et al., 2004). One
study of attentive listening to polyphonic music also found increased activation of premo
tor areas (pSMA, mid-PMC), although the study did not seek to associate these activa
tions directly with the rhythmic structure in the music (Janata et al., 2002a).
Timbre
Page 11 of 42
Cognitive Neuroscience of Music
A number of studies have made explicit use of the multidimensional scaling approach to
examine the organization of timbral dimensions in the brain. For example, when based on
similarity judgments of isolated tones varying in attack or spectral centroid, the perceptu
al space of patients with resections of the right temporal lobe is considerably more dis
torted than is that of normal controls or individuals with left temporal lobe resections
(Samson et al., 2002). These impairments are ameliorated to a great extent, but not en
tirely, when the timbral similarity of eight-note melodies is judged (Samson et al., 2002),
although the extent to which melodic perception as opposed to simple timbral reinforce
ment drives this effect is unclear.
Evidence that timbral variation is assessed at relatively early stages of auditory cortical
processing comes from observations that the MMN is similar to changes in the emotional
connotation of a tone played by a violin, a change in timbre from violin to flute, and
changes in pitch (Goydke et al., 2004). Moreover, MMN responses are largely additive
when infrequent ignored deviant sounds deviate from standard ignored sounds on multi
ple timbre dimensions simultaneously, suggesting that timbral dimensions are processed
within separate sensory memory channels (Caclin et al., 2006). Also, within the represen
tation of the spectral centroid dimension, the (p. 119) magnitude of the MMN varies lin
early with perceptual and featural similarity (Toiviainen et al., 1998).
Attention
Most of the research described to this point was aimed at understanding the representa
tion of basic musical features and dimensions, and the processing of change along those
dimensions, without much regard for the broader and perhaps more domain-general psy
chological processes that are engaged by music.
Following from the fact that expectancy violation paradigms have been a staple of cogni
tive neuroscience research on music, considerable information has been collected about
attentional capture by deviant musical events. Working from a literature based primarily
on visual attention, Corbetta and Shulman (2002) proposed a distinction between a dorsal
and a ventral attentional system, whereby the ventral attentional system is engaged by
novel or unexpected sensory input while the dorsal attentional system is active during en
dogenously guided expectations, such as the orientation of attention to a particular spa
Page 12 of 42
Cognitive Neuroscience of Music
tial location. Overall, the orienting and engagement of attention in musical tasks recruits
these attention systems. Monitoring for target musical events and the targets themselves
cause activity increases in the ventral system—in the VLPFC in the region of the frontal
operculum where the IFG meets the anterior insula (Janata et al., 2002a; Koelsch et al.,
2002c; Tillmann et al., 2003). The strongest activation arises when the targets violate har
monic expectations (Maess et al., 2001; Tillmann et al., 2003).
Structural markers in music, such as the boundaries of phrases or the breaks between
movements, also cause the ventral and dorsal attentional systems to become engaged
(Nan et al., 2008; Sridharan et al., 2007), with the ventral system leading the dorsal sys
tem (Sridharan et al., 2007). Attentive listening to excerpts of polyphonic music engages
both systems even in the absence of specific targets or boundary markers (Janata et al.,
2002a ; Satoh et al., 2001). Interestingly, the ventral attentional system is engaged, bilat
erally, when (1) an attentive listening task requires target detection in either selective or
divided attention conditions, or (2) selective listening is required without target detec
tion, but not during divided/global listening without target detection. When task demands
are shifted from target detection to focusing attention on an instrument as though one
were trying to memorize the part the instrument is playing, working memory areas in the
dorsolateral prefrontal cortex (DLPFC) are recruited bilaterally (Janata et al., 2002a).
Given the integral relationship between attention and timing (Jones, 1976; Large & Jones,
1999), elements of the brain’s attention and action systems interact when attention is fo
cused explicitly on timing judgments, and this interaction is modulated by individual dif
ferences in listening style. For example, while frontoparietal attention areas, together
with auditory and premotor areas, are engaged overall, greater activation is observed
within the ventral attentional system in those individuals who tend to orient their atten
tion toward a longer, rather than a subdivided, beat period (Grahn & McAuley, 2009).
Memory
Music depends on many forms of memory that aid in its perception and production. For
example, we form an implicit understanding of tonal structure that allows us to form ex
pectations and detect violations of those expectations. We also store knowledge about
musical styles in terms of the harmonic progressions, timbres, and orchestration that
characterize them. Beyond the memory for structural aspects of music are memories for
specific pieces of music or autobiographical memories that may be inextricably linked to
those pieces of music. We also depend on working memory to play music in our minds, ei
ther when imagining a familiar song that we have retrieved from long-term memory or
when repeating a jingle from a commercial that we just heard. Because linguistic materi
al is often an integral part of music (i.e., the lyrics in songs), the set of memory processes
that needs to be considered in association with music necessarily extends to include those
associated with language. Two questions that arise (p. 120) are, How do the different
memory systems interact, and how might they be characterized in terms of their overlap
with memory systems identified using different tasks and sensory modalities?
Page 13 of 42
Cognitive Neuroscience of Music
Working Memory
An early neuroimaging study of musical processes used short eight-note melodies and
found that parietal and lateral prefrontal areas were recruited when subjects had to com
pare the pitch of the first and last notes (Zatorre et al., 1994). This result suggested that
there is an overlap of musical working memory with more general working memory sys
tems. Direct evidence that verbal working memory and tonal working memory share the
same neural substrates—auditory, lateral prefrontal, and parietal cortices—was obtained
in two studies in which verbal and tonal material was presented and rehearsed in sepa
rate trials (Hickok et al., 2003) or in which the stimuli were identical but the task instruc
tions emphasized encoding and rehearsal of either the verbal or tonal material (Koelsch
et al., 2009). Further studies relevant to musical working memory are discussed below in
the section on musical imagery.
Episodic Memory
If we have a melody running through our minds, that is, if we are maintaining a melody in
working memory, it is likely the consequence of having heard and memorized the melody
at some point in the past. The melody need not even be one that we heard repeatedly dur
ing our childhood (remote episodic memory), but could be one that we heard for the first
time earlier in the experimental session (recent episodic memory). Neuroimaging experi
ments have examined both types of episodic memory.
One possible reason for the discrepant findings in the anterior temporal lobes is the use
of neuroimaging technique in that the positron emission tomography studies (Platel et al.,
2003; Satoh et al., 2006) were not susceptible to signal loss in those regions as were the
fMRI experiments (Janata, 2009; Plailly et al., 2007). Another is the use of complex
recorded musical material compared with monophonic melodies. Familiarity judgments
about monophonic melodies must be based solely on the pitch and temporal cues of a sin
gle melodic line, whereas recordings of orchestral or popular music contain a multitude
of timbral, melodic, harmonic, and rhythmic cues that can facilitate familiarity judgments.
Page 14 of 42
Cognitive Neuroscience of Music
Recruitment of the anterior temporal lobes is consistent with the neuropsychological evi
dence of melody processing and recognition deficits following damage to those areas (Ay
otte et al., 2000; Peretz, 1996). Indeed, when engaged in a recognition memory test in
which patients were first presented with twenty-four unfamiliar folk tune fragments, and
then made old/new judgments, those with right temporal lobe excisions were mainly im
paired on tune recognition, whereas left temporal lobe excisions resulted in impaired
recognition of the words (Samson & Zatorre, 1991). When tunes and words were com
bined, new words paired with old tunes led to impaired tune recognition in both patient
groups, suggesting some sort of text–tune integration process involving the temporal
lobes of both hemispheres.
Absolute Pitch
The rare ability to accurately generate the note name for a tone played in isolation with
out an external referent is popularly revered as a highly refined musical ability. Neu
roimaging experiments indicate that regions of the left DLPFC that are associated with
working memory functions become more active when absolute pitch possessors passively
listen to or make melodic interval categorization judgments about pairs of tones relative
to musically trained individuals without absolute pitch (Zatorre et al., 1998). The hypothe
sis that these regions become more active because of the process of associating a note
with a label is supported by the observation that nonmusician subjects who are trained to
associate chords with arbitrary numbers show activity within this region during that task
following training (Bermudez & Zatorre, 2005). Although the classic absolute pitch ability
as defined above is rare, the ability to distinguish above chance whether a recording of
popular music has been transposed by one or two semitones is common (Schellenberg &
Trehub, 2003), although not understood at a neuroscientific level.
Page 15 of 42
Cognitive Neuroscience of Music
Semantics
The question of whether music conveys meaning is another interesting point of compari
son between music and language. Music lacks sufficient specificity to unambiguously con
vey relationships between objects and concepts, but it can be evocative of scenes and
emotions (e.g., Beethoven’s Pastoral Symphony). Even without direct reference to lan
guage, music specifies relationships among successive tonal elements (e.g., notes and
chords), by virtue of the probability structures that govern music’s movement in tonal
space. Less expected transitions create a sense of tension, whereas expected transitions
release tension (Lerdahl & Krumhansl, 2007). Similarly, manipulations of timing (e.g.,
tempo, rhythm, and phrasing) parallel concepts associated with movement.
Evidence of music’s ability to interact with the brain’s semantic systems comes from two
elegant studies that make use of simultaneously presenting musical and linguistic materi
al. The first (Koelsch et al., 2004) used short passages of real music to prime semantic
contexts and then examined the ERP response to probe words that were either semanti
cally congruous or incongruous with the concept ostensibly primed by the music. The in
congruous words elicited an N400, indicating that the musical passage had indeed
primed a meaningful concept as intended. The second (Steinbeis & Koelsch, 2008) used
simultaneously presented words and chords, along with a moderately demanding dual
Page 16 of 42
Cognitive Neuroscience of Music
task in which attention had to be oriented to both the musical and linguistic information,
and found that semantically incongruous words affected the processing of irregular
(Neapolitan) chords. The semantic incongruities did not affect the ERAN, but rather af
fected the N500, a late frontal negativity that follows the ERAN in response to incongru
ous chords and is interpreted as a stage of contextual (p. 122) integration of the anom
alous chord (Koelsch et al., 2000).
Action
Although technological advances over the past few decades have made it possible to se
quence sounds and produce music without the need for a human to actively generate
each sound, music has been and continues to be intimately dependent on the motor sys
tems of the human brain. Musical performance is the obvious context in which the motor
systems are engaged, but the spectrum of actions associated with music extends beyond
overt playing of an instrument. Still within the realm of overt action are movements that
are coordinated with the music, be they complex dance moves or the simple tapping, nod
ding, or bobbing along with perceived beat. Beyond that is the realm of covert action, in
which no overt movements are detectable, but movements or sensory inputs are imag
ined. Even expectancy, the formation of mental images of anticipated sensory input, can
be viewed as a form of action (Fuster, 2001; Schubotz, 2007).
As described above, listening to rhythms in the absence of overt action drives activity
within premotor areas, the basal ganglia, and the cerebellum. Here, we examine musical
engagement of the brain’s action system when some form of action, either overt or
covert, is required. One of the beautiful things about music is the ability to engage the ac
tion system across varying degrees of complexity and still have it be a compelling musical
experience, from simple isochronous synchronization with the beat to virtuosic polyrhyth
mic performance on an instrument. Within other domains of cognitive neuroscience, there
is an extensive literature on timing and sequencing behaviors that shows differential en
gagement of the action systems as a function of complexity (Janata & Grafton, 2003),
which is largely echoed in the emerging literature pertaining explicitly to music.
Sensorimotor Coordination
Tapping
Perhaps the simplest form of musical engagement is tapping isochronously with a
metronome whereby a sense of meter is imparted through the periodic accentuation of a
subset of the pacing events (e.g., accenting every other beat to impart the sense of a
march or every third beat to impart the sense of a waltz). In these types of situations, a
very simple coupling between the posterior auditory cortex and dorsal premotor areas is
observed, in which the strength of the response in these two areas is positively correlated
with the strength of the accent that drives the metric salience and the corresponding be
havioral manifestation of longer taps to more salient events (Chen et al., 2006). When the
synchronization demands increase as simple and complex metric and then nonmetric
Page 17 of 42
Cognitive Neuroscience of Music
Basic synchronization with a beat also provides a basis for interpersonal synchronization
and musical communication. Simultaneous electroencephalogram (EEG) recordings from
guitarists given the task of mentally synchronizing with a metronome and then commenc
ing to play a melody together reveal that EEG activity recorded from electrodes situated
above premotor areas becomes synchronized both within and between the performers
(Lindenberger et al., 2009). Although the degree of interpersonal synchronization that
arises by virtue of shared sensory input is difficult to estimate, such simultaneous record
ing approaches are bound to shape our understanding of socioemotional aspects of senso
rimotor coordination.
Singing
The adjustment of one’s own actions based on sensory feedback is an important part of
singing. In its simplest form, the repeated singing/chanting of a single note, in compari
son to passive listening to complex tones, recruits auditory cortex, motor cortex, the
SMA, and the cerebellum, with possible involvement of the anterior cingulate and basal
ganglia (Brown et al., 2004b; Perry et al., 1999; Zarate & Zatorre, 2008). However, as
pitch regulation demands are increased through electronic shifting of the produced pitch,
additional premotor and attentional control regions such as the pSMA, ventral PMC,
basal ganglia, and intraparietal sulcus are recruited across tasks that require the singer
to either ignore the shift or try to compensate for it. The exact network of recruited areas
depends on the amount of singing experience (Zarate & Zatorre, 2008). In regard to more
melodic material, (p. 123) repetition of, or harmonization with, a melody also engages the
anterior STG relative to monotonic vocalization (Brown et al., 2004b), although this acti
vation is not seen during the singing of phrases from an aria that is familiar to the subject
(Kleber et al., 2007).
Performance
Tasks involving the performance of complex musical sequences, beyond those that are re
produced via the imitation of a prescribed auditory pattern, afford an opportunity to ob
serve the interaction of multiple brain systems. Performance can be externally guided by
a musical score, internally guided as in the case of improvisation, or some combination of
the two.
Score-Based
One of the first neuroimaging studies of musical functions examined performance of a
Bach partita from a score and the simpler task of playing scales contrasted with listening
to the corresponding music (Sergent et al., 1992). Aside from auditory, visual, and motor
Page 18 of 42
Cognitive Neuroscience of Music
areas recruited by the basic processes of hearing, score reading, and motor execution,
parietal regions were engaged, presumably by the visuomotor transformations associated
with linking the symbols in the score with a semantic understanding of those symbols as
well as associated actions (Bevan et al., 2003; McDonald, 2006; Schön et al., 2001; 2002).
In addition, left premotor and IFG areas were engaged, presumably reflecting some of the
sequencing complexity associated with the partita. A similar study (Parsons et al., 2005)
in which bimanual performance of memorized Bach pieces was compared with bimanual
playing of scales, found extensive activation of medial and lateral premotor areas, anteri
or auditory cortex, and subcortical activations in the thalamus and basal ganglia, presum
ably driven by the added demands of retrieving and executing complex sequences from
memory.
Other studies complicate the interpretation that premotor cortices are driven by greater
complexity in the music played. For example, separate manipulation of melodic and rhyth
mic complexity found some areas that were biased toward processing melodic informa
tion (mainly in the STG and calcarine sulcus), whereas others were biased toward pro
cessing rhythmic information (left inferior frontal cortex and inferior temporal gyrus), but
there was no apparent activation of premotor areas (Bengtsson & Ullen, 2006).
Improvised
Music, like language, is often improvised with the intent of producing a syntactically (and
semantically) coherent stream of auditory events. Given a task of continuing an unfamil
iar melody or linguistic phrase with an improvised sung melodic or spoken linguistic
phrase, a distributed set of brain areas is engaged in common for music and language, in
cluding the SMA, motor cortex, putamen, globus pallidus, cerebellum, posterior auditory
cortex, and lateral inferior frontal cortex (Brodmann area 44/45), although the extent of
activation in area 44/45 is greater for language (Brown et al., 2006). Lateral premotor ar
eas are consistently found to be active during improvisation tasks that involve various de
grees of piano performance realism. For instance, when unimanual production of
melodies is constrained by a five-key keyboard and instructions that independently vary
the amount of melodic or rhythmic freedom that can be exhibited by the subject, activity
in mid-dorsal premotor cortex is modulated by complexity along both dimensions
(Berkowitz & Ansari, 2008). A similar region is recruited during unimanual production
while improvising around a visually presented score, both when the improvised perfor
mance must be memorized and when it is improvised freely without memorization
(Bengtsson et al., 2007). A more dorsal premotor area is engaged during this type of im
provisation also, mirroring effects found in a study of unimanual improvisation in which
free improvisation without a score was contrasted with playing a jazz melody from memo
ry (Limb & Braun, 2008). The latter study observed activation within an extensive net
work encompassing the ventrolateral prefrontal (Brodmann area 44), middle temporal,
parietal, and cerebellar areas. Emotion areas in the ventromedial prefrontal cortex were
active during improvisation also, providing the first neuroimaging evidence of how motor
control areas are coupled with affective areas during a highly naturalistic task. Interest
ingly, both of the studies in which improvisation was least constrained also found substan
Page 19 of 42
Cognitive Neuroscience of Music
tial activation in extrastriate visual cortices that could not be attributed to visual input or
score reading, suggesting perhaps that visual mental imagery processes accompany im
provisation. One must note that in all of these studies, subjects were musicians, often
with high levels of training.
Imagery
Auditory Imagery
Activation of auditory association cortices is found using fMRI or PET when subjects sing
through a short melody in order to compare the pitch of two notes corresponding to spe
cific words in the lyric (Zatorre et al., 1996), or continue imaging the notes following the
opening fragment of a television show theme song (Halpern & Zatorre, 1999). The activa
tion of auditory areas is corroborated by EEG/MEG studies in which responses to an
imagined note (Janata, 2001) or expected chord (Janata, 1995; Otsuka et al., 2008) closely
resemble auditory evoked potentials with known sources in the auditory cortex (e.g., the
N100). One study that used actual CD recordings of instrumental and vocal music found
extensive activation of auditory association areas during silent gaps that were inserted in
to the recordings, with some activation of the primary auditory cortex when the gaps oc
curred in instrumental music (Kraemer et al., 2005). However, another study that used
actual CD recordings to examine anticipatory imagery—the phenomenon of imagining the
next track on a familiar album as soon as the current one ends—found no activation of the
auditory cortices during the imagery period but extensive activation of a frontal and pre
motor network (Leaver et al., 2009).
Premotor areas, in particular the SMA, as well as frontal regions associated with memory
retrieval, have been activated in most neuroimaging studies of musical imagery that have
emphasized the auditory components (Halpern & Zatorre, 1999; Leaver et al., 2009; Za
torre et al., 1996), even under relatively simple conditions that could be regarded as
maintenance of items in working memory during same/different comparison judgments of
melodies or harmonized melodies lasting 4 to 6 seconds (Brown & Martinez, 2007). It has
been argued, however, on the basis of comparing activations in an instrumental timbre
imagery task with a visual object imagery task, that the frontal contribution may arise
from general imagery task demands (Halpern et al., 2004). Nonetheless, effortful musical
imagery tasks, such as those requiring the imagining of newly learned pairings of novel
melodies (Leaver et al., 2009), imagery of expressive short phrases from an aria (Kleber
et al., 2007), or imagining the sound or actions associated with a Mozart piano sonata
when only the other modality is presented (Baumann et al., 2007), appear to be associat
ed with activity in a widespread network of cortical and subcortical areas. This network
Page 20 of 42
Cognitive Neuroscience of Music
matches quite well elements of both the ventral and dorsal attentional networks (Corbet
ta & Shulman, 2002) and the network observed when attentive listening to polyphonic
music is contrasted with rest (Janata et al., 2002a).
Motor Imagery
Several studies have focused on motor imagery. In a group of pianists, violinists, and cel
lists, imagined performance of rehearsed pieces from the classical repertoire recruited
frontal and parietal areas bearing resemblance to the dorsal attention network, together
with the SMA and subcortical areas and cerebellum (Langheim et al., 2002). Imagining
performing the right-hand part of one of Bartok’s Mikrokosmos while reading the score
similarly activates the dorsal attentional network along with visual areas and the cerebel
lum (Meister et al., 2004). Interestingly, the SMA is not activated significantly when the
source of the information to be imagined is external rather than internal (i.e., playing
from memory), indicating that premotor and parietal elements of the dorsal attentional
system coordinate with other brain regions based on the specific demands of the particu
lar imagery task.
Emotion
The relationship between music and emotion is a complex one, and multiple mechanisms
have been postulated through which music and the emotion systems of the brain can in
teract (Juslin & Vastfjall, 2008). Compared with the rather restricted set of paradigms
that have been developed for probing the structural representations of music (e.g., tonali
ty), the experimental designs for examining neural correlates of emotion in music are di
verse. The precise emotional states that are captured in any given experiment, and their
relevance to real music listening experiences, are often difficult to discern when the actu
al comparisons between experimental conditions are considered carefully. Manipulations
have tended to fall into one of two categories: (1) normal music contrasted with the same
music rendered dissonant or incoherent, or (2) selection or composition of musical stimuli
to fall into discrete affective categories (e.g., happy, sad, fearful). In general, studies have
found modulation of activity within limbic system areas of the brain.
(p. 125) When the relative dissonance of a mechanical performance of a piano melody is
varied by the dissonance of the accompanying chords, activity in the right parahippocam
pal gyrus correlates positively with the increases in dissonance and perceived unpleasant
ness, whereas activity in the right orbitofrontal cortex and subcallosal cingulate increases
as the consonance and perceived pleasantness increase (Blood et al., 1999). Similarly,
when listening to pleasant dance music spanning a range of mostly classical genres is
contrasted with listening to the same excerpts rendered dissonant and displeasing by
mixing the original with two copies that have been pitch-shifted by a minor second and a
tritone, medial temporal areas—the left parahippocampal gyrus, hippocampus, and bilat
eral amygdala—respond more strongly to the dissonant music, whereas areas more typi
cally associated with listening to music—the auditory cortex, the left IFG, anterior insula
and frontal operculum, and ventral premotor cortex—respond more strongly to the origi
Page 21 of 42
Cognitive Neuroscience of Music
nal pleasing versions (Koelsch et al., 2006). The same stimulus materials result in
stronger theta activity along anterior midline sites in response to the pleasing music
(Sammler et al., 2007). Listening to coherent excerpts of music rather than their tempo
rally scrambled counterparts (Levitin & Menon, 2003) increases activity in parts of the
dopaminergic pathway—the ventral tegmental area and nucleus accumbens (Menon &
Levitin, 2005). These regions interact and are functionally connected to the left IFG, insu
la, hypothalamus, and orbitofrontal cortex, thus delineating a set of emotion-processing
areas of the brain that are activated by music listening experiences that are relatively
pleasing.
The results of the above-mentioned studies are somewhat heterogeneous and challenging
to interpret because they depend on comparisons of relatively normal (and pleasing) mu
sic-listening experiences with highly abnormal (and displeasing) listening experience,
rather than comparing normal pleasing listening experiences with normal displeasing ex
periences. Nonetheless, modulation of the brain’s emotion circuitry is also observed when
the statistical contrasts do not involve distorted materials. Listening to unfamiliar and
pleasing popular music compared with silent rest activates the hippocampus, nucleus ac
cumbens, ventromedial prefrontal cortex, right temporal pole, and anterior insula (Brown
et al., 2004a). When listening to excerpts of unfamiliar and familiar popular music, activi
ty in the VMPFC increases as the degree of experienced positive affect increases (Janata,
2009). Somewhat paradoxically, listening to music that elicits chills (goosebumps or shiv
ers down the spine)—something that is considered by many to be highly pleasing—re
duces activity in the VMPFC (where activity increases tend to be associated with positive
emotional responses), whereas activity in the right amygdala and in the left hippocampus/
amygdala also decreases (Blood & Zatorre, 2001). Activity in other brain areas associated
with positive emotional responses, such as the ventral striatum and orbitofrontal cortex
increases, along with activity in the insula and premotor areas (SMA and cerebellum).
The amygdala has been of considerable interest, given its general role in the processing
of fearful stimuli. Patients with either unilateral or bilateral damage to the amygdala
show impaired recognition of scary music and difficulty differentiating peaceful music
from sad music (Gosselin et al., 2005, 2007). Right amygdala damage, in particular,
leaves patients unable to distinguish intended fear in music from either positive or nega
tive affective intentions (Gosselin et al., 2005). Chord sequences that contain irregular
chord functions and elicit activity in the VLPFC are also regarded as less pleasing and
elicit activity bilaterally in the amygdala (Koelsch et al., 2008). Violations of syntactic ex
pectations also increase the perceived tension in a piece of music and are associated with
changes in electrodermal activity—a measure of emotional arousal (Steinbeis et al.,
2006).
Perhaps the most common association between music and emotion is the relationship be
tween affective valence and the mode of the music: The minor mode is consistently asso
ciated with sadness, whereas the major mode is associated with happiness. Brain activa
tions associated with mode manipulations are not as consistent across studies, however.
In one study (Khalfa et al., 2005), the intended emotions of classical music pieces played
Page 22 of 42
Cognitive Neuroscience of Music
on the piano were assessed on a five-point bivalent scale (sad to happy). Relative to major
pieces, minor pieces elicited activity in the posterior cingulate and in the medial pre
frontal cortex, whereas pieces in the major mode were not associated with any activity in
creases relative to minor pieces. A similar absence of response for major mode melodies
relative to minor mode was observed in a different study in which unfamiliar monophonic
melodies were used (Green et al., 2008). However, minor mode melodies elicited activity
in the left parahippocampal gyrus and rostral anterior cingulate, indicating engagement
of the limbic system, albeit in a unique constellation. A study in which responses to
(p. 126) short four-chord sequences that established either major or minor tonalities were
compared with repeated chords found bilateral activation of the IFG, irrespective of the
mode (Mizuno & Sugishita, 2007). This result was consistent with the role of this region
in the evaluation of musical syntax, but inconsistent with the other studies comparing ma
jor and minor musical material. Finally, a study using recordings of classical music that
could be separated into distinct happy, sad, and neutral categories (Mitterschiffthaler et
al., 2007) found that happy and sad excerpts strongly activated the auditory cortex bilat
erally relative to neutral music. Responses to happy and sad excerpts (relative to neutral)
were differentiated in that happy music elicited activity within the ventral striatum, sev
eral sections of the cingulate cortex, and the parahippocampal gyrus, whereas sad music
was associated with activity in a region spanning the right hippocampus and amygdala,
along with cingulate regions.
Unsurprisingly, the auditory cortex has been the specific target of several investigations.
An early investigation observed a larger planum temporale in the left hemisphere among
musicians, although the effect was primarily driven by musicians with absolute pitch
(Schlaug et al., 1995). In studies utilizing larger numbers of musically trained and un
trained subjects, the volume of HG, where the primary and secondary auditory areas are
situated, was found to increase with increasing musical aptitude (Schneider et al., 2002,
2005). The volumetric measures were positively correlated with the strength of the early
(19–30 ms) stages of the evoked responses to amplitude-modulated pure tones (Schneider
et al., 2002). Within the lateral extent of HG, the volume was positively correlated with
the magnitude of a slightly later peak (50 ms post-stimulus) in the waveform elicited by
sounds consisting of several harmonics. Remarkably, the hemispheric asymmetry in the
volume of this region was indicative of the mode of perceptual processing of these
sounds, with larger left-hemisphere volumes reflecting a bias toward processing the im
plied fundamental frequency of the sounds and larger right-hemisphere volumes indicat
Page 23 of 42
Cognitive Neuroscience of Music
ing a bias toward spectral processing of the sounds (Schneider et al., 2005). In general,
the auditory cortex appears to respond more strongly to musical sounds in musicians
(Pantev et al., 1998) and as a function of the instrument with which they have had the
most experience (Margulis et al., 2009).
Whole brain analyses using techniques such a cortical thickness mapping or voxel-based
morphometry (VBM) have also revealed differences between individuals trained on musi
cal instruments and those with no training, although there is considerable variability in
the regions identified in the different studies, possibly a consequence of differences in the
composition of the samples and the mapping technique used (Bermudez et al., 2009). Cer
tain findings, such as a greater volume in trained pianists of primary motor and so
matosensory areas and cerebellar regions responsible for hand and finger movements
(Gaser & Schlaug, 2003), are relatively easy to interpret, and they parallel findings of
stronger evoked responses in the hand regions of the right hemisphere that control the
left (fingering) hand of violinists (Pascual-Leone et al., 1994). Larger volumes in musi
cians are also observed in lateral prefrontal cortex, both ventrally (Bermudez et al., 2009;
Gaser & Schlaug, 2003; Sluming et al., 2002) and dorsally along the middle frontal gyrus
(Bermudez et al., 2009). However, one particular type of musical aptitude, absolute pitch,
is associated with decreased cortical thickness in dorsolateral frontal cortex in similar ar
eas that are associated with increases in activation in listeners with absolute pitch rela
tive to other musically trained subjects (Zatorre et al., 1998). Another paradox presented
by VBM are findings of greater gray-matter volumes in amusic subjects in some of the
same ventrolateral regions that show greater cortical thickness in musicians (Bermudez
et al., 2009; Hyde et al., 2007).
The anatomical differences that are observed as a function of musical training are per
haps better placed into a functional context when one observes the effects of short-term
training on neural responses. Nonmusicians who were trained over the course of two
weeks to play a cadence consisting of broken chords on a piano keyboard exhibited a
stronger MMN to deviant notes in similar note patterns compared either with their re
sponses before receiving training or with a group of subjects who received training by lis
tening and making judgments about (p. 127) the sequences performed by the trained
group (Lappe et al., 2008). Similarly, nonmusicians who, over the course of 5 days,
learned to perform five-note melodies showed greater activation bilaterally in the dorsal
IFG (Broca’s area) and lateral premotor areas when listening to the trained melodies
compared with listening to comparison melodies on which they had not trained (Lahav et
al., 2007). Thus, perceptual responses are stronger following sensorimotor training with
in the networks that are utilized during the training. When the training involves reading
music from a score, medial superior parietal areas also show effects of training (Stewart
et al., 2003). Both mental and physical practice of five-finger piano exercises is capable of
strengthening finger representations in the motor cortex across a series of days, as mea
sured by reduced transcranial magnetic stimulation (TMS) thresholds for eliciting move
ments (Pascual-Leone et al., 1995).
Page 24 of 42
Cognitive Neuroscience of Music
Disorders
Musical behaviors, like any other behaviors, are disrupted when the functioning of the
neural substrates that support those behaviors is impaired. Neuropsychological investiga
tions provide intriguing insights into component processes in musical behaviors and their
likely locations in the brain. Aside from the few studies mentioned in the sections above
of groups of patients who underwent brain surgery, there are many case studies docu
menting the effects of brain insults, typically caused by stroke, on musical functions
(Brust, 2003). A synthesis of the findings from these many studies (Stewart et al., 2006) is
beyond the scope of this chapter, as is a discussion of the burgeoning topic of using music
for neurorehabilitation (Belin et al., 1996; Sarkamo et al., 2008; Thaut, 2005). Here, the
discussion of brain disorders in relation to music is restricted to amusia, a music-specific
disorder.
Amusia
Impaired identification of pitch direction, but not basic pitch discrimination, has been ob
served in patients with right temporal lobe excisions that encroach on auditory cortex in
HG (Johnsrude et al., 2000), which is likely to underlie the bias toward the right hemi
sphere for the processing of melodic information (Warrier & Zatorre, 2004). A diffusion
tensor imaging study of amusics and normal controls found that the volume of the superi
or arcuate fasciculus in the right hemisphere was consistently smaller in the group of
amusics than in normal controls (Loui et al., 2009a). The arcuate fasciculus connects the
temporal and frontal lobes, specifically the posterior superior and middle temporal gyri
and the IFG. VBM results additionally indicate a structural anomaly in a small region of
the IFG in amusics (Hyde et al., 2006, 2007). Taken together, the neuroimaging studies
that have implicated the IFG in the processing of musical syntax and temporal structure
(Koelsch et al., 2002c, 2005b; Levitin & Menon, 2003; Maess et al., 2001; Patel, 2003; Till
mann et al., 2003), the behavioral and structural imaging data from amusics, and the
studies of deficits in melody processing in right temporal lobe lesion patients support a
view that the ability to perceive, appreciate, and remember melodies depends in large
part on intact functioning of a perception/action circuit in the right hemisphere (Fuster,
2000; Loui et al., 2009a).
Page 25 of 42
Cognitive Neuroscience of Music
cal stimuli and musical tasks are capable of reaching most every part of the brain. Given
this complexity, is there any hope for generating process models of musical functions and
behaviors that can lead to a grand unified theory of music and the brain?
Obligatory components of process models are boxes with arrows between them, in which
each box refers to a discrete function, and perhaps an associated brain area. Such models
have been proposed with respect to music (Koelsch & Siebel, 2005; Peretz & Coltheart,
2003). Within such models, some component processes are considered music specific,
whereas others represent shared processes with other brain functions (e.g., language,
emotion). The issue of music specificity, or modularity of musical functions, is of consider
able interest, in large part because of its evolutionary implications (Peretz, 2006). The
strongest evidence for modularity of musical functions derives from studies of individuals
Page 26 of 42
Cognitive Neuroscience of Music
with brain damage in whom specific musical functions are selectively impaired (Peretz,
2006; Stewart et al., 2006). Such specificity in loss of function is remarkable given the
usual extent of brain damage. Indeed, inferences about modularity must be tempered
when not all possible parallel functions in other domains have been considered. The pro
cessing functions of the right lateral temporal lobes provide a nice example. As reviewed
in this chapter and elsewhere (Zatorre et al., 2002), the neuropsychological and function
al and structural neuroanatomical evidence suggests that the auditory cortex in the right
hemisphere is more specialized than the left for the processing of pitch, pitch relation
ships, and thereby melody. Voice-selective regions of the auditory cortex are highly selec
tive in the right hemisphere (Belin et al., 2000), and in general a large extent of the right
lateral temporal lobes appears to be important for the processing of emotional prosody
(Ethofer et al., 2009; Ross & Monnot, 2008; Schirmer & Kotz, 2006). Given evidence of
parallels between melody and prosody, such as the connotation of sadness by an interval
of a minor third in both music and speech (Curtis & Bharucha, 2010), or deficits among
amusic individuals in processing speech intonation contours (Patel et al., 2005), it is likely
that contour-related music and language functions are highly intertwined in the right
temporal lobe.
Perhaps more germane to the question of how the brain processes music is the question
of how one engages with the music. Few would doubt that the brain of someone dancing a
tango at a club would show more extensive engagement with the music than that of some
one incidentally hearing music while choosing what cereal to buy at a supermarket. It
would therefore seem that any given result from the cognitive neuroscience of music lit
erature must be interpreted with regard to the experiential situation of the participants,
both in terms of the affective, motivational, and task/goal states they might find them
selves in, and in terms of the relationship of the (often abstracted) musical stimuli that
(p. 129) they are being asked to interact with to the music they would normally interact
with. In this regard, approaches that explicitly relate the dynamic time-varying properties
of real musical stimuli to the behaviors and brain responses they engender will become
increasingly important.
Figure 7.2 is a highly schematized (and incomplete) summary of sets of brain areas that
may be recruited by different musical elements and more general processes that mediate
musical experiences. It is not intended to be a process model or to represent the outcome
of a quantitative meta-analysis. Rather, it serves mainly to suggest that one goal of re
search in the cognitive neuroscience of music might be to serve as a model system for un
derstanding the coordination of the many processes that underlie creative goal-directed
behavior in humans.
References
Adler, D. S. (2009). Archaeology: The earliest musical tradition. Nature, 460, 695–696.
Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflict
ed with a music-specific disorder. Brain, 125, 238–251.
Page 27 of 42
Cognitive Neuroscience of Music
Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music
agnosia associated with middle cerebral artery infarcts. Brain, 123 (Pt 9), 1926–1938.
Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., Heinze, H.-J.,
& Altenmüller, E. (2006). Shared networks for auditory and motor processing in profes
sional pianists: Evidence from fMRI conjunction. NeuroImage, 30, 917–926.
Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology,
41, 254–311.
Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A net
work for audio-motor coordination in skilled pianists and non-musicians. Brain Res, 1161,
65–78.
Beisteiner, R., Erdler, M., Mayer, D., Gartus, A., Edward, V., Kaindl, T., Golaszewski, S.,
Lindinger, G., & Deecke, L. (1999). A marker for differentiation of capabilities for process
ing of musical harmonies as detected by magnetoencephalography in musicians. Neuro
science Letters, 277, 37–40.
Belin, P., VanEeckhout, P., Zilbovicius, M., Remy, P., Francois, C., Guillaume, S., Chain, F.,
Rancurel, G., & Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intona
tion therapy: A PET study. Neurology, 47, 1504–1511.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in hu
man auditory cortex. Nature, 403, 309–312.
Bengtsson, S. L., Csikszentmihalyi, M., & Ullen, F. (2007). Cortical regions involved in the
generation of musical structures during improvisation in pianists. Journal of Cognitive
Neuroscience, 19, 830–842.
Bengtsson, S. L., & Ullen, F. (2006). Dissociation between melodic and rhythmic process
ing during piano performance from musical scores. NeuroImage, 30, 272–284.
Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural
correlates of musical improvisation. NeuroImage, 41, 535–543.
Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical corre
lates of musicianship as revealed by cortical thickness and voxel-based morphometry.
Cerebral Cortex, 19, 1583–1596.
Bermudez, P., & Zatorre, R. J. (2005). Conditional associative memory for musical stimuli
in nonmusicians: Implications for absolute pitch. Journal of Neuroscience, 25, 7718–7723.
Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical ex
pectancy: Comparison of musicians with nonmusicians. Journal of Experimental Psycholo
gy: Human Perception and Performance, 21, 1278–1296.
Page 28 of 42
Cognitive Neuroscience of Music
Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in mu
sic and other non-linguistic contexts. Psychophysiology, 24, 14–25.
Bevan, A., Robinson, G., Butterworth, B., & Cipolotti, L. (2003). To play “B” but not to say
“B”: Selective loss of letter names. Neurocase, 9, 118–128.
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimension
al scaling of emotional responses to music: The effect of musical expertise and of the du
ration of the excerpts. Cognition & Emotion, 19, 1113–1139.
Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate
with activity in brain regions implicated in reward and emotion. Proceedings of the Na
tional Academy of Sciences U S A, 98, 11818–11823.
Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to
pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature
Neuroscience, 2, 382–387.
Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical
discrimination. Brain and Cognition, 63, 59–69.
Brown, S., Martinez, M. J., & Parsons, L. M. (2004a). Passive music listening spontaneous
ly engages limbic and paralimbic systems. Neuroreport, 15, 2033–2037.
Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in
the brain: A PET study of the generation of melodies and sentences. European Journal of
Neuroscience, 23, 2791–2803.
Brown, S., Martinez, M. J., Hodges, D. A., Fox, P. T., & Parsons, L. M. (2004b). The song
system of the human brain. Cognitive Brain Research, 20, 363–375.
Brust, J. C. M. (2003). Music and the neurologist: A historical perspective. In I. Peretz &
R. J. Zatorre (Eds.), Cognitive neuroscience of music (pp. 181–191). Oxford, UK: Oxford
University Press.
Caclin, A., Brattico, E., Tervaniemi, M., Naatanen, R., Morlet, D., Giard, M. H., &
McAdams, S. (2006). Separate neural processing of timbre dimensions in auditory senso
ry memory. Journal of Cognitive Neuroscience, 18, 1959–1972.
Caclin, A., Giard M.-H., Smith, B. K., & McAdams, S. (2007). Interactive processing of
timbre dimensions: A Garner interference study. Brain Research, 1138, 159–170.
Caclin, A., McAdams, S., Smith, B. K., & Giard, M. H. (2008). Interactive processing of
timbre dimensions: An exploration with event-related potentials. Journal of Cognitive
Neuroscience, 20, 49–64.
Page 29 of 42
Cognitive Neuroscience of Music
Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates
(p. 130)
of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the
Acoustical Society of America, 118, 471–482.
Carrion, R. E., & Bly, B. M. (2008). The effects of learning on event-related potential cor
relates of musical expectancy. Psychophysiology, 45, 759–775.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits
motor regions of the brain. Cerebral Cortex, 18, 2844–2854.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for au
ditory-motor synchronization is modulated by rhythm complexity and musical training.
Journal of Cognitive Neuroscience, 20, 226–239.
Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dor
sal premotor cortex during synchronization to musical rhythms. NeuroImage, 32, 1771–
1781.
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten
tion in the brain. Nature Reviews, Neuroscience, 3, 201–215.
Curtis, M. E., & Bharucha, J. J. (2010). The minor third communicates sadness in speech,
mirroring its use in music. Emotion, 10, 335–348.
Dennis, M., & Hopyan, T. (2001). Rhythm and melody in children and adolescents after
left or right temporal lobectomy. Brain and Cognition, 47, 461–469.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for
melodies. Psychological Review, 85, 341–354.
Ethofer, T., De Ville, D. V., Scherer, K., & Vuilleumier, P. (2009). Decoding of emotional in
formation in voice-sensitive cortices. Current Biology, 19, 1028–1033.
Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural inte
gration in language and music: Evidence for a shared system. Memory and Cognition, 37,
1–9.
Foxton, J. M., Dean, J. L., Gee, R., Peretz, I., & Griffiths, T. D. (2004). Characterization of
deficits in pitch perception underlying “tone deafness.” Brain, 127, 801–810.
Foxton, J. M., Nandy, R. K., & Griffiths, T. D. (2006). Rhythm deficits in “tone deafness.”
Brain and Cognition, 62, 24–29.
Fuster, J. M. (2000). Executive frontal functions. Experimental Brain Research, 133, 66–
70.
Fuster, J. M. (2001) The prefrontal cortex—an update: Time is of the essence. Neuron, 30,
319–333.
Page 30 of 42
Cognitive Neuroscience of Music
Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musi
cians. Journal of Neuroscience, 23, 9240–9245.
Gilbert, S. J., Spengler, S., Simons, J. S., Steele, J. D., Lawrie, S. M., Frith, C. D., &
Burgess, P. W. (2006). Functional specialization within rostral prefrontal cortex (Area 10):
A meta-analysis. Journal of Cognitive Neuroscience, 18, 932–948.
Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emo
tion recognition from music. Neuropsychologia, 45, 236–244.
Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S.
(2005). Impaired recognition of scary music following unilateral temporal lobe excision.
Brain, 128, 628–640.
Goydke, K. N., Altenmuller, E., Moller, J., & Munte, T. F. (2004). Changes in emotional tone
and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Re
search, 21, 351–359.
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain.
Journal of Cognitive Neuroscience, 19, 893–906.
Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual differences in beat per
ception. NeuroImage, 47, 1894–1903.
Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in
musicians and nonmusicians during beat perception. Journal of Neuroscience, 29, 7540–
7548.
Green, A. C., Baerentsen, K. B., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., &
Vuust, P. (2008). Music in minor activates limbic structures: A relationship with disso
nance? Neuroreport, 19, 711–715.
Griffiths, T. D., Buchel, C., Frackowiak, R. S. J., & Patterson, R. D. (1998). Analysis of tem
poral structure in sound by the human brain. Nature Neuroscience, 1, 422–427.
Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET in
vestigation of auditory imagery for familiar melodies. Cerebral Cortex, 9, 697–704.
Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
correlates of perceived and imagined musical timbre. Neuropsychologia, 42, 1281–1292.
Hickok, G., Buchsbaum, B., Humphries, C., & Muftuler, T. (2003). Auditory-motor interac
tion revealed by fMRI: Speech, music, and working memory in area Spt. Journal of Cogni
tive Neuroscience, 15, 673–682.
Page 31 of 42
Cognitive Neuroscience of Music
Hyde, K. L., Lerch, J. P., Zatorre, R. J., Griffiths, T. D., Evans, A. C., & Peretz, I. (2007).
Cortical thickness in congenital amusia: When less is better than more. Journal of Neuro
science, 27, 13028–13032.
Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological
Science, 15, 356–360.
Hyde, K. L., Zatorre, R. J., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of
the amusic brain: A two-site study. Brain, 129, 2562–2570.
Janata, P. (1995). ERP measures assay the degree of expectancy violation of harmonic
contexts in music. Journal of Cognitive Neuroscience, 7, 153–164.
Janata, P. (2001). Brain electrical activity evoked by mental formation of auditory expecta
tions and images. Brain Topography, 13, 169–193.
Janata, P. (2005). Brain networks that track musical structure. Annals of the New York
Academy of Sciences, 1060, 111–124.
Janata, P., Birk, J. L., Tillmann, B., & Bharucha, J. J. (2003). Online detection of tonal pop-
out in modulating contexts. Music Perception, 20, 283–305.
Janata, P., Birk, J. L., Van Horn, J. D., Leman, M., Tillmann, B., & Bharucha, J. J. (2002b).
The cortical topography of tonal structures underlying Western music. Science, 298,
2167–2170.
Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for be
haviors related to sequencing and music. Nature Neuroscience, 6, 682–687.
Janata, P., Tillmann, B., & Bharucha, J. J. (2002a). Listening to polyphonic music recruits
domain-general attention and working memory circuits. Cognitive, Affective and Behav
ioral Neuroscience, 2, 121–140.
Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development
(p. 131)
Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right
human auditory cortex for perceiving pitch direction. Brain, 123, 155–163.
Jones, M. R. (1976). Time, our lost dimension—toward a new theory of perception, atten
tion, and memory. Psychological Review, 83, 323–355.
Page 32 of 42
Cognitive Neuroscience of Music
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological
Review, 96, 459–491.
Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stim
ulus-driven attending in dynamic arrays. Psychological Science, 13, 313–319.
Juslin, P. N., & Vastfjall, D. (2008). Emotional responses to music: The need to consider
underlying mechanisms. Behavioral and Brain Science, 31, 559–621.
Khalfa, S., Schon, D., Anton, J. L., & Liegeois-Chauvel, C. (2005). Brain regions involved in
the recognition of happiness and sadness in music. Neuroreport, 16, 1981–1984.
Kleber, B., Birbaumer, N., Veit, R., Trevorrow, T., & Lotze, M. (2007). Overt and imagined
singing of an Italian aria. NeuroImage, 36, 889–900.
Klostermann, E. C., Loui, P., Shimamura, A. P. (2009). Activation of right parietal cortex
during memory retrieval of nonlinguistic auditory stimuli. Cognitive Affective & Behav
ioral Neuroscience, 9, 242–248.
Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and dif
ferences between ERAN and MMN. Psychophysiology, 46, 179–190.
Koelsch, S., & Mulder, J. (2002). Electric brain responses to inappropriate harmonies dur
ing listening to expressive music. Clinical Neurophysiology, 113, 862–869.
Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in
Cognitive Sciences, 9, 578–584.
Koelsch, S., Schmidt, B. H., & Kansok, J. (2002a). Effects of musical expertise on the early
right anterior negativity: An event-related brain potential study. Psychophysiology, 39,
657–663.
Koelsch, S., Schroger, E., & Gunter, T. C. (2002b). Music matters: Preattentive musicality
of the human brain. Psychophysiology, 39, 38–48.
Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unex
pected chord functions during music listening. Neuroreport, 19, 1815–1819.
Koelsch, S., Gunter, T., Friederici, A. D., & Schroger, E. (2000). brain indices of music pro
cessing: “Nonmusicians” are musical. Journal of Cognitive Neuroscience, 12, 520–541.
Koelsch, S., Gunter, T., Schroger, E., & Friederici, A. D. (2003a). Processing tonal modula
tions: An ERP study. Journal of Cognitive Neuroscience, 15, 1149–1159.
Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005a). Interaction between syn
tax processing in language and in music: An ERP study. Journal of Cognitive
Neuroscience, 17, 1565–1577.
Page 33 of 42
Cognitive Neuroscience of Music
Koelsch, S., Jentschke, S., Sammler, D., & Mietchen, D. (2007). Untangling syntactic and
sensory processing: An ERP study of music perception. Psychophysiology, 44, 476–490.
Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005b). Adults and children
processing music: An fMRI study. NeuroImage, 25, 1068–1076.
Koelsch, S., Fritz, T., v Cramon, D. Y., Muller, K., & Friederici, A. D. (2006). Investigating
emotion with music: An fMRI study. Human Brain Mapping, 27, 239–250.
Koelsch, S., Gunter, T. C., Schroger, E., Tervaniemi, M., Sammler, D., & Friederici, A. D.
(2001). Differentiating ERAN and MMN: An ERP study. Neuroreport, 12, 1385–1389.
Koelsch, S., Gunter, T. C., v Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D.
(2002c). Bach speaks: A cortical “language-network” serves the processing of music. Neu
roImage, 17, 956–966.
Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schroger, E., & Friederici, A. D.
(2003b). Children processing music: Electric brain responses reveal musical competence
and gender differences. Journal of Cognitive Neuroscience, 15, 683–693.
Koelsch, S., Kasper, E., Gunter, T. C., Sammler, D., Schulze, K., & Friederici, A. D. (2004).
Music, language, and meaning: Brain signatures of semantic processing. Nature Neuro
science, 7, 302–307.
Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Muller, K., & Gruber, O. (2009). Functional
architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping,
30, 859–873.
Kraemer, D. J. M., Macrae, C. N., Green, A. E., & Kelley, W. M. (2005). Musical imagery:
Sound of silence activates auditory cortex. Nature, 434, 158–158.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford Uni
versity Press.
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological Review, 89, 334–
368.
Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomo
tor recognition network while listening to newly acquired actions. Journal of Neuro
science, 27, 308–314.
Lakatos, S. (2000). A common perceptual space for harmonic and percussive timbres. Per
ception and Psychophysics, 62, 1426–1439.
Langheim, F. J. P., Callicott, J. H., Mattay, V. S., Duyn, J. H., & Weinberger, D. R. (2002).
Cortical systems associated with covert music rehearsal. NeuroImage, 16, 901–908.
Page 34 of 42
Cognitive Neuroscience of Music
Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by
short-term unimodal and multimodal musical training. Journal of Neuroscience, 28, 9632–
9639.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-
varying events. Psychological Review 106, 119–159.
Large, E. W., & Palmer, C. (2002). Perceiving temporal regularity in music. Cognitive
Science, 26, 1–37.
Leaver, A. M., Van Lare, J., Zielinski, B., Halpern, A. R., & Rauschecker, J. P. (2009). Brain
activation during anticipation of sound sequences. Journal of Neuroscience, 29, 2477–
2485.
Leino, S., Brattico, E., Tervaniemi, M., & Vuust, P. (2007). Representation of harmony
rules in the human brain: Further evidence from event-related potentials. Brain Research,
1142, 169–177.
Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception, 24,
329–366.
Levitin, D. J., & Menon, V. (2003). Musical structure is processed in “language” areas of
the brain: A possible role for Brodmann Area 47 in temporal coherence. NeuroImage, 20,
2142–2152.
Liegeois-Chauvel, C., Peretz, I., Babai, M., Laguitton, V., & Chauvel, P. (1998).
(p. 132)
Contribution of different cortical areas in the temporal lobes to music processing. Brain,
121, 1853–1867.
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical perfor
mance: An fMRI study of jazz improvisation. PLoS One, 3, e1679.
Lindenberger, U., Li, S. C., Gruber, W., & Muller, V. (2009). Brains swinging in concert:
Cortical phase synchronization while playing guitar. BMC Neuroscience, 10 (22), 1–12.
London, J. (2004). Hearing in time: Psychological aspects of musical meter. New York: Ox
ford University Press.
Loui, P., Alsop, D., & Schlaug, G. (2009a). Tone deafness: A new disconnection syndrome?
Journal of Neuroscience, 29, 10215–10220.
Loui, P., Grent-’t-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the
neural processing of harmonic syntax in Western music. Cognitive Brain Research, 25,
678–687.
Loui, P., Wu, E. H., Wessel, D. L., & Knight, R. T. (2009b). A generalized mechanism for
perception of pitch patterns. Journal of Neuroscience, 29, 454–459.
Page 35 of 42
Cognitive Neuroscience of Music
Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is
processed in Broca’s area: An MEG study. Nature Neuroscience, 4, 540–545.
Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009).
Selective neurophysiologic responses to music in instrumentalists with different listening
biographies. Human Brain Mapping, 30, 267–275.
McAdams, S., Winsberg, S., Donnadieu, S., Desoete, G., & Krimphoff, J. (1995). Perceptual
scaling of synthesized musical timbres: Common dimensions, specificities, and latent sub
ject classes. Psychological Research, 58, 177–192.
McDonald, I. (2006). Musical alexia with recovery: A personal account. Brain, 129, 2554–
2561.
Meister, I. G., Krings, T., Foltys, H., Boroojerdi, B., Muller, M., Topper, R., & Thron, A.
(2004). Playing piano in the mind: An fMRI study on music imagery and performance in
pianists. Cognitive Brain Research, 19, 219–228.
Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physio
logical connectivity of the mesolimbic system. NeuroImage, 28, 175–184.
Menon, V., Levitin, D. J., Smith, B. K., Lembke, A., Krasnow, B. D., Glazer, D., Glover, G. H.,
& McAdams, S. (2002). Neural correlates of timbre change in harmonic sounds. NeuroI
mage, 17, 1742–1754.
Meyer, M., Baumann, S., & Jancke, L. (2006). Electrical brain imaging reveals spatio-tem
poral dynamics of timbre perception in humans. NeuroImage, 32, 1510–1523.
Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in
music: An event-related potential study. NeuroImage, 38, 331–345.
Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R.
(2007). A functional MRI study of happy and sad affective states induced by classical mu
sic. Human Brain Mapping, 28, 1150–1162.
Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonality-re
lated emotional contents. Neuroreport, 18, 1651–1655.
Munte, T. F., Altenmuller, E., & Jancke, L. (2002). The musician’s brain as a model of neu
roplasticity. Nature Reviews, Neuroscience, 3, 473–478.
Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in
cognitive neuroscience. Psychological Bulletin, 125, 826–859.
Nan, Y., Knosche, T. R., Zysset, S., & Friedericil, A. D. (2008) Cross-cultural music phrase
processing: An fMRI study. Human Brain Mapping, 29, 312–328.
Page 36 of 42
Cognitive Neuroscience of Music
Northoff, G., & Bermpohl, F. (2004). Cortical midline structures and the self. Trends in
Cognitive Sciences, 8, 102–107.
Northoff, G., Heinzel, A., Greck, M., Bennpohl, F., Dobrowolny, H., & Panksepp, J. (2006).
Self-referential processing in our brain: A meta-analysis of imaging studies on the self.
NeuroImage, 31, 440–457.
Otsuka, A., Tamaki, Y., & Kuriki, S. (2008). Neuromagnetic responses in silence after mu
sical chord sequences. Neuroreport, 19, 1637–1641.
Paller, K. A., McCarthy, G., & Wood, C. C. (1992). Event-related potentials elicited by de
viant endings to melodies. Psychophysiology, 29, 202–206.
Palmer, C., & Krumhansl, C. L. (1990). Mental representations for musical meter. Journal
of Experimental Psychology. Human Perception and Performance, 16, 728–741.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). In
creased auditory cortical representation in musicians. Nature, 392, 811–814.
Parsons, L. M., Sergent, J., Hodges, D. A., & Fox, P. T. (2005). The brain basis of piano per
formance. Neuropsychologia, 43, 199–215.
Pascual-Leone, A., Dang, N., Cohen, L. G., Brasilneto, J. P., Cammarota, A., & Hallett, M.
(1995). Modulation of muscle responses evoked by transcranial magnetic stimulation dur
ing the acquisition of new fine motor-skills. Journal of Neurophysiology, 74, 1037–1045.
Pascual-Leone, A., Grafman, J., Hallett, M. (1994), Modulation of cortical motor output
maps during development of implicit and explicit knowledge. Science, 263, 1287–1289.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–
681.
Patel, A. D., & Balaban, E. (2000). Temporal patterns of human cortical activity reflect
tone sequence structure. Nature, 404, 80–84.
Patel, A. D., & Balaban, E. (2004). Human auditory cortical dynamics during perception of
long acoustic sequences: Phase tracking of carrier frequency by the auditory steady-state
response. Cerebral Cortex, 14, 35–46.
Patel, A. D., Foxton, J. M., & Griffiths, T. D. (2005). Musically tone-deaf individuals have
difficulty discriminating intonation contours extracted from speech. Brain and Cognition,
59, 310–313.
Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. J. (1998). Processing syntac
tic relations in language and music: An event-related potential study. Journal of Cognitive
Neuroscience, 10, 717–733.
Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing
of temporal pitch and melody information in auditory cortex. Neuron, 36, 767–776.
Page 37 of 42
Cognitive Neuroscience of Music
Penhune, V. B., Zatorre, R. J., & Feindel, W. H. (1999). The role of auditory cortex in reten
tion of rhythmic patterns as studied in patients with temporal lobe removals including
Heschl’s gyrus. Neuropsychologia, 37, 315–331.
Peretz, I. (1996). Can we lose memory for music? A case of music agnosia in a nonmusi
cian. Journal of Cognitive Neuroscience, 8, 481–496.
100, 1–32.
Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience,
6, 688–691.
Perry, D. W., Zatorre, R. J., Petrides, M., Alivisatos, B., Meyer, E., & Evans, A. C. (1999).
Localization of cerebral activity during simple singing. Neuroreport, 10, 3979–3984.
Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors:
The same neural signature? Cerebral Cortex, 17, 2650–2658.
Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and
episodic memory of music are subserved by distinct neural networks. NeuroImage, 20,
244–256.
Popescu, M., Otsuka, A., & Ioannides, A. A. (2004). Dynamics of brain activity in motor
and frontal cortical areas during music listening: A magnetoencephalographic study. Neu
roImage, 21, 1622–1638.
Pressing, J. (2002). Black Atlantic rhythm: Its computational and transcultural founda
tions. Music Perception, 19, 285–310.
Ross, E. D., & Monnot, M. (2008). Neurology of affective prosody and its functional-
anatomic organization in right hemisphere. Brain and Language, 104, 51–74.
Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion: Electro
physiological correlates of the processing of pleasant and unpleasant music. Psychophysi
ology, 44, 293–304.
Samson, S., & Zatorre, R. J. (1988). Melodic and harmonic discrimination following unilat
eral cerebral excision. Brain and Cognition, 7, 348–360.
Samson, S., & Zatorre, R. J. (1991). Recognition memory for text and melody of songs af
ter unilateral temporal lobe lesion: Evidence for dual encoding. Journal of Experimental
Psychology. Learning, Memory, and Cognition, 17, 793–804.
Samson, S., & Zatorre, R. J. (1994). Contribution of the right temporal-lobe to musical
timbre discrimination. Neuropsychologia, 32, 231–240.
Page 38 of 42
Cognitive Neuroscience of Music
Samson, S., Zatorre, R. J., & Ramsay, J. O. (2002). Deficits of musical timbre perception af
ter unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain, 125,
511–523.
Sarkamo, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., Autti,
T., Silvennoinen, H. M., Erkkilae, J., Laine, M., Peretz, I., & Hietanen, M. (2008). Music
listening enhances cognitive recovery and mood after middle cerebral artery stroke.
Brain, 131, 866–876.
Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain
regions in musicians during an ensemble: A PET study. Cognitive Brain Research, 12,
101–108.
Satoh, M., Takeda, K., Nagata, K., Shimosegawa, E., & Kuzuhara, S. (2006). Positron-
emission tomography of brain regions activated by recognition of familiar music. Ameri
can Journal of Neuroradiology, 27, 1101–1106.
Schellenberg, E. G., Iverson, P., & McKinnon, M. C. (1999). Name that tune: Identifying
popular recordings from brief excerpts. Psychonomic Bulletin and Review, 6, 641–646.
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psycholog
ical Science, 14, 262–266.
Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: brain mechanisms medi
ating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30.
Schlaug, G., Jancke, L., Huang, Y. X., & Steinmetz, H. (1995). In-vivo evidence of structur
al brain asymmetry in musicians. Science, 267, 699–701.
Schmithorst, V. J., & Holland, S. K. (2003). The effect of musical training on music pro
cessing: A functional magnetic resonance imaging study in humans. Neuroscience
Letters, 348, 65–68.
Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002).
Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musi
cians. Nature Neuroscience, 5, 688–694.
Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., Dosch, H. G.,
Bleeck, S., Stippich, C., & Rupp, A. (2005). Structural and functional asymmetry of lateral
Heschl’s gyrus reflects pitch perception preference. Nature Neuroscience, 8, 1241–1247.
Schön, D., Anton, J. L., Roth, M., & Besson, M. (2002). An fMRI study of music sight-read
ing. Neuroreport, 13, 2285–2289.
Schön, D., Semenza, C., & Denes, G. (2001). Naming of musical notes: A selective deficit
in one musical clef. Cortex, 37, 407–421.
Page 39 of 42
Cognitive Neuroscience of Music
Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a
new framework. Trends in Cognitive Sciences, 11, 211–218.
Sergent, J., Zuck, E., Terriah, S., & Macdonald, B. (1992). Distributed neural network un
derlying musical sight-reading and keyboard performance. Science, 257, 106–109.
Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-
based morphometry reveals increased gray matter density in Broca’s area in male sym
phony orchestra musicians. NeuroImage, 17, 1613–1622.
Sridharan, D., Levitin, D. J., Chafe, C. H., Berger, J., & Menon, V. (2007). Neural dynamics
of event segmentation in music: Converging evidence for dissociable ventral and dorsal
networks. Neuron, 55, 521–532.
Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language
indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex, 18,
1169–1178.
Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of harmonic expectancy viola
tions in musical emotions: Evidence from subjective, physiological, and neural responses.
Journal of Cognitive Neuroscience, 18, 1380–1393.
Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Brain
changes after learning to read and play music. Neuroimage, 20 (1), 71–83. doi: http://
dx.doi.org/10.1016/S1053-8119(03)00248-9
Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the
brain: Disorders of musical listening. Brain, 129, 2533–2553.
Temperley, D. (2001). The cognition of basic musical structures. Cambridge, MA: MIT
Press.
Thaut, M. (2005). Rhythm, music, and the brain: Scientific foundations and clinical appli
cations. New York: Routledge.
Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-orga
nizing approach. Psychological Review, 107, 885–913.
Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal
(p. 134)
Toiviainen, P. (2007). Visualization of tonal content in the symbolic and audio domains.
Computing in Musicology, 15, 187–199.
Toiviainen, P., & Krumhansl, C. L. (2003). Measuring and modeling real-time responses to
music: The dynamics of tonality induction. Perception, 32, 741–766.
Page 40 of 42
Cognitive Neuroscience of Music
Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., & Naatanen, R.
(1998). Timbre similarity: Convergence of neural, behavioral, and computational ap
proaches. Music Perception, 16, 223–241.
Tueting, P., Sutton, S., & Zubin, J. (1970). Quantitative evoked potential correlates of the
probability of events. Psychophysiology, 7, 385–394.
Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch
chroma and pitch height in the human brain. Proceedings of the National Academy of
Sciences U S A, 100, 10038–10042.
Warrier, C. M., & Zatorre, R. J. (2002). Influence of tonal context and timbral variation on
perception of pitch. Perception and Psychophysics, 64, 198–207.
Warrier, C. M., & Zatorre, R. J. (2004). Right temporal cortex is critical for utilization of
melodic contextual cues in a pitch constancy task. Brain, 127, 1616–1625.
Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hip
pocampus and left inferior frontal gyrus. NeuroImage, 39, 483–491.
Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science,
20, 1–5.
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex:
Music and speech. Trends in Cognitive Sciences, 6, 37–46.
Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural mechanisms underlying melodic
perception and memory for pitch. Journal of Neuroscience, 14, 1908–1919.
Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the
mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive
Neuroscience, 8, 29–46.
Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Function
al anatomy of musical processing in listeners with absolute pitch and relative pitch. Pro
ceedings of the National Academy of Sciences U S A, 95, 3172–3177.
Petr Janata
Page 41 of 42
Audition
Audition
Josh H. McDermott
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Audition is the process by which organisms use sound to derive information about the
world. This chapter aims to provide a bird’s-eye view of contemporary audition research,
spanning systems and cognitive neuroscience as well as cognitive science. The author
provides brief overviews of classic areas of research as well as some central themes and
advances from the past ten years. The chapter covers the sound transduction of the
cochlea, subcortical and cortical anatomical and functional organization of the auditory
system, amplitude modulation and its measurement, adaptive coding and plasticity, the
perception of sound sources (with a focus on the classic research areas of location, loud
ness, and pitch), and auditory scene analysis (including sound segregation, streaming,
filling in, and reverberation perception). The chapter concludes with a discussion of
where hearing research seems to be headed at present.
Keywords: sound transduction, auditory system anatomy, modulation, adaptation, plasticity, pitch perception, au
ditory scene analysis, sound segregation, streaming, reverberation
Introduction
From the cry of a baby to the rumble of a thunderclap, many events in the world produce
sound. Sound is created when matter in the world vibrates, and takes the form of pres
sure waves that propagate through the air, containing clues about the environment
around us. Audition is the process by which organisms utilize these clues to derive infor
mation about the world.
Audition is a crucial sense for most organisms. Humans, in particular, use sound to infer a
vast number of important things—what someone said, their emotional state when they
said it, and the whereabouts and nature of objects we cannot see, to name but a few.
When hearing is impaired (via congenital conditions, noise exposure, or aging), the conse
quences can be devastating, such that a large industry is devoted to the design of pros
thetic hearing devices.
Page 1 of 62
Audition
As listeners we are largely unaware of the computations underlying our auditory system’s
success, but they represent an impressive feat of engineering. The computational chal
lenges of everyday audition are reflected in the gap between biological and machine hear
ing systems—machine systems for interpreting sound currently fall far short of human
abilities. Understanding the basis of our success in perceiving sound will hopefully help
us to replicate it in machine systems and to restore it in biological auditory systems when
their function becomes impaired.
The goal of this chapter is to provide a bird’s-eye view of contemporary hearing research.
I provide brief overviews of classic areas of research as well as some central themes and
advances from the past ten years. The first section describes the sensory transduction of
the cochlea. The second section outlines subcortical and cortical functional organization.
(p. 136) The third section discusses modulation and its measurement by subcortical and
cortical regions of the auditory system, a key research focus of the past few decades. The
fourth section describes adaptive coding and plasticity, encompassing the relationship be
tween sensory coding and the environment as well as its adaptation to task demands. The
fifth section discusses the perception of sound sources, focusing on location, loudness,
and pitch. The sixth section presents an overview of auditory scene analysis. I conclude
with a discussion of where hearing research is headed at present. Because other chapters
in this handbook are devoted to auditory attention, music, and speech, I will largely avoid
these topics.
The Problem
Just by listening, we can routinely apprehend many aspects of the world around us: the
size of a room in which we are talking, whether it is windy or raining outside, the speed
of someone approaching from behind, or whether the surface someone is walking on is
gravel or marble. These abilities are nontrivial because the properties of the world that
are of interest to a listener are generally not explicit in the acoustic input—they cannot be
easily recognized or discriminated using the sound waveform itself. The brain must
process the sound entering the ear to generate representations in which the properties of
interest are more evident. One of the main objectives of hearing science is to understand
the nature of these transformations and their instantiation in the brain.
Page 2 of 62
Audition
analogues in other sensory modalities, but the auditory version presents some uniquely
challenging features.
Hearing begins with the ear, where the sound pressure waveform carried by the air is
transduced into action potentials that are sent to the brain via the auditory nerve. Action
potentials are a binary code, but what is conveyed to the brain is far from simply a bina
rized version of the incoming waveform. The transduction process is marked by several
distinctive signal transformations, the most obvious of which is produced by frequency
tuning.
Page 3 of 62
Audition
The coarse details of sound transduction are well understood (Figure 8.2). Sound induces
vibrations of the eardrum, which are transmitted via the bones of the middle ear to the
cochlea, the sensory organ of the auditory system. The cochlea is a coiled, fluid-filled
tube, containing several membranes that extend along its length and vibrate in response
to sound. Transduction of this mechanical vibration into an electrical signal occurs in the
organ of Corti, a mass of cells attached to the basilar membrane. The organ of Corti in
particular contains what are known as hair cells, named for the stereocilia that protrude
from them. The inner hair cells are (p. 137) responsible for sound transduction. When the
section of membrane on which they lie vibrates, the resulting deformation of the hair cell
body opens mechanically gated ion channels, inducing a voltage change within the cell.
Neurotransmitter release is triggered by the change in membrane potential, generating
action potentials in the auditory nerve fiber that the hair cell synapses with. This electri
cal signal is carried by the auditory nerve fiber to the brain.
The frequency tuning of the transduction process occurs because different parts of the
basilar membrane vibrate in response to different frequencies. This is partly due to me
chanical resonances—the thickness and stiffness of the membrane vary along its length,
producing a different resonant frequency at each point. However, the mechanical reso
nances are actively enhanced via a feedback process, believed to be mediated largely by a
second set of cells, called the outer hair cells. The outer hair cells abut the inner hair
cells on the organ of Corti and serve to alter the basilar membrane vibration rather than
transduce it. They expand and contract in response to sound through mechanisms that
are only partially understood (Ashmore, 2008; Dallos, 2008; Hudspeth, 2008). Their mo
tion alters the passive mechanics of the basilar membrane, amplifying the response to
low-intensity sounds and tightening the frequency tuning of the resonance. The upshot is
that high frequencies produce vibrations at the basal end of the cochlea (close to the
eardrum), whereas low frequencies produce vibrations at the apical end (far from the
Page 4 of 62
Audition
Page 5 of 62
Audition
The frequency decomposition of the cochlea is conceptually similar to the Fourier trans
form, but differs in the way that the frequency spectrum is decomposed. Whereas the
Fourier transform uses linearly spaced frequency bins, each separated by the same num
ber of hertz, the tuning bandwidth of auditory nerve fibers increases with their preferred
frequency. This characteristic can be observed in Figure 8.3A, in which the frequency re
sponse of a set of auditory nerve fibers is (p. 138) (p. 139) plotted on a logarithmic frequen
cy scale. Although the lowest frequency fibers are broader on a log scale than the high-
frequency fibers, in absolute terms their bandwidths are much lower—several hundred
hertz instead of several thousand. The distribution of best frequency along the cochlea
follows a roughly logarithmic function, apparent in Figure 8.3B, which plots the best fre
quency of a large set of nerve fibers against the distance along the cochlea of the hair cell
that they synapse with. These features of frequency selectivity are present in most biolog
ical auditory systems. It is partly for this reason that a log scale is commonly used for fre
quency.
Page 6 of 62
Audition
Although the frequency tuning of the cochlea is uncontroversial, the teleological question
of why the cochlear transduction process is frequency-tuned remains less settled. How
does frequency tuning aid the brain’s task of recovering useful information about the
world from its acoustic input? Over the past two decades, a growing number of re
searchers have endeavored to explain properties of sensory systems as optimal for the
task of encoding natural sensory stimuli, initially focusing on coding questions in vision,
and using notions of efficiency as the optimality criterion (Field, 1987; Olshausen & Field,
1996). Lewicki and colleagues have applied similar concepts to hearing, using algorithms
that derive efficient and sparse representations of sounds (Lewicki, 2002; Smith & Lewic
ki, 2006), properties believed to be desirable of early sensory representations. They re
port that for speech, or for (p. 140) combinations of environmental sounds and animal vo
calizations, efficient representations for sound look much like the representation pro
duced by auditory nerve fiber responses—sounds are represented with filters whose tun
ing is localized in frequency. Interestingly, the resulting representations share the depen
dence of bandwidth on frequency found in biological hearing—bandwidths increase with
frequency as they do in the ear. Moreover, representations derived in the same way for
“unnatural” sets of sounds, such as samples of white noise, do not exhibit frequency tun
ing, indicating that the result is at least somewhat specific to the sorts of sounds com
monly encountered in the world. These results suggest that frequency tuning provides an
efficient means to encode the sounds that were likely of importance when the auditory
system evolved, possibly explaining its ubiquitous presence in auditory systems. It re
mains to be seen whether this framework can explain potential variation in frequency tun
ing bandwidths across species (humans have recently been claimed to possess narrower
tuning than other species (Joris, Bergevin, et al., 2011; Shera, Guinan, et al., 2002), or the
broadening of frequency tuning with increasing sound intensity (Rhode, 1978), but it pro
vides one means by which to understand the origins of peripheral auditory processing.
Amplitude Compression
A second salient transformation that occurs in the cochlea is that of amplitude compres
sion, whereby the mechanical response of the cochlea to a soft sound (and thus the neur
al response as well) is larger than would be expected given the response to a loud sound.
The response elicited by a sound is thus not proportional to the sound’s amplitude (as it
would be if the response were linear), but rather to a compressive nonlinear function of
amplitude. The dynamic range of the response to sound is thus “compressed” relative to
the dynamic range of the acoustic input. Whereas the range of audible sounds covers five
orders of magnitude, or 100 dB, the range of cochlear response covers only one or two or
ders of magnitude (Ruggero, Rich, et al., 1997).
Compression appears to serve to map the range of amplitudes that the listener needs to
hear (i.e., those commonly encountered in the environment), onto the physical operating
range of the cochlea. Without compression, it would have to be the case that either
sounds low in level would be inaudible, or sounds high in level would be indiscriminable
(for they would fall outside the range that could elicit a response change). Compression
Page 7 of 62
Audition
permits very soft sounds to produce a physical response that is (just barely) detectable,
while maintaining some discriminability of higher levels.
Although frequency tuning and amplitude compression are at this point uncontroversial
and relatively well understood, several other empirical questions about peripheral audito
ry coding remain unresolved. One important issue involves the means by which the audi
Page 8 of 62
Audition
tory nerve encodes frequency information. As a result of the frequency tuning of the audi
tory nerve, the spike rate of a nerve fiber contains information about frequency (a large
firing rate indicates that the sound input contains frequencies near the center of the
range of the fiber’s tuning). Collectively, the firing rates of all nerve fibers could thus be
used to estimate the instantaneous spectrum of a sound. However, spike timings also car
ry frequency information. At least for low frequencies, the spikes that are fired in re
sponse to sound do not occur randomly, (p. 141) but rather tend to occur at the peak dis
placements of the basilar membrane vibration. Because the motion of a particular section
of the membrane mirrors the bandpass-filtered sound waveform, the spikes occur at the
waveform peaks (Rose, Brugge, et al., 1967). If the input is a single frequency, spikes thus
occur at a fixed phase of the frequency cycle (Figure 8.4A). This behavior is known as
phase locking and produces spikes at regular intervals corresponding to the period of the
frequency. The spike timings thus carry information that could potentially augment or su
percede that conveyed by the rate of firing.
Phase locking degrades in accuracy as frequency is increased (Figure 8.4B) due to limita
tions in the temporal fidelity of the hair cell membrane potential (Palmer & Russell, 1986)
and is believed to be largely absent for frequencies above 4 kHz in most mammals, al
though there is some variability across species (Johnson, 1980; Palmer & Russell, 1986).
The appeal of phase locking as a code for sound frequency is partly due to features of
rate-based frequency selectivity that are unappealing from an engineering standpoint. Al
though frequency tuning in the auditory system (as measured by auditory nerve spike
rates or psychophysical masking experiments) is narrow at low stimulus levels, it broad
ens considerably as the level is raised (Glasberg & Moore, 1990; Rhode, 1978). Phase
locking, by comparison, is robust to sound level—even though a nerve fiber responds to a
broad range of frequencies when the level is high, the time intervals between spikes con
tinue to convey frequency-specific information, as the peaks in the bandpass-filtered
waveform tend to occur at integer multiples of the periods of the component frequencies.
Our ability to discriminate frequency is impressive, with thresholds on the order of 1 per
cent (Moore, 1973), and there has been long-standing interest in whether this ability in
part depends on fine-grained spike timing information (Heinz, Colburn, et al., 2001). Al
though phase locking remains uncharacterized in humans because of the unavailability of
human auditory nerve recordings, it is presumed to occur in much the same way as in
nonhuman auditory systems. Moreover, several psychophysical phenomena are consistent
with a role for phase locking in human hearing. For instance, frequency discrimination
becomes much poorer for frequencies above 4 kHz (Moore, 1973), roughly the point at
which phase locking declines in nonhuman animals. The fundamental frequency of the
highest note on a piano is also approximately 4 kHz; this is also the point above which
melodic intervals between pure tones (tones containing a single frequency) are much less
evident (Attneave & Olson, 1971; Demany & Semal, 1990). These findings provide some
circumstantial evidence that phase locking is important for deriving precise estimates of
frequency, but definitive evidence remains elusive. It remains possible that the perceptual
degradations at high frequencies reflect a lack of experience with such frequencies, or
Page 9 of 62
Audition
their relative unimportance for typical behavioral judgments, rather than a physiological
limitation.
The upper limit of phase locking is also known to decrease markedly at each successive
stage of the auditory system (Wallace, Anderson, et al., 2007). (p. 142) By primary audito
ry cortex, the upper cutoff is in the neighborhood of a few hundred hertz. It would thus
seem that the phase locking that occurs robustly in the auditory nerve would need to be
rapidly transformed into a spike rate code if it were to benefit processing throughout the
auditory system. Adding to the puzzle is the fact that frequency tuning is not thought to
be dramatically narrower at higher stages in the auditory system. Such tightening might
be expected if the frequency information provided by phase-locked spikes was trans
formed to yield improved rate-based frequency tuning at subsequent stages (but see Bit
terman, Mukamel, et al., 2008).
The auditory nerve feeds into a cascade of interconnected subcortical regions that lead
up to the auditory cortex, as shown in Figure 8.1. The subcortical auditory pathways have
complex anatomy, only some of which is depicted in Figure 8.1. In contrast to the subcor
tical pathways of the visual system, which are often argued to largely preserve the repre
sentation generated in the retina, the subcortical auditory areas exhibit a panoply of in
teresting response properties not found in the auditory nerve, many of which remain ac
tive topics of investigation. Several subcortical regions will be referred to in the sections
that follow in the context of other types of acoustic measurements or perceptual func
tions.
Like other sensory systems, the auditory system can be thought of as a processing cas
cade, extending from the sensory receptors to cortical areas believed to mediate auditory-
based decisions. This “feedforward” view of processing underlies much auditory re
search. As in other systems, however, feedback from later stages to earlier ones is ubiqui
tous and substantial, and in the auditory system is perhaps even more pronounced than
elsewhere in the brain. Unlike the visual system, for instance, the auditory pathways con
tain feedback extending all the way back to the sensory receptors. The function of much
of this feedback remains poorly understood, but one particular set of projections—the
cochlear efferent system—has been the subject of much discussion.
Efferent connections to the cochlea originate primarily from the superior olivary nucleus,
an area of the midbrain a few synapses removed from the cochlea (see Figure 8.1, al
though the efferent pathways are not shown). The superior olive is divided into two sub
regions, medial and lateral, and to first order, these give rise to two efferent projections:
Page 10 of 62
Audition
one from the medial superior olive to the outer hair cells, called the medial olivocochlear
(MOC) efferents, and one from the lateral superior olive to the inner hair cells, called the
lateral olivocochlear (LOC) efferents (Elgoyhen & Fuchs, 2010). The MOC efferents have
been relatively well studied. Their activation (e.g., by electrical stimulation) is known to
reduce the basilar membrane response to low-intensity sounds, and causes the frequency
tuning of the response to broaden. This is probably because the MOC efferents inhibit the
outer hair cells, which are crucial to amplifying the response to low-intensity sounds and
to sharpening frequency tuning.
The MOC efferents may serve a protective function by reducing the response to loud
sounds (Rajan, 2000), but their most commonly proposed function is to enhance the re
sponse to transient sounds in noise (Guinan, 2006). When the MOC fibers are severed, for
instance, performance on tasks involving discrimination of tones in noise is reduced (May
& McQuone, 1995). Noise-related MOC effects are proposed to derive from its influence
on adaptation, which when induced by background noise, reduces the detectability of
transient foreground sounds by decreasing the dynamic range of the auditory nerve’s re
sponse. Because MOC activation reduces the response to ongoing sound, adaptation in
duced by continuous background noise is reduced, thus enhancing the response to tran
sient tones that are too brief to trigger the MOC feedback themselves (Kawase, Delgutte,
et al., 1993; Winslow & Sachs, 1987). Another interesting but controversial proposal is
that the MOC efferents play a role in auditory attention. One study, for instance, found
that patients whose vestibular nerve (containing the MOC fibers) had been severed were
better at detecting unexpected tones after the surgery, suggesting that selective attention
had been altered so as to prevent the focusing of resources on expected frequencies
(Scharf, Magnan, et al., 1997). See Guinan, 2006, for a recent review of these and other
ideas about MOC efferent function.
Less is known about the LOC efferents. One recent study found that destroying the LOC
efferents to one ear in mice caused binaural responses to become “unbalanced” (Darrow,
Maison, et al., 2006)—when sounds were presented binaurally at equal levels, responses
from the two ears that were equal under normal conditions were generally not equal fol
lowing the surgical procedure. The suggestion was that the LOC efferents serve to regu
late binaural responses so that interaural intensity (p. 143) differences, crucial to sound
localization (see below), can be accurately registered.
Page 11 of 62
Audition
Tonotopy
Although many of the functional properties of subcortical and cortical neurons are dis
tinct from what is found in auditory nerve responses, frequency tuning persists. Every
subcortical region contains frequency-tuned neurons, and neurons tend to be spatially or
ganized to some extent according to their best frequency, forming “tonotopic” maps. This
organization is also evident in the cortex. Many cortical neurons have a preferred fre
quency, although they are often less responsive to pure tones (relative to sounds with
more complex spectra) and often have broader tuning than neurons in peripheral stages
(Moshitch, Las, et al., 2006). Cortical frequency maps were one of the first reported find
ings in single-unit neurophysiology studies of the auditory cortex in animals, and have
since been found using functional magnetic resonance imaging (fMRI) in humans
(Formisano, Kim, et al., 2003; Humphries, Liebenthal, et al., 2010; Talavage, Sereno, et
al., 2004) as well as monkeys (Petkov, Kayser, et al., 2006). Figure 8.5 shows an example
of a tonotopic map obtained in a human listener with fMRI. Although never formally
quantified, it seems that tonotopy is less robust than the retinotopy found in the visual
system (evident, e.g., in recent optical imaging studies; Bandyopadhyay, Shamma, et al.,
2010; Rothschild, Nelken, et al., 2010).
Although the presence of some degree of tonotopy in the cortex is beyond question, its
functional importance remains unclear. Frequency selectivity is not the end goal of the
auditory system, and it does not obviously bear much relevance to behavior, so it is un
clear why tonotopy would be a dominant principle of organization throughout the audito
Page 12 of 62
Audition
ry system. It may be that other principles of organization are in fact more prominent but
have yet to be discovered. At present, however, tonotopy remains a staple of textbooks
and review chapters such as this.
Functional Organization
Page 13 of 62
Audition
Some of the proposed organizational principles clearly derive inspiration from the visual
system. For (p. 144) instance, selectivity for vocalizations and selectivity for spatial loca
tion have been found to be partially segregated, each being most pronounced in a differ
ent part of the lateral belt (Tian, Reser, et al., 2001; Woods, Lopez, et al., 2006). These re
gions have thus been proposed to constitute the beginning of ventral “what” and dorsal
“where” pathways analogous to those in the visual system, perhaps culminating in the
same parts of the prefrontal cortex as the analogous visual pathways (Cohen, Russ, et al.,
2009; Romanski, Tian, et al., 1999). Functional imaging results in humans have also been
viewed as supportive of this framework (Alain, Arnott, et al., 2001; Warren, Zielinski, et
Page 14 of 62
Audition
al., 2002). Additional evidence for a “what/where” dissociation comes from a recent study
in which sound localization and temporal pattern discrimination in cats were selectively
impaired by reversibly deactivating different regions of nonprimary auditory cortex
(Lomber & Malhotra, 2008). However, other studies have found less evidence for segre
gation of tuning properties in early auditory cortex (Bizley, Walker, et al., 2009). More
over, the properties of the “what” stream remain relatively undefined (Recanzone, 2008);
at this point, it has been defined mainly by reduced selectivity to spatial location.
There have been further attempts to extend the characterization of a ventral auditory
pathway by testing for specialization for the analysis of particular categories of sounds,
analogous to what has been found in the visual system (Kanwisher, 2010). The most wide
ly proposed specialization is for vocalizations. Using functional imaging, regions of the
anterior temporal lobe have been identified in both humans (Belin, Zatorre, et al., 2000)
and macaques (Petkov, Kayser, et al., 2008) that appear to be somewhat selectively re
sponsive to vocalizations and that could be homologous across species. Evidence for re
gions selective for other categories is less clear at present (Leaver & Rauschecker, 2010),
although see the section below on pitch perception for a discussion of a cortical region
putatively involved in pitch processing.
Another proposal is that the left and right auditory cortices are specialized for different
aspects of signal processing, with the left optimized for temporal resolution and the right
for frequency resolution (Zatorre, Belin, et al., 2002). This idea is motivated by the uncer
tainty principle of time–frequency analysis, whereby resolution cannot simultaneously be
optimized for both time and frequency. The evidence for hemispheric differences comes
mainly from functional imaging studies that manipulate spectral and temporal stimulus
characteristics (Samson, Zeffiro, et al., 2011; Zatorre & Belin, 2001) and neuropsycholo
gy studies that find pitch perception deficits associated with right temporal lesions (John
srude, Penhune, et al., 2000; Zatorre, 1985). (p. 145) A related alternative idea is that the
two hemispheres are specialized to analyze distinct timescales, with the left hemisphere
more responsive to short-scale temporal variation (e.g. tens of milliseconds) and the right
hemisphere more responsive to long-scale variation (e.g. hundreds of milliseconds)
(Boemio, Fromm, et al., 2005; Poeppel, 2003).
Page 15 of 62
Audition
The cochlea decomposes the acoustic input into frequency channels, but much of the im
portant information in sound is conveyed by the way that the output of these frequency
channels is modulated in amplitude. Consider Figure 8.7A, which displays in blue the out
put of one such frequency channel for a short segment of a speech signal. The blue wave
form oscillates at a rapid rate, but its amplitude waxes and wanes at a much lower rate
(evident in the close-up view of Figure 8.7B). This waxing and waning is known as ampli
tude modulation and is a common feature of many modes of sound production (e.g., vocal
articulation). The amplitude is captured by what is known as the envelope of a signal,
shown in red for the signal of Figures 8.7A and B. Often, the envelopes of each cochlear
channel are stacked vertically and displayed as an image called a spectrogram, providing
a depiction of how the sound energy in each frequency channel varies over time (Figure
8.7C). Figure 8.7D shows the spectra of the signal and envelope shown in Figures 8.7A
and B. The signal spectrum is bandpass (because it is the output of a bandpass filter),
with energy at frequencies in the audible range. The envelope spectrum, in contrast, is
Page 16 of 62
Audition
low-pass, with most of the power below 10 Hz, corresponding to the slow rate at which
the envelope changes. The frequencies that compose the envelope are typically termed
modulation frequencies, distinct from the acoustic frequencies that compose the signal
that the envelope is derived from.
The information carried by a cochlear channel can thus be viewed as the product of “fine
structure”—a (p. 146) waveform that varies rapidly, at a rate close to the center frequency
of the channel—and an amplitude envelope that varies more slowly (Rosen, 1992). The
envelope and fine structure have a clear relation to common signal processing formula
tions in which the output of a bandpass filter is viewed as a single sinusoid varying in am
plitude and frequency—the envelope describes the amplitude variation, and the fine
structure describes the frequency variation. The envelope of a frequency channel is also
straightforward to extract from the auditory nerve—it can be obtained by low-pass filter
ing a spike train (because the amplitude changes reflected in the envelope are relatively
slow). Despite the fact that envelope and fine structure are not completely independent
(Ghitza, 2001), there has been much interest in the past decade in distinguishing their
roles in different aspects of hearing (Smith, Delgutte, et al., 2002) and its impairment
(Lorenzi, Gilbert, et al., 2006).
Modulation Tuning
Page 17 of 62
Audition
Rodriguez, Chen, et al., 2010; Schreiner & Urbas, 1986, 1988; Woolley, Fremouw, et al.,
2005), loosely consistent with the idea of a modulation filter bank (Figure 8.9A). Because
such filters are typically conceived to operate on the envelope of a particular cochlear
channel, they are tuned both in acoustic frequency (courtesy of the cochlea) and modula
tion frequency.
Neurophysiological studies in nonhuman animals (Schreiner & Urbas, 1986, 1988) and
neuroimaging results in humans (Boemio, Fromm, et al., 2005; Giraud, Lorenzi, et al.,
2000; Schonwiesner & Zatorre, 2009) have generally found that the auditory cortex re
sponds preferentially to low modulation frequencies (in the range of 4–8 Hz), whereas
subcortical structures prefer higher rates (up to 100–200 Hz), with preferred modulation
frequency generally decreasing up the auditory pathway. Based on this, it is intriguing to
speculate that successive stages of the auditory system might process structure at pro
gressively longer (slower) timescales, analogous to the progressive increase in receptive
field size that occurs in the visual system from V1 to inferotemporal cortex (Lerner, Hon
ey, et al., 2011). Within the cortex, however, no hierarchy is clearly evident as of yet, at
least in the response to simple patterns of modulation (Boemio, Fromm, et al., 2005; Gi
raud, Lorenzi, et al., 2000). Moreover, there is considerable variation within each stage of
the pathway in the preferred modulation frequency of individual neurons (Miller, Escabi,
et al., 2001; Rodriguez, Chen, et al., 2010). There are several reports of topographic orga
nization for modulation frequency in the inferior colliculus, in which a gradient of pre
ferred modulation frequency is observed orthogonal to the tonotopic gradient of pre
ferred acoustic frequency (Baumann, Griffiths, et al., 2011; Langner, Sams, et al., 1997).
Whether there is topographic organization in the cortex remains unclear (Nelken, Bizley,
et al., 2008).
Page 18 of 62
Audition
Page 19 of 62
Audition
The STRF approximates a neuron’s output as a linear function of the cochlear input—the
result of convolving the spectrogram of the acoustic input with the STRF. However, it is
clear that linear models are inadequate to explain neuronal responses (Christianson, Sa
hani, et al., 2008; Machens, Wehr, et al., 2004; Rotman, Bar Yosef, et al., 2001; Theunis
sen, Sen, et al., 2000). Understanding the nonlinear contributions is an important direc
tion (p. 148) of future research (Ahrens, Linden, et al., 2008; David, Mesgarani, et al.,
2009), as neuronal nonlinearities likely play critical computational roles, but at present
much analysis is restricted to linear receptive field estimates. There are established
methods for computing STRFs, and they exhibit many interesting properties even though
they are clearly not the whole story.
Modulation tuning functions (e.g., those shown in Figure 8.9A) can be obtained via the
Fourier transform of the STRF. Temporal modulation tuning is commonly observed, as
previously discussed, but some tuning is normally also present for spectral modulation—
Page 20 of 62
Audition
variation in power that occurs along the frequency axis. Spectral modulation is often evi
dent as well in spectrograms of speech (e.g., Figure 8.7C) and animal vocalizations. Mod
ulation results both from individual frequency components and from formants—the broad
spectral peaks that are present for vowel sounds due to vocal tract resonances. Tuning to
spectral modulation is generally less pronounced than to amplitude modulation, especial
ly subcortically (Miller, Escabi, et al., 2001), but is an important feature of cortical re
sponses (Barbour & Wang, 2003; Mesgarani, David, et al., 2008). Examples of cortical
STRFs with spectral modulation sensitivity are shown in Figure 8.9C.
The efficient coding hypothesis has also been proposed to apply to modulation tuning in
the inferior colliculus. Modulation tuning bandwidth tends to increase with preferred
modulation frequency (Rodriguez, Chen, et al., 2010), as would be predicted if the low-
pass modulation spectra of most natural sounds (Attias & Schreiner, 1997; McDermott,
Wrobleski, et al., 2011; Singh & Theunissen, 2003) were to be divided into channels con
veying equal power. Inferior colliculus neurons have also been found to convey more in
formation about sounds whose amplitude distribution follows that of natural sounds
rather than that of white noise (Escabi, Miller, et al., 2003). Along the same lines, studies
of STRFs in the bird auditory system indicate that neurons are tuned to the properties of
bird song and other natural sounds, maximizing discriminability of behaviorally important
sounds (Hsu, Woolley, et al., 2004; Woolley, Fremouw, et al., 2005). Similar arguments
have been made about the coding of binaural cues to sound localization (Harper &
McAlpine, 2004).
Other strands of research have explored whether the auditory system might further adapt
to the environment by changing its coding properties in response to changing environ
mental statistics, so as to optimally represent the current environment. Following on re
search showing that the visual system adapts to local contrast statistics (Fairhall, Lewen,
et al., 2001), numerous groups have reported evidence for neural adaptation in the audi
tory system—responses to a fixed stimulus that vary depending on the immediate history
of stimulation (Ulanovsky, Las, et al., 2003; Kvale & Schreiner, 2004). In some cases, it
Page 21 of 62
Audition
can be shown that this adaptation increases information transmission. For instance, the
“tuning” of neurons in the inferior colliculus to sound intensity (i.e., the function relating
intensity to firing rate) depends on the mean and variance of the local intensity distribu
tion (Dean, Harper, et al., 2005). Qualitatively, the rate–intensity curves shift so that the
point of maximum slope (around which neural discrimination of intensity is best) is closer
to the most commonly occurring intensity. Quantitatively, this behavior results in in
creased information transmission about stimulus level.
Some researchers have recently taken things a step further, showing that auditory re
sponses are dependent not just on the stimulus history but also on the task a listener is
performing. Fritz and colleagues found that the STRFs measured for neurons in the pri
mary auditory cortex of awake ferrets change depending on whether the animals are per
forming a task (Fritz, Shamma, et al., 2003), and that the nature of the change depends
on the task (Fritz, Elhilali, et al., 2005). For instance, STRF changes serve to accentuate
the frequency of a tone being detected, or to enhance discrimination of a target tone from
a reference. These changes are mirrored in sound-evoked responses in the prefrontal cor
tex (Fritz, David, et al., 2010), which may drive the changes that occur in auditory cortex
during behavior. In some cases the STRF changes persist long after the animals are fin
ished performing the task, and as such may play a role in sensory memory and perceptual
learning.
Page 22 of 62
Audition
Perhaps because they are more easily controlled and manipulated, researchers have been
more inclined to instead study the perception of isolated properties of sounds or their
sources. Much research has concentrated in particular on three well-known properties of
sound: spatial location, pitch, and loudness. This focus is in some sense unfortunate be
cause auditory perception is much richer than the hegemony of these three attributes in
hearing science would indicate. However, their study has nonetheless given rise to fruit
ful lines of research that have yielded many useful insights about hearing more generally.
Localization
Localization is less precise in hearing than in vision but is nonetheless of great value, be
cause sound enables us to localize objects that we may not be able to see. Human ob
servers can judge the location of a source to within a few degrees if conditions are opti
mal. The processes by which this occurs are among the best understood in hearing.
Spatial location is not made explicit on the cochlea, which provides a map of frequency
rather than of space, and instead must be derived from three primary sources of informa
tion. Two of these are binaural, resulting from differences in the acoustic input to the two
ears. Due to the difference in path length from the source to the ears, and to the acoustic
shadowing effect of the head, sounds to one side of the vertical meridian reach the two
ears at different times and with different intensities. These interaural time and level dif
ferences vary with direction and thus provide a cue to a sound source’s location. Binaural
cues are primarily useful for deriving the location of a sound in the horizontal plane, be
cause changes in elevation do not change interaural time or intensity differences much.
To localize sounds in the vertical dimension, or to distinguish sounds coming from in front
of the head from those from in back, listeners rely on a third source of information: the
filtering of sounds by the body and ears. This filtering is direction specific, such that a
spectral analysis can reveal peaks and valleys in the frequency spectrum that are signa
tures of location in the vertical dimension (Figure 8.10; discussed further below).
Page 23 of 62
Audition
Interaural time differences (ITDs) are typically a fraction of a millisecond, and just-notice
able ITDs (which determine spatial acuity) can be as low as 10 microseconds (Klump &
Eady, 1956). This is striking given that neural refractory periods (which determine the
minimal interspike interval for a single neuron) are on the order of a millisecond, which
one might think would put a limit on the temporal resolution of neural representations.
Typical interaural level differences (ILDs) can be as large as 20 dB, with a just-noticeable
difference of about 1 dB. ILDs result from the acoustic shadow cast by the head. To first
order, ILDs are more pronounced for high frequencies because low frequencies are less
affected by the acoustic shadow (because their wavelengths are comparable to the dimen
sions of the head). ITDs, in contrast, support localization most effectively at low frequen
cies, when the time difference between individual cycles of sinusoidal sound components
can be detected via phase-locked spikes from the two ears (phase locking, as we dis
cussed earlier, degrades at high frequencies). That said, ITDs between the envelopes of
high-frequency sounds can also produce percepts of localization. The classic “duplex”
view that localization is determined by either ILDs or ITDs, depending (p. 151) on the fre
quency (Rayleigh, 1907), is thus not fully appropriate for realistic natural sounds, which
in general produce perceptible ITDs across the spectrum. See Middlebrooks and Green
(1991), for a review of much of the classic behavioral work on sound localization.
The binaural cues to sound location are extracted in the superior olive, a subcortical re
gion where inputs from the two ears are combined. In most animals there appears to be
an elegant segregation of function, with ITDs being extracted in the medial superior olive
(MSO) and ILDs being extracted in the lateral superior olive (LSO). In both cases, accu
rate coding of interaural differences is made possible by neural signaling with unusually
high temporal precision. This precision is needed to encode both sub-millisecond ITDs
and ILDs of brief transient events, for which the inputs from the ears must be aligned in
time. Brain structures subsequent to the superior olive largely inherit its ILD and ITD
Page 24 of 62
Audition
sensitivity. See Yin and Kuwada, 2010, for a recent review of the physiology of binaural lo
calization.
Binaural cues are of little use in distinguishing sounds at different locations on the verti
cal dimension (relative to the head), or in distinguishing front from back, because interau
ral time and level differences are largely unaffected by changes across these locations.
Instead, listeners rely on spectral cues provided by the filtering of a sound by the torso,
head, and ears of a listener. The filtering results from the reflection and absorption of
sound by the surfaces of a listener’s body, with sound from different directions producing
different patterns of reflection and thus different patterns of filtering. The effect of these
interactions on the sound that reaches the eardrum can be described by a linear filter
known as the head-related transfer function (HRTF). The overall effect is that of amplify
ing some frequencies while attenuating others. A broadband sound entering the ear will
thus be endowed with peaks and valleys in its frequency spectrum (see Figure 8.10).
Compelling sound localization can be perceived when these peaks and valleys are artifi
cially induced. The effect of the filtering is obviously confounded with the spectrum of the
unfiltered sound source, and the brain must make some assumptions about the source
spectrum. When these assumptions are violated, as with narrowband sounds whose spec
tral energy occurs at a peak in the HRTF of a listener, sounds are mislocalized (Middle
brooks, 1992). For broadband sounds, however, HRTF filtering produces signatures that
are sufficiently distinct as to support localization in the vertical dimension to within 5 de
grees or so in some cases, although some locations are more accurately perceived than
others (Makous & Middlebrooks, 1990; Wightman & Kistler, 1989).
The bulk of the filtering occurs in the outer ear (the pinna), the folds of which produce
distinctive pattern of reflections. Because pinna shapes vary across listeners, the HRTF is
listener specific as well as location specific, with spectral peaks and valleys that are in
different places for different listeners. Listeners appear to learn the HRTFs for their set
of ears. When ears are artificially modified with plastic molds that change their shape, lo
calization initially suffers considerably, but over a period of weeks, listeners regain the
ability to localize with the modified ears (Hofman, Van Riswick, et al., 1998). Listeners
thus learn at least some of the details of their particular HRTF through experience, al
though sounds (p. 152) can be localized even when the peaks and valleys of the pinna fil
tering are somewhat blurred (Kulkarni & Colburn, 1998). Moreover, compelling spatial
ization is often evident even if a generic HRTF is used.
The physiology of HRTF-related cues for localization is not as developed as it is for binau
ral cues, but there is evidence that midbrain regions may again be important. Many infe
rior colliculus neurons, for instance, show tuning to sound elevation (Delgutte, Joris, et
al., 1999). The selectivity for elevation presumably derives from tuning to particular spec
tral patterns (peaks and valleys in the spectrum) that are diagnostic of particular loca
tions (May, Anderson, et al., 2008).
Page 25 of 62
Audition
Although the key cues for sound localization are extracted subcortically, lesion studies re
veal that the cortex is essential for localizing sound. Ablating auditory cortex typically
produces large deficits in localization (Heffner & Heffner, 1990), with unilateral lesions
producing deficits specific to locations contralateral to the side of the lesion (Jenkins &
Masterton, 1982). Consistent with these findings, tuning to sound location is widespread
in auditory cortical neurons, with the preferred location generally positioned in the con
tralateral hemifield (Middlebrooks, 2000). Topographic representations of space have not
been found to be evident within individual auditory cortical areas, although one recent re
port argues that such topography may be evident across multiple areas (Higgins, Storace,
et al., 2010).
Pitch
Although the word pitch is often used colloquially to refer to the perception of sound fre
quency, in hearing research it has a more specific meaning—pitch is the perceptual corre
late of periodicity. Vocalizations, instrument sounds, and some machine sounds are all of
ten produced by periodic physical processes. Our vocal cords open and close at regular
intervals, producing a series of clicks separated by regular temporal intervals. Instru
ments produce sounds via strings that oscillate at a fixed rate, or via tubes in which the
air vibrates at particular resonant frequencies, to give two examples. Machines frequent
ly feature rotating parts, which often produce sounds at every rotation. In all these cases,
the resulting sounds are periodic—the sound pressure waveform consists of a single
shape that repeats at a fixed rate (Figure 8.11A). Perceptually, such sounds are heard as
having a pitch that can vary from low to high, proportional to the frequency at which the
waveform repeats (the fundamental frequency, i.e., the F0). The periodicity is distinct
from whether a sound’s frequencies fall in high or low regions of the spectrum, although
in practice periodicity and the spectral center of mass are sometimes correlated.
Many physically different sounds—all those with a particular period—have the same
pitch. Historically, pitch has been a focal point of hearing research because it is an impor
tant perceptual property with a nontrivial relationship to the acoustic input, whose mech
anistic characterization has been resistant to unambiguous solution. Debates on pitch and
related phenomena date back at least to Helmholtz, and continue to occupy many re
searchers today (Plack, Oxenham, et al., 2005).
One central debate concerns whether pitch is derived from an analysis of frequency or
time. Periodic waveforms produce spectra whose frequencies are harmonically related—
they form a harmonic series, being integer multiples of the fundamental frequency, whose
period is the period of the waveform (Figure 8.11B). Although the fundamental frequency
Page 26 of 62
Audition
determines the pitch, the fundamental need not be physically present in the spectrum for
a sound to have pitch—sounds missing the fundamental frequency but containing other
harmonics of the fundamental are still perceived to have the pitch of the fundamental, an
effect known as the missing fundamental illusion. What matters for pitch perception is
whether the frequencies that are present are harmonically related. Pitch could thus con
ceivably be detected with harmonic templates applied to an estimate of a sound’s spec
trum obtained from the cochlea (Goldstein, 1973; Shamma & Klein, 2000; Terhardt, 1974;
Wightman, 1973). Alternatively, periodicity could be assessed in the time domain, for in
stance via the autocorrelation function (Cariani & Delgutte, 1996; de Cheveigne & Kawa
hara, 2002; Meddis & Hewitt, 1991). The autocorrelation measures the correlation of a
signal with a delayed copy of itself. For a periodic signal that repeats with some period,
the autocorrelation exhibits peaks at multiples of the period (Figure 8.11C).
(p. 153)
Page 27 of 62
Audition
Such analyses are in principle functionally equivalent because the power spectrum is re
lated to the autocorrelation via the Fourier transform, and detecting periodicity in one do
main versus the other might simply seem a question of implementation. In the context of
the auditory system, however, the two concepts diverge, due to information being limited
by distinct factors in the two domains. Time–domain models are typically assumed to uti
lize fine-grained spike timing (i.e., phase locking), with concomitant temporal resolution
limits. In contrast, frequency-based models (often known as place models, in reference to
the frequency–place mapping that occurs on the basilar membrane) rely on the pattern of
excitation along the cochlea, which is limited in resolution by the frequency tuning of the
cochlea (Cedolin & Delgutte, 2005). Cochlear frequency selectivity is present in time–do
main models of pitch as well, but its role is typically not to estimate the spectrum but sim
ply to restrict an autocorrelation analysis to a narrow frequency band (Bernstein & Oxen
ham, 2005), which might help improve its robustness in the presence of multiple sound
sources. Reviews of the current debates and their historical origins are available else
where (de Cheveigne, 2004; Plack & Oxenham, 2005), and we will not discuss them ex
haustively here. Suffice it to say that despite being a centerpiece of hearing research for
decades, the mechanisms underlying pitch perception remain under debate.
Research on pitch has provided many important insights about hearing even though a
conclusive account of pitch remains elusive. One contribution of pitch research has been
to reveal the importance of the resolvability of individual frequency components by the
cochlea, a principle that has importance in other aspects of hearing as well. Because the
frequency resolution of the cochlea is approximately constant on a logarithmic scale,
whereas the components of a harmonic tone are equally spaced on a linear scale (separat
ed by a fixed number of hertz, equal to the fundamental frequency of the tone; Figure
8.12A), multiple high-numbered harmonics fall within a single cochlear filter (Figure
8.12B). Because of the nature of the log scale, this is true regardless of whether the fun
damental is low or high. As a result, the excitation pattern induced by a tone on the
cochlea (of a human with normal hearing) is believed to contain resolvable peaks for only
the first ten or so harmonics (Figure 8.12C).
Page 28 of 62
Audition
There is now abundant evidence that resolvability places strong constraints on pitch per
ception. For instance, the perception of pitch is determined (p. 154) predominantly by low-
numbered harmonics (harmonics one to ten or so in the harmonic series), presumably ow
ing to the peripheral resolvability of these harmonics. Moreover, the ability to discrimi
Page 29 of 62
Audition
nate pitch is much poorer for tones synthesized with only high-numbered harmonics than
for tones containing only low-numbered harmonics, an effect not accounted for simply by
the frequency range in which the harmonics occur (Houtsma & Smurzynski, 1990; Shack
leton & Carlyon, 1994). This might be taken as evidence that the spatial pattern of excita
tion, rather than the periodicity that could be derived from the autocorrelation, underlies
pitch perception, but variants of autocorrelation-based models have also been (p. 155) pro
posed to account for the effect of resolvability (Bernstein & Oxenham, 2005). Resolvabili
ty has since been demonstrated to constrain sound segregation as well as pitch (Micheyl
& Oxenham, 2010); see below.
Just as computational theories of pitch remain a matter of debate, so do its neural corre
lates. One might expect that neurons at some stage of the auditory system would be
tuned to stimulus periodicity, and there is one recent report of this in marmosets (Bendor
& Wang, 2005). However, comparable results have yet to be reported in other species
(Fishman, Reser, et al., 1998), and some have argued that pitch is encoded by ensembles
of neurons with broad tuning rather than single neurons selective for particular funda
mental frequencies (Bizley, Walker, et al., 2010). In general, pitch-related responses can
be difficult to disentangle from artifactual responses to distortions introduced by the non
linearities of the cochlea (de Cheveigne, 2010; McAlpine, 2004).
Given the widespread presence of frequency tuning in the auditory system, and the im
portance of harmonic frequency relations in pitch, sound segregation (Darwin, 1997), and
music (McDermott, Lehr, et al., 2010), it is natural to think there might be neurons with
multipeaked tuning curves selective for harmonic frequencies. There are a few isolated
reports of such tuning (Kadia & Wang, 2003; Sutter & Schreiner, 1991), but the tuning
peaks do not always correspond to harmonic frequencies, and whether they relate to
pitch is unclear. At least given how researchers have looked for it thus far, tuning for har
monicity is not as evident in the auditory system as might be expected.
If pitch is analyzed in a particular part of the brain, one might expect the region to re
spond more to stimuli with pitch than to those lacking it, other things being equal. Such
response properties have in fact been reported in regions of auditory cortex identified
with functional imaging in humans (Hall, Barrett, et al. 2005; Patterson, Uppenkamp, et
al., 2002; Penagos, Melcher, et al., 2004; Schonwiesner & Zatorre, 2008). The regions are
typically reported to lie outside primary auditory cortex, and could conceivably be homol
ogous to the region claimed to contain pitch-tuned neurons in marmosets (Bendor &
Wang, 2006), although again there is some controversy over whether pitch per se is impli
cated (Hall & Plack, 2009). See Winter, 2005, and Walker, Bizley, et al., 2010, for recent
reviews of the brain basis of pitch.
In many contexts (e.g., the perception of music or speech intonation), it is the changes in
pitch over time that matter rather than the absolute value of the F0. For instance, pitch
increases or decreases are what capture the identity of a melody or the intention of a
speaker. Less is known about how this relative pitch information is represented in the
brain, but the right temporal lobe has been argued to be important, in part on the basis of
Page 30 of 62
Audition
Loudness
Loudness is the perhaps the most immediate perceptual property of sound, and has been
actively studied for more than 150 years. To first order, loudness is the perceptual corre
late of sound intensity. In real-world listening scenarios, loudness exhibits additional in
fluences that suggest it serves to estimate the intensity of a sound source, as opposed to
the intensity of the sound entering the ear (which changes with distance and the listening
environment). However, loudness models that capture exclusively peripheral processing
nonetheless have considerable predictive power.
For a sound with a fixed spectral profile, such as a pure tone or a broadband noise, the
relationship between loudness and intensity can be approximated via the classic Stevens
power law (Stevens, 1955). However, the relation between loudness and intensity is not
as simple as one might imagine. For instance, loudness increases with increasing band
width—a sound whose frequencies lie in a broad range will seem louder than a sound
whose frequencies lie in a narrow range, even when their physical intensities are equal.
Standard models of loudness thus posit something somewhat more complex than a simple
power law of intensity: that loudness is linearly related to the total amount of neural ac
tivity elicited by a stimulus at the level of the auditory nerve (ANSI, 2007; Moore & Glas
berg, 1996). The effect of bandwidth on loudness is explained via the compression that
occurs in the cochlea: loudness is determined by the neural activity summed across nerve
fibers, the spikes of which are generated after the output of a particular cochlear location
is nonlinearly compressed. Because compression boosts low responses relative to high re
sponses, the sum of several responses to low amplitudes (produced by the several fre
quency channels stimulated by a broadband sound) is greater than a single response to a
high amplitude (produced by a single frequency (p. 156) channel responding to a narrow
band sound of equal intensity). Loudness also increases with duration for durations up to
half a second or so (Buus, Florentine, et al., 1997), suggesting that it is computed from
neural activity integrated over some short window.
The ability to predict perceived loudness is important in many practical situations, and is
a central issue in the fitting of hearing aids. Cochlear compression is typically reduced in
hearing-impaired listeners, and amplification runs the risk of making sounds uncomfort
ably loud unless compression is introduced artificially. There has thus been long-standing
interest in quantitative models of loudness.
that appear more distant seem louder than those that appear closer but have the same
overall intensity. Visual cues to distance have some influence on perceived loudness (Mer
shon, Desaulniers, et al., 1981), but the cue provided by the amount of reverberation also
seems to be important. The more distant a source, the weaker the direct sound from the
source to the listener, relative to the reverberant sound that reaches the listener after re
flection off of surfaces in the environment (see Figure 8.14). This ratio of direct to rever
berant sound appears to be used both to judge distance and to calibrate loudness percep
tion (Zahorik & Wightman, 2001), although how the listener estimates this ratio from the
sound signal remains unclear at present. Loudness thus appears to function somewhat
like size or brightness perception in vision, in which perception is not based exclusively
on retinal size or light intensity (Adelson, 2000).
Historically, the “cocktail party problem” has referred to two conceptually distinct prob
lems that in practice are closely related. The first, known as sound segregation, is the
problem of deriving representations of individual sound sources from a mixture of sounds.
The second is the task of directing attention to one source among many, as when listening
to a particular speaker at a party. These tasks are related because the ability to segregate
sounds is probably dependent on attention (Carlyon, Cusack, et al., 2001; Shinn-Cunning
ham, 2008), although the extent and nature of this dependence remains an active area of
study (Macken, Tremblay, et al., 2003). Here, we will focus on the first problem, of sound
Page 32 of 62
Audition
segregation, which is usually studied under conditions in which listeners pay full atten
tion to a target sound. Al Bregman, a Canadian psychologist, is typically credited with
drawing interest to this problem and pioneering its study (Bregman, 1990).
er ill-posed problems, this inference is only possible with the aid of assumptions that con
strain the solution. In this case, the assumptions concern the nature of sounds in the
world, and are presumably learned from experience with natural sounds (or perhaps
hard-wired into the auditory system via evolution).
Grouping cues (i.e., sound properties that dictate whether sound elements are heard as
part of the same sound) are examples of these assumptions. For instance, natural sounds
that have pitch, such as vocalizations, contain frequencies that are harmonically related,
evident as banded structures in lower half of the spectrogram of the target speaker in
Figure 8.13. Harmonically related frequencies are unlikely to occur from the chance
alignment of multiple different sounds, and thus when they (p. 158) are present in a mix
ture, they are likely to be due to the same sound and are generally heard as such (de
Cheveigne, McAdams, et al., 1995; Roberts & Brunstrom, 1998). Moreover, a component
that is mistuned (in a tone containing otherwise harmonic frequencies) segregates from
the rest of the tone (Moore, Glasberg, et al., 1986). Understanding sound segregation re
quires understanding the acoustic regularities, such as harmonicity, that characterize nat
ural sound sources and that are used by the auditory system.
Perhaps the most important generic acoustic grouping cue is common onset: frequency
components that begin and end at the same time are likely to belong to the same sound.
Onset differences, when manipulated experimentally, cause frequency components to per
ceptually segregate from each other (Cutting, 1975; Darwin, 1981). Interestingly, a com
ponent that has an earlier or later onset than the rest of a set of harmonics has reduced
influence over the perceived pitch of the entire tone (Darwin & Ciocca, 1992), suggesting
that pitch computations operate on frequency components that are deemed likely to be
long together, rather than on the raw acoustic input.
Not every intuitively plausible grouping cue produces a robust effect when assessed psy
chophysically. For instance, frequency modulation (FM) that is shared (“coherent”) across
multiple frequency components, as in voiced speech, has been proposed to promote their
grouping (Bregman, 1990; McAdams, 1989). However, listeners are poor at discriminat
ing coherent from incoherent FM if the component tones are not harmonically related, in
dicating that sensitivity to FM coherence may simply be mediated by the deviations from
harmonicity that occur when harmonic tones are incoherently modulated (Carlyon, 1991).
One might also think that the task of segregating sounds would be greatly aided by the
tendency of distinct sound sources in the world to originate from distinct locations. In
practice, spatial cues are indeed of some benefit, for instance, in hearing a target sen
tence from one direction amid distracting utterances from other directions (Bronkhorst,
Page 34 of 62
Audition
2000; Hawley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast,
et al., 2005). However, spatial cues are surprisingly ineffective at segregating one fre
quency component from a group of others (Culling & Summerfield, 1995), especially
when pitted against other grouping cues such as onset or harmonicity (Darwin & Hukin,
1997). The benefit of listening to a target with a distinct location (Bronkhorst, 2000; Haw
ley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast, et al.,
2005) may thus be due to the ease with which the target can be attentively tracked over
time amid competing sound sources, rather than to a facilitation of auditory grouping per
se (Darwin & Hukin, 1999). Moreover, humans are usually able to segregate monaural
mixtures of sounds without difficulty, demonstrating that spatial separation is often not
necessary for high performance. For instance, much popular music of the twentieth cen
tury was released in mono, and yet listeners have no trouble distinguishing many differ
ent instruments and voices in any given recording. Spatial cues thus contribute to sound
segregation, but their presence or absence does not seem to fundamentally alter the
problem.
The weak effect of spatial cues on segregation may reflect their fallibility in complex audi
tory scenes. Binaural cues can be contaminated when sounds are combined or degraded
by reverberation (Brown & Palomaki, 2006) and can even be deceptive, as when caused
by echoes (whose direction is generally different from the original sound source). It is
possible that the efficacy of different grouping cues in general reflects their reliability in
natural conditions. Evaluating this hypothesis will require statistical analysis of natural
auditory scenes, an important direction for future research.
Sequential Grouping
Because the spectrogram approximates the input that the cochlea provides to the rest of
the auditory system, it is common to view the problem of sound segregation as one of de
ciding how to group the various parts of the spectrogram (Bregman, 1990). However, the
brain does not receive an entire spectrogram at once. Rather, the auditory input arrives
gradually over time. Many researchers thus distinguish between the problem of simulta
neous grouping (determining how the spectral content of a short segment of the auditory
input should be segregated) and sequential grouping (determining how the (p. 159)
groups from each segment should be linked over time, e.g., to form a speech utterance or
a melody) (Bregman, 1990).
Although most of the classic grouping cues (e.g., onset/comodulation, harmonicity, ITD)
are quantities that could be measured over short timescales, the boundary between what
is simultaneous and what is sequential is unclear for most real-world signals, and it may
be more appropriate to view grouping as being influenced by processes operating at mul
tiple timescales rather than two cleanly divided stages of processing. There are, however,
contexts in which the bifurcation into simultaneous and sequential grouping stages is nat
ural, as when the auditory input consists of discrete sound elements that do not overlap
in time. In such situations interesting differences are sometimes evident between the
grouping of simultaneous and sequential elements. For instance, spatial cues, which are
Page 35 of 62
Audition
Another clear case of sequential processing can be found in the effects of sound repeti
tion. Sounds that occur repeatedly in the acoustic input are detected by the auditory sys
tem as repeating, and are inferred to be a single source. Perhaps surprisingly, this is true
even when the repeating source is embedded in mixtures with other sounds, and is never
presented in isolation (McDermott, Wrobleski, et al., 2011). In such cases the acoustic in
put itself does not repeat, but the source repetition induces correlations in the input that
the auditory system detects and uses to extract the repeating sound. The informativeness
of repetition presumably results from the fact that mixtures of multiple sounds tend not to
occur repeatedly, such that when a structure does repeat, it is likely to be a single source.
Streaming
One type of sequential segregation effect has particularly captured the imagination of the
hearing community and merits special mention. When two pure tones of different fre
quency are repeatedly presented in alternation, one of two perceptual states is commonly
reported by listeners: one in which the two repeated tones are heard as a single “stream”
whose pitch varies over time, and one in which two distinct streams are heard, one with
the high tones and one with the low tones (Bregman & Campbell, 1971). If the frequency
separation between the two tones is small, and if the rate of alternation is slow, one
stream is generally heard. When the frequency separation is larger or the rate is faster,
two streams tend to be heard, in which case “streaming” is said to occur (van Noorden,
1975).
An interesting hallmark of this phenomenon is that when two streams are perceived,
judgments of the temporal order of elements in different streams are impaired (Bregman
& Campbell, 1971; Micheyl & Oxenham, 2010). This latter finding provides compelling ev
idence for a substantive change in the representation underlying the two percepts. Sub
sequent research has demonstrated that separation along most dimensions of sound can
elicit streaming (Moore & Gockel, 2002). The streaming effects in these simple stimuli
Page 36 of 62
Audition
Filling in
Although it is common to view sound segregation as the problem of grouping the spectro
gram-like output of the cochlea across frequency and time, this cannot be the whole story,
in part because large swaths of a sound’s time–frequency representation are often physi
cally obscured (masked) by other sources and are thus not physically available to be
grouped. Masking is evident in the green pixels of Figure 8.13, which represent points
where the target source has substantial energy, but where the mixture exceeds it in level.
If these points are simply assigned (p. 160) to the target, or omitted from its representa
tion, the target’s level at those points will be misconstrued, and the sound potentially
misidentified. To recover an accurate estimate of the target source, it is necessary to in
fer not just the grouping of the energy in the spectrogram but also the structure of the
target source in the places where it is masked.
There is in fact considerable evidence that the auditory system does just this, from exper
iments investigating the perception of partially masked sounds. For instance, tones that
are interrupted by noise bursts are “filled in” by the auditory system, such that they are
heard as continuous in conditions in which physical continuity is plausible given the stim
ulus (Warren, Obusek, et al., 1972). Known as the “continuity effect”, it occurs only when
the interrupting noise bursts are sufficiently intense in the appropriate part of the spec
trum to have masked the tone should it have been present continuously. Continuity is also
heard for frequency glides (Ciocca & Bregman, 1987; Kluender & Jenison, 1992) as well
as oscillating frequency-modulated tones (Carlyon, Micheyl, et al., 2004). The perception
of continuity across intermittent maskers was actually first reported for speech signals in
terrupted by noise bursts (Warren, 1970). For speech, the effect is often termed phonemic
restoration, and likely indicates that knowledge of speech acoustics (and perhaps of other
types of sounds as well) influences the inference of the masked portion of sounds. Similar
effects occur in the spectral domain—regions of the spectrum are perceptually filled in
when evidence indicates they are likely to have been masked, e.g. by a continuous noise
source (McDermott & Oxenham, 2008). Filling-in effects in hearing are conceptually simi
lar to completion under and over occluding surfaces in vision, although the ecological
constraints provided by masking (involving the relative intensity of two sounds) are dis
tinct from those provided by occlusion (involving the relative depth of two surfaces). Neu
rophysiological evidence indicates that the representation of tones in primary auditory
cortex reflects the perceived continuity, responding as though the tone were continuously
present despite being interrupted by noise (Petkov, O’Connor, et al., 2007; Riecke, van
Opstal, et al., 2007).
Page 37 of 62
Audition
Recent years have seen great interest in how sound segregation is instantiated in the
brain. One proposal that has attracted interest is that sounds are heard as segregated
when they are represented in non-overlapping neural populations at some stage of the au
ditory system. This idea derives largely from studies of the pure-tone streaming phenome
na described earlier, with the hope that it will extend to more realistic sounds.
The notion is that conditions that cause two tones to be represented in distinct neural
populations are also those that cause sequences of two tones to be heard as separate
streams (Bee & Klump, 2004; Fishman, Arezzo, et al., 2004; Micheyl, Tian, et al., 2005;
Pressnitzer, Sayles, et al., 2008). Because of tonotopy, different frequencies are processed
in neural populations whose degree of overlap decreases as the frequencies become more
separated. Moreover, tones that are more closely spaced in time are more likely to reduce
each other’s response (via what is termed suppression), which also reduces overlap be
tween the tone representations—a tone on the outskirts of a neuron’s receptive field
might be sufficiently suppressed as to not produce a response at all. These two factors,
frequency separation and suppression, predict the two key effects in pure-tone stream
ing: that streaming should increase when tones are more separated in frequency or are
presented more quickly (van Noorden, 1975).
Experiments over the past decade in multiple animal species indicate that pure-tone se
quences indeed produce non-overlapping neural responses under conditions in which
streaming is perceived by human listeners (Bee & Klump, 2004; Fishman, Arezzo, et al.,
2004; Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al., 2008). Some of these experi
ments take advantage of another notable property of streaming—its strong dependence
on time. Specifically, the probability that listeners report two streams increases with time
from the beginning of the sequence, an effect termed buildup (Bregman, 1978). Buildup
has been linked to neurophysiology via neural adaptation. Because neural responses de
crease with stimulus repetition, over time it becomes less likely that two stimuli with dis
tinct properties will both exceed the spiking threshold for the same neuron, such that the
neural responses to two tones become increasingly segregated on a timescale consistent
with that of perceptual buildup (Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al.,
2008). For a comprehensive review of these and related studies, see Snyder and Alain,
2007, and Fishman and Steinschneider, 2010.
A curious feature of these studies is that they suggest that streaming is an accidental side
effect of what would appear to be general features of the auditory system—tonotopy, sup
pression, and (p. 161) adaptation. Given that sequential grouping seems likely to be of
great adaptive significance (because it affects our ability to recognize sounds), it would
seem important for an auditory system to behave close to optimally, that is, for the per
ception of one or two streams to be related to the likelihood of one or two streams in the
world. It is thus striking that the phenomenon is proposed to result from apparently inci
dental features of processing. Consistent with this viewpoint, a recent study showed that
synchronous high- and low-frequency tones produce neural responses that are just as
Page 38 of 62
Audition
segregated as those for the classic streaming configuration of alternating high and low
tones, even though perceptual segregation does not occur when the tones are synchro
nous (Elhilali, Ma, et al., 2009). This finding indicates that non-overlapping neural re
sponses are not sufficient for perceptual segregation, and that the relative timing of neur
al responses may be more important. The significance of neural overlap thus remains un
clear, and the brain basis of streaming will undoubtedly continue to be debated in the
years to come.
Thus far we have mainly discussed how the auditory system segregates the signals from
multiple sound sources, but listeners face a second important scene analysis problem.
The sound that reaches the ear from a source is almost always altered to some extent by
the surrounding environment, and these environmental influences must be separated
from those of the source if the source content is to be estimated correctly. Typically the
sound produced by a source reflects off multiple surfaces on its way to the ears, such that
the ears receive some sound directly from the source, but also many reflected versions
(Figure 8.14). These reflected versions (echoes) are delayed because their path to the ear
is lengthened, but generally they also have altered frequency spectra because reflective
surfaces absorb some frequencies more than others. Because each reflection can be well
described with a linear filter applied to the source signal, the signal reaching the ear,
which is the sum of the direct sound along with all the reflections, can be described sim
ply as the result of applying a single composite linear filter to the source (Gardner, 1998).
Significant filtering of this sort occurs in almost every natural listening situation, such
that sound produced in anechoic conditions (in which all surfaces are minimally reflec
tive) sounds noticeably strange and unnatural.
Listeners are often interested in the properties of sound sources, and one might think of
the environmental effects as a nuisance that should simply be discounted. However, envi
ronmental filtering imbues the acoustic input with useful information—for instance, about
the size of a room where sound is produced and the distance of the source from the lis
tener. It is thus more appropriate to think of separating source and environment, at least
to some extent, rather than simply recovering the source. Reverberation is commonly
used in music production, for instance, to create a sense of space or to give a different
feel to particular instruments or voices.
The loudness constancy phenomena discussed earlier are one example of the brain infer
ring the properties of the sound source as separate from that of the environment, but
there are many others. One of the most interesting involves the treatment of echoes in
sound localization. The echoes that are common in most natural environments pose a
problem for localization because they generally come from directions other than that of
the source (Figure 8.14B). The auditory system appears to solve this problem by percep
tually fusing similar impulsive sounds that occur within a brief interval of each other (on
the order of 10 ms or so), and using the sound that occurs first to determine the per
ceived location. This precedence effect, so called because of the dominance of the sound
Page 39 of 62
Audition
that occurs first, was described and named by Hans Wallach (Wallach, Newman, et al.,
1949), one of the great gestalt psychologists, and has since been the subject of a large
and interesting literature. For instance, the maximal delay at which echoes are perceptu
ally suppressed increases as two pairs of sounds are repeatedly presented (Freyman,
Clifton, et al., 1991), presumably because the repetition provides evidence that the sec
ond sound is indeed an echo of the first, rather than being due to a distinct source (in
which case it would not occur at a consistent delay following the first sound). Moreover,
reversing the order of presentation can cause an abrupt breakdown of the effect, such
that two sounds are heard rather than one, each with a different location. See Litovsky,
Colburn, et al., 1999, for a review.
Page 40 of 62
Audition
fect on our ability to recognize speech and other sounds. Recent work indicates that part
of our robustness to reverberation derives from a process that adapts to the history of
echo stimulation. In reverberant conditions, the intelligibility of a speech utterance has
been found to be higher when preceded by another utterance than when not, an effect
that does not occur in anechoic conditions (Brandewie & Zahorik, 2010). Such results,
like those of the precedence effect, are consistent with the idea that listeners construct a
model of the environment’s contribution to the acoustic input and use it to partially dis
count the environment when judging properties of a source. Analogous effects have been
found with nonspeech sounds. When listeners hear instrument sounds preceded by
speech or music that has been passed through a filter that “colors” the spectrum, the in
strument sound is identified differently, as though listeners internalize the filter, assume
it to be an environmental effect, and discount it to some extent when identifying the
sound (Stilp, Alexander, et al., 2010).
One important set of questions concerns the interface of audition with the rest of cogni
tion, via attention and memory. Attention research ironically also flourished in hearing
early on (with Cherry’s [1953] classic dichotic listening studies), but then largely moved
to the visual domain. Recent years have seen renewed interest (see chapter 11 in this vol
ume), but there remain many open questions. Much is still unclear about what is repre
sented about sound in the absence of attention, about how and what auditory attention
selects, and about the role of attention in perceptual organization.
Page 41 of 62
Audition
Another promising research area involves working memory. Auditory short-term memory
may have some striking differences with its visual counterpart (Demany, Trost, et al.,
2008) and appears closely linked to auditory scene analysis (Conway, Cowan, et al., 2001).
(p. 163) Studies of these topics in audition also hold promise for informing us more gener
ally about the structure of cognition––the similarities and differences with respect to visu
al cognition will reveal much about whether attention and memory mechanisms are do
main general (perhaps exploiting central resources) or specific to particular sensory sys
tems.
Interactions between audition and the other senses are also attracting increased interest.
Information from other sensory systems likely plays a crucial role in hearing given that
sound on its own often provides ambiguous information. The sounds produced by rain and
applause, for instance, can in some cases be quite similar, such that multisensory integra
tion (using visual, somatosensory, or olfactory input) may help to correctly recognize the
sound source. Cross-modal interactions in localization (Alais & Burr, 2004) are similarly
powerful. Understanding cross-modal effects within the auditory system (Bizley, Nodal, et
al., 2007; Ghazanfar, 2009; Kayser, Petkov, et al., 2008) and their role in behavior will be a
significant direction of research going forward.
In addition to the uncharted territory in perception and cognition, there remain important
open questions about peripheral processing. Some of these unresolved issues, such as the
mechanisms of outer hair cell function, have great importance for understanding hearing
impairment. Others may dovetail with higher level function. For instance, the role of ef
ferent connections to the cochlea is still uncertain, with some hypothesizing a role in at
tention or segregation (Guinan, 2006). The role of phase locking in frequency encoding
and pitch perception is another basic issue that remains controversial and that has wide
spread relevance to mid-level audition.
As audition continues to evolve as a field, I believe useful guidance will come from a com
putational analysis of the inference problems the auditory system must solve (Marr,
1982). This necessitates thinking about the behavioral demands of real-world listening
situations, as well as the constraints imposed by the way that information about the world
is encoded in a sound signal. Many of these issues are becoming newly accessible with re
cent advances in computational power and signal processing techniques.
For instance, one of the most important tasks a listener must perform with sound is sure
ly that of recognition—determining what it was in the world that caused a sound, be it a
particular type of object, or of a type of event, such as something falling on the floor
(Gaver, 1993; Lutfi, 2008). Recognition is computationally challenging because the same
type of occurrence in the world typically produces a different sound waveform each time
it occurs. A recognition system must generalize across the variation that occurs within
categories, but not the variation that occurs across categories (DiCarlo & Cox, 2007). Re
alizing this computational problem allows us to ask how the auditory system solves it.
One place where these issues have been explored to some extent is speech perception
(Holt & Lotto, 2010). The ideas explored there—about how listeners achieve invariance
Page 42 of 62
Audition
across different speakers and infer the state of the vocal apparatus along with the accom
panying intentions of the speaker—could perhaps be extended to audition more generally
(Rosenblum, 2004).
The inference problems of audition can also be better appreciated by examining real-
world sound signals, and formal analysis of these signals seems likely to yield valuable
clues. As discussed in previous sections, statistical analysis of natural sounds has been a
staple of recent computational auditory neuroscience (Harper & McAlpine, 2004; Ro
driguez, Chen, et al., 2010; Smith & Lewicki, 2006), where natural sound statistics have
been used to explain the mechanisms observed in the peripheral auditory system. Howev
er, sound analysis seems likely to provide insight into mid- and high-level auditory prob
lems as well. For instance, the acoustic grouping cues used in sound segregation are al
most surely rooted to some extent in natural sound statistics, and examining such statis
tics could reveal unexpected cues. Similarly, because sound recognition must generalize
across the variability that occurs within sounds produced by a particular type of source,
examining this variability in natural sounds may provide clues to how the auditory system
achieves the appropriate invariance in this domain.
The study of real-world auditory competence will also necessitate measuring auditory
abilities and physiological responses with more realistic sound signals. The tones and
noises that have been the staple of classical psychoacoustics and auditory physiology
have many uses, but also have little in common with many everyday sounds. One chal
lenge of working with realistic signals is that actual recordings of real-world sounds are
often uncontrolled, and typically introduce confounds associated with their familiarity.
Methods of synthesizing novel sounds with naturalistic properties (Cavaco & Lewicki,
2007; McDermott, Wrobleski et al., 2011; (p. 164) McDermott & Simoncelli, 2011) are thus
likely to be useful experimental tools. Simulations of realistic auditory environments are
also increasingly within reach, with methods for generating three-dimensional auditory
scenes (Wightman & Kistler, 1989; Zahorik, 2009) being used in studies of sound localiza
tion and speech perception in realistic conditions.
We must also consider more realistic auditory behaviors. Hearing does not normally oc
cur while we are seated in a quiet room, listening over headphones, and paying full atten
tion to the acoustic stimulus, but rather in the context of everyday activities in which
sound is a means to some other goal. The need to respect this complexity while maintain
ing sufficient control over experimental conditions presents a challenge, but not one that
is insurmountable. For instance, neurophysiology experiments involving naturalistic be
havior are becoming more common, with preparations being developed that will permit
recordings from freely moving animals engaged in vocalization (Eliades & Wang, 2008) or
locomotion—ultimately, perhaps a real-world cocktail party.
Page 43 of 62
Audition
Author Note
I thank Garner Hoyt von Trapp, Sam Norman-Haignere, Michael Schemitsch, and Sara
Steele for helpful comments on earlier drafts of this chapter, the authors who kindly al
lowed me to reproduce their figures (acknowledged individually in the figure captions),
and the Howard Hughes Medical Institute for support.
References
Adelson, E. H. (2000). Lightness perception and lightness illusions. In: M. S. Gazzaniga
(Ed.), The new cognitive neurosciences (2nd ed., pp. 339–351). Cambridge, MA, MIT
Press.
Ahrens, M. B., Linden, J. F., et al. (2008). Influences in auditory cortical responses mod
eled with multilinear spectrotemporal methods. Journal of Neuroscience, 28 (8), 1929–
1942.
Alain, C., Arnott, S. R., et al. (2001). “What” and “where” in the human auditory system.
Proceedings of the National Academy of Sciences U S A, 98, 12301–12306.
Alais, D., & Burr, D. E. (2004). The ventriloquist effect results from near-optimal bimodal
integration. Current Biology, 14, 257–262.
ANSI (2007). American national standard procedure for the computation of loudness of
steady sounds. ANSI, S3–4.
Ashmore, J. (2008). Cochlear outer hair cell motility. Physiological Review, 88, 173–210.
Attias, H., & Schreiner, C. E. (1997). Temporal low-order statistics of natural sounds. Ad
vances in Neural Information Processing (p. 9). In M. Mozer, Jordan, M., & Petsche, T.
Cambridge, MA: MIT Press.
Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical
scaling. American Journal of Psychology, 84 (2), 147–166.
Bacon, S. P., & Grantham, D. W. (1989). Modulation masking: Effects of modulation fre
quency, depth, and phase. Journal of the Acoustical Society of America, 85, 2575–2580.
Banai, K., Hornickel, J., et al. (2009). Reading and subcortical auditory function. Cerebral
Cortex, 19 (11), 2699–2707.
Barbour, D. L., & Wang, X. (2003). Contrast tuning in auditory cortex. Science, 299, 1073–
1075.
Page 44 of 62
Audition
Baumann, S., Griffiths, T. D., et al. (2011). Orthogonal representation of sound dimensions
in the primate midbrain. Nature Neuroscience, 14 (4), 423–425.
Bee, M. A., & Klump, G. M. (2004). Primitive auditory stream segregation: A neurophysio
logical study in the songbird forebrain. Journal of Neurophysiology, 92, 1088–1104.
Bee, M. A., & Micheyl, C. (2008). The cocktail party problem: What is it? How can it be
solved? And why should animal behaviorists study it? Journal of Comparative Psychology,
122 (3), 235–251.
Belin, P., Zatorre, R. J., et al. (2000). Voice-selective areas in human auditory cortex. Na
ture, 403, 309–312.
Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory
cortex. Nature, 426, 1161–1165.
Bendor, D., & Wang, X. (2006). Cortical representations of pitch in monkeys and humans.
Current Opinion in Neurobiology, 16, 391–399.
Bendor, D., & Wang, X. (2008). Neural response properties of primary, rostral, and ros
trotemporal core fields in the auditory cortex of marmoset monkeys. Journal of Neuro
physiology, 100 (2), 888–906.
Bernstein, J. G. W., & Oxenham, A. J. (2005). An autocorrelation model with place depen
dence to account for the effect of harmonic number on fundamental frequency discrimi
nation. Journal of the Acoustical Society of America, 117 (6), 3816–3831.
Bitterman, Y., Mukamel, R., et al. (2008). Ultra-fine frequency tuning revealed in single
neurons of human auditory cortex. Nature, 451 (7175), 197–201.
Bizley, J. K., Nodal, F. R., et al. (2007). Physiological and anatomical evidence for multi
sensory interactions in auditory cortex. Cerebral Cortex, 17, 2172–2189.
Bizley, J. K., Walker, K. M. M., et al. (2009). Interdependent encoding of pitch, timbre, and
spatial location in auditory cortex. Journal of Neuroscience, 29 (7), 2064–2075.
Bizley, J. K., Walker, K. M. M., et al. (2010). Neural ensemble codes for stimulus periodici
ty in auditory cortex. Journal of Neuroscience, 30 (14), 5078–5091.
Boemio, A., Fromm, S., et al. (2005). Hierarchical and asymmetric temporal sensitivity in
human auditory cortices. Nature Neuroscience, 8, 389–395.
Brandewie, E., & Zahorik, P. (2010). Prior listening in rooms improves speech intelligibili
ty. Journal of the Acoustical Society of America, 128, 291–299.
Page 45 of 62
Audition
Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and percep
tion of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244–
249.
Brown, G. J., & Palomaki, K. J. (2006). In D. Wang & G. J. Brown (Eds.), Reverberation.
Computational auditory scene analysis: Principles, algorithms, and applications (pp. 209–
250). D. Wang and G. J. Brown. Hoboken, NJ: John Wiley & Sons.
Buus, S., Florentine, M., et al. (1997). Temporal integration of loudness, loudness discrim
ination, and the form of the loudness function. Journal of the Acoustical Society of Ameri
ca, 101, 669–680.
Carcagno, S., & Plack, C. J. (2011). Subcortical plasticity following perceptual learning in
a pitch discrimination task. Journal of the Association for Research in Otolaryngology, 12,
89–100.
Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones. I.
Pitch and pitch salience. Journal of Neurophysiology, 76, 1698–1716.
Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8
(10), 465–471.
Carlyon, R. P., & Cusack, R., et al. (2001). Effects of attention and unilateral neglect on
auditory stream segregation. Journal of Experimental Psychology: Human Perception and
Performance, 27 (1), 115–127.
Carlyon, R. P., Micheyl, C., et al. (2004). Auditory processing of real and illusory changes
in frequency modulation (FM) phase. Journal of the Acoustical Society of America, 116
(6), 3629–3639.
Cavaco, S., & Lewicki, M. S. (2007). Statistical modeling of intrinsic structures in impact
sounds. Journal of the Acoustical Society of America, 121 (6), 3558–3568.
Cedolin, L., & Delgutte, B. (2005). Pitch of complex tones: Rate-place and interspike in
terval representations in the auditory nerve. Journal of Neurophysiology, 94, 347–362.
Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two
ears. Journal of the Acoustical Society of America, 25 (5), 975–979.
Page 46 of 62
Audition
Christianson, G. B., Sahani, M., et al. (2008). The consequences of response nonlineari
ties for interpretation of spectrotemporal receptive fields. Journal of Neuroscience, 28 (2),
446–455.
Ciocca, V., & Bregman, A. S. (1987). Perceived continuity of gliding and steady-state tones
through interrupting noise. Perception & Psychophysics, 42, 476–484.
Cohen, Y. E., Russ, B. E., et al. (2009). A functional role for the ventrolateral prefrontal
cortex in non-spatial auditory cognition. Proceedings of the National Academy of Sciences
U S A, 106, 20045–20050.
Conway, A. R., Cowan, A. N., et al. (2001). The cocktail party phenomenon revisited: The
importance of working memory capacity. Psychonomic Bulletin & Review, 8, 331–335.
Culling, J. F., & Akeroyd, M. A. (2010). In C. J. Plack (Ed.), Spatial hearing. The Oxford
handbook of auditory science: Hearing (Vol. 3, pp. 123–144). Oxford, UK: Oxford Universi
ty Press.
Cusack, R., & Carlyon, R. P. (2004). Auditory perceptual organization inside and outside
the laboratory. In J. G. Neuhoff (Ed.), Ecological psychoacoustics (pp. 15–84). San Diego:
Elsevier Academic Press.
Dallos, P. (2008). Cochlear amplification, outer hair cells and prestin. Current Opinion in
Neurobiology, 18, 370–376.
Darrow, K. N., Maison, S. F., et al. (2006). Cochlear efferent feedback balances interaural
sensitivity. Nature Neuroscience, 9 (12), 1474–1476.
Darwin, C. J., & Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asyn
chrony and ear of presentation of a mistuned component. Journal of the Acoustical Soci
ety of America, 91, 3381–3390.
Page 47 of 62
Audition
Darwin, C. J., & Hukin, R. W. (1997). Perceptual segregation of a harmonic from a vowel
by interaural time difference and frequency proximity. Journal of the Acoustical Society of
America, 102 (4), 2316–2324.
Darwin, C. J., & Hukin, R. W. (1999). Auditory objects of attention: The role of interaural
time differences. Journal of Experimental Psychology: Human Perception and Perfor
mance, 25 (3), 617–629.
Dau, T., Kollmeier, B., et al. (1997). Modeling auditory processing of amplitude modula
tion. I. Detection and masking with narrow-band carriers. Journal of the Acoustical Soci
ety of America, 102 (5), 2892–2905.
David, S. V., Mesgarani, N., et al. (2009). Rapid synaptic depression explains nonlinear
modulation of spectro-temporal tuning in primary auditory cortex by natural stimuli. Jour
nal of Neuroscience, 29 (11), 3374–3386.
de Cheveigne, A. (2006). Multiple F0 estimation. In: D. Wang & G. J. Brown (Eds.), Com
putational auditory scene analysis: Principles, algorithms, and applications (pp. 45–80).
Hoboken, NJ: John Wiley & Sons.
de Cheveigne, A. (2010). Pitch perception. In C. J. Plack (Ed.), The Oxford handbook of au
ditory science: Hearing (Vol. 3), pp. 71–104. New York: Oxford University Press.
de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for
speech and music. Journal of the Acoustical Society of America, 111, 1917–1930.
de Cheveigne, A., McAdams, S., et al. (1995). Identification of concurrent harmonic and
inharmonic vowels: A test of the theory of harmonic cancellation and enhancement. Jour
nal of the Acoustical Society of America, 97 (6), 3736–3748.
Dean, I., Harper, N. S., et al. (2005). Neural population coding of sound level adapts to
stimulus statistics. Nature Neuroscience, 8 (12), 1684–1689.
Delgutte, B., Joris, P. X., et al. (1999). Receptive fields and binaural interactions for virtu
al-space stimuli in the cat inferior colliculus. Journal of Neurophysiology, 81, 2833–2851.
Demany, L., & Semal, C. (1990). The upper limit of “musical” pitch. Music Percep
(p. 166)
tion, 8, 165–176.
Demany, L., Trost, W., et al. (2008). Auditory change detection: Simple sounds are not
memorized better than complex sounds. Psychological Science, 19, 85–91.
Depireux, D. A., Simon, J. Z., et al. (2001). Spectro-temporal response field characteriza
tion with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology,
85 (3), 1220–1234.
Page 48 of 62
Audition
DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog
nitive Sciences, 11, 333–341.
Durlach, N. I., Mason, C. R., et al. (2003). Note on informational masking. Journal of the
Acoustical Society of America, 113 (6), 2984–2987.
Elgoyhen, A. B., & Fuchs, P. A. (2010). Efferent innervation and function. In P. A. Fuchs
(Ed.), The Oxford handbook of auditory science: The ear (pp. 283–306). Oxford, UK: Ox
ford University Press.
Elhilali, M., Ma, L., et al. (2009). Temporal coherence in the perceptual organization and
cortical representation of auditory scenes. Neuron, 61, 317–329.
Eliades, S. J., & Wang X. (2008). Neural substrates of vocalization feedback monitoring in
primate auditory cortex. Nature, 453, 1102–1106.
Escabi, M. A., Miller, L. M., et al. (2003). Naturalistic auditory contrast improves spec
trotemporal coding in the cat inferior colliculus. Journal of Neuroscience, 23, 11489–
11504.
Fairhall, A. L., Lewen, G. D., et al. (2001). Efficiency and ambiguity in an adaptive neural
code. Nature, 412, 787–792.
Field, D. J. (1987). Relations between the statistics of natural images and the response
profiles of cortical cells. Journal of the Optical Society of America A, 4, 2379–2394.
Fishman, Y. I., Arezzo, J. C., et al. (2004). Auditory stream segregation in monkey auditory
cortex: Effects of frequency separation, presentation rate, and tone duration. Journal of
the Acoustical Society of America, 116, 1656–1670.
Fishman, Y. I., Reser, D. H., et al. (1998). Pitch vs. spectral encoding of harmonic complex
tones in primary auditory cortex of the awake monkey. Brain Research, 786, 18–30.
Fishman, Y. I., & Steinschneider, M. (2010). Formation of auditory streams. In A. Rees &
A. R. Palmer (Eds.), The oxford handbook of auditory science: The auditory brain (pp.
215–245). Oxford, UK: Oxford University Press.
Formisano, E., Kim, D., et al. (2003). Mirror-symmetric tonotopic maps in human primary
auditory cortex. Neuron, 40 (4), 859–869.
Freyman, R. L., Clifton, R. K., et al. (1991). Dynamic processes in the precedence effect.
Journal of the Acoustical Society of America, 90, 874–884.
Fritz, J. B., David, S. V., et al. (2010). Adaptive, behaviorally gated, persistent encoding of
task-relevant auditory information in ferret frontal cortex. Nature Neuroscience, 13 (8),
1011–1019.
Fritz, J. B., Elhilali, M., et al. (2005). Differential dynamic plasticity of A1 receptive fields
during multiple spectral tasks. Journal of Neuroscience, 25 (33), 7623–7635.
Page 49 of 62
Audition
Fritz, J. B., Shamma, S. A., et al. (2003). Rapid task-related plasticity of spectrotemporal
receptive fields in primary auditory cortex. Nature Neuroscience, 6, 1216–1223.
Ghazanfar, A. A. (2009). The multisensory roles for auditory cortex in primate vocal com
munication. Hearing Research, 258, 113–120.
Ghitza, O. (2001). On the upper cutoff frequency of the auditory critical-band envelope
detectors in the context of speech perception. Journal of the Acoustical Society of Ameri
ca, 110 (3), 1628–1640.
Giraud, A., Lorenzi, C., et al. (2000). Representation of the temporal envelope of sounds
in the human brain. Journal of Neurophysiology, 84 (3), 1588–1598.
Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from
notched-noise data. Hearing Research, 47, 103–138.
Goldstein, J. L. (1973). An optimum processor theory for the central formation of the pitch
of complex tones. Journal of the Acoustical Society of America, 54, 1496–1516.
Guinan, J. J. (2006). Olivocochlear efferents: Anatomy, physiology, function, and the mea
surement of efferent effects in humans. Ear and Hearing, 27 (6), 589–607.
Gygi, B., Kidd, G. R., et al. (2004). Spectral-temporal factors in the identification of envi
ronmental sounds. Journal of the Acoustical Society of America, 115 (3), 1252–1265.
Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain. Cere
bral Cortex, 19 (3), 576–585.
Hall, D. A., Barrett, D. J. K., Akeroyd, M. A., & Summerfield, A. Q. (2005). Cortical repre
sentations of temporal structure in sound. Journal of Neurophysiology, 94 (11), 3181–
3191.
Hall, J. W., Haggard, M. P., et al. (1984). Detection in noise by spectro-temporal pattern
analysis. Journal of the Acoustical Society of America, 76, 50–56.
Harper, N. S., & McAlpine, D. (2004). Optimal neural population coding of an auditory
spatial cue. Nature, 430, 682–686.
Hawley, M. L., Litovsky, R. Y., et al. (2004). The benefit of binaural hearing in a cocktail
party: Effect of location and type of interferer. Journal of the Acoustical Society of Ameri
ca, 115 (2), 833–843.
Page 50 of 62
Audition
Heffner, H. E., & Heffner, R. S. (1990). Effect of bilateral auditory cortex lesions on sound
localization in Japanese macaques. Journal of Neurophysiology, 64 (3), 915–931.
Heinz, M. G., Colburn, H. S., et al. (2001). Evaluating auditory performance limits: I. One-
parameter discrimination using a computational model for the auditory nerve. Neural
Computation, 13, 2273–2316.
Higgins, N. C., Storace, D. A., et al. (2010). Specialization of binaural responses in ventral
auditory cortices. Journal of Neuroscience, 30 (43), 14522–14532.
Hofman, P. M., Van Riswick, J. G. A., et al. (1998). Relearning sound localization with new
ears. Nature Neuroscience, 1 (5), 417–421.
Holt, L. L., & Lotto, A. J. (2010). Speech perception as categorization. Attention, Percep
tion, and Psychophysics, 72 (5), 1218–1227.
Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for
complex tones with many harmonics. Journal of the Acoustical Society of America, 87 (1),
304–310.
Hsu, A., Woolley, S. M., et al. (2004). Modulation power and phase spectrum of
(p. 167)
natural sounds enhance neural encoding performed by single auditory neurons. Journal of
Neuroscience, 24, 9201–9211.
Humphries, C., Liebenthal, E., et al. (2010). Tonotopic organization of human auditory
cortex. NeuroImage, 50 (3), 1202–1211.
Ihlefeld, A., & Shinn-Cunningham, B. (2008). Spatial release from energetic and informa
tional masking in a divided speech identification task. Journal of the Acoustical Society of
America, 123 (6), 4380–4392.
Javel, E., & Mott, J. B. (1988). Physiological and psychophysical correlates of temporal
processes in hearing. Hearing Research, 34, 275–294.
Jenkins, W. M., & Masterton, R. G. (1982). Sound localization: Effects of unilateral lesions
in central auditory system. Journal of Neurophysiology, 47, 987–1016.
Johnson, D. H. (1980). The relationship between spike rate and synchrony in responses of
auditory-nerve fibers to single tones. Journal of the Acoustical Society of America, 68,
1115–1122.
Johnsrude, I. S., Penhune, V. B., et al. (2000). Functional specificity in the right human au
ditory cortex for perceiving pitch direction. Brain, 123 (1), 155–163.
Page 51 of 62
Audition
Joris, P. X., Bergevin, C., et al. (2011). Frequency selectivity in Old-World monkeys corrob
orates sharp cochlear tuning in humans. Proceedings of the National Academy of
Sciences U S A, 108 (42), 17516–17520.
Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing
streams in primates. Proceedings of the National Academy of Sciences U S A, 97, 11793–
11799.
Kadia, S. C., & Wang, X. (2003). Spectral integration in A1 of awake primates: Neurons
with single and multipeaked tuning characteristics. Journal of Neurophysiology, 89 (3),
1603–1622.
Kanwisher, N. (2010). Functional specificity in the human brain: A window into the func
tional architecture of the mind. Proceedings of the National Academy of Sciences U S A,
107, 11163–11170.
Kawase, T., Delgutte, B., et al. (1993). Anti-masking effects of the olivocochlear reflex. II.
Enhancement of auditory-nerve response to masked tones. Journal of Neurophysiology,
70, 2533–2549.
Kayser, C., Petkov, C. I., et al. (2008). Visual modulation of neurons in auditory cortex.
Cerebral Cortex, 18 (7), 1560–1574.
Kidd, G., Arbogast, T. L., et al. (2005). The advantage of knowing where to listen. Journal
of the Acoustical Society of America, 118 (6), 3804–3815.
Kidd, G., Mason, C. R., et al. (1994). Reducing informational masking by sound segrega
tion. Journal of the Acoustical Society of America, 95 (6), 3475–3480.
Kidd, G., Mason, C. R., et al. (2003). Multiple bursts, multiple looks, and stream coher
ence in the release from informational masking. Journal of the Acoustical Society of Amer
ica, 114 (5), 2835–2845.
Kikuchi, Y., Horwitz, B., et al. (2010). Hierarchical auditory processing directed rostrally
along the monkey’s supratemporal plane. Journal of Neuroscience, 30 (39), 13021–13030.
Kluender, K. R., & Jenison, R. L. (1992). Effects of glide slope, noise intensity, and noise
duration on the extrapolation of FM glides through noise. Perception & Psychophysics,
51, 231–238.
Klump, R. G., & Eady, H. R. (1956). Some measurements of interural time difference
thresholds. Journal of the Acoustical Society of America, 28, 859–860.
Kulkarni, A., & Colburn, H. S. (1998). Role of spectral detail in sound-source localization.
Nature, 396, 747–749.
Page 52 of 62
Audition
Kvale, M., & Schreiner, C. E. (2004). Short-term adaptation of auditory receptive fields to
dynamic stimuli. Journal of Neurophysiology, 91, 604–612.
Langner, G., Sams, M., et al. (1997). Frequency and periodicity are represented in orthog
onal maps in the human auditory cortex: Evidence from magnetoencephalography. Jour
nal of Comparative Physiology, 181, 665–676.
Lerner, Y., Honey, C. J., et al. (2011). Topographic mapping of a hierarchy of temporal re
ceptive windows using a narrated story. Journal of Neuroscience, 31 (8), 2906–2915.
Liberman, M. C. (1982). The cochlear frequency map for the cat: Labeling auditory-nerve
fibers of known characteristic frequency. Journal of the Acoustical Society of America, 72,
1441–1449.
Litovsky, R. Y., Colburn, H. S., et al. (1999). The precedence effect. Journal of the Acousti
cal Society of America, 106, 1633–1654.
Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process
ing in auditory cortex. Nature Neuroscience, 11 (5), 609–616.
Lorenzi, C., Gilbert, G., et al. (2006). Speech perception problems of the hearing impaired
reflect inability to use temporal fine structure. Proceedings of the National Academy of
Sciences U S A, 103, 18866–18869.
Machens, C. K., M. S. Wehr, et al. (2004). Linearity of cortical receptive fields measured
with natural sounds. Journal of Neuroscience, 24, 1089–1100.
Macken, W. J., Tremblay, S., et al. (2003). Does auditory streaming require attention? Evi
dence from attentional selectivity in short-term memory. Journal of Experimental Psychol
ogy: Human Perception and Performance, 29, 43–51.
Page 53 of 62
Audition
May, B. J., Anderson, M., et al. (2008). The role of broadband inhibition in the rate
(p. 168)
representation of spectral cues for sound localization in the inferior colliculus. Hearing
Research, 238, 77–93.
May, B. J., & McQuone, S. J. (1995). Effects of bilateral olivocochlear lesions on pure-tone
discrimination in cats. Auditory Neuroscience, 1, 385–400.
McDermott, J. H. (2009). The cocktail party problem. Current Biology, 19, R1024–R1027.
McDermott, J. H., Lehr, A. J., et al. (2010). Individual differences reveal the basis of conso
nance. Current Biology, 20, 1035–1041.
McDermott, J. H., & Oxenham, A. J. (2008a). Music perception, pitch, and the auditory
system. Current Opinion in Neurobiology, 18, 452–463.
McDermott, J. H., & Simoncelli, E. P. (2011). Sound texture perception via statistics of the
auditory periphery: Evidence from sound synthesis. Neuron, 71, 926–940.
McDermott, J. H., Wrobleski, D., et al. (2011). Recovering sound sources from embedded
repetition. Proceedings of the National Academy of Sciences U S A, 108 (3), 1188–1193.
Meddis, R., & Hewitt, M. J. (1991). Virtual pitch and phase sensitivity of a computer mod
el of the auditory periphery: Pitch identification. Journal of the Acoustical Society of
America, 89, 2866–2882.
Mershon, D. H., Desaulniers, D. H., et al. (1981). Perceived loudness and visually-deter
mined auditory distance. Perception, 10, 531–543.
Mesgarani, N., David, S. V., et al. (2008). Phoneme representation and classification in
primary auditory cortex. Journal of the Acoustical Society of America, 123 (2), 899–909.
Micheyl, C., & Oxenham, A. J. (2010). Objective and subjective psychophysical measures
of auditory stream integration and segregation. Journal of the Association for Research in
Otolaryngology, 11 (4), 709–724.
Page 54 of 62
Audition
Micheyl, C., & Oxenham, A. J. (2010). Pitch, harmonicity and concurrent sound segrega
tion: Psychoacoustical and neurophysiological findings. Hearing Research, 266, 36–51.
Micheyl, C., Tian, B., et al. (2005). Perceptual organization of tone sequences in the audi
tory cortex of awake macaques. Neuron, 48, 139–148.
Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual
Review of Psychology, 42, 135–159.
Miller, L. M., Escabi, M. A., et al. (2001). Spectrotemporal receptive fields in the lemnis
cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527.
Miller, L. M., Escabi, M. A., et al. (2002). Spectrotemporal receptive fields in the lemnis
cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527.
Miller, R. L., Schilling, J. R., et al. (1997). Effects of acoustic trauma on the representation
of the vowel /e/ in cat auditory nerve fibers. Journal of the Acoustical Society of America,
101 (6), 3602–3616.
Moore, B. C. J. (2003). An introduction to the psychology of hearing. San Diego, CA: Acad
emic Press.
Moore, B. C., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model. Acta Acus
tica, 82 (2), 335–345.
Moore, B. C. J., Glasberg, B. R., et al. (1986). Thresholds for hearing mistuned partials as
separate tones in harmonic complexes. Journal of the Acoustical Society of America, 80,
479–483.
Moore, B. C. J., & Gockel, H. (2002). Factors influencing sequential stream segregation.
Acta Acustica, 88, 320–332.
Moshitch, D., Las, L., et al. (2006). Responses of neurons in primary auditory cortex (A1)
to pure tones in the halothane-anesthetized cat. Journal of Neurophysiology, 95 (6), 3756–
3769.
Page 55 of 62
Audition
Nelken, I., Bizley, J. K., et al. (2008). Responses of auditory cortex to complex stimuli:
Functional organization revealed using intrinsic optical signals. Journal of Neurophysiolo
gy, 99 (4), 1928–1941.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties
by learning a sparse code for natural images. Nature, 381, 607–609.
Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig
and its relation to the receptor potential of inner hair-cells. Hearing Research, 24, 1–15.
Patterson, R. D., Uppenkamp, S., et al. (2002). The processing of temporal pitch and
melody information in auditory cortex. Neuron, 36 (4), 767–776.
Penagos, H., Melcher, J. R., et al. (2004). A neural representation of pitch salience in non
primary human auditory cortex revealed with functional magnetic resonance imaging.
Journal of Neuroscience, 24 (30), 6810–6815.
Petkov, C. I., Kayser, C., et al. (2006). Functional imaging reveals numerous fields in the
monkey auditory cortex. PLoS Biology, 4 (7), 1213–1226.
Petkov, C. I., Kayser, C., et al. (2008). A voice region in the monkey brain. Nature Neuro
science, 11, 367–374.
Petkov, C. I., O’Connor, K. N., et al. (2007). Encoding of illusory continuity in primary au
ditory cortex. Neuron, 54, 153–165.
Plack, C. J., & Oxenham, A. J. (2005). The psychophysics of pitch. In C. J. Plack, A. J. Oxen
ham, R. R. Fay, & A. J. Popper (Eds.), Pitch: Neural coding and perception (pp. 7–55). New
York: Springer-Verlag.
Plack, C. J., Oxenham, A. J., et al. (Eds.) (2005). Pitch: Neural coding and perception.
Springer Handbook of Auditory Research. New York: Springer-Verlag.
(p. 169) Poremba, A., Saunders, R. C., et al. (2003). Functional mapping of the primate au
ditory system. Science, 299, 568–572.
Pressnitzer, D., Sayles, M., et al. (2008). Perceptual organization of sound begins in the
auditory periphery. Current Biology, 18, 1124–1128.
Page 56 of 62
Audition
Rajan, R. (2000). Centrifugal pathways protect hearing sensitivity at the cochlea in noisy
environments that exacerbate the damage induced by loud sound. Journal of Neuro
science, 20, 6684–6693.
Rauschecker, J. P., & Tian, B. (2004). Processing of band-passed noise in the lateral audi
tory belt cortex of the rhesus monkey. Journal of Neurophysiology, 91, 2578–2589.
Riecke, L., van Opstal, J., et al. (2007). Hearing illusory sounds in noise: Sensory-
(p. 170)
Roberts, B., & Brunstrom, J. M. (1998). Perceptual segregation and pitch shifts of mis
tuned components in harmonic complexes and in regular inharmonic complexes. Journal
of the Acoustical Society of America, 104 (4), 2326–2338.
Rodriguez, F. A., Chen, C., et al. (2010). Neural modulation tuning characteristics scale to
efficiently encode natural sound statistics. Journal of Neuroscience, 30, 15969–15980.
Romanski, L. M., Tian, B., et al. (1999). Dual streams of auditory afferents target multiple
domains in the primate prefrontal cortex. Nature Neuroscience, 2 (12), 1131–1136.
Rose, J. E., Brugge, J. F., et al. (1967). Phase-locked response to low-frequency tones in
single auditory nerve fibers of the squirrel monkey. Journal of Neurophysiology, 30, 769–
793.
Rosen, S. (1992). Temporal information in speech: Acoustic, auditory and linguistic as
pects. Philosophical Transactions of the Royal Society, London, Series B, 336, 367–373.
Rothschild, G., Nelken, I., et al. (2010). Functional organization and population dynamics
in the mouse primary auditory cortex. Nature Neuroscience, 13 (3), 353–360.
Page 57 of 62
Audition
Rotman, Y., Bar Yosef, O., et al. (2001). Relating cluster and population responses to nat
ural sounds and tonal stimuli in cat primary auditory cortex. Hearing Research, 152, 110–
127.
Ruggero, M. A., & Rich, N. C. (1991). Furosemide alters organ of Corti mechanics: Evi
dence for feedback of outer hair cells upon the basilar membrane. Journal of Neuro
science, 11, 1057–1067.
Ruggero, M. A., Rich, N. C., et al. (1997). Basilar-membrane responses to tones at the
base of the chinchilla cochlea. Journal of the Acoustical Society of America, 101, 2151–
2163.
Samson, F., Zeffiro, T. A., et al. (2011). Stimulus complexity and categorical effects in hu
man auditory cortex: an Activation Likelihood Estimation meta-analysis. Frontiers in Psy
chology, 1, 1–23.
Scharf, B., Magnan, J., et al. (1997). On the role of the olivocochlear bundle in hearing: 16
Case studies. Hearing Research, 102, 101–122.
Schonwiesner, M., & Zatorre, R. J. (2008). Depth electrode recordings show double disso
ciation between pitch processing in lateral Heschl’s gyrus. Experimental Brain Research,
187, 97–105.
Schreiner, C. E., & Urbas, J. V. (1986). Representation of amplitude modulation in the au
ditory cortex of the cat. I. Anterior auditory field. Hearing Research, 21, 227–241.
Schreiner, C. E., & Urbas, J. V. (1988). Representation of amplitude modulation in the au
ditory cortex of the cat. II. Comparison between cortical fields. Hearing Research, 32, 49–
64.
Shackleton, T. M., & Carlyon, R. P. (1994). The role of resolved and unresolved harmonics
in pitch perception and frequency modulation discrimination. Journal of the Acoustical So
ciety of America, 95 (6), 3529–3540.
Shamma, S. A., & Klein, D. (2000). The case of the missing pitch templates: How harmon
ic templates emerge in the early auditory system. Journal of the Acoustical Society of
America, 107, 2631–2644.
Shannon, R. V., Zeng, F. G., et al. (1995). Speech recognition with primarily temporal
cues. Science, 270 (5234), 303–304.
Page 58 of 62
Audition
Shera, C. A., Guinan, J. J., et al. (2002). Revised estimates of human cochlear tuning from
otoacoustic and behavioral measurements. Proceedings of the National Academy of
Sciences U S A, 99 (5), 3318–3323.
Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and etho
logical theories of auditory processing. Journal of the Acoustical Society of America, 114
(6), 3394–33411.
Skoe, E., & Kraus, N. (2010). Auditory brainstem response to complex sounds: A tutorial.
Ear and Hearing, 31 (3), 302–324.
Smith, E. C., & Lewicki, M. S. (2006). Efficient auditory coding. Nature, 439, 978–982.
Smith, Z. M., Delgutte, B., et al. (2002). Chimaeric sounds reveal dichotomies in auditory
perception. Nature, 416, 87–90.
Snyder, J. S., & Alain, C. (2007). Toward a neurophysiological theory of auditory stream
segregation. Psychological Bulletin, 133 (5), 780–799.
Stilp, C. E., Alexander, J. M., et al. (2010). Auditory color constancy: Calibration to reli
able spectral properties across nonspeech context and targets. Attention, Perception, and
Psychophysics, 72 (2), 470–480.
Sutter, M. L., & Schreiner, C. E. (1991). Physiology and topography of neurons with multi
peaked tuning curves in cat primary auditory cortex. Journal of Neurophysiology, 65,
1207–1226.
Sweet, R. A., Dorph-Petersen, K., et al. (2005). Mapping auditory core, lateral belt, and
parabelt cortices in the human superior temporal gyrus. Journal of Comparative Neurolo
gy, 491, 270–289.
Talavage, T. M., Sereno, M. I., et al. (2004). Tonotopic organization in human auditory cor
tex revealed by progressions of frequency sensitivity. Journal of Neurophysiology, 91,
1282–1296.
Tansley, B. W., & Suffield, J. B. (1983). Time-course of adaptation and recovery of chan
nels selectively sensitive to frequency and amplitude modulation. Journal of the Acousti
cal Society of America, 74, 765–775.
Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of
America, 55, 1061–1069.
Page 59 of 62
Audition
Theunissen, F. E., Sen, K., et al. (2000). Spectral-temporal receptive fields of non-linear
auditory neurons obtained using natural sounds. Journal of Neuroscience, 20, 2315–2331.
Tian, B., & Rauschecker, J. P. (2004). Processing of frequency-modulated sounds in the lat
eral auditory belt cortex of the rhesus monkey. Journal of Neurophysiology, 92, 2993–
3013.
Tian, B., Reser, D., et al. (2001). Functional specialization in rhesus monkey auditory cor
tex. Science, 292, 290–293.
Ulanovsky, N., Las, L., et al. (2003). Processing of low-probability sounds by cortical neu
rons. Nature Neuroscience, 6 (4), 391–398.
Walker, K. M. M., Bizley, J. K., et al. (2010). Cortical encoding of pitch: Recent results and
open questions. Hearing Research, 271 (1–2), 74–87.
Wallace, M. N., Anderson, L. A., et al. (2007). Phase-locked responses to pure tones in the
auditory thalamus. Journal of Neurophysiology, 98 (4), 1941–1952.
Wallach, H., Newman, E. B., et al. (1949). The precedence effect in sound localization.
American Journal of Psychology, 42, 315–336.
Warren, J. D., Zielinski, B. A., et al. (2002). Perception of sound-source motion by the hu
man brain. Neuron, 34, 139–148.
Warren, R. M., Obusek, C. J., et al. (1972). Auditory induction: perceptual synthesis of ab
sent sounds. Science, 176, 1149–1151.
Wightman, F., & Kistler, D. J. (1989). Headphone simulation of free-field listening. II. Psy
chophysical validation. Journal of the Acoustical Society of America, 85 (2), 868–878.
Winslow, R. L., & Sachs, M. B. (1987). Effect of electrical stimulation of the crossed olivo
cochlear bundle on auditory nerve response to tones in noise. Journal of Neurophysiology,
57 (4), 1002–1021.
Page 60 of 62
Audition
Wong, P. C. M., Skoe, E., et al. (2007). Musical experience shapes human brainstem en
coding of linguistic pitch patterns. Nature Neuroscience, 10 (4), 420–422.
Woods, T. M., Lopez, S. E., et al. (2006). Effects of stimulus azimuth and intensity on the
single-neuron activity in the auditory cortex of the alert macaque monkey. Journal of Neu
rophysiology, 96 (6), 3323–3337.
Woolley, S. M., Fremouw, T. E., et al. (2005). Tuning for spectro-temporal modulations as
a mechanism for auditory discrimination of natural sounds. Nature Neuroscience, 8 (10),
1371–1379.
Yates, G. K. (1990). Basilar membrane nonlinearity and its influence on auditory nerve
rate-intensity functions. Hearing Research, 50, 145–162.
Yin, T. C. T., & Kuwada, S. (2010). Binaural localization cues. In A. Rees & A. R. Palmer
(Eds.), The Oxford handbook of auditory science: The auditory brain (pp. 271–302). Ox
ford, UK: Oxford University Press.
Young, E. D. (2010). Level and spectrum. In A. Rees & A. R. Palmer (Eds.), The Oxford
handbook of auditory science: The auditory brain (pp. 93–124). Oxford, UK: Oxford Uni
versity Press.
Zahorik, P., Bangayan, P., et al. (2006). Perceptual recalibration in human sound localiza
tion: Learning to remediate front-back reversals. Journal of the Acoustical Society of
America, 120 (1), 343–359.
Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source dis
tance. Nature Neuroscience, 4 (1), 78–83.
Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cor
tex. Cerebral Cortex, 11, 946–953.
Zatorre, R. J., Belin, P., et al. (2002). Structure and function of auditory cortex: Music and
speech. Trends in Cognitive Sciences, 6 (1), 37–46.
Josh H. McDermott
Page 61 of 62
Audition
Page 62 of 62
Neural Correlates of the Development of Speech Perception and Compre
hension
The development of auditory language perception proceeds from acoustic features via
phonological representations to words and their relations in a sentence. Neurophysiologi
cal data indicate that infants discriminate acoustic differences relevant for phoneme cate
gories and word stress patterns by the age of 2 and 4 months, respectively. Salient
acoustic cues that mark prosodic phrase boundaries (e.g., pauses) are also perceived at
about the age of 5 months and infants learn about the rules according to which phonemes
are legally combined (i.e, phonotactics). At the end of their first year of life, children rec
ognize and produce their first words, and electrophysiological studies suggest that they
establish brain mechanisms to gain lexical representations similar to those of adults. In
their second year of life, children enlarge their lexicon, and electrophysiological data
show that 24-month-olds base their processing of semantic relations in sentences on
brain mechanisms comparable to those observable in adults. At this age, children are also
able to recognize syntactic errors in a sentence, but it takes until 32 months before they
display a brain response pattern to syntactic violations similar to adults. The development
of comprehension of syntactically complex sentences, such as sentences with a noncanon
ical word order, however, takes several more years before adult-like processes are estab
lished.
Introduction
Language acquisition, with its remarkable speed and high levels of success, remains a
mystery. At birth, infants are able to communicate by crying in different ways. From birth
on, infants also distinguish the sound of their native language from that of other lan
guages. Following these first language-related steps, there is a fast progression in the de
velopment of perceptive and expressive language skills. At about 4 months, babies start
to babble, the earliest stages of language production. A mere 12 months after birth, most
Page 1 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
babies start to speak their first words, and about half a year later, they can even produce
short sentences. Finally, at the end of most children’s third year of life, they have ac
quired at least 500 words and know how to combine them into meaningful utterances.
Thus, they have mastered the entry into their native language: They have acquired a com
plex system with the typical sounds of a language, these sounds are combined in different
ways to make up a wide vocabulary, and the vocabulary items are related to each other by
means of syntactic rules.
Although developmental research has delivered increasing knowledge about language ac
quisition (e.g., Clark, 2003; Szagun, 2006), many questions remain. Studying how chil
dren acquire language is not easily accomplished because a great deal of learning takes
place before the child is able to speak and to show overt responses to what he or she ac
tually perceives. It is a methodological challenge to develop ways to investigate whether
infants know (p. 172) a particular word before they can demonstrate this by producing it.
The measurement of infants’ brain responses to language input can help to provide infor
mation about speech perception abilities early in life, and, moreover, they allow us to de
scribe the neural basis of language perception and comprehension during development.
Page 2 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
temporal resolution. In contrast to EEG, however, it also provides reliable spatial informa
tion about the localization of the currents responsible for the magnetic field sources. For
example, an MEG study with newborns revealed infants’ instant ability to discriminate
speech sounds and, moreover, reliably located this process in the auditory cortices (Ku
jala et al., 2004). Because movement restrictions limit the use of MEG in developmental
research, this method has been primarily applied to sleeping newborns (e.g., Kujala et al.,
2004; Sambeth, Ruohio, Alku, Fellman, & Huotilainen, 2008). However, the use of addi
tional head-tracking techniques considerably expands its field of application (e.g., Imada
et al., 2006).
Another method that registers the metabolic demand due to neural signaling is functional
magnetic resonance imaging (fMRI). Here, the resulting changes in oxygenated hemoglo
bin are measured as (p. 173) blood-oxygen-level-dependent (BOLD) contrast. The temporal
resolution of this method is considerably lower than with EEG/MEG, but its spatial resolu
tion is excellent. Thus, fMRI studies provide information about the localization of sensory
and cognitive processes, not only in surface layers of the cortex as with fNIRS, but also in
deeper cortical and subcortical structures. So far, this measurement has been primarily
applied with infants while they were asleep in the scanner (e.g., Dehaene-Lambertz, De
haene, & Hertz-Pannier, 2002; Dehaene-Lambertz et al., 2010; Perani et al., 2010). The
limited number of developmental fMRI studies may be due to both practical issues and
methodological considerations. Movement restrictions during brain scanning make it dif
ficult to work with children in the scanner. Moreover, there is an ongoing discussion
whether the BOLD signal in children is comparable to the one in adults and whether the
Page 3 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
adult models applied so far are appropriate for developmental research (see, e.g., Muzik,
Chugani, Juhasz, Shen, & Chugani, 2000; Rivkin et al., 2004; Schapiro et al., 2004). To ad
dress the latter problem, recent fMRI studies in infants and young children have used
age-specific brain templates (Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002) and
optimized template construction for developmental populations (Fonov et al., 2011; Wilke,
Holland, Altaye, & Gaser, 2008).
Although there are only a few studies that have applied the latter methods with infants so
far, their findings strongly point to neural dispositions for language in the infant brain. In
an fNIRS study with sleeping newborns, Pena et al. (2003) observed a left hemispheric
dominance in temporal areas for normal speech compared with backward speech. In an
fMRI experiment with 3-month-olds, Dehaene-Lambertz, Dehaene, and Hertz-Pannier
(2002) also found that speech sounds, collapsed across forward and backward speech,
compared with silence evoked strong left hemispheric activation in the superior temporal
gyrus. This activation included Heschl’s gyrus and extended to the superior temporal sul
cus and the temporal pole (Figure 9.1). Activation differences between forward and back
ward speech were observed in the left angular gyrus and the precuneus. An additional
right frontal brain activation occurred only in infants that were awake and was interpret
ed as reflecting attentional factors.
In a second analysis of the fMRI data from 3-month-olds, Dehaene-Lambertz et al. (2006)
found that the temporal sequence of left hemispheric activations in the different brain ar
eas was similar to adult patterns, with the activation in the auditory cortex preceding ac
tivation both in the most posterior and anterior parts of the temporal cortex and in
Broca’s area. The reported early left hemisphere specialization has been shown to be
speech-specific (Dehaene-Lambertz et al., 2010) and, in addition, appears more pro
Page 4 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
nounced for native relative to non-native language input (Minagawa-Kawai et al., 2011).
Specifically, 2-month-olds showed stronger fMRI activation in the left posterior temporal
lobe in response to language than music stimuli (Dehaene-Lambertz et al., 2010). In an
fNIRS study with 4-month-olds, Minagawa-Kawai et al. (2011) reported stronger respons
es in left temporal regions for speech relative to three nonspeech conditions, with native
stimuli revealing stronger lateralization patterns than non-native stimuli. Left hemispher
ic superior temporal activation has also been reported as a discriminative response to syl
lables in a recent MEG study with neonates, 6-month-olds, and 12-month-olds (Imada et
al., 2006). Interestingly, 6- and 12-month-olds additionally showed activation patterns in
inferior frontal regions, but newborns did not yet do this. Together with the results in 3-
month-olds (Dehaene-Lambertz et al., 2006), these findings indicate developmental
changes in motor speech areas (Broca’s area) at an age when infants produce their first
syllable sequences and words.
(p. 174)
Page 5 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Thus, it appears that early during the development, there is a dominance of the left hemi
sphere for the processing of speech, and particularly native language stimuli, in which
both segmental and suprasegmental information is intact, compared with, for example,
backward speech, in which this information is altered. However, the processing of more
fine-grained phonological features and language-specific contrasts may lateralize later
during the development, once children have advanced from general perceptual abilities to
the attunement to their native language (e.g., Minagawa-Kawai, Mori, Naoi, & Kojima,
2007).
Given these early neuroimaging findings, it seems that the functional neural network on
which language is based, with a left-hemispheric dominance for speech over nonspeech
and right-hemispheric dominance for prosody (Friederici & Alter, 2004; Hickok & Poep
pel, 2007), is, in principle, established during the first 3 months of life. However, it ap
pears that neither the specialization of the language-related areas (e.g., Brauer &
Friederici, 2007; Minagawa-Kawai, Mori, Naoi, & Kojima, 2007) nor all of their structural
connections are fully developed from early on (Brauer, Anwander, & Friederici, 2011;
Dubois et al., 2008).
Page 6 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
With respect to the first developmental path, neurophysiological data indicate that
acoustic differences in phonemes and word stress patterns are detected by the age of 2 to
4 months. At the end of their first year, children recognize and produce their first words,
and ERP studies suggest that infants have established brain mechanisms necessary to ac
quire lexical representations in a similar way to adults, although these are still less spe
cific. In their second year, children enlarge their lexicon, and ERP data show that 24-
month-olds process semantic relations between nouns and verbs in sentences. These
processes resemble those in adults, indicated by children at this age already displaying
an adult-like N400 component reflecting semantic processes.
With respect to the second developmental path, developmental studies show that salient
acoustic cues which mark prosodic phrase boundaries (e.g., pauses) are also perceived at
about the age of 5 months, although it takes some time before less salient cues (e.g.,
changes in the pitch contour) can be used to identify prosodic boundaries that divide the
speech stream into lexical and syntactic units. Electrophysiological data suggest that the
processing of prosodic phrase structure, reflected by the closure positive shift (CPS),
evolves with the emerging ability to process syntactic phrase structure. At the age of 2
years, children are able to recognize syntactic errors in a sentence, reflected by the P600
component. However, the fast automatic syntactic phrase structure building processes,
indicated by the early left anterior negativity (ELAN) in addition to the P600, do not oc
cur before the age of 32 months. The ability to comprehend syntactically complex sen
tences, such as those with noncanonical word order (e.g., passive sentences, object-first
sentences), takes a few more years until adult-like processes are established. Diffusion-
tensor imaging data suggest that this progression is dependent on the maturation of the
fiber bundles that connect the language-relevant brain areas in the inferior frontal gyrus
(Broca’s area) and in the superior temporal gyrus (posterior portion). Figure 9.2 gives an
overview of the outlined developmental steps in language acquisition and the following
sections will describe the related empirical evidence in more detail.
Page 7 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
On the path from sounds to words, infants initially start to process phonological informa
tion that makes up the actual speech sounds (phonemes) and the rules according to
which these sounds are combined (phonotactic rules). Soon, they process prosodic stress
patterns of words, which help them to recognize lexical units in the speech
(p. 176)
stream. These information types are accessed before the processing of word meaning.
Phoneme Characteristics
As one crucial step in language acquisition, infants have to tackle the basic speech
sounds of their native language. The smallest sound units of a language, phonemes, are
contrastive from each other, although functionally equivalent. In a given language, a cir
cumscribed set of approximately 40 phonemes can be combined in different ways to form
unique words. Thus, the meaning of a word changes when one of its component
phonemes is exchanged with another, as in from cat to pat.
For example, Friederici, Friedrich, and Weber (2002) investigated infants’ ability to dis
criminate between different vowel lengths in phonemes at the age of 2 months. Infants
were presented with two syllables of different duration, /ba:/(baa) versus /ba/(ba), in an
MMR paradigm. Two separate sessions tested the long syllable /ba:/(baa) as deviant in a
Page 8 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
stream of short syllable /ba/(ba) standards, and short /ba/(ba) as deviant in a stream of
long /ba:/(baa) standards.
In Figure 9.3, the ERP difference waves display a positivity with a frontal maximum at
about 500-ms post-syllable onset for deviant processing. However, this positivity was only
present for the deviancy detection of the long syllable in a stream of short syllables but
not vice versa, which can be explained by the greater perceptual saliency of a larger ele
ment in the context of smaller elements. In adults, the same experimental setting evoked
a pronounced negative deflection at about 200-ms post-stimulus onset in the difference
wave, the typical MMN response to acoustically deviating stimuli. Interestingly, in in
fants, the response varied depending on their state of alertness; children who were in qui
et sleep during the experiment showed only a positivity, while children who (p. 177) were
awake showed an adult-like MMN in addition to the positivity. From the data, it follows
that infants as young as 2 months of age are able to discriminate long syllables from short
syllables and that they display a positivity in the ERP as MMR.
Word Stress
Another important phonological feature that infants have to discover and make use of
during language acquisition is the rule according to which stress is applied to multisyllab
ic words. For example, English, like German, is a stress-based language and has a bias to
ward a stress-initial pattern for bisyllabic words (Cutler & Carter, 1987). French, in con
trast, is a syllable-based language that tends to lengthen the word’s last syllable (Nazzi,
Iakimova, Bertoncini, Frédonie, & Alcantara, 2006).
Behaviorally, it has been shown that even newborns discriminate differently stressed
pseudowords (Sansavini, Bertoncini, & Giovanelli, 1997) and that between 6 and 9
months, infants acquire language-specific knowledge about the stress pattern of possible
word candidates (Jusczyk, Cutler, & Redanz, 1993; Höhle, Bijeljac-Babic, Nazzi, Herold, &
Weissenborn, 2009; Skoruppa et al., 2009). Interestingly, studies revealed that stress pat
tern discrimination at 6 months is shaped by language experience, as German-learning
(p. 178) but not French-learning infants distinguish between stress-initial and stress-final
Neuroimaging studies that do not require infants’ attention during testing suggest that
infants are sensitive to the predominant stress pattern of their target language as early
as 4 to 5 months of age (Friederici, Friedrich, & Christophe, 2007; Weber, Hahne,
Friedrich, & Friederici, 2004). In an ERP study, Friederici, Friedrich, and Christophe
(2007) tested two groups of 4- to 5-month-old German- and French-learning infants for
their ability to discriminate between different stress patterns. In a mismatch paradigm,
the standard stimuli were bisyllabic pseudowords with stress on the first syllable (baaba),
whereas the deviant stimuli had stress on the second syllable (babaa). The data revealed
that both groups are able to discriminate between the two types of stress patterns (Fig
ure 9.4). However, results differed with respect to the amplitude of the MMR: Infants
learning German showed a larger effect for the language-nontypical iambic pattern
(stress on the second syllable), whereas infants learning French demonstrated a larger ef
fect for the language-nontypical trochaic pattern (stress on the first syllable). These re
sults suggest that the respective nontypical stress pattern is considered deviant both
Page 10 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
within the experiment (i.e., rare stimulus in the set) and with respect to an individual
infant’s native language. This finding, in turn, presupposes that infants have established
knowledge about the predominant stress pattern of their target language by the age of 5
months.
Page 11 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Phonotactics
For successful language learning, infants eventually need to acquire the rules according
to which phonemes may be combined to form a word in a given language. As infants be
come more familiar with actual speech sounds, they gain probabilistic knowledge about
particular phonotactic rules. This also includes information about which phonemes or
phoneme combinations can legitimately appear at word onsets and offsets. If infants ac
Page 12 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
quire this kind of information early on, it can support the detection of lexical units in con
tinuous speech and thus facilitate the learning of new words.
Behaviorally, it has been shown that phonotactic knowledge about word onsets and off
sets is present and used for speech segmentation at the age of 9 months, but is not yet
present at 6 months of age (Friederici & Wessels, 1993; Jusczyk, Friederici, Wessels,
Svenkerud, & Jusczyk, 1993). In ERP studies, the N400 component can serve as an elec
trophysiological marker for studying phonotactic knowledge by enabling the comparison
of brain responses to nonwords that follow the phonotactic rules of a given language and
nonsense words that do not. The N400 component is known to indicate lexical (word
form) and semantic (meaning) processes and is interpreted to mark the effort to integrate
an event into its current or long-term context, with more pronounced N400 amplitudes in
dicating lexically and semantically unfamiliar or unexpected events (Holcomb, 1993; Ku
tas & Van Petten, 1994). Regarding phonotactic processing in adults, ERP studies re
vealed larger N400 amplitudes for pseudowords (i.e., (p. 180) phonotactically legal but
nonexistent in the lexicon) than to real words. In contrast, nonwords (i.e., phonotactically
illegal words) did not elicit an N400 response (e.g., Bentin, Mouchetant-Rostaing, Giard,
Echallier, & Pernier, 1999; Holcomb, 1993; Nobre & McCarthy, 1994). This suggests that
pseudowords trigger search processes for possible lexical entries, but this search fails be
cause pseudowords do not exist in the lexicon. Nonwords, in contrast, do not initiate a
similar search response because they are not even treated as possible lexical entries as
they already violate the phonotactic rules.
Phonological Familiarity
Infants’ emerging efforts to map sounds onto objects (or pictures of objects) has been
captured in an additional ERP effect. An ERP study with 11-month-olds suggested a dif
ferential brain response to known compared with unknown words in the form of a nega
tivity at about 200 ms after word onset, which could be viewed as a familiarity effect
(Thierry, Vihman, & Roberts, 2003). Using a picture–word priming paradigm with 12- and
14-month-olds, Friedrich and Friederici (2005b) observed an early frontocentral negativi
ty between 100 and 400 ms for auditory word targets that matched the picture compared
with nonmatching words. This early effect was interpreted as a familiarity effect reflect
ing the fulfillment of a phonological (word) expectation after seeing the picture of an ob
ject. At this age, infants seem to have some lexical knowledge, but the specific word form
referring to a given object may not yet be sharply defined. This interpretation is support
ed by the finding that 14-month-olds show an ERP difference between known words and
phonetically dissimilar known words, but not between known words and phonetically sim
ilar words (Mills et al., 2004). The available data thus indicate that phonological informa
tion and semantic knowledge interact at about 14 months of age.
Word Meaning
As described above, the adult N400 component reflects the integration of a lexical ele
ment into a semantic context (Holcomb, 1993; Kutas & Van Petten, 1994) and can be used
as an ERP template (p. 181) against which the ERPs for lexical-semantic processing during
early development are compared.
Mills, Coffey-Corina, and Neville (1997) investigated infants’ processing of words whose
meaning they knew or did not know. Infants between 13 and 17 months of age showed a
bilateral negativity for unknown words, whereas 20-month-olds showed a left-hemispher
ic negativity, which was interpreted as a developmental change toward a hemispheric
specialization for word processing (see also Mills et al., 2004). In a more recent ERP
study, Mills and colleagues tested the effect of word experience (training) and vocabulary
size (word production) on lexical processing (Mills, Plunkett, Prat, & Schafer, 2005). In
this word-learning paradigm, 20-month-olds acquired novel words either paired with a
novel object or without an object. After training, the infant ERPs showed a repetition ef
fect indicated by a reduced N200-500 amplitude to familiar and novel unpaired words,
whereas an increased bilaterally distributed N200-500 was found for novel paired words.
This finding indicates that the N200-500 is linked to word meaning; however, it is not en
tirely clear whether the N200-500 reflects semantic processes alone or whether phono
Page 14 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
logical familiarity also plays a role. Assuming that this early ERP effect indeed reflects se
mantic processing, its early onset may be explained by infants’ relatively small vocabular
ies. A small vocabulary results in a low number of phonologically possible alternative
word forms, allowing the brain to respond soon after hearing a word’s first phonemes
(see earlier section on phonological familiarity).
A clear semantic-context N400 effect at the word level has been observed for 14- and 19-
month-olds, but not yet for 12-month-olds (Friedrich & Friederici, 2005a, 2005b, 2004).
The ERP to words in picture contexts showed a central-parietal, bilaterally distributed
negative-going wave between 400 and 1400 ms, which was more negative for words that
did not match the picture context than those that did (Figure 9.6). Compared with adults,
this N400-like effect reached significance later and lasted longer, which suggests slower
lexical-semantic processes in children. There were also small topographic differences of
the effect because children showed a stronger involvement of frontal electrode sites than
adults. The more frontal distribution could either mean that children’s semantic process
es are still more image-based (see frontal distribution in adults for picture instead of
word processing; West & Holcomb, 2002) or that children recruit frontal brain regions
that, in adults, are associated with attention (Courchesne, 1990) and increased demands
on language processing (Brauer & Friederici, 2007). In a recent study, Friedrich and
Friederici (2010) found that the emergence of the N400 is not merely age dependent but
also related to the infants’ state of behavioral language development. Twelve-month-olds,
who obtained a high early word production score, already displayed an N400 semantic
priming effect, whereas infants with lower vocabulary rates did not.
There are few fMRI studies with children investigating lexical-semantic processes at the
word level. These fMRI studies suggest that a neural network for the processing of words
and their meaning often seen in adults is established by the age of 5 years. For example,
one study used a semantic categorization task with 5- to 10-year-old children and ob
served activation in the left inferior frontal gyrus and the temporal region as well as in
the left fusiform gyrus, suggesting a left hemispheric language network similar to that in
adults (Balsamo, Xu, & Gaillard, 2006). Another study used a semantic judgment task
requiring the evaluation of the semantic relatedness of two auditorily presented words
(Chou et al., 2006). During this task, 9- to 15-year-olds showed activation in the temporal
gyrus and in the inferior frontal gyri bilaterally. In a recent fMRI language mapping study
with 8- to 17-year-old children, de Guibert et al. (2010) applied two auditory lexical-se
Page 15 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
mantic tasks and two visual phonological tasks and observed selective activations in left
frontal and temporal regions.
In summary, in the developmental ERP studies on semantic processing at the word level
we have introduced, two ERP effects have been observed. First, an early negativity in re
sponse to picture-matching words has been found even in 12-month-olds (p. 182) and can
be interpreted as a phonological familiarity effect. Second, a later central-parietal nega
tivity for nonmatching words has been observed in 14- and 19-month-olds, an effect re
ferred to as infant N400. The occurrence of a phonological familiarity effect across all age
groups suggests that not only 14- and 19-month-olds but also 12-month-olds create lexical
expectations from picture contents, revealing that they already possess some lexical-se
mantic knowledge. However, infants at the latter age do not yet display an N400 semantic
expectancy violation effect present in 14-month-olds, which indicates that the neural
mechanisms of the N400 mature between 12 and 14 months of age. Furthermore, at the
end of their second year, toddlers are sensitive to semantic category relations and seman
tic relatedness of basic-level words. The finding that the N400 at this age still differs in
latency and distribution from the adult N400 suggests that the underlying brain systems
are still under development. The fact, however, that an N400 effect is present at this age
implies that this ERP component is a useful tool to further investigate semantic process
ing in young children. In this context, developmental fMRI studies have revealed left-
hemisphere activation patterns for lexical-semantic processes that resemble those of
adults. Direct developmental comparisons, however, suggest age-related activation in
creases in left inferior frontal regions and left superior temporal regions, indicating
greater lexical control and experience-based gain in lexical representations, respectively
(Booth et al., 2004; Cao et al., 2010; Schlaggar et al., 2002).
Page 16 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
From Sounds to Sentences
On the path from sounds to sentences, prosodic information plays a central role. Senten
tial prosody is crucial for the acquisition of syntactic structure because different acoustic
cues that in combination mark prosodic phrase boundaries often signal syntactic phrase
boundaries. The detection and processing of prosodic phrase boundaries thus facilitate
the segmentation of linguistically relevant units from continuous speech and provide an
easy entry into later lexical and syntactic learning (see Gleitman & Wanner, 1982).
Sentence-Level Prosody
Intonational phrase boundaries (IPBs) mark the largest units in phrasal prosody, roughly
(p. 183) corresponding to syntactic clauses, and are characterized by several acoustic
cues, namely, preboundary lengthening, pitch change, and pausing (Selkirk, 1984). Be
haviorally, it has been shown that adult listeners make use of prosodic boundaries in the
interpretation of spoken utterances (e.g., Schafer, Speer, Warren, & White, 2000). Simi
larly, developmental studies indicate that infants perceive larger linguistic units in contin
uous speech based on prosodic boundary cues. Although 6-month-old English-learning in
fants detect clauses in continuous speech, they cannot yet reliably identify syntactic
phrases in continuous speech (Nazzi, Kemler Nelson, Jusczyk, & Jusczyk, 2000; Seidl,
2007; Soderstrom, Nelson, & Jusczyk, 2005; Soderstrom, Seidl, Nelson, & Jusczyk, 2003).
In contrast, 9-month-olds demonstrate this ability at both clause and phrase level (Soder
strom et al., 2003). Thus, the perception of prosodic cues that, in combination, signal
boundaries appears to be essential for the structuring of the incoming speech signal and
enables further speech analyses.
In adult ERP studies, the offset of IPBs is associated with a positive-going deflection with
a central-parietal distribution, the CPS (Pannekamp, Toepel, Alter, Hahne, & Friederici,
2005; Steinhauer, Alter, & Friederici, 1999). This component has been interpreted as an
indicator of the closure of prosodic phrases by IPBs. The CPS has been shown to be not a
mere reaction to the acoustically salient pause (lower-level processing), but rather an in
dex for the underlying linguistic process of prosodic structure perception (higher-level
processing) because it is still present when the pause is deleted (Steinhauer, Alter, &
Friederici, 1999).
Page 17 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
ously. In contrast, adults showed a CPS in addition to obligatory ERP responses indepen
dent of the presence of the boundary pause (see also Steinhauer, Alter, & Friederici,
1999). The developmental comparison indicates that infants are sensitive to salient
acoustic cues such as pauses in the speech input, and that they process speech interrup
tions at lower perceptual levels. However, they do not yet show higher-level processing of
combined prosodic boundary cues, reflected by the CPS.
ERP studies in older children examined when, during language learning, the processes
associated with the CPS emerge by exploring the relationship between prosodic boundary
perception and syntactic knowledge (Männel & Friederici, 2011). ERP studies on the pro
cessing of phrase structure violations have revealed a developmental shift between
children’s second and third year (Oberecker, Friedrich, & Friederici, 2005; Oberecker &
Friederici, 2006; see below). Accordingly, children were tested on IPB processing before
this developmental phase, at 21 months, and after this phase, at 3 and 6 years of age. As
can be seen from Figure 9.7, 21-month-olds do not yet show a positive shift in response to
IPBs, although 3- and 6-year-olds do. These results indicate that prosodic structure pro
cessing, as indicated by the CPS, does not emerge until some knowledge of syntactic
phrase structure has been established.
The combined ERP findings on prosodic processing in infants and children suggest that
during early stages of language acquisition, infants initially rely on salient acoustic as
pects of prosodic information that are likely contributors to the recognition of prosodic
boundaries. Children may initially detect prosodic breaks through lower-level processing
mechanisms until a degree of syntactic structure knowledge is formed through continued
language experience that, in turn, reinforces the ability of children to perceive prosodic
phrasing at a cognitive level.
The use of prosodic boundary cues for later language learning has been shown in lexical
acquisition (Gout, Christophe, & Morgan, 2004; Seidl & Johnson, 2007), and in the acqui
sition of word order regularities (Höhle, Weissenborn, Schmitz, & Ischebeck, 2001). Thus,
from a developmental perspective, the initial analysis and segmentation of larger linguis
tically relevant units based on prosodic boundary cues seems to be particularly important
during language acquisition and likely facilitates bootstrapping into smaller syntactic and
lexical units in the speech signal later in children’s development.
Page 18 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
(p. 184)
Sentence-Level Semantics
Sentence processing requires not only the identification of linguistic units but also the
maintenance of the related information in working memory and the integration of differ
ent information over time. To understand the meaning of a sentence, the listener has to
possess semantic knowledge about nouns and verbs as well as their respective relation
ship (for neural correlates of developmental differences between noun and verb process
ing, see Li, Shu, Liu, & Li, 2006; Mestres-Misse, Rodriguez-Fornells, & Münte, 2010; Tan
& Molfese, 2009). To investigate whether children already process word meaning and se
mantic relations in sentential context, the semantic violation paradigm can be applied
with semantically correct and incorrect sentences such as The king was murdered and
The honey was murdered, respectively (Friederici, Pfeifer, & Hahne, 1993; Hahne &
Friederici, 2002). This paradigm uses the N400 as an index of semantic integration abili
ties, with larger N400 amplitudes for higher integration efforts of semantically inappro
priate words into their context. The semantic expectation of a possible sentence ending,
for example, is violated in The honey was murdered because the verb at the end of the
sentence (murdered) does not semantically meet the meaning that was set up by the noun
in the beginning (honey). In adult ERP studies, an N400 has been found in response to
such semantically unexpected sentence endings (Friederici, Pfeifer, & Hahne, 1993;
Hahne & Friederici, 2002).
Friedrich and Friederici (2005c) studied the ERP responses to semantically correct and
incorrect sentences in 19- and 24-month-old children. Semantically incorrect sentences
contained objects that violated the selection restrictions of the preceding verb, as in The
cat drinks the ball in contrast to The child rolls the ball. For both age groups, the sen
tence endings of semantically incorrect sentences evoked N400-like effects in the ERP,
with a maximum at central-parietal electrode sites (Figure 9.8). In comparison to the
adult data, the negativities in children started at about the same time (i.e., at around 400
ms post-word onset) but were longer lasting. This suggests that semantically unexpected
nouns that violate the selection restrictions of the preceding verb also initiate semantic
integration processes in children but that these integration efforts are maintained longer
than in adults. The developmental ERP data indicate that even at the age of 19 and 24
Page 19 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
months, children are able to process semantic relations between words in sentences in a
similar manner to adults.
ERP studies on the processing of sentential lexical-semantic information have also report
ed N400-like responses to semantically incorrect sentences in older children, namely 5- to
15-year-olds (Atchley et al., 2006; Hahne et al., 2004; Holcomb, Coffey, & Neville, 1992).
Similarly, Silva-Pereyra and colleagues found that sentence endings that semantically vio
lated the preceding sentence phrases evoked several anteriorly distributed negative
peaks in 3- and 4-year-olds, whereas in 30-month-olds, an anterior negativity between
500- and 800-ms after word onset occurred (Silva-Pereyra, Klarman, Lin, & Kuhl, 2005;
Silva-Pereyra, Rivera-Gaxiola, & Kuhl, 2005). Although these studies revealed differential
responses to semantically incorrect and correct sentences in young children, the distribu
tion of these negativities did not match the usual central-parietal maximum of the N400
seen in adults.
Despite the different effects reported in the ERP studies on sentential semantic process
ing, the current ERP studies suggest that semantic processes at sentence level, as reflect
ed by an N400-like response, are, in principle, present at the end of children’s second
year of life. However, it takes a few more years (p. 185) before the neural network under
lying these processes is established in an adult-like manner.
A recent fMRI study investigating the neural network underlying sentence-level semantic
processes in 5- to 6-year-old children and adults provides some evidence for the differ
ence between the neural network recruited in children and adults (Brauer & Friederici,
Page 20 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
2007). Activation in children was found bilaterally in the superior temporal gyri and in
the inferior and middle frontal gyri for the processing of correct sentences and semanti
cally incorrect sentences. Compared with adults, the children’s language network was
less lateralized, was less specialized with respect to different aspects of language pro
cessing (semantics versus syntax, see also below), and engaged additional areas in the in
ferior frontal cortex bilaterally. Another fMRI study examined lexical-semantic decisions
for semantically congruous and incongruous sentences in older children, aged 7 to 10
years, and adults (Moore-Parks et al., 2010). Overall, the results suggested that by the
end of children’s first decade, they employ a similar cortical network in semantic process
ing as adults, including activation in left inferior frontal, left middle temporal, and bilater
al superior temporal gyri. However, results also revealed developmental differences, with
adults showing greater activation in the left inferior frontal gyrus, left supramarginal
gyrus, and left inferior parietal lobule as well as motor-related regions.
Syntactic Rules
In any language, a well-defined rule system determines the composition of lexical ele
ments, thus giving the sentence its structure. The analysis of syntactic relations between
words and phrases is a complicated process, yet children have acquired the basic syntac
tic rules of their native language (p. 186) by the end of their third year (see Guasti, 2002;
Hirsh-Pasek, & Golinkoff, 1996; Szagun, 2006). For successful sentence comprehension,
two aspects of syntax processing appear to be of particular relevance: first, the structure
of each phrase that has to be built on the basis of word category information; and second,
the grammatical relationship between the various sentence elements, which has to be es
tablished in order to allow the interpretation of who is doing what to whom.
Adult ERP and fMRI studies have investigated the neural correlates of syntactic process
ing during sentence comprehension by focusing on two aspects: phrase structure build
ing and the establishment of grammatical relations and thereby the sentence’s interpreta
tion. Studies of the former have used the syntactic violation paradigm (e.g., Atchley et al.,
2006; Friederici, Pfeifer, & Hahne, 1993). In this paradigm, syntactically correct and syn
tactically incorrect sentences are presented, with the latter having morphosyntactic,
phrase structure, or tense violations. In the ERP response to syntactically incorrect sen
Page 21 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
tences containing phrase structure violations, two components have been observed. The
first is the ELAN, an early anterior negativity, which is interpreted to reflect highly auto
matic phrase structure building processes (Friederici, Pfeifer, & Hahne, 1993; Hahne &
Friederici, 1999). The second is the P600, a later-occurring central-parietal positivity,
which is interpreted to indicate processes of syntactic integration (Kaan et al., 2000) and
controlled processes of syntactic reanalysis and repair (Friederici, Hahne, & Mecklinger,
1996; Osterhout & Holcomb, 1993). This biphasic ERP pattern in response to phrase
structure violations has been observed for both passive and active sentence constructions
(Friederici, Pfeifer, & Hahne, 1993; Hahne, Eckstein, & Friederici, 2004; Hahne &
Friederici, 1999; Rossi, Gugler, Hahne, & Friederici, 2005).
Several developmental ERP studies have examined at what age children process phrase
structure violations and therefore show the syntax-related ERP components ELAN and
P600 as observed in adults (Oberecker, Friedrich, Friederici, 2005; Oberecker & Friederi
ci, 2006). In these experiments, 24- and 32-month-old German children listened to syntac
tically correct sentences and incorrect sentences that comprised incomplete prepositional
phrases. For example, the noun after the preposition was omitted as in *The lion in the
___ roars versus The lion roars. As illustrated in Figure 9.9, the adult data revealed the ex
pected biphasic ERP pattern in response to the sentences containing a phrase structure
violation. The ERP responses of 32-month-old children showed a similar ERP pattern, al
though both components appeared in later time windows than the adult data. Interesting
ly, 24-month-old children also showed a difference between correct and incorrect sen
tences; however, in this age group only, a P600 but no ELAN occurred.
Page 22 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Hahne, Eckstein, and Friederici (2004) investigated the processing of phrase structure vi
olations in syntactically more complex, noncanonical sentences (i.e., passive sentences
such as The boy was kissed by the girl). In these sentences, the first noun (the boy) is not
the actor, which makes the interpretation more difficult than in active sentences. When a
syntactic violation occurred in passive sentences, the ELAN-P600 pattern was evoked in
7- to 13-year-old children. Six-year-olds, however, only displayed a late P600.
The combined ERP results point to developmental differences suggesting that automatic
syntactic processes, reflected by the ELAN, are present later during language develop
ment than processes reflected by the P600. Moreover, the adult-like ERP pattern is
present earlier for active than for passive sentences. This developmental course is in line
with behavioral findings indicating that the processing of noncanonical sentences only de
velops late, after the age of 5 years and, depending on the syntactic structure only around
the age of 7 years (Dittmar, Abbot-Smith, Lieven, & Tomasello, 2008).
The neural network underlying syntactic processes in the developing brain has recently
been investigated in an fMRI study with 5- to 6-year-olds using the syntactic violation par
adigm (Brauer & Friederici, 2007). Sentences containing a phrase structure violation bi
laterally activated the superior temporal gyri and the inferior and middle frontal gyri
(similar to correct and semantically incorrect sentences) but, moreover, specifically acti
vated left Broca’s area. Compared with that in adults, this activation pattern was less lat
eralized, less specific, and more extended. A time course analysis of the perisylvian acti
vation across correct and incorrect sentences also revealed developmental differences. In
contrast to that in adults, children’s inferior frontal cortex responded much later than
their superior temporal cortex (Figure 9.10). Moreover, in contrast to adults, children dis
played a temporal primacy of right-hemispheric over left-hemispheric activation (Brauer,
Neumann & Friederici, 2008), which suggests a strong reliance on right-hemisphere
prosodic processes during auditory sentence comprehension in childhood. In a recent fM
RI study with 10- to 16-year-old children, Yeatman, Ben-Shachar, Glover, and Feldmann
(2010) investigated sentence processing by systematically varying syntactic complexity
and observed broad activation patterns in frontal, temporal, temporal-parietal and cingu
late regions. Independent of sentence length, syntactically more complex sentences
evoked stronger activation in the left temporal-parietal junction and the right superior
temporal gyrus. Interestingly, activation changes in frontal regions correlated with vocab
ulary and syntax perception measures. Thus, individual differences in activation patterns
demonstrate that auditory sentence comprehension is based on a dynamic and distrib
uted network that is modulated by age, language skills, and task demands.
Page 23 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Conclusion
The results of the reported behavioral and neuroimaging studies broadly cover phonologi
cal/prosodic, semantic, and syntactic aspects of language acquisition during the first
years of life. In developmental research, ERPs are well established and often the method
of choice; however, MEG, NIRS, and fMRI have recently been adjusted for use in develop
mental populations. Because the ERP method delivers information about the neural corre
lates of different aspects of language processing, it is an excellent tool for the investiga
tion of the various developmental stages in language acquisition. More specifically, a par
ticular ERP component, the MMR, which reflects discrimination not only of acoustic but
also of phonological features, can thus be used to examine very early stages of language
acquisition, even in newborns. A further ERP component that indicates lexical and seman
tic processes in adults, the N400, has been registered in 14-month-olds, but has not been
found in 12-month-olds, and can (p. 188) be used to investigate phonotactic knowledge,
word knowledge, and knowledge of lexical-semantic relations between basic-level words
and verbs and their arguments in sentences. For the syntactic domain, an adult-like
biphasic ERP pattern, the ELAN-P600, is not yet present in 24-month-olds but is in 32-
month-old children for the processing of structural dependencies within phrases, thus
Page 24 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
characterizing the developmental progression of syntax acquisition. Other methods, par
ticularly fMRI, deliver complementary evidence that the neural basis underlying specific
aspects of language processing, such as semantics and syntax, is still under development
for a few more years before adult-like language processes are achieved.
References
Atchley, R. A., Rice, M. L., Betz, S. K., Kwasney, K. M., Sereno, J. A., & Jongman, A. (2006).
A comparison of semantic and syntactic event related potentials generated by children
and adults. Brain & Language, 99, 236–246.
Balsamo, L. M., Xu B., & Gaillard W. D. (2006). Language lateralization and the role of the
fusiform gyrus in semantic processing in young children. NeuroImage, 31 (3), 1306–1314.
Bentin, S., Mouchetant-Rostaing, Y., Giard, M. H., Echallier, J. F., & Pernier, J. (1999). ERP
manifestations of processing printed words at different psycholinguistic levels: Time
course and scalp distribution. Journal of Cognitive Neuroscience, 11 (3), 235–260.
Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-year-
(p. 189)
Booth, J. R., Burman, D. D., Meyer, J. R., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M.
(2004). Development of brain mechanisms for processing orthographic and phonologic
representations. Journal of Cognitive Neuroscience, 16 (7), 1234–1249.
Brauer, J., Anwander, A. & Friederici, A. D. (2011). Neuroanatomical prerequisites for lan
guage functions in the maturing brain. Cerebral Cortex, 21, 459–466.
Brauer, J., & Friederici, A. D. (2007). Functional neural networks of semantic and syntac
tic processes in the developing brain. Journal of Cognitive Neuroscience, 19 (10), 1609–
1623.
Brauer, J., Neumann, J., & Friederici, A. D. (2008). Temporal dynamics of perisylvian acti
vation during language processing in children and adults. NeuroImage, 41 (4), 1484–
1492.
Cao, F., Khalid, K., Zaveri, R., Bolger, D. J., Bitan, T., & Booth, J. R. (2010). Neural corre
lates of priming effects in children during spoken word processing with orthographic de
mands. Brain & Language, 114 (2), 80–89.
Page 25 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., et al. (1998). Devel
opment of language-specific phoneme representations in the infant brain. Nature Neuro
science, 1, 351–353.
Cheour, M., Imada, T., Taulu, S., Ahonen, A., Salonen, J., & Kuhl, P. K. (2004). Magnetoen
cephalography is feasible for infant assessment of auditory discrimination. Experimental
Neurology, 190, 44–51.
Chou, T. L., Booth, J. R., Burman, D. D., Bitan, T., Bigio, J. D., Lu, D., & Cone, N. E. (2006).
Developmental changes in the neural correlates of semantic processing. NeuroImage, 29,
1141–1149.
Cutler, A., & Carter, D. (1987). The predominance of strong initial syllables in the English
vocabulary. Computational Speech and Language, 2, 133–142.
de Guibert, C., Maumeta, C., Ferréa, J.-C., Jannina, P., Birabeng, A., Allairee, C., Barillota,
C., & Le Rumeur, E. (2010). FMRI language mapping in children: A panel of language
tasks using visual and auditory stimulation without reading or metalinguistic require
ments. NeuroImage, 51 (2), 897–909.
Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral correlates of syllable
discrimination in infants. Nature, 370, 292–295.
Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Mériaux, S., Roche, A., Sigman, M.,
et al. (2006). Functional organization of perisylvian activation during presentation of sen
tences in preverbal infants. Proceedings of the National Academy of Sciences U S A, 103,
14240–14245.
Dehaene-Lambertz, G., Montavont, A., Jobert, A., Allirol, L., Dubois, J., Hertz-Pannier, L.,
& Dehaene, S. (2010). Language or music, mother or Mozart? Structural and environmen
tal influences on infants’ language networks. Brain & Language, 114 (2), 53–65.
Page 26 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Dittmar, M., Abbot-Smith, K., Lieven, E., & Tomasello, M. (2008). Young German
children’s early syntactic competence: A preferential looking study. Developmental
Science, 11 (4), 575–582.
Dubois, J., Dehaene-Lambertz, G., Perrin, M., Mangin, J.-F., Cointepas, Y., Duchesnay, E.,
et al. (2008). Asynchrony of the early maturation of white matter bundles in healthy in
fants: Quantitative landmarks revealed noninvasively by diffusion tensor imaging. Human
Brain Mapping, 29 (1), 14–27.
Fonov, V. S., Evans, A. C., Botteron, K., Almli, C. R., McKinstry, R. C., Collins, D. L., et al.
(2011). Unbiased average age-appropriate atlases for pediatric studies. NeuroImage, 54,
313–327.
Friederici, A. D., & Alter, K. (2004). Lateralization of auditory language functions: A dy
namic dual pathway model. Brain and Language, 89, 267–276.
Friederici, A. D., & Wessels, J. M. (1993). Phonotactic knowledge and its use in infant
speech perception. Perception and Psychophysics, 54, 287–295.
Friederici, A. D., Friedrich, M., & Christophe, A. (2007). Brain responses in 4-month-old
infants are already language specific. Current Biology, 17 (14), 1208–1211.
Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestation of cognitive and
precognitive mismatch detection in early infancy. NeuroReport, 13, 1251–1254.
Friederici, A. D., Hahne, A., & Mecklinger, A. (1996). Temporal structure of syntactic
parsing: Early and late event-related brain potential effects. Journal of Experimental Psy
chology: Learning Memory and Cognition, 22, 1219–1248.
Friederici, A. D., Pfeifer, E., & Hahne, A. (1993). Event-related brain potentials during
natural speech processing: Effects of semantic, morphological and syntactic violations.
Cognitive Brain Research, 1, 183–192.
Friedrich, M., & Friederici, A. D. (2004). N400-like semantic incongruity effect in 19-
month-olds: Processing known words in picture contexts. Journal of Cognitive Neuro
science, 16, 1465–1477.
Friedrich, M., & Friederici, A. D. (2005b). Lexical priming and semantic integration re
flected in the ERP of 14-month-olds. NeuroReport, 16 (6), 653–656.
Page 27 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Friedrich, M., & Friederici, A. D. (2005c). Semantic sentence processing reflected in the
event-related potentials of one- and two-year-old children. NeuroReport, 16 (6), 1801–
1804.
Friedrich, M., & Friederici, A. D. (2010). Maturing brain mechanisms and developing be
havioral language skills. Brain and Language, 114, 66–71.
Gervain, J., Mehler, J., Werker, J. F., Nelson, C. A., Csibra, G., Lloyd-Fox, S., et al. (2011).
Near-infrared spectroscopy: A report from the McDonnell infant methodology consor
tium. Developmental Cognitive Neuroscience, 1 (1), 22–46.
Gleitman, L. R., & Wanner, E. (1982). The state of the state of the art. In E. Wan
(p. 190)
ner & L. Gleitman (Eds.), Language acquisition: The state of the art (pp. 3–48). Cam
bridge, MA: Cambridge University Press.
Gout, A., Christophe, A., & Morgan, J. L. (2004). Phonological phrase boundaries con
strain lexical access. II. Infant data. Journal of Memory and Language, 51, 548–567.
Goyet, L., de Schonen, S., & Nazzi, T. (2010). Words and syllables in fluent speech seg
mentation by French-learning infants: An ERP study. Brain Research, 1332, 75–89.
Gratton, G., & Fabiani, M. (2001). Shedding light on brain function: The event-related op
tical signal. Trends in Cognitive Sciences, 5, 357–363.
Grossmann, T., Johnson, M. H., Lloyd-Fox, S., Blasi, A., Deligianni, F., Elwell, C., et al.
(2008). Early cortical specialization for face-to-face communication in human infants. Pro
ceedings of the Royal Society B, 275, 2803–2811.
Grossmann, T., Oberecker, R., Koch, S. P., & Friederici, A. D. (2010). Developmental ori
gins of voice processing in the human brain. Neuron, 65, 852–858.
Guasti, M. T. (2002). Language acquisition: The growth of grammar. Cambridge, MA: MIT
Press.
Hahne, A., & Friederici, A. D. (1999). Electrophysiological evidence for two steps in syn
tactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuro
science, 11, 194–205.
Hahne, A., & Friederici, A. D. (2002). Differential task effects on semantic and syntactic
processes as revealed by ERPs. Cognitive Brain Research, 13, 339–356.
Hahne, A., & Jescheniak, J. D. (2001). What’s left if the Jabberwock gets the semantics?
An ERP investigation into semantic and syntactic processes during auditory sentence
comprehension. Cognitive Brain Research, 11, 199–212.
Hahne, A., Eckstein, K., & Friederici, A. D. (2004). Brain signatures of syntactic and se
mantic processes during children’s language development. Journal of Cognitive Neuro
science, 16, 1302–1318.
Page 28 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature
Reviews Neuroscience, 8, 393–402.
Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The origins of grammar: Evidence from early
language comprehension. Cambridge, MA: MIT Press.
Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). Language
specific prosodic preferences during the first half year of life: Evidence from German and
French infants. Infant Behavior and Development, 32 (3), 262–274.
Höhle, B., Weissenborn, J., Schmitz, M., & Ischebeck, A. (2001). Discovering word order
regularities: The role of prosodic information for early parameter setting. In J. Weis
senborn & B. Höhle (Eds.), Approaches to bootstrapping. Phonological, lexical, syntactic
and neurophysiological aspects of early language acquisition (Vol. 1, p. 249–265). Amster
dam: John Benjamins.
Holcomb, P. J. (1993). Semantic priming and stimulus degradation: Implications for the
role of the N400 in language processing. Psychophysiology, 30, 47–61.
Holcomb, P. J., Coffey, S. A., & Neville, H. J. (1992). Visual and auditory sentence process
ing: A developmental analysis using event-related brain potentials. Developmental Neu
ropsychology, 8, 203–241.
Homae, F., Watanabe, H., Nakano, T., Asakawa, K., & Taga, G. (2006). The right hemi
sphere of sleeping infant perceives sentential prosody. Neuroscience Research, 54 (4),
276–280.
Homae, F., Watanabe, H., Nakano, T., & Taga, G. (2007). Prosodic processing in the devel
oping brain. Neuroscience Research, 59 (1), 29–39.
Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., & Cutler, A. (2000). Cross-lan
guage word segmentation by 9-month-olds. Psychonomic Bulletin Review 7, 504–509.
Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., & Kuhl, P. K. (2006). Infant speech
perception activates Broca`s area: A developmental magnetoencephalography study. Neu
roReport, 17, 957–962.
Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant
stress patterns of English words. Child Development, 64, 675–687.
Jusczyk, P. W., Friederici, A. D., Wessels, J. M. I., Svenkerud, V., & Jusczyk, A. M. (1993).
Infants’ sensitivity to the sound patterns of native language words. Journal of Memory
and Language, 32, 402–420.
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmenta
tion in English-learning infants. Cognitive Psychology, 39 (3–4), 159–207.
Page 29 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Kaan, E., Harris, A., Gibson, E., & Holcomb, P. (2000). The P600 as an index of syntactic
integration difficulty. Language and Cognitive Processes, 15, 159–201.
Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmenta
tion: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591–612.
Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word
recognition. In A. D. Friederici & G. Thierry (Eds.), Early language development: Bridging
brain and behaviour (TiLAR 5, p. 91–114). Amsterdam: John Benjamins.
Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews
Neuroscience, 5, 831–843.
Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T.
(2008). Phonetic learning as a pathway to language: New data and native language mag
net theory expanded (NLM-e). Philosophical Transactions of the Royal Society B, 363,
979–1000.
Kuijpers, C. T. L., Coolen, R., Houston, D., Cutler, A. (1998). Using the headturning tech
nique to explore cross-linguistic performance differences. Advances in Infancy Research,
12, 205–220.
Kujala, A., Huotilainen, M., Hotakainen, M., Lennes, M., Parkkonen, L., Fellman, et al.
(2004). Speech-sound discrimination in neonates as measured with MEG. NeuroReport,
15 (13), 2089–2092.
Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., Huotilainen, M., & Näätänen, R.
(2002b). Maturation of the auditory event-related potentials during the first year of life.
NeuroReport, 13, 47–51.
Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., & Näätänen, R. (2002a). Matura
tion of the auditory change detection response in infants: A longitudinal ERP study. Neu
roReport, 13 (15), 1843–1846.
Kushnerenko, E., Cheour, M., Ceponiene, R., Fellman, V., Renlund, M., Soininen, K., et al.
(2001). Central auditory processing of durational changes in complex speech patterns by
newborns: An event-related brain potential study. Developmental Neuropsychology, 19
(1), 83–97.
Kutas, M., & van Petten, C. K. (1994). Psycholinguistics electrified: Event-related brain
potential investigations. In M. A. Gernsbacher (Ed.), Handbook of psycholinguistics (pp.
83–143). San Diego, CA: Academic Press.
Leach, J. L., & Holland, S. K. (2010). Functional MRI in children: Clinical and research ap
plications. Pediatric Radiology, 40, 31–49.
Page 30 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Leppänen, P. H. T., Pikho, E., Eklund, K. M., & Lyytinen, H. (1999). Cortical re
(p. 191)
sponses of infants with and without a genetic risk for dyslexia: II. Group effects. NeuroRe
port, 10, 969–973. Cambridge, MA: MIT Press.
Lloyd-Fox, S., Blasi, A., & Elwell, C. E. (2010) Illuminating the developing brain: The past,
present and future of functional near infrared spectroscopy. Neuroscience and Biobehav
ioural Reviews, 34 (3), 269–284.
Li, X. S., Shu, H., Liu, Y. Y., & Li, P. (2006). Mental representation of verb meaning: Behav
ioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 18 (10), 1774–
1787.
Männel, C., & Friederici, A. D. (2009). Pauses and intonational phrasing: ERP studies in 5-
month-old German infants and adults. Journal of Cognitive Neuroscience, 21 (10), 1988–
2006.
Männel, C., & Friederici, A. D. (2010). Prosody is the key: ERP studies on word segmenta
tion in 6- and 12-month-old children. Journal of Cognitive Neuroscience, Supplement,
261.
Männel, C., & Friederici, A. D. (2011). Intonational phrase structure processing at differ
ent stages of syntax acquisition: ERP studies in 2-, 3-, and 6-year-old children. Develop
mental Science, 14 (4), 786–798.
Mestres-Misse, A., Rodriguez-Fornells, A., & Münte, T. F. (2010). Neural differences in the
mapping of verb and noun concepts onto novel words. NeuroImage, 49, 2826–2835.
Meyer, M., Alter, K., Friederici, A. D., Lohmann, G., & von Cramon, D. Y. (2002). fMRI re
veals brain regions mediating slow prosodic modulations in spoken sentences. Human
Brain Mapping, 17 (2), 73–88.
Meyer, M., Steinhauer, K., Alter, K., Friederici, A. D., & von Cramon, D. Y. (2004). Brain
activity varies with modulation of dynamic pitch variance in sentence melody. Brain and
Language, 89 (2), 277–289.
Mills, D. L., Coffey-Corina, S. A., & Neville, H. J. (1997). Language comprehension and
cerebral specification from 13 to 20 months. Developmental Neuropsychology, 13 (3),
397–445.
Mills, D. L., Plunkett, K., Prat, C., & Schafer, G. (2005). Watching the infant brain learn
words: Effects of vocabulary size and experience. Cognitive Development, 20, 19–31.
Page 31 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Mills, D. L., Prat, C., Zangl, R., Stager, C. L., Neville, H. J., & Werker, J. F. (2004). Lan
guage experience and the organization of brain activity to phonetically similar words:
ERP evidence from 14- and 20-month-olds. Journal of Cognitive Neuroscience, 16 (8),
1452–1464.
Minagawa-Kawai, Y., Mori, K., Furuya, I., Hayashi R., & Sato, Y. (2002). Assessing cere
bral representations of short and long vowel categories by NIRS. NeuroReport, 13, 581–
584.
Minagawa-Kawai, Y., Mori, K., Hebden, J., & Dupoux, E. (2008). Optical imaging of in
fants’ neurocognitive development: Recent advances and perspectives. Developmental
Neurobiology, 68 (6), 712–728.
Minagawa-Kawai, Y., Mori, K., Naoi, N., & Kojima, S. (2007). Neural attunement process
es in infants during the acquisition of a language-specific phonemic contrast. Journal of
Neuroscience, 27, 315–321.
Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R., & Dupoux, E.
(2011). Optical brain imaging reveals general auditory and language-specific processing
in early infant development. Cerebral Cortex, 21 (2), 254–261.
Moore-Parks, E. N., Burns, E. L., Bazzill, R., Levy, S., Posada, V., & Muller, R. A. (2010). An
fMRI study of sentence-embedded lexical-semantic decision in children and adults. Brain
and Language, 114 (2), 90–100.
Morr, M. L., Shafer, V. L., Kreuzer, J., & Kurtzberg, D. (2002). Maturation of mismatch
negativity in infants and pre-school children. Ear and Hearing, 23, 118–136.
Muzik, O., Chugani, D. C., Juhasz, C., Shen, C., & Chugani, H. T. (2000). Statistical para
metric mapping: Assessment of application in children. NeuroImage, 12, 538–549.
Nazzi, T., Dilley, L. C., Jusczyk, A. M., Shattuck-Hufnagel, S., & Jusczyk, P. W. (2005). Eng
lish-learning infants’ segmentation of verbs from fluent speech. Language and Speech,
48, 279–298.
Nazzi, T., Iakimova, G., Bertoncini, J., Frédonie, S., & Alcantara, C. (2006). Early segmen
tation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic
differences. Journal of Memory and Language, 54, 283–299.
Nazzi, T., Kemler Nelson, D. G., Jusczyk, P.W., & Jusczyk, A. M. (2000). Six-month-olds’ de
tection of clauses embedded in continuous speech: Effects of prosodic well-formedness.
Infancy, 1 (1), 123–147.
Page 32 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Nobre, A. C., & McCarthy, G. (1994). Language-related ERPs: Scalp distributions and
modulation by word type and semantic priming. Journal of Cognitive Neuroscience, 6 (33),
233–255.
Oberecker, R., Friedrich, M., & Friederici, A. D. (2005). Neural correlates of syntactic
processing in two-year-olds. Journal of Cognitive Neuroscience, 17, 407–421.
Obrig, H., & Villringer, A. (2003). Beyond the visible: Imaging the human brain with light.
Journal of Cerebral Blood Flow and Metabolism, 23, 1–18.
Okamoto, M., Dan, H., Shimizu, K., Takeo, K., Amita, T. Oda, I., et al. (2004). Multimodal
assessment of cortical activation during apple peeling by NIRS and fMRI. NeuroImage,
21, 1275–1288.
Osterhout, L., & Holcomb, P. J. (1993). Event-related brain potentials and syntactic anom
aly: Evidence on anomaly detection during perception of continuous speech. Language
and Cognitive Processes, 8, 413–437.
Pannekamp, A., Toepel, U., Alter, K., Hahne, A., & Friederici, A. D. (2005). Prosody-driven
sentence processing: An event-related brain potential study. Journal of Cognitive Neuro
science, 17, 407–421.
Pena, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., et al.
(2003). Sounds and silence: An optical topography study of language recognition at birth.
Proceedings of the National Academy of Sciences U S A, 100 (20), 11702–11705.
Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., Baldoli, C., &
Koelsch, S. (2010). Functional specializations for music processing in the human newborn
brain. Proceedings of the National Academy of Sciences U S A, 107 (10), 4758–4763.
Pihko, E., Leppänen, P. H. T., Eklund, K. M., Cheour, M., Guttorm, T. K., & Lyyti
(p. 192)
nen, H. (1999). Cortical responses of infants with and without a genetic risk for dyslexia:
I. Age effects. NeuroReport, 10, 901–905.
Rivera-Gaxiola, M., Silva-Pereyra, J., & Kuhl, P. K. (2005). Brain potentials to native- and
non-native speech contrasts in seven- and eleven-month-old American infants. Develop
mental Science, 8, 162–172.
Rivkin, M. J., Wolraich, D., Als, H., McAnulty, G., Butler, S., Conneman, N., et al. (2004).
Prolonged T*[2] values in newborn versus adult brain: Implications for fMRI studies of
newborns. Magnetic Resonance in Medicine, 51 (6), 1287–1291.
Rossi, S., Gugler, M. F., Hahne, A., & Friederici, A. D. (2005). When word category infor
mation encounters morphosyntax: An ERP study. Neuroscience Letters, 384, 228–233.
Page 33 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Saito, Y., Aoyama, S., Kondo, T., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M.,
& Toshima, T. (2007a). Frontal cerebral blood flow change associated with infant-directed
speech. Archives of Disease in Childhood. Fetal and Neonatal Edition, 92 (2), F113–F116.
Saito, Y., Kondo, T., Aoyama, S., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M.,
& Toshima, T. (2007b). The function of the frontal lobe in neonates for response to a
prosodic voice. Early Human Development, 83 (4), 225–230.
Sambeth, A., Ruohio, K., Alku, P., Fellman, V., & Huotilainen, M. (2008). Sleeping new
borns extract prosody from continuous speech. Clinical Neurophysiology, 119 (2), 332–
341.
Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of
multisyllabic stressed words. Developmental Psychology, 33 (1), 3–11.
Schafer, A. J., Speer, S. R., Warren, P., & White, S. D. (2000). Intonational disambiguation
in sentence production and comprehension. Journal of Psycholinguistic Research, 29,
169–182.
Schapiro, M. B., Schmithorst, V. J., Wilke, M., Byars Weber, A., Strawsburg, R. H., & Hol
land, S. K. (2004). BOLD fMRI signal increases with age in selected brain regions in chil
dren. NeuroReport, 15 (17), 2575–2578.
Schlaggar, B. L., Brown, T. T., Lugar, H. L., Visscher, K. M., Miezin, F. M., & Petersen, S.
E. (2002). Functional neuroanatomical differences between adults and school-age chil
dren in the processing of single words. Science, 296, 1476–1479.
Schroeter, M. L., Zysset, S., Wahl, M., & von Cramon, D. Y. (2004). Prefrontal activation
due to Stroop interference increases during development: An event-related fNIRS study.
NeuroImage, 23, 1317–1325.
Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Jour
nal of Memory and Language, 57, 24–48.
Seidl, A., & Johnson, E. K. E. (2007). Boundary alignment facilitates 11-month-olds’ seg
mentation of vowel-initial words from speech. Journal of Child Language, 34, 1–24.
Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure.
Cambridge, MA: MIT Press.
Silva-Pereyra, J., Conboy, B. T., Klarman, L., & Kuhl, P. K. (2007). Grammatical processing
without semantics? An event-related brain potential study of preschoolers using jabber
wocky sentences. Journal of Cognitive Neuroscience, 19 (6), 1–16.
Silva-Pereyra, J., Klarman, L., Lin, L. J., & Kuhl, P. K. (2005). Sentence processing in 30-
month-old children: An event-related potential study. NeuroReport, 16, 645–648.
Page 34 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Silva-Pereyra, J., Rivera-Gaxiola, M., & Kuhl, P. K. (2005). An event-related brain potential
study of sentence comprehension in preschoolers: Semantic and morphosyntactic pro
cessing. Cognitive Brain Research, 23, 247–258.
Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastián-Gallés, N., &
Peperkamp, S. (2009). Language-specific stress perception by nine-month-old French and
Spanish infants. Developmental Science, 12, 914–919.
Soderstrom, M., Nelson, D. G. K., & Jusczyk, P. W. (2005). Six-month-olds recognize claus
es embedded in different passages of fluent speech. Infant Behavior & Development, 28,
87–94.
Soderstrom, M., Seidl, A., Nelson, D. G. K., & Jusczyk, P. W. (2003). The prosodic boot
strapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Lan
guage, 49 (2), 249–267.
Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate
use of prosodic cues in natural speech processing. Nature Neuroscience, 2, 191–196.
Tan, A., & Molfese, D. L. (2009). ERP Correlates of noun and verb processing in
preschool-age children. Biological Psychology, 8 (1), 46–51.
Thierry, G., Vihman, M., & Roberts, M. (2003). Familiar words capture the attention of 11-
month-olds in less than 250 ms. NeuroReport, 14, 2307–2310.
Torkildsen, J. V. K., Sannerud, T., Syversen, G., Thormodsen, R., Simonsen, H. G., Moen, I.,
et al. (2006). Semantic organization of basic level words in 20-month-olds: An ERP study.
Journal of Neurolinguistics, 19, 431–454.
Torkildsen, J. V. K., Syversen, G., Simonsen, H. G., Moen, I., Smith, L., & Lindgren, M.
(2007). Electrophysiological correlates of auditory semantic priming in 24-month-olds.
Journal of Neurolinguistics, 20, 332–351.
Trainor, L., Mc Fadden, M., Hodgson, L., Darragh Barlow, J., Matsos, L., & Sonnadara, R.
(2003). Changes in auditory cortex and the development of mismatch negativity between
2 and 6 months of age. International Journal of Psychophysiology, 51, 5–15.
Tsao, F.-M., Liu, H.-M., & Kuhl, P. K. (2004). Speech perception in infancy predicts lan
guage development in the second year of life: A longitudinal study. Child Development,
75, 1067–1084.
Vannest, J., Karunanayaka, P. R., Schmithorst, V. J., Szaflarski, J. P., & Holland, S. K.
(2009). Language networks in children: Evidence from functional MRI studies. American
Journal of Roentgenology, 192 (5), 1190–1196.
Page 35 of 36
Neural Correlates of the Development of Speech Perception and Compre
hension
Villringer, A., & Chance, B. (1997). Noninvasive optical spectroscopy and imaging of hu
man brain function. Trends in Neuroscience, 20, 435–442.
Weber, C., Hahne, A., Friedrich, M., & Friederici, A. D. (2004). Discrimination of word
stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research,
18, 149–161.
West, W. C., & Holcomb, P. J. (2002). Event-related potentials during discourse-level se
mantic integration of complex pictures. Cognitive Brain Research, 13, 363–375.
Wilke, M., Holland, S. K., Altaye, M., & Gaser, C. (2008). Template-O-Matic: A toolbox for
creating customized pediatric templates. NeuroImage, 41 (3), 903–913.
Yamada, Y., & Neville, H. J. (2007). An ERP study of syntactic processing in English and
nonsense sentences. Brain Research, 1130, 167–180.
Yeatman, J. D., Ben-Shachar, M., Glover, G. H., & Feldman, H. M. (2010). Individual differ
ences in auditory sentence comprehension in children: An exploratory event-related func
tional magnetic resonance imaging investigation. Brain & Language, 114 (2), 72–79.
Angela Friederici
Angela D. Friederici, Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany.
Claudia Männel
Claudia Männel, Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany
Page 36 of 36
Perceptual Disorders
Perceptual Disorders
Josef Zihl
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Perceptual processes provide the basis for mental representation of the visual, auditory,
olfactory, gustatory, somatosensory, and social “worlds” as well as for guiding and con
trolling cognitive, social, and motor activities. All perceptual systems, i.e. vision, audition,
somatosensory perception, smell and taste, and social perception are segregated func
tional networks and show a parallel-hierarchical type of organization of information pro
cessing and encoding. In pathological conditions such as acquired brain injury, perceptu
al functions and abilities can be variably affected, ranging from the loss of stimulus detec
tion to impaired recognition. Despite the functional specialization of perceptual systems,
association of perceptual deficits within sensory modalities is the rule, and disorders of a
single perceptual function or ability are rare. This chapter describes cerebral visual, audi
tory, somatosensory, olfactory, and gustatory perceptual disorders within a neuropsycho
logical framework. Disorders in social perception are also considered because they repre
sent a genuine category of perceptual impairments.
Keywords: vision, audition, somatosensory perception, smell, taste, social perception, cerebral perceptual disor
ders
Introduction
Perception “is the process or result of becoming aware of objects, relationships, and
events by means of the senses,” which includes activities such as detecting, discriminat
ing, identifying, and recognizing. “These activities enable organisms to organize and in
terpret the stimuli received into meaningful knowledge” (APA, 2007). Perception is con
structed in the brain and involves lower and higher level processes that serve simpler and
more complex perceptual abilities such as detection, identification, and recognition
(Mather, 2006). The behavioral significance of perception lies not only in the processing
of stimuli as a basis for mental representation of the visual, auditory, olfactory, gustatory,
somatosensory, and social “worlds” but also in the guidance and control of activities. Thus
there exists a reciprocal interaction between perception, cognition, and action. For per
ceptual activities, attention, memory, and executive functions are crucial prerequisites.
Page 1 of 37
Perceptual Disorders
They form the bases for focusing on stimuli and maintaining attention during stimulus ac
quisition and processing, storing percepts as experience and concepts, and controlling in
put and output activities that allow for an optimal, flexible adaptation to extrinsic and in
trinsic challenges.
The aim of this chapter is to describe the effect of pathological conditions, particularly ac
quired brain injury, on the various abilities in the domains of vision, audition, somatosen
sory perception, smell and taste, and social perception as well as the behavioral conse
quences and the significance of these disorders for the understanding of brain organiza
tion. Perceptual disorders can result from injury to the afferent sensory pathways and/or
to their subcortical and cortical processing and coding (p. 194) stages. Peripheral injury
usually causes “lower level” dysfunctions (e.g., threshold elevation or difficulties with
stimulus localization and sensory discrimination), whereas central injuries cause “higher
level” perceptual dysfunctions (e.g., in the domains of identification and recognition).
However, peripheral sensory deficits may also be associated with higher perceptual disor
ders because the affected sensory functions and their interactions represent a crucial
prerequisite for more complex perceptual abilities (i.e., detection and discrimination of
stimuli build the basis for identification and recognition).
Vision
Visual perception comprises lower level visual abilities (i.e., the visual field, visual acuity,
contrast sensitivity, color and form vision, and stereopsis) and higher level visual abilities,
in particular visual identification and recognition. Visual perceptual abilities also form the
basis for visually guided behavior, such as oculomotor activities, hand and finger move
ments, and spatial navigation. From its very beginning, visual neuroscience has been con
cerned with the analysis of the various visual perceptual deficits and the identification of
the location of the underlying brain injury. Early clinical reports on patients have already
demonstrated the selective loss of visual abilities after acquired brain injury. These obser
vations have suggested a functional specialization of the visual cortex, a concept verified
many years later by combined anatomical, electrophysiological, and behavioral evidence
(Desimone & Ungerleider, 1989; Grill-Spector & Malach, 2004; Orban, 2008; Zeki, 1993).
The primary visual cortical area (striate cortex, Brodmann area 17, visual area 1, or V1)
receives its input from the retina via the lateral geniculate body (LGN) and possesses a
highly accurate, topographically organized representation of the retina and thus of the vi
sual field. The central visual field occupies a large proportion of the striate cortex; about
half of the cortical surface is devoted to the central 10 degrees of the visual field, which is
only 1 percent of the visual field (Tootell, Hadjikhani, Mendola, Marrett, & Dale, 1998). In
addition, V1 distributes specific visual signals to the other visual areas that are located in
the surrounding cortex (for a review, see Bullier, 2003). This anatomical and functional or
ganization enables the visual brain to deal with the processing of global and local fea
tures of visual objects and scenes. The result of processing at distinct levels of complexity
at each stage can be flexibly and dynamically integrated into coherent perception (Bar
tels & Zeki, 1998; Tootell et al., 1998; Zeki, 1993; Zeki & Bartels, 1998). Because of the
Page 2 of 37
Perceptual Disorders
inhomogeneity of spatial resolution and acuity in the visual field (Anstis, 1974), the field
size for processing visual details (e.g., form vision) is much smaller, comprising the inner
9 degrees of the binocular visual field (i.e., macular region; Henderson, 2003). Occipital-
parietal, posterior parietal, and prefrontal mechanisms guarantee rapid global context ex
traction as well as visual spatial working memory (Bar, 2004; Henderson, 2003; Hochstein
& Ahissar, 2002).
Ungerleider and Mishkin (1982) have characterized the functional specialization of the vi
sual brain as consisting of two processing streams: The “where” pathway or dorsal route,
comprising occipital-parietal visual areas and connections, is specialized in space pro
cessing; and the “what” pathway or ventral route, comprising occipital-temporal visual
areas and connections, is specialized in object processing. According to Milner and
Goodale (2008), information processed in the dorsal pathway is used for the implicit visu
al guidance of actions, whereas explicit perception is associated with processing in the
ventral stream. Because visual perception usually involves both space- and object-based
information processing, cooperation and interaction between the two visual streams are
required (Goodale & Westwood, 2004). In addition, both routes interact either directly or
indirectly via attention involving the inferior parietal cortex (Singh-Curry & Husain,
2009) and working memory involving the prefrontal cortex (Goodale & Westwood, 2004;
Oliveri et al., 2001). Eye movements play a crucial role in visual processing and thus in vi
sual perception (for a comprehensive review, see Martinez-Conde, Macknik, & Hubel,
2004). The posterior thalamus and its reciprocal connections with cortical regions in the
occipital, parietal, and frontal lobes and with the limbic neocortex form a cortical-subcor
tical network subserving intentionally guided and externally triggered attention as well as
saccadic eye movements that are involved in visual information processing (e.g., Anders
son et al., 2007; Dean, Crowley & Platt 2004; Himmelbach, Erb, & Karnath, 2006; Nobre,
2001; Olson et al., 2000; Schiller & Tehovnik, 2001, 2005). Complex visual stimuli (e.g.,
objects and faces) are coded as specific categories in extrastriate regions in the ventral
visual pathway (Grill-Spector, 2003; Sigala, 2004; Wierenga et al., 2009). Top-down
processes involving the prefrontal cortex facilitate visual object recognition (Bar, 2003),
and hippocampal-dependent memory builds the basis for experience-dependent visual
scanning (Smith & Squire, 2008). (p. 195) Yet, it is still unclear how the brain eventually
codes complex visual stimuli for accurate identification and recognition; it appears, how
ever, that complex visual stimuli are simultaneously represented in two parallel and hier
archically organized processing systems in the ventral and dorsal visual pathways (Konen
& Kastner, 2008).
About 30 percent of patients with acquired brain injury suffer from visual disorders
(Clarke, 2005; Rowe et al., 2009; Suchoff et al., 2008). Lower level visual functions and
abilities (e.g., visual detection and localization, visual acuity, contrast sensitivity, and col
or discrimination) may be understood as perceptual architecture, whereas higher level,
visual-cognitive capacities (e.g., text processing and recognition) also involve learning
and memory processes as well as executive functions. Selective visual disorders after
brain injury are the exception rather than the rule because small “strategic” lesions are
very rare and visual cortical areas are intensely interconnected. Injury to the visual brain,
Page 3 of 37
Perceptual Disorders
that is, to visual cortical areas and fiber connections, therefore commonly causes an asso
ciation of visual disorders.
Visual Field
A homonymous visual field defect is defined as a restriction of the normal visual field
caused by injury to the afferent postchiasmatic visual pathway, that is, an interruption in
the flow of visual information between the optic chiasm and the striate cortex. Homony
mous visual field disorders are characterized by partial or total blindness in correspond
ing visual field regions of each eye. In the case of unilateral postchiasmatic brain injury,
vision may be lost in the left or right hemifield (homonymous left- or right-sided hemi
anopia), the left or right upper or lower quadrants (homonymous upper or lower quadra
nopia in the left or right hemifield), or a restricted portion in the parafoveal visual field
(paracentral scotoma). The most common type of homonymous visual field disorders is
hemianopia (loss of vision in one hemifield), followed by quadranopia (loss of vision in one
quadrant) and paracentral scotoma (island of blindness in the parafoveal field region). Vi
sual field defects are either absolute (complete loss of vision, anopia) or relative (de
pressed vision, amblyopia, hemiachromatopsia). Homonymous amblyopia typically affects
the entire hemifield (hemiamblyopia), and homonymous achromatopsia (i.e., the selective
loss of color vision) typically affects one hemifield (hemiachromatopsia) or the upper
quadrant. Visual field defects differ with respect to visual field sparing. Foveal sparing
refers to sparing of the foveal region (1 degree), macular sparing refers to the preserva
tion of the macular region (5 degrees), and macular splitting refers to a sparing of less
than 5 degrees (for review, see Harrington & Drake, 1990). In the majority of patients
(71.5 percent of 876 cases), field sparing does not exceed 5 degrees. As a rule, patients
with small visual field sparing are more disabled, especially with regard to reading.
Stroke represents the most common etiology, but other etiologies such as traumatic brain
injury, tumors, multiple sclerosis, and cortical posterior atrophy may also cause homony
mous visual field disorders (see Zihl, 2011).
About 80 percent of patients (n = 157) with unilateral homonymous visual field loss suffer
from functional impairments in reading (hemianopic dyslexia) and/or in global perception
and overview (Zihl, 2011). Homonymous visual field loss causes a restriction of the field
of view, which prevents the rapid extraction of the entire spatial configuration of the visu
al environment. It therefore impairs the top-down and bottom-up interactions that are re
quired for efficient guidance of spatial attention and oculomotor activities during scene
perception and visual search. Patients with additional injury to the posterior thalamus,
the occipital white matter route (i.e., fiber pathways to the dorsal visual route and path
ways connecting occipital, parietal, temporal, and frontal cortical areas) show disorga
nized oculomotor scanning behavior (Zihl & Hebel, 1997; Mort & Kennard, 2003). The im
pairments in global perception and visual scanning shown by these patients are more se
vere than those resulting from visual field loss alone (Zihl, 1995a). Interestingly, about 20
percent show spontaneous substitution of visual field loss by oculomotor compensation
and thus enlargement of the field of view; the percentage is even higher in familiar sur
roundings because patients can make use of their spatial knowledge of the surroundings
Page 4 of 37
Perceptual Disorders
(Zihl, 2011). In normal subjects, global visual perception is based on the visual field with
in which they can simultaneously detect and process visual stimuli. The visual field can be
enlarged by eye shifts, which is typically 50 degrees in all directions (Leigh & Zee, 2006).
The resulting field of view is thus defined by the extent of the visual field when moving
the eyes in global visual perception (see also Pambakian, Mannan, Hodgson, & Kennard,
2004).
Reading is impaired in patients with unilateral homonymous field loss and visual field
sparing of less than 5 degrees to the left and less than 8 degrees to the right of the fovea.
In reading, the visual brain (p. 196) relies on a gestalt-type visual word-form processing,
the “reading span.” It is asymmetrical (larger to the right in left-to right-orthographies)
and is essential for the guidance of eye movements during text processing (Rayner, 1998).
However, insufficient visual field sparing does not appear to be the only factor causing
persistent “hemianopic” dyslexia. The extent of brain injury affecting in particular the oc
cipital white matter seems to be crucial in this regard (Schuett, Heywood, Kentridge, &
Zihl, 2008a; Zihl 1995b). That reading is impaired at the pre-semantic visual sensory level
is supported by the outcome of treatment procedures involving practice with nontext ma
terial, which have been found to be as effective as word material in reestablishing eye
movement reading patterns and improving reading performance (Schuett, Heywood, Ken
tridge, & Zihl, 2008b).
In the case of bilateral postchiasmatic brain injury, both visual hemifields are affected, re
sulting in bilateral homonymous hemianopia (“tunnel vision”), bilateral upper or lower
hemianopia, bilateral paracentral scotoma, or central scotoma. Patients with bilateral vi
sual field disorders suffer from similar, but typically more debilitating, visual impairments
in global visual perception and reading. A central scotoma is a very dramatic form of
homonymous visual field loss because foveal vision is either totally lost or depressed (cen
tral amblyopia). The reduction or loss of vision in the central part of the visual field is typ
ically associated with a corresponding loss of visual spatial contrast sensitivity, visual acu
ity, and form, object, and face perception. The loss of foveal vision also causes a loss of
the central reference for optimal fixation and of the straight-ahead direction as well as an
impairment of the visual-spatial guidance of saccades and hand-motor responses. As a
consequence, patients cannot accurately fixate a visual stimulus and shift their gaze from
one stimulus to another, scan a scene or a face, and guide their eye movements during
scanning and reading. Patients therefore show severe impairments in locating objects,
recognizing objects and faces, finding their way in rooms or places, and reading, and of
ten get lost when scanning a word or a scene (Zihl, 2011).
After unilateral postchiasmatic brain injury, visual acuity is usually not significantly re
duced, except for cases in which the optic tract is involved (Frisén, 1980). After bilateral
postchiasmatic injury, visual acuity can either be normal, gradually diminished, or totally
Page 5 of 37
Perceptual Disorders
lost (i.e., form vision is no longer possible) (Symonds & MacKenzie, 1957). This reduction
in visual acuity cannot be improved by optical correction.
Color Vision
(mild) hypoxia (Connolly, Barbur, Hosking, & Moorhead, 2008), multiple sclerosis (Moura
et al., 2008), Parkinson’s disease (Müller, Woitalla, Peters, Kohla, & Przuntek, 2002), and
dementia of the Alzheimer type (Jackson & Owsley, 2003). Furthermore, color hue dis
crimination accuracy can be considerably reduced in older age (Jackson & Owsley, 2003).
Spatial Vision
Disorders in visual space perception comprise deficits in visual localization, depth percep
tion, and perception of visual spatial axes. Brain injury can differentially affect retino
topic, spatiotopic, egocentric, and allocentric frames of reference. Visual-spatial disor
ders typically occur after occipital-parietal and posterior parietal injury; a right-hemi
sphere injury more frequently causes visual spatial impairments (for comprehensive re
views, see Farah, 2003; Karnath & Zihl, 2003; Landis 2000).
After unilateral brain injury, moderate defective visual spatial localization is typically
found in the contralateral hemifield, but may also be present in the foveal visual field
(Postma, Sterken, de Vries, & de Haan, 2000), which is associated with less accurate sac
cadic localization accuracy. Patients with bilateral posterior brain injury, in contrast, show
moderate to severe localization inaccuracy in the entire visual field, which typically af
fects all visually guided activities, including accurately fixating objects, reaching for ob
jects, and reading and writing (Zihl, 2011). Interestingly, patients with parietal lobe injury
can show dissociation between spatial perception deficits and pointing errors (Darling,
Bartelt, Pizzimenti, & Rizzo, 2008), indicating that inaccurate pointing cannot always be
explained in terms of defective localization but may represent a genuine disorder (optic
ataxia; see Caminiti et al., 2010).
Impaired monocular and binocular depth perception (astereopsis) has been observed in
patients with unilateral and bilateral posterior brain injury, with bilateral injury causing
more severe deficits. Defective depth perception may cause difficulties in pictorial depth
perception, walking (downstairs), and reaching for objects or handles (Koh et al., 2008;
Miller et al., 1999; Turnbull, Driver, & McCarthy, 2004). Impaired distance perception, in
particular in the peripersonal space, has mainly been observed after bilateral occipital-
parietal injury (Berryhill, Fendrich, & Olson, 2009).
Shifts in the vertical and horizontal axes have been reported particularly in patients with
right occipital-parietal injury (Barton, Behrmann, & Black, 1998; Bonan, Leman, Legar
gasson, Guichard, & Yelnik, 2006). Right-sided posterior parietal injury can also cause ip
silateral and contralateral shifts in the visually perceived trunk median plane (Darling,
Pizzimenti, & Rizzo, 2003). Occipital injury more frequently causes contralateral shifts in
spatial axes, whereas posterior parietal injury also causes ipsilateral shifts. Barton and
Black (1998) suggested that the contralateral midline shift of hemianopic patients is “a
consequence of the strategic adaptation of attention into contralateral hemispace after
hemianopia” (p. 660), that is, that a change in attentional distribution might cause an ab
normal bias in line bisection. In a study of 129 patients with homonymous visual field
loss, we found the contralateral midline shift in more than 90 percent of cases. However,
Page 7 of 37
Perceptual Disorders
the line bisection bias was not associated with efficient oculomotor compensation for the
homonymous visual loss. In addition, visual field sparing also did not modulate the degree
of midline shift. Therefore, the subjective straight-ahead deviation may be explained as a
consequence of a systematic, contralesional shift of the egocentric visual midline and may
therefore represent a genuine visual-spatial perceptual disorder (Zihl, Sämann, Schenk,
Schuett, & Dauner, 2009). This idea is supported by Darling et al. (2003), who reported
difficulties in visual perception of the trunk-fixed anterior-posterior axis in patients with
left- or (p. 198) right-sided unilateral posterior parietal lesions without visual field defects.
Processing of direction and speed of visual motion stimuli is a genuine visual ability. How
ever, in order to know how objects move in the world, we must take into account the rota
tion of our eyes as well as of our head (Bradley, 2004; Snowden & Freeman, 2004). Mo
tion perception also enables recognition of biological movements (Giese & Poggio, 2003)
and supports face perception (Roark, Barrett, Spence, Abdi, & O’Toole, 2003). Visual area
V5 activity is the most critical basis for generating motion perception (Moutoussis & Zeki,
2008), whereas superior temporal and premotor areas subserve biological motion percep
tion (Saygin, 2007).
The first well-documented case of loss of visual motion perception (cerebral akinetopsia)
is L.M. After bilateral temporal-occipital cerebrovascular injury, she completely lost move
ment vision in all three dimensions, except for detection and direction discrimination of
single targets moving at low speed with elevated thresholds. In contrast, all other visual
abilities, including the visual field, visual acuity, color vision, stereopsis, and visual recog
nition, were spared, as was motion perception in the auditory and tactile modalities. Her
striking visual-perceptual impairment could not be explained by spatial or temporal pro
cessing deficits, impaired contrast sensitivity (Hess, Baker, & Zihl, 1989), or generalized
cognitive slowing (Zihl, von Cramon, & Mai, 1983; Zihl, von Cramon, Mai, & Schmid,
1991). L.M. was also unable to search for a moving target among stationary distractor
stimuli in a visual display (McLeod, Heywood, Driver, & Zihl, 1989) and could not see bio
logical motion stimuli (McLeod, Dittrich, Driver, Perrett, & Zihl, 1996), including facial
movements in speech reading (Campbell, Zihl, Massaro, Munhall, & Cohen, 1997). She
could not extract shape from motion and lost apparent motion perception (Rizzo, Nawrot,
& Zihl, 1995). Because of her akinetopsia, L.M. was severely handicapped in all activities
involving visual motion perception, whereby perception and action were similarly affect
ed (Schenk, Mai, Ditterich, & Zihl, 2000). Selective impairment of movement vision in
terms of threshold elevation for speed and direction has also been reported in the hemi
field contralateral to unilateral posterior brain injury for motion types of different com
plexity, combined and in separation (Billino, Braun, Bohm, Bremmer, & Gegenfurtner,
2009; Blanke, Landis, Mermoud, Spinelli, & Safran, 2003; Braun, Petersen, Schoenle, &
Fahle, 1998; Plant, Laxer, Barbaro, Schiffman, & Nakayama, 1993; Vaina, Makris,
Kennedy, & Cowey, 1998).
Page 8 of 37
Perceptual Disorders
Visual agnosia is the inability to identify, recognize, interpret, or comprehend the mean
ing of visual stimuli even though basic visual functions (i.e., the visual field, visual acuity,
spatial contrast sensitivity, color vision, and form discrimination) are intact or at least suf
ficiently preserved. Visual agnosia either results from defective visual perception (e.g.,
synthesis of features; apperceptive visual agnosia) or from the loss of the “bridge” be
tween the visual stimulus and its semantic associations (e.g., label, use, history; associa
tive or semantic visual agnosia). However, objects can be recognized in the auditory and
tactile modalities, and the disorder cannot be explained by supramodal cognitive or apha
sic deficits (modified after APA, 2007). Lissauer (1890) interpreted apperceptive visual ag
nosia as “mistaken identity” because incorrectly identified objects share global (e.g., size
and shape) and/or local properties (e.g., color, texture, form details) with other objects,
which causes visual misidentification. Cases with pure visual agnosia seem to be the ex
ception rather than the rule (Riddoch, Johnston, Bracewell, Boutsen, & Humphreys,
2008). Therefore, a valid and equivocal differentiation between a “genuine” visual ag
nosia and secondary impairments in visual identification and recognition resulting from
other visual deficits is often difficult, in particular concerning the integration of global
and local information (Delvenne, Seron, Coyette, & Rossion, 2004; Thomas & Forde,
2006). In a group of 1,216 patients with acquired injury to the visual brain we have found
only seventeen patients (about 2.4 percent) with genuine visual agnosia. Visual agnosia is
typically caused by bilateral occipital-temporal injury (Barton, 2008a) but may also occur
after left- (Barton, 2008b) or right-sided posterior brain injury (Landis, Regard, Bliestle,
& Kleihues, 1988). There also exist progressive forms of visual agnosia in posterior corti
cal atrophy and in early stages of dementia (Nakachi et al., 2007; Rainville et al., 2006).
Farah (2000) has proposed a useful classification of visual agnosia according to the type
of visual material patients find difficult to identify and recognize. Patients with visual ob
ject and form agnosia are unable to visually recognize complex objects or pictures.
(p. 199) There exist category-specific types of object agnosia, such as for living and nonliv
ing things (Thomas & Forde, 2006), animals or artifacts (Takarae & Levin, 2001). A par
ticular type of visual object agnosia is visual form agnosia. The most elaborated case with
visual form agnosia is D.F. (Milner et al., 1991). After extensive damage to the ventral
processing stream due to carbon monoxide poisoning, this patient showed a more or less
complete loss of form perception, including form discrimination, despite having a visual
resolution capacity of 1.7 minute of arc. Visually guided activities such as pointing to or
grasping for an object, however, were spared (Carey, Dijkerman, Murphy, Goodale, & Mil
ner, 2006; James, Culham, Humphrey, Milner, & Goodale, 2003; McIntosh, Dijkerman,
Mon-Williams, & Milner, 2004). D.F. also showed profound inability to visually recognize
objects, places, and faces, indicating a more global rather than selective visual agnosia.
Furthermore, D.F.’s visual disorder may also be explained in terms of an allocentric spa
tial deficit rather than as perceptual deficit (Schenk, 2006). As Goodale and Westwood
(2004) have pointed out, the proposed ventral-dorsal division in visual information pro
cessing may not be as exclusive as assumed, and both routes interact at various stages.
However, automatic obstacle avoidance was intact in D.F. while correct grasping was pos
Page 9 of 37
Perceptual Disorders
sible for simple objects only (McIntosh et al., 2004), suggesting that the “what” pathway
plays no essential role in detecting and localizing objects or in the spatial guidance of
walking (Rice et al., 2006). Further cases of visual form agnosia after carbon monoxide
poisoning have been reported by Heider (2000). Despite preserved visual acuity and only
minor visual field defects, patients were severely impaired in shape and form discrimina
tion, whereas the perception of color, motion, and stereoscopic depth was relatively unim
paired. Heider (2000) identified a failure in figure–ground segregation and grouping sin
gle elements of a composite visual scene into a “gestalt” as the main underlying deficit.
Global as well as local processing can be affected after right- and left-sided occipital-tem
poral injury (Rentschler, Treutwein, & Landis, 1994); yet, typically patients find it more
difficult to process global features and integrate them into a whole percept (integrative or
simultaneous agnosia; Behrmann & Williams, 2007; Saumier, Arguin, Lefebvre, & Las
sonde, 2002; Thomas & Forde, 2006). Consequently, patients are unable to report more
than one attribute of a single object (Coslett & Lie, 2008). Encoding the spatial arrange
ments of parts of an object requires a mechanism that is different from that required for
encoding the shape of individual parts, with the former selectively compromised in inte
grative agnosia (Behrmann, Peterson, Moscovitch, & Suzuki, 2006). Integration of multi
ple object stimuli into a holistic interpretation seems to depend on the spatial distance of
local features and elements (Huberle & Karnath, 2006). Yet, shifting fixation and thus al
so attention to all elements of an object in a regular manner seems not sufficient to
“bind” together the different elements of spatially distributed stimuli (Clavagnier et al.,
2006). The integration of multiple visual elements resulting in a conscious perception of
their gestalt seems to rely on bilateral structures in the human lateral and medial inferior
parietal cortex (Himmelbach, Erb, Klockgether, Moskau, & Karnath, 2009). An alternative
explanation for the impairment in global visual perception is shrinkage of the field of at
tention and thus perception (Michel & Henaff, 2004), which might be elicited by atten
tional capture (“radical visual capture”) to single, local elements (Takaiwa, Yoshimura,
Abe, & Terai, 2003; Dalrymple, Kingstone, & Barton, 2007). The pathological restriction
and rigidity of attention impair the integration of multiple visual elements to a gestalt,
but the type of capture depends on the competitive balance between global and local
salience. The impaired disengaging of attention causes inability to “unlock” attention
from the first object or object element to other objects or elements of objects (Pavese,
Coslett, Saffran, & Buxbaum, 2002). Interestingly, facial expressions of emotion are less
affected in simultanagnosia, indicating that facial stimuli constitute a specific category of
stimuli that attract attention more effectively and are possibly processed before attention
al engagement (Pegna, Caldara-Schnetzer, & Khateb, 2008). It has been proposed that
differences in local relative to more global visual processing can be explained by different
processing modes in the dorsal and medial ventral visual pathways at an extrastriate lev
el; these characteristics can also explain category-specific deficits in visual perception
(Riddoch et al., 2008). The dual-route organization of visual information has also been ap
plied to local–global perception. Difficulties with processing of multiple stimulus elements
or features (within-object representation) are often referred to as “ventral” simultanag
nosia, and impaired processing of multiple spatial stimuli (between-object representation)
as “dorsal” simultanagnosia (Karnath, Ferber, Rorden, & Driver, 2000). Dorsal simul
Page 10 of 37
Perceptual Disorders
tanagnosia is one component of the Bálint-Holmes syndrome, which consists of (p. 200)
spatial (and possibly temporal) restriction of the field of visual attention and thus visual
processing and perception, impaired visual spatial localization and orientation, and defec
tive depth perception (Moreaud, 2003; Rizzo & Vecera, 2002). In addition, patients with
severe Balint’s syndrome find it extremely difficult to shift their gaze voluntarily or on
command (oculomotor apraxia or psychic paralysis of gaze) and are unable to direct
movement of an extremity in space under visual guidance (optic or visuomotor ataxia). As
a consequence, visually guided oculomotor and hand motor activities, visual-constructive
abilities, visual orientation, recognition, and reading are severely impaired (Ghika, Ghika-
Schmid, & Bogousslavsky, 1998).
In face agnosia (prosopagnosia), recognition of familiar faces, including one’s own face, is
impaired or lost. The difficulties prosopagnosic patients have with visual face recognition
also manifest in their oculomotor scan path during inspection of a face; global features
such as hair or the forehead, for example, are scanned in much more detail than genuine
facial features such as the eye or nose (Stephan & Caine, 2009). Other prosopagnosic
subjects may show partial processing of facial features, such as the mouth region
(Bukach, Le Grand, Kaiser, Bub, & Tanaka, 2008). Topographical (topographagnosia) or
environmentalagnosia refers to defective recognition of familiar environments, in reality
and on maps and pictures; however, patients may have fewer difficulties in familiar sur
roundings and with scenes with clear landmarks, and may benefit from semantic informa
tion such as street names (Mendez & Cherrier, 2003). Agnosia for letters (pure alexia) is a
form of acquired dyslexia with defective visual recognition of letters and words while au
ditory recognition of letters and words and writing are intact. The underlying disorder
may have a pre-lexical, visual-perceptual basis because patients can also exhibit difficul
ties with nonlinguistic stimuli (Mycroft, Behrmann, & Kay, 2009).
Audition
Auditory perception comprises detection, discrimination, identification, and recognition
of sounds, voice, music, and speech. The ability to detect and discriminate attributes of
sounds improves with practice (Wright & Zhang, 2009) and thus depends on auditory ex
perience. This might explain interindividual differences in auditory performance, in par
ticular recognition expertise and domain specificity concerning, for example, sounds,
voices, and music (Chartrand, Peretz, & Belin, 2008). Another factor that crucially modu
lates auditory perceptual efficiency is selective attention (Shinn-Cunningham & Best,
2008).
The auditory brain possesses tonotopic maps that show rapid task-related changes to sub
serve distinct functional roles in auditory information processing, such as pitch versus
phonetic analysis (Ozaki & Hashimoto, 2007). This task specificity can be viewed as a
form of plasticity that is embedded in a context- and cognition-related frame of reference,
whereby attention, learning and memory, and mental imagery can modulate processing
(Dahmen & King, 2007; Fritz, Elhilali, & Shamma, 2005; Weinberger, 2007; Zatorre,
Page 11 of 37
Perceptual Disorders
The auditory system also possesses a “where” and a “what” subdivision for processing
spatial and nonspatial aspects of acoustic stimuli, which allows detection, localization,
discrimination, identification, and recognition of auditory information, including vocal
communication sounds (speech perception) and music (Kraus & Nicol, 2005; Wang, Wu, &
Li, 2008).
Unilateral and bilateral injury to left- or right-sided temporal brain structures can affect
processing of spatial and temporal auditory processing capacities (Griffiths et al., 1997;
Polster & Rose, 1998) and the perception of environmental sounds (Tanaka, Nakano, &
Obayashi, 2002), sound movement (Lewald, Peters, Corballis, & Hausmann, 2009), tunes,
prosody, and voice (Peretz et al., 1994), and words (pure word deafness) (Shivashankar,
Shashikala, Nagaraja, Jayakumar, & Ratnavalli, 2001). Functional dissociation of auditory
perceptual deficits, such as preservation of speech perception and environmental sounds
but impairment of melody perception (Peretz et al., 1994), impaired speech perception
but intact environmental sound perception (Kaga, Shindo, & Tanaka, 1997), and impaired
perception of verbal but spared perception of nonverbal stimuli (Shivashankar et al.,
2001), suggests a modular architecture similar to that in the visual cortex (Polster &
Rose, 1998).
Auditory Agnosia
Page 12 of 37
Perceptual Disorders
Somatosensory Perception
The somatosensory system provides information about object surfaces that are in direct
contact with the skin (touch) and about the position and movements of body parts (propri
oception and kinesthesis). Somatosensory perception thus includes detection and discrim
ination of (fine) differences in touch stimulation and haptic perception, that is, the per
ception of shape, size, and identity (recognition) of objects on the basis of touch and
kinesthesis. Shape is an important cue for recognizing objects by touch; edges, curvature,
and surface areas are associated with three-dimensional shape (Plaisier, Tiest, & Kap
pers, 2009). Exploratory motor procedures are directly linked to the extraction of specific
shape properties (Valenza et al., 2001). Somatosensory information is processed in anteri
or, lateral, and posterior parietal cortex, but also in frontal, cingulate, temporal, and insu
lar cortical regions (Porro, Lui, Facchin, Maieron, & Baraldi, 2005).
(Bohlhalter, Fretz, & Weder, 2002; Estanol, Baizabal-Carvallo, & Senties-Madrid, 2008).
Difficulties to identify objects using hand manipulation only have been reported after
parietal injury (Tomberg & Desmedt, 1999). Impairment of the perception of stimulus
shape (morphagnosia) may result from defective processing of spatial orientation in two-
and three-dimensional space (Saetti, De Renzi, & Comper, 1999). (p. 202) Tactile object
recognition can be impaired without associated disorders in tactile discrimination and
manual shape exploration, indicating the existence of “pure” tactile agnosia (Reed, Casel
li, & Farah, 1996).
Disorders in body perception may affect body form and body actions selectively or in com
bination (Moro et al., 2008). Patients with injury to the premotor cortex may show ag
nosia for their body (asomatognosia); that is, they describe parts of their body to be miss
ing or disappeared from body awareness (Arzy, Overney, Landis, & Blanke, 2006). Macro
and, less frequently, micro somatognosia have been reported as transient and reversible
modifications of body representation during migraine aura (Robinson & Podoll, 2000).
Asomatognosia either may involve the body as a whole (Beis, Paysant, Bret, Le Chapelain,
& Andre, 2007) or may be restricted to finger recognition (“finger agnosia”; Anema et al.,
2008). Body misperception may also result in body illusion, a deranged representation of
the body concerning its ownership labeled “somatoparaphrenia” (Vallar & Ronchi, 2009).
Distorted body perception may also occur in chronic pain (Lotze & Moseley, 2007).
Smell perception involves the caudal orbitofrontal and medial temporal cortices. Olfacto
ry stimuli are processed in primary olfactory (piriform) cortex and also activate the amyg
dala bilaterally, regardless of valence. In posterior orbitofrontal cortex, processing of
pleasant and unpleasant odors is segregated within medial and lateral segments, respec
tively, indicating functional heterogeneity. Olfactory stimuli also show that brain regions
mediating emotional processing are differentially activated by odor valence and provide
Page 14 of 37
Perceptual Disorders
evidence for a close anatomical coupling between olfactory and emotional processes (Got
tfried, Deichmann, Winston, & Dolan, 2002).
Gustation is vital for establishing whether a specific substance is edible and nutritious or
poisonous, and for developing preferences for specific foods. According to the well-known
taste tetrahedron, four basic taste qualities can be distinguished: sweet, salt, sour, and
bitter. A fifth taste quality is umami, a Japanese word for “good taste.” Perceptual taste
qualities are based on the pattern of activity across different classes of sensory fibers
(i.e., cross-fiber theory; Mather, 2006, pp. 44) and distributed cortical processing (Simon,
de Araujo, Gutierrez, & Nicolelis, 2006). Taste information is conveyed through the cen
tral gustatory pathways to the gustatory cortical area, but is also sent to the reward sys
tem and feeding center via the prefrontal cortex, insular cortex, and amygdala (Simon et
al., 2006; Yamamoto, 2006). The sensation of eating, or flavor, involves smell and taste as
well as interactions between these and other perceptual systems, including temperature,
touch, and sight. However, flavor is not a simple summation of different sensations; smell
and taste seem to dominate flavor.
Chronic disorders in olfactory perception and recognition have been reported after (trau
matic) brain injury mainly to ventral frontal cortical structures (Fujiwara, Schwartz,
Gaom Black, & Levine, 2008; Haxel, Grant, & Mackay-Sim, 2008; Wermer, Donswijk,
Greebe, Verweij, & Rinkel, 2007), in (p. 203) Parkinson’s disease and multiple sclerosis, in
mesial temporal epilepsy, and in neurodegenerative diseases, including dementia of the
Alzheimer type, frontal-temporal dementia, cortical-basal degeneration, and Huntington’s
disease (Barrios et al., 2007; Doty, 2009; Jacek, Stevenson, & Miller, 2007; Pardini, Huey,
Cavanagh, & Grafman, 2009). It should be mentioned, however, that hyposmia and im
paired odor identification can also be found in older age (Wilson, Arnold, Tang, & Ben
nett, 2006), in particular in subjects with cognitive decline. Presbyosmia has been found
in particular after 65 years of age, with no difference between males and females, and
with a weak relationship between self-reports of olfactory function and objective olfactory
function (Mackay-Sim, Johnston, Owen, & Burne, 2006). Olfactory perceptual changes
have also been reported among subjects receiving chemotherapy (Bernhardson, Tishel
man, & Ruthqvist, 2009), in depression (Pollatos et al., 2007), and in anorexia nervosa
(Roessner, Bleich, Banashewski, & Rothenburger, 2005).
Page 15 of 37
Perceptual Disorders
Smell and taste dysfunctions, including impaired detection, discrimination, and identifica
tion of foods, have been frequently reported in patients following (minor) stroke in tempo
ral brain structures (Green, McGregor, & King, 2008). Abnormalities in taste and smell
have also been reported in patients with Parkinson’s disease (Shah et al., 2009).
Social Perception
Social perception is an individual’s perception of social stimuli (i.e., facial expression,
prosody and gestures, and smells), which allow inferring motives, attitudes, or values
from the social behavior of other individuals. Social perception and social cognition, but
also sensitivity to the social context, and social action, belong to particular functional sys
tems in the prefrontal brain (Adolphs, 2003; Adolphs, Tranel, & Damasio, 2003). The
amygdala is involved in recognizing facial emotional expressions; the orbitofrontal cortex
is important to reward processing; and the insula is involved in representing “affective”
states of our own body, such as empathy or pain (Adolphs, 2009). The neural substrates of
social perception are characterized by a general pattern of right-hemispheric functional
asymmetry (Brancucci, Lucci, Mazzatenta, & Tommasi, 2009). The (right) amygdala is
crucially involved in evaluating sad but not happy faces, suggesting that this brain struc
ture plays a specific role in processing negative emotions, such as sadness and fear
(Adolphs & Tranel, 2004).
Patients with traumatic brain injury may show difficulties with recognizing affective infor
mation from the face, voice, bodily movement, and posture (Bornhofen & McDonald,
2008), which may persistently interfere with successful negotiation of social interactions
(Ietswaart, Milders, Crawford, Currie, & Scott, 2008). Interestingly, face perception and
perception of visual social cues can be affected while the perception of prosody can be
relatively spared, indicating a dissociation between visual and auditory social-perceptual
abilities (Croker & McDonald, 2005; Green, Turner, & Thompson, 2004; Pell, 1998). Im
paired auditory recognition of fear and anger has been reported following bilateral amyg
Page 16 of 37
Perceptual Disorders
dala lesions (Scott et al., 1997). Impairments of social perception, including inaccurate in
terpretation and evaluation of stimuli signifying reward or punishment in a social context,
and failures to translate emotional and social information into task- and context-appropri
ate action patterns are often observed in subjects with frontal lobe injury. Consequently,
patients may demonstrate inadequate social judgments and decision making, social inflex
ibility, and lack of self-monitoring, particularly in social situations (Rankin, 2007). Difficul
ties with facial expression perception have also been reported in mood disorders (Venn,
Watson, Gallagher, & Young, 2006).
and not cause any handicap, because there is no “internal” representation of the outside
world in our brains and thus no visual experience? Modulatory effects of producing action
on perception such that observers become selectively sensitive to similar or related ac
tions are known from visual imitation learning and social interactions (Schutz-Bosbach &
Prinz, 2007), but in both instances, perception of action and, perhaps, motivation to ob
serve and attention directed to the action in question are required. Nevertheless, a more
detailed understanding of the bidirectional relationships between perception and action
and the underlying neural networks will undoubtedly help us to understand how percep
tion modulates action and vice versa. Possibly, the search for associations and dissocia
tions of perceptions and actions in cases with acquired brain injury in the framework of
common functional representations in terms of sensorimotor contingencies represents a
helpful approach to studying the reciprocal relationships between perception and action.
Accurate visually guided hand actions in the absence of visual perception (Goodale, 2008)
and impaired eye–hand coordination and saccadic control in optic ataxia as a conse
quence of impaired visual-spatial processing (Pisella et al., 2009) are examples of such
dissociations and associations. Despite some conceptual concerns and limitations, the
dual-route model of visual processing proposed by Milner and Goodale (2008) is still of
theoretical and practical value (Clark, 2009).
derlying perception and cognition may further help to define perceptual dysfunction with
sufficient validity and thus contribute to the significance of perception (Pollen, 2008). Re
search on functional plasticity in subjects with perceptual disorders using experimental
practice paradigms may, in addition, contribute to a more comprehensive and integrative
understanding of perception in the framework of other functional systems in the brain,
which are known to modulate perceptual learning and thus functional plasticity in percep
tual systems (Gilbert, Li & Piech, 2009; Gilbert & Sigman, 2007).
Author Note
Preparation of this chapter has been supported in part by the German Ministry for Educa
tion and Research (BMBF grant 01GW0762). I want to thank Susanne Schuett for her
very helpful support.
References
Adolphs, R. (2003). Cognitive neuroscience of human social behaviour. Nature Reviews
Neuroscience, 4, 165–178.
Adolphs, R. (2009). The social brain: Neural basis of social knowledge. Annual Review of
Psychology, 60, 693–716.
Adolphs, R., & Tranel, D. (2004). Impaired judgments of sadness but not happiness follow
ing bilateral amygdala damage. Journal of Cognitive Neuroscience, 16, 453–462.
Adolphs, R., Tranel, D., & Damasio, A. R. (2003). Dissociable neural systems for recogniz
ing emotions. Brain and Cognition, 52, 61–69.
Aglioti, S., Bricolo, E., Cantagallo A., Berlucchi, G. (1999). Unconscious letter discrimina
tion is enhanced by association with conscious color perception in visual form agnosia.
Current Biology, 9, 1419–1422.
Andersson, F., Joliot, M., Perchey, G., & Petit, L. (2007). Eye position-dependent activity in
the primary visual area as revealed by fMRI. Human Brain Mapping, 28, 673–680.
Anema, H. A., Kessels, R. P., de Haan, E. H., Kappelle, L. J., Leijten, F. S., van Zandvoort,
M. J., & Dijkerman, H. C. (2008). Differences in finger localisation performance of pa
tients with finger agnosia. NeuroReport, 19, 1429–1433.
Anstis, S. M. (1974). A chart demonstrating variations in acuity with retinal position. Vi
sion Research, 14, 579–582.
Arzy, S., Overney, L. S., Landis, T., Blanke, O. (2006). Neural mechanisms of embodiment:
Asomatognosia due to premotor cortex damage. Archives of Neurology, 63, 1022–1025.
Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music
agnosia associated with middle cerebral artery infarcts. Brain, 123, 1926–1938.
Page 19 of 37
Perceptual Disorders
Bamiou, D. E., Musiek, F. E., & Luxon, L. M. (2003). The insula (Island of Reil) and its role
in auditory processing. Brain Research—Brain Research Reviews, 42, 143–154.
Barrios, F. A., Gonzalez, L., Favila, R., Alonso, M. E., Salgado, P. M., Diaz, R., & Fernan
dez-Ruiz, J. (2007). Olfaction and neurodegeneration in HD. NeuroReport, 18, 73–76.
Bartels, A., & Zeki, S. (1998). The theory of multistage integration. Proceedings of the
Royal Society London B, 265, 2327–2332.
Barton, J. J. S., & Black, S. E. (1998). Line bisection in hemianopia. Journal of Neurology,
Neurosurgery & Psychiatry, 64, 660–662.
Barton, J. J. S., Behrmann, M., & Black, S. (1998) Ocular search during line bisection: The
effects of hemi-neglect and hemianopia. Brain, 121, 1117–1131.
Behrmann, M., Peterson, M. A., Moscovitch, M., & Suzuki, S. (2006). Independent repre
sentation of parts and the relations between them: Evidence from integrative agnosia.
Journal of Experimental Psychology: Human Perception & Performance, 32, 1169–1184.
Beis, J. M., Paysant, J., Bret, D., Le Chapelain, L., & Andre, J. M. (2007). Specular right-left
disorientation, finger-agnosia, and asomatognosia in right hemisphere stroke. Cognitive &
Behavioral Neurology, 20, 163–169.
Berlucchi, G., & Aglioti, S. M. (2010). The body in the brain revisited. Experimental Brain
Research, 200, 25–35.
Bernhardson, B. M., Tishelman, C., & Rutqvist, L. E. (2009). Olfactory changes among pa
tients receiving chemotherapy. European Journal of Oncology Nursing, 13, 9–15.
Berryhill, M. E., Fendrich, R., & Olson, I. R. (2009). Impaired distance perception
(p. 206)
and size constancy following bilateral occipito-parietal damage. Experimental Brain Re
search, 194, 381–393.
Page 20 of 37
Perceptual Disorders
Bharucha, J. J., Curtis, M., & Paroo, K. (2006). Varieties of musical experiences. Cognition,
100, 131–172.
Billino, J., Braun, D. I., Bohm, K. D., Bremmer, F., & Gegenfurtner, K. R. (2009). Cortical
networks for motion perception: effects of focal brain lesions on perception of different
motion types. Neuropsychologia, 47, 2133–2144.
Blanke, O., Landis, T., Mermoud, C., Spinelli, L., & Safran, A. B. (2003). Direction-selec
tive motion blindness after unilateral posterior brain damage. European Journal of Neuro
science, 18, 709–722.
Bohlhalter, S., Fretz, C., & Weder, B. (2002). Hierarchical versus parallel processing in
tactile object recognition: A behavioural-neuroanatomical study of apperceptive tactile
agnosia. Brain, 125, 2537–2548.
Bonan, I. V., Leman, M. C., Legargasson, J. F., Guichard, J. P., & Yelnik, A. P. (2006). Evolu
tion of subjective visual vertical perturbation after stroke. Neurorehabilitation & Neural
Repair, 20, 484–491.
Bornhofen, C., & McDonald, S. (2008). Emotion perception deficits following traumatic
brain injury: A review of the evidence and rationale for intervention. Journal of the Inter
national Neuropsychological Society, 14, 511–525.
Boso, M., Politi, P., Barale, F., & Enzo, E. (2006). Neurophysiology and neurobiology of the
musical experience. Functional Neurology, 21, 187–191.
Bouvier, S. E., & Engel, S. A. (2006). Behavioral deficits and cortical damage loci in cere
bral achromatopsia. Cerebral Cortex, 16, 183–191.
Bradley, D. (2004). Object motion: A world view. Current Biology, 14, R892–R894.
Brancucci, A., Lucci, G., Mazzatena, A. & Tommasi, L. (2009). Asymmetries of the human
social brain in the visual, auditory, and chemical modalities. Philosophical Transactions of
the Royal Society of London—Series B: Biological Sciences, 364, 895–914.
Braun, D., Petersen, D., Schoenle, P., & Fahle, M. (1998). Deficits and recovery of first-
and second-order motion perception in patients with unilateral cortical lesions. European
Journal of Neuroscience, 10, 2117–2128.
Bukach, C. M., Le Grand, R., Kaiser, M. D., Bub, D., & Tanaka, J. W. (2008). Preservation
of mouth region processing in two cases of prosopagnosia. Journal of Neuropsychology, 2,
227–244.
Bulens, C., Meerwaldt, J. D., van der Wildt, G. J., & Keemink, D. (1986). Contrast sensitivi
ty in Parkinson’s disease. Neurology, 36, 1121–1125.
Page 21 of 37
Perceptual Disorders
Bulens, C., Meerwaldt, J. D., van der Wildt, G. J., & Keemink, D. (1989). Spatial contrast
sensitivity in unilateral cerebral ischemic lesions involving the posterior visual pathway.
Brain, 112, 507–520.
Bullier, J. (2003). Cortical connections and functional interactions between visual cortical
areas. In M. Fahle & M. Greenlee (Eds.), The neuropsychology of vision (pp. 23–63). Ox
ford, UK: Oxford University Press.
Bussey, T. J., & Saksida, L. M. (2007). Memory, perception, and the ventral visual-perirhi
nal-hippocampal stream: thinking outside of the boxes. Hippocampus, 17, 898–908.
Caminiti, R., Chafee, M. V., Battalglia-Mayer, A., Averbeck, B. B., Crowe, D. A., & Geor
gopoulos, A. P. (2010). Understanding the parietal lobe syndrome from a neuropsychologi
cal and evolutionary perspective. European Journal of Neuroscience, 31, 2320–2340.
Campbell, R., Zihl, J., Massaro, D., Munhall, K., & Cohen, M. M. (1997). Speechreading in
the akinetopsic patient, L.M. Brain, 120, 1793–1803.
Carey, D. P., Dijkerman, H. C., Murphy, K. J., Goodale, M. A., & Milner, A. D. (2006). Point
ing to places and spaces in a patient with visual form agnosia. Neuropsychologia, 44,
1584–1594.
Cerrato, P., Lentini, A., Baima, C., Grasso, M., Azzaro, C., Bosco, G., Destefanis, E., Benna,
P., Bergui, M., & Bergamasco, B. (2005). Hypogeusia and hearing loss in a patient with an
inferior collicular lesion. Neurology, 65, 1840–1841.
Chartrand, J. P., Peretz, I., & Belin, P. (2008). Auditory recognition expertise and domain
specificity. Brain Research, 1220, 191–198.
Clark, A. (2009). Perception, action, and experience: unraveling the golden braid. Neu
ropsychologia, 47, 1460–1468.
Clarke, G. (2005). Incidence of neurological vision impairment in patients who suffer from
an acquired brain injury. International Congress Series, 1282, 365–369.
Clarke, S., Bellmann, A., Meuli, R. A., Assal, G., & Steck, A. J. (2000). Auditory agnosia
and auditory spatial deficits following left hemispheric lesions: Evidence for distinct pro
cessing pathways. Neuropsychologia, 38, 797–807.
Clavagnier, S., Fruhmann Berger, M., Klockgether, T., Moskau, S., & Karnath, H. O.
(2006). Restricted ocular exploration does not seem to explain simultanagnosia. Neu
ropsychologia, 44, 2330–2336.
Combarros, O., Miro, J., & Berciano, J. (1994). Ageusia associated with thalamic plaque in
multiple sclerosis. European Neurology, 34, 344–346.
Page 22 of 37
Perceptual Disorders
Connolly, D. M., Barbur, J. L., Hosking, S. L., & Moorhead, I. R. (2008). Mild hypoxia im
pairs chromatic sensitivity in the mesopic range. Investigative Ophthalmology & Visual
Science, 49, 820–827.
Coslett, H. B., & Lie, G. (2008). Simultanagnosia: When a rose is not red. Journal of Cog
nitive Neuroscience, 20, 36–48.
Cowey, A. (2010). The blindsight saga. Experimental Brain Research, 200, 3–24.
Croker, V., & McDonald, S. (2005). Recognition of emotion from facial expression follow
ing traumatic brain injury. Brain Injury, 19, 787–799.
Dahmen, J. C., & King, A. J. (2007). Learning to hear: Plasticity of auditory cortical pro
cessing. Current Opinion in Neurobiology, 17, 456–464.
Dalrymple, K. A., Kingstone, A., & Barton, J. J. (2007). Seeing trees OR seeing forests in
simultanagnosia: Attentional capture can be local or global. Neuropsychologia, 45, 871–
875.
Danckert, J., & Rosetti, Y. (2005). Blindsight in action: What can the different subtypes of
blindsight tell us about the control of visually guided actions? Neuroscience & Biobehav
ioral Reviews, 29, 1035–1046.
Darling, W. G., Bartelt, R., Pizzimenti, M. A., & Rizzo, M. (2008). Spatial perception errors
do not predict pointing errors by individuals with brain lesions. Journal of Clinical & Ex
perimental Neuropsychology, 30, 102–119.
Darling, W. G., Pizzimenti, M. A., & Rizzo, M. (2003). Unilateral posterior parietal lobe le
sions affect representation of visual space. Vision Research, 43, 1675–1688.
Dean, H. L., Crowley, J. C., & Platt, M. L. (2004). Visual and saccade-related activity in
macaque posterior cingulated cortex. Journal of Neurophysiology, 92, 3056–3068.
Delvenne, J. F., Seron, X., Coyette, F., & Rossion, B. (2004). Evidence for perceptu
(p. 207)
Doty, R. L. (2009). The olfactory system and its disorders. Seminars in Neurology, 29, 74–
81.
Eckman, G., Berglund, B., Berglund, U., & Lindvall, T. (1967). Perceived intensity of odor
as a function of time of adaptation. Scandinavian Journal of Psychology, 8, 177–186.
Page 23 of 37
Perceptual Disorders
Engelien, A., Silbersweig, D., Stern, E., Huber, W., Doring, W., Frith, C., & Frackowiak, R.
S. (1995). The functional anatomy of recovery from auditory agnosia. A PET study of
sound categorization in a neurological patient and normal controls. Brain, 118, 1395–
1409.
Estanol, B., Baizabal-Carvallo, J. F., & Senties-Madrid, H. (2008). A case of tactile agnosia
with a lesion restricted to the post-central gyrus. Neurology India, 56, 471–473.
Fikentscher, R., Roseburg, B., Spinar, H., & Bruchmuller, W. (1977). Loss of taste in the el
derly: Sex differences. Clinical Otolaryngology & Allied Sciences, 2, 183–189.
Fritz, J., Elhilali, M., & Shamma, S. (2005). Active listening: Task-dependent plasticity of
spectrotemporal receptive fields in primary auditory cortex. Hearing Research, 206, 159–
176.
Fujiwara, E., Schwartz, M. L., Gao, F., Black, S.E., & Levine, B. (2008). Ventral frontal cor
tex functions and quantified MRI in traumatic brain injury. Neuropsychologia, 46, 461–
474.
Gal, R. L. (2008). Visual function 15 years after optic neuritis: A final follow-up report
from the optic neuritis treatment trial. Ophthalmology. 115, 1079–1082.
Ghika, J., Ghika-Schmid, F., & Bogousslavsky, J. (1998). Parietal motor syndrome: A clini
cal description in 32 patients in the acute phase of pure parietal stroke studied prospec
tively. Clinical Neurology & Neurosurgery, 100, 271–282.
Giese, M. A., & Poggio, T. (2003). Neural mechanisms for the recognition of biological
movements. Nature Reviews Neuroscience, 4, 179–192.
Gilbert, C. D., Li, W., & Piech, V. (2009). Perceptual learning and adult cortical plasticity.
Journal of Physiology, 587, 2743–2751.
Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory process
ing. Neuron, 54, 677–696.
Goodale, M. A., & Westwood, D. A. (2004). An evolving view of duplex vision: Separate
but interacting cortical pathways for perception and action. Current Opinion in Neurobi
ology, 14, 203–211.
Page 24 of 37
Perceptual Disorders
Gottfried, J. A., Deichmann, R., Winston, J. S., & Dolan, R. J. (2002). Functional hetero
geneity in human olfactory cortex: An event-related functional magnetic resonance imag
ing study. Journal of Neuroscience, 22, 10819–10828.
Gottlieb, J. (2007). From thought to action: The parietal cortex as a bridge between per
ception, action, and cognition. Neuron, 53, 9–16.
Green, R. E., Turner, G. R., & Thompson, W. F. (2004). Deficits in facial emotion percep
tion in adults with recent traumatic brain injury. Neuropsychologia, 42, 133–141.
Green, T. L., McGregor, L. D., & King, K. M. (2008). Smell and taste dysfunction following
minor stroke: A case report. Canadian Journal of Neuroscience Nursing, 30, 10–13.
Griffiths, T. D., Rees, A., Witton, C., & Cross, P. M., Shakir, R. A., & Green, G. G. (1997).
Spatial and temporal auditory processing deficits following right hemisphere infarction: A
psychophysical study. Brain, 120, 85–94.
Grill-Spector, K. (2003). The neural basis of object recognition. Current Opinion in Neuro
biology, 13, 159–166.
Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro
science, 27, 649–677.
Harrington, D. O., & Drake, M. V. (1990). The visual fields (6th ed.). St. Louis: Mosby.
Haxel, B. R., Grant, L., & Mackay-Sim, A. (2008). Olfactory dysfunction after head injury.
Journal of Head Trauma Rehabilitation, 23, 407–413.
Heider, B. (2000). Visual form agnosia: Neural mechanisms and anatomical foundations.
Neurocase, 6, 1–12.
Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends
in Cognitive Sciences, 7, 498–504.
Hess, R. F., Zihl, J., Pointer, S. J., & Schmid, C. (1990). The contrast sensitivity deficit in
cases with cerebral lesions. Clinical Vision Sciences, 5, 203–215.
Hess, R. H., Baker, C. L. Jr., & Zihl, J. (1989). The “motion-blind” patient: Low-level spatial
and temporal filters. Journal of Neuroscience, 9, 1628–1640.
Heywood, C. A., & Kentridge, R. W. (2003). Achromatopsia, color vision, and cortex. Neu
rologic Clinics, 21, 483–500.
Heywood, C. A., Wilson, B., & Cowey, A. (1987). A case study of cortical colour blindness
with relatively intact achromatic discrimination. Journal of Neurology, Neurosurgery, and
Psychiatry, 50, 22–29.
Himmelbach, M., Erb, M., & Karnath, H.-O. (2006). Exploring the visual world: The neur
al substrate of spatial orienting. NeuroImage, 32, 1747–1759.
Page 25 of 37
Perceptual Disorders
Himmelbach, M., Erb, M., Klockgether, T., Moskau, S., & Karnath, H. O. (2009). fMRI of
global visual perception in simultanagnosia. Neuropsychologia, 47, 1173–1177.
Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierar
chies in the visual system. Neuron, 36, 791–804.
Huberle, E., & Karnath, H. O. (2006). Global shape recognition is modulated by the spa
tial distance of local elements: Evidence from simultanagnosia. Neuropsychologia, 44,
905–911.
Ietswaart, M., Milders, M., Crawford, J. R., Currie, D., & Scott, C. L. (2008). Longitudinal
aspects of emotion recognition in patients with traumatic brain injury. Neuropsychologia,
46, 148–159.
Jacek, S., Stevenson, R. J., & Miller, L. A. (2007). Olfactory dysfunction in temporal lobe
epilepsy: A case of ictus-related parosmia. Epilepsy & Behavior, 11, 466–470.
Jackson, G. R., & Owsley, C. (2003). Visual dysfunction, neurodegenerative diseases, and
aging. Neurologic Clinics, 21, 709–728.
James, T. W., Culham, J., Humphreys, G. K., Milner, A. D., & Goodale, M. A. (2003).
(p. 208)
Ventral occipital lesions impair object recognition but not object-directed grasping: an
fMRI study. Brain, 126, 2463–2475.
Janata, P. (2005). Brain networks that track musical structure. Annals of the New York
Academy of Sciences, 1060, 111–124.
Kaga, K., Shindo, M., & Tanaka, Y. (1997). Central auditory information processing in pa
tients with bilateral auditory cortex lesions. Acta Oto-Laryngologica Supplement, 532, 77–
82.
Kaga, K., Shindo, M., Tanaka, Y., & Haebara, H. (2000). Neuropathology of auditory ag
nosia following bilateral temporal lobe lesions: A case study. Acta Oto-Laryngologica, 120,
259–262.
Karnath, H.-O., Ferber, S., Rorden, C., & Driver, J. (2000). The fate of global information in
dorsal simultanagnosia. Neurocase, 6, 295–306.
Kennard, C., Lawden, M., Morland, A. B., & Ruddock, K. H. (1995). Colour identification
and colour constancy are impaired in a patient with incomplete achromatopsia associated
with prestriate cortical lesions. Proceedings of the Royal Society of London—Series B: Bi
ological Sciences, 260, 169–175.
Page 26 of 37
Perceptual Disorders
Kentridge, R. W., Heywood, C. A., & Milner, A. D. (2004). Covert processing of visual form
in the absence of area L Neuropsychologia, 42, 1488–1495.
Koh, S. B., Kim, B. J., Lee, J., Suh, S. I., Kim, T. K., & Kim, S. H. (2008). Stereopsis and col
or vision impairment in patients with right extrastriate cerebral lesions. European Neurol
ogy, 60, 174–178.
Konen, C. S., & Kastner, S. (2008). Two hierarchically organized neural systems for object
information in human visual cortex. Nature Neuroscience, 11, 224–231.
Kraus, N., & Nicol, T. (2005). Brainstem origins for cortical “what” and “where” pathways
in the auditory system. Trends in Neurosciences, 28, 176–181.
Landis, B. N., Leuchter, I., San Millan Ruiz, D., Lacroix, J. S., & Landis, T. (2006). Tran
sient hemiageusia in cerebrovascular lateral pontine lesions. Journal of Neurology, Neuro
surgery, and Psychiatry, 77, 680–683.
Landis, T. (2000). Disruption of space perception due to cortical lesions. Spatial Vision,
13, 179–191.
Landis, T., Regard, M., Bliestle, A., & Kleihues, P. (1988). Prosopagnosia and agnosia for
noncanonical views. An autopsied case. Brain, 111, 1287–1297.
Le, S., Raufaste, E., Roussel, S., Puel, M., & Demonet, J. F. (2003). Implicit face percep
tion in a patient with visual agnosia? Evidence from behavioural and eye-tracking analy
ses. Neuropsychologia, 41, 702–712.
Leigh, R. J., & Zee, D. S. (2006). The neurology of eye movements (4th ed.). Philadelphia:
F. A. Davis.
Lewald, J., Peters, S., Corballis, M. C., & Hausmann, M. (2009). Perception of stationary
and moving sound following unilateral cortectomy. Neuropsychologia, 47, 962–971.
Lewis, J. W., Wightman, F. L., Brefczynski, J. A., Phinney, R. E., Binder, J. R., & DeYoe, E. A.
(2004). Human brain regions involved in recognizing environmental sounds. Cerebral
Cortex, 14, 1008–1021.
Lissauer, H. (1890). Ein Fall von Seelenblindheit nebst einem Beitrage zur Theorie dersel
ben. [A case of mindblindness with a contribution to its theory]. Archiv für Psychiatrie
und Nervenkrankheiten, 21, 222–270.
Lotze, M., & Moseley, G. L. (2007). Role of distorted body image in pain. Current Rheuma
tology Reports, 9, 488–496.
Luzzi, S., Snowden, J. S., Neary, D., Coccia, M., Provinciali, L., & Lambon Ralph, M. A.
(2007). Distinct patterns of olfactory impairment in Alzheimer’s disease, semantic demen
tia, frontotemporal dementia, and corticobasal degeneration. Neuropsychologia, 45,
1823–1831.
Page 27 of 37
Perceptual Disorders
Mackay-Sim, A., Johnston, A. N., Owen, C., & Burne, T. H. (2006). Olfactory ability in the
healthy population: Reassessing presbyosmia. Chemical Senses, 31, 763–771.
Martinez-Conde, S., Macknik, S. L., & Hubel, D. H. (2004). The role of fixational eye
movements in visual perception. Nature Reviews Neuroscience, 5, 229–239.
Mather, G. (2006). Foundations of perception. Hove (UK) and New York: Psychology
Press.
Mathy, I., Dupois, M. J., Pigeolet, Y., & Jacquerye, P. (2003). Bilateral ageusia after left in
sular and opercular ischemic stroke. [French]. Revue Neurologique, 159, 563–567.
McIntosh, R. D., Dijkerman, H. C., Mon-Williams, M., & Milner, A. D. (2004). Grasping
what is graspable: Evidence from visual form agnosia. Cortex, 40, 695–702.
McLeod, P., Dittrich, W., Driver, J., Perrett, D., & Zihl, J. (1996). Preserved and impaired
detection of structure from motion by a motion-blind patient. Visual Cognition, 3, 363–
391.
McLeod, P., Heywood, C., Driver, J., & Zihl, J. (1989). Selective deficit of visual search in
moving displays after extrastriate damage. Nature, 339, 466–467.
Mendez, M. F., & Cherrier, M. M. (2003). Agnosia for scenes in topographagnosia. Neu
ropsychologia, 41, 1387–1395.
Mendez, M. F., & Ghajarnia, M. (2001). Agnosia for familiar faces and odors in a patient
with right temporal lobe dysfunction. Neurology, 57, 519–521.
Michel, F., & Henaff, M. A. (2004). Seeing without the occipito-parietal cortex: Simul
tanagnosia as a shrinkage of the attentional visual field. Behavioural Neurology, 15, 3–13.
Miller, L. J., Mittenberg, S., Carey, V. M., McMorrow, M. A., Kushner, T. E., & Weinstein, J.
M. (1999). Astereopsis caused by traumatic brain injury. Archives of Clinical Neuropsy
chology, 14, 537–543.
Milner, A. D., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., et al.
(1991). Perception and action in “visual form agnosia.” Brain, 114, 405–428.
Milner, A. D., & Goodale, M. A. (2008). Two visual systems re-reviewed. Neuropsychologia,
46, 774–785.
Page 28 of 37
Perceptual Disorders
Moro, V., Urgesi, C., Pernigo, S., Lanteri, P., Pazzaglia, M., & Aglioti, S. M. (2008). The
neural basis of body form and body action agnosia. Neuron, 60, 235–246.
Mort, D. J., & Kennard, C. (2003). Visual search and its disorders. Current Opinion in Neu
rology, 16, 51–57.
Moura, A. L., Teixeira, R. A., Oiwa, N. N., Costa, M. F., Feitosa-Santana, C., Calle
(p. 209)
garo, D., Hamer, R. D., & Ventura, D. F. (2008). Chromatic discrimination losses in multi
ple sclerosis patients with and without optic neuritis using the Cambridge Colour Test. Vi
sual Neuroscience, 25, 463–468.
Moutoussis, K., & Zeki, S. (2008). Motion processing, directional selectivity, and con
scious visual perception in the human brain. Proceedings of the National Academy of the
United States of America, 105, 16362–16367.
Müller, T., Woitalla, D., Peters, S., Kohla, K., & Przuntek, H. (2002). Progress of visual dys
function in Parkinson’s disease. Acta Neurologica Scandinavica, 105, 256–260.
Mycroft, R. H., Behrmann, M., & Kay, J. (2009). Visuoperceptual deficits in letter-by-letter
reading? Neuropsychologia, 47, 1733–1744.
Nakachi, R., Muramatsu, T., Kato, M., Akiyama, T., Saito, F., Yoshino, F., Mimura, M., &
Kashima, H. (2007). Progressive prosopagnosia at a very early stage of frontotempolar
lobular degeneration. Psychogeriatrics, 7, 155–162.
Nobre, A. C. (2001). The attentive homunculus: Now you see it, now you don’t. Neuro
science & Biobehavioral Reviews, 25, 477–496.
Oliveri, M., Turriziani, P., Carlesimo, G. A., Koch, G., Tomaiuolo, F., Panella M., & Calta
girone, G. M. (2001). Parieto-frontal interactions in visual-object and visual-spatial work
ing memory: Evidence from transcranial magnetic stimulation. Cerebral Cortex, 11, 606–
618.
Olson, C. R., Gettner, S. N., Ventura, V., Carta, R., & Kass, R. E. (2000). Neuronal activity
in macaque supplementary eye field during planning of saccades in response to pattern
and spatial cues. Journal of Neurophysiology, 84, 1369–1384.
Orban, G. A. (2008). Higher order visual processing in macaque extrastriate cortex. Physi
ological Reviews, 88, 59–89.
O’Regan, J. K., & Noë, A. (2001). A sensorimotor account of visual and visual conscious
ness. Behavioral and Brain Sciences, 24, 939–1031.
Ozaki, I., & Hashimoto, I. (2007). Human tonotopic maps and their rapid task-related
changes studied by magnetic source imaging. Canadian Journal of Neurological Sciences,
34, 146–153.
Page 29 of 37
Perceptual Disorders
Pambakian, A. L. M., Mannan, S. K., Hodgson, T. L., & Kennard, C. (2004). Saccadic visual
search training: A treatment for patients with homonymous hemianopia. Journal of Neu
rology, Neurosurgery, and Psychiatry, 75, 1443–1448.
Pardini, M., Huey, E. D., Cavanagh, A. L., & Grafman, J. (2009). Olfactory function in corti
cobasal syndrome and frontotemporal dementia. Archives of Neurology, 66, 92–96.
Pavese, A., Coslett, H. B., Saffran, E., & Buxbaum, L. (2002). Limitations of attentional
orienting: Effects of abrupt visual onsets and offsets on naming two objects in a patient
with simultanagnosia. Neuropsychologia, 40, 1097–1103.
Pegna, A. J., Caldara-Schnetzer, A. S., & Khateb, A. (2008). Visual search for facial expres
sions of emotion is less affected in simultanagnosia. Cortex, 44, 46–53.
Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville,
S. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain,
117, 1283–1301.
Pisella, L., Sergio, L., Blangero, A., Torchin, H., Vighetto, A., & Rosetti, Y. (2009). Optic
ataxia and the function of the dorsal stream: Contributions to perception and action. Neu
ropsychologia, 47, 3033–3044.
Plaisier, M. A., Tiest, W. M., & Kappers, A. M. (2009). Salient features in 3-D haptic shape
perception. Attention Perception & Psychophysics, 71, 421–430.
Plant, G. T., Laxer, K. D., Barbaro, N. M., Schiffman, J. S., & Nakayama, K. (1993). Im
paired visual motion perception in the contralateral hemifield following unilateral posteri
or cerebral lesions in humans. Brain, 116, 1303–1335.
Pollatos, O., Albrecht, J., Kopietz, R., Linn, J., Schoepf, V., Kleemann, A. M., Schreder, T.,
Schandry, R., & Wiesmann, M. (2007). Reduced olfactory sensitivity in subjects with de
pressive symptoms. Journal of Affective Disorders, 102, 101–108.
Polster, M. R., & Rose, S. B. (1998). Disorders of auditory processing: Evidence for modu
larity in audition. Cortex, 34, 47–65.
Porro, C. A., Lui, F., Facchin, P., Maieron, M., & Baraldi, P. (2005). Percept-related activity
in the human somatosensory system: functional magnetic resonance imaging studies.
Magnetic Resonance Imaging, 22, 1539–1548.
Page 30 of 37
Perceptual Disorders
Postma, A., Sterken, Y., de Vries, L., & de Haan, E. H. E. (2000). Spatial localization in pa
tients with unilateral posterior left or right hemisphere lesions. Experimental Brain Re
search, 134, 220–227.
Rainville, C., Joubert, S., Felician, O., Chabanne, V., Ceccaldi, M., & Peruch, P. (2006).
Wayfinding in familiar and unfamiliar environments in a case of progressive topographi
cal agnosia. Neurocase, 11, 297–309.
Rayner, K. (1998). Eye movements in reading and visual information processing: 20 Years
of research. Psychological Bulletin, 124, 372–422.
Reed, C. L., Caselli, R. J., & Farah, M. J. (1996). Tactile agnosia: Underlying impairment
and implications for normal tactile object recognition. Brain, 119, 875–888.
Rentschler, I., Treutwein, B., & Landis, T. (1994). Dissociation of local and global process
ing in visual agnosia. Vision Research, 34, 963–971.
Rice, N. J., McIntosh, R. D., Schindler, I., Mon-Williams, M., Demonet, J. F., & Milner, A. D.
(2006). Intact automatic avoidance of obstacles in patients with visual form agnosia. Ex
perimental Brain Research, 174, 76–88.
Riddoch, M. J., Humphreys, G. W., Akhtar, N., Allen, H., & Bracewell, R. M., & Schofield,
A. J. (2008). A tale of two agnosias: Distinctions between form and integrative agnosia.
Cognitive Neuropsychology, 25, 56–92.
Riddoch, M. J., Johnston, R. A., Bracewell, R. M., Boutsen, L., & Humphreys, G. W. (2008).
Are faces special? A case of pure prosopagnosia. Cognitive Neuropsychology, 25, 3–26.
Rizzo, M., Nawrot, M., & Zihl, J. (1995). Motion and shape perception in cerebral akine
topsia. Brain, 118, 1105–1127.
Rizzo, M., Smith, V., Pokorny, J., & Damasio, A. R. (1993). Colour perception profiles in
central achromatopsia. Neurology, 43, 995–1001.
Rizzo, M., & Vecera, S. P. (2002). Psychoanatomical substrates of Balint’s syndrome. Jour
nal of Neurology, Neurosurgery, and Psychiatry, 72, 162–178.
(p. 210) Roark, D. A., Barrett, S. E., Spence, M. J., Abdi, H., & O’Toole, A. J. (2003). Psycho
logical and neural perspectives on the role of motion in face recognition. Behavioral &
Cognitive Neuroscience Reviews, 2, 15–46.
Page 31 of 37
Perceptual Disorders
Roessner, V., Bleich, S., Banaschewski, T., & Rothenberger, A. (2005). Olfactory deficits in
anorexia nervosa. European Archives of Psychiatry & Clinical Neuroscience, 255, 6–9.
Rowe, F., Brand, D., Jackson, C. A., Price, A., Walker, L., Harrison, S., Eccleston, C., Scott
C., Akerman, N., Dodridge, C., Howard, C., Shipman, T., Sperring, U., MacDiarmid, S., &
Freeman, C. (2009). Visual impairment following stroke: Do stroke patients require vision
assessment? Age and Ageing, 38, 188–193.
Russ, B. E., Lee, Y. S., & Cohen, Y. E. (2007). Neural and behavioral correlates of auditory
categorization. Hearing Research, 229, 204–212.
Saetti, M. C., De Renzi, E., & Comper, M. (1999). Tactile morphagnosia secondary to spa
tial deficits. Neuropsychologia, 37, 1087–1100.
Samson, S., Zatorre, R. J., & Ramsay, J. O. (2002). Deficits of musical timbre perception af
ter unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain, 125,
511–523.
Satoh, M., Takeda, K., Murakami, Y., Onouchi, K., Inoue, K., & Kuzuhara, S. (2005). A case
of amusia caused by the infarction of anterior portion of bilateral temporal lobes. Cortex,
41, 77–83.
Saumier, D., Arguin, M., Lefebvre, C., & Lassonde, M. (2002). Visual object agnosia as a
problem in integrating parts and part relations. Brain & Cognition, 48, 531–537.
Saygin, A. P. (2007). Superior temporal and premotor brain areas necessary for biological
motion perception. Brain, 130, 2452–2461.
Saygin, A. P., Dick, F., Wilson, S. M., Dronkers, N. F., & Bates, E. (2003). Neural resources
for processing language and environmental sounds: Evidence from aphasia. Brain, 126,
928–945.
Schenk, T. (2006). An allocentric rather than perceptual deficit in patient DF. Nature Neu
roscience, 9, 1369–1370.
Schenk, T., Mai, N., Ditterich, J., & Zihl, J. (2000). Can a motion-blind patient reach for
moving objects? European Journal of Neuroscience, 12, 3351–3360.
Schiller, P. H., & Tehovnik, E. J. (2001). Look and see: How the brain moves your eyes
about. Progress in Brain Research, 134, 127–142.
Schiller, P. H., & Tehovnik, E. J. (2005). Neural mechanisms underlying target selection
with saccadic eye movements. Progress in Brain Research, 149, 157–171.
Schuett, S., Heywood, C. A., Kentrigde, R. W., & Zihl, J. (2008a). The significance of visual
information processing in reading: Insights from hemianopic dyslexia. Neuropsychologia,
46, 2445–2462.
Page 32 of 37
Perceptual Disorders
Schuett, S., Heywood, C. A., Kentrigde, R. W., & Zihl, J. (2008b). Rehabilitation of hemi
anopic dyslexia: are words necessary for re-learning oculomotor control? Brain, 131,
3156–3168.
Schuppert, M., Munte, T. F., Wieringa, B. M., & Altenmuller, E. (2000). Receptive amusia:
Evidence for cross-hemispheric neural networks underlying music processing strategies.
Brain, 123, 546–559.
Scott, S. K., Young, A. W., Calder, A. J., Hellawell, D. J., Aggleton, J. P., & Johnson, M.
(1997). Impaired auditory recognition for fear and anger following bilateral amygdala le
sions. Nature, 385, 254–257.
Shah, M., Deeb, J., Fernando, M., Noyce, A., Visentin, E., Findley, L. J., & Hawkes, C. H.
(2009). Abnormality of taste and smell in Parkinson’s disease. Parkinsonism & Related
Disorders, 15, 232–237.
Shinn-Cunningham, B. G., & Best, V. (2008). Selective attention in normal and impaired
hearing. Trends in Amplification, 12, 283–299.
Shivashankar, N., Shashikala, H. R., Nagaraja, D., Jayakumar, P. N., & Ratnavalli, E.
(2001). Pure word deafness in two patients with subcortical lesions. Clinical Neurology &
Neurosurgery, 103, 201–205.
Sigala, N. (2004). Visual categorization and the inferior temporal cortex. Behavioural
Brain Research, 149, 1–7.
Simon, S. A., de Araujo, I. E., Gutierrez, R., & Nicolelis, M. A. L. (2006). The neural mech
anisms of gestation: a distributed processing code. Nature Reviews Neuroscience, 7, 890–
901.
Singh-Curry, V., & Husain, M. (2009). The functional role of the inferior parietal lobe in
the dorsal and ventral stream dichotomy. Neuropsychologia, 47, 1434–1448.
Small, D. M., Bernasconi, N., Bernasconi, A., Sziklas, V., & Jones-Gotman, M. (2005). Gus
tatory agnosia. Neurology, 64, 311–317.
Smith, C. N., & Squire, L. R. (2008). Experience-dependent eye movements reflect hip
pocampus-dependent (aware) memory. Journal of Neuroscience, 28, 12825–12833.
Snowden, R. J., & Freeman, T. C. (2004). The visual perception of motion. Current Biology,
14, R828–R831.
Page 33 of 37
Perceptual Disorders
Stephan, B. C. M., & Caine, D. (2009). Aberrant pattern of scanning in prosopagnosia re
flects impaired face processing. Brain and Cognition, 69, 262–268.
Stolbova, K., Hahn, A., Benes, B., Andel, M., & Treslova, L. (1999). Gustatometry of dia
betes mellitus patients and obese patients. International Tinnitus Journal, 5, 135–140.
Suchoff, I. B., Kapoor, N., Ciuffreda, K. J., Rutner, D., Han, E., & Craig, S. (2008). The fre
quency of occurrence, types, and characteristics of visual field defects in acquired brain
injury: A retrospective analysis. Optometry, 79, 259–265.
Suzuki, W. A. (2009). Perception and the medial temporal lobe: Evaluating the current evi
dence. Neuron, 61, 657–666.
Symonds, C., & MacKenzie, I. (1957). Bilateral loss of vision from cerebral infarction.
Brain, 80, 415–455.
Takaiwa, A., Yoshimura, H., Abe, H., & Terai, S. (2003). Radical “visual capture” observed
in a patient with severe visual agnosia. Behavioural Neurology, 14, 47–53.
Takarae, Y., & Levin, D. T. (2001). Animals and artifacts may not be treated equally: differ
entiating strong and weak forms of category-specific visual agnosia. Brain & Cognition,
45, 249–264.
Tanaka, Y., Nakano, I., & Obayashi T. (2002). Environmental sound recognition after uni
lateral subcortical lesions. Cortex, 38, 69–76.
Taniwaki, T., Tagawa, K., Sato, F., & Iino, K. (2000). Auditory agnosia restricted to envi
ronmental sounds following cortical deafness and generalized auditory agnosia. Clinical
Neurology & Neurosurgery, 102, 156–162.
Thomas, R., & Forde, E. (2006). The role of local and global processing in the recognition
of living and nonliving things. Neuropsychologia, 44, 982–986.
Tomberg, C., & Desmedt, J. E. (1999). Failure to recognise objects by active touch
(p. 211)
Tootell, R. B. H., Hadjikhani, N. K., Mendola, J. D., Marrett, S., & Dale, A. M. (1998). From
retinotopy to recognition: fMRI in human visual cortex. Trends in Cognitive Sciences, 2,
174–183.
Turnbull, O. H., Driver, J., & McCarthy, R. A. (2004). 2D but not 3D: Pictorial depth
deficits in a case of visual agnosia. Cortex, 40, 723–738.
Uc, E. Y., Rizzo, M., Anderson, S. W., Quian, S., Rodnitzky, R. L., & Dawson, J. D. (2005).
Visual dysfunction in Parkinson disease without dementia. Neurology, 65, 1907–1923.
Page 34 of 37
Perceptual Disorders
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical systems. In D. J. Ingle, J. W. Mans
field, & M. A. Goodale (Eds.), Advances in the analysis of visual behaviour (pp. 549–596).
Cambridge, MA: MIT Press.
Vaina, L. M., Makris, N., Kennedy, D., & Cowey, A. (1998). The selective impairment of the
perception of first-order motion by unilateral cortical brain damage. Visual Neuroscience,
15, 333–348.
Valenza, N., Ptak, R., Zimine, I., Badan, M., Lazeyras, F., & Schnider, A. (2001). Dissociat
ed active and passive tactile shape recognition: A case study of pure tactile apraxia.
Brain, 24, 2287–2298.
Vallar, G., & Ronchi R. (2009). Somatoparaphrenia: A body delusion. A review of the neu
ropsychological literature. Experimental Brain Research, 192, 533–551.
Venn, H. R., Watson, S., Gallagher, P., & Young, A. H. (2006). Facial expression percep
tion: An objective outcome measure for treatment studies in mood disorders? Internation
al Journal of Neuropsychopharmacology, 9, 229–245.
Vignolo, L. A. (2003). Music agnosia and auditory agnosia: Dissociations in stroke pa
tients. Annals of the New York Academy of Sciences, 999, 50–57.
Walsh, Th. J. (1985). Blurred vision. In Th. J. Walsh (Ed.) Neuro-ophthalmology: Clinical
signs and symptoms (pp. 343–385). Philadelphia: Lea & Febiger.
Wang, E., Peach, R. K., Xu, Y., Schneck, M., & Manry, C. (2000). Perception of dynamic
acoustic patterns by an individual with unilateral verbal auditory agnosia. Brain & Lan
guage, 73, 442–455.
Wang, X., Lu, T., Bendor, D., & Bartlett, E. (2008). Neural coding of temporal information
in auditory thalamus and cortex. Neuroscience, 157, 484–494.
Wang, W. J., Wu, X. H., & Li, L. (2008). The dual-pathway model of auditory signal pro
cessing. Neuroscience Bulletin, 24, 173–182.
Wermer, M. J., Donswijk, M., Greebe, P., Verweij, B. H., & Rinkel, G. J. (2007). Anosmia af
ter aneurysmal subarachnoid hemorrhage. Neurosurgery, 61, 918–922.
Wierenga, C. E., Perlstein, W. M., Benjamin, M., Leonard, C. M., Rothi, L. G., Conway, T.,
Cato, M. A., Gopinath, K., Briggs, R., & Crosson, B. (2009). Neural substrates of object
identification: Functional magnetic resonance imaging evidence that category and visual
Page 35 of 37
Perceptual Disorders
Wilson, R. S., Arnold, S. E., Tang, Y., & Bennett, D. A. (2006). Odor identification and de
cline in different cognitive domains in old age. Neuroepidemiology, 26, 61–67.
Wright, B. A., & Zhang, Y. (2009). A review of the generalization of auditory learning.
Philosophical Transactions of the Royal Society of London—Series B: Biological Sciences,
364, 301–311.
Yamamoto, T. (2006). Neural substrates fort he processing of cognitive and affective as
pects of taste in the brain. Archives of Histology & Cytology, 69, 243–255.
Yang, J., Wu, M., & Shen, Z. (2006). Preserved implicit form perception and orientation
adaptation in visual form agnosia. Neuropsychologia, 44, 1833–1842.
Zaehle, T., Geiser, E., Alter, K., Jancke, L., & Meyer, M. (2008). Segmental processing in
the human auditory dorsal stream. Brain Research, 1220, 179–190.
Zatorre, R. J. (2007). There’s more to auditory cortex than meets the ear. Hearing Re
search, 229, 24–30.
Zeki, S., & Bartels, A. (1998). The autonomy of the visual systems and the modularity of
conscious vision. Proceedings of the Royal Society London B, 353, 1911–1914.
Zihl, J. (1995a). Visual scanning behavior in patients with homonymous hemianopia. Neu
ropsychologia, 33, 287–303.
Zihl, J. (1995b). Eye movement patterns in hemianopic dyslexia. Brain, 118, 891–912.
Zihl, J. (2011). Rehabilitation of cerebral visual disorders (2nd ed.). Hove, UK: Psychology
Press.
Zihl, J., & Hebel, N. (1997). Patterns of oculomotor scanning in patients with unilateral
posterior parietal or frontal lobe damage. Neuropsychologia, 35, 893–906.
Zihl, J., Sämann, Ph., Schenk, T., Schuett S., & Dauner, R. (2009). On the origin of line bi
section error in hemianopia. Neuropsychologia, 47, 2417–2426.
Zihl, J., von Cramon, D., & Mai, N. (1983). Selective disturbance of movement vision after
bilateral brain damage. Brain, 106, 313–334.
Zihl, J., von Cramon, D., Mai, N., & Schmid, C. (1991). Disturbance of movement vision af
ter bilateral posterior brain damage. Further evidence and follow up observations. Brain,
114, 2235–2352.
Page 36 of 37
Perceptual Disorders
Josef Zihl
Josef Zihl is research group leader and head of the outpatient clinic for neuropsychol
ogy, Max Planck Institute of Psychiatry.
Page 37 of 37
Varieties of Auditory Attention
Research on attention is one of the major areas of investigation within psychology, neurol
ogy, and cognitive neuroscience. There are many areas of active investigations that aim to
understand the brain networks and mechanisms that support attention, in addition to the
relationship between attention and other cognitive processes like working memory, vigi
lance, and learning. This chapter focuses on auditory attention, with a particular empha
sis on studies that have examined the neural underpinnings of sustained, selective, and
divided attention. The chapter begins with a brief discussion regarding the possible role
of attention in the formation and perception of sound objects as the underlying units of
selection. The similarities and differences in neural networks supporting various aspects
of auditory attention, including selective attention, sustained attention, and divided atten
tion are then discussed. The chapter concludes with a description of the neural networks
involved in the control of attention and a discussion of future directions.
Page 1 of 35
Varieties of Auditory Attention
attentional resources between the conversation and the maître d. Common to all three as
pects of attention (i.e., sustained, selective, divided) are processes that allow us to switch
from one auditory source to another as well as to switch from one attentional mode to an
other. Auditory attention is both flexible and dynamic in that it can be driven by external
events such as loud sounds (e.g., a plate smashing on the floor) as well as by internal
goal-directed actions that enable listeners to prioritize and selectively process task-rele
vant sounds at a deeper level (e.g., what the maître d is saying to the restaurant manager
about the reservation), often at the expense of other, less relevant stimuli. This brings up
another important issue in research related to auditory attention, and that is (p. 216) the
role of bottom-up, data-driven attentional capture (e.g., one’s initial response to a fire
alarm) in relation to top-down, controlled attention (e.g., the voluntary control of atten
tion often associated with goal-directed behavior). This distinction between exogenous
and endogenous factors appears inherent to all three modes of auditory attention.
Owing to the fact that current theories of attention have been developed primarily to ac
count for visual scene analysis, as well as the fact that early work on attention demon
strated greater equity between vision and audition, there has been a tendency to assume
that general principles derived from research on visual attention can also be applied to
situations that involve auditory attention. Despite these links, analogies between auditory
and visual attention may be misleading. For instance, in vision, it appears that attention
can be allocated to a particular region of retinal space. However, the same does not nec
essarily apply to audition, in the sense that as far as it is known, there is no topographic
representation of auditory space in the human brain. Although evidence suggests that we
can allocate auditory attention to various locations of retinal space, we can also attend to
sounds that are outside of our sight (e.g., Tark & Curtis, 2009), which makes auditory at
tention particularly important for monitoring changes that occur outside our visual field.
However, it is important to point out that we do not necessarily have to actively monitor
our auditory environment in order to notice occasional or peculiar changes in sounds oc
curring outside our visual field. Indeed, there is ample evidence from scalp recordings of
event-related potentials (ERPs) showing that infrequent changes in the ongoing auditory
scene are automatically detected and can trigger attention to them (Näätänen et al.,
1978; Picton et al., 2000; Winkler et al., 2009). These changes in the auditory environ
ment may convey important information that could require an immediate response, such
as a car horn that is increasing in intensity. In that respect, audition might be thought of
as being at the service of vision, especially in situations that require the localization of
sound sources in the environment (Arnott & Alain, 2011; Kubovy & Van Valkenburg,
2001). Such considerations set the scene for the following discussion, in which a number
of central questions pertaining to attention and auditory cognitive neuroscience will be
tackled. What are the similarities and differences in the brain areas associated with ex
ogenous and endogenous auditory attention? What is the neural network that enables us
to switch attention between modalities or tasks? Are there different networks for switch
ing attention between auditory spatial locations and objects? If so, are these the same as
those used in the visual modality? What are the neural networks supporting the engage
ment and disengagement of auditory attention?
Page 2 of 35
Varieties of Auditory Attention
This chapter focuses on the psychological and neural mechanisms supporting auditory at
tention. We review studies on auditory attention, with a particular emphasis on human
studies, although we also consider animal research when relevant. Our review is by no
means exhaustive but rather aims to provide a theoretical framework upon which future
research can be generated. We begin with a brief discussion regarding the limit of atten
tion and explore the possible mechanisms involved in the formation of auditory objects
and the role of attention on sound object formation. We then discuss the similarities and
differences in neural networks supporting various aspect of auditory attention, including
selective attention, sustained attention, and divided attention. We conclude by describing
the neural networks involved in the control of attention and discuss future directions.
Some visual models have characterized attention in spatial or frequency terms, likening
attention to a “spotlight” or “filter” that moves around, applying processing resources to
whatever falls within a selected spatial region (e.g., Brefczynski & DeYoe, 1999; LaBerge,
1983; McMains & Somers, 2004). Other models discuss resource allocation on the basis
of perceptual objects, in which attending to a particular object enhances processing of all
features of that object (e.g., Chen & Cave, 2008). Recent models of auditory attention
have been consistent with the latter conception of attention in that the underlying units of
Page 3 of 35
Varieties of Auditory Attention
selection are discrete objects or streams, and that attending to one component of an audi
tory object facilitates the processing of other properties of the same object (Alain &
Arnott, 2000; Shinn-Cunningham, 2008).
In the visual domain, the object-based account of attention was proposed to explain why
observers are better at processing visual features that belong to the same visual object
than when those visual features are distributed between different objects (Duncan, 1984;
Egly et al., 1994). For instance, Duncan (1984) showed that performance declined when
individuals were required to make a single judgment about each of two visually overlap
ping objects (e.g., the size of one object and the texture of the other object), compared
with when those two judgments had to be made about a single object. The robustness of
the findings was demonstrated not only in behavioral experiments (Baylis & Driver, 1993)
but also in neuroimaging studies employing functional magnetic resonance imaging (fM
RI; e.g., O’Craven et al., 1999; Yantis & Serences, 2003) and ERPs (e.g., Valdes-Sosa et
al., 1998). Such findings indicated that attention is preferentially allocated to a visual ob
ject such that the processing of all features belonging to an attended object is facilitated.
Alain and Arnott (2000) proposed an analogous object-based account for audition in
which listeners’ attention is allocated to auditory objects derived from the ongoing audi
tory scene according to perceptual grouping principles (Bregman, 1990). More recently,
Shinn-Cunningham (2008) has drawn a parallel between object-based auditory and visual
attention, proposing that perceptual objects form the basic units of auditory attention.
Central to the object-based account of auditory attention is the notion that several per
ceptual objects may be simultaneously accessible for selection and that interactions be
tween object formation and selective attention determine how competing sources inter
fere with perception (Alain & Arnott, 2000; Shinn-Cunningham, 2008). Although there is
some behavioral evidence supporting the notion that objects can serve as an organiza
tional principle in auditory memory (Dyson & Ishfaq, 2008; Hecht et al., 2008), further
work is required to understand how sound objects are represented in memory when these
objects occur simultaneously. Moreover, the object-based account of auditory attention
posits that perceptual objects form the basic unit for selection rather than individual
acoustic features of the stimuli such as pitch and location (Mondor & Terrio, 1998). How
ever, in many studies, it is often difficult to distinguish feature- and object-based attention
effects because sound objects are usually segregated from other concurrent sources us
ing simple acoustic features such as pitch and location (e.g., the voice and location of the
maître d in the restaurant). Therefore, how do we know that attention is allocated to a
sound object rather than its defining acoustic features? One way to distinguish between
feature- and object-based attentional accounts is to pit the grouping of stimuli into per
ceptual objects against the physical proximity of features (e.g., Alain et al., 1993; Alain &
Woods, 1993, 1994; Arnott & Alain, 2002a; Bregman & Rudnicky, 1975; Driver & Baylis,
1989).
Page 4 of 35
Varieties of Auditory Attention
Figure 11.1A shows an example in the auditory modality where the physical similarity be
tween three different streams of sounds was manipulated to promote perceptual group
ing while at the same time decreasing the physical distance between task-relevant and
task-irrelevant stimuli. In that study, Alain and Woods (1993) presented participants with
rapid transient pure tones that varied along three different frequencies in random order.
In the example shown in Figure 11.1A, participants were asked to focus their attention to
the lowest pitch sound in order to detect slightly longer (Experiment 1) or louder (Experi
ment 2) target stimuli while ignoring the other sounds. In one condition, the tones com
posing the sequence were evenly spaced along the frequency domain, whereas in another
condition, the two task-irrelevant frequencies were (p. 218) grouped together by making
the extreme sound (lowest or highest, depending on the condition) more similar to the
middle pitch tone. Performance improved when the two task-irrelevant sounds were
grouped together even though this meant having more distractors closer in pitch to the
task-relevant stimuli (see Figure 11.1B). Figure 11.1C shows the effects of perceptual
Page 5 of 35
Varieties of Auditory Attention
grouping on the selective attention effects on ERPs, which was isolated in the difference
wave between the ERPs elicited by the same sounds when they were task relevant and
when they were task irrelevant. Examining the neural activity that occurred in the brain
during these types of tasks suggested that perceptual grouping enhances selective atten
tion effects in auditory cortices, primarily along the Sylvian fissure (Alain et al., 1993;
Alain & Woods, 1994; Arnott & Alain, 2002a). Taken together, these studies show that
perceptual grouping can override physical similarity effects during selective listening,
and suggest that sound objects form the basic unit for attentional selection.
Page 6 of 35
Varieties of Auditory Attention
One important assumption of the object-based account of auditory attention is that sound
objects are created and segregated independently of attention and that selection for fur
ther processing takes place after this initial partition of the auditory scene into its con
stituent objects. Although there is evidence to suggest that sound segregation may occur
independently of listeners’ attention, there are also some findings that suggest otherwise
(Shamma et al., 2011). The role of attention on perceptual organization has been investi
gated for sounds that occur concurrently as well as for sounds that are sequentially
grouped into distinct perceptual streams. In the case of concurrent sound segregation
(Figure 11.2), the proposal that concurrent sound (p. 219) segregation is not under voli
tional control was confirmed in ERP studies using passive listening (Alain et al., 2001a,
2002; Dyson & Alain, 2004) as well as active listening paradigms that varied auditory
(Alain & Izenberg, 2003) or visual attentional demands (Dyson et al., 2005). For sequen
tial sound segregation, the results are more equivocal.
In most studies, the effect of attention on sequential sound organization has been investi
gated by mixing two sets of sound sequences that differ in terms of some acoustic feature
(e.g., in the frequency range of two sets of interleaved pure tones). In a typical frequency
paradigm, sounds are presented in patterns of “ABA—ABA—”, in which “A” and “B” are
tones of different frequencies and “—” is a silent interval (Figure 11.3A). The greater the
stimulation rate and the feature separation, the more likely and rapidly listeners are to
Page 7 of 35
Varieties of Auditory Attention
report hearing two separate streams of sounds (i.e., one of A’s and another of B’s), with
this type of perceptual organization or stream segregation taking several seconds to build
up. Using similar sequences as those shown in Figure 11.3A, Carlyon and colleagues
found that the buildup of stream segregation was affected by auditory (Carlyon et al.,
2001) and visual (Cusack et al., 2004) attention, and they proposed that attention may be
needed (p. 220) for stream segregation to occur. Also consistent with this hypothesis are
findings from neuropsychological studies in which patients with unilateral neglect follow
ing a brain lesion show impaired buildup in streaming relative to age-matched controls
when stimuli are presented to the neglected side (Carlyon et al., 2001). In addition to the
pro-attention studies reviewed above, there is also evidence to suggest that attention may
not be required for sequential perceptual organization to occur. For instance, patients
with unilateral neglect who are unaware of sounds presented to their neglected side ex
perience the “scale illusion” (Deutsch, 1975), which can occur only if the sounds from the
left and right ears are grouped together (Deouell et al., 2008). Such findings are difficult
to reconcile with a model invoking a required role of attention in stream segregation and
suggest that some organization must be taking place outside the focus of attention, as
others have suggested previously (e.g., Macken et al., 2003; McAdams & Bertoncini, 1997;
Snyder et al., 2006). This apparent discrepancy could be reconciled by assuming that se
quential stream segregation relies on multiple levels of representations (Snyder et al.,
2009), some of which may be more sensitive to volitional control (Gutschalk et al., 2005;
Snyder et al., 2006).
In summary, evidence from ERP studies suggests that perceptual organization of acoustic
features into sound objects can occur independently of attention (e.g., Alain & Izenberg,
2003; Alain & Woods, 1994; Snyder et al., 2006; Sussman et al., 2007). However, it is very
likely that attention facilitates perceptual organization and that selective attention may
determine which stream of sounds is in the foreground and which is in the background
(i.e., figure–ground segmentation) (Sussman et al., 2005). These views are compatible
with the object-based account of auditory attention in which primitive perceptual process
es sort the incoming acoustic data into its constituent sources, allowing selective process
es to work on the basis of meaningful objects (Alain & Arnott, 2000).
tended features as well as suppression in regions that are sensitive to the unattended
(task-irrelevant) features.
In the auditory attention literature, the notion of a gain mechanism was introduced early
on (e.g., Hillyard et al., 1973), although it was not originally ascribed as a feature-specific
process. For instance, early studies in nonhuman primates showed that the firing rate of
auditory neurons increased when sounds occurred at the attended location (Benson &
Hienz, 1978) or when attention was directed toward auditory rather than visual stimuli
(Hocherman et al., 1976), consistent with a mechanism that “amplifies” or enhances the
representation of task-relevant stimuli. Electroencephalography (EEG; Hillyard et al.,
1973; Woods et al., 1994) and magnetoencephalography (MEG; Ross et al., 2010; Woldorff
et al., 1993) studies provide further evidence for neural enhancement during auditory se
lective attention. For instance, the amplitude of the N1 wave (negative wave at ∼100 ms
after sound onset) from scalp-recorded auditory evoked potentials is larger when sounds
are task-relevant and fall within an “attentional spotlight” compared with when the same
sounds are task-irrelevant (Alho et al., 1987; Giard et al., 2000; Hansen & Hillyard, 1980;
Hillyard et al., 1973; Woldorff, 1995; Woods et al., 1994; Woods & Alain, 2001).
In a more recent study, Munte et al. (2010) measured auditory evoked responses from two
locations, each containing a spoken story and bandpass-filtered noise. Participants were
told to focus their attention on a designated story/location. Consistent with prior re
search, the N1 elicited by the speech probe was found to be larger at the attended than
Page 10 of 35
Varieties of Auditory Attention
at the unattended location. More importantly, the noise probes from the task-relevant
story’s location showed a more positive frontal ERP response at about 300 ms than did
the probes at the task-irrelevant location. Although the enhanced positivity may be indica
tive of a suppression mechanism, there is a possibility that it reflects novelty or target-re
lated activity, which may comprise a positive wave that peaks about the same time. More
over, using a similar design, but measuring the effect of sustained selective attention in
the EEG power spectrum, Kerlin et al. (2010) found no evidence of suppression for the
task-irrelevant stream of speech.
Page 11 of 35
Varieties of Auditory Attention
rons representing task-relevant sounds, a mechanism that may also enhance figure–
ground separation. Kauramaki et al. (2007) used the notched noise technique in which a
pure tone is embedded within noise that has a segment whose width around the pure
tone is parametrically varied to ease target detection (Figure 11.5). Kauramaki et al. mea
sured the N1 wave elicited by the pure tone during selective attention and found a de
crease in attention effects when the width of the notched noise was decreased. However,
the shape of the function was significantly different from a multiplicative one expected on
the basis of simple gain model of selective attention (see also Okamoto et al., 2007). Ac
cording to Kauramaki et al. (2007), auditory selective attention in humans cannot be ex
plained by a gain model, whereby only the neural activity level is increased. Rather, selec
tive attention additionally enhances auditory cortex frequency selectivity. This effect of
selective attention on frequency tuning evolves rapidly, within a few seconds after atten
tion switching, and appears to occur for neurons in nonprimary auditory cortices (Ahveni
nen et al., 2011).
Page 12 of 35
Varieties of Auditory Attention
equivocal. Both gain and suppression mechanisms, as well as sharper receptive field tun
ing, may enhance figure–ground segregation, thereby easing the monitoring and selec
tion of task-relevant information.
Sustained Attention
The process of monitoring the auditory environment for a particular event (e.g., perhaps
you are still waiting for the maître d to call out your name) has been studied in a number
of ways in order to reveal the neural architecture supporting sustained attention. One ex
ample is the oddball paradigm in which participants are asked to respond to infrequent
targets sounds in a train of nontarget stimuli. In oddball tasks, the target stimuli differ
from the standard sounds along a particular dimension (e.g., pitch, duration, intensity, lo
cation). The detection of the task-relevant stimuli is typically accompanied by a parietal-
central positive wave of the ERP, the P300 or P3b. Moreover, fMRI studies have identified
several brain areas that show increased hemodynamic activity during the detection of
these oddball targets relative to the nontarget stimuli, including the auditory cortex bilat
erally, parietal cortex and prefrontal cortex, supramarginal gyrus, frontal operculum, and
insular cortex bilaterally (Linden et al., 1999; Yoshiura et al., 1999). The increased fMRI
signals for target versus nontarget conditions are consistent over various stimuli (i.e., au
ditory versus visual stimuli) and response modalities (i.e., button pressing for targets ver
sus silently counting the targets) and can be regarded as specific for target detection in
both the auditory modality and the visual modality (Linden et al., 1999). Interestingly, the
amount of activity in the anterior cingulate and bilateral lateral prefrontal cortex, tempo
ral-parietal junction, postcentral gyri, thalamus, and cerebellum was positively correlated
with an increase in time between targets (Stevens et al., 2005). The fact that these effects
were only observed for target and not novel stimuli suggests that the activity in these ar
eas indexes the updating of a working memory template for the target stimuli or strategic
resource allocation processes (Stevens et al., 2005).
Another task that requires sustained attention is the n-back working memory task in
which participants indicate whether the incoming stimulus matches the one occurring
one, two, or three positions earlier. Alain and colleagues (2008) used a variant of this par
adigm and asked their participants to press a button only when the incoming stimulus
sound matched the previous one (1-back) in terms of identity (i.e., same sound category
such as animal sounds, human sounds, or musical instrument sounds) or location. Distinct
Page 13 of 35
Varieties of Auditory Attention
patterns of neural activity were observed for sustained attention and transient target-re
lated activity. Figure 11.6 shows sustained task-related activity during sound identity and
sound location processing after taking into account transient target-related activity. The
monitoring of sound attributes recruited many of the areas previously mentioned for the
oddball task, including auditory, parietal, and prefrontal cortices (see also, Martinkauppi
et al., 2000; Ortuno et al., 2002). Interestingly, changes in task instruction modulated ac
tivity in this attentional network, with greater activity in ventral areas, including the ante
rior temporal lobe and inferior frontal gyrus, when participants’ attention was directed
toward sound identity and greater activity in dorsal areas, including the inferior parietal
lobule, superior parietal cortex, and superior frontal gyrus, when participants’ attention
was directed toward sound location.
Although such data again suggest the relationship between specific regions or pathways
within the brain and specific perceptual and cognitive function, it is important to consider
the extent to which clear delineations can be made. For example, although the inferior
parietal lobule (IPL) is clearly involved in spatial analysis and may play an important role
in monitoring or updating sound source location in working memory, there is also plenty
of evidence demonstrating its involvement in nonspatial processing (see Arnott et al.,
2004). In fact, some of this activity may be accounted for in terms of an action–perception
Page 14 of 35
Varieties of Auditory Attention
dissociation (Goodale & Milner, 1992) in which the dorsal auditory pathway brain regions
are important for acting on objects and sounds in the environment (Arnott & Alain, 2011).
For example, in a task requiring listeners to categorize various sounds as being
“material” (i.e., malleable sheets of paper, Styrofoam, aluminium foil, or plastic being
crumpled in a person’s hands), “human” (i.e., nonverbal vocalizations including coughing,
yawning, snoring, and throat clearing) or “noise” (i.e., randomized material sounds),
Arnott et al. (2008) found increased blood-oxygen-level- dependent (BOLD) effect along a
dorsal region, the left intraparietal sulcus (IPS), in response to the material sounds. A
very similar type of activation was also reported by Lewis and colleagues when listeners
attended to hand-related (i.e., tool) sounds compared with animal vocalizations (Lewis,
2006; Lewis et al., 2005). Both groups proposed that such sounds triggered a “mental
mimicry” of the motor production sequences that most likely would have produced the
sounds, with the left hemispheric activation reflecting the right-handed dominance of the
participants. This explanation finds support in the fact that area (p. 225) hAIP, a region
shown to be active not only during real hand-grasping movements but also during imag
ined movements, as well as passive observation of people grasping three-dimensional ob
jects, is located proximally at the junction of the anterior IPS and inferior postcentral sul
cus (Culham et al., 2006).
Additionally, it is noteworthy that the IPS is an area of the IPL known to make use of visu
al input, and that it is particularly important for integrating auditory and visual informa
tion (Calvert, 2001; Macaluso et al., 2004; Meienbrock et al., 2007), especially with re
spect to guiding and controlling action in space (e.g., Andersen et al., 1997; Sestieri et al.,
2006). As noted earlier, knowledge about auditory material properties is largely depen
dent on prior visual experiences, at least for normally sighted individuals. Thus, it is plau
sible that the IPS auditory material–property activity reflects the integration of auditory
input with its associated visual knowledge.
The above data remind us that successful interaction with the complex and dynamic
acoustic environment that we ultimately face involves the coordination and integration of
attentional demands from other modalities such as vision, the timely initiation of task-ap
propriate action, and the maintenance of attentional processes over long periods of time.
Nevertheless, there are cases in which less important signals must be sacrificed for more
important signals, and the brain regions associated with selective attention are those to
which we now turn.
Selective Attention
Auditory selective attention was originally studied using dichotic listening situations in
which two (p. 226) different streams of speech sounds were presented simultaneously in
both ears (Broadbent, 1962; Cherry, 1953; for a review, see Driver, 2001). In such situa
tions, participants were asked to shadow (repeat) the message presented in one ear while
ignoring the speech sounds presented at the irrelevant location (i.e., the other ear). This
intramodal attention (i.e., between streams of sounds) involves sustained attention to a
particular stream of sounds in the midst of others, usually defined by its most salient fea
Page 15 of 35
Varieties of Auditory Attention
tures, such as pitch and location (e.g., Hansen & Hillyard, 1983). This form of selective at
tention differs from intermodal attention, whereby participants are presented with
streams of auditory and visual stimuli and alternatively focus on either auditory or visual
stimuli in order to detect infrequent target stimuli. The neural networks supporting in
tramodal and intermodal selective attention are examined next in turn.
Recently, there has been increased interest in examining the effects of attention on audi
tory cortical activity. This interest is motivated in part by the notion that the auditory cor
tex is not a single entity but rather comprises many cortical fields that appear to be dif
ferentially sensitive to sound frequency and sound location, and by the analogous discov
ery of feature-specific enhancement and suppression of visual neurons during visual se
lective attention. Although the effects of attention on neural activity in auditory cortical
areas are not disputed, the effects of attention on the primary auditory cortex remain
equivocal. For instance, some fMRI studies do not find evidence for attention effects on
primary auditory cortex in Heschl’s gyrus (Hill & Miller, 2009; Petkov et al., 2004),
whereas others report enhancements in frequency-sensitive regions, although the atten
tion effects are not necessarily restricted to them (Paltoglou et al., 2009). The lack of at
tention effects on the BOLD signal from the primary cortex does not mean that neural ac
tivity in primary auditory cortex is not modulated by attention; it may be too weakly or
differentially modulated such that the BOLD effect cannot capture it. As we have already
seen, studies that have used another imaging technique such as EEG or MEG provide evi
dence suggesting that selective attention amplifies neural activity in frequency-sensitive
regions (Woods et al., 1994; Woods & Alain, 2001) as well as in or near primary areas
(Woldorff & Hillyard, 1991; Woldorff et al., 1993). In addition, single-unit research in
mammals has shown that attention can modulate the neural firing rate of neurons in pri
mary auditory cortex (Benson and Hienz, 1978; Hocherman et al., 1976).
Moreover, intramodal selective attention to location (i.e., left or right ear) has been
shown to increase BOLD activity in the right middle frontal gyrus regardless of the loca
tion of attentional focus (Lipschutz et al., 2002). In contrast, brain regions including the
middle and inferior frontal cortex, frontal eye fields (FEFs), and the superior temporal
cortex in the contralateral hemisphere did show attention-related activity according to
Page 16 of 35
Varieties of Auditory Attention
which ear was being attended to (Lipschutz et al., 2002). Activation in the superior tem
poral cortex extended through the temporal-parietal junction to the inferior parietal cor
tex, including the IPS (Lipschutz et al., 2002). The activation in the human homologue of
FEFs during auditory spatial attention has been reported in several studies (e.g., Lip
schutz et al., 2002; Tzourio et al., 1997; Zatorre et al., 1999), but the results should be in
terpreted with caution because these studies did not control for eye movements, which
could partly account for the activation in the FEFs. In a more recent study, Tark and Cur
tis (2009) found FEF activity during audiospatial working memory task even for sounds
that were presented behind the head to which it was impossible to make saccades. Their
findings are consistent with the proposal that FEF plays an important role in processing
and maintaining sound location (Arnott & Alain, 2011).
In addition to enhanced activity in cortical areas during intramodal auditory selective at
tention, there is also evidence from fMRI that selective attention (p. 227) modulates activi
ty in the human inferior colliculus (Rinne et al., 2008). The inferior colliculus is a mid
brain nucleus of the ascending auditory pathway with diverse internal and external con
nections. The inferior colliculus also receives descending projections from the auditory
cortex, suggesting that cortical processes affect inferior colliculus operations. Enhanced
fMRI activity in the basal ganglia has also been observed during auditory selective atten
tion to speech sounds (Hill & Miller, 2009). There is also some evidence that selective at
tention may modulate neural activity at the peripheral level via descending projections.
However, the effects of attention on the peripheral and efferent auditory pathways re
main equivocal. For example, although some studies report attention effects on the pe
ripheral auditory systems as measured with evoked otoacoustic emissions (Giard et al.,
1994; Maison et al., 2001), other studies do not (Avan & Bonfils, 1992; Michie et al.,
1996; Timpe-Syverson & Decker, 1999).
Page 17 of 35
Varieties of Auditory Attention
et al., 2007). Accordingly, it seems plausible that intermodal attention could theoretically
alter auditory processing at the very earliest stages of auditory processing (i.e., at senso
ry transduction).
Further underscoring the need to consider the interaction of attentional demands be
tween modalities, there is evidence to suggest that during intermodal selective attention
tasks, attention to auditory stimuli may alter visual cortical activity. The attentional repul
sion effect is one example of this. Traditionally understood as a purely visual phenome
non, attentional repulsion refers to the perceived displacement of a vernier stimulus in a
direction that is opposite to that of a brief peripheral visual cue (Suzuki & Cavanagh,
1997). Observers in these behavioral tasks typically judge two vertical lines placed above
and below one another to be offset in a direction that is opposite to the location of a
briefly presented irrelevant visual stimulus. Under the assumption that the repulsion ef
fect exerts its effect in early retinotopic areas (i.e., visual cortex; Pratt & Turk-Browne,
2003; Suzuki & Cavanagh, 1997), Arnott and Goodale (2006) sought to determine
whether peripheral auditory events could also elicit the repulsion effect. In keeping with
the notion that sounds can alter occipital activity, peripheral sounds were also found to
elicit the repulsion effect.
More direct evidence for enhancement of activity in occipital cortex comes from an fMRI
study in which selective attention to sounds was found to activate visual cortex (Cate et
al., 2009). In that particular study, the occipital activations appeared to be specific to at
tended auditory stimuli given that the same sounds failed to produce occipital activation
when they were not being attended to (Cate et al., 2009). Moreover, there is some evi
dence that auditory attention, but not passive exposure to sounds, activates peripheral re
gions of visual cortex when participants attended to sound sources outside the visual
field. Functional connections between auditory cortex and visual cortex subserving the
peripheral visual field appear to underlie the generation of auditory occipital activations
(Cate et al., 2009). This activation may reflect the priming of visual regions to process
soon-to-appear objects associated with unseen sound sources and provides further sup
port for the idea that the auditory “where” subsystem may be in the service of the visual-
motor “where” subsystem (Kubovy & Van Valkenburg, 2001). In fact, the functional over
lap between the auditory cortical spatial network and the visual orientation network is
quite striking, as we have recently shown, suggesting that the auditory spatial network
and visual orienting network share a (p. 228) common neural substrate (Arnott & Alain,
2011). Along the same line, auditory selective attention to speech modulates activity in
the visual word form areas (Yoncheva et al., 2009), suggesting a high level of interaction
between sensory systems even at the relatively early stages of processing.
Throughout this discussion of intermodal attention, one should also keep in mind that
even though auditory information can certainly be obtained in the absence of other senso
ry input (e.g., sounds perceived in the dark, or with the eyes closed), for the vast majority
of neurologically intact individuals, any given auditory experience is often experienced in
the presence of other sensory (especially visual) input. Thus, there is good reason to ex
pect that in such cases, the neural architecture of auditory processing may be interwoven
Page 18 of 35
Varieties of Auditory Attention
with that of other sensory systems, especially in instances in which the two are depen
dent on one another. Once again, neuroimaging data derived from the experience of audi
tory material property information are useful in this regard. Unlike auditory localization
where a sound’s position can be constructed from interaural timing and intensity differ
ences, the acoustic route to the material properties of any given object depends entirely
on previous associations between sound and information from other senses (e.g., hearing
the sound that an object makes as one watches someone or something come into contact
with that object, or hearing the sound that an object makes as one touches it). Building
on research showing that visual material processing appears to be accomplished in ven
tromedial brain areas that include the collateral sulcus and parahippocampal gyrus (Cant
& Goodale, 2007), Arnott and colleagues used fMRI to investigate the brain regions in
volved in auditory material processing (Arnott et al., 2008). Relative to control sounds,
audio recordings of various materials being manipulated in someone’s hands (i.e., paper,
plastic, aluminium foil, and Styrofoam) were found to elicit greater hemodynamic activity
in the medial region of the right parahippocampus both in neurologically intact individu
als and in a cortically blind individual. Most interestingly, a concomitant visual material
experiment in which the sighted participants viewed pictures of objects rendered in dif
ferent materials (e.g., plastic, wood, marble, foil) was found to elicit right parahippocam
pal activity in an area just lateral to the auditory-evoked region. These results fit well
with animal neuroanatomy in that the most medial aspect of the monkey parahippocam
pus (i.e., area TH) has connections with auditory cortex (Blatt et al., 2003; Lavenex et al.,
2004; Suzuki et al., 2003), whereas the region situated immediately adjacent to area TH
(i.e., area TFm in the macaque or TL in the rhesus monkey) has strong inputs from areas
processing visual information, receiving little if any input from auditory areas (Blatt et al.,
2003).
The data from both the intramodal and intermodal attentional literature reminds us again
that attention may have a variety of neural expressions, at both relatively early (e.g., oc
cipital) and late (e.g., parahippocampal) stages of processing, and depends to a large ex
tent on specific task demands (Griffiths et al., 2004). In this respect, attention demon
strates itself as a pervasive and powerful influence on brain functioning.
Divided Attention
Divided attention between two concurrent streams of sounds (one in each hemispace) or
between auditory modality and visual modality, has been associated with enhanced activi
ty in the precuneus, IPS, FEFs, and middle frontal gyrus compared with focused attention
to either one modality or location (Santangelo et al., 2010). Moreover, Lipschutz et al.
(2002) found comparable activation during selective and divided attention suggestive of a
common neural network. Bimodal divided attention (i.e., attending to auditory and visual
stimuli) has also been associated with enhanced activity in the posterior dorsolateral pre
frontal cortex (Johnson et al., 2007). Importantly, the same area was not significantly ac
tive when participants focused their attention to either visual or auditory stimuli or when
they were passively exposed to bimodal stimuli (Johnson & Zatorre, 2006). The impor
tance of the dorsolateral prefrontal cortex (DLPFC) during bimodal divided attention is
Page 19 of 35
Varieties of Auditory Attention
Our ability to attend a particular sound object or sound location is not instantaneous and
may require a number of cognitive alterations. We may need to disengage from what we
are doing, switch our attention to a different sensory modality, focus on a different spatial
location or object, and then engage our selection mechanisms. It has been generally es
tablished that focusing attention on a (p. 229) particular input modality, or a particular
“what” or “where” feature, modulates cortical activity such that task-relevant representa
tions are enhanced at the expense of irrelevant ones (Alain et al., 2001b; Degerman et al.,
2006; Johnson & Zatorre, 2005, 2006). Although the object-based model can adequately
account for many findings that involve focusing attention to a task-relevant stream, there
is a growing need to better understand how attention is deployed toward an auditory ob
ject within an auditory scene and whether the mechanisms at play when directing audito
ry attention toward spatial and nonspatial cues also apply when the auditory scene com
prises multiple sound objects. Although substantial research has been carried out on sus
tained and selective attention, fewer studies have examined the deployment of attention,
especially in the auditory modality.
One popular paradigm to assess the deployment of attention consists of presenting an in
formational cue before either a target sound or streams of sounds in which the likely in
coming targets are embedded (Green et al., 2005; Stormer et al., 2009). The mechanisms
involved in the control of attention can be assessed by comparing brain activity during
the cue period (i.e., the interval between the cue and the onset of the target or stream of
sounds) in conditions in which attention is directed to a particular feature (e.g., location
or pitch) that defined either the likely incoming target or the streams of sounds to be at
tended. Such a design has enabled researchers to identify a frontal-parietal network in
the deployment of attention to either a particular location (Hill & Miller, 2009; Salmi et
al., 2007a, 2009; Wu et al., 2007) or sound identity (Hill & Miller, 2009). A similar frontal-
parietal network has been reported during attention switching between locations in both
auditory modality and visual modality (Salmi et al., 2007b; Shomstein & Yantis, 2004;
Smith et al., 2010), sound locations (left vs. right ear), or sound identities (male vs. fe
male voice) (Shomstein & Yantis, 2006). The network that mediates voluntary control of
auditory spatial and nonspatial attention encompasses several brain areas that vary
among studies and task and include, but are not limited to, the inferior and superior
frontal gyrus, dorsal precentral sulcus, IPS, superior parietal lobule, and auditory cortex.
Some of the areas (e.g., anterior cingulate, FEFs, superior parietal lobule) involved in ori
enting auditory spatial attention are similar to those observed during the orientation of
visual spatial attention, suggesting that control of spatial attention may be supported by a
combination of supramodal and modality- specific brain mechanism (Wu et al., 2007). The
activations in this network vary as a function of the feature to be attended, with location
Page 20 of 35
Varieties of Auditory Attention
recruiting the parietal cortex to a greater extent and attention to pitch recruiting the infe
rior frontal gyrus. These findings resemble studies of auditory working memory for sound
location and sound identity (Alain et al., 2008; Arnott et al., 2005; Rama et al., 2004).
Additionally, the neural networks involved in the endogenous control of attention differ
from those engaged by salient auditory oddball or novel stimuli designed to capture a
participant’s attention in an exogenous fashion. Salmi et al. (2009) used fMRI to measure
brain activity elicited by infrequently occurring loudness deviation tones (LTDs) while
Page 21 of 35
Varieties of Auditory Attention
participants were told to focus their attention on one auditory stream (e.g., left ear) and
to ignore sounds presented in the other ear (i.e., right ear). The LTD occurred in both
streams and served as a means of assessing involuntary (i.e., bottom-up) attentional cap
ture. The authors found impaired performance when the targets were preceded by LTDs
in the unattended location, and this effect coincided with enhanced activity in the ventro
medial prefrontal cortex (VMPFC), possibly related to evaluation of the distracting event
(Figure 11.7). Together, these fMRI studies reveal a complex neural network involved in
the deployment of auditory attention. In a recent study, Gamble and Luck (2011)
measured auditory ERPs while listeners were presented with two clearly distinguishable
sound objects occurring in the left and right hemispace simultaneously. Participants indi
cated whether a predefined target was present or absent. They found an increased nega
tivity between 200 and 400 ms that was maximum at anterior and contralateral elec
trodes to the target location, which was followed by a posterior contralateral positivity.
These results suggest that auditory attention can be quickly deployed to the sound object
location. (p. 230) More important, these findings suggest that scalp-recordings of ERPs
may provide a useful tool for studying the deployment of auditory attention in real-life sit
uations in which multiple sound objects are simultaneously present in the environment.
This is an important issue to address given that auditory perception often occurs in a
densely cluttered, rapidly changing acoustic environment, where multiple sound objects
compete for attention.
Page 22 of 35
Varieties of Auditory Attention
Future Directions
Over the past decade, we have seen a significant increase in research activity regarding
the mechanisms supporting the varieties of auditory attention. Attention to auditory mate
rial engages a broadly distributed neural network that varies as a function of task de
mands, including selective, divided, and sustained attention. An important goal for future
research will be to clarify the role of auditory cortical areas as well as those beyond audi
tory cortices (p. 231) (e.g., parietal cortex) in auditory attention. This may require a com
bination of neuroimaging techniques such as EEG, TMS, and fMRI, as well as animal stud
ies using microstimulation and selective-deactivation (e.g. cooling) techniques combined
with behavioral measures.
Current research using relatively simple sounds (e.g., pure tone) suggests that selective
attention may involve facilitation and suppression of task-relevant and task-irrelevant
stimuli, respectively. However, further research is needed to determine the extent to
which attentional mechanisms derived from paradigms using relatively simple stimuli ac
count for the processes involved in more complex and realistic listening situations often
illustrated using the cocktail party example. Speech is a highly familiar stimulus, and our
Page 23 of 35
Varieties of Auditory Attention
auditory system has had the opportunity to learn about speech-specific properties (e.g.,
f0, formant transitions) that may assist listeners while they selectively attend to speech
stimuli (Rossi-Katz & Arehart, 2009). For instance, speech sounds activate schemata that
may interact with more primitive mechanisms, thereby influencing our incoming acoustic
data to perceptually organize and select for further processing. It is unlikely that such
schemata play an equivalent role in the processing of pure tones, so the relationship be
tween bottom-up and top-down contributions in the deployment of attention may be dif
ferent according to the naturalism of the auditory environment used. Lastly, spoken com
munication is a multimodal and highly interactive process whereby visual input can help
listeners identify speech in noise and can also influence what is heard. Hence, it is also
important to examine the role of visual information during selective attention to speech
sounds.
In short, it is clear that auditory attention plays a central (and perhaps even pri
(p. 232)
mary) role in guiding our interaction with the external world. However, in reviewing the
literature, we have noted how auditory attention is intimately connected with other fun
damental issues such as multimodal integration, the relationship between perception and
action-based processing, and how mental representations are maintained across both
space and time. In advancing the field, it will be important not to ignore the complexity of
the problem, such that our understanding of the neural substrates that underlie auditory
attention reflect this core mechanism at its most ecologically valid expression.
References
Ahveninen, J., Hamalainen, M., Jaaskelainen, I. P., Ahlfors, S. P., Huang, S., Lin, F. H., Raij,
T., Sams, M., Vasios, C. E., & Belliveau, J. W. (2011). Attention-driven auditory cortex
short-term plasticity helps segregate relevant sounds from noise. Proceedings of the Na
tional Academy of Sciences U S A, 108, 4182–4187.
Alain, C., Achim, A., & Richer, F. (1993). Perceptual context and the selective attention ef
fect on auditory event-related brain potentials. Psychophysiology, 30, 572–580.
Alain, C., & Arnott, S. R. (2000). Selectively attending to auditory objects. Frontiers in
Bioscience, 5, D202–D212.
Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). “What” and
“where” in the human auditory system. Proceedings of the National Academy of Sciences
U S A, 98, 12301–12306.
Alain, C., Arnott, S. R., & Picton, T. W. (2001). Bottom-up and top-down influences
(p. 233)
on auditory scene analysis: Evidence from event-related brain potentials. Journal of Ex
perimental Psychology: Human Perception and Performance, 27, 1072–1089.
Alain, C., & Bernstein, L. J. (2008). From sounds to meaning: The role of attention during
auditory scene analysis. Current Opinion in Otolaryngology & Head and Neck Surgery,
16, 485–489.
Page 24 of 35
Varieties of Auditory Attention
Alain, C., He, Y., & Grady, C. (2008). The contribution of the inferior parietal lobe to audi
tory spatial working memory. Journal of Cognitive Neuroscience, 20, 285–295.
Alain, C., & Izenberg, A. (2003). Effects of attentional load on auditory scene analysis.
Journal of Cognitive Neuroscience, 15, 1063–1073.
Alain, C., Schuler, B. M., & McDonald, K. L. (2002). Neural activity associated with distin
guishing concurrent auditory objects. Journal of the Acoustical Society of America, 111,
990–995.
Alain, C., & Woods, D. L. (1993). Distractor clustering enhances detection speed and ac
curacy during selective listening. Perception & Psychophysics, 54, 509–514.
Alain, C., & Woods, D. L. (1994). Signal clustering modulates auditory cortical activity in
humans. Perception & Psychophysics, 56, 501–516.
Alain, C., & Woods, D. L. (1997). Attention modulates auditory pattern memory as indexed
by event-related brain potentials. Psychophysiology, 34, 534–546.
Alho, K., Tottola, K., Reinikainen, K., Sams, M., & Naatanen, R. (1987). Brain mechanism
of selective listening reflected by event-related potentials. Electroencephalography and
Clinical Neurophysiology, 68, 458–470.
Andersen, R. A., Snyder, L. H., Bradley, D. C., & Xing, J. (1997). Multiple representation of
space in the posterior parietal cortex and its use in planning movements. Annual Review
of Neuroscience, 20, 303–330.
Arnott, S. R., & Alain, C. (2002a). Effects of perceptual context on event-related brain po
tentials during auditory spatial attention. Psychophysiology, 39, 625–632.
Arnott, S. R., & Alain, C. (2002b). Stepping out of the spotlight: MMN attenuation as a
function of distance from the attended location. NeuroReport, 13, 2209–2212.
Arnott, S. R., & Alain, C. (2011). The auditory dorsal pathway: Orienting vision. Neuro
science and Biobehavioral Reviews, 35 (10), 2162–2173.
Arnott, S. R., Binns, M. A., Grady, C. L., & Alain, C. (2004). Assessing the auditory dual-
pathway model in humans. NeuroImage, 22, 401–408.
Arnott, S. R., Cant, J. S., Dutton, G. N., & Goodale, M. A. (2008). Crinkling and crumpling:
An auditory fMRI study of material properties. NeuroImage, 43, 368–378.
Arnott, S. R., Grady, C. L., Hevenor, S. J., Graham, S., & Alain, C. (2005). The functional
organization of auditory working memory as revealed by fMRI. Journal of Cognitive Neu
roscience, 17, 819–831.
Arnott, S. R., & Goodale, M. A. (2006). Distorting visual space with sound. Vision Re
search, 46, 1553–1558.
Page 25 of 35
Varieties of Auditory Attention
Avan, P., & Bonfils, P. (1992). Analysis of possible interactions of an attentional task with
cochlear micromechanics. Hearing Research, 57, 269–275.
Baylis, G. C., & Driver, J. (1993). Visual attention and objects: Evidence for hierarchical
coding of location. Journal of Experimental Psychology: Human Perception and Perfor
mance, 19, 451–470.
Benson, D. A., & Hienz, R. D. (1978). Single-unit activity in the auditory cortex of mon
keys selectively attending left vs. right ear stimuli. Brain Research, 159, 307–320.
Blatt, G. J., Pandya, D. N., & Rosene, D. L. (2003). Parcellation of cortical afferents to
three distinct sectors in the parahippocampal gyrus of the rhesus monkey: An anatomical
and neurophysiological study. Journal of Comparative Neurology, 466, 161–179.
Brefczynski, J. A., & DeYoe, E. A. (1999). A physiological correlate of the “spotlight” of vi
sual attention. Nature Neuroscience, 2, 370–374.
Bregman, A. S., & Rudnicky, A. I. (1975). Auditory segregation: Stream or streams? Jour
nal of Experimental Psychology: Human Perception and Performance, 1, 263–267.
Broadbent, D. E. (1962). Attention and the perception of speech. Scientific American, 206,
143–151.
Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional
neuroimaging studies. Cerebral Cortex, 11, 1110–1123.
Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates dif
ferent regions of human occipitotemporal cortex. Cerebral Cortex, 17, 713–731.
Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8,
465–471.
Carlyon, R. P., Cusack, R., Foxton, J. M., & Robertson, I. H. (2001). Effects of attention
and unilateral neglect on auditory stream segregation. Journal of Experimental Psycholo
gy: Human Perception and Performance, 27, 115–127.
Cate, A. D., Herron, T. J., Yund, E. W., Stecker, G. C., Rinne, T., Kang, X., Petkov, C. I., Dis
brow, E. A., & Woods, D. L. (2009). Auditory attention activates peripheral visual cortex.
PLoS One, 4, e4645.
Chen, Z., & Cave, K. R. (2008). Object-based attention with endogenous cuing and posi
tional certainty. Perception & Psychophysics, 70, 1435–1443.
Cherry, E. C. (1953). Some experiments on the recognition of speech with one and with
two ears. Journal of the Acoustical Society of America, 25, 975–979.
Page 26 of 35
Varieties of Auditory Attention
Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their
mutual constraints within the human information-processing system. Psychological Bul
letin, 104, 163–191.
Cowan, N. (1993). Activation, attention, and short-term memory. Memory & Cognition, 21,
162–167.
Culham, J. C., Cavina-Pratesi, C., & Singhal, A. (2006). The role of parietal cortex in visuo
motor control: what have we learned from neuroimaging? Neuropsychologia, 44, 2668–
2684.
Cusack, R., Deeks, J., Aikman, G., & Carlyon, R. P. (2004). Effects of location, frequency
region, and time course of selective attention on auditory scene analysis. Journal of Ex
perimental Psychology: Human Perception and Performance, 30, 643–656.
Degerman, A., Rinne, T., Salmi, J., Salonen, O., & Alho, K. (2006). Selective attention to
sound location or pitch studied with fMRI. Brain Research, 1077, 123–134.
Degerman, A., Rinne, T., Sarkka, A. K., Salmi, J., & Alho, K. (2008). Selective attention to
sound location or pitch studied with event-related brain potentials and magnetic fields.
European Journal of Neuroscience, 27, 3329–3341.
Delano, P. H., Elgueda, D., Hamame, C. M., & Robles, L. (2007). Selective attention to vi
sual stimuli reduces cochlear sensitivity in chinchillas. Journal of Neuroscience, 27, 4146–
4153.
Deouell, L. Y., Deutsch, D., Scabini, D., & Knight, R. T. (2008). No disillusions in auditory
extinction: Perceived a melody comprised of unperceived notes. Frontiers in Human Neu
roscience, 1, 1–6.
Deutsch, D. (1975). Two-channel listening to musical scales. Journal of the Acoustical So
ciety of America, 57, 1156–1160.
Driver, J. (2001). A selective review of selective attention research from the past century.
British Journal of Psychology, 92, 53–78.
Driver, J., & Baylis, G. C. (1989). Movement and visual attention: The spotlight metaphor
breaks down. Journal of Experimental Psychology: Human Perception and Performance,
15, 448–456.
Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology: General, 113, 501–517.
Dyson, B. J., & Alain, C. (2004). Representation of concurrent acoustic objects in primary
auditory cortex. Journal of the Acoustical Society of America, 115, 280–288.
Dyson, B. J., Alain, C., & He, Y. (2005). Effects of visual attentional load on low-level audi
tory scene analysis. Cognitive, Affective, & Behavioral Neuroscience, 5, 319–338.
Page 27 of 35
Varieties of Auditory Attention
Dyson, B. J., & Ishfaq, F. (2008). Auditory memory can be object-based. Psychonomic Bul
letin & Review, 15, 409–412.
Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy
chology: General, 123, 161–177.
Gamble, M. L., & Luck, S. J. (2011). N2ac: An ERP component associated with the focus
ing of attention within an auditory scene. Psychophysiology, 48, 1057–1068.
Giard, M. H., Collet, L., Bouchet, P., & Pernier, J. (1994). Auditory selective attention in
the human cochlea. Brain Research, 633, 353–356.
Giard, M. H., Fort, A., Mouchetant-Rostaing, Y., & Pernier, J. (2000). Neurophysiological
mechanisms of auditory selective attention in humans. Frontiers in Bioscience, 5, D84–
D94.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15, 20–25.
Green, J. J., Teder-Salejarvi, W. A., & McDonald, J. J. (2005). Control mechanisms mediat
ing shifts of attention in auditory and visual space: A spatio-temporal ERP analysis. Exper
imental Brain Research, 166, 358–369.
Griffiths, T. D., Warren, J. D., Scott, S. K., Nelken, I., & King, A. J. (2004). Cortical process
ing of complex sound: A way forward? Trends in Neurosciences, 27, 181–185.
Grill-Spector, K., Henson, R., & Martin, A. (2006). Repetition and the brain: Neural mod
els of stimulus-specific effects. Trends in Cognitive Sciences, 10, 14–23.
Gutschalk, A., Micheyl, C., Melcher, J. R., Rupp, A., Scherg, M., & Oxenham, A. J. (2005).
Neuromagnetic correlates of streaming in human auditory cortex. Journal of Neuro
science, 25, 5382–5388.
Hansen, J. C., & Hillyard, S. A. (1980). Endogenous brain potentials associated with selec
tive auditory attention. Electroencephalography and Clinical Neurophysiology, 49, 277–
290.
Hecht, L. N., Abbs, B., & Vecera, S. P. (2008). Auditory object-based attention. Visual Cog
nition, 16, 1109–1115.
Hill, K. T., & Miller, L. M. (2009). Auditory attentional control and selection during cock
tail party listening. Cerebral Cortex, 20, 583–590.
Hillyard, S. A., Hink, R. F., Schwent, V. L., & Picton, T. W. (1973). Electrical signs of selec
tive attention in the human brain. Science, 182, 177–180.
Page 28 of 35
Varieties of Auditory Attention
Hillyard, S. A., Vogel, E. K., & Luck, S. J. (1998). Sensory gain control (amplification) as a
mechanism of selective attention: Electrophysiological and neuroimaging evidence. Philo
sophical Transactions of the Royal Society of London, Series B, Biological Sciences, 353,
1257–1270.
Hillyard, S. A., Woldorff, M., Mangun, G. R., & Hansen, J. C. (1987). Mechanisms of early
selective attention in auditory and visual modalities. Electroencephalography and Clinical
Neurophysiology Supplement, 39, 317–324.
Hocherman, S., Benson, D. A., Goldstein, M. H., Jr., Heffner, H. E., & Hienz, R. D. (1976).
Evoked unit activity in auditory cortex of monkeys performing a selective attention task.
Brain Research, 117, 51–68.
Johnson, J. A., Strafella, A. P., & Zatorre, R. J. (2007). The role of the dorsolateral pre
frontal cortex in bimodal divided attention: two transcranial magnetic stimulation studies.
Journal of Cognitive Neuroscience, 19, 907–920.
Johnson, J. A., & Zatorre, R. J. (2005). Attention to simultaneous unrelated auditory and vi
sual events: behavioral and neural correlates. Cerebral Cortex, 15, 1609–1620.
Johnson, J. A., & Zatorre, R. J. (2006). Neural substrates for dividing and focusing
(p. 234)
attention between simultaneous auditory and visual events. NeuroImage, 31, 1673–1681.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, atten
tion, and memory. Psychological Review, 83, 323–355.
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological
Review, 96, 459–491.
Kauramaki, J., Jaaskelainen, I. P., & Sams, M. (2007). Selective attention increases both
gain and feature selectivity of the human auditory cortex. PLoS One, 2, e909.
Kawashima, R., Imaizumi, S., Mori, K., Okada, K., Goto, R., Kiritani, S., Ogawa, A., &
Fukuda, H. (1999). Selective visual and auditory attention toward utterances: A PET
study. NeuroImage, 10, 209–215.
Kerlin, J. R., Shahin, A. J., & Miller, L. M. (2010). Attentional gain control of ongoing corti
cal speech representations in a “cocktail party.” Journal of Neuroscience, 30, 620–628.
Kubovy, M., & Van Valkenburg, D. (2001). Auditory and visual objects. Cognition, 80, 97–
126.
LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen
tal Psychology: Human Perception and Performance, 9, 371–379.
Lavenex, P., Suzuki, W. A., & Amaral, D. G. (2004). Perirhinal and parahippocampal cor
tices of the macaque monkey: Intrinsic projections and interconnections. Journal of Com
parative Neurology, 472, 371–394.
Page 29 of 35
Varieties of Auditory Attention
Leung, A. W., & Alain, C. (2011). Working memory load modulates the auditory “What”
and “Where” neural networks. NeuroImage, 55, 1260–1269.
Lewis, J. W. (2006). Cortical networks related to human use of tools. Neuroscientist, 12,
211–231.
Lewis, J. W., Brefczynski, J. A., Phinney, R. E., Janik, J. J., & DeYoe, E. A. (2005). Distinct
cortical pathways for processing tool versus animal sounds. Journal of Neuroscience, 25,
5148–5158.
Linden, D. E., Prvulovic, D., Formisano, E., Vollinger, M., Zanella, F. E., Goebel, R., &
Dierks, T. (1999). The functional neuroanatomy of target detection: An fMRI study of visu
al and auditory oddball tasks. Cerebral Cortex, 9, 815–823.
Lipschutz, B., Kolinsky, R., Damhaut, P., Wikler, D., & Goldman, S. (2002). Attention-de
pendent changes of activation and connectivity in dichotic listening. NeuroImage, 17,
643–656.
Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J. (2004). Spatial and temporal
factors during processing of audiovisual speech: A PET study. NeuroImage, 21, 725–732.
Macken, W. J., Tremblay, S., Houghton, R. J., Nicholls, A. P., & Jones, D. M. (2003). Does
auditory streaming require attention? Evidence from attentional selectivity in shortterm
memory. Journal of Experimental Psychology: Human Perception and Performance, 29,
43–51.
Maeder, P. P., Meuli, R. A., Adriani, M., Bellmann, A., Fornari, E., Thiran, J. P., Pittet, A., &
Clarke, S. (2001). Distinct pathways involved in sound recognition and localization: a hu
man fMRI study. NeuroImage, 14, 802–816.
Maison, S., Micheyl, C., & Collet, L. (2001). Influence of focused auditory attention on
cochlear activity in humans. Psychophysiology, 38, 35–40.
Martinkauppi, S., Rama, P., Aronen, H. J., Korvenoja, A., & Carlson, S. (2000). Working
memory of auditory localization. Cerebral Cortex, 10, 889–898.
McAdams, S., & Bertoncini, J. (1997). Organization and discrimination of repeating sound
sequences by newborn infants. Journal of the Acoustical Society of America, 102, 2945–
2953.
McMains, S. A., & Somers, D. C. (2004). Multiple spotlights of attentional selection in hu
man visual cortex. Neuron, 42, 677–686.
Meienbrock, A., Naumer, M. J., Doehrmann, O., Singer, W., & Muckli, L. (2007). Retino
topic effects during spatial audiovisual integration. Neuropsychologia, 45, 531–539.
Michie, P. T., LePage, E. L., Solowij, N., Haller, M., & Terry, L. (1996). Evoked otoacoustic
emissions and auditory selective attention. Hearing Research, 98, 54–67.
Page 30 of 35
Varieties of Auditory Attention
Michie, P. T., Solowij, N., Crawford, J. M., & Glue, L. C. (1993). The effects of between-
source discriminability on attended and unattended auditory ERPs. Psychophysiology, 30,
205–220.
Mondor, T. A., & Terrio, N. A. (1998). Mechanisms of perceptual organization and audito
ry selective attention: The role of pattern structure. Journal of Experimental Psychology:
Human Perception and Performance, 24, 1628–1641.
Moray, N., & O’Brien, T. (1967). Signal-detection theory applied to selective listening.
Journal of the Acoustical Society of America, 42, 765–772.
Munte, T. F., Spring, D. K., Szycik, G. R., & Noesselt, T. (2010). Electrophysiological atten
tion effects in a virtual cocktail-party setting. Brain Research, 1307, 78–88.
Näätänen, R., Gaillard, A. W., & Mantysalo, S. (1978). Early selective-attention effect on
evoked potential reinterpreted. Acta Psychologica (Amsterdam), 42, 313–329.
O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the
units of attentional selection. Nature, 401, 584–587.
Okamoto, H., Stracke, H., Wolters, C. H., Schmael, F., & Pantev, C. (2007). Attention im
proves population-level frequency tuning in human auditory cortex. Journal of Neuro
science, 27, 10383–10390.
Ortuno, F., Ojeda, N., Arbizu, J., Lopez, P., Marti-Climent, J. M., Penuelas, I., & Cervera, S.
(2002). Sustained attention in a counting task: normal performance and functional neu
roanatomy. NeuroImage, 17, 411–420.
Paltoglou, A. E., Sumner, C. J., & Hall, D. A. (2009). Examining the role of frequency speci
ficity in the enhancement and suppression of human cortical activity by auditory selective
attention. Hearing Research, 257, 106–118.
Petkov, C. I., Kang, X., Alho, K., Bertrand, O., Yund, E. W., & Woods, D. L. (2004). Atten
tional modulation of human auditory cortex. Nature Neuroscience, 7, 658–663.
Picton, T. W., Alain, C., Otten, L., Ritter, W., & Achim, A. (2000). Mismatch negativity: Dif
ferent water in the same river. Audiology & Neuro-otology, 5, 111–139.
Pratt, J., & Turk-Browne, N. B. (2003). The attentional repulsion effect in perception and
action. Experimental Brain Research, 152, 376–382.
Rama, P., Poremba, A., Sala, J. B., Yee, L., Malloy, M., Mishkin, M., & Courtney, S. M.
(2004). Dissociable functional cortical topographies for working memory maintenance of
voice identity and location. Cerebral Cortex, 14, 768–780.
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhu
man primates illuminate human speech processing. Nature Neuroscience, 12, 718–724.
Page 31 of 35
Varieties of Auditory Attention
Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of
(p. 235)
“what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences
U S A, 97, 11800–11806.
Rimmele, J., Jolsvai, H., & Sussman, E. (2011). Auditory target detection is affected by im
plicit temporal and spatial expectations. Journal of Cognitive Neuroscience, 23, 1136–
1147.
Rinne, T., Balk, M. H., Koistinen, S., Autti, T., Alho, K., & Sams, M. (2008). Auditory selec
tive attention modulates activation of human inferior colliculus. Journal of Neurophysiolo
gy, 100, 3323–3327.
Rinne, T., Kirjavainen, S., Salonen, O., Degerman, A., Kang, X., Woods, D. L., & Alho, K.
(2007). Distributed cortical networks for focused auditory attention and distraction. Neu
roscience Letters, 416, 247–251.
Ross, B., Hillyard, S. A., & Picton, T. W. (2010). Temporal dynamics of selective attention
during dichotic listening. Cerebral Cortex, 20, 1360–1371.
Rossi-Katz, J., & Arehart, K. H. (2009). Message and talker identification in older adults:
Effects of task, distinctiveness of the talkers’ voices, and meaningfulness of the compet
ing message. Journal of Speech, Language, and Hearing Research, 52, 435–453.
Salmi, J., Rinne, T., Degerman, A., & Alho, K. (2007a). Orienting and maintenance of spa
tial attention in audition and vision: An event-related brain potential study. European
Journal of Neuroscience, 25, 3725–3733.
Salmi, J., Rinne, T., Degerman, A., Salonen, O., & Alho, K. (2007b). Orienting and mainte
nance of spatial attention in audition and vision: multimodal and modality-specific brain
activations. Brain Structure and Function, 212, 181–194.
Salmi, J., Rinne, T., Koistinen, S., Salonen, O., & Alho, K. (2009). Brain networks of bot
tom-up triggered and top-down controlled shifting of auditory attention. Brain Research,
1286, 155–164.
Sanders, L. D., & Astheimer, L. B. (2008). Temporally selective attention modulates early
perceptual processing: Event-related potential evidence. Perception & Psychophysics, 70,
732–742.
Santangelo, V., Fagioli, S., & Macaluso, E. (2010). The costs of monitoring simultaneously
two sensory modalities decrease when dividing attention in space. NeuroImage, 49, 2717–
2727.
Page 32 of 35
Varieties of Auditory Attention
Sestieri, C., Di Matteo, R., Ferretti, A., Del Gratta, C., Caulo, M., Tartaro, A., Olivetti Be
lardinelli, M., & Romani, G. L. (2006). “What” versus “where” in the audiovisual domain:
an fMRI study. NeuroImage, 33, 672–680.
Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in
auditory scene analysis. Trends in Neurosciences, 34, 114–123.
Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear.
Nature, 408, 788.
Shomstein, S., & Yantis, S. (2004). Control of attention shifts between vision and audition
in human cortex. Journal of Neuroscience, 24, 10702–10706.
Shomstein, S., & Yantis, S. (2006). Parietal cortex mediates voluntary control of spatial
and nonspatial auditory attention. Journal of Neuroscience, 26, 435–439.
Smith, D. V., Davis, B., Niu, K., Healy, E. W., Bonilha, L., Fridriksson, J., Morgan, P. S., &
Rorden, C. (2010). Spatial attention evokes similar activation patterns for visual and audi
tory stimuli. Journal of Cognitive Neuroscience, 22, 347–361.
Snyder, J. S., Alain, C., & Picton, T. W. (2006). Effects of attention on neuroelectric corre
lates of auditory stream segregation. Journal of Cognitive Neuroscience, 18, 1–13.
Snyder, J. S., Carter, O. L., Hannon, E. E., & Alain, C. (2009). Adaptation reveals multiple
levels of representation in auditory stream segregation. Journal of Experimental Psycholo
gy: Human Perception and Performance, 35, 1232–1244.
Stevens, M. C., Calhoun, V. D., & Kiehl, K. A. (2005). fMRI in an oddball task: Effects of
target-to-target interval. Psychophysiology, 42, 636–642.
Stormer, V. S., Green, J. J., & McDonald, J. J. (2009). Tracking the voluntary control of au
ditory spatial attention with event-related brain potentials. Psychophysiology, 46, 357–
366.
Sussman, E. S., Bregman, A. S., Wang, W. J., & Khan, F. J. (2005). Attentional modulation
of electrophysiological activity in auditory cortex for unattended sounds within multi
stream auditory environments. Cognitive, Affective, and Behavioral Neuroscience, 5, 93–
110.
Sussman, E. S., Horváth, J., Winkler, I., & Orr, M. (2007). The role of attention in the for
mation of auditory streams. Perception & Psychophysics, 69, 136–152.
Suzuki, K., Takei, N., Toyoda, T., Iwata, Y., Hoshino, R., Minabe, Y., & Mori, N. (2003). Au
ditory hallucinations and cognitive impairment in a patient with a lesion restricted to the
hippocampus. Schizophrenia Research, 64, 87–89.
Page 33 of 35
Varieties of Auditory Attention
Suzuki, S., & Cavanagh, P. (1997). Focused attention distorts visual space: An attentional
repulsion effect. Journal of Experimental Psychology: Human Perception and Performance,
23, 443–463.
Tark, K. J., & Curtis, C. E. (2009). Persistent neural activity in the human frontal cortex
when maintaining space that is off the map. Nature Neuroscience, 12, 1463–1468.
Trejo, L. J., Ryan-Jones, D. L., & Kramer, A. F. (1995). Attentional modulation of the mis
match negativity elicited by frequency differences between binaurally presented tone
bursts. Psychophysiology, 32, 319–328.
Tzourio, N., Massioui, F. E., Crivello, F., Joliot, M., Renault, B., & Mazoyer, B. (1997).
Functional anatomy of human auditory attention studied with PET. NeuroImage, 5, 63–77.
Valdes-Sosa, M., Bobes, M. A., Rodriguez, V., & Pinilla, T. (1998). Switching attention
without shifting the spotlight object-based attentional modulation of brain potentials.
Journal of Cognitive Neuroscience, 10, 137–151.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory dis
crepancy. Psychological Bulletin, 88, 638–667.
Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive
regularity representations and perceptual objects. Trends in Cognitive Sciences, 13, 532–
540.
Woldorff, M. G. (1995). Selective listening at fast stimulus rates: so much to hear, so little
time. Electroencephalography and Clinical Neurophysiology Supplement, 44, 32–51.
Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D.,
(p. 236)
& Bloom, F. E. (1993). Modulation of early sensory processing in human auditory cortex
during auditory selective attention. Proceedings of the National Academy of Science U S
A, 90, 8722–8726.
Woldorff, M. G., Hackley, S. A., & Hillyard, S. A. (1991). The effects of channel-selective
attention on the mismatch negativity wave elicited by deviant tones. Psychophysiology,
28, 30–42.
Woldorff, M. G., & Hillyard, S. A. (1991). Modulation of early auditory processing during
selective listening to rapidly presented tones. Electroencephalography and Clinical Neu
rophysiology, 79, 170–191.
Woods, D. L., & Alain, C. (1993). Feature processing during high-rate auditory selective
attention. Perception & Psychophysics, 53, 391–402.
Page 34 of 35
Varieties of Auditory Attention
Woods, D. L., & Alain, C. (2001). Conjoining three auditory features: An event-related
brain potential study. Journal of Cognitive Neuroscience, 13, 492–509.
Woods, D. L., Alain, C., Diaz, R., Rhodes, D., & Ogawa, K. H. (2001). Location and fre
quency cues in auditory selective attention. Journal of Experimental Psychology: Human
Perception and Performance, 27, 65–74.
Woods, D. L., Alho, K., & Algazi, A. (1994). Stages of auditory feature conjunction: An
event-related brain potential study. Journal of Experimental Psychology: Human Percep
tion and Performance, 20, 81–94.
Wu, C. T., Weissman, D. H., Roberts, K. C., & Woldorff, M. G. (2007). The neural circuitry
underlying the executive control of auditory spatial attention. Brain Research, 1134, 187–
198.
Yantis, S., & Serences, J. T. (2003). Cortical mechanisms of space-based and object-based
attentional control. Current Opinion in Neurobiology, 13, 187–193.
Yoncheva, Y. N., Zevin, J. D., Maurer, U., & McCandliss, B. D. (2010). Auditory selective at
tention to speech modulates activity in the visual word form area. Cerebral Cortex, 20,
622–632.
Yoshiura, T., Zhong, J., Shibata, D. K., Kwok, W. E., Shrier, D. A., & Numaguchi, Y. (1999).
Functional MRI study of auditory and visual oddball tasks. NeuroReport, 10, 1683–1688.
Zatorre, R. J., Mondor, T. A., & Evans, A. C. (1999). Auditory attention to space and fre
quency activates similar cerebral systems. NeuroImage, 10, 544–554.
Claude Alain
Claude Alain is Senior Scientist and Assistant Director Rotman Research Institute,
Baycrest Centre; Professor Department of Psychology & Institute of Medical
Sciences, University of Toronto.
Stephen R. Arnott
Stephen R. Arnott, Rotman Research Institute, Baycrest Centre for Geriatric Care.
Benjamin J. Dyson
Page 35 of 35
Spatial Attention
Spatial Attention
Jeffrey R. Nicol
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Spatial attention facilitates adaptive interaction in the environment by enhancing the per
ceptual processing associated with selected locations or objects and suppressing process
ing associated with nonselected stimuli. This chapter presents spatial attention as a di
chotomy of visual processes, mechanisms, and neural networks. Specifically, distinctions
are made between space- and object-based models of attentional selection, overt and
covert orienting, reflexive and voluntary control of the allocation of resources, and dorsal
and ventral frontoparietal neural networks. The effects of spatial attention on neurophysi
ological activity in subcortical and visual cortical areas are reviewed, as are the findings
from behavioral studies examining the effects of spatial attention on early (i.e., low-level)
visual perception.
Keywords: space-based selection, object-based selection, reflexive control, voluntary control, covert orienting,
overt orienting, eye movements, neural networks of attention, visual perception
When our eyes open to view the environment that surrounds us, we are immediately con
fronted with a massive amount of information. This situation presents a problem to the vi
sual system because it is limited in the amount of information that it can process, and
hence be aware of, at one time (e.g., Broadbent, 1958). Adaptive interaction with the en
vironment, therefore, requires a neural mechanism that effectively facilitates the selec
tion of behaviorally relevant information for enhanced processing while simultaneously
suppressing the processing associated with irrelevant information. This mechanism,
which permits differential processing of spatially and temporally contiguous sources of in
formation, is referred to as selective attention (e.g., Johnston & Dark, 1986), and it broad
ly describes “those processes that enable an observer to recruit resources for processing
selected aspects of the retinal image more fully than nonselected aspects” (Palmer, 1999,
p. 532). Although a considerable body of research has demonstrated effects of spatial at
tention in the auditory sensory modality (e.g., Spence & Driver, 1994) and cross-modally
(see Spence & Driver, 2004), the majority of research has focused on the visual modality.
Accordingly, the present chapter represents a selective review of seminal and recent
studies that have examined visual spatial attention. Spatial attention is defined here as
Page 1 of 30
Spatial Attention
the focusing of limited capacity visual processing resources to a specific location in space
for the purpose of selective perceptual enhancement of the information at that location.
Space-Based Models
Space-based theories of attention posit that space is the primary unit of selection. Gener
al support for space-based approaches comes from studies showing that performance is
negatively affected when a target stimulus is flanked by spatially contiguous distractors
(i.e., within 1 degree of visual angle), but not when distractors are more disparate from
the target (Eriksen & Eriksen, 1974; Eriksen & Hoffman, 1972). Common space-based
theories use metaphors such as a spotlight, a zoom lens, or a gradient to describe the
process of attentional selection. According to the spotlight account (e.g., Posner, 1978;
Posner, Snyder, & Davidson, 1980), attention operates like a beam of light that enhances
processing of stimuli that occupy regions that fall within its boundary, and inhibits, or at
tenuates, processing of stimuli at locations outside its boundary. The spotlight metaphor
of attention was developed to account for findings that emerged from Posner and col
leagues’ (Posner, 1978; Posner, Nissen, & Ogden, 1978; Posner et al., 1980) seminal work
using the spatial cueing task.
In a typical spatial cueing task, observers fixate on a centrally presented stimulus and are
presented with a cue that directs attention to a specific spatial location. Following the
cue, a target appears, and observers are required to either detect the onset of the target
as quickly as possible or perform a target discrimination as quickly and as accurately as
possible. Generally, three different cue types are used: valid cues that indicate where the
target will appear, invalid cues that direct attention away from the target location, and
neutral cues that alert the observer to the ensuing target onset in the absence of direc
tional information (see Jonides & Mack, 1984, and Wright, Richard, & McDonald, 1995, in
regard to methodological issues surrounding neutral cues). Cost–benefit analysis of reac
tion time (RT) (i.e., the perceptual cost associated with invalid cues is determined by sub
tracting mean invalid RT from neutral RT, and the perceptual benefit associated with
valid cues is determined by subtracting neutral RT from valid RT) reveal that, relative to
the neutral cueing condition, targets are detected faster when they appear at the cued lo
cation and slower when they appear at the uncued location (Posner et al., 1978). Percep
tual discriminations are also more accurate for targets that appear at the cued than un
Page 2 of 30
Spatial Attention
cued location (Posner et al., 1980). The spotlight account of the data forwarded by Posner
and colleagues contends that detection was more efficient at the cued location because
attentional resources were allocated to that location before the onset of the target,
whereas detection was less efficient at the uncued location because of the additional time
it took to shift attention to the actual location of the target.
An important aspect of Posner and colleagues’ spotlight account is the notion that spatial
attention must disengage from its origin before it can shift to another location (e.g., Pos
ner & Petersen, 1990; Posner, Petersen, Fox, & Raichle, 1998). In fact, they propose that
shifts of spatial attention involve three distinct processes, each of which is associated
with a specific neural substrate. First, an area of the parietal lobe called the temporal-
parietal junction (TPJ) activates to permit disengagement of spatial attention from the
current orientation. Next, a midbrain structure called the superior colliculus determines
the target location and initiates the attentional shift toward that location. Finally, an area
of the thalamus called the pulvinar activates to facilitate engagement of attention at the
new spatial location. Perhaps the most convincing evidence in support of the disengage
function comes from research demonstrating that perceptual discriminations improve
when the central fixation stimulus is removed just before the onset of peripherally pre
sented targets (e.g., Mackeben & Nakayama, 1993; Pratt & Nghiem, 2000)—a finding re
ferred to as the “gap effect” (Saslow, 1967).
In contrast to the spotlight metaphor, which assumes that the size of the attentional beam
is fixed, the zoom lens theory (Eriksen & St. James, 1986; Eriksen & Yeh, 1985) suggests
that the spatial extent of attention can be focused or diffuse, depending on the demands
of the task. Support for that claim comes from an experiment by La Berge (1983) that re
quired observers to categorize five-letter words, the middle letter of five-letter words, or
the middle letter of five-letter nonwords in a speeded RT task. Within each condition, crit
ical probe trials were periodically presented. On those trials, observers categorized a
probe stimulus that appeared in one of the five letter positions (+ signs occupied the re
maining four positions). In the letter conditions, RTs were fastest when the probe ap
peared in the middle (i.e., attended) position, and slowest (p. 239) when it appeared at the
first and fifth positions (i.e., the data formed a V-shaped RT function). In the word condi
tion, however, RT was not affected by the position of the probe (i.e., the RT function was
flat). The findings support the zoom lens theory by demonstrating that the focus of atten
tion can indeed vary in size, in accordance with the demands of the task.
Zoom lens theory also proposes that the resolution of the visual system that is afforded by
spatial attention is determined by the variable scope of the lens: focusing the lens yields
higher resolution within a narrow spatial area, whereas widening the lens yields lower
resolution over a broader spatial area (e.g., Eriksen & Yeh, 1985). Consistent with that
proposal, Eriksen and St. James (1986) showed that RTs to targets increased as a function
of the number of spatial pre-cues presented. In the experiment, between one and four
contiguous locations of an eight-item circular array were spatially cued, followed by the
presentation of the target and distractors items. The results showed that target discrimi
nations became slower as the number of cued items in the display increased, suggesting
Page 3 of 30
Spatial Attention
that attentional resources are “watered down” as they are expanded over larger spatial
areas (Eriksen & St. James, 1986, p. 235). Two other findings from that study are also
worth noting: the entire array used in the experiment only subtended 1.5 degrees of visu
al angle, so the reported effects emerged when attention was quite focused; and the cue-
to-target stimulus onset asynchrony (SOA) manipulation revealed that performance was
asymptotic beyond 100 ms, suggesting that the zoom lens required approximately that
much time to make adjustments within spatial areas of that size.
The space-based theories discussed above assume that the attentional field is indivisible
(e.g., Eriksen & Yeh, 1985; Posner et al., 1980). Several studies, however, have found evi
dence to the contrary (e.g., Bichot, Cave, & Pashler, 1999; Gobell, Tseng, & Sperling,
2004; Kramer & Hahn, 1995). For example, Kramer and Hahn (1995) presented observers
with displays of two targets with two spatially intervening distractors. On each trial, two
boxes were used to pre-cue the target positions, and observers were required to report
whether the letters that appeared within the boxes were the same or different. To avoid
an attentional capture confound associated with abrupt onsets, targets and distractors
were initially presented as square figure eights, then shortly after the cues were present
ed, segments of the figure eights were removed to produce letters. On half of the trials,
the distractors primed a same response, and on the other half, they primed a different re
sponse. The researchers found that both RT and discrimination accuracy were unaffected
by the presence of the intervening distractors and concluded that “attention can be flexi
bly deployed and maintained on multiple locations” (Kramer & Hahn, 1995, p. 384).
Object-Based Models
The finding that the attentional field can be divided into multiple noncontiguous regions
of space represents a major obstacle for space-based models of attention. According to
object-based models, however, perceptual grouping mechanisms make it possible to di
vide the attentional field across spatially disparate or partially occluded stimuli (Driver &
Baylis, 1989; Duncan, 1984; Kahneman, Treisman, & Gibbs, 1992; Kramer & Jacobson,
1991). Object-based models assert that attentional selection is determined by the number
of objects that are present in the visual field and emphasize the influence of gestalt
grouping factors on the distribution of attention (Duncan, 1984; Neisser, 1967).
One of the original object-based models contended that attentional selection is a two-
stage process (Neisser, 1967). In the first stage, the visual field is preattentively parsed
into perceptual units (i.e., into objects) according to the gestalt principles of grouping
(e.g., similarity, proximity). Then, in the second stage, the object is analyzed in detail by
focal attention. Proponents of object-based theories of selection generally argue that once
attention is directed to an object, all parts or features of that object are automatically se
lected regardless of spatial location (e.g., Duncan, 1984; Egly, Rafal, & Driver, 1994; Kah
neman & Treisman, 1984). As such, all parts of the same object are processed in a paral
lel fashion, whereas different objects are processed serially (Treisman, Kahneman, &
Burkell, 1983).
Page 4 of 30
Spatial Attention
One of the earliest studies to comprehensively examine the idea that both automatic and
voluntary mechanisms can guide the allocation of spatial attention resources was con
ducted by Jonides (1981). He tested the independence of these two mechanisms by pre
senting observers with either a centrally presented directional cue (i.e., an arrow at fixa
tion) or a peripherally presented location cue (i.e., an arrow adjacent to a potential target
position in the periphery) and then asking them to perform a visual search for one of two
targets in a circular array of eight letters. Across three experiments, Jonides showed that
RTs to targets following central cues, but not peripheral cues, are slowed when mental re
Page 5 of 30
Spatial Attention
sources are simultaneously consumed in a working memory task (i.e., holding a sequence
of numbers in mind); cueing effects persist when observers are asked to ignore peripher
al cues, but not when they are asked to ignore central cues; and cueing effects are modu
lated as a function of the relative proportion of central cues that observers expect to be
presented with, but not as a function of the relative proportion of peripheral cues that
they expect to be presented with. Together, the findings were taken as evidence that ex
ogenously and endogenously driven spatial attention, activated by peripheral and central
cues respectively, “differ in the extent to which they engage attention
automatically” (Jonides, 1981, p. 200).
A two-mechanism model of spatial attention was also supported by the results of a study
showing that reflexive and voluntary orientations have different time courses with re
spect to their respective facilitative and inhibitory effects on perception. Muller and Rab
bitt (1989) presented observers with a central or a peripheral cue that directed their at
Page 6 of 30
Spatial Attention
tention to one of four target locations. Following the cue, a target stimulus appeared at
the cued or an uncued location, and observers were required to discriminate whether it
was the same or different from a previously presented stimulus. Peripheral cues, which
activate reflexive orienting, produced maximal facilitation of target discriminations at the
cued location at short (100–175 ms) SOAs and also improved performance at the uncued
location at longer (400–725 ms) SOAs. In contrast, central cues, which activate voluntary
orienting, facilitated performance at the cued location maximally at SOAs between 275
and 400 ms. Based on these results, the researchers proposed that spatial orienting is
composed of a fast-acting mechanism that briefly facilitates, but then inhibits, attentional
processing at peripherally cued locations, and a slower-acting mechanism, activated only
by central cues, that facilitates attentional processing for a more sustained interval
(Muller & Rabbitt, 1989).
It is noteworthy that the results from both studies that were just reviewed are consistent
with the findings from an earlier study by Posner and Cohen (1984) showing that an inhi
bition of return (IOR) effect occurs when spatial attention is directed exogenously in re
sponse to peripheral cues, but not when it is directed endogenously by central cues. In
the study, observers were presented with three placeholder boxes along the horizontal
meridian of the screen. In the peripheral cueing procedure, one of peripheral boxes
brightened briefly, followed by brief brightening of the centre box (i.e., to reorient spatial
attention away from the initially cued location). The central cueing procedure was similar,
except an arrow stimulus appeared in the center box, followed by the brightening of that
box 600 ms later (if no target had appeared in a peripheral box 450 ms following the on
set of the central cue). In line with previous research (e.g., Posner et al., 1978, 1980), pe
ripheral cues and central cues produced reliable cueing effects: target detection was
most efficient when the target appeared at the cued location. However, although the facil
itative effect of the cue persisted across the entire range of cue–target SOAs when cen
tral cues were used, target detection was actually inhibited at the cued location at SOAs
beyond 200 ms when peripheral cues were used. Posner and Cohen (1984) called the ef
fect IOR and suggested that it reflected an evolved mechanism that promoted efficient vi
sual searching of the environment (i.e., by assigning inhibitory tags to previously exam
ined spatial locations). The finding that the IOR effect occurs when attention is controlled
exogenously, but not when it is under endogenous control (but see Lupianez et al., 2004
for an exception), clearly supports a two-mechanism model of spatial attention.
Perhaps the strongest evidence supporting the distinction between exogenous and en
dogenous attentional orienting comes from Klein and colleagues’ research (e.g., Briand &
Klein, 1987; Klein, 1994; Klein & Hansen, 1990; also see Funes, Lupianez, & Milliken,
2007, for a double dissociation using on a spatial Stroop task) showing a double dissocia
tion between the two components of spatial attention. In one study, Briand and Klein
(1987) dissociated the two mechanisms by examining the effect of spatial cueing on the
likelihood that observers would make illusory conjunction errors. Spatial attention was di
rected to the left or right of fixation by peripheral or central cues. In each trial, a pair of
letters appeared at the cued or uncued location and observers were required to report
whether or not the target letter R was present or absent. The critical comparison con
Page 7 of 30
Spatial Attention
cerned the difference in performance when the target was absent and the letter pair pro
moted an illusory conjunction of the target (i.e., PQ), as opposed to when the target was
absent and the letter pair did not promote an illusory conjunction of the target (i.e., PB).
The conditions were referred to as conjunction and feature search conditions, respective
ly. The results revealed that search type interacted with spatial attention (p. 242) when ex
ogenous peripheral cues were used, but not when endogenous central cues were used.
Thus, the possibility of making an illusory conjunction error impaired performance when
spatial attention was controlled exogenously, but not when it was controlled endogenous
ly. Briand and Klein (1987) concluded that central and peripheral cues engage different
attentional systems, and that feature integration processes are only performed by the ex
ogenously driven component of spatial attention.
In another study, Klein (1994) dissociated the two mechanisms of spatial attention by ex
amining the effect of spatial cueing on nonspatial (i.e., perceptual motor) expectancies.
Covert spatial attention was directed by peripheral or central cues to a box on the left or
right side of fixation, and on each trial the cued or uncued box either increased or de
creased in size. For half the observers an increase in target size was far more likely to oc
cur, and for the other half a decrease in target size was far more likely to occur. The re
sults indicated that nonspatial expectancies interacted with spatial attention when en
dogenous central cues were used, but not when exogenous peripheral cues were used.
Specifically, performance was impaired by the occurrence of an unexpected event at the
uncued location when central cues were employed, whereas the effect of exogenous at
tention on performance was the same for expected and unexpected events. Klein (1994)
concluded that when taken together with evidence from the Briand and Klein (1987)
study, their results demonstrated that exogenously and endogenously controlled atten
tions recruit qualitatively different attentional mechanisms. In other words, “it is not just
the vehicle, but the passenger, that might differ with endogenous versus exogenous con
trol” (Klein, 1994, p. 169).
Early single-cell recording research showed that programming an eye movement caused
increased firing rates in cells in the superior colliculus whose receptive fields were at the
Page 8 of 30
Spatial Attention
target location, well before the actual eye movement had begun (Goldberg & Wurtz, 1972).
These findings supported the notion that a relationship exists between eye movements
and spatial attention and that the superior colliculus may be the neural substrate that
manages that relationship. However, since then, several studies have demonstrated that
the perceptual costs and benefits produced by spatial cueing manipulations can occur in
the absence of eye movements (e.g., Bashinski & Bacharach, 1980; Eriksen & Hoffman,
1972; Posner et al., 1978). The fact that the effects of spatial attention can occur in the
absence of eye movements suggests that attention shifts and eye movements are mediat
ed by different neural structures. In fact, Posner (1980) concluded that when considered
together, the behavioral, electrophysiological, and single-cell recording findings are evi
dence that “eliminates the idea that attention and eye movements are identical
systems” (p. 13).
Although spatial attention and eye movements are not identical (Posner, 1980), the oculo
motor readiness hypothesis (OMRH) (Klein, 1980) and premotor theory (Rizzolatti, Rig
gio, Dascola, and Umilta (1987) nevertheless propose that they are two processes of a
unitary mechanism. Both theories contend that endogenously controlled covert shifts of
attention prepare and facilitate the execution of eye movements to a target location. Or
as Rizzolatti et al. (1987) simply put it: Eye movements follow attention shifts. In this
view, the preparation of an eye movement to a target location is the endogenous orienting
mechanism (Klein, 2004). Thus, covert endogenous spatial attention is simply an intend
ed, but unexecuted, eye movement (Rizzolatti et al., 1987). Critically, the theories predict
that covert shifts of spatial attention facilitate eye movements when they are in the same
direction, and that sensory processing should be enhanced at the location of a pro
grammed eye movement. Despite representing the dominant view concerning the nature
of the relationship between spatial attention and eye movements (Palmer, 1999), the re
search has yielded equivocal results concerning the two critical predictions that emerge
from the OMRH and premotor theory.
Klein (1980; see also Hunt & Kingstone, 2003; Klein & Pontefract, 1994) conducted two
(p. 243) experiments to test to the predictions of his OMRH. In one experiment, observers
were presented with a central cue, and then, depending on the type of trial, they either
made an eye movement or a detection response to a target at the cued or uncued location
(i.e., the type of response was determined by the physical characteristics of the target).
Most trials required a detection response, and performance in these trials suggested that
the participants had indeed shifted their attention in response to the cue. However, in
contrast to what would be predicted by the OMRH and premotor theory (Rizzolatti et al.,
1987), eye movements to a target at the cued location were no faster than they were to a
target at the uncued location (Klein, 1980). That result is not consistent with the OMRH
prediction that covert shifts of spatial attention facilitate eye movements when they are in
the same direction. In his second experiment, Klein (1980) instructed some observers to
perform a leftward eye movement on every trial and others to perform a rightward eye
movement on every trial, regardless of whether the target appeared on the left or right of
central fixation. Critically, on some trials, instead of executing an eye movement, ob
servers simply needed to make a detection response to the target. Again, the results were
Page 9 of 30
Spatial Attention
inconsistent with the OMRH and premotor theory. Instead of RTs being facilitated when
detection targets appeared in the location of the prepared eye movement, as the OMRH
and premotor theory would predict, RTs were unaffected by the spatial compatibility of
the target and the direction of the eye movement. These findings and others (e.g., Hunt &
Kingstone, 2003; Posner, 1980; Remington, 1980) refute the notion that covert spatial at
tention shifts are unexecuted eye movements, and rather suggest that covert spatial at
tention and overt eye movements are independent mechanisms (Klein, 1980; Klein & Pon
tefract, 1994).
The extant research does suggest that covert spatial attention and overt eye movements
are interdependent. For example, the results of a study conducted by Shepherd et al.
(1986) showed that spatial attention and eye movements form an asymmetrical relation
ship, such that an attention shift does not require a corresponding eye movement, but an
eye movement necessarily involves a shift in the focus of attention. They examined covert
and overt attentional orienting independently by manipulating the validity of a central
cue and by instructing observers to either prepare and execute, or prepare and inhibit, an
eye movement to the target location, respectively. In a blocked design, observers were re
quired to remain fixated (i.e., fixate condition) or make an eye movement (i.e., move con
dition) following the presentation of either a valid (80 percent), uninformative (50 per
cent), or invalid (20 percent) central cue, and then respond to the detection of a target at
the cued or uncued location. Not surprisingly, the results showed a significant reduction
in detection RT when the move condition was coupled with the valid spatial cue. Interest
ingly, a benefit to RT was also found in the move condition when the cue was invalid, sug
gesting that eye movements have a more dominant effect on performance than endoge
nously controlled attention (Shepherd et al., 1986). Most germane to the OMRH (Klein,
1980) and premotor theory (Rizzolatti et al., 1987), however, was the large benefit to RT
that was found in the move condition when the cue was uninformative. Accordingly, the
authors concluded that making an eye movement “involves some allocation of attention to
the target position before the movement starts” (Shepherd et al., 1986, p. 486). This
demonstration of the interdependence between spatial attention and eye movements is at
least in partial support of the OMRH (Klein, 1980) and premotor theory (Rizzolatti et al.,
1987).
The findings from a more recent study conducted by Hoffman and Subramanian (1995)
also support the OMRH (Klein, 1980) and premotor theory (Rizzolatti et al., 1987) by
showing that a close relationship exists between spatial attention and eye movements. In
their first experiment, observers were presented with four placeholders around fixation
(i.e., above and below, and on either side) followed by an uninformative central cue indi
cating to which placeholder they should prepare an eye movement toward (but providing
no prediction about the target location). Shortly after the directional cue was removed, a
tone was presented to cue observers to initiate the planned eye movement. When the
tone was presented or immediately afterward, a letter was presented in each placeholder
(i.e., three distractors and one target), and observers were required to perform a two-al
ternatives forced choice regarding the identity of the target. The results revealed that tar
get discriminations were approximately 20 percent better when the target appeared in
Page 10 of 30
Spatial Attention
the location of the intended location compared with when the target appeared at one of
the uncued locations. Because this finding indicates that observers attended to the loca
tion of the intended eye movement, even though they were aware that the cue did not
validly predict the target location, the researchers concluded that the spatial attention
and eye movements are indeed (p. 244) related, such that eye movements to a given tar
get location are preceded by a shift of spatial attention to that location (Hoffman &
Subramanian, 1995).
In their second experiment, Hoffman and Subramanian (1995) dissociated spatial atten
tion and eye movements by requiring observers to prepare eye movements in the same di
rection on each trial, before presenting them with a valid central cue. Thus, on some tri
als, the intended eye movement and the target location matched, and on other trials they
mismatched. The results showed that regardless of whether the cue directed attention to
the target location or not (i.e., whether the cue was valid or not), target discriminations
were highly accurate when the target location and the intended eye movement matched,
and target discriminations were poor when the target location and the intended eye
movement mismatched. The findings from this experiment indicate that spatial attention
cannot be easily directed to a different location than an intended eye movement and sug
gest that an “obligatory” relationship exists between spatial attention and eye movements
such that covert spatial orienting precedes overt orienting (Hoffman & Subramanian,
1995). Thus, the results of both experiments reported by Hoffman and Subramanian
(1995; see also Shepherd et al., 1986) support the OMRH (Klein, 1980) and premotor the
ory (Rizzolatti et al., 1987).
Taken together, the studies reviewed above indicate that spatial attention and eye move
ments are interdependent mechanisms: Although a shift of attention can be made in the
absence of an eye movement, an eye movement cannot be executed without a preceding
shift of spatial attention. Moreover, because spatial attention appears to play a critical
role in initiating eye movements, rather than vice versa, it can be concluded that “atten
tion is the primary mechanism of visual selection, with eye movements playing an impor
tant but secondary role” (Palmer, 1999, p. 570).
One subcortical substrate, the superior colliculus, plays a key role in reflexive shifts of at
tention (it is also involved in localization of stimuli and eye movements) (Wright & Ward,
2008). Another area called the frontal eye field (FEF) is particularly important for execut
ing voluntary shifts of attention (Paus, 1996). The critical connection between the FEF
Page 11 of 30
Spatial Attention
and attention shifts would be expected given the interdependent relationship between
spatial attention and eye movements that was illustrated in the previous section. Some re
searchers suggest that the superior colliculus and FEF make up a subcortical network
that interacts to control the allocation of spatial attention. Specifically, it has been sug
gested that the FEF may serve to inhibit reflexive attention shifts generated by the supe
rior colliculus (c.f., Wright & Ward, 2008).
Areas of the thalamus are also important subcortical regions associated with spatial at
tention. One area, called the pulvinar nucleus, is critically involved in covert spatial ori
enting (e.g., Robinson & Petersen, 1992). In a single-cell recording study, Petersen, Robin
son, and Morris (1987) measured activity of neurons in the dorsomedial part of the lateral
pulvinar (Pdm) of trained rhesus monkeys while they fixated centrally and made speeded
responses to targets appearing at peripherally cued or uncued locations. First, the re
searchers confirmed that the Pdm is related to spatial attention, and is independent of
eye movements, by observing enhanced activity in that area when the monkeys covertly
attended to the target location. Having established that the Pdm is related to spatial at
tention, next Petersen et al. examined changes in attentional performance that resulted
from pharmacological alteration of the Pdm by GABA-related drugs (i.e., muscimol, a GA
BA-agonist, and bicuculline, a GABA-antagonist). The results showed that pharmacologi
cal alteration of this part of the brain did in fact alter performance in the spatial cueing
task: Injections of muscimol, which increases inhibition by increasing GABA effectiveness,
impaired the monkey’s ability to execute contralateral attention shifts, and injections of
bicuculline, which decreases inhibition by decreasing GABA effectiveness, facilitated the
monkey’s ability to shift attention to the contralateral field. Given that these modulations
of attentional performance were produced by drug injections to the Pdm, it is reasonable
to conclude that the Pdm is implicated in spatial attention (Petersen et al. 1987). A study
by O’Connor, Fukui, Pinsk, and Kastner (2002) that will be reviewed later in the chapter
shows that activity in another area of the thalamus, called the lateral geniculate nucleus
(LGN), is also modulated by spatial attention.
In addition to the subcortical sources discussed above, multiple cortical sources are criti
cally involved in the allocation of spatial attention resources. A number of different atten
tional control processes involve various areas of the parietal cortex (e.g., Corbetta, 1998;
Kanwisher & Wojciulik, 2000). For example, Yantis, Schwarzbach, Serences, et al. (2002)
used event-related functional magnetic resonance imaging (fMRI) to examine changes in
brain activity that occurred when observers made covert attention shifts between two pe
ripheral target locations. In particular, the researchers were interested in determining
whether the activation of parietal cortex is associated with transient or sustained atten
tional control. The scans revealed that while shifts of spatial attentional produced sus
tained contralateral activation in extrastriate cortex, they produced only transient in
creases in activation in the posterior parietal cortex (Yantis et al., 2002). Accordingly the
authors concluded, “activation of the parietal cortex is associated with a discrete signal to
Page 12 of 30
Spatial Attention
shift spatial attention, and is not the source of a signal to continuously maintain the cur
rent attentive state” (Yantis et al., 2002, p. 995).
Activation of the parietal cortex is associated with top-down attentional control that is
specifically related to the processing of the spatial pre-cue. That was demonstrated in an
event-related fMRI study that Hopfinger, Buonocore, and Mangun (2000) designed to dis
sociate neural activity associated with cue-related attentional control, from the selective
sensory processing associated with target perception. Valid central cues were used to di
rect spatial attention to one of the black-and-white checkerboard targets that were pre
sented on either side of fixation. The task required observers to covertly attend to the
cued checkerboard and report whether or not it contained some gray checks. The fMRI
scans revealed that the cues activated a network for voluntary attentional control com
prising the inferior parietal (particularly the intraparietal sulcus [IPS]), superior tempo
ral, and superior frontal cortices. Moreover, the scans showed contralateral cue-related
activation in areas of extrastriate cortex that represented the spatial location of the ensu
ing target. Taken together, these findings indicate that a neural network comprising areas
of parietal, temporal, and frontal cortex is associated with top-down attentional control,
and that this network in turn modulates activity in areas of visual cortex where the target
is expected to appear.
Although one area of the parietal lobule, the IPS, is involved in generating and sustaining
voluntary attention toward a cued location in the absence of sensory stimulation
(Friedrich, Egly, Rafal, & Beck, 1998; Hopfinger et al., 2000; Yantis et al., 2002), another
area, TPJ, is activated by unexpected stimulus onsets, particularly at unattended locations
(e.g., Serences et al., 2005; Shulman et al., 2003). Thus, the neuroimaging evidence from
the studies reviewed above indicates that several distinct areas of the parietal cortex are
critically involved in a number of components of spatial attention (Serences et al., 2005,
p. 1000).
Abundant findings from neuroimaging research also indicate that the parietal cortex in
terconnects with the frontal cortex to form a complex cortical network for spatial atten
tion (e.g., Corbetta, 1998; Corbetta & Shulman, 2002; Kastner & Ungerleider, 2000).
Positron emission tomography (PET) was used in a classic study by Corbetta et al. (1993)
to investigate the brain areas involved in the voluntary orienting of spatial attention. Spa
tial attention was directed endogenously by informative central cues and by instructions
to covertly shift attention to the most probable target location. The behavioral findings
showed the expected pattern of costs and benefits to RT (i.e., targets were detected
faster at cued, and slower at uncued, locations), confirming the effectiveness of the cue in
orienting attention. The PET scans revealed significant activation of superior (i.e., dorsal)
areas in both the parietal and frontal cortex during performance of the spatial cueing
task.
Page 13 of 30
Spatial Attention
The behavioral studies reviewed in the previous section on the control of spatial attention
provide compelling evidence that it is governed by two independent, and perhaps inter
acting, mechanisms. One mechanism shifts reflexively, provides transient attentional ef
fects on perception, and is under exogenous (i.e., stimulus-driven) control. The other
mechanism is associated with voluntary shifts, provides sustained attentional effects on
perception, and is under endogenous (i.e., goal-directed) control. Moreover, given the as
sertion made by researchers such as Klein (1994; Klein & Shore, 2000) that these two at
tentional mechanisms are specialized to process different types of visual information, one
might reasonably assume that they are each associated with distinct areas of the brain. In
fact, in an influential review of the literature, Corbetta and Shulman (2002) suggested
that spatial attention is (p. 246) associated with two “partially segregated networks of
brain areas that carry out different attentional functions” (p. 201). On one hand, they pro
pose that control of goal-directed (i.e., voluntary) spatial attention is accomplished in a bi
lateral neural network that sends top-down signals from parts of the superior frontal cor
tex (i.e., FEF) to areas of the parietal cortex (i.e., IPS). On the other hand, they propose
that control of stimulus-driven (i.e., reflexive) spatial attention takes place in a right-later
alized neural network involving temporal-parietal cortex (i.e., TPJ) and inferior frontal
cortex. Corbetta and Shulman (2002) refer to these two streams as the dorsal frontopari
etal network and ventral frontoparietal network, respectively.
The dorsal and ventral frontoparietal networks play separate roles in the control of atten
tion. The dorsal network performs attentional processing associated with voluntary con
trol of attention (e.g., Corbetta et al., 2000; Hopfinger et al., 2000; Yantis et al., 2002),
and the ventral network is more involved in attentional processing associated with the on
set and detection of salient and behaviorally relevant stimuli in the environment (Corbet
ta & Shulman, 2002; Marois, Chun, & Gore, 2000). While Corbetta and Schulman (2002)
contend that the two networks are partially segregated, they also posit that the systems
interact. Specifically, they suggest that one of the important functions of the ventral net
work is to interrupt processing in the dorsal stream when a behaviorally relevant stimu
lus is detected (Corbetta & Shulman, 2002). This “circuit breaker” role of the ventral
stream would facilitate the disengagement and reorientation of attention (Corbetta &
Shulman, 2002).
Interaction between the dorsal and ventral frontoparietal networks may also be required
for the spatial localization of salient stimuli. The ventral network is specialized to detect
the onset of salient stimuli, but localizing such stimuli in space probably requires assis
tance from the dorsal network. Consistent with that idea, an influential theory of visual
processing postulated by Ungerleider and Mishkin (1982) contends that the ventral path
way of the visual system determines “what is out there,” whereas the dorsal pathway of
the visual system determines “where it is” (see also Ungerlieder & Mishkin, 1992).
Results from a recent study by Fox, Corbetta, Snyder, Vincent, and Raichle (2006)
provided some support for Corbetta and Shulman’s (2002) contention that the exchange
Page 14 of 30
Spatial Attention
Mangun and Hillyard (1991) investigated the neural bases of the ubiquitous perceptual
benefits of spatial attention (e.g., more efficient and accurate detections and discrimina
tions) using event-related potentials (ERPs) (i.e., changes in the electrophysiological ac
tivity in the brain time-locked to the presentation of an external stimulus). Central cues
were used to direct spatial attention toward or away from subsequently presented periph
eral targets. Observers detected or made choice discriminations about targets that ap
Page 15 of 30
Spatial Attention
peared at covertly attended or unattended locations while the electrical activity from cor
tical neurons was measured from their scalps. Visual onsets produce predictable early re
sponses over visual cortex called the P1 and N1 waveform components (e.g., Eason, 1981;
Van Voorhis & Hillyard, 1977). The P1 is the first major positive deflection, occurring be
tween 70 and 100 ms after the presentation of a visual stimulus, and the N1 is the first
major negative component that is more broadly distributed and occurs about 150 to 200
ms after the presentation of a visual stimulus (Mangun & Hillyard, 1991). The recordings
revealed that the P1 was larger for attended than unattended targets in both the detec
tion and the discrimination tasks; however, the amplitude of the N1 component only dif
fered for attended and unattended targets in the discrimination task. These findings led
Mangun and Hillyard (1991; Hillyard & Mangun, 1987) to conclude that spatial attention
facilitates a sensory gain control mechanism that enhances processing associated with
sensory signals emanating from attended locations. Interestingly, the observed attention-
related dissociation between the P1 and N1 waveform components has since been repli
cated, and has been interpreted as evidence that the costs and benefits produced by spa
tial cues may reflect the activity of qualitatively different neural mechanisms (c.f. Luck,
1995).
Similar attention-related increases in neural activity have been shown using fMRI. In a
study by O’Conner et al. (2002), observers were presented with high- or low-contrast
flickering checkerboards on either side of fixation while they were in the scanner. There
were two viewing conditions in the experiment: In the attend condition, observers covert
ly attended to the checkerboard on the left or right of fixation and responded when they
detected random changes in target luminance; in the unattended condition, instead of
shifting attention, observers counted letters that were presented at fixation. The results
showed increased fMRI signal change in the LGN and visual cortex in the attended condi
tion compared with the unattended condition.
In the same study, O’Conner et al. (2002) also investigated the effect of spatial attention
on processing of nonselected, or unattended, information. They assumed that the spread
of spatial attention would be determined (i.e., constrained) by the relative amount of cog
nitive effort that was required to perform a perceptual task at fixation. Specifically, be
cause of the limited resource capacity of the attention system, they predicted that the
amount of processing devoted to an unattended stimulus would be determined by the
amount of resources not consumed by the attended stimulus. To test their prediction, ob
servers were required to perform either a demanding (high-load) or easy (low-load) task
at fixation while ignoring checkerboard stimuli presented in periphery. As expected, the
results showed a decrease in activation across the visual cortex in the demanding condi
tion compared with the easy condition. Thus, activation of the LGN and visual cortex is
enhanced when observers voluntarily attend to peripheral stimulus, and it is attenuated
when the same stimulus is voluntarily ignored (O’Conner et al., 2002). The researchers
concluded that spatial attention facilitates visual perception by “enhancing neural re
sponses to an attended stimulus relative to those evoked by the same stimulus when ig
nored” (Kastner, McMains, & Beck, 2009, p. 206).
Page 16 of 30
Spatial Attention
Spatial attention increases contrast sensitivity (see Carrasco, 2006 for a review).
Cameron, Tai, and Carrasco (2002) examined the effect of covert spatial attention on con
trast sensitivity in an orientation discrimination task. Spatial attention was manipulated
by using an informative (i.e., (p. 248) 100 percent valid) peripheral cue that appeared at
one of eight locations in an imaginary circular array around the fixation point, or a neu
tral cue that appeared at fixation. Following the onset of the cue, observers were briefly
presented with a tilted sine wave grating (i.e., alternating fuzzy black lines in a Gaussian
envelope) at varying stimulus contrasts. The researchers found that observers’ contrast
sensitivity thresholds were lower at the attended than the unattended location. In other
words, the contrast needed for observers to perform the task at a given level of accuracy
was different at attended and unattended locations: They attained a threshold level of
performance in the orientation discrimination task at a lower contrast when targets ap
peared at the attended than the unattended location.
Similar findings were reported by Pestilli and Carrasco (2005) in a study that manipulated
spatial attention using pure exogenous cues (i.e., peripheral and uninformative). They
presented a peripheral cue (50 percent valid with two locations) or a neutral cue (i.e., at
fixation) followed by the brief presentation of a titled sine wave grating on either side of
fixation. Shortly after the removal of the target gratings (which were presented at vary
ing contrasts), a centrally presented stimulus indicated to the observer whether the ori
entation of the left or right target grating was to be reported. Thus, trials were divided in
to three equally probable types: On valid trials the peripheral cue and the response cue
were spatially congruent; on invalid trials the peripheral cue and the response cue were
spatially incongruent; and on neutral trials the response cue, which followed the centrally
presented neutral cue, was equally likely to point to the left or right target. This elegant
experimental design permitted the researchers to evaluate the effect of attention on con
trast sensitivity at cued and uncued locations. The results showed both a benefit and cost
of spatial attention. That is, relative to the neutral condition, contrast sensitivity was en
hanced at the attended location and was impaired at the unattended location. Pestilli and
Carrasco (2005) concluded that there is a processing tradeoff associated with spatial at
tention such that the benefit to perception at the attended location means that fewer cor
tical resources are available for perceptual processing at unattended spatial locations.
Page 17 of 30
Spatial Attention
An extensive amount of research has demonstrated that spatial attention improves spatial
resolution (e.g., Balz & Hock, 1997; Tsal & Shalev, 1996; Yeshurun & Carrasco, 1998;
1999; and see Carrasco & Yeshurun, 2009, for a review). In a study conducted by Yeshu
run and Carrasco (1999), which used peripheral cues to direct covert spatial attention, it
was shown that performance was better at attended than unattended locations in a vari
ety of spatial resolution tasks that required observers to either localize a spatial gap, dis
criminate between dotted and dashed lines, or determine the direction of vernier line dis
placements (Yeshurun & Carrasco, 1999). In another study (Yeshurun & Carrasco, 1998),
these researchers showed that spatial attention modulates spatial resolution via signal
enhancement (i.e., as opposed to attenuating noise or changing decisional criteria). Ob
servers performed a two-interval forced-choice texture-segregation task in which they re
ported whether the first or second display contained a unique texture patch. On peripher
al cue trials, a cue in one interval validly predicted the location of the target, but not the
interval that contained the target, and a cue in another interval simply appeared at a non
target location. Neutral cues were physically distinct from the peripheral cues, and they
provided no information concerning where the target would be in either interval. Both pe
ripheral and neutral cues were presented at a range of eccentricities from the fovea. In
terestingly, but in line with the researchers’ prediction, performance was better in cued
than neutral trials at all target eccentricities except those at, or immediately adjacent to,
the fovea. According to Yeshurun and Carrasco (1998), this counterintuitive pattern of re
sults occurred because spatial attention caused the already small spatial filters at the
fovea to become so small that their resolution was beyond what the texture segregation
task required. In other words, by enhancing spatial resolution, attention impaired task
performance at central retinal locations. Justifiably, it was concluded that one way in
which attention enhances spatial resolution is via signal enhancement (Yeshurun & Car
rasco, 1998).
Research has also revealed counterintuitive effects of spatial attention on temporal sensi
tivity. Yeshurun and Levy (2003) investigated the effect of spatial attention on temporal
sensitivity using a temporal gap detection task. Observers judged whether a briefly pre
sented target disc was on continuously, or contained a brief offset (i.e., a temporal gap).
Targets appeared following valid peripheral cues or (p. 249) after a physically distinct neu
tral cue that diffused spatial attention across the entire horizontal meridian. The results
indicated that temporal sensitivity was actually worse on valid peripheral cue trials than
on diffuse neutral cue trials. To account for the counterintuitive finding that spatial atten
tion impairs temporal resolution, Yeshurun and Levy (2003) suggested that spatial atten
tion produces an inhibitory interaction that activates the parvocellular visual pathway
and inhibits the magnocellular visual pathway. To the extent that spatial resolution de
pends on processing in the parvocellular pathway and temporal resolution depends on
Page 18 of 30
Spatial Attention
processing in the magnocellular pathway, an inhibitory interaction of this nature could ex
plain why spatial attention enhances spatial resolution and degrades temporal resolution.
The effect of spatial resolution on temporal sensitivity was further investigated in a study
by Hein, Rolke, and Ulrich (2006) that employed both peripheral and central cues in sepa
rate temporal order judgment tasks. Two targets dots, side by side, were presented asyn
chronously at either the attended or unattended location, and observers reported which
target appeared first. Consistent with the results reported by Yeshurun and Levy (2003),
when peripheral cues were used, TOJ performance was worse at the attended than the
unattended location. However, when central cues were used, performance was more ac
curate at the attended than the unattended location. To account for the qualitative differ
ence in performance across the two cue types, Hein et al. (2006) adhered to the idea that
automatic and voluntary shifts of spatial attention affect different stages of the visual sys
tem (see also Briand & Klein, 1987). Specifically, they suggested that automatic shifts of
attention influence early stages of processing and impair the temporal resolution of the
visual system, whereas voluntary shifts of attention influence processing at higher levels
and improve on the temporal resolution of the visual system (Hein et al., 2007).
Spatial attention has a similar effect on the perception of temporal offsets. In a study by
Downing and Treisman (1997), subjects were exogenously cued to one side or the other
of fixation by the transient brightening of one of two target placeholders (there was also
an endogenous component to the cue because it validly predicted the location of the tar
get event on two-thirds of the trials). Following the presentation of the cue, one of two
target dots offset at either the cued or uncued location. When the target offset occurred
at the cued location, observers were faster to respond relative to when the offset oc
curred at the uncued location. That effect of cue validity shows that “attention facilitates
Page 19 of 30
Spatial Attention
the detection of offsets at least as much as detection of onsets” (Downing & Treisman,
1997, p. 770).
Seemingly at odds with the results indicating that perception of stimulus onsets and off
sets are sped up at spatially attended locations, research has also shown that attention
prolongs perceived duration of briefly presented stimuli (e.g., Enns, Brehaut, & Shore,
1999). Mattes and Ulrich (1998) investigated the effect of attention on perceived duration
by assuming that more attention is allocated to a spatial cue as it becomes increasingly
valid. In a blocked design, observers were presented with central cues that validly pre
dicted the location either on 90 percent, 70 percent, or 50 percent of trials. Observers
were explicitly aware of the cue validity in each block, and their task was to judge
whether the presentation of the target stimulus was of short, medium, or long duration.
As expected, the results indicated that as cue validity increased, so did mean ratings of
perceived duration. Thus, endogenous spatial attention, activated by central cues, pro
longs the perceived duration of a briefly presented target stimulus (Mattes & Ulrich,
1998). That result was (p. 250) subsequently replicated and extended in a study by Enns et
al. (1999) showing that the illusion of prolonged duration was independent of the prior
entry effect. In other words, attended stimuli do not seem to last longer because they also
seem to have their onset sooner (Enns et al., 1999).
Although the effects of spatial attention on many dimensions of basic vision have been
well established in the extant research, for centuries attention researchers have contem
plated the effect of spatial attention on the perceived intensity and perceptual clarity of a
stimulus (Helmholtz, 1886; James, 1890; Wundt, 1912). In other words: Does attention al
ter the appearance of a stimulus? Despite the long-standing interest, however, only re
cently have psychophysical procedures been developed to permit attention researchers to
evaluate the question with empirical data (e.g., Carrasco, Ling, & Read, 2004; Liu,
Abrams, & Carrasco, 2009).
Carrasco et al. (2004) approached the issue by investigating the effect of transient atten
tion on perceived contrast. Uninformative peripheral cues were used to direct spatial at
tention to the left or right side of fixation, or a neutral cue was presented at fixation. On
each trial, observers were presented with a sine wave grating on either side of fixation
and were required to perform an orientation discrimination on the target grating that ap
peared higher in contrast. One of the two targets, the standard, was always presented at
a fixed contrast (i.e., near threshold), and the other target, the test, was presented at var
ious contrasts above and below that of the standard. To determine the effect of attention
on perceived contrast, the authors determined the point of subjective equality between
the test and the standard target. The results indicated that when the test target was
cued, it was perceived as being higher in contrast than it really was. In other words,
when the test was cued, its contrast was subjectively perceived as being equal to the
standard even when the actual contrast was lower than the standard. Based on this find
ing, the authors concluded “that attention changes the strength of a stimulus by increas
Page 20 of 30
Spatial Attention
ing its ‘effective contrast’ or salience” (Carrasco et al., 2004, p. 308; see also Gobell &
Carrasco, 2005).
Recently, Liu et al. (2009) showed that voluntary spatial attention also enhances subjec
tive contrast. They presented observers with valid central directional cues to neutral cen
tral cues to orient observers to one of two rapid serial visual presentation (RSVP) streams
of letters, where they were instructed to detect the rare occurrence of specific target. At
the end of the RSVP stream, a sine wave grating was briefly presented at each location.
On trials when the target was present in the cued or uncued RSVP stream, observers
made one type of response, but if the target was not present, observers reported the ori
entation of the target grating that was higher in contrast. Similar to the Carrasco et al.
(2004) study summarized above, one target grating, the standard, was presented at a
fixed contrast in all trials, and the other target grating, the test, was presented at a vary
ing range of contrasts, above and below that of the standard. The critical finding support
ed what Carrasco et al. (2004) found using peripheral uses: In order for the pairs to be
perceived as subjectively equal in contrast, the test target needed to be presented at a
lower contrast than the standard when it was at the attended location, and needed to be
at a higher contrast than the standard target when it was at the unattended location (Liu
et al., 2009). In sum, automatic and voluntary spatial attention both alter the appearance
of stimuli by enhancing perceived contrast.
Summary
Much of our current knowledge of the visual system conceptualizes it as being, in many
ways, dichotomous (e.g., cells in the central visual pathways are mainly magnocellular
and parvocellular, and visual information is processed in the putative “where/how” dorsal
pathway and the “what” ventral pathway). The research reviewed in the present chapter
presented visual spatial attention in much the same way. Spatial attention is a selective
mechanism; it determines which sensory signals control behavior, and it constrains the
rate of information processing (Maunsell, 2009). Abundant research has attempted to de
termine the units of attentional selection. On the one hand, space-based models generally
liken spatial attention to a mental spotlight (e.g., Eriksen & Eriksen, 1974; Posner, 1978)
that selects and enhances perception of stimuli falling within a region of space. On the
other hand, object-based models assert that spatial attention selects and enhances per
ception of perceptually grouped stimuli, irrespective of spatial overlap, occlusion, or dis
parity (e.g., Duncan, 1984; Neisser, 1967). Rather than one theory or the other being cor
rect, it appears that the attentional system relies on both spatial and grouping factors in
the selection process. Indeed, recent evidence suggests that relatively minor changes to
the stimuli used in a spatial cueing task can promote space- or object-based selection
(Nicol, Watter, Gray, & Shore, 2009).
Control over the allocation of spatial attention resources is also dichotomous. In
(p. 251)
deed, exogenously controlled spatial attention is deployed rapidly and automatically in re
sponse to salient stimuli that appear at peripheral locations (e.g., Jonides, 1981; Muller &
Page 21 of 30
Spatial Attention
Rabbitt, 1989; Nakayama & Mackeben, 1989), whereas endogenously controlled atten
tion is deployed voluntarily based on the interpretation of centrally presented, informa
tive stimuli (e.g., Jonides, 1981; Muller & Rabbitt, 1989; Nakayama & Mackeben, 1989).
In addition to reflecting distinct responses to relevant stimuli, these two attentional con
trol mechanisms probably also process visual information in fundamentally distinct ways
(e.g., Briand & Klein, 1987; Klein, 1994).
Another dichotomy germane to spatial attention concerns overt and covert orienting. Al
though it is clear that covert attention shifts and eye movements are not identical (e.g.,
Posner, 1980), it is uncertain precisely how the two are related. Some evidence argues for
their independence (e.g., Klein & Pontefract, 1994), but according to the premotor theory
(Rizzolatti et al., 1987)—the dominant view concerning the relationship between spatial
attention and eye movements (Palmer, 1999)—the two mechanisms are actually interde
pendent. To be sure, the research indicates that although attention shifts can occur in the
absence of an eye movement, an eye movement cannot be executed before an attentional
shift (e.g., Hoffman & Subramanian, 1995; Klein, 2004; Shepherd et al., 1986).
Spatial attention can either enhance or degrade performance in tasks that measure low-
level visual perception. Spatial attention enhances contrast sensitivity (c.f. Carrasco,
2006) and spatial sensitivity (c.f. Carrasco & Yeshurun, 2009), likely by effectively shrink
ing the receptive field size of cells with receptive fields at the attended location (Moran &
Desimone, 1985), or possibly by biasing activity of cells with smaller receptive fields at
the attended location (Yeshurun & Carrasco, 1999). In contrast, spatial attention de
grades temporal sensitivity (Yeshurun & Levy, 2003; but see Hein et al., 2006, and Nicol
et al., 2009, for exceptions). One possible explanation for this counterintuitive finding is
that spatial attention induces an inhibitory interaction that favors parvocellular over mag
nocellular activity (Yeshurun & Levy, 2003).
The final dichotomy of spatial attention reviewed in the present chapter pertained to the
partially segregated, but interacting, dorsal and ventral frontoparietal neural networks.
In their influential model, Corbetta and Shulman (2002) proposed that control of goal-di
rected (i.e., voluntary) spatial attention is accomplished in a pathway comprising dorsal
frontoparietal substrates, and control of stimulus-driven (i.e., automatic) spatial attention
takes place in a right-lateralized pathway comprising ventral frontoparietal areas. Their
model further proposes that the ventral network serves as a “circuit breaker” of the dor
sal stream to facilitate the disengagement and reorientation of attention when a behav
iorally relevant stimulus is detected (Corbetta & Shulman, 2002).
Page 22 of 30
Spatial Attention
are identical mechanisms, but they are related (Rizzolatti et al., 1987): Although a shift of
spatial attention can be made in the absence of an overt eye movement, eye movements
cannot be executed without a preceding attentional shift.
Throughout this chapter, spatial attention has been presented as a dichotomous neural
mechanism. Spatial attention can select either spatial locations (e.g. Posner, 1980) or ob
jects (e.g., Duncan, 1984) for further processing. The neural sources of spatial attention
are both subcortical and cortical, the most critical of which are perhaps the two partially
separate, but interacting, pathways that make up the dorsal and ventral frontoparietal
networks (c.f. Corbetta & Shulman, 2002). Finally, the allocation of spatial attention re
sources can be controlled automatically by stimulus-driven factors or voluntarily by top-
down processes, and these two mechanisms of attentional control have distinct effects on
perception and behavior (e.g., Jonides, 1981).
References
Balz, G. W., & Hock, H. S. (1997). The effect of attentional spread on spatial resolution. Vi
sion Research, 37, 1499–1510.
Bichot, N. P., Cave, K. R., & Pashler, H. (1999). Visual selection mediated by location: Fea
ture-based selection of noncontiguous locations. Perception & Psychophysics, 61, 403–
423.
Briand, K. A., & Klein, R. M. (1987). Is Posner’s “beam” the same as Treisman’s
(p. 252)
“glue”? On the relation between visual orienting and feature integration theory. Journal of
Experimental Psychology: Human Perception and Performance, 13, 228–241.
Cameron, E. L., Tai, J. C., & Carrasco, M. (2002). Covert attention affects the psychomet
ric function of contrast sensitivity. Vision Research, 42, 949–967.
Carrasco, M., Ling, S., & Read, S. (2004). Attention alters appearance. Nature Neuro
science, 7, 308–313.
Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye
to visual locations: identical, independent, or overlapping systems? Proceedings of the
National Academy of Science U S A, 95, 831–838.
Page 23 of 30
Spatial Attention
Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A., et al.
(1998). A common network of functional areas for attention and eye-movements. Neuron,
21, 761–773.
Corbetta, M., Kincade, M. J., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature Neuroscience, 3, 292–297.
Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L., & Petersen, S. E. (1993). Atten
tional modulation of neural processing of shape, color, and velocity in humans. Science,
248, 1556–1559.
Corbetta, M., Miezin, F. M., Shulman, G. L. & Petersen, S. E. (1993). A PET study of visu
ospatial attention. Journal of Neuroscience, 13, 1202–1226.
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten
tion in the brain. Nature Neuroscience, 3, 201–215.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An
nual Review of Neuroscience, 18, 193–222.
Driver, J., & Baylis, G. C. (1989). Movement and visual attention: The spotlight metaphor
breaks down. Journal of Experimental Psychology: Human Perception and Performance, 3,
448–456.
Downing, P. E., & Treisman, A. M. (1997). The line-motion illusion: Attention or impletion?
Journal of Experimental Psychology: Human Perception and Performance, 23, 768–779.
Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology: General, 113, 501–517.
Eason, R. G. (1981). Visual evoked potential correlates of early neural filtering during se
lective attention. Bulletin of the Psychonomic Society, 18, 203–206.
Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo
cations: Evidence from normal and parietal lesion subjects. Journal of Experimental Psy
chology: General, 123, 161–177.
Enns, J. T., Brehaut, J. C., & Shore, D. I. (1999). The duration of a brief event in the mind’s
eye. Journal of General Psychology, 126, 355–372.
Eriksen, C. W., & Eriksen (1974). Effects of noise letters upon the identification of a tar
get letter in a nonsearch task. Perception & Psychophysics, 16, 143–149.
Eriksen, C. W., & Hoffman, J. E. (1972). Some characteristics of selective attention in visu
al perception determined by vocal reaction time. Perception & Psychophysics, 11, 169–
171.
Page 24 of 30
Spatial Attention
Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of fo
cal attention: a zoom lens model. Perception & Psychophysics, 40, 225–240.
Eriksen, C. W., & Yeh, Y. (1985). Allocation of attention in the visual field. Journal of Ex
perimental Psychology: Human Perception and Performance, 11, 583–597.
Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu
mans: A comparison of superior parietal and temporal-parietal junction lesions. Neuropsy
chology, 12, 193–207.
Fox, M. D., Corbetta, M., Snyder, A. Z., Vincent, J. L., & Raichle, M. E. (2006). Sponta
neous neuronal activity distinguishes human dorsal and ventral attention systems. Pro
ceedings of the National Academy of Sciences, 103, 10046–10051.
Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu
mans: A comparison of superior parietal and temporal-parietal junction lesions. Neuropsy
chology, 12, 193–207.
Funes, M. J., Lupianez, J., & Milliken, B. (2007). Separate mechanisms recruited by exoge
nous and endogenous spatial cues: Evidence from the spatial Stroop paradigm. Journal of
Experimental Psychology: Human Perception and Performance, 33, 348–362.
Gobell, J., & Carrasco, M. (2005). Attention alters the appearance of spatial frequency
and gap size. Psychological Science, 16, 644–651.
Gobell, J. L., Tseng, C. H., & Sperling, G. (2004). The spatial distribution of visual atten
tion. Vision Research, 44, 1273–1296.
Goldberg, M. E., & Wurtz, R. H. (1972). Activity of superior colliculus cells in behaving
monkey. I. Visual receptive fields of single neurons. Journal of Neurophysiology, 35, 542–
559.
Hein, E., Rolke, B., & Ulrich, R. (2006). Visual attention and temporal discrimination: Dif
ferential effects of automatic and voluntary cueing. Visual Cognition, 13, 29–50.
Hillyard, S. A., & Mangun, G. R. (1987). Sensory gating as a physiological mechanisms for
visual selective attention. In R. Johnson, R. Parasuraman, & J. W. Rohrbaugh (Eds.), Cur
rent trends in event-related potential research (pp. 61–67). Amsterdam: Elsevier.
Hoffman, J. E., & Subramanian, B. (1995). The role of visual attention in saccadic eye
movements. Perception & Psychophysics, 57, 787–795.
Hopfinger, J. B., Buonocre, M. H., & Mangun, G. R. (2000). The neural mechanisms of top-
down attentional control. Nature Neuroscience, 3, 284–291.
Page 25 of 30
Spatial Attention
Hunt, A., & Kingstone, A. (2003). Covert and overt voluntary attention: Linked or inde
pendent? Cognitive Brain Research, 18, 102–115.
Johnston, W. A., & Dark, V. J. (1986). Selective attention. Annual Review of Psychology, 37,
43–75.
Jonides, J. (1981). Voluntary versus automatic control over the mind’s eye’s move
(p. 253)
ment. In J. Long & A. Baddeley (Eds.), Attention and performance VIII (pp. 259–276).
Hillsdale, NJ: Erlbaum.
Jonides, J., & Mack, R. (1984). On the cost and benefit of cost and benefit. Psychological
Bulletin, 96, 29–44.
Kahneman, D., & Treisman, A. (1984). Changing views of attention and automaticity. In R.
Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 29–62). New York: Academ
ic Press.
Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object
specific integration of information. Cognitive Psychology, 24, 175–219.
Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Na
ture Reviews: Neuroscience, 1, 91–100.
Kastner, S., McMains, S. A., & Beck, D. M. (2009). Mechanisms of selective attention in
the human visual system: Evidence from neuroimaging. In M. S. Gazzaniga (Ed.), The cog
nitive neurosciences (4th ed.). Cambridge, MA: MIT Press.
Kastner, S. & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cor
tex. Annual Review of Neuroscience, 23, 315–341.
Klein, R. M. (2004). On the control of visual orienting. In M. Posner (ed.), Cognitive neu
roscience of attention (pp. 29–43.). New York: Guilford Press.
Klein, R. M., & Hansen, E. (1990). Chronometric analysis of spotlight failure in endoge
nous visual orienting. Journal of Experimental Psychology: Human Perception and Perfor
mance, 16, 790–801.
Klein, R. M., & Pontefract, A. (1994). Does oculomotor readiness mediate cognitive con
trol of visual attention? Revisited! In C. Umilta (Ed.), Attention and performance XV (pp.
333–350). Hillsdale, NJ: Erlbaum.
Page 26 of 30
Spatial Attention
Klein, R. M., & Shore, D. I. (2000). Relations among modes of visual orienting. In S. Mon
sell & J. Driver (Eds.), Attention and performance XVIII (pp. 195–208). Hillsdale, NJ: Erl
baum.
Kramer, A. F., & Hahn, S. (1995). Splitting the beam: Distribution of attention over non
contiguous regions of the visual field. Psychological Science, 6, 381–386.
Kramer, A. F., & Jacobson, A. (1991). Perceptual organization and focused attention: The
role of objects and proximity in visual processing. Perception & Psychophysics, 50, 267–
284.
LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen
tal Psychology: Human Perception and Performance, 9, 371–379.
Liu, T., Abrams, J., & Carrasco, M. (2009). Voluntary attention enhances contrast appear
ance. Psychological Science, 20, 354–362.
Lupianez, J., Decraix, C., Sieroff, E., Chokron, S., Milliken, B., & Bartolomeo, P. (2004). In
dependent effects of endogenous and exogenous spatial cueing: Inhibition of return at en
dogenously attended target locations. Experimental Brain Research, 159, 4, 447–457.
Mackeben, M., & Nakayama, K. (1993). Express attentional shifts. Vision Research, 33,
85–90.
Marois, R., Chun, M. M., & Gore, J. C. (2000). Neural correlates of the attentional blink.
Neuron, 28, 299–308.
Mattes, S., & Ulrich, R. (1998). Directed attention prolongs the perceived duration of a
brief stimulus. Perception & Psychophysics, 60, 1305–1317.
Maunsell, J. H. R. (2009). The effect of attention on the responses of individual visual neu
rons. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (4th ed.). Cambridge, MA:
MIT Press.
Moran, J., & Desimone, R. (1985). Selection attention gates visual processing in the ex
trastriate cortex. Science, 229, 782–784.
Muller, H. J., & Rabbitt, P. M. A. (1989). Reflexive and voluntary orienting of visual atten
tion: Time course of activation and resistance to interruption. Journal of Experimental
Psychology: Human Perception and Performance, 15, 315–330.
Page 27 of 30
Spatial Attention
Nakayama, K., & Mackeben, M. (1989). Sustained and transient components of focal visu
al attention. Vision Research, 29, 1631–1647.
Nicol, J. R., Watter, S., Gray, K., & Shore, D. I. (2009). Object-based perception mediates
the effect of exogenous attention on temporal resolution. Visual Cognition, 17, 555–573.
O’Connor, D. H., Fukui, M. M., Pinsk, M. A., & Kastner, S. (2002). Attention modulates re
sponses in the human lateral geniculate nucleus. Nature Neuroscience, 5, 1203–1209.
Paus, T. (1996). Localization and function of the human frontal eye-field: A selective re
view. Neuropsychologia, 34, 475–483.
Pestilli, F., & Carrasco, M. (2005). Attention enhances contrast sensitivity at cued and im
pairs it at uncued locations. Vision Research, 45, 1867–1875.
Petersen, S. E., Robinson, D. L., & Morris, D. J. (1987). Contributions of the pulvinar to vi
sual spatial attention. Neuropsychologia, 25, 97–105.
Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G.
Bouwhuis (Eds.), Attention and performance X (pp. 521–556). Hillsdale, NJ: Erlbaum.
Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended processing
modes: The role of set for spatial location. In H. L. Pick and J. J. Saltzman (Eds.), Modes of
perceiving and processing information (pp. 137–157). Hillsdale, NJ: Erlbaum.
Posner, M. I., & Petersen, S. E. (1990). The attention system of the human brain. Annual
Review of Neuroscience, 13, 25–42.
Posner, M. I., Petersen, S. E., Fox, P. T., & Raichle, M. E. (1988). Localization of cognitive
operations in the human-brain. Science, 240, 1627–1631.
Posner, M. I., Snyder, C. R. R., Davidson, B. J. (1980). Attention and the detection of sig
nals. Journal of Experimental Psychology: General, 109, 160–174.
Pratt, J., & Nghiem, T. (2000). The role of the gap effect in the orienting of attention: Evi
dence for attentional shifts. Visual Cognition, 7, 629–644.
Page 28 of 30
Spatial Attention
Rizzolatti, G., Riggio L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favour of a premotor theory of attention.
Neuropsychologia, 25, 31–40.
Robinson, D. L., & Petersen, S. (1992). The pulvinar and visual salience. Trends in Neuro
sciences, 15, 127–721.
Serences, J. T., Shomstein, S., Leber, A. B., Golay, X., Egeth, H. E., & Yantis, S. (2005). Co
ordination of voluntary and stimulus-driven attentional control in human cortex. Psycho
logical Science, 16, 214–222.
Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye move
ments and spatial attention. Quarterly Journal of Experimental Psychology, 38A, 475–491.
Shore, D. I., & Spence, C. (2005). Prior entry. In L. Itti, G. Rees, & J. K. Tsotos (Eds.), Neu
robiology of attention (pp. 89–95). Amsterdam: Elsevier.
Shore, D. I., Spence, C., & Klein, R. M. (2001). Visual prior entry. Psychological Science,
12, 205–212.
Shulman, G. L. McAvoy, M. P., Cowan, M. C., Astafiev, S. V., & Tansy, A. P., d’Avossa, G., et
al. (2003). Quantitative analysis of attention and detection signals during visual search.
Journal of Neurophysiology, 90, 3384–3397.
Stelmach, L. B., & Herdman, C. M. (1991). Directed attention and the perception of tem
poral order. Journal of Experimental Psychology: Human Performance and Performance,
17, 539–550.
Spence, C., Shore, D. I., & Klein, R. M. (2001). Multisensory prior entry. Journal of Experi
mental Psychology: General, 130, 799–832.
Spence, C., & Driver, J. (1994). Covert spatial orienting in audition—exogenous and en
dogenous mechanisms. Journal of Experimental Psychology: General, 20, 555–574.
Spence, C., & Driver, J. (2004). Crossmodal space and crossmodal attention. London: Ox
ford University Press.
Titchener, E. B. (1908). Lectures and the elementary psychology of feeling and attention.
New York: Macmillan.
Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive
Psychology, 12, 97–136.
Treisman, A., Kahneman, D., & Burkell, J. (1983). Perceptual objects and the cost of filter
ing. Perception & Psychophysics, 33, 527–532.
Page 29 of 30
Spatial Attention
Tsal, Y., & Shalev, L. (1996). Inattention magnifies perceived length: The attentional re
ceptive field hypothesis. Journal of Experimental Psychology: Human Perception and Per
formance, 22, 233–243.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. G. Ingle, M. A.
Goodale, & R. J. Q. Mansfield (Eds.), Analysis of visual behaviour (pp. 549–586). Cam
bridge, MA: MIT Press.
Van Voorhis, S. T., & Hillyard, S. E. (1977). Visual evoked potentials and selective atten
tion to points in space. Perception & Psychophysics, 22, 54–62.
Wright, R. D., Richard, C. M., & McDonald, J. J. (1995). Neutral location cues and cost/
benefit analysis of visual attention shifts. Canadian Journal of Experimental Psychology,
49, 540–548.
Wright, R. D., & Ward, L. M. (2008). Orienting of attention. New York: Oxford University
Press.
Wundt, W. (1912). An introduction to psychology (R. Pinter, Trans.). London: Allen & Un
win.
Yantis, S. Schwarzbach, J., Serences, J. T., Carlson, R. L. Steinmetz, M. A., Pekar, J. J., &
Courtney, S. M. (2002). Transient neural activity in human parietal cortex during spatial
attention shifts. Nature Neuroscience, 5, 995–1002.
Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396, 72–75.
Yeshurun, Y., & Carrasco, M. (1999). Spatial attention improves performance in spatial
resolution tasks. Vision Research, 39, 293–306.
Yeshurun, Y., & Levy, L. (2003). Transient attention degrades temporal resolution. Psycho
logical Science, 14, 225–231.
Jeffrey R. Nicol
Page 30 of 30
Attention and Action
At every moment, we face choices: Is it time to work, or to play? Should I listen to this
lecture, or check my e-mail? Should I pay attention to what my significant other is saying,
or do a mental inventory of what work I need to accomplish today? Should I keep my
hands on the wheel, or change the radio station? Without any change to the external envi
ronment, it is possible to select a subset of these possibilities for further action. The
process of selection is called attention, and it operates in many domains, from selecting
our higher level goals, to selecting the sensory information on which we focus, to select
ing what actions we perform. This chapter focuses on the relationship between visual at
tention (selecting visual inputs) and action (selecting and executing movements of the
body). As a case study, the authors focus on visual-spatial attention, the act of choosing to
attend to a particular location in the visual field, and its relationship to eye-movement
control. Visual attention appears to select the targets for eye movements because atten
tion to a location necessarily precedes an eye movement to that location. Moreover, there
is a great deal of overlap in the neural mechanisms that control spatial attention and eye
movements, and the neural mechanisms that are specialized for spatial attention or eye-
movements are highly intertwined. This link between spatial attention and eye move
ments strongly supports the idea that a computational goal of the visual attention system
is to select targets for action, and suggests that many of the design properties of the spa
tial attention system might be optimized for the control of eye movements. Whether this
relationship will hold broadly between other forms of attention (e.g., goal selection, audi
tory selection, tactile selection) and other forms of action (e.g., hand movements and lo
comotion) is an important topic of contemporary and future research.
Keywords: attention, action, visual attention, visual-spatial attention, eye movements, spatial attention
Let us start with a fairly ordinary situation (for me at least): you are hungry, and you de
cide to get a snack from the refrigerator. You open the door with your left hand and arm,
and you look inside. Because you just went shopping you have a fairly stocked refrigera
tor, so you have a lot of food to choose from. You are thinking that a healthy snack is in
order, and decide to look for carrots. How will you find the carrots? Most likely you will
Page 1 of 32
Attention and Action
“tune your visual system” to look for orange things, and all of the sudden you will become
aware of the location of your orange juice, your carrots, some sharp cheddar cheese, and
any other orange things in sight. You know that you keep your juice on the top shelf, and
your carrots near the bottom, so you further tune your visual system to look for orange
things near the bottom. Now you’ve zeroed in on those carrots and are ready to make
your move and grab them. You reach, and with extreme precision you position your hand
and fingers in exactly the right shape required to grasp the carrots, and begin snacking.
I occasionally have poured juice into my cereal (by mistake), most likely because
(p. 256)
the milk and juice containers appeared somewhat similar in shape. But have you ever ac
cidentally grabbed a carton of eggs instead of carrots? Have you ever reached for a bag
of carrots as if they had a handle like a gallon of milk? Neither have I, and that is proba
bly because we all have vision and action systems that have, for the most part, selected
the visual inputs and the physical actions that meet our current goals. This deceptively
simple act of grabbing the bag of carrots in your refrigerator is in fact an extremely im
pressive behavior that requires the activity of a large network of brain regions (possibly
most of them!). These different regions are specialized for setting up your goal states,
processing incoming visual information, and executing actions. But none of this would
work if the system did not have “selective mechanisms” that enable us to select a subset
of visual inputs (the carrots and not the juice), and a subset of possible actions (the car
rot-reach and not the juice-reach) that match our current goal state (getting a snack and
not a beverage). Not only do we have these selective mechanisms, but they also tend to
work in a coordinated and seamless fashion.
The overarching purpose of this chapter is to introduce the concept of attentional selec
tion and the role that it plays in visual perception and action. To begin, the chapter pro
vides a general definition of attention, followed by a specific introduction to visual atten
tion. Next, the link between visual attention and eye movements is reviewed as a case
study exploring the relationship between visual attention and action. By taking you
through definitions, theoretical considerations, and evidence linking visual attention and
action, this chapter is aimed at three specific goals: (1) to provide a nuanced view of at
tention as a process that operates at multiple levels of cognition, (2) to summarize empiri
cal evidence demonstrating that at least one form of visual attention (visual-spatial atten
tion) interacts with the action system, and (3) to provide a theoretical framework for un
derstanding how and why visual attention and action interact.
which the core aspects of attention are (1) we can selectively process some stimuli more
than others (selection), (2) we are limited in the number of things we can process simulta
neously (capacity limitation), and (3) sustained processing involves effort or exertion (ef
fort). Such a definition will apply to a wide range of cognitive processes. For instance,
there are many possible things you can pay attention to in your visual field at this mo
ment. Among other things, I currently have a keyboard, a coffee mug, a bottle of wine,
and some knives in sight. The process of selecting a subset of those visual inputs to focus
on is called attention. There are also many possible actions you can take at any given mo
ment. I can move my eyes to focus on my coffee mug and pick it up, I can continue typing,
or I can get up and find a corkscrew in my kitchen. The process of selecting a subset of
those possible actions is also called attention. Finally, which one of these acts of selection
I execute depends on my goal state. Do I want to finish my work tonight? What is the
most effective approach for accomplishing this task? Should I keep typing, or reach for
the coffee mug? Or would it serve me better in the long run to have a glass of wine and
listen to some music, and start again tomorrow? What’s on TV? Choosing to focus on one
of these trains of thought and set up stable goals is also called attention.
At first it is confusing that we refer to all of these different acts by the same term (atten
tion by James’ definition, and others’). These different acts of selection do not seem like
they could possibly be the same thing in the mind and brain. Of course, that’s because
they are not the same thing in either. The difficulty here is that we tend to think about the
term “attention” as a noun, and that’s the wrong way to think about it. Think of the word
“attention” as a verb instead, and it is a little easier to understand how and why the term
attention accurately applies to so many situations. Now attention is more like the word
“walk” which can be true of people, rats, horses. They are all the same because they all
walk, but they are not all the same thing in any physical sense. Likewise, selecting visual
inputs, selecting actions, and selecting goals are all forms of attention because they in
volve a form of selection, but they are not all the same thing. What they share in common
is the process of selection: There are many competing representations (visual inputs, ac
tion plans, goal states), and attention is the process of (p. 257) selecting a subset of those
competing representations. On this view, attention is a process that operates in multiple
domains. Thus, whenever you hear the term “attention,” you should ask, “Selection in
what domain? What are the competing representations that make selection necessary?”
With that starting point you can then ask questions about the mechanisms by which a
particular act of selection is achieved, the units of that selection process, and the conse
quences of selection.
This view of attention does not require that the mechanisms of selection in each domain
are completely independent, or that there cannot be general mechanisms of attention. Ul
timately selection operates over distinct competing representations in each domain (goals
compete with other goals, visual inputs compete with other visual inputs, action represen
tations compete with other action representations). However, it is likely that goal selec
tion (e.g., to continue working) interacts with visual selection (e.g., to focus on the coffee
mug), and that both forms of selection interact with action selection (e.g., to reach for the
handle). The competition happens mostly within a domain, so the process of selection also
Page 3 of 32
Attention and Action
happens mostly within a domain. But each process operates within an integrated cogni
tive system, and the selective processes in one domain will influence selective processes
in the others. This chapter reviews evidence that the processes of visual selection and ac
tion selection, although separable, are tightly coupled in this way.
In the following section we introduce visual attention, including limits on visual attention,
mechanisms of visual attention, and theoretical proposals for why we need visual atten
tion, which is a useful starting point for understanding how visual attention interacts with
the control of action.
The limits on our ability to pay attention are sometimes startling, particularly in cases of
inattentional blindness (Mack & Rock, 1998; Most, Scholl, Clifford, & Simons, 2005; Si
mons & Chabris, 1999) and change blindness (J. K. O’Regan, Rensink, & Clark, 1999;
Rensink, O’Regan, & Clark, 1997; Simons & Rensink, 2005). For instance, in one study Si
mons and Chabris (1999) had participants watch a video in which two teams (white shirts,
black shirts) tossed a basketball between players on the same team. The task was to
count the number of times players on the white-shirt team tossed the ball to another play
er on the same team. Halfway through the video, a man in a black gorilla suit walked into
the scene from off camera, passing through the middle of the action right between play
ers, stopping in the middle of the screen to thump his chest, and then continuing off
screen to the left. Remarkably, half of the participants failed to notice the gorilla, even
though it was plainly visible for several seconds! Indeed, when shown a replay of the
video, participants were often skeptical that the event actually occurred during their ini
tial viewing. The original video for this experiment can be viewed at http://
www.dansimons.com/.
Similar results have been found in conditions with much higher stakes. For instance,
when practicing landing an airplane in a simulator with a heads-up display in which con
trol panels are superimposed on the cockpit window, many pilots failed to see the appear
ance of an unexpected vehicle on the runway (Haines, 1991). Attending to the control
Page 4 of 32
Attention and Action
panel superimposed over the runway appeared to prevent noticing completely visible and
critical information. Similarly, traffic accident reports often include accounts of drivers
“not seeing” clearly visible obstacles (McLay, Anderson, Sidaway, & Wilder, 1997). Such
occurrences are typically interpreted as attentional lapses: It seems that we have a se
verely limited ability to perceive, understand, and act on information that falls outside the
current focus of attention.
Change-blindness studies provide further evidence for our limited ability to attend to vi
sual information (J. K. O’Regan, Deubel, Clark, & Rensink, 2000; J. K. O’Regan, et al.,
1999; Rensink, et al., 1997; Simons, 2000; Simons & Levin, 1998; Simons & Rensink,
2005). In a typical (p. 258) change-blindness study, observers watch as two pictures are
presented in alternation, with a blank screen between them. The two pictures are identi
cal, except for one thing that is changing, and the task is simply to identify what is chang
ing. Even when the difference between images is a substantial visual change (e.g., a large
object repeatedly disappearing and reappearing), observers often fail to notice the
change for several seconds. For demonstrations, go to http://www.dansimons.com/ or
http://visionlab.harvard.edu/Members/George/demo-GradualChange.html. These studies
provide a dramatic demonstration that, although our subjective experience is that we
“see the whole visual field,” we are unable to detect a local change to our environment
unless we focus attention on the thing that changes (Rensink, et al., 1997; Scholl, 2000).
Indeed, with attention, such changes are trivially easy to notice in standard change-blind
ness tasks, making the failure to detect such changes before they are localized all the
more impressive.
Thus, it appears that we need attention to keep track of information in a dynamic visual
environment. We fail to notice new visual information, or changes to old visual informa
tion, unless we selectively attend to that information. If we could pay attention to all of
the incoming visual information at once, we would not observe these dramatic inatten
tional blindness and change-blindness phenomena.
Fortunately the visual system is equipped with a variety of mechanisms that control the
allocation of attention, effectively tuning our attention toward salient and task-relevant
information in the visual field. The mechanisms of attentional control are typically divided
into stimulus-driven and goal-driven mechanisms (for a review, see Yantis, 1998). Stimu
lus-driven attentional shifts occur when there is an abrupt change in the environment,
particularly the appearance of a new object (Yantis & Jonides, 1984), or more generally
when certain dynamic events occur in the environment (Franconeri, Hollingworth, & Si
mons, 2005; Franconeri & Simons, 2005). Our attention also tends to be guided toward
salient portions of the visual field, such as a location that differs from its surround in
terms of color or other features (Itti & Koch, 2000).
Page 5 of 32
Attention and Action
Research on object-based attention suggests that attention can select discrete objects,
spreading through them and constrained by their boundaries, while suppressing informa
tion that does not belong to the selected object. There is a wide range of empirical sup
port for this theory. In the classic demonstration of object-based attention, attention is
cued to part of an object where the speed and accuracy of perceptual processing is en
hanced. Critically, performance is enhanced at uncued locations that are part of the same
object, relative to uncued locations equally distant from the cue, but within part of anoth
er object (Atchley & Kramer, 2001; Avrahami, 1999; Egly, Driver, & Rafal, 1994; Z. J. He &
Nakayama, 1995; Lamy & Tsal, 2000; Marino & Scholl, 2005; Vecera, 1994). Similarly, di
vided attention studies have shown that, when the task is to report or compare two target
features, performance is better when the target features lie within the boundaries of the
same object compared with when the features are spatially equidistant but appear in dif
ferent objects (Ben-Shahar, Scholl, & Zucker, 2007; Duncan, 1984; Kramer, Weber, & Wat
son, 1997; Lavie & Driver, 1996; Valdes-Sosa, Cobo, & Pinilla, 1998; Vecera & Farah,
1994). The fact that these “same object” advantages occur suggests that visual attention
automatically selects entire objects. Many other paradigms provide evidence supporting
this hypothesis, including studies using functional imaging (e.g., O’Craven, Downing, &
Kanwisher, 1999), visual search (e.g., Mounts & Melara, 1999), negative priming (e.g.,
Tipper, Driver, & Weaver, 1991), inhibition of return (e.g., Reppa & Leek, 2003; Tipper et
al., 1991), object reviewing (e.g., Kahneman, Treisman, & Gibbs, 1992; Mitroff, Scholl, &
Wynn, 2004), attentional capture (e.g., Hillstrom & Yantis, 1994), visual illusions (e.g.,
Cooper & Humphreys, 1999), and patient studies (e.g., Humphreys, 1998; Ward,
Goodrich, & Driver, 1994).
Feature-based attention is the ability to tune attention to a particular feature (e.g., red)
such that all items in the visual field containing that feature are simultaneously selected.
Early research provided evidence for feature-based attention using the visual search par
adigm (Bacon & Egeth, 1997; Egeth, Virzi, & Garbart, 1984; Kaptein, Theeuwes, & Van
der Heijden, 1995; Zohary & Hochstein, 1989). For example, Bacon and Egeth (1997)
asked observers to search for a red X among red Os and black Xs. When participants
were instructed to attend to (p. 259) only the red items, the time to find the target was in
dependent of the number of black Xs. This finding suggests that attention could be limit
ed to only the red items in the display, with black items filtered out. This example fits with
the intuition that if your friend were one of a few people wearing a red shirt at a St.
Patrick’s Day party, she would be relatively easy to find in the crowd. Other evidence that
attention can be tuned to specific stimulus features comes from inattentional blindness
studies (Most & Astur, 2007; Most et al., 2001, 2005). For example, Most and colleagues
(2001) asked observers to attentively track a set of moving objects, and asked whether
they noticed the appearance of an unexpected object on one of the trials. The likelihood
of noticing the unexpected object depended systematically on its similarity to the attend
ed objects. For example, when observers attended to white objects, they were most likely
Page 6 of 32
Attention and Action
to notice the unexpected object if it was white, less likely if it was gray, and least likely if
it was black. Similarly, the time to find the target in a visual search task depends on the
similarity between targets and distractors (Duncan & Humphreys, 1989; Konkle, Brady,
Alvarez, & Oliva, 2010), as does the likelihood of a distractor capturing attention (Folk,
Remington, & Johnston, 1992). Other research has shown that attention can be tuned to
particular orientations or spatial frequencies (Rossi & Paradiso, 1995), as well as color
and motion direction (Saenz, Buracas, & Boynton, 2002, 2003). These effects of feature-
based attention appear to operate over the full visual field, such that attending to a par
ticular feature selects all objects in the visual field that share the same feature, even for
stimuli that are irrelevant to the task (Saenz, et al., 2002).
a finite system (fair assumption), and consequently is limited in the amount of informa
tion it can process at once. On this view, we need attention to choose which inputs into
the visual system will be fully processed, and which inputs will not be fully processed. An
alternative framework, the attention-for-action framework, assumes that a particular
body part can only execute a single action at once (another fair assumption), and that at
tention is needed to select a target for that action. On this view, visual attention is limited
because the action system is limited. Although not necessarily mutually exclusive, these
frameworks have been presented in opposition. In this section, I summarize these theo
retical frameworks and conclude (1) that visual attention is clearly required to handle in
formation processing (p. 260) constraints (supporting a capacity-limitation view) and (2)
that attention is not necessarily limited because action is limited. This is in line with the
view that attention is a process that operates at multiple levels (goal selection, action se
lection, visual selection), and highlights the point that the process of selection in one do
main need not be constrained by the process of selection in another domain. However, as
described in the following section, there is clearly a tight link between at least one form
of visual attention—visual-spatial attention—and action.
One class of attentional theory assumes that there is simply too much information flood
ing into the visual system to fully process all of it at once. Therefore, attentional mecha
nisms are required to select which inputs to process at any given moment. This informa
tion overload begins at the earliest stages of visual processing, when a massive amount of
information is registered. The retina has about 6.4 million cones and 110 to 125 million
rods (Østerberg, 1935), which is a fairly high-resolution sampling of the visual field
(equivalent to a 6.4-megapixel camera in the fovea). A million or so axons from each eye
then project information to the brain for further processing (Balazsi, Rootman, Drance,
Schulzer, & Douglas, 1984; Bruesch & Arey, 2004; Polyak, 1941; Quigley, Addicks, &
Green, 1982). These initial measurements must be passed on to higher-level processing
mechanisms that enable the visual system to recognize and localize objects in the visual
field. However, simultaneously processing all of these inputs in order to identify every ob
ject in the visual field at once would be highly computationally demanding. Intuitively,
such massively parallel processing is beyond the capacity of the human cognitive system
(Broadbent, 1958; Neisser, 1967). This intuition is also supported by computational analy
ses of visual processing (Tsotsos, 1988), which suggest that such parallel processing is
not viable within the constraints of the human visual system. Thus, the capacity limita
tions on higher-level cognitive processes, such as object identification and memory encod
ing, require a mechanism for selecting which subset of the incoming information should
gain access to these higher-level processes.
Of course, there is good behavioral evidence that the visual system is not fully processing
all of the incoming information simultaneously, including the inattentional blindness and
change blindness examples described above. In your everyday life, you may have experi
enced these phenomena while driving down the road and noticing a pedestrian that
seemed to “appear out of nowhere,” or missing the traffic light turning green even
Page 8 of 32
Attention and Action
though you were looking in the general direction of the light. Or you may have experi
enced not being able to find something that was in plain sight, like your keys on a messy
desk. It seems safe to say that we are not always aware of all of the information coming
into our visual system at any given moment. Take a look at Figure 13.1, which depicts a
standard visual search task used to explore this in a controlled laboratory setting. Focus
your eyes on the “x” at the center of the display, and then look for the letter T, which is lo
cated somewhere in the display, tilted on its side in either the counterclockwise () or
clockwise () direction. Try not to be fooled by the almost Ts, like ,, , or . Why does it take
so long to find the T? Is it that you “can’t see it?” Not really, because now that you know
where it is (far right, just below halfway down), you can see it perfectly well even while
continuing to focus your eyes at the x at the center.
How can we understand our failure to “see” things that are plainly visible? One possibility
is that the visual system has many specialized neurons that code for different properties
of objects (e.g., some neurons code for color, others code for shape), and that attention is
required to combine these different features into a single coherent object representation
(Treisman & Gelade, 1980). Attending to multiple objects at once would result in confu
sion between the features of one object and the features of the other attended objects. On
this account, attention can only perform this feature integration operation correctly if it
operates over just one object at a time. (p. 261) Thus, you cannot see things that are plain
ly visible because at any give moment you can only accurately see the current object of
attention.
An alternative theory, biased competition (Desimone, 1998; Desimone & Duncan, 1995),
places the capacity limit at the level of neuronal representation. In this model, when mul
tiple objects fall within the receptive field of a neuron in visual cortex, those objects com
pete for the response of the neuron. Neurons are tuned such that they respond more for
Page 9 of 32
Attention and Action
some input features than for other input features (e.g., a particular neuron might fire
strongly for shape A and not at all for shape B). How would such a neuron respond when
multiple inputs fall within its receptive field simultaneously (e.g., if both shape A and
shape B fell within the receptive field)? One could imagine that you would observe an av
erage response, somewhere between the strong response to the preferred stimulus and a
weak response to the nonpreferred stimulus. However, it turns out that the response of
the neuron depends on which object is attended, with stronger responses when the pre
ferred stimulus is attended and weaker responses when the nonpreferred stimulus is at
tended. Thus, attention appears to bias this neuronal competition in favor of the attended
stimulus. Both stimulus-driven and goal-driven mechanisms can bias the representation of
the neuron in favor of one object over the other, and the biasing signal (“tuning mecha
nism”) can be feature based or space based. This biasing signal enhances the response to
the attended object and suppresses the response to the unattended object. The capacity
limit in this case is at the level of the individual neuron, which cannot represent two ob
jects at once, and therefore attentional mechanisms are required to determine which ob
ject is represented. This theory has been supported by a variety of physiological evidence
in monkeys (for a review, see Desimone, 1998) and is consistent with behavioral evidence
in humans (Carlson, Alvarez, & Cavanagh, 2007; Motter & Simoni, 2007; Scalf & Beck,
2010; Torralbo & Beck, 2008).
An analysis of the complexity of the problems the visual system must solve rules out the
possibility that a human-sized visual system could fully process all of the incoming visual
information at once. The limits on processing capacity can be conceived as limits on spe
cific computations (e.g., feature binding), or in terms of neuronal competition (e.g., com
peting for the response of a neuron). On either view, it is clear that visual selection is re
quired, at least in part if not in full, by capacity limitations that arise within the stream of
visual processing owing to the architecture of the visual system.
An alternative class of theories holds that attention is limited because action is limited.
Because of physical constraints, your body parts can only move in one direction at a time:
Each hand can only move in one direction, each eye can only move in one direction, and
so on. According to the attention-for-action view, the purpose of visual attention is to se
lect the most relevant visual information for action and to suppress irrelevant visual infor
mation. Consequently, you are limited in the number of things you can pay attention to at
once because your body is limited in the number of things it can act on at once. In its
strongest form, the attention-for-action theory proposes that attentional selection limits
are not set by information processing constraints: Attention is limited because action is
limited (Allport, 1987, 1989, 1993; Neumann, 1987, 1990; Van der Heijden, 1992). The
most appealing aspect of the attention-for-action theory is the most intuitive part: It
makes sense that we would pay attention to things while we perform actions on them.
Even without any data, I am convinced that I attend to the handle of my coffee mug just
before I pick it up—and of course eye-movement studies support this intuition (Land &
Hayhoe, 2001). However, the satisfaction of this part of the attention-for-action theory
Page 10 of 32
Attention and Action
does not imply that attention is limited because action is limited. It is possible for atten
tion and action to be tightly coupled, but for attentional limitations to be determined by
factors other than the limitations on action, such as the architecture of the visual system
(e.g., neurons that encompass large regions of the visual field consisting of multiple ob
jects). I will support a view in which visual attention selects the targets for action, but in
which the limitations on visual attention are not set by the limitations on action.
To push the claim that attention is limited because action is limited, we can think about
the limits on visual attention and ask if they match with limits on physical action. For in
stance, feature-based visual attention is capable of globally selecting a feature such as
“red” or “moving to the right,” such that all objects or regions in the visual field sharing
that feature receive enhanced processing over objects or regions that do not have the se
lected feature (Bacon & Egeth, 1997; Rossi & Paradiso, 1995; Saenz, et al., 2002, 2003;
Serences & Boynton, 2007; Treue & Martinez Trujillo, 1999). Consequently, in some situa
tions feature-based (p. 262) attention results in selecting dozens of objects, which far ex
ceeds the action limit of one (or a few) objects at once. Perhaps one could argue that such
a selected group constitutes a single “object of perception” —whereby within-group
scrutiny requires additional selective mechanisms—but because this single perceptual
group can be distributed across the entire visual field and interleaved with other “unse
lected” information, it cannot be considered a potential object of action (which is neces
sarily localized in space). Thus, action limits cannot explain the limits on feature-based at
tention.
One might argue that attention-for-action theory does not apply to feature-based atten
tion, but rather that it applies only to spatial attention. This is a fair point because action
occurs within the spatial domain. However, visual-spatial attention also shows many con
straints that do not match obvious action limits. Most notably, although action is limited
to a single location at any given moment, it is possible to attend to multiple objects or lo
cations simultaneously, in parallel (Awh & Pashler, 2000; McMains & Somers, 2004;
Pylyshyn & Storm, 1988; for a review of multifocal attention, see Cavanagh & Alvarez,
2005). For instance, a common task for exploring the limits on spatial attention is the
multiple-object-tracking task. In this task, observers view a set of identical moving ob
jects. A subset is highlighted as targets to be tracked, and then all items appear identical
and continue moving. The task is to keep track of the target items, but because there are
no distinguishing features of the objects, participants must continuously attend to the ob
jects in order to track them. Observers are able to keep track of one to eight objects con
currently, depending on the speed and spacing between the items (Alvarez & Franconeri,
2007), and the selection appears to be multifocal, selecting targets but not the space be
tween targets (Intriligator & Cavanagh, 2001; Pylyshyn & Storm, 1988). The upper bound
on spatial attention of at least eight objects is far beyond the action limit of a single loca
tion. One might argue that multiple locations can be attended because it is possible to
plan complex actions that involve acting on multiple locations sequentially. However, on
this view, there is no reason for there to be any limit on the number of locations that
Page 11 of 32
Attention and Action
Other constraints on spatial attention are also difficult to explain within the attention-for-
action framework. In particular, there appear to be important low-level, anatomical con
straints on attentional processing. For instance, the spatial resolution of attention is
coarser in the upper visual field than in the lower visual field (He, Cavanagh, & Intriliga
tor, 1996; Intriligator & Cavanagh, 2001), and it is easier to maintain sustained focal at
tention along the horizontal meridian than along the vertical meridian (Mackeben, 1999).
It is difficult to understand these limitations in terms of action constraints, unless we as
sume that our hands are less able to act on the upper visual field (perhaps because of
gravity) and less able to act on the vertical meridian (it seems awkward to perform ac
tions above and below fixation). An alternative explanation is that the upper/lower asym
metry is likely to be associated with visual areas in which the lower visual field is overrep
resented. In monkeys, there is an upper/lower asymmetry that increases from relatively
modest in V1 (Tootell, Switkes, Silverman, & Hamilton, 1988) to much more pronounced
in higher visual areas like MT (Maunsell & Van Essen, 1987) and parietal cortex (Galletti,
Fattori, Kutz, & Gamberini, 1999). The horizontal/vertical meridian asymmetry is likely to
be linked to the relatively lower density of ganglion cells along the vertical meridian rela
tive to the horizontal meridian (Curcio & Allen, 1990; Perry & Cowey, 1985) and possibly
to the accelerated decline of cone density with eccentricity along the vertical meridian
relative to the horizontal meridian (Curcio, Sloan, Packer, Hendrickson, & Kalina, 1987).
It is tempting to link these sorts of hemifield effects with hemispheric control of action by
assuming that contralateral control of the body is linked to hemifield constraints on atten
tional selection (e.g., where the right hemisphere controls reaching and attention to the
Page 12 of 32
Attention and Action
left visual field, and the left hemisphere controls reaching and attention to the right visu
al field). However, there are also quadrantic limits on attentional selection (Carlson et al.,
2007) that are not amenable to such an account. Specifically, attended targets interfere
with each other to a greater extent when they both appear within the same quadrant of
the visual field than when they are equally far apart but appear in separate quadrants of
the visual field. It is difficult to understand these limits on spatial attention within an at
tention-for-action framework because there are no visual field quadrantic effects on body
movement. Instead, a capacity limit account, particularly within the competition-for-rep
resentation framework, provides a more natural explanation for these visual field effects.
For instance, by taking known anatomy into account, we can find some important clues as
to the cause of the quadrantic deficit. Visual effects that are constrained by the horizontal
and vertical meridian can be directly linked to extrastriate areas V2 and V3 (Horton &
Hoyt, 1991), which maintain noncontiguous representations of the four quadrants of the
visual field. Increasing the distance between the lower-level cortical representations of
each target appears to decrease the amount of attentional interference between them.
One possibility is that cortical distance is correlated with the degree of the overlap be
tween the receptive fields of neurons at two attended locations (i.e., the degree of compe
tition for representation). On this account, the release from interference over the meridi
ans suggests that receptive fields of neurons located on these intra-areal borders may not
extend across quadrants of the visual field. This account is purely speculative, but for our
purpose the important point is that there are quadrant-level attentional interference ef
fects, and anatomical constraints offer some potential explanations, whereas limits on ac
tion do not.
These are just a few examples to illustrate that visual attention is not limited only be
cause action is limited. Many constraints on visual-spatial attention are not mirrored by
constraints on the action system, including limits on the number of attentional foci that
can be maintained at once and visual field constraints on attentional selection. Such con
straints can be understood in terms of capacity limits in the form competition for repre
sentation, by taking the architecture of the human visual system into account. Thus, to a
great extent, the mechanisms and limitations on visual-spatial attention can be under
stood separately and independently from the limitations on action. This should not be tak
en to mean that the process of spatial attention is somehow completely unrelated to the
action system, or that the action system does not constrain visual selection. Visual atten
tion and action are component processes of an integrated cognitive system, and as de
scribed in the following section, the two systems are tightly coupled.
attention first selects that location, as if attention selects the targets of eye movements.
Indeed, theories such as the premotor theory of attention (Rizzolatti, Riggio, Dascola, &
Umilta, 1987) propose that the mechanisms of visual attention are the same mechanisms
that control eye movements. However, even within the domain of vision, attention is a
process that operates at many levels, and it is likely that some aspects of visual attention
do not perfectly overlap with the eye movement system, and the evidence suggests that
there is some degree of separation between the mechanisms of attention and eye move
ments. Nevertheless, the idea that there is a high degree of overlap between the mecha
nisms of visual-spatial attention and eye movements is well supported by both behavioral
and neurophysiological evidence.
One logical possibility for the relationship between visual-spatial selection and eye move
ments is that the focus of attention is locked into alignment with the center of fixation (at
the fovea). That (p. 264) is, you might always be attending where you eyes are fixating.
However, evidence that attention can be shifted covertly away from fixation rejects this
possibility (Eriksen & Yeh, 1985; J. M. Henderson, 1991; Posner, 1980). For instance,
Posner’s cueing studies suggest that observers are able to shift their attention away from
where their eyes are looking, and focus on locations in the periphery. This ability to shift
attention away from the eyes is known as covert attention.
As described above, other research has shown that it is possible to attend to multiple ob
jects simultaneously (Pylyshyn & Storm, 1988). It is impossible to perform this task using
eye movements alone because the targets move randomly and independently of each oth
er. In other words, you can only look directly at one object at a time, and yet it is possible
to track multiple objects simultaneously (anywhere from two to eight, depending on the
speed and spacing of the items; Alvarez & Franconeri, 2007). Also, this task cannot be
performed by grouping items into a single object (Yantis, 1992) because each item moves
randomly and independently of the other items. Consequently, even if the items are per
ceptually grouped, the vertices must be independently selected and tracked to perform
the task. Thus, it would appear that we have a multifocal, covert attention system that
can select and track multiple moving objects at once, independent of eye position (Ca
vanagh & Alvarez, 2005).
Page 14 of 32
Attention and Action
It is clear that attention can be shifted covertly away from the center of fixation, but that
visual-spatial attention and eye movements appear tightly coupled. Specifically, it appears
that attention to a location precedes eye movements to that location. Early studies explor
ing eye movements during reading found evidence suggesting that attention precedes
saccadic eye movements. McConkie and Rayner (1975) developed a clever paradigm
called the “moving window” or “gaze-contingent display” to investigate perceptual pro
cessing during reading. They had observers read text (Figure 13.2a), and changed all of
the letters away from fixation into the letter x. You might think this would be easily no
ticeable to the readers, but it was possible to change most of the letters to x without read
ers noticing. To accomplish this, researchers created a “window” around fixation where
the text always had normal letters, and as observers moved their eyes, the window moved
with them. Interestingly, observers did not notice this alteration of the text when the win
dow was large (Figure 13.2b), but they did notice when the window was small (Figure
13.2c). Thus, by manipulating the size of the moving window, it is possible to determine
the span of perception during reading (the number of letters readers notice as they read).
Interestingly, the span of perception is highly asymmetrical around the point of fixation,
with three to four letters to the left of fixation, and fourteen to fifteen letters to the right
of fixation (McConkie & Rayner, 1976) (Figure 13.2d).
What does this have to do with the relationship between attention and eye movements?
One interpretation of this finding is that during reading, eye movements to the right are
preceded by a shift of attention to the right. This interpretation assumes that attention
enhances the perception of letters, which is consistent with the known enhancing effects
of attention (Carrasco, Ling, & Read, 2004; Carrasco, Williams, & Yeshurun, 2002; Titch
ener, 1908; Yeshurun & Carrasco, 1998). This interpretation is further supported by the
finding that readers of Hebrew text, which is read from right to left, show the opposite
pattern: More letters are perceived to the left of fixation than to the right of fixation (Pol
latsek, Bolozky, Well, & Rayner, 1981). This asymmetry of the perceptual span does not
Page 15 of 32
Attention and Action
appear to rely on extensive reading practice: In general, the perceptual span appears to
be asymmetrical in the direction of eye movements, even when reading in an atypical di
rection (Inhoff, Pollatsek, Posner, & Rayner, 1989) or when scanning objects in the visual
field in a task that does not involve reading (Henderson, Pollatsek, & Rayner, 1989).
Saccadic eye movements are ballistic movements of the eyes from one location to anoth
er. Direct (p. 265) evidence for attention preceding saccadic eye movements comes from
studies showing enhanced perceptual processing at saccade target locations before sac
cade execution (Chelazzi, et al., 1995; Crovitz & Daves, 1962; J. M. Henderson, 1993;
Hoffman & Subramaniam, 1995; Schneider & Deubel, 1995; Shepherd, Findlay, & Hock
ey, 1986). The assumption of such studies is that faster, more accurate perceptual pro
cessing is a hallmark of visual-spatial attention. In one study, Hoffman and Subramaniam
(1995) used a dual-task paradigm in which they presented a set of four possible saccade
target locations. For the first task, an arrow cue pointed to one of the four locations, and
observers were instructed to prepare a saccade to the cued location (Figure 13.3). How
ever, observers did not execute the saccade until they heard a beep. The critical question
was whether attention would obligatorily be focused on the saccade target location,
rather than on the other locations. Because attention presumably enhances perceptual
processing at attended locations, it is possible to use a secondary task to probe whether
attention is allocated to the saccade target location. If perceptual processing were en
hanced at the saccade target location relative to other locations, then it would appear
that attention was allocated to the saccade target location. For the second task, four let
ters were briefly presented, with one letter appearing in each of the four possible sac
cade target locations (see Figure 13.3). One of those letters was the target letter (either a
T or an L), and the other letters were distractor letters, Es and Fs. The task was to identi
fy whether a T or L was present, and the T or L could appear at any position, independent
of the saccade target location. Because the letter could appear anywhere, there was no
Page 16 of 32
Attention and Action
incentive to look for the target letter in the saccade target position, and yet observers
were more accurate in detecting the target letter when it appeared at the saccade target
location. In a follow-up experiment, it was shown that observers could not attend to one
location and simultaneously move the eyes to a different location. Thus, the allocation of
attention to the saccade target location is in fact obligatory, occurring even when condi
tions are in favor of attending to a location other than the saccade target location.
Several studies have taken a similar approach, demonstrating that it does not appear pos
sible to make an eye movement without first attentionally selecting the saccade target lo
cation. Shepherd, Findlay, and Hockey (1986) showed observers two boxes, one to the left
and one to the right of fixation. A central arrow cued one of the locations, and observers
were required to shift attention to the cued location. Shortly before or after the saccade,
a probe stimulus (a small square) appeared, and observers had to press a key as quickly
as possible. Critically, observers knew that the probe would mostly likely appear in the
box opposite of the saccade target location, providing an incentive to attend to the box
opposite of the saccade target if possible. Nevertheless, response times were faster when
the saccade target location and the probe location were the same, relative to when they
appeared in different boxes. This was true even when the probe appeared before the sac
cade occurred. It seems difficult or impossible to move the eyes in one direction while at
tending to a location in the opposite direction. Thus, although it is possible to shift atten
tion without moving the eyes, it does not seem possible to shift the eyes without shifting
attention in that direction first.
This tight coupling between visual-spatial attention and eye movements is not on
(p. 266)
ly true for saccadic eye movements but also appears to hold for smooth pursuit eye move
ments (Khurana & Kowler, 1987; van Donkelaar, 1999; van Donkelaar & Drew, 2002).
Smooth pursuit eye movements are the smooth, continuous eye movements used to main
tain fixation on a moving object. Indirect evidence for the interaction between attention
and smooth pursuit eye movements comes from studies requiring observers to smoothly
track a moving target with their eyes, and then make a saccade toward a stationary pe
ripheral target. Such saccades are faster (Krauzlis & Miles, 1996; Tanaka, Yoshida, &
Fukushima, 1998) and more accurate (Gellman & Fletcher, 1992) to locations ahead of
the pursuit direction relative to saccades toward locations behind the pursuit direction.
These results are consistent with the notion that attention is allocated ahead of the direc
tion of ongoing smooth pursuit. Van Donkelaar (1999) provides more direct evidence in fa
vor of this interpretation. Observers were required to keep their eyes focused on a
smoothly moving target, while simultaneously monitoring for the appearance of a probe
stimulus. The probe was a small circle that could be flashed ahead of the moving target
(in the direction of pursuit) or behind the moving target (in the wake of pursuit). Critical
ly, participants were required to continue smooth pursuit throughout the trial, and to
press a key upon detecting the probe without breaking smooth pursuit. The results
showed that responses were significantly faster for probes ahead of the direction of pur
suit. This suggests that attentional selection of locations ahead of the direction of pursuit
is required to maintain smooth pursuit eye movements. In support of this interpretation,
subsequent experiments showed that probe detection is fastest at the target location and
Page 17 of 32
Attention and Action
just ahead of the target location, with peak enhancement ahead of eye position depending
on pursuit speed (greater speed, further ahead; van Donkelaar & Drew, 2002).
In summary, the behavioral evidence shows enhanced perceptual processing before eye
movements at the eventual location of the eyes. Because attention is known to enhance
perceptual processing in terms of both speed and accuracy, this behavioral evidence sup
ports a model in which visual-spatial attention selects the targets of eye movements.
However, the behavioral evidence alone cannot distinguish between a model in which vi
sual-spatial attention and eye-movement control are in fact a single mechanism, as op
posed to two separate but tightly interconnected mechanisms. One approach to directly
address this question is to investigate the neural mechanisms of visual-spatial attention
and eye-movement control.
Research on the neural basis of visual-spatial attention and eye movements strongly sug
gests that there is a great deal of overlap between the neural mechanisms that control at
tention and the neural mechanisms that control eye movements.
In one study, Beauchamp et al. (2001) had observers perform either a covert attention
task or an eye-movement task. In the covert attention task, observers kept their eyes fix
ated on a point at the center of the screen and attended to a target that jumped from lo
cation to location in the periphery. In the eye-movement task, observers moved their eyes,
attempting to keep their eyes focused on the location of the target as it jumped from posi
tion to position. In each condition, the rate at which the target changed location was var
ied from 0.2 times per second, to 2.5 times per second. Activation for both tasks was com
pared with a control task in which the target remained stationary at fixation. Relative to
this control task, several areas were more active during both the covert attention task
and the eye-movement task, including the precentral sulcus (PreCS), the intraparietal sul
cus (IPS), the lateral occipital sulcus (LOS). Activity in the PreCS appeared to have two
distinct foci, and the superior PreCS has been identified as the possible homolog of the
Page 18 of 32
Attention and Action
monkey frontal eye fields (FEFs) (Paus, 1996), which plays a central role in the control of
eye-movements.
Although these brain regions were active during both the covert attention task
(p. 267)
and the eye-movement task, the level of activity was significantly greater during the eye-
movement task. Based on this result, Beauchamp et al. (2001) proposed the intriguing hy
pothesis that a shift of visual-spatial attention is simply a subthreshold activation of the
oculomotor control regions. Moderate, subthreshold activity in oculomotor control re
gions will cause a covert shift of spatial attention without triggering an eye movement,
whereas higher, suprathreshold activity in those same regions will cause eye movement.
Of course, the resolution of fMRI does not allow us to determine that the exact same neu
rons involved in shifting attention are involved with eye movements. A plausible alterna
tive is that separate populations of neurons underlie shifts of attention and the control of
eye movements, but that these populations are located in roughly the same anatomical re
gions (see Corbetta et al., 2001, for a discussion of this possibility). However, if attention
and eye movements had separate underlying populations of neurons, the degree of over
lap between the two networks would suggest a functional purpose, perhaps to enable
communication between the systems. Thus, on either view, there is strong evidence for
shared or tightly interconnected neural mechanisms for visual-spatial attention and eye
movements.
Recall that the behavioral evidence presented above suggests that a shift of attention
necessarily precedes an eye movement. Thus, even if eye movements required neural
mechanisms completely separate from shifts of attention, the brain regions involved in
the eye movement task should include both the attention regions (which behavioral work
suggests must be activated to make an eye movement) and those involved specifically
with making eye movements (if there are any). How do we know that all of these active
regions are not just part of the attention system? The critical question is not only whether
the brain regions for attention and eye movements overlap but also whether there is any
nonoverlap between them because the nonoverlap may represent eye-movement-specific
neural mechanisms. Physiological studies in monkeys provide some insight into this ques
tion.
Page 19 of 32
Attention and Action
Research employing direct stimulation of neurons in the FEF and SC supports the hypoth
esis that the same neural mechanisms that control eye movements also control covert
shifts of spatial attention. For instance, Moore and Fallah (2004) trained monkeys to de
tect a target (a brief dimming) in the peripheral visual field. It was found that subthresh
old stimulation of FEF cells improved performance at peripheral locations where the
monkey’s eyes would move if suprathreshold stimulation were applied. Similarly, Ca
vanaugh and Wurtz (2004) have shown that change detection is improved by subthresh
old microstimulation of SC, and this improvement was only observed when the change oc
curred at the location where the eyes would move if suprathreshold stimulation were ap
plied. Thus, it appears that the cells in the FEF and SC that control the deployment of eye
movements also control covert shifts of spatial attention.
Although compelling, these microstimulation studies are still limited by the possibility
that microstimuation likely influences several neurons within the region of the electrode.
Thus, it remains possible that eye movements and attention are controlled by different
cells, but these cells are closely intertwined. Indeed, there is evidence supporting the
possibility that some neurons code for eye movements and not spatial selection, whereas
other neurons code for spatial selection and not eye movements. For instance, Sato and
Schall (2003) found that some neurons within the FEF appear to code for the locus of spa
tial attention, whereas others code only for the saccade location. Moreover, Ig
nashchenkova et al. (2004) measured single-unit activity in the SC and found that visual
and visuomotor neurons were active during covert shifts of attention, but that purely mo
tor neurons were not. Thus, both the FEF and SC appear to have neurons that are in
volved with both attention and eye movements, as well as some purely motor eye-move
ment neurons.
Summary
What is the relationship between attention and action? To answer this question we must
first (p. 268) clarify, at least a little bit, what we mean by the term “attention.” In this
chapter the term has been used to refer to the process of selection that occurs at multiple
levels of representation within the cognitive system, including goal selection, visual selec
tion, and action selection. An overview of visual attention was presented, highlighting the
capacity limitations on visual processing and the attentional mechanisms by which we
cope with these capacity limitations, effectively tuning the system to the most important
subset of the incoming information. There has been some debate about whether the limits
on visual attention are due to information processing constraints or to limitations on
physical action. The idea that visual attention is limited by action is undermined by the
mismatch in limitations between visual attention and physical action. Moreover, there is
ample evidence demonstrating that the architecture of the visual system alone, indepen
dent of action, requires mechanisms of attention and constrains how those mechanisms
operate. In other words, attention is clearly required to manage information processing
constraints within the visual system, irrespective of limitations on physical action. This
does not rule out the possibility that action limitations impose additional constraints on
Page 20 of 32
Attention and Action
visual attention, but it does argue against the idea that action limitations alone impose
constraints on visual attention.
Despite the claim that attention is not limited only because action is limited, it is clear
that visual-spatial attention and action are tightly coupled. Indeed, in the case study of
spatial attention and eye movements, the attention system and action system are highly
overlapping both behaviorally and neurophysiologically, and where they do not overlap
completely they are tightly intertwined. In this case the overlap makes perfect sense be
cause attending to a location in space and moving the eyes to a position in space require
similar computations (i.e., specifying a particular spatial location relative to some refer
ence frame). It is interesting to consider whether other mechanisms of visual selection
(feature based, object based) are as intimately tied to action, or whether the coupling is
primarily between spatial attention and action. Further, the current chapter focused on
eye movements, but reaching movements and locomotion are other important forms of ac
tion that must be considered as well. Detailed models of the computations involved in
these different forms of attentional selection and the control of action would predict that,
like the case of spatial attention and eye movements, visual selection and action mecha
nisms are likely to overlap to the extent that shared computations and representations
are required by each. However, it is likely that the relationship between attention and ac
tion will vary across different forms of attention and action. For instance, there is some
evidence that the interaction between visual-spatial attention and hand movements might
be more flexible than the interaction between spatial attention and eye movements.
Specifically, spatial attention appears to select the targets for both eye movements and
hand movements. However, it is possible to shift attention away from eventual hand loca
tion before the hand is moved, but it is not possible to shift attention away from the even
tual eye position before the eye has moved (Deubel & Schneider, 2003).
Our understanding of visual-spatial attention and action has been greatly expanded by
considering the relationship between these cognitive systems. Further exploring similar
questions within other domains of attention (e.g., goal selection, other mechanisms of vi
sual selection, auditory selection, tactile selection) and other domains of action (e.g.,
reaching, locomotion, complex coordinated actions) is likely to provide an even greater
understanding of attention and action, and of how they operate together within an inte
grated cognitive system.
References
Allport, D. A. (1987). Selection for action: Some behavioral and neurophysiological con
siderations of attention and action. In H. Heuer & A. F. Sanders (Eds.), Perspectives on
perception and action (pp. 395–419). Cambridge, MA: Erlbaum.
Page 21 of 32
Attention and Action
Allport, D. A. (1993). Attention and control. Have we been asking the wrong questions? A
critical review of twenty-five years. In D. E. Meyer & S. Kornblum (Eds.), Attention and
performance XIV: Synergies in experimental psychology, artificial intelligence, and cogni
tive neuroscience (pp. 183–218). Cambridge, MA: MIT Press.
Alvarez, G. A., & Cavanagh, P. (2005). Independent resources for attentional tracking in
the left and right visual hemifields. Psychological Science, 16 (8), 637–643.
Alvarez, G. A., & Franconeri, S. L. (2007). How many objects can you track? Evidence for
a resource-limited attentive tracking mechanism. Journal of Vision, 7 (13–14), 1–10.
Awh, E., & Pashler, H. (2000). Evidence for split attentional foci. Journal of Experimental
Psychology: Human Perception and Performance, 26 (2), 834–846.
Balazsi, A. G., Rootman, J., Drance, S. M., Schulzer, M., & Douglas, G. R. (1984). The ef
fect of age on the nerve fiber population of the human optic nerve. American Journal of
Ophthalmology, 97 (6), 760–766.
Beauchamp, M. S., Petit, L., Ellmore, T. M., Ingeholm, J., & Haxby, J. V. (2001). A paramet
ric fMRI study of overt and covert shifts of visuospatial attention. NeuroImage, 14 (2),
310–321.
Ben-Shahar, O., Scholl, B. J., & Zucker, S. W. (2007). Attention, segregation, and textons:
Bridging the gap between object-based attention and texton-based segregation. Vision Re
search, 47 (6), 845–860.
Bizzi, E. (1968). Discharge of frontal eye field neurons during saccadic and following eye
movements in unanesthetized monkeys. Experimental Brain Research, 6, 69–80.
Bruce, C. J., & Goldberg, M. E. (1985). Primate frontal eye fields. I. Single neurons dis
charging before saccades. Journal of Neurophysiology, 53 (3), 603–635.
Bruesch, S. R., & Arey, L. B. (2004). The number of myelinated and unmyelinated fibers in
the optic nerve of vertebrates. Journal of Comparative Neurology, 77 (3), 631–665.
Page 22 of 32
Attention and Action
Carlson, T. A., Alvarez, G. A., & Cavanagh, P. (2007). Quadrantic deficit reveals anatomi
cal constraints on selection. Proceedings of the National Academy of Sciences U S A, 104
(33), 13496–13500.
Carrasco, M., Ling, S., & Read, S. (2004). Attention alters appearance. Nature Neuro
science, 7 (3), 308–313.
Carrasco, M., Talgar, C. P., & Cameron, E. L. (2001). Characterizing visual performance
fields: Effects of transient covert attention, spatial frequency, eccentricity, task and set
size. Spatial Vision, 15 (1), 61–75.
Carrasco, M., Williams, P. E., & Yeshurun, Y. (2002). Covert attention increases spatial
resolution with or without masks: support for signal enhancement. Journal of Vision, 2 (6),
467–479.
Cavanagh, P., & Alvarez, G. A. (2005). Tracking multiple targets with multifocal attention.
Trends in Cognitive Sciences, 9 (7), 349–354.
Chelazzi, L., Biscaldi, M., Corbetta, M., Peru, A., Tassinari, G., & Berlucchi, G. (1995).
Oculomotor activity and visual spatial attention. Behavioural Brain Research, 71 (1–2),
81–88.
Cooper, A., & Humphreys, G. (1999). A new, object-based visual illusion. Paper presented
at the Psychonomic Society, Los Angeles.
Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A., et al.
(1998). A common network of functional areas for attention and eye movements. Neuron,
21 (4), 761–773.
Crovitz, H. F., & Daves, W. (1962). Tendencies to eye movement and perceptual accuracy.
Journal of Experimental Psychology, 63, 495–498.
Curcio, C. A., & Allen, K. A. (1990). Topography of ganglion cells in human retina. Journal
of Comparative Neurology, 300 (1), 5–25.
Curcio, C. A., Sloan, K. R., Jr., Packer, O., Hendrickson, A. E., & Kalina, R. E. (1987). Dis
tribution of cones in human and monkey retina: Individual variability and radial asymme
try. Science, 236 (4801), 579–582.
Page 23 of 32
Attention and Action
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An
nual Review of Neuroscience, 18, 193–222.
Deubel, H., & Schneider, W. X. (2003). Delayed saccades, but not delayed manual aiming
movements, require visual attention shifts. Annals of the New York Academy of Sciences,
1004, 289–296.
Downing, C. J., & Pinker, S. (1985). The spatial structure of visual attention. In M. Posner
& O. Martin (Eds.), Attention and performance XI (pp. 171–187). Hillsdale, NJ: Erlbaum.
Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology General, 113 (4), 501–517.
Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychologi
cal Review, 96 (3), 433–458.
Egeth, H. E., Virzi, R. A., & Garbart, H. (1984). Searching for conjunctively defined tar
gets. Journal of Experimental Psychology Human Perception and Performance, 10 (1), 32–
39.
Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy
chology General, 123 (2), 161–177.
Engel, F. L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Re
search, 11 (6), 563–576.
Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of fo
cal attention: a zoom lens model. Perception & Psychophysics, 40 (4), 225–240.
Eriksen, C. W., & Yeh, Y. Y. (1985). Allocation of attention in the visual field. Journal of Ex
perimental Psychology: Human Perception and Performance, 11 (5), 583–597.
Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is con
tingent on attentional control settings. Journal of Experimental Psychology Human Per
ception and Performance, 18 (4), 1030–1044.
Franconeri, S. L., Alvarez, G. A., & Enns, J. T. (2007). How many locations can be selected
at once? Journal of Experimental Psychology: Human Perception and Performance, 33 (5),
1003–1012.
Franconeri, S. L., Hollingworth, A., & Simons, D. J. (2005). Do new objects capture atten
tion? Psychological Science, 16 (4), 275–281.
Page 24 of 32
Attention and Action
Franconeri, S. L., & Simons, D. J. (2005). The dynamic events that capture visual atten
tion: A reply to Abrams and Christ (2005). Perception & Psychophysics, 67 (6), 962–966.
Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1991). Neuronal activity related to
saccadic eye movements in the monkey’s dorsolateral prefrontal cortex. Journal of Neuro
physiology, 65 (6), 1464–1483.
Galletti, C., Fattori, P., Kutz, D. F., & Gamberini, M. (1999). Brain location and visual
topography of cortical area V6A in the macaque monkey. European Journal of Neuro
science, 11 (2), 575–582.
Gellman, R. S., & Fletcher, W. A. (1992). Eye position signals in human saccadic process
ing. Experimental Brain Research, 89 (2), 425–434.
Orbrecht & L. Stark (Eds.), Presbyopia research (pp. 171–175). New York: Plenum.
He, S., Cavanagh, P., & Intriligator, J. (1996). Attentional resolution and the locus of visual
awareness. Nature, 383 (6598), 334–337.
He, Z. J., & Nakayama, K. (1995). Visual attention to surfaces in three-dimensional space.
Proceedings of the National Academy of Sciences U S A, 92 (24), 11155–11159.
Helmholtz, H. V. (1962). Treatise on physiological optics (Vol. 3). New York: Dover.
Henderson, J. M., Pollatsek, A., & Rayner, K. (1989). Covert visual attention and ex
trafoveal information use during object identification. Perception & Psychophysics, 45 (3),
196–208.
Hillstrom, A. P., & Yantis, S. (1994). Visual motion and attentional capture. Perception &
Psychophysics, 55 (4), 399–411.
Hoffman, J. E., & Subramaniam, B. (1995). The role of visual attention in saccadic eye
movements. Perception & Psychophysics, 57 (6), 787–795.
Horton, J. C., & Hoyt, W. F. (1991). Quadrantic visual field defects: A hallmark of lesions in
extrastriate (V2/V3) cortex. Brain, 114 (4), 1703–1718.
Ignashchenkova, A., Dicke, P. W., Haarmeier, T., & Thier, P. (2004). Neuron-specific contri
bution of the superior colliculus to overt and covert shifts of attention. Nature Neuro
science, 7 (1), 56–64.
Inhoff, A. W., Pollatsek, A., Posner, M. I., & Rayner, K. (1989). Covert attention and eye
movements during reading. Quarterly Journal of Experimental Psychology A, 41 (1), 63–
89.
Intriligator, J., & Cavanagh, P. (2001). The spatial resolution of visual attention. Cognitive
Psychology, 43 (3), 171–216.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts
of visual attention. Vision Research, 40 (10–12), 1489–1506.
Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object-
specific integration of information. Cognitive Psychology, 24 (2), 175–219.
Kaptein, N. A., Theeuwes, J., & Van der Heijden, A. H. (1995). Search for a conjunctively
defined target can be selectively limited to a color-defined subset of elements. Journal of
Experimental Psychology: Human Perception and Performance, 21, 1053–1069.
Khurana, B., & Kowler, E. (1987). Shared attentional control of smooth eye movement and
perception. Vision Research, 27 (9), 1603–1618.
Konkle, T., Brady, T. F., Alvarez, G., & Oliva, A. (2010). Conceptual distinctiveness sup
ports detailed visual long-term memory for real-world objects. Journal of Experimental
Psychology: General, 139 (3), 558–578.
Kramer, A. F., Weber, T. A., & Watson, S. E. (1997). Object-based attentional selection:
Grouped arrays or spatially invariant representations? Comment on Vecera and Farah
(1994). Journal of Experimental Psychology: General, 126 (1), 3–13.
Krauzlis, R. J., & Miles, F. A. (1996). Initiation of saccades during fixation or pursuit: Evi
dence in humans for a single mechanism. Journal of Neurophysiology, 76 (6), 4175–4179.
LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen
tal Psychology: Human Perception & Performance, 9 (3), 371–379.
LaBerge, D., & Brown, V. (1986). Variations in size of the visual field in which targets are
presented: An attentional range effect. Perception & Psychophysics, 40 (3), 188–200.
Lamy, D., & Tsal, Y. (2000). Object features, object locations, and object files: Which does
selective attention activate and when? Journal of Experimental Psychology Human Per
ception and Performance, 26 (4), 1387–1400.
Land, M. F., & Hayhoe, M. (2001). In what ways do eye movements contribute to every
day activities? Vision Research, 41 (25–26), 3559–3565.
Page 26 of 32
Attention and Action
Lavie, N., & Driver, J. (1996). On the spatial extent of attention in object-based visual se
lection. Perception & Psychophysics, 58 (8), 1238–1251.
Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press.
Mackeben, M. (1999). Sustained focal attention and peripheral letter recognition. Spatial
Vision, 12 (1), 51–72.
Marino, A. C., & Scholl, B. J. (2005). The role of closure in defining the “objects” of object-
based attention. Perception & Psychophysics, 67 (7), 1140–1149.
Maunsell, J. H., & Van Essen, D. C. (1987). Topographic organization of the middle tempo
ral visual area in the macaque monkey: Representational biases and the relationship to
callosal connections and myeloarchitectonic boundaries. Journal of Comparative Neurolo
gy, 266 (4), 535–555.
McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during fixation
in reading. Perception & Psychophysics, 17, 578–586.
McConkie, G. W., & Rayner, K. (1976). Asymmetry of the perceptual span in reading. Bul
letin of the Psychonomic Society, 8, 365–368.
McLay, R. W., Anderson, D. J., Sidaway, B., & Wilder, D. G. (1997). Motorcycle accident re
construction under Daubert. Journal of the National Academy of Forensic Engineering,
14, 1–18.
McMains, S. A., & Somers, D. C. (2004). Multiple spotlights of attentional selection in hu
man visual cortex. Neuron, 42 (4), 677–686.
Mitroff, S. R., Scholl, B. J., & Wynn, K. (2004). Divide and conquer: How object files adapt
when a persisting object splits into two. Psychological Science, 15 (6), 420–425.
Moore, T., & Fallah, M. (2004). Microstimulation of the frontal eye field and its effects on
covert spatial attention. Journal of Neurophysiology, 91 (1), 152–162.
Most, S. B., & Astur, R. (2007). Feature-based attentional set as a cause of traffic acci
dents. PVIS, 15 (2), 125–132.
Most, S. B., Scholl, B. J., Clifford, E. R., & Simons, D. J. (2005). What you see is what you
set: Sustained inattentional blindness and the capture of awareness. Psychological Re
view, 112 (1), 217–242.
(p. 271) Most, S. B., Simons, D. J., Scholl, B. J., Jimenez, R., Clifford, E., & Chabris, C. F.
(2001). How not to be seen: The contribution of similarity and selective ignoring to sus
tained inattentional blindness. Psychological Science, 12 (1), 9–17.
Motter, B. C., & Simoni, D. A. (2007). The roles of cortical image separation and size in
active visual search performance. Journal of Vision, 7 (2), 6–15.
Page 27 of 32
Attention and Action
Mountcastle, V. B., Lynch, J. C., Georgopoulos, A., Sakata, H., & Acuna, C. (1975). Posteri
or parietal association cortex of the monkey: Command functions for operations within
extrapersonal space. Journal of Neurophysiology, 38 (4), 871–908.
Mounts, J. R., & Melara, R. D. (1999). Attentional selection of objects or features: evi
dence from a modified search task. Perception & Psychophysics, 61 (2), 322–341.
Neumann, O. (1990). Visual attention and action. In O. Neumann & W. Prinz (Eds.), Rela
tionships between perception and action: Current approaches (pp. 227–267). Berlin:
Springer.
O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the
units of attentional selection. Nature, 401 (6753), 584–587.
O’Regan, J. K., Deubel, H., Clark, J. J., & Rensink, R. A. (2000). Picture changes during
blinks: Looking without seeing and seeing without looking. Visual Cognition, 7 (1), 191–
211.
O’Regan, J. K., Rensink, R. A., & Clark, J. J. (1999). Change-blindness as a result of “mud
splashes.” Nature, 398 (6722), 34–34.
Østerberg, G. A. (1935). Topography of the layer of rods and cones in the human retina.
Acta Ophthalmologica, 13 (Suppl 6), 1–97.
Paus, T. (1996). Location and function of the human frontal eye-field: A selective review.
Neuropsychologia, 34 (6), 475–483.
Perry, V. H., & Cowey, A. (1985). The ganglion cell and cone distributions in the monkey’s
retina: Implications for central magnification factors. Vision Research, 25 (12), 1795–
1810.
Petersen, S. E., Robinson, D. L., & Keys, W. (1985). Pulvinar nuclei of the behaving rhesus
monkey: Visual responses and their modulation. Journal of Neurophysiology, 54 (4), 867–
886.
Pollatsek, A., Bolozky, S., Well, A. D., & Rayner, K. (1981). Asymmetries in the perceptual
span for Israeli readers. Brain Language, 14 (1), 174–180.
Page 28 of 32
Attention and Action
Posner, M. I., Snyder, C. R., & Davidson, B. J. (1980). Attention and the detection of sig
nals. Journal of Experimental Psychology, 109 (2), 160–174.
Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: Evidence
for a parallel tracking mechanism. Spatial Vision, 3 (3), 179–197.
Quigley, H. A., Addicks, E. M., & Green, W. R. (1982). Optic nerve damage in human glau
coma. III. Quantitative correlation of nerve fiber loss and visual field defect in glaucoma,
ischemic neuropathy, papilledema, and toxic neuropathy. Archives of Ophthalmology, 100
(1), 135–146.
Remington, R., & Pierce, L. (1984). Moving attention: Evidence for time-invariant shifts of
visual selective attention. Perception & Psychophysics, 35 (4), 393–399.
Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for at
tention to perceive changes in scenes. Psychological Science, 8 (5), 368–373.
Reppa, I., & Leek, E. C. (2003). The modulation of inhibition of return by object-internal
structure: Implications for theories of object-based attentional selection. Psychonomic
Bulletin & Review, 10 (2), 493–502.
Rizzolatti, G., Riggio, L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the
horizontal and vertical meridians: evidence in favor of a premotor theory of attention.
Neuropsychologia, 25 (1A), 31–40.
Robinson, D. A. (1972). Eye movements evoked by collicular stimulation in the alert mon
key. Vision Research, 12 (11), 1795–1808.
Robinson, D. L., Goldberg, M. E., & Stanton, G. B. (1978). Parietal association cortex in
the primate: Sensory mechanisms and behavioral modulations. Journal of Neurophysiolo
gy, 74, 698–721.
Rossi, A. F., & Paradiso, M. A. (1995). Feature-specific effects of selective visual attention.
Vision Research, 35 (5), 621–634.
Saenz, M., Buracas, G. T., & Boynton, G. M. (2002). Global effects of feature-based atten
tion in human visual cortex. Nature Neuroscience, 5 (7), 631–632.
Saenz, M., Buracas, G. T., & Boynton, G. M. (2003). Global feature-based attention for mo
tion and color. Vision Research, 43 (6), 629–637.
Sato, T. R., & Schall, J. D. (2003). Effects of stimulus-response compatibility on neural se
lection in frontal eye field. Neuron, 38 (4), 637–648.
Scalf, P. E., & Beck, D. M. (2010). Competition in visual cortex impedes attention to multi
ple items. Journal of Neuroscience, 30 (1), 161–169.
Page 29 of 32
Attention and Action
Schall, J. D., Hanes, D. P., Thompson, K. G., & King, D. J. (1995). Saccade target selection
in frontal eye field of macaque. I. Visual and premovement activation. Journal of Neuro
science, 15 (10), 6905–6918.
Schiller, P. H., & Tehovnik, E. J. (2001). Look and see: How the brain moves your eyes
about. Progress in Brain Research, 134, 127–142.
Schlag, J., & Schlag-Rey, M. (1987). Evidence for a supplementary eye field. Journal of
Neurophysiology, 57 (1), 179–200.
Schneider, W. X., & Deubel, H. (1995). Visual attention and saccadic eye movements: Evi
dence for obligatory and selective spatial coupling. In J. M. Findlay, R. Kentridge & R.
Walker (Eds.), Eye-movement research: Mechanisms, processes, and applications (pp.
317–324). New York: Elsevier.
Serences, J. T., & Boynton, G. M. (2007). Feature-based attentional modulations in the ab
sence of direct visual stimulation. Neuron, 55 (2), 301–312.
Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye move
ments and spatial attention. Quarterly Journal of Experimental Psychology A, 38 (3), 475–
491.
Shulman, G. L., Remington, R. W., & McLean, J. P. (1979). Moving attention through visual
space. Journal of Experimental Psychology, 5 (3), 522–526.
1/2/3, 1–15.
Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blind
ness for dynamic events. Perception, 28 (9), 1059–1074.
Simons, D. J., & Levin, D. T. (1998). Failure to detect changes to people in real-world in
teraction. Psychonomic Bulletin and Review, 5, 644–649.
Simons, D. J., & Rensink, R. (2005). Change blindness: Past, present, and future. Trends
in Cognitive Sciences, 9 (1), 16–20.
Tanaka, M., Yoshida, T., & Fukushima, K. (1998). Latency of saccades during smooth-pur
suit eye movement in man: Directional asymmetries. Experimental Brain Research, 121
(1), 92–98.
Tipper, S. P., Driver, J., & Weaver, B. (1991). Object-centred inhibition of return of visual
attention. Quarterly Journal of Experimental Psychology A, 43 (2), 289–298.
Tootell, R. B., Switkes, E., Silverman, M. S., & Hamilton, S. L. (1988). Functional anatomy
of macaque striate cortex. II. Retinotopic organization. Journal of Neuroscience, 8 (5),
1531–1568.
Treue, S., & Martinez Trujillo, J. C. (1999). Feature-based attention influences motion pro
cessing gain in macaque visual cortex. Nature, 399 (6736), 575–579.
Valdes-Sosa, M., Cobo, A., & Pinilla, T. (1998). Transparent motion and object-based at
tention. Cognition, 66 (2), B13–B23.
van Donkelaar, P., & Drew, A. S. (2002). The allocation of attention during smooth pursuit
eye movements. Progress in Brain Research, 140, 267–277.
Vecera, S. P. (1994). Grouped locations and object-based attention: Comment on Egly, Dri
ver, and Rafal (1994). Journal of Experimental Psychology: General, 123, 316–320.
Vecera, S. P., & Farah, M. J. (1994). Does visual attention select objects or locations? Jour
nal of Experimental Psychology General, 123 (2), 146–160.
Ward, R., Goodrich, S., & Driver, J. (1994). Grouping reduces visual extinction: Neuropsy
chological evidence for weight-linkage in visual selection. Visual Cognition, 1, 101–130.
Wurtz, R. H., & Goldberg, M. E. (1972a). Activity of superior colliculus in behaving mon
key. 3. Cells discharging before eye movements. Journal of Neurophysiology, 35 (4), 575–
586.
Wurtz, R. H., & Goldberg, M. E. (1972b). Activity of superior colliculus in behaving mon
key. IV. Effects of lesions on eye movements. Journal of Neurophysiology, 35 (4), 587–596.
Yantis, S. (1998). Control of visual attention. In H. Pashler (Ed.), Attention (pp. 223–256).
East Sussex, UK: Psychology Press.
Page 31 of 32
Attention and Action
Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence
from visual search. Journal of Experimental Psychology: Human Perception and Perfor
mance, 10 (5), 601–621.
Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396 (6706), 72–75.
Zohary, E., & Hochstein, S. (1989). How serial is serial processing in vision? Perception,
18 (2), 191–200.
George Alvarez
Page 32 of 32
Visual Control of Action
The visual control of skilled actions, such as reaching and grasping, requires fundamen
tally different computations from those mediating our perception of the world. These dif
ferences in the computational requirements for vision for action and vision for perception
are reflected in the organization of the two prominent visual streams of processing that
arise from primary visual cortex in the primate brain. Although the ventral stream pro
jecting to inferotemporal cortex mediates the visual processing underlying our visual ex
perience of the world, the dorsal stream projecting to the posterior parietal cortex medi
ates the visual control of skilled actions. Specialized visual-motor modules have emerged
in the posterior parietal cortex for the visual control of eye, hand, and arm movements.
Although the identification of goal objects and the selection of an appropriate course of
action depend on the perceptual machinery of the ventral stream and associated cogni
tive modules in the temporal and frontal lobes, the execution of the subsequent goal-di
rected action is mediated by dedicated online control systems in the dorsal stream and
associated motor areas. Ultimately then, both streams work together in the production of
goal-directed actions.
Keywords: action, visual-motor control, dorsal stream, ventral stream, two visual streams, grasping, reaching
Introduction
The visual control of movement is a central feature of almost all our daily activities from
playing tennis to picking up our morning cup of coffee. But even though vision is essen
tial for all these activities, only recently have vision scientists turned their attention to the
study of visual-motor control. For most of the past 100 years, researchers have instead
concentrated their efforts on working out how vision constructs our perception of the
world. Psychophysics, not the study of visual-motor control, has been the dominant
methodology (Goodale, 1983). Indeed, with the notable exception of eye movements,
which have typically been regarded as an information-seeking adjunct to visual percep
tion, little attention has been paid to the way in which vision is used to program and con
Page 1 of 40
Visual Control of Action
trol our actions, particularly the movements of our hands and limbs. Nevertheless, in the
past few decades, considerable progress has been made on this front. Enormous strides,
for example, have been made in our understanding of the visual control of locomotion
(see Patla, 1997; Warren & Fajen, 2004). But in this brief review, I focus largely on the vi
sual control of reach-to-grasp movements, a class of behavior that is exceptionally well
developed in humans, and one that exemplifies the importance of vision in the control of
action.
I begin by introducing research that has examined the visual cues that play a critical role
in the control of reach-to-grasp movements. I then move on to a discussion of work on the
neural substrates of that control, reviewing evidence that the visual pathways mediating
visual-motor control are quite distinct from those supporting visual perception.
ments
Humans are capable of reaching out and grasping objects with great dexterity, and vision
plays a critical role in this important skill. Think for a moment about what happens when
you perform the deceptively simple act of reaching out and picking up the cup of coffee
sitting on your desk. After identifying your cup among all the other objects on our desk,
you begin to reach out toward the cup, choosing a trajectory that avoids the telephone
and the computer monitor. At the same time, your fingers begin to conform to the shape
of the cup’s handle well before your hand makes contact with the cup. As your fingers
curl around the handle, the initial forces that are generated as you lift the cup are finely
tuned to its anticipated weight—and to your (implicit) predictions about the friction coef
ficients and compliance of the material from which the cup is made. Visual information is
crucial at every stage of this behavior, but the cues that are used and the neural mecha
nisms that are engaged are quite different for each of the components involved.
Page 2 of 40
Visual Control of Action
pled.1 This so-called dual-channel hypothesis has become the dominant model of human
prehension.
According to Jeannerod’s (1981) dual-channel account, the kinematics of the reach com
ponent, whereby the hand is transported to the goal, are largely determined by visual
cues that are extrinsic to the goal object, such as its distance and location with respect to
the grasping hand. In contrast, the kinematics of the grasp component reflect the size,
shape, and other intrinsic properties of the goal object. Even though later studies (e.g.,
Chieffi & Gentilucci, 1993; Jakobson & Goodale, 1991) showed that the visual control of
the reach and grip components may be more intimately related than Jeannerod had origi
nally proposed, there is broad consensus that the two components show a good deal of
functional independence and (as will be discussed later) are mediated by relatively inde
pendent neural circuitry.
Jeannerod’s (1981) dual-channel hypothesis has not gone unchallenged. Smeets and Bren
ner (1999, 2001, 2009), for example, have proposed instead that the movements of each
finger of the grasping hand are programmed and controlled independently. According to
this account, when a person reaches out to grasp an object with a precision grip, the in
dex finger is directed to one side of the object and the thumb to the other. The apparent
scaling of grip aperture to object size is nothing more than an emergent property of the
fact that the two digits are moving (independently) toward their respective end points.
Page 3 of 40
Visual Control of Action
Moreover, because both digits are attached to the same limb, the so-called reach (or
transport) component is simply the joint movement (p. 275) of the two digits toward the
object. Simply put, it is location rather than size that drives grasping—and there is no
need to separate grasping into transport and grip components, each sensitive to a differ
ent set of visual cues.
Smeets and Brenner’s (1999, 2001) double-pointing hypothesis has the virtue of being
parsimonious. Nevertheless, it is not without its critics (e.g., Dubrowski, Bock, Carnahan,
& Jungling, 2002; Mon-Williams & McIntosh, 2000; Mon-Williams & Tresilian, 2001; van
de Kamp & Zaal, 2007). Van de Kamp and Zaal, for example, showed that when one side
of an object was perturbed as a person reached out to grasp it, the trajectories of both
digits were adjusted in flight, a result that would not be predicted by Smeets and
Brenner’s model but one that is entirely consistent with Jeannerod’s (1981) dual-channel
hypothesis. Even more important (as we shall see later), the organization of the neural
substrates of grasping as revealed by neuroimaging and neuropsychology can be more
easily explained by the dual-channel than the double-pointing hypothesis.
Most studies of grasping, including those discussed above, have used rather unnatural
situations in which the goal object is the only object present in the workspace. In the real
world, of course, the workspace is usually cluttered with other objects, some of which
could be potential obstacles for a goal-directed movement. Nevertheless, when people
reach out to grasp an object, their hand and arm rarely collide with other objects in the
workspace. The ease with which this is accomplished belies the fact that a sophisticated
obstacle avoidance system must be at work—a system that encodes possible obstructions
to a goal-directed movement and incorporates this information into the motor plan. The
few investigations that have examined obstacle avoidance have revealed an efficient sys
tem that is capable of altering the spatial and temporal trajectories of goal-directed
reaching and grasping movements to avoid other objects in the workspace in a fluid man
ner (e.g., Castiello, 2001; Jackson, Jackson, & Rosicky, 1995; Tresilian, 1998; Vaughan,
Rosenbaum, & Meulenbroek, 2001). There has been some debate as to whether the non
goal objects are always being treated as obstacles or instead as potential targets for ac
tion (e.g., Tipper, Howard, & Jackson, 1997) or even as frames of reference for the control
of the movement (e.g., Diedrichsen, Werner, Schmidt, & Trommershauser, 2004; Obhi &
Goodale, 2005). By positioning nongoal objects in different locations in the workspace
with respect to the target, however, it is possible to show that most often these objects
are being treated as obstacles and that when individuals reach out for the goal, the trial-
to-trial adjustments of their trajectories are remarkably sensitive to the position of obsta
cles both in depth and in the horizontal plane, as well as to their height (Chapman &
Goodale, 2008, 2010). Moreover, the system behaves conservatively, moving the trajecto
ry of the hand and arm away from nontarget objects, even when those objects are unlike
ly to interfere with the target-directed movement.
Part of the reason for the rapid growth of research into the visual control of manual pre
hension has been the development of reliable technologies for recording hand and limb
movements in three dimensions. Expensive movie film was replaced with inexpensive
Page 4 of 40
Visual Control of Action
videotape in the 1980s—and over the past 25 years, the use of accurate recording devices
based on active or passive infrared markers, ultrasound, magnetism, instrumented
gloves, and an array of other technologies has grown enormously. This has made it possi
ble for the development of what might be termed “visual-motor psychophysics,” in which
investigators are exploring the different visual cues that are used in the programming
and control of grasping.
One of the most powerful sets of cues used by the visual-motor system in mediating
grasping comes from binocular vision (e.g., Servos, Goodale, & Jakobson, 1992). Several
studies, for example, have shown that covering one eye has clear detrimental effects on
grasping (e.g., Keefe & Watt, 2009; Loftus, Servos, Goodale, Mendarozqueta, & Mon-
Williams, 2004; Melmoth & Grant, 2006; Servos, Goodale, & Jakobson, 1992; Watt &
Bradshaw, 2000). People reach more slowly, show longer periods of deceleration, and exe
cute more online adjustments of both their trajectory and their grip during the closing
phase of the grasp. Not surprisingly, then, adults with stereo deficiencies from amblyopia
have been shown to exhibit slower and less accurate grasping movements (Melmoth, Fin
lay, Morgan, & Grant, 2009). Interestingly, however, individuals who have lost an eye are
still able to grasp objects as accurately as normally sighted individuals who are using
both eyes. It turns out that they do this by making use of monocular retinal motion cues
generated by exaggerated head movements (Marotta, Perrot, Nicolle, Servos, & Goodale,
1995). The use of these self-generated motion cues appears to be learned: the longer the
time between loss of the eye and testing, the more likely it is that these individuals will
make unusually large vertical and lateral head movements during the execution of the
grasp (Marotta, Perrot, Nicolle, & Goodale, 1995).
Computation of the required distance for the grasp has been shown to depend
(p. 276)
more on vergence than on retinal disparity cues, whereas the scaling of the grasping
movement and the final placement of the fingers depends more on retinal disparity than
vergence (Melmoth, Storoni, Todd, Finlay, & Grant, 2007; Mon-Williams & Dijkerman,
1999). Similarly, motion parallax contributes more to the computation of reach distance
than it does to the formation of the grasp, although motion parallax becomes important
only when binocular cues are no longer available (Marotta, Kruyer, & Goodale, 1998; Watt
& Bradshaw, 2003). (It should be noted that the differential contributions that these cues
make to the reach and grasp components, respectively, are much more consistent with
Jeannerod’s, 1981, dual-channel hypothesis than they are with Smeets and Brenner’s
(1999) double-pointing account.) But even when one eye is covered and the head immobi
lized, people are still able to reach out and grasp objects reasonably well, suggesting that
static monocular cues can be used to program and control grasping movements. Marotta
and Goodale (1998, 2001), for example, showed that pictorial cues, such as height in the
visual scene and familiar size, can be exploited to program and control grasping—but the
contributions from these static monocular cues are usually overshadowed by binocular in
formation from vergence or retinal disparity, or both.
Page 5 of 40
Visual Control of Action
The role of the shape and orientation of the goal object in determining the formation of
the grasp is poorly understood. It is clear that the posture of the grasping hand is sensi
tive to these features (e.g., Cuijpers, Smeets, & Brenner, 2004; Goodale et al., 1994b; van
Bergen, van Swieten, Williams, & Mon-Williams, 2007), but there have been only a few
systematic investigations of how information about object shape and orientation is used
to configure the hand during the planning and execution of grasping movements (e.g.,
Cuijpers, Smeets, & Brenner, 2004; Lee, Crabtree, Norman, & Bingham, 2008; Louw,
Smeets, & Brenner, 2007; van Mierlo, Louw, Smeets, & Brenner, 2009).
But understanding the cues that are used to program and control a grasping movement is
only part of the story. To reach and grasp an object, one presumably has to direct one’s
attention to that object as well as to other objects in the workspace that could be poten
tial obstacles or alternative goals. Research on the deployment of overt and covert atten
tion in reaching and grasping tasks has accelerated over the past two decades, and it has
become clear that when vision is unrestricted, people shift their gaze toward the goal ob
ject (e.g., Ballard, Hayhoe, Li, & Whitehead, 1992; Johansson, Westling, Bäckström, &
Flanagan, 2001) and to those locations on the object where they intend to place their fin
gers, particularly the points on the object where more visual feedback is required to posi
tion the fingers properly (e.g., Binsted, Chua, Helsen, & Elliott, 2001; Brouwer, Franz, &
Gegenfurtner, 2009). In cluttered workspaces, people also tend to direct their gaze to ob
stacles that they might have to avoid (e.g., Johansson et al., 2001). Even when gaze is
maintained elsewhere in the scene, there is evidence that attention is shifted covertly to
the goal and is bound there until the movement is initiated (Deubel, Schneider, & Paprot
ta, 1998). In a persuasive account of the role of attention in reaching and grasping, Bal
dauf and Deubel (2010) have argued that the planning of a reach-to-grasp movement re
quires the formation of what they call an “attentional landscape,” in which the locations
of all the objects and features in the workspace that are relevant for the intended action
are encoded. Interestingly, their model implies parallel rather than sequential deployment
of attentional resources to multiple locations, a distinct departure from how attention is
thought to operate in more perceptual-cognitive models of attention.
Finally, and importantly for the ideas that I discuss later in this review, it should be noted
that the way in which different visual cues are weighted for the control of skilled move
ments is typically quite different from the way they are weighted for perceptual judg
ments. For example, Knill (2005) found that participants gave significantly more weight to
binocular compared with monocular cues when they were asked to place objects on a
slanted surface in a virtual display compared with when they were required to make ex
plicit judgments about the slant. Similarly, Servos (2000) demonstrated that even though
people relied much more on binocular than monocular cues when they grasped an object,
their explicit judgments about the distance of the same object were no better under
binocular than under monocular viewing conditions. These and other, even more dramatic
dissociations that I review later underscore the fundamental differences between how vi
sion is used for action and for perceptual report. In the next section, I offer a speculative
account of the origins of vision before moving on to discuss the neural organization of the
Page 6 of 40
Visual Control of Action
pathways supporting vision for action on the one hand and vision for perception on the
other.
Page 7 of 40
Visual Control of Action
Beyond the primary visual cortex in the primate cerebral cortex, visual information is
conveyed to a bewildering number of extrastriate areas (Van Essen, 2001). Despite the
complexity of the interconnections between these different areas, two broad “streams” of
projections from primary visual cortex have been identified in the macaque monkey brain:
a ventral stream projecting eventually to the inferotemporal cortex and a dorsal stream
projecting to the posterior parietal cortex (Ungerleider & Mishkin, 1982) (Figure 14.2).
Although some caution must be exercised in generalizing from monkey to human (Sereno
& Tootell, 2005), recent neuroimaging evidence suggests that the visual projections from
early visual areas to the temporal and parietal lobes in the human brain also involve a
separation into ventral and dorsal streams (Culham & Valyear, 2006; Grill-Spector &
Malach, 2004).
Traditional accounts of the division of labor between the two streams (e.g., Ungerleider &
Mishkin, 1982) focused on the distinction between object vision and spatial vision. This
distinction between what and where resonated not only with psychological accounts of
perception that emphasized the role of vision in object recognition and spatial attention,
but also with nearly a century of neurological thought about the functions of the temporal
and parietal lobes in vision (Brown & Schäfer, 1888; Ferrier & Yeo, 1884; Holmes, 1918;).
Page 8 of 40
Visual Control of Action
In the early 1990s, however, the what-versus-where story (p. 278) began to unravel as new
evidence emerged from work with both monkeys and neurological patients. It became ap
parent that a purely perceptual account of ventral-dorsal function could not explain these
findings. The only way to make sense of them was to consider the nature of the outputs
served by the two streams—and to work out how visual information is eventually trans
formed into motor acts.
In 1992, Goodale and Milner proposed a reinterpretation of the Ungerleider and Mishkin
(1982) account of the functional distinction between the two visual streams. According to
the Goodale-Milner model, the dorsal stream plays a critical role in the real-time control
of action, transforming moment-to-moment information about the location and disposition
of objects into the coordinate frames of the effectors being used to perform the action
(Goodale & Milner, 1992; Milner & Goodale 2006, 2008). The ventral stream (together
with associated cognitive networks outside the ventral stream) helps to construct the rich
and detailed representations of the world that allow us to identify objects and events, at
tach meaning and significance to them, and establish their causal relations. Such opera
tions are essential for accumulating and accessing a visual knowledge base about the
world. Thus, it is the ventral stream that provides the perceptual foundation for the of
fline control of action, projecting action into the future and incorporating stored informa
tion from the past into the control of current actions. In contrast, processing in the dorsal
stream does not generate visual percepts; it generates skilled actions (in part by modulat
ing more ancient visual-motor modules in the midbrain and brainstem; see Goodale,
1996).
Some of the most compelling evidence for the division of labor proposed by Goodale and
Milner (1992) has come from studies of the visual deficits observed in patients with dam
age to either the dorsal or ventral stream. It has been known for a long time, for example,
that patients with lesions in the dorsal stream, particularly lesions of the superior regions
of the posterior parietal cortex that invade the territory of the intraparietal sulcus and/or
the parieto-occipital sulcus, can have problems using vision to direct a grasp or aiming
movement toward the correct location of a visual target placed in different positions in
the visual field, particularly the peripheral visual field. This deficit is often described as
optic ataxia (following Bálint, 1909; Bálint & Harvey, 1995). But the failure to locate an
object with the hand should not be construed as a problem in spatial vision; many of
these patients, for example, can describe the relative position of the object in space quite
accurately, even though they cannot direct their hand toward it (Perenin & Vighetto,
1988). Moreover, sometimes the deficit will be seen in one hand but not the other. (It
should be pointed out, of course, that these patients typically have no difficulty using in
put from other sensory systems, such as proprioception or audition, to guide their move
ments.) Some of these patients are unable to use visual information to rotate their hand,
scale their grip, or configure their fingers properly when reaching out to pick up an ob
ject, even though they have no difficulty describing the orientation, size, or shape of ob
jects in that part of the visual field (Goodale et al., 1994; Jakobson, Archibald, Carey, &
Goodale, 1991; Figure 14.3A). Clearly, a “disorder of spatial vision” (Holmes, 1918;
Ungerleider & Mishkin, 1982) fails to capture this range of visual-motor impairments. In
Page 9 of 40
Visual Control of Action
stead, this pattern of deficits suggests that the posterior parietal cortex plays a critical
role in the visual control of skilled actions (for a more detailed discussion, see Milner &
Goodale 2006).
The opposite pattern of deficits and spared abilities can be seen in patients with visual
agnosia. Take the case of patient DF, who developed a profound visual form agnosia fol
lowing carbon monoxide poisoning (Goodale et al., 1991; Milner et al., 1991). Although
magnetic resonance imaging (MRI) showed evidence of diffuse damage consistent with
hypoxia, most of the damage was evident in ventrolateral regions of the occipital cortex,
with V1 remaining largely spared. Even though DF’s “low-level” visual abilities are rea
sonably intact, she can no longer recognize everyday objects or the faces of her friends
and relatives; nor can she identify even the simplest of geometric shapes. (If an object is
placed in her hand, of course, she has no trouble identifying it by touch.) Remarkably,
however, DF shows strikingly accurate guidance of her hand movements when she at
tempts to pick up the very objects she cannot identify. Thus, when she reaches out to
grasp objects of different sizes, her hand opens wider mid-flight for larger objects than it
does for smaller ones, just like it does in people with normal vision (Figure 14.3B). Simi
larly, she rotates her hand and wrist quite normally when she reaches out to grasp ob
jects in different orientations, and she places her fingers correctly on the surface of ob
jects with different shapes (Goodale et al., 1994). At the same time, she is quite unable to
distinguish between any of these objects when they are presented to her in simple dis
crimination tests. She even fails (p. 279) in manual “matching” tasks, in which she is
asked to show how wide an object is by opening her index finger and thumb a corre
sponding amount. DF’s spared visual-motor skills are not limited to grasping. She can
Page 10 of 40
Visual Control of Action
step over obstacles during locomotion as well as controls, even though her perceptual
judgments about the height of these obstacles are far from normal. Contrary to what
would be predicted from the what-versus-where hypothesis, then, a profound loss of form
perception coexists in DF with a preserved ability to use form in guiding a broad range of
actions. Such a dissociation, of course, is consistent with the idea that there are separate
neural pathways for transforming incoming visual information for the perceptual repre
sentation of the world and for the control of action. Presumably, it is the former and not
the latter that is compromised in DF (for more details, see Goodale & Milner 2004; Milner
& Goodale 2006).
But where exactly is the damage in DF’s brain? As already mentioned, an early structural
MRI showed evidence of extensive bilateral damage in the ventrolateral occipital cortex.
A more recent high-resolution MRI scan confirmed this damage but revealed that the le
sions were focused in a region of the lateral occipital cortex (area LO) that we now know
is involved in the visual recognition of objects, particularly their geometric structure
(James, Culham, Humphrey, Milner, & Goodale, 2003). It would appear that this selective
damage to area LO has disrupted DF’s ability to perceive the form of objects. These le
sions have not interfered with her ability to use visual information about form to shape
her hand when she reaches out and grasps objects—presumably because the visual-motor
networks in her dorsal stream are largely spared.
Since the original work on DF, other patients with ventral stream damage have been iden
tified who show strikingly similar dissociations between vision for perception and vision
for action. Patient SB, who suffered several bilateral damage to his ventral stream early
in life, shows remarkably preserved visual-motor skills (he plays table tennis and can ride
a motorcycle) despite having profound deficits in his ability to identify objects, faces, col
ors, visual texture, and words (Dijkerman, Lê S, Démonet, & Milner, 2004; Lê, Cardebat
et al., 2002). Recently, another patient, who sustained bilateral damage to the ventral
stream following a stroke, was tested on several of the same tests that we gave to DF
more than a decade ago. Remarkably, this new patient (JS) behaved almost identically to
DF: in other words, despite his inability to perceive the shape and orientation of objects,
he was able to use these same object features to program and control grasping move
ments directed at those objects (Karnath, Rüter, Mandler, & Himmelbach, 2009). Finally,
it is worth noting that if one reads the early clinical reports of patients with visual form
agnosia, one can find a number of examples of what appear to be spared visual-motor
skills in the face of massive deficits in form perception. Thus, Campion (1987), (p. 280) for
example, reports that patient RC, who showed a profound visual form agnosia after car
bon monoxide poisoning, “could negotiate obstacles in the room, reach out to shake
hands and manipulate objects or [pick up] a cup of coffee.”
Thus, the pattern of visual deficits and spared abilities in DF (and in SB, JS, and other pa
tients with visual form agnosia) is in many ways the mirror image of that observed in the
optic ataxia patients described earlier. DF, who has damage in her ventral stream, can
reach out and grasp objects whose form and orientation she does not perceive, whereas
patients with optic ataxia, who have damage in their dorsal stream, are unable to use vi
Page 11 of 40
Visual Control of Action
sion to guide their reaching or grasping movements to objects whose form and orienta
tion they perceive. This “double dissociation” cannot be easily accommodated within the
traditional what-versus-where account but is entirely consistent with the division of labor
between perception and action proposed by Goodale and Milner (1992). It should be not
ed that the perception–action model is also supported by a wealth of anatomical, electro
physiological, and lesion studies in the monkey too numerous to review here (for recent
reviews, see Andersen & Buneo, 2003; Cohen & Andersen, 2002; Milner & Goodale, 2006;
Tanaka, 2003). But perhaps some of the most convincing evidence for the perception–ac
tion proposal has come from functional magnetic resonance imaging (fMRI) studies of the
dorsal and ventral streams in the human brain.
Not surprisingly therefore, an fMRI investigation of activity in DF’s brain revealed no dif
ferential activation for line drawings of common objects (vs. scrambled versions) any
where in DF’s remaining ventral stream, mirroring her poor performance in identifying
the objects depicted in the drawings (James et al., 2003) (Figure 14.4). Again, this strong
ly suggest that area LO is essential for form perception, generating the geometrical struc
ture of objects by combining information about edges and surfaces that has already been
extracted from the visual array by low-level visual feature detectors.
In addition to LO, other ventral stream areas have been identified that code for faces, hu
man body parts, and places or scenes (for review, see Milner & Goodale, 2006). Although
there is a good deal of debate about whether these areas are really category specific
(e.g., Downing, Chan, Peelen, Dodds, & Kanwisher, 2006
Page 12 of 40
Visual Control of Action
Page 13 of 40
Visual Control of Action
medially in the intraparietal sulcus (Culham, Cavina-Pratesi, & Singhal, 2006; Grefkes &
Fink, 2005; Pierrot-Deseilligny, Milea, & Muri, 2004). A region in the anterior part of the
intraparietal sulcus has been identified that is consistently activated when people reach
out and grasp visible objects in the scanner (Binkofski et al., 1998; Culham, 2004; Culham
et al., 2003). This area has been called human AIP (hAIP) because it thought to be homol
ogous with a region in the anterior intraparietal sulcus of the monkey that has also been
implicated in the visual control of grasping (for review, see Sakata, 2003). Several more
recent studies have also shown that hAIP is differentially activated during visually guided
grasping (e.g., Cavina-Pratesi, Goodale, & Culham, 2007; Frey, Vinton, Norlund, &
Grafton, 2005). Importantly, area LO in the ventral stream is not activated when subjects
reach out and grasp objects (Cavina-Pratesi, Goodale, & Culham, 2007; Culham, 2004;
Culham et al., 2003), suggesting that this object-recognition area in the ventral stream is
not required for the programming and control of visually guided grasping and that hAIP
and associated networks in the posterior parietal cortex (in association with premotor
and motor areas) can do this independently. This conclusion is considerably strengthened
by the fact that patient DF, who has large bilateral lesions of area LO, shows robust differ
ential activation in area hAIP for grasping (compared with reaching), similar to that seen
in healthy subjects (James et al. 2003) (Figure 14.5).
There is preliminary fMRI evidence to suggest that when binocular information is avail
able for the control of grasping, dorsal stream structures can mediate this control with
out any additional activity in ventral stream areas such as LO. But when (p. 282) only
monocular vision is available, and reliance on pictorial cues becomes more critical, acti
Page 14 of 40
Visual Control of Action
But what about the visual control of reaching? As we saw earlier, the lesions associated
with the misreaching that defines optic ataxia have been typically found in the posterior
parietal cortex, including the intraparietal sulcus, and sometimes extending into the infe
rior or superior parietal lobules (Perenin & Vighetto, 1988). More recent quantitative
analyses of the lesion sites associated with misreaching have revealed several key foci in
the parietal cortex, including the medial occipital-parietal junction, the superior occipital
gyrus, the intraparietal sulcus, and the superior parietal lobule as well as parts of the in
ferior parietal lobule (Karnath & Perenin, 2005). As it turns out, these lesion sites map
nicely onto the patterns of activation found in a recent fMRI study of visually guided
reaching that showed reach-related activation both in a medial part of the intraparietal
sulcus (near the intraparietal lesion site identified by Karnath and Perenin) and in the me
dial occipital-parietal junction (Prado, Clavagnier, Otzenberger, Scheiber, & Perenin,
2005). A more recent study found that the reach-related focus in the medial intraparietal
sulcus was equally active for reaches with and without visual feedback, whereas an area
in the superior parietal occipital cortex (SPOC) was particularly active when visual feed
back was available (Filimon, Nelson, Huang, & Sereno, 2009). This suggests that the me
dial intraparietal region may reflect proprioceptive more than visual control of reaching,
whereas the SPOC may be more involved in visual control. Both these areas have also
been implicated in the visual control of reaching in the monkey (Andersen & Buneo,
2003; Fattori, Gamberini, Kutz, & Galletti, 2001; Snyder, Batista, & Andersen, 1997).
There is evidence to suggest that SPOC may play a role in some aspects of grasping, par
ticularly wrist rotation (Grol et al., 2007; Monaco, Sedda, Fattori, Galletti, & Culham,
2009), a result that mirrors recent findings on homologous areas in the medial occipital-
parietal cortex of the monkey (Fattori, Breveglieri, Amoroso, & Galletti, 2004; Fattori et
al., 2010). But at the same time, it seems clear from the imaging data that more anterior
parts of the intraparietal sulcus, such as hAIP, play a unique role in visually guided grasp
ing and appear not to be involved in the visual control of reaching movements. Moreover,
patients with lesions of hAIP have deficits in grasping but retain the ability to reach to
ward objects (Binkofski et al., 1998), whereas other patients with lesions in more medial
and posterior areas of the parietal lobe, including the SPOC, show deficits in reaching but
not grip scaling (Cavina-Pratesi, Ietswaart, Humphreys, Lestou, & Milner, 2010). The
identification of areas in the human posterior parietal cortex for the visual control of
reaching that are anatomically distinct from those implicated in the visual control of
grasping, particularly the scaling of grip aperture, lends additional support to Jeannerod’s
(1981) proposal that the transport and grip components of reach-to-grasp movements are
programmed and controlled relatively independently. None of these observations, howev
Page 15 of 40
Visual Control of Action
er, can be easily accommodated within the double-pointing hypothesis of Smeets and
Brenner (1999).
As mentioned earlier, not only are we adept at reaching out and grasping objects, but we
are also able to avoid obstacles that might potentially interfere with our reach. Although
to date there is no neuroimaging evidence about where in the brain the location of obsta
cles is coded, there is persuasive neuropsychological evidence for a dorsal stream locus
for this coding. Thus, unlike healthy control subjects, patients with optic ataxia from dor
sal stream lesions do not automatically alter the trajectory of their grasp to avoid obsta
cles located to the left and right of the path of their hand as they reach out to touch a tar
get beyond the obstacles—even though they certainly see the obstacles and can indicate
the midpoint between them (Schindler et al., 2004). Conversely, patient DF shows normal
avoidance of the obstacles in the same task, even though she is deficient at indicating the
midpoint between the two obstacles (Rice et al., 2006).
But where is the input to all these visual-motor areas in the dorsal stream coming from?
Although it is clear that V1 has prominent projections to the motion processing area MT
and other areas that provide input to dorsal stream networks, it has been known for a
long time that humans (and monkeys) with large bilateral lesions of V1 are still capable of
performing many visually guided actions despite being otherwise blind with respect to
the controlling stimuli (for review, see Milner & Goodale, 2006; (p. 283) Weiskrantz, 1997).
These residual visual abilities, termed “blindsight” by Sanders et al. (1974), presumably
depend on projections that must run outside of the geniculostriate pathway, such as those
going from the eye to the superior colliculus, the interlaminar layers of the dorsal lateral
geniculate nucleus, or even directly to the pulvinar (for review, see Cowey, 2010). Some of
these extra-geniculate projections may also reach visual-motor networks in the dorsal
stream. It has recently been demonstrated, for example, that a patient with a complete le
sion of V1 in the right hemisphere was still capable of avoiding obstacles in his blind left
hemifield while reaching out to touch a visual target in his sighted right field (Striemer,
Chapman, & Goodale, 2009). The avoidance of obstacles in this kind of task, as we have
already seen, appears to be mediated by visual-motor networks in the dorsal stream (Rice
et al., 2006; Schindler et al., 2004). Similarly, there is evidence that such patients show
some evidence for grip scaling when they reach out and grasp objects placed in their
blind field (Perenin & Vighetto, 1996). This residual ability also presumably depends on
dorsal stream networks that are being accessed by extra-geniculostriate pathways. There
is increasing evidence that projections from the superior colliculus to the pulvinar—and
from there to MT and area V3—may be the relay whereby visual inputs reach the visual-
motor networks in the dorsal stream (e.g., Lyon, Nassi, & Callaway, 2010). Some have
even suggested that a direct projection from the eye to the pulvinar—and then to MT—
might be responsible (Warner, Goldshmit, & Bourne, 2010). But whatever the pathways
might be, it is clear that the visual-motor networks in the dorsal stream that are known to
mediate grasping and obstacle avoidance during reaching are receiving visual input by
pass V1.
Page 16 of 40
Visual Control of Action
In summary, the neuropsychological and neuroimaging data that have been amassed over
the past 25 years suggest that vision for action and vision for perception depend on dif
ferent and relatively independent visual pathways in the primate brain. In short, the visu
al signals that give us the percept of our coffee cup sitting on the breakfast table are not
the same ones that guide our hand as we reach out to pick up it up!
Although I have focused almost entirely on the role of the dorsal stream in the control of
action, it is important to emphasize that the posterior parietal cortex also plays a critical
role in the deployment of attention, as well as in other high level cognitive tasks, such as
numeracy and working memory. Even so, a strong argument can be made that these func
tions of the dorsal stream (and associated networks in premotor cortex and more inferior
parietal areas) grew out of a pivotal role that the dorsal stream plays in the control of eye
movements and goal-directed limb movements (for more on these issues, see Moore,
2006; Nieder & Dehaene, 2009; Rizzolatti & Craighero, 1998; Rizzolatti, Riggio, Dascola,
& Umiltá, 1987).
Although the evidence from a broad range of empirical studies points to that fact that
there are two relatively independent visual pathways in the primate cerebral cortex, the
question remains as to why two separate systems evolved in the first place. Why couldn’t
one “general purpose” visual system handle both vision for perception and vision for ac
tion? The answer to this question lies in the differences in the computational require
ments of vision for perception on the one hand and vision for action on the other. Consid
er the coffee cup example introduced earlier. To be able to grasp the cup successfully, the
visual-motor system has to deal with the actual size of the cup and its orientation and po
sition with respect to the hand you intend to use to pick it up. These computations need
to reflect the real metrics of the world, or at the very least, make use of learned “look-up
tables” that link neurons coding a particular set of sensory inputs with neurons that code
the desired state of the limb (Thaler & Goodale, 2010). The time at which these computa
tions are performed is equally critical. Observers and goal objects rarely stay in a static
relationship with one another and, as a consequence, the egocentric location of a target
object can often change radically from moment to moment. In other words, the required
coordinates for action need to be computed at the very moment the movements are per
formed.
Page 17 of 40
Visual Control of Action
In contrast to vision for action, vision for perception does not need to deal with the ab
solute size of objects or their egocentric locations. In fact, very often such computations
would be counterproductive because our viewpoint with respect to objects does not re
main constant—even though our perceptual representations of those objects do show con
stancy. Indeed, one can argue that it would be better to encode the size, orientation, and
location of objects relative to each other. Such a scene-based frame of reference permits
a perceptual representation of objects that transcends particular viewpoints, (p. 284)
while preserving information about spatial relationships (as well as relative size and ori
entation) as the observer moves around. The products of perception also need to be avail
able over a much longer time scale than the visual information used in the control of ac
tion. We may need to recognize objects we have seen minutes, hours, days—or even years
before. To achieve this, the coding of the visual information has to be somewhat abstract
—transcending particular viewpoint and viewing conditions. By working with perceptual
representations that are object or scene based, we are able to maintain the constancies of
size, shape, color, lightness, and relative location, over time and across different viewing
conditions. Although there is much debate about the way in which this information is cod
Page 18 of 40
Visual Control of Action
ed, it is pretty clear that it is the identity of the object and its location within the scene,
not its disposition with respect to the observer, that is of primary concern to the percep
tual system. In fact, current perception, combined with stored information about previ
ously encountered objects, not only facilitates the object recognition but also contributes
to the control of goal-directed movements when we are working in offline mode (i.e., con
trolling our movements, not in real time, but rather on the basis of the memory of goal
objects that are no longer visible and their remembered locations in the world).
The differences in the metrics and frames of reference used by vision for perception and
vision for action have been demonstrated in normal observers in experiments that have
made use of pictorial illusions, particularly size-contrast illusions. Aglioti, DeSouza, and
Goodale (1995), for example, showed that the scaling of grip aperture in flight was re
markably insensitive to the Ebbinghaus illusion, in which a target disk surrounded by
smaller circles appears to be larger than the same disk surrounded by larger circles. They
found that maximum grip aperture was scaled to the real, not the apparent, size of the
target disk (Figure 14.6). A similar dissociation between grip scaling and perceived size
was reported by Haffenden and Goodale (1998), under conditions where participants had
no visual feedback during the execution of grasping movements made to targets present
ed in the context of an Ebbinghaus illusion. Although grip scaling escaped the influence
of the illusion, the illusion did affect (p. 285) performance in a manual matching task, a
kind of perceptual report, in which participants were asked to open their index finger and
thumb to indicate the perceived size of a disk. (This measure is akin to the typical magni
tude estimation paradigms used in conventional psychophysics, but with the virtue that
the manual estimation makes use of the same effector that is used in the grasping task.)
To summarize, then, the aperture between the finger and thumb was resistant to the illu
sion when the vision-for-action system was engaged (i.e., when the participant grasped
the target) and sensitive to the illusion when the vision-for-perception system was en
gaged (i.e., when the participant estimated its size).
This dissociation between what people do and what they say they see underscores the dif
ferences between vision for action and vision for perception. The obligatory size-contrast
effects that give rise to the illusion (in which different elements of the array are com
pared) presumably play a crucial role in scene interpretation, a central function of vision
for perception. But the execution of a goal-directed act, such as manual prehension, re
quires computations that are centered on the target itself, rather than on the relations be
tween the target and other elements in the scene. In fact, the true size of the target for
calibrating the grip can be computed from the retinal-image size of the object coupled
with an accurate estimate of distance. Computations of this kind, which do not take into
account the relative difference in size between different objects in the scene, would be
expected to be quite insensitive to the kinds of pictorial cues that distort perception when
familiar illusions are presented.
The initial demonstration by Aglioti et al. (1995) that grasping is refractory to the Ebbing
haus illusion engendered a good deal of interest among researchers studying vision and
motor control—and there have been numerous investigations of the effects (or not) of pic
Page 19 of 40
Visual Control of Action
torial illusions on visual-motor control. Some investigators have replicated the original
observations of Agioti et al. with the Ebbinghaus illusion (e.g., Amazeen & DaSilva, 2005;
Fischer, 2001; Kwok & Braddick, 2003)—and others have observed a similar insensitivity
of grip scaling to the Ponzo illusion (Brenner & Smeets, 1996; Jackson & Shaw, 2000), the
horizontal-vertical illusion (Servos, Carnahan, & Fedwick, 2000), the Müller-Lyer illusion
(Dewar & Carey, 2006), and the diagonal illusion (Stöttinger & Perner, 2006; Stöttinger,
Soder, Pfusterschmied, Wagner, & Perner, 2010). Others have reported that pictorial illu
sions affect some aspects of motor control but not others (e.g., Biegstraaten et al., 2007;
Daprati & Gentilucci, 1997; Gentilucci et al., 1996; Glazebrook, de Grave, Brenner, &
Smeets, 2005; van Donkelaar, 1999). And a few investigators have found no dissociation
whatsoever between the effects of pictorial illusions on perceptual judgments and the
scaling of grip aperture (e.g., Franz et al., 2000; Franz, Bülthoff & Fahle, 2003).
Demonstrating that actions such as grasping are sometimes sensitive to illusory displays
is not by itself a refutation of the idea of two visual systems. One should not be surprised
that visual perception and visual-motor control can interact in the normal brain. Ultimate
ly, after all, perception has to affect our actions or the brain mechanisms mediating per
ception would never have evolved! The real surprise, at least for monolithic accounts of
vision, is that there are clear instances when visually guided action is apparently unaf
fected by pictorial illusions, which, by definition, affect perception. But from the stand
point of the duplex perception–action model, such instances are to be expected (see
Goodale, 2008; Milner & Goodale, 2006, 2008). Nevertheless, the fact that action has
been found to be affected by pictorial illusions in some instances has led a number of au
thors to argue that the earlier studies demonstrating a dissociation had not adequately
matched action and perception tasks for various input, attentional, and output demands
(e.g., Smeets & Brenner, 2001; Vishton & Fabre, 2003)—and that when these factors are
taken into account, the apparent differences between perceptual judgments and motor
control could be resolved without invoking the idea of two visual systems. Other authors,
notably Glover (2004), have argued that action tasks involve multiple stages of processing
from purely perceptual to more “automatic” visual-motor control. According to his plan
ning/control model, illusions would be expected to affect the early but not the late stages
of a grasping movement (Glover 2004; Glover & Dixon 2001a, 2001b).
Some of these competing accounts, such as Glover’s (2004) planning/control model, can
be viewed simply as modifications of the original perception–action model, but there are a
number of other studies in which the results cannot easily be reconciled with the two vi
sual systems model, and it remains a real question as to why actions appear to be sensi
tive to illusions in some experiments but not in others. But as it turns out, there are sever
al reasons why grip aperture might appear (p. 286) to be sensitive to illusions under cer
tain testing conditions—even when it is not. In some cases, notably the Ebbinghaus illu
sion, the flanker elements can be treated as obstacles, influencing the posture of the fin
gers during the execution of the grasp (de Grave et al., 2005; Haffenden, Schiff, &
Goodale, 2001; Plodowski & Jackson, 2001). In other words, the apparent effect of the il
lusion on grip scaling in some experiments might simply reflect the operation of visual-
motor mechanisms that treat the flanker elements of the visual arrays as obstacles to be
Page 20 of 40
Visual Control of Action
avoided. Another critical variable is the timing of the grasp with respect to the presenta
tion of the stimuli. When targets are visible during the programming of a grasping move
ment, maximum grip aperture is usually not affected by size-contrast illusions, whereas
when vision is occluded before the command to initiate programming of the movement is
presented, a reliable effect of the illusion on grip aperture is typically observed (West
wood, Heath, & Roy, 2000; Westwood & Goodale, 2003; Fischer, 2001; Hu & Goodale,
2000). As discussed earlier, vision for action is designed to operate in real time and is not
normally engaged unless the target object is visible during the programming phase, when
(bottom-up) visual information can be immediately converted into the appropriate motor
commands. The observation that (top-down) memory-guided grasping is affected by the il
lusory display reflects the fact that the stored information about the target’s dimensions
was originally derived from the earlier operation of vision for perception (for a more de
tailed discussion of these and related issues, see Bruno, Bernadis, & Gentilucci, 2008;
Goodale, Westwood, & Milner, 2004).
Nevertheless, some have argued that if the perceptual and grasping tasks are appropri
ately matched, then grasping can be shown to be as sensitive to size-contrast illusions as
psychophysical judgments (Franz, 2001; Franz et al., 2000;) Although this explanation, at
least on the face of it, is a compelling one, it cannot explain why Aglioti et al. (1995) and
Haffenden and Goodale (1998) found that when the relative sizes of the two target ob
jects in the Ebbinghaus display were adjusted so that they appeared to be perceptually
identical, the grip aperture that participants used to pick up the two targets continued to
reflect the physical difference in their size.
Experiments by Ganel, Tanzer, and Goodale (2008b) provide evidence that is even more
difficult to explain away by appealing to a failure to match testing conditions and other
task-related variables. In this experiment, which used a version of the Ponzo illusion, a re
al difference in size was pitted against a perceived difference in size in the opposite direc
tion (Figure 14.7). The results were remarkably clear. Despite the fact that people be
lieved that the shorter object was the longer one (or vice versa), their in-flight grip aper
ture reflected the real, not the illusory, size of the target objects (Figure 14.8). In other
words, on the same trials in which participants erroneously decided that one object was
the longer (or shorter) of the two, the anticipatory opening between their fingers reflect
ed the real direction and magnitude of size differences between the two objects. More
over, the subjects in this experiment showed the same differential scaling to the real size
of the objects whether the objects were shown on the illusory display or on the control
display. Not surprisingly, when subjects were asked to use their finger and thumb to esti
mate the size of the target objects rather than pick them up, their manual estimates re
flected the apparent, not the real, size of the targets. Overall, these results underscore
once more the profound difference in the way visual information is transformed for action
and perception. Importantly, too, the results are difficult to reconcile with any argument
that suggests that grip aperture is sensitive to illusions, and that the absence of an effect
found in many studies is simply a consequence of differences in the task demands (Franz,
2001; Franz et al., 2000).
Page 21 of 40
Visual Control of Action
Page 22 of 40
Visual Control of Action
Of course, this finding (as well as the fact that actions are often resistant to size-contrast
illusions) fits well with Smeets and Brenner’s (1999, 2001) double-pointing hypothesis.
They would (p. 287) argue that the visual-motor system does not compute the size of the
object but instead computes the two locations on the surface of object where the digits
will be placed. According to their double-pointing hypothesis, size is irrelevant to the
planning of these trajectories, and thus variation in size will not affect the accuracy with
which the finger and thumb are placed on either side the target (p. 288) object. In short,
Weber’s law is essentially irrelevant (Smeets & Brenner, 2008). The same argument ap
plies to grasping movements made in the context of size-contrast illusions: Because grip
scaling is simply an epiphenomenon of the independent finger trajectories, grip aperture
seems to be impervious to the effects of the illusion. Although, as discussed earlier,
Smeets and Brenner’s account has been challenged, it has to be acknowledged that their
double-pointing model offers a convincing explanation of all these findings (even if the
neuropsychological and neuroimaging data are more consistent with Jeannerod’s, 1981,
two-visual-motor channel account of reach-to-grasp movements).
Even so, there are some behavioral observations that cannot also be accommodated by
the Smeets and Brenner (1999, 2001) model. For example, as discussed earlier, if a delay
is introduced between viewing the target and initiating the grasp, the scaling of the antic
ipatory grip aperture is much more likely to be sensitive to size-contrast illusions (Fisch
er, 2001; Hu & Goodale, 2000; Westwood & Goodale, 2003; Westwood, Heath, & Roy,
2000). Moreover, if a similar delay is introduced in the context of the Ganel et al. (2008a)
experiments just described, grip aperture now obeys Weber’s law. These results cannot
Page 23 of 40
Visual Control of Action
be easily explained by the Smeets and Brenner model without conceding that—with delay
—grip scaling is no longer a consequence of programming individual digit trajectories,
but instead reflects the perceived size of the target object. Nor can the Smeets and Bren
ner model explain what happens when unpracticed finger postures (e.g., the thumb and
ring fingers) are used to pick up objects in the context of a size-contrast illusion. In con
trast to skilled grasping movements, grip scaling with unpracticed awkward grasping is
quite sensitive to the illusory difference in size between the objects (Gonzalez, Ganel,
Whitwell, Morrissey, & Goodale, 2008). Only with practice does grip aperture begin to re
flect the real size of the target objects. Smeets and Brenner’s model cannot account for
this result without positing that individual control over the digits occurs only after prac
tice. Finally, recent neuropsychological findings with patient DF suggest that she could
use action-related information about object size (presumably from her intact dorsal
stream) to make explicit judgments regarding the length of an object that she was about
to pick up (Schenk & Milner, 2006), suggesting that size, rather than two separate loca
tions, is implicitly coded during grasping. At this point, the difference between the per
ception–action model (Goodale & Milner, 1992; Milner & Goodale, 2006) and the (modi
fied) Smeets and Brenner account begins to blur. Both accounts posit that real-time con
trol of skilled grasping depends on visual-motor transformations that are quite distinct
from those involved in the control of delayed or unpracticed grasping movements. The
difference in the two accounts turns on the nature of the control exercised over skilled
movements performed in real time. But note that even if Smeets and Brenner are correct
that the trajectories of the individual digits are programmed individually on the basis of
spatial information that ignores the size of the object, this would not obviate the idea of
two visual systems, one for constructing our perception of the world and one for control
ling our actions in that world. Indeed, the virtue of the perception–action model is that it
accounts not only for the dissociations outlined above between the control of action and
psychophysical report in normal observers in a number of different settings, but it also
accounts for a broad range of neuropsychological, neurophysiological, and neuroimaging
data (and is completely consistent with Jeannerod’s dual-channel model of reach-to-grasp
movements).
It is worth noting that dissociations between perceptual report and action have been re
ported for other classes of responses as well. For example, Tavassoli and Ringach (2010)
found that eye movements in a visual tracking task responded to fluctuations in the veloc
ity of the moving target that were perceptually invisible to the subjects. Moreover, the
perceptual errors were independent of the accuracy of the pursuit eye movements. These
results are in conflict with the idea that the motor control of pursuit eye movements and
the perception are based on the same motion signals and are affected by shared sources
of noise. Similar dissociations have been observed between saccadic eye movements and
perceptual report. Thus, by exploiting the illusory mislocalization of a flashed target in
duced by visual motion, de’Sperati and Baud-Bovy (2008) showed that fast but not slow
saccades escaped the effects of the illusion and were directed to the real rather than the
apparent location of the target. This result underscores the fact that the control of action
often depends on processes that unfold much more rapidly than those involved in percep
Page 24 of 40
Visual Control of Action
tual processing (e.g., Castiello & Jeannerod, 1991). Indeed, as has been already dis
cussed, visual-motor control may often be mediated by fast feedforward mechanisms, in
contrast to conscious perception, which requires (slower) feedback to earlier visual ar
eas, including V1 (Lamme, 2001).
When the idea of a separate vision-for-action system was first proposed 20 years
(p. 289)
ago, the emphasis was on the independence of this system from vision for perception. But
clearly the two systems must work closely together in the generation of purposive behav
ior. One way to think about the interaction between the two streams (an interaction that
takes advantage of the complementary differences in their computational constraints) is
in terms of a “tele-assistance” model (Goodale & Humphrey, 1998). In tele-assistance, a
human operator who has identified a goal object and decided what to do with it communi
cates with a semi-autonomous robot that actually performs the required motor act on the
flagged goal object (Pook & Ballard, 1996). In terms of this tele-assistance metaphor, the
perceptual machinery in the ventral stream, with its rich and detailed representations of
the visual scene (and links with cognitive systems), would be the human operator.
Processes in the ventral stream participate in the identification of a particular goal and
flag the relevant object in the scene, perhaps by means of an attention-like process. Once
a particular goal object has been flagged, dedicated visual-motor networks in the dorsal
stream (in conjunction with related circuits in premotor cortex, basal ganglia, and brain
stem) are then activated to transform the visual information about the object into the ap
propriate coordinates for the desired motor act. This means that in many instances a
flagged object in the scene will be processed in parallel by both ventral and dorsal stream
mechanisms—each transforming the visual information in the array for different purpos
es. In other situations, where the visual stimuli are particularly salient, visual-motor
mechanisms in the dorsal stream will operate without any immediate supervision by ven
tral stream perceptual mechanisms.
Of course, the tele-assistance analogy is far too simplified. For one thing, the ventral
stream by itself cannot be construed as an intelligent operator that can make assess
ments and plans. Clearly, there has to be some sort of top-down executive control—almost
certainly engaging prefrontal mechanisms—that can initiate the operation of attentional
search and thus set the whole process of planning and goal selection in motion (for re
view, see Desimone & Duncan, 1995; Goodale & Haffenden, 2003). Reciprocal interac
tions between prefrontal/premotor areas and the areas in the posterior parietal cortex un
doubtedly play a critical role in recruiting specialized dorsal stream structures, such as
LIP, which appear to be involved in the control of both voluntary eye movements and
covert shifts of spatial attention in monkeys and humans (Bisley & Goldberg, 2010; Cor
betta, Kincade, & Shulman, 2002). In terms of the tele-assistance metaphor, area LIP can
be seen as acting like a video camera on the robot scanning the visual scene, and thereby
providing new inputs that the ventral stream can process and pass on to frontal systems
that assess their potential importance. In practice, of course, the video camera/LIP sys
tem does not scan the environment randomly: It is constrained to a greater or lesser de
Page 25 of 40
Visual Control of Action
gree by top-down information about the nature of the potential targets and where those
targets might be located, information that reflects the priorities of the operator/organism
that are presumably elaborated in prefrontal systems.
What happens next goes beyond even these speculations. Before instructions can be
transmitted to the visual-motor control systems in the dorsal stream, the nature of the ac
tion required needs to be determined. This means that praxis systems, perhaps located in
the left hemisphere, need to “instruct” the relevant visual-motor systems. After all, ob
jects such as tools demand a particular kind of hand posture. Achieving this not only re
quires that the tool be identified, presumably using ventral stream mechanisms (Valyear
& Culham, 2010), but also that the required actions to achieve that posture be selected as
well via a link to these praxis systems. At the same time, the ventral stream (and related
cognitive apparatus) has to communicate the locus of the goal object to these visual-mo
tor systems in the dorsal stream. One way that this ventral-dorsal transmission could hap
pen is via recurrent projections from foci of activity in the ventral stream back down
stream to primary visual cortex and other adjacent visual areas. Once a target has been
“highlighted” on these retinotopic maps, its location could then finally be forwarded to
the dorsal stream for action (for a version of this idea, see Lamme & Roelfsema, 2000).
Moreover, LIP itself, by virtue of the fact that it would be “pointing” at the goal object,
could also provide the requisite coordinates, once it has been cued by recognition sys
tems in the ventral stream.
When the particular disposition and location of the object with respect to the actor have
been computed, that information has to be combined with the postural requirements of
the appropriate functional grasp for the tool, that as I have already suggested are pre
sumably provided by praxis systems that are in turn cued by recognition mechanisms in
the ventral (p. 290) stream. At the same time, the initial fingertip forces that should be ap
plied to the tool (or any object, for that matter) are based on estimations of its mass, sur
face friction, and compliance that are derived from visual information (e.g., Gordon, West
ling, Cole, & Johansson, 1993). Once contact is made, somatosensory information can be
used to fine-tune the applied forces—but the specification of the initial grip and lift forces
must be derived from learned associations between the object’s visual appearance and
prior experience with similar objects or materials (Buckingham, Cant, & Goodale, 2009).
This information presumably can be provided only by the ventral visual stream in con
junction with stored information about past interactions.
Again, it must be emphasized that all of this is highly speculative. Nevertheless, whatever
complex interactions might be involved, it is clear that goal-directed action is unlikely to
be mediated by a simple serial processing system. Multiple iterative processing is almost
certainly required, involving a constant interplay among different control systems at dif
ferent levels of processing (for a more detailed discussion of these and related issues, see
Milner & Goodale, 2006). A full understanding of the contrasting (and complementary)
roles of the ventral and dorsal streams in this complex network will come only when we
Page 26 of 40
Visual Control of Action
can specify the neural and functional interconnections between the two streams (and oth
er brain areas) and the nature of the information they exchange.
References
Aglioti, S., DeSouza, J., & Goodale, M. A. (1995). Size-contrast illusions deceive the eyes
but not the hand. Current Biology, 5, 679–685.
Amazeen, E. L., & DaSilva, F. (2005). Psychophysical test for the independence of percep
tion and action. Journal of Experimental Psychology: Human Perception and Performance,
31, 170–182.
Andersen, R. A., & Buneo, C. A. (2003). Sensorimotor integration in posterior parietal cor
tex. Advances in Neurology, 93, 159–177.
Ballard, D. H., Hayhoe, M. M., Li, F., & Whitehead, S. D. (1992). Hand–eye coordination
during sequential tasks. Philosophical Transactions of the Royal Society London B Biologi
cal Sciences, 337, 331–338.
Bálint, R. (1909). Seelenlähmung des “Schauens,” optische Ataxie, räumliche Störung der
Aufmerksamkeit. Monatsschrift für Psychiatrie und Neurologie, 25, 51–81.
Bálint, R., & Harvey, M. (1995). Psychic paralysis of gaze, optic ataxia, and spatial disor
der of attention. Cognitive Neuropsychology, 12, 265–281.
Baldauf, D., & Deubel, H. (2010). Attentional landscapes in reaching and grasping. Vision
Research, 50, 999–1013.
Biegstraaten, M., de Grave, D. D. J., Brenner, E., & Smeets, J. B. J. (2007). Grasping the
Muller-Lyer illusion: Not a change in perceived length. Experimental Brain Research, 176,
497–503.
Binkofski, F., Dohle, C., Posse, S., Stephan, K. M., Hefter, H., Seitz, R. J., & Freund, H. J.
(1998). Human anterior intraparietal area subserves prehension: A combined lesion and
functional MRI activation study. Neurology, 50, 1253–1259.
Binsted, G., Chua, R., Helsen, W., & Elliott, D. (2001). Eye–hand coordination in goal-di
rected aiming. Human Movement Sciences, 20, 563–585.
Bisley, J. W., & Goldberg, M. E. (2010). Attention, intention, and priority in the parietal
lobe. Annual Review of Neuroscience, 33, 1–21.
Brenner, E., & Smeets, J. B. (1996). Size illusion influences how we lift but not how we
grasp an object. Experimental Brain Research, 111, 473–476.
Brouwer, A. M., Franz, V. H., & Gegenfurtner, K. R. (2009). Differences in fixations be
tween grasping and viewing objects. Journal of Vision, 9, 18. 1–24.
Page 27 of 40
Visual Control of Action
Brown, S., & Schäfer, E. A. (1888). An investigation into the functions of the occipital and
temporal lobes of the monkey’s brain. Philosophical Transactions of the Royal Society of
London, 179, 303–327.
Bruno, N., Bernardis, P., & Gentilucci, M. (2008). Visually guided pointing, the Müller-Ly
er illusion, and the functional interpretation of the dorsal-ventral split: Conclusions from
33 independent studies. Neuroscience and Biobehavioral Reviews, 32, 423–437.
Buckingham, G., Cant, J. S., & Goodale, M. A. (2009). Living in a material world: How vi
sual cues to material properties affect the way that we lift objects and perceive their
weight. Journal of Neurophysiology, 102, 3111–3118.
Cant, J. S., Arnott, S. R., & Goodale, M. A. (2009). fMR-adaptation reveals separate pro
cessing regions for the perception of form and texture in the human ventral stream. Ex
perimental Brain Research, 192, 391–405.
Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates dif
ferent regions of human occipitotemporal cortex. Cerebral Cortex, 17, 713–731.
Castiello, U. (2001). The effects of abrupt onset of 2-D and 3-D distractors on prehension
movements. Perception and Psychophysics, 63, 1014–1025.
Cavina-Pratesi, C., Goodale, M. A., & Culham, J. C. (2007). FMRI reveals a dissociation be
tween grasping and perceiving the size of real 3D objects. PLoS One, 2, e424.
Cavina-Pratesi, C., Ietswaart, M., Humphreys, G. W., Lestou, V., & Milner, A. D. (2010).
Impaired grasping in a patient with optic ataxia: Primary visuomotor deficit or secondary
consequence of misreaching? Neuropsychologia, 48, 226–234.
Chapman, C. S., & Goodale, M. A. (2008). Missing in action: The effect of obstacle
(p. 291)
position and size on avoidance while reaching. Experimental Brain Research, 191, 83–97.
Chapman, C. S., & Goodale, M. A. (2010). Seeing all the obstacles in your way: The effect
of visual feedback and visual feedback schedule on obstacle avoidance while reaching.
Experimental Brain Research, 202, 363–375.
Chieffi, S., & Gentilucci, M. (1993). Coordination between the transport and the grasp
components during prehension movements. Experimental Brain Research, 94, 471–477.
Page 28 of 40
Visual Control of Action
Cohen, Y. E., & Andersen, R. A. (2002). A common reference frame for movement plans in
the posterior parietal cortex. Nature Reviews Neuroscience, 3, 553–562.
Corbetta, M., Kincade, M. J. & Shulman, G. L. (2002). Two neural systems for visual ori
enting and the pathophysiology of unilateral spatial neglect. In H.-O. Karnath, A. D. Mil
ner, & G. Vallar (Eds.), The cognitive and neural bases of spatial neglect (pp. 259–273).
Oxford, UK: Oxford University Press.
Cowey, A. (2010). The blindsight saga. Experimental Brain Research, 200, 3–24.
Cuijpers, R. H., Smeets, J. B., & Brenner, E. (2004). On the relation between object shape
and grasping kinematics. Journal of Neurophysiology, 91, 2598–2606.
Cuijpers, R. H., Smeets, J. B., & Brenner, E. (2004). On the relation between object shape
and grasping kinematics. Journal of Neurophysiology, 91, 2598–2606.
Culham, J. C. (2004). Human brain imaging reveals a parietal area specialized for grasp
ing. In N. Kanwisher & J. Duncan (Eds.) Attention and performance XX: Functional brain—
Imaging of human cognition (417–438). Oxford, UK: Oxford University Press.
Culham, J. C., Cavina-Pratesi, C., & Singhal, A. (2006). The role of parietal cortex in visuo
motor control: What have we learned from neuroimaging? Neuropsychologia, 44, 2668–
2684.
Culham, J. C., Danckert, S. L., DeSouza, J. F. X., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually-guided grasping produces activation in dorsal but not ventral stream
brain areas. Experimental Brain Research, 153, 158–170.
Culham, J. C., & Valyear, K. F. (2006). Human parietal cortex in action. Current Opinion in
Neurobiology, 16, 205–212.
Daprati, E., & Gentilucci, G. (1997). Grasping an illusion. Neuropsychologia, 35, 1577–
1582.
de Grave, D. D., Biegstraaten, M., Smeets, J. B., Brenner, E. (2005) Effects of the Ebbing
haus figure on grasping are not only due to misjudged size. Experimental Brain Research
163, 58–64.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An
nual Review of Neuroscience, 18, 193–222.
de’Sperati, C., & Baud-Bovy, G. (2008). Blind saccades: An asynchrony between seeing
and looking. Journal of Neuroscience, 28, 4317–4321.
Page 29 of 40
Visual Control of Action
Deubel, H., Schneider, W. X., & Paprotta, I. (1998). Selective dorsal and ventral process
ing: Evidence for a common attentional mechanism in reaching and perception. Visual
Cognition, 5, 81–107.
Dewar, M. T., & Carey, D. P. (2006). Visuomotor “immunity” to perceptual illusion: A mis
match of attentional demands cannot explain the perception-action dissociation. Neu
ropsychologia, 44, 1501–1508.
Diedrichsen, J., Werner, S., Schmidt, T., & Trommershauser, J. (2004). Immediate spatial
distortions of pointing movements induced by visual landmarks. Perception and Psy
chophysics, 66, 89–103.
Dijkerman, H. C., Lê, S., Démonet, J. F., & Milner, A. D. (2004). Visuomotor performance
in a patient with visual agnosia due to an early lesion. Brain Research: Cognitive Brain
Research, 20, 12–25.
Downing, P. E., Chan, A. W., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain
specificity in visual cortex. Cerebral Cortex, 16, 1453–1461.
Dubrowski, A., Bock, O., Carnahan, H., & Jungling, S. (2002). The coordination of hand
transport and grasp formation during single- and double-perturbed human prehension
movements. Experimental Brain Research, 145, 365–371.
Fattori, P., Breveglieri, R., Amoroso, K., & Galletti, C. (2004). Evidence for both reaching
and grasping activity in the medial parieto-occipital cortex of the macaque. European
Journal of Neuroscience, 20, 2457–2466.
Fattori, P., Gamberini, M., Kutz, D. F., & Galletti, C. (2001). “Arm-reaching” neurons in the
parietal area V6A of the macaque monkey. European Journal of Neuroscience, 13, 2309–
2313.
Fattori, P., Raos, V., Breveglieri, R., Bosco, A., Marzocchi, N., & Galletti C. (2010). The
dorsomedial pathway is not just for reaching: Grasping neurons in the medial parieto-oc
cipital cortex of the macaque monkey. Journal of Neuroscience, 30, 342–349.
Ferrier, D., & Yeo, G. F. (1884). A record of experiments on the effects of lesion of differ
ent regions of the cerebral hemispheres. Philosophical Transactions of the Royal Society
of London. 175, 479–564.
Filimon, F., Nelson, J. D., Huang, R. S., & Sereno, M. I. (2009). Multiple parietal reach re
gions in humans: Cortical representations for visual and proprioceptive feedback during
on-line reaching. Journal of Neuroscience, 29, 2961–2971.
Fischer, M. H. (2001). How sensitive is hand transport to illusory context effects? Experi
mental Brain Research, 136, 224–230.
Franz, V. H. (2001). Action does not resist visual illusions. Trends in Cognitive Sciences, 5,
457–459.
Page 30 of 40
Visual Control of Action
Franz, V. H., Bulthoff, H. H., & Fahle, M. (2003). Grasp effects of the Ebbinghaus illusion:
Obstacle avoidance is not the explanation. Experimental Brain Research, 149, 470–477.
Franz, V. H., Gegenfurtner, K. R., Bulthoff, H. H., & Fahle, M. (2000). Grasping visual illu
sions: no evidence for a dissociation between perception and action. Psychological
Science, 11, 20–25.
Frey, S. H., Vinton, D., Norlund, R., & Grafton, S. T. (2005). Cortical topography of human
anterior intraparietal cortex active during visually guided grasping. Brain Research: Cog
nitive Brain Research, 23, 397–405.
Ganel, T., Chajut, E., & Algom, D. (2008a). Visual coding for action violates fundamental
psychophysical principles. Current Biology, 18, R599–R601.
Ganel, T., Tanzer, M., & Goodale, M. A. (2008b). A double dissociation between action and
perception in the context of visual illusions: Opposite effects of real and illusory size. Psy
chological Science, 19, 221–225.
Gentilucci, M., Chieffi, S., Daprati, E., Saetti, M. C., & Toni, I. (1996). Visual illusion and
action. Neuropsychologia, 34, 369–376.
Glazebrook, C. M., Dhillon, V. P., Keetch, K. M., Lyons, J., Amazeen, E., Weeks, D.
(p. 292)
J., & Elliott, D. (2005). Perception-action and the Muller-Lyer illusion: Amplitude or end
point bias? Experimental Brain Research, 160, 71–78.
Glover, S. (2004). Separate visual representations in the planning and control of action.
Behavioural and Brain Sciences, 27, 3–24; discussion 24–78.
Glover, S., & Dixon, P. (2001a). Motor adaptation to an optical illusion. Experimental Brain
Research, 137, 254–258.
Glover, S., & Dixon, P. (2001b). The role of vision in the on-line correction of illusion ef
fects on action. Canadian Journal of Experimental Psychology, 55, 96–103.
Gonzalez, C. L. R., Ganel, T., Whitwell, R. L., Morrissey, B., & Goodale, M. A. (2008). Prac
tice makes perfect, but only with the right hand: Sensitivity to perceptual illusions with
awkward grasps decreases with practice in the right but not the left hand. Neuropsy
chologia, 46, 624–631.
Goodale, M. A. (1995). The cortical organization of visual perception and visuomotor con
trol. In S. Kosslyn and D. N. Oshershon (Ed.), An invitation to cognitive science. Vol. 2. Vi
sual cognition and action (2nd ed., pp. 167–214). Cambridge, MA: MIT Press.
Page 31 of 40
Visual Control of Action
Goodale, M. A., & Haffenden, A. M. (1998). Frames of reference for perception and action
in the human visual system. Neuroscience and Biobehavioral Reviews, 22, 161–172.
Goodale, M. A., & Haffenden, A. M. (2003). Interactions between dorsal and ventral
streams of visual processing. In A. Siegel, R. Andersen, H.-J. Freund, & D. Spencer (Eds.),
Advances in neurology: The parietal lobe (Vol. 93, pp. 249–267). Philadelphia: Lippincott-
Raven.
Goodale, M. A., Meenan, J. P., Bülthoff, H. H., Nicolle, D. A., Murphy, K. S., & Racicot, C. I.
(1994). Separate neural pathways for the visual analysis of object shape in perception
and prehension. Current Biology, 4, 604–610.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15, 20–25.
Goodale, M. A., & Milner, A. D. (2004). Sight unseen: An exploration of conscious and un
conscious vision. Oxford, UK: Oxford University Press.
Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissoci
ation between perceiving objects and grasping them. Nature, 349, 154–156.
Goodale, M. A., Westwood, D. A., & Milner, A. D. (2004). Two distinct modes of control for
object-directed action. Progress in Brain Research 144, 131–144.
Gordon, A. M., Westling, G., Cole, K. J., & Johansson, R. S. (1993). Memory representa
tions underlying motor commands used during manipulation of common and novel ob
jects. Journal of Neurophysiology, 69, 1789–1796.
Grefkes, C., & Fink, G. R. (2005). The functional organization of the intraparietal sulcus in
humans and monkeys. Journal of Anatomy, 207, 3–17.
Grill-Spector, K. (2003). The neural basis of object perception. Current Opinion in Neuro
biology, 13, 159–166.
Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Dif
ferential processing of objects under various viewing conditions in the human lateral oc
cipital complex. Neuron, 24, 187–203.
Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., & Malach, R. (1998). Cue-invariant
activation in object-related areas of the human occipital lobe. Neuron, 21, 191–202.
Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro
science, 27, 649–677.
Page 32 of 40
Visual Control of Action
Grol, M. J., Majdandzić, J., Stephan, K. E., Verhagen, L., Dijkerman, H. C., Bekkering, H.,
Verstraten, F. A., & Toni, I. (2007). Parieto-frontal connectivity during visually guided
grasping. Journal of Neuroscience, 27, 11877–11887.
Haffenden, A., & Goodale, M. A. (1998). The effect of pictorial illusion on prehension and
perception. Journal of Cognitive Neuroscience, 10, 122–136.
Haffenden, A. M., Schiff, K. C., & Goodale, M. A. (2001). The dissociation between percep
tion and action in the Ebbinghaus illusion: Nonillusory effects of pictorial cues on grasp.
Current Biology, 11, 177–181.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor
tex. Science, 293, 2425–2430.
Hu, Y., & Goodale, M. A. (2000). Grasping after a delay shifts size-scaling from absolute to
relative metrics. Journal of Cognitive Neuroscience, 12, 856–868.
Hu, Y., Osu, R., Okada, M., Goodale, M. A., & Kawato, M. (2005). A model of the coupling
between grip aperture and hand transport during human prehension. Experimental Brain
Research, 167, 301–304.
Jackson, S. R., Jackson, G. M., & Rosicky, J. (1995). Are non-relevant objects represented
in working memory? The effect of non-target objects on reach and grasp kinematics. Ex
perimental Brain Research, 102, 519–530.
Jackson, S. R., & Shaw, A. (2000). The Ponzo illusion affects grip-force but not grip-aper
ture scaling during prehension movements. Journal of Experimental Psychology: Human
Perception and Performance, 26, 418–423.
Jakobson, L. S., Archibald, Y. M., Carey, D. P., & Goodale, M. A. (1991). A kinematic analy
sis of reaching and grasping movements in a patient recovering from optic ataxia. Neu
ropsychologia, 29, 803–809.
Jakobson, L. S., & Goodale, M. A. (1991). Factors affecting higher-order movement plan
ning: A kinematic analysis of human prehension. Experimental Brain Research, 86, 199–
208.
James, T. W., Culham, J., Humphrey, G. K., Milner, A. D., & Goodale, M. A. (2003). Ventral
occipital lesions impair object recognition but not object-directed grasping: A fMRI study.
Brain, 126, 2463–2475.
James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differen
tial effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron,
35, 793–801.
Page 33 of 40
Visual Control of Action
Jeannerod, M. (1984). The timing of natural prehension movements. Journal of Motor Be
havior, 16, 235–254.
Johansson, R., Westling, G., Bäckström, A., & Flanagan, J. R. (2001). Eye–hand coordina
tion in object manipulation. Journal of Neuroscience, 21, 6917–6932.
Karnath, H. O., & Perenin, M.-T. (2005). Cortical control of visually guided reaching: Evi
dence from patients with optic ataxia. Cerebral Cortex, 15, 1561–1569.
Karnath, H. O., Rüter, J., Mandler, A., & Himmelbach, M. (2009). The anatomy of object
recognition—Visual form agnosia caused by medial occipitotemporal stroke. Journal of
Neuroscience, 29, 5854–5862.
Keefe, B. D., & Watt, S. J. (2009). The role of binocular vision in grasping: A small stimu
lus-set distorts results. Experimental Brain Research, 194, 435–444.
Knill, D. C. (2005). Reaching for visual cues to depth: the brain combines depth cues dif
ferently for motor control and perception. Journal of Vision, 5, 103–115.
Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the hu
man lateral occipital complex. Science, 293, 1506–1509.
Kwok, R. M., & Braddick, O. J. (2003) When does the Titchener circles illusion exert an ef
fect on grasping? Two- and three-dimensional targets. Neuropsychologia, 41, 932–940.
Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed
forward and recurrent processing. Trends in Neurosciences, 23, 571–579.
Lê, S., Cardebat, D., Boulanouar, K., Hénaff, M. A., Michel, F., Milner, D., Dijkerman, C.,
Puel, M., & Démonet, J.-F. (2002). Seeing, since childhood, without ventral stream: A be
havioural study. Brain, 125, 58–74.
Lee, Y. L., Crabtree, C. E., Norman, J. F., & Bingham, G. P. (2008). Poor shape perception
is the reason reaches-to-grasp are visually guided online. Perception and Psychophysics,
70, 1032–1046.
Page 34 of 40
Visual Control of Action
Loftus, A., Servos, P., Goodale, M. A., Mendarozqueta, N., & Mon-Williams, M. (2004).
When two eyes are better than one in prehension: monocular viewing and end-point vari
ance. Experimental Brain Research, 158, 317–327.
Louw, S., Smeets, J. B., & Brenner, E. (2007). Judging surface slant for placing objects: A
role for motion parallax. Experimental Brain Research, 183, 149–158.
Lyon, D. C., Nassi, J. J., & Callaway, E. M. (2010). A disynaptic relay from superior collicu
lus to dorsal stream visual cortex in macaque monkey. Neuron, 65, 270–279.
Marotta, J. J., Behrmann, M., & Goodale, M. A. (1997). The removal of binocular cues dis
rupts the calibration of grasping in patients with visual form agnosia. Experimental Brain
Research, 116, 113–121.
Marotta, J. J., & Goodale, M. A. (1998). The role of learned pictorial cues in the program
ming and control of grasping. Experimental Brain Research, 121, 465–470.
Marotta, J. J., & Goodale, M. A. (2001). The role of familiar size in the control of grasping.
Journal of Cognitive Neuroscience, 13, 8–17.
Marotta, J. J., Kruyer, A., & Goodale, M. A. (1998). The role of head movements in the con
trol of manual prehension. Experimental Brain Research, 120, 134–138.
Marotta, J. J., Perrot, T. S., Nicolle, D., & Goodale, M. A. (1995). The development of adap
tive head movements following enucleation. Eye, 9 (3), 333–336.
Marotta, J. J., Perrot, T. S., Servos, P., Nicolle, D., & Goodale, M. A. (1995). Adapting to
monocular vision: Grasping with one eye. Experimental Brain Research, 104, 107–114.
Melmoth, D. R., Finlay, A. L., Morgan, M. J., & Grant, S. (2009). Grasping deficits and
adaptations in adults with stereo vision losses. Investigative Ophthalmology and Visual
Science, 50, 3711–3720.
Melmoth, D. R., & Grant, S. (2006). Advantages of binocular vision for the control of
reaching and grasping. Experimental Brain Research, 171, 371–388.
Melmoth, D. R., Storoni, M., Todd, G., Finlay, A. L., & Grant, S. (2007). Dissociation be
tween vergence and binocular disparity cues in the control of prehension. Experimental
Brain Research, 183, 283–298.
Menon, R. S., Ogawa, S., Kim, S. G., Ellermann, J. M., Merkle, H., Tank, D. W., & Ugurbil,
K. (1992). Functional brain mapping using magnetic resonance imaging: Signal changes
accompanying visual stimulation. Investigative Radiology, Suppl 2, S47–S53.
Milner, A. D., & Goodale, M.A. (2006). The visual brain in action (2nd ed.). Oxford, UK:
Oxford University Press.
Milner, A. D., & Goodale M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46, 774–785.
Page 35 of 40
Visual Control of Action
Milner, A. D., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., Bet
tucci, D., Mortara, F., Mutani, R., Terazzi, E., & Davidson, D. L. W. (1991). Perception and
action in “visual form agnosia.” Brain, 114, 405–428.
Monaco, S., Sedda, A., Fattori, P., Galletti, C., & Culham, J. C. (2009). Functional magnetic
resonance adaptation (fMRA) reveals the involvement of the dorsomedial stream in wrist
orientation for grasping. Society for Neuroscience Abstracts, 307, 1.
Mon-Williams, M., & Dijkerman, H. C. (1999). The use of vergence information in the pro
gramming of prehension. Experimental Brain Research, 128, 578–582.
Mon-Williams, M., & McIntosh, R. D. (2000). A test between two hypotheses and a possi
ble third way for the control of prehension. Experimental Brain Research, 134, 268–273.
Mon-Williams, M., & Tresilian, J. R. (2001). A simple rule of thumb for elegant prehen
sion. Current Biology, 11, 1058–1061.
Moore, T. (2006). The neurobiology of visual attention: Finding sources. Current Opinion
in Neurobiology, 16, 159–165.
Nieder, A., & Dehaene, S. (2009). Representation of number in the brain. Annual Review
of Neuroscience, 32, 185–208.
Obhi, S. S., & Goodale, M. A. (2005). The effects of landmarks on the performance of de
layed and real-time pointing movements. Experimental Brain Research, 167, 335–344.
Ogawa, S., Tank, D. W., Menon, R., Ellermann, J. M., Kim, S. G., Merkle, H., & Ugurbil, K.
(1992). Intrinsic signal changes accompanying sensory stimulation: Functional brain map
ping with magnetic resonance imaging. Proceedings of the National Academy of Sciences
U S A, 89, 5951–5955.
Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules and dimensions. Nature Reviews Neuroscience, 9, 123–135.
Patla, A. E. (1997). Understanding the roles of vision in the control of human locomotion.
Gait and Posture, 5, 54–69.
Perenin, M.-T., & Rossetti, Y. (1996). Grasping without form discrimination in a hemi
anopic field. NeuroReport, 7, 793–797.
Perenin, M.-T., & Vighetto, A. (1988). Optic ataxia: A specific disruption in visuo
(p. 294)
motor mechanisms. I. Different aspects of the deficit in reaching for objects. Brain, 111,
643–674.
Pierrot-Deseilligny C. H., Milea, D., & Muri, R. M. (2004). Eye movement control by the
cerebral cortex. Current Opinion in Neurology, 17, 17–25.
Plodowski, A., & Jackson, S. R. (2001). Vision: Getting to grips with the Ebbinghaus illu
sion. Current Biology, 11, R304–R306.
Page 36 of 40
Visual Control of Action
Pook, P. K., & Ballard, D. H. (1996). Deictic human/robot interaction. Robotics and Au
tonomous Systems, 18, 259–269.
Prado, J., Clavagnier, S., Otzenberger, H., Scheiber, C., & Perenin, M.-T. (2005). Two corti
cal systems for reaching in central and peripheral vision. Neuron, 48, 849–858.
Rice, N. J., McIntosh, R. D., Schindler, I., Mon-Williams, M., Démonet, J. F., & Milner, A. D.
(2006). Intact automatic avoidance of obstacles in patients with visual form agnosia. Ex
perimental Brain Research, 174, 176–188.
Rizzolatti, G., & Craighero, L. (1998). Spatial attention: Mechanisms and theories. In M.
Sabourin, F. Craik, & M. Robert (Eds.), Advances in psychological science: Vol.2. Biologi
cal and cognitive aspects (pp. 171–198). East Sussex, UK: Psychology Press.
Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25, 31–40.
Sakata, H. (2003). The role of the parietal cortex in grasping. Advances in Neurology, 93,
121–139.
Sanders, M. D., Warrington, E. K., Marshall, J., & Weiskrantz, L. (1974). “Blindsight”: Vi
sion in a field defect. Lancet, 20, 707–708.
Schenk, T., & Milner, A. D. (2006). Concurrent visuomotor behaviour improves form dis
crimination in a patient with visual form agnosia. European Journal of Neuroscience, 24,
1495–1503.
Schindler, I., Rice, N. J., McIntosh, R. D., Rossetti, Y., Vighetto, A., & Milner, A. D. (2004).
Automatic avoidance of obstacles is a dorsal stream function: Evidence from optic ataxia.
Nature Neuroscience, 7, 779–784.
Sereno, M. I., & Tootell, R. B. (2005). From monkeys to humans: what do we now know
about brain homologies? Current Opinion in Neurobiology, 15, 135–144.
Servos, P., Carnahan, H., & Fedwick, J. (2000). The visuomotor system resists the horizon
tal-vertical illusion. Journal of Motor Behavior, 32, 400–404.
Servos, P., Goodale, M. A., & Jakobson, L. S. (1992). The role of binocular vision in pre
hension: A kinematic analysis. Vision Research, 32, 1513–1521.
Smeets, J. B., & Brenner, E. (1999). A new view on grasping. Motor Control, 3, 237–271.
Smeets, J. B., & Brenner, E. (2001). Independent movements of the digits in grasping. Ex
perimental Brain Research, 139, 92–100.
Smeets, J. B., & Brenner, E. (2008). Grasping Weber’s law. Current Biology, 18, R1089–
R1090.
Page 37 of 40
Visual Control of Action
Smeets, J. B., Brenner, E., & Martin, J. (2009). Grasping Occam’s razor. Advances in Ex
perimental Medical Biology, 629, 499–522.
Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior
parietal cortex. Nature, 386, 167–170.
Stöttinger, E., & Perner, J. (2006). Dissociating size representation for action and for con
scious judgment: Grasping visual illusions without apparent obstacles. Consciousness and
Cognition, 15, 269–284.
Stöttinger, E., Soder, K., Pfusterschmied, J., Wagner, H., & Perner, J. (2010). Division of
labour within the visual system: fact or fiction? Which kind of evidence is appropriate to
clarify this debate? Experimental Brain Research, 202, 79–88.
Striemer, C., Chapman, C. S., & Goodale, M. A. (2009). “Realtime” obstacle avoidance in
the absence of primary visual cortex. Proceedings of the National Academy of Sciences, U
S A, 106, 15996–16001.
Tanaka, K. (2003). Columns for complex visual object features in the inferotemporal cor
tex: Clustering of cells with similar but slightly different stimulus selectivities. Cerebral
Cortex, 13, 90–99.
Tavassoli, A., & Ringach, D. L. (2010). When your eyes see more than you do. Current Bi
ology, 20, R93–R94.
Thaler, L., & Goodale, M. A. (2010). Beyond distance and direction: The brain represents
target locations non-metrically. Journal of Vision, 10, 3. 1–27.
Tipper, S. P., Howard, L. A., & Jackson, S. R. (1997). Selective reaching to grasp: Evidence
for distractor interference effects. Visual Cognition, 4, 1–38.
Tootell, R. B. H., Tsao, D., & Vanduffel, W. (2003). Neuroimaging weighs in: Humans meet
macaques in “primate” visual cortex. Journal of Neuroscience, 23, 3981–3989.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge
MA: MIT Press.
Valyear, K. F., & Culham, J. C. (2010). Observing learned objectspecific functional grasps
preferentially activates the ventral stream. Journal of Cognitive Neuroscience, 22, 970–
984.
Valyear, K. F., Culham, J. C., Sharif, N., Westwood, D., & Goodale, M. A. (2006). A double
dissociation between sensitivity to changes in object identity and object orientation in the
ventral and dorsal visual streams: A human fMRI study. Neuropsychologia, 44, 218–228.
Page 38 of 40
Visual Control of Action
van Bergen, E., van Swieten, L. M., Williams, J. H., & Mon-Williams, M. (2007). The effect
of orientation on prehension movement time. Experimental Brain Research, 178, 180–193.
van de Kamp, C., & Zaal, F. T. J. M. (2007). Prehension is really reaching and grasping. Ex
perimental Brain Research, 182, 27–34.
van Donkelaar, P. (1999). Pointing movements are affected by size-contrast illusions. Ex
perimental Brain Research, 125, 517–520.
Van Essen, D. C., Lewis, J. W., Drury, H. A., Hadjikhani, N., Tootell, R. B., Bakircioglu, M.,
& Miller, M. I. (2001). Mapping visual cortex in monkeys and humans using surface-based
atlases. Vision Research, 41, 1359–1378.
van Mierlo, C. M., Louw, S., Smeets J. B., & Brenner, E. (2009). Slant cues are processed
with different latencies for the online control of movement. Journal of Vision, 9, 25. 1–8.
Vaughan, J., Rosenbaum, D. A., & Meulenbroek, R. G. (2001). Planning reaching and
grasping movements: The problem of obstacle avoidance. Motor Control, 5, 116–135.
Verhagen, L., Dijkerman, H. C., Grol, M. J., & Toni, I. (2008). Perceptuo-motor in
(p. 295)
Vilaplana, J. M., Batlle, J. F., & Coronado, J. L. (2004). A neural model of hand grip forma
tion during reach to grasp. 2004 IEEE International Conference on Systems, Man, and
Cybernetics, 1–7, 542–546.
Vishton, P. M., & Fabre, E. (2003). Effects of the Ebbinghaus illusion on different behav
iors: One- and two-handed grasping; one- and two-handed manual estimation; metric and
comparative judgment. Spatial Vision, 16, 377–392.
Warner, C. E., Goldshmit, Y., & Bourne, J. A. (2010). Retinal afferents synapse with relay
cells targeting the middle temporal area in the pulvinar and lateral geniculate nuclei.
Frontiers in Neuroanatomy, 4, 8.
Warren, W. H., & Fajen, B. R. (2004). From optic flow to laws of control. In L. M. Vaina, S.
A. Beardsley, & S. K. Rushton (Eds.), Optic flow and beyond (pp. 307–337). Norwell, MA:
Kluwer Academic.
Watt, S. J., & Bradshaw, M. F. (2000). Binocular cues are important in controlling the
grasp but not the reach in natural prehension movements. Neuropsychologia, 38, 1473–
1481.
Watt, S. J., & Bradshaw M. F. (2003). The visual control of reaching and grasping: Binocu
lar disparity and motion parallax. Journal of Experimental Psychology: Human Perception
and Performance, 29, 404–415.
Page 39 of 40
Visual Control of Action
Westwood, D. A., & Goodale, M. A. (2003). Perceptual illusion and the real-time control of
action. Spatial Vision, 16, 243–254.
Westwood, D. A., Heath, M., & Roy, E. A. (2000). The effect of a pictorial illusion on
closed-loop and open-loop prehension. Experimental Brain Research, 134, 456–463.
Notes:
(1) . Several solutions to the temporal coupling problem have been proposed (e.g., Hu,
Osu, Okada, Goodale, & Kawato, 2005; Mon-Williams & Tresilian, 2001; Vilaplana, Batlle,
& Coronado, 2004).
Melvyn A. Goodale
Melvyn A. Goodale, The Brain and Mind Institute, The University of Western Ontario
Page 40 of 40
Development of Attention
Development of Attention
M. Rosario Rueda
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Keywords: development, attention, cognitive development, brain development, attention networks, alerting, orient
ing, executive control
Attention has been a matter of interest to researchers since the emergence of psychology
as an experimental science. Titchener (1909) gave attention a central role in cognition,
considering it “the heart of the psychological enterprise. ” Some years earlier, William
James had provided an insightful definition of attention, which has become one of the
most popular to this date: “Everyone knows what attention is. It is the taking possession
of the mind in clear and vivid form of one out of what seem several simultaneous objects
or trains of thought. Focalization, concentration and consciousness are of its essence. It
implies withdrawal from some things in order to deal effectively with others …” (James,
1890).
Page 1 of 40
Development of Attention
Despite its subjective nature, this definition contains important insights about the various
aspects that would be studied by scientists interested in attention in the times to follow.
For instance, James’ definition suggests that we are limited by the amount of information
we can consciously process at a time, and he attributed to the attention system the func
tion of selecting the relevant out of all the stimulation available to our sensory systems.
What is relevant to a particular individual depends on the current goals and intentions of
that individual. Therefore, although implicitly, it is also suggested that attention is intrin
sically connected to volition. Additionally, words like “concentration” and “focalization”
point to the effortful and resource-consuming nature of the continuous monitoring be
tween the flow of information and goals of the individual that is required in order to know
what (p. 297) stimulation is important to attend to at each particular moment.
After James’ insightful ideas on attention, the research on inner states and cognitive
processes was for the most part neglected owing to an exacerbated emphasis on “observ
able” behavior during most of the first half of the twentieth century. However, when the
interest in cognition was reinstituted during and after World War II, all the multiple com
ponents of the attentional construct contained in James’ definition, such as selection, ori
enting, and executive control, were further developed by many authors, including Donald
Hebb (1949), Colin Cherry (1953), Donald Broadbent (1958), Daniel Kahneman (1973),
and Michael Posner (1978, 1980), among others.
In this chapter, I first review a variety of constructs studied within the field of attention.
Then, I discuss on the anatomy of brain networks related to attention as revealed from
imaging studies. Afterward, I present evidence on the developmental time course of the
attention networks during infancy and childhood. Next, I discuss how individual differ
ences in attentional efficiency have been studied over the past years as well as the influ
ence of both constitutional and experiential factors on the development of attention. Fi
nally, I present recent research focused on optimizing attentional capacities during devel
opment.
Page 2 of 40
Development of Attention
sponses to stimulation can be automatically triggered (Norman & Shallice, 1986; Posner
& Snyder, 1975). These three aspects of attention are further reviewed below.
Attention as State
The attention state of an organism varies according to changes in both internal and exter
nal conditions. Generally, alertness reflects the state of the organism for processing infor
mation and is an important condition in all tasks. Intrinsic or tonic alertness refers to a
state of general wakefulness, which clearly changes over the course of the day from sleep
to waking and within the waking state from sluggish to highly alert. Sustained attention
and vigilance have been defined as the ability to maintain a certain level of arousal and
alertness that allows responsiveness to external stimulation (Posner, 1978; Posner &
Boies, 1971). Tonic alertness requires mental effort and is subject to top-down control of
attention (Posner, 1978); thus, the attention state is likely to diminish after a period of
maintenance. In all tasks involving long periods of processing, the role of changes of state
may be important. Thus, vigilance or sustained attention effects probably rest at least in
part on changes in the tonic alerting system.
The presentation of an external stimulus also increases the state of alertness. Preparation
from warning cues (phasic alertness) can be measured by comparing the speed and accu
racy of response to stimulation with and without warning signals (Posner, 2008). Warning
cues appear to accelerate processes of response selection (Hackley & Valle-Inclán, 1998),
which commonly result in increased response speed. Depending on the conditions, this ef
fect can be automatic as it occurs with an auditory accessory event that does not predict
a target, but it can also be partly due to voluntary actions based on information conveyed
by the cue about the time of the upcoming target. The reduction of reaction time (RT) fol
lowing a warning signal is accompanied by vast changes in the physiological state of the
organism. These changes are thought to bring on the suppression of ongoing activity in
order to prepare the system for a rapid response. This often happens before the response
being sufficiently contrasted, as with short intervals between warning cue and target,
leading to reduced accuracy in performance (Morrison, 1982; Posner, 1978).
Attention as Selectivity
Attention is also an important mechanism for conscious perception. When examining a vi
sual scene, there is the general feeling that all information (p. 298) about it is available.
However, important changes can occur in the scene without being noticed by the observ
er, provided they take place away from the focus of attention. For instance, changes in
the scene are often completely missed if cues that are normally effective in producing a
shift of attention, such as luminance changes or movements, are suppressed (Rensink,
O’Reagan, & Clark, 1997), a phenomenon called change blindness. Something similar
happens in the auditory modality. A classic series of studies conducted by Colin Cherry
(1953) in which different stimuli were presented simultaneously to the two ears of individ
uals showed that information presented to the unattended ear could go totally unnoticed.
To explain the role of attention on the selection of information, Broadbent (1958)
Page 3 of 40
Development of Attention
developed a model that considers that attention acts as a filter, which selects a channel of
entry and sends information to a perceptual processing system of limited capacity. Later
on, Hillyard and colleagues studied the mechanisms of attentional selection. Using event-
related potentials (ERP), they showed that attended stimuli generate early ERP compo
nents of larger amplitude than unattended ones, suggesting that attention facilitates per
ceptual processing of attended information (Hillyard, 1985; Mangun & Hillyard, 1987).
Attention is thus considered a mechanism that allows selecting out irrelevant information
and gives priority to relevant information for conscious processing.
Much of the selection of external stimulation is achieved by orienting to the source of in
formation. This orientation can be carried out overtly, as when head and eye movements
are directed toward the source of input, or it can be carried out covertly, as when only at
tention is oriented to the stimulation. Posner (1980) developed a cueing paradigm to
study orienting of attention. In this task, a visual target appears at one of two possible lo
cations, either to the right or left of fixation, and participants are asked to detect its pres
ence by pressing a key. Before the target, a cue is presented at one of the two possible lo
cations. When the preceding cue appears at the same location as the subsequent target
(valid cue), a reduction in RT is observed compared with when there is no cue or it is dis
played at fixation (neutral cue). This allows measuring benefits in RT due to moving atten
tion to the location of the target before its appearance. On the contrary, when the cue is
presented at a different location from the target (invalid cue), an RT cost is observed with
respect to neutral cues, provided that attention must be disengaged from the wrong loca
tion and moved to where the target appeared. The same result is obtained even when no
sufficient time for a saccadic movement is allowed between cue and target (i.e., less than
200 milliseconds). This shows that attention can be oriented independently of eye and
head movements and that covert shifts of attention also enhance the speed of responding.
However, how free covert attention is from the eye movement systems is still a debated
matter (see Klein, 2004).
Page 4 of 40
Development of Attention
Studies using functional magnetic resonance imaging (fMRI) and cellular recording have
demonstrated that brain areas that are activated by cues, such as the superior parietal
lobe and temporal parietal junction, play a key role in modulating activity within primary
and extrastriate visual systems when attentional orienting occurs (Corbetta & Shulman,
2002; Desimone & Duncan, 1995). This shows that the attentional system achieves selec
tion by modulating the functioning of sensory systems. Attention can thus be considered a
domain-general system that regulates the activation of domain-specific sensory process
ing systems.
Attention is also an important mechanism for action monitoring, particularly when selec
tion has to be achieved in a voluntary effortful mode. Posner and Snyder (1975) first ar
gued about the central role of attention for cognitive control. In well-practiced tasks, ac
tion coordination does not require attentional control because responses can be automati
cally triggered by the stimulation. However, attention is needed in a variety of situations
in which automatic processing is either not available or is likely to produce inappropriate
responses. Years later, Norman and Shallice (1986) developed a cognitive model for dis
tinguishing between automatic and controlled modes of processing. According to their
model, psychological processing systems rely on a number of hierarchically organized
schemas of action and thought used for routine actions. These schemas are automatically
triggered and contain well-learned responses or sequences of actions. However, a differ
Page 5 of 40
Development of Attention
ent mode of operation involving the attention system is required when situations call for
more carefully elaborated responses. These are situations that involve (1) novelty, (2) er
ror correction or troubleshooting, (3) some degree of danger or difficulty, or (4) overcom
ing strong habitual responses or tendencies.
A way to study cognitive control in the lab consists of inducing conflict between respons
es by instructing people to execute a subdominant response while suppressing a domi
nant tendency. A basic measure of conflict interference is provided by the Stroop task
(Stroop, 1935). The original form of this task requires subjects to look at words denoting
colors and to report the color of ink the words are written in instead of reading them. Pre
senting incongruent perceptual and semantic information (e.g., the word “blue” written
with red ink) induces conflict and produces a delay in response time compared with when
the two sources of information match. The Flanker task is another widely used method to
study conflict resolution. In this task, the target is surrounded by irrelevant stimulation
that can either match or conflict with the response required by the target (Eriksen &
Eriksen, 1974). As with the Stroop task, resolving interference from distracting incongru
ent stimulation delays RT. Cognitive tasks involving conflict have been extensively used to
measure the efficiency with which control of action is exerted. The extent of interference
is usually measured by subtracting average RT in nonconflict conditions from that of con
ditions involving conflict. The idea is that additional involvement of the attention system
is required to detect conflict and resolve it by inhibiting the dominant but inappropriate
response (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Posner & DiGirolamo, 1998).
Thus, larger interference scores are interpreted as indicative of less efficiency of cogni
tive control.
Another important form of attention regulation is related to the ability to detect and cor
rect errors. Detection and monitoring of errors has been studied using ERP (Gehring,
Gross, Coles, Meyer, & Donchin, 1993). A large negative deflection over midline frontal
channels is often observed about 100 ms after the commission of an error, called the er
ror-related negativity (ERN). This effect has been associated with an attention-based self-
regulatory mechanism (Dehaene, Posner, & Tucker, 1994) and provides a means to exam
ine the emergence of this function during infancy and childhood (Berger, Tzur, & Posner,
2006).
All three aspects of attention considered above are simultaneously involved in much of
our behavior. (p. 300) However, the distinction between alerting, orienting, and executive
control has proved useful to understanding the neural basis of attention. Over the past
decades, Posner and colleagues have developed a neurocognitive model of attention. Pos
ner proposes a division of attention into three distinct brain networks. One of these in
volves changes of state and is called alerting. The other two are closely involved with se
lection and are called orienting and executive attention (Posner, 1995; Posner & Fan,
2008; Posner & Petersen, 1990; Posner, Rueda, & Kanske, 2007). The alerting network
deals with the intensive aspect of attention related to how the organism achieves and
Page 6 of 40
Development of Attention
maintains the alert state. The orienting network deals with selective mechanisms operat
ing on sensory input. Finally, the executive network is involved in the regulation of
thoughts, feelings, and behavior.
The three attention networks in Posner’s model are not considered completely indepen
dent. For instance, as mentioned earlier, alerting signals produce faster but more inaccu
rate responses, a result that Posner interpreted as an inhibitory interaction between the
alerting and executive networks (Posner, 1978). On the other hand, alerting appears to fa
cilitate the orienting of attention by speeding up attention shifts. Also, by focusing on tar
gets, orienting aids on filtering-out irrelevant information, thus enhancing the function of
executive attention (see Callejas, Lupiáñez, & Tudela, 2004, for futher discussion on at
tention networks interactions). However, despite their interacting nature, the three atten
tion networks have been shown to have relatively independent neuroanatomies and ap
pear to involve distinct neuromodulatory mechanisms (Posner, Rueda, & Kanske, 2007;
Posner & Fan, 2008; see also Table 15.1).
Several years ago, Fan et al. (2002) developed an experimental task to study the function
ing of the three attentional networks, called the attention network task (ANT). The task is
based on traditional experimental paradigms to study the functions of alerting (prepara
tion cues), orienting (orienting cues), and executive control (flanker task) (Figure 15.1).
Completion of the task allows calculation of three scores related to the efficiency of the
attention networks. The alerting score is calculated by subtracting RT to trials with dou
ble cue from RT to trials with no cue. This provides a measure of the benefit in perfor
mance by having a signal that informs about the immediate upcoming of the target and
by using this information to get ready to respond. The orienting score provides a measure
of how much benefit is obtained in responding when information is given about the loca
tion of the upcoming target. It is calculated by subtracting RT to spatial cue trials from
that of central cue trials. Finally, the executive attention score indicates the amount of in
terference experienced in performing the task when stimulation conflicting with the tar
get is presented in the display. It is calculated by subtracting RT to congruent trials from
RT to incongruent trials. Larger scores indicate more interference from distractors and
therefore less efficiency of conflict resolution mechanisms (executive attention).
Posner’s view of attention as an organ system with its own functional anatomy (Posner &
Fan, 2008) provides a model of great heuristic power. Connecting cognitive and neural
levels of analysis aids in answering many complex issues related to attention, such as the
maturational processes underlying its development and factors that are likely to influence
maturation, such as genes and experience. In the next section, anatomy of the three at
tention networks in Posner’s model is briefly described.
Neuroanatomy of Attention
The emergence of the field of cognitive neuroscience constituted a turning point in the
study of the relationship between brain and cognition from which the study of attention
has benefited greatly. Functional neuroimaging has allowed many cognitive tasks to be
Page 7 of 40
Development of Attention
analyzed in terms of the brain areas they activate (Posner & Raichle, 1994). Studies of at
tention have been among the most often examined in this way (Corbetta & Shulman, 2002;
Driver, Eimer, & Macaluso, 2007; Posner & Fan, 2008; Wright & Ward, 2008), and per
haps the areas of activation have been more consistent for the study of attention than for
any other cognitive system (Hillyard, Di Russo, & Martinez, 2006; Posner et al., 2007; Raz
& Buhle, 2006). A summary of the anatomy and neurotransmitters involved in the three
networks is shown in Table 15.1.
Alerting Network
Research in the past years has shown that structures of the right parietal and right
frontal lobes as well as a number of midbrain neural modulators such as norepinephrine
and dopamine are involved in alertness (Sturm et al., 1999). Arousal of the central ner
vous system involves input from brainstem systems that modulate activation of the cor
tex. (p. 301) Primary among these is the locus coeruleus, which is the source of the brain’s
norepinephrine. It has been demonstrated that the influence of warning signals operate
via this brain system because drugs that block it also prevent the changes in the alert
state that lead to improved performance after a warning signal is provided (Coull, Nobre,
& Frith, 2001; Marrocco & Davidson, 1998).
Page 8 of 40
Development of Attention
Table 15.1 Marker Tasks, Anatomy, Neurochemistry, and Genetics of Attention Networks
Page 9 of 40
Development of Attention
The involvement of frontal and parietal regions of the right hemisphere is supported by
studies showing that lesions in those areas impair patients’ ability to maintain the alert
state in the absence of warning signals (Dockree et al., 2004; Sturm et al., 1999). Imaging
studies have also supported the involvement of right frontal-parietal structures in the en
dogenous maintenance of alertness (Coull, Frith, Frackowiak, & Grasby, 1996; Dockree et
al., 2004).
However, the neural basis for tonic alertness may differ from those involving phasic
changes of alertness following warning cues. Warning signals provide a phasic change in
level of alertness over millisecond intervals. This change involves widespread variation in
autonomic signals, such as heart rate (Kahneman, 1973), and cortical changes, such as
the contingent negative variation (CNV; Walter, 1964). Several studies have shown that
the CNV is generated by activation in the frontal lobe, with specific regions depending on
the type of task being used (Cui et al., 2000; Tarkka & Basile, 1998). When using fixed
cue–target intervals, warning signals are informative of when a target will occur, thus
producing a preparation in the time domain. Under these conditions, warning cues ap
pear to activate frontal-parietal structures on the left hemisphere, instead of the right
(Coull, Frith, Büchel, & Nobre, 2000; Nobre, 2001). Using the ANT, Fan et al. (2005)
observed cortical frontal-parietal activation that was stronger on the left hemisphere fol
lowing warning cues along with activation on the thalamus.
Orienting Network
The orienting system for visual events has been associated with posterior brain areas in
cluding the superior parietal lobe, the temporal-parietal junction, and the frontal eye
fields. Lesions of the parietal lobe and superior temporal lobe have been consistently re
lated to difficulties in orienting (Karnath, Ferber, & Himmelbach, 2001). Activation of cor
tical areas has been specifically associated with operations of disengagement from the
current focus of attention. Moving attention from one location to another involves the su
perior colliculus, whereas engaging attention requires thalamic areas such as the pulv
inar nucleus (Posner & Raichle, 1994). Also, Corbetta and Shulman (2002) reviewed a se
ries of imaging studies and showed that partially segregated networks appear to be in
volved in endogenous and exogenous orientation of attention. Goal-directed or top-down
selection involves activation of a dorsal network that includes intraparietal and frontal
cortices, whereas stimuli-driven (exogenous) attention activates a ventral network con
sisting of the temporal-parietal junction and inferior frontal cortex mostly on the right
hemisphere. The ventral network is involved in orienting attention in a reflexive mode to
salient events and has the capacity to overcome the voluntary orientation associated with
the dorsal system.
(Ach). It has been shown that lesions of cholinergic systems in the basal forebrain in mon
keys interfere with orienting attention (Voytko et al., 1994). In addition, administration of
a muscarinic antagonist, scopolamine, appears to delay orientation to spatial cues but not
to cues that only have an alerting effect (Davidson, Cutrell, & Marrocco, 1999). Further
Page 10 of 40
Development of Attention
evidence shows that the parietal cortex is the site where this modulation takes place. In
jections of scopolamine directly in parietal areas containing cells that respond to spatial
cues affect the ability to shift attention to the cued location. Systemic injections of scopo
lamine have a smaller effect on covert orienting of attention than do local injections in the
parietal area (Davidson & Marrocco, 2000). These observations in the monkey have also
been confirmed by similar studies in the rat (Everitt & Robbins, 1997) and by studies with
nicotine in humans (Newhouse, Potter, & Singh, 2004).
We know from adult brain imaging studies that Stroop tasks activate the anterior cingu
late cortex (ACC). In a meta-analysis of imaging studies, the dorsal section of the ACC
was activated in response to cognitive conflict tasks such as variants of the Stroop task,
whereas the ventral section appeared to be mostly activated by emotional tasks and emo
tional states (Bush, Luu, & Posner, 2000). The two divisions of the ACC also seem to inter
act in a mutually exclusive way. For instance, when the cognitive division is activated, the
affective division tends to be deactivated, and vice versa, suggesting the possibility of rec
iprocal effortful and emotional controls of attention (Drevets & Raichle, 1998). Also, re
solving conflict from incongruent stimulation in the flanker task activates the dorsal por
tion of the ACC together with other regions of the lateral prefrontal cortex (Botvinick, Ny
strom, Fissell, Carter, & Cohen, 1999; Fan, Flombaum, McCandliss, Thomas, & Posner,
2003). Different parts of the ACC appear to be well connected to a variety of other brain
regions, including limbic structures and parietal and frontal areas (Posner, Sheese, Odlu
das, & Tang, 2006). Support for the voluntary exercise of self-regulation comes from stud
ies that examine either the instruction to control affect or the connections involved in the
exercise of that control. For example, the instruction to avoid arousal during processing
of erotic events (Beauregard, Levesque, & Bourgouin, 2001) or to ward off emotion when
looking at negative pictures (Ochsner, Bunge, Gross, & Gabrieli, 2002) produces a locus
of activation in midfrontal and cingulate areas. In addition, if people are required to se
lect an input modality, the cingulate shows functional connectivity to the selected sensory
system (Crottaz-Herbette & Menon, 2006). Similarly, when involved with emotional pro
cessing, the cingulate shows a functional connection to limbic areas (Etkin, Egner, Per
aza, Kandel, & Hirsch, 2006). These findings support the role of cingulate areas in the
control of cognition and emotion.
As with the previous networks, pharmacological studies conducted with monkeys and rats
have aided our understanding of the neurochemical mechanisms affecting efficiency of
the executive attention network. In this case, data suggest that dopamine (DA) is the im
portant neurotransmitter for executive control. Blocking DA in the dorsal-lateral pre
frontal cortex (DLPFC) of rhesus monkeys causes deficits on tasks involving inhibitory
control (Brozoski, Brown, Rosvold, & Goldman, 1979). Additionally, activation of mesocor
tical dopaminergic neurons in rats enhances activity in the prefrontal cortex (McCulloch,
Savaki, McCulloch, Jehle, & Sokoloff, 1982), as does expression of DA receptors in the an
terior cingulate cortex (Stanwood, Washington, Shumsky, & Levitt, 2001).
Page 11 of 40
Development of Attention
Attention in infancy is less developed than later in life, and the functions of alerting, ori
enting, and executive control in particular are less independent during infancy. In their
volume Attention in Early Development, Ruff and Rothbart (1996) extensively reviewed
the development of attention across early childhood. They suggest that both reactive and
self-regulatory systems of attention are at play during the first years of life. Initially, reac
tive attention is involved in more automatic engagement and orienting processes. Then,
by the end of the first year of life, attention can be more voluntarily controlled. Across the
toddler and preschool years, the self-regulatory system increasingly assumes control of
attentional processes, allowing for a more flexible and goal-oriented control of attentional
resources. (p. 303) In this section, I first discuss components of attention related to the
state of engagement and selectivity and then consider attention in relation to self-regula
tion.
The early life of the infant is concerned with changes in state. Sleep dominates at birth,
and the waking state is relatively rare at first. The newborn infant spends nearly three-
fourths of the time sleeping (Colombo & Horowitz, 1987). There is a dramatic change in
the percentage of time in the waking state over the first 3 months of life. By the 12th
postnatal week, the infant has become able to maintain the alert state during much of the
daytime hours, although this ability still depends heavily on external sensory stimulation,
much of it provided by the caregiver.
Newborns show head and eye movements toward novel stimuli, but the control of orient
ing is initially largely in the hands of caregiver presentations. Eye movements are prefer
entially directed toward moving stimuli and have been shown to depend on properties of
the stimulus, for example, how much they resemble human faces (Johnson & Morton,
1991).
Much of the response to external stimuli involves orienting toward the source of stimula
tion. The orienting response is a relatively automatic or involuntary response in reaction
to moderately intense changes in stimulation (Sokolov, 1963). It depends on the physical
characteristics of the stimulation, such as novelty and intensity. With the orienting re
sponse, the organism is alerted and prepared to learn more about the event in order to
respond appropriately. Orienting responses are accompanied by a deceleration in heart
rate that is sustained during the attention period (Richards & Casey, 1991). In babies,
other physical reactions involve decreases in motor activity, sucking, and respiration
(Graham, Anthony, & Ziegler, 1983). The heart rate deceleration is observed after 2
Page 12 of 40
Development of Attention
months of age and increases in amplitude until about 9 months. Subsequently, the ampli
tude decreases until it approximates the adult response, which consists of a small decel
eration in response to novel stimulation (Graham et al., 1983). Orienting responses of
greater magnitude lead to more sustained periods of focused attention, which likely in
crease babies’ opportunities to explore objects and scenes. Infants become more able to
recognize objects also partially because of maturation of temporal-parietal structures and
increases in neural transmission. As this happens, the novelty of objects, and hence their
capacity to alert, diminishes, causing shorter periods of sustained attention. Reductions
of automatic orientation may then facilitate the emergence of self-initiated voluntary at
tention (Ruff & Rothbart, 1996).
The most frequent method of studying orienting in infancy involves the use of eye move
ments tracking devices. As in adults, there is a close relation between the direction of
gaze and the infants’ attention. The attention system can be driven to particular locations
by external input from birth (Richards & Hunter, 1998); however, orientation to the
source of stimulation continues to improve in precision over many years. Infants’ eye
movements often fall short of the target, and peripheral targets are often foveated by a
series of head and eyes movements. Although not as easy to track, the covert system like
ly follows a similar trajectory. One strategy to examine covert orienting consists of pre
senting brief cues that do not produce an eye movement followed by targets that do. Us
ing this strategy, it has been shown that the speed of the eye movement to the target is
enhanced by the cue, and this enhancement improves over the first year of life (Butcher,
2000). In more complex situations, for example, when there are competing targets, the
improvement may go on for longer periods (Enns & Cameron, 1987).
The orienting network appears to be fully functional by around 6 months of age. During
the first months of life there is a progressive development of the connection between vi
sual processing pathways and parietal systems involved in attentional control. This matu
ration allows for visual orientation to be increasingly under attentional control. The dor
sal visual pathway, primarily involved in processing spatial properties of objects and loca
tions, maturates earlier compared with the ventral visual pathway, involved in object
identification. This explains why infants show preference for novel objects instead of nov
el locations (Harman, Posner, Rothbart, & Thomas-Thrapp, 1994). Preference for novel lo
cations is in fact shown from very early on. Inhibition of return is shown by newborns for
very salient objects such as lights (Valenza, Simion, & Umiltá, 1994), and sometime later,
at about 6 months of age, for more complex objects (Harman et al., 1994).
Gaining control over disengaging attention is also necessary to be able to shift between
objects or locations. Infants in the first 2 or 3 months of life often have a hard time disen
gaging from salient objects and events and might become distressed before they are able
to move away from the target. By 4 months, however, infants become more able to look
away from central displays (Johnson, Posner, & Rothbart, 1991). (p. 304) From then on,
the latency to turn from central to peripheral displays decreases substantially with age
(Casey & Richards, 1988). Before disengaging attention, the heart rate begins to acceler
ate back to preattentive levels, and infants become more distractible. After termination of
Page 13 of 40
Development of Attention
attention, there seems to be a refractory period of about 3 seconds during which orient
ing to a novel event in the previous location or a nearby one is inhibited (Casey &
Richards, 1991), a process that might be related to inhibition of return. The ability to dis
engage gaze is achieved before the capacity to disengage attention. The voluntary disen
gagement of attention requires further inhibitory skills that appear to emerge later on, at
about 18 months of age (Ruff & Rothbart, 1996).
Late infancy is the time when self-regulation develops. At about the end of the first year,
executive attention-related frontal structures come into play, and this allows for a pro
gressive increase in the duration of orientation based on goals and intentions. For in
stance, periods of focused attention during free play increase steadily after the first year
of life (Ruff & Lawson, 1990). With the development of voluntary attention, young chil
dren become increasingly more sensitive to others’ line of regard, establishing a basis for
joint attention and social influences on selectivity. Increasingly, infants are able to gain
control of their own emotions and other behaviors, and this transition marks the emer
gence of the executive attention system.
Perhaps the earliest evidence of activation of the executive attention network is at about
7 months of age. As discussed earlier, an important form of self-regulation is related to
the ability to detect errors, which has been linked to activation of the ACC. One study ex
amined the ability of infants of 7 to 9 months to detect errors (Berger, Tsur, & Posner,
2006). In this study, infants observed a scenario in which one or two puppets were hidden
behind a screen. A hand was seen to reach behind the screen and either add or remove a
puppet. When the screen was removed, there were either the correct number of puppets
or an incorrect number. Wynn (1992) found that infants of 7 months looked longer when
the number was in error that when it was correct. Whether the increased looking time in
volved the same executive attention circuitry that is active in adults when they detect er
Page 14 of 40
Development of Attention
rors was simply unknown. Berger and colleagues replicated the Wynn study but used a
128-channel electroencephalogram (EEG) to determine the brain activity that occurred
during error trials in comparison with that found when the infant viewed a correct solu
tion. They found that the same EEG component over the same electrode sites differed be
tween correct and erroneous displays both in infants and adults. This suggests that a sim
ilar brain anatomy as in adult studies is involved in infants’ ability to detect errors. Of
course, activating this anatomy for observing an error is not the same as what occurs in
adults, who actually slow down after an error and adjust their performance. However, it
suggests that even very early in life, the anatomy of the executive attention system is at
least partly functional.
Later in the first year of life, there is evidence of further development of executive func
tions, which may depend on executive attention. One example is Adele Diamond’s work
using the “A not B” task and the reaching task. These two marker tasks involve inhibition
of an action that is strongly elicited by the situation. In the “A not B” task, the experi
menter shifts the location of a hidden object from location A to location B, after the
infant’s retrieving from location A had been reinforced as correct in the previous trials
(Diamond, 1991). In the reaching task, visual information about the correct route (p. 305)
to a toy is put in conflict with the cues that normally guide reaching. The normal tenden
cy is to reach for an object directly along the line of sight. In the reaching task, a toy is
placed under a transparent box in front of the child. The opening of the box is on one of
the lateral sides instead of the front side. In this situation, the infant can reach the toy on
ly if the tendency to reach directly along the line of sight is inhibited. Important changes
in the performance of these tasks are observed from 6 to 12 months (Diamond, 2006).
Comparison of performance between monkeys with brain lesions and human infants on
the same marker tasks suggests that the tasks are sensitive to the development of the
prefrontal cortex, and maturation of this brain area seems to be critical for the develop
ment of this form of inhibition.
Another task that reflects the executive system involves anticipatory looking in a visual
sequence task. In the visual sequence task, stimuli are placed in front of the infant in a
fixed and predictable sequence of locations. The infant’s eyes are drawn reflexively to the
stimuli because they are designed to be attractive and interesting. After a few trials,
some infants will begin to anticipate the location of the next target by correctly moving
their eyes in anticipation of the target. Anticipatory looks are thought to reflect the devel
opment of a more voluntary attention system that might depend in part on the orienting
network and also on the early development of the executive network. It has been shown
that anticipatory looking occurs with infants as young as 3½ to 4 months (Clohessy, Pos
ner, & Rothbart, 2001; Haith, Hazan, & Goodman, 1988). However, there are also impor
tant developments that occur during infancy (Pelphrey et al., 2004) and later (Garon,
Bryson, & Smith, 2008). Learning more complex sequences of stimuli, such as sequences
in which a location is followed by one of two or more different locations, depending on
the location of previous stimuli within the sequence (e.g., location 1, then location 2, then
location 1, then location 3, and so on…), requires the monitoring of context and, in adult
studies, has been shown to depend on lateral prefrontal cortex (Keele, Ivry, Mayr, Hazel
Page 15 of 40
Development of Attention
tine, & Heuer, 2003). Infants of 4 months do not learn to go to locations where there is
conflict as to which location is the correct one. The ability to respond when such conflict
occurs is not present until about 18 to 24 months of age (Clohessy et al., 2001). At 3
years, the ability to respond correctly when there is conflict in the sequential looking task
correlates with the ability to resolve conflict in a spatial conflict task (Rothbart, Ellis,
Rueda, & Posner, 2003). These findings support the slow development of the executive at
tention network during the first and second years of life.
The visual sequence task is related to other features that reflect executive attention. One
of these is the cautious reach toward novel toys. Rothbart and colleagues found that the
slow, cautious reach of infants of 10 months predicted higher levels of effortful control as
measured by parent report at 7 years of age (Rothbart, Ahadi, Hersey, & Fisher, 2001). In
fants of 7 months who show higher levels of correct anticipatory looking in the visual se
quence task also show longer inspection before reaching toward novel objects and slower
reaching toward the object (Sheese, Rothbart, Posner, Fraundorf, & White, 2008). This
suggests that successful anticipatory looking at 7 months is one feature of self-regulation.
In addition, infants with higher levels of correct anticipatory looking also showed evi
dence for higher levels of emotionality in a distressing task and more evidence of efforts
to self-regulate their emotional reactions. Thus, even at 7 months, the executive attention
system is showing some properties of self-regulation, even though it is not yet sufficiently
developed to resolve the simple conflicts used in the visual sequence task or the task of
reaching away from the line of sight in the transparent box task.
An important question about early development of executive attention is its relation to the
orienting network. The findings to date suggest that orienting is playing some of the reg
ulatory roles in early infancy that are later exercised by the executive network. I argued
that the orienting network seems to have a critical role in regulation of emotion by the
caregiver as early as 4 months. It has been recently shown that orienting as measured
from the Infant Behavior Questionnaire at 7 months is not correlated with effortful con
trol as measured in the same infants at 2 years (Sheese, Voelker, Posner, & Rothbart,
2009). However, orienting did show some relation with early regulation of emotional re
sponding of the infants because it was positively related to positive affect and negatively
related to negative affect. After toddlerhood, emotional control by orienting might experi
ence a transition toward control by executive attention because a negative relationship
between effortful control and negative affect has been repeatedly found in preschool-
aged and older children (Rothbart & Rueda, 2005).
Page 16 of 40
Development of Attention
tion by 4 to 5 years. This schedule is similar to our discussion in order, but as discussed in
the next section, all these functions continue developing during childhood.
Childhood
In the preschool years, children become more able to follow instructions and perform RT
tasks. To study the development of attention functions across childhood, a child-friendly
version of the ANT was developed (Rueda, Fan, et al., 2004). This version is structurally
similar to the adult version but uses fish instead of arrows as target stimuli. This allows
contextualization of the task in a game in which the goal is to feed the middle fish (tar
get), or simply to make it happy, by pressing a key corresponding to the direction in
which it points. After the response is made, a feedback consisting of an animation of the
middle fish is provided, which intends to help the child’s motivation to complete the task.
Using the child ANT, the development of attention networks has been traced during the
primary school period into early adolescence. In a first study, this task was used with chil
dren aged 6 to 10 years and in adults (Rueda, Fan, et al., 2004). Results showed separate
developmental trajectories for each attention function. Alerting scores showed stability
across early and middle childhood, but children’s scores were higher than the scores ob
Page 17 of 40
Development of Attention
tained by adults, suggesting further development of alerting during late childhood. Ori
enting scores showed no differences across ages, suggesting an early development of this
network. However, in this study invalid cues were not used; thus, the load of operations of
disengagement and reallocation of attention was rather low. (p. 307) Finally, flanker inter
ference scores indexing executive attention efficiency showed considerable reduction
from age 6 to 7 years. However, interference, as calculated with both RT and percentage
of errors, remained about the same from age 7 years to adulthood, suggesting that early
childhood is the period of major development of executive attention.
Page 18 of 40
Development of Attention
One of the strengths of the child ANT is that it is a theoretically grounded task that com
bines experimental strategies widely used to study alerting (e.g., warning signals), orient
ing (e.g., spatial cues), and attention control (e.g., flanker conflict) within the same task.
However, much of the developmental research on attention has been conducted separate
ly for each function. The main findings of this research are reviewed next.
Alerting
Several studies have examined developmental changes in phasic alertness between
preschoolers, older children, and adults. Increasing age is generally associated with larg
er reductions in RT in response to warning cues. Developmental differences in response
preparation may relate to the speed and maintenance of preparation while expecting the
target. It has been shown that young children (5-year-olds) need more time than older
children (8-year-olds) and adults to get full benefit from a warning cue (Berger, Jones,
Rothbart, & Posner, 2000), and they also seem to be less able to sustain the optimal level
of alertness over time (Morrison, 1982). Using the child ANT, Mezzacappa (2004)
observed a trend toward higher alerting scores (difference between RT in trials with and
without warning cues) with age in a sample of 5- to 7-year-old children. Increasing age
was associated with larger reductions in RT in response to warning cues. Older children
also show lower rates of omissions overall, which indicates greater ability to remain vigi
lant during the task period. The fact that alertness is subject to more fluctuations in
younger children can in part explain age differences in processing speed because alert
ness is thought to speed the processing of subsequent events.
Page 19 of 40
Development of Attention
Developmental changes in alertness during childhood and early adolescence appear to re
late to continuous maturation of frontal systems during this period. One way to examine
brain mechanisms underlying changes in alertness is through studying the CNV, an elec
trophysiological index associated with activation in the right ventral and medial frontal
areas of the brain (Segalowitz & Davies, 2004). In adolescents as well as adults, the CNV
has been shown to relate to performance in various measures of intelligence and execu
tive functions as well as functional capacity of the frontal cortex (Segalowitz, Unsal, &
Dywan, 1992). Various studies have shown that the amplitude of the CNV increases with
age, especially during middle childhood. Jonkman (2006) found that the CNV amplitude is
significantly smaller for 6- to 7-year-old children compared with adults, but no differences
were observed between 9- and 10-year-olds and adults. Moreover, the difference in CNV
amplitude between children and adults seems to be restricted to early components of the
CNV observed over right frontal-central channels (Jonkman, Lansbergen, & Stauder,
2003), which suggests a role of maturation of the frontal alerting network.
A similar developmental pattern emerges when orienting and selectivity are studied in
the auditory modality. Coch et al. (2005) developed a task to measure sustained selective
Page 20 of 40
Development of Attention
auditory attention (p. 309) in young children. They used a dichotic listening task in which
participants were asked to selectively focus attention to one auditory stream consisting of
a narration of a story while ignoring a different stream containing another story. A pic
ture related to the to-be-attended story is visually presented to the child in order to facili
tate the task. Then, ERP are recorded to probes embedded in the attended and unattend
ed channels. Adults show increased P1 and N1 amplitudes to probes presented in the at
tended stream. Six- to 8-year-old children and preschoolers also show an attention-relat
ed modulation of ERP components, which is more sustained in children than adults (Coch
et al., 2005; Sanders, Stevens, Coch, & Neville, 2006). Differences in the topographic dis
tribution of the effect between children and adults also suggest that brain mechanisms
related to sustained selectivity continue developing beyond middle childhood. Additional
ly, further development is necessary when attention has to be disengaged from the at
tended channel and moved to a different one, as occurred for the visual modality. An im
portant improvement in performance between 8 and 11 years of age has been reported
when children are asked to disengage attention from the attended channel and reallocate
it to the other channel in the dichotic listening task (Pearson & Lane, 1991).
Executive Attention
Children are able to perform simple conflict tasks in which their RT can be measured
from age 2 years on, although the tasks need to be adapted to appear child friendly. One
such adaptation is the spatial conflict task (Gerardi-Caulton, 2000), which induces con
flict between the identity and the location of objects. It is a touch-screen task in which
pictures of houses of two animals (i.e., a duck and a cat) are presented in the bottom left
and right sides of the screen, then one of the two animals appears either on the left or
right side of the screen in each trial, and the child is required to touch the house corre
sponding to the animal. Location is the dominant aspect of the stimulus, although instruc
tions require responding according to its identity. Thus, conflict trials in which the animal
appears on the side of the screen opposite to its house usually result in slower responses
and larger error rates than nonconflict trials (when the animal appears on the side of its
house). Between 2 and 4 years of age, children progressed from an almost complete in
ability to carry out the task to relatively good performance. Although 2-year-old children
tended to perseverate on a single response, 3-year-olds performed at high accuracy lev
els; although, like adults, they responded more slowly and with reduced accuracy to con
flict trials (Gerardi-Caulton, 2000; Rothbart, Ellis, Rueda, & Posner, 2003).
Another way to study action monitoring consists of examining the detection and correc
tion of errors. While performing the spatial conflict task, 2½- and 3-year-old children
showed longer RT following erroneous trials than following correct ones, indicating that
children were noticing their errors and using them to guide performance in the next trial.
However, no evidence of slowing following an error was found at 2 years of age (Rothbart
et al., 2003). A similar result with a different time frame was found when using a version
of the Simon Says game. In this task, children are asked to execute a response when a
command is given by one stuffed animal and to inhibit a response commanded by a sec
ond animal (Jones, Rothbart, & Posner, 2003). Children 36 to 38 months of age were un
Page 21 of 40
Development of Attention
able to inhibit their response and did not show the slowing-after-error effect, but at 39 to
41 months of age, children showed both the ability to inhibit and the slowing of RT follow
ing errors. These results suggest that between 30 and 39 months of age, children greatly
develop their ability to detect and correct erroneous responses and that this ability may
relate to the development of inhibitory control.
Data collected with the ANT and reported earlier suggested that the development of con
flict resolution continues during the preschool and early childhood periods. Nonetheless,
studies in which the difficulty of the conflict task is increased by other demands such as
switching rules or holding more information in working memory have shown further de
velopment of conflict resolution between late childhood and adulthood. For example,
Davidson et al. (2006) manipulated memory load, inhibitory demand, and rule switching
(cognitive flexibility) in a spatial conflict task. They found that the cost resulting from the
need for inhibitory control was larger for children than for adults. Also, even under low
memory load conditions, the switching cost was still larger for 13-year-old children than
for adults. The longer developmental course observed with this task might be due to the
requirement of additional frontal brain areas to those involved in executive attention.
Other studies have used ERP to examine the brain mechanisms that underlie the develop
ment of executive attention. In one of these studies, a flanker task was used to compare
conflict resolution (p. 310) in three groups of children aged 5 to 6, 7 to 9, and 10 to 12
years, and a group of adults (Ridderinkhof & van der Molen, 1995). Developmental differ
ences were examined in two ERP components, one related to response preparation (LRP)
and another one related to stimulus evaluation (P3). The authors found differences be
tween children and adults in the latency of the LRP, but not in the latency of the P3 peak,
suggesting that developmental differences in the ability to resist interference are mainly
related to response competition and inhibition, but not to stimulus evaluation.
As discussed earlier, brain responses to errors are also informative of the efficiency of the
executive attention system. The amplitude of the ERN seems to reflect detection of the
error as well as its salience in the context of the task and therefore is subject to individ
ual differences in affective style or motivation. Generally, larger ERN amplitudes are as
sociated with greater engagement in the task and/or greater efficiency of the error-detec
tion system (Santesso, Segalowitz, & Schmidt, 2005; Tucker, Hartry-Speiser, McDougal,
Luu, & deGrandpre, 1999). Developmentally, the amplitude of the ERN shows a progres
sive increase during childhood into late adolescence (Segalowitz & Davies, 2004), with
young children (age 7–8 years) being less likely to show the ERN to errors than older chil
dren and adults—at least when performing a flanker task.
Another evoked potential, the N2, is also modulated by the requirement for executive
control (Kopp, Rist, & Mattler, 1996) and has been associated with a source of activation
in the ACC (van Veen & Carter, 2002). In a flanker task, such as the fish version of the
child ANT, adults show larger N2 for incongruent trials over the mid-frontal leads (Rueda,
Posner, Rothbart, & Davis-Stober, 2004b). Four-year-old children also show a larger nega
tive deflection for the incongruent condition compared with the congruent one at mid-
Page 22 of 40
Development of Attention
frontal electrodes. However, compared with adults, this congruency effect had a larger
size and extended over a longer period of time. Later in childhood, developmental studies
have shown a progressive decrease in the amplitude and latency of the N2 effect with age
(Davis, Bruce, Snyder, & Nelson, 2004; Johnstone, Pleffer, Barry, Clarke, & Smith, 2005;
Jonkman, 2006). The reduction of the amplitude appears to relate to the increase in effi
ciency of the system and not to the overall amplitude decrease that is observed with age
(Lamm, Zelazo, & Lewis, 2006). Also, the effects are more widely distributed for young
children, and they become more focalized with age (Jonkman, 2006; Rueda et al., 2004b).
Source localization analyses indicate that, compared with adults, children need additional
activations to adequately explain the distribution (Jonkman, Sniedt, & Kemner, 2007).
The focalization of signals in adults compared with children is consistent with neuroimag
ing studies, in which children appear to activate the same network of areas as adults
when performing similar tasks, but the average volume of activation appears to be re
markably greater in children than in adults (Casey, Thomas, Davidson, Kunz, & Franzen,
2002; Durston et al., 2002). Altogether, these data suggest that the brain circuitry under
lying executive functions becomes more focal and refined as it gains in efficiency. This
maturational process involves not only greater anatomical specialization but also reduc
ing the time these systems need to resolve each of the processes implicated in the task.
This is consistent with recent data showing that the network of brain areas involved in at
tentional control shows increased segregation of short-range connections but increased
integration of long-range connections with maturation (Fair et al., 2007). Segregation of
short-range connectivity may be responsible for greater local specialization, whereas in
tegration of long-range connectivity likely increases efficiency by improving coordinated
responses between different processing networks.
Page 23 of 40
Development of Attention
In summary, evidence shows that the attention networks have different developmental
courses across childhood. The different developmental courses are represented in Figure
15.2. Shortly after birth infants show increasing levels of alertness, although alertness is
highly dependent on exogenous stimulation. Then, preparation from warning signals
shows a progressive development during the first years of life, whereas the ability to en
dogenously sustain attention improves up to late childhood. Orienting also shows pro
gressive development of increasingly complex functions. Infants are able to orient to ex
ternal cues by age 4 months. From there, children’s orientation is increasingly more pre
cise and less dependent on exogenous stimulation. Endogenous disengagement and reori
enting of attention progress up to late childhood and early adolescence. The earlier signs
of executive attention appear by the end of the first year of life. From there on, there is a
progressive development, especially during the preschool period, of the ability to inhibit
dominant responses and suppress irrelevant stimulation. However, with increasingly com
plex conditions, as when other executive functions (p. 311) (e.g., working memory, plan
ning) come into play, executive attention shows further development during childhood
and adolescence.
The relationship between poor attention, school maladjustment, and low academic
achievement seems to be consistent across ages and cultures. Mechanisms of the execu
tive attention network are likely to play a role in this relationship. We have recently found
that brain activation registered while performing the flanker task is a significant predic
tor of mathematics grades in school (Checa & Rueda, 2011). Also, in a study conducted
with a flanker task and ERPs, children who committed more errors on incongruent trials
showed smaller amplitudes of the ERN. This result suggests less sensibility of the brains
of these children to the commission of errors. Moreover, the amplitude of the ERN was
predicted by individual differences in social behavior, in that children with poorer social
sensitivity as assessed by a self-report personality questionnaire showed ERNs of smaller
Page 24 of 40
Development of Attention
amplitude (Santesso et al., 2005). On the other hand, empathy appears to show a positive
relation with amplitude of the ERN (Santesso & Segalowitz, 2009). Altogether, these data
suggest that attentional flexibility is required to link affect, internalized social norms, and
action in everyday life situations (Rueda, Checa, & Rothbart, 2010).
Studies of this sort are important for establishing links between biology and behavior.
Knowing the neural substrates of attention also provides a tool for examining which as
pects of the attention functions are subject to genetic influence, as well as how the effi
ciency of this system may be influenced by experience.
Genes
Describing the development of a specific neural network is only one step toward a biologi
cal understanding. It is also important to know the genetic and environmental influences
that together built up the neural network.
ences than others. The degree of heritability of each attention network in Posner’s model
was examined in a twin study using the scores provided by the ANT as phenotypes (Fan,
Wu, Fossella, & Posner, 2001). The study showed that the executive attention network
and, to a lesser degree, the alerting network show evidence of heritability. This suggested
that genetic variation contributes to individual differences at least in these two functions.
Links between specific neural networks of attention and chemical modulators allow inves
tigating the genetic basis of normal attention (Fossella et al., 2002; Green et al., 2008).
Information on the neuromodulators that influence the function of the attention networks
has been used to search for genes related to these modulators. Thus, since the sequenc
ing of the human genome in the year 2001 (Venter et al., 2001), many studies have shown
that genes influence individuals’ attention capacity (see Posner, Rothbart, & Sheese,
2007; see also Table 15.1). Polymorphisms in genes related to the norepinephrine and
dopamine systems have been associated with individual differences in the efficiency of
alerting and executive attention (Fossella et al., 2002). For example, it has been found
that scores of the executive attention network are specifically related to variations on the
dopamine receptor D4 (DRD4) gene and the monoanime oxidase-A (MAOA) gene (Fossella
et al., 2002), as well as other dopaminergic genes such as the dopamine transporter 1
(DAT1) gene (Rueda, Rothbart, McCandliss, Saccomanno, & Posner, 2005) and the cate
chol-O-methyltransferase (COMT) gene (Diamond, Briand, Fossella, & Gehlbach, 2004).
Moreover, individuals carrying the alleles associated with better performance showed
greater activation in the anterior cingulate gyrus while performing the ANT (Fan, Fossel
la, Sommer, Wu, & Posner, 2003). On the other hand, polymorphisms of genes influencing
the function of cholinergic receptors have been associated with attentional orienting as
measured with both RT and ERP (Parasuraman, Greenwood, Kumar, Fossella, et al., 2005;
Winterer et al., 2007).
Page 25 of 40
Development of Attention
Experience
Genetic data could wrongly lead to the impression that attention is not susceptible to the
environment and cannot be enhanced or harmed by experience. However, this conclusion
would greatly contradict evidence on the extraordinarily plastic capacity of the human
nervous system, especially during development (see Posner & Rothbart, 2007). There is
some evidence suggesting that susceptibility to the environment might even be embed
ded in genetic endowment because some genetic polymorphisms, often under positive se
lection, appear to make children more susceptible to environmental factors such as par
enting (Sheese, Voelker, Rothbart, & Posner, 2007).
In the past years, much evidence has been provided in favor of the susceptibility of sys
tems of self-regulation to the influence of experience. One piece of evidence comes from
studies showing vulnerability of attention to environmental aspects such as parenting and
socioeconomic status (SES; Bornstein & Bradley, 2003). Noble, McCandliss, and Farah
(2007) assessed predictability of SES in a wide range of cognitive abilities in children.
These investigators found that SES accounts for portions of variance, particularly in lan
guage but also in other superior functions including executive attention. Parental level of
education is also an important environmental factor, highly predictive of the family SES. A
recent study has shown that children whose parents have lower levels of education ap
pear to have more difficulty selecting out irrelevant information as shown by ERP than
those with highly educated parents (Stevens, Lauinger, & Neville, 2009). All these data
indicate that children’s experiences can shape the functional efficiency of brain networks
and also suggest that providing children with the appropriate experiences may constitute
a good method to enhance attentional skills.
Page 26 of 40
Development of Attention
Combita, 2012). In this study, trained children showed a faster and more efficient activa
tion of the brain circuitry involved in executive attention, an effect that was still observed
at 2 months’ follow-up. An important question related to intervention is whether it has the
potential to overcome the influence of negative experience or unfavorable constitutional
conditions. Although further data on this question are undoubtedly needed, current evi
dence indicates that training may be an important tool, especially for children with
greater risk for experiencing attentional difficulties.
Consistently with our results, other studies have shown beneficial effects of cognitive
training on attention and other forms of executive functions during development. For in
stance, auditory selective attention was improved by training with a computerized pro
gram designed to promote oral language skills in both language-impaired and typically
developing children (Stevens, Fanning, Coch, Sanders, & Neville, 2008). Klingberg and
colleagues have shown that working memory training has benefits and shows some de
gree of transfer to aspects of attention (Thorell, Lindqvist, Nutley, Bohlin, & Klingberg,
2009). The Klingberg group has also shown evidence that training can affect various lev
els of brain function including activation (Olesen, Westerberg, & Klingberg, 2004) and
changes in the dopamine D1 receptor system (McNab et al., 2009) in areas of the cere
bral cortex involved in the trained function.
There is also some evidence that curricular interventions directly carried out in the class
room can lead to improvements in children’s cognitive control. Diamond et al. (2007)
tested the influence of a specific curriculum on preschoolers’ control abilities and found
beneficial effects as measured by various conflict tasks. A somewhat indirect but proba
bly not less beneficial form of fostering attention in school could be provided by multilin
gual education. There is growing evidence indicating that bilingual individuals perform
better on executive attention tasks than monolinguals (Bialystok, 1999; Costa, Hernan
dez, & Sebastian-Galles, 2008). The idea is that people using multiple languages on a reg
ular basis might train executive attention because of the need to suppress one language
while using the other.
Although all this evidence shows promising results about the effectiveness of interven
tions and particular educational methods to promote attentional skills, questions on vari
ous aspects of training remain to be answered. In future studies, it will be important to
address questions such as whether genetic variation and other constitutionally based
variables influence the extent to which the executive attention network can be modified
by experience, and whether there are limits to the ages at which training can be effec
tive. Additionally, further research will be needed to examine whether the beneficial ef
fects of these interventions transfer to abilities relevant for schooling competence.
of alertness, selectivity, and executive control. These various functions have been ad
dressed in cognitive studies conducted for the most part during the second half of the
twentieth century. Since then, imaging studies have shown that each function is associat
ed with activation of a particular network of brain structures. It has been also determined
that the function of each attention network appears to be modulated by particular neuro
chemical mechanisms. Using Posner’s neuroanatomical model as a theoretical frame
work, I have reviewed developmental studies conducted with infants and children. Evi
dence show that maturation of each attention network follows a particular trajectory that
extends from birth to late childhood and, in the case of executive control, may continue
during adolescence. Apart from the progressive competence acquired with maturation,
efficiency of attentional functions is subject to important differences among individuals of
the same age. Evidence shows that individual differences in attentional efficiency depend
on constitutional as well as environmental factors, or the combination of both. Important
ly, these individual differences largely appear to contribute to central aspects in the life of
children, including social-emotional development and school competence. For the future,
it will be important to understand the mechanisms by which genes and experience influ
ence the organization and (p. 314) efficiency of the attention networks and whether there
exist sensitive periods when intervention to foster attention may have the most prevalent
benefits.
Author Note
This work was supported by grants from the Spanish Ministry of Science and Innovation,
refs. PSI2008-02955 and PSI2011–27746.
References
Abundis, A., Checa, P., Castellanos, C., & Rueda, M. R. (2013). Electrophysiological corre
lates of the development of attention networks in childhood. Manuscript submitted for
publication.
Beauregard, M., Levesque, J., & Bourgouin, P. (2001). Neural correlates of conscious self-
regulation of emotion. Journal of Neuroscience, 21 (18), 6993–7000.
Berger, A., Jones, L., Rothbart, M. K., & Posner, M. I. (2000). Computerized games to
study the development of attention in childhood. Behavior Research Methods, Instru
ments & Computers, 32 (2), 297–303.
Berger, A., Tzur, G., & Posner, M. I. (2006). Infant brains detect arithmetic errors. Pro
ceedings of the National Academy of Sciences, 103 (33), 12649–12653.
Page 28 of 40
Development of Attention
Bialystok, E. (1999). Cognitive complexity and attentional control in the bilingual mind.
Child Development, 70 (3), 636.
Bornstein, M. H., & Bradley, R. H. (2003). Socioeconomic status, parenting, and child de
velopment. Mahwah, NJ: Erlbaum.
Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict
monitoring and cognitive control. Psychological Review, 108 (3), 624–652.
Botvinick, M., Nystrom, L. E., Fissell, K., Carter, C. S., & Cohen, J. D. (1999). Conflict
monitoring versus selection-for-action in anterior cingulate cortex. Nature, 402 (6758),
179–181.
Brozoski, T. J., Brown, R. M., Rosvold, H. E., & Goldman, P. S. (1979). Cognitive deficit
caused by regional depletion of dopamine in prefrontal cortex of rhesus monkey. Science,
205, 929–932.
Bush, G., Luu, P., & Posner, M. I. (2000). Cognitive and emotional influences in anterior
cingulate cortex. Trends in Cognitive Sciences, 4 (6), 215–222.
Butcher, P. R. (2000). Longitudinal studies of visual attention in infants: The early devel
opment of disengagement and inhibition of return. Meppel, The Netherlands: Aton.
Callejas, A., Lupiáñez, J., & Tudela, P. (2004). The three attentional networks: On their in
dependence and interactions. Brain and Cognition, 54 (3), 225–227.
Casey, B. J., & Richards, J. E. (1988). Sustained visual attention in young infants measured
with an adapted version of the visual preference paradigm. Child Development, 59, 1514–
1521.
Casey, B. J., & Richards, J. E. (1991). A refractory period for the heart rate response in in
fant visual attention. Developmental Psychobiology, 24, 327–340.
Casey, B., Thomas, K. M., Davidson, M. C., Kunz, K., & Franzen, P. L. (2002). Dissociating
striatal and hippocampal function developmentally with a stimulus-response compatibility
task. Journal of Neuroscience, 22 (19), 8647–8652.
Checa, P., Rodriguez-Bailon, R., & Rueda, M. R. (2008). Neurocognitive and temperamen
tal systems of self-regulation and early adolescents’ school competence. Mind, Brain and
Education, 2 (4), 177–187.
Checa, P., & Rueda, M. R. (2011). Behavioral and brain measures of executive attention
and school competence in late childhood. Developmental Neuropsychology, 36 (8), 1018–
1032.
Cherry, C. E. (1953). Some experiments on the recognition of speech with one and two
ears. Journal of the Acoustical Society, 25, 975–979.
Page 29 of 40
Development of Attention
Clohessy, A. B., Posner, M. I., & Rothbart, M. K. (2001). Development of the functional vi
sual field. Acta Psychologica, 106 (1–2), 51–68.
Coch, D., Sanders, L. D., & Neville, H. J. (2005). An event-related potential study of selec
tive auditory attention in children and adults. Journal of Cognitive Neuroscience, 17 (4),
605–622.
Colombo, J. (2001). The development of visual attention in infancy. Annual Review of Psy
chology, 52, 337–367.
Colombo, J., & Horowitz, F. D. (1987). Behavioral state as a lead variable in neonatal re
search. Merrill-Palmer Quarterly, 33, 423–438.
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten
tion in the brain. Nature Reviews Neuroscience, 3 (3), 201–215.
Costa, A., Hernandez, M., & Sebastian-Galles, N. (2008). Bilingualism aids conflict resolu
tion: Evidence from the ANT task. Cognition, 106 (1), 59–86.
Coull, J. T., Frith, C. D., Büchel, C., & Nobre, A. C. (2000). Orienting attention in time: Be
havioral and neuroanatomical distinction between exogenous and endogenous shifts.
Neuropsychologia, 38 (6), 808–819.
Coull, J. T., Frith, C. D., Frackowiak, R. S. J., & Grasby, P. M. (1996). A fronto-parietal net
work for rapid visual information processing: A PET study of sustained attention and
working memory. Neuropsychologia, 34 (11), 1085–1095.
Coull, J. T., Nobre, A. C., & Frith, C. D. (2001). The noradrenergic a2 agonist clonidine
modulates behavioural and neuroanatomical correlates of human attentional orienting
and alerting. Cerebral Cortex, 11 (1), 73–84.
Crottaz-Herbette, S., & Menon, V. (2006). Where and when the anterior cingulate cortex
modulates attentional response: Combined fMRI and ERP evidence. Journal of Cognitive
Neuroscience, 18 (5), 766–780.
Cui, R. Q., Egkher, A., Huter, D., Lang, W., Lindinger, G., & Deecke, L. (2000). High resolu
tion spatiotemporal analysis of the contingent negative variation in simple or complex mo
tor tasks and a non-motor task. Clinical Neurophysiology, 111 (10), 1847–1859.
Curtindale, L., Laurie-Rose, C., Bennett-Murphy, L., & Hull, S. (2007). Sensory modality,
temperament, and the development of sustained attention: A vigilance study in children
and adults. Developmental Psychology, 43 (3), 576–589.
Danis, A., Pêcheux, M.-G., Lefèvre, C., Bourdais, C., & Serres-Ruel, J. (2008). A continuous
performance task in preschool children: Relations between attention and performance.
European Journal of Developmental Psychology, 5 (4), 401–418.
Page 30 of 40
Development of Attention
Davidson, M. C., Amso, D., Anderson, L. C., & Diamond, A. (2006). Development of cogni
tive control and executive functions from 4 to 13 years: Evidence from manipulations of
memory, inhibition, and task switching. Neuropsychologia, 44 (11), 2037–2078.
Davidson, M. C., Cutrell, E. B., & Marrocco, R. T. (1999). Scopolamine slows the
(p. 315)
Davidson, M. C., & Marrocco, R. T. (2000). Local infusion of scopolamine into intrapari
etal cortex slows covert orienting in rhesus monkeys. Journal of Neurophysiology, 83 (3),
1536–1549.
Davis, E. P., Bruce, J., Snyder, K., & Nelson, C. A. (2004). The X-trials: Neural correlates of
an inhibitory control task in children and adults. Journal of Cognitive Neuroscience, 15,
532–443.
Dehaene, S., Posner, M. I., & Tucker, D. M. (1994). Localization of a neural system for er
ror detection and compensation. Psychological Science, 5 (5), 303–305.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An
nual Review of Neuroscience, 18, 193–222.
Diamond, A. (1991). Neuropsychological insights into the meaning of object concept de
velopment. In S. Carey & R. Gelman (Eds.), The epigenesis of mind: Essays on biology and
cognition (pp. 67–110). Hillsdale, NJ: Erlbaum.
Diamond, A., Barnett, W. S., Thomas, J., & Munro, S. (2007). Preschool program improves
cognitive control. Science, 318 (5855), 1387–1388.
Diamond, A., Briand, L., Fossella, J., & Gehlbach, L. (2004). Genetic and neurochemical
modulation of prefrontal cognitive functions in children. American Journal of Psychiatry,
161 (1), 125–132.
Dockree, P. M., Kelly, S. P., Roche, R. A. P., Reilly, R. B., Robertson, I. H., & Hogan, M. J.
(2004). Behavioural and physiological impairments of sustained attention after traumatic
brain injury. Cognitive Brain Research, 20 (3), 403–414.
Drevets, W. C., & Raichle, M. E. (1998). Reciprocal suppression of regional cerebral blood
flow during emotional versus higher cognitive processes: Implications for interactions be
tween emotion and cognition. Cognition & Emotion, 12 (3), 353–385.
Driver, J., Eimer, M., & Macaluso, E. (2007). Neurobiology of human spatial attention:
Modulation, generation and integration. In N. Kanwisher & J. Duncan (Eds.), Attention
Page 31 of 40
Development of Attention
and performance XX: Functional brain imaging of visual cognition (pp. 267–300). New
York: Oxford University Press.
Durston, S., Thomas, K. M., Yang, Y., Ulug, A. M., Zimmerman, R. D., & Casey, B. (2002).
A neural basis for the development of inhibitory control. [Journal Peer Reviewed Journal].
Developmental Science, 5 (4), F9–F16.
Eisenberg, N., Fabes, R. A., Nyman, M., Bernzweig, J., & Pinuelas, A. (1994). The rela
tions of emotionality and regulation to children’s anger-related reactions. Child Develop
ment, 65 (1), 109–128.
Enns, J. T., & Brodeur, D. A. (1989). A developmental study of covert orienting to peripher
al visual cues. Journal of Experimental Child Psychology, 48 (2), 171–189.
Enns, J. T., & Cameron, S. (1987). Selective attention in young children: The relations be
tween visual search, filtering, and priming. Journal of Experimental Child Psychology, 44,
38–63.
Enns, J. T., & Girgus, J. S. (1985). Developmental changes in selective and integrative vi
sual attention. Journal of Experimental Child Psychology, 40, 319–337.
Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a
target letter in a nonsearch task. Perception & Psychophysics, 16 (1), 143–149.
Etkin, A., Egner, T., Peraza, D. M., Kandel, E. R., & Hirsch, J. (2006). Resolving emotional
conflict: A role for the rostral anterior cingulate cortex in modulating activity in the amyg
dala. Neuron, 51 (6), 871–882.
Everitt, B. J., & Robbins, T. W. (1997). Central cholinergic systems and cognition. Annual
Review of Psychology, 48.
Fair, D. A., Dosenbach, N. U. F., Church, J. A., Cohen, A. L., Brahmbhatt, S., Miezin, F. M.,
et al. (2007). Development of distinct control networks through segregation and integra
tion. PNAS Proceedings of the National Academy of Sciences of the United States of
America, 104 (33), 13507–13512.
Fan, J., Flombaum, J. I., McCandliss, B. D., Thomas, K. M., & Posner, M. I. (2003). Cogni
tive and brain consequences of conflict. NeuroImage, 18 (1), 42–57.
Fan, J., Fossella, J., Sommer, T., Wu, Y., & Posner, M. I. (2003). Mapping the genetic varia
tion of executive attention onto brain activity. Proceedings of the National Academy of
Sciences U S A, 100 (12), 7406–7411.
Fan, J., McCandliss, B. D., Fossella, J., Flombaum, J. I., & Posner, M. I. (2005). The activa
tion of attentional networks. NeuroImage, 26 (2), 471–479.
Page 32 of 40
Development of Attention
Fan, J., McCandliss, B. D., Sommer, T., Raz, A., & Posner, M. I. (2002). Testing the efficien
cy and independence of attentional networks. Journal of Cognitive Neuroscience, 14 (3),
340–347.
Fan, J., Wu, Y., Fossella, J., & Posner, M. I. (2001). Assessing the heritability of attentional
networks. BMC Neuroscience, 2, 14.
Fossella, J., Sommer, T., Fan, J., Wu, Y., Swanson, J. M., Pfaff, D. W., et al. (2002). Assess
ing the molecular genetics of attention networks. BMC Neuroscience, 3, 14.
Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive function in preschoolers: A re
view using an integrative framework. Psychological Bulletin, 134, 31–60.
Gehring, W. J., Gross, B., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1993). A neural sys
tem for error detection and compensation. Psychological Science, 4, 385–390.
Graham, F. K., Anthony, B. J., & Ziegler, B. L. (1983). The orienting response and develop
mental processes. In D. Siddle (Ed.), Orienting and habituation: Persperctives in human
research (pp. 371–430). New York: Wiley.
Green, A. E., Munafo, M. R., DeYoung, C. G., Fossella, J. A., Fan, J., & Gray, J. R. (2008).
Using genetic data in cognitive neuroscience: From growing pains to genuine insights.
Nature Reviews Neuroscience, 9, 710–720.
Hackley, S. A., & Valle-Inclán, F. (1998). Automatic alerting does not speed late motoric
processes in a reaction-time task. Nature, 391 (6669), 786–788.
Haith, M. M., Hazan, C., & Goodman, G. S. (1988). Expectation and anticipation of dynam
ic visual events by 3.5 month old babies. Child Development, 59, 467–469.
Harman, C., Posner, M. I., Rothbart, M. K., & Thomas-Thrapp, L. (1994). Development of
orienting to objects and locations in human infants. Canadian Journal of Experimental
Psychology, 48, 301–318.
Harman, C., Rothbart, M. K., & Posner, M. I. (1997). Distress and attention inter
(p. 316)
Hebb, D. O. (1949). Organization of behavior. New York: John Wiley & Sons.
Page 33 of 40
Development of Attention
Hillyard, S. A., Di Russo, F., & Martinez, A. (2006). The imaging of visual attention. In J.
Duncan & N. Kanwisher (Eds.), Attention and performance XX: Functional brain imaging
of visual cognition (pp. 381–388). Oxford, UK: Oxford University Press.
Houghton, G., & Tipper, S. P. (1994). A dinamic model of selective attention. In D. Dagen
bach & C. T. (Eds.), Inhibitory mechanisms in attention, memory and language (pp. 53–
113). Orlando, FL: Academic Press.
Johnson, M. H., & Morton, J. (1991). Biology and cognitive development: The case of face
recognition. Oxford, UK: Blackwell.
Johnson, M. H., Posner, M. I., & Rothbart, M. K. (1991). Components of visual orienting in
early infancy: Contingency learning, anticipatory looking, and disengaging. Journal of
Cognitive Neuroscience, 3, 335–344.
Johnstone, S. J., Pleffer, C. B., Barry, R. J., Clarke, A. R., & Smith, J. L. (2005). Develop
ment of inhibitory processing during the Go/NoGo Task: A behavioral and event-related
potential study of children and adults. Journal of Psychophysiology, 19 (1), 11–23.
Jones, L. B., Rothbart, M. K., & Posner, M. I. (2003). Development of executive attention
in preschool children. Developmental Science, 6 (5), 498–504.
Jonkman, L., Sniedt, F., & Kemner, C. (2007). Source localization of the Nogo-N2: A devel
opmental study. Clinical Neurophysiology, 118 (5), 1069–1077.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall.
Karnath, H. O., Ferber, S., & Himmelbach, M. (2001). Spatial awareness is a function of
the temporal not the posterior parietal lobe. Nature, 411, 950–953.
Keele, S. W., Ivry, R. B., Mayr, U., Hazeltine, E., & Heuer, H. (2003). The cognitive and
neural architecture of sequence representation. Psychological Review, 110, 316–339.
Page 34 of 40
Development of Attention
Kopp, B., Rist, F., & Mattler, U. (1996). N200 in the flanker task as a neurobehavioral tool
for investigating executive control. Psychophysiology, 33, 282–294.
Lamm, C., Zelazo, P. D., & Lewis, M. D. (2006). Neural correlates of cognitive control in
childhood and adolescence: Disentangling the contributions of age and executive func
tion. Neuropsychologia, 44 (11), 2139–2148.
Levy, F. (1980). The development of sustained attention (vigilance) in children: Some nor
mative data. Journal of Child Psychology and Psychiatry, 21 (1), 77–84.
Lin, C. C. H., Hsiao, C. K., & Chen, W. J. (1999). Development of sustained attention as
sessed using the Continuous Performance Test among children 6–15 years of age. Journal
of Abnormal Child Psychology, 27 (5), 403–412.
Mangun, G. R., & Hillyard, S. A. (1987). The spatial allocation of visual attention as in
dexed by event-related brain potentials. Human Factors, 29 (2), 195–211.
McCulloch, J., Savaki, H. E., McCulloch, M. C., Jehle, J., & Sokoloff, L. (1982). The distrib
ution of alterations in energy metabolism in the rat brain produced by apomorphine.
Brain Research, 243, 67–80.
McNab, F., Varrone, A., Farde, L., Jucaite, A., Bystritsky, P., Forssberg, H., et al. (2009).
Changes in cortical dopamine D1 receptor binding associated with cognitive training.
Science, 323 (5915), 800–802.
Newhouse, P. A., Potter, A., & Singh, A. (2004). Effects of nicotinic stimulation on cogni
tive performance. Current Opinion in Pharmacology, 4 (1), 36–46.
Noble, K. G., McCandliss, B. D., & Farah, M. J. (2007). Socioeconomic gradients predict
individual differences in neurocognitive abilities. Developmental Science, 10 (4), 464–480.
Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of
behavior. In R. J. Davison, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness and self-
regulation (pp. 1–18). New York: Plenum Press.
Page 35 of 40
Development of Attention
Ochsner, K. N., Bunge, S. A., Gross, J. J., & Gabrieli, J. D. (2002). Rethinking feelings: An
fMRI study of the cognitive regulation of emotion. Journal of Cognitive Neuroscience, 14
(8), 1215–1229.
Olesen, P. J., Westerberg, H., & Klingberg, T. (2004). Increased prefrontal and parietal ac
tivity after training of working memory. Nature Neuroscience, 7 (1), 75–79.
Parasuraman, R., Greenwood, P. M., Kumar, R., Fossella, J., et al. (2005). Beyond heritabil
ity: Neurotransmitter genes differentially modulate visuospatial attention and working
memory. Psychological Science, 16, 200–207.
Pelphrey, K. A., Reznick, J. S., Goldman, B. D., Sasson, N., Morrow, J., Donahoe, A., et al.
(2004). Development of visuospatial short-term memory in the second half of the first
year. Developmental Psychology, 40 (5), 836–851.
Gazzaniga (Ed.), The cognitive neurosciences (pp. 615–624). Cambridge, MA: MIT Press.
Posner, M. I. (2008). Measuring alertness. Annals of the New York Academy of Sciences,
1129 (Molecular and Biophysical Mechanisms of Arousal, Alertness, and Attention), 193–
199.
Posner, M. I., & Boies, S. J. (1971). Components of attention. Psychological Review, 78 (5),
391–408.
Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D.
Bouwhuis (Eds.), Attention and performance X (pp. 531–556). London: Erlbaum.
Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: Conflict, target detection,
and cognitive control. Cambridge, MA: MIT Press.
Posner, M. I., & Fan, J. (2008). Attention as an organ system. In J. R. Pomerantz (Ed.), Top
ics in integrative neuroscience (pp. 31–61). New York: Cambridge University Press.
Posner, M. I., & Petersen, S. E. (1990). The attention system of human brain. Annual Re
view of Neuroscience, 13, 25–42.
Posner, M. I., & Raichle, M. E. (1994). Images of mind. New York: Scientific American Li
brary; Dist. W.H. Freeman.
Page 36 of 40
Development of Attention
Posner, M. I., & Rothbart, M. K. (2007). Educating the human brain. Washington, DC:
American Psychological Association.
Posner, M. I., Rothbart, M. K., & Sheese, B. E. (2007). Attention genes. Developmental
Science, 10 (1), 24–29.
Posner, M. I., Rueda, M. R., & Kanske, P. (2007). Probing the mechanisms of attention. In
J. T. Cacioppo, J. G. Tassinary, & G. G. Berntson (Eds.), Handbook of psychophysiology (3er
ed., pp. 410–432). Cambridge, UK: Cambridge University Press.
Posner, M. I., Sheese, B. E., Odludas, Y., & Tang, Y. (2006). Analyzing and shaping human
attentional networks. Neural Networks, 19 (9), 1422–1429.
Posner, M. I., & Snyder, C. R. R. (1975). Attention and cognitive control. In R. Solso (Ed.),
Information processing and cognition: The Loyola Symposium (pp. 55–85). Hillsdale, NJ:
Erlbaum.
Raz, A., & Buhle, J. (2006). Typologies of attentional networks. Nature Reviews Neuro
science, 7 (5), 367–379.
Rensink, R. A., O’Reagan, J. K., & Clark, J. J. (1997). To see or not to see: The need for at
tention to perceive changes in scenes. Psychological Science, 8 (5), 368–373.
Richards, J. E., & Casey, B. J. (1991). Heart rate variability during attention phases in
young infants. Psychophysiology, 28 (1), 43–53.
Richards, J. E., & Hunter, S. K. (1998). Attention and eye movements in young infants:
Neural control and development. In J. E. Richards (Ed.), Cognitive neuroscience of atten
tion. Mahwah, NJ: LEA.
Ridderinkhof, K. R., & van der Molen, M. W. (1995). A psychophysiological analysis of de
velopmental differences in the ability to resist interference. Child Development, 66 (4),
1040–1056.
Rothbart, M. K., Ahadi, S. A., Hersey, K. L., & Fisher, P. (2001). Investigations of tempera
ment at three to seven years: The Children’s Behavior Questionnaire. Child Development,
72 (5), 1394–1408.
Rothbart, M. K., Ellis, L. K., Rueda, M., & Posner, M. I. (2003). Developing mechanisms of
temperamental effortful control. Journal of Personality, 71 (6), 1113–1143.
Rothbart, M. K., & Rueda, M. R. (2005). The development of effortful control. In U. Mayr,
E. Awh, & S. W. Keele (Eds.), Developing individuality in the human brain. A tribute to
Michael I. Posner (pp. 167–188). Washington, DC: American Psychological Association.
Rueda, M. R., Checa, P., & Combita, L. M. (2012). Enhanced efficiency of the executive at
tention network after training in preschool children: Immediate changes and effects after
two months. Developmental Cognitive Neuroscience, 2S, S192–S204.
Page 37 of 40
Development of Attention
Rueda, M. R., Checa, P., & Rothbart, M. K. (2010). Contributions of attentional control to
social emotional and academic development. Early Education and Development, 21 (5),
744–764.
Rueda, M., Fan, J., McCandliss, B. D., Halparin, J. D., Gruber, D. B., Lercari, L. P., et al.
(2004). Development of attentional networks in childhood. Neuropsychologia, 42 (8),
1029–1040.
Rueda, M. R., Posner, M. I., & Rothbart, M. K. (2004a). Attentional control and self-regu
lation. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, the
ory, and applications (pp. 283–300). New York: Guilford Press.
Rueda, M. R., Posner, M. I., Rothbart, M. K., & Davis-Stober, C. P. (2004b). Development
of the time course for processing conflict: An event-related potentials study with 4 year
olds and adults. BMC Neuroscience, 5 (39), 1–13.
Rueda, M. R., Rothbart, M. K., McCandliss, B. D., Saccomanno, L., & Posner, M. I. (2005).
Training, maturation, and genetic influences on the development of executive attention.
Proceedings of the National Academy of Sciences U S A, 102 (41), 14931–14936.
Ruff, H. A., & Lawson, K. R. (1990). Development of sustained, focused attention in young
children during free play. Developmental Psychology, 26 (1), 85–93.
Ruff, H. A., & Rothbart, M. K. (1996). Attention in early development: Themes and varia
tions. New York: Oxford University Press.
Sanders, L. D., Stevens, C., Coch, D., & Neville, H. J. (2006). Selective auditory attention
in 3-to 5-year-old children: An event-related potential study. Neuropsychologia, 44 (11),
2126–2138.
Santesso, D. L., & Segalowitz, S. J. (2009). The error-related negativity is related to risk
taking and empathy in young men. Psychophysiology, 46 (1), 143–152.
Santesso, D. L., Segalowitz, S. J., & Schmidt, L. A. (2005). ERP correlates of error moni
toring in 10-year olds are related to socialization. Biological Psychology, 70 (2), 79–87.
Schul, R., Townsend, J., & Stiles, J. (2003). The development of attentional orienting dur
ing the school-age years. Developmental Science, 6 (3), 262–272.
Segalowitz, S. J., & Davies, P. L. (2004). Charting the maturation of the frontal lobe: An
electrophysiological strategy. Brain and Cognition, 55 (1), 116–133.
Segalowitz, S. J., Unsal, A., & Dywan, J. (1992). Cleverness and wisdom in 12-year-olds:
Electrophysiological evidence for late maturation of the frontal lobe. Developmental Neu
ropsychology, 8, 279–298.
Page 38 of 40
Development of Attention
Sheese, B. E., Rothbart, M. K., Posner, M. I., Fraundorf, S. H., & White, L. K. (2008). Exec
utive attention and self-regulation in infancy. Infant Behavior & Development, 31 (3), 501–
510.
Sheese, B. E., Voelker, P., Posner, M. I., & Rothbart, M. K. (2009). Genetic variation influ
ences on the early development of reactive emotions and their regulation by attention.
Cognitive Neuropsychiatry, 14 (4–5), 332–355.
Sheese, B. E., Voelker, P. M., Rothbart, M. K., & Posner, M. I. (2007). Parenting
(p. 318)
Simonds, J., Kieras, J. E., Rueda, M., & Rothbart, M. K. (2007). Effortful control, executive
attention, and emotional regulation in 7-10-year-old children. Cognitive Development, 22
(4), 474–488.
Sokolov, E. N. (1963). Perception and the conditioned reflex. Oxford, UK: Pergamon.
Stanwood, G. D., Washington, R. A., Shumsky, J. S., & Levitt, P. (2001). Prenatal cocaine
exposure produces consistent developmental alteration in dopamine-rich regions of the
cerebral cortex. Neuroscience, 106, 5–14.
Stevens, C., Fanning, J., Coch, D., Sanders, L., & Neville, H. (2008). Neural mechanisms
of selective auditory attention are enhanced by computerized training: Electrophysiologi
cal evidence from language-impaired and typically developing children. Brain Research,
1205, 55–69.
Stevens, C., Lauinger, B., & Neville, H. (2009). Differences in the neural mechanisms of
selective attention in children from different socioeconomic backgrounds: An event-relat
ed brain potential study. Developmental Science, 12 (4), 634–646.
Sturm, W., de Simone, A., Krause, B. J., Specht, K., Hesselmann, V., Radermacher, I., et al.
(1999). Functional anatomy of intrinsic alertness: Evidence for a fronto-parietal-thalamic-
brainstem network in the right hemisphere. Neuropsychologia, 37 (7), 797–805.
Tarkka, I. M., & Basile, L. F. H. (1998). Electric source localization adds evidence for task-
specific CNVs. Behavioural Neurology, 11 (1), 21–28.
Thorell, L. B., Lindqvist, S., Nutley, S. B., Bohlin, G., & Klingberg, T. (2009). Training and
transfer effects of executive functions in preschool children. Developmental Science, 12
(1), 106–113.
Page 39 of 40
Development of Attention
Tucker, D. M., Hartry-Speiser, A., McDougal, L., Luu, P., & deGrandpre, D. (1999). Mood
and spatial memory: Emotion and right hemisphere contribution to spatial cognition. Bio
logical Psychology, 50, 103–125.
Valenza, E., Simion, F., & Umiltá, C. (1994). Inhibition of return in newborn infants. Infant
Behavior & Development, 17, 293–302.
van Veen, V., & Carter, C. (2002). The timing of action-monitoring processes in the anteri
or cingulate cortex. Journal of Cognitive Neuroscience, 14, 593–602.
Venter, J. C., et al. (2001). The sequence of the human genome. Science, 291, 1304–1351.
Voytko, M. L., Olton, D. S., Richardson, R. T., Gorman, L. K., Tobin, J. R., & Price, D. L.
(1994). Basal forebrain lesions in monkeys disrupt attention but not learning and memo
ry. Journal of Neuroscience, 14 (1), 167–186.
Wainwright, A., & Bryson, S. E. (2002). The development of exogenous orienting: Mecha
nisms of control. Journal of Experimental Child Psychology, 82 (2), 141–155.
Wainwright, A., & Bryson, S. E. (2005). The development of endogenous orienting: Con
trol over the scope of attention and lateral asymmetries. Developmental
Neuropsychology, 27 (2), 237–255.
Winterer, G., et al. (2007). Association of attentional network function with exon 5 varia
tions of the CHRNA4 gene. Human Molecular Genetics, 16, 2165–2174.
Wright, R. D., & Ward, L. E. (2008). Orienting of attention. New York: Oxford University
Press.
Wynn, K. (1992). Addition and subtraction by human infants. Nature, 358, 749–750.
M. Rosario Rueda
Page 40 of 40
Attentional Disorders
Attentional Disorders
Laure Pisella, A. Blangero, Caroline Tilikete, Damien Biotti, Gilles Rode, Alain
Vighetto, Jason B. Mattingley, and Yves Rossetti
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
While reviewing Bálint’s syndrome and its different interpretations, a new framework of
the anatomical-functional organization of the posterior parietal cortex (dorsal visual
stream) is proposed in this chapter based on recent psychophysical, neuropsychological,
and neuroimaging data. In particular, the authors identify two main aspects that exclude
eye and hand movement disorders as specific deficit categories (optic ataxia). The first
aspect is spatial attention, and the second is visual synthesis. Ipsilesional attentional bias
(and contralesional extinction) may be caused by the dysfunction of the saliency map of
the superior parietal lobule (SPL) directly (by its lesion) or indirectly (by a unilateral le
sion creating an inter-hemispheric imbalance between the two SPL). Simultanagnosia
may appear as concentric shrinking of spatial attention due to bilateral SPL damage. Oth
er symptoms such as spatial disorganization (constructional apraxia) and impaired spatial
working memory or visual remapping may be attributable to the second, nonlateralized
aspect. Even if described initially in patients with left-sided neglect, these deficits occur
in the entire space and only after right inferior parietal lobule (IPL) damage. These two
aspects might also correspond to the ventral and dorsal networks of exogenous and en
dogenous attention, respectively.
Keywords: Bálint’s syndrome, extinction, simultanagnosia, neglect, optic ataxia, constructional apraxia, endoge
nous/exogenous attention, visual remapping, saliency maps, dorsal/ventral network of attention, dorsal/ventral vi
sual streams
Page 1 of 56
Attentional Disorders
Introduction
In this chapter, we focus on the role of the posterior parietal cortex (PPC) in visual atten
tion and use the disorder of Bálint’s syndrome (Bálint, 1909) as an illustrative example.
The PPC lies between the occipital-parietal sulcus (POS) and the post-Rolando sulcus. It
includes a superior parietal lobule (SPL) above and an inferior parietal lobule (IPL) below
the intraparietal sulcus (IPS). After reviewing the main interpretations (Andersen &
Buneo 2002; Colby & Goldberg, 1999; Milner & Goodale 1995; Ungerleider & Mishkin
1982) of the functional role of the dorsal (occipital-parietal) stream of visual processing
Page 2 of 56
Attentional Disorders
and the classic categorizations of Bálint’s syndrome, we propose a new framework for un
derstanding the anatomical-functional organization of the PPC in the right hemisphere
(Figure 16.1). In particular, we put forward that the core processing of the PPC appears
to be attention and to only consequently affect “vision for action.” Koch and Ullman
(1985) have postulated the existence of a topographic feature map, which codes the
“saliency” of all information within the visual field. By definition, the amount of activity at
a given location within the saliency map represents the relative “conspicuity” or rele
vance of the corresponding location in the visual field. Interestingly, such a saliency map
has been described by electrophysiologists within the macaque PPC. For instance, single-
unit recordings from the lateral intraparietal (LIP) (p. 320) area suggest that the primate
PPC contains a relatively sparse representation of space, with only those stimuli that are
most salient or behaviorally relevant being strongly represented (Figure 16.2; Gottlieb et
al., 1998). Area LIP provides one potential neural substrate for the levels of visual space
representation proposed by Niebur and Koch (1997), according to whom visual percep
tion does not correspond to a basic retinotopic representation of the visual input, but is
instead a complex, prioritized interpretation of the environment (Pisella & Mattingley
2004). The autonomous, sequential selection of salient regions to be explored, overtly or
covertly, is postulated to follow the firing rate of populations of neurons in the saliency
map, starting with the most salient location and sampling the visual input in decreasing
order of salience (Niebur & Koch 1997). The saliency map would thus be a representa
tional level from which the pattern of attentional or motor exploration of the visual world
is achieved.
It is usually acknowledged that the central processing of visual information follows two
parallel cortical pathways. From the early occipital areas, an inferotemporal pathway (or
ventral stream) processes the objects for visual recognition and identification, and a pari
etal pathway (or dorsal stream) makes a spatial (metric) analysis of visual inputs and pre
pares goal-directed movements. By oversimplification, one often considers that the ven
tral stream is the pathway of “vision for perception” and the dorsal stream the pathway of
“vision for action” (Milner & Goodale 1995), overlooking the fact that an extensive
frontal-parietal network underlies the visual-spatial and attentional processes necessary
for visual perception and awareness (Driver & Mattingley, 1998; Rees, 2001).
Initially, a “what” versus “where” dissociation depicted the putative function of these two
streams (Figure 16.3; Ungerleider & Mishkin 1982). Most subsequent theoretical formu
lations, however, have emphasized a dissociation between detecting the presence of a vi
sual stimulus (“conscious,” “explicit,” “perceptual”) and describing its attributes (“what,”
“semantic”), on the one hand, and performing simple, automatic motor responses toward
a visual stimulus (“unconscious,” “implicit,” “visuomotor,” “pragmatic,” “how”) on the
other (see Milner & Goodale, 1995; Jeannerod & Rossetti, (p. 321)
Page 3 of 56
Attentional Disorders
Page 4 of 56
Attentional Disorders
awareness is restricted to the ipsilesional side of space or objects. In patients with simul
tanagnosia, perceptual awareness is restricted to a reduced central field of view, or to
just one object among others, even when the objects are superimposed in the same loca
tion of space (e.g., in the case of overlapping figures).
After a lesion to the IPL and TPJ in the left hemisphere, patients may present with
Gerstmann’s syndrome, which will not be a focus of this chapter but which also includes a
visual perceptual symptom. In addition to dysgraphia, dyscalculia, and finger agnosia,
these patients are impaired at determining whether a letter is correctly oriented (left-
right confusion). This also suggests a role of the left hemisphere dorsal stream in building
oriented (canonical) perceptual representations. Accordingly, a neuroimaging study from
Konen and Kastner (2008) has revealed object-selective responses displayed in visual ar
eas of both dorsal and ventral streams. Foci of activity in the lateral occipital cortex (ven
tral stream) and in the IPS (dorsal stream) have been shown to represent objects inde
pendent of viewpoint and size, whereas the responses are viewpoint and size specific in
the occipital cortex. Such symptoms, pertaining to Gerstmann’s or Bálint’s syndrome, af
fect the perceptual domain after a lesion to the PPC and thereby highlight a necessary in
teraction of dorsal and ventral streams for visual perception.
rods), which allows a detailed analysis only in the restricted area corresponding to the
fovea (Figure 16.5). If the eyes are fixed, the ventral stream will provide information
about only a small central part of the visual scene (see Figure 16.5), even though the sub
jective experience of most normally sighted individuals is of a coherent, richly detailed
world where everything is apprehended simultaneously. This subjective sensation of an
instantaneous, (p. 323) global, and detailed perception of the visual scene is an illusion, as
hinted at by the phenomenon of “change blindness” in healthy subjects (reviewed in Pisel
la & Mattingley, 2004) and provided thanks to mechanisms of “active vision” (Figure 16.6)
relying on the PPC (Berman & Colby, 2009).
There are three mechanisms of active vision that rely on the PPC (visual attention, sac
cadic eye movements, and visual remapping), as we outline below.
First, shifts of attention without eye displacements (known as covert attention) have been
shown to improve spatial resolution and contrast sensitivity in peripheral vision (Carrasco
Page 7 of 56
Attentional Disorders
et al., 2000; Yeshurun & Carrasco, 1998). Attention can be defined as the neural process
es that allow us to prioritize information of interest from a cluttered visual scene while
suppressing irrelevant detail. Note that a deficit of visual attention can be expressed as a
spatial deficit (p. 324) (omissions of relevant information in contralesional peripheral visu
al space, i.e., eye-centered space) or a temporal one (increased time needed for relevant
information to be selected by attention and further processed; Husain, 2001). So a spatial
bias and a temporal processing impairment may be the two sides of the same coin. The
level of interest or relevance of a stimulus can be provided by physical properties that al
low it to be distinguished easily from the background (bottom-up or exogenous attention
al selection) or by its similarity to a specific target defined by the current task or goal
(top-down or endogenous attentional selection).
Second, rapid eye movements (known as saccades or overt attention) can bring a new
part of the visual scene to the fovea, allowing it to be analyzed with optimal visual acuity.
Active ocular exploration of a visual scene (three saccades per second on average) during
free viewing has been demonstrated using video-oculography by Yarbus (1967). The glob
al image of the visual scene is thus built on progressively through active ocular explo
ration, with each new saccade bringing a new component of the visual scene to the fovea
to be analyzed precisely by the retina and the visual cortex. Most visual areas in the oc
cipital and posterior inferotemporal cortex (ventral stream) are organized with a retinal
topography, with a large magnification of the representation of central vision, relative to
peripheral vision (see Figure 16.4). At the level of these retinocentric visual representa
tions, the components of the visual scene that are successively brought to the fovea by
the ocular exploration actually overwrite each other, at each new ocular fixation. These
different components are thus analyzed serially by the ventral stream as unlocalized
snapshots. A spatial linkage of these different details is necessary for global visual per
ception.
This spatial linkage (or visual synthesis) of multiple snapshots is the third mechanism nec
essary to perceive a large, stable, and coherent visual environment. Brain areas more or
less directly related to saccadic eye movements (frontal and parietal eye fields, superior
colliculus) demonstrate “oculocentric” (instead of retinocentric) spatial representations
acting as “spatial buffers”: The location of successively explored components is stored
and displaced in spatial register with the new eye position after each saccade (Figure
16.7; Colby et al., 1995). The PPC is specifically involved in the “remapping” (or “updat
ing”) mechanisms compensating for the continuous changes of eye position (Heide et al.,
1995), which are crucial to building these dynamic oculocentric representations. Trans-
saccadic oculocentric remapping processes have been explored using computational mod
els (e.g., Anderson & Van Essen, 1987), and demonstrated using specific psychophysical
paradigms like the double-step saccade task (Figure 16.8), in which the retinal location of
a stimulus has to be maintained in memory and recoded with respect to a new ocular lo
cation owing to an intervening saccade. Such paradigms have been used in monkey elec
trophysiology (e.g., Colby et al., 1995), human neuroimaging (Medendorp et al., 2003;
Merriam et al., 2003, 2007), after chronic lesions to the PPC (Duhamel et al., 1992; Heide
Page 8 of 56
Attentional Disorders
et al., 1995; Khan et al., 2005a, 2005b; Pisella et al., 2011), and during transcranial mag
netic stimulation (TMS) of the PPC (Morris et al., 2007, Van Koningsbruggen et al., 2010).
Consistent with these putative mechanisms of active vision, the perceptual symptoms of
Bálint-Holmes syndrome can be seen as arising from deficits in attentive vision, ocular ex
ploration, and global apprehension of visual scenes. One common component of the syn
drome is a limited capacity of attentive vision such that patients fail to perceive at any
time the totality of items forming a visual scene (leading to “piecemeal” perception of the
environment, and a local perceptual bias). Description or copy of images is accordingly
composed of elements of the original figure without perception of the whole scene. Pa
tients not only explore working space in a disorganized fashion but also may return to
scrutinize the same item repeatedly. This “revisiting behavior” may lead, for example, to
patients counting more dots than are actually present in a test display (personal observa
Page 9 of 56
Attentional Disorders
tion of the behavior of patients with Bálint’s syndrome after posterior cortical atrophy).
This revisiting behavior has been reported during ocular exploration of visual scenes in
patients with unilateral neglect and ascribed to a spatial working memory deficit (Husain
et al., 2001). Indeed, a study has demonstrated that patients with parietal neglect show,
in addition to their characteristic right attentional bias, a specific deficit of working mem
ory for location and not for color or shape (Figure 16.9; (p. 325) Pisella et al., 2004). Pisel
la and Mattingley (2004) ascribed this revisiting behavior and spatial working memory
deficit to a loss of “visual remapping” or “visual synthesis” due to parietal lobe damage,
which will contribute to severe visual neglect syndrome.
This association of (1) spatially restricted attention and (2) visual synthesis deficit ap
pears as a consistent frame to understand the different expressions of posterior parietal
lobe lesions. Attention will be restricted to central vision in bilateral lesions or to the ip
silesional (eye-centered) visual field in unilateral lesions, affecting visual perception and
action in peripheral vision. Additional deficit of visual synthesis might occur in the entire
visual field, producing a disorganization of active vision mechanisms and thereby increas
ing the severity of the patient’s restricted field of view (unilateral in neglect and bilateral
in simultanagnosia).
Page 10 of 56
Attentional Disorders
Bálint-Holmes Syndrome
Page 11 of 56
Attentional Disorders
The syndrome reported by Reszo Bálint in 1909 and later, in other terms, by Holmes
(1918) is a clinical entity that combines variously a set of complex spatial behavior disor
ders following bilateral damage (p. 326) to the occipital-parietal junction (dorsal stream).
Both diversity of terminology used in the literature and bias in clinical descriptions, which
often reflect a particular opinion of the authors on underlying mechanisms, add to the dif
ficulty in describing and comprehending this rare and devastating syndrome. Despite
these flaws, Bálint-Holmes syndrome can be identified at the bedside examination and al
lows a robust anticipation of lesion localization.
Page 12 of 56
Attentional Disorders
The related “visual disorientation” syndrome described a few years later by Holmes
(1918; Smith & Holmes, 1916) in soldier patients with bilateral parietal lesions (see also
Inouye, 1900, cited by Rizzo & Vecera, 2002) highlighted a particular oculomotor disor
der: wandering of gaze in search for peripheral objects and a difficulty to maintain fixa
tion. This eye movement disorganization, later (p. 327) labeled gaze ataxia or apraxia
(reviewed in Vallar, 2007), was accompanied by a complete deficit for visually guided
reach-to-grasp movements, even when performed in central vision.
The comparison between Bálint’s and Holmes’ descriptions of the consequences of le
sions to the posterior parietal cortex put forward three differences that still correspond to
debated questions and that constitute the plan of this chapter.
First, the lateralized aspect of the behavioral deficit for left visual space in Bálint’s syn
drome (corresponding to räumliche Störung der Aufmerksamkeit) does not appear in the
patients’ description of Holmes, suggesting that the component known nowadays as uni
lateral neglect might be easily dissociated from the two others components of the triad by
its specific right-hemispheric localization. Visual extinction is often considered as minimal
severity of neglect, and prevalence for right-hemispheric lesions is reported for visual ex
tinction as well as for neglect (Becker & Karnath, 2007). Nevertheless, their dissociation
has been shown clinically (Karnath et al., 2003) and experimentally: Contralesional visual
Page 13 of 56
Attentional Disorders
extinction is symmetrically linked to the function of the right or left superior parietal lob
ules (Hilgetag et al., 2001; Pascual-Leone, 1994), whereas some aspects of neglect (espe
cially those concerned with visual scanning; Ashbridge et al., 1997; Ellison et al., 2004;
Fierro et al., 2000; Fink et al., 2000; Muggleton et al., 2008) are distinct from visual ex
tinction and specifically linked to the right inferior parietal cortex (and superior temporal
cortex). Furthermore, the recent literature tends to highlight the contribution of nonlater
alized deficits to neglect syndrome, that is, deficits specifically linked to right-hemispher
ic damage but nonspatially restricted to the contralesional visual space, like sustained at
tention or spatial working memory (Husain, 2001, 2008; Husain & Rorden, 2003; Malho
tra et al., 2005, 2009; Pisella et al., 2004; Robertson, 1989). The dissociation between lat
eralized and nonlateralized deficits following parietal damage are re-explored and rede
fined in this chapter, notably in the section devoted to neglect, discussed with respect to
visual extinction.
A third difference is the presentation of the eye movement troubles as causing the deficit
by (p. 328) Holmes, whereas they were considered consequences of higher level attention
al deficits by Bálint (Seelenlähmung des Schauens most often translated in English by
“psychic paralysis of gaze”). This question corresponds to the conflicting “intentional”
and “attentional” functional views of the parietal cortex (Andersen & Buneo, 2002; Colby
& Goldberg, 1999). For the former, the PPC is anatomically segregated into regions per
mitting the planning of different types of movements (reach, grasp, saccade), whereas for
the latter, the different functional regions represent locations of interest of external and
internal space with respect to different reference frames.
Page 14 of 56
Attentional Disorders
This chapter reviews historical and recent arguments that have been advanced in the
context of these three debates, with a first section on neglect, a second on OA, and a
third on the less defined third component of the Bálint’s syndrome: psychic paralysis of
gaze. Visual extinction is advocated in each of these three sections. This clinical review of
attentional disorders should allow us, through the patients’ behavioral descriptions, to
start to delineate theoretically and neurologically the concepts of selective versus sus
tained attention, object versus spatial attention, saliency versus oculomotor maps, and ex
ogenous versus endogenous attention. A conceptual segregation, different from Bálint’s,
of these symptoms consecutive to lesions of the PPC will be progressively introduced and
finally proposed based on the recent advances on posterior parietal functional organiza
tion made through psychophysical assessments of patients and neuroimaging data (see
Figure 16.1).
Unilateral Neglect
Basic Arguments for Dissociation Between Neglect and Extinction
Unilateral neglect is defined by a set of symptoms in which the left part of space or ob
jects is not explicitly behaviorally considered. This lateralized definition (left-sided ne
glect instead of contralesional neglect), and its global distinction with contralesional visu
al extinction, is acknowledged as soon as one considers that only the neglect symptoms
(and not the extinction symptoms) appear in the most complex, typically human, types of
Page 15 of 56
Attentional Disorders
visual-spatial behavior like ecological (Figure 16.10, left panel; e.g., make up or shave on
ly the right side of the face, prepare an apple pie, represent mentally a well-known space,
art) or paper-and-pencil (see Figure 16.10, right panel; e.g., draw, copy, enumerate, evalu
ate lengths between left and right spatial positions, write on a white sheet) tasks.
However, it is more classical to distinguish unilateral neglect from visual extinction based
on a more experimental subtlety, the latter being a failure to report an object located con
tralesionally only when it competes for attentional resources with an object located ipsile
sionally. Critically, report of contralesional items presented in isolation should be normal
(or better than for simultaneous stimuli) in visual extinction. Visual extinction is
(p. 329)
thus revealed in conditions of brief presentation and attentional competition, whereas vi
sual neglect refers to loss of awareness of a contralesional stimulus even when it is pre
sented alone and permanently, in free gaze conditions.
Note that the use of the term “contralesionally” rather than “in the contralesional visual
field” in the above definitions reveals the lack of acknowledgment of the reference frame
in which these deficits can occur. Even if visual extinction is usually tested between visual
fields and strictly eye-centered pattern has been described (Figure 16.11; Mattingley et
al.,
Page 16 of 56
Attentional Disorders
Page 17 of 56
Attentional Disorders
Although the literature has treated extinction as a single phenomenon (contrary to ne
glect), this may not be the case and may explain why, as for neglect, the anatomical site
of visual extinction is still debated. Both the tests used for diagnosis and the condition in
which they are tested are crucial. For example, it seems that the processes are different
whether the stimuli are presented at short or wide eccentricities (Riddoch et al., 2010).
Extinction is classically tested with stimuli presented bilaterally at relatively short eccen
tricity (perhaps to ensure reasonable report of single items in the contralesional visual
field; Riddoch et al., 2010) and has been shown to benefit from similarly based groupings
(Mattingley et al., 1997). However, the reverse effect of similarity has been observed with
items presented at wide eccentricities. Riddoch et al. (2010) argue that at far separations
participants need to select both items serially, with attention switched from one location
to another. This may also be the case when one increases task difficulty (Figure 16.13) or
attentional demand. In such conditions, extinction might then arise from a limitation in
short-term visual memory rather than from competition for selection between stimuli that
are available in parallel. Given that Pisella et al. (2004) have shown that spatial working
memory deficit is a nonlateralized component of parietal neglect, this makes the frontier
between patients considered as exhibiting extinction or neglect rather confusing.
Page 18 of 56
Attentional Disorders
In sum, these experimental conditions using dots are unsatisfactory to distinguish be
tween visual extinction and neglect, whereas their clinical distinction in terms of handi
cap and recovery and with paper-and-pencil tasks is almost intuitive (as mentioned
above). Alternatively, the presence of nonlateralized components, in addition to the com
mon lateralized bias of attention, could be used as criteria to distinguish “clinical” unilat
eral neglect from visual extinction.
gions specifically involved in remapping mechanisms using the double-step saccadic para
digm with four combinations of saccades directions in different groups of stroke patients.
Patients with left PPC lesions were impaired when the initial saccade was made toward
the right, followed by a second saccade toward the left (right-left condition), but not in
the condition of two successive rightward saccades. Patients with right PPC lesions were
impaired in left-right and left-left conditions, and also in right-left condition (only the
right-right combination was correctly performed; see Figure 16.8). As reviewed in Pisella
and Mattingley (2004), this asymmetry in remapping impairment matches the clinical
consequences of lesions of the human IPL and is probably due to an asymmetry of visual
space representation between the two hemispheres. Later, Heide and Kömpf (1997) wrote
about their study (Heide et al., 1995): “our data confirm the key role of the PPC in the
analysis of visual space with a dominance of the right hemisphere” (p. 166), and provided
new information on the lesions and symptoms of their patients’ groups: the focus of PPC
lesions located “in the inferior parietal lobule along the border between the angular and
supramarginal gyrus, extending cranially toward the intraparietal sulcus, caudally to the
temporo-parietal junction, and posteriorly into the angular gyrus” (p. 158). Compatible
with this lesion site, patients of the right PPC lesion group in the study of Heide et al.
(1995) presented with symptoms of hemineglect. Furthermore, their deficit in the double-
step saccade task did correlate with patients’ impairment in copying Rey’s complex figure
(Figure 16.14), but not with other tests measuring severity of left hemineglect (Heide &
Kömpf, 1997). Accordingly, Pisella and Mattingley (2004) have suggested that an impair
ment of remapping processes may contribute to a series of symptoms that pertain to uni
lateral visual neglect syndrome and that are unexplained by the attentional hypothesis
alone (the ipsilesional attentional bias), such as revisiting, spatial transpositions, and dis
organization in the whole visual field. The severity of neglect might depend on two disso
ciated factors: 1) the strength of the ipsilesional attentional bias, and 2) the severity of
non-lateralized attentional deficits.
Page 20 of 56
Attentional Disorders
Another attempt to account for the prevalence of visual neglect following right-hemi
spheric lesion has highlighted its specialization in “nonspatial” processes that are critical
for visual selection, such as sustained attention (or arousal; Robertson, 1989). In this re
spect, the review of Husain (2001) on the nonspatial temporal deficits associated with ne
glect is of prime interest. Husain et al. (1997) have shown (p. 332) that lesions of either
the frontal lobe (4), the parietal lobe (3), or the basal ganglia (1) causing an ipsilesional
attentional bias are also associated with a limited-capacity visual processing system caus
ing abnormal visual processing over time between letters presented at the same location
in space (lengthening of the “attentional blink” in central vision tested by the Rapid Seri
al Visual Presentation paradigm; Broadbent & Broadbent, 1987; Raymond et al., 1992). As
reviewed in Husain (2001), a later study by di Pelligrino et al. (1998) suggested that this
deficit in time and the spatial bias were the two sides of the same coin by showing in a
patient with left-sided visual extinction that the attentional blink was lengthened when
Page 21 of 56
Attentional Disorders
stimuli were presented in the left visual field but within normal range when stimuli were
presented in the right visual field. As a conclusion, the temporal deficits of attention in
central vision (such as lengthening of attentional blink) can appear as a deficit of spatial
attention and do not seem to be related specifically to neglect (because it also occurs in
patients with extinction), nor anatomically to the posterior parietal lobe. In contrast, the
deficit of visual spatial working memory (visual synthesis) described in neglect but also
more recently in constructional apraxia has been ascribed to be specific to right IPL le
sions (Pisella et al., 2004; Russell et al., 2010; see Figure 16.9). In this context, the cru
cial contribution of visual remapping impairment in severe neglect (and not in left-sided
extinction) proposed by Pisella and Mattingley (2004) is not questioned by the studies
that have shown that spatial working memory deficit of patients with neglect also ap
pears in a vertical display (Ferber & Danckert, 2006; Malhotra et al., 2005). Indeed, the
right IPL is conceived to be able to remap (and establish relationships between) locations
throughout the whole visual field (Pisella et al., 2004, 2011; see Figure 16.9). The lateral
ized remapping impairments revealed by Heide et al. (1995) but also more recently by van
Koningsbruggen et al. (2010) result from the combination of the right IPL (p. 333) special
ization for space and the ipsilesional bias of attention, which additionally may delay or de
crease the representation of contralesional visual stimuli. In contrast, remapping impair
ments may express as deficient visual synthesis and revisiting behavior without lateral
ized spatial bias in several nonlateralized syndromes like constructional apraxia (see Fig
ure 16.14), which appears as persisting visual-spatial disorder following right parietal
damage when neglect lateralized bias has resolved (Russell et al., 2010).
The standard view of the PPC and Bálint’s syndrome in the context of the predominant
model of Milner and Goodale (1995) was that the most superior part of the PPC (dorsal
stream) was devoted to action (with OA as illustrative example of specific visual-motor
deficit) and the most inferior part of the PPC was more intermediate between vision for
action and vision for perception, with unilateral neglect as illustrative example. Recent
converging evidence tends to distinguish behaviorally but also anatomically the lateral
ized and nonlateralized components of unilateral neglect, linking the well-known lateral
ized bias of neglect to the dysfunction of the superior parietal-frontal network and the
newly defined deficits nonlateralized in space to the inferior parietal-frontal network and
TPJ. Indeed, the consequences of PPC lesions in humans suggest that, within the PPC,
symmetrical and asymmetrical (right-hemispheric dominant) visual-spatial maps coexist.
TMS applied unilaterally on the SPL symmetrically causes in humans contralesional visu
al extinction (Hilgetag et al., 2001; Pascual-Leone et al., 1994): In bilateral crossed-hemi
field visual presentation of two simultaneous objects, only the ipsilesional one is reported.
The spatial representations of the SPL, whose damage potentially induces contralesional
OA and contralesional visual extinction, concern egocentric (eye-centered) localization in
the contralesional visual field (Blangero et al., 2010a; see Figure 16.11) and do not exhib
it right-hemispheric dominance (Blangero et al., 2010a; Hilgetag et al., 2001; Pascual-
Page 22 of 56
Attentional Disorders
Leone et al., 1994). In contrast, hemineglect for the right space is rare and usually is
found in people who have an unusual right-hemispheric lateralization of language. Ac
cording to the rightward bias in midline judgment characteristic of more classical left-
sided visual neglect after a unilateral (right) lesion, brain imaging (Fink et al., 2000) and
TMS (Fierro et al., 2000) studies have revealed that this landmark (allocentric localiza
tion: perceptual line bisection) task activates in humans a specialized and lateralized net
work including the right IPL and the left cerebellum. The right IPL (IPL is used here to
distinguish it from SPL and designates a large functional region in the right hemisphere
that also includes the superior temporal gyrus and the TPJ) is also specifically involved in
processes such as sustaining and reorienting attention to spatial locations in the whole vi
sual field, useful in visual search tasks (Ashbridge et al., 1997; Corbetta et al., 2000, 2005;
Ellison et al., 2004; Malhotra et al., 2009; Mannan et al., 2005; Muggleton et al., 2008;
Schulman et al., 2007). Because neglect patients by definition (1) have a deficit of atten
tion for contralesional space and (2) fail in visual scanning tasks of line bisection and can
cellation, it seems that visual neglect syndrome is a combination of left visual extinction
(produced by damage of the right SPL) and visual synthesis deficits in the entire visual
field caused by damage to the right IPL (Pisella & Mattingley, 2004). Accordingly, Pisella
et al. (2004) have shown a combination of left-right attentional gradient and spatial work
ing memory in the entire visual space in neglect consecutive to parietal damage (see Fig
ure 16.9).
One can further speculate (model on Figure 16.1) that bilateral lesions of the SPL in hu
mans may cause the Bálint’s symmetrical shrinkage of attention, Seelenlähmung des
Schauens, later called simultanagnosia (Luria, 1959; Wolpert, 1924,), in which the patient
reports only one object among two in a symmetrical way, that is, not systematically the
right or the left one. Accordingly, the symptoms of simultanagnosia may simply appear as
a bilateral visual extinction (Humphreys et al., 1994), without more severe spatial disor
ganization. This might correspond to the differential severity between the handicap con
secutive to bilateral SPL lesion after a stroke, in which the patients exhibit bilateral OA
and subclinical simultanagnosia only displayed as a shrinking of visual attention (e.g., pa
tient AT: Michel & Henaff, 2004; patient IG, personal observation), and the larger handi
cap consecutive to posterior cortical atrophy (Benson’s syndrome). In Benson’s syn
drome, simultanagnosia may be worsened by an extent of the damage toward the right
IPL, thereby affecting the maps in which the whole space is represented and in which the
remapping mechanisms may specifically operate in order to integrate visual information
collected via multiple snapshots into a (p. 334) global concept. This extent of neural dam
age toward the right IPL would also be the crucial parameter to explain the different
severity between left-sided extinction (without clinical neglect syndrome) after a unilater
al SPL lesion and neglect syndrome, including extinction, deficit in landmark tests, and
remapping impairments or defect of visual synthesis (see Figure 16.1).
The observation of a young patient with an extremely focal lesion of the right IPL caused
by a steel nut penetrating his brain during an explosion (Patterson & Zangwill, 1944,
Case 1) is a direct argument for the model we propose in this chapter (see Figure 16.1).
The right IPL is the region the most commonly associated with visual neglect (Mort et al.,
Page 23 of 56
Attentional Disorders
2003). Patterson & Zangwill (1944) describe a patient with left-sided extinction and “a
complex disorder affecting perception, appreciation and reproduction of spatial relation
ships in the central visual field of vision” (p. 337). This “piecemeal approach” was associ
ated with a lack of “any real grasp of the object as a whole” (p. 342) that “could be de
fined as a fragmentation of the visual contents with deficient synthesis” (p. 356). This de
fect of visual synthesis in central vision was qualitatively similar to the consequences of
bilateral lesion of the posterior parietal lobe, which causes simultanagnosia, a nonlateral
ized extinction restricting visual perception to seeing only one object a time, even though
another overlapping object may occupy the same location in space (Humphreys et al.,
1994; Husain, 2001; Luria, 1959;). It is striking that the behavioral impact of this focal
unilateral lesion was almost equivalent to a full Bálint’s syndrome with ipsilesional biases
and extreme restriction of attention. In addition, a defect of establishing spatial relation
ships and integration of visual snapshots as a whole was described (that has been related
to spatial working memory or visual remapping impairments in the whole visual field; Dri
ver & Husain, 2002; Husain et al., 2001; Pisella et al., 2004; Pisella & Mattingley, 2004).
According to neuropsychological and neuroimaging observations, Corbetta et al. (2005)
developed a model based on the notion that the right inferior parietal-frontal network is
activated by unpredicted visual events throughout the whole visual field and that the su
perior parietal-frontal eye field network of attention influences the top-down stimulus–re
sponse selection in the contralesional visual space. In their model, the lesion of the right
inferior parietal-frontal network, via a “circuit-breaking signal,” decreases activity in the
ipsilateral superior parietal-frontal eye field network and consequently biases the activity
in the occipital visual cortex toward the ipsilesional visual field. In other words, this mod
el predicts that the lesion of the right IPL would, in addition to directly damaging the in
ferior parietal-frontal network, indirectly decrease functional activity in the right SPL and
thereby cause ipsilesional attentional bias and left-sided visual extinction (arrow from the
right IPL to the right SPL on the model of Figure 16.1).
The presence of lateralized attentional bias in the patient described by Patterson and
Zangwill (1944) may alternatively be understood within the general framework of inter
hemispheric balance. It appears that any lesion affecting directly or indirectly the pari
etal-frontal networks of attention in an asymmetrical way will cause an imbalance ex
pressed as a lateralized bias. According to this more general assumption, a lateralized
bias (left-right gradient of attention) is observed, for example, after a lesion to the basal
ganglia but without spatial working memory deficit, which is more specific to lesions of
the right IPL (Pisella et al., 2004; see Figure 16.9). Even less specifically, a spatial bias to
direct attention (or a limited-capacity visual processing system) may presumably occur
because of unopposed contralesional hemisphere activity.
To sum up, in our model (see Figure 16.1), visual extinction is a sensitive test to reveal
the ipsilesional attentional bias, which is the lateralized component of neglect. This ipsile
sional attentional bias (or left-right attentional gradient) is caused by the dysfunction of
the SPL directly (lesion of the SPL) or indirectly (the lesion of the right IPL causes an im
balance between the right and the left SPL; Corbetta et al., 2005), or by a lesion else
where causing similar imbalance in the dorsal parietal-frontal networks between the right
Page 24 of 56
Attentional Disorders
and the left hemispheres. It seems that prismatic adaptation, and most other treatments
of neglect, improve this lateralized component (common to neglect and extinction pa
tients, whatever the lesion causing the attentional gradient; see Figure 16.12; but see
Striemer et al., 2008), probably by acting on this imbalance (Luauté et al., 2006; Pisella et
al., 2006b). This is also an explanation of the paradoxical improvement of neglect (follow
ing right-hemisphere damage) by subsequent damage of the left hemisphere (Sprague ef
fect; see Weddell, 2004, for a recent reference). The other (nonlateralized) components of
neglect rely on the right IPL and its general function of visual-spatial synthesis and may
be more resistant to treatments (but see Rode et al., 2006; Schindler et al., 2009).
The basic feature of OA is a deficit for reach and grasp to a visual target in peripheral vi
sion that cannot be considered purely visual (the patient can see and describe the visual
world, and binocular vision is unaffected), proprioceptive (the patient can match joint an
gles between the two arms), or motor (the patient can move the two arms freely and can
usually reach and grasp in central vision) (Blangero et al., 2007; Garcin et al., 1967;
Perenin & Vighetto, 1988).
The two bilateral OA patients who have been tested most extensively in the last decade
(IG and AT; see Khan et al., 2005b, for detailed description of their lesion) initially pre
sented with associated Bàlint’s symptoms (simultanagnosia but no neglect) causing
deficits in central vision. For example, patient AT showed a limited deficit when grasping
unfamiliar objects in central vision during the early stage of her disease, when more con
comitant Bàlint’s symptoms were present (Jeannerod et al., 1994). Subsequent studies of
grasping in these patients used only peripheral object presentations (e.g., Milner et al.,
2001, 2003). The reaching study conducted in these two bilateral patients (after they re
covered from initial Bálint’s symptoms) also typically showed that accuracy was essential
ly normal in central vision, whereas errors increased dramatically with target eccentricity
(Milner et al., 1999, 2003; Rossetti et al., 2005). In light of these observations, we suggest
that patients with bilateral lesions suffering from Bàlint’s syndrome may be impaired in
central vision owing to additional visual-spatial deficits such as simultanagnosia. As re
ported by IG herself soon after her stroke, simultanagnosia prevents “the concomitant
viewing of the hand and the target,” which prevents motor execution from any control
and can cause reach-and-grasp errors in central vision. Accordingly, patients with reach
ing deficits in central vision are shown to be more accurate when they are prevented
from visual feedback about the hand (open-loop condition) during the execution of the
movement (Jackson et al., 2005; Jakobson et al., 1991; Buxbaum & Coslett, 1998). This
demonstrates that in these patients there is a clear interference between visual informa
tion from the target guiding the action and visual information from the hand position pro
viding online feedback, which is resolved by removing the visual feedback of the hand.
Such simultanagnosia, described by Bálint (1909) as difficulty looking at an object other
Page 25 of 56
Attentional Disorders
than the one he was fixating, can also be advocated to explain the “magnetic misreach
ing” behavior (Carey et al., 1997), in which the patient can only reach toward where he is
fixating, and that we have had the opportunity to observe in patient CF, in the acute
phase when Bálint’s syndrome was exhibited.
For patient IG, this perceptual deficit in the acute phase is mentioned in the Pisella et al.
(2000) study, where it is written that online reaching control was tested only when IG re
covered from her initial simultanagnosia. Even if clinical signs of simultanagnosia had re
solved after a few months (the patient had retrieved the full perceptual view of her hand,
whereas she initially had perceived only two fingers at the same time, and could report
bilateral stimulation even when tested on a computer with simultaneous presentation of
small dots), a complaint remained, for example, about pouring water into a glass without
touching it. For patient AT, who exhibited a larger lesion with more extent toward the oc
cipital lobule and IPL, a full report of the remaining perceptual deficits has been provided
by Michel and Henaff (2004) and summed up as a concentric shrinking of the attentional
field, impairing tasks, such as mazes, in which global attention is required.
Page 26 of 56
Attentional Disorders
After unilateral focal lesions of the SPL, cases of “pure” OA have been described, that is,
in absence of clinical perceptual and oculomotor symptoms (e.g., Garcin et al., 1967),
even in the acute stage. These descriptions have provided arguments for a pure vision-
for-action pathway involving the SPL-IPS network (Milner & Goodale, 1995). When OA is
observed after a unilateral lesion, the symptoms predominate on the contralesional pe
ripheral visual field (field effect), usually combined with a hand effect, the use of the con
Page 27 of 56
Attentional Disorders
tralesional hand causing additional errors throughout the whole visual field, especially
when vision of the hand is not provided (Figure 16.15A, B; Blangero et al., 2007; Brou
chon et al., 1986; Vighetto, 1980). This combination of hand and field effect is observed
both for reaching to stationary targets and for online motor control in response to a tar
get jump (Figure 16.15C; Blangero et al., 2008,); the impairment of manual automatic
corrections is linked both to the deficient updating of target location when it jumps in the
contralesional visual field (field effect) and to the deficient monitoring of the contralesion
al hand location within the contralesional space at movement start and during ongoing
movement (hand effect). This combination of hand and field effects has been considered
characteristic of a deficit in visual-manual (p. 336) transformation. However, recent exper
imental investigation of the field effect has revealed that this deficit specifically corre
sponds to the impairment of a visual-spatial (not only visual-manual) transformation defin
ing a contralesional target location in an eye-centered reference frame (Blangero et al.,
2010a; Dijkerman et al., 2006; Gaveau et al., 2008; Khan et al., 2005a). That is, there is a
visual-spatial module coding locations in the eye-centered reference frame commonly
used for eye and hand movements within the SPL (Figure 16.16; Gaveau et al., 2008).
This common module could thus also be involved in covert (attentional) orienting shifts
with perceptual consequences. Accordingly, a lateralized bias of attention has been
demonstrated in a Posner task by Striemer et al. (2007). Michel and Henaff (2004) have
reported that target jumps were perceived without an apparent motion in a bilateral OA
patient (AT), and reaction times to discriminate with a mouse left or right jumping of a
target seen in central vision are delayed for contralesional directions (e.g., with a right
PPC lesion, reaction time to target jumps to the left were longer than target jumps to the
right and longer than control performance; unpublished observation).
Recent studies have revealed that even pure cases of unilateral OA appear to systemati
cally demonstrate a lateralized contralesional deficit of covert attention (reviewed in
Pisella et al., 2007, 2009). This attentional deficit has to be specifically explored because
it can be subclinical (see Figure 16.13). An attentional deficit or ipsilesional bias can be
revealed in conditions of increased attentional load (request of identification of the letter
rather than simple detection) or attentional competition (with presence of an ipsilesional
item or of flankers within the (p. 337) same visual field). These increases of task demand
were always at the disadvantage of the performance within the contralesional visual field
in unilateral OA patients, similarly as in patients with extinction or neglect. The similarity
is also highlighted by the fact that OA can express in space or in time (Rossetti et al.,
2003), as already mentioned for attentional deficit (reviewed in Husain, 2001). The last
decade of study of OA has described the deficit in space (errors in eye-centered reference
frame increasing with visual eccentricity; reviewed in Blangero et al., 2010a) and in time
(Milner et al., 1999, 2001; Pisella et al., 2000; reviewed in Pisella et al., 2006a; Rossetti &
Pisella, 2002; Rossetti et al., 2003). The time effect can be summed up as a paradoxical
improvement of visual-manual guidance in delayed offline conditions and even by a guid
ance based on past representations (more stable than eye-centered ones) of targets
(“grasping the past” Milner et al., 2001). This description does not seem finally too far
away from what Husain (2001) concluded about attentional function in the context of vi
Page 28 of 56
Attentional Disorders
sual extinction and neglect: that it may consist in the keeping track of object features
across space and time. Note that location in space is a crucial feature for reaching as well
as for feature binding in the context of perception and more complex action.
To sum up, evidence of a lateralized deficit of covert attention in OA does not allow us to
claim a pure visual-motor deficit and to maintain a functional dissociation between OA
and (subclinical) visual extinction after lesions to the SPL.
At this stage of the chapter, we have claimed that both neglect and OA exhibit a deficit of
attentional orienting toward contralesional space emerging from damage to the symmet
rical SPL-IPS network. The specific right-hemispheric localization of neglect would rely
on the association of nonlateralized deficit consecutive to right IPL damage. Interestingly,
these common and different components can be revealed using the Posner paradigm (Pos
ner, 1980), a well-known means of testing attentional orienting in space. In the Posner
spatial-cueing task, participants keep their eyes on a central fixation point and respond as
fast as possible to targets presented in the right or the left visual field. Targets can be
preceded by cues presented either on the same side (valid trials) or on the opposite side
(invalid trials). An increased cost in responding to the target presented on the opposite
side of the cue (incongruent trials) is attributed to a covert orienting shift toward the cue
and necessary disengagement of attention from this cue to detect the target (Posner,
1980).
Page 29 of 56
Attentional Disorders
Page 30 of 56
Attentional Disorders
Page 31 of 56
Attentional Disorders
In parietal patients, targets on the contralesional side are detected more slowly than
those (p. 338) (p. 339) on the ipsilesional side (Friedrich et al., 1998; Posner et al., 1984),
consistent with a pathological attentional gradient. As can be seen on Figure 16.17,
deficit in disengaging attention from a right (ipsilesional) cue in invalid condition after
right parietal damage has been highlighted as a specific additional characteristic of ne
glect (right hyperattention; Bartolomeo & Chokron, 1999, 2002). Note, however, on Fig
ure 16.17 that a cost to disengage attention from a left (contralesional) cue is also ob
served in the other condition of spatial invalidity of the cue in neglect patients: When the
cue appears on the left side and the target on the right side, neglect patients are about
100 ms slower than in the valid condition with the target presented on the left side. Al
though the difference between leftward and rightward orienting in valid trials is about 20
ms only (Posner et al., 1984; redrawn in Figure 16.17). This means that for these patients
with parietal neglect, the need to generate sequential left-right orienting appears more
problematic than the need to orient attention in a direction opposite to their pathological
rightward attentional gradient (illustrated in Figure 16.12). This reflects the combination
of a rightward attentional gradient and a general (nonlateralized) double-step orienting
deficit, with an interaction between the two deficits causing the disengagement cost from
Page 32 of 56
Attentional Disorders
right cues to be higher than the disengagement cost from left cues. We have suggested
that remapping mechanisms, crucial for double-step orienting, can account for this gener
al disengagement cost in neglect patients and further for observation of ipsilesional ne
glect patterns after left cueing (Pisella & Mattingley, 2004). Crucially, this hypothesis im
plies that remapping processes might work for covert shifts of attention as well as for
overt shifts of the eyes.
Contrary to neglect patients who exhibit a major disengagement deficit leading to the
longest reaction times in the two invalid conditions (Posner et al., 1984), Striemer et al.
(2007) have suggested that OA patients exhibit only an engagement deficit toward the
contralesional visual field in a Posner task and no specific disengagement deficit. In con
sequence, performance to respond to ipsilesional items is never impaired, even in invalid
trials, whereas response to contralesional items is slow in both valid and invalid trials
(see Figure 16.17). Striemer et al. (2007) have therefore suggested damage of the salien
cy maps representing the contralesional visual field after an SPL-IPS lesion in a patient
with OA. In other words, they show the same deficit as neglect to orient attention toward
the contralesional visual field but no additional deficit in the double-step orienting need
ed in invalid trials. Accordingly, patients with OA exhibit no signs of visual synthesis
deficit. Moreover, OA patients have shown preserved visual remapping of saccade in the
context of a pointing experiment (Khan et al., 2005a). This absence of visual synthesis im
pairment fits with the absence of disengagement cost in invalid trials.
Page 33 of 56
Attentional Disorders
Luo et al. (1998) and later Bartolomeo et al. (2000) showed that neglect patients are more
specifically impaired in exogenous (stimulus-driven) conditions of attentional orienting
and remain able to shift attention leftward when instructed (i.e., voluntarily, goal direct
ed). This framework can be used to explain the superiority of the disengaging costs with
respect to the bias resulting from the pathological attentional gradient in neglect pa
tients. When the Posner task is undertaken by participants during functional magnetic
resonance imaging (Corbetta et al. 2000), symmetrical activations (contralateral to stimu
lus presentation) are observed in the SPL, and specific activation of the right IPL is addi
tionally observed in invalid trials in either visual field. Based on this pattern of activation,
Corbetta and (p. 340) Shulman (2002) have proposed this distinction between goal-direct
ed (endogenous) and stimulus-driven (exogenous) systems. This model was suggested by
the stimulus-driven nature of the invalid trials (detecting events at unpredicted locations
Page 34 of 56
Attentional Disorders
could only been stimulus-driven by definition) activating representation of the entire visu
al space within the IPL. If exogenous attention relies on the right IPL, according to neu
roimaging and neuropsychology of spatial neglect, then the symmetrical SPL-IPS network
could subtend endogenous attention shifts. Accordingly, the tests of attentional shifting
we have used on several OA patients (OK: Blangero et al., 2010b, Pisella et al., 2009; CF
and Mme P: see Figure 16.18) involved endogenous cues (central arrow). Patients were
instructed to undertake a letter-discrimination task at a cued location within the visual
periphery. They failed to identify the letter flashed at the cued location in their contrale
sional (ataxic) visual field. Is there a double dissociation between OA and neglect in this
respect? This remains to be investigated on a large scale. We have recently started to test
OA patients in an exogenous version of the same letter-discrimination task (adapted from
Deubel & Schneider, 1996), in which the cue was a flash of the peripheral location where
the letter would be presented. The two patients who are still available for testing (CF and
Mme P) have exhibited less contralesional deficit in this exogenous covert attention task
than in the endogenous version. In the endogenous version (right panel of Figure 16.18),
the percentage of correct letter discrimination was at chance level in the contralesional
visual field and at almost 80 percent in the ipsilesional visual field during the task. In the
exogenous version (left panel of Figure 16.18), there were no more difference between
performance in left and right visual fields. This possible clinical dissociation within the
PPC between exogenous and endogenous covert attention might therefore be (p. 341) an
other difference between the attentional deficits of neglect and OA patients. Whether this
constitutes a systematic neuropsychological difference between SPL-IPS and right IPL le
sion consequences needs to be further investigated. Note that such dissociation between
exogenous and endogenous attention has also been reported within the frontal lobe (Ve
cera & Rizzo, 2006): Contrary to frontal neglect cases, their patient exhibited a general
impairment in orienting attention endogenously, although he could use peripheral cues to
direct attention.
To sum up, the Posner task reveals that OA patients exhibit attentional deficits different
from those exhibited by neglect patients because neglect appears as a combination of lat
eralized deficits (common with OA patients) and deficits specific to the right IPL lesion
that need to be further characterized. Another issue that needs to be further investigated
is whether the attentional deficits of OA patients are associated with their visual-motor
impairment or are causal. The same issue is developed in the following section in relation
to attentional and eye movements’ impairments.
Page 35 of 56
Attentional Disorders
blink. When patients are asked to move their eyes to a target suddenly appearing in the
peripheral field, they may generate no movement, or they may initiate wandering eye
movements that consist in erratic and usually hypometric displacement of eyes in space,
ending with incidental acquisition of the target. Patients do not seem to perceive visual
targets located away from a small area, which is usually the area of foveation, despite
preserved visual fields. They exhibit a reduction of “useful field of vision,” operationally
defined as the field of space that can be attended while keeping central fixation (Rizzo &
Vecera, 2002). This can be tested by asking patients either to direct their eyes or their
hand to, or to name, objects presented extrafoveally. Generally, responses are more likely
to be given after verbal encouragement, a finding that indicates that the deficit is not a
consequence of a reduction of the visual fields but rather of attention scanning for non
central events.
The three elements of Bálint’s triad (1909) have been subjected to numerous interpreta
tions. As emphasized by de Renzi (1989), Holmes (1918) added to his description of pa
tients with parietal lesions a deficit for oculomotor functions, which had been excluded by
Bálint for his patient. This oculomotor deficit described by Holmes (1918) has often been
incorporated into the description of Bálint’s syndrome, and confounded with the Seelen
lähmung des Schauens most often translated in English as a psychic paralysis of gaze. As
de Renzi (1989) pointed out, psychic paralysis of gaze appears to be an erroneous transla
tion of Bálint’s description. A first alternative translation proposed by Hecaen and De
Ajuriaguerra (1954)—psychic paralysis of visual fixation, also known as spasm of fixation
(or inability to look toward a peripheral target)—suggested that it can be dissociated from
intrinsic oculomotor disorders, as already argued by Bálint, but also from the two atten
tional disturbances of Bálint’s syndrome: a general impairment of attention that corre
sponded to a selective perception of foveal stimuli and a lateralized component of the at
tention deficit, which can now be described as unilateral neglect. Husain and Stein’s
translation (1988) of Bálint’s description of the psychic paralysis of gaze corresponds to a
restriction of the patient’s “field of view, or we can call it the psychic field of vision.” De
Renzi (1989) only distinguished two types of visual disorders: one corresponding to the
lateralized deficit known as unilateral neglect, and the other to a nonlateralized restric
tion of visual attention. These interpretations of the Bálint-Holmes syndrome excluded the
presence of intrinsic oculomotor deficits.
By contrast, Rizzo and Vecera (2002; Rizzo, 1993), like Holmes (1918), included in the
posterior parietal dysfunction the oculomotor deficits which they distinguished from the
psychic paralysis of gaze assimilated to spasm of fixation or ocular apraxia. To these two
ocular symptoms they associated spatial disorder of attention corresponding to simul
tanagnosia (Wolpert 1924), unilateral neglect, and a concentric restriction of the atten
tive field. Evaluation of spatial disorder of attention using a motor response was therefore
considered to be possibly affected by concurrent visual-motor deficits (OA and ocular
apraxia). Verbal response was considered to more directly probe conscious or attentive
perception. This framework resembles the theory developed by Milner and Goodale
(1995) postulating a neural dissociation between perception and action, with attention po
sitioned together with perception. In this context of the dominant view of the dorsal
Page 36 of 56
Attentional Disorders
stream being devoted to vision for action, Bálint’s syndrome has been described
(p. 342)
as a set of visual-motor deficits: deficit of arm (reach) and hand (grasp) movements and
psychic paralysis of gaze as a deficit of eye movements. The attentional disorders that
were initially associated with the psychic paralysis of gaze have been implicitly grouped
together with the lateralized attentional deficits assimilated to unilateral neglect and sup
posed to all rely on the IPL and TPJ, a ventrodorsal pathway intermediate between the
dorsal and the ventral stream.
Altogether these apparent subtleties have given rise to several conceptualizations of the
oculomotor, perceptual, and attentional aspects of the syndrome, which reflect the diffi
culty to assign these oculomotor disorders to a well-defined dysfunction, as well as the va
riety of eye movement problems observed from case to case. The first subsection deals
with the assimilation of psychic paralysis of gaze to simultanagnosia, and the second sub
section reviews in detail the oculomotor impairments that have been described after a le
sion to the PPC.
Simultanagnosia
Simultanagnosia, a term initially coined by Wolpert (1924), defines a deficit in which pa
tients see only one object at a time (Figure 16.19). This aspect was reminiscent of the de
scription of Bálint’s patient who was not able to perceive the light of a match while focus
ing on a cigarette until he felt a burning sensation, a symptom that Bálint (1909) included
within the component he called Seelenlähmung des Schauens. Bálint (1909) mentioned
that this limited capacity of attentive vision for only one object at a time does not depend
on the size of the object. This is a distinction from a visual field deficit. Moreover, in pa
tients with simultanagnosia, description or copy of complex figures is laborious and slow,
and patients focus serially on details, apprehending portions of the pictures with a piece
Page 37 of 56
Attentional Disorders
meal approach but failing to switch attention from local details to global structures. Copy
of images is accordingly composed of tiny elements of the original figure without percep
tion of the whole scene (Wolpert, 1924). Luria (1959) assessed visual perception in a si
multanagnosic patient with tachistoscopic presentation of two overlapping triangles in
the configuration of the Star of David. When the two triangles were drawn in the same
color, the patient reported a star, but when one was drawn in red and the other in blue,
the patient reported seeing only one triangle (never two, and never a star).
The description of isolated cases of simultanagnosia (Luria, 1959; Wolpert, 1924), without
lateralized bias (unilateral visual neglect) and with OA or impaired visual capture being
considered consecutive to a dissociated visual-motor component, has tended to confirm
Bálint’s classification. However, as mentioned above, the causal relationship between at
tentional and visual-motor deficit is a possibility, even in patients with subclinical atten
tional disorder. Michel and Hénaff (2004) have provided a comprehensive examination of
a patient with bilateral PPC lesions whose initial Bálint’s syndrome had been reduced 20
years after onset to bilateral OA and a variety of attentional deficits that could be all in
terpreted as a concentric shrinking of the attentional field.
Spatial disorder of attention (Bálint, 1909), restriction (de Renzi, 1989) or shrinkage of
the attentional field (Michel & Henaff, 2004), or disorder of simultaneous perception
(Luria, 1959), are equivalent terms to designate this complex symptom, which can be
viewed as a limitation of visual-spatial attentional resources following bilateral lesions of
the parietal-occipital junction. It is tempting to view both the visual grasp reflex and the
wandering of gaze as direct consequences of the shrinking of the attentional field. It is
quite obvious that if one does not perceive any visual target in the periphery, one will not
capture this target with the eyes. Hence the gaze should remain anchored on a target
once acquired (visual grasp reflex). Then, if one is forced to find a specific target in the
visual scene, one would have to make blind exploratory movements, as would be the case
in tunnel vision. The shrinking of the visual (p. 343) field would therefore also account for
the wandering of gaze. However, a competition between visual objects or features for at
tentional resources is necessary to integrate to the spatial interpretation, in order to ac
count for the perception of one object among others in these patients, independent of its
size (Bálint, 1909) and even if the two objects are presented at the same location (Luria,
1959). These space-based and object-based aspects of simultanagnosia could be inter
preted in a more general framework of attention (see a similar view for object-based and
space-based neglect in Driver, 1999) as follows: The feature currently being processed ex
tinguishes—or impairs processing of—the other feature(s).
Page 38 of 56
Attentional Disorders
motor scanning is highly abnormal during scene exploration (Zihl, 2000). Both accuracy
of fixation and saccadic localization are impaired, and spatial-temporal organization of
eye displacements does not fit with the spatial configuration of the scene to be analyzed
(Tyler, 1968). Saccadic behavior is abnormal in rich (natural) environment arrays, which
requires continuous selection between concurrent stimuli, but it may be normal in a sim
plified context, for example, when the task is to direct the eyes to a peripheral light-emit
ting diode in the dark (Guard et al., 1984). The issue of whether the parietal lobe is
specifically involved in oculomotor processes per se should first be assessed by testing
such simple saccades to isolated single dots. Such a paradigm has been tested in patients
with a specific lesion of a parietal eye field, unilateral neglect, or OA, as reviewed below.
Pierrot-Deseilligny and Müri (1997) have observed increased reaction times and hypome
tria for contradirectional reflexive saccades after a lesion to the IPL. They have therefore
postulated the existence of an oculomotor parietal region (parietal eye field; see also Müri
et al., 1996) whose lesion specifically affects the processes of planning and triggering of
contradirectional reflexive saccades. These authors mention that the increase in reaction
time for contradirectional saccades is shorter when the fixation point vanishes about 200
ms before the presentation of the target in the contralesional visual field (“gap para
digm”) than in a condition of overlap between the two visual locations. This effect sug
gests that the deficit may be linked to visual extinction or a deficit of attentional disen
gagement from the cross initially fixated.
Patients with unilateral neglect have also been reported to exhibit late and hypometric
leftward saccades (e.g., Girotti et al., 1983; Walker & Findlay, 1997) when their saccadic
accuracy is reported as normal elsewhere (Behrmann et al., 1997). Finally, Niemeier and
Karnath (2000) have shown that hypometria can be observed casually only for reflexive
saccades triggered in response to left peripheral target presentation, leftward and right
ward saccades being equivalent in amplitude in conditions of free ocular exploration of vi
sual scenes. By contrast, the strategy of exploration is clearly impaired in patients with
unilateral neglect (e.g., Ishiai, 2002). No available result seems to rule out that these tem
poral and strategic deficits in neglect patients may result from a deficient allocation of at
tention to visual targets in the left periphery.
Finally, patients with pure OA arising from damage of the SPL are, by definition, impaired
for visual-manual reach-and-grasp guidance within their peripheral visual field, without
primary visual, proprioceptive, and motor deficits (Garcin et al., 1967; Jeannerod, 1986);
this definition is supposed to also exclude oculomotor deficits. The slight impairments of
saccadic eye movements detected from clinical tests in some OA patients have not been
considered sufficient to account for their major misreaching deficit (e.g., Rondot et al.,
1977; Vighetto & Perenin, 1981). Indeed, they correspond to only one part of the mis
reaching deficit: the contribution of the deficit to localizing the target commonly affecting
saccade and reach (field effect), and not the hand effect, which is specific to the reach
(Gaveau et al., 2008; see Figure 16.16 stationary targets). In addition, a deficit to monitor
hand location in peripheral vision (at the start of and during reaching execution) has to
be added to get the whole deficit. A more simple explanation could be that the misreach
Page 39 of 56
Attentional Disorders
ing arises from only two deficits: one to monitor visual information in eye-centered coor
dinates (impaired localization of the target and impaired visual guidance of the hand in
the contralesional visual field) and the other to monitor proprioceptive information in eye-
centered coordinates (typical (p. 344) hand effect as impaired proprioceptive guidance of
the ataxic hand; Blangero et al., 2007).
To sum up, the most consistent observation following a unilateral lesion of the posterior
parietal cortex is an increase of latency for saccades in the contralesional direction or—in
the case of a bilateral lesion—a poverty of eye movements, that may culminate in a condi
tion often referred to as spasm of fixation, or visual grasp reflex. As mentioned above,
parietal patients may exhibit a defect in shifting spatial attention (Verfaellie et al., 1990)
exogenously or endogenously. This may be central for the deficit of “attentional disen
gagement from fixated objects” (Rizzo & Vecera, 2002) and thereby for the visual grasp
reflex to occur. Second, patients are usually able to move their eyes spontaneously or on
verbal command, but they are impaired to perform visually guided saccades. The more at
tention and complex visual processing the eye movement requires, the less it is likely to
be performed. In this line, visual search behavior is particularly vulnerable. The variabili
ty and instability of the oculomotor deficit after a parietal lesion can be highlighted, for
single saccades as well as for exploratory scanning, as an argument for an underlying at
tentional deficit. Accordingly, inactivation of LIP area, considered the parietal saccadic
region in monkeys, affects visual search but not systematically saccades to single targets
(Li et al., 1999; Wardak et al., 2002). We therefore tend to propose that when oculomotor
deficits are observed following a posterior parietal lesion, they stem from an attentional
disorder. This view stands in contrast to the premotor theory of attention (Rizzolatti et al.,
1987), which postulates that spatial attention emerges from (a more basic process of) mo
tor preparation of eye movements, functionally and anatomically. In our view, the causal
effect is solely due to a functional coupling between attention and motor preparation. In
deed, attentional selection and motor selection rely on dissociated neural substrates
(Blangero et al., 2010b; Khan et al., 2009) even if they tightly interact with each other.
Moreover, our view is not theoretical or evolutionist. The direction of the causal effect is
not seen as an absolute hierarchy between two systems but is rather proposed only in the
case of a posterior parietal lesion. Other functional links may emerge from different le
sion locations (e.g., the frontal cortex).
Page 40 of 56
Attentional Disorders
It is tempting to conclude from this chapter that, schematically, all deficits that occur af
ter lesions to the superior parietal lobule (extinction, simultanagnosia, and OA) can be un
derstood in the framework of visual attention (defined as mechanisms allowing the brain
to increase spatial resolution in the visual periphery and supposed to express as deficits
in time or in space), whereas all deficits consecutive to a lesion extending toward the IPL
include deficits of visual synthesis (spatial representations and remapping mechanisms).
However, before reaching this conclusion, further investigations are (p. 345) needed to
confirm two main assumptions that we have been forced to make to establish our model
on Figure 16.1.
First, remapping processes for covert shifts of attention should exist within the right IPL.
Prime et al. (2008) have shown that only TMS of the right PPC, and not of the left PPC,
disrupts spatial working memory across saccades but also in static conditions. Visual
remapping across overt eye movements and spatial working memory therefore appear
similar in terms of anatomical network. However, they operate at different time scales
and may constitute related but different processes. Indeed, contrasting with the double-
step saccadic literature (Duhamel et al., 1992; Heide et al., 1995), which has claimed that
patients with neglect are impaired to remap contralesional (leftward) saccades, Vuilleumi
er et al. (2007) have shown impairment in a perceptual spatial working memory task
across saccades after a first rightward saccade in neglect patients. Interestingly, a more
complex account of parietal attentional function has been proposed recently by Riddoch
et al. (2010), distinguishing functionally within the ventral right-hemispheric attentional
network of the dorsal stream several subregions: the supramarginal and angular gyri of
the IPL, but also the TPJ and the superior temporal gyrus. This should provide neural sub
strates for possible distinction between visual remapping of a goal stimulus for a goal-di
Page 41 of 56
Attentional Disorders
rected (saccadic or pointing) response and perceptual response based on a spatial repre
sentation of the configuration of several salient objects of the visual scene. This would al
so provide a more complex framework to account for the multiple dissociations that have
been reported within the neglect syndrome (e.g., between near and far space, body and
external space, cancellation and bisection tasks).
Finally, the possible causal link between attentional deficit and OA needs to be further in
vestigated, as should that between attentional deficit and eye movement deficits follow
ing a lesion to the PPC. The view developed here is that the parietal cortex contains at
tentional prioritized representations of all possible targets (salient object locations; Got
tlieb et al., 1998) that feed oculomotor maps (e.g., by communicating the next location to
reach) but are not oculomotor maps themselves. More specifically, further studies are
needed to explain how an attentional deficit can cause a visual-motor deficit, expressed
spatially as hypometric errors in eye-centered coordinates, increasing with target eccen
tricity (Blangero et al., 2010a; Gaveau et al., 2008) or temporally by a delay of visual cap
ture. Old psychophysical studies have demonstrated that low-energy targets actually elic
it high-latency and hypometric saccadic eye movements (Pernier et al., 1969; Prablanc &
Jeannerod, 1974). An explanation of this phenomenon in terms of spatial map organiza
tion at different subcortical and cortical levels is necessary to establish a link between at
tention and visual-motor metric errors.
References
Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. An
nual Review of Neuroscience, 25, 189–220.
Anderson, C., & Van Essen, D. (1987). Shifter circuits: A computational strategy for dy
namic aspects of visual processing. Proceedings of the National Academy of Sciences U S
A, 84, 6297–6301.
Ashbridge, E., Walsh, V., & Cowey, A. (1997). Temporal aspects of visual search studied by
transcranial magnetic stimulation. Neuropsychologia, 35 (8), 1121–1131.
Bálint, R. (1909). “Seelenlähmung des Schauens, optische Ataxie, raümliche Störung der
Aufmerksamkeit.” Monatsschrift für Psychiatrie und Neurologie, 25, 51–81.
Bartolomeo, P. (2000). Inhibitory processes and spatial bias after right hemisphere dam
age. Neuropsychological Rehabilitation, 10 (5), 511–526.
Bartolomeo, P., & Chokron, S. (1999). Left unilateral neglect or right hyperattention?
Neurology, 53 (9), 2023–2027.
Bartolomeo, P., & Chokron, S. (2002). Orienting of attention in left unilateral neglect.
Neuroscience and Biobehavioral Reviews, 26, 217–234.
Bartolomeo, P., Perri, R., & Gainotti, G. (2004). The influence of limb crossing on left tac
tile extinction. Journal of Neurology, Neurosurgery, and Psychiatry, 75 (1), 49–55.
Page 42 of 56
Attentional Disorders
Becker, E., & Karnath, H. O. (2007). Incidence of visual extinction after left versus right
hemisphere stroke. Stroke, 38 (12), 3172–3174.
Behrmann, M., Watt, S., Black, S. E., & Barton, J. J. (1997). Impaired visual search in pa
tients with unilateral neglect: An oculographic analysis. Neuropsychologia, 35 (11), 1445–
1458.
Berman, R. A., & Colby C. 2009 Attention and active vision. Vision Research, 49 (10),
1233–1248.
Bisiach, E., Ricci, R., Lualdi, M., & Colombo, M. R. (1998). Perceptual and response bias
in unilateral neglect: Two modified versions of the Milner landmark task. Brain and Cog
nition, 37 (3), 369–386.
Blangero, A., Delporte, L., Vindras, P., Ota, H., Revol, P., Boisson, D., Rode, G., Vighetto,
A., Rossetti, Y. & Pisella, L. (2007). Optic ataxia is not only “optic”: Impaired spatial inte
gration of proprioceptive information. NeuroImage, 36, 61–68.
Blangero, A., Gaveau, V., Luauté, J., Rode, G., Salemme, R., Boisson, D., Guinard, M.,
Vighetto, A., Rossetti, Y. & Pisella, L. (2008). A hand and a field effect on on-line motor
control in unilateral optic ataxia. Cortex, 44 (5), 560–568.
Blangero, A., Khan, A. Z., Salemme, R., Laverdure, N., Boisson, D., Rode, G.,
(p. 346)
Vighetto, A., Rossetti, Y., & Pisella L. (2010b). Pre-saccadic perceptual facilitation can oc
cur without covert orienting of attention. Cortex, 46 (9), 1132–1137.
Blangero, A., Ota, H., Rossetti, Y., Fujii, T., Luaute, J., Boisson, D., Ohtake, H., Tabuchi,
M., Vighetto, A., Yamadori, A., Vindras, P., & Pisella, L. (2010a). Systematic retinotopic er
ror vectors in unilateral optic ataxia. Cortex, 46 (1), 77–93.
Brouchon, M., Joanette, Y., & Samson, M. (1986). From movement to gesture: “Here” and
“there” as determinants of visually guided pointing. In J. L. Nespoulos, A. Perron, & R. A.
Lecours (Eds.), Biological foundations of gesture (pp. 95–107). Mahwah, NJ: Erlbaum.
Carey, D. P., Coleman R. J., & Della Sala, S. (1997). Magnetic misreaching. Cortex, 33 (4),
639–652.
Carey, D. P., Harvey, M., & Milner, A. D. (1996). Visuomotor sensitivity for shape and ori
entation in a patient with visual form agnosia. Neuropsychologia, 34 (5), 329–337.
Page 43 of 56
Attentional Disorders
Carrasco, M., Penpeci-Talgar, C., & Eckstein, M. (2000). Spatial covert attention enhances
contrast sensitivity across the CSF: Support for signal enhancement. Vision Research, 40,
1203–1215.
Colby, C. L., Duhamel, J.-R., & Goldberg, M. E. (1995). Oculocentric spatial representation
in parietal cortex. Cerebral Cortex, 5, 470–481.
Colby, C. L., & Goldberg, M. E. (1999). Space and attention in parietal cortex. Annual Re
view of Neuroscience, 22, 319–349.
Corbetta, M., Kincade, M. J., Lewis, C., Snyder, A. Z., & Sapir, A. (2005). Neural basis and
recovery of spatial attention deficits in spatial neglect. Nature Neuroscience, 8 (11),
1603–1610.
Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature Neuroscience, 3 (3), 292–297.
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten
tion in the brain. Nature Reviews Neuroscience, 3 (3), 201–215.
Danckert, J., & Rossetti, Y. (2005). Blindsight in action: What can the different sub-types
of blindsight tell us about the control of visually guided actions? Neuroscience and Biobe
havioral Reviews, 29 (7), 1035–1046.
Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition:
Evidence for a common attentional mechanism. Vision Research, 36, 1827–1837.
Dijkerman, H. C., McIntosh, R. D., Anema, H. A., de Haan, E. H., Kappelle, L. J., & Milner,
A. D. (2006). Reaching errors in optic ataxia are linked to eye position rather than head or
body position. Neuropsychologia, 44, 2766–2773.
Di Pellegrino, G., Basso, G., & Frassinetti, F. (1998). Visual extinction as a spatio-temporal
disorder of selective attention. NeuroReport, 9, 835–839.
Driver, J. (1999). Egocentric and object-based visual neglect. In N. Burgess, K. J. Jeffery &
J. O. O’Keefe (Eds.), The hippocampal and parietal foundations of spatial cognition. (pp.
66–89). Oxford, UK: Oxford University Press.
Driver, J., & Husain, M. (2002). The role of spatial working memory deficits in pathologi
cal search by neglect patients. In H. O. Karnath, A. D. Milner, & G. Vallar (Eds.), The cog
nitive and neural bases of spatial neglect (pp. 351–364). Oxford, UK: Oxford University
Press.
Page 44 of 56
Attentional Disorders
Driver, J., & Mattingley J. B. (1998). Parietal neglect and visual awareness. Nature Neuro
science, 1, 17–22.
Duhamel, J.-R., Goldberg, M. E., Fitzgibbon, E. J., Sirigu, A., & Grafman, J. (1992). Sac
cadic dysmetria in a patient with a right frontoparietal lesion. Brain, 115, 1387–1402.
Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy
chology: General, 123 (2), 161–177.
Ellison, A., Schindler, I., Pattison, L. L., & Milner, A. D. (2004). An exploration of the role
of the superior temporal gyrus in visual search and spatial perception using TMS. Brain,
127 (Pt 10), 2307–2315.
Ferber, S., & Danckert, J. (2006). Lost in space—the fate of memory representations for
non-neglected stimuli. Neuropsychologia, 44 (2), 320–325.
Fierro, B., Brighina, F., Oliveri, M., Piazza, A., La Bua, V., Buffa, D., & Bisiach, E. (2000).
Contralateral neglect induced by right posterior parietal rTMS in healthy subjects. Neu
roReport, 11 (7), 1519–1521.
Fink, G. R., Marshall, J. C., Shah, N. J., Weiss, P. H., Halligan, P. W., Grosse-Ruyken, M.,
Ziemons, K., Zilles, K., & Freund, H. J. (2000). Line bisection judgments implicate right
parietal cortex and cerebellum as assessed by fMRI. Neurology, 54 (6), 1324–1331.
Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu
mans: A comparison of superior parietal and temporo-parietal junction lesions. Neuropsy
chology, 12, 193–207.
Garcin, R., Rondot, P., & de Recondo J. (1967). Ataxie optique localisée aux deux
hémichamps visuels homonymes gauches. Revue Neurologique, 116 (6), 707–714.
Gaveau, V., Pélisson, D., Blangero, A., Urquizar, C., Prablanc, C., Vighetto, A. & Pisella, L.
(2008). Saccadic control and eye-hand coordination in optic ataxia. Neuropsychologia, 46,
475–486.
Girard, P., Salin, P. A., & Bullier, J. (1991). Visual activity in macaque area V4 depends on
area 17 input. NeuroReport, 2 (2), 81–84.
Girard, P., Salin, P. A., & Bullier, J. (1992). Response selectivity of neurons in area MT of
the macaque monkey during reversible inactivation of area V1. Journal of Neurophysiolo
gy, 67 (6), 1437–1446.
Girotti, F., Casazza, M., Musicco, M., & Avanzini, G. (1983). Oculomotor disorders in corti
cal lesions in man: The role of unilateral neglect. Neuropsychologia, 21, 543–553.
Page 45 of 56
Attentional Disorders
Girotti, F., Milanese, C., et al. (1982). Oculomotor disturbances in Bálint’s syndrome:
Anatomoclinical findings and electrooculographic analysis in a case. Cortex, 18 (4), 603–
614.
Goldberg, M. E., & Bruce, C. J. (1990). Primate frontal eye fields. III. Maintenance of a
spatially accurate saccade signal. Journal of Neurophysiology, 64, 489–508.
Goodale, M. A., Milner, A. D., Jacobson, L. S., & Carey, D. P. (1991). A neurological dissoci
ation between perceiving objects and grasping them. Nature, 349, 154–156.
Gottlieb, J. P., Kusunoki, M., & Goldberg, M. E. (1998). The representation of visu
(p. 347)
Gréa, H., Pisella, L., Rossetti, Y., Desmurget, M., Tilikete, C., Prablanc, C., & Vighetto, A.
(2002). A lesion of the posterior parietal cortex disrupts on-line adjustments during aim
ing movements. Neuropsychologia, 40, 2471–2480.
Guard, O., Perenin, M. T., Vighetto, A., Giroud, M., Tommasi, M., & Dumas, R. (1984).
Syndrome pariétal bilatéral ressemblant au syndrome de Bálint. Revue Neurologique, 140
(5), 358–367.
Hecaen, H., & De Ajuriaguerra, J. (1954). Bálint’s syndrome (psychic paralysis of visual
fixation) and its minor forms. Brain, 77 (3), 373–400.
Heide, W., Binkofski, F., Seitz, R. J., Posse, S., Nitschke, M. F., Freund, H. J., & Kömpf, D.
(2001). Activation of frontoparietal cortices during memorized triple-step sequences of
saccadic eye movements: an fMRI study. European Journal of Neuroscience, 13 (6), 1177–
1189.
Heide, W., Blankenburg, M., Zimmermann, E., & Kömpf, D. (1995). Cortical control of
double-step saccades: implications for spatial orientation. Annals of Neurology, 38, 739–
748.
Heide, W., & Kömpf, D. (1997). Specific parietal lobe contribution to spatial constancy
across saccades. In: P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orienta
tion in 3D space (pp. 149–172). Heidelberg: Springer-Verlag.
Hilgetag, C. C., Théoret, H., & Pascual-Leone, A. (2001). Enhanced visual spatial atten
tion ipsilateral to rTMS-induced “virtual lesions” of human parietal cortex. Nature Neuro
science, 4 (9), 953–957.
Hillis, A. E., & Caramazza, A. (1991). Deficit to stimulus-centered, letter shape represen
tations in a case of “unilateral neglect.” Neuropsychologia, 29 (12), 1223–1240.
Page 46 of 56
Attentional Disorders
Hillis, A. E., Chang, S., Heidler-Gary, J., Newhart, M., Kleinman, J. T., Davis, C., Barker, P.
B., Aldrich, E., & Ken, L. 2006 Neural correlates of modality-specific spatial extinction.
Journal of Cognitive Neuroscience, 18 (11), 1889–1898.
Humphreys, G. W., Romani, C., Olson, A., Riddoch, M. J., & Duncan, J. (1994). Non-spatial
extinction following lesions of the parietal lobe in humans. Nature, 372, 357–359.
Husain, M., Mannan, S., Hodgson, T., Wojciulik, E., Driver, J., & Kennard, C. (2001). Im
paired spatial working memory across saccades contributes to abnormal search in pari
etal neglect. Brain, 124 (Pt 5), 941–952.
Husain, M., Mattingley, J. B., Rorden, C., Kennard, C., & Driver, J. (2000). Distinguishing
sensory and motor biases in parietal and frontal neglect. Brain, 123 (Pt 8), 1643–1659.
Husain, M., & Rorden, C. (2003). Non-spatially lateralized mechanisms in hemispatial ne
glect. Nature Reviews Neurosci ence, 4 (1), 26–36.
Husain, M., Shapiro, K., Martin, J., & Kennard, C. (1997). Abnormal temporal dynamics of
visual attention in spatial neglect patients. Nature, 385, 154–156.
Husain, M., & Stein, J. (1988). Reszö Bálint and his most celebrated case. Archives of
Neurology, 45, 89–93.
Jackson, S. R., Newport, R., Mort, D., & Husain, M. (2005). Where the eye looks, the hand
follows: Limb-dependent magnetic misreaching in optic ataxia. Current Biology, 15, 42–
46.
Jakobson, L. S., Archibald, Y. M., Carey, D. P., & Goodale, M. A. (1991). A kinematic analy
sis of reaching and grasping movements in a patient recovering from optic ataxia. Neu
ropsychologia, 29, 803–809.
Page 47 of 56
Attentional Disorders
Jeannerod, M., Decety, J., et al. (1994). Impairment of grasping movements following bi
lateral posterior parietal lesion. Neuropsychologia, 32, 369–380.
Karnath, H. O., Himmelbach, M., & Küker, W. (2003). The cortical substrate of visual ex
tinction. NeuroReport, 14 (3), 437–442.
Kennard, C., Mannan, S. K., Nachev, P., Parton, A., Mort, D. J., Rees, G., Hodgson, T. L., &
Husain, M. (2005). Cognitive processes in saccade generation. Annals of N Y Academy of
Sci ence, 1039, 176–183.
Khan, A. Z., Blangero, A., Rossetti, Y., Salemme, R., Luauté, J., Laverdure, N., Rode, G.,
Boisson, D., & Pisella, L. (2009). Parietal damage dissociates saccade planning from pre-
saccadic perceptual facilitation. Cerebral Cortex, 19 (2), 383–387.
Khan, A. Z., Pisella, L., Rossetti, Y., Vighetto, A., & Crawford, J. D. (2005b). Impairment of
gaze-centered updating of reach targets in bilateral parietal-occipital damaged patients.
Cerebral Cortex, 15 (10), 1547–1560.
Khan, A. Z., Pisella, L., Vighetto, A., Cotton, F., Luauté, J., Boisson, D., Salemme, R., Craw
ford, J. D., & Rossetti, Y. (2005a). Optic ataxia errors depend on remapped, not viewed,
target location. Nature Neuroscience, 8 (4), 418–420.
Kinsbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from at
tentional gradients within hemispace. In I. H. Robertson & J. C. Marshall (Eds.), Unilater
al neglect: Clinical and experimental studies (pp. 63–86). Hove, UK: Erlbaum.
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying
neural circuitry. Human Neurobiology, 4, 219–227.
Konen, C. S., & Kastner, S. (2008). Two hierarchically organized neural systems for object
information in human visual cortex. Nature Neuroscience, 11 (2), 224–231.
Làdavas, E. (1990). Selective spatial attention in patients with visual extinction. Brain,
113 (Pt 5), 1527–1538.
(p. 348) Li, C. S., Mazzoni, P., & Andersen, R. A. (1999). Effect of reversible inactivation of
macaque lateral intraparietal area on visual and memory saccades. Journal of Neurophys
iology, 81 (4), 1827–1838.
Luauté, J., Halligan, P., Rode, G., Jacquin-Courtois, S., & Boisson, D. (2006). Prism adapta
tion first among equals in alleviating left neglect: A review. Restorative Neurology and
Neuroscience, 24 (4–6), 409–418.
Page 48 of 56
Attentional Disorders
Luo, C. R., Anderson, J. M., & Caramazza, A. (1998). Impaired stimulus-driven orienting of
attention and preserved goal-directed orienting of attention in unilateral visual neglect.
American Journal of Psychology, 111 (4), 487–507.
Malhotra, P., Coulthard, E. J., & Husain, M. (2009). Role of right posterior parietal cortex
in maintaining attention to spatial locations over time. Brain, 132 (Pt 3), 645–660.
Malhotra, P., Jäger, H. R., Parton, A., Greenwood, R., Playford, E. D., Brown, M. M., Dri
ver, J., & Husain, M. (2005). Spatial working memory capacity in unilateral neglect. Brain,
128 (Pt 2), 424–435.
Mannan, S. K., Mort, D. J., Hodgson, T. L., Driver, J., Kennard, C., & Husain, M. (2005). Re
visiting previously searched locations in visual neglect: Role of right parietal and frontal
lesions in misjudging old locations as new. Journal of Cognitive Neuroscience, 17 (2), 340–
354.
Mattingley, J. B., Davis, G., & Driver, J. (1997). Preattentive filling-in of visual surfaces in
parietal extinction. Science, 275 (5300), 671–674.
Mattingley, J. B., Husain, M., Rorden, C., Kennard, C., & Driver, J. (1998). Motor role of
human inferior parietal lobe revealed in unilateral neglect patients. Nature, 392 (6672),
179–182.
Mattingley, J. B., Pisella, L., Rossetti, Y., Rode, G., Tilikete, C., Boisson, D., Vighetto, A.
(2000). Visual extinction in retinotopic coordinates: a selective bias in dividing attention
between hemifields. Neurocase 6, 465–475.
Mays, L. E., & Sparks, D. L. (1980). Dissociation of visual and saccade-related responses
in superior colliculus neurons. Journal of Neurophysiology, 43, 207–232.
McIntosh, R.D., Mulroue, A., Blangero, A., Pisella, L., & Rossetti, Y. (2011). Correlated
deficits of perception and action in optic ataxia. Neuropsychologia, 49, 131–137.
Medendorp, W. P., Goltz, H. C., Vilis, T., & Crawford, J. D. (2003). Gaze-centered updating
of visual space in human parietal cortex. Journal of Neuroscience, 23 (15), 6209–6214.
Merriam, E. P., Genovese, C. R., & Colby, C. L. (2003). Spatial updating in human parietal
cortex. Neuron, 39, 361–373.
Merriam, E. P., Genovese, C. R., & Colby, C. L. (2007). Remapping in human visual cortex.
Journal of Neurophysiology, 97 (2), 1738–1755.
Michel, F., & Hénaff, M. A. (2004). Seeing without the occipito-parietal cortex: Simul
tanagnosia as a shrinkage of the attentional visual field. Behavioural Neurology, 15, 3–13.
Page 49 of 56
Attentional Disorders
Michel, F., Jeannerod, M., & Devic M. (1963). Un cas de désorientation visuelle dans les
trois dimensions de l’espace (A propos du syndrome de Bálint et du syndrome décrit par
G Holmes). Revue Neurologique, 108, 983–984.
Milner, A. D., Dijkerman, H. C., McIntosh, R. D., Rossetti, Y., & Pisella, L. (2003). Delayed
reaching and grasping in patients with optic ataxia. In D. Pelisson, C. Prablanc & Y. Ros
setti (Eds.), Progress in Brain Research series: Neural control of space coding and action
production (pp. 142, 225–242). Amsterdam: Elsevier.
Milner, A. D., Dijkermann, C., Pisella, L., McIntosh, R., Tilikete, C., Vighetto, A., & Rosset
ti, Y. (2001). Grasping the past: Delaying the action improves visuo-motor performance.
Current Biology, 11 (23), 1896–1901.
Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford, UK: Oxford Uni
versity Press.
Milner, A. D., & Goodale, M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46 (3), 774–785.
Milner, A. D., Paulignan Y., Dijkerman H. C., Michel F., & Jeannerod M. (1999). A paradox
ical improvement of misreaching in optic ataxia: New evidence for two separate neural
systems for visual localization. Proceedings of the Royal Society of London B, 266, 2225–
2229.
Morris, A. P., Chambers, C. D., & Mattingley, J. B. (2007). Parietal stimulation destabilizes
spatial updating across saccadic eye movements. Proceedings of the National Academy of
Science U S A, 104 (21), 9069–9074.
Mort, D. J., Malhotra, P., Mannan, S. K., Rorden, C., Pambakian, A., Kennard, C., & Hu
sain, M. (2003). The anatomy of visual neglect. Brain, 126 (Pt 9), 1986–1997.
Muggleton, N. G., Cowey, A., & Walsh, V. (2008). The role of the angular gyrus in visual
conjunction search investigated using signal detection analysis and transcranial magnetic
stimulation. Neuropsychologia, 46 (8), 2198–2202.
Müri, R. M., Iba-Zizen, M. T., et al. (1996). Location of the human posterior eye field with
functional magnetic resonance imaging. Journal of Neurology, Neurosurgery, and Psychia
try, 60, 445–448.
Niebur, E., & Koch, C. (1997). Computational architectures for attention. In R. Parasura
man (Ed.), The attentive brain (pp. 163–186). Cambridge, MA: MIT Press.
Nowak, L., & Bullier, J. (1997). The timing of information transfer in the visual system. In
J. Kaas, K. Rochland, & A. Peters (Eds.), Extrastriate cortex in primates. New York:
Plenum Press.
Page 50 of 56
Attentional Disorders
Ota, H., Fujii, T., Suzuki, K., Fukatsu, R., & Yamadori, A. (2001). Dissociation of body-cen
tered and stimulus-centered representations in unilateral neglect. Neurology, 57 (11),
2064–2069.
Pascual-Leone, A., Gomez-Tortosa, E., Grafman, J., Always, D., Nichelli, P., & Hallett, M.
(1994). Induction of visual extinction by rapid-rate transcranial magnetic stimulation of
parietal lobe. Neurology, 44 (3 Pt 1), 494–498.
Patterson, A., & Zangwill, O. L. (1944). Disorders of visual space perception associated
with lesions of the right cerebral hemisphere. Brain, 67, 331–358.
Perenin, M.-T. & Vighetto, A. (1988) Optic ataxia: a specific disruption in visuomotor
mechanisms. I. Different aspects of the deficit in reaching for objects. Brain, 111 (Pt 3),
643–674.
Pernier, J., Jeannerod, M., & Gerin, P. (1969) Preparation and decision in saccades: adap
tation to the trace of the stimulus. Vision Research, 9 (9), 1149–1165.
Pierrot-Deseilligny, C., & Müri, R. (1997). Posterior parietal cortex control of saccades in
humans. In P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orientation in
3D space (pp. 135–148). Heidelberg: Springer-Verlag.
Pisella, L., Alahyane, N., Blangero, A., Thery, F., Blanc, S., Rode, G., & Pelisson, D. (2011).
Right-hemispheric dominance for visual remapping in humans. Philosophical Transactions
of the Royal Society of London, Series B, Biological Sciences, 366 (1564), 572–585.
Pisella, L., Berberovic, N., & Mattingley, J. B. (2004). Impaired working memory
(p. 349)
for location but not for colour or shape in visual neglect: a comparison of parietal and
non-parietal lesions. Cortex, 40 (2), 379–390.
Pisella, L., Binkofski, F., Lasek, K., Toni, I., & Rossetti, Y. (2006a). No double-dissociation
between optic ataxia and visual agnosia: Multiple sub-streams for multiple visuo-manual
integrations. Neuropsychologia, 44 (13), 2734–2748.
Pisella, L., Gréa, H., Tilikete, C., Vighetto, A., Desmurget, M., Rode, G., Boisson, D., &
Rossetti, Y. (2000). An automatic pilot for the hand in the human posterior parietal cortex
toward a reinterpretation of optic ataxia. Nature Neuroscience, 3, 729–736.
Pisella, L., & Mattingley, J. B. (2004). The contribution of spatial remapping impairments
to unilateral visual neglect. Neuroscience and Biobehavioral Reviews, 28 (2), 181–200.
Pisella, L., Ota, H., Vighetto, A., & Rossetti, Y. (2007). Optic ataxia and Bálint syndrome:
Neurological and neurophysiological prospects. In G. Goldenberg & B. Miller (Eds.),
Handbook of clinical neurology, 3rd Series, Volume 88: Neuropsychology and behavioral
neurology (pp. 393–416). Amsterdam: Elsevier.
Page 51 of 56
Attentional Disorders
Pisella, L., Rode, G., Farne, A., Tilikete, C., & Rossetti, Y. (2006b). Prism adaptation in the
rehabilitation of patients with visuo-spatial cognitive disorders. Current Opinion in Neu
rology, 19 (6), 534–542.
Pisella, L., Sergio, L., Blangero, A., Torchin, H., Vighetto, A., & Rossetti, Y. (2009). Optic
ataxia and the function of the dorsal stream: Contribution to perception and action. Neu
ropsychologia, 47, 3033–3044.
Posner, M. I., Walker, J. A., Friedrich, F. J., & Rafal, R. D. (1984). Effects of parietal injury
on covert orienting of attention. Journal of Neuroscience, 4, 1863–1874.
Prablanc, C., & Jeannerod, M. (1974). Latence et precision des saccades en fonction de
l’intensité, de la durée et de la position rétinienne d’un stimulus. Revue E.E.G., 4 (3), 484–
488.
Prime, S. L., Vesia, M., & Crawford, J. D. (2008). Transcranial magnetic stimulation over
posterior parietal cortex disrupts transsaccadic memory of multiple objects. Journal of
Neuroscience, 28 (27), 6938–6949.
Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual
processing in RSVP task: An attentional blink? Journal of Experimental Psychology: Hu
man Perception and Performance, 18, 849–860.
Rees, G. (2001). Neuroimaging of visual awareness in patients and normal subjects. Cur
rent Opinions in Neurobiology, 11 (2), 150–156.
Riddoch, M. J., Chechlacz, M., Mevorach, C., Mavritsaki, E., Allen, H., & Humphreys, G.
W. (2010). The neural mechanisms of visual selection: The view from neuropsychology.
Annals of the N Y Academy of Science, 1191 (1), 156–181.
Rizzo, M., & Vecera, S. P. (2002). Psychoanatomical substrates of Bálint ‘s syndrome. Jour
nal of Neurology, Neurosurgery, and Psychiatry, 72 (2), 162–178.
Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25, 31–40.
Robertson, I. (1989). Anomalies in the laterality of omissions in unilateral left visual ne
glect: Implications for an attentional theory of neglect. Neuropsychologia, 27 (2), 157–
165.
Page 52 of 56
Attentional Disorders
Rode, G., Luauté, J., Klos, T., Courtois-Jacquin, S., Revol, P., Pisella, L., Holmes, N. P., Bois
son D., & Rossetti, Y. (2007a). Bottom-up visuo-manual adaptation: Consequences for spa
tial cognition. In P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Attention and performance
XXI: Sensorimotor foundations of higher cognition (pp. 207–229). Oxford, UK: Oxford Uni
versity Press.
Rode, G., Pisella, L., Marsal, L., Mercier, S., Rossetti, Y., & Boisson, D. (2006). Prism adap
tation improves spatial dysgraphia following right brain damage. Neuropsychologia, 44
(12), 2487–2493.
Rode, G., Perenin, M. T., & Boisson, D. (1995). [Neglect of the representational space:
Demonstration by mental evocation of the map of France]. Revue Neurologique, 151 (3),
161–164.
Rode, G., Revol, P., Rossetti, Y., Boisson, D., & Bartolomeo, P. (2007b). Looking while
imagining: The influence of visual input on representational neglect. Neurology, 68 (6),
432–437.
Rondot, P., de Recondo J., & Ribadeau-Dumas, J. L. (1977). Visuomotor ataxia. Brain, 100
(2), 355–376.
Rorden, C., Mattingley, J. B., Karnath, H.-O., & Driver, J. (1997). Visual extinction as prior
entry: Impaired perception of temporal order with intact motion perception after parietal
injury. Neuropsychologia, 35, 421–433.
Rossetti, Y., McIntosh, R. M., Revol, P., Pisella, L., Rode, G., Danckert, J., Tilikete, C., Dijk
erman, H. C. M., Boisson, D., Michel, F., Vighetto, A., & Milner, A. D. (2005). Visually
guided reaching: posterior parietal lesions cause a switch from visuomotor to cognitive
control. Neuropsychologia, 43/2, 162–177.
Rossetti, Y., & Pisella L. (2002). Tutorial. Several “vision for action” systems: A guide to
dissociating and integrating dorsal and ventral functions. In W. Prinz & B. Hommel (Eds.),
Attention and performance XIX: Common mechanisms in perception and action (pp. 62–
119). Oxford, UK: Oxford University Press.
Rossetti, Y., Pisella, L., & Pélisson, D. (2000). New insights on eye blindness and hand
sight: Temporal constraints of visuomotor networks. Visual Cognition, 7, 785–808.
Rossetti, Y., Pisella, L. & Vighetto, A. (2003). Optic ataxia revisited: Visually guided action
versus immediate visuo-motor control. Experimental Brain Research, 153 (2), 171–179.
Rossetti, Y., & Revonsuo, A. (2000) Beyond dissociations: Recomposing the mind-brain af
ter all? In Y. Rossetti & A. Revonsuo (Eds.), Beyond dissociation: Interaction between dis
sociated implicit and explicit processing (pp. 1–16). Amsterdam: Benjamins.
Page 53 of 56
Attentional Disorders
Rossetti, Y., Rode, G., Pisella, L., Farne A., Ling L., Boisson D., & Perenin, M. T. (1998).
Hemispatial neglect and prism adaptation: When adaptation to rightward optical devia
tion rehabilitates the neglected left side. Nature, 395, 166–169.
Russell, C., Deidda, C., Malhotra, P., Crinion, J. T., Merola, S., & Husain, M. (2010). A
deficit of spatial remapping in constructional apraxia after right-hemisphere stroke.
Brain, 133, 1239–1251.
Schindler, I., McIntosh, R. D., Cassidy, T. P., Birchall, D., Benson, V., Ietswaart, M.,
(p. 350)
& Milner, A. D. (2009). The disengage deficit in hemispatial neglect is restricted to be
tween-object shifts and is abolished by prism adaptation. Experimental Brain Research,
192 (3), 499–510.
Schulman, G. L., Astafiev, S. V., McAvoy, M. P., d’Avossa, G., & Corbetta, M. (2007). Right
TPJ deactivation during visual search: Functional significance and support for a filter hy
pothesis. Cerebral Cortex, 17 (11), 2625–2633.
Smith, S., & Holmes, G. (1916). A case of bilateral motor apraxia with disturbance of visu
al orientation. British Medical Journal, 1, 437–441.
Snyder LH, Batista AP, Andersen RA (1997). Coding of intention in the posterior parietal
cortex. Nature, 386 (6621), 167–170.
Striemer, C., Blangero, A., Rossetti, Y., Boisson, D., Rode, G., Vighetto, A., Pisella, L., &
Danckert, J. (2007). Deficits in peripheral visual attention in patients with optic ataxia.
NeuroReport, 18 (11), 1171–1175.
Striemer, C., Blangero, A., Rossetti, Y., Boisson, D., Rode, G., Salemme, R., Vighetto, A.,
Pisella, L., & Danckert, J. (2008). Bilateral posterior parietal lesions disrupt the beneficial
effects of prism adaptation on visual attention: Evidence from a patient with optic ataxia.
Experimental Brain Research, 187 (2), 295–302.
Striemer, C., Locklin, J., Blangero, A., Rossetti, Y., Pisella, L., & Danckert, J. (2009). Atten
tion for action? Examining the link between attention and visuomotor control deficits in a
patient with optic ataxia. Neuropsychologia, 47 (6), 1491–1499.
Tian, J., Schlag J., & Schlag-Rey, M. (2000). Testing quasi-visual neurons in the monkey’s
frontal eye field with the triple-step paradigm. Experimental Brain Research, 130, 433–
440.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam
bridge, MA: MIT Press.
Page 54 of 56
Attentional Disorders
Vallar, G. (2007). Spatial neglect, Balint-Homes’ and Gerstmann’s syndrome, and other
spatial disorders. CNS Spectrums, 12 (7), 527–536.
Vallar, G., & Perani, D. (1986). The anatomy of unilateral neglect after right-hemisphere
stroke lesions: A clinical/CT-scan correlation study in man. Neuropsychologia, 24, 609–
622.
van Koningsbruggen, M. G., Gabay, S., Sapir, A., Henik, A., & Rafal, R. D. (2010). Hemi
spheric asymmetry in the remapping and maintenance of visual saliency maps: A TMS
study. Journal of Cognitive Neurosci ence, 22 (8), 1730–1738.
Vecera, S. P., & Rizzo, M. (2006). Eye gaze does not produce reflexive shifts of attention:
Evidence from frontal-lobe damage. Neuropsychologia, 44, 150–159.
Verfaellie, M., Rapcsak, S. Z., et al. (1990). Impaired shifting of attention in Bálint ‘s syn
drome. Brain and Cognition, 12 (2), 195–204.
Vighetto, A., & Perenin, M. T. (1981). Optic ataxia: Analysis of eye and hand responses in
pointing at visual targets. Revue Neurologique, 137 (5), 357–372.
Vuilleumier, P., & Rafal, R. D. (2000). A systematic study of visual extinction: Between-
and within-field deficits of attention in hemispatial neglect. Brain, 123 (Pt 6), 1263–1279.
Vuilleumier, P., Sergent, C., Schwartz, S., Valenza, N., Girardi, M., Husain, M., & Driver, J.
(2007). Impaired perceptual memory of locations across gaze-shifts in patients with uni
lateral spatial neglect. Journal of Cognitive Neuroscience, 19 (8), 1388–1406.
Wade, A. R., Brewer, A. A., Rieger, J. W., & Wandell, B. A. (2002). Functional measure
ments of human ventral occipital cortex: retinotopy and colour. Philosophical Transac
tions of the Royal Society of London, Series B, Biological Sciences. 357 (1424), 963–973.
Walker, R., & Findlay, J. M. (1997). Eye movement control in spatial- and object-based ne
glect. In P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orientation in 3D
space (pp. 201–218). Heidelberg: Springer-Verlag.
Wardak, C., Olivier, E., & Duhamel, J. R. (2002). Saccadic target selection deficits after
lateral intraparietal area inactivation in monkeys. Journal of Neuroscience, 22 (22), 9877–
9884.
Wojciulik, E., Husain, M., Clarke, K., & Driver, J. (2001) Spatial working memory. Journal
of Neuropsychologia, 39 (4), 390–396.
Page 55 of 56
Attentional Disorders
Wolpert, T. (1924). Die simultanagnosie. Zeitschrift für gesamte Neurologie und Psychia
trie, 93, 397–415.
Yarbus, A. L. (1967). Eye movements and vision. New York: Plenum Press.
Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396, 72–75.
Laure Pisella
A. Blangero
Caroline Tilikete
Damien Biotti
Gilles Rode
Gilles Rode, Lyon Neuroscience Research Center, Hospices Civils de Lyon, and Hôpi
tal Henry Gabrielle.
Alain Vighetto
Alain Vighetto, Lyon Neuroscience Research Center, University Lyon, Hospices Civils
de Lyon, Hôpital Neurologique, Lyon, France.
Jason B. Mattingley
Yves Rossetti
Page 56 of 56
Semantic Memory
Semantic Memory
Eiling Yee, Evangelia G. Chrysikou, and Sharon L. Thompson-Schill
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Semantic memory refers to general knowledge about the world, including concepts, facts,
and beliefs (e.g., that a lemon is normally yellow and sour or that Paris is in France). How
is this kind of knowledge acquired or lost? How is it stored and retrieved? This chapter
reviews evidence that conceptual knowledge about concrete objects is acquired through
experience with them, thereby grounding knowledge in distributed representations
across brain regions that are involved in perceiving or acting on them, and impaired by
damage to these brain regions. The authors suggest that these distributed representa
tions result in flexible concepts that can vary depending on the task and context, as well
as on individual experience. Further, they discuss the role of brain regions implicated in
selective attention in supporting such conceptual flexibility. Finally, the authors consider
the neural bases of other aspects of conceptual knowledge, such as the ability to general
ize (e.g., to map lemons and grapes onto the category of fruit), and the ability to repre
sent knowledge that does not have a direct sensorimotor correlate (e.g., abstract con
cepts, such as peace).
Keywords: semantic memory, concepts, categories, representation, knowledge, sensorimotor, grounding, embodi
ment
Introduction
What Is Semantic Memory?
How do we know what we know about the world? For instance, how do we know that a
cup must be concave, or that a lemon is normally yellow and sour? Psychologists and cog
nitive neuroscientists use the term semantic memory to refer to this kind of world knowl
edge. In his seminal article, “Episodic and Semantic Memory,” Endel Tulving borrowed
the term semantic from linguists to refer to a memory system for “words and other verbal
symbols, their meaning and referents, about relations among them, and about rules, for
mulas, and algorithms for manipulating them”1 (Tulving, 1972, p. 386).
Page 1 of 40
Semantic Memory
Today, most psychologists use the term semantic memory more broadly—to refer to all
kinds of general world knowledge, whether it is about words or concepts, facts or beliefs.
What these types of world knowledge have in common is that they are made up of knowl
edge that is independent of specific experiences; instead, it is general information or
knowledge that can be retrieved without reference to the circumstances in which it was
originally acquired. For example, the knowledge that lemons are shaped like mini-foot
balls would be considered part of semantic memory, whereas knowledge about where you
were the last time you tasted a lemon would be considered part of episodic memory. This
division is reflected in a prominent taxonomy of long-term memory (Squire, 1987), in
which semantic and episodic memory are characterized as distinct components of the ex
plicit (or declarative) memory system for facts (semantic knowledge) and events (episodic
knowledge).
Episodic Memory?
Although semantic memory and episodic memory are typically considered distinct, the de
gree to which semantic memory is dependent on episodic memory is a matter of ongoing
debate. This is because in order to possess a piece of semantic information, there must
have been some episode during which that information was learned. Whether this means
that all information in semantic memory begins as information in episodic memory (i.e.,
memory linked to a specific time and place) is an open question. According to Tulving, the
answer is no: “If a person possesses some semantic memory information, he obviously
must have learned it, either directly or indirectly, at an earlier time, but he need not pos
sess any mnemonic information about the episode of such learning …” (p. 389). In other
words, it may be possible for information to be incorporated into our semantic memory in
the absence of ever having conscious awareness of the instances in which we were ex
posed to it. Alternatively, episodic memory may be the “gateway” to semantic memory
(see Squire & Zola, 1998, for review)—that is, it may be the route through which seman
tic memory must be acquired (although eventually this information may exist indepen
dently). Most of the evidence brought to bear on this debate has come from studies of pa
tients with selective episodic or semantic memory deficits. We turn to these patients in
the following two subsections.
that came into common use [Gabrieli et al., 1988] and for people who became famous
[O’Kane et al., 2004] after his surgery). Thus, the evidence suggests that semantic knowl
edge can be acquired independently of the episodic memory system. However, semantic
knowledge in these amnesic patients is not normal (e.g., it is acquired very slowly and la
boriously). It is therefore possible that the acquisition of semantic memory normally de
pends on the episodic system,2 but other points of entry can be used (albeit less efficient
ly) when the episodic system is damaged. Alternatively, these patients may have enough
remaining episodic memory to allow the acquisition of semantic knowledge (Squire & Zo
la, 1998).
Everyone occasionally experiences difficulty retrieving episodic memories (what did I eat
for dinner last night?), but can people lose their knowledge of what things are? Imagine
walking through an orchard with a friend: Your friend has no trouble navigating among
the trees; then—to your surprise—as you stroll under a lemon tree, she picks up a lemon,
holds it up and asks, “What is this thing?”
In an early report, Elizabeth Warrington (1975) described three patients who appeared to
have lost this kind of knowledge. The syndrome has subsequently been termed semantic
dementia (also known as the temporal variant of fronto-temporal dementia), a neurode
generative disease that causes gradual and selective atrophy of the anterior temporal cor
tex (predominantly on the left; see Garrard & Hodges, 1999 ; Mesulam et al., 2003; Mum
mery et al., 1999). Although semantic dementia patients typically speak fluently and with
out grammatical errors, as the disease progresses, they exhibit severe word-finding diffi
culties and marked deficits in identifying objects, concepts, and people (Snowden et al.,
1989) irrespective of stimulus modality (e.g., pictures or written or spoken words; Bozeat
et al., 2000; Hodges et al., 1992; Patterson et al., 2006, 2007; Rogers & Patterson, 2007;
Snowden et al., 1994, 2001).
How one conceives of the relationship between semantic and episodic memory is
(p. 355)
complicated by the fact that (as we discuss in the following section) there are different
kinds of semantic knowledge. It may be that for sensorimotor aspects of semantic knowl
edge (e.g., knowledge about the shape, size, or smell of things), “new information enters
semantic memory through our perceptual systems, not through episodic memory” (Tulv
ing, 1991. p. 20), whereas semantic knowledge of information that does not enter directly
through our senses (e.g., “encyclopedic knowledge,” such as the fact that trees photosyn
Page 3 of 40
Semantic Memory
Early functional neuroimaging studies, however, suggested that semantic memory may be
organized along featural (also known as modality- or attribute-specific) lines—either in
stead of or in addition to domain-specific lines. These studies showed neuroanatomical
dissociations between visual and nonvisual object attributes, even within a category (e.g.,
Thompson-Schill et al., 1999). For example, Martin and colleagues (1995) reported that
retrieving the color of an object was associated with activation in ventral temporal cortex
bilaterally, whereas retrieving action-related information was associated with activation
in middle temporal and frontal cortex.
Further observations from neuropsychological patients have suggested even finer subdi
visions within semantic memory (e.g., Buxbaum & Saffran, 2002; Saffran & Schwartz,
1994). In particular, in categorical frameworks, living things can be further divided into
distinct subcategories (e.g., fruits and vegetables). Similarly, in featural frameworks, non
visual features can be subdivided into knowledge about an object’s function (e.g., a spoon
is used to eat) versus knowledge about how it is manipulated (e.g., a spoon is held with
the thumb, index, and middle fingers, at an angle; Buxbaum, Veramonti, & Schwartz,
2000; Kellenbach, Brett, & Patterson, 2003; Sirigu et al., 1991); likewise, visual features
can be subdivided into different attributes (e.g., color, size, form, or motion; see Thomp
son-Schill, 2003, for review).
Page 4 of 40
Semantic Memory
In the remainder of this chapter, we present a number of different theories cognitive neu
roscientists have proposed for the organization of semantic knowledge, and we discuss
experimental evidence on how this organization might be reflected in the brain. Although
some findings would appear, at first, to be consistent with an organization of semantic
memory by categories of information, we will conclude that the bulk of the evidence sup
ports an organization by features or attributes that are distributed across multiple brain
regions.
As described above, a number of observations from patients with brain injuries suggest
that different object categories (i.e., living and nonliving things) might be differentially in
fluenced by brain damage. One way to instantiate the evident neural dissociation be
tween living and nonliving things is to posit that there are distinct neural regions dedicat
ed to processing different categories of objects. The “domain-specific” category-based
model (Caramazza & Shelton, 1998) does just that. According to this model, evolutionary
pressure led to the development of adaptations to facilitate recognition of categories that
are particularly relevant for survival or reproduction, such as animals, plant life (i.e.,
Page 5 of 40
Semantic Memory
fruits and vegetables), conspecifics, and possibly tools; and these adaptations led to ob
jects from these different categories having distinct, non-overlapping neural representa
tions. Such a system would have adaptive value to the extent that having dedicated neur
al mechanisms for recognizing these objects could make for faster and more accurate
classification—and subsequent appropriate response.
A complication for category-based models is that despite the “category-specific” label, pa
tients’ recognition problems do not always adhere to category boundaries—deficits can
span category boundaries or affect only part of a category. This suggests a need for an ac
count of semantic memory that does not assume a purely category-specific organization.
Sensory-functional theory provides an alternative account. According to this model, con
ceptual knowledge is divided into anatomically distinct sensory and functional stores, and
so-called category-specific deficits emerge because the representations of different kinds
tend to rely on sensory and functional information to different extents (Farah & McClel
land, 1991; Warrington & McCarthy, 1987). For example, representations of living things
depend more on visual information than do artifacts, which depend more on functional in
formation. Consequently, deficits that partially adhere to category boundaries can emerge
even without semantic memory being categorically organized per se.
Sensory-functional theory is not without its own problems, however. There exist numer
ous patients whose deficits cannot be captured by a binary sensory-functional divide (see
Caramazza & Shelton, 1998, for a review), which demonstrates that a simple two-way
partitioning of semantic attributes is overly simplistic. A related but more fully specified
proposal by Alan Allport addresses this concern by pointing out that sensory information
should not be considered a unitary entity but rather should be divided into multiple at
tributes (e.g., color, sound, form, touch). Specifically, Allport (1985) suggests that the sen
sorimotor systems used to experience the world are also used to represent meaning: “The
essential idea is that the same neural elements that are involved in coding the sensory at
tributes of a (possibly unknown) object presented to eye or hand or ear also make up the
elements of the auto-associated activity-patterns that represent familiar object-concepts
in ‘semantic memory’” (1985, p. 53).3 Hence, according to Allport’s model, representa
tions are sensorimotor based, and consequently, the divisions of labor that exist in senso
rimotor processing should be reflected in conceptual representations. More recently, oth
er sensorimotor-based models have made similar claims (e.g., Barsalou, 1999; (p. 357)
Page 6 of 40
Semantic Memory
Damasio, 1989; Lakoff & Johnson, 1999; in a later section, we discuss empirical studies
that address these predictions).
One question that often arises with respect to these sensorimotor-based theories is
whether, in addition to sensorimotor representations and the connections between them,
it is useful to posit one or more specific brain regions, often called a hub or convergence
zone, where higher order similarity—that is, similarity across sensory modalities—can be
computed (e.g., Damasio, 1989; Simmons & Barsalou, 2003). Such an architecture may
facilitate capturing similarity among concepts, thereby promoting generalization and the
formation of categories (see Patterson et al., 2007, for a review). We return to these is
sues in later sections, where we discuss generalization and the representation of knowl
edge that is abstract in that it has no single direct sensorimotor correlate (e.g., the pur
pose for which an object is used, such as “to tell time” for a clock).
The final class of models that we discuss is commonly referred to as correlated feature-
basedaccounts (Gonnerman et al., 1997; McRae, de Sa, & Seidenberg, 1997; Tyler &
Moss 2001). According to these models, the “features” from which concepts are built
comprise not only sensorimotor-based features (such as shape, color, action, and taste)
but also other (experience-based) attributes that participants produce when asked to list
features of objects. For instance, for a tiger, these features might include things such as
“has eyes,” “breathes,” “has legs,” and “has stripes,” whereas for a fork, they might in
clude “made of metal,” “used for spearing,” and “has tines.”
According to one influential correlated feature-based model (Tyler & Moss, 2001), highly
correlated shared features tend to support knowledge of a category as a whole, whereas
distinctive features tend to support accurate identification of individual members. Fur
ther, the correlations between features enable them to support each other, making these
features robust. Hence, because living things have many shared features, general catego
ry knowledge is robust for them. On the other hand, because individual living things tend
to have few and uncorrelated distinctive features (e.g., “has stripes” or “has spots”), dis
tinctive information about living things is particularly susceptible to impairment. In con
trast, features that distinguish individual artifacts from others tend to be correlated (e.g.,
“has tines” is correlated with “used for spearing”), making this information robust. While
Page 7 of 40
Semantic Memory
differing in some details, Cree and McRae’s (2003) feature-based account similarly posits
that objects (living and nonliving) differ with respect to number of shared versus distinc
tive features and that these factors vary with object category. Hence, correlated feature-
based accounts hypothesize that the reason for category-specific deficits is not domain of
knowledge per se, but instead is differences in the distribution of features across domains
(see also Rogers & Patterson, 2007).
Summary of Models
The main division between domain-specific category-based models, on the one hand, and
sensorimotor-based and correlated feature-based accounts, on the other, concerns how
category knowledge is represented. For domain-specific models, object category is a pri
mary organizing principle of semantic memory, whereas for the other accounts, category
differences emerge from other organizational properties. In many ways, correlated fea
ture-based accounts echo sensorimotor-based theories. In particular, these two classes of
models are parallel in that categories emerge through co-occurrence of features, with the
relevance of different features depending on the particular object, and with different
parts of a representation supporting one another. The major distinguishing aspect is that
sensorimotor-based theories focus on sensorimotor features—specifying that the same
brain regions that encode a feature represent it. In contrast, because none of the funda
mental principles of correlated feature-based accounts require that features be sensori
motor based (in fact, a concern for these models is how features should be defined), these
accounts do not require that features be situated in brain regions that are tied to sensory
or motor processing.
model may help integrate all three classes of models. Convergence zone theories posit
dedicated regions for integrating across sensorimotor-based features, extracting statisti
cal regularities across concepts, and ultimately producing a level of representation with a
category-like topography in the brain (Simmons & Barsalou, 2003).
Functional neuroimaging techniques like positron emission tomography (PET) and func
tional magnetic resonance imaging (fMRI) have allowed cognitive neuroscientists to ex
plore different hypotheses regarding the neural organization of semantic memory in un
damaged brains. By means of these methodologies, researchers observe regional brain
activity while participants perform cognitive tasks such as naming objects, deciding
Page 8 of 40
Semantic Memory
whether two stimuli belong in the same object category, or matching pictures of stimuli to
their written or spoken names.
Early work attempted to examine whether specific brain regions are selectively active for
knowledge of different object categories (e.g., animals or tools). These studies found that
thinking about animals tends to produce increased neural activity in inferior posterior ar
eas, including inferior temporal (Okada et al., 2000; Perani et al., 1995) and occipital re
gions (Grossman et al., 2002; Martin et al., 1996; Okada et al., 2000; Perani et al., 1995),
whereas thinking about tools tends to activate more dorsal and frontal areas, including
left dorsal (Perani et al., 1995) or inferior (Grossman et al., 2002; Okada et al., 2000) pre
frontal regions, as well as left premotor (Martin et al., 1996), inferior parietal (Okada et
al., 2000), and posterior middle temporal areas (Grossman et al., 2002; Martin et al.,
1996; Okada et al., 2000). Further, within the inferior temporal lobe, the lateral fusiform
gyrus generally shows increased neural activity in response to animals, while the medial
fusiform tends to respond more to tools (see Martin, 2007, for a review).
Although these findings might seem at first glance to provide unambiguous support for a
domain-specific, category-based organization of semantic memory, the data have not al
ways been interpreted as such. Sensory-functional theories can also account for putative
ly category-specific activations because they posit that different regions of neural activity
for animals and tools reflect a tendency for differential weighting of visual and functional
features for objects within a given category, rather than an explicit category-based orga
nization (e.g., Warrington & McCarthy, 1987).
The hypothesis that a feature’s weight can vary across objects raises the possibility that
even for a given object, a feature’s weight may vary depending on its relevance to a given
context. In other words, the extent to which a particular feature becomes active for a giv
en object may be contextually dependent not only on long-term, object-related factors
(i.e., is this feature relevant in general for the identification of this object?) but also on
short-term, task-related factors (i.e., is this feature relevant for the current task?). The
following sections describe evidence suggesting that both the format of the stimulus with
which semantic memory is probed (i.e., words vs. pictures) and the demands of the task
influence which aspects of a given concept’s semantic representation are activated.
Page 9 of 40
Semantic Memory
edge (Chainay & Humphreys, 2002; Rumiati & Humphreys, 1998; Saffran et al., 2003b).
That is, damage to a component accessed by one stimulus type (e.g., words) can spare
components accessed by a different stimulus type (e.g., pictures).
analysis of four PET studies involving semantic categorization and lexical decision tasks
with verbal and pictorial stimuli. They found evidence for a common semantic system for
pictures and words in the left inferior frontal gyrus and left temporal lobe (anterior and
medial fusiform, parahippocampal, and perirhinal cortices) and evidence for modality-
specific activations for words in both temporal poles and for pictures in both occipito-tem
poral cortices. Overall, evidence from studies examining access to semantic knowledge
from pictures versus words suggests that concepts are distributed patterns of brain acti
vation that can be differentially tapped by stimuli in different formats.
Retrieval from semantic memory can be influenced not only by the format of the stimuli
used to elicit that information (as described above) but also by specifics of the task, such
as the information that the participant is asked to produce and the amount of time provid
ed to respond. For example, in an elegant PET experiment, Mummery and colleagues
(1998) showed participants the names of living things or artifacts and asked them to
make judgments about either a perceptual attribute (color) or a nonperceptual attribute
(typical location). Different attribute judgments elicited distinct patterns of activation (in
creased activation in the left temporal-parietal-occipital junction for location and in
creased activation in the left anterior middle temporal cortex for color). Moreover, differ
ences between attributes were larger than differences between category (i.e., living
things vs. artifacts), suggesting that the most prominent divisions in semantic memory
may be associated with attributes rather than categories—a structure consistent with dis
tributed, feature-based models of semantic memory (see also Moore & Price, 1999).
The amount of time provided to respond also appears to affect which aspects of a concept
become active. In an early semantic priming study, Schreuder and colleagues (1984)
observed that priming for perceptual information (e.g., between the concepts apple and
ball, which are similar in shape) emerges when task demands encourage a rapid re
sponse, whereas priming for more abstract information (e.g., between apple and banana,
which are from the same category) emerges only when responses are slower (see Yee et
al., 2011, for converging evidence). More recently, Rogers and Patterson (2007) provided
additional evidence that speed of response influences which semantic features are avail
able: When participants were under time pressure, responses were more accurate for cat
egorization judgments that did not require specific information, such as between cate
gories (e.g., distinguishing birds from vehicles), and less accurate for categorization that
Page 10 of 40
Semantic Memory
did require access to specific information, such as within a category (e.g., distinguishing
between particular kinds of birds). When participants were allowed more time to re
spond, the pattern reversed. Thus, the results of these studies suggest that the specifics
of the task influence which aspects of a representation become measurably active.
In sum, retrieval from semantic memory can be influenced not only by the format of the
stimuli used to elicit the information (e.g., words vs. pictures) but also by the timing of
the task and the information that the participant is asked to provide.
The format- and task-related effects reviewed earlier suggest that the most prominent di
vision in semantic memory might be in terms of attribute domains and not, necessarily,
category domains, thus offering support for distributed, feature-based models of semantic
memory. Clearly, though, differences in format or task cannot account for the fact that dif
ferences between categories can be observed even with the same format and task. How
ever, the presence of both format and task effects in semantic knowledge retrieval raises
the possibility that interactions between stimulus modality and task type can elicit catego
ry effects that these factors do not produce independently. In this section we explore how
the organization of semantic memory might accommodate stimulus, task, and category ef
fects.
For instance, the particular combinations of sensorimotor attributes retrieved from se
mantic memory might be determined by an interaction between task-type and sensorimo
tor experience (Thompson-Schill et al., 1999). For example, for living things, retrieval of
both visual and nonvisual information should require activation of visual attributes be
cause semantic memory about living things depends largely on knowledge about their vi
sual features. To illustrate, people’s experience with zebras is largely visual; hence, re
trieval of even nonvisual information about them (e.g., Do zebras live in Africa?) will en
gage visual attributes because one’s knowledge about zebras is built around their (p. 360)
visual features (assuming that retrieving more weakly represented attributes depends on
the activation of more strongly represented attributes; see Farah & McClelland, 1991). In
contrast, for nonliving things, only retrieval of visual information should require activa
tion of visual attributes. For instance, because people’s experience with microwave ovens
is distributed across a wide range of properties (e.g., visual, auditory, tactile), retrieval of
nonvisual information about them (e.g., Do microwave ovens require more electricity than
refrigerators?) will not necessarily engage visual attributes.
Thompson-Schill and colleagues (1999) found evidence for just such a dissociation: The
left fusiform gyrus (a region linked to visual knowledge) was activated by living things re
gardless of whether participants made judgments about their visual or nonvisual proper
ties. In contrast, for nonliving things, the same visual region was active only when partici
pants were asked to make judgments about visual properties. The complementary pattern
has also been observed: A region linked to action information (the left posterior middle
temporal cortex) was activated by tools for both action and nonaction tasks, but was acti
Page 11 of 40
Semantic Memory
vated by fruit only during an action task (Phillips et al., 2002). These and related findings
(Hoenig et al., 2008) suggest that category-specific activations may reflect differences in
which attributes are important for our knowledge of different object categories (but see
Caramazza, 2000, for an alternative perspective).
Related work has demonstrated that ostensibly category-specific patterns can be elimi
nated by changing the task. Both patients with herpes simplex virus encephalitis and
unimpaired participants exhibit apparently category-specific patterns when classifying
objects at the “basic” level (i.e., at the level of dog or car) as revealed by errors or by
functional activity in ventral temporal cortex, respectively. However, these differences
can be made to disappear when objects are classified more specifically (e.g., Labrador or
BMW, instead of dog or car; Lambon Ralph et al., 2007; Rogers et al., 2005). Why might
level of classification matter? One possibility relates to correlated feature-based models
(discussed earlier): Differences in the structure of the stimuli that are correlated with cat
egory may interact with the task (e.g., Humphreys et al., 1988; Price et al., 2003; Tarr &
Gautier, 2000; see also Cree & McRae, 2003). For instance, at the basic level, animals
typically share more features (e.g., consider dog vs. goat), than do vehicles (e.g., car vs.
boat). This greater similarity for animals may produce a kind of “crowding” that makes
them particularly difficult to differentiate at the basic level (e.g., Rogers et al., 2005; Nop
peney et al., 2007; Tyler & Moss, 2001; but cf. Wiggett et al., 2009, who find that interac
tions between category and task do not always modulate category effects).
Hence, the studies described in this section provide further evidence that apparently cat
egory-specific patterns may be due to interactions between stimuli and task. More broad
ly, numerous studies have explored whether semantic memory is organized in the brain
by object category, by perceptual or functional features, or by a multimodal distributed
network of attributes. Thus far, the findings are compatible with correlated feature and
sensorimotor-based accounts and appear to suggest a highly interactive distributed se
mantic system that is engaged differently depending on object category and task de
mands (for a review, see Thompson-Schill, 2003).
The preceding evidence largely supports one main tenet of sensorimotor, feature-based
accounts—that semantic memory is distributed across different brain regions. However,
an additional claim of sensorimotor theory is that the brain regions that are involved
when perceiving and interacting with an object also encode its meaning. To address this
claim, research has attempted to explore the extent to which the different sensorimotor
properties of an object (e.g., its color, action, or sound) activate the same neural systems
as actually perceiving these properties.
With respect to color, for example, Martin and colleagues (1995) measured changes in re
gional cerebral blood flow using PET when participants generated the color or the action
associated with pictures of objects or their written names. Generating color words led to
Page 12 of 40
Semantic Memory
activation in the ventral temporal lobe in an area anterior to that implicated in color per
ception, whereas generating action words was associated with activation in the middle
temporal gyrus just anterior to a region identified in the perception of motion. Martin and
colleagues interpreted these results as indicative of a distributed semantic memory net
work organized according to one’s sensorimotor experience of different object attributes
(see also Ishai et al., 2000; Wise et al., 1991). More recent studies have reported some di
rect overlap5 between regions involved in color perception and (p. 361) those involved in
retrieval of color knowledge about objects (Hsu et al., 2011; Simmons et al., 2007).
With respect to action, analogous findings have been reported regarding overlap between
perceptual-motor and conceptual processing. Chao and Martin (2000; see also Chao, Hax
by, & Martin, 1999; Gerlach et al., 2000) showed that the left ventral premotor and left
posterior parietal cortices (two areas involved in planning and performing actions) are se
lectively active when participants passively view or name pictures of manipulable tools.
The involvement of these regions despite the absence of a task requiring the retrieval of
action information (i.e., even during passive viewing) can be explained if the representa
tions of manipulable objects include areas involved in planning and performing actions. In
a recent study (Yee, Drucker, & Thompson-Schill, 2010) we obtained additional evidence
supporting this hypothesis: In left premotor cortex and inferior parietal sulcus, the neural
similarity of a pair of objects (as measured by fMRI-adaptation; see later) is correlated
with the degree of similarity in the actions used to interact with them. For example, a pi
ano and a typewriter, which we interact with using similar hand motions, have similar
representations in action regions, just as they should if representations are sensorimotor
based. Moreover, reading action words (e.g., lick, pick, kick) produces differential activity
in or near motor regions activated by actual movement of the tongue, fingers, and feet,
respectively (Hauk et al., 2004). Interestingly, it appears that this motor region activation
can be modulated by task: Reading an action verb related to leg movement (e.g., kick) ac
tivates motor regions in literal (kick the ball) but not figurative (kick the bucket) sen
tences (Raposo et al., 2009).
Although visual and motor features have been studied most often, other modalities also
supply evidence for overlap between conceptual and perceptual processing. Regions in
volved in auditory perception and processing (posterior and superior middle temporal
gyri) are active when reading the names of objects that are strongly associated with
sounds (e.g., telephone; Kiefer et al., 2008; see also Goldberg et al., 2006; Kellenbach et
al., 2001; Noppeney & Price, 2002). Similarly, an orbitofrontal region associated with
taste and smell is activated when making decisions about objects’ flavor (Goldberg et al.,
2006), and simply reading words with strongly associated smells (e.g., cinnamon) acti
vates primary olfactory areas (Gonzalez et al., 2006).
Patients with brain damage affecting areas involved in sensorimotor processing are also
relevant to the question of whether regions underlying perception and action also under
lie conceptual knowledge. A sensorimotor-based account would predict that damage to an
auditory, visual, or motor area (for example), should affect the ability to retrieve auditory,
visual, or motor information about an object, whereas access to features corresponding to
Page 13 of 40
Semantic Memory
undamaged brain regions would be less affected. There is evidence that this is indeed the
case. For instance, patients with damage to left auditory association cortex have prob
lems accessing concepts for which sound is highly relevant (e.g., thunder or telephone;
Bonner & Grossman, 2012; Trumpp et al., 2013). Likewise, a patient with damage to ar
eas involved in visual processing (right inferior occipito-temporal junction) had more diffi
culty naming pictures of objects whose representations presumably rely on visual infor
mation (e.g., living things that are not ordinarily manipulated) than objects whose repre
sentations are presumably less reliant on visual information (e.g., living or nonliving
things that are generally manipulated); the patient’s encyclopedic and auditory knowl
edge about both types of objects, in contrast, was relatively preserved (Wolk et al., 2005).
A critical function of semantic memory is the ability to generalize (or abstract) over our
experiences with a given object. Such generalization permits us to derive a representa
tion that will allow us to recognize new exemplars of it and make predictions about as
pects of these exemplars that we have not directly perceived. For example, during analog
ical thinking, generalization is critical to uncover relationships between a familiar situa
tion and a new situation that may not be well understood (e.g., that an electron is to the
nucleus like a planet is to the sun). Thus, analogical thinking involves not only retrieving
information about the two situations but also a mapping between their surface elements
based on shared abstract relationships (see Chrysikou & Thompson-Schill, 2010). Similar
ly, knowing that dogs and cats are both animals (i.e., mapping them from their basic to
their superordinate level categories) may facilitate generalization from one to the other. A
full treatment of the process of generalization would be beyond the scope of this chapter.
Page 14 of 40
Semantic Memory
However, we briefly touch on some of the things that cognitive neuroscience has revealed
about the generalization process.
Several findings are consistent with the idea that different brain regions support different
levels of representation. For instance, an anterior temporal region (the perirhinal cortex,
particularly in the left) was activated when naming pictures at the basic level (e.g., dog or
hammer), but not at the superordinate level (e.g., living or manmade), whereas a posteri
or temporal region (fusiform gyrus bilaterally) was activated for both levels (Tyler et al.,
2004, but cf. Rogers et al., 2006). In addition, greater anterior temporal lobe activity has
been observed during word–picture matching at a specific level (e.g., robin?kingfisher?)
than at a more general level (e.g., animal?vehicle?; Rogers et al., 2006). Further, process
ing may differ for different levels of representation: Recordings of neural activity (via
magnetoencephalography) suggest that during basic level naming, there are more recur
rent interactions between left anterior and left fusiform regions than during superordi
nate level naming (Clark et al., 2011).
One interpretation of these findings is that there exists a hierarchically structured system
along a posterior-anterior axis in the temporal cortex—with posterior regions more in
volved in coarse processing (such as the presemantic, perceptual processing required for
superordinate category discrimination) and anterior regions more involved in the integra
tion of information across modalities that facilitates basic-level discrimination (e.g., cat
vs. dog; see Martin & Chao, 2001). More broadly, these and related findings (e.g., Chan et
al., 2011; Grabowski et al., 2001; Kable et al., 2005) are consistent with the idea that se
mantic knowledge is represented at different levels of abstraction in different regions
(see also Hart & Kraut, 2007, for a mechanism by which different types of knowledge
could be integrated).
If true, this may be relevant to a puzzle that has emerged in neuroimaging tests of
Allport’s (1985) sensorimotor model of semantic memory. There is a consistent trend for
retrieval of a given physical attribute to be associated with activation of cortical areas 2
to 3 cm anterior to regions associated with perception of that attribute (Thompson-Schill,
2003). This pattern, which has been interpreted as coactivation of the “same areas” in
volved in sensorimotor processing, as Allport hypothesized, could alternately be used as
grounds to reject the Allport model. What does this anterior shift reflect?
We believe the answer may lie in ideas developed by Rogers and colleagues (2004). They
have articulated a model of semantic memory that includes units that integrate informa
tion across all of the attribute domains (including verbal descriptions and object names;
McClelland & Rogers, 2003). As a consequence, “abstract semantic representations
emerge as a product of statistical learning mechanisms in a region of cortex suited to per
forming cross-modal mappings by virtue of its many interconnections with different per
ceptual-motor areas” (Rogers et al., 2004, p. 206). The process of abstracting away from
modality-specific representations may occur gradually across a number of cortical re
gions (perhaps converging on the temporal pole). As a result, a gradient of abstraction
may emerge in the representations throughout a given region of cortex (e.g., the ventral
Page 15 of 40
Semantic Memory
extrastriate visual pathway), and the anterior shift may reflect activation of a more ab
stract representation (Kosslyn & Thompson, 2000). In other words, the conceptual simi
larity space in more anterior regions may depart a bit from the similarity space in the en
vironment, moving in the direction of abstract relations.
A gradient like this could also help solve another puzzle: If concepts are sensorimotor
based, one might worry that thinking of a concept would cause (p. 363) one to hallucinate
it or execute it (e.g., thinking of lemon would cause one to hallucinate a lemon, and think
ing of kicking would produce a kick). But if concepts are represented (at least in part) at a
more abstract level than that which underlies direct sensory perception and action, then
the regions that underlie, for example, action execution, need not become sufficiently ac
tive to produce action. More work is needed to uncover the nature of the representations
—and how the similarity space may gradually change across different cortical regions.
In this section we have briefly summarized a large body of data on the neural systems
supporting semantic memory (see Noppeney, 2009, for a more complete review of func
tional neuroimaging evidence for sensorimotor-based models). We suggested that in light
of the highly consistent finding that sensorimotor regions are active during concept re
trieval, the data largely support sensorimotor-based models of semantic memory. Howev
er, there is a question that is frequently raised about activation in sensorimotor regions
during semantic knowledge retrieval: Could it be that the activation of sensorimotor re
gions that has been observed in so many studies is “epiphenomenal”6 rather than indicat
ing that aspects of semantic knowledge are encoded in these regions? (See Mahon &
Caramazza, 2008, for discussion.) For example, perhaps activation in visual areas during
semantic processing is a consequence of generating visual images, and not of semantic
knowledge per se. The patient, TMS, and behavioral interference work described above
help to address this question: It is not clear how an epiphenomenal account would ex
plain the fact that lesioning or interfering with a sensorimotor brain region affects the
ability to retrieve the corresponding attribute of a concept. These data therefore suggest
that semantic knowledge is at least partially encoded in sensorimotor regions.
However, the task effects described above raise another potential concern. Traditionally,
in the study of semantic representations (and, in fact, in cognitive psychology more
broadly) it is assumed that only effects that can be demonstrated across a variety of con
texts should be considered informative with regard to the structure and organization of
semantic memory. If one holds this tenet, then these task effects are problematic. Yet, as
highlighted by the work described in this section, task differences can be accommodated
if one considers an important consequence of postulating that the representations of con
cepts are distributed (recall that all but traditional approaches allow for a distributed ar
chitecture): Distributed models allow attention to be independently focused on specific
(e.g., contextually relevant) properties of a representation through partial activation of
the representation (see Humphreys & Forde, 2001, for a description of one such model).
This means that if a task requiring retrieval of action information, for example, produces
Page 16 of 40
Semantic Memory
activation in premotor and parietal regions, but a task requiring retrieval of color does
not, the discrepancy may reflect differential focus of attention within an object concept
rather than that either attribute is not part of the object concept.
Thus, the differences between effects that emerge in different contexts lead to important
questions, such as how we are able to flexibly focus attention on relevant attributes. We
turn to this in the next section.
Several mechanisms have been proposed regarding this region’s role in selective activa
tion of conceptual information, among them that prefrontal cortical activity during se
mantic tasks reflects the maintenance of different attributes in semantic memory (e.g.,
Gabrieli et al., 1998) or that this region performs a “controlled retrieval” of semantic in
formation (e.g., Badre & Wagner, 2007). We and others (p. 364) have suggested that this
region, although critical in semantic memory retrieval, performs a domain-general func
tion as a dynamic filtering mechanism that biases neural responses toward task-relevant
information while gating task-irrelevant information (Shimamura, 2000; Thompson-Schill,
2003; Thompson-Schill et al., 2005). In other words, when a context or task requires us to
focus on specific aspects of our semantic memory, the left ventrolateral prefrontal cortex
biases which aspects of our distributed knowledge system will be most active.
Page 17 of 40
Semantic Memory
ries: If it is true that the sensorimotor regions that are active when an object is perceived
are the same regions that represent its meaning, then an individual’s experience with
that object should shape the way it is represented. In other words, the studies that we
have described so far have explored the way that concepts are represented in the “aver
age” brain, and the extent to which the findings have been consistent presumably reflects
the commonalities in human experience. Yet studying the average brain does not allow us
to explore whether, as predicted by sensorimotor-based theories, differences in individu
als’ experiences result in differences in their representation of concepts. In this section
we describe some ways in which individual differences influence the organization of se
mantic memory.
Even with much less than a lifetime of experience, the neural representation of an object
can reflect specific experience with it. Oliver and colleagues (2008) asked one set of par
ticipants to learn (by demonstration) actions for a set of novel objects, perform those ac
tions, and also view the objects, whereas a second set of participants viewed the same ob
jects without learning actions but had the same total amount of exposure to them. In a
subsequent fMRI session in which participants made judgments about visual properties of
the objects, activity in parietal cortex was found to be modulated by the amount of tactile
and action experience a participant had with a given object. These and related findings
(Kiefer et al., 2007; Weisberg et al., 2007) demonstrate a causal link between experience
with an object and its neural representation, and also show that even relatively short-
term differences in sensorimotor experience can influence an object’s representation.
Page 18 of 40
Semantic Memory
Intriguingly, changes in individual experience may also lead to changes in the representa
tion of abstract concepts. Right-handers’ tendency to associate “good” with “right” and
“bad” with “left” (Casasanto, 2009) can be reversed when right hand dominance is com
promised because of stroke or a temporary laboratory-induced handicap (Casasanto &
Chrysikou, 2011).
As would be expected given the differences observed for handedness and relatively short-
term (p. 365) experience, more dramatic differences in individual experience have also
been shown to affect the organization of semantic knowledge. For instance, color influ
ences implicit similarity judgments for sighted but not for blind participants (even when
blind participants have good explicit color knowledge of the items tested; Connolly et al.,
2007). Interestingly, this difference held only for fruits and vegetables, and not for house
hold items, consistent with a large literature demonstrating that the importance of color
for an object concept varies according to how useful it is for recognizing the object (see
Tanaka & Presnell, 1999, for review).
At first glance, the individual differences that we have described in this section may seem
surprising. If our concept of a lemon, for example, is determined by experience, then no
two individuals’ concepts of a lemon will be exactly the same. Further, your own concept
of a lemon is likely to change subtly over time, probably without conscious awareness. Yet
the data described above suggest that this is, in fact, what happens. Because sensorimo
tor-based models assume that our representations of concepts are based on our experi
ences with them, these models can easily account for, and in fact predict, these differ
ences and changes. It is a challenge for future research to explore whether there are fac
Page 19 of 40
Semantic Memory
tors that influence the extent to which we attend to different types of information, and
that constrain the degree to which representations change over time.
Abstract Knowledge
Our discussion of the organization of semantic memory has thus far focused primarily on
the physical properties of concrete objects. Clearly, though, a complete theory of seman
tic memory must also provide an account for how we represent abstract concepts (e.g.,
peace) as well as abstract features of concrete objects (e.g., “used to tell time” is a prop
erty of a clock). According to the “concreteness effect,” concrete words are processed
more easily than abstract words (e.g., Paivio, 1991) because their representations include
sensory information that abstract words lack. However, there have been reports of se
mantic dementia patients who have more difficulty with concrete than abstract words
(Bonner et al., 2009; Breedin et al., 1994; but cf. Hoffman & Lambon-Ralph, 2011, and Jef
feries et al., 2009, for evidence that the opposite pattern is more common in semantic de
mentia), suggesting that there must be more to the difference between these word types
than quantity of information. Additional evidence for a qualitative difference between the
representations of concrete and abstract words comes from work by Crutch and Warring
ton (2005). They reported a patient AZ, with left temporal, parietal, and posterior frontal
damage, who, for concrete words, exhibits more interference from words closely related
in meaning (e.g., synonyms) than for “associated” words (i.e., words that share minimal
meaning but often occur in similar contexts), whereas for abstract words, she displays the
opposite pattern.
Neuroimaging studies that have compared abstract and concrete words have identified
an inconsistent array of regions associated with abstract concepts: the left superior tem
poral gyrus (Wise et al., 2000), right anterior temporal pole, or left posterior middle tem
poral gyrus (Grossman et al., 2002). These inconsistencies may be due to the differing de
mands of the tasks employed in these studies or to differences in how “abstract” is opera
tionalized. The operational definition of abstract may be particularly important because it
varies widely across studies—ranging from words without sensorimotor associations to
words that have low imageability (p. 366) (i.e., words that are difficult to visualize) to emo
tion words (e.g., love). We surmise that these differences likely have a particularly signifi
cant influence on where brain activation is observed.
Using abstract stimuli intended to have minimal sensorimotor associations, Noppeney and
Price (2004) compared fMRI activation while subjects made judgments about words (com
prising nouns, verbs, and adjectives) referring to visual, auditory, manual action, and ab
stract concepts. Relative to the other conditions, abstract words activated the left inferior
frontal gyrus, middle temporal gyrus, superior temporal sulcus, and anterior temporal
pole. Because these are classical “language” areas, the authors suggest that the activa
tions are a consequence of the representations of abstract words being more reliant on
contextual information provided by language. Recently, Rodriguez and colleagues (2011)
observed activation in these same regions for abstract verbs. They also observed that a
Page 20 of 40
Semantic Memory
greater number of regions were active for abstract relative to concrete verbs—leading
them to hypothesize that because abstract words appear in more diverse contexts (Hoff
man et al., 2011), the networks supporting them are more broadly distributed.
Like abstract words, abstract features (e.g., “used to tell time”) have no direct sensorimo
tor correlates. Our ability to conceive of abstract concepts and features—i.e., knowledge
that cannot be directly perceived from any individual sensory modality—demonstrates
that there must be more to semantic knowledge than simple sensorimotor echoes. How
might abstract concepts or features be represented in the kind of distributed architecture
that we have described? Rogers and colleagues’ model of semantic memory (introduced
above in the context of generalization) may be of service here as well. They argue that the
interaction between content-bearing perceptual representations and verbal labels pro
duces a similarity space that is not captured in any single attribute domain, but rather re
flects abstract similarity (cf. Caramazza, Hillis, Rapp, & Romani, 1990; Chatterjee, 2010;
Damasio, 1989; Plaut, 2002; Tyler, Moss, Durrant-Peatfield, & Levy, 2000).
Based on the abundant interconnections between the temporal pole and different sensori
motor areas, and on the fact that temporal pole degeneration is associated with semantic
dementia (introduced in earlier), Rogers and colleagues suggest that this region may sup
port abstract knowledge and generalization. Semantic dementia, in particular, has had a
large influence on ideas about the anterior temporal lobes’ role in semantic memory. In
this disorder, relatively focal degeneration in the anterior temporal lobes accompanies se
mantic memory deficits (e.g., problems naming, recognizing, and classifying objects, re
gardless of category), whereas other cognitive functions are relatively spared (see
Hodges & Patterson, 2007, for a review). The concomitance of the anatomical and cogni
tive impairments in semantic dementia therefore lends credence to the idea that the ante
rior temporal lobes are important for supporting semantic memory (see Patterson et al.,
2007, for a review). Additional research is needed to explore whether brain regions be
yond the anterior temporal lobe serve similar “converging” functions.
Methodological Advances
The studies reviewed in this chapter employed behavioral, neuropsychological, and neu
roimaging techniques to explore the organization and function of semantic memory. A
number of methodologies that have recently been introduced in cognitive neuroscience
hold much promise for the study of semantic memory.
First, new approaches in experimental design and data analysis for neuroimaging-based
studies allow cognitive neuroscientists to address more fine-grained questions about the
neural representation of concepts. For example, questions relating to representational
similarity can be explored with fMRI adaptation (e.g., Grill-Spector & Malach, 2001). This
technique relies on the assumption that when stimuli that are representationally similar
are presented sequentially, the repeated activation of the same set of neurons will pro
duce a reduced fMRI response. If the stimuli are representationally distinct, no such
adapted response should be observed. This method can be applied to address a number of
Page 21 of 40
Semantic Memory
questions pertaining, for instance, to relationships between regions implicated in the pro
cessing of different object attributes (e.g., color, shape, and size; see Yee et al., 2010, for
function and manipulation), or to the degree to which the same neurons are involved in
perception and in conceptual representation. Similarly, multivoxel pattern analysis (e.g.,
Mitchell et al., 2008; Norman et al., 2006; Weber et al., 2009) and functional connectivity
approaches allow for analyses that exploit the distributed nature of brain activation,
rather than focusing on focal activation peaks (see Rissman & Wagner, 2012).
Second, noninvasive brain stimulation techniques, specifically TMS and transcranial di
rect current stimulation (tDCS), allow researchers to temporarily “lesion” a given brain
region and (p. 367) observe the effects on behavior (e.g., Antal et al., 2001, 2008; Walsh &
Pascual-Leone, 2003). In contrast to studying patients in the months and years after brain
injuries that produce permanent lesions, using these “virtual lesions” allows cognitive
neuroscientists to examine the role of a given brain region without the possibility that re
organization of neural function has occurred.
Much of the work in cognitive neuroscience that has been the focus of this chapter indi
cates that semantic representations are at least partially sensorimotor based. One senso
rimotor modality in particular, action, has received a great deal of attention, perhaps be
cause of the discovery of “mirror neurons”—cells that respond both when an action is
perceived and when it is performed (Rizzolatti & Craighero, 2004). This focus on action
has led to a common criticism of sensorimotor-based theory: Being impaired in perform
ing actions does not entail being unable to conceive of objects with strongly associated
actions—suggesting that action may not, in fact, be part of these conceptual representa
tions.7
There are at least three important points to keep in mind with respect to this criticism.
First, concepts are more than associated actions (and in fact many concepts—e.g., book
Page 22 of 40
Semantic Memory
Third, recent research (reviewed earlier) suggests that depending on the demands of the
task, we are able to dynamically focus our attention on different aspects of a concept.
This means that sensorimotor-based distributed models are not inconsistent with finding
that an action is not routinely activated when the concept is activated, or that patients
with disorders of action can respond successfully to concepts that are action based if the
task does not require access to action information. In fact, such findings fall naturally out
of the architecture of these models. Such models allow for semantic memory to exhibit
some degree of gracefuldegradation (or fault tolerance) in that representations can con
tinue to be accessed despite the loss of some of their components.
Page 23 of 40
Semantic Memory
Many of the studies described in this chapter explored the organization of semantic mem
ory by comparing the neural responses to traditionally defined categories (e.g., animals
vs. tools). However, a more fruitful method of understanding conceptual representations
may be to compare individual concepts to one another, and extract dimensions that de
scribe the emergent similarity space. The newer methods of analyzing neuroimaging data
discussed above (such as fMRI adaptation and multi-voxel pattern analysis, or MVPA) are
well suited to the task of describing these types of neural similarity spaces. Further, by
making inferences from these spaces, it is possible to discover what type of information is
represented in a given cortical region (e.g., Mitchell et al., 2008; Weber et al., 2009; Yee
et al., 2010). Overall, our understanding of semantic memory can benefit more from
studying individual items (e.g., Bedny et al., 2007) and their relations to each other, than
from simply examining categories as unified wholes.
Summary
In this chapter we have briefly summarized a wide variety of data pertaining to the cogni
tive neuroscience of semantic memory. We reviewed different schemes for characterizing
the organization of semantic memory and argued that the bulk of the evidence converges
to support sensorimotor-based models (which extend sensory-functional theory). Because
these models allow for, and in fact are predicated on, a role for degree and type of experi
ence (which will necessarily vary by individual and by concept), they are able to accom
modate a wide variety of observations. Importantly, they can also make specific, testable
predictions regarding experience. Finally, it is important to emphasize that although often
pitted against one another in service of testing specific hypotheses, sensorimotor and cor
related feature-based models are not at odds with a categorical-like organization. In fact,
both were developed to provide a framework in which a categorical organization can
emerge from commonalities in the way we interact with and experience similar objects.
Page 24 of 40
Semantic Memory
References
Allport, D. A. (1985). Distributed memory, modular subsystems and dysphasia. In S. K.
Newman & R. Epstein (Eds.), Current perspectives in dysphasia (pp. 207–244). Edin
burgh: Churchill Livingstone.
Amedi, A., Merabet, L., Bermpohl, F., & Pascual-Leone, A. (2005). The occipital cortex in
the blind: Lessons about plasticity and vision. Current Directions in Psychological
Science, 16, 306–311.
Antal, A., Nitsche, M. A., & Paulus, W. (2001). External modulation of visual perception in
humans, NeuroReport, 12, 3553–3555.
Antal, A., & Paulus, W. (2008). Transcranial direct current stimulation of visual percep
tion, Perception, 37, 367–374.
Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive
control of memory. Neuropsychologia, 45, 2883–1901.
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral Brain Science, 22, 577–
660.
Bedny, M., Aguirre, G. K., & Thompson-Schill, S. L. (2007). Item analysis in functional
magnetic resonance imaging. NeuroImage, 35, 1093–1102.
Beilock, S. L., Lyons, I. M., Mattarella-Micke, A., Nusbaum, H. C., & Small, S. L. (2008).
Sports experience changes the neural processing of action language. Proceedings of the
National Academy of Sciences, 105, 13269–13273.
Bonner, M. F., & Grossman, M. (2012). Gray matter density of auditory association cortex
relates to knowledge of sound concepts in primary progressive aphasia. Journal of Neuro
science, 32 (23), 7986–7991.
Bonner, M. F., Vesely, L., Price, C., Anderson, C., Richmond, L., Farag, C., et al. (2009). Re
versal of the concreteness effect in semantic dementia. Cognitive Neuropsychology, 26,
568–579.
Bozeat, S., Lambon Ralph, M. A., Patterson, K., Garrard, P., & Hodges, J. R. (2000). Non-
verbal semantic impairment in semantic dementia. Neuropsychologia, 38, 1207–1215.
Bozeat, S., Lambon Ralph, M. A., Patterson, K., & Hodges, J. R. (2002a). The influence of
personal familiarity and context on object use in semantic dementia. Neurocase, 8, 127–
134.
Bozeat, S., Lambon Ralph, M. A., Patterson, K., & Hodges, J. R. (2002b). When objects
lose their meaning: What happens to their use? Cognitive, Affective, & Behavioral Neuro
sciences, 2, 236–251.
Page 25 of 40
Semantic Memory
Bozeat, S., Patterson, K., & Hodges, J. R. (2004). Relearning object use in semantic de
mentia. Neuropsychological Rehabilitation, 14, 351–363.
Bindschaedler, C., Peter-Favre, C., Maeder, P., Hirsbrunner, T., & Clarke, S. (2011). Grow
ing up with bilateral hippocampal atrophy: From childhood to teenage. Cortex, 47, 931–
944.
Breedin, S. D., Saffran, E. M., & Coslett, H. B. (1994). Reversal of the concreteness effect
in a patient with semantic dementia. Cognitive Neuropsychology, 11, 617–660.
Bright, P., Moss, H., & Tyler, L.K. (2004). Unitary vs multiple semantics: PET studies of
word and picture processing. Brain and Language, 89, 417–432.
Buxbaum, L. J., & Saffran, E. M. (2002). Knowledge of object manipulations and object
function: Dissociations in apraxic and nonapraxic subjects. Brain and Language, 82, 179–
199.
Buxbaum, L. J., Veramonti, T., & Schwartz, M. F. (2000). Function and manipulation tool
knowledge in apraxia: Knowing “what for” but not “how.” Neurocase, 6, 83–97.
Caramazza, A., Hillis, A. E., Rapp, B. C., & Romani, C. (1990). The multiple semantics hy
pothesis: Multiple confusions? Cognitive Neuropsychology, 7, 161–189.
Caramazza, A., & Shelton, J. R. (1998). Domain-specific knowledge systems in the brain
the animate-inanimate distinction. Journal of Cognitive Neuroscience 10, 1–34.
Casasanto, D. (2009). Embodiment of abstract concepts: Good and bad in right- and left-
handers. Journal of Experimental Psychology: General, 138, 351–367.
Casasanto, D., & Chrysikou, E. G. (2011). When left is “right”: Motor fluency shapes ab
stract concepts. Psychological Science, 22, 419–422.
Chainay, H., & Humphreys, G. W. (2002). Privileged access to action for objects relative to
words. Psychonomic Bulletin & Review, 9, 348–355.
Chan, A. M., Baker, J. M., Eskandar, E., Schomer, D., Ulbert, I., Marinkovic, K., Cash, S.
C., & Halgren, E. (2011). First-pass selectivity for semantic categories in human an
teroventral temporal lobe. Journal of Neuroscience, 31, 18119–18129.
Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates for per
ceiving and knowing about objects. Nature Neuroscience, 2, 913–919.
Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.
Page 26 of 40
Semantic Memory
Chrysikou, E. G., & Thompson-Schill, S. L. (2010). Are all analogies created equal? Pre
frontal cortical functioning may predict types of analogical reasoning (commentary). Cog
nitive Neuroscience, 1, 141–142.
Clarke, A., Taylor, K. I., & Tyler, L. K. (2011). The evolution of meaning: Spatiotemporal
dynamics of visual object recognition. Journal of Cognitive Neuroscience, 23, 1887–1899.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of
Verbal Learning and Verbal Behavior, 8, 240–247.
Connolly, A. C., Gleitman, L. R., & Thompson-Schill, S. L. (2007). The effect of congenital
blindness on the semantic representation of some everyday concepts. Proceedings of the
National Academy of Sciences, 104, 8241–8246.
Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A
new methodology for the real-time investigation of speech perception, memory, and lan
guage processing. Cognitive Psychology, 6, 84–107.
Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and com
putation of the meaning of chipmunk, cherry, chisel, cheese and cello (and many other
such concrete nouns). Journal of Experimental Psychology: General, 132, 163–201.
Crutch, S. J., & Warrington, E. K. (2005). Abstract and concrete concepts have structural
ly different representational frameworks. Brain, 128, 615–627.
Damasio, A. R. (1989). The brain binds entities and events by multiregional activa
(p. 370)
Farah, M. J., & McClelland, J. L. (1991). A computational model of semantic memory im
pairment: modality specificity and emergent category specificity. Journal of Experimental
Psychology: General, 120, 339–357.
Frith, C. (2000). The role of dorsolateral prefrontal cortex in the selection of action as re
vealed by functional imaging. In S. Monsell & J. Driver (Eds.), Control of cognitive
processes (pp. 549–565). Cambridge, MA: MIT Press.
Funnell, E. (1995b). Objects and properties: A study of the breakdown of semantic memo
ry. Memory, 3, 497–518.
Funnell, E. (2001). Evidence for scripts in semantic dementia: Implications for theories of
semantic memory. Cognitive Neuropsychology, 18, 323–341.
Page 27 of 40
Semantic Memory
Gabrieli, J. D. E., Cohen, N. J., & Corkin, S. (1988). The impaired learning of semantic
knowledge following bilateral medial temporal-lobe resection. Brain & Cognition, 7, 151–
177.
Gabrieli, J. D., Poldrack, R. A., & Desmond, J. E. (1998). The role of left prefrontal cortex
in language and memory. Proceedings of the National Academy of Sciences of the United
States of America, 95 (3), 906–913.
Gage, N., & Hickok, G. (2005). Multiregional cell assemblies, temporal binding, and the
representation of conceptual knowledge in cortex: A modern theory by a “classical” neu
rologist, Carl Wernicke. Cortex, 41, 823–832.
Gainotti, G. (2000). What the locus of brain lesion tells us about the nature of the cogni
tive defect underlying category-specific disorders: A review. Cortex, 36, 539–559.
Gardiner, J. M., Brandt, K. R., Baddeley, A. D., Vargha-Khadem, F., & Mishkin, M. (2008).
Charting the acquisition of semantic knowledge in a case of developmental amnesia. Neu
ropsychologia. 46, 2865–2868.
Garrard, P., & Hodges, J. R. (1999). Semantic dementia: Implications for the neural basis
of language and meaning. Aphasiology, 13, 609–623.
Gates, L., & Yoon, M. G. (2005). Distinct and shared cortical regions of the human brain
activated by pictorial depictions versus verbal descriptions: An fMRI study. NeuroImage,
24, 473–486.
Gerlach, C., Law, I., Gade, A., & Paulson, O.B. (2000). Categorization and category effects
in normal object recognition: A PET study. Neuropsychologia, 38, 1693–1703.
Goldberg, R. F., Perfetti, C. A., & Schneider, W. (2006). Perceptual knowledge retrieval ac
tivates sensory brain regions. Journal of Neuroscience, 26, 4917–4921.
Gonnerman, L. M., Andersen, E. S., Devlin, J. T., Kempler, D., & Seidenberg, M. S. (1997).
Double dissociation of semantic categories in Alzheimer’s disease. Brain and Language,
57, 254–279.
Gonzalez, J., Barros-Loscertales, A., Pulvermuller, F., Meseguer, V., Sanjuan, A., Belloch,
V., et al. (2006). Reading cinnamon activates olfactory brain regions. NeuroImage, 32,
906–912.
Grabowski, T. J., Damasio, H., Tranel, D., Boles Ponto, L. L., Hichwa, R.D., & Damasio, A.
R. (2001). A role for left temporal pole in the retrieval of words for unique entities. Hu
man Brain Mapping, 13, 199–212.
Graham, K. S., Lambon Ralph, M. A., & Hodges, J. R. (1997). Determining the impact of
autobiographical experience on “meaning”: New insights from investigating sports-relat
Page 28 of 40
Semantic Memory
ed vocabulary and knowledge in two cases with semantic dementia. Cognitive Neuropsy
chology, 14, 801–837.
Graham, K. S., Lambon Ralph, M. A., & Hodges, J. R. (1999). A questionable semantics:
The interaction between semantic knowledge and autobiographical experience in seman
tic dementia. Cognitive Neuropsychology, 16, 689–698.
Graham, K. S., Simons, J. S., Pratt, K. H., Patterson, K., & Hodges, J. R. (2000). Insights
from semantic dementia on the relationship between episodic and semantic memory. Neu
ropsychologia, 38, 313–324.
Greve, A., van Rossum, M. C. W., & Donaldson, D. I. (2007). Investigating the functional
interaction between semantic and episodic memory: Convergent behavioral and electro
physiological evidence for the role of familiarity. NeuroImage, 34, 801–814.
Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica, 107, 293–321.
Grossman, M., Koenig, P., DeVita, C., Glosser, G., Alsop, D., Detre, J., & Gee, J. (2002). The
neural basis for category-specific knowledge: An fMRI study. NeuroImage, 15, 936–948.
Hart, J., & Kraut, M. A. (2007). Neural hybrid model of semantic object memory (version
1.1). In J. Hart & M. A. Kraut (Eds.), Neural basis of semantic memory (pp. 331–359). New
York: Cambridge University Press.
Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action
words in human motor and premotor cortex. Neuron, 41, 301–307.
Hauk, O., Shtyrov, Y., & Pulvermuller, F. (2008). The time course of action and action-word
comprehension in the human brain as revealed by neurophysiology. Journal of Physiology,
Paris 102, 50–58.
Hillis, A., & Caramazza, A. (1995). Cognitive and neural mechanisms underlying visual
and semantic processing: Implication from “optic aphasia.” Journal of Cognitive Neuro
science, 7, 457–478.
Hodges, J. R., Patterson, K., Oxbury, S., & Funnell, E. (1992). Semantic dementia: Pro
gressive fluent aphasia with temporal lobe atrophy. Brain, 115, 1783–1806.
Hodges, J. R., Patterson, K., Ward, R., Garrard, P., Bak, T., Perry, R., & Gregory, C. (1999).
The differentiation of semantic dementia and frontal lobe dementia (temporal and frontal
variants of frontotemporal dementia) from early Alzheimer’s disease: A comparative neu
ropsychological study. Neuropsychology, 13, 31–40.
Page 29 of 40
Semantic Memory
Hoenig, K., Müller, C., Herrnberger, B., Spitzer, M., Ehret, G., & Kiefer, M. (2011). Neuro
plasticity of semantic maps for musical instruments in professional musicians. NeuroI
mage, 56, 1714–1725.
Hoenig, K., Sim, E.-J., Bochev, V., Herrnberger, B., & Kiefer, M. (2008). Conceptual flexi
bility in the human brain: Dynamic recruitment of semantic maps from visual, motion and
motor-related areas. Journal of Cognitive Neuroscience, 20, 1799–1814.
Hoffman, P., & Lambon Ralph, M. A. (2011). Reverse concreteness effects are not
(p. 371)
a typical feature of semantic dementia: Evidence for the hub-and-spoke model of concep
tual representation. Cerebral Cortex, 21, 2103–2112.
Hoffman, P., Rogers, T. T., & Lambon Ralph, M. A. (2011). Semantic diversity accounts for
the “missing” word frequency effect in stroke aphasia: Insights using a novel method to
quantify contextual variability in meaning. Journal of Cognitive Neuroscience, 23, 2432–
2446.
Hsu, N. S., Kraemer, D. J. M., Oliver, R. T., Schlichting, M. L., & Thompson-Schill, S. L.
(2011). Color, context, and cognitive style: Variations in color knowledge retrieval as a
function of task and subject variables. Journal of Cognitive Neuroscience, 23, 2554–2557.
Huettig, F., & Altmann, G. T. M. (2005). Word meaning and the control of eye fixation: Se
mantic competitor effects and the visual world paradigm. Cognition, 96, B23–B32.
Humphreys, G. W., & Forde, E. M. (2001). Hierarchies, similarity, and interactivity in ob
ject recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain
Sciences, 24, 453–476.
Humphreys, G. W., Riddoch, M. J., & Quinlan, P. T. (1988). Cascade processes in picture
identification. Cognitive Neuropsychology, 5, 67–103.
Jefferies, E., Patterson, K., Jones, R. W., & Lambon Ralph, M. A. (2009) Comprehension of
concrete and abstract words in semantic dementia. Neuropsychology, 23, 492–499.
Ishai, A., Ungerleider, L. G., & Haxby, J. V. (2000). Distributed neural systems for the gen
eration of visual images. Neuron, 28, 979–990.
Kable, J. W., Kan, I. P., Wilson, A., Thompson-Schill, S. L., & Chatterjee, A. (2005). Concep
tual representations of action in lateral temporal cortex. Journal of Cognitive Neuro
science, 17, 855–870.
Kan, I. P., Alexander, M. P., & Verfaellie, M. (2009). Contribution of prior semantic knowl
edge to new episodic learning in amnesia. Journal of Cognitive Neuroscience, 21, 938–
944.
Kan, I. P., Kable, J. W., Van Scoyoc, A., Chatterjee, A., & Thompson-Schill, S. L. (2006).
Fractionating the left frontal response to tools: Dissociable effects of motor experience
and lexical competition. Journal of Cognitive Neuroscience, 18, 267–277.
Page 30 of 40
Semantic Memory
Kan, I. P., & Thompson-Schill, S. L. (2004). Effect of name agreement on prefrontal activi
ty during overt and covert picture naming. Cognitive, Affective, & Behavioral Neuro
science, 4, 43–57.
Kellenbach, M. L., Brett, M., & Patterson, K. (2001). Large, colorful or noisy? Attribute-
and modality-specific activations during retrieval of perceptual attribute knowledge. Cog
nitive, Affective, & Behavioral Neuroscience, 1, 207–221.
Kellenbach, M., Brett, M., & Patterson, K. (2003). Actions speak louder than functions:
The importance of manipulability and action in tool representation. Journal of Cognitive
Neuroscience, 15, 30–46.
Kiefer, M., Sim, E.-J., Herrnberger, B., Grothe, J. & Hoenig, K. (2008). The sound of con
cepts for markers for a link between auditory and conceptual brain systems. Journal of
Neuroscience, 28, 12224–12230.
Kiefer, M., Sim, E.-J., Liebich, S., Hauk, O., & Tanaka, J. (2007). Experience-dependent
plasticity of conceptual representations in human sensory-motor areas. Journal of Cogni
tive Neuroscience, 19, 525–542.
Kosslyn, S. M., & Thompson, W. L. (2000). Shared mechanisms in visual imagery and visu
al perception: Insights from cognitive neuroscience. In M. S. Gazzaniga (Ed.), The new
cognitive neurosciences (2nd ed., pp. 975–985). Cambridge, MA: MIT Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its chal
lenge to Western thought. New York: Basic Books.
Lambon Ralph, M. A., Lowe, C., & Rogers, T. (2007). Neural basis of category-specific se
mantic deficits for living things: evidence from semantic dementia, HSVE and a neural
network model. Brain, 130 (Pt 4), 1127–1137.
Mahon, B. Z., Anzellotti, S., Schwarzbach, J., Zampini, M., & Caramazza, A. (2009). Cate
gory-specific organization in the human brain does not require visual experience. Neuron,
63, 397–405.
Mahon, B. Z., & Caramazza, A. (2003). Constraining questions about the organization &
representation of conceptual knowledge. Cognitive Neuropsychology, 20, 433–450.
Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothe
sis and a new proposal for grounding conceptual content. Journal of Physiology—Paris,
102, 59–70.
Martin, A. (2007) The representation of object concepts in the brain. Annual Review of
Psychology, 58, 25–45.
Martin, A., & Chao, L. L. (2001). Semantic memory and the brain: Structure and process
es. Current Opinion in Neurobiology, 11, 194–201.
Page 31 of 40
Semantic Memory
Martin, A., Haxby, J. V., Lalonde, F. M., Wiggs, C. L., & Ungerleider, L. G. (1995). Discrete
cortical regions associated with knowledge of color and knowledge of action. Science,
270, 102–105.
Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of cat
egory-specific knowledge. Nature, 379, 649–652.
McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach to
semantic cognition. Nature Reviews Neuroscience, 4, 310–322.
McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural
representations of word meaning. Journal of Experimental Psychology: General, 126, 99–
130.
Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets top-
down: Neuronal interactions during perception and imagery. Cerebral Cortex, 14, 1256–
1265.
Mesulam, M. M., Grossman, M., Hillis, A., Kertesz, A., & Weintraub, S. (2003). The core
and halo of primary progressive aphasia and semantic dementia. Annals of Neurology, 54,
S11–S14.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An
nual Review of Neuroscience, 24, 167–202.
Mirman, D., & Graziano, K. M. (2012). Damage to temporoparietal cortex decreases inci
dental activation of thematic relations during spoken word comprehension. Neuropsy
chologia, 50, 1990–1997.
Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K.-M., Malave, V. L., Mason, R. A.,
Just, M. A. (2008). Predicting human brain activity associated with the meanings of
nouns. Science, 320, 1191–1195.
Moore, C. J., & Price, C. J. (1999). A functional neuroimaging study of the variables that
generate category-specific object processing differences. Brain, 122, 943–962.
Mummery, C. J., Patterson, K., Hodges, J. R., & Price, C. J. (1998). Functional neuroanato
my of the semantic system: Divisible by what? Journal of Cognitive Neuroscience, 10,
766–777.
(p. 372) Mummery, C. J., Patterson, K., Wise, R. J. S., Vandenbergh, R., Price, C. J., &
Hodges, J. R. (1999). Disrupted temporal lobe connections in semantic dementia. Brain,
122, 61–73.
Myung, J., Blumstein, S. E., Yee, E., Sedivy, J. C., Thompson-Schill, S. L., & Buxbaum, L. J.
(2010). Impaired access to manipulation features in apraxia: Evidence from eyetracking
and semantic judgment tasks. Brain and Language, 112, 101–112.
Page 32 of 40
Semantic Memory
Noppeney, U., Friston, K., & Price, C. (2003). Effects of visual deprivation on the organiza
tion of the semantic system. Brain, 126, 1620–1627.
Noppeney, U., Patterson, K., Tyler, L. K., Moss, H., Stamatakis, E. A., Bright, P., Mummery,
C., & Price, C. J. (2007). Temporal lobe lesions and semantic impairment: A comparison of
herpes simplex virus encephalitis and semantic dementia. Brain, 130 (Pt 4), 1138–1147.
Noppeney, U., & Price, C. J. (2002). Retrieval of visual, auditory, and abstract semantics.
NeuroImage, 15, 917–926.
Noppeney, U., & Price, C. J. (2004). Retrieval of abstract semantics. NeuroImage, 22, 164–
170.
Noppeney, U., Price, C. J., Friston, K. J., & Penny, W. D. (2006). Two distinct neural mecha
nisms for category-selective responses. Cerebral Cortex, 1, 437–445.
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Mul
ti-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10, 424–430.
Okada, T., Tanaka, S., Nakai, T., Nishiwaza, S., Inui, T., Sadato, N., Yonekura, Y., & Kon
ishi, J. (2000). Naming of animals and tools: A functional magnetic resonance imagine
study of categorical differences in the human brain areas commonly used for naming vi
sually presented objects. Neuroscience Letters, 296, 33–36.
O’Kane G., Kensinger E. A., Corkin S. (2004). Evidence for semantic learning in profound
amnesia: An investigation with patient H.M. Hippocampus 14, 417–425.
Oliver, R. T., Geiger, E. J., Lewandowski, B. C., & Thompson-Schill, S. L. (2009). Remem
brance of things touched: How sensorimotor experience affects the neural instantiation of
object form. Neuropsychologia, 47, 239–247.
Oliver, R. T., Parsons, M. A., & Thompson-Schill, S. L. (2008). Hands on learning: Varia
tions in sensorimotor experience alter the cortical response to newly learned objects. San
Francisco: Cognitive Neuroscience Society.
Paivio, A. (1969). Mental Imagery in associative learning and memory. Psychological Re
view, 76, 241–263.
Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart, and Winston.
Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of
Psychology, 45, 255–287.
Page 33 of 40
Semantic Memory
Patterson, K., Lambon-Ralph, M. A., Jefferies, E., Woollams, A., Jones, R., Hodges, J. R., &
Rogers, T. T. (2006). “Presemantic” cognition in semantic dementia: Six deficits in search
of an explanation. Journal of Cognitive Neuroscience, 18, 169–183.
Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know?
The representation of semantic knowledge in the human brain. Nature Reviews Neuro
science, 8, 976–987.
Perani, D., Cappa, S. F., Bettinardi, V., Bressi, S., Gorno-Tempini, M., Matarrese, & Fazio,
F. (1995). Different neural systems for the recognition of animals and man-made tools.
NeuroReport, 6, 1637–1641.
Phillips, J. A., Noppeney, U., Humphreys, G. W., & Price, C. J. (2002). Can segregation
within the semantic system account for category specific deficits? Brain, 125, 2067–2080.
Pobric, G., Jefferies, E., & Lambon Ralph, M. A. (2010) Induction of category-specific vs.
general semantic impairments in normal participants using rTMS. Current Biology, 20,
964–968.
Price, C. J., Noppeney, U., Phillips, J. A., & Devlin, J. T. (2003). How is the fusiform gyrus
related to category-specificity? Cognitive Neuropsychology, 20 (3-6), 561–574.
Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental
imagery. Psychological Bulletin, 80, 1–24.
Raposo, A., Moss, H. E., Stamatakis, E. A., & Tyler, L. K. (2009). Modulation of motor and
premotor cortices by actions, action words and action sentences. Neuropsychologia, 47,
388–396.
Riddoch, M. J., & Humphreys, G. W. (1987). Visual object processing in a case of optic
aphasia: A case of semantic access agnosia. Cognitive Neuropsychology, 4, 131–185.
Rissman, J., & Wagner, A. D. (2012). Distributed representations in memory: Insights from
functional brain imaging. Annual Review of Psychology, 63, 101–128.
Rizzolatti, G., & Craighero, L. (2004.) The mirror-neuron system. Annual Review of Neu
roscience 27, 169–192.
Rodriguez-Ferreiro, J., Gennari, S. P., Davies, R., & Cuetos, F. (2011). Neural correlates of
abstract verb processing, Journal of Cognitive Neuroscience, 23, 106–118.
Rogers, T. T., Hocking, J., Mechelli, A., Patterson, K., & Price, C. (2005). Fusiform activa
tion to animals is driven by the process, not the stimulus. Journal of Cognitive Neuro
science, 17, 434–445.
Page 34 of 40
Semantic Memory
Rogers, T. T., Hocking, J., Noppeney, U., Mechelli, A., Gorno-Tempini, M., Patterson, K., &
Price, C. (2006). The anterior temporal cortex and semantic memory: Reconciling find
ings from neuropsychology and functional imaging. Cognitive, Affective and Behavioral
Neuroscience, 6, 201–213.
Rogers, T. T., Ivanoiu, A., Patterson, K., & Hodges, J. R. (2006). Semantic memory in
Alzheimer’s disease and the frontotemporal dementias: A longitudinal study of 236 pa
tients. Neuropsychology, 20, 319–335.
Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R.,
& Patterson, K. (2004). The structure and deterioration of semantic memory: A neuropsy
chological and computational investigation. Psychological Review, 111, 205–235.
Rogers, T. T., & Patterson, K. (2007). Object categorization: Reversals and explanations of
the basic-level advantage. Journal of Experimental Psychology: General, 136, 451–469.
Rumiati, R. I., & Humphreys, G. W. (1998). Recognition by action: Dissociating visual and
semantic routes to action in normal observers. Journal of Experimental Psychology: Hu
man Perception and Performance, 24, 631–647.
Saffran, E. M., Coslett, H. B., & Keener, M. T. (2003b). Differences in word associations to
pictures and words. Neuropsychologia, 41, 1541–1546.
Saffran, E. M., Coslett, H. B., Martin, N., & Boronat, C. (2003a). Access to knowl
(p. 373)
edge from pictures but not words in a patient with progressive fluent aphasia. Language
and Cognitive Processes, 18, 725–757.
Saffran, E. M., & Schwartz, M. F. (1994). Of cabbages and things: Semantic memory from
a neuropsychological perspective—A tutorial review. In C. Umilta & M. Moscovitch (Eds.),
Attention and performance XV (pp. 507–536). Cambridge, MA: MIT Press.
Schreuder, R., Flores D’Arcais, G. B., & Glazenborg, G. (1984). Effects of perceptual and
conceptual similarity in semantic priming. Psychological Research, 45, 339–354.
Sevostianov, A., Horwitz, B., Nechaev, V., Williams, R., Fromm, S., & Braun, A. R. (2002).
fMRI study comparing names versus pictures for objects. Human Brain Mapping, 16, 168–
175.
Shimamura, A. P. (2000). The role of the prefrontal cortex in dynamic filtering. Psychobi
ology, 28, 207–218.
Simmons, K., & Barsalou, L.W. (2003). The similarity-in-topography principle: Reconciling
theories of conceptual deficits. Cognitive Neuropsychology, 20, 451–486.
Simmons, W., Ramjee, V., Beauchamp, M., McRae, K., Martin, A., & Barsalou, L. (2007). A
common neural substrate for perceiving and knowing about color. Neuropsychologia, 45,
2802–2810.
Page 35 of 40
Semantic Memory
Sirigu, A., Duhamel, J. R., Poncet, M. (1991). The role of sensorimotor experience in ob
ject recognition: A case of multimodal agnosia. Brain, 114, 2555–2573.
Snowden, J. S., Bathgate, D., Varma, A., Blackshaw, A., Gibbons, Z. C., & Neary, D. (2001).
Distinct behavioral profiles in frontotemporal dementia and semantic dementia. Journal of
Neurology, Neurosurgery, & Psychiatry, 70, 323–332.
Snowden, J. S., Goulding, P. J., & Neary, D. (1989). Semantic dementia: A form of circum
scribed cerebral atrophy. Behavioural Neurology, 2, 167–182.
Snowden, J. S., Griffiths, H. L., & Neary, D. (1994). Semantic dementia: Autobiographical
contribution to preservation of meaning. Cognitive Neuropsychology, 11, 265–288.
Snowden, J. S., Griffiths, H. L., & Neary, D. (1996). Semanticepisodic memory interactions
in semantic dementia: Implications for retrograde memory function. Cognitive Neuropsy
chology, 13, 1101–1137.
Snowden, J. S., Griffiths, H. L., & Neary, D. (1999). The impact of autobiographical experi
ence on meaning: Reply to Graham, Lambon Ralph, and Hodges. Cognitive Neuropsychol
ogy, 11, 673–687.
Spivey, M. J. (2007). The continuity of mind. New York: Oxford University Press.
Squire, L. R. (1987). Memory and brain. New York: Oxford University Press.
Squire, L. R., & Zola, S. M. (1998). Episodic memory, semantic memory, and amnesia. Hip
pocampus, 8, 205–211.
Tanaka, J. M., & Presnell, L. M. (1999). Color diagnosticity in object recognition. Percep
tion & Psychophysics, 61, 1140–1153.
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integra
tion of visual and linguistic information in spoken language comprehension. Science, 268
(5217), 1632–1634.
Tarr, M. J., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual
processing automatized by expertise. Nature Neuroscience, 3, 764–769.
Taylor, L. J., & Zwaan, R. A. (2009). Action in cognition: The case of language. Language
and Cognition, 1, 45–58.
Thompson-Schill, S. L., Aguirre, G. K., D’Esposito, M., & Farah, M. J. (1999). A neural ba
sis for category and modality specificity of semantic knowledge. Neuropsychologia, 37,
671–676.
Page 36 of 40
Semantic Memory
Thompson-Schill, S. L., Bedny, M., & Goldberg, R. F. (2005). The frontal lobes and the reg
ulation of mental activity. Current Opinion in Neurobiology, 15, 219–224.
Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
prefrontal cortex in retrieval of semantic knowledge: A re-evaluation. Proceedings of the
National Academy of Sciences, 94, 14792–14797.
Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com
petition on prefrontal activity during word generation. Neuron, 23, 513–522.
Thompson-Schill, S. L., Swick, D., Farah, M. J., D’Esposito, M., Kan, I. P., & Knight, R. T.
(1998). Verb generation in patients with focal frontal lesions: A neuropsychological test of
neuroimaging findings. Proceedings of the National Academy of Sciences, 95, 15855–
15860.
Trumpp, N., Kliese, D., Hoenig, K., Haarmaier, T., & Kiefer, M. (2013). A causal link be
tween hearing and word meaning: Damage to auditory association cortex impairs the pro
cessing of sound-related concepts. Cortex, 49, 474–486.
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.),
Organization of memory (pp. 381–403). New York: Academic Press.
Tyler, L. K., & Moss, H. E. (2001). Towards a distributed account of conceptual knowl
edge. Trends in Cognitive Sciences, 5, 244–252.
Tyler, L. K., Moss, H. E., Durrant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual struc
ture and the structure of concepts: a distributed account of category-specific deficits.
Brain & Language, 75, 195–231.
Tyler, L. K., Stamatakis, E. A., Bright, P., Acres, K., Abdalah, S., Rodd, J. M., & Moss, H. E.
(2004). Processing objects at different levels of specificity. Journal of Cognitive Neuro
science, 16, 351–362.
Vandenberghe, R., Price, C., Wise, R., Josephs, O., & Frackowiak, R. S. J. (1996). Function
al anatomy of a common semantic system for words and pictures. Nature, 383, 254–256.
Vargha-Khadem, F., Gadian, D. G., Watkins, K. E., Connelly, A., Van Paesschen, W.,
Mishkin, M. (1997). Differential effects of early hippocampal pathology on episodic and
semantic memory. Science 277, 376–380.
Page 37 of 40
Semantic Memory
Warrington, E. K., & McCarthy, R. A. (1983). Category specific access dysphasia. Brain,
106, 869–878.
Warrington, E. K., & Shallice, T. (1984). Category specific semantic impairments. Brain,
107, 829–854.
Weber, M., Thompson-Schill, S. L., Osherson, D., Haxby, J., & Parsons, L. (2009). Predict
ing judged similarity of natural categories from their neural representations. Neuropsy
chologia, 47, 859–868.
Weisberg, J., Turennout, M., & Martin, A. (2007). A neural system for learning about ob
ject function. Cerebral Cortex. 17, 513–521.
Wiggett, A. J., Pritchard, I. C., & Downing, P. E. (2009). Animate and inanimate objects in
human visual cortex: evidence for task-independent category effects. Neuropsychologia,
47, 3111–3117.
Willems, R. M., Hagoort, P., & Casasanto, D. (2010). Body-specific representations of ac
tion verbs: Neural evidence from right- and left-handers. Psychological Science, 21, 67–
74.
Wise, R. J., Chollet, F., Hadar, U., Friston, K., Hoffner, E., & Frackowiak, R. (1991). Distri
bution of cortical neural networks involved in word comprehension and word retrieval.
Brain, 114 (Pt 4), 1803–1817.
Wise, R. J. S., Howard, D., Mummery, C. J., Fletcher, P., Leff, A., Buchel, C., & Scott, S. K.
(2000). Noun imageability and the temporal lobes. Neuropsychologia, 38, 985–994.
Witt, J. K., Kemmerer, D., Linkenauger, S. A., & Culham, J. (2010). A functional role for
motor simulation in naming tools. Psychological Science, 21, 1215–1219.
Wolk, D. A., Coslett, H. B., & Glosser, G. (2005). The role of sensory-motor information in
object recognition: Evidence from category-specific visual agnosia, Brain and Language
94, 131–146.
Wright, N. D., Mechelli, A., Noppeney, U., Veltman, D. J., Rombouts, S. A. R. B., Glensman,
J., Haynes, J. D., & Price, C. J. (2008). Selective activation around the left occipito-tempo
ral sulcus for words relative to pictures: Individual variability or false positives? Human
Brain Mapping, 29, 986–1000.
Page 38 of 40
Semantic Memory
Yee, E., Chrysikou, E., Hoffman, E., & Thompson-Schill, S. L. (2013). Manual experience
shapes object representations. Psychological Science, 24 (6), 909–919.
Yee, E., Hufstetler, S., & Thompson-Schill, S. L. (2011). Function follows form: Activation
of shape and function features during object identification. Journal of Experimental Psy
chology: General, 140, 348–363.
Yee, E., & Sedivy, J. C. (2006). Eye movements to pictures reveal transient semantic acti
vation during spoken word recognition. Journal of Experimental Psychology, Learning,
Memory, and Cognition, 32, 1–14.
Notes:
(1) . Linguists use the term semantic in a related, but slightly narrower way— to refer to
the meanings of words or phrases.
(2) . There is mounting evidence that the reverse may also be true: semantic memory has
been found to support episodic memory acquisition (Kan et al., 2009) and retrieval (Gra
ham et al., 2000; Greve et al., 2007).
(3) . These ideas about the relationship between knowledge and experience echo those of
much earlier thinkers. For example, in “An Essay Concerning Human Understanding,”
John Locke considers the origin of “ideas,” or what we now refer to as “concepts,” such
as “whiteness, hardness, sweetness, thinking, motion, elephant …”, arguing: “Whence
comes [the mind] by that vast store, which the busy and boundless fancy of man has
painted on it with an almost endless variety? … To this I answer, in one word, From expe
rience.” Furthermore, in their respective works on aphasia, Wernicke (1874) and Freud
(1891) both put forth similar ideas (Gage & Hickok, 2005).
(4) . Recall that the domain-specific hypothesis allows for distributed representations
within different categories.
(5) . Moreover (returning to the task effects discussed in 4.3), it has been suggested that
the presence or absence of direct overlap may reflect the existence of multiple types of
color representations that vary in resolution (or abstraction) with differences in task-con
text influencing whether information is retrieved at a fine (high-resolution) level of detail
or a more abstract level. Retrieving high- (but not necessarily low-) resolution color
knowledge results in overlap with color perception regions (Hsu et al., 2011).
(6) . We use the word “epiphenomenal” here to remain consistent with the objections that
are sometimes raised in this literature; however, we note that the literal translation of the
meaning of this term (i.e., an event with no effectual consequence) may not be suited to
Page 39 of 40
Semantic Memory
descriptions of neural activity, which can always be described as having an effect on its
efferent targets.
(7) . Note that an analogous critique—and importantly, a response analogous to the one
that follows—could be made for any sensorimotor modality.
Eiling Yee
Eiling Yee is staff scientist at the Basque Center on Cognition, Brain and Language.
Evangelia G. Chrysikou
Sharon L. Thompson-Schill
Page 40 of 40
Cognitive Neuroscience of Episodic Memory
This chapter examines the neural underpinnings of episodic memory, focusing on process
es taking place during the experience itself, or encoding, and those involved in memory
reactivation or retrieval at a later time point. It first looks at the neuropsychological case
study of Henry Molaison to show how the medial temporal lobe (MTL) is linked to the for
mation of new episodic memories. It also assesses the critical role of the MTL in general
and the hippocampus in particular in the encoding and retrieval of episodic memories.
The chapter then describes the memory as reinstatement model, episodic memory re
trieval, the use of functional neuroimaging as a tool to probe episodic memory, and the
difference in memory paradigm. Furthermore, it discusses hippocampal activity during
encoding and its connection to associative memory, activation of the hippocampus and
cortex during episodic memory retrieval, and how hippocampal reactivation mediates cor
tical reactivation during memory retrieval.
Keywords: episodic memory, medial temporal lobe, Henry Molaison, memory retrieval, encoding, memory reacti
vation, hippocampus, memory as reinstatement model, functional neuroimaging, associative memory
Introduction
What did she say? Where did you go? What did you eat? The answers to questions like
these require that you access previous experiences, or episodes of your life. Episodic
memory is memory for our unique past experiences that unfolded in a particular time and
place. By contrast, other forms of memory do not require that you access a particular
time and place from the past, such as knowing what a giraffe is (semantic memory) or
knowing how to ride a bike (procedural memory). Thus, episodic memory is, in a sense, a
collection of our personal experiences, the past that still resides inside of us and makes
up much of the narrative of our lives.
In fact, the distinction between episodic remembering and semantic knowing was de
scribed by Tulving (1972): “Episodic memory refers to memory for personal experiences
and their temporal relations, while semantic memory is a system for receiving, retaining,
Page 1 of 26
Cognitive Neuroscience of Episodic Memory
In this chapter, we describe the current status of our understanding of the neural under
pinnings of episodic memory, focusing on processes taking place during the experience it
self, or encoding, and those involved in reactivating or retrieving memories (p. 376) at a
later time point. To facilitate and organize the discussion, we present a current model of
memory formation and retrieval that, perhaps arguably, has received the most support to
date (Figure 18.1). At the very least, it is the model that motivates and shapes the design
of most current investigations into episodic memory processing in the brain.
Page 2 of 26
Cognitive Neuroscience of Episodic Memory
With this one case, it became clear that the MTL is absolutely critical for the formation of
new episodic memories. Although there is some debate regarding the extent of presurgi
cal memory loss (retrograde amnesia) that he exhibited and whether H.M. was able to
learn new semantic information, it is well agreed that his main deficit was the inability to
form new episodic memories. Thus, this major discovery set the stage for systems neuro
scientists to begin to examine the MTL and how it contributes to episodic memory forma
tion and retrieval.
The MTL regions damaged in this now-famous surgery were the hippocampus as well as
portions of the underlying cortex: entorhinal, perirhinal, and posterior parahippocampal
cortices. Since this discovery, it has been demonstrated that other patients (p. 377) with
damage to the hippocampus and MTL cortical structures also exhibit anterograde amne
sia. Of current focus and highlighted in this chapter is the critical role of the MTL in gen
eral and the hippocampus in particular in the encoding and retrieval of episodic memo
ries.
Page 3 of 26
Cognitive Neuroscience of Episodic Memory
During an experience (see Figure 18.1A), the representation of that experience in the
brain is characterized by distributed cortical and subcortical patterns of neural activation
that represent our sensory-perceptual (visual, auditory, somatosensory) experience, ac
tions, internal thoughts, and emotions. Thus, in a sense, the brain is processing and rep
resenting the current context and state of the organism. In addition, it is thought that the
distributed pattern of cortical firing filters into the MTL and converges in the hippocam
pus, where a microrepresentation of the current episode is created. The connections be
tween neurons that make up the hippocampal representation for each episode are
thought to strengthen through long-term potentiation (LTP). The idea is that the connec
tions between neurons that are simultaneously active in the hippocampus are more likely
to become strengthened compared with those that are not (Davachi, 2004; Hebb, 1949).
Importantly, the LTP-mediated strengthening can happen over time both immediately af
ter the experience and during post-encoding sleep (Diekelmann & Born, 2010; Ellenbo
gen et al, 2007). Thus, what results from a successfully encoded experience is a hip
pocampal neural pattern (HNP) and a corresponding cortical neural pattern (CNP). Im
portantly, what differentiates the two patterns is that the HNP is thought to contain the
critical connections between representations that allow the CNP to be accessed later and
be attributed to a particular time and place, that is, an episodic memory. Thus, without
the HNP, it is difficult to recover the precise CNP associated with a prior experience.
Episodic memory retrieval encompasses multiple stages of processing, including cue pro
cessing (see Figure 18.1B), reinstatement of the HNP (see Figure 18.1C), and finally rein
statement of the CNP (see Figure 18.1D). Cue processing simply refers to the fact that re
trieval is often cued by an external stimulus or internal thought whose representation is
supported by cortical regions (visual cortex, auditory cortex, etc.). In addition, the cue is
thought to serve as one key that might unlock the neural patterns associated with the pri
or experience or episode that contained that stimulus. Specifically it is thought that dur
ing successful retrieval, the retrieval cue triggers what has been referred to as hippocam
pal pattern completion (see Figure 18.1C). Pattern completion refers to the idea that a
complete pattern (a memory) can be reconstructed from part of the pattern (a partial
cue). In other words, a part of or the entire HNP that was established during encoding
and subsequently strengthened becomes reinstated. Finally, the reinstatement of the HNP
is then thought to reinstate aspects of the CNP (see Figure 18.1D), resulting in the con
current reactivation of disparate cortical regions that were initially active during the ex
perience. Importantly, it is this final stage of reactivation that is thought to underlie our
subjective experience of remembering and, in turn, drive mnemonic decision making.
Page 4 of 26
Cognitive Neuroscience of Episodic Memory
Why use imaging to understand memory? As noted by Tulving (1983), equivalent memory
output responses can arise from fundamentally different internal cognitive states. Thus
measures such as memory accuracy and response time, although important, are limited
with respect to disentangling the contributing underlying cognitive processes that sup
port memory formation and retrieval. Furthermore, the current MAR model specifically
posits that representations and processes characterizing an encoding event are at least
partially reinstated during later successful retrieval. Examination of the veracity of this
model is greatly facilitated by the ability to use the multivariable information contained in
a functional image of brain activity during encoding and retrieval. In other words, one
can ask whether a specific encoding pattern of brain activity (across hundreds of voxels)
is reinstated during retrieval, and whether it relates to successful recovery of episodic de
tails. This is arguably much more efficient than, for example, asking whether subjects re
activate a particular representation during encoding. As evidence discussed further in
this chapter demonstrates, although functional imaging of episodic memory has laid the
groundwork for characterizing episodic encoding and retrieval in terms of distributed
neural systems and multivariate patterns of activity across subcortical and cortical re
gions, important questions about the functional roles of cortical regions comprising these
systems and their interactions still remain.
Page 5 of 26
Cognitive Neuroscience of Episodic Memory
ory trace. It is precisely these mechanisms that are thought to be lost in patients with am
nesia, who are capable of attending to stimuli but arguably never form lasting episodic
memories after the onset of their amnesia. It is noteworthy, however, that recent work
has demonstrated that patients with amnesia also appear to have deficits in novel associa
tive processing and imagination (Addis et al, 2007; Hassabis et al, 2007; but see Squire et
al, 2010).
Neuroimaging has been used in a multitude of ways to understand the underlying neural
mechanisms supporting episodic memory. Early approaches were directly motivated by
the levels of processing framework that demonstrated that differential processing during
encoding modulates memory formation (Craik & Lockhart, 1972). Thus, the earliest neu
roimaging studies examined brain activation during cognitive tasks that required seman
tic or associative processing (Fletcher et al., 1998; Montaldi et al., 1998; Rombouts et al.,
1997; Shallice et al., 1994). This approach was targeted because it was known that se
mantic or elaborative/associative processing leads to better later memory, and determin
ing what brain systems were activated during this kind of processing could help to illumi
nate the neural substrates of successful episodic encoding. Results from these initial task-
level investigations revealed that activation in left lateral prefrontal cortex (PFC), and the
MTL was enhanced during semantic processing of study items relative to more superficial
processing of those items (Fletcher et al., 1998; Montaldi et al., 1998; Rombouts et al.,
1997; Shallice et al., 1994). Involvement of left PFC in semantic retrieval processes has
been observed consistently across many paradigms (Badre et al., 2005; Fiez, 1997; Pe
tersen et al., 1989; Poldrack et al., 1999; Thompson-Schill et al., 1997; Wagner et al.
2001).
Other approaches used to reveal the neural substrates of episodic memory formation
have compared brain activation to novel versus familiar stimuli within the context of the
same task. The logic in these studies is that, on average, we are more likely to encode
novel stimuli compared with familiar ones (Tulving et al., 1994). The results of this ap
proach again demonstrated greater activation in the MTL to novel compared with familiar
stimuli (Dolan & Fletcher, 1997; Gabrieli et al., 1997; Stern et al., 1996), suggesting that
processes in these regions are related to the encoding of novel information into memory.
imaging (fMRI) that enabled a tighter linking between brain activation and episodic en
coding was the measurement of trial-by-trial estimates of blood-oxygen-level-dependent
(BOLD) activation, compared with earlier PET and fMRI block designs. Accordingly, at
present, most studies of memory now employ event-related fMRI designs and, thus, mea
sure brain activation and patterns of activity across multiple voxels on individual trials.
The advantage of measuring trial-by-trial estimates is that brain activation during the ex
periencing of events that are later remembered can be directly contrasted with activity
for events that are not remembered. This approach has been referred to as the difference
in memory (DM) paradigm (Paller et al., 1987; Rugg, 1995; Sanquist et al., 1980; Wagner
et al., 1999). This paradigm affords better experimental control because events yielding
successful and unsuccessful memory encoding can be compared within the same
Page 6 of 26
Cognitive Neuroscience of Episodic Memory
individual performing the same task. Also, brain activity during encoding can be related
to a variety of memory outcomes, measured by different retrieval tests. For example, one
can determine whether each presented item was or was not remembered and whether,
for example, some contextual detail was also recovered during remembering. The varying
memory status of individual events can be used to query the brain data to determine what
brain areas show patterns of activation relating to successful memory formation and the
recovery of contextual details.
Initial groundbreaking studies used this powerful DM approach to reveal brain regions
important for successful memory formation using fMRI data. Wagner et al. (1998)
demonstrated that brain activation in posterior parahippocampal cortex and left PFC dur
ing semantic processing of words was greater during processing of words that partici
pants later successfully recognized with high confidence. At the same time, Brewer et al.
(1998) found that activation in right PFC and bilateral parahippocampal gyrus during the
viewing of scene images correlated with later subjective ratings of memory using the re
member/know paradigm.
Taken together, these two groundbreaking studies revealed two important principles of
episodic memory formation. First, brain activation during encoding depends, in some
brain regions, on the content of the stimulus itself, with the lateralized PFC DM effects
for verbal and visual-spatial stimuli seen in these two studies nicely aligning with existing
work showing that left inferior frontal gyrus is important in semantic processing (Pol
drack et al., 1999), whereas pictorial stimuli have been shown to engage right PFC to a
greater extent (Kelley, 1998). Thus, successful encoding is related to enhanced activation
in a subset of brain regions engaged during stimulus and task processing.
The second principle that was evident in these early studies but not directly examined un
til more recently is the notion of a specialized system or a domain-general mechanism un
derlying episodic memory formation. Both of these initial studies (and almost every study
performed since) found that activation within the MTL correlates with successful episodic
memory formation. However, how subregions within the MTL differentially contribute to
episodic encoding is still not known and is debated both in the animal (Eichenbaum et al.,
2012; Meunier et al., 1993; Parkinson et al., 1988; Zola-Morgan & Squire, 1986) and hu
man literature (Davachi et al., 2003; Jackson & Schacter, 2004; Kirwan & Stark, 2004;
Mayes et al., 2004; Squire et al., 2004; Stark & Squire, 2001, 2003; for reviews, see
Davachi, 2006; Diana et al., 2007; Eichenbaum et al., 2007; Ranganath et al, 2010; Wixted
& Squire, 2011). The next section provides further discussion focusing on the role of the
hippocampus in associative encoding, presumably by laying down a strong HNP (see Fig
ure 18.1A) that can later be accessed in the context of an appropriate retrieval cue.
Page 7 of 26
Cognitive Neuroscience of Episodic Memory
because it appears to be consistent with the growing body of literature that PRc mecha
nisms are important for knowing that an item has previously occurred even when you
cannot remember in what particular episodic context.
One important advance, fueled by observations that PRc and PHc are sensitive to differ
ent stimulus classes (e.g., scenes versus objects), is that the MTL cortex may contribute
to domain-specific encoding of object and scene-like, or contextual, details, whereas the
hippocampus may be important in domain-general binding together of these various dis
tinct episodic elements (Davachi, 2006). First, it has been demonstrated that PRc re
sponds more to objects and faces than scenes and that PHc shows the opposite response
pattern: greater activation to scenes than objects and faces (Liang et al., 2013; Litman et
al., 2009). Second, when study items were scenes and the associated context was devised
to be one of six repeating objects, it was seen that PRc enoding activation now predicted
the later recovery of the associated objects (a form of context), whereas succesful scene
memory (a form of item memory) was supported by PHc (Awipi & Davachi, 2008). Third,
it was shown that both hippocampal and PRc activation predicted whether object details
were later recalled, whereas only hippocampal activation additionally predicted whether
other contextual details were later recovered (Staresina & Davachi, 2008; see also Park &
Rugg, 2011). Finally, in a tightly controlled study in which study items were always words
but participants treated the word either as a cue to imagine an object or a cue to imagine
a scene, it was shown that PRc activation predicted later source memory for the object-
imagery trials and that PHc activation predicted later source memory for the scene-im
agery trials (Staresina et al., 2011). Taken together, it is clear that involvement of MTL
cortex in encoding is largely dependent on the content of the episode and on what aspects
of the episode are attended. By contrast, across all of the aforementioned studies and a
whole host of other experiments, hippocampal activation appears to selectively predict
whether associated details are later recovered, irrespective of the context of those details
(Awipi & Davachi, 2008; Park et al., 2012; Prince et al., 2005; Rugg et al., 2012; Staresina
& Davachi, 2008; Staresina et al, 2011). These results bring some clarity to the seemingly
inconsistent findings that activation in the PHc has been shown to both correlate with lat
er item (Davachi & Wagner, 2002; Eldridge et al., 2000; Kensinger et al., 2003) and asso
ciative memory (Awipi & Davachi, 2008; Cansino et al., 2002; Davachi et al., 2003; Kirwan
& Stark, 2004; Ranganath et al., 2004; Staresina et al., 2011) across different paradigms.
It is likely that the role of PRc (p. 381) and PHc in item versus associative encoding will
vary depending on the nature of the stimuli being treated as the “item” and the “con
text” (Staresina et al., 2011).
It is important to note that although the studies cited above provide strong evidence for a
selective role of the hippocampus in binding episodic representations so that they can be
later retrieved, another proposal has recently emerged linking hippocampal processes
with establishing “strong” memories—both strongly “recollected” and strongly “famil
iar” (Kirwan et al, 2008; Shrager et al., 2008; Song et al., 2011; see Hayes et al., 2011 for
support for both accounts). This account is not necessarily in conflict with the aforemen
tioned notion that PRc is important in both item encoding and item–feature binding. How
ever, it does raise questions about the often-used dichotomy linking PRc and hippocampal
Page 9 of 26
Cognitive Neuroscience of Episodic Memory
function with the subjective sense of “knowing” and “remembering.” It is also important
to keep in mind that the paradigms being used to distinguish item from associative, re
membering from knowing, and low from high memory strength are all going to suffer
from ambiguities in the interpretion of each specific condition. For example, high confi
dence recognition can be associated with the recovery of all kinds of episodic detail.
Thus, if you only ask for one detail and a participant fails to recover that detail, the par
ticipant might actually be recollecting other noncriterial episodic details. Furthermore, it
is unclear what underlying operations, or information processing, is being proposed to
support the memory strength theory. This is in contrast to the complementary learning
systems approach, which is strongly grounded in how underlying processing within PRc
and hippocampus can come to support item and associative encoding.
Thus, taken together, current functional imaging results strongly suggest that greater
hippocampal activation during the encoding of an event is correlated with the later recov
ery of the details associated with that event. However, very little has been done to identi
fy whether a specific HNP needs to be reinstated or completed in order to allow for recov
ery of episodic details. Instead there have been recent reports that the level of reactiva
tion in a region of interest can correlate with later memory. These results have specifical
ly been seen in work examining post-encoding rest and sleep periods (Rasch et al., 2007;
Peigneux et al., 2004; 2006; Tambini et al., 2010). It is assumed that overall hippocampal
BOLD activation during encoding may thus be a good proxy for laying down a strong
HNP. However, this assumption needs to be tested.
Consistent with this idea, the hippocampus has been found to activate during retrieval,
specifically when the retrieval appears to be episodic in nature (i.e., characterized by the
recovery of associations or contextual details). A number of measures have been used to
isolate episodic from nonepisodic retrieval. For example, the remember/know paradigm is
a recognition memory paradigm in which participants must distinguish among studied
items that are recollected with episodic details (remember), studied items that are merely
familiar (know), and unstudied items (new). Studied items endorsed as “remembered” are
associated with greater hippocampal activity during retrieval than studied items en
dorsed as “known” or “new” (Eldridge et al., 2000; Wheeler & Buckner, 2004; see also,
Daselaar et al., 2006; Yonelinas et al., 2005). Another way to isolate episodic retrieval is
to compare the successful retrieval of an association to successful item recognition in the
absence of successful associative retrieval. The hippocampus has been found to be more
Page 10 of 26
Cognitive Neuroscience of Episodic Memory
active during associative retrieval than during nonassociative retrieval (Dobbins et al,
2003; Kirwan & Stark, 2004; Yonelinas et al., 2001). In addition, hippocampal activation
during autobiographical memory retrieval has been found to correlate with subjective re
ports of a number of recollective qualities, including detail, emotionality, and personal
significance (Addis et al., 2004). The current evidence is consistent with the notion that
the hippocampus is driven by the recovery of associations or episodic details.
The MAR model posits that episodic retrieval does not just activate hippocampus general
ly, but also specifically reactivates through pattern completion the same hippocampal
neurons (the HNP) that were active during the encoding episode. Because of methodolog
ical obstacles, there are currently no fMRI studies demonstrating that the hippocampus
(p. 382) reinstates encoding activity during retrieval (but see the later section, Hippocam
Page 11 of 26
Cognitive Neuroscience of Episodic Memory
global forms of retrograde amnesia. This kind of global memory impairment would be ex
pected if the damaged cortex represented a large or crucial component of many episodic
memories. This appears to be the case in some individuals with damage to visual cortex
(Rubin & Greenberg, 1998). In contrast to amnesia caused by MTL lesions, amnesia
caused by damage to the perceptual systems seems to be predominantly retrograde in na
ture. Thus, it appears that the cortex is pivotal in representing the contents of memory
that are reactivated during remembering.
Strong support for cortical reactivation also requires evidence that regions of cortex that
are activated directly by some stimulus during encoding can be indirectly reactivated by
an associate of that stimulus (i.e., the partial cue in Figure 18.1B) during retrieval. In the
past decade, substantial evidence drawn from functional neuroimaging studies of memo
ry has supported the idea that the presentation of partial cues can lead to the reactiva
tion of cortex during episodic retrieval. These studies have largely converged on a single
paradigm, which we describe in detail here. This paradigm relies critically on the associa
tion of neutral retrieval cues with different kinds of associative or contextual information,
such that the stimuli that engage different regions of cortex during encoding are re
trieved but not actually presented during retrieval. During encoding, brain activity is
recorded while participants encounter and build associations between neutral stimuli
(e.g., words) and multiple categories of stimuli that evoke activity in different brain re
gions (e.g., pictures vs. sounds). During retrieval, brain activity is recorded while partici
pants are presented with the neutral stimuli as retrieval cues and are instructed to make
decisions about their memory of the cue or its associates. For example, participants might
make a decision about whether the cue was studied or not (recognition decision), how
well they remember encoding the cue (remember/know decision), or how the cue was en
coded or what its associates were (cued recall/source decision).
Early studies using this paradigm, relying on what we will refer to as the region of inter
est (ROI) approach, demonstrated that circumscribed cortical regions that are differen
tially engaged during the encoding of different kinds of associations are also differentially
engaged during their retrieval. For example, in an event-related fMRI study, Wheeler, Pe
tersen, and Buckner (2000) had participants associate words (e.g., dog) with either corre
sponding sounds (“WOOF!”) or corresponding pictures (e.g., a picture of a dog) during
encoding. During (p. 383) subsequent retrieval, participants were presented with each
studied word and instructed to indicate whether it was studied with a sound or picture.
Wheeler et al. (2000) found that the fusiform gyrus, a region in the visual association cor
tex that is preferentially activated by pictures compared with sounds during encoding, is
also more strongly activated during cued picture retrieval than cued sound retrieval.
They also found that Heschl’s gyrus, a region in the auditory association cortex that is
preferentially activated by sounds compared with pictures during encoding, is more
strongly activated during cued sound retrieval than cued picture retrieval. That is, re
gions that are activated during sound and picture encoding are reactivated during sound
and picture retrieval. This finding has been replicated across studies for both pictures
Page 12 of 26
Cognitive Neuroscience of Episodic Memory
(Vaidya et al., 2002; Wheeler & Buckner, 2003; Wheeler & Buckner, 2004; Wheeler et al.,
2006) and sounds (Nyberg et al., 2000).
In addition, the specificity of reactivation has been demonstrated in several studies that
found that the retrieval of different visual categories evokes activity in stimulus-selective
regions of visual cortex. For example, the ventral and dorsal visual processing streams,
which process object information and location information, respectively (Ungerleider &
Mishkin, 1982), have been shown to reactivate during the retrieval of object and location
information (Khader et al., 2005). Similarly, the fusiform face area (FFA) and parahip
pocampal place area (PPA), two regions in the ventral visual stream that have been shown
to respond preferentially to visually presented faces and scenes, respectively (Epstein &
Kanwisher, 1998; Kanwisher et al., 1997), have correspondingly been shown to reactivate
during the retrieval of faces and places (O’Craven & Kanwisher, 2000; Ranganath et al.,
2004; see also Danker, Fincham, & Anderson, 2011). In a series of studies, Slotnick and
colleagues demonstrated that even regions very early in the visual processing stream re
activate during retrieval of the appropriate stimulus: Color retrieval reactivates color pro
cessing region V8 (Slotnick, 2009a), retrieval of items in motion reactivates motion pro
cessing region MT+ (Slotnick & Thackral, 2011), and retrieval of items presented to the
right or left visual field reactivates the contralateral area of extrastriate cortex (BA 18) in
a retinotopic manner (Slotnick, 2009b). These studies demonstrate that different process
ing modules within the visual system are reactivated during the retrieval of specific kinds
of visual information (for a more in-depth discussion, see Danker & Anderson, 2010).
Polyn and colleagues (2005) were the first to apply the classification method to the study
of memory in this manner. Participants studied lists containing photographs of famous
faces, famous locations, and common objects, and subsequently retrieved as many list
items as possible in a free recall paradigm (i.e., no cues were presented). Polyn et al.
trained classifiers to differentiate between encoding trials in the three conditions, and
tested the classifiers using the free recall data. Consistent with their predictions, Polyn et
Page 13 of 26
Cognitive Neuroscience of Episodic Memory
al. found that the reactivation of a given stimulus type’s pattern of activity correlated with
verbal recall of items of that stimulus type. It is worth noting that the voxels that con
tributed to classification decisions overlapped with, but were not limited to, the category-
selective regions that one would expect to find using the ROI approach (i.e., the FFA and
PPA). Polyn et al. present their technique of applying MVPA to memory retrieval as “a
powerful new tool that researchers can use to test and refine theories of how people mine
the recesses of the past” (p. 1966).
As mentioned earlier, the retrieval of associations and contexts is one of the hallmarks of
episodic retrieval. Insofar as episodic retrieval is characterized by cortical reactivation,
we should expect greater reactivation when episodic details are recovered (p. 384) during
retrieval. Consistent with this, the earliest studies to find reactivation of encoding regions
during retrieval required associative retrieval (Nyberg et al., 2000; Wheeler et al., 2000).
Along the same lines, reactivation correlates with subjective reports of the recovery of
episodic details. In studies using the remember/know paradigm, items endorsed as re
membered have been found to evoke more reactivation than items endorsed as known us
ing both the ROI (Wheeler & Buckner, 2004) and classifier (Johnson et al., 2009) ap
proaches. Along the same lines, Daselaar et al. (2008) found that the degree of activation
in auditory and visual association cortex was positively correlated with participant rat
ings of reliving during autobiographical memory retrieval, suggesting that reactivation
correlates with the number or quality of retrieved details. Overall, the current evidence
suggests that reactivation correlates with subjective ratings of episodic retrieval.
It is often the case that a particular retrieval cue is associated with multiple episodes. Re
trieval becomes more difficult in the presence of competing associations (Anderson,
1974), and this is often reflected in increased prefrontal and anterior cingulate cortex
(ACC) involvement during retrieval (e.g., Danker, Gunn, & Anderson, 2008; Thompson-
Schill et al., 1997; Wagner et al., 2001). It has been theorized that this increased frontal
activity represents the engagement of control processes that select among competing al
ternatives during retrieval (e.g., Danker, Gunn, & Anderson, 2007; Thompson-Schill et al.,
1997; Wagner et al., 2001). According to Kuhl and colleagues (2010), if competition re
sults from the simultaneous retrieval of multiple episodes, then competition should be re
flected in the simultaneous reactivation of competing memories during retrieval. In their
study, Kuhl et al. (2010) instructed participants to associate words (e.g., “lamp”) with im
ages of well-known faces (e.g., Robert De Niro) or scenes (e.g., Taj Majal). Some words
were paired with one associate, and some words were paired with two associates: one
face and one scene. During retrieval, participants were presented with a word as a cue
and instructed to recall its most recent associate and indicate the visual category (face or
scene). Kuhl et al. (2010) used a classifier approach to capture the amount of target and
competitor reactivation during retrieval and found that competition decreased target
classifier accuracy, presumably because of increased competitor reactivation. Further
more, when classifier accuracy was low, indicating high competition, frontal engagement
was increased. A follow-up study using three kinds of images (faces, scenes, and objects)
confirmed that competition corresponded to increased competitor reactivation, and found
that competitor reactivation correlated with ACC engagement during retrieval (Kuhl,
Page 14 of 26
Cognitive Neuroscience of Episodic Memory
Brainbridge, & Chun, 2012). These studies demonstrate that competition during retrieval
is reflected in the reactivation of competing memories.
By capitalizing on stimulus types with distinct patterns of hippocampal and cortical activi
ty, one should be able to simultaneously measure hippocampal and cortical reactivation
during retrieval using classification methods. In fact, there has already been some suc
cess in using classifiers to identify cognitive states (Hassabis et al., 2009), and even indi
vidual memories (Chadwick et al., 2010), using patterns of activity across hippocampal
voxels. However, no study has attempted to classify retrieval trials using a classifier
trained on the encoding data. This would be a true demonstration of hippocampal reacti
vation during retrieval.
Page 15 of 26
Cognitive Neuroscience of Episodic Memory
References
Addis, D. R., Moscovitch, M., Crawley, A. P., & McAndrews, M. P. (2004). Recollective
qualities modulate hippocampal activation during autobiographical memory retrieval.
Hippocampus, 14, 752–762.
Addis, D. R., Wong A. T., & Schacter D. L. (2007). Remembering the past and imagining
the future: Common and distinct neural substrates during event construction and elabo
ration. Neuropsychologia, 45 (7) 1363–1377.
Alvarez, P., & Squire, L. R. (1994). Memory consolidation and the medial temporal lobe: A
simple network model. Proceedings of the National Academy of Sciences, 91, 7041–7045.
Awipi, T., & Davachi L. (2008). Content-specific source encoding in the human medial
temporal lobe. Journal of Experimental Psychology: Learning, Memory and Cognition, 34
(4) 769–779.
Badre, D., Poldrack, R. A., Paré-Gloev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissocia
ble controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal
cortex. Neuron, 47, 907–918.
Bowles, B., Crupi, C., Mirsattari, S. M., Pigott, S. E., Parrent, A. G., Pruessner, J. C.,
Yonelinas, A. P., & Köhler, S. (2007). Impaired familiarity with preserved recollection after
anterior temporal-lobe resection that spares the hippocampus. Proceedings of the Nation
al Academy of Sciences, 41, 16382–16387.
Brewer, J. B., Zhao, Z., Desmond, J. E., Glover, G. H., & Gabrieli, J. D. E. (1998). Making
memories: Brain activity that predicts how well visual experience will be remembered.
Science, 281, 1185–1187.
Cansino, S., Maquet, P., Dolan, R. J., & Rugg, M. D. (2002). Brain activity underlying en
coding and retrieval of source memory. Cerebral Cortex, 12, 1048–1056.
Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding individual
episodic memory traces in the human hippocampus. Current Biology, 20, 544–547.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory
research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684.
Danker, J. F., & Anderson, J. R. (2010). The ghosts of brain states past: Remembering re
activates the brain regions engaged during encoding. Psychological Bulletin, 136, 87–102.
Danker, J. F., Fincham, J. M., & Anderson, J. R. (2011). The neural correlates of competi
tion during memory retrieval are modulated by attention to the cues. Neuropsychologia,
49, 2427–2438.
Page 16 of 26
Cognitive Neuroscience of Episodic Memory
Danker, J. F., Gunn, P., & Anderson, J. R. (2008). A rational account of memory predicts
left prefrontal activation during controlled retrieval. Cerebral Cortex, 18, 2674–2685.
Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial tem
poral lobes: recollection, familiarity, and novelty. Journal of Neurophysiology, 96, 1902–
1911.
Daselaar, S. M., Rice, H. J., Greenberg, D. L., Cabeza, R., LaBar, K. S., & Rubin, D. C.
(2008). The spatiotemporal dynamics of autobiographical memory: Neural correlates of
recall, emotional intensity, and reliving. Cerebral Cortex, 18, 217–229.
Davachi, L. (2004). The ensemble the plays together, stays together. Hippocampus, 14, 1–
3.
Davachi, L., Mitchell, J. P., & Wagner, A. D. (2003). Multiple routes to memory: Distinct
medial temporal lobe processes build item and source memories. Proceedings of the Na
tional Academy of Sciences, 100, 2157–2162.
Davachi, L. (2006). Item, context and relational episodic encoding in humans. Current
Opinion in Neurobiology, 16, 693–700.
Davachi, L. (2007). Encoding: The proof is still required. In H. L. Roediger III, Y. Dudai, &
S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 137–143). New York: Oxford
University Press.
Dobbins, I. G., Rice, H. J., Wagner, A. D., & Schacter, D. L. (2003). Memory orientation and
success: separable neurocognitive components underlying episodic recognition. Neu
ropsychologia, 41, 318–333.
Dougal, S., Phelps, E. A., & Davachi, L. (2007). The role of the medial temporal lobe in
item recognition and source recollection of emotional stimuli. Cognitive, Affective & Be
havioral Neurosciences, 7 (3) 233–242.
Diana, R. A., Yonelinas, A. P., & Ranganath, C. (2007). Imaging recollection and familiarity
in the medial temporal lobe: a three-component model. Trends in Cognitive Sciences, 11,
379–386.
Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuro
science, 11, 114–126.
Dolan, R. J., & Fletcher, P. C. (1997). Dissociating prefrontal and hippocampal function in
episodic memory encoding. Nature, 388, 582–585.
Eichanbam, H., Yonelinas, A. R., & Ranganath, C. (2007). The medial temporal lobes and
recognition memory. Annual Reviews in Neuroscience, 30, 123–152.
Page 17 of 26
Cognitive Neuroscience of Episodic Memory
Eichenbaum, H., Sauvege, M., Fortin, N., Komorovski, R., & Lipton, P. (2012). Towards a
functional organization of episodic memory in the medial temporal lobe. Neuroscience &
Biobehavioral Reviews, 36, 1597–1608.
Eichenbaum, H., Fortin, N., Sauvage, M., Robitsek, R. J., & Farovik, A. (2010) An animal
model of amnesia that uses Receiver Operating Characteristics (ROC) analysis to distin
guish recollection from familiarity deficits in recognition memory. Neuropsychologia 48,
2281–2289.
Eldridge, L. L., Knowlton, B. J., Furmanski, C. S., Bookheimer, S. Y., & Engel, S. A. (2000).
Remembering episodes: A selective role for the hippocampus during retrieval. Nature
Neuroscience, 3, 1149–1152.
Ellenbogen, J. M., Hu, P. T., Payne, J. D. Titone, D., & Walker, M. P. (2007) Human relation
al memory requires time and sleep. Proceedings of the National Academy of Sciences,
104 (18), 7723–7728.
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ
ment. Nature, 392, 598–601.
Farah, M. J. (1988). Is visual imagery really visual? Overlooked evidence from neu
(p. 386)
Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5, 79–83.
Fletcher, P. C., Shallice, T., & Dolan, R. J. (1998). The functional roles of prefrontal cortex
in episodic memory. I. Encoding. Brain, 121, 1239–1248.
Gabrieli, J. D. E., Brewer, J. B., Desmond, J. E., & Glover, G. H. (1997). Separate neural
bases of two fundamental memory processes in the human medial temporal lobe. Science,
11, 264–266.
Gelbard-Sagiv, H., Mukamel, R., Harel, M., Malach, R., & Fried, I. (2008). Internally gen
erated reactivation of single neurons in human hippocampus during free recall. Science,
322, 96–101.
Giovanello, K. S., Verfaellie, M., & Keane, M. M. (2003). Disproportionate deficit in asso
ciative recognition relative to item recognition in global amnesia. Cognitive, Affective,
and Behavioral Neuroscience, 3, 186–194.
Gold, J. J., Smith, C. N., Baylet, P. J., Shrager, Y., Brewer, J. B., Stark, C. E. L., Hopkins, R.
O., & Squire, L. R. (2006). Item memory, source memory, and the medial temporal lobe:
Concordant findings from fMRI and memory-impaired patients. Proceedings of the Na
tional Academy of Sciences of the United States of America, 103, 9351–9356.
Page 18 of 26
Cognitive Neuroscience of Episodic Memory
Hannula, D. E., & Ranganath, C. (2008). Medial temporal lobe activity predicts successful
relational memory binding. Journal of Neuroscience, 28 (1), 116–124.
Haskins, A. L., Yonelinas, A. P., & Ranganath, C. (2008). Perirhinal cortex supports uniti
zation and familiarity-based recognition of novel associations. Neuron, 59 (4), 554–560.
Hassabis, D., Kumaran D., Vann S. D., & Macguire E. A. (2007). Patients with hippocam
pal amnesia cannot imagine new experiences. Proceedings of the National Academy of
Sciences, 105 (5), 1726–1731.
Hassabis, D., Chu, C., Rees, G., Weiskopf, N., Molyneux, P. D., & Maguire, E. A. (2009).
Decoding neuronal ensembles in the human hippocampus. Current Biology, 19, 546–554.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor
tex. Science, 293, 2425–2430.
Hayes, S. M., Buchler, N., Stokes, J., Kragel, J., & Cabeza, R. (2011). Neural correlates of
confidence during item and recognition and source memory retrieval: Evidence for both
dual-process and strength memory theories. Journal of Cognitive Neuroscience, 23, 3959–
3971.
Hebb, D. O. (1949). The organization of behavior. New York: Wiley & Sons.
Jackson, O., & Schacter, D. L. (2004). Encoding activity in anterior medial temporal lobe
supports subsequent associative recognition. NeuroImage, 21, 456–462.
Johnson, J. D., Minton, B. R., & Rugg, M. D. (2008). Context-dependence of the electro
physiology correlates of recollection. NeuroImage, 39, 406–416.
Johnson, J. D., McDuff, S. G. R., Rugg, M. D., & Norman, K. A. (2009). Recollection, famil
iarity, and cortical reinstatement: A multivoxel pattern analysis. Neuron, 63, 697–708.
Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17,
4302–4311.
Kelley, W. M., Miezin, F. M., McDermott, K. B., Buckner, R. L., Raichle, M. E., Cohen, N. J.,
Ollinger, J. M., Akbudak, E., Conturo, T. E., Snyder, A. Z., & Petersen, S. E. (1998). Hemi
spheric specialization in human dorsal frontal cortex and medial temporal lobe for verbal
and nonverbal memory encoding. Neuron, 20, 927–936.
Kensinger, E. A., Clarke, R. J., & Corkin, S. (2003). What neural correlates underlie suc
cessful encoding and retrieval? A functional magnetic resonance imaging study using a
divided attention paradigm. Journal of Neuroscience, 23, 2407–2415.
Page 19 of 26
Cognitive Neuroscience of Episodic Memory
Kensinger, E. A., & Schacter, D. (2006). Amygdala activity is associated with the success
ful encoding of item, but not source, information for positive and negative stimuli. Journal
of Neuroscience, 26 (9), 2564–2570.
Khader, P., Burke, M., Bien, S., Ranganath, C., & Rosler, F. (2005). Content-specific activa
tion during associative long-term memory retrieval. NeuroImage, 27, 805–816.
Kirwan, C. B., & Stark, C. L. (2004). Medial temporal lobe activation during encoding and
retrieval of novel face-name pairs. Hippocampus, 14, 919–930.
Kirwan, C. B., Wixted, J. T., & Squire, L.R. (2008). Activity in the medial temporal lobe
predicts memory strength, whereas activity in the prefrontal cortex predicts recollection.
Journal of Neuroscience, 28, 10548–10541.
Komorowski, R. W., Manns, J. R., & Eichenbaum, H. (2009) Robust conjunctive item-place
coding by hippocampal neurons parallels learning what happens. Journal of
Neuroscience, 29, 9918–9929.
Kuhl, B. A., Rissman, J., Chun, M. M., & Wagner, A. D. (2010). Fidelity of neural reactiva
tion reveals competition between memories. Proceedings of the National Academy of
Sciences, 108, 5903–5908.
Kuhl, B. A., Brainbridge, W. A., & Chun, M. M. (2012). Neural reactivation reveals mecha
nisms for updating memory. Journal of Neuroscience, 32, 3453–3461.
Levin, D. T., Simons, D. J., Angelone, B. L., & Chabris, C. F. (2002). Memory for centrally
attended changing object in an incidental real-world change detection paradigm. British
Journal of Psychology, 92, 289–302.
Liang, J., Wagner, A. D., & Preston, A. R. (2013). Content representation in the human me
dial temporal lobe. Cerebral Cortex, 23 (1), 80–96.
Litman, L., & Davachi, L. (2008) Distributed learning enhances relational memory consol
idation. Learning & Memory, 15, 711–716.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why are there complemen
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102, 419–
457.
Meunier, M., Bachevalier, J., Mashkin, M., & Murray, E. A. (1993) Effects of visual recog
nition of combined and separate ablations of the entorhinal and perirhinal cortex in rhe
sus monkeys. Journal of Neuroscience, 13, 5418–5432.
Page 20 of 26
Cognitive Neuroscience of Episodic Memory
Mitchell, T., Hutchinson, R., Niculescu, S., Pereira, F., Wang, X., Just, M., & New
(p. 387)
man, S. (2004). Learning to decode cognitive states from brain images. Machine Learning,
57, 145–175.
Montaldi, D., Mayes, A. R., Barnes, A., Pirie, H., Hadley, D. M., Patterson, J., & Wyper, D. J.
(1998). Associative encoding of pictures activates the medial temporal lobes. Human
Brain Mapping, 6, 85–104.
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer
appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533.
Moscovitch, R., Rosenbaum, R. S., Gilboa, A., Addis, D. R., Westmacott, R., Grady, C.,
McAndrews, M. P., Levine, B., Black, S., Winocur, G., & Nadel, L. (2005). Functional neu
roanatomy of remote episodic, semantic, and spatial memory: A unified account based on
multiple trace theory. Journal of Anatomy, 207, 35–66.
Norman, K. A., & Schacter, D. L. (1997). False recognition in young and older adults: Ex
ploring the characteristics of illusory memories. Memory & Cognition, 25, 838–848.
Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contribu
tions to recognition memory: a complementary-learning-systems approach. Psychological
Review, 110, 611–646.
Nyberg, L., Habib, R., McIntosh, A. R., & Tulving, E. (2000). Reactivation of encoding-re
lated brain activity during memory retrieval. Proceedings of the National Academy of
Sciences, 97, 11120–11124.
O’Craven, K. M., & Kanwisher, N. (2000). Mental imagery of faces and places activates
corresponding stimulus-specific brain regions. Journal of Cognitive Neuroscience, 12,
1013–1023.
Paller, K. A., Kutas, M., & Mayes, A. R. (1987). Neural correlates of encoding in an inci
dental learning paradigm. Electroencephalography and Clinical Neurophysiology, 67,
360–371.
Park, H., & Rugg, M. D. (2011) Neural correlates of encoding within- and across-domain
inter-item associations. Journal of Cognitive Neuroscience, 23, 2533–2543.
Park, H., Shannon V., Biggan, J., & Spann, C. (2012). Neural activity supporting the forma
tion of associative memory versus source memory. Brain Research, 1471, 81–92.
Parkinson, J. K., Murray, E. A., & Mishkin, M. (1988). A selective mnemonic role for the
hippocampus in monkeys: memory for the location of objects. Journal of Neuroscience, 8,
4159–4167.
Page 21 of 26
Cognitive Neuroscience of Episodic Memory
Peigneux, P., Laureys, S., Fuchs, S., Collette, F., Perrin, F., Reggers, J., Phillips, C., Deguel
dre, C., Del Fiore, G., Aerts, J., Luxen, A., & Maquet, P. (2004). Are spatial memories
strengthened in the human hippocampus during slow wave sleep? Neuron, 44, 535–545.
Peigneux, P., Orban P., Balteau E., Degueldre C., Luxen A., Laureys S., & Paquet P. (2006).
Offline persistence of memory-related cerebral activity during active wakefulness. PLoS
Biology 4 (4), e100.
Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1989). Positron
emission tomographic studies of processing of single words. Journal of Cognitive Neuro
science, 1, 153–170.
Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
E. (1999). Functional specialization for semantic and phonological processing in left infe
rior frontal cortex. NeuroImage, 10, 15–35.
Polyn, S. M., Natu, V. S., Cohen, J. D., & Norman, K. A. (2005). Category-specific cortical
activity precedes retrieval during memory search. Science, 310, 1963–1966.
Prince, S. E., Daselaar, S. M., & Cabeza, R. (2005). Neural correlates of relational memo
ry: Successful encoding and retrieval of semantic and perceptual associations. Journal of
Neuroscience, 25 (5), 1203–1210.
Ranganath, C., Cohen, M. X., Dam, C., & D’Esposito, M. (2004). Inferior temporal, pre
frontal, and hippocampal contributions to visual working memory maintenance and asso
ciative memory retrieval. Journal of Neuroscience, 24, 3917–3925.
Ranganath, C. (2010). A unified framework for the functional organization of the medial
temporal lobes and the phenomenology of episodic memory. Hippocampus, 20, 1263–
1290.
Rasch, B., Buchel, C., Gais, S., & Born, J. (2007). Odor cues during slow-wave sleep
prompt declarative memory consolidation. Science, 315, 1426–1429.
Roediger, H. L. (2000). Why retrieval is the key process to understanding human memory.
In E. Tulving (Ed.), Memory, consciousness, and the brain: The Tallinn conference (pp. 52–
75). Philadelphia: Psychology Press.
Rombouts, S. A. R. B., Machielsen, W. C. M., Witter, M. P., Barkhof, F., Lindelboom, J., &
Scheltens, P. (1997). Visual associative encoding activates the medial temporal lobe: A
functional magnetic resonance imaging study. Hippocampus, 7, 594–601.
Rubin, D. C., & Greenberg, D. L. (1998). Visual memory-deficit amnesia: A distinct amne
sia presentation and etiology. Proceedings of the National Academy of Sciences, 95, 5413–
5416.
Page 22 of 26
Cognitive Neuroscience of Episodic Memory
Rugg, M. D. (1995). ERP studies of memory. In M. D. Rugg & M. G. H. Coles (Eds.), Elec
trophysiology of mind: Event-related brain potentials and cognition (pp. 133–170). Lon
don: Oxford University Press.
Sanquist, T. F., Rohrbaugh, J., Syndulko, K., & Lindsley, D. B. (1980). An event-related po
tential analysis of coding processes in human memory. Progress in Brain Research, 54,
655–660.
Sauvage, M. M., Fortin, N. J., Owens, C. B., Yonelinas, A. P., & Eichenbaum, H. (2008).
Recognition memory: Opposite effects of hippocampal damage on recollection and famil
iarity. Nature Neuroscience, 11, 16–18.
Shallice, T., Fletcher, P., Frith C. D., Grasby, P., Frackowiak, R. S. J., & Dolan, R. J. (1994).
Brain regions associated with acquisition and retrieval of verbal episodic memory. Nature,
368, 633–635.
Shrager, Y., Kirwan, C. B., & Squire, L. R. (2008). Activity in both hippocampus and
perirhinal cortex predicts the memory strength of subsequently remembered information.
Neuron, 59, 547–553.
Slotnick, S. D. (2009a). Memory for color reactivates color processing region. NeuroRe
port, 20, 1568–1571.
Slotnick, S. D. (2009b). Rapid retinotopic reactivation during spatial memory. Brain Re
search, 1268, 97–111.
Slotnick, S. D., & Thakral, P. P. (2011). Memory for motion and spatial location is mediat
ed by contralateral and ipsilateral motion processing cortex. NeuroImage, 55, 794–800.
Song, Z., Wixted, J. T., Smith, C. N., & Squire, L. R. (2011). Different nonlinear functions
in hippocampus and perirhinal cortex relating functional MRI activity to memory
strength. Proceedings of the National Academy of Sciences, 108, 5783–5788.
Spiers, H. J., Burgess, N., Hartley, T., Vargha-Khadem, F., & O’Keefe, J. (2001). Bi
(p. 388)
lateral hippocampal pathology impairs topographical and episodic memory but not visual
pattern matching. Hippocampus, 11 (6), 715–725.
Squire, L. R., Stark, C. E. L., & Clark, R. E. (2004). The medial temporal lobe. Annual Re
view of Neuroscience, 27, 279–306.
Squire, L. R., van der Horst, A. S., McDuff, S. G., Frascino, J. C., Hopkins, R. O., &
Mauldin, K. N. (2010). Role of the hippocampus in remembering the past and imagining
the future. Proceedings of the National Academy of Sciences, 107 (44) 19044–19048.
Staresina, B. P., & Davachi, L. (2008). Selective and shared contributions of the hip
pocampus and perirhinal cortex to episodic item and associative encoding. Journal of Cog
nitive Neuroscience, 20 (8), 1478–1489.
Page 23 of 26
Cognitive Neuroscience of Episodic Memory
Staresina, B. P., & Davachi, L. (2009). Mind the gap: Binding experiences across space
and time in the human hippocampus. Neuron, 63 (3), 267–276.
Staresina, B. P., Duncan, K. D., Davachi, L. (2011). Perirhinal and parahippocampal cor
tices differentially contribute to later recollection of object- and scene-related event de
tails. Journal of Neuroscience, 31 (24), 8739–8747.
Stark, C. E. L., & Squire, L. R. (2001). Simple and associative recognition memory in the
hippocampal region. Learning & Memory, 8, 190–197.
Stark, C. E. L., & Squire, L. R. (2003). Hippocampal damage equally impairs memory for
single items and memory for conjunctions. Hippocampus, 13, 281–292.
Tambini, A., Ketz, N., & Davachi, L. (2010). Enhanced brain correlations during rest are
related to memory for recent experiences. Neuron, 65, 280–290.
Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceedings
of the National Academy of Sciences, 94, 14792–14797.
Tulving, E. (1973). Encoding specificity and retrieval processes in episodic memory. Psy
chological Review, 80, 352–373.
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.),
Organization Memory (pp. 381–403). New York: Academic Press.
Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press.
Tulving, E., Markowitsch, H. J., Kapur, S., Habib, R., & Houle, S. (1994). Novelty encoding
networks in the human brain—positron emission tomography data. NeuroReport, 5, 2525–
2528.
Uncapher, M. R., Otten, L. J., & Rugg, M. D. (2006). Episodic encoding is more than the
sum of its parts: An fMRI investigation of multifeatural contextual encoding. Neuron, 52
(3) 547–556.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam
bridge, MA: MIT Press.
Vaidya, C. J., Zhao, M., Desmond, J. E., & Gabrieli, J. D. E. (2002). Evidence for cortical
specificity in episodic memory: Memory-induced re-activation of picture processing areas.
Neuropsychologia, 40, 2136–2143.
Vann, S.D., Tsivilis D., Denby C.E., Quamme J.R., Yonelinas A.P., Aggleton J.P., Montaldi D.
& Mayes A.R. (2009) Impaired recollection but spared familiarity in patients with extend
ed hippocampal system damage revealed by 3 convergent methods. Proceedings of the
National Academy of Sciences, 106 (13): 5442–5447.
Page 24 of 26
Cognitive Neuroscience of Episodic Memory
Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B.
R., & Buckner, R. L. (1998). Building memories: Remembering and forgetting verbal expe
riences as predicted by brain activity. Science, 281, 1188–1191.
Wagner, A. D., Koutstaal, W., & Schacter, D. L. (1999). When encoding yields remember
ing: Insights from event-related neuroimaging. Philosophical Transactions of the Royal
Society of London, Biology, 354, 1307–1324.
Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning:
Left prefrontal cortex guides controlled semantic retrieval. Neuron, 31, 329–338.
Wheeler, M. E., Petersen, S. E., & Buckner, R. L. (2000). Memory’s echo: Vivid remember
ing reactivates sensory-specific cortex. Proceedings of the National Academy of Sciences,
97, 11125–11129.
Wheeler, M. E., Shulman, G. L., Bucckner, R. L., Miezin, F. M., Velanova, K., & Petersen, S.
E. (2006). Evidence for separate perceptual reactivation and search processes during re
membering. Cerebral Cortex, 6, 949–959.
Wixted, J. T., & Squire, L.R. (2004). Recall and recognition are equally impaired in pa
tients with selective hippocampal damage. Cognitive, Affective, and Behavioral Neuro
science, 4, 58–66.
Wixted, J. T., & Squire, L. R. (2011). The medial temporal lobe and the attributes of mem
ory. Trends in Cognitive Sciences, 15, 210–217.
Yick, Y. Y., & Wilding, E. L. (2008). Material-specific correlates of memory retrieval. Neu
roReport, 19, 1463–1467.
Yonelinas, A. P., Hopfinger, J. B., Buonocore, M. H., Kroll, N. E. A., & Baynes, K. (2001).
Hippocampal, parahippocampal, and occipital-temporal contributions to associative and
item recognition memory: an fMRI study. NeuroReport, 12, 359–363.
Yonelinas, A. P., Kroll, N. E., Quamme, J. R., Lazzara, M. M., Sauvé, M. J., Widaman, K. F.,
& Knight, R. T. (2002). Effects of extensive temporal lobe damage or mild hypoxia on rec
ollection and familiarity. Nature Neuroscience, 11, 1236–1241.
Yonelinas, A. P., Otten, L. J., Shaw, K. N., & Rugg, M. D. (2005). Separating the brain re
gions involved in recollection and familiarity in recognition memory. Journal of Neuro
science, 25, 3002–3008.
Page 25 of 26
Cognitive Neuroscience of Episodic Memory
Yu, S. S., Johnson, J. D., & Rugg, M. D. (2012) Hippocampal activity during recognition
memory co-varies with the accuracy and confidence of source memory judgments. Hip
pocampus, 22, 1429–1437.
Zola-Morgan, S., & Squire, L. R. (1986). Memory impairment in monkeys following le
sions limited to the hippocampus. Behavioral Neuroscience, 100, 155–160.
Lila Davachi
Lila Davachi, Department of Psychology, Center for Neural Science, New York Uni
versity
Jared Danker
Page 26 of 26
Working Memory
Working Memory
Bradley R. Buchsbaum and Mark D'Esposito
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Working memory refers to the temporary retention of information that has just arrived to
the senses or been retrieved from long-term memory. Although internal representations
of external stimuli have a natural tendency to decay, they can be kept “in mind” through
the action of maintenance or rehearsal strategies, and can be subjected to various opera
tions that manipulate information in the service of ongoing behavior. Empirical studies of
working memory using the tools of neuroscience, such as electrophysiological recordings
in monkeys and functional neuroimaging in humans, have advanced our knowledge of the
underlying neural mechanisms of working memory.
Keywords: short-term memory, working memory, prefrontal cortex, maintenance, neuroimaging, functional mag
netic resonance imaging, phonological loop, visual-spatial sketchpad, central executive
Introduction
Humans and other animals with elaborately evolved sensory systems are prodigious con
sumers of information: Each successive moment in an ever-changing environment nets a
vast informational catch—a rich and teeming mélange of sights, sounds, smells, and sen
sations. Everything that is caught by the senses, however, is not kept—and that which is
kept may not be kept for long. Indeed, the portion of experience that survives the immedi
ate moment is but a small part of the overall sensory input. With regard to memory stor
age, then, the brain is not a packrat, but rather is a judicious and discerning collector of
the most important pieces of experience. A good collector of experience, however, is also
a good speculator: The most important information to store in memory is that which is
most likely to be relevant at some time in the future. Of course, a large amount of infor
mation that might be important in the next few seconds is very unlikely to be of any im
portance in a day, a month, or a year. It might be stated more generally that to a large de
gree the relevance of information is time-bounded—sense-data collected and registered in
the present are far more likely to be useful in a few seconds than they are to be in a few
minutes. It would seem, then, that the temporary relevance of information demands the
existence of a temporary storage system: a kind of memory that is capable of holding onto
Page 1 of 48
Working Memory
the sense-data of the “just past” in an easily accessible form, while ensuring that older, ir
relevant, or distracting information is discarded or actively suppressed.
The existence of this kind of short-term or “working memory” has been well established
over the past century through the detailed study of human performance on tasks de
signed to examine the limits, properties, and underlying structure of human memory.
Moreover, in recent years, much has been learned about the neurobiological basis of
working memory through the study of brain-damaged patients, the effect of cortical abla
tions on animal behavior, electrophysiological recordings from single cells in the nonhu
man primate, and regional brain activity as measured by modern functional neuroimaging
tools such as positron emission tomography (p. 390) (PET), functional magnetic resonance
imaging (fMRI), and event-related potentials (ERPs). In this chapter, we examine how the
psychological concept of working memory has, through a variety of cognitive neuroscien
tific investigations, been validated as a biological reality.
Short-Term Memory
In the mid-1960s, evidence began to accumulate supporting the view that separate func
tional systems underlie memory for recent events and memory for more distant events. A
particularly robust finding came from studies of free recall in which it was demonstrated
that when subjects are presented a list of words and asked to repeat as many as possible
in any order, performance is best for the first few items (the primacy effect) and for the
last few items (the recency effect)—a pattern of accuracy that when plotted as a function
of serial position (Figure 19.1) appears U-shaped (Glanzer & Cunitz, 1966; Waugh & Nor
man, 1965).
Page 2 of 48
Working Memory
When a brief filled retention period is interposed between stimulus presentation and re
call, however, performance on early items is relatively unaffected, but the recency effect
disappears (Glanzer & Cunitz, 1966; Postman & Phillips, 1965). These findings suggest
that in the immediate recall condition, the last few items of a list are recalled best be
cause they remain accessible in a short-term store, whereas early items are more perma
nently represented (and thus unaffected by the insertion of a filled delay) in a long-term
store. This idea that memory, as a functional system, contains both short- and long-term
stores is exemplified by the two-store memory model of Atkinson and Shiffrin (1968). In
this prototype psychological memory model, comprising a short-term store (STS) and
long-term store (LTS), information enters the system through the STS, where it is encod
ed and enriched, before being passed on to the LTS for permanent storage. Although the
idea that short-term storage is a necessary prerequisite for entry into the LTS has not
held up, the two-store model of Atkinson and Shiffrin crystallized the very idea of memory
as a divisible, dichotomous system and provided the conceptual framework for the inter
pretation of patterns of memory deficits observed in patients with brain damage.
Page 3 of 48
Working Memory
Page 4 of 48
Working Memory
the system. What was missing from this line of research was the recognition that the con
tents of short-term memory are not physical elements governed by certain lawful and in
exorable processes of decay and interference, but rather dynamic representations of a
fluid cognition, capable of being maintained, transformed, and manipulated by active, ex
ecutive processes of higher control. Thus, for instance, two of the most important vari
ables in studies of short-term memory, before the emergence of the working memory
model, were time (e.g., between stimulus presentation and recall) and serial order (e.g.,
of a list of items); both of which variables are defined by the inherent structure of the en
vironmental input. In more recent years, at least as great an emphasis has been placed on
variables that reflect an ability or attribute of the subject, for instance, his or her rate of
articulation (Hulme, Newton, Cowan, Stuart, & Brown, 1999), memory capacity (Cowan,
2001), degree of inhibitory control (Hasher, Zacks, & Rahhal, 1999), or ability to “re
fresh” information in memory (Raye, Johnson, Mitchell, Reeder, & Greene, 2002). Interest
in these “internal variables” is a recognition of the fact that what is “in memory” at a mo
ment in time is defined to various degrees by the structure of the input (e.g., time, serial
order, information content), the biophysical properties of the storage medium (e.g., rate
of decay, interference susceptibility), and the active processes of control that continually
monitor and operate on the contents of memory. It is this last ingredient that puts the
“work” into working memory; it makes explicit the active and transformative character of
mental processes and acknowledges that the content of memory need not mirror the
structure and arrangement of environmental input, but rather may reflect the intentions,
plans, and goals of the conscious organism.
With that introduction in mind, let us now give a brief overview of the working memory
model of Baddeley and colleagues (Baddeley, 1986, 2000; Baddeley & Hitch, 1974).
Whereas contemporary models of short-term memory tended to emphasize storage
buffers as the receptacles for information arriving from the senses, Baddeley and Hitch
(1974) focused on rehearsal processes, that is, strategic mechanisms for the maintenance
of items in memory. Thus, for example, when one is trying to keep a telephone or license
plate number “in mind,” a common strategy is to repeatedly rehearse, either subvocally
or out loud, the contents of the numerical or alphanumerical sequence. Research had
shown that in tests of serial recall, when subjects are prevented from engaging in covert
rehearsal during a delay period that is inserted between stimulus presentation and recall,
overall performance is dramatically impaired (Baddeley, Thomson, & Buchanan, 1975). In
the case of verbal material, then, it was clear that in many ways the ability to keep words
in memory depended in large part on articulatory processes. This insight was central to
the development of the verbal component of working memory, the phonological loop (see
below), and led to a broader conceptualization of short-term memory that seeks (p. 392) to
explain not only how and why information enters and exits awareness but also how re
sources are marshaled in a strategic effort to capture and maintain the objects of memory
in the focus of attention.
Page 5 of 48
Working Memory
The initial working memory model proposed by Baddeley and Hitch (1974), but later re
fined somewhat (Baddeley, 1986, 2000; Salame & Baddeley, 1982), argued for the exis
tence of three functional components of working memory. The central executive was envi
sioned as a control system of limited attentional capacity responsible for coordinating and
controlling two subsidiary slave systems: a phonological loop and a visual-spatial sketch
pad. The phonological loop was responsible for the storage and maintenance of informa
tion in a verbal form, and the visual-spatial sketchpad was dedicated to the storage and
maintenance of visual-spatial information. In the last decade, a fourth component, the
episodic buffer has been added to the model in order to capture a number of phenomena
that could not be readily explained within the original framework.
Page 6 of 48
Working Memory
logical loop) in working memory (Baddeley, 1986). Because total attentional capacity is fi
nite, there must be a mechanism that intervenes to determine how the pool of attention is
to be divided among the many possible actions, with their different levels of priority and
reward contingencies that are afforded by the environment. Thus, in dual-task paradigms,
the central executive plays a crucial role in the scheduling and shifting of resources be
tween tasks, and it can be used to explain the decline in performance that may be ob
served even when the two tasks in question involve different memory subsystems (Badde
ley, 1992). Finally, it has often been pointed out that the central executive concept is too
vague to act as anything other than a kind of placeholder for what is undoubtedly a much
more complex system than is implied by the positing of a unitary and homunculus-like
central cognitive operator (for a model of executive cognition, see Shallice, 1982). Provid
ed, however, that the concept is not taken too literally, it can serve as a convenient way to
refer to the complex and variegated set of processes that constitute the brain’s executive
system.
Built into the architecture of the working memory model is a separation between domain-
specific (p. 393) mechanisms of memory maintenance and domain-general mechanisms of
executive control. Thus, the verbal component of working memory, or the phonological
loop, is viewed as a “slave” system that can be mobilized by the central executive when
verbal material has to be retained in memory over some uncertain delay. Within the
phonological loop, it is the interplay of two components—the phonological store and the
articulatory rehearsal process—that enables representations of verbal material to be kept
in an active state. The phonological store is a passive buffer in which speech-based infor
mation can be stored for brief (approximately 2 seconds) periods. The articulatory control
process serves to refresh and revivify the contents of the store, thus allowing the system
to maintain short sequences of verbal items in memory for an extended interval. This divi
sion of labor between two interlocking components, one an active process and the other a
passive store, is crucial to the model’s explanatory power. For instance, when the articula
tory control process is interfered with through the method of articulatory suppression
(e.g., by requiring subjects to say “hiya” over and over again), items in the store rapidly
decay, and recall performance suffers greatly. The store, then, lacks a mechanism of reac
tivating its own contents but possesses memory capacity, whereas conversely, the articu
latory rehearsal process lacks an intrinsic memory capacity of its own but can exert its ef
fect indirectly by refreshing the contents of the store.
The other slave system in the working memory model is the visual-spatial sketchpad,
which is critical for the online retention of object and spatial information. Again, as is
suggested by the term “sketchpad,” the maintenance of visual-spatial imagery in an ac
tive state requires top-down, or strategic, processing. As with the phonological loop,
where articulatory suppression interferes with the maintenance of verbal information, a
concurrent processing demand in the visual-spatial domain, such as tracking a spot of
Page 7 of 48
Working Memory
light moving on a screen, making random eye movements, or presenting subjects with ir
relevant visual information during learning, also impairs memory performance. Although
the symmetry between sensory and motor representations of visual-spatial information is
less obvious than it is in the case of speech, it has been demonstrated that covert eye
movement is important for the maintenance of spatial information (Postle, D’Esposito, &
Corkin, 2005). Baddeley (1986) initially proposed that in the context of spatial memory,
covert eye movements can act as way of revisiting locations in memory, and thus operate
very much like the articulatory rehearsal process known to be important for the mainte
nance of verbal information. Moreover, requiring subjects to perform a spatial interfer
ence task that disrupts or otherwise occupies this rehearsal component significantly im
pairs the performance of tests of spatial working memory, but may have little effect on
nonspatial visual memory tasks (Cocchini, Logie, Della Sala, MacPherson, & Baddeley,
2002; Della Sala, Gray, Baddeley, Allamano, & Wilson, 1999). In contrast, retention of vi
sual shape or color information is interfered with by visual-perceptual input, but not by a
concurrent demand in the spatial domain (Klauer & Zhao, 2004). Thus, the principles that
underlie the operation of the phonological loop are qualitatively similar to those that un
derlie the operation of the visual-spatial sketchpad; in both cases, maintenance processes
consist of covert motor performance that serves to reactivate the memory traces residing
in sensory stores. This mechanism might be most simply described as “remembering by
doing,” a strategy that is most effective when a motor code, which can be infinitely regen
erated and that is under the subject’s voluntary control, can be substituted for a fragile
and less easily maintained perceptual memory code.
Page 8 of 48
Working Memory
tive domains (Baddeley, Allen, & Vargha-Khadem, 2010; Prabhakaran, Narayanan, Zhao,
& Gabrieli, 2000).
Summary
The working memory model of Baddeley and colleagues describes a system for the main
tenance and manipulation of information that is stored in domain-specific memory
buffers. Separate cognitive components are dedicated to the functions of storage, re
hearsal, and executive control. Informational encapsulation and domain segregation dic
tate that auditory-verbal and visual information be kept in separate storage subsystems—
the phonological loop and the visual-spatial sketchpad, respectively. These storage sub
systems themselves comprise specialized components for the passive storage of memory
traces, which are subject to time and interference-based decay, and for the reactivation of
these memory traces by way of simulation, or rehearsal. Thus, storage components repre
sent memory traces, but have no internal means of refreshing them, whereas rehearsal
processes (e.g., articulatory, saccadic) have no mnemonic capacity of their own, but can
reactivate the decaying traces held in temporary stores. In the succeeding sections we ex
amine how neuroscience has built on the cognitive foundation of the working memory
model of Baddeley and colleagues to refine our understanding of how information is
maintained and manipulated in the brain. We will see that in some cases neuroscientific
evidence has bolstered and reinforced aspects of the working memory model, whereas in
other cases neuroscience has compelled a departure from certain core principles of the
Baddeleyan concept.
The normal chimpanzee has considerable facility in using sticks or other objects to
manipulate its environment, e.g., to reach a piece of food beyond its unaided
reach. It can solve such problems when it must utilize several sticks, some of
which may not be immediately available in the visual field. After ablation of the
pre-frontal areas, the chimpanzee continues to use sticks as tools but it may have
difficulty solving the problem if the necessary sticks and the food are not simulta
neously present in the visual field. It exhibits also a characteristic “memory” de
fect. Given an opportunity to observe a piece of food being concealed under one of
two similar cups, it fails to recall after a few seconds under which cup the lure has
been hidden.… (p. 317)
Page 9 of 48
Working Memory
In his pioneering experimental work, Jacobsen (1936) discovered that damage to the PFC
of the monkey produces selective deficits in a task requiring a delayed response to the
presentation of a sensory stimulus. The delayed response tasks were initially devised by
Hunter (1913) as a way of differentiating between animals on the basis of their ability to
use information not currently available in the sensory environment to guide an imminent
response. In the classic version of this test, a monkey is shown the location of a food
morsel that is then hidden from view and placed in one of two wells. After a delay period
of a few seconds, the monkey chooses one of the two locations and is rewarded if his
choice corresponds to the location of the food. Variations on this test include the delayed
alternation task, the delayed match-to-sample task, and the delayed nonmatch-to-sample
task. The family of delayed-response tasks measure a complex cognitive ability that re
quires at least three clearly identifiable subprocesses: to recognize and properly encode
the to-be-remembered item, to hold an internal representation of the item “online” across
an interval of time, and finally, to initiate the appropriate motor command when a re
sponse is prompted. Jacobsen showed that lesions to the PFC impair only the second of
the above three functions, suggesting a fundamental role for the region in immediate or
short-term memory. Thus, monkeys with lesions to PFC perform in the normal range on a
variety of tests requiring sensorimotor behavior, such as visual pattern discrimination and
motor learning and control—that is, tasks without a short-term mnemonic component. Al
though the impairments in the performance of delayed-response tasks in Jacobsen’s stud
ies were caused by large (p. 395) prefrontal lesions that often extended into the frontal
pole and orbital surface, later studies showed that lesions confined to the region of the
principal sulcus produced deficits equally as severe (Blum, 1952; Butters, Pandya, Stein,
& Rosen, 1972).
Fuster and Alexander (1971) reported the first direct physiological measures of PFC in
volvement in short-term memory. With microelectrodes placed in the PFC, they measured
the firing patterns of neurons during a spatial delayed-response task and showed that
many cells had increased firing, relative to an intertrial baseline period, during both cue
presentation and the later retention period. Importantly, some cells fired exclusively dur
ing the delay period and therefore could be considered pure “memory cells.” The results
were interpreted as providing evidence for PFC involvement in the focusing of attention
“on information that is being or that has been placed in temporary memory storage for
prospective utilization” (p. 654). Many subsequent electrophysiological studies have
demonstrated memory-related activity in the PFC of the monkey during delayed-response
tasks of various kinds (e.g., Joseph & Barone, 1987; Niki, 1974; Niki & Watanabe, 1976;
Quintana, Yajeya, & Fuster, 1988), although it was Patricia Goldman-Rakic who first drew
a parallel (but see Passingham, 1985) and then firmly linked the phenomenon of persis
tent activity in PFC to the cognitive psychological concept of “working memory.” In a re
view of the existing literature on the role of the PFC in short-term memory, Goldman-Ra
kic (1987), citing lesion and electrophysiological studies in the monkey, human neuropsy
chology, and the cytoarchitectonics and cortical-cortical connections of the PFC, argued
that the dorsolateral PFC (the principal sulcus of the monkey) plays an essential role in
holding visual-spatial information in memory before the initiation of a response and in the
Page 10 of 48
Working Memory
absence of guiding sensory stimulation. In this and later work (especially that of Wilson,
1993), Goldman-Rakic developed a model of PFC in which visual-spatial and (visual) ob
ject working memory were topographically segregated, with the former localized to the
principal sulcus and the latter localized to a more ventral region along the inferior con
vexity of the lateral PFC (Figure 19.2).
This domain-specific view of the prefrontal organization, which was supported by ob
served dissociations in the responsivity of neurons in dorsal and ventral areas of the PFC
during delayed response tasks, could be viewed as an anterior expansion of the dorsal
(“where”) and ventral (“what”) streams that had been discovered in the visual system in
posterior neocortex (Ungerleider & Mishkin, 1982). In addition, the parallel and modular
nature of the proposed functional and neuroanatomical architecture of PFC was in keep
ing with the tenet of domain independence in the working memory model of Baddeley and
colleagues.
The connection between persistent activation in the PFC of the monkey and a model of
memory developed in the field of cognitive psychology might seem tenuous, especially in
light of the fact that the working memory model was originally formulated on the basis of
evidence derived from behavioral studies using linguistic material—an informational
medium clearly unavailable to monkeys. For Goldman-Rakic, though, the use of the term
“working memory” in the context of nonhuman primate electrophysiology was intended
not as an offhand or otherwise desultory nod to psychology (Goldman-Rakic, 1990), but
rather as a reasoned and deliberate effort to unify both our understanding of and manner
of referencing a common neurobiological mechanism underlying an aspect of higher cog
nition that is well developed in primate species. Certainly, in retrospect, the decision to
label the phenomenon of persistent activity in PFC with the term “working memory” has
had an immeasurable impact on memory research, and indeed may be thought of as one
of the two or three most important events contributing to the emergence of an integrated
and unified approach to the study of neurobiology and psychology.
Page 11 of 48
Working Memory
At about the same time as Fuster and Alexander (1971) recorded neural activity in the
monkey PFC during a working memory task, Ingvar and colleagues examined variation in
regional cerebral blood flow (rCBF) during tasks requiring complex mental activity. In
deed, Risberg and Ingvar (1973), in the first functional neuroimaging study of short-term
memory, showed that during a backward digit span task, the largest increases in rCBF,
compared with a resting baseline, were observed in pre-rolandic and anterior frontal cor
tex. It was not, however, until the emergence of PET and the development the oxygen-15
tracer that the mapping of brain activity during the exercise of higher mental functions
would become genuinely amenable to the evaluation of complex hypotheses about of the
neural basis of cognition. In the middle and late 1980s, technological advances in the PET
technique, with its relatively high spatial resolution (approximately 1 cm3), were accom
panied by a critical conceptual (p. 396) innovation known as cognitive subtraction, which
provided the inferential machinery needed to link regional variation in brain activity to
Page 12 of 48
Working Memory
experimental manipulations at the task or psychological level (Posner, Petersen, Fox, &
Raichle, 1988). Thus, for any set of hypothesized mental processes (a,b,c), if a task can be
devised in which one condition recruits all of the processing components (Task 1a,b,c) and
another condition recruits only a subset of the components (Task 2a,b), subtraction of the
observed regional activity during Task 2 from that observed during Task 1 should reveal
the excess neural activity due to the performance of Task 1, and thus is associated with
the cognitive component c. The working memory model of Baddeley, with its discrete cog
nitive components (e.g., central executive, phonological loop, and visual-spatial scratch
pad) was an ideal model with which to test the power of cognitive subtraction using mod
ern neuroimaging tools. Indeed, in the span of only 2 years, the landmark studies of
Paulesu et al. (1993), Jonides et al. (1993), and D’Esposito et al. (1995) had mapped all of
the cognitive components of the working memory model onto specific regions of the cere
bral cortex. The challenge in successive years was to go beyond this sort of “psychoneur
al transcription”—which is necessarily a unidirectional mapping between the cognitive
box and the cerebral convolution—and begin to develop models that generate hypotheses
that refer directly to the brain regions and mechanisms that underlie working memory. In
the following sections, we review how cognitive neuroscience studies of short-term mem
ory and executive control used the working memory model to gain an initial neural
foothold, upon which later studies were buttressed, and which would lead to insights and
advances in our understanding of working memory as it is implemented in the brain.
The first study of visual-spatial working memory in PET was carried out by Jonides and
colleagues in 1993 using the logic of cognitive subtraction to isolate mnemonic processes
associated with the maintenance of visual-spatial information, in a task very similar to the
tasks used by Goldman-Rakic and her colleagues with monkeys (p. 397) (Funahashi,
Bruce, & Goldman-Rakic, 1989; Goldman-Rakic, 1987). During “memory” scans, subjects
were shown an array of three dots appearing for 200 ms on the circumference of a 14-mm
imaginary circle and instructed to maintain the items in memory during a 3-second reten
tion interval. This was followed by a probe for location-memory consisting of a circular
outline that either did or did not (with equal probability) enclose one of the previously
memorized dots, and to which subjects responded with a yes/no decision. In “perception”
scans, the three dots and the probe outline were presented simultaneously, so that sub
jects did not have to retain the location of the items in memory during a delay, but instead
simply had to decide whether the outline encircled one of the three displayed dots (Fig
ure 19.3).
Page 13 of 48
Working Memory
Subtraction of the “perception” scans from the “memory” scans revealed a right-lateral
ized network of cortical regions that would become a hallmark of neuroimaging studies of
visual-spatial working memory: the posterior parietal lobe, dorsal premotor cortex, occipi
tal cortex (Brodmann area 19), and PFC. In their interpretation of the findings, the au
thors suggested that the occipital activity reflected a role in the creation, but not neces
sarily the maintenance, of an internal visual image of the dot pattern, and that activity in
the PFC might reflect one of two things: (1) the literal storage of a representation of the
image in memory during the delay, or (2) the representation of a pointer or link to other
brain circuitry, perhaps in the occipital or parietal lobe, that is actually responsible for
maintaining the memory engram. These two explanations for the observation of pre
frontal activity during working memory tasks, which in later years would often be pitted
against each other, nicely framed the emerging debate on the division of labor among the
cortical regions involved in the maintenance of information in working memory.
A major aim of many of the early neuroimaging studies of visual-spatial working memory
was to duplicate the canonical finding of Goldman-Rakic and colleagues of a dorsal-ven
tral dissociation in monkey PFC for spatial and object working memory. Studies by
Petrides et al. (1993) and McCarthy et al. (1994) demonstrated with PET and fMRI, re
spectively, that mid-dorsolateral PFC (Brodmann areas 9 and 46) shows increased activity
during spatial working memory when compared with a control condition. An attempt to
show a neuroanatomical double dissociation between spatial and object working memory
was undertaken by Smith et al. (1995) in a PET study that used carefully controlled non
verbalizable object stimuli that were presented in both object and spatial task contexts.
This study found distinct brain circuits for the storage of spatial and object information,
with spatial working memory relying primarily on right-hemisphere regions in the pre
frontal (BA 46) and parietal (BA 40) cortices, and object working memory involving only a
left inferotemporal area. These results, however, only partially replicated the monkey
study of Wilson et al. (1993), who had found distinct dorsal and ventral regions in lateral
Page 14 of 48
Working Memory
PFC for spatial and object working memory. A similar pattern was found by McCarthy et
al. (1994), in which regional (p. 398) differences between object and spatial working mem
ory were most pronounced across hemispheres rather than between dorsal and ventral
divisions of the PFC. In a contemporaneous review and meta-analysis of all human neu
roimaging studies of working memory, D’Esposito et al. (1998) showed that there was vir
tually no evidence for a neuroanatomical dissociation between spatial and object working
memory in the PFC, a finding that was supported by a later and more exhaustive quanti
tative meta-analysis (Wager & Smith, 2003). Indeed, establishing a correspondence be
tween the functional neuroanatomy of visual-spatial working memory in the monkey and
human brains has proved remarkably difficult, leading to a protracted debate among and
between monkey neurophysiologists and human neuroimaging researchers about the
proper way to conceptualize the functional topography of working memory in the PFC
(Goldman-Rakic, 2000; E. K. Miller, 2000). Increasingly, efforts were made to adapt hu
man neuroimaging studies to resemble as closely as possible the kinds of tasks used in
animal electrophysiology, such as the delayed match-to-sample procedure. The emer
gence of event-related fMRI, with its superior spatial and temporal resolution to oxy
gen-15 PET, was critical to this new effort at cross-disciplinary synthesis and reconcilia
tion, and led to a number of fundamental insights on the brain basis of working memory,
to the discussion of which we now turn.
Early PET studies of working memory relied exclusively on the logic of cognitive subtrac
tion to isolate hypothesized components of a complex cognitive task. Thus, even for work
ing memory tasks that consisted of a number of temporal phases within a given trial (e.g.,
stimulus presentation → memory maintenance → recognition decision), the low temporal
resolution of PET prohibited separate statistical assessment of activity within a single
task phase. Event-related fMRI, on the other hand, with its temporal resolution on the or
der of 2 to 4 seconds, could be used to examine functional activity in different portions of
a multiphase trial, provided that each of the sequential task components was separated
by approximately 4 seconds (Zarahn, Aguirre, & D’Esposito, 1997). This methodology per
mits the isolation of maintenance-related activity during the delay period of a match-to-
sample procedure without relying on a complex cognitive subtraction.
in the superior frontal sulcus (posterior and superior to BA 46) indicated an evolutionary
displacement in the functional anatomy of the PFC, possibly owing to the emergence of
new cognitive abilities such as abstract reasoning, complex problem solving, and plan
ning for the future. In short, then, this study was the first functional neuroimaging study
to fully replicate the object versus spatial working memory dissociation shown by Gold
man-Rakic and colleagues, insofar as one accepts their proposal that the human homo
logue to the monkey principal sulcus is located, not in the middle frontal gyrus or BA 46,
but rather in the superior frontal sulcus.
While several subsequent studies of spatial working memory offered further support (Le
ung, Seelig, & Gore, 2004; Munk et al., 2002; Sala, Rama, & Courtney, 2003; Walter et al.,
2003) for a specifically mnemonic role of the superior frontal sulcus in tasks of spatial
working memory, other studies failed to replicate the finding (Postle, Berger, &
D’Esposito, 1999; Postle, Berger, Taich, & D’Esposito, 2000; Srimal & Curtis). For in
stance, although Postle et al. (2000) observed delay-period activity in this region during a
spatial working memory task, they also found it to be equally active during the generation
of two-dimensional saccades, a task that required visual-spatial attention and motor con
trol but placed no demands on memory storage. Using a paradigm that varied the length
of the delay period in a memory-guided saccade task similar to that used by Funahashi
and colleagues in the monkey, Srimal and Curtis (2008) failed to show any maintenance-
related activity in the (p. 399) superior frontal sulcus, casting doubt on whether this area
is the human homologue of the monkey principal sulcus as originally suggested by Court
ney et al. (1998). Srimal and Curtis (2008) note, however, that all of the studies of spatial
working memory that have shown delay-period activity in human superior frontal sulcus
have used tasks that required maintenance of multiple spatial locations. This suggests
that activity in this region reflects the operation of higher order processes that are need
ed when one is required to maintain the spatial relation or configuration of multiple ob
jects in space. Moreover, Curtis et al. (2004) have shown using the memory-guided sac
cade task that the FEF and intraparietal sulcus (IPS) are the two areas that not only show
robust delay-period activity but also have activity that predicts the accuracy of the gener
ated saccade (Figure 19.4). This indicates that the very areas that are known to be impor
tant for the planning and preparation of eye movements and, more generally, for spatial
selective attention (Corbetta, Kincade, & Shulman, 2002) are also critical for working
memory maintenance (Logie, 1995). In addition, recent evidence has shown that internal
shifts to locations in working memory not only activate the frontoparietal oculomotor sys
tem but also can activate early visual cortex (including V1) in a retinotopic manner
(Munneke, Belopolsky, & Theeuwes).
Page 16 of 48
Working Memory
Page 17 of 48
Working Memory
Page 18 of 48
Working Memory
Many studies have investigated the maintenance of visual objects, mostly faces, houses,
scenes, and abstract shapes that are not easily verbalizable (e.g., Belger et al., 1998;
Courtney, Ungerleider, Keil, & Haxby, 1996, 1997; Druzgal & D’Esposito, 2001, 2003; Lin
den et al., 2003; G. McCarthy et al., 1994; Mecklinger, Gruenewald, Besson, Magnie, &
Von Cramon, 2002; Postle & D’Esposito, 1999; Postle, Druzgal, & D’Esposito, 2003; Rama,
Sala, Gillen, Pekar, & Courtney, 2001; Sala et al., 2003; Smith et al., 1995). A consistent
finding has been that posterior cortical areas within the inferior temporal lobe that may
preferentially respond to the presentation of certain categories of complex visual objects
also tend to activate during object working memory tasks. For example, the fusiform
gyrus, found along the ventral surface of the temporal lobe, shows greater activation
when a subject is shown pictures of faces than when shown other types of complex visual
stimuli like pictures of houses or scenes or household objects (Kanwisher, McDermott, &
Chun, 1997). Indeed, given its selective response (p. 400) properties, the fusiform gyrus
has been termed the “fusiform face area,” or FFA.
There are four important findings that indicate that posterior extrastriate cortical regions
like the FFA play an important role in the mnemonic storage of object features. First, the
FFA shows persistent delay-period activity (Druzgal & D’Esposito, 2001, 2003; Postle et
al., 2003; Rama et al., 2001) during working memory tasks. Second, the activity in the
FFA is selective for faces; it is greater during delays in which subjects are maintaining
faces compared with other objects (Sala et al., 2003). Third, as the number of faces that
are being maintained increases, the magnitude of the delay-period activity increases in
the FFA (Druzgal & D’Esposito, 2001, 2003; Jha & McCarthy, 2000). Such load effects
strongly suggest a role in short-term storage because, as the number of items that must
be represented increases, so should the storage demands. Fourth, using a delayed paired-
associates task, Ranganath et al. (2004) showed that the FFA responds during an unfilled
delay interval following the presentation of a house that the subject has learned is associ
ated with a certain face. Therefore, the delay-period FFA activity likely reflects the reacti
vated image of the associated face that was retrieved from long-term memory despite the
fact that no face was actually presented before the delay. Together, these studies suggest
that posterior regions of visual association cortex, like the FFA, participate in the internal
storage of specific classes of visual object features. Most likely, the mechanisms used to
create internal representations of objects that are no longer in our environment are simi
lar to the mechanisms used to represent objects that exist in our external environment.
What happens to memory representations for visual objects when subjects momentarily
divert their attention to elsewhere (i.e., to a competing task or a different item in memo
ry)? Lewis-Peacock et al. (2012) have exploited the enhanced sensitivity of multivoxel
analyses of fMRI data to answer this question. In their paradigm, a subject is presented
with two items displayed on the screen, one word and one picture, followed by a cue indi
cating which of the two stimuli to retain in memory. After an 8-second delay, a second cue
appears indicating for the subject either to retain the same item in memory or else to re
trieve and maintain the item that the subject had previously been instructed to ignore for
another 8 seconds. The authors found that when an item is actively maintained in the fo
cus of attention, multivoxel patterns of brain activation can be used to “decode” whether
Page 19 of 48
Working Memory
the item in memory is a picture or a word. However, if the item is not actively maintained
in working memory, there is not sufficient information in the multivoxel patterns to identi
fy it. This is surprising in light of the behavioral data showing that subjects are easily able
to retrieve the ignored item when they are cued to do so (on “switch trials”). Thus, a
memory may be resident within “short-term memory” while not being actively main
tained, and yet there is no discernible neurophysiological footprint in the blood-oxygen-
level-dependent (BOLD) signal that the item is displaying heightened activation. Two im
portant points may be taken from this work—first, that analyses of patterns of activity of
fer increased power and specificity when investigating the brain’s representations of indi
vidual memories; and, second, that increased BOLD activity during working memory
tasks reflects processes associated with active maintenance and may not be able to de
tect memory representations that are accessible but nevertheless outside the direct focus
of attention.
Page 20 of 48
Working Memory
Page 21 of 48
Working Memory
that cut across the representational domain, arguing for a modality-independent storage system
that the authors linked to Cowan’s concept of a focus of attention (Cowan, 1988). The correct in
terpretation, however, as to what is actually being represented in the IPS is still a matter of de
bate; however, one need not assume that what is being “stored” is a literal representation of an
object—it may instead reflect the deployment of a limited pool of attentional resources that
scales with the number of items that are maintained in memory.
In the past several years there has been a great deal of interest in the question as to
whether working memory capacity can be expanded through practice and training (Jaeg
gi, Buschkuehl, Jonides, & Perrig, 2008; Klingberg, 2010; Morrison & Chein, 2011). Stud
ies of individual differences in working memory abilities have shown that the construct is
highly correlated (r ∼ 0.5) with fluid intelligence (Engle, Tuholski, Laughlin, & Conway,
1999; Oberauer, Schulze, Wilhelm, & Suss, 2005)—and it therefore stands to reason that
if working memory capacity could be expanded through training, so could one enhance a
person’s general problem-solving abilities and overall intelligence. A central concern of
such training research is in demonstrating that practice with a working memory task con
fers a benefit, not just in performance on the same or similar types of tasks, but also in
cognitive performance on activities that share little surface similarity with the training
task (Shipstead, Redick, & Engle, 2012)—a phenomenon call far transfer. Although there
is some evidence supporting the hypothesis that training on a working memory task can
lead to improvements on other cognitive tasks (Chein & Morrison, 2010; Jaeggi,
Buschkuehl, Jonides, & Shah, 2011; Klingberg et al., 2005), the largest study of computer
ized cognitive training involving more than 10,000 participants failed to demonstrate far
transfer (Owen et al., 2010). Nevertheless, research on this topic is still in its infancy, and
it seems plausible that small improvements in working memory capacity could be
achieved through training.
Page 22 of 48
Working Memory
dozens of sophisticated computational models. Finally, the study of aphasic patients has
provided a wealth of information about the neural circuitry underlying language, and sys
tematic neurological and neuropsychological inquiries into the impairments that accom
pany damage to the language system have yielded detailed neuroanatomical models. The
aphasia literature notwithstanding, the study of the neural basis of verbal working memo
ry has depended, to a much greater extent than has been the case in the visual-spatial do
main, on pure cognitive models of memory, in particular the phonological loop of Badde
ley and colleagues. Not surprisingly, as it turns out, there are notable similarities be
tween working memory for visual material and working memory for linguistic material,
despite the absence of an exactly analogous capacity in nonhuman primates.
Page 23 of 48
Working Memory
eas, in the inferior frontal and superior temporal cortices, that subserve language func
tion.
In the 1960s a handful of patients were described that did not fit nicely into the classic
aphasiological rubric. Both Luria (1967) and Warrington and Shallice (1969) described pa
tients with damage to the temporal-parietal cortex who were severely impaired at repeat
ing sequences of words or digits spoken aloud by the experimenter. Luria referred to the
deficit as an acoustic-mnestic aphasia, whereas Warrington and Shallice (1969), who were
perhaps more attuned to extant information processing models in cognitive psychology,
referred to the deficit as a “selective impairment of auditory-verbal short-term memory.”
In both of these cases, however, the memory impairment was accompanied by a deficit in
ordinary speech production (i.e., word-finding difficulties, errors of speech, and reading
difficulty), which was, in fact, consistent with the more common diagnosis of conduction
aphasia, and therefore complicated the argument in favor of a “pure” memory impair
ment. Several years later, however, a patient (JB) (Shallice & Butterworth, 1977), also
with a temporal-parietal lesion, was described who had a severely reduced auditory-ver
bal immediate memory span (one or two items) and yet was otherwise unimpaired in ordi
nary language use, including speech production and even long-term learning of supra-
span lists of words. Several other such patients have since been described (for a review,
see Shallice & Vallar, 1990), thus strengthening the case for the existence of an auditory-
verbal storage component located in temporal-parietal cortex.
The puzzle, of course, with respect to the classic neurological model of language dis
cussed above, is how a lesion in the middle of the perisylvian speech center could pro
duce a deficit in auditory-verbal immediate memory without any collateral deficit in basic
language functioning. One possibility is that the precise location of the brain injury is de
terminative, so that a particularly focal and well-placed lesion in temporal-parietal cortex
might spare cortex critical for speech perception and speech production, while damaging
a region dedicated to the storage of auditory-verbal information. However, the number of
patients who have been described with a selective impairment to auditory-verbal short-
term memory is small, and the lesion locations that have been reported are comparable to
those that might, in another patient, have led to conduction or Wernicke’s aphasia (A. R.
Damasio, 1992; Dronkers, Wilkins, Van Valin, Redfern, & Jaeger, 2004; Goodglass, 1993).
This would seem, then, to be a question particularly well suited to high-resolution func
tional neuroimaging.
The first study that attempted to localize the components of phonological loop in the
brain was that of Paulesu and colleagues (1993). In one task, English letters were visually
presented on a monitor and subjects were asked to remember them. In a second task, let
ters were presented and rhyming judgments were made about them (press a button if let
ter rhymes with “B”). In a baseline condition, Korean letters were visually presented and
subjects were asked to remember them using a visual code. According to the authors’ log
ic, the first task would (p. 404) require the contribution of all the components of the
phonological loop—subvocal rehearsal, phonological storage, and executive processes—
and the second (rhyming) task would only require subvocal rehearsal and executive
Page 24 of 48
Working Memory
processes. This reasoning was based on previous research showing that when letters are
presented visually (Vallar & Baddeley, 1984), rhyming decisions engage the subvocal re
hearsal system, but not the phonological store. Thus, a subtraction of the rhyming condi
tion from the letter-rehearsal condition should isolate the neural locus of the phonological
store. First, results were presented for the two tasks requiring phonological processing
with the baseline tasks (viewing Korean letters) that did not. Several areas were shown to
be significantly more active in the phonological tasks, including (in all cases, bilaterally):
Broca’s area (BA 44/45), the supplementary motor cortex (SMA), the insula, the cerebel
lum, Brodmann area 22/42, and Brodmann area 40. Subtracting the rhyming condition
from the phonological short-term memory condition left a single brain area: Brodmann
area 40—the neural correlate of the phonological store.
Not surprisingly, the articulatory rehearsal process recruited a distributed neural circuit
that included the inferior frontal gyrus. Activation of multiple brain regions during articu
latory rehearsal is not surprising, given the complexity of the process and the variety of
lesion sites associated with a speech production deficit. On the other hand, the localiza
tion of the phonological store in a single brain region, BA 40 (or the supramarginal
gyrus), comports with the idea of a solitary “receptacle,” where phonological information
is temporarily stored. A number of follow-up PET studies, using various tasks and design
logic, generally replicated the basic finding of the Paulesu study, namely a frontal-insular
cerebellar network associated with rehearsal processes, and a parietal locus for the
phonological store (Awh et al., 1996; Jonides et al., 1998; Salmon et al., 1996; Schumach
er et al., 1996; Smith & Jonides, 1999).
In a review of these pre-millennium PET studies of verbal working memory, Becker (1999)
questioned whether the localization of the phonological store in BA 40 of the parietal cor
tex could be reconciled with the logical architecture of the phonological loop. For in
stance, one key aspect of the phonological loop model is that auditory information
(whether it be speech, tones, music, or white noise), but not visual information, has oblig
atory access to the phonological store. The reason for this asymmetry is to account for
dissociations in memory performance that depend on the modality in which information is
presented. For instance, the presentation of distracting auditory information while sub
jects attempt to retain a list of verbal items in memory impairs performance on tests of
recall. In contrast, the presentation of distracting visual information during verbal memo
ry retention has no impact on verbal recall. This phenomenon, known as the irrelevant
sound effect, is explained by assuming that auditory information—whether relevant or ir
relevant—always enters the phonological store, but that visual-verbal information only en
ters the store when it is explicitly subvocalized. Becker and colleagues, however, noted
that if indeed auditory information has obligatory access to the phonological store, its
“neural correlate” should be active even during passive auditory perception. Functional
neuroimaging studies of passive auditory listening (e.g., with no memory component),
however, do not show activity in the parietal lobe, but rather show activation that is large
ly confined to the superior temporal lobe (e.g., Binder et al., 2000). In addition, efforts to
show verbal mnemonic specificity to maintenance-related activity on the parietal lobe
have not been successful, showing instead that working memories for words, visual ob
Page 25 of 48
Working Memory
jects, and spatial locations all activate the area (Badre, Poldrack, Pare-Blagoev, Insler, &
Wagner, 2005; Nystrom et al., 2000; Zurowski et al., 2002). Thus, it would appear that if
there were a true neural correlate to the phonological store, it must reside within the
confines of the auditory cortical zone of the superior temporal cortex.
As was the case in the visual-spatial domain, the emergence of event-related fMRI, with
its ability to isolate delay-period activity during working memory, was an inferential boon
to the study of verbal working memory. Postle (1999) showed with visual-verbal presenta
tion of letter stimuli that delay-period activity in single subjects was often localized in the
posterior-superior temporal cortex rather than the parietal lobe. Buchsbaum (2001) also
used an event-related fMRI paradigm in which, on each trial, subjects were presented
with acoustic speech information that they then rehearsed subvocally for 27 seconds, fol
lowed by a rest period. Analysis focused on identifying regions that were responsive both
during the perceptual phase and during the rehearsal phase of the trial. Activation oc
curred in two regions in the posterior superior temporal cortex, one in the posterior supe
rior temporal sulcus (pSTS) bilaterally and one along the dorsal surface of the left poste
rior planum temporale, that is, in (p. 405) the Sylvius fissure at the parietal-temporal
boundary (area SPT). Notably, although the parietal lobe did show delay-period activity, it
was unresponsive during auditory stimulus presentation. In a follow-up study, Hickok
(2003) showed that the same superior temporal regions (pSTS and SPT) were active both
during the perception and during delay-period maintenance of short (5-second) musical
melodies, suggesting that these posterior temporal storage sites are not restricted to
speech-based, or phonological, information (Figure 19.6). Several subsequent studies
have confirmed the role of SPT in inner speech and verbal working memory (Hashimoto,
Lee, Preus, McCarley, & Wible, 2010; Hickok, Okada, & Serences, 2009; Koelsch et al.,
2009). Acheson et al. (2011) used fMRI to identify posterior temporal regions activated
during verbal working memory maintenance, and then used repetitive transcranial mag
netic stimulation (TMS) to these sites while subjects performed a rapid-paced reading
task that involved language production but no memory load. TMS applied to the posterior
temporal area significantly interfered with paced reading, arguing for common neural
substrate for language production and verbal working memory.
Page 26 of 48
Working Memory
Stevens et al. (2004) and Rama et al. (2004) have shown that memory for voice identity,
independent of phonological content (i.e., matching speaker identity as opposed to word
identity), selectively activates the mid-STS and the anterior STG of the superior temporal
region, but not the more posterior and dorsally situated SPT region. Buchsbaum et al.
(2005) have further shown that the mid-STS is more active when subjects recall verbal in
formation that is acoustically presented than when the information is visually presented,
whereas area SPT shows equally strong delay-period activity for both auditory and visual
forms of input. This finding is supported by regional analyses of structural MRI in large
groups of patients with brain lesions that have showed that damage to the STG is most
predictive of auditory short-term memory impairment (Koenigs et al. 2011; Leff et al.,
2009). Thus, it appears that different regions in the auditory association cortex of the su
perior temporal cortex are attuned to different qualities or features of a verbal stimulus,
such as voice information, input modality, phonological content, and lexical status (e.g.,
Martin & Freedman, 2001)—and all of these codes may play a role in the short-term
maintenance of verbal information.
Page 27 of 48
Working Memory
Earlier we posed the question as to how a lesion in posterior sylvian cortex, an area of
known importance for online language processing, could occasionally produce an impair
ment restricted to phonological short-term memory. One solution to this puzzle is that
Page 28 of 48
Working Memory
subjects with selective verbal short-term memory deficits from temporal-parietal lesions
retain their perceptual and comprehension abilities owing to the sparing of the ventral
stream pathways in the lateral temporal cortex, whereas the preservation of speech pro
duction is due to an unusual capacity in these subjects for right-hemisphere control of
speech (Buchsbaum & D’Esposito, 2008b; Hickok & Poeppel, 2004). The short-term mem
ory deficit arises, then, from a selective deficit in auditory-motor integration—or the abili
ty to translate between acoustic and articulatory speech codes—a function that is espe
cially taxed during tests of repetition and short-term memory (Buchsbaum & D’Esposito,
2008a). Conduction aphasia, the aphasic syndrome most often associated with a deficit in
auditory repetition and verbal short-term memory in the absence of any difficulty with
speech perception, may reflect a disorder of auditory-motor integration. Indeed, it has re
cently been shown that the lesion site most often implicated in conduction aphasia cir
cumscribes area SPT in the posterior-most portion of the superior temporal lobe, a link
between a disorder of verbal repetition and a region in the brain often implicated in tasks
of verbal working memory (Buchsbaum et al. 2011; Figure 19.7). Thus, impairment in the
ability to temporarily store verbal information, as occurs in conduction aphasia, may re
sult from damage to a system, area (p. 407) SPT, that is critical for the interfacing of audi
tory and motor representations of sound.
Three of the most important processes that serve to regulate the contents of working
memory are selection, reactivation (or updating), and suppression. Each of these opera
tions functions to regulate what is currently “in” working memory, allowing for the con
tents of working memory to be determined strategically, according to ongoing goals and
actions of the organism. Moreover, these cognitive control processes allow for the best
utilization of a system that is subject to severe capacity limitations; thus, selection is a
Page 29 of 48
Working Memory
mechanism that regulates what enters working memory, suppression serves to prevent
unwanted information from entering (or remaining in) memory, and reactivation offers a
mechanism for retaining information in working memory. All three of these operations fall
under the general category of what we might call “top-down” signals or commands that
the PFC deploys to effectively regulate memory. In the following sections we briefly re
view research on the neural basis of these working memory control functions.
It is easy to see why selection is an important process for regulating the contents of work
ing memory. For instance, if one tries to mentally calculate the product of the numbers 8
and 16, one might attack the problem in stages. First multiply 8 and 6 (yielding 48), then
multiply 8 and 16 (yielding 80), and then add together the intermediate values (48 + 80 =
128). For this strategy to work, one has to be able to be able to select the currently rele
vant numbers in working memory at each stage of the calculation. Thus, at any given
time, it is likely that many pieces of information are competing for access to working
memory. One way to study this type of selection is to devise working memory tasks that
vary the degree of proactive interference, so that on some trials subjects must select the
correct items from among a set of active, but task irrelevant competitors (see Jonides &
Nee, 2006; Vogel, McCollough, & Machizawa, 2005; Yi, Woodman, Widders, Marois, &
Chun, 2004). Functional neuroimaging studies have shown that the left inferior frontal
gyrus is modulated by the number of irrelevant competing alternatives that confront a
subject (Badre & Wagner, 2007; Thompson-Schill, D’Esposito, Aguirre, & Farah, 1997).
Moreover, behavioral measures of the interfering effects of irrelevant information are cor
related with the level of activation in the left inferior frontal gyrus (D’Esposito, Postle,
Jonides, & Smith, 1999; Smith, Jonides, Marshuetz, & Koeppe, 1998; Thompson-Schill et
al., 2002). From a mechanistic standpoint, selection may involve the maintenance of goal
information to bias activation in favor of relevant over irrelevant information competing
for access to working memory (E. K. Miller & Cohen, 2001; Rowe, Toni, Josephs, Frack
owiak, & Passingham, 2000).
Page 30 of 48
Working Memory
There must yet be other means of achieving reactivation, however, when there is not a di
rect equivalence between some top-down (e.g., motor) and bottom-up (e.g., sensory)
code, as is true in the case of speech, in which every sound that can be produced can also
be perceived (Chun & Johnson, 2011). Thus, for instance, there is no obvious “motor
equivalent” of an abstract shape, color, or picture of a (expressionless) face, and yet such
images can nevertheless be maintained to some degree in working memory. One possibili
ty is that the maintenance of sensory stimuli that do not have a motor analogue is
achieved by means of a more general mechanism of reactivation that need not rely on di
rect motor to sensory signals, but rather operates in a similar manner to selective atten
tion. Thus, just as attention can be focused on this or that aspect of the current perceptu
al environment, it can also be directed to some subset of the current contents of working
memory. Indeed, if, as we have been suggesting, the neural substrates of memory are
shared with those of perception, then it seems likely that the same mechanisms of selec
tive attention can be applied to both the contents of perception and the contents of mem
ory. A potential mechanism for selective modulation of memory representations is by way
of a top-down bias signal that can target populations of cells in sensory cortex and modu
late their level of activity.
Top-down suppression has the opposite effect of selection or reactivation—it serves to de
crease, rather than increase, the salience of a stimulus representation in working memo
ry. Although it is sometimes argued that suppression in attention and memory is merely
the flip side of selection, and therefore does not constitute a separate processing mecha
nism in its own right, evidence from cognitive neuroscience suggests that this is not the
case. For instance, Gazzaley (2005) used a working memory delay task to directly study
the neural mechanisms underlying top-down activation and suppression by investigating
the processes involved when participants were required to select relevant and suppress
irrelevant information. During each trial, participants observed sequences of two faces
and two natural scenes presented in a randomized order. The tasks differed in the in
structions informing the participants how to process the stimuli: (1) remember faces and
ignore scenes, (2) remember scenes and ignore faces, or (3) passively view faces and
scenes without attempting to remember them. In the two memory tasks, the encoding of
the task-relevant stimuli required selective attention and thus permitted the dissociation
of physiological measures of enhancement and suppression relative to the passive base
line. fMRI data revealed top-down modulation of both activity magnitude and processing
speed that occurred above or below the perceptual baseline depending on task instruc
tion. Thus, during the encoding period of the delay task, FFA activity was enhanced when
faces had to be remembered compared with a condition in which they were passively
viewed. Likewise, FFA activity was suppressed when faces had to be ignored (with scenes
now being retained instead across the delay interval) compared with a condition in which
they were passively viewed. Thus, there appears to be at least two types of top-down sig
nal: one that serves to enhance task-relevant information and another that serves to sup
press task-relevant information. It is well documented that the nervous system uses inter
leaved inhibitory and excitatory mechanisms throughout the neuroaxis (e.g., spinal reflex
es, cerebellar outputs, and basal ganglia movement control networks). Thus, it may not
Page 31 of 48
Working Memory
be surprising that enhancement and suppression mechanisms may exist to control cogni
tion (Knight, Staines, Swick, & Chao, 1999; Shimamura, 2000). By generating contrast
through enhancements and suppressions of activity magnitude and processing speed, top-
down signals bias the likelihood of successful representation of relevant information in a
competitive system.
Although it has been proposed that the PFC provides the major source of the types of top-
down signals that we have described, this hypothesis largely originates from suggestive
findings rather than direct empirical evidence. However, a few studies lend direct causal
support to this hypothesis. For example, Fuster (1985) investigated the effect of cooling
inactivation of specific parts of the PFC on spiking activity in ITC inferior temporal cortex
(ITC) neurons, during a delayed match-to-sample DMS color task. During the delay inter
val in this task—when persistent stimulus-specific activity in ITC neurons is observed—in
activation caused attenuated spiking profiles and a loss of stimulus (p. 409) specificity of
ITC neurons. These two alterations of ITC signaling strongly implicate the PFC as a
source of top-down signals necessary for maintaining robust sensory representations in
the absence of bottom-up sensory activity. Tomita (1999) was able to isolate top-down sig
nals during the retrieval of paired associates in a visual memory task. Spiking activity
was recorded from stimulus-specific ITC neurons as cue stimuli were presented to the ip
silateral hemifield. This experiment’s unique feature was the ability to separate bottom-
up sensory signals from a top-down mnemonic reactivation, using a posterior split-brain
procedure that limited hemispheric cross-talk to the anterior corpus callosum connecting
each PFC. When a probe stimulus was presented ipsilaterally to the recording site, thus
restricting bottom-up visual input to the contralateral hemisphere, stimulus-specific neu
rons became activated at the recording site approximately 170 ms later. Because these
neurons received no bottom-up visual signals of the probe stimulus, with the only route
between the two hemispheres being through the PFC, this experiment showed that PFC
neurons were sufficient to trigger the reactivation of object-selective representations in
ITC regions in a top-down manner. The combined lesion/electrophysiological approach in
humans has rarely been implemented. Chao and Knight (1998), however, studied patients
with lateral PFC lesions during DMS tasks. It was found that when distracting stimuli are
presented during the delay period, the amplitude of the recorded ERP from posterior
electrodes was markedly increased in patients compared with controls. These results
were interpreted to show disinhibition of sensory processing and support a role of the
PFC in suppressing the representation of stimuli that are irrelevant for current behavior.
Based on the data we have reviewed thus far, we might propose that any population of
neurons within primary or unimodal association cortex can exhibit persistent neuronal ac
tivity, which serves to actively maintain the representations coded by those neuronal pop
ulations (Curtis & D’Esposito, 2003). Areas of multimodal cortex, such as PFC and pari
etal cortex, which are in a position to integrate representations through connectivity to
unimodal association cortex, are also critically involved in the active maintenance of task-
relevant information (Burgess, Gilbert, & Dumontheil, 2007; Stuss & Alexander, 2007).
Page 32 of 48
Working Memory
Miller and Cohen (2001) have proposed that in addition to the recent sensory informa
tion, integrated representations of task contingencies and even abstract rules (e.g., if this
object, then this later response) are also maintained in the PFC. This is similar to what
Fuster (1997) has long emphasized, namely that the PFC is critically responsible for tem
poral integration and the mediation of events that are separated in time but contingent
on one another. In this way, the PFC may exert “control” in that the information it repre
sents can bias posterior unimodal association cortex in order to keep neural representa
tions of behaviorally relevant sensory information activated when they are no longer
present in the external environment (B. T. Miller & D’Esposito, 2005; Postle, 2006b; Ran
ganath et al., 2004). In a real world example, when a person is looking at a crowd of peo
ple, the visual scene presented to the retina may include a myriad of angles, shapes, peo
ple, and objects. If that person is a police officer looking for an armed robber escaping
through the crowd, however, some mechanism of suppressing irrelevant visual informa
tion while enhancing task-relevant information is necessary for an efficient and effective
search. Thus, neural activity throughout the brain that is generated by input from the out
side world may be differentially enhanced or suppressed, presumably from top-down sig
nals emanating from integrative brain regions such as PFC, based on the context of the
situation. As Miller and Cohen (2001) state, putative top-down signals originating in PFC
may permit “the active maintenance of patterns of activity that represent goals and the
means to achieve them. They provide bias signals throughout much of the rest of the
brain, affecting visual processes and other sensory modalities, as well as systems respon
sible for response execution, memory retrieval, emotional evaluation, etc. The aggregate
effect of these bias signals is to guide the flow of neural activity along pathways that es
tablish the proper mappings between inputs, internal states and outputs needed to per
form a given task.” Computational models of this type of system have created a PFC mod
ule (e.g., O’Reilly & Norman, 2002) that consists of “rule” units whose activation leads to
the production of a response other than the one most strongly associated with a given in
put. Thus, “this module is not responsible for carrying out input–output mappings needed
for performance. Rather, this module influences the activity of other units whose respon
sibility is making the needed mappings” (e.g., Cohen, Dunbar, & McClelland, 1990). Thus,
there is no need to propose the existence of a homunculus (e.g., central executive) in the
brain that can perform a wide range of cognitive operations that (p. 410) are necessary for
the task at hand (Hazy, Frank, & O’Reilly, 2006).
Clearly, there are other areas of multimodal cortex such as posterior parietal cortex, and
the hippocampus, that can also be the source of top-down signals. For example, it is
thought that the hippocampus is specialized for rapid learning of arbitrary information
that can be recalled in the service of controlled processing (McClelland, McNaughton, &
O’Reilly, 1995). Several recent studies have offered evidence that the hippocampus, long
thought to contribute little to “online” cognitive activities, plays a role in the maintenance
of information in working memory for novel or high information stimuli (Olsen et al.,
2009; Olson, Page, Moore, Chatterjee, & Verfaellie, 2006; Ranganath & D’Esposito, 2001;
Rose, Olsen, Craik, & Rosenbaum, 2012). Moreover, input from brainstem neuromodula
tory systems probably plays a critical role in modulating goal-directed behavior (Robbins,
Page 33 of 48
Working Memory
2007). For example, the dopaminergic system probably plays a critical role in cognitive
control processes (for a review, see Cools & Robbins, 2004). Specifically, it is proposed
that phasic bursts of dopaminergic neurons may be critical for updating currently activat
ed task-relevant representations, whereas tonic dopaminergic activity serves to stabilize
such representations (e.g., Cohen, Braver, & Brown, 2002; Durstewitz, Seamans, & Se
jnowski, 2000).
References
Acheson, D. J., Hamidi, M., Binder, J. R., & Postle, B. R. (2011). A common neural sub
strate for language production and verbal working memory. Journal of Cognitive Neuro
science, 23 (6), 1358–1367.
Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its con
trol processes. In K. W. Spence (Ed.), The psychology of learning and motivation: Ad
vances in research and theory (Vol. 2, pp. 89–195). New York: Academic Press.
Awh, E., Jonides, J., Smith, E. E., Schumacher, E. H., Koeppe, R. A., & Katz, S. (1996). Dis
sociation of storage and rehearsal in working memory: PET evidence. Psychological
Science, 7, 25–31.
Page 34 of 48
Working Memory
Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in
Cognitive Sciences, 4 (11), 417–423.
Baddeley, A., Allen, R., & Vargha-Khadem, F. (2010). Is the hippocampus necessary for vi
sual and verbal binding in working memory? Neuropsychologia, 48 (4), 1089–1095.
Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. Bower (Ed.), The psychology
of learning and motivation (Vol. 7, pp. 47–90). New York: Academic Press.
Baddeley, A., Lewis, V., & Vallar, G. (1984). Exploring the articulatory loop. Quarterly Jour
nal of Experimental Psychology, 36, 2233–2252.
Baddeley, A. D., Thomson, N., & Buchanan, M. (1975). Word length and the structure of
short-term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575–589.
Badre, D., Poldrack, R. A., Pare-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Disso
ciable controlled retrieval and generalized selection mechanisms in ventrolateral pre
frontal cortex. Neuron, 47 (6), 907–918.
Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive
control of memory. Neuropsychologia, 45 (13), 2883–2901.
Becker, J. T., MacAndrew, D. K., & Fiez, J. A. (1999). A comment on the functional localiza
tion of the phonological storage subsystem of working memory. Brain and Cognition, 41
(1), 27–38.
Belger, A., Puce, A., Krystal, J. H., Gore, J. C., Goldman-Rakic, P., & McCarthy, G. (1998).
Dissociation of mnemonic and perceptual processes during spatial and nonspatial work
ing memory using fMRI. Human Brain Mapping, 6 (1), 14–32.
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N.,
et al. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cerebral
Cortex, 10 (5), 512–528.
Blum, R. A. (1952). Effects of subtotal lesions of frontal granular cortex on delayed reac
tion in monkeys. AMA Archives of Neurology and Psychiatry, 67 (3), 375–386.
Buchsbaum, B. R., Baldo, J., Okada, K., Berman, K. F., Dronkers, N., D’Esposito,
(p. 411)
M., et al. (2011). Conduction aphasia, sensory-motor integration, and phonological short-
term memory—An aggregate analysis of lesion and fMRI data. Brain and Language.
Buchsbaum, B. R., & D’Esposito, M. (2008b). The search for the phonological store: from
loop to convolution. Journal of Cognitive Neuroscience, 20 (5), 762–778.
Buchsbaum, B. R., Hickok, G., & Humphries, C. (2001). Role of the left superior temporal
gyrus in phonological processing for speech perception and production. Cognitive
Science, 25, 663–678.
Page 35 of 48
Working Memory
Buchsbaum, B. R., Olsen, R. K., Koch, P., & Berman, K. F. (2005). Human dorsal and ven
tral auditory streams subserve rehearsal-based and echoic processes during verbal work
ing memory. Neuron, 48 (4), 687–697.
Burgess, P. W., Gilbert, S. J., & Dumontheil, I. (2007). Function and localization within ros
tral prefrontal cortex (area 10). Philosophical Transactions of the Royal Society of Lon
don, Series B, Biological Sciences, 362 (1481), 887–899.
Butters, N., Pandya, D., Stein, D., & Rosen, J. (1972). A search for the spatial engram
within the frontal lobes of monkeys. Acta Neurobiologiae Experimentalis (Wars), 32 (2),
305–329.
Chao, L. L., & Knight, R. T. (1998). Contribution of human prefrontal cortex to delay per
formance. Journal of Cognitive Neurosciences, 10 (2), 167–177.
Chein, J. M., & Morrison, A. B. (2010). Expanding the mind’s workspace: training and
transfer effects with a complex working memory span task. Psychonomic Bulletin and Re
view, 17 (2), 193–199.
Chevillet, M., Riesenhuber, M., & Rauschecker, J. P. (2011). Functional correlates of the
anterolateral processing hierarchy in human auditory cortex. Journal of Neuroscience, 31
(25), 9345–9352.
Chun, M. M., & Johnson, M. K. (2011). Memory: Enduring traces of perceptual and reflec
tive attention. Neuron, 72 (4), 520–535.
Cocchini, G., Logie, R. H., Della Sala, S., MacPherson, S. E., & Baddeley, A. D. (2002).
Concurrent performance of two memory tasks: Evidence for domain-specific working
memory systems. Memory and Cognition, 30 (7), 1086–1095.
Cohen, J. D., Braver, T. S., & Brown, J. W. (2002). Computational perspectives on dopamine
function in prefrontal cortex. Current Opinion in Neurobiology, 12 (2), 223–229.
Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic process
es: A parallel distributed processing account of the Stroop effect. Psychological Review,
97 (3), 332–361.
Cools, R., & Robbins, T. W. (2004). Chemistry of the adaptive mind. Philosophical Transac
tions, Series A, Mathematical, Physical, and Engineering Sciences, 362 (1825), 2871–
2888.
Corbetta, M., Kincade, J. M., & Shulman, G. L. (2002). Neural systems for visual orienting
and their relationships to spatial working memory. Journal of Cognitive Neurosciences, 14
(3), 508–523.
Courtney, S. M., Petit, L., Maisog, J. M., Ungerleider, L. G., & Haxby, J. V. (1998). An area
specialized for spatial working memory in human frontal cortex. Science, 279 (5355),
1347–1351.
Page 36 of 48
Working Memory
Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1996). Object and spatial visu
al working memory activate separate neural systems in human cortex. Cerebral Cortex, 6
(1), 39–49.
Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1997). Transient and sustained
activity in a distributed neural system for human working memory. Nature, 386 (6625),
608–611.
Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their
mutual constraints within the human information-processing system. Psychological Bul
letin, 104 (2), 163–191.
Cowan, N., Li, D., Moffitt, A., Becker, T. M., Martin, E. A., Saults, J. S., et al. (2011). A
neural region of abstract working memory. Journal of Cognitive Neurosciences, 23 (10),
2852–2863.
Curtis, C. E., & D’Esposito, M. (2003). Persistent activity in the prefrontal cortex during
working memory. Trends in Cognitive Sciences, 7 (9), 415–423.
Curtis, C. E., Rao, V. Y., & D’Esposito, M. (2004). Maintenance of spatial and motor codes
during oculomotor delayed response tasks. Journal of Neuroscience, 24 (16), 3944–3952.
D’Esposito, M., Aguirre, G. K., Zarahn, E., Ballard, D., Shin, R. K., & Lease, J. (1998).
Functional MRI studies of spatial and nonspatial working memory. Brain Research. Cogni
tive Brain Research, 7 (1), 1–13.
D’Esposito M, Detre JA, Alsop DC, Shin RK, Atlas S, Grossman M (1995). The neural basis
of the central executive system of working memory. Nature, 378: 279–281.
D’Esposito, M., Postle, B. R., Jonides, J., & Smith, E. E. (1999). The neural substrate and
temporal dynamics of interference effects in working memory as revealed by event-relat
ed functional MRI. Proceedings of the National Academy of Sciences U S A, 96 (13),
7514–7519.
Damasio, A. R. (1992). Aphasia. New England Journal of Medicine, 326 (8), 531–539.
Damasio, H., & Damasio, A. R. (1980). The anatomical basis of conduction aphasia. Brain,
103 (2), 337–350.
Della Sala, S., Gray, C., Baddeley, A., Allamano, N., & Wilson, L. (1999). Pattern span: A
tool for unwelding visuo-spatial memory. Neuropsychologia, 37 (10), 1189–1199.
Page 37 of 48
Working Memory
Dronkers, N. F., Wilkins, D. P., Van Valin, R. D., Jr., Redfern, B. B., & Jaeger, J. J. (2004). Le
sion analysis of the brain areas involved in language comprehension. Cognition, 92 (1-2),
145–177.
Druzgal, T. J., & D’Esposito, M. (2001). Activity in fusiform face area modulated as a func
tion of working memory load. Brain Research. Cognitive Brain Research, 10 (3), 355–364.
Druzgal, T. J., & D’Esposito, M. (2003). Dissecting contributions of prefrontal cortex and
fusiform face area to face working memory. Journal of Cognitive Neurosciences, 15 (6),
771–784.
Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. (1999). Working memory,
short-term memory, and general fluid intelligence: A latent-variable approach. Journal of
Experimental Psychology: General, 128 (3), 309–331.
Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review,
102 (2), 211–245.
Fritz, J., Mishkin, M., & Saunders, R. C. (2005). In search of an auditory engram.
(p. 412)
Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1989). Mnemonic coding of visual
space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 61 (2),
331–349.
Fuster, J. M., & Alexander, G. E. (1971). Neuron activity related to short-term memory.
Science, 173 (997), 652–654.
Fuster, J. M., Bauer, R. H., & Jervey, J. P. (1985). Functional interactions between infer
otemporal and prefrontal cortex in a cognitive task. Brain Research, 330 (2), 299–307.
Gazzaley, A., Cooney, J. W., McEvoy, K., Knight, R. T., & D’Esposito, M. (2005). Top-down
enhancement and suppression of the magnitude and speed of neural activity. Journal of
Cognitive Neuroscience, 17 (3), 507–517.
Geschwind, N. (1965). Disconnexion syndromes in animals and man. I. Brain, 88 (2), 237–
294.
Glanzer, M., & Cunitz, A.-R. (1966). Two storage mechanisms in free recall. Journal of Ver
bal Learning and Verbal Behavior, 5, 351–360.
Page 38 of 48
Working Memory
Hasher, L., Zacks, R. T., & Rahhal, T. A. (1999). Timing, instructions, and inhibitory con
trol: some missing factors in the age and memory debate. Gerontology, 45 (6), 355–357.
Hashimoto, R., Lee, K., Preus, A., McCarley, R. W., & Wible, C. G. (2011). An fMRI study of
functional abnormalities in the verbal working memory system and the relationship to
clinical symptoms in chronic schizophrenia. Cerebral Cortex, 20 (1), 46–60.
Hazy, T. E., Frank, M. J., & O’Reilly, R. C. (2006). Banishing the homunculus: making
working memory work. Neuroscience, 139 (1), 105–118.
Hickok, G., Buchsbaum, B., Humphries, C., & Muftuler, T. (2003). Auditory-motor interac
tion revealed by fMRI: speech, music, and working memory in area Spt. Journal of Cogni
tive Neuroscience, 15 (5), 673–682.
Hickok, G., Okada, K., & Serences, J. T. (2009). Area Spt in the human planum temporale
supports sensory-motor integration for speech processing. Journal of Neurophysiology,
101 (5), 2725–2732.
Hickok, G., & Poeppel, I. D. (2000). Towards a functional neuroanatomy of speech percep
tion. Journal of Cognitive Neuroscience, 45–45.
Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for under
standing aspects of the functional anatomy of language. Cognition, 92 (1-2), 67–99.
Hulme, C., Newton, P., Cowan, N., Stuart, G., & Brown, G. (1999). Think before you speak:
Pauses, memory search, and trace redintegration processes in verbal memory span. Jour
nal of Experimental Psychology: Learning, Memory, and Cognition, 25 (2), 447–463.
Hunter, W. S. (1913). The delayed reaction in animals and children. Behavioral Mono
graphs, 2, 1–86.
Ingvar, D. H. (1977). Functional responses of the human brain studied by regional cere
bral blood flow techniques. Acta Clinical Belgica, 32 (2), 68–83.
Page 39 of 48
Working Memory
Ingvar, D. H., & Risberg, J. (1965). Influence of mental activity upon regional cerebral
blood flow in man: A preliminary study. Acta Neurologica Scandinavica Supplementum,
14, 183–186.
Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Improving fluid intelli
gence with training on working memory. Proceedings of the National Academy of
Sciences U S A, 105 (19), 6829–6833.
Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Shah, P. Short- and long-term benefits of cog
nitive training. Proceedings of the National Academy of Sciences U S A, 108 (25), 10081–
10086.
Jha, A. P., & McCarthy, G. (2000). The influence of memory load upon delay-interval activi
ty in a working-memory task: an event-related functional MRI study. Journal of Cognitive
Neuroscience, 12 (Suppl 2), 90–105.
Jonides, J., & Nee, D. E. (2006). Brain mechanisms of proactive interference in working
memory. Neuroscience, 139 (1), 181–193.
Jonides, J., Schumacher, E. H., Smith, E. E., Koeppe, R. A., Awh, E., Reuter-Lorenz, P. A.,
et al. (1998). The role of parietal cortex in verbal working memory. Journal of Neuro
science, 18 (13), 5026–5034.
Jonides, J., Smith, E. E., Koeppe, R. A., Awh, E., Minoshima, S., & Mintun, M. A. (1993).
Spatial working memory in humans as revealed by PET. Nature, 363 (6430), 623–625.
Joseph, J. P., & Barone, P. (1987). Prefrontal unit activity during a delayed oculomotor task
in the monkey. Experimental Brain Research, 67 (3), 460–468.
Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17
(11), 4302–4311.
Klauer, K. C., & Zhao, Z. (2004). Double dissociations in visual and spatial short-term
memory. Journal of Experimental Psychology: General, 133 (3), 355–381.
Klingberg, T., Fernell, E., Olesen, P. J., Johnson, M., Gustafsson, P., Dahlstrom, K., et al.
(2005). Computerized training of working memory in children with ADHD: A randomized,
controlled trial. Journal of the American Academy of Child and Adolescent Psychiatry, 44
(2), 177–186.
Page 40 of 48
Working Memory
Knight, R. T., Staines, W. R., Swick, D., & Chao, L. L. (1999). Prefrontal cortex regulates
inhibition and excitation in distributed neural networks. Acta Psychologica (Amst), 101
(2-3), 159–178.
Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Muller, K., & Gruber, O. (2009).
(p. 413)
Functional architecture of verbal and tonal working memory: an FMRI study. Human
Brain Mapping, 30 (3), 859–873.
Koenigs, M., Acheson, D. J., Barbey, A. K., Solomon, J., Postle, B. R., & Grafman, J. (2011).
Areas of left perisylvian cortex mediate auditory-verbal short-term memory. Neuropsy
chologia, 49 (13), 3612–3619.
Leff, A. P., Schofield, T. M., Crinion, J. T., Seghier, M. L., Grogan, A., Green, D. W., et al.
(2009). The left superior temporal gyrus is a shared substrate for auditory short-term
memory and speech comprehension: Evidence from 210 patients with stroke. Brain, 132
(Pt 12), 3401–3410.
Leung, H. C., Seelig, D., & Gore, J. C. (2004). The effect of memory load on cortical activi
ty in the spatial working memory circuit. Cognitive, Affective, and Behavioral Neuro
science, 4 (4), 553–563.
Lewis-Peacock, J. A., Drysdale, A. T., Oberauer, K., & Postle, B. R. (2012). Neural evidence
for a distinction between short-term memory and the focus of attention. Journal of Cogni
tive Neuroscience, 24 (1), 61–79.
Linden, D. E., Bittner, R. A., Muckli, L., Waltz, J. A., Kriegeskorte, N., Goebel, R., et al.
(2003). Cortical capacity constraints for visual working memory: Dissociation of fMRI
load effects in a fronto-parietal network. NeuroImage, 20 (3), 1518–1530.
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and
conjunctions. Nature, 390 (6657), 279–281.
Luria, A. R., Sokolov, E. N., & Klinkows, M. (1967). Towards a neurodynamic analysis of
memory disturbances with lesions of left temporal lobe. Neuropsychologia, 5 (1), 1.
Martin, R. C., & He, T. (2004). Semantic short-term memory and its role in sentence pro
cessing: a replication. Brain and Language, 89 (1), 76–82.
McCarthy, G., Blamire, A. M., Puce, A., Nobre, A. C., Bloch, G., Hyder, F., et al. (1994).
Functional magnetic resonance imaging of human prefrontal cortex activation during a
spatial working memory task. Proceedings of the National Academy of Sciences U S A, 91
(18), 8690–8694.
Page 41 of 48
Working Memory
McCarthy, G., Puce, A., Constable, R. T., Krystal, J. H., Gore, J. C., & Goldman-Rakic, P.
(1996). Activation of human prefrontal cortex during spatial and nonspatial working mem
ory tasks measured by functional MRI. Cerebral Cortex, 6 (4), 600–611.
McCarthy, R. A., & Warrington, E. K. (1987). The double dissociation of short-term memo
ry for lists and sentences. Evidence from aphasia. Brain, 110 (Pt 6), 1545–1563.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complemen
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102 (3),
419–457.
Mecklinger, A., Gruenewald, C., Besson, M., Magnie, M. N., & Von Cramon, D. Y. (2002).
Separable neuronal circuitries for manipulable and non-manipulable objects in working
memory. Cerebral Cortex, 12 (11), 1115–1123.
Miller, B. T., & D’Esposito, M. (2005). Searching for “the top” in top-down control. Neuron,
48 (4), 535–538.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An
nual Review of Neuroscience, 24, 167–202.
Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D.
(2000). The unity and diversity of executive functions and their contributions to complex
“frontal lobe” tasks: A latent variable analysis. Cognitive Psychology, 41 (1), 49–100.
Morrison, A. B., & Chein, J. M. (2011). Does working memory training work? The promise
and challenges of enhancing cognition by training working memory. Psychonomic Bulletin
and Review, 18 (1), 46–60.
Munk, M. H., Linden, D. E., Muckli, L., Lanfermann, H., Zanella, F. E., Singer, W., et al.
(2002). Distributed cortical systems in visual short-term memory revealed by event-relat
ed functional magnetic resonance imaging. Cerebral Cortex, 12 (8), 866–876.
Munneke, J., Belopolsky, A. V., & Theeuwes, J. (2012). Shifting Attention within Memory
Representations Involves Early Visual Areas. PLoS One, 7 (4), e35528.
Niki, H. (1974). Differential activity of prefrontal units during right and left delayed re
sponse trials. Brain Research, 70 (2), 346–349.
Niki, H., & Watanabe, M. (1976). Prefrontal unit activity and delayed response: Relation
to cue location versus direction of response. Brain Research, 105 (1), 79–88.
Nystrom, L. E., Braver, T. S., Sabb, F. W., Delgado, M. R., Noll, D. C., & Cohen, J. D.
(2000). Working memory for letters, shapes, and locations: fMRI evidence against stimu
Page 42 of 48
Working Memory
Oberauer, K., Schulze, R., Wilhelm, O., & Suss, H. M. (2005). Working memory and intelli
gence—their correlation and their relation: Comment on Ackerman, Beier, and Boyle
(2005). Psychological Bulletin, 131 (1), 61–65; author reply, 72–65.
Olsen, R. K., Nichols, E. A., Chen, J., Hunt, J. F., Glover, G. H., Gabrieli, J. D., et al. (2009).
Performance-related sustained and anticipatory activity in human medial temporal lobe
during delayed match-to-sample. Journal of Neuroscience, 29 (38), 11880–11890.
Olson, I. R., Page, K., Moore, K. S., Chatterjee, A., & Verfaellie, M. (2006). Working memo
ry for conjunctions relies on the medial temporal lobe. Journal of Neuroscience, 26 (17),
4596–4601.
Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, Ballard CG
(2010). Putting brain training to the test. Nature, 465: 775–778.
Paulesu, E., Frith, C. D., & Frackowiak, R. S. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.
Petrides, M., Alivisatos, B., Evans, A. C., & Meyer, E. (1993). Dissociation of hu
(p. 414)
Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
(1999). Functional specialization for semantic and phonological processing in the left in
ferior prefrontal cortex. NeuroImage, 10 (1), 15–35.
Posner, M. I., Petersen, S. E., Fox, P. T., & Raichle, M. E. (1988). Localization of cognitive
operations in the human brain. Science, 240 (4859), 1627–1631.
Postle, B. R. (2006b). Working memory as an emergent property of the mind and brain.
Neuroscience, 139 (1), 23–38.
Postle, B. R., Berger, J. S., & D’Esposito, M. (1999). Functional neuroanatomical double
dissociation of mnemonic and executive control processes contributing to working memo
Page 43 of 48
Working Memory
Postle, B. R., Berger, J. S., Taich, A. M., & D’Esposito, M. (2000). Activity in human frontal
cortex associated with spatial working memory and saccadic behavior. Journal of Cogni
tive Neuroscience, 12 (Suppl 2), 2–14.
Postle, B. R., D’Esposito, M., & Corkin, S. (2005). Effects of verbal and nonverbal interfer
ence on spatial and object visual working memory. Memory and Cognition, 33 (2), 203–
212.
Postle, B. R., Druzgal, T. J., & D’Esposito, M. (2003). Seeking the neural substrates of vi
sual working memory storage. Cortex, 39 (4-5), 927–946.
Postman, L., & Phillips, L.-W. (1965). Short-term temporal changes in free recall. Quarter
ly Journal of Experimental Psychology, 17, 132–138.
Prabhakaran, V., Narayanan, K., Zhao, Z., & Gabrieli, J. D. (2000). Integration of diverse
information in working memory within the frontal lobe. Nature Neuroscience, 3 (1), 85–
90.
Quintana, J., Yajeya, J., & Fuster, J. M. (1988). Prefrontal representation of stimulus attrib
utes during delay tasks. I. Unit activity in cross-temporal integration of sensory and sen
sory-motor information. Brain Research, 474 (2), 211–221.
Rama, P., Poremba, A., Sala, J. B., Yee, L., Malloy, M., Mishkin, M., et al. (2004). Dissocia
ble functional cortical topographies for working memory maintenance of voice identity
and location. Cerebral Cortex, 14 (7), 768–780.
Rama, P., Sala, J. B., Gillen, J. S., Pekar, J. J., & Courtney, S. M. (2001). Dissociation of the
neural systems for working memory maintenance of verbal and nonspatial visual informa
tion. Cognitive, Affective, and Behavioral Neuroscience, 1 (2), 161–171.
Ranganath, C., Cohen, M. X., Dam, C., & D’Esposito, M. (2004). Inferior temporal, pre
frontal, and hippocampal contributions to visual working memory maintenance and asso
ciative memory retrieval. Journal of Neuroscience, 24 (16), 3917–3925.
Ranganath, C., & D’Esposito, M. (2001). Medial temporal lobe activity associated with ac
tive maintenance of novel information. Neuron, 31 (5), 865–873.
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhu
man primates illuminate human speech processing. Nature Neuroscience, 12 (6), 718–
724.
Page 44 of 48
Working Memory
Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what”
and “where” in auditory cortex. Proceedings of the National Academy of Sciences U S A,
97 (22), 11800–11806.
Raye, C. L., Johnson, M. K., Mitchell, K. J., Reeder, J. A., & Greene, E. J. (2002). Neu
roimaging a single thought: Dorsolateral PFC activity associated with refreshing just-acti
vated information. NeuroImage, 15 (2), 447–453.
Risberg, J., & Ingvar, D. H. (1973). Patterns of activation in the grey matter of the domi
nant hemisphere during memorizing and reasoning: A study of regional cerebral blood
flow changes during psychological testing in a group of neurologically normal patients.
Brain, 96 (4), 737–756.
Romanski, L. M. (2004). Domain specificity in the primate prefrontal cortex. Cognitive, Af
fective, and Behavioral Neuroscience, 4 (4), 421–429.
Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P.
(1999). Dual streams of auditory afferents target multiple domains in the primate pre
frontal cortex. Nature Neuroscience, 2 (12), 1131–1136.
Rose, N. S., Olsen, R. K., Craik, F. I., & Rosenbaum, R. S. (2012). Working memory and
amnesia: the role of stimulus novelty. Neuropsychologia, 50 (1), 11–18.
Rowe, J. B., Toni, I., Josephs, O., Frackowiak, R. S., & Passingham, R. E. (2000). The pre
frontal cortex: response selection or maintenance within working memory? Science, 288
(5471), 1656–1660.
Sala, J. B., Rama, P., & Courtney, S. M. (2003). Functional topography of a distributed
neural system for spatial and nonspatial information maintenance in working memory.
Neuropsychologia, 41 (3), 341–356.
Salmon, E., Van der Linden, M., Collette, F., Delfiore, G., Maquet, P., Degueldre, C., et al.
(1996). Regional brain activity during working memory tasks. Brain, 119 (Pt 5), 1617–
1625.
Schumacher, E. H., Lauber, E., Awh, E., Jonides, J., Smith, E. E., & Koeppe, R. A. (1996).
PET evidence for an amodal verbal working memory system. NeuroImage, 3 (2), 79–88.
Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le
sions. Journal of Neurology, Neurosurgery, and Psychiatry, 20 (1), 11–21.
Page 45 of 48
Working Memory
Shallice, T., & Butterworth, B. (1977). Short-term memory impairment and spontaneous
speech. Neuropsychologia, 15 (6), 729–735.
Shallice, T., & Vallar, G. (1990). The impairment of auditory-verbal short-term storage. In
G. Vallar & T. Shallice (Eds.), Neuropsychological impairments of short-term memory (pp.
11–53). Cambridge, UK: Cambridge University Press.
Shipstead, Z., Redick, T. S., & Engle, R. W. (2012). Is working memory training effective?
Psychological Bulletin, 138 (4).
Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes.
Science, 283 (5408), 1657–1661.
Smith, E. E., Jonides, J., Koeppe, R. A., Awh, E., Schumacher, E. H., & Minoshima, S.
(1995). Spatial versus object working-memory—PET investigations. Journal of Cognitive
Neuroscience, 7 (3), 337–356.
Smith, E. E., Jonides, J., Marshuetz, C., & Koeppe, R. A. (1998). Components of verbal
working memory: evidence from neuroimaging. Proceedings of the National Academy of
Sciences U S A, 95 (3), 876–882.
Srimal, R., & Curtis, C. E. (2008). Persistent neural activity during the maintenance of
spatial position in working memory. NeuroImage, 39 (1), 455–468.
Stevens, A. A. (2004). Dissociating the cortical basis of memory for voices, words and
tones. Brain Research. Cognitive Brain Research, 18 (2), 162–171.
Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceedings
of the National Academy of Sciences U S A, 94 (26), 14792–14797.
Page 46 of 48
Working Memory
Thompson-Schill, S. L., Jonides, J., Marshuetz, C., Smith, E. E., D’Esposito, M., Kan, I. P.,
et al. (2002). Effects of frontal lobe damage on interference effects in working memory.
Cognitive, Affective, and Behavioral Neuroscience, 2 (2), 109–120.
Tian, B., Reser, D., Durham, A., Kustov, A., & Rauschecker, J. P. (2001). Functional special
ization in rhesus monkey auditory cortex. Science, 292 (5515), 290–293.
Todd, J. J., & Marois, R. (2004). Capacity limit of visual short-term memory in human pos
terior parietal cortex. Nature, 428 (6984), 751–754.
Tomita, H., Ohbayashi, M., Nakahara, K., Hasegawa, I., & Miyashita, Y. (1999). Top-down
signal from prefrontal cortex in executive control of memory retrieval. Nature, 401
(6754), 699–703.
Ungerleider, L., & Mishkin, M. (1982). Two cortical visual systems. In D. Ingle, M. A.
Goodale & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge,
M A: MIT Press.
Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in
visual working memory capacity. Nature, 428 (6984), 748–751.
Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal indi
vidual differences in controlling access to working memory. Nature, 438 (7067), 500–503.
Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A meta-
analysis. Cognitive, Affective, and Behavioral Neuroscience, 3 (4), 255–274.
Wagner, A. D., Pare-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning:
Left prefrontal cortex guides controlled semantic retrieval. Neuron, 31 (2), 329–338.
Walter, H., Wunderlich, A. P., Blankenhorn, M., Schafer, S., Tomczak, R., Spitzer, M., et al.
(2003). No hypofrontality, but absence of prefrontal lateralization comparing verbal and
spatial working memory in schizophrenia. Schizophrenic Research, 61 (2-3), 175–184.
Warrington, E., & Shallice, T. (1969). Selective impairment of auditory verbal short-term
memory. Brain, 92, 885–886.
Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72, 89–
104.
Wilson, F. A., Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of object and
spatial processing domains in primate prefrontal cortex. Science, 260 (5116), 1955–1958.
Xu, Y., & Chun, M. M. (2006). Dissociable neural mechanisms supporting visual short-
term memory for objects. Nature, 440 (7080), 91–95.
Page 47 of 48
Working Memory
Yi, D. J., Woodman, G. F., Widders, D., Marois, R., & Chun, M. M. (2004). Neural fate of ig
nored stimuli: Dissociable effects of perceptual and working memory load. Nature Neuro
science, 7 (9), 992–996.
Zarahn, E., Aguirre, G., & D’Esposito, M. (1997). A trial-based experimental design for
fMRI. NeuroImage, 6 (2), 122–138.
Zurowski, B., Gostomzyk, J., Gron, G., Weller, R., Schirrmeister, H., Neumeier, B., et al.
(2002). Dissociating a common working memory network from different neural substrates
of phonological and spatial stimulus processing. NeuroImage, 15 (1), 45–57.
Bradley R. Buchsbaum
Mark D'Esposito
Page 48 of 48
Motor Skill Learning
Introduction
Do you aspire to take up the piano? To improve your tennis or golf game? Or, perhaps you
would like to maintain your motor abilities in the face of injury, growth, advancing age, or
disease? Skill learning underlies our capacity for such adaptive motor behaviors. Re
search over the past 20 years has provided great insight into the dynamic neural process
es underlying human motor skill acquisition, focusing primarily on brain networks that
are engaged during early versus late stages of learning. What has been challenging for
the field is to tightly link these shifting neural processes with what is known about mea
sureable behavioral changes and strategic processes that occur during learning. Because
individuals learn at different rates and often adopt different strategies, it is difficult to
characterize the dynamics of evolving motor behaviors. Moreover, skill learning is often
implicit, so verbal reports about the learning process are imprecise and unreliable. Here,
we review our current understanding of skill learning from a cognitive neuroscience
Page 1 of 38
Motor Skill Learning
perspective, with a particular emphasis on linking the cognitive (i.e., which strategies and
behavioral processes are relied on for skill learning) with the neuroscience (i.e., which
neural networks underlie these processes, and where motor memories are stored in the
brain). Researchers in different disciplines have employed varying approaches to these
two topics with relatively little crosstalk. For example, those working from the behavioral
approach have been focused on questions such as whether implicit or explicit memory
systems are engaged during early and late learning; those working from the computation
al neuroscience approach have modeled fast and slow processes of learning; and those
working from the cognitive neuroscience approach have identified large-scale shifts in
brain networks that are engaged (p. 417) during early versus late learning. Here, we en
deavor to bring together these predominantly autonomous schools of thought. Given the
constraints of this chapter, we focus on learning from overt practice and do not address
other aspects of the field such as sleep-dependent consolidation of learning, transfer of
learning, mental practice, and learning by observing others. Moreover, these topics have
been recently reviewed elsewhere (Garrison, Winstein, & Aziz-Zadeh, 2010; Krakauer &
Shadmehr, 2006; Robertson, Pascual-Leone, & Miall, 2004; Seidler, 2010).
Researchers studying skill acquisition have classified learning into at least two broad cat
egories: sensorimotor adaptation and sequence learning (Doyon & Benali, 2005; Willing
ham, 1998). In sensorimotor adaptation paradigms, participants modify movements to ad
just to changes in either sensory input or motor output characteristics. A real-world ex
ample is learning to drive a new car: The magnitude of vehicle movement in response to
the amount of wheel turn and accelerator depression varies across vehicles. Thus, the dri
ver must learn the new mapping between his or her actions and the resulting vehicle
movements. Another example of sensorimotor adaptation is learning how the size and
speed of hand movements of a mouse correspond to cursor movements on a computer
display screen. The study of motor performance under transformed spatial mappings
spans over 100 years. Helmholtz (1867) originally used prisms to invert the visual world,
whereas more recent investigations make use of computer displays to alter visual feed
back of movement (Cunningham, 1989; Ghilardi, Gordon, & Ghez, 1995; Krakauer, Pine,
Ghilardi, & Ghez, 2000; Seidler, Noll, & Chintalapati, 2006). These studies have demon
strated that sensorimotor adaptation occurs when movements are actively made in the
new environment.
Motor sequence learning refers to the progressive association between isolated elements
of movement, eventually allowing for a multi-element sequence to be performed quickly.
A real-world example is a gymnast learning a new tumbling routine. The gymnast already
knows how to perform the individual skills in isolation, but must practice to combine
them with seamless transitions.
Page 2 of 38
Motor Skill Learning
as early as possible, as many useful actions as we can” (James, 1890). Even well before
James, the ancient Greeks knew that repetitive action brought about physiological and
behavioral changes in an organism. In his treatise, On the Motion of Animals (1908/330
BC), Aristotle noted that “the body must straightaway be moved and changed with the
changes that nature makes dependent upon one another.”
It is now more than 2,000 years since Aristotle first philosophized on the changes associ
ated with the movement of organisms. How much, or how little, have recent advances in
cognitive neuroscience changed the way we think about skill learning? What theories and
ideas do we still hold dear and which have we discarded? This section of the chapter will
highlight a few influential models of the processes underlying skill learning that have
been offered in the past 50 years or so.
In 1967 Paul Fitts and Michael Posner proposed an influential model for understanding
motor skill acquisition (Fitts & Posner, 1967). In their model, the learning of a movement
progresses through three interrelated phases: the cognitive phase, the associative phase,
and the autonomous phase. One utility of this model is that it applies well to both motor
and cognitive skill acquisition.
In the cognitive stage, learners must use their attentional resources to break down a de
sired skill into discrete components. This involves creating a mental picture of the skill,
which helps to facilitate an understanding of how these parts come together to form cor
Page 3 of 38
Motor Skill Learning
rect execution of the desired movement. Performance at this stage might be defined as
conscious incompetence; individuals are cognizant of the various components of the task,
yet cannot perform them efficiently and effectively. The next phase of the model, the asso
ciative phase, requires repeated practice and the use of feedback to link the component
parts into a smooth action. The ability to distinguish important from unimportant stimuli
is central to this stage of the model. For example, if a baseball player is learning to detect
and hit a curveball, attention should be paid to the most telling stimuli, such as the angle
of the wrist on the pitcher’s throwing hand, not the height of the pitcher’s leg-kick. The
last stage of the model, the autonomous stage, involves development of the learned skill
so that it becomes habitual and automatic. Individuals at this stage rely on processes that
require little or no conscious attention. Performance may be defined as unconscious com
petence, and is (p. 418) reliant on experience and stored knowledge easily accessible for
the execution of the motor skill (Figure 20.1). As will be seen throughout this chapter, this
model has had an enduring impact on the field of skill learning.
Another influential theory from the information processing perspective is Adams’ (1971)
closed-loop theory of skill acquisition. In this theory, two independent memory represen
tations, a memory trace and a perceptual trace, serve to facilitate the learning of self-
paced pointing movements. Selection and initiation of movement are the responsibility of
the memory trace, whose strength is seen as a function of the stimulus–response contigui
ty. With practice the strength of the memory trace is increased (Newell, 1991). The per
ceptual trace relies on feedback, both internal and external, to create an image of the cor
rectness of the desired skill. Motor learning then takes place through the process of de
tecting errors and inconsistencies between the perceptual trace and memory trace
(Adams, 1971). Repetition, feedback, and refinement serve to produce a set of associated
sensory representations and movement units that can be called on, depending on the req
uisite skill. Interestingly, both internal and external feedback loops are incorporated into
more modern skill acquisition theories as well.
Page 4 of 38
Motor Skill Learning
Many theories of skill learning describe processes of optimization. The oldest of these
theories is optimization of movement cost; that is, as learning progresses, performance of
the skill becomes not only more automatic, but also less effortful. For example, in the
case of runners, refinement and optimization of running form allows athletes to run at the
same speed with less effort, and consequently to run faster at maximal effort (Conley &
Krahenbuhl, 1980; Jones, 1998). In the case of Olympic weightlifters, learning complex
techniques allows them to lift heavier weights with less effort (Enoka, 1988).
This idea fits well with the theory of evolution: strategies that minimize energy consump
tion should be favored. As a consequence, early studies of motor control in the framework
of optimization focused on the metabolic cost of movements, particularly gait (Atzler &
Herbst, 1927; Ralston, 1958). These studies demonstrated good accordance between the
metabolically optimal walking speed and preferred walking speed (Holt, Hamill, & An
dres, 1991).
However, movements are not always optimized for lowest metabolic cost, even after ex
tensive (p. 419) practice (Nelson, 1983). One possible reason for this is that there are sim
ply too many degrees of freedom for the motor system to sample all possibilities. Further
more, our environment is highly variable, our information about it is too limited, and our
movements and their associated feedback are too noisy.
In an attempt to explain how individuals adapt despite such variability, researchers have
suggested that the motor system relies on a Bayesian model to maximize the likelihood of
desired outcomes (Geisler, 1989; Gepshtein, Seydell, & Trommershauser, 2007; Körding &
Wolpert, 2004; Maloney & Mamassian, 2009; Seydell, McCann, Trommershauser, & Knill,
2008; Trommershäuser, Maloney, & Landy, 2008). Such approaches predict that move
ment accuracy is stressed early in learning, which quickly eliminates many inefficient
movement patterns.
A model of Bayesian transfer, in which parameters from one task are used to more rapidly
learn a second, novel task, predicts that learning complex movements is achieved most
efficiently by first learning component, simple movements (Maloney & Mamassian, 2009).
This may seem like common sense: beginning piano students don’t start right away play
ing concertos, but rather start with single notes, scales, and chords, and progress from
there. In the beginning, basic commands, known as motor primitives, are combined to
produce a simple movement such as a key press or a pen stroke (Nishimoto & Tani, 2009;
Paine & Tani, 2004; Polyakov, Drori, Ben-Shaul, Abeles, & Flash, 2009; Thoroughman &
Shadmehr, 2000). By learning simple movements first, the brain is able to simultaneously
form a model of the environment and its uncertainty, which transfers to other tasks (Mc
Nitt-Gray, Requejo, & Flashner, 2006; Seydell et al., 2008).
The advent and widespread use of neuroimaging techniques have opened the door for in
tegration of these and other psychological theories with evolving views of brain function.
Questions that have dominated the field of neuromotor control, and which continue to
Page 5 of 38
Motor Skill Learning
take center stage today, include: What are the neural bases of the stages of learning?
How does their identification inform our knowledge regarding the underlying processes
of skill learning? Where in the brain are motor memories formed and stored? How gener
alizable are these representations?
Early neuroimaging studies often investigated motor tasks because they allow for simple
recording and tracking of overt behavioral responses (Grafton, Mazziotta, Woods, &
Phelps, 1992; Kim et al., 1993; Pascual-Leone, Brasil-Neto, Valls-Sole, Cohen, & Hallett,
1992; Pascual-Leone, Valls-Sole, et al., 1992). Additionally, the discovery that motor repre
sentations were capable of exhibiting experience-dependent change even in the adult
brain was greeted with much interest and excitement. These earlier experiments identi
fied prominent roles for the motor cortex, cerebellum, and striatum in skill learning, com
plementing the results of earlier experiments conducted with neurological patients (Pas
cual-Leone et al., 1993; Weiner, Hallett, & Funkenstein, 1983). Much of the early neu
roimaging experiments of skill learning focused specifically on the motor cortex, identify
ing an initial expansion of functional movement representations in primary motor cortex,
followed by retraction of these representations as movement automatization progressed
(Doyon, Owen, Petrides, Sziklas, & Evans, 1996; Grafton, Mazziotta, Presty, et al., 1992;
Grafton, Woods, Mazziotta, & Phelps, 1991; Jueptner, Frith, Brooks, Frackowiak, & Pass
ingham, 1997; Jueptner, Stephan, et al., 1997; Karni et al., 1995, 1998; Pascual-Leone,
Grafman, & Hallett, 1994; Pascual-Leone & Torres, 1993).
It seems logical to begin studying motor learning processes by focusing on the motor cor
tex. However, these and other subsequent studies also reported extensive activation of
“nonmotor” brain networks during skill learning as well, including engagement of dorso
lateral prefrontal cortex, parietal cortex, anterior cingulate cortex, and associative re
gions of the striatum (cf. Doyon & Benali, 2005). This work is extensively reviewed in the
current chapter, with a particular emphasis on the processes and functions of both non
motor and motor networks that are engaged across the time course of learning.
Recent neuroimaging studies have tracked the time course of involvement of these addi
tional brain structures. Typically, frontal-parietal systems are engaged early in learning
with a shift to activation in more “basic” motor cortical and subcortical structures later in
learning. This is taken as support that early learning is a more cognitively controlled
process, whereas later learning is more automatic. This view is supported by studies
showing dual-task interference during early learning (Eversheim & Bock, 2001; Taylor &
Thoroughman, 2007, 2008) and those reporting correlations between cognitive capacity
and the rate of early learning (Anguera, Reuter-Lorenz, Willingham, & Seidler, 2010; Bo,
Borza, & Seidler, 2009; Bo & Seidler, 2009). (p. 420) In the subsequent sections, we out
Page 6 of 38
Motor Skill Learning
line evidence and provide reasoned speculation regarding precisely which cognitive
mechanisms are engaged during early skill learning.
In this section, we describe what is known about the neurocognitive processes of skill
learning, emphasizing the early and late stages of learning. Let’s consider an example of
a novice soccer player learning to make a shot on goal. Initially, she may be rehearsing
her coach’s instructions in working memory (see later section, Role of Working Memory
in Skill Learning), reminding herself to keep her toe down and her ankle locked, and to
hit the ball with her instep. When her first shot goes wide, she will engage error detec
tion and correction mechanisms (see next section, Error Detection and Correction) to ad
just her aim for the next attempt, relying again on working memory to adjust motor com
mands based on recent performance history. It seems intuitive, both from this example
and our own experiences, that these early learning processes are cognitively demanding.
As our budding soccer player progresses in skill, she can become less inwardly focused
on the mechanics of her own shot and start learning about more tactical aspects of the
sport. At this point, her motor performance has become fluid and automatized (see later
section, Late Learning Processes). When learning a new sport or skill, this transition from
“early” to “late” processing occurs not just once, but rather at multiple levels of compe
tency as the learner progresses through more and more complex aspects of her or his do
main of expertise (Schack & Mechsner, 2006; Wolpert & Flanagan, 2010). These stages of
learning are likely not dissociated by discrete transitions, but rather overlap in time.
When learning a new motor skill such as moving a cursor on a computer screen, predic
tion error becomes critical: New skills do not have enough of a motor history for an accu
rate forward model, resulting in large prediction errors. In this case, the process of learn
ing involves updating motor commands through multiple exposures to motor errors and
Page 7 of 38
Motor Skill Learning
gradually reducing them by refining the forward model (Donchin, Francis, & Shadmehr,
2003; Shadmehr et al., 2010).
The mechanisms of error-based learning are often studied using visual-motor adaptation
and force field adaptation tasks. Visual-motor adaptation involves distortion of the visual
consequences of movement, whereas force field adaptation affects the proprioceptive
consequences of motor commands by altering the dynamics of movement (Lalazar & Vaa
dia, 2008; Shadmehr et al., 2010). Error processing under these two paradigms shows ex
tensive neural overlap in the cerebellum, suggesting a common mechanism for error pro
cessing and learning (Diedrichsen, Hashambhoy, Rane, & Shadmehr, 2005). Error pro
cessing that contributes to learning is distinct from online movement corrections that
happen within a trial (Diedrichsen, Hashambhoy, et al., 2005; Gomi, 2008). Rather, learn
ing is reflected in corrections made from one trial to the next, reflecting learning or up
dating of motor representations.
Evidence suggests that the cerebellum provides the core mechanism underlying error-
based learning (Criscimagna-Hemminger, Bastian, & Shadmehr, 2010; Diedrichsen, Ver
stynen, Lehman, & Ivry, 2005; Ito, 2002; Miall, Christensen, Cain, & Stanley, 2007; Miall,
Weir, Wolpert, & Stein, 1993; Ramnani, 2006; Tseng, Diedrichsen, Krakauer, Shadmehr, &
Bastian, 2007; Wolpert & Miall, 1996). Studies with cerebellar patients demonstrate that,
although these patients are able to make online motor adjustments, their performance
across trials does not improve (Bastian, 2006; Maschke, Gomez, Ebner, & Konczak, 2004;
Morton & Bastian, 2006; Smith & Shadmehr, 2005; Tseng et al., 2007). (p. 421) Neu
roimaging studies also provide evidence for cerebellar contributions to error-based motor
skill learning (Diedrichsen, Verstynen, et al., 2005; Imamizu, Kuroda, Miyauchi, Yoshioka,
& Kawato, 2003; Imamizu, Kuroda, Yoshioka, & Kawato, 2004; Imamizu et al., 2000). It
should be noted that correcting errors within a trial does not seem to be a prerequisite to
learning, which is reflected as correcting errors from one trial to the next, or across-trial
corrections. This is evidenced by experiments showing that learning occurs even when
participants have insufficient time for making corrections. Thus it seems that just experi
encing or detecting an error is sufficient to stimulate motor learning.
Neuroimaging studies provide evidence that brain regions other than the cerebellum may
also play a role in error-dependent learning such as the parietal cortex, striatum, and an
terior cingulate cortex (Chapman et al., 2010; Clower et al., 1996; Danckert, Ferber, &
Goodale, 2008; den Ouden, Daunizeau, Roiser, Friston, & Stephan, 2010). Studies also
show that cerebellar patients can learn from errors when a perturbation is introduced
gradually, resulting in small errors (Criscimagna-Hemminger et al., 2010). In combination
these studies suggest that not all error-based learning relies on cerebellar networks.
Recent studies demonstrate a contribution of the anterior cingulate cortex (ACC) error
processing system to motor learning (Anguera, Reuter-Lorenz, Willingham, & Seidler,
2009; Anguera, Seidler, & Gehring, 2009; Danckert et al., 2008; Ferdinand, Mecklinger, &
Kray, 2008; Krigolson & Holroyd, 2006, 2007a, 2007b; Krigolson, Holroyd, Van Gyn, &
Heath, 2008). This prefrontal performance monitoring system has been studied extensive
Page 8 of 38
Motor Skill Learning
In a series of studies by Krigolson and colleagues, an ERN from the medial frontal region
was found to be associated with motor tracking errors (Krigolson & Holroyd, 2006,
2007a, 2007b; Krigolson et al., 2008). Interestingly, the onset of the ERN occurred before
the tracking error, indicating that the medial frontal system began to detect the error
even before the error was fully committed (Krigolson & Holroyd, 2006). The authors sug
gested that this might entail the medial frontal system predicting tracking errors by
adopting a predictive mode of control (Desmurget, Vindras, Grea, Viviani, & Grafton,
2000). They also found that target errors which allow online movement correction within
a trial did not elicit the medial frontal ERN, but rather resulted in a negative deflection in
the posterior parietal region (Krigolson & Holroyd, 2007a; Krigolson et al., 2008). These
results indicate that the contribution of the medial frontal ERN to motor error processing
is a distinct process potentially involving prediction error calibration (Krigolson & Hol
royd, 2007a; Krigolson et al., 2008).
We recently tested whether the ERN was sensitive to the magnitude of error experienced
during visual-motor adaptation and found a larger ERN magnitude on trials in which larg
er errors were made (Anguera, Seidler, et al., 2009). ERN magnitude also decreased from
the early to the late stages of learning. These results are in agreement with current theo
ries of ERN and skill acquisition. For example, as the error detection theory proposes
(Falkenstein, Hohnsbein, Hoormann, & Blanke, 1991; Gehring et al., 1993), a greater
ERN associated with larger errors indicates that the brain was monitoring the disparity
between the predicted and actual movement outcomes (Anguera, Seidler, et al., 2009).
There is also evidence supporting that the error-based learning represented in the ACC
contributes to motor sequence learning (Berns, Cohen, & Mintun, 1997). Several studies
have shown that the N200 ERP component, which is known to be sensitive to a mismatch
between the expected and actual sensory stimuli, is enhanced for a stimulus that violates
a learned motor sequence (Eimer, Goschke, Schlaghecken, & Sturmer, 1996). When ERN
magnitudes were compared between explicit (p. 422) and implicit learners, a larger ERN
was found for the explicit learners, demonstrating greater involvement of the error moni
toring system when actively searching for the regularity of a sequence (Russeler, Kuh
Page 9 of 38
Motor Skill Learning
licke, & Munte, 2003). A more recent study demonstrated a parametric increase in the
magnitude of the ERN during sequence learning as the awareness of the sequential na
ture and the expectancy to the forthcoming sequential element increased (Ferdinand et
al., 2008).
The series of studies described above supports a role for the prefrontal ACC system in
motor error processing. The actual mechanism of how the ERN contributes to perfor
mance improvements across trials during motor learning is not well understood, however.
Additionally, it remains unclear whether this system works independently of or in collabo
ration with the cerebellar-based error processing system.
Working memory refers to the structures and processes used for temporarily storing and
manipulating information (Baddeley, 1986; Miyake & Shah, 1999). Dissociated processing
for spatial and verbal information was initially proposed by Baddeley and Hitch (1974).
Current views suggest that working memory may not be as process pure as once thought
(Cowan, 1995, 2005; Jonides et al., 2008), but the idea of separate modules for processing
different types of information still holds (Goldman-Rakic, 1987; Shah & Miyake, 1996;
Smith, Jonides, & Koeppe, 1996; Volle et al., 2008). Given that the early stages of motor
learning can be disrupted by the performance of secondary tasks (Eversheim & Bock,
2001; Remy, 2010; Taylor & Thoroughman, 2007, 2008), and the fact that similar pre
frontal cortical regions are engaged early in motor learning as those relied on to perform
spatial working memory tasks (Jonides et al., 2008; Reuter-Lorenz et al., 2000), it is plau
sible that learners engage spatial working memory processes to learn new motor skills.
Because variation exists in the number of items that individuals can hold and operate on
in working memory (cf. Vogel & Machizawa, 2004), it lends itself well to individual differ
ences research approaches. To investigate whether working memory contributes to visu
al-motor adaptation in a recent study by our group, we administered a battery of neu
ropsychological assessments to participants and then had them perform a manual visual-
motor adaptation task and a spatial working memory task during magnetic resonance
imaging (MRI). The results showed that performance on the card rotation task (Ekstrome,
1976), a measure of spatial working memory, correlated with the rate of early, but not
late, learning on the visual-motor adaptation task across individuals (Anguera, Reuter-
Lorenz, Willingham, & Seidler, 2010). There were no correlations between verbal working
memory measures and either early or late learning. Moreover, the neural correlates of
early adaptation overlapped with those that participants engaged when performing a spa
tial working memory task, notably in the right dorsolateral prefrontal cortex and in the bi
lateral inferior parietal lobules. There was no neural overlap between late adaptation and
spatial working memory. These data demonstrate that early, but not late, learning en
gages spatial working memory processes (Anguera, Reuter-Lorenz, Willingham, & Sei
dler, 2010).
Page 10 of 38
Motor Skill Learning
Despite recent assertions that visual-motor adaptation is largely implicit (Mazzoni &
Krakauer, 2006), these findings (Anguera, Reuter-Lorenz, Willingham, & Seidler, 2010)
are consistent with the hypothesis that spatial working memory processes are involved in
the early stages of acquiring new visual-motor mappings. As described in detail below,
whether a task is learned implicitly (subconsciously) or explicitly appears to be a separate
issue from the relative cognitive demands of a task. For example, adaptation to a small
and gradual visual-motor perturbation, which typically occurs outside of the learner’s
awareness, is still subject to interference by performance of a secondary task (Galea, Sa
mi, Albert, & Miall, 2010). This effect is reminiscent of Nissen and Bullemer’s (1987)
finding that implicitly acquiring a sequence of actions is attentionally demanding and can
be disrupted by secondary task performance. Thus we propose that spatial working mem
ory can be engaged for learning new motor skills even when learning is implicit.
Models of motor sequence learning also propose that working memory plays an integral
role (cf. Ashe, Lungu, Basford, & Lu, 2006; Verwey, 1996, 2001). Studies that have used
repetitive transcranial magnetic stimulation (rTMS) to disrupt the dorsolateral prefrontal
cortex, a structure involved in working memory (Jonides et al., 1993), have shown im
paired motor sequence learning (Pascual-Leone, Wassermann, Grafman, & Hallett, 1996;
Robertson, Tormos, Maeda, & Pascual-Leone, 2001). To evaluate whether working memo
ry plays a role in motor sequence learning, we determined whether individual differences
in visual-spatial working memory (p. 423) capacity affect the temporal organization of ac
quired motor sequences. We had participants perform an explicit motor sequence learn
ing task (i.e., they were explicitly informed about the sequence and instructed to learn it)
and a visual-spatial working memory task (Luck & Vogel, 1997). We found that working
memory capacity correlated with the motor sequence chunking pattern that individuals
developed; that is, individuals with a larger working memory capacity chunked more
items together when learning the motor sequence (Bo & Seidler, 2009). Moreover, these
individuals exhibited faster rates of learning. These results demonstrate that individual
differences in working memory capacity predict the temporal structure of acquired motor
sequences.
How might spatial working memory be used during the motor learning process? We pro
pose that it is used for differing purposes depending on whether the learner is acquiring
a new sequence of actions or adapting to a novel sensorimotor environment. In the case
of sensorimotor adaptation, we suggest that error information from the preceding trial
(see above section) is maintained in spatial working memory and relied on when the
learner manipulates the sensorimotor map to generate a motor command that is appro
priate for the new environment. When adaptation is in response to a rotation of the visual
display, this process likely involves the mental rotation component of spatial working
memory (Jordan, 2001; Logie, 2005). This interpretation agrees with Abeele and Bock’s
proposal that adaptation progresses in a gradual fashion across the learning period from
small angles of transformation through intermediate values until the prescribed angle of
rotation is reached (Abeele, 2001). Thus, the engagement of these spatial working memo
ry resources late in adaptation is markedly diminished, compared with early adaptation,
when the new mapping has been formed and is in use. This notion is supported by elec
Page 11 of 38
Motor Skill Learning
Many motor sequence learning paradigms result in learning with few or no errors be
cause movements are cued element by element, as in the popular serial reaction time
task. Trial and error sequence learning is an exception; in this case spatial working mem
ory likely plays a key role in maintaining and inhibiting past erroneous responses (Hikosa
ka et al., 1999). In the case of cued sequence learning, we propose that working memory
is relied on for chunking together of movement elements based on repetition, and their
transfer into long-term memory. Both implicit and explicit motor sequence learning para
digms consistently result in activation of the frontal-parietal working memory system
(Ashe, Lungu, Basford, & Lu, 2006), consistent with this notion. Interestingly, dorsal pre
motor cortex, which serves as a node in both working memory and motor execution net
works, is the site where sensory information from working memory is thought to be con
verted into motor commands for sequence execution (Ohbayashi, 2003).
Verwey (1996, 2001) also hypothesized a close relationship between working memory ca
pacity and motor sequence learning. He proposed that participants rely on a cognitive
processor during sequence learning, which depends on “motor working memory” to allow
a certain number of sequence elements (i.e., a chunk) to be programmed in advance of
execution. At the same time, a motor processor is running in parallel to execute the ac
tions so that the entire sequence can be performed efficiently. Interestingly, Ericsson and
colleagues (Ericsson, 1980) reported a case in which a participant with initially average
memory abilities increased his memory span from 7 to 79 digits with practice. This indi
vidual learned to group chunks of digits together to form “supergroups,” which allowed
him to dramatically increase his digit span. Presumably the process is similar for those
acquiring long motor (p. 424) sequences as well, such as a musician memorizing an entire
piece of music or a dancer learning a sequence of moves for a performance.
Page 12 of 38
Motor Skill Learning
As individuals become more proficient in executing a task, striatal and cerebellar regions
become more active, whereas cortical regions become less active (Doyon & Benali, 2005;
Grafton, Mazziotta, Woods, et al., 1992). But the question of how and where motor memo
ries are ultimately stored is not trivial to answer. Shifts in activation do not mean that
these “late learning” structures are involved in memory; they may instead merely medi
ate performance of a well-learned set of movements. Also, the role of a specific area may
differ between types of motor tasks.
At the cellular level, the acquisition of motor memories as a modulation of synaptic con
nections between neurons was first proposed by Ramon y Cajal (1894). Memories result
from the facilitation and selective elimination of neuronal pairings due to experience.
Hebb stated the associative nature of memories in more explicit terms: “two cells or sys
tems that are repeatedly active at the same time will tend to become associated, so that
activity in one facilitates activity in the other” (1949, p. 70). This Hebbian plasticity mod
el has been well supported in model organisms (see Abel & Lattal, 2001; Martin, Grim
wood, & Morris, 2000), and it appears that similar or identical mechanisms are at play for
human motor memories (Donchin, Sawaki, Madupu, Cohen, & Shadmehr, 2002).
At the level of systems, motor memories appear to follow a “cascade” pattern (Krakauer
& Shadmehr, 2006): motor learning initially excites and rapidly induces associative plas
ticity in one area, for example, primary motor cortex or cerebellar cortex. For a period of
hours, this area will remain excited, and the learning will be sensitive to disruption and
interference (Brashers-Krug, Shadmehr, & Bizzi, 1996; Stefan et al., 2006). With the pas
sage of time, the activation decreases, and the learning becomes more stable and resis
tant to interference (Shadmehr & Brashers-Krug, 1997). Note that this has been debated
(Caithness et al., 2004), but reducing the effects of anterograde interference, through ei
ther intermittent practice (Overduin, Richardson, Lane, Bizzi, & Press, 2006) or washout
blocks (Krakauer, Ghez, & Ghilardi, 2005), demonstrates consolidation.
Memories for motor acts are stored hierarchically. This hierarchy appears to hold true
both for abstract representations of movements in declarative memory (Schack & Mech
sner, 2006) as well as for the motor commands themselves (Grafton & Hamilton, 2007;
Krigolson & Holroyd, 2006; Paine & Tani, 2004; Thoroughman & Shadmehr, 2000; Ya
mamoto & Fujinami, 2008). This storage method may reflect a fundamental limitation of
working memory or attention for the number of simultaneous elements that can be oper
ated on at once. In other words, hierarchical representations are a form of chunking.
Analogy learning, discussed later in this chapter, may exploit this hierarchical framework
by providing a ready-made set of assembled motor primitives, bringing a learner more
quickly to a higher hierarchical level.
The search for the site of motor memory storage has mostly focused on areas known to be
active in late learning of well-practiced tasks. Classically, it was believed that primary mo
tor cortex merely housed commands for simple trajectories, and was controlled by premo
tor areas (Fulton, 1935). However, more recent electrophysiological work in primates has
Page 13 of 38
Motor Skill Learning
shown that motor cortex stores representations of target locations (Carpenter, Geor
gopoulos, & Pellizzer, 1999), motor sequences (Lu, 2005; Matsuzaka, Picard, & Strick,
2007), adaptations to a force field (Li, Padoa-Schioppa, & Bizzi, 2001), and a visual-motor
transformation (Paz, Boraud, Natan, Bergman, & Vaadia, 2003).
Whether and how other brain areas contribute to the storage of motor memories is less
clear. The cerebellum, commonly associated with motor tasks and motor learning, and ac
tive in the performance of well-learned movements, has been identified as a likely candi
date. However, its role is still debated. With regard to sequence learning, although it is
active in performance, the cerebellum is not necessary for learning or memory (Seidler et
al., 2002) unless movements are cued in a symbolic fashion (Bo, Peltier, Noll, & Seidler,
2011; Spencer & Ivry, 2008). The striatum has been extensively linked to acquisition and
storage of sequential representations (Debas et al., 2010; Jankowski, 2009; Seitz &
Roland, 1992), although a recent experiment provides a provocative challenge to this
viewpoint (Desmurget, 2010). In contrast, substantial evidence supports a role for the
cerebellum in housing internal models acquired during sensorimotor adaptation (Gray
don, Friston, Thomas, Brooks, & Menon, 2005; Imamizu et al., 2000, 2003; Seidler & Noll,
2008; Werner, 2010).
Studies investigating ways to maximize the effects of the slow learning process provide
some interesting implications for enhancing retention of acquired skills. For example,
Joiner and Smith (2008) demonstrated that retention 24 hours after learning does not de
pend on the level of performance that participants had attained at the end of learning,
which often is viewed as an indicator of the amount of learning that has taken place.
Rather, retention depends on the level that the slow learning process has reached at the
end of learning. Further, it seems that the fast and slow processes are not fixed but rather
Page 14 of 38
Motor Skill Learning
can be exploited to maximize retention (Huang & Shadmehr, 2009). When an adaptive
stimulus is introduced gradually, errors are small, and retention is better.
The distinction between the two memory systems and the existence of parallel neural
mechanisms underlying each was initially evidenced through the study of neurological pa
tients. Patients with lesions to the medial temporal lobe and in particular the hippocam
pus (as in the case of the well-studied patient H.M.) show amnesia, which is the inability
to learn, store, and recollect new information consciously (Cohen & Squire, 1980; Scoville
& Milner, 1957). Motor learning and memory systems are spared in amnesic patients, as
shown by intact mirror drawing and rotary pursuit task learning (Brooks & Baddeley,
1976; Cavaco, Anderson, Allen, Castro-Caldas, & Damasio, 2004; Corkin, 1968). These
same implicit tasks are impaired in patients with diseases affecting the basal ganglia
(Gabrieli, Stebbins, Singh, Willingham, & Goetz, 1997; Heindel, Salmon, Shults, Walicke,
& Butters, 1989). Neuroimaging studies also demonstrate medial temporal lobe and hip
pocampal activation during explicit learning (Luo & Niki, 2005; Montaldi et al., 1998;
Staresina & Davachi, 2009) and striatal involvement during implicit learning (Lieberman,
Chang, Chiao, Bookheimer, & Knowlton, 2004; Poldrack, Prabhakaran, Seger, & Gabrieli,
1999; Seger & Cincotta, 2006). The dorsolateral prefrontal cortex is also involved during
explicit learning (Barone & Joseph, 1989a, 1989b; Sakai et al., 1998; Toni, Krams, Turner,
& Passingham, 1998), whereas the cerebellum and the cortical motor areas are involved
during implicit learning (Ashe et al., 2006; Matsumura et al., 2004).
More recent evidence suggests that the explicit and implicit systems may affect and inter
act with one another (Willingham, 2001). For example, performance on an implicit memo
ry test can be influenced by explicit memory, and vice versa (Keane, Orlando, & Verfael
lie, 2006; Kleider & Goldinger, 2004; Tunney & Fernie, 2007; Voss, Baym, & Paller, 2008).
Moreover, the hippocampus and striatum, traditionally associated with explicit and im
plicit knowledge, respectively, have been shown to interact during probabilistic classifica
tion learning (Sadeh, Shohamy, Levy, Reggev, & Maril, 2011; Shohamy & Wagner, 2008).
Studies also suggest (p. 426) that the two systems can compete with one another during
Page 15 of 38
Motor Skill Learning
learning (Eichenbaum, Fagan, Mathews, & Cohen, 1988; Packard, Hirsh, & White, 1989;
Poldrack et al., 2001).
The interaction between the explicit and implicit learning and memory systems is also ev
ident during motor skill learning. The fact that amnesic patients can still learn visual-mo
tor tasks such as mirror drawing led many to think that explicit processes are not in
volved in motor skill learning. As a consequence, motor skill learning has been predomi
nantly viewed as part of the implicit learning system by researchers in the memory field
(Gabrieli, 1998; Henke, 2010; Squire, 1992). As explicated in this chapter, however, there
are both explicit and implicit forms of motor skill learning, and there is evidence that
these processes interact during the learning of new motor skills. Moreover, the heavy in
volvement of cognitive processes such as error detection and working memory during the
early stages of skill learning supports the notion that motor skills can benefit from explic
it instruction, at least early in the learning process. The following sections further discuss
the role of implicit and explicit processes in motor sequence learning and sensorimotor
adaptation.
Implicit and explicit learning mechanisms have been studied extensively in motor se
quence learning. There is still debate, however, as to whether the two systems interact
during learning. Behavioral experiments suggest that the two systems act parallel to each
other without interference (Curran & Keele, 1993; Song, Howard, & Howard, 2007). Dis
tinct neural mechanisms have also been reported for the two forms of sequence learning
(Destrebecqz et al., 2005; Honda et al., 1998; Karabanov et al., 2010).
On the other hand, there is also some evidence supporting the interaction of the two sys
tems during sequence learning. For example, the medial temporal lobe and striatum,
known to serve explicit and implicit processes, respectively, have both been shown to be
engaged during both implicit and explicit sequence learning (Schendan, Searl, Melrose, &
Stern, 2003; Schneider et al., 2010; Wilkinson, Khan, & Jahanshahi, 2009). Several stud
ies also show that explicit instructions interfere with implicit learning, supporting interac
tions between the two systems (Boyd & Winstein, 2004; Green & Flowers, 1991; Reber,
1976). However, if explicit knowledge has been acquired through practice, it does not in
terfere with implicit learning (Vidoni & Boyd, 2007). The interference effect between the
two systems during sequence learning has also been supported by neuroimaging data.
Destrebecqz and colleagues showed the expected activation in the striatum during implic
it learning; during explicit learning, the ACC and medial prefrontal cortex were also ac
tive, and this activation correlated negatively with striatal activation (Destrebecqz et al.,
2005). The authors interpreted this as suppression of the implicit learning system (i.e.,
the striatum) during explicit learning. Another study showed that the intention to learn a
sequence was associated with sustained right prefrontal cortex activation and an attenua
tion of learning-related changes in the medial temporal lobe and thalamus, resulting in
the failure of implicit learning (Fletcher et al., 2005).
Page 16 of 38
Motor Skill Learning
Some studies also emphasize that the implicit and explicit memory systems share infor
mation. Destrebecqz and Cleermans (2001) showed that participants were able to gener
ate an implicitly learned sequence when prompted, implying the existence of explicitly ac
quired sequence knowledge. An overlap of neural activation patterns between the two
learning systems including the striatum has also been shown in neuroimaging studies of
simultaneous explicit and implicit sequence learning (Aizenstein et al., 2004; Willingham,
Salidis, & Gabrieli, 2002).
A recent study demonstrates that the two memory systems are combined during the
learning process (Ghilardi, Moisello, Silvestri, Ghez, & Krakauer, 2009). In this study, the
authors report that the implicit and explicit systems consolidate differently and are differ
entially sensitive to interference, with explicit learning being more sensitive to antero
grade and implicit more sensitive to retrograde interference. They proposed that the ex
plicit acquisition of sequential order knowledge and the implicit acquisition of accuracy
can be combined for successful learning.
Whether and how implicit and explicit processes play a role in sensorimotor adaptation is
less well understood. It seems clear that cognitive processes contribute to the early
stages of adaptive learning, as outlined above. However, cognitively demanding tasks do
not necessitate reliance on explicit processes (cf. Galea et al., 2010; Nissen & Bullemer,
1987). It has been shown that participants who gain explicit awareness by the end of
adaptation exhibit learning (p. 427) advantages (Hwang, Smith, & Shadmehr, 2006; Wern
er & Bock, 2007). However, because participants were polled after the experimental pro
cedure, it is not clear whether explicit processes aided learning per se, or arose as a re
sult of the transformation having become well learned. Malone and Bastian (Malone,
2010) recently demonstrated that participants instructed how to consciously correct er
rors during adaptation learned faster than those given no instructions, and participants
performing a secondary distractor task learned even more slowly.
Mazzoni and Krakauer (2006) also provided participants with an explicit strategy to
counter an imposed rotation. Their surprising result was that implicit adaptation to the
rotation proceeded despite the strategy, and despite making performance on the task
worse. Using the same paradigm, Taylor and colleagues (2010) repeated this experiment,
this time comparing healthy adults and patients with cerebellar ataxia. The patients with
cerebellar damage performed better than the controls; they were able to implement an
explicit strategy with little interference from implicit processes, whereas the control
group demonstrated progressively poorer performance as a result of recalibration. Al
though these findings are remarkable, the data do not necessarily support independence
of implicit and explicit systems or a complete takeover by implicit processes, as proposed
by Mazzoni and Krakauer (2006): The errors made by normal controls as a result of their
adaptation only reached about one-third of the magnitude of the total rotation. These
Page 17 of 38
Motor Skill Learning
findings provide further support for the role of the cerebellum in implicit adaptation, as
separate from explicit processes.
That the two systems may be independent is further supported by work from Sulzenbruck
and Heuer (2009). In their task, drawing circles under changing gain conditions, explicit
and implicit processes were placed both in cooperation and in opposition; the explicit
strategy either complemented or opposed the gain change. Under these circumstances,
and when participants never fully adapted to the gain change, the two systems operated
in isolation: they did not interfere with one another, and their effects were purely summa
tive.
When the explicit and implicit systems are not placed in direct conflict, the evidence
seems to be clearer. Explicit and implicit involvements are complementary and non-inter
acting. In a study by Mazzoni and Wexler (2009), an explicit rule could be used to select a
target, whereas implicit processes adapted to a visual-motor rotation without interfer
ence. Interestingly, this study indicates that the two processes have a potential to inter
act. Patients in a presymptomatic stage of Huntington’s disease were unable to maintain
pure separation, and their adaptation performance suffered when explicit task control
was required. In all, a clear conclusion regarding the role of explicit control in adaptation
remains elusive, as does an understanding of the brain structures and networks involved.
Future work with careful attention to methodological detail is required to address these
issues.
Numerous studies have shown that choking under pressure occurs when a highly learned
and automatic skill is brought to explicit awareness leaving the normally well-learned and
automatic motor skill prone to breakdown under conditions of stress (Baumeister, 1984;
Beilock & Carr, 2001; Beilock, Carr, MacMahon, & Starkes, 2002). Although little neuro
science research has been dedicated to this phenomenon, presumably learners show de
creased performance in a well-learned skill when the focus of attention engages brain
networks more actively involved in initial learning, such as the prefrontal cortex and ACC
networks (see Figure 20.1, arrow on right). Researchers studying skilled performance in
Page 18 of 38
Motor Skill Learning
How does the way that individuals learn affect their acquisition and robustness of a new
motor skill? For example, if skill acquisition requires going from a highly cognitive explic
it representation to a procedural implicit representation, are there ways to affect the
learning process so that the motor program is initially stored implicitly? This would also
have the added benefit of making the skill more resistant to choking under pressure be
cause there would not be a strong explicit motor program of the skill to bring to con
scious awareness. In analogy learning, one simple heuristic is provided to the learner
(Koedijker, Oudejans, & Beek, 2008). In following this one single rule, it is believed that
learners eschew hypothesis testing during skill learning. Thus, little explicit knowledge
about the task is accumulated, and demands on error detection and working memory
processes are kept to a minimum (Koedijker, Oudejans, & Beek, 2008). From a neuro
science perspective this would be akin to “bypassing” the highly cognitive prefrontal
brain networks and encoding the skill directly into more robust subcortical brain regions
such as the striatum and cerebellum (see Figure 20.1, arrow on left). For example, in a
2001 study conducted by Liao and Masters (2001), subjects were instructed to hit a fore
hand shot in table tennis by moving the hand along an imaginary hypotenuse of a right
triangle. Analogy learners acquired few explicit rules about the task, and showed similar
performance compared with participants who had learned the skill explicitly. When a con
current secondary task was added, the performance of the explicit learners was signifi
cantly reduced. However, no significant performance impairments were seen in the analo
gy (implicit learning) group under dual-task conditions.
Page 19 of 38
Motor Skill Learning
When new techniques are applied to the study of skill learning mechanisms, investigators
typically focus first on primary motor cortex. Although this structure is both accessible
and strongly associated with skill learning, a complete understanding of the neural mech
anisms underlying learning will come only from investigation of interactions between and
integration of cognitive and motor networks. Moreover, investigation of ecologically valid
skills in more naturalistic settings with the use of mobile brain imaging techniques (Gwin,
Gramann, Makeig, & Ferris, 2010, 2011) will bring further validity and translational ca
pacity to the study of skill learning.
Learning a new motor skill places demands not only on the motor system but also on cog
nitive processes. For example, working memory appears to be relied on for manipulating
motor commands for subsequent actions based on recently experienced (p. 429) errors.
Moreover, it may be used for chunking movement elements into hierarchical sequence
representations. Interestingly, individual differences in working memory capacity are pre
dictive of the ability to learn new motor skills. It should be noted that engagement of
working memory, error processing, and attentional processes and networks does not
necessitate that skill learning is explicit. As skill learning progresses, cognitive demands
are reduced, as reflected by decreasing interference from secondary tasks and reduced
activation of frontal-parietal brain networks.
Page 20 of 38
Motor Skill Learning
• Do the implicit and explicit memory systems interact, either positively (reflected as
transfer) or negatively (reflected as interference), for both motor sequence learning
and sensorimotor adaptation?
• What role do genetic, experiential, and other individual differences variables play in
skill learning?
• What role do state variables, such as motivation, arousal, and fatigue, play in skill
learning?
• How can a precise understanding of the neurocognitive mechanisms of skill learning
be exploited to enhance rehabilitation, learning, and performance?
Author Note
This work was supported by the National Institutes of Health (R01 AG 24106 S1 and T32-
AG00114).
References
Abeele, S., Bock, O. (2001). Mechanisms for sensorimotor adaptation to rotated visual in
put. Experimental Brain Research, 139, 248–253.
Abel, T., & Lattal, K. M. (2001). Molecular mechanisms of memory acquisition, consolida
tion and retrieval. Current Opinion in Neurobiology, 11 (2), 180–187.
Aizenstein, H. J., Stenger, V. A., Cochran, J., Clark, K., Johnson, M., Nebes, R. D., et al.
(2004). Regional brain activation during concurrent implicit and explicit sequence learn
ing. Cerebral Cortex, 14 (2), 199–208.
Anguera, J. A., Reuter-Lorenz, P. A., Willingham, D. T., & Seidler, R. D. (2009). Contribu
tions of spatial working memory to visuomotor learning. Journal of Cognitive Neuro
science, 22 (9), 1917–1930.
Anguera, J. A., Reuter-Lorenz, P.A., Willingham, D.T., Seidler, R.D. (2010). Contributions of
spatial working memory to visuomotor learning. Journal of Cognitive Neuroscience, 22
(9), 1917–1930.
Anguera, J. A., Seidler, R. D., & Gehring, W. J. (2009). Changes in performance monitoring
during sensorimotor adaptation. Journal of Neurophysiology, 102 (3), 1868–1879.
Aristotle; Ross, W. D., & Smith, J. A. (1908). The works of Aristotle. Oxford, UK: Clarendon
Press.
Page 21 of 38
Motor Skill Learning
Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences.
Current Opinion in Neurobiology, 16, 213–221.
Atzler, E., & Herbst, R. (1927). Arbeitsphysiologische Studien. Pflügers Archiv European
Journal of Physiology, 215 (1), 291–328.
Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Bower (Ed.), Recent ad
vances in learning and motivation (Vol. 8, pp. 47–89). New York: Academic Press.
Barone, P., & Joseph, J. P. (1989a). Prefrontal cortex and spatial sequencing in macaque
monkey. Experimental Brain Research, 78 (3), 447–464.
Barone, P., & Joseph, J. P. (1989b). Role of the dorsolateral prefrontal cortex in organizing
visually guided behavior. Brain Behavior and Evolution, 33 (2-3), 132–135.
Bastian, A. J. (2006). Learning to predict the future: the cerebellum adapts feedforward
movement control. Current Opinion in Neurobiology, 16 (6), 645–649.
Beilock, S. L., & Carr, T. H. (2001). On the fragility of skilled performance: what governs
choking under pressure? Journal of Experimental Psychology: General, 130 (4), 701–725.
Beilock, S. L., Carr, T. H., MacMahon, C., & Starkes, J. L. (2002). When paying attention
becomes counterproductive: Impact of divided versus skill-focused attention on novice
and experienced performance of sensorimotor skills. Journal of Experimental Psychology:
Applied, 8 (1), 6–16.
Berns, G. S., Cohen, J. D., & Mintun, M. A. (1997). Brain regions responsive to novelty in
the absence of awareness. Science, 276 (5316), 1272–1275.
Bo, J., Borza, V., & Seidler, R. D. (2009). Age-related declines in visuospatial working
memory correlate with deficits in explicit motor sequence learning. Journal of Neurophys
iology, 102, 2744–2754.
Bo, J., Peltier, S., Noll, D., Seidler, R. D. (2011). Symbolic representations in motor se
quence learning. NeuroImage, 54 (1), 417–426.
Bo, J., & Seidler, R. D. (2009). Visuospatial working memory capacity predicts the organi
zation of acquired explicit motor sequences. Journal of Neurophysiology, 101 (6), 3116–
3125.
Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict
monitoring and cognitive control. Psychological Review, 108 (3), 624–652.
Page 22 of 38
Motor Skill Learning
Boyd, L. A., & Winstein, C. J. (2004). Providing explicit information disrupts implicit motor
learning after basal ganglia stroke. Learning and Memory, 11 (4), 388–396.
Brashers-Krug, T., Shadmehr, R., & Bizzi, E. (1996). Consolidation in human mo
(p. 430)
Brooks, D. N., & Baddeley, A. D. (1976). What can amnesic patients learn? Neuropsycholo
gia, 14 (1), 111–122.
Caithness, G., Osu, R., Bays, P., Chase, H., Klassen, J., Kawato, M., et al. (2004). Failure to
consolidate the consolidation theory of learning for sensorimotor adaptation tasks. Jour
nal of Neuroscience, 24 (40), 8662–8671.
Cajal, S. R. (1894). The Croonian Lecture: La fine structure des centres nerveux. Proceed
ings of the Royal Society of London, 55, 444–468.
Carpenter, A. F., Georgopoulos, A. P., & Pellizzer, G. (1999). Motor cortical encoding of se
rial order in a context-recall task. Science, 283 (5408), 1752–1757.
Castaneda, B., & Gray, R. (2007). Effects of focus of attention on baseball batting perfor
mance in players of different skill levels. Journal of Sport & Exercise Psychology, 29, 60–
77.
Cavaco, S., Anderson, S. W., Allen, J. S., Castro-Caldas, A., & Damasio, H. (2004). The
scope of preserved procedural memory in amnesia. Brain, 127 (Pt 8), 1853–1867.
Chapman, H. L., Eramudugolla, R., Gavrilescu, M., Strudwick, M. W., Loftus, A., Cunning
ton, R., et al. (2010). Neural mechanisms underlying spatial realignment during adapta
tion to optical wedge prisms. Neuropsychologia, 48 (9), 2595–2601.
Clower, D. M., Hoffman, J. M., Votaw, J. R., Faber, T. L., Woods, R. P., & Alexander, G. E.
(1996). Role of posterior parietal cortex in the recalibration of visually guided reaching.
Nature, 383 (6601), 618–621.
Cohen, N. J., & Squire, L. R. (1980). Preserved learning and retention of pattern-analyzing
skill in amnesia: Dissociation of knowing how and knowing that. Science, 210 (4466), 207–
210.
Conley, D. L., & Krahenbuhl, G. S. (1980). Running economy and distance running perfor
mance of highly trained athletes. Medicine and Science in Sports and Exercise, 12 (5),
357–360.
Corkin, S. (1968). Acquisition of motor skill after bilateral medial temporal-lobe excision.
Neuropsychologia, 6 (3), 255–265.
Criscimagna-Hemminger, S. E., Bastian, A. J., & Shadmehr, R. (2010). Size of error affects
cerebellar contributions to motor learning. Journal of Neurophysiology, 103 (4), 2275–
2284.
Curran, T., & Keele, S. W. (1993). Attentional and nonattentional forms of sequence learn
ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19 (1), 189–
202.
Danckert, J., Ferber, S., & Goodale, M. A. (2008). Direct effects of prismatic lenses on vi
suomotor control: An event-related functional MRI study. European Journal of Neuro
science, 28 (8), 1696–1704.
Debas, K., Carrier, J., Orban, P., Barakat, M., Lungu, O., Vandewalle, G., Hadj Tahar, A.,
Bellec, P., Karni, A., Ungerleider, L. G., Benali, H., & Doyon, J. (2010). Brain plasticity re
lated to the consolidation of motor sequence learning and adaptation. Proceedings of the
National Academy of Sciences U S A, 107 (41), 17839–17844.
Della-Maggiore, V., Scholz, J., Johansen-Berg, H., & Paus, T. (2009). The rate of visuomo
tor adaptation correlates with cerebellar white-matter microstructure. Human Brain Map
ping, 30 (12), 4048–4053.
den Ouden, H. E., Daunizeau, J., Roiser, J., Friston, K. J., & Stephan, K. E. (2010). Striatal
prediction error modulates cortical coupling. Journal of Neuroscience, 30 (9), 3210–3219.
Desmurget, M., Turner, R. S. (2010). Motor sequences and the basal ganglia: Kinematics,
not habits. Journal of Neuroscience, 30 (22), 7685–7690.
Desmurget, M., Vindras, P., Grea, H., Viviani, P., & Grafton, S. T. (2000). Proprioception
does not quickly drift during visual occlusion. Experimental Brain Research, 134 (3), 363–
377.
Destrebecqz, A., & Cleeremans, A. (2001). Can sequence learning be implicit? New evi
dence with the process dissociation procedure. Psychonomic Bulletin and Review, 8 (2),
343–350.
Destrebecqz, A., Peigneux, P., Laureys, S., Degueldre, C., Del Fiore, G., Aerts, J., et al.
(2005). The neural correlates of implicit and explicit sequence learning: Interacting net
works revealed by the process dissociation procedure. Learning and Memory, 12 (5), 480–
490.
Diedrichsen, J., Hashambhoy, Y., Rane, T., & Shadmehr, R. (2005). Neural correlates of
reach errors. Journal of Neuroscience, 25 (43), 9919–9931.
Page 24 of 38
Motor Skill Learning
Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2009). The coordination of movement: Opti
mal feedback control and beyond. Trends in Cognitive Sciences, 14 (1), 31–39.
Diedrichsen, J., Verstynen, T., Lehman, S. L., & Ivry, R. B. (2005). Cerebellar involvement
in anticipating the consequences of self-produced actions during bimanual movements.
Journal of Neurophysiology, 93 (2), 801–812.
Donchin, O., Francis, J. T., & Shadmehr, R. (2003). Quantifying generalization from trial-
by-trial behavior of adaptive systems that learn with basis functions: Theory and experi
ments in human motor control. Journal of Neuroscience, 23 (27), 9032–9045.
Donchin, O., Sawaki, L., Madupu, G., Cohen, L. G., & Shadmehr, R. (2002). Mechanisms
influencing acquisition and recall of motor memories. Journal of Neurophysiology, 88 (4),
2114–2123.
Doyon, J., & Benali, H. (2005). Reorganization and plasticity in the adult brain during
learning of motor skills. Current Opinion in Neurobiology, 15 (2), 161–167.
Doyon, J., Owen, A. M., Petrides, M., Sziklas, V., & Evans, A. C. (1996). Functional anato
my of visuomotor skill learning in human subjects examined with positron emission to
mography. European Journal of Neuroscience, 8 (4), 637–648.
Eichenbaum, H., Fagan, A., Mathews, P., & Cohen, N. J. (1988). Hippocampal system dys
function and odor discrimination learning in rats: Impairment or facilitation depending on
representational demands. Behavioral Neuroscience, 102 (3), 331–339.
Eimer, M., Goschke, T., Schlaghecken, F., & Sturmer, B. (1996). Explicit and implicit
learning of event sequences: evidence from event-related brain potentials. Journal of Ex
perimental Psychology: Learning, Memory, and Cognition, 22 (4), 970–987.
Ekstrome, R. B., French, J. W., Harman, H. H., et al. (1976). Manual for kit of factor refer
enced cognitive tests. Princeton, NJ: Educational Testing Service.
a weightlifting movement. Medicine and Science in Sports and Exercise, 20 (2), 178–187.
Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of a memory skill. Science,
208, 1181–1182.
Eversheim, U., & Bock, O. (2001). Evidence for processing stages in skill acquisition: A
dual-task study. Learning and Memory, 8 (4), 183–189.
Falkenstein, M., Hohnsbein, J., & Hoormann, J. (1995). Event-related potential correlates
of errors in reaction tasks. Electroencephalographic and Clinical Neurophysiology Sup
plement, 44, 287–296.
Page 25 of 38
Motor Skill Learning
Falkenstein, M., Hohnsbein, J., Hoormann, J., & Blanke, L. (1991). Effects of crossmodal
divided attention on late ERP components. II. Error processing in choice reaction tasks.
Electroencephalography and Clinical Neurophysiology, 78 (6), 447–455.
Ferdinand, N. K., Mecklinger, A., & Kray, J. (2008). Error and deviance processing in im
plicit and explicit sequence learning. Journal of Cognitive Neuroscience, 20 (4), 629–642.
Fitts, P. M., & Posner, M. I. (1967). Human performance. Belmont, CA: Brooks/Cole.
Flanagan, J. R., Vetter, P., Johansson, R. S., & Wolpert, D. M. (2003). Prediction precedes
control in motor learning. Current Biology, 13 (2), 146–150.
Fletcher, P. C., Zafiris, O., Frith, C. D., Honey, R. A., Corlett, P. R., Zilles, K., et al. (2005).
On the benefits of not trying: Brain activity and connectivity reflecting the interactions of
explicit and implicit sequence learning. Cerebral Cortex, 15 (7), 1002–1015.
Fregni, F., Boggio, P. S., Valle, A. C., Rocha, R. R., Duarte, J., Ferreira, M. J., Wagner, T.,
Fecteau, S., Rigonatti, S. P., Riberto, M., Freedman, S. D., & Pascual-Leone, A. (2006). A
sham-controlled trial of a 5-day course of repetitive transcranial magnetic stimulation of
the unaffected hemisphere in stroke patients. Stroke, 37, 2115–2122.
Fulton, J. F. (1935). A note on the definition of the “motor” and “premotor” areas. Brain,
58 (2), 311.
Gabrieli, J. D., Stebbins, G. T., Singh, J., Willingham, D. B., & Goetz, C. G. (1997). Intact
mirror-tracing and impaired rotary-pursuit skill learning in patients with Huntington’s
disease: Evidence for dissociable memory systems in skill learning. Neuropsychology, 11
(2), 272–281.
Galea, J. M., Sami, S. A., Albert, N. B., & Miall, R. C. (2010). Secondary tasks impair adap
tation to step- and gradual-visual displacements. Experimental Brain Research, 202 (2),
473–484.
Garrison, K. A., Winstein, C. J., & Aziz-Zadeh, L. (2010). The mirror neuron system: A
neural substrate for methods in stroke rehabilitation. Neurorehabilitation and Neural Re
pair, 24 (5), 404–412.
Gehring, W. J., Coles, M. G., Meyer, D. E., & Donchin, E. (1995). A brain potential manifes
tation of error-related processing. Electroencephalography and Clinical Neurophysiology
Suppl, 44, 261–272.
Gehring, W. J., Goss, B., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1993). A neural sys
tem for error detection and compensation. Psychological Science, 4 (6), 385–390.
Page 26 of 38
Motor Skill Learning
Gepshtein, S., Seydell, A., & Trommershauser, J. (2007). Optimality of human movement
under natural variations of visual-motor uncertainty. Journal of Vision, 7 (5), 13 11–18.
Ghilardi, M. F., Gordon, J., & Ghez, C. (1995). Learning a visuomotor transformation in a
local area of work space produces directional biases in other areas. Journal of Neurophys
iology, 73 (6), 2535–2539.
Ghilardi, M. F., Moisello, C., Silvestri, G., Ghez, C., & Krakauer, J. W. (2009). Learning of a
sequential motor skill comprises explicit and implicit components that consolidate differ
ently. Journal of Neurophysiology, 101 (5), 2218–2229.
Grafton, S. T., & Hamilton, A. F. (2007). Evidence for a distributed hierarchy of action rep
resentation in the brain. Human Movement Science, 26 (4), 590–616.
Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E.
(1992). Functional anatomy of human procedural learning determined with regional cere
bral blood flow and PET. Journal of Neuroscience, 12 (7), 2542–2548.
Grafton, S. T., Mazziotta, J. C., Woods, R. P., & Phelps, M. E. (1992). Human functional
anatomy of visually guided finger movements. Brain, 115 (Pt 2), 565–587.
Grafton, S. T., Woods, R. P., Mazziotta, J. C., & Phelps, M. E. (1991). Somatotopic mapping
of the primary motor cortex in humans: Activation studies with cerebral blood flow and
positron emission tomography. Journal of Neurophysiology, 66 (3), 735–743.
Graydon, F., Friston, K., Thomas, C., Brooks, V., & Menon, R. (2005). Learning-related fM
RI activation associated with a rotational visuo-motor transformation. Brain Research:
Cognitive Brain Research, 22 (3), 373–383.
Green, T. D., & Flowers, J. H. (1991). Implicit versus explicit learning processes in a prob
abilistic, continuous fine-motor catching task. Journal of Motor Behavior, 23 (4), 293–300.
Gwin, J. T., Gramann, K., Makeig, S., & Ferris, D. P. (2010). Removal of movement artifact
from high-density EEG recorded during walking and running. Journal of Neurophysiology,
103 (6), 3526–3534.
Gwin, J. T., Gramann, K., Makeig, S., & Ferris, D. P. (2011). Electrocortical activity is cou
pled to gait cycle phase during treadmill walking. NeuroImage, 54 (2), 1289–1296.
Page 27 of 38
Motor Skill Learning
Hebb, D. O. (1949). The organization of behavior. New York: Wiley & Sons.
Heindel, W. C., Salmon, D. P., Shults, C. W., Walicke, P. A., & Butters, N. (1989). Neuropsy
chological evidence for multiple implicit memory systems: A comparison of Alzheimer’s,
Huntington’s, and Parkinson’s disease patients. Journal of Neuroscience, 9 (2), 582–587.
Henke, K. (2010). A model for memory systems based on processing modes rather than
consciousness. Nature Reviews Neuroscience, 11 (7), 523–532.
Hikosaka, O., & Isoda, M. (2010). Switching from automatic to controlled behavior: corti
co-basal ganglia mechanisms. Trends in Cognitive Sciences, 14 (4), 154–161.
Hikosaka, O., Nakahara, H., Rand, M. K., Sakai, K., Lu, X., Nakamura, K., et al. (1999).
Parallel neural networks for learning and sequential procedures. Trends in
Neurosciences, 22, 464–471.
(p. 432) Holt, K. G., Hamill, J., & Andres, R. O. (1991). Predicting the minimal energy costs
of human walking. Medicine and Science in Sports and Exercise, 23 (4), 491.
Honda, M., Deiber, M. P., Ibanez, V., Pascual-Leone, A., Zhuang, P., & Hallett, M. (1998).
Dynamic cortical involvement in implicit and explicit motor sequence learning: A PET
study. Brain, 121 (Pt 11), 2159–2173.
Huang, V. S., & Shadmehr, R. (2009). Persistence of motor memories reflects statistics of
the learning event. Journal of Neurophysiology, 102 (2), 931–940.
Hwang, E. J., Smith, M. A., & Shadmehr, R. (2006). Dissociable effects of the implicit and
explicit memory systems on learning control of reaching. Experimental Brain Research,
173 (3), 425–437.
Imamizu, H., Kuroda, T., Miyauchi, S., Yoshioka, T., & Kawato, M. (2003). Modular organi
zation of internal models of tools in the human cerebellum. Proceedings of the National
Academy of Sciences U S A, 100 (9), 5461–5466.
Imamizu, H., Kuroda, T., Yoshioka, T., & Kawato, M. (2004). Functional magnetic reso
nance imaging examination of two modular architectures for switching multiple internal
models. Journal of Neuroscience, 24 (5), 1173–1181.
Imamizu, H., Miyauchi, S., Tamada, T., Sasaki, Y., Takino, R., Putz, B., et al. (2000). Hu
man cerebellar activity reflecting an acquired internal model of a new tool. Nature, 403
(6766), 192–195.
Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by mon
key medial frontal cortex. Nature Neuroscience, 10 (2), 240–248.
Page 28 of 38
Motor Skill Learning
Ito, M. (2002). Historical review of the significance of the cerebellum and the role of
Purkinje cells in motor learning. Annals of the New York Academy of Science, 978, 273–
288.
Jankowski, J., Scheef, L., Hüppe, C., & Boecker, H. (2009). Distinct striatal regions for
planning and executing novel and automated movement sequences. NeuroImage, 44 (4),
1369–1379.
Joiner, W. M., & Smith, M. A. (2008). Long-term retention explained by a model of short-
term learning in the adaptive control of reaching. Journal of Neurophysiology, 100 (5),
2948–2955.
Jones, A. M. (1998). A five year physiological case study of an Olympic runner. British
Journal of Sports Medicine, 32 (1), 39–43.
Jonides, J., Lewis, R. L., Nee, D. E., Lustig, C. A., Berman, M. G., & Moore, K. S. (2008).
The mind and brain of short-term memory. Annual Review of Psychology, 59, 193–224.
Jonides, J., Smith, E. E., Koeppe, R. A., Awh, E. Minoshima, S., & Mintun, M. A. (1993).
Spatial working memory in humans as revealed by PET. Nature, 363, 623–625.
Jordan, K., Heinze, H. J., Lutz, K., Kanowski, M., Jancke, L. (2001). Cortical activations
during the mental rotation of different visual objects. NeuroImage, 13, 143–152.
Jueptner, M., Frith, C. D., Brooks, D. J., Frackowiak, R. S., & Passingham, R. E. (1997).
Anatomy of motor learning. II. Subcortical structures and learning by trial and error. Jour
nal of Neurophysiology, 77 (3), 1325–1337.
Jueptner, M., Stephan, K. M., Frith, C. D., Brooks, D. J., Frackowiak, R. S., & Passingham,
R. E. (1997). Anatomy of motor learning. I. Frontal cortex and attention to action. Journal
of Neurophysiology, 77 (3), 1313–1324.
Karabanov, A., Cervenka, S., de Manzano, O., Forssberg, H., Farde, L., & Ullen, F. (2010).
Dopamine D2 receptor density in the limbic striatum is related to implicit but not explicit
movement sequence learning. Proceedings of the National Academy of Sciences U S A,
107 (16), 7574–7579.
Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995).
Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Na
ture, 377 (6545), 155–158.
Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M. M., Turner, R., et al. (1998).
The acquisition of skilled motor performance: Fast and slow experience-driven changes in
primary motor cortex. Proceedings of the National Academy of Sciences U S A, 95 (3),
861–868.
Page 29 of 38
Motor Skill Learning
Kawato, M. (1999). Internal models for motor control and trajectory planning. Current
Opinion in Neurobiology, 9 (6), 718–727.
Keane, M. M., Orlando, F., & Verfaellie, M. (2006). Increasing the salience of fluency cues
reduces the recognition memory impairment in amnesia. Neuropsychologia, 44 (5), 834–
839.
Kim, S. G., Ashe, J., Georgopoulos, A. P., Merkle, H., Ellermann, J. M., Menon, R. S., et al.
(1993). Functional imaging of human motor cortex at high magnetic field. Journal of Neu
rophysiology, 69 (1), 297–302.
Kleider, H. M., & Goldinger, S. D. (2004). Illusions of face memory: Clarity breeds famil
iarity. Journal of Memory and Language, 50 (2), 196–211.
Koedijker, J. M., Oudejans, R. R. D., & Beek P. J. (2008). Table tennis performance, follow
ing explicit and analogy learning over 10,000 repetitions. International Journal of Sport
Psychology, 39, 237–256.
Krakauer, J. W., Ghez, C., & Ghilardi, M. F. (2005). Adaptation to visuomotor transforma
tions: Consolidation, interference, and forgetting. Journal of Neuroscience, 25 (2), 473–
478.
Krakauer, J. W., Pine, Z. M., Ghilardi, M. F., & Ghez, C. (2000). Learning of visuomotor
transformations for vectorial planning of reaching trajectories. Journal of Neuroscience,
20 (23), 8916–8924.
Krakauer, J. W., & Shadmehr, R. (2006). Consolidation of motor memory. Trends in Neuro
sciences, 29 (1), 58–64.
Krigolson, O. E., & Holroyd, C. B. (2006). Evidence for hierarchical error processing in
the human brain. Neuroscience, 137 (1), 13–17.
Krigolson, O. E., & Holroyd, C. B. (2007a). Hierarchical error processing: different errors,
different systems. Brain Research, 1155, 70–80.
Krigolson, O. E., & Holroyd, C. B. (2007b). Predictive information and error processing:
The role of medial-frontal cortex during motor control. Psychophysiology, 44 (4), 586–595.
Krigolson, O. E., Holroyd, C. B., Van Gyn, G., & Heath, M. (2008). Electroencephalograph
ic correlates of target and outcome errors. Experimental Brain Research, 190 (4), 401–
411.
Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: Modifying inter
nal models. Current Opinion in Neurobiology, 18 (6), 573–581.
Page 30 of 38
Motor Skill Learning
Li, C. S., Padoa-Schioppa, C., & Bizzi, E. (2001). Neuronal correlates of motor perfor
mance and motor learning in the primary motor cortex of monkeys adapting to an exter
nal force field. Neuron, 30 (2), 593–607.
Liao, C. M., & Masters, R. S. W. (2001). Analogy learning: a means to implicit mo
(p. 433)
Lieberman, M. D., Chang, G. Y., Chiao, J., Bookheimer, S. Y., & Knowlton, B. J. (2004). An
event-related fMRI study of artificial grammar learning in a balanced chunk strength de
sign. Journal of Cognitive Neuroscience, 16 (3), 427–438.
Logie, R. H., Della Sala, S., Beschin, N., Denis, M. (2005). Dissociating mental transforma
tions and visuo-spatial storage in working memory: Evidence from representational ne
glect. Memory, 13, 430–434.
Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex codes memorized
movement sequences. Neuron, 45 (6), 967–973.
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and
conjunctions. Nature, 390, 279–281.
Luo, J., & Niki, K. (2005). Does hippocampus associate discontiguous events? Evidence
from event-related fMRI. Hippocampus, 15 (2), 141–148.
Malone, L. A., Bastian, A. J. (2010). Thinking about walking: Effects of conscious correc
tion versus distraction on locomotor adaptation. Journal of Neurophysiology, 103 (4),
1954–1962.
Maloney, L. T., & Mamassian, P. (2009). Bayesian decision theory as a model of human vi
sual perception: Testing Bayesian transfer. Visual Neuroscience, 26 (1), 147–155.
Martin, S. J., Grimwood, P. D., & Morris, R. G. (2000). Synaptic plasticity and memory: An
evaluation of the hypothesis. Annual Review of Neuroscience, 23, 649–711.
Maschke, M., Gomez, C. M., Ebner, T. J., & Konczak, J. (2004). Hereditary cerebellar atax
ia progressively impairs force adaptation during goal-directed arm movements. Journal of
Neurophysiology, 91 (1), 230–238.
Matsumura, M., Sadato, N., Kochiyama, T., Nakamura, S., Naito, E., Matsunami, K., et al.
(2004). Role of the cerebellum in implicit motor skill learning: a PET study. Brain Re
search Bulletin, 63 (6), 471–483.
Matsuzaka, Y., Picard, N., & Strick, P. L. (2007). Skill representation in the primary motor
cortex after long-term practice. Journal of Neurophysiology, 97 (2), 1819–1832.
Mazzoni, P., & Krakauer, J. W. (2006). An implicit plan overrides an explicit strategy dur
ing visuomotor adaptation. Journal of Neuroscience, 26 (14), 3642–3645.
Page 31 of 38
Motor Skill Learning
Mazzoni, P., & Wexler, N. S. (2009). Parallel explicit and implicit control of reaching. PLoS
One, 4 (10), e7557.
McNitt-Gray, J. L., Requejo, P. S., & Flashner, H. (2006). Multijoint control strategies
transfer between tasks. Biological Cybernetics, 94 (6), 501–510.
Miall, R. C., Christensen, L. O., Cain, O., & Stanley, J. (2007). Disruption of state estima
tion in the human lateral cerebellum. PLoS Biol, 5 (11), e316.
Miall, R. C., Weir, D. J., Wolpert, D. M., & Stein, J. F. (1993). Is the cerebellum a smith pre
dictor? Journal of Motor Behavior, 25 (3), 203–216.
Miyake, A., & Shah, P. (1999). Models of working memory: Mechanisms of active mainte
nance and executive control. New York: Cambridge University Press.
Montaldi, D., Mayes, A. R., Barnes, A., Pirie, H., Hadley, D. M., Patterson, J., et al. (1998).
Associative encoding of pictures activates the medial temporal lobes. Human Brain Map
ping, 6 (2), 85–104.
Newell, K. M. (1991). Motor skill acquisition. Annual Review of Psychology, 42, 213–237.
Nishimoto, R., & Tani, J. (2009). Development of hierarchical structures for actions and
motor imagery: a constructivist view from synthetic neuro-robotics study. Psychological
Research, 73 (4), 545–558.
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from
performance measures. Cognitive Psychology, 19 (1), 1–32.
Ohbayashi, M., Ohki, K., Miyashita, Y. (2003). Conversion of working memory to motor se
quence in the monkey premotor cortex. Science, 301, 233–236.
Overduin, S. A., Richardson, A. G., Lane, C. E., Bizzi, E., & Press, D. Z. (2006). Intermit
tent practice facilitates stable motor memories. Journal of Neuroscience, 26 (46), 11888–
11892.
Packard, M. G., Hirsh, R., & White, N. M. (1989). Differential effects of fornix and caudate
nucleus lesions on two radial maze tasks: evidence for multiple memory systems. Journal
of Neuroscience, 9 (5), 1465–1472.
Paine, R. W., & Tani, J. (2004). Motor primitive and sequence self-organization in a hierar
chical recurrent neural network. Neural Networks, 17 (8-9), 1291–1309.
Page 32 of 38
Motor Skill Learning
Pascual-Leone, A., Brasil-Neto, J. P., Valls-Sole, J., Cohen, L. G., & Hallett, M. (1992). Sim
ple reaction time to focal transcranial magnetic stimulation: Comparison with reaction
time to acoustic, visual and somatosensory stimuli. Brain, 115 (Pt 1), 109–122.
Pascual-Leone, A., Grafman, J., Clark, K., Stewart, M., Massaquoi, S., Lou, J. S., et al.
(1993). Procedural learning in Parkinson’s disease and cerebellar degeneration. Annals of
Neurology, 34 (4), 594–602.
Pascual-Leone, A., Grafman, J., & Hallett, M. (1994). Modulation of cortical motor output
maps during development of implicit and explicit knowledge. Science, 263 (5151), 1287–
1289.
Pascual-Leone, A., & Torres, F. (1993). Plasticity of the sensorimotor cortex representa
tion of the reading finger in Braille readers. Brain, 116 (Pt 1), 39–52.
Pascual-Leone, A., Valls-Sole, J., Wassermann, E. M., Brasil-Neto, J., Cohen, L. G., & Hal
lett, M. (1992). Effects of focal transcranial magnetic stimulation on simple reaction time
to acoustic, visual and somatosensory stimuli. Brain, 115 (Pt 4), 1045–1059.
Pascual-Leone, A., Wassermann, E. M., Grafman, J., & Hallett, M. (1996). The role of the
dorsolateral prefrontal cortex in implicit procedural learning. Experimental Brain Re
search, 107, 479–485.
Paz, R., Boraud, T., Natan, C., Bergman, H., & Vaadia, E. (2003). Preparatory activity in
motor cortex reflects learning of local visuomotor skills. Nature Neuroscience, 6 (8), 882–
890.
Poldrack, R. A., Clark, J., Pare-Blagoev, E. J., Shohamy, D., Creso Moyano, J., Myers, C., et
al. (2001). Interactive memory systems in the human brain. Nature, 414 (6863), 546–550.
Poldrack, R. A., Prabhakaran, V., Seger, C. A., & Gabrieli, J. D. (1999). Striatal activation
during acquisition of a cognitive skill. Neuropsychology, 13 (4), 564–574.
Polyakov, F., Drori, R., Ben-Shaul, Y., Abeles, M., & Flash, T. (2009). A compact
(p. 434)
Ralston, H. (1958). Energy-speed relation and optimal speed during level walking. Euro
pean Journal of Applied Physiology and Occupational Physiology, 17 (4), 277–283.
Ramnani, N. (2006). The primate cortico-cerebellar system: anatomy and function. Nature
Reviews Neuroscience, 7 (7), 511–522.
Reber, A. S. (1976). Implicit learning of synthetic languages: The role of instructional set.
Journal of Experimental Psychology: Human Learning and Memory, 2 (1), 88–94.
Page 33 of 38
Motor Skill Learning
Reis, J., Schambra, H., Cohen, L. G., Buch, E. R., Fritsch, B., Zarahn, E., Celnik, P. A., &
Krakauer, J. W. (2009). Noninvasive cortical stimulation enhances motor skill acquisition
over multiple days through an effect on consolidation. Proceedings of the National Acade
my of Sciences U S A, 106 (5), 1590–1595.
Remy, F., Wenderoth, N., Lipkens, K., Swinnen, S. P. (2010). Dual-task interference during
initial learning of a new motor task results from competition for the same brain areas.
Neuropsychologia, 48 (9), 2517–2527.
Reuter-Lorenz, P. A., Jonides, J., Smith, E. E., Hartley, A., Miller, A., Marshuetz, C., et al.
(2000). Age differences in the frontal lateralization of verbal and spatial working memory
revealed by PET. Journal of Cognitive Neuroscience, 12, 174–187.
Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of
the medial frontal cortex in cognitive control. Science, 306 (5695), 443–447.
Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedur
al consolidation. Nature Reviews Neuroscience, 5 (7), 576–582.
Robertson, E. M., Tormos, J. M., Maeda, F., & Pascual-Leone, A. (2001). The role of the
dorsolateral prefrontal cortex during sequence learning is specific for spatial information.
Cerebral Cortex, 11 (628–635).
Russeler, J., Kuhlicke, D., & Munte, T. F. (2003). Human error monitoring during implicit
and explicit learning of a sensorimotor sequence. Neuroscience Research, 47 (2), 233–
240.
Sadeh, T., Shohamy, D., Levy, D. R., Reggev, N., & Maril, A. (2011). Cooperation between
the hippocampus and the striatum during episodic encoding. Journal of Cognitive Neuro
science, 23, 1597–1608.
Sakai, K., Hikosaka, O., Miyauchi, S., Takino, R., Sasaki, Y., & Putz, B. (1998). Transition
of brain activation from frontal to parietal areas in visuomotor sequence learning. Journal
of Neuroscience, 18 (5), 1827–1840.
Schack, T., & Mechsner, F. (2006). Representation of motor skills in human long-term
memory. Neuroscience Letters, 391 (3), 77–81.
Scheidt, R. A., Dingwell, J. B., & Mussa-Ivaldi, F. A. (2001). Learning to move amid uncer
tainty. Journal of Neurophysiology, 86 (2), 971–985.
Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the
role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37 (6),
1013–1025.
Page 34 of 38
Motor Skill Learning
Schneider, S. A., Wilkinson, L., Bhatia, K. P., Henley, S. M., Rothwell, J. C., Tabrizi, S. J., et
al. (2010). Abnormal explicit but normal implicit sequence learning in premanifest and
early Huntington’s disease. Movement Disorders, 25 (10), 1343–1349.
Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le
sions. Journal of Neurology, Neurosurgery, and Psychiatry, 20 (1), 11–21.
Seger, C. A., & Cincotta, C. M. (2006). Dynamics of frontal, striatal, and hippocampal sys
tems during rule learning. Cerebral Cortex, 16 (11), 1546–1555.
Seidler, R. D. (2010). Neural correlates of motor learning, transfer of learning, and learn
ing to learn. Exercise and Sport Sciences Review, 38 (1), 3–9.
Seidler, R. D., & Noll, D. C. (2008). Neuroanatomical correlates of motor acquisition and
motor transfer. Journal of Neurophysiology, 99, 1836–1845.
Seidler, R. D., Noll, D. C., & Chintalapati, P. (2006). Bilateral basal ganglia activation asso
ciated with sensorimotor adaptation. Experimental Brain Research, 175 (3), 544–555.
Seidler, R. D., Purushotham, A., Kim, S. G., Ugurbil, K., Willingham, D., & Ashe, J. (2002).
Cerebellum activation associated with performance change but not motor learning.
Science, 296 (5575), 2043–2046.
Seitz, R. J., & Roland, P. E. (1992). Learning of sequential finger movements in man: A
combined kinematic and positron emission tomography (PET) Study. European Journal of
Neuroscience, 4, 154–165.
Seydell, A., McCann, B. C., Trommershauser, J., & Knill, D. C. (2008). Learning stochastic
reward distributions in a speeded pointing task. Journal of Neuroscience, 28 (17), 4356–
4367.
Shadmehr, R., & Brashers-Krug, T. (1997). Functional stages in the formation of human
long-term motor memory. Journal of Neuroscience, 17 (1), 409–419.
Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error correction, sensory predic
tion, and adaptation in motor control. Annual Review of Neuroscience, 33, 89–108.
Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial
thinking and language processing: An individual differences approach. Journal of Experi
mental Psychology: General, 125 (1), 4–27.
Shohamy, D., & Wagner, A. D. (2008). Integrating memories in the human brain: hip
pocampal-midbrain encoding of overlapping events. Neuron, 60 (2), 378–389.
Smith, E. E., Jonides, J., & Koeppe, R. A. (1996). Dissociating verbal and spatial working
memory: PET investigations. Journal of Cognitive Neuroscience, 7, 337–356.
Smith, M. A., Ghazizadeh, A., & Shadmehr, R. (2006). Interacting adaptive processes with
different timescales underlie short-term motor learning. PLoS Biology, 4 (6), e179.
Page 35 of 38
Motor Skill Learning
Smith, M. A., & Shadmehr, R. (2005). Intact ability to learn internal models of arm dy
namics in Huntington’s disease but not cerebellar degeneration. Journal of Neurophysiol
ogy, 93 (5), 2809–2821.
Song, S., Howard, J. H., Jr., & Howard, D. V. (2007). Implicit probabilistic sequence learn
ing is independent of explicit awareness. Learning and Memory, 14 (3), 167–176.
Spencer, R. M., & Ivry, R. B. (2009). Sequence learning is preserved in individuals with
cerebellar degeneration when the movements are directly cued. Journal of Cognitive Neu
roscience, 21, 1302–1310.
tems supporting learning and memory. Journal of Cognitive Neuroscience, 4 (3), 232–243.
Staresina, B. P., & Davachi, L. (2009). Mind the gap: Binding experiences across space
and time in the human hippocampus. Neuron, 63 (2), 267–276.
Stefan, K., Wycislo, M., Gentner, R., Schramm, A., Naumann, M., Reiners, K., et al.
(2006). Temporary occlusion of associative motor cortical plasticity by prior dynamic mo
tor training. Cerebral Cortex, 16 (3), 376–385.
Sulzenbruck, S., & Heuer, H. (2009). Functional independence of explicit and implicit mo
tor adjustments. Conscious Cognition, 18 (1), 145–159.
Taylor, J., Klemfuss, N., & Ivry, R. (2010). An explicit strategy prevails when the cerebel
lum fails to compute movement errors. Cerebellum, 9, 580–586.
Taylor, J. A., & Thoroughman, K. A. (2007). Divided attention impairs human motor adap
tation but not feedback control. Journal of Neurophysiology, 98 (1), 317–326.
Taylor, J. A., & Thoroughman, K. A. (2008). Motor adaptation scaled by the difficulty of a
secondary cognitive task. PLoS One, 3 (6), e2485.
Thoroughman, K. A., & Shadmehr, R. (2000). Learning of action through adaptive combi
nation of motor primitives. Nature, 407 (6805), 742–747.
Toni, I., Krams, M., Turner, R., & Passingham, R. E. (1998). The time course of changes
during motor sequence learning: A whole-brain fMRI study. NeuroImage, 8 (1), 50–61.
Trommershäuser, J., Maloney, L. T., & Landy, M. S. (2008). Decision making, movement
planning and statistical decision theory. Trends in Cognitive Sciences, 12 (8), 291–297.
Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Senso
ry prediction errors drive cerebellum-dependent adaptation of reaching. Journal of Neuro
physiology, 98 (1), 54–62.
Tunney, R. J., & Fernie, G. (2007). Repetition priming affects guessing not familiarity. Be
havioral and Brain Functions, 3, 40.
Page 36 of 38
Motor Skill Learning
Vidoni, E. D., & Boyd, L. A. (2007). Achieving enlightenment: What do we know about the
implicit learning system and its interaction with explicit knowledge? Journal of Neurolog
ic Physical Therapy, 31 (3), 145–154.
Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in
visual working memory capacity. Nature, 428, 748–751.
Volle, E., Kinkingnéhun, S., Pochon, J. B., Mondon, K., Thiebaut de Schotten, M., Seassau,
M., et al. (2008). The functional architecture of the left posterior and lateral prefrontal
cortex in humans. Cerebral Cortex, 18 (10), 2460–2469.
Voss, J. L., Baym, C. L., & Paller, K. A. (2008). Accurate forced-choice recognition without
awareness of memory retrieval. Learning and Memory, 15 (6), 454–459.
Voss, J. L., & Paller, K. A. (2008). Brain substrates of implicit and explicit memory: The im
portance of concurrently acquired neural signals of both memory types. Neuropsycholo
gia, 46 (13), 3021–3029.
Weiner, M. J., Hallett, M., & Funkenstein, H. H. (1983). Adaptation to lateral displacement
of vision in patients with lesions of the central nervous system. Neurology, 33 (6), 766–
772.
Werner, S., & Bock, O. (2007). Effects of variable practice and declarative knowledge on
sensorimotor adaptation to rotated visual feedback. Experimental Brain Research, 178
(4), 554–559.
Werner, S., Bock, O., Gizewski, E. R., Schoch, B., & Timmann, D. (2010). Visuomotor adap
tive improvement and aftereffects are impaired differentially following cerebellar lesions
in SCA and PICA territory. Experimental Brain Research, 201 (3), 429–439.
Wilkinson, L., Khan, Z., & Jahanshahi, M. (2009). The role of the basal ganglia and its cor
tical connections in sequence learning: evidence from implicit and explicit sequence
learning in Parkinson’s disease. Neuropsychologia, 47 (12), 2564–2573.
Page 37 of 38
Motor Skill Learning
Willingham, D. B., Salidis, J., & Gabrieli, J. D. (2002). Direct comparison of neural systems
mediating conscious and unconscious skill learning. Journal of Neurophysiology, 88 (3),
1451–1460.
Wise, S. P., Moody, S. L., Blomstrom, K. J., Mitz, A. R. (1998). Changes in motor cortical
activity during visuomotor adaptation. Experimental Brain Research, 121, 285–299.
Wolpert, D. M., & Flanagan, J. R. (2010). Motor learning. Current Biology, 20 (11), R467–
R472.
Wolpert, D. M., & Miall, R. C. (1996). Forward models for physiological motor control.
Neural Networks, 9 (8), 1265–1279.
Yamamoto, T., & Fujinami, T. (2008). Hierarchical organization of the coordinative struc
ture of the skill of clay kneading. Human Movement Science, 27 (5), 812–822.
Rachael Seidler
Bryan L. Benson
Nathaniel B. Boyden
Youngbin Kwak
Page 38 of 38
Memory Consolidation
Memory Consolidation
John Wixted and Denise J. Cai
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
The idea that memories require time to consolidate has a long history, but the under
standing of what consolidation means has evolved over time. In 1900, the German experi
mental psychologists Georg Müller and Alfons Pilzecker published a monograph in which
a new theory of memory and forgetting was proposed, one that included—for the first
time—a role for consolidation. Their basic method involved asking subjects to study a list
of paired-associate nonsense syllables and then testing their memory using cued recall af
ter a delay of several minutes. Typically, some of the list items were forgotten, and to in
vestigate why that occurred, Müller and Pilzecker (1900) presented subjects with a sec
ond, interfering list of items to study before memory for the target list was tested. They
found that this interpolated list reduced memory for the target list compared with a con
trol group that was not exposed to any intervening activity. Critically, the position of the
interfering list within the retention interval mattered such that interference occurring
soon after learning had a more disruptive effect than interference occurring later in the
retention interval. This led them to propose that memories require time to consolidate
and that retroactive interference is a force that compromises the integrity of recently
Page 1 of 34
Memory Consolidation
formed (and not-yet-consolidated) memories. In this chapter, we review the major theo
ries of consolidation—beginning with the still-relevant account proposed by Müller and
Pilzecker (1900)—and we consider a variety of recent developments in what has become
a rapidly evolving field.
The kind of interference envisioned by Müller and Pilzecker (1900) does not involve over
loading a retrieval cue but instead involves directly compromising the integrity of a par
tially consolidated memory trace. In what even today seems like a radical notion to many
experimental psychologists, Müller and Pilzecker (1900) assumed that the interference
was nonspecific in the sense that the interfering material did not have to be similar to the
originally memorized material for interference to occur. Instead, mental exertion of any
kind was thought to be the interfering force (Lechner et al., 1999). “Mental exertion” is
fairly vague concept, and Wixted (2004a) suggested that the kind of intervening mental
exertion that Müller and Pilzecker (1900) probably had in mind consists specifically of
new learning. The basic idea is that new learning, per se, serves as an interfering force
that degrades recently formed and still fragile memory traces.
Loosely speaking, it can be said that Müller and Pilzecker (1900) believed that the memo
ry trace becomes strengthened by the process of consolidation. However, there is more
than one way that a trace can become stronger, so it is important to keep in mind which
meaning of a “stronger memory trace” applies in any discussion of consolidation. One
Page 2 of 34
Memory Consolidation
way that a trace might become stronger is that it comes to more accurately reflect past
experience than it did when it was first formed, much like a snapshot taken from a Po
laroid camera comes into sharper focus over time. A trace that consolidated in this man
ner would yield an ever-clearer memory of the encoding event in response to the same re
trieval cue. Another way that a trace might become stronger is that it becomes ever more
likely to spring to mind in response to a retrieval cue (even more likely than it was when
the memory was first formed). A memory trace that consolidated in either of these two
ways would support a higher level of performance than it did at the end of training, as if
additional learning occurred despite the absence of additional training.
Still another way that a trace can become stronger is that it becomes hardened against
the destructive forces of interference. A trace that hardens over time (i.e., a trace that
consolidates in that sense) may simultaneously become degraded over time due to the in
terfering force of new learning or to some other force of decay. As an analogy, a clay
replica of the Statue of Liberty will be at its finest when it has just been completed and
the clay is still wet, but it will also be at its most vulnerable. With the passage of time,
however, the statue dries and becomes more resistant to damage even though it may now
be a less accurate replica than it once was (because of the damage that occurred before
the clay dried). Müller and Pilzecker’s (1900) original view of consolidation, which was
later elaborated by Wickelgren (1974), was analogous to this. That is, the consolidation
process was not thought to render the trace more representative of past experience or to
render it more likely to come to mind than it was at the time of formation; instead, consol
idation was assumed to render the trace (or its association with a retrieval cue) more re
sistant to interference even while the integrity of the trace was gradually being compro
mised by interference.
These considerations suggest a relationship between Müller and Pilzecker’s (1900) view
of consolidation and the time course of forgetting. More specifically, the fact that a memo
ry trace hardens in such a way as to become increasingly resistant to interference even
as the trace fades may help to explain the general shape of the forgetting function
(Wixted, 2004b). Since the seminal work of Ebbinghaus (1885), a consistent body of evi
dence has indicated that the proportional rate of (p. 438) forgetting is rapid at first and
then slows to a point at which almost no further forgetting occurs. This general property
is captured by the power law of forgetting (Anderson & Schooler, 1991; Wixted & Carpen
ter, 2007; Wixted & Ebbesen, 1991), and it is enshrined in Jost’s law of forgetting, which
states that if two memory traces have equal strength but different ages, the older trace
will decay at a slower rate than the younger one from that moment on (Jost, 1897). One
possibility is that the continuous reduction in the rate of forgetting as a trace ages is a re
flection of the increased resistance to interference as the trace undergoes a slow process
of consolidation (Wickelgren, 1974; Wixted, 2004b).
Page 3 of 34
Memory Consolidation
Systems Consolidation
The temporal gradient of retrograde amnesia that is associated with injury and disease
was noted long ago by Ribot (1881/1882), but he had no way of knowing what brain struc
tures were centrally involved in this phenomenon. The experience of H.M. made it clear
that the relevant structures reside in the MTL, and the phenomenon of temporally graded
retrograde amnesia suggests an extended but time-limited role for the MTL in the encod
ing and retrieval of new memories. That is, the MTL is needed to encode new memories,
and it is needed for a time after they are encoded, but it is not needed indefinitely. The
decreasing dependence of memories on the MTL is known as systems consolidation (Fran
kland & Bontempi, 2005; McGaugh, 2000). As a result of this process, which may last as
long as several years in humans, memories are eventually reorganized and established in
the neocortex in such a way that they become independent of the MTL (Squire et al.,
2001). Note that this is a different view of consolidation than the resistance-to-interfer
ence view proposed by Müller and Pilzecker (1900).
The temporal gradient of retrograde amnesia exhibited by patient H.M. (documented dur
ing early years after surgery) prompted more controlled investigations using both hu
mans and nonhumans. These studies have shown that the temporal gradient is real and
that it is evident even when bilateral lesions are limited to the hippocampus (a central
structure of the MTL). For example, in a particularly well-controlled study, Anagnostaras
et al. (1999) investigated the effect of hippocampal lesions in rats using a context fear-
Page 4 of 34
Memory Consolidation
conditioning paradigm. In this task, a tone conditional stimulus (CS) is paired with a
shock unconditional stimulus (US) several times in a novel context. Such training results
in a fear of both the tone and the training context (measured behaviorally as the propor
tion of time spent freezing), and memory for the context-shock association in particular is
known to depend on the hippocampus. Anagnostaras et al. (1999) trained a group of rats
in two different contexts, and this training was later followed by surgical lesions of the
hippocampus. Each rat received training in Context A 50 days before surgery and train
ing in Context B 1 day before surgery. Thus, at the time lesions were induced, memory for
learning in Context A was relatively old, whereas memory for learning in Context B was
still new. A later test of retention showed that remote (i.e., 50-day-old) memory for con
textual fear was similar to that of controls, whereas recent (i.e., 1-day-old) memory for
contextual fear was greatly impaired. Thus, in rats, hippocampus-dependent memories
appear to (p. 439) become independent of the hippocampus in a matter of weeks.
Controlled studies in humans sometimes suggest that the time course of systems consoli
dation often plays out over a much longer period of time period, a finding that is consis
tent with the time window of retrograde amnesia observed for H.M. However, the time
course is rather variable, and the basis for the variability is not known. Both semantic and
episodic memory have been assessed in studies investigating the temporal gradient of
retrograde amnesia. Semantic memory refers to memory for factual knowledge (e.g.,
what is the capital of Texas?), whereas episodic memory refers to memory for specific
events (e.g., memory for a recently presented list of words or memory for an autobio
graphical event, such as a trip to the Bahamas).
Page 5 of 34
Memory Consolidation
The findings from lesion studies have sometimes been corroborated by neuroimaging
studies performed on unimpaired subjects, though the relevant literature is somewhat
mixed in this regard. Although some studies found more activity in the MTL during the
recollection of recent semantic memories compared with remote semantic memories
(Douville et al., 2005; Haist et al., 2001; Smith & Squire, 2009), other studies found no
difference (e.g., Bernard et al., 2004; Maguire et al., 2001; Maguire & Frith, 2003). For
example, using functional magnetic resonance imaging (fMRI), Bernard et al. (2004)
identified brain regions associated with recognizing famous faces from two different peri
ods: people who became famous in the 1960s to 1970s and people who became famous in
the 1990s. They found that the hippocampus was similarly active during the recognition
of faces from both periods (i.e., no temporal gradient was observed). It is not clear why
studies vary in this regard, but one possibility is that the detection of a temporal gradient
is more likely when multiple time points are assessed, especially during the several years
immediately preceding memory impairment, than when only two time points are assessed
(as in Bernard et al., 2004).
In an imaging study that was patterned after the lesion study reported by Manns et al.
(2003), Smith and Squire (2009) measured brain activity while subjects recalled news
events from multiple time points over the past 30 years. In agreement with the lesion
study, they found that regions in the MTL exhibited a decrease in brain activity as a func
tion of the age of the memory over a 12-year period (whereas activity was constant for
memories from 13 to 30 years ago). In addition, they found that regions in the frontal
lobe, temporal lobe, and parietal lobe exhibited an increase in activity as a function of the
age of the trace. Thus, it seems that the (systems) consolidation of semantic memories is
a slow process that may require years to complete.
These findings appear to suggest that the MTL plays a role in recalling personal episodes
even if they happened long ago, and the apparent implication is that autobiographical
memories do not undergo systems consolidation. However, by the time H.M.’s remote au
tobiographical memory impairment was documented, he was an elderly patient, and his
brain was exhibiting signs of cortical thinning, abnormal white matter, and subcortical in
farcts (Squire, 2009). Thus, these late-life brain abnormalities could account for the loss
of remote memories. In addition, in most of the case studies that have documented re
mote autobiographical memory impairment, damage was not restricted to the MTL. To
Page 6 of 34
Memory Consolidation
compare the remote memory effects of limited MTL damage and damage that also in
volved areas of the neocortex, Bayley et al. (2005) measured the ability of eight amnesic
patients to recollect detailed autobiographical memories from their early life. Five of the
patients had damage limited to the MTL, whereas three had damage to the neocortex in
addition to MTL damage. They found that the remote autobiographical memories of the
five MTL patients were quantitatively and qualitatively similar to the recollections of the
control group, whereas the autobiographical memories of the three patients with addi
tional neocortical damage were severely impaired. This result suggests that semantic
memory and episodic memory both eventually become independent of the MTL through a
process of systems consolidation, but the temporal gradient of retroactive amnesia associ
ated with that process can be obscured if damage extends to the neocortex. MacKinnon
and Squire (1989) also found that the temporal gradient of autobiographical memories for
five MTL patients was similar in duration to the multiyear gradient associated with se
mantic memory.
An even shorter time scale for systems consolidation was evident in a recent fMRI study
reported by Takashima et al. (2009). Subjects in that study memorized two sets of face–lo
cation stimuli, one studied 24 hours before the memory test (old memories) and the other
studied 15 minutes before the memory test (new memories). To control for differences in
memory strength, they compared activity for high-confidence hits associated with the old
and new memories and found that hippocampal activity decreased and neocortical activi
ty increased over the course of 24 hours. In addition, the connectivity between the hip
pocampus and the neocortical regions decreased, whereas cortico-cortical connectivity
increased (all over the course of only 24 hours). Results like these suggest that the
process of systems consolidation can occur very quickly.
Page 7 of 34
Memory Consolidation
What determines whether the temporal gradient is short or long? The answer is not
known, but Frankland and Bontempi (2006) suggested that the critical variable may be
the richness of the memorized material. To-be-remembered stimuli presented in a labora
tory are largely unrelated to a subject’s personal history and thus might be integrated
with prior knowledge represented in the cortex in a rather sparse (yet rapid) manner. Au
tobiographical memories, by contrast, are generally related to a large preexisting knowl
edge base. The integration of such memories into an intricate knowledge base may re
quire more extended dialogue between (p. 441) the hippocampus and neocortex (McClel
land, McNaughton, & O’Reilly, 1995). Alternatively, Tse et al. (2007) suggested that when
memories can be incorporated into an associative “schema” of preexisting knowledge
(i.e., when newly learned information is compatible with previously learned information),
the process of systems consolidation is completed very rapidly (within 48 hours for rats).
However, it is not clear whether this idea can account for the rapid systems consolidation
that is apparent for memories of arbitrary laboratory-based stimuli in humans (e.g.,
Takashima et al., 2009). The as-yet-unresolved question of why memories vary in how
quickly they undergo systems consolidation seems likely to remain a focus of research in
this area for some time to come.
The memory of a past experience that is elicited by a retrieval cue presumably consists of
the reactivation of distributed neocortical areas that were active at the time the trace was
initially encoded (Damasio, 1989; Hoffman & McNaughton, 2002; Johnson, McDuff, Rugg,
& Norman, 2009; McClelland, McNaughton, & O’Reilly, 1995; Squire & Alvarez, 1995).
The primary sensory areas of the brain (e.g., the brain areas activated by the sights,
sounds, and smells associated with a visit to the county fair) converge on association ar
eas of the brain, which, in turn, are heavily interconnected with the MTL. Conceivably,
memories are stored in widely distributed neocortical areas from the outset, but the hip
pocampus and other structures of the MTL are required to bind them together until corti
co-cortical associations develop, eventually rendering the memory trace independent of
the MTL (Wixted & Squire, 2011). In studies using fMRI, the increasing role of the direct
cortico-cortical connections may be reflected in increased neocortical activity as time
passes since the memory trace was formed (Takashima et al., 2009).
In agreement with this possibility, a series of trace eyeblink conditioning studies conduct
ed by Takehara-Nishiuchi and colleagues has shown that an area of the medial prefrontal
cortex (mPFC) in rats becomes increasingly necessary for the retrieval of memories as
they become decreasingly dependent on the hippocampus over the course of several
weeks. This was shown both by lesion studies and by direct neural recordings of task-re
Page 8 of 34
Memory Consolidation
lated activity in the mPFC (Takehara, Kawahara, & Kirino, 2003; Takehara-Nishiuchi &
McNaughton, 2008). In one particularly relevant experiment, Takehara-Nishiuchi and Mc
Naughton (2008) showed that task-related activity of mPFC neurons in rats increased
over the course of several weeks even in the absence of further training. In a conceptual
ly related study, Frankland et al. (2004) trained mice in a fear-conditioning procedure and
tested memory either 1 day (recent) or 36 days (remote) after training. They found that
activity in multiple association cortical regions (measured by the expression of activity-
regulated genes) was greater for remote than for recent memories. In addition, they
found that the increased cortical activity for remote memories was not evident in mice
with a gene mutation that selectively impairs remote memory. Results like these would
seem to provide direct evidence of the kind of cortical reorganization that has long been
thought to underlie systems consolidation.
Cellular Consolidation
Systems consolidation is not what Müller and Pilzecker (1900) had in mind when they
first introduced the concept of consolidation. Their view was that a memory trace be
comes increasingly resistant to interference caused by new learning as the trace consoli
dates, not that the trace becomes reorganized in the neocortex (and, therefore, less de
pendent on the MTL) over time.
Whereas a role for systems consolidation came into sharper focus in the years following
the recognition of H.M.’s memory impairment, evidence for a second kind of consolida
tion began to emerge in the early 1970s. This kind of consolidation—called cellular con
solidation—occurs at the level of neurons (not brain systems) and takes place over the
hours (and, perhaps, days) after a memory is formed in the hippocampus (McGaugh,
2000). Cellular consolidation seems more directly relevant to the trace-hardening physio
logical processes that Müller and Pilzecker (1900) had in mind, and it had its origins in
the discovery of a phenomenon known as long-term potentiation (LTP; Bliss & Lomo,
1973).
Page 9 of 34
Memory Consolidation
Although LTP looks like neural memory for an experience (albeit an artificial experience
consisting of a train of electrical impulses), what reason is there to believe that a similar
process plays a role in real memories? The induction of LTP in hippocampal neurons in
volves the opening of calcium channels in postsynaptic N-methyl-D-aspartate (NMDA) re
ceptors (Bliss & Collingridge, 1993). When those receptors are blocked by an NMDA an
tagonist, high-frequency stimulation fails to induce LTP. Perhaps not coincidentally, NM
DA antagonists have often been shown to impair the learning of hippocampus-dependent
tasks in animals (e.g., Morris et al., 1986; Morris, 1989), as if an LTP-like process in the
hippocampus plays an important role in the formation of new episodic memories. One
study suggests that the encoding of actual memories (not just an artificial train of electri
cal pulses) also gives rise to LTP in the hippocampus. Whitlock et al. (2006) trained rats
on an inhibitory avoidance task (a task known to be dependent on the hippocampus), and
they were able to find neurons in the hippocampus that exhibited sustained LTP after
training (not after an artificial tetanus). In addition, tetanic stimulation applied to these
neurons after training now had a lesser effect (as if those neurons were already close to
ceiling levels of LTP) than tetanic stimulation applied to the neurons of animals who had
not received training. These findings suggest that LTP may be more than just a model for
memory formation; it may, in fact, be part of the mechanism that underlies the initial en
coding of memory.
What does LTP have to do with the story of consolidation? The induction of LTP unleashes
a molecular cascade in postsynaptic neurons that continues for hours and results in struc
tural changes to those neurons. The postsynaptic changes are protein-synthesis depen
dent and involve morphological changes in dendritic spines (Yuste & Bonhoeffer, 2001)
and the insertion of additional AMPA receptors into dendritic membranes (Lu et al.,
2001). These changes are generally thought to stabilize LTP because LTP degrades rapid
ly if they do not occur (or are prevented from occurring by the use of a protein synthesis
inhibitor).
LTP exhibits all of the characteristics of consolidation envisioned by Müller and Pilzecker
(1900). In their own work, Müller and Pilzecker (1900) used an original learning phase
(L1), followed by an interfering learning phase (L2), followed by a memory test for the
original list (T1). Holding the retention interval between L1 and T1 constant, they essen
tially showed that L1-L2-----T1 yields greater interference than L1---L2---T (where the
dashes represent units of time). In experimental animals, memories formed in the hip
pocampus and LTP induced in the hippocampus both exhibit a similar temporal gradient
with respect to retroactive interference (Izquierdo et al., 1999; Xu et al., 1998). Whether
L1 and L2 both involve hippocampus-dependent learning tasks (e.g., L1 = one-trial in
hibitory avoidance learning, L2 = exploration of a novel environment), as reported by
Izquierdo et al. (1999), or one involves the induction of LTP (L1) while the other involves
exposure to a learning task (L2), as reported by Xu et al. (1998), the same pattern
emerges. Specifically, L2 interferes with L1 when the time between them is relatively
short (e.g., 1 hour), but not when the time between them is relatively long (e.g., 6 or more
hours). Moreover, if an NMDA antagonist is infused into the hippocampus before L2
(thereby blocking the induction of interfering LTP that might be associated with the
Page 10 of 34
Memory Consolidation
The point is that hippocampus-dependent memories and hippocampal LTP both appear to
be vulnerable to interference early on and then become more resistant to interference
with the passage of time. Moreover, the interfering force is the formation of new memo
ries (or, analogously, the induction of LTP). Newly induced LTP, like a newly encoded
memory, begins life in a fragile state. Over time, as the process of cellular consolidation
(p. 443) unfolds, recently formed LTP and recently encoded memories become more sta
ble, which is to say that they become more resistant to interference caused by the induc
tion of new LTP or by the encoding of new memories.
The use of an NMDA antagonist in rats is not the only way to induce a temporary period
of anterograde amnesia (thereby protecting recently induced LTP or recently formed
memories). In sufficient quantities, alcohol and benzodiazepines have been shown to do
the same in humans. Moreover, like NMDA antagonists, these drugs not only induce an
terograde amnesia but also inhibit the induction of LTP in the hippocampus (Del Cerro et
al., 1992; Evans & Viola-McCabe, 1996; Givens & McMahon, 1995; Roberto et al., 2002,
Sinclair & Lo, 1986). Interestingly, they also result in a phenomenon known as retrograde
facilitation. That is, numerous studies have reported that even though alcohol induces
amnesia for information studied under the influence of the drug, it actually results in im
proved memory for material studied just before consumption (e.g., Bruce & Pihl, 1997;
Lamberty, Beckwith, & Petros, 1990; Mann, Cho-Young, & Vogel-Sprott, 1984; Parker et
al., 1980, 1981). Similar findings have been frequently reported for benzodiazepines such
as diazepam and triazolam (Coenen & Van Luijtelaar, 1997; Fillmore et al., 2001;
Ghoneim, Hinrichs, & Mewaldt, 1984; Hinrichs, Ghoneim, & Mewaldt, 1984; Weingartner
et al., 1995). Predrug memories, it seems, are protected from interference that would
have been created during the postdrug amnesic state.
All of these findings are easily understood in terms of cellular consolidation (not systems
consolidation), but a recent explosion of research on the role of sleep and consolidation
Page 11 of 34
Memory Consolidation
has begun to suggest that the distinction between cellular consolidation and systems con
solidation may not be as sharp as previously thought.
Because sleep is not an undifferentiated state, one focus of this line of research has been
to identify the specific stage of sleep that is important for consolidation. Sleep is divided
into five stages that occur in a regular sequence within 90-minute cycles throughout the
night. Stages 1 through 4 refer to ever-deeper levels of sleep, with stages 3 and 4 often
being referred to as slow-wave sleep. Rapid eye movement (REM) sleep is a lighter stage
of sleep (p. 444) associated with vivid dreams. Although every stage of sleep occurs during
each 90-minute sleep cycle, the early sleep cycles of the night are dominated by slow-
wave sleep, and the later sleep cycles are dominated by REM sleep. For declarative mem
ory, the beneficial effects of sleep have been almost exclusively associated with slow-wave
sleep, and this is true of its possible role in systems consolidation and cellular consolida
tion. For nondeclarative memories, the beneficial effects of sleep are more often associat
ed with REM sleep.
Much evidence dating back at least to Jenkins and Dallenbach (1924) has shown that less
forgetting occurs if one sleeps during the retention interval than if one remains awake. A
reduction in interference is generally thought to play some role in this sleep-related bene
fit, but there appears to be much more to the story than that. In particular, consolidation
Page 12 of 34
Memory Consolidation
is an important part of the story, and the different stages of sleep play very different
roles.
Ekstrand and colleagues (Ekstrand, 1972; Yaroush, Sullivan, & Ekstrand, 1971) were the
first to address the question of whether the different stages of sleep differentially benefit
what we now call declarative memory. These researchers took advantage of the fact that
most REM sleep occurs in the second half of the night, whereas most non-REM sleep oc
curs in the first half. Some subjects in this experiment learned a list, went to sleep imme
diately, and were awakened 4 hours later for a test of recall. These subjects experienced
mostly slow-wave sleep during the 4-hour retention interval. Others slept for 4 hours,
were awakened to learn a list, slept for another 4 hours, and then took a recall test. These
subjects experienced mostly REM sleep during the 4-hour retention interval. The control
(i.e., awake) subjects learned a list during the day and were tested for recall 4 hours lat
er. The subjects all learned the initial list to a similar degree, but the results showed that
4 hours of mostly non-REM sleep resulted in less forgetting relative to the other two con
ditions, which did not differ from each other (i.e., REM sleep did not facilitate memory).
Barrett and Ekstrand (1972) reported similar results in a study that controlled for time-of-
day and circadian rhythm confounds, and the effect was later replicated in studies by Pli
hal and Born (1997, 1999). Slow-wave sleep may play a role in both cellular consolidation
and systems consolidation.
Even during a night of sleep, interference may occur, especially during REM sleep, when
considerable mental activity (mainly vivid dreaming) takes place and memories can be en
coded in the hippocampus. But memories are probably never formed during slow-wave
sleep. This is true despite the fact that a considerable degree of mental activity (consist
ing of static visual images, thinking, reflecting, etc.) occurs during slow-wave sleep. In
deed, perhaps half as much mental activity occurs during non-REM sleep as during REM
sleep (Nielsen, 2000). However, mental activity and the formation of memories are not
one and the same. The mental activity that occurs during slow-wave sleep is not remem
bered, perhaps because it occurs during a time when hippocampal plasticity is mini
mized. Because no new memories are formed in the hippocampus during this time, cellu
lar consolidation can presumably proceed in the absence of interference. During REM
sleep, however, electroencephalogram (EEG) recordings suggest that the hippocampus is
Page 13 of 34
Memory Consolidation
in an awake-like state, and LTP can be induced (and memories can be formed), so inter
ference is more likely to occur.
If slow-wave sleep protects recently formed memories from interference while allowing
cellular consolidation to move forward, then a temporal gradient of interference should
be observed. That is, sleep soon after learning should confer more protection than sleep
that is delayed. This can be tested by holding the retention interval between learning (L1)
and test (T1) constant (e.g., at 24 hours), with the location of sleep (S) within that reten
tion interval varied. Using the notation introduced earlier, the (p. 445) prediction would be
that L1-S-----T1 will confer greater protection than L1---S---T1. If a temporal gradient is
observed (i.e., if memory performance at T1 is greater in the first condition than the sec
ond), it would suggest that sleep does more than simply subtract out a period of retroac
tive interference that would otherwise occur. Instead, it would suggest that sleep (pre
sumably slow-wave sleep) also allows the process of cellular consolidation to proceed in
the absence of interference.
Once again, Ekstrand (1972) performed the pioneering experiment on this issue. In that
experiment, memory was tested for paired-associate words following a 24-hour retention
interval in which subjects slept either during the 8 hours that followed list presentation
or during the 8 hours that preceded the recall test. In the immediate sleep condition (in
which L1 occurred at night, just before sleep), he found that 81 percent of the items were
recalled 24 hours later; in the delayed sleep condition (in which L1 occurred in the morn
ing), only 66 percent were recalled. In other words, a clear temporal gradient associated
with the subtraction of retroactive interference was observed, one that is the mirror im
age of the temporal gradient associated with the addition of retroactive interference re
ported by Müller and Pilzecker (1900). More recent sleep studies have reinforced the idea
that the temporal gradient of retrograde facilitation is a real phenomenon, and they have
addressed various confounds that could have accounted for the results that Ekstrand
(1972) obtained (Gais, Lucas, & Born, 2006; Talamini et al., 2008). The temporal gradient
associated with sleep, like the LTP and animal learning research described earlier, is con
sistent with the notion that when memory formation is temporarily halted, recently
formed and still-fragile memories are protected from interference. As a result, they are
given a chance to become hardened against the forces of retroactive interference that
they will later encounter (perhaps through a process of cellular consolidation).
Page 14 of 34
Memory Consolidation
Long ago, it was discovered that the firing of particular hippocampal cells in awake rats
is coupled to specific points in the rat’s environment (O’Keefe & Dostrovsky, 1971). These
cells are known as “place cells” because they fire only when the rat traverses a particular
place in the environment. Usually, hippocampal place cells fire in relation to the rat’s po
sition on a running track. That is, as the rat traverses point A along the track, place cell 1
will reliably fire. As it traverses point B, place cell 2 will fire (and so on). An intriguing
finding that may be relevant to the mechanism that underlies systems consolidation is
that cells that fire in sequence in the hippocampus during a behavioral task tend to be
come sequentially coactive again during sleep (Wilson & McNaughton, 1994). This is the
phenomenon of neural replay.
Neural reply has most often been observed in rats during slow-wave sleep. It has also oc
casionally been observed during REM sleep, but, in that case, it occurs at a rate that is
similar to the neuron firing that occurred during learning (Louie & Wilson, 2001) and
thus may simply reflect dreaming. The neural replay that occurs during slow-wave sleep
occurs at a rate five to ten times faster than it did during the waking state (e.g., Ji & Wil
son, 2007) and may therefore reflect a biological consolidation process separate from
mental activity like dreaming. It is as if the hippocampus is replaying the earlier behav
ioral experience, perhaps as a way to reorganize the representation of that experience in
the neocortex.
The fact that replay of sequential place cell activity in the hippocampus occurs during
slow-wave sleep does not, by itself, suggest anything about communication between the
hippocampus and the neocortex (the kind of communication that is presumably required
for systems consolidation to take place). However, Ji and Wilson (2007) reported that hip
pocampal replay during slow-wave sleep in rats was coordinated with firing patterns in
the visual cortex, which is consistent with the idea that this process underlies the reorga
nization of memories in the neocortex. In addition, Lansink et al. (2009) performed multi
neuron recordings from the hippocampus and ventral striatum during waking and sleep
ing states. While the rats were awake, the hippocampal cells fired when the rat traversed
a (p. 446) particular point in the environment (i.e., they were place cells), whereas the stri
atal cells generally fired in response to rewards. During slow-wave sleep (but not during
REM sleep), they found that the hippocampal and striatal cells reactivated together. The
coordinated firing was particularly evident for pairs in which the hippocampal place cell
fired before the striatal reward-related neuron. Thus, the hippocampus leads reactivation
in a projection area, and this mechanism may underlie the systems consolidation of place–
reward associations.
One concern about studies of neural replay is that the animals are generally overtrained,
so little or no learning actually occurs. Thus, it is not clear whether learning-related neur
al replay takes place. However, Peyrache et al. (2009) recorded neurons in prefrontal cor
tex during the course of learning. Rats were trained on a Y-maze task in which they
learned to select the rewarded arm using one rule (e.g., choose the left arm) that
changed to a different rule as soon as a criterion level of performance was achieved (e.g.,
choose the right arm). They identified sets of neuronal assemblies with reliable coactiva
Page 15 of 34
Memory Consolidation
tions in prefrontal cortex, and some of these coactivations became stronger when the rat
started the first run of correct trials associated with the acquisition of the new rule. Fol
lowing these sessions, replay during slow-wave sleep mainly involved the learning-related
coactivations. Thus, learning-related replay—the mechanism that may underlie systems
consolidation—can be identified and appears to get underway very soon after learning.
Other evidence suggests that something akin to neural replay occurs in humans as well.
An intriguing study by Rasch et al. (2007) showed that cuing recently formed odor-associ
ated memories by odor re-exposure during slow-wave sleep—but not during REM sleep—
prompted hippocampal activation (as measured by fMRI) and resulted in less forgetting
after sleep compared with a control group. This result is consistent with the notion that
systems consolidation results from the reactivation of newly encoded hippocampal repre
sentations during slow-wave sleep. In a conceptually related study, Peigneux et al. (2004)
measured regional cerebral blood flow and showed that hippocampal areas that were ac
tivated during route learning in a virtual town (a hippocampus-dependent, spatial learn
ing task) were activated again during subsequent slow-wave sleep. Moreover, the degree
of activation during slow-wave sleep correlated with performance on the task the next
day.
In a sleep-deprivation study that also points to an almost immediate role for systems-level
consolidation processes, Sterpenich et al. (2009), using human subjects, investigated
memory for emotional and neutral pictures 6 months after encoding. Half the subjects
were deprived of sleep on the first postencoding night, and half were allowed to sleep
(and then all subjects slept normally each night thereafter). Six months later, subjects
completed a recognition test in the scanner in which each test item was given a judgment
of “remember” (previously seen and subjectively recollected), “know” (previously seen
but not subjectively recollected), or “new” (not previously seen). A contrast between ac
tivity associated with remembered items and known items yielded a smaller difference in
the sleep-deprived subjects across a variety of brain areas (ventral mPFC, precuneus,
amygdala, and occipital cortex), even though the items had been memorized 6 months
earlier, and these results were interpreted to mean that sleep during the first postencod
ing night influences the long-term systems-level consolidation of emotional memory.
The unmistakable implication from all of these studies is that the process thought to un
derlie systems consolidation—namely, neural replay (or neural reactivation)—begins to
unfold in a measurable way along a time course ordinarily associated with cellular consol
Page 16 of 34
Memory Consolidation
idation. That is, in the hours after a trace is formed, hippocampal LTP stabilizes, and
neural replay in the hippocampus gets underway. These findings would seem to raise the
possibility that the molecular cascade that underlies cellular consolidation also plays a
role in initiating neural replay (Mednick, Cai, Shuman, Anagnostaras, & Wixted, 2011). If
interference occurs while the trace is still fragile, then LTP will not stabilize, and presum
ably, neural replay will not be initiated. In that case, the memory will be lost. But if hip
pocampal (p. 447) LTP is allowed to stabilize (e.g., if potentially interfering memories are
blocked, or if a period of slow-wave sleep ensues after learning), then (1) the LTP will sta
bilize and become more resistant to interference and (2) neural replay in the hippocam
pus will commence and the memory will start to become reorganized in the neocortex.
Thus, on this view, cellular consolidation is an early component of systems consolidation.
With these considerations in mind, it is interesting to consider why Rasch et al. (2007)
and Peigneux et al. (2004) both observed performance benefits associated with reactiva
tion during slow-wave sleep. Results like these suggest that reactivation not only serves
to reorganize the memory trace in the neocortex but also strengthens the memory trace
in some way. But in what way is the trace strengthened? Did the reactivation process that
occurred during slow-wave sleep act as a kind of rehearsal, strengthening the memory in
much the same way that ordinary conscious rehearsal strengthens a memory (increasing
the probability that the memory will later be retrieved)? Or did the reactivation instead
serve to render the memory trace less dependent on the hippocampus and, in so doing,
protect the trace from interference caused by the encoding of new memories in the hip
pocampus (e.g., Litman & Davachi, 2008)? Either way, less forgetting would be (and was)
observed following a period of reactivation compared with a control condition.
The available evidence showing that increased reactivation during slow-wave sleep re
sults in decreased forgetting after a night of sleep does not shed any direct light on why
reactivation causes the information to be better retained. Evidence uniquely favoring a
rehearsal-like strengthening mechanism (as opposed to protection from interference)
would come from a study showing that reactivation during sleep can be associated with
an actual enhancement of performance beyond the level that was observed at the end of
training. Very few declarative memory studies exhibit that pattern, but one such study
was reported by Cai, Shuman, Gorman, Sage, and Anagnostaras (2009). Using a Pavlov
ian fear-conditioning task, they found that hippocampus-dependent contextual memory in
mice was enhanced (in an absolute sense) following a period of sleep whether the sleep
phase occurred immediately after training or 12 hours later. More specifically, following
sleep, the percentage of time spent freezing (the main dependent measure of memory) in
creased beyond that observed at the end of training. This is a rare pattern in studies of
declarative memory, but it is the kind of finding that raises the possibility that sleep-relat
ed consolidation can sometimes increase the probability that a memory will be retrieved
(i.e., it can strengthen memories in that sense).
Page 17 of 34
Memory Consolidation
Role of Brain Rhythms in the Encoding and Consolidation States of the Hip
pocampus
Most of the work on hippocampal replay of past experience has looked for the phenome
non during sleep, as if it might be a sleep-specific phenomenon. However, the key condi
tion for consolidation to occur may not be sleep, per se. Instead, the key condition may
arise whenever the hippocampus is not in an encoding state, with slow-wave sleep being
an example of such a condition. Indeed, Karlsson and Frank (2009) found frequent awake
replay of sequences of hippocampal place cells in the rat. The rats were exposed to two
environments (i.e., two different running tracks) each day, and each environment was as
sociated with a different sequence of place cell activity. The interesting finding was that
during pauses in awake activity in environment 2, replay of sequential place cell activity
associated with environment 1 was observed (replay of the local environment was also
observed). The finding that the hippocampus exhibits replay of the remote environment
while the rat is awake suggests that the hippocampus may take advantage of any down
time (including, but not limited to, sleep) to consolidate memory. That is to say, the
processes that underlie systems consolidation may unfold whenever the hippocampus is
not encoding new memories (e.g., Buzsáki, 1989).
In the encoding state, the cortex is characterized by beta oscillations (i.e., 12 to 20 Hz),
whereas the hippocampus is characterized by theta oscillations (i.e., 4 to 8 Hz). Hip
pocampal theta oscillations are thought to synchronize neural firing along an input path
way into the hippocampus. For example, in the presence of theta (but not in its absence),
the hippocampus receives rhythmic input from neurons in the input layers of the adjacent
entorhinal cortex (Chrobak & Buzsáki, 1996). In addition, Siapas, Lubenov, and Wilson
(2005) showed that neural activity in the prefrontal cortex of rats was “phase-locked” to
theta oscillations in the hippocampus in freely behaving (i.e., active-awake) rats. Findings
like these are consistent with the idea that theta rhythms coordinate the flow of informa
tion into the hippocampus, and still other findings suggest that theta rhythms may facili
Page 18 of 34
Memory Consolidation
tate the encoding of information flowing into the hippocampus. During the high-Ach en
coding state—which is a time when hippocampal synaptic plasticity is high (Rasmusson,
2000)—electrical stimuli delivered at intervals equal to theta frequency are more likely to
induce LTP than stimulation delivered at other frequencies (Larson & Lynch, 1986). Thus,
theta appears to play a role both in organizing the flow of information into the hippocam
pus and in facilitating the encoding of that information.
Lower levels of Ach prevail during quite-awake and slow-wave sleep, and this is thought
to shift the hippocampus into the consolidating state (see Rasch, Born, & Gais, 2006). In
this state, activity along input pathways (ordinarily facilitated by theta rhythms) is sup
pressed, and hippocampal plasticity is low (i.e., hippocampal LTP is not readily induced).
As such, and as indicated earlier, recently induced LTP is protected from interference and
is given a chance to stabilize as the process of cellular consolidation unfolds. In addition,
under these conditions, the cortex is characterized by low-frequency spindle oscillations
(i.e., 7 to 14 Hz) and delta oscillations (i.e., 4 Hz or less), whereas the hippocampus is as
sociated with a more broad-spectrum pattern punctuated by brief, high-frequency sharp
waves (i.e., 30 Hz or more) and very-high-frequency “ripples” (about 200 Hz). These
sharp wave oscillations occur within the hippocampal-entorhinal output network, and syn
chronized neural discharges tend to occur along this pathway during sharp-wave/ripple
events (Buzsáki, 1986; Chrobak & Buzsáki, 1996). Thus, once again, rhythmic activity
seems to coordinate communication between adjacent brain structures, and such commu
nication has been found to occur between more distant brain structures as well. For ex
ample, ripples observed during hippocampal sharp waves have been correlated with the
occurrence of spindles in prefrontal cortex (Siapas & Wilson, 1998). Moreover, the neural
replay discussed earlier preferentially takes place during the high-frequency bursts of
spindle waves (Wilson & McNaughton, 1994). All of this suggests that rhythmically based
feedback activity from the hippocampus may serve to “train” the neocortex and thus facil
itate the process of systems consolidation. When it occurs in the hours after learning, this
kind of systems-level communication presumably involves hippocampal neurons that have
encoded information and that are successfully undergoing the process of cellular consoli
dation. If so, then, again, cellular consolidation could be regarded as an early component
of the systems consolidation process.
A novel line of research concerned with the role of sleep in consolidation was initiated by
a study suggesting that sleep also plays a role in the consolidation of nondeclarative
memories. Karni et al. (1994) presented subjects with computer-generated stimulus dis
plays that sometimes contained a small target consisting of three adjacent diagonal bars
(arranged either vertically or horizontally) embedded within a background of many hori
zontal bars. The displays were presented very briefly (10 ms) and then occluded by a visu
al mask, and the subject’s job on a given trial was to indicate whether the target items
were arranged vertically or horizontally in the just-presented display. Performance on this
Page 19 of 34
Memory Consolidation
task improves with practice in that subjects can correctly identify the target with shorter
and shorter delays between the stimulus and the mask.
A remarkable finding reported by Karni et al. (1994; Karni & Sagi, 1993) was that, follow
ing a night of normal sleep, performance improved on this task to levels that were higher
than the level that had been achieved at the end of training—as if further learning took
place offline during sleep. This is unlike what is typically observed on declarative memory
tasks, which only rarely show an actual performance enhancement. Various control condi
tions showed that the enhanced learning effect was not simply due to a reduction in gen
eral fatigue. Instead, some kind of performance-enhancing consolidation apparently oc
curred while the subjects slept.
Karni et al. (1994) found that depriving subjects of slow-wave sleep after learning did not
prevent the improvement of postsleep performance from occurring, but depriving them of
REM sleep did. Thus, REM sleep seems critical for the sleep-related enhancement of pro
cedural learning to occur, and similar results have been reported in a number of other
studies (Atienza et al., 2004; Gais et al., 2000; Mednick et al., 2002, 2003; Stickgold,
James, & Hobson, 2000; Walker et al., 2005). These findings have been taken to mean
that nondeclarative memories require a period of consolidation and that REM sleep in
particular is critical for such consolidation to occur. Although most work has pointed to
REM, some work has suggested a role for slow-wave sleep as well. For example, using the
same texture-discrimination task, Stickgold et al. (2000) found that the sleep-dependent
gains were correlated with the amount of slow-wave sleep early in the night and with the
amount of REM sleep late in the night (cf. Gais et al., 2000).
In the case of nondeclarative memories, the evidence for consolidation does not consist of
decreasing dependence on one brain system (as in systems consolidation) or of increasing
resistance to interference (as in cellular consolidation). Instead, the evidence consists of
an enhancement of learning beyond the level that was achieved at the end of training. At
the time Karni et al. (1994) published their findings, this was an altogether new phenome
non, and it was followed by similar demonstrations of sleep-related enhancement using
other procedural memory tasks, such as the sequential finger-tapping task (Walker et al.,
2002, 2003a, 2003b). In this task, subjects learn a sequence of finger presses, and perfor
mance improves with training (i.e., the sequence is completed with increasing speed) and
improves still further following a night of sleep, with the degree of improvement often
Page 20 of 34
Memory Consolidation
correlating with time spent in stage 2 sleep. Fischer, Hallschmid, Elsner, and Born (2002)
reported similar results, except that performance gains correlated with amount of REM
sleep. However, one aspect of this motor-sequence-learning phenomenon—namely, the
fact that performance improves beyond what was observed at the end of training—has
been called into question. Rickard et al. (2008) recently presented evidence suggesting
that the apparent absolute enhancement of performance on this task following sleep may
have resulted from a combination of averaging artifacts, time-of-day confounds (cf.
Keisler, Ashe, & Willingham, 2007; Song et al., 2007), and the buildup of fatigue (creating
the impression of less presleep learning than actually occurred). This result does not nec
essarily question the special role of sleep in the consolidation of motor-sequence learn
ing, but it does call into question the absolute increase in performance that has been ob
served following a period of sleep.
Somewhat more puzzling for the idea that REM plays a special role in the consolidation of
nondeclarative memory is that Rasch, Pommer, Diekelmann, and Born (2008) found that
the use of antidepressant drugs, which virtually eliminate REM sleep, did not eliminate
the apparent sleep-related enhancement of performance on two nondeclarative memory
tasks (mirror tracing and motor sequence learning). This result would appear to suggest
that REM sleep, per se, is not critical for the consolidation of learning on either task. In
stead, conditions that happen to prevail during REM sleep (rather than REM sleep per se)
may be critical. Consistent with this possibility, Rasch, Gais, and Born (2009) showed that
cholinergic receptor blockade during REM significantly impaired motor skill consolida
tion. This finding suggests that the consolidation of motor skill depends on the high
cholinergic activity that typically occurs during REM (and that presumably occurs even
when REM is eliminated by antidepressant drugs).
primed items with new unrelated items to create new and useful associations. They used
the Remote Associations Test, in which subjects are asked to find a fourth word that could
serve as an associative link between three presented words (such as COOKIES, SIXTEEN,
HEART). The answer to this item is SWEET (cookies are sweet, sweet sixteen, sweet
heart). It is generally thought that insight is required to hit upon solutions to problems
such as these because the correct answer is usually not the strongest associate of any of
the individual items. After priming the answers earlier in the day using an unrelated
analogies task, subjects took an afternoon nap. Cai et al. (2009) found that quiet rest and
non-REM sleep did not facilitate performance on this task, but REM sleep did. Important
ly, the ability to successfully create new associations was not attributable to conscious
memory for the previously primed items because there were no differences in recall or
recognition for the primed items between the quiet rest, non-REM sleep, and REM sleep
groups. This finding reinforces the notion that REM sleep is important for nondeclarative
memory, possibly by providing a brain state in which the association of the neocortex can
reorganize without being disrupted by input from the MTL.
Reconsolidation
In recent years, a great deal of attention has focused on the possibility that it is not just
new memories that are labile for a period of time; instead, any recently reactivated mem
ory—even one that consolidated long ago—may become labile again as a result of having
been reactivated. That is, according to this idea, an old declarative memory retrieved to
conscious awareness again becomes vulnerable to disruption and modification and must
undergo the process of cellular consolidation (and, perhaps, systems consolidation) all
over again.
The idea that recently retrieved memories once again become labile was proposed long
ago (Misanin, Miller, & Lewis, 1968), but the recent resurrection of interest in the subject
was sparked by Nader, Schafe, and Le Doux (2000). Rats in this study were exposed to a
fear-conditioning procedure in which a tone was paired with shock in one chamber (con
text A). The next day, the rats were placed in another chamber (context B) and presented
with the tone to reactivate memory of the tone–shock pairing. For half the rats, a protein
synthesis inhibitor (anisomycin) was then infused into the amygdala (a structure that is
adjacent to the hippocampus and that is involved in the consolidation of emotional memo
ry). If the tone-induced reactivation of the fear memory required the memory to again un
dergo the process of consolidation in order to become stabilized, then anisomycin should
prevent that from happening, and the memory should be lost. This, in fact, was what Nad
er et al. (2000) reported. Whereas control rats exhibited considerable freezing when the
tone was presented again 1 day later (indicating long-term memory for the original tone–
shock pairing), the anisomycin-treated rats did not (as if they had forgotten the tone–
shock pairing).
Page 22 of 34
Memory Consolidation
(2000) also reported that when the administration of anisomycin was delayed for 6 hours
after the memory was reactivated (thereby giving the memory a chance to reconsolidate
before protein synthesis was inhibited), little effect on long-term learning was observed.
More specifically, in a test 24 hours after reactivation, the treated rats and the control
rats exhibited a comparable level of freezing in response to the tone (indicating memory
for the tone–shock pairing). (p. 451) All these results parallel the effects of anisomycin on
tone–shock memory when it is infused after a conditioning trial (Schafe & LeDoux, 2000).
What was remarkable about the Nader et al. (2000) results was that similar consolidation
effects were also observed well after conditioning and in response to the reactivation of
memory caused by the presentation of the tone. Similar findings have now been reported
for other tasks and other species (see Nader & Hardt, 2009, for a review).
The notion that a consolidated memory becomes fragile again merely because it is reacti
vated might seem implausible because personal experience does not suggest that we
place our memories at risk by retrieving them. In fact, the well-known testing effect—the
finding that successful retrieval enhances memory more than additional study—seems to
suggest that the opposite may be true (e.g., Roediger & Karpicke, 2006). However, a frag
ile trace is also a malleable trace, and it has been suggested that the updating of memory
—not its erasure—may be a benefit of what otherwise seems like a problematic state of
affairs. As noted by Dudai (2004), the susceptibility to corruption of a retrieved memory
“might be the price paid for modifiability” (p. 75). If the reactivated trace is susceptible
only to agents such as anisomycin, which is not a drug that is encountered on a regular
basis, then the price for modifiability might be low indeed. On the other hand, if the trace
is vulnerable to corruption by new learning, as a newly learned memory trace appears to
be, then the price could be considerably higher. In an intriguing new study, Monfils, Cow
ansage, Klann, and LeDoux (2009) showed that contextual fear memories in rats can be
more readily eliminated by extinction trials if the fear memory is first reactivated by a re
minder trial. For the first time, this raises the possibility that reactivated memories are
vulnerable to disruption and modification by new learning (not just by protein synthesis
inhibitors).
Much remains unknown about reconsolidation, and there is some debate as to whether
the disruption of a recently retrieved trace is a permanent or a transient phenomenon.
For example, Stafford and Lattal (2009) recently compared the effects of anisomycin ad
ministered shortly after fear conditioning (which would disrupt the consolidation of a new
memory) or shortly after a reminder trial (which would disrupt the consolidation of a new
ly retrieved memory). With both groups equated on important variables such as prior
learning experience, they found that the anisomycin-induced deficit on a test of long-term
memory was larger and more persistent in the consolidation group compared with the re
consolidation group. Still, this study adds to a large and growing literature showing that
reactivated memories are in some way vulnerable in a way that was not fully appreciated
until Nader et al. (2000) drove the point home with their compelling study.
Page 23 of 34
Memory Consolidation
Conclusion
The idea that memories require time to consolidate was proposed more than a century
ago, but empirical inquiry into the mechanisms of consolidation is now more intense than
ever. With that inquiry has come the realization that the issue is complex, so much so
that, used in isolation, the word “consolidation” no longer has a clear meaning. One can
speak of consolidation in terms of memory becoming less dependent on the hippocampus
(systems consolidation) or in terms of a trace becoming stabilized (cellular consolidation).
Alternatively, one can speak of consolidation in terms of enhanced performance (over and
above the level of performance achieved at the end of training), in terms of increased re
sistance to interference (i.e., less forgetting), or in terms of a presumed mechanism, such
as neural replay or neural reactivation. A clear implication is that any use of the word
consolidation should be accompanied by a statement of what it means. Similarly, any sug
gestion that consolidation “strengthens” the memory trace should be accompanied by a
clear statement of the way (or ways) in which the trace is thought be stronger than it was
before. A more precise use of the terminology commonly used in this domain of investiga
tion will help to make sense of the rapidly burgeoning literature on the always fascinating
topic of memory consolidation.
References
Anagnostaras, S. G., Maren S., & Fanselow M. S. (1999). Temporally-graded retrograde
amnesia of contextual fear after hippocampal damage in rats: Within-subjects examina
tion. Journal of Neuroscience, 19, 1106–1114.
Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psy
chological Science, 2, 396–408.
Atienza, M., Cantero, J. L., & Stickgold, R. (2004) Posttraining sleep enhances automatici
ty in perceptual discrimination. Journal of Cognitive Neuroscience, 16, 53–64.
Barrett, T. R., & Ekstrand, B. R. (1972). Effect of sleep on memory: III. Controlling for
time-of-day effects. Journal of Experimental Psychology, 96, 321–327.
Bayley, P. J. Gold, J. J., Hopkins, R. O., & Squire, L. R. (2005). The neuroanatomy of remote
memory. Neuron, 46, 799–810.
Bernard, F. A., Bullmore, E. T., Graham, K. S., Thompson, S. A., Hodges, J. R., &
(p. 452)
Bliss, T. V. P., & Collingridge, G. L. (1993). A synaptic model of memory: Long-term poten
tiation in the hippocampus. Nature, 361, 31–39.
Bliss, T. V. P., Collingridge, G. L., & Morris, R. G. (2003). Longterm potentiation: enhanc
ing neuroscience for 30 years – Introduction. Philosophical Transactions of the Royal So
ciety of London, Series B, Biological Sciences, 358, 607–611.
Page 24 of 34
Memory Consolidation
Bliss, T. V. P., & Lomo, T. (1973). Long-lasting potentiation of synaptic transmission in the
dentate area of the anaesthetized rabbit following stimulation of the perforant path. Jour
nal of Physiology, 232, 331–356.
Bramham, C. R., & Srebo, B. (1989). Synaptic plasticity in the hippocampus is modulated
by behavioral state. Brain Research, 493, 74–86.
Bruce, K. R., & Pihl, R. O. (1997). Forget “drinking to forget”: Enhanced consolidation of
emotionally charged memory by alcohol. Experimental and Clinical Psychopharmacology,
5, 242–250.
Buzsáki, G. (1986). Hippocampal sharp waves: Their origin and significance. Brain Re
search, 398, 242–252.
Buzsáki, G. (1989). A two-stage model of memory trace formation: A role for “noisy” brain
states. Neuroscience, 31, 551–570.
Cai, D. J., Mednick, S. A., Harrison, E. M., Kanady, J. C., & Mednick, S. C. (2009). REM,
not incubation, improves creativity by priming associative networks. Proceedings of the
National Academy of Sciences U S A, 106, 10130–10134.
Cai, D. J., Shuman, T., Gorman, M. R., Sage, J. R., & Anagnostaras, S. G. (2009). Sleep se
lectively enhances hippocampus-dependent memory in mice. Behavioral Neuroscience,
123, 713–719.
Chrobak, J. J., & Buzsáki, G. (1996) High-frequency oscillations in the output networks of
the hippocampal-entorhinal axis of the freely-behaving rat. Journal of Neuroscience, 16,
3056–3066.
Cipolotti, L., Shallice, T., Chan, D., Fox, N., Scahill, R., Harrison, G., Stevens, J., & Rudge,
P. (2001). Long-term retrograde amnesia: the crucial role of the hippocampus. Neuropsy
chologia, 39, 151–172.
Del Cerro, S., Jung, M., & Lynch, L. (1992). Benzodiazepines block long-term potentiation
in slices of hippocampus and piriform cortex. Neuroscience, 49, 1–6.
Douville, K., Woodard, J. L., Seidenberg, M., Miller, S. K., Leveroni, C. L., Nielson, K. A.,
Franczak, M., Antuono, P., & Rao, S. M. (2005). Medial temporal lobe activity for recogni
tion of recent and remote famous names: An event related fMRI study. Neuropsychologia,
43, 693–703.
Page 25 of 34
Memory Consolidation
Dudai, Y. (2004). The neurobiology of consolidations, or, how stable is the engram? Annu
al Review of Psychology, 55, 51–86.
Fillmore, M. T., Kelly, T. H., Rush, C. R., & Hays, L. (2001). Retrograde facilitation of mem
ory by triazolam: Effects on automatic processes. Psychopharmacology, 158, 314–321.
Fischer, S., Hallschmid, M., Elsner, A. L., & Born, J. (2002). Sleep forms memory for fin
ger skills. Proceedings of the National Academy of Sciences U S A, 99, 11987–11991.
Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memory.
Nature Reviews Neuroscience, 6, 119–130.
Frankland, P. W., & Bontempi, B. (2006). Fast track to the medial prefrontal cortex. Pro
ceedings of the National Academy of Sciences U S A, 103, 509–510.
Frankland, P. W., Bontempi, B., Talton, L. E., Kaczmarek, L., & Silva, A. J. (2004). The in
volvement of the anterior cingulate cortex in remote contextual fear memory. Science,
304, 881–883.
Gais, S., Lucas, B., & Born, J. (2006). Sleep after learning aids memory recall. Learning
and Memory, 13, 259–262.
Gais, S., Plihal, W., Wagner, U., Born, J. (2000). Early sleep triggers memory for early visu
al discrimination skills. Nature Neuroscience, 3, 1335–1339.
Ghoneim, M. M., Hinrichs, J. V., & Mewaldt, S. P. (1984). Dose-response analysis of the be
havioral effects of diazepam: I. Learning and memory. Psychopharmacology, 82, 291–295.
Givens, B., & McMahon, K. (1995). Ethanol suppresses the induction of long-term potenti
ation in vivo. Brain Research, 688, 27–33.
Page 26 of 34
Memory Consolidation
Haist, F., Bowden Gore, J., & Mao, H. (2001). Consolidation of human memory over
decades revealed by functional magnetic resonance imaging. Nature Neuroscience, 4,
1139–1145.
Hinrichs, J. V., Ghoneim, M. M., & Mewaldt, S. P. (1984). Diazepam and memory: Retro
grade facilitation produced by interference reduction. Psychopharmacology, 84, 158–162.
Hirano, M., & Noguchi, K. (1998). Dissociation between specific personal episodes and
other aspects of remote memory in a patient with hippocampal amnesia. Perceptual and
Motor Skills, 87, 99–107.
Izquierdo, I., Schröder, N., Netto, C. A., & Medina, J. H. (1999). Novelty causes time-de
pendent retrograde amnesia for one-trial avoidance in rats through NMDA receptor- and
CaMKII-dependent mechanisms in the hippocampus. European Journal of Neuroscience,
11, 3323–3328.
Jenkins, J. B., & Dallenbach, K. M. (1924). Oblivescence during sleep and waking. Ameri
can Journal of Psychology, 35, 605–612.
Ji, D., & Wilson, M. A. (2007). Coordinated memory replay in the visual cortex and hip
pocampus during sleep. Nature Neuroscience, 10, 100–107.
Johnson, J. D., McDuff, S. G., Rugg, M. D., & Norman, K. A. (2009). Recollection,
(p. 453)
familiarity, and cortical reinstatement: A multivoxel pattern analysis. Neuron, 63, 697–
708.
Jones Leonard, B., McNaughton, B. L., & Barnes, C. A. (1987). Suppression of hippocam
pal synaptic activity during slow-wave sleep. Brain Research, 425, 174–177.
Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhängigkeit von der Verteilung der
Wiederholungen [The strength of associations in their dependence on the distribution of
repetitions]. Zeitschrift fur Psychologie und Physiologie der Sinnesorgane, 16, 436–472.
Karlsson, M. P., & Frank, L. M. (2009). Awake replay of remote experiences in the hip
pocampus. Nature Neuroscience, 12, 913–918.
Karni, A., & Sagi, D. (1993) The time course of learning a visual skill. Nature, 365, 250–
252.
Karni, A., Tanne, D., Rubenstein, B. S., Askenasy, J. J. M., & Sagi, D. (1994). Dependence
on REM sleep of overnight improvement of a perceptual skill. Science, 265, 679–682.
Page 27 of 34
Memory Consolidation
Keisler, A., Ashe, J., & Willingham, D.T. (2007). Time of day accounts for overnight im
provement in sequence learning. Learning and Memory, 14, 669–672.
Kitchener, E. G., Hodges, J. R., & McCarthy, R. (1998). Acquisition of post-morbid vocabu
lary and semantic facts in the absence of episodic memory. Brain 121, 1313–1327.
Lamberty, G. J., Beckwith, B. E., & Petros, T. V. (1990). Posttrial treatment with ethanol
enhances recall of prose narratives. Physiology and Behavior, 48, 653–658.
Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L., & Pennartz, C. M. A.
(2009). Hippocampus leads ventral striatum in replay of place-reward information. PLoS
Biology, 7 (8), e1000173.
Larson, J., & Lynch, G. (1986). Induction of synaptic potentiation in hippocampus by pat
terned stimulation involves two events. Science, 23, 985–988.
Lechner, H. A., Squire, L. R., & Byrne, J. H. (1999). 100 years of consolidation—Remem
bering Müller and Pizecker. Learning and Memory, 6, 77–87.
Litman, L., & Davachi, L. (2008) Distributed learning enhances relational memory consol
idation. Learning and Memory, 15, 711–716.
Louie, K., & Wilson, M. A. (2001). Temporally structured replay of awake hippocampal en
semble activity during rapid eye movement sleep. Neuron, 29, 145–156.
Lu, W., Man, H., Ju, W., Trimble, W. S., MacDonald, J. F., & Wang, Y. T. (2001). Activation of
synaptic NMDA receptors induces membrane insertion of new AMPA receptors and LTP
in cultured hippocampal neurons. Neuron, 29, 243–254.
Maguire, E. A., & Frith, C. D. (2003) Lateral asymmetry in the hippocampal response to
the remoteness of autobiographical memories. Journal of Neuroscience, 23, 5302–5307.
Maguire, E. A., Henson, R. N. A., Mummery, C. J., & Frith, C. D. (2001) Activity in pre
frontal cortex, not hippocampus, varies parametrically with the increasing remoteness of
memories. NeuroReport, 12, 441–444.
Maguire, E. A., Nannery, R., & Spiers, H. J. (2006). Navigation around London by a taxi
driver with bilateral hippocampal lesions. Brain, 129, 2894–2907.
Mann, R. E., Cho-Young, J., & Vogel-Sprott, M. (1984). Retrograde enhancement by alco
hol of delayed free recall performance. Pharmacology, Biochemistry and Behavior, 20,
639–642.
Page 28 of 34
Memory Consolidation
Manns, J. R., Hopkins, R. O., & Squire, L. R. (2003). Semantic memory and the human hip
pocampus. Neuron, 37, 127–133.
Maquet, P., Laureys, S., Peigneux, P., et al. (2000) Experience-dependent changes in cere
bral activation during human REM sleep. Nature Neuroscience, 3, 831–836.
Martin, S. J., Grimwood, P. D., & Morris, R. G. M. (2000). Synaptic plasticity and memory:
An evaluation of the hypothesis. Annual Review of Neuroscience, 23, 649–711.
Mednick, S. C., Cai, D. J., Shuman, T., Anagnostaras, S., & Wixted, J. T. (2011). An oppor
tunistic theory of cellular and systems consolidation. Trends in Neurosciences, 34, 504–
514.
Mednick, S., Nakayam, K., & Stockgold, R. (2003). Sleep-dependent learning: A nap is as
good as a night. Nature Neuroscience, 6, 697–698.
Mednick, S., & Stickgold, R. (2002). The restorative effect of naps on perceptual deterio
ration. Nature Neuroscience, 5, 677–681.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complemen
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102, 419–
457.
Misanin, J. R., Miller, R. R., & Lewis, D. J. (1968). Retrograde amnesia produced by elec
troconvulsive shock after reactivation of a consolidated memory trace. Science, 160, 203–
204.
Monfils, M., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolida
tion boundaries: Key to persistent attenuation of fear memories. Science, 324, 951–955.
Morris, R. G. M., Anderson, E., Lynch, G. S., & Baudry, M. (1986). Selective impairment of
learning and blockade of long-term potentiation by an N-methyl-D-aspartate receptor an
tagonist, AP5. Nature, 319, 774–776.
Müller, G. E., & Pizecker, A. (1900). Experimentelle Beiträge zur Lehre vom Gedächtnis.
Z. Psychol. Ergänzungsband (Experimental contributions to the science of memory), 1, 1–
300.
Page 29 of 34
Memory Consolidation
Nader, K., & Hardt, O. (2009). A single standard for memory: The case for reconsolida
tion. Nature Reviews Neuroscience, 10, 224–234.
Nader, K., Schafe, G. E., LeDoux, J. E. (2000). Fear memories require protein synthesis in
the amygdala for reconsolidation after retrieval. Nature, 406, 722–726.
Nielsen, T. A. (2000). Cognition in REM and NREM sleep: A review and possible reconcili
ation of two models of sleep mentation. Behavioral and Brain Sciences, 23, 851–866.
O’Keefe, J., & Dostrovsky, J. (1971). The hippocampus as a spatial map: Prelimi
(p. 454)
nary evidence from unit activity in the freely-moving rat. Brain Research, 34, 171–175.
Parker, E. S., Birnbaum, I. M., Weingartner, H., Hartley, J. T., Stillman, R. C., & Wyatt, R. J.
(1980). Retrograde enhancement of human memory with alcohol. Psychopharmacology,
69, 219–222.
Parker, E. S., Morihisa, J. M., Wyatt, R. J., Schwartz, B. L., Weingartner, H., & Stillman, R.
C. (1981). The alcohol facilitation effect on memory: A dose-response study. Psychophar
macology, 74, 88–92.
Peigneux, P., Laureys, S., Fuchs, S., Collette, F., Perrin, F., et al. (2004). Are spatial memo
ries strengthened in the human hippocampus during slow wave sleep? Neuron, 44, 535–
545.
Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S. I., & Battaglia, F. P. (2009). Re
play of rule-learning related neural patterns in the prefrontal cortex during slee p. Nature
Neuroscience, 12, 919–926.
Plihal, W., & Born, J. (1997). Effects of early and late nocturnal sleep on declarative and
procedural memory. Journal of Cognitive Neuroscience, 9, 534–547.
Plihal, W., & Born, J. (1999). Effects of early and late nocturnal sleep on priming and spa
tial memory. Psychophysiology, 36, 571–582.
Rasch, B. H., Born, J., & Gais, S. (2006). Combined blockade of cholinergic receptors
shifts the brain from stimulus encoding to memory consolidation. Journal of Cognitive
Neuroscience, 18, 793–802.
Rasch, B., Buchel, C., Gais, S., & Born, J. (2007). Odor cues during slow-wave sleep
prompt declarative memory consolidation. Science, 315, 1426–1429.
Rasch, B., Gais, S., & Born, J. (2009) Impaired off-line consolidation of motor memories af
ter combined blockade of cholinergic receptors during REM sleep-rich sleep. Neuropsy
chopharmacology, 34, 1843–1853.
Rasch, B., Pommer, J., Diekelmann, S., & Born, J. (2008). Pharmacological REM sleep sup
pression paradoxically improves rather than impairs skill memory. Nature Neuroscience,
12, 396–397.
Page 30 of 34
Memory Consolidation
Ribot, T. (1881). Les maladies de la memoire [Diseases of memory]. New York: Appleton-
Century-Crofts.
Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not en
hance motor sequence learning. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 34, 834–842.
Roberto, M., Nelson, T. E., Ur, C. L., & Gruol, D. L. (2002). Long-term potentiation in the
rat hippocampus is reversibly depressed by chronic intermittent ethanol exposure. Jour
nal of Neurophysiology, 87, 2385–2397.
Roediger, H. L., & Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests im
proves long-term retention. Psychological Science, 17, 249–255.
Rosenbaum, R.S., McKinnon, M. C., Levine, B., & Moscovitch, M. (2004). Visual imagery
deficits, impaired strategic retrieval, or memory loss: Disentangling the nature of an am
nesic person’s autobiographical memory deficit. Neuropsychologia, 42, 1619–1635.
Schafe, G. E., & LeDoux, J. E. (2000). Memory consolidation of auditory Pavlovian fear
conditioning requires protein synthesis and protein kinase A in the amygdala. Journal of
Neuroscience, 20, RC96, 1–5.
Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le
sions. Journal of Neurology, Neurosurgery and Psychiatry, 20, 11–21.
Siapas, A. G., Lubenov, E. V., & Wilson, M. A., (2005). Prefrontal phase locking to hip
pocampal theta oscillations. Neuron, 46, 141–151.
Siapas, A. G., & Wilson, M. A. (1998). Coordinated interactions between hippocampal rip
ples and cortical spindles during slow-wave sleep. Neuron, 21, 1123–1128.
Sinclair, J. G., & Lo, G. F. (1986). Ethanol blocks tetanic and calcium-induced long-term
potentiation in the hippocampal slice. General Pharmacology, 17, 231–233.
Sirota, A., Csicsvari, J. Buhl, D., & Buzsáki, G. (2003). Communication between neocortex
and hippocampus during sleep in rats and mice. Proceedings of the National Academy of
Sciences U S A, 100, 2065–2069.
Smith, C. N., & Squire, L. R. (2009). Medial temporal lobe activity during retrieval of se
mantic memory is related to the age of the memory. Journal of Neuroscience, 29, 930–
938.
Page 31 of 34
Memory Consolidation
Song, S. S., Howard, J. H., Jr., & Howard, D. V. (2007). Sleep does not benefit probabilistic
motor sequence learning. Journal of Neuroscience, 27, 12475–12483.
Squire, L. R. (1992) Memory and the hippocampus: a synthesis from findings with rats,
monkeys, and humans. Psychological Review, 99, 195–231.
Squire, L. R. (2009). The legacy of patient H.M. for neuroscience. Neuron, 61, 6–9.
Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consolidation: A neu
robiological perspective. Current Opinion in Neurobiology, 5, 169–177.
Squire, L. R., Clark, R. E., & Knowlton, B. J. (2001). Retrograde amnesia. Hippocampus,
11, 50–55.
Squire, L. R., & Wixted, J. T. (2011). The cognitive neuroscience of human memory since
H.M. Annual Review of Neuroscience, 34, 259–288.
Squire, L. R., & Zola, S. M. (1996). Structure and function of declarative and nondeclara
tive memory systems. Proceedings of the National Academy of Sciences U S A, 93, 13515–
13522.
Stafford, J. M., & Lattal, K. M. (2009). Direct comparisons of the size and persistence of
anisomycin-induced consolidation and reconsolidation deficits. Learning and Memory, 16,
494–503.
Steinvorth, S., Levine, B., & Corkin, S. (2005). Medial temporal lobe structures are need
ed to re-experience remote autobiographical memories: evidence from H.M. and W.R.
Neuropsychologia, 43, 479–496.
Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Des
seilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P.
(2009). Sleep promotes the neural reorganization of remote emotional memory. Journal of
Neuroscience, 16, 5143–5152.
Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires
sleep after training. Nature Neuroscience, 3, 1237–1238.
Takashima, A., Nieuwenhuis, I. L. C., Jensen, O., Talamini, L. M., Rijpkema, M., & Fernán
dez, G. (2009). Shift from hippocampal to neocortical centered retrieval network with
consolidation. Journal of Neuroscience, 29, 10087–10093.
(p. 455) Takashima, A., Petersson, K. M., Rutters, F., Tendolkar, I., Jensen, O., Zwarts, M.
J., McNaughton, B. L., & Fernández, G. (2006). Declarative memory consolidation in hu
mans: A prospective functional magnetic resonance imaging study. Proceedings of the Na
tional Academy of Sciences U S A, 103, 756–761.
Page 32 of 34
Memory Consolidation
Takehara, K., Kawahara, S., & Kirino, Y. (2003). Time-dependent reorganization of the
brain components underlying memory retention in trace eyeblink conditioning. Journal of
Neuroscience, 23, 9897–9905.
Talamini, L. M., Nieuwenhuis, I. L., Takashima, A., & Jensen, O. (2008). Sleep directly fol
lowing learning benefits consolidation of spatial associative memory. Learning and Memo
ry, 15, 233–237.
Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M.
P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316, 76–82.
Wagner, U., Gais, S., Haider, H., Verleger, R., & Born, J. (2004) Sleep inspires insight. Na
ture, 427, 352–355.
Walker, M. P., Brakefield, T., Hobson, J. A., & Stickgold, R. (2003a). Dissociable stages of
human memory consolidation and reconsolidation. Nature, 425, 616–620.
Walker, M. P., Brakefield, T., Morgan, A., Hobson, J. A., & Stickgold, R. (2002). Practice
with sleep makes perfect: Sleep-dependent motor skill learning. Neuron, 35, 205–211.
Walker, M. P., Brakefield, T., Seidman, J., Morgon, A., Hobson, J. A., & Stickgold, R.
(2003b). Sleep and the time course of motor skill learning. Learning and Memory, 10,
275–284.
Walker, M. P., Stickgold, R., Jolesz, F. A., & Yoo, S. S. (2005). The functional anatomy of
sleep-dependent visual skill learning. Cerebral Cortex, 15, 1666–1675.
Watkins, C., & Watkins, M. J. (1975). Buildup of proactive inhibition as a cue-overload ef
fect. Journal of Experimental Psychology: Human Learning and Memory, 1, 442–452.
Weingartner, H. J., Sirocco, K., Curran, V., & Wolkowitz, O. (1995). Memory facilitation fol
lowing the administration of the benzodiazepine triazolam. Experimental and Clinical Psy
chopharmacology, 3, 298–303.
Whitlock, J. R., Heynen A. J., Schuler M. G., & Bear M. F. (2006). Learning induces long-
term potentiation in the hippocampus. Science, 313, 1058–1059.
Page 33 of 34
Memory Consolidation
Wixted, J. T. (2004b). On common ground: Jost’s (1897) law of forgetting and Ribot’s
(1881) law of retrograde amnesia. Psychological Review, 111, 864–879.
Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law and the Ebbinghaus
savings function. Psychological Science, 18, 133–134.
Wixted, J. T., & Ebbesen, E. (1991). On the form of forgetting. Psychological Science, 2,
409–415.
Xu, L., Anwyl, R., & Rowan, M. J. (1998). Spatial exploration induces a persistent reversal
of long-term potentiation in rat hippocampus. Nature, 394, 891–894.
Yamashita, K., Hirose, S., Kunimatsu, A., Aoki, S., Chikazoe, J., Jimura, K., Masutani, Y.,
Abe, O., Ohtomo, K., Miyashita, Y., & Konishi, S. (2009). Formation of long-term memory
representation in human temporal cortex related to pictorial paired associates. Journal of
Neuroscience, 29, 10335–10340.
Yaroush, R., Sullivan, M. J., & Ekstrand, B. R. (1971). The effect of sleep on memory: II.
Differential effect of the first and second half of the night. Journal of Experimental Psy
chology, 88, 361–366.
Yuste, R., & Bonhoeffer, T. (2001). Morphological changes in dendritic spines associated
with long-term synaptic plasticity. Annual Review of Neuroscience, 24, 1071–1089.
John Wixted
Denise J. Cai
Page 34 of 34
Age-Related Decline in Working Memory and Episodic Memory
Memory is one of the cognitive functions that deteriorate most with age. The types of
memory most affected by aging are working memory, the short-term memory mainte
nance and simultaneous manipulation of information, and episodic memory, our memory
for personally experienced past events. Functional neuroimaging studies indicate impor
tant roles in age-related memory decline for the medial temporal lobe (MTL) and pre
frontal cortex (PFC) regions, which have been linked to two major cognitive aging theo
ries, the resource and binding deficit hypotheses, respectively. Interestingly, functional
neuroimaging findings also indicate that aging is not exclusively associated with decline.
Some older adults seem to deal with PFC and MTL decline by shifting to alternative brain
resources that can compensate for their memory deficits. In the future, these findings
may help to distinguish normal aging from early Alzheimer’s dementia and the develop
ment of memory remediation therapies.
Keywords: functional neuroimaging, aging, working memory, episodic memory, medial temporal lobe, prefrontal
cortex
Page 1 of 27
Age-Related Decline in Working Memory and Episodic Memory
Introduction
As we age, our brain declines both in terms of anatomy and physiology. This brain decline
is accompanied by cognitive decline, which is most notable in the memory domain. Un
derstanding the neural basis of age-related memory decline is important for two main
reasons. First, in view of the growing number of older adults in today’s society, cognitive
aging is increasingly becoming a problem in general health care, and effective therapies
can only be developed on the basis of knowledge obtained through basic research. Se
cond, there is a subgroup of elderly people whose memory impairments are more severe,
preventing normal functioning in society. For these persons, such memory impairments
can be the earliest sign of pathological age-related conditions, such as Alzheimer’s de
mentia (AD). Particularly in the early stages of this disease, the differentiation from nor
mal age-related memory impairments is very difficult to make. Thus, it is important to
map out which memory deficits can be regarded as a correlate of normal aging and which
deficits are associated with age-related pathology. Working memory (WM) and episodic
memory (EM) are the two types of memory most affected by the aging process. WM
refers to the short-term memory maintenance and simultaneous manipulation of informa
tion. Clinical and functional neuroimaging evidence indicates that WM is particularly de
pendent on the functions of the prefrontal cortex (PFC) and the parietal cortex (Wager &
Smith, 2003). EM refers to the encoding and retrieval of personally experienced events
(Gabrieli, 1998; Tulving, 1983). Clinical studies have shown that EM is primarily depen
dent on the integrity of the medial temporal lobe (MTL) memory system (Milner, 1972;
Squire, Schmolck, & Stark, 2001). However, functional neuroimaging (p. 457) studies have
also underlined the contributions of PFC to EM processes (Simons & Spiers, 2003). In
this chapter, we examine the roles of PFC and MTL changes in age-related decline in WM
and EM by reviewing findings from functional neuroimaging studies of healthy aging.
With respect to EM, most studies have scanned either the encoding or the retrieval phase
Page 2 of 27
Age-Related Decline in Working Memory and Episodic Memory
of EM, and hence, we will consider these two phases in separate sections. We also discuss
how these WM and EM findings relate to two major cognitive theories of aging, the re
source deficit hypothesis (Craik, 1986), and the binding deficit hypothesis (Johnson,
Hashtroudi, & Lindsay, 1993; Naveh-Benjamin, 2000).
Regarding the role of PFC and MTL, one influential neurocognitive view is that the age-
related decline in EM and WM results from a selective PFC decline, whereas MTL degra
dation is an indicator of pathological age-related conditions (Buckner, 2004; Hedden &
Gabrieli, 2004; West, 1996). This is based on anatomical studies showing that the PFC is
the brain region that shows the greatest brain atrophy with age (Raz, 2005; Raz et al.,
1997). However, there is now substantial evidence that the MTL also shows substantial
anatomical and functional decline in healthy older adults (OAs), and thus, it is no longer
possible to ascribe age-related memory deficits exclusively to PFC decline. It should be
noted, though, that not all MTL regions show decline with age. As shown in Figure 22.1, a
recent longitudinal study found that in healthy OAs, the hippocampus showed atrophy
similar to the PFC, whereas the rhinal cortex did not (Raz et al., 2005). The differential ef
fects of aging on hippocampus and rhinal cortex is very interesting because the rhinal
cortex is one of the regions first affected by AD (Braak, Braak, & Bohl, 1993). As dis
cussed later, together with recent functional magnetic resonance imaging (fMRI) evi
dence of dissociations between hippocampal and rhinal functions in aging (Daselaar,
Fleck, Dobbins, Madden, & Cabeza, 2006), these findings have implications for the early
diagnosis of AD.
This chapter focuses on two major cognitive factors thought to underlie age-related mem
ory decline and strongly linked to PFC and MTL function, namely deficits in executive
function and deficits in binding processes. The term executive function describes a set of
cognitive abilities that control and regulate other abilities and behaviors. With respect to
EM, executive functions are necessary (p. 458) to keep information available online so that
it can be encoded in or retrieved from EM. The PFC is generally thought to be the key
brain region underlying executive functions (Miller & Cohen, 2001). According to the re
source deficit hypothesis (Craik, 1986), age-related cognitive impairments, including WM
and EM deficits, are the result of a general reduction in attentional resources. As a result,
OAs have greater difficulties with cognitive tasks that provide less environmental sup
port, and hence require greater self-initiated processing. Executive and PFC functions are
thought to be major factors explaining resource deficits in OAs (Craik, 1977, 1986; Glisky,
2007; West, 1996). Yet, we acknowledge that there are other factors, not discussed in this
chapter, that can explain these deficits, including speed of processing (Salthouse, 1996)
and inhibition deficits (Hasher, Zacks, & May, 1999).
Binding refers to our capacity to bind into one coherent representation the individual ele
ments that together make up an episode in memory, such as sensory inputs, thoughts,
and emotions. Binding is assumed to be more critical for recollection than for familiarity.
Recollection refers to remembering an item together with contextual details, whereas fa
miliarity refers to knowing that an item without occurred in the past even though its con
textual details cannot be retrieved. According to relational memory theory, different sub
Page 3 of 27
Age-Related Decline in Working Memory and Episodic Memory
regions of MTL are differentially involved in memory for relations among items and mem
ory for individual items (Eichenbaum, Yonelinas, & Ranganath, 2007). In particular, the
hippocampal formation is more involved in relational memory (binding, recollection),
whereas the surrounding parahippocampal gyrus and rhinal cortex are more involved in
item memory (familiarity). According to the binding deficit hypothesis (Johnson et al.,
1993; Naveh-Benjamin, 2000), OAs are impaired in forming and remembering associa
tions between individual items and between items and their context. As a result, age-re
lated WM and EM impairments are more pronounced on tasks that require binding be
tween study items (i.e., relational memory and recollection-based tasks). It is important to
note that these tests also depend on executive functions mediated by PFC, and hence it is
not possible to simply associate age-related deficits in these tasks with MTL function.
Given the role of PFC in executive functions and MTL in binding operations, we will take
an in-depth look at functional neuroimaging studies of WM and EM that revealed age-re
lated functional changes in PFC or MTL regions. We will first discuss studies showing
PFC differences during WM and EM encoding and retrieval, and then MTL differences.
We will conclude with a discussion of different interpretations of age-related memory de
cline derived from these findings, and how they relate to deficits in executive function
and binding capacity.
In the case of the PFC, both age-related reductions and increases in activity have been
found. Age-related reductions in PFC activity are often accompanied by increased activity
in the contralateral hemisphere, leading to a more bilateral activation pattern in OAs than
in younger adults (YAs). This pattern has been conceptualized in a model called hemi
spheric asymmetry reduction in older adults (HAROLD), which states that, under similar
conditions, PFC activity tends to be less lateralized in OAs than in YAs (Cabeza, 2002). In
addition to the HAROLD pattern, over-recruitment of PFC regions in OAs is often found in
the same hemisphere. When appropriate, we will distinguish between neuroimaging stud
ies presenting task conditions in distinct blocks (blocked design) and studies that charac
terize stimuli on a trial-by-trial basis based on performance (event-related design).
Working Memory
Imaging studies of aging and WM function have shown altered patterns of activation in
OAs compared with YAs, particularly in PFC regions. Studies that compared simple main
tenance tasks have generally found increased PFC activity in OAs, which is correlated
with better performance (Rypma & D’Esposito, 2000). Additionally, PFC activity in OAs
not only is greater overall in these studies but also is often more bilateral, exhibiting the
Page 4 of 27
Age-Related Decline in Working Memory and Episodic Memory
aforementioned HAROLD pattern (Cabeza et al., 2004; Park et al., 2003; Reuter-Lorenz et
al., 2000).
For example, Reuter-Lorenz et al. (2000) used a maintenance task in which participants
maintained four letters in WM and then compared them to a probe letter. As shown in
Figure 22.2, YAs showed left lateralized activity, whereas OAs showed bilateral activity.
They interpreted this HAROLD pattern as compensatory. Consistent with this interpreta
tion, the OAs who showed the bilateral activation pattern were faster in the verbal WM
task than those who did not. In addition to the verbal WM condition, they also included a
spatial WM task. In this (p. 459) task, YAs activated right PFC, and OAs additionally re
cruited left PFC. Thus, even though age-related increases were in opposite hemispheres,
both verbal and spatial WM conditions yielded the HAROLD pattern (see Figure 22.2).
This finding supports the generalizability of the HAROLD model to different kinds of stim
uli.
In contrast to studies using simple maintenance tasks, recent WM studies that manipulat
ed WM load found both age-related decreases and increases in PFC activity. Cappell et al.
(Cappell, Gmeindl, & Reuter-Lorenz, 2010) used a WM maintenance task with three dif
ferent memory loads: low, medium, and high. OAs showed increased activation in right
dorsolateral PFC regions during the lower load conditions. However, during the highest
load condition, OAs showed a reduction in left dorsolateral PFC activation. Another WM
study by Schneider-Garces et al. (2009) reported similar findings. They varied WM load
between two and six letters, and measured a “throughput” variable reflecting the amount
of information processed at a given load. Whereas YAs showed increasing throughput lev
els with higher WM loads, the levels of OAs showed an asymptote-like function with lower
levels at the highest load. Matching the behavioral results, they found that overall PFC
activity showed an increasing function in YAs, but an asymptotic function in OAs, leading
to an age-related over-recruitment in PFC activity during the lower loads and an under-
recruitment during the highest load.
Page 5 of 27
Age-Related Decline in Working Memory and Episodic Memory
The results by Cappell et al. and Schneider-Garces et al. are in line with the compensa
tion-related utilization of neural circuits hypothesis (CRUNCH) (Reuter-Lorenz & Cappell,
2008). CRUNCH was proposed to account for patterns of overactivation and underactiva
tion in OAs. According to CRUNCH, declining neural efficiency leads OAs to engage more
neural circuits than YAs to meet task demands. Therefore, OAs show more activity at low
er levels of task demands. However, as demands increase, YAs show greater activity to
meet increasing task loads. OAs, on the other hand, have already reached their ceiling
and will show reduced performance and underactivation.
In summary, WM studies often found that OAs show reduced activity in the PFC regions
engaged by YAs but greater activity in other PFC regions, such as contralateral PFC re
gions (i.e., the PFC hemisphere less engaged by YAs). In some cases (Reuter-Lorenz et al.,
2000), contralateral recruitment led to a more bilateral pattern of PFC activity in OAs
(i.e., HAROLD). Moreover, the WM studies by Cappel et al. and Schneider-Garces et al. il
lustrate the importance of distinguishing between difficulty levels when considering age
differences in brain activity. In general, age-related increases in PFC activity were attrib
uted to compensatory mechanisms.
The difference between blocked and event-related encoding studies is that the first
method measures overall task activity including regions involved in simply processing the
task instructions and stimuli, whereas the second method measures memory-specific ac
tivity because overall task characteristics are subtracted out. Although OAs may show a
difference in task-related processes that are not immediately relevant for memory encod
ing, such as interpreting task instructions and switching between task conditions, they
may recruit processes associated with successful memory encoding to a similar or even
greater extent.
The most common result in blocked design EM encoding studies using incidental and in
tentional learning instructions is a reduction in left PFC activity. This reduction in left
PFC activity was often coupled with an increase in right PFC activity, yielding the
HAROLD pattern. In line with the resource deficit hypothesis, PFC reductions are more
pronounced for intentional conditions—which require more self-initiated processing—
than for incidental encoding conditions. For example, Logan et al. (2002) reported that
Page 6 of 27
Age-Related Decline in Working Memory and Episodic Memory
during self-initiated, intentional encoding instructions, OAs compared with YAs showed
less activity in left PFC but greater activity in right PFC, resulting in a more bilateral ac
tivity pattern (HAROLD). Results were similar for intentional encoding of both verbal and
nonverbal material. Interestingly, further exploratory analyses revealed that this pattern
was present in a group of “old-old” (mean age = 80), but not in a group of “young-
old” (mean age = 67), suggesting that contralateral recruitment is associated with more
pronounced age-related cognitive decline. At the same time, the decrease in left PFC was
not present in the old-old group during incidental encoding instructions, suggesting that
—in line with the resource deficit hypothesis—frontal reductions can be remediated by
providing environmental support during encoding (Figure 22.3).
An incidental EM encoding study by Rosen at al. (2002) distinguishing between high- and
low-performing OAs linked the HAROLD pattern specifically to high-performing OAs. This
study distinguished between OAs with high and low memory scores based on a neuropsy
chological test battery. The authors reported equivalent left PFC activity but greater right
PFC activity in the old-high memory group relative to YAs. In contrast, the old-low memo
ry group showed reduced activity in both left and right PFC. As a result, the old-high
group showed a more bilateral pattern of PFC activity than YAs (HAROLD). These find
ings support a compensatory interpretation of HAROLD.
Page 7 of 27
Age-Related Decline in Working Memory and Episodic Memory
In contrast to blocked design EM encoding studies, event-related fMRI studies using sub
sequent memory paradigms have often found age-related equivalent or increased activity
in left PFC activity. (Dennis, Daselaar, & Cabeza, 2006; Duverne, Motamedinia, & Rugg,
2009; Gutchess et al., 2005; Morcom, Good, Frackowiak, & Rugg, 2003). For instance,
Morcom et al. (2003) used event-related fMRI to study subsequent memory for semanti
cally encoded words. Recognition memory for these words was tested after a short and a
longer delay. Performance of OAs at the short delay was equal to that of YAs at the long
delay. Under these conditions, activity in left (p. 461) inferior PFC was greater for subse
quently recognized than forgotten words and was equivalent in both age groups. Howev
er, OAs showed greater right PFC activity than YAs, again resulting in a more bilateral
pattern of frontal activity (HAROLD).
As noted, one explanation for the discrepancy in left PFC activity between blocked and
event-related studies may be a difference between overall task activity (reduced) and
memory-related activity (preserved or enhanced). Although OAs may show a difference in
task-related processes that are not relevant for memory encoding, they may recruit
processes associated with successful memory encoding to a similar or even greater ex
tent. Another explanation may be a difference in sustained (blocked) and transient (event-
related) subsequent memory effects. A recent study by Dennis et al. (2006), used hybrid
blocked and event-related analyses to distinguish between transient and sustained subse
quent memory effects during deep incidental encoding of words. Subsequent memory
was defined as parametric increases in encoding activity as a function of a combined sub
sequent memory and confidence scale. This parametric response was measured in each
trial (transient activity) and in blocks of eight trials (sustained activity). Similar to the re
sults of Gutchess et al., subsequent memory analyses of transient activity showed age-re
Page 8 of 27
Age-Related Decline in Working Memory and Episodic Memory
lated increases in the left PFC. At the same time, subsequent memory analyses of sus
tained activity showed age-related reductions in the right PFC (Figure 22.4). The decline
in sustained subsequent memory activity in the PFC may involve age-related deficits in
sustained attention that affect encoding processes. The results underline the importance
of investigating aging effects on both transient and sustained neural activity.
To summarize encoding studies, the most consistent finding in incidental and intentional
encoding studies is an age-related reduction in left PFC activity. This finding is more fre
quent for intentional than for incidental encoding studies, suggesting that, in line with
the resource deficit hypothesis, the environmental support provided by a deep semantic
encoding task may attenuate the age-related decrease in left PFC activity. This effect was
found within subjects in the study by Logan et al. (2002). The difference between inten
tional and incidental encoding conditions suggests an important strategic component in
age-related memory decline. The reduction in left PFC activity was often coupled with an
increase in right PFC activity, leading to a bilateral pattern of PFC activity in OAs
(HAROLD). Importantly, the study by Rosen et al. (2002) found the HAROLD pattern only
in high-performing OAs. This result provides support for a compensatory account of
HAROLD. In contrast to blocked EM encoding studies, studies that used subsequent
memory paradigms often found increases in left PFC activity. This discrepancy may relate
to differences between overall task activity (reduced) and memory-related activity (pre
served or enhanced) and between the different attentional components measured during
blocked and event-related paradigms.
As an example of a blocked EM retrieval study, Cabeza et al. (1997) used both a word-pair
recognition and cued-recall task. During word recognition, OAs showed reduced activity
in the right PFC. During recall, OAs also showed weaker activations in the right PFC than
YAs, but at the same time, showed greater activity than YAs in the left PFC. The net result
was that PFC activity during recall was right lateralized in YAs but bilateral in OAs. The
authors noted this change in hemispheric asymmetry and interpreted it as compensatory.
This was the first study identifying the HAROLD pattern and the first one suggesting the
compensatory interpretation of this finding. These changes were more pronounced dur
ing recall than during recognition, consistent with behavioral evidence that recall is more
sensitive to aging.
In another study by Cabeza and colleagues (2002), YAs, high-performing OAs (old-high),
and low-performing OAs (old-low) studied words presented auditorily or visually. During
scanning, they were presented with words visually and made either old/new decisions
Page 9 of 27
Age-Related Decline in Working Memory and Episodic Memory
(item memory) or heard/seen decisions (context memory). Consistent with their previous
results, YAs showed right PFC activity for context trials, whereas OAs showed bilateral
PFC activity (HAROLD). Importantly, however, this pattern was only seen for the old-high
adults, supporting a compensation account of the HAROLD pattern (Figure 22.5).
Summarizing the studies on PFC and EM retrieval, in blocked design studies, PFC differ
ences between YA and OAs have been found more frequently in studies using tasks with
little environmental support, including recall and context memory tasks, than during sim
ple item recognition. This was exemplified in the study by Cabeza et al. (1997), which in
cluded both recall and recognition tasks. These findings suggest a three-way interaction
between age, executive demand, and frontal laterality. Distinguishing between old-high
and old-low adults, the study by Cabeza et al. (2002) provided direct evidence for the
compensation account of HAROLD. Similar to EM encoding studies that used the subse
quent memory paradigm, event-related EM retrieval studies that focused on the neural
correlates of successful EM retrieval have found equivalent or increased PFC activity in
OAs. Interestingly, in line with the WM studies by Cappell et al. (2010) and Schneider-
Page 10 of 27
Age-Related Decline in Working Memory and Episodic Memory
Garces et al. (2009), (p. 463) the event-related study by Morcom et al. (2007) illustrates
the importance of distinguishing between differences in task difficulty when considering
age differences in brain activity.
Frontal activations in aging showed both reductions and increases across aging, as well
as shifts in lateralization of activation. On the other hand, activation within the MTL gen
erally shows age-related decreases compared with MTL activation seen in YAs. However,
EM retrieval studies indicate a shift in the foci of activation from the hippocampus proper
to more parahippocampal regions in aging.
Working Memory
Although the MTL has been strongly linked to EM, and the PFC to WM, MTL processes
are also thought to play a role in WM tasks, particularly when these involve the binding
between different elements (Ranganath & Blumenfeld, 2005). Regarding aging, only
three WM studies found reductions in hippocampal activity during WM tasks, which all
used nonverbal tasks.
The first study was conducted by Grady et al. (1998). They employed a face WM task with
varying intervals of item maintenance. Results showed that OAs have difficulty maintain
ing hippocampal activation across longer delays. As the delay extended from 1 to 6 sec
onds, left hippocampal activity increased in YAs but decreased in OAs, which implies that
OAs have difficulties initiating memory strategies mediated by MTL or sustaining MTL ac
tivity beyond very short retention intervals.
The second study was conducted by Mitchell et al. (2000). They investigated a WM para
digm with an important binding component. In each trial, participants were presented an
object in a particular screen location and had to hold in WM the object, its location, or
both (combination trials). Combination trials can be assumed to involve not only WM
maintenance but also the binding of different information into an integrated memory
trace (associative memory EM encoding). OAs showed a deficit in accuracy in the combi
nation condition but not in the object and location conditions. Two regions were differen
Page 11 of 27
Age-Related Decline in Working Memory and Episodic Memory
tially involved in the combination condition in YAs but not in OAs: a left anterior hip
pocampal region and an anteromedial PFC region (right Brodmann area 10) (Figure
22.6). According to the authors, a disruption of a hippocampal–PFC circuit may underlie
binding deficits in OAs.
Finally, in a study by Park et al. (2003), OAs showed an age-related reduction in hip
pocampal activity. The left hippocampus was more activated in the viewing than in the
maintenance condition in YAs but not in OAs. As in the study by Mitchell et al. (2000), the
age-related reduction in hippocampal activity was attributed to deficits in binding opera
tions.
In sum, three nonverbal WM studies using spatial/pictorial stimuli (Grady et al., 1998;
Mitchell et al., 2000; Park et al., 2003) found age-related decreases in hippocampus activ
ity. Interestingly, no verbal WM study found such decreases (Cappell et al., 2010; Reuter-
Lorenz et al., 2000). It is possible that nonverbal tasks are more dependent on hippocam
pal-mediated relational memory processing, and hence more sensitive to age-related
deficits in MTL regions. Thus, contrary to PFC findings, WM studies have generally found
age-related reductions in MTL activity.
Although not as frequently as reductions in left PFC activity, blocked design studies using
both intentional and incidental EM encoding paradigms have found age-related decreases
in MTL activity. As an example of intentional EM encoding, in their study examining face
encoding, Grady et al. (1995) found that, compared with YAs, OAs showed less activity not
only in the left PFC but also in the MTL. Furthermore, they found a highly significant cor
relation between hippocampus and left PFC activity in YAs, but not in OAs. Based on
these results, they concluded that encoding in OAs is accompanied by reduced neural ac
tivity and diminished connectivity between PFC and MTL areas. As an example of inciden
tal EM encoding, Daselaar et al. (2003a) investigated levels of processing in aging using a
deep (living/nonliving) versus shallow (uppercase/lowercase) encoding task. Despite see
ing common activation of regions involved in a semantic network across both age groups,
activation differences were seen when comparing levels of processing. OAs revealed sig
nificantly less activation in the left anterior hippocampus during deep relative to shallow
classification. The researchers concluded that in addition to PFC changes, under-recruit
ment of MTL regions contributes, at least in part, to age-related impairments in EM en
coding.
Similar to block design studies, event-related EM encoding studies have generally found
age-related reductions in MTL activity during tasks using single words or pictures (Den
Page 12 of 27
Age-Related Decline in Working Memory and Episodic Memory
nis et al., 2006; Gutchess et al., 2005). A study by Dennis et al. (2008) also included a
source memory paradigm and provided clear support for the binding deficit hypothesis.
YAs and OAs were studied with fMRI while encoding faces, scenes, and face–scene pairs
(source memory). In line with binding deficit theory, the investigators found age-related
reductions in subsequent memory activity in the hippocampus, which were more pro
nounced for face–scene pairs than for item memory (faces and scenes).
The aforementioned study by Daselaar et al. (2003b) that distinguished between high-
and low-performing OAs linked MTL reductions directly to individual differences in mem
ory performance. These researchers found an age-related reduction activity in the anteri
or MTL when comparing subsequently remembered items to a motor baseline, which was
specific to low-performing OAs. Based on these findings, they concluded that MTL dys
function during encoding is an important factor in age-related memory decline.
To summarize, blocked and event-related EM encoding studies have generally found age-
related MTL reductions (Daselaar et al., 2003a, 2003b; Dennis et al., 2006; Gutchess et
al., 2005). In line with the binding deficit hypothesis, the study by Dennis et al. links age-
related MTL reductions mainly to source memory. However, other studies also found re
ductions during individual items. These findings suggest that age-related binding deficits
play a role not only in complex associative memory tasks but also in simpler item memory
tasks. One explanation for these results is that, in general, item memory tasks also have
an associative component. In fact, the deep processing tasks used in these studies are
specifically designed to invoke semantic associations in relation to the study items. As dis
cussed in the next section, recollection of these associations can be used as confirmatory
evidence during EM retrieval: remembering specific associations with a study item con
firms that one has seen the study items. The study by Daselaar et al. (2003b) directly
linked reduced MTL activity during single-word encoding to impaired performance on a
subsequent recognition test.
Page 13 of 27
Age-Related Decline in Working Memory and Episodic Memory
The first support for a shift from recollection to familiarity EM retrieval processes came
from a study by Cabeza et al. (2004). They investigated the effects of aging on several
cognitive tasks, including a verbal recognition task. Within the MTLs, they found a disso
ciation between a hippocampal region, which showed weaker activity in OAs than in YAs,
and a parahippocampal region, which showed the converse pattern. Given evidence that
hippocampal and parahippocampal regions are, respectively, more involved in recollec
tion than familiarity (Eichenbaum et al., 2007), this finding is consistent with the notion
that OAs are more impaired in recollection than in familiarity (e.g., Jennings & Jacoby,
1993; Parkin & Walter, 1992). Indeed, the age-related increase in parahippocampal cortex
activity suggests that OAs may be compensating for recollection deficits by relying more
on familiarity. Supporting this idea, OAs had a larger number of “know” (familiarity-based
EM retrieval—“knowing” that something is old) responses than YAs, and these responses
were positively correlated with the parahippocampal activation.
In line with the findings by Cabeza and colleagues, a recent study by Giovanello et al.
(2010) also found a hippocampal–parahippocampal shift with age during retrieval. They
used a false memory paradigm in which conjunction words during study were recombined
during testing. For instance, “blackmail” and “jailbird” were presented during the study,
and “blackbird” was presented during test. False memory conjunction errors (responding
“blackbird” is old) tended to occur more frequently in OAs than in YAs. Giovanello and
colleagues found that OAs showed more parahippocampal activity during recombined
conjunction words at retrieval (false memory), but less hippocampal activity during iden
tical conjunction words (veridical memory). Given that false memories are associated with
gist-based processes that rely on familiarity (Balota et al., 1999; Dennis, Kim, & Cabeza,
2008), the age-related increase in parahippocampal cortex fits well with the results by
Cabeza et al. (2004).
Another study by Cabeza and colleagues provided a similar pattern of results (Daselaar,
Fleck, et al., 2006). YAs and OAs made old/new judgments about previously studied words
followed by a confidence judgment from low to high. On the basis of previous research
(Daselaar, Fleck, & Cabeza, 2006; Yonelinas, 2001), recollection was measured as an ex
ponential change in brain activity as a function of confidence, and familiarity was mea
sured as a linear change. The results revealed a clear double dissociation within MTL:
whereas recollection-related activity in the hippocampus was reduced by aging, familiari
ty-related activity in rhinal cortex was increased by aging (Figure 22.7A). These results
suggested that OAs compensate for deficits in recollection processes mediated by the hip
pocampus by relying more on familiarity processes mediated by rhinal cortex. Supporting
this interpretation, within-participants regression analyses based on single-trial activity
showed that recognition accuracy was determined by only hippocampal activity in YAs but
by both hippocampal and rhinal activity in OAs. Also consistent with the notion of com
pensation, functional connectivity analyses showed that correlations between the hip
pocampus and posterior regions associated with recollection were greater in YAs, where
as correlations between the rhinal cortex and bilateral PFC regions were greater in OAs
(Figure 22.7B). The latter effect suggests a top-down modulation of PFC on rhinal activity
in OAs. The finding of preserved rhinal function in healthy OAs has important clinical im
Page 14 of 27
Age-Related Decline in Working Memory and Episodic Memory
plications because this region is impaired early in AD (Killiany et al., 2000; Pennanen et
al., 2004).
In sum, retrieval studies have found both increases and decreases in MTL activity. The
findings by Cabeza and colleagues suggest that at least some of these increases reflect a
shift from recollection-based (hippocampus) to familiarity-based (parahippocampal\rhinal
cortex) retrieval. Furthermore, their functional connectivity findings suggest that the
greater reliance on familiarity processes in OAs may be mediated by a top-down frontal
modulation.
Discussion
In summary, our review of functional neuroimaging studies of cognitive aging has identi
fied considerable age-related changes in activity during WM, EM encoding, and EM re
trieval tasks not only in the PFC but also in the MTL. These findings suggest that func
tional changes in both the PFC and MTL play a role in age-related memory deficits. Fo
cusing first on PFC findings, the studies indicated both age-related reductions and in
creases in PFC activity. During WM tasks, OAs show reduced activity in the PFC regions
engaged by YAs, but greater activity in other regions, such as contralateral PFC regions.
The latter changes often resulted in the more bilateral pattern of PFC activity in OAs than
YAs known as HAROLD (Cabeza (p. 466)
Page 15 of 27
Age-Related Decline in Working Memory and Episodic Memory
Page 16 of 27
Age-Related Decline in Working Memory and Episodic Memory
The resource deficit hypothesis postulates that aging reduces attentional resources, and
as a result, OAs have greater difficulties with cognitive tasks, including EM tasks, that re
quire greater self-initiated processing. This hypothesis predicts that age-related differ
ences should be smaller when the task provides a supportive environment that reduces
attentional demands. Among other findings, the resource deficit hypothesis is supported
by evidence that when attentional resources are reduced in YAs, they tend to show EM
deficits that resemble those of OAs (Anderson, Craik, & Naveh-Benjamin, 1998; Jennings
& Jacoby, 1993).
Regarding neural correlates, Craik (1983) proposed that OAs’ deficits in processing are
related to a reduction in the efficiency of PFC functioning. This idea fits with the fact that
this region shows the most prominent gray matter atrophy. Moreover, functional neu
roimaging studies have found age-related changes in PFC activity that are generally in
line with the resource deficit hypothesis.
Given the critical role of PFC in managing attentional resources, the resource deficit hy
pothesis predicts that age-related changes in PFC activity will be larger for tasks involv
ing greater self-initiated processing or less environmental support. The results of func
tional neuroimaging studies are generally consistent with this prediction. During EM en
coding, age-related decreases in left PFC activation were found frequently during inten
tional EM encoding conditions (which provide less environmental support) but rarely dur
ing incidental EM encoding conditions (which provide greater environmental support).
Similarly, during EM retrieval, age-related differences in PFC activity were usually larger
for recall and context memory tasks (which require greater cognitive resources) than for
recognition memory tasks (which require fewer cognitive resources). Thus, in general,
age effects on PFC activity tend to increase as a function of the demands placed on cogni
tive resources.
However, not all age-related changes in PFC activity suggested decline; on the contrary,
many studies found age-related increases in PFC that suggested compensatory mecha
nisms in the aging brain. In particular, several studies of EM encoding and EM retrieval
found activations in contralateral PFC regions in OAs that were not seen in YAs. Impor
tantly, experimental comparisons between high- and low-performing OAs (Cabeza et al.,
2002; Rosen et al., 2002) demonstrated the beneficial contribution of contralateral PFC
recruitment to memory performance in OAs. Moreover, a recent study using transcranial
magnetic stimulation (TMS) found that in YAs, episodic EM retrieval performance was im
paired by TMS of the right PFC but not of the left PFC, whereas in OAs, it was impaired
by either right or left PFC stimulation (Rossi et al., 2004). This result indicates that the
left PFC was less critical for YAs and was used more by OAs, consistent with the compen
sation hypothesis.
It is important to note that resource deficit and compensatory interpretations are not in
compatible. In fact, it is reasonable to assume that the recruitment of additional brain re
gions (e.g., in the contralateral PFC hemisphere) reflects an attempt to compensate for
Page 17 of 27
Age-Related Decline in Working Memory and Episodic Memory
reduced cognitive resources. One way in which OAs could counteract deficits in the par
ticular pool of cognitive resources required by a cognitive task is to tap into other pools of
cognitive resources. If one task is particularly dependent on cognitive processes mediat
ed by one hemisphere, the other hemisphere represents an alternative pool of cognitive
resources. Thus, in the case of PFC-mediated cognitive resources, if OAs have deficits in
PFC activity in one hemisphere, they may compensate for these deficits by recruiting con
tralateral PFC regions. Moreover, age-related decreases suggestive of resource deficits
and age-related increases suggestive of compensation have often been found in the same
conditions. For example, intentional EM encoding studies have shown age-related de
creases in left PFC activity coupled with age-related increases in right PFC activity, lead
ing to a dramatic reduction in hemispheric asymmetry in OAs (i.e., HAROLD).
(p. 468) Binding Deficit Hypothesis and Medial Temporal Lobe Function
The binding deficit hypothesis postulates that age-related memory deficits are primarily
the result of difficulties in EM encoding and retrieving novel associations between items.
This hypothesis predicts that OAs are particularly impaired in EM tasks that involve rela
tions between individual items or between items and their context. Given that relational
memory has been strongly associated with the hippocampus (Eichenbaum, Otto, & Co
hen, 1994), this hypothesis also predicts that OAs will show decreased hippocampal activ
ity during memory tasks, particularly when they involve relational information.
Functional neuroimaging studies have identified considerable age-related changes not on
ly in the PFC but also in MTL regions. As noted, the MTL also shows substantial atrophy
in aging. Yet, the rate of decline differs for different subregions: Whereas the hippocam
pus shows a marked decline, the rhinal cortex is relatively preserved in healthy aging
(see Figure 22.1). This finding is in line with the idea that age-related memory deficits are
particularly pronounced during relational memory tasks, which depend on the hippocam
pus.
In line with anatomical findings, functional neuroimaging studies have found substantial
age-related changes in MTL activity during WM, EM encoding, and EM retrieval. Several
studies have found age-related decreases in both hippocampal and parahippocampal re
gions. During WM and EM encoding, these reductions are seen in tasks that emphasize
the binding between different elements (Dennis, Hayes, et al., 2008; Mitchell et al.,
2000). However, declines in hippocampal activation are also seen during tasks using indi
vidual stimuli in healthy OAs (e.g., Daselaar et al., 2003a, 2003b; Park et al., 2003). Final
ly, during EM retrieval some studies found decreases in hippocampal activity, but also
greater activity in parahippocampal regions, which may be compensatory (Cabeza et al.,
2004; Daselaar, Fleck, & Cabeza, 2006; Giovanello et al., 2010).
In general, age-related changes in MTL activity are consistent with the binding deficit hy
pothesis. Age-related reductions in hippocampal activity were found during the mainte
nance of EM encoding of complex scenes, which involved associations among picture ele
ments (Gutchess et al., 2005), and during deep EM encoding of words, which involved
Page 18 of 27
Age-Related Decline in Working Memory and Episodic Memory
identification of semantic associations (Daselaar et al., 2003a, 2003b; Dennis et al., 2006).
Finally, one study specifically linked age-related reductions in hippocampal activity to rec
ollection, which involves recovery of item–context associations (Daselaar, Fleck, et al.,
2006). Yet, it should be noted that age-related changes in MTL activity were often accom
panied by concomitant changes in PFC activity. Hence, in these cases, it is unclear
whether such changes signal MTL dysfunction or are the result of a decline in executive
processes mediated by PFC regions. However, studies using incidental EM encoding tasks
with minimal self-initiated processing requirements have also identified age-related dif
ferences in MTL activity without significant changes in PFC activity (Daselaar et al.,
2003a, 2003b)
As in the case of PFC, not all age-related changes in MTL activity suggest decline; several
findings suggest compensation. OAs have been found to show reduced activity in the hip
pocampus but increased activity in other brain regions such as the parahippocampal
gyrus (Cabeza et al., 2004) and the rhinal cortex (Daselaar, Fleck, et al., 2006). These re
sults were interpreted as a recruitment of familiarity processes mediated by parahip
pocampal regions to compensate for the decline of recollection processes that are depen
dent on the hippocampus proper. These results fit well with the relational memory view
(Cohen & Eichenbaum, 1993; Eichenbaum et al., 1994), which states that the hippocam
pus is involved in binding an item with its context (recollection), whereas the surrounding
parahippocampal cortex mediates item-specific memory processes (familiarity).
As mentioned at the beginning of this chapter, one of the biggest challenges in cognitive
aging research is to isolate the effects of healthy aging from those of pathological aging.
Structural neuroimaging literature suggests that healthy aging is accompanied by greater
declines in frontal regions compared with MTL (Raz et al., 2005). In contrast, pathologi
cal aging is characterized by greater decline in MTL than in frontal regions (Braak,
Braak, & Bohl, 1993; Kemper, 1994). In fact, functional neuroimaging evidence suggests
that prefrontal activity tends to be maintained or even increased in early AD (Grady,
2005). Thus, these findings suggest that memory decline in healthy aging is more depen
dent on frontal than MTL deficits, whereas the opposite pattern is more characteristic of
pathological aging (Buckner, 2004; West, 1996). In view of these findings, clinical studies
aimed at an early diagnosis of (p. 469) age-related pathology have mainly targeted
changes in MTL (Nestor, Scheltens, & Hodges, 2004). Yet, the studies reviewed in this
chapter clearly indicate that healthy OAs are also prone to MTL decline. Hence, rather
than focusing on MTL deficits alone, diagnosis of age-related pathology may be improved
by employing some type of composite score reflecting the ratio between MTL and frontal
decline.
In terms of MTL dysfunction in healthy and pathological aging, it is also critical to assess
the specific type or loci of MTL dysfunction. Critically, a decline in hippocampal function
can be seen in both healthy aging and AD. Thus, even though hippocampal volume de
cline is an excellent marker of concurrent AD (Scheltens, Fox, Barkhof, & De Carli, 2002),
Page 19 of 27
Age-Related Decline in Working Memory and Episodic Memory
it is not a reliable measure for distinguishing normal aging from early stages of the dis
ease (Raz et al., 2005). In contrast, changes in the rhinal cortex are not apparent in
healthy aging (see Figure 22.1), but they are present in early AD patients with only mild
impairments (Dickerson et al., 2004). In a discriminant analysis, Pennanen and colleagues
(2004) showed that, although hippocampal volume is indeed the best marker to discrimi
nate AD patients from normal controls, measuring the volume of the entorhinal cortex is
much more useful for distinguishing between incipient AD (mild cognitive impairment)
and healthy aging. The fMRI study by Daselaar, Cabeza, and colleagues provides indica
tions that the implementation of the recollection/familiarity distinction during EM re
trieval in combination with fMRI may be promising in that respect (Daselaar, Fleck, et al.,
2006). Finally, it should be noted that, despite the rigorous screening procedures typical
of functional neuroimaging studies of healthy aging, it remains possible that early symp
toms of age-related pathology went undetected in some of the studies reviewed in this
chapter.
Summary
In this chapter, we reviewed functional neuroimaging evidence highlighting the role of
the PFC and MTL regions in age-related decline in WM and EM function. The chapter fo
cused on two major factors thought to underlie age-related memory decline and strongly
linked to PFC and MTL function, namely deficits in executive function and deficits in
binding processes. We discussed functional neuroimaging studies that generally showed
age-related decreases in PFC and MTL activity during WM, EM encoding, and EM re
trieval. Yet, some of these studies also found preserved or increased levels of PFC or MTL
activity in OAs, which may be compensatory. Regarding the PFC, several WM and EM
studies have found an age-related increase in contralateral PFC activity, leading to an
overall reduction in frontal asymmetry in OAs (HAROLD). As discussed, studies that divid
ed OAs into high and low performers provided strong support for the idea that HAROLD
reflects a successful compensatory mechanism. Regarding the MTL, several WM and EM
studies reported age-related decreases in MTL activity. Yet, studies of EM retrieval have
also found age-related increases in MTL activity. Recent findings suggest that at least
some of these increases reflect a compensatory shift from hippocampal-based recollection
processes to parahippocampal-based familiarity processes. In sum, in view of the substan
tial changes in PFC and MTL that take place when we grow older, a reduction in WM and
EM function seems inevitable. Yet, our review also suggests that some OAs can counter
act this reduction by using alternative brain resources within the PFC and MTL that allow
them to compensate for the general deficits in executive and binding operations underly
ing WM and EM decline with age.
Page 20 of 27
Age-Related Decline in Working Memory and Episodic Memory
Future Directions
In this chapter we reviewed functional neuroimaging studies of WM and EM that identi
fied considerable age-related changes in PFC and MTL activity. These findings suggest
that functional changes in both PFC and MTL play a role in age-related memory deficits.
However, as outlined below, there are several open questions that need to be addressed
in future functional neuroimaging studies of memory and aging
(p. 470) What is the role of task demands in age differences in memory activations?
According to CRUNCH (Reuter-Lorenz & Cappell, 2008), OAs show more activity at lower
levels of task demands, but as demands increase, YAs show greater activity to meet in
creasing task loads. Yet, OAs have already reached their ceiling and will show reduced
performance and less activity. We discussed evidence from two recent WM studies sup
porting the CRUNCH model. We also discussed similar findings regarding the HAROLD
model, indicating, for instance, greater age-related asymmetry during more difficult re
call tasks than during simple recognition tasks. Future fMRI studies of aging and memory
should further clarify the relation between task difficulty and age-related activation dif
ferences by incorporating multiple levels of task difficulty in their design and including
task demand functions in their analyses.
Finally, two open question are: To what extent is the age-related shift from recollection-
based (hippocampus) to familiarity-based (rhinal cortex) retrieval indeed specific to
healthy OAs? and Can the familiarity versus recollection distinction, combined with fMRI,
be used in practice for the clinical diagnosis of pathological age-related conditions?
Author Note
This work was supported by grants from the National Institute on Aging to RC (AG19731;
AG23770; AG34580).
Page 21 of 27
Age-Related Decline in Working Memory and Episodic Memory
References
Anderson, N. D., Craik, F. I. M., & Naveh-Benjamin, M. (1998). The attentional demands
of encoding and retrieval in younger and older adults: I. Evidence from divided attention
costs. Psychology and Aging, 13, 405–423.
Balota, D. A., Cortese, M. J., Duchek, J. M., Adams, D., Roediger, H. L., McDermott, K. B.,
et al. (1999). Veridical and false memories in healthy older adults and in dementia of the
Alzheimer’s type. Cognitive Neuropsychology, 16 (3-5), 361–384.
Bastin, C., & Van der Linden, M. (2003). The contribution of recollection and familiarity to
recognition memory: A study of the effects of test format and aging. Neuropsychology, 17
(1), 14–24.
Braak, H., Braak, E., & Bohl, J. (1993). Staging of Alzheimer-related cortical destruction.
European Neurology, 33 (6), 403–408.
Buckner, R. L. (2004). Memory and executive function in aging and AD: Multiple factors
that cause decline and reserve factors that compensate. Neuron, 44 (1), 195–208.
Cabeza, R. (2002). Hemispheric asymmetry reduction in older adults: The HAROLD mod
el. Psychology and Aging, 17 (1), 85–100.
Cabeza, R., Anderson, N. D., Locantore, J. K., & McIntosh, A. R. (2002). Aging gracefully:
Compensatory brain activity in high-performing older adults. NeuroImage, 17 (3), 1394–
1402.
Cabeza, R., Daselaar, S. M., Dolcos, F., Prince, S. E., Budde, M., & Nyberg, L. (2004).
Task-independent and task-specific age effects on brain activity during working memory,
visual attention and episodic retrieval. Cerebral Cortex, 14 (4), 364–375.
Cabeza, R., Grady, C. L., Nyberg, L., McIntosh, A. R., Tulving, E., Kapur, S., et al. (1997).
Age-related differences in neural activity during memory encoding and retrieval: A
positron emission tomography study. Journal of Neuroscience, 17, 391–400.
Cappell, K. A., Gmeindl, L., & Reuter-Lorenz, P. A. (2010). Age differences in prefrontal
recruitment during verbal working memory maintenance depend on memory load. Cortex,
46 (4), 462–473.
Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia and the Hippocampal system.
Cambridge MA: MIT Press.
Page 22 of 27
Age-Related Decline in Working Memory and Episodic Memory
Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial tem
poral lobes: Recollection, familiarity, and novelty. Journal of Neurophysiology, 96 (4),
1902–1911.
Daselaar, S. M., Fleck, M. S., Dobbins, I. G., Madden, D. J., & Cabeza, R. (2006). Effects of
healthy aging on hippocampal and rhinal memory functions: An event-related fMRI study.
Cerebral Cortex, 16 (12), 1771–1782.
Daselaar, S. M., Veltman, D. J., Rombouts, S. A., Raaijmakers, J. G., & Jonker, C. (2003a).
Deep processing activates the medial temporal lobe in young but not in old adults. Neuro
biology of Aging, 24 (7), 1005–1011.
Daselaar, S. M., Veltman, D. J., Rombouts, S. A., Raaijmakers, J. G., & Jonker, C. (2003b).
Neuroanatomical correlates of episodic encoding and retrieval in young and elderly sub
jects. Brain, 126 (Pt 1), 43–56.
Dennis, N. A., Daselaar, S., & Cabeza, R. (2006). Effects of aging on transient and sus
tained successful memory encoding activity. Neurobiology of Aging, 28 (11), 1749–1758.
Dennis, N. A., Hayes, S. M., Prince, S. E., Madden, D. J., Huettel, S. A., & Cabeza, R.
(2008). Effects of aging on the neural correlates of successful item and source memory
encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (4),
791–808.
Dennis, N. A., Kim, H., & Cabeza, R. (2008). Age-related differences in brain activity dur
ing true and false memory retrieval. Journal of Cognitive Neuroscience, 20 (8), 1390–
1402.
Dickerson, B. C., Salat, D. H., Bates, J. F., Atiya, M., Killiany, R. J., Greve, D. N., et al.
(2004). Medial temporal lobe function and structure in mild cognitive impairment. Annals
of Neurology, 56 (1), 27–35.
Duverne, S., Motamedinia, S., & Rugg, M. D. (2009). The relationship between ag
(p. 471)
ing, performance, and the neural correlates of successful memory encoding. Cerebral
Cortex, 19 (3), 733–744.
Eichenbaum, H., Otto, T., & Cohen, N. J. (1994). Two functional components of the hip
pocampal memory system. Behavioral and Brain Sciences, 17 (3), 449–472.
Page 23 of 27
Age-Related Decline in Working Memory and Episodic Memory
Eichenbaum, H., Yonelinas, A. P., & Ranganath, C. (2007). The medial temporal lobe and
recognition memory. Annual Review in Neuroscience, 30, 123–152.
Giovanello, K. S., Kensinger, E. A., Wong, A. T., & Schacter, D. L. (2010). Age-related neur
al changes during memory conjunction errors. Journal of Cognitive Neuroscience, 22 (7),
1348–1361.
Grady, C. L. (2005). Functional connectivity during memory tasks in healthy aging and de
mentia. In R. Cabeza, L. Nyberg, & D. Park (Eds.), Cognitive neuroscience of aging (pp.
286–308). New York: Oxford University Press.
Grady, C. L., McIntosh, A. R., Bookstein, F., Horwitz, B., Rapoport, S. I., & Haxby, J. V.
(1998). Age-related changes in regional cerebral blood flow during working memory for
faces. NeuroImage, 8 (4), 409–425.
Grady, C. L., McIntosh, A. R., Horwitz, B., Maisog, J. M., Ungerleider, L. G., Mentis, M. J.,
et al. (1995). Age-related reductions in human recognition memory due to impaired en
coding. Science, 269 (5221), 218–221.
Gutchess, A. H., Welsh, R. C., Hedden, T., Bangert, A., Minear, M., Liu, L. L., et al. (2005).
Aging and the neural correlates of successful picture encoding: Frontal activations com
pensate for decreased medial-temporal activity. Journal of Cognitive Neuroscience, 17 (1),
84–96.
Hasher, L., Zacks, R. T., & May, C. P. (1999). Inhibitory control, circadian arousal, and
age. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII, cognitive regula
tion of performance: Interaction of theory and application (pp. 653–675). Cambridge, MA:
MIT Press.
Hedden, T., & Gabrieli, J. D. (2004). Insights into the ageing mind: A view from cognitive
neuroscience. Nature Reviews Neuroscience, 5 (2), 87–96.
Java, R. I. (1996). Effects of age on state of awareness following implicit and explicit
word-association tasks. Psychology and Aging, 11 (1), 108–111.
Jennings, J. M., & Jacoby, L. L. (1993). Automatic versus intentional uses of memory: Ag
ing, attention, and control. Psychology and Aging, 8 (2), 283–293.
Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological
Bulletin, 114 (1), 3–28.
Page 24 of 27
Age-Related Decline in Working Memory and Episodic Memory
Killiany, R. J., Gomez-Isla, T., Moss, M., Kikinis, R., Sandor, T., Jolesz, F., et al. (2000). Use
of structural magnetic resonance imaging to predict who will get Alzheimer’s disease. An
nals of Neurology, 47 (4), 430–439.
Logan, J. M., Sanders, A. L., Snyder, A. Z., Morris, J. C., & Buckner, R. L. (2002). Under-re
cruitment and nonselective recruitment: Dissociable neural mechanisms associated with
aging. Neuron, 33 (5), 827–840.
Mantyla, T. (1993). Knowing but not remembering: Adult age differences in recollective
experience. Memory and Cognition, 21 (3), 379–388.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An
nual Review of Neuroscience, 24, 167–202.
Milner, B. (1972). Disorders of learning and memory after temporal lobe lesions in man.
Clinical Neurosurgery, 19, 421–446.
Mitchell, K. J., Johnson, M. K., Raye, C. L., & D’Esposito, M. (2000). fMRI evidence of age-
related hippocampal dysfunction in feature binding in working memory. Cognitive Brain
Research, 10 (1-2), 197–206.
Morcom, A. M., Good, C. D., Frackowiak, R. S., & Rugg, M. D. (2003). Age effects on the
neural correlates of successful memory encoding. Brain, 126, 213–229.
Morcom, A. M., Li, J., & Rugg, M. D. (2007). Age effects on the neural correlates of
episodic retrieval: Increased cortical recruitment with matched performance. Cerebral
Cortex, 17, 2491–2506.
Nestor, P. J., Scheltens, P., & Hodges, J. R. (2004). Advances in the early detection of
Alzheimer’s disease. Nature Medicine, 10 (Suppl), S34–S41.
Paller, K. A., & Wagner, A. D. (2002). Observing the transformation of experience into
memory. Trends in Cognitive Sciences, 6 (2), 93–102.
Park, D. C., Welsh, R. C., Marshuetz, C., Gutchess, A. H., Mikels, J., Polk, T. A., et al.
(2003). Working memory for complex scenes: Age differences in frontal and hippocampal
activations. Journal of Cognitive Neuroscience, 15 (8), 1122–1134.
Parkin, A. J., & Walter, B. M. (1992). Recollective experience, normal aging, and frontal
dysfunction. Psychology and Aging, 7, 290–298.
Page 25 of 27
Age-Related Decline in Working Memory and Episodic Memory
Pennanen, C., Kivipelto, M., Tuomainen, S., Hartikainen, P., Hanninen, T., Laakso, M. P., et
al. (2004). Hippocampus and entorhinal cortex in mild cognitive impairment and early
AD. Neurobiology of Aging, 25 (3), 303–310.
Ranganath, C., & Blumenfeld, R. S. (2005). Doubts about double dissociations between
short- and long-term memory. Trends in Cognitive Sciences, 9 (8), 374–380.
Raz, N. (2005). The aging brain observed in vivo. In R. Cabeza, L. Nyberg & D. C. Park
(Eds.), Cognitive neuroscience of aging (pp. 19–57). New York: Oxford University Press.
Raz, N., Gunning, F. M., Head, D., Dupuis, J. H., McQuain, J., Briggs, S. D., et al. (1997).
Selective aging of the human cerebral cortex observed in vivo: Differential vulnerability
of the prefrontal gray matter. Cerebral Cortex, 7 (3), 268–282.
Raz, N., Lindenberger, U., Rodrigue, K. M., Kennedy, K. M., Head, D., Williamson, A., et al.
(2005). Regional brain changes in aging healthy adults: General trends, individual differ
ences and modifiers. Cerebral Cortex, 15 (11), 1676–1689.
Reuter-Lorenz, P. A., & Cappell, K. A. (2008). Neurocognitive aging and the compensation
hypothesis. Current Directions in Psychological Science, 17 (3), 177–182.
Reuter-Lorenz, P., Jonides, J., Smith, E. S., Hartley, A., Miller, A., Marshuetz, C., et
(p. 472)
al. (2000). Age differences in the frontal lateralization of verbal and spatial working mem
ory revealed by PET. Journal of Cognitive Neuroscience, 12, 174–187.
Reuter-Lorenz, P. A., Marshuetz, C., Jonides, J., Smith, E. E., Hartley, A., & Koeppe, R.
(2001). Neurocognitive ageing of storage and executive processes. European Journal of
Cognitive Psychology, 13 (1-2), 257–278.
Rosen, A. C., Prull, M. W., O’Hara, R., Race, E. A., Desmond, J. E., Glover, G. H., et al.
(2002). Variable effects of aging on frontal lobe contributions to memory. NeuroReport, 13
(18), 2425–2428.
Rossi, S., Miniussi, C., Pasqualetti, P., Babiloni, C., Rossini, P. M., & Cappa, S. F. (2004).
Age-related functional changes of prefrontal cortex in long-term memory: A repetitive
transcranial magnetic stimulation study. Journal of Neuroscience, 24 (36), 7939–7944.
Rypma, B., & D’Esposito, M. (2000). Isolating the neural mechanisms of age-related
changes in human working memory. Nature Neuroscience, 3 (5), 509–515.
Scheltens, P., Fox, N., Barkhof, F., & De Carli, C. (2002). Structural magnetic resonance
imaging in the practical assessment of dementia: beyond exclusion. Lancet Neurology, 1
(1), 13–21.
Page 26 of 27
Age-Related Decline in Working Memory and Episodic Memory
Schneider-Garces, N. J., Gordon, B. A., Brumback-Peltz, C. R., Shin, E., Lee, Y., Sutton, B.
P., et al. (2009). Span, CRUNCH, and beyond: Working memory capacity and the aging
brain. Journal of Cognitive Neuroscience, 22 (4), 655–669.
Simons, J. S., & Spiers, H. J. (2003). Prefrontal and medial temporal lobe interactions in
long-term memory. Nature Reviews Neuroscience, 4 (8), 637–648.
Spencer, W. D., & Raz, N. (1995). Differential effects of aging on memory for content and
context: A meta-analysis. Psychology and Aging, 10 (4), 527–539.
Squire, L. R., Schmolck, H., & Stark, S. M. (2001). Impaired auditory recognition memory
in amnesic patients with medial temporal lobe lesions. Learning and Memory, 8 (5), 252–
256.
Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A meta-
analysis. Cognitive, Affective and Behavioral Neuroscience, 3 (4), 255–274.
Sander Daselaar
Sander Daselaar, Donders Institute for Brain, Cognition, and Behaviour, Radboud
University, Nijmegen, Netherlands, Center for Cognitive Neuroscience, Duke Univer
sity, Durham, NC
Roberto Cabeza
Page 27 of 27
Memory Disorders
Memory Disorders
Barbara Wilson and Jessica Fish
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Following some introductory comments, this chapter describes the ways in which memo
ry systems can be affected after an insult to the brain. The main focus is on people with
nonprogressive brain injury, particularly traumatic brain injury (TBI), stroke, encephali
tis, and hypoxia. References are provided for those readers interested in memory disor
ders following progressive conditions such as dementia. The chapter then considers re
covery of memory function for people with nonprogressive memory deficits. This is fol
lowed by a section addressing the assessment of memory abilities, suggesting that both
standardized and functional assessment procedures are required to identify an
individual’s cognitive strengths and weaknesses and to plan for rehabilitation. The final
part of the chapter addresses rehabilitation of memory functioning, including compen
satory strategies, ways to improve new learning, the value of memory groups, and ways
to reduce the emotional consequences of memory impairment.
Memory impairment is one of the most common consequences of an insult to the brain
(Stilwell et al., 1999), affecting almost all people with dementia (approximately 10 per
cent of people over the age of 65 years), some 36 percent of survivors of TBI, and about
70 percent of survivors of encephalitis, as well as those with hypoxic brain damage fol
lowing cardiac or pulmonary arrest, attempted suicide, or near drowning, Parkinson’s dis
ease, multiple sclerosis, AIDS, Korsakoff’s syndrome, epilepsy, cerebral tumors, and so
Page 1 of 26
Memory Disorders
forth. The problem is enormous, and unfortunately, many people do not get the care and
help they and their families need.
What do we mean by memory? Memory is not one skill or function or system but is a
“complex combination of memory subsystems” (Baddeley, 1992, p. 5). We can classify or
understand these systems in a number of ways: We can consider the stages involved in
memory, the length of time for which information is stored, the type of information
stored, the modality information is stored in, whether explicit or implicit recall is re
quired, whether recall or recognition is required, whether we are trying to remember
things that have already occurred or remember to do things in the future, and whether
memories date from before or after an insult to the brain. These distinctions are elaborat
ed on in the next section of this chapter.
Although survivors of brain injury can have problems with each one of these subcate
gories of (p. 474) memory, the most typical scenario is for a memory-impaired person to
have (1) normal or near normal immediate memory; (2) problems remembering after a
delay or distraction; (3) a memory gap for the time just before the accident, illness, or in
sult to the brain; and (4) difficulty learning most new information. Those with no other
cognitive deficits (apart from memory) are said to have an amnesic syndrome, whereas
those with more widespread cognitive deficits (the majority of memory-impaired people)
are said to have memory impairment. This distinction is not always adhered to, however,
and it is not uncommon to hear the term amnesia used for all memory-impaired people.
Although people with progressive conditions will not recover or improve memory func
tioning, much can be done to help them survive with a better quality of life (Camp et al.,
2000; Clare, 2008; Clare & Woods, 2004), they can learn new information (Clare et al.,
1999, 2000) and be helped to compensate for their difficulties (Berry, 2007). Recovery for
people with nonprogressive brain injury is addressed below.
Assessment of memory functioning and other cognitive areas is an important part of any
treatment program for people with memory difficulties. We need to identify the person’s
cognitive strengths and weaknesses, and we need to identify the everyday problems caus
ing the most distress for the patient and family. Having identified these situations, we
then need to help the memory-impaired person to reduce his or her real-life problems. As
sessment and treatment are the themes of the final parts of this chapter.
There are three stages involved in remembering: namely encoding, or the stage of taking
in information; storage, the stage of retaining of information; and retrieval, the stage of
accessing information when it is required. All can be differentially affected by an insult to
the brain even though, in real life, these stages interact with one another. For example,
people with encoding problems show attention difficulties. Although it is the case that in
Page 2 of 26
Memory Disorders
some circumstances remembering is unintentional and we can recall things that happen
when we are not paying attention to them, typically we need to pay attention when we
are learning something new or when it is important to remember. People with the classic
amnesic syndrome do not have encoding difficulties, while those with executive deficits,
say, following a TBI or in the context of dementia, may well have them.
Once information is registered in memory, it has to be stored there until needed. Most
people forget new information rather rapidly over the first few days, and then the rate of
forgetting slows down. This is also true for people with memory problems, bearing in
mind of course that in their case, relatively little information may be stored in the first
place. However, once the information is encoded adequately and enters the long-term
store, testing, rehearsal, or practice can help keep it there.
Retrieving information when we need it is the third stage in the memory process. We all
experience occasions when we know we know something such as a particular word or the
name of a person or a film, yet we cannot retrieve it at the right moment. This is known
as the “tip of the tongue phenomenon.” If someone provides us with a word we can usual
ly determine immediately whether or not it is correct. Retrieval problems are even more
likely for people with memory problems. If we can provide a “hook” in the form of a cue
or prompt, we may be able to help them access the correct memory. Wilson (2009)
discusses ways to improve encoding, storage, and retrieval.
Memory Systems
Page 3 of 26
Memory Disorders
ry store holds information for a few seconds, and the long-term memory store holds infor
mation for anything from minutes to years.
The term short-term memory is so misused by the general public that it may be better to
avoid it and instead to use the terms immediate memory, primary memory, and working
memory. Immediate memory and primary memory refer to the memory span as measured
by the number of digits that can be repeated in the correct order after one presentation
(seven plus or minus two for the vast majority of people; Miller, 1956) or by the recency
effect in free recall (the ability to recall the last few items in a list of words, letters, or
digits). Baddeley (2004) prefers the term primary memory when referring to the simple
unitary memory system and working memory when referring to the interacting range of
temporary memory systems people typically use in real-life situations.
Baddeley and Hitch’s (1974) working memory model, which describes how information is
stored and manipulated over short periods of time, is one of the most influential theories
in the history of psychology. Working memory is composed of three components: the cen
tral executive and two subsidiary systems, the phonological loop and the visual-spatial
sketchpad. The phonological loop is a store that holds memory for a few seconds, al
though this can be increased through the use of subvocal speech and can also convert vi
sual material that is capable of being named into a phonological code (Baddeley, 2004).
The visual-spatial sketchpad is the second subsidiary system that allows for the tempo
rary storage and manipulation of visual and spatial information. In clinical practice, one
can see people with selective damage to each of these systems. Those with a central exec
utive disorder are said to have the dysexecutive syndrome (Baddeley, 1986; Baddeley &
Wilson, 1988a). Their main difficulties are with planning, organization, divided attention,
perseveration, and dealing with new or novel situations (Evans, 2005). Those with phono
logical loop disorders have difficulties with speech processing and with new language ac
quisition. The few patients we see with immediate memory disorders, who can only re
peat back one or two digits, words, or letters accurately, have a phonological loop disor
der. Vallar and Papagno (2002) discuss the phonological loop in detail, whereas Baddeley
and Wilson (1988b; Wilson & Baddeley, 1993) describe a patient with a phonological loop
deficit and his subsequent recovery. People with visual-spatial sketchpad disorders may
have deficits in just one of these functions, in other words, visual and spatial difficulties
are separable (Della Sala & Logie, 2002). Although one such patient was described by
Wilson, Baddeley, and Young (1999), such patients are rare in clinical practice and are not
typical of memory-impaired people or of those referred for rehabilitation.
In 2000, Baddeley added a fourth component to working memory, namely, the episodic
buffer, a multimodal temporary interface between the two subsidiary systems and long-
term memory. When assessing an amnesic patient’s immediate recall of a prose passage,
it is not uncommon to find that the score is in the normal range even though, given the
capacity of the short-term store, the recall should be no more than seven words plus or
minus two (Miller, 1956). The episodic buffer provides an explanation for the enhanced
score because it allows amnesic patients with good intellectual functioning and no execu
tive impairments to show apparently normal immediate memory for prose passages that
Page 4 of 26
Memory Disorders
would far exceed the capacity of either of the subsidiary systems. Delayed memory for
people with amnesia is, of course, still severely impaired.
Episodic Memory
Episodic memory is autobiographical in nature. Tulving (2002) states that it is “about hap
penings in particular places at particular times, or about ‘what,’ ‘where,’ and ‘when,’”,
and that it “makes possible mental time travel,” in the sense that it allows one to relive
previous experiences (p. 3). Broadly speaking, episodic memory is supported by the medi
al temporal lobes, although there is continuing debate regarding the specific roles of the
hippocampus and the parahippocampal, entorhinal, and perirhinal cortices in supporting
different aspects of episodic memory. As would be expected, impairment of episodic mem
ory is common following damage (p. 476) to the temporal lobes, and it is the primary fea
ture of Alzheimer’s disease. Tulving and colleagues have also reported a case of relatively
specific and near-total episodic memory loss. The patient K.C. suffered a closed head in
jury that resulted in widespread and extensive brain damage, including damage to the
medial temporal lobes. Subsequently, K.C. was unable to remember any events from his
life before the injury and was unable to retain “new” events for more than a few minutes.
In contrast, he retained semantic memory for personal details such as addresses and
school names, although he could not create new personal semantic memories (see Tulv
ing, 2002, for a summary).
In understanding disorders of episodic memory, modality and material specificity are im
portant considerations to bear in mind. In everyday life, we may be required to remember
information and events from several modalities, including things we have seen, heard,
smelled, touched, and tasted. In memory research, however, the main area of concern is
memory for auditory or visual information. In addition to the modality that material is
presented in, we should also consider the material itself, that is, whether it is verbal or
nonverbal. Not only can we remember things that we can label or read (verbal material),
we can also remember things that we cannot easily verbalize such as a person’s face (vi
sual material). Because different parts of the brain are responsible for visual and verbal
processing, these can be independently affected, with some people having a primary diffi
culty with nonverbal memory and others with verbal memory, as demonstrated by Milner
many years ago (1965, 1968, 1971) when she found that removal of the left temporal lobe
resulted in verbal memory deficits and removal of the right temporal lobe led to more dif
ficulties with remembering nonverbal material, such as faces, patterns, and mazes. If
memory for one kind of material is less affected, it might be possible to capitalize on the
Page 5 of 26
Memory Disorders
less damaged system in order to compensate for the damaged one. People with amnesic
syndrome have problems with both systems. This does not mean that they cannot benefit
from mnemonics and other memory strategies, as we will see later in this chapter.
Recall and recognition are two other episodic memory processes that may be differential
ly affected in people with neurological disorders. Differences between recall and recogni
tion may be attributable to their differential reliance on recollection (having distinct and
specific memory of the relevant episode) and familiarity (a feeling of knowing in relation
to the episode), and there has been considerable debate about whether familiarity and
recollection are separable within the medial temporal lobes. Diana et al. (2007) reviewed
functional imaging literature on this topic, and concluded that recollection is associated
with patterns of increased activation within the hippocampus and posterior parahip
pocampal gyrus, whereas familiarity is associated with activations in anterior parahip
pocampal gyrus (perirhinal cortex).
For most memory-impaired people, recall is believed to be harder than recognition. Kopel
man et al. (2007), however, believe this may be due to the way the two processes are
measured. They studied patients with hippocampal, medial temporal lobe, more wide
spread temporal lobe, or frontal pathology. Initially, it looked as if all patients found recall
harder than recognition, but when converted to Z scores, the differences were eliminat
ed. In clinical practice, it is important to measure both visual and verbal recall and recog
nition, as discussed in the assessment section below.
It is also well established that patients with frontal lobe damage, because of poor strategy
application (Kopelman & Stanhope 1998; Shallice & Burgess 1991), benefit from cues and
thus are less impaired on recognition tasks than on recall tasks because they do not have
to apply a retrieval strategy. This probably depends on which regions of the frontal lobes
are affected. Stuss and Alexander (2005) found that different areas of the frontal lobes
were concerned with different memory processes and that some regions were involved
with nonstrategic memory encoding. A later study by McDonald et al. (2006) found that
only patients with right frontal (and not those with left frontal) lesions had recognition
memory deficits. Fletcher and Henson (2001) reviewed functional imaging studies of
frontal lobe contributions to memory and concluded that ventrolateral frontal regions are
associated with updating or maintenance in memory; dorsolateral areas with selection,
manipulation, and monitoring, and anterior areas with the selection of processes or sub
goals.
Semantic Memory
We all have a very large store of information about what things mean, look like, sound
like, smell like, and feel like, and we do not need to remember when this information was
acquired or who was present at the time. Learning typically takes place over many occa
sions in a variety of circumstances. Most memory-impaired people have a normal (p. 477)
semantic memory at least for information acquired before the onset of the memory im
pairment. People with a pure amnesic syndrome have little difficulty recalling from their
semantic memory store information laid down well before the onset of the amnesia, but
Page 6 of 26
Memory Disorders
they may have great difficulty laying down new semantic information. This is because ini
tially one has to depend on episodic memory in order for information to enter the seman
tic store (Cermak & O’Connor, 1983).
The group most likely to show problems with semantic memory includes those with pro
gressive semantic dementia (Snowden, 2002; Snowden, Neary, Mann, et al., 1992; War
rington, 1975). This condition is characterized by a “selective degradation of core seman
tic knowledge, affecting all types of concept, irrespective of the modality of
testing” (Lambon Ralph & Patterson, 2008, p. 61). Episodic memory deficits in these peo
ple may be less affected. Mummery et al. (2000) reported that although several areas of
frontal and temporal cortex are affected in semantic dementia, the area most consistently
and significantly affected in semantic dementia is the left anterior temporal lobe. Further
more, the degree of atrophy in this region is associated with the degree of semantic im
pairment—a relationship that does not hold for the other regions affected. Semantic
memory difficulties may also be seen in some survivors of nonprogressive brain injury,
particularly after encephalitis or anoxia, although people with TBI can also exhibit these
problems (Wilson, 1997).
Prospective Memory
Yet another way we can understand memory problems is to distinguish between retro
spective memory and prospective memory. The former is memory for past events, inci
dents, word lists, and other experiences as discussed above. This can be contrasted with
prospective memory, which is remembering to do things rather than remembering things
that have already happened. One might have to remember to do something at a particular
Page 7 of 26
Memory Disorders
time (e.g., to watch the news on television at 9.00 p.m.), within a given interval of time
(e.g., to take your medication in the next 10 minutes), or when a certain event happens
(e.g., when you next see your sister, to give her a message from an old school friend).
Groot, Wilson, Evans, and Watson (2002) showed that prospective memory can fail be
cause of memory failures or because of executive difficulties. Indeed, there is support for
the view that measures of executive functioning are better at accounting for prospective
memory failures than are measures of memory (Burgess et al., 2000). In a review of the
literature, Fish, Manly, and Wilson (2010) suggested a hierarchical model whereby
episodic memory problems are likely to lead to prospective memory problems due to for
getting task-related information (e.g., what to do, when to do it, that there is a task to do).
However, when retrospective memory functioning is adequate, other more executive
problems can lead to prospective memory failure, for example, those resulting from inad
equate monitoring of the self and environment for retrieval cues, failure to initiate the in
tended activity, or difficulty in applying effective strategies and managing multiple task
demands. Considering the multi-componential nature of prospective memory tasks, it is
not surprising that problems with such tasks are one of the most frequently reported
complaints when people are asked what everyday memory problems they face (Baddeley,
2004). (p. 478) In clinical practice, treatment of prospective memory disorders is one of
the major components of memory rehabilitation.
Procedural Memory
Procedural memory refers to the abilities involved in learning skills or routines. These
abilities are generally tested in the laboratory by asking people to repeatedly perform an
unfamiliar perceptual or motor task, such as reading mirror-reversed words or tracing
the shape of a star with only the reflection of a mirror for visual feedback. Common real-
life examples are learning how to use a computer or to ride a bicycle. The primary char
acteristic of this kind of learning is that it does not depend on conscious recollection; in
stead, the learning can be demonstrated without the need to be aware of where and how
the original learning took place. For this reason, most people with memory problems
show normal or relatively normal procedural learning (Brooks & Baddeley, 1976; Cohen
et al., 1985). Some patients are known to show impaired procedural learning, particularly
those with Huntington’s disease and those with Parkinson’s disease (Osman et al., 2008;
Vakil & Herishanu-Naaman, 1998). People with Alzheimer’s disease may show a deficit
(Mitchell & Schmitt, 2006) or may not (Hirono et al., 1997).
Page 8 of 26
Memory Disorders
Priming
The term priming refers to the processes whereby performance on a given task is im
proved or biased through prior exposure or experience. Again, no conscious memory of
the previous episode is necessary. Thus, if an amnesic patient is shown and reads a list of
words before being shown the word stems (in the form of the first two or three letters of
the word), he or she is likely to respond with the previously studied words even though
there is no conscious or explicit memory of seeing them before (Warrington & Weiskrantz,
1968). Further, a double dissociation has been reported between priming and recognition
memory. Keane et al. (1995) reported that a patient with bilateral occipital lesions showed
impaired priming but intact recognition memory, whereas patient H.M., who had bilateral
medial temporal lesions, showed intact priming but impaired recognition memory. Hen
son (2009), integrating such evidence from neuropsychological studies, along with experi
mental psychological and neuroimaging studies of priming, considers that priming is a
form of memory dissociable from declarative memory, which reflects plasticity in the form
of reduced neural activity in areas of the brain relevant to the task in question. For exam
ple in word-reading tasks, such reductions are seen in the left inferior frontal, left inferior
temporal, and occipital regions. These reductions are thought to reflect more efficient
processing of the stimuli.
Recovery means different things to different people. Some focus on survival rates, others
are more concerned with recovery of cognitive functions, and others only consider biolog
ical recovery such as repair of brain structures. Some interpret recovery as the exact re
instatement of behaviors disrupted by the brain injury (LeVere, 1980), but this state is
rarely achieved by memory-impaired people. Jennett and Bond (1975) define “good recov
ery” on the Glasgow Outcome Scale as “resumption of normal life even though there may
be minor neurological and psychological deficits” (p. 483). This is sometimes achievable
for those with organic memory problems (Wilson, 1999). Kolb (1995) says that recovery
typically involves partial recovery of lost functioning together with considerable substitu
tion of function, that is, compensating for the lost function through other means. Because
this includes both natural recovery and compensatory approaches, it is, perhaps, more
satisfactory for those of us working in rehabilitation.
TBI is the most common cause of brain damage and memory impairment in people
younger than 25 years. Some, and often considerable, recovery may be seen in people in
curring such injury. This is likely to be fairly rapid in the early weeks and months after in
jury, followed by a slower recovery that can continue for many years. Those with other
kinds of nonprogressive injury such as stroke (cerebrovascular accident), encephalitis,
and hypoxic brain damage may show a similar pattern of recovery, although (p. 479) this
typically lasts for months rather than years. For many people with severe brain damage,
recovery will be minimal, and compensatory approaches may provide the best chance of
reducing everyday problems, increasing independence, and enhancing quality of life.
Page 9 of 26
Memory Disorders
Mechanisms of recovery include resolution from edema or swelling of the brain, diaschi
sis (whereby lesions cause damage to other areas of the brain through shock), plasticity
or changes to the structure of the nervous system, and regeneration or regrowth of neur
al tissue. Changes seen in the first few minutes (e.g., after a mild head injury) probably
reflect the resolution of temporary damage that has not caused structural damage.
Changes seen within several days are more likely to be due to resolution of temporary
structural abnormalities such as edema, vascular disruption, or the depression of enzyme
metabolic activity. Recovery after several months or years is less well understood. There
are several ways in which this might be achieved, including diaschisis, plasticity, or re
generation. Age at insult, diagnosis, the number of insults sustained by an individual, and
the premorbid status of the individual’s brain are other factors that may influence recov
ery from brain injury. Kolb (2003) provides an overview of plasticity and recovery from
brain injury.
So much for general recovery, but what about recovery of memory functions themselves?
Although some recovery of lost memory functioning occurs in the early weeks and months
following an insult to the brain, many people remain with lifelong memory problems. Pub
lished studies show contradictory evidence, with some individuals showing no improve
ment and others showing considerable improvement. It is clear that we can improve on
natural recovery through rehabilitation (Wilson, 2009). Because restoration of episodic
memory is unlikely in most cases after the acute period, compensatory approaches are
the most likely to lead to changes in everyday memory functioning. Before beginning re
habilitation, however, a thorough assessment is required, and this is addressed in the
next section.
Some questions can be answered through the use of standardized tests, others need func
tional or behavioral assessments, and others may require specially designed procedures.
Standardized tests can help us answer questions about the nature of the memory deficit—
for example, “does this person have an episodic memory disorder?” or “Is the memory
problem global or restricted to certain kinds of material (e.g., Is memory for visual mater
ial better than for verbal material)”? They can also help us answer questions about indi
rect effects on memory functioning such as, “To what extent are the memory problems
due to executive, language, perceptual, or attention difficulties?” or “Is this person de
Page 10 of 26
Memory Disorders
pressed?” If the assessment question is, “What are this person’s memory strengths and
weaknesses?” we should assess immediate and delayed memory; verbal and visuo-spatial
memory; recall and recognition; semantic and episodic memory; explicit and implicit
memory; anterograde and retrograde amnesia; and new learning and orientation. Pub
lished standardized tests exist for many of these functions, apart from implicit memory
and retrograde amnesia, for which one might need to design specific procedures (see Wil
son, 2004, for a fuller discussion of the assessment of memory). Of course, memory as
sessments should not be carried out alone—it will also be necessary to assess general in
tellectual functioning; predict premorbid ability; assess language, reading, perception, at
tention, and executive functioning to get a clear picture of the person’s cognitive
strengths and weaknesses; and assess anxiety, depression, and perhaps other areas of
emotion such as post-traumatic stress disorder. These assessments are needed for reha
bilitation because the emotional consequences of memory impairment should be treated
alongside the memory and other cognitive consequences of any insult to the brain
(Williams & Evans, 2003)
When we want to answer more treatment-related questions like, “How are the memory
difficulties manifested in everyday life?” or “What coping strategies are used?” we need a
more functional or behavioral approach through observations, self-report measures (from
relatives or caregivers as well as (p. 480) from the memory-impaired person), or inter
views because these are more suited and able to answer real-life, practical problems. The
standardized and functional assessment procedures provide complementary information:
The former allow us to build up a cognitive map of a person’s strengths and weaknesses,
whereas the latter enable us to target areas for treatment.
Rehabilitation of Memory
Rehabilitation should focus on improving aspects of everyday life and should address per
sonally meaningful themes, activities, settings, and interactions (Ylvisaker & Feeney,
2000). Many survivors of brain injury will face problems in everyday life. These could be
difficulties with motor functioning, impaired sensation, reduced cognitive skills, emotion
al troubles, conduct or behavioral problems, and impoverished social relationships. Some
people will have all of these. In addition to emotional and psychosocial problems, neu
ropsychologists in rehabilitation are likely to treat cognitive difficulties, including memo
ry deficits (Wilson et al., 2009). The main purposes of rehabilitation, including memory
rehabilitation, are to enable people with disabilities to achieve their optimal level of well-
being, to reduce the impact of their problems on everyday life, and to help them return to
their own most appropriate environments. Its purpose is not to teach individuals to score
better on tests or to learn lists of words or to be faster at detecting stimuli. Apart from
some emerging evidence that it might be possible to achieve some degree of restoration
of working memory in children with attention deficit hyperactivity disorder (Klingberg,
2006; Klingberg et al., 2005) and possibly in stroke patients (Westerberg et al., 2007), no
evidence exists for recovery of episodic memory in survivors of brain injury. Thus restora
tion of memory functioning is at this time an unrealistic goal. There is evidence, however,
that we can help people to compensate for their difficulties, find ways to help them learn
Page 11 of 26
Memory Disorders
more efficiently, and for those with very severe and widespread cognitive problems, orga
nize the environment so that they can function without the need for memory (Wilson,
2009). These are more realistic goals for memory rehabilitation.
One series of studies carried out in Cambridge, England involves the use of a paging sys
tem, “NeuroPage,” to determine whether people can carry out everyday tasks more effi
ciently with or without a pager. Following a pilot study (Wilson et al., 1997), a randomized
control trial (cross-over design) was carried out. People were randomly allocated to pager
first (Group A) or waiting list first (Group B). All participants chose their own target be
haviors that they needed to remember each day. Taking medication, feeding pets, and
turning on the hot water system were the kinds of target behavior selected. Most people
selected between four and seven messages each day. For 2 weeks (the baseline period),
participants recorded the frequency with which these targets were achieved each day; an
independent observer (usually a close relative) also checked to ensure the information
was accurate. In the baseline period, there were no significant differences between the
two groups. Group A participants were then provided with pagers for 7 weeks, while
Group B participants remained on the waiting list. There was a very significant improve
ment in the targets achieved by Group A. Group B (on the waiting list) did not change.
Then Group A returned the pagers, which were given to Group B. Now Group B showed a
statistically significant improvement over baseline and waiting list periods. Group A par
ticipants dropped back a little but were still better than baseline, showing that they had
learned many of their target behaviors during the 7 weeks with the pager (Wilson et al.,
2001). (p. 481) This study comprised several diagnostic groups, which have been reported
separately. People with TBI performed like the main group (Wilson et al., 2005), which is
not surprising because they were the largest subgroup. Four people with encephalitis all
improved with the pager (Emslie et al., 2007). People with stroke performed like the main
group in the baseline and treatment phases but dropped back to baseline levels when the
Page 12 of 26
Memory Disorders
pagers were returned (Fish et al., 2008), possibly because they were older and had more
executive deficits as a result of ruptured aneurysms on the anterior communicating
artery. There were twelve children in the study with ages ranging from 8 to 18 years (Wil
son et al., 2009), all of whom benefited. Approximately 80 percent of the 143 people in
the study reduced their everyday memory and planning problems. As a result of the study,
the local health authority set up a clinical service for people throughout the United King
dom—so this was an example of research influencing clinical practice.
Spaced retrieval, also known as expanding rehearsal (Landauer & Bjork, 1978), is also
widely used in memory rehabilitation. This method involves the presentation of material
to be remembered, followed by immediate testing, then a very gradual lengthening of the
retention interval. Spaced retrieval may work because it is a form of distributed practice,
that is, distributing the learning trials over a period of time rather than massing them to
gether in one block. Distributed practice is known to be more effective than massed prac
tice (Baddeley, 1999). Camp and colleagues in the United States have used this method
extensively with dementia patients (Camp et al., 1996; 2000; McKitrick & Camp, 1993),
but it has also been used to help people with TBI, stroke, encephalitis, and dementia.
Sohlberg (2005) discusses using this method in people with learning difficulties.
Those of us without severe memory difficulties can benefit from trial-and-error learning.
We are able to remember our mistakes and thus can avoid making the same mistake in fu
ture attempts. Because memory-impaired people have difficulty with this, any erroneous
response may be strengthened or reinforced. This is the rationale behind errorless learn
ing, a teaching technique whereby the likelihood of mistakes during learning is mini
mized as far as possible. Another way of understanding errorless learning is through the
principle of Hebbian plasticity and learning (Hebb, 1949). At a synaptic level, Hebbian
plasticity refers to increases in synaptic strength between neurons that fire together
Page 13 of 26
Memory Disorders
(“neurons that fire together wire together”). Hebbian learning refers to the detection of
temporally related inputs. If an input elicits a pattern of neural activity, then, according to
the Hebbian learning rule, the tendency to activate the same pattern on subsequent occa
sions is strengthened. This means that the likelihood of making the same response in the
future, whether correct or incorrect, is strengthened (McClelland et al., 1999). Like im
plicit memory, Hebbian learning has no mechanism for filtering out errors.
Errors can be avoided through the provision of spoken or written instructions or guiding
someone through a particular task or modeling the steps of a procedure. There is now
considerable evidence that errorless learning is superior to trial-and-error learning for
people with severe memory deficits. In a meta-analysis of errorless learning, Kessels and
De Haan (2003) found a large and statistically significant effect size of this kind of learn
ing for those with severe memory deficits. The combination of (p. 482) errorless learning
and spaced retrieval would appear to be a powerful learning strategy for people with pro
gressive conditions in addition to those with nonprogressive conditions (Wilson, 2009).
There has been some debate in the literature as to whether errorless learning depends on
explicit or implicit memory. Baddeley and Wilson (1994) argued that implicit memory was
responsible for the efficacy of errorless learning: Amnesic patients had to rely on implicit
memory, a system that is poor at eliminating errors (this is not to say that errorless learn
ing is a measure of implicit memory). Nevertheless, there are alternative explanations.
For example, the errorless learning advantage could be due to residual explicit memory
processes or to a combination of both implicit and explicit systems. Hunkin et al. (1998)
argued that it is due entirely to the effects of error prevention on the residual explicit
memory capacities, and not to implicit memory at all. Specifically, they used errorless and
errorful learning protocols similar to Baddeley and Wilson (1994) to teach word lists to
people with moderate to severe and relatively specific memory impairment. They investi
gated the errorless learning advantage in a fragment completion task intended to tap im
plicit memory of the learned words and in a cued-recall task intended to tap explicit mem
ory. The fragment completion task presented participants with two letters from learned
words from noninitial positions (e.g., _ _ T _ S _ ), with the task being to complete the
word. The cued-recall condition presented the first two letters of learned words (e.g., A R
_ _ _ _). In both of these examples the learned word is “ARTIST.” Hunkin et al. found that
the errorless learning advantage was only evident in the cued-recall task, with perfor
mance in fragment completion being equivalent between errorless and errorful learning
conditions. The authors interpreted this result as meaning that the errorless learning ad
vantage relies on residual explicit memory. Tailby and Haslam (2003) also believe that the
benefits of EL are due to residual explicit concurrent memory processes, although they do
not rule out implicit memory processes altogether. They say the issue is a complex one
and that different individuals may rely on different processes. Support for this view can
also be found in a paper by Kessels et al. (2005).
Page et al. (2006) claim, however, that preserved implicit memory in the absence of ex
plicit memory is sufficient to produce the errorless learning advantage. They challenge
the conclusions of Hunkin et al. (1998) because the design of their implicit task was such
Page 14 of 26
Memory Disorders
that it was unlikely to be sensitive to implicit memory for prior errors. As is clear from the
example above, the stimuli in the fragment completion task used by Hunkin et al. did not
prime errors made during learning in the same way that the cued-recall stimuli did. To
continue with the previous example, if the errors “ARCHES” and “AROUND” had been
made during learning, neither would fit with the fragment “_ _ T _ S _.” If the errorless
learning advantage results from avoiding implicit memory of erroneous responses, then
an advantage would not be expected within this fragment completion task. Furthermore,
there was an element of errorful learning in both the errorless and errorful explicit mem
ory conditions. They also challenge the Tailby and Haslam (2003) paper because it con
flates two separate questions. First, is the advantage of errorless learning due to the con
tribution of implicit memory? And second, is learning under errorless conditions due to
implicit memory? Perhaps some people do use both implicit and explicit systems when
learning material, but this does not negate the argument that the advantage (i.e., that
seen at retrieval) is due to implicit memory, particularly implicit memory for prior errors
following errorful learning. Some people with no or very little explicit recall can learn un
der certain conditions such as errorless learning. For example, the Baddeley and Wilson
(1994) study included sixteen very densely amnesic participants with extremely little ex
plicit memory capacity, yet nevertheless, every single one of them showed an errorless
over errorful advantage. Page et al. (2006) also found that people with moderate memory
impairment, and hence some retention of explicit memory, showed no greater advantage
from errorless learning than people with very severe memory impairment who had very
little explicit memory, which again supports the hypothesis that implicit memory is suffi
cient to produce that advantage.
Mnemonics are systems that enable us to remember things more easily and usually refer
to internal strategies, such as reciting a rhyme to remember how many days there are in
a month, or remembering the order of information such as “My very elderly mother just
sat upon a new pin” to remember the order of the planets around the sun (where my
stands for Mercury, very for Venus, elderly for Earth, etc). Although verbal and visual
mnemonic systems have been used successfully with memory-impaired people (Wilson,
2009), not everyone can use them. Instead of expecting memory-impaired people to use
mnemonics spontaneously, therapists may need to employ them to help their patients
(p. 483) achieve faster learning for particular pieces of information, such as names of a
Modifying the Environment for Those with Severe and Widespread Cogni
tive Deficits
People with very severe memory difficulties and widespread cognitive problems may be
unable to learn compensatory strategies and have major difficulties learning new episodic
information. They may be able to learn things implicitly. C.W., for example, the musician
who has one of the most severe cases of amnesia on record (Wearing, 2005; Wilson et al.,
2008) is unable to lay down any new episodic memories but has learned some things im
plicitly. Thus, if he is asked “where is the kitchen?” he says he does not know, but if asked
if he would like to go and make himself some coffee, he will go to the kitchen without er
Page 15 of 26
Memory Disorders
ror. He has implicit but no explicit memory of how to find the kitchen. For such people,
our only hope of improving quality of life and giving them as much independence as possi
ble is probably to organize or structure the environment so that they can function without
the need for memory. People with severe memory problems may not be handicapped in
environments where there are no demands made on memory. Thus, if doors, closets,
drawers, and storage jars are clearly labeled, if rooms are cleared of dangerous equip
ment, if someone appears to remind or accompany the memory-impaired person when it
is time to go to the hairdresser or to have a meal, the person may cope reasonably well.
Kapur et al. (2004) give other examples. Items can be left by the front door for people
who forget to take belongings with them when they leave the house; a message can be
left on the mirror in the hallway; and a simple flow chart can be used to help people
search in likely places when they cannot find a lost belonging (Moffat, 1989). Modifica
tions can also be made to verbal environments to avoid irritating behavior such as the
repetition of a question, story, or joke. It might be possible to identify a “trigger” or an an
tecedent that elicits this behavior. Thus, by eliminating the “trigger,” one can avoid the
repetitious behavior. For example, in response to the question “How are you feeling to
day?” one young brain-injured man would say “Just getting over my hangover.” If staff
simply said “Good morning,” however, he replied “Good morning,” so the repetitious com
ments about his supposed hangover were avoided.
Hospitals and nursing homes, in addition to other public spaces such as shopping cen
ters, may use color coding, signs, and other warning systems to reduce the chances of
getting lost. “Smart houses” are already in existence to help “disable the disabling envi
ronment” described by Wilson and Evans (2000). Some of the equipment used in Smart
houses can be employed to help the severely memory-impaired patient survive more easi
ly. For example, “photo phones” are telephones with large buttons (available from ser
vices for visually impaired people); each button can be programmed to dial a particular
number and a photo of the person who owns that number can be pasted on to the large
button. Thus, if a memory-impaired person wants to dial her daughter or her district
nurse, she simply presses the right photograph, and the number is automatically dialled.
Treatment for emotional difficulties includes psychological support for individuals and for
groups (Wilson et al., 2009). Individual psychological support is mostly derived from cog
nitive behavior therapy, which is now very much part of neuropsychological rehabilitation
programs, particularly in the United Kingdom (Gracey et al., 2009). Tyerman and King
Page 16 of 26
Memory Disorders
(2004) provide suggestions on how to adapt psychotherapy and cognitive behavior thera
py for those with memory problems. Notes, audiotapes and videotapes of sessions, fre
quent repetitions, mini-reviews, telephone reminders to complete homework tasks, and
use of family members as co-therapists can all help to circumvent the difficulties posed by
impaired retention of the therapeutic procedures.
Group therapy can also be of great assistance in helping to reduce anxiety and other emo
tional difficulties. Memory-impaired people often benefit from interaction with others
having similar problems. (p. 484) Those who fear they are losing their sanity may have
their fears allayed through the observation of others with similar problems. Groups can
reduce anxiety and distress; they can instill hope and show patients that they are not
alone; and it may be easier to accept advice from peers than from therapists or easier to
use strategies that peers are using rather than strategies recommended by professional
staff (Evans, 2009; Malley et al., 2009; Wilson, 2009).
Conclusions
1. Memory disorders can be classified in a number of ways, including the amount of
time for which information is stored, the type of information stored, the type of mate
rial to be remembered, the modality being employed, the stages involved in the mem
ory process, explicit and implicit memory, recall and recognition, retrospective and
prospective memory, and anterograde and retrograde amnesia.
2. Some recovery of memory functioning can be expected after an insult to the brain,
especially in the early days, weeks, and months after nonprogressive damage. Age at
insult, diagnosis, the number of insults sustained by the individual, and the premor
bid status of the individual’s brain are just a few of the factors influencing recovery.
Some people will remain with lifelong memory impairment. There is no doubt that we
can improve on natural recovery through rehabilitation. Given the fact that restora
tion of episodic, explicit memory is unlikely in most cases after the acute period,
compensatory approaches are the most likely to lead to change in everyday memory
functioning.
3. Before planning treatment for someone with memory difficulties, a detailed as
sessment should take place. This should include a formal neuropsychological assess
ment of all cognitive abilities, including memory, to build up a picture of a person’s
cognitive strengths and weaknesses. In addition, assessment of emotional and psy
chosocial functioning should be carried out. Standardized tests should be comple
mented with observations, interviews, and self-report measures.
4. Once the assessment has been carried out, one can design a rehabilitation pro
gram. One of the major ways of helping people with memory problems cope in every
day life is to enable them to compensate through the use of external aids; we can al
so help them learn more efficiently, and for those who are very severely impaired, we
may need to structure or organize the environment to help them function without a
memory. We can also provide support and psychotherapy to address the emotional
consequences of memory impairment.
Page 17 of 26
Memory Disorders
5. Rehabilitation can help people to compensate for, bypass, or reduce their everyday
problems and thus survive more efficiently in their own most appropriate environ
ments. Rehabilitation makes clinical and economic sense and should be widely avail
able to all those who need it.
References
Atkinson, R. C., & Shiffrin, R. M. (1971). The control of short-term memory. Scientific
American, 225: 82–90.
Baddeley, A. D. (1992). Memory theory and memory therapy. In B. A. Wilson & N. Moffat
(Eds.), Clinical management of memory problems (2nd ed., pp. 1–31). London: Chapman
& Hall.
Baddeley, A. D. (1997). Human memory: Theory and practice (revised edition). Hove: Psy
chology Press.
Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The psychol
ogy of learning and motivation: Advances in research and theory (pp. 47–89). New York:
Academic Press.
Baddeley, A. D., & Wilson, B. A. (1988a). Comprehension and working memory: A single
case neuropsychological study. Journal of Memory and Language, 27 (5), 479–498.
Baddeley, A. D., & Wilson, B. A. (1988b). Frontal amnesia and the dysexecutive syndrome.
Brain and Cognition, 7 (2), 212–230.
Baddeley, A. D., & Wilson, B. A. (1994). When implicit learning fails: Amnesia and the
problem of error elimination. Neuropsychologia, 32 (1), 53–68.
Brooks, D. N., & Baddeley, A. D. (1976). What can amnesic patients learn? Neuropsycholo
gia, 14 (1), 111–122.
Page 18 of 26
Memory Disorders
Burgess, P. W., Veitch, E., de Lacy Costello, A., & Shallice, T. (2000). The cognitive and
neuroanatomical correlates of multitasking. Neuropsychologia, 38 (6), 848–863.
Camp, C. J., Bird, M., & Cherry, K. (2000). Retrieval strategies as a rehabilitation
(p. 485)
aid for cognitive loss in pathological aging. In R. D. Hill, L. Bäckman, & A. Stigsdotter-
Neely (Eds.), Cognitive rehabilitation in old age (pp. 224–248). New York: Oxford Univer
sity Press.
Camp, C. J., Foss, J. W., Stevens, A. B., & O’Hanlon, A. M. (1996). Improving prospective
memory performance in persons with Alzheimer’s disease. In M. A. Brandimonte, G.O.
Einstein, & M. A. McDaniel (Eds.), Prospective memory: Theory and application (pp. 351–
367). Mahwah, NJ: Erlbaum.
Cermak, L. S., & O’Connor, M. (1983). The anterograde and retrograde retrieval ability of
a patient with amnesia due to encephalitis. Neuropsychologia, 21 (3), 213–234.
Clare, L. (2008). Neuropsychological rehabilitation and people with dementia. Hove, UK:
Psychology Press.
Clare, L., Wilson, B. A., Breen, K., & Hodges, J. R. (1999). Errorless learning of face-name
associations in early Alzheimer’s disease. Neurocase, 5, 37–46.
Clare, L., Wilson, B. A., Carter, G., Breen, K., Gosses, A., & Hodges, J. R. (2000). Interven
ing with everyday memory problems in dementia of Alzheimer type: An errorless learning
approach. Journal of Clinical and Experimental Neuropsychology, 22 (1), 132–146.
Clare, L., & Woods, R. T. (2004) Cognitive training and cognitive rehabilitation for people
with early-stage Alzheimer’s disease: A review. Neuropsychological Rehabilitation, 14,
385–401.
Cohen, N. J., Eichenbaum, H., Deacedo, B. S., & Corkin, S. (1985). Different memory sys
tems underlying acquisition of procedural and declarative knowledge. Annals of the New
York Academy of Sciences, 444, 54–71.
Corkin, S. (2002). What’s new with the amnesic patient H.M.? Nature Reviews Neuro
science, 3, 153–160.
Della Sala, S., & Logie, R. H. (2002). Neuropsychological impairments of visual and spa
tial working memory. In A. D. Baddeley, M. D. Kopelman, & B. A. Wilson (Eds.), The hand
book of memory disorders (pp. 271–292). Chichester, UK: John Wiley & Sons.
Diana, R. A., Yonelinas, A. P., & Ranganath, C. (2007). Imaging recollection and familiarity
in the medial temporal lobe: a three-component model. Trends in Cognitive Sciences, 11,
379–386.
Emslie, H., Wilson, B. A., Quirk, K., Evans, J., & Watson, P. (2007). Using a paging system
in the rehabilitation of encephalitic patients. Neuropsychological Rehabilitation, 17, 567–
581.
Page 19 of 26
Memory Disorders
Evans, J. J. (2009). The cognitive group part two: Memory. In B. A. Wilson, F. Gracey, J. J.
Evans, & A. Bateman (Eds.), Neuropsychological rehabilitation: Theory, therapy and out
comes (pp. 98–111). Cambridge, UK: Cambridge University Press.
Evans, J. J., Wilson, B. A., Needham, P., & Brentnall, S. (2003). Who makes good use of
memory aids: Results of a survey of 100 people with acquired brain injury. Journal of the
International Neuropsychological Society, 9 (6), 925–935.
Fish, J., Manly, T., Emslie, H., Evans, J. J., & Wilson, B. A. (2008). Compensatory strategies
for acquired disorders of memory and planning: Differential effects of a paging system for
patients with brain injury of traumatic versus cerebrovascular aetiology. Journal of Neu
rology, Neurosurgery and Psychiatry, 79, 930–935.
Fish, J., Wilson, B. A., & Manly, T. (2010). The assessment and rehabilitation of prospec
tive memory problems in people with neurological disorders: A review. Neuropsychologi
cal Rehabilitation, 20, 161–179.
Fletcher, P. C., & Henson, R. N. A. (2001). Frontal lobes and human memory: Insights
from functional neuroimaging. Brain, 124, 849–881.
Glisky, E. L., Schacter, D. L., & Tulving, E. (1986). Computer learning by memory-im
paired patients: Acquisition and retention of complex knowledge. Neuropsychologia, 24
(3), 313–328.
Gracey, F., Yeates, G., Palmer, S., & Psaila, K. (2009) The psychological support group. In
B. A. Wilson, F. Gracey, J. J. Evans, & A. Bateman (Eds.), Neuropsychological Rehabilita
tion: Theory, therapy and outcomes (pp. 123–137). Cambridge, UK: Cambridge University
Press.
Graham, K. S., Simons, J.S., Pratt, K. H., Patterson, K., & Hodges, J. R. (2000). Insights
from semantic dementia on the relationship between episodic and semantic memory. Neu
ropsychologia, 38, 313–324.
Groot, Y. C. T., Wilson, B. A., Evans, J. J., & Watson, P. (2002). Prospective memory func
tioning in people with and without brain injury. Journal of the International Neuropsycho
logical Society, 8 (05), 645–654.
Page 20 of 26
Memory Disorders
Hirono, N., Mori, E., Ikejiri, Y., Imamura, T., Shimomura, T., Ikeda, M., et al. (1997). Pro
cedural memory in patients with mild Alzheimer’s disease. Dementia and Geriatric Cogni
tive Disorders, 8 (4), 210–216.
Hunkin, N. M., Squires, E. J., Parkin, A. J., & Tidy, J. A. (1998). Are the benefits of error
less learning dependent on implicit memory? Neuropsychologia, 36 (1), 25–36.
Jennett, B., & Bond, M. (1975). Assessment of outcome after severe brain damage.
Lancet, 1 (7905), 480–484.
Kapur, N. (1993). Focal retrograde amnesia in neurological disease: A critical review. Cor
tex; a Journal Devoted to the Study of the Nervous System and Behavior, 29 (2), 217–234.
Kapur, N., Glisky, E. L., & Wilson, B. A. (2004). Technological memory aids for people with
memory deficits. Neuropsychological Rehabilitation, 14 (1/2), 41–60.
Keane, M. M., Gabrieli, J. D., Mapstone, H. C., Johnson, K. A., & Corkin, S. (1995). Double
dissociation of memory capacities after bilateral occipital-lobe or medial temporal-lobe le
sions. Brain, 119, 1129–1148.
Kessels, R. P. C., Boekhorst, S. T., & Postma, A. (2005). The contribution of implicit and ex
plicit memory to the effects of errorless learning: A comparison between young and older
adults. Journal of the International Neuropsychological Society, 11 (2), 144–151.
Kime, S. K., Lamb, D. G., & Wilson, B. A. (1996). Use of a comprehensive program of ex
ternal cuing to enhance procedural memory in a patient with dense amnesia. Brain Injury,
10, 17–25.
Klingberg, T., Fernell, E., Olesen, P., Johnson, M., Gustafsson, P., Dahlström, K., Gillberg,
C. G., Forssberg, H., & Westerberg, H. (2005). Computerized training of working memory
in children with ADHD—a randomized, controlled trial. Journal of the American Academy
of Child and Adolescent Psychiatry, 44 (2), 177–186.
Kolb, B. (2003). Overview of cortical plasticity and recovery from brain injury. Physical
Medicine and Rehabilitation Clinics of North America, 14 (1), S7–S25.
Page 21 of 26
Memory Disorders
Kopelman, M. D. (2000). Focal retrograde amnesia and the attribution of causality: An ex
ceptionally critical view. Cognitive Neuropsychology, 17 (7), 585–621.
Kopelman, M. D., Bright, P., Buckman, J., Fradera, A., Yoshimasu, H., Jacobson, C., &
Colchester, A. C. F. (2007). Recall and recognition memory in amnesia: Patients with hip
pocampal, medial temporal, temporal lobe or frontal pathology. Neuropsychologia, 45,
1232–1246.
Kopelman, M. D., & Stanhope, N. (1998). Recall and recognition memory in patients with
focal frontal, temporal lobe and diencephalic lesions. Neuropsychologia, 37, 939–958.
Lambon Ralph, M. A., & Patterson, K. (2008). Generalization and differentiation in seman
tic memory: Insights from semantic dementia. Annals of the New York Academy of
Sciences, 1124, 61–76.
Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In
M. M. Gruneberg, P. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–
632). London: Academic Press.
LeVere, T. E. (1980). Recovery of function after brain damage: A theory of the behavioral
deficit. Physiological Psychology, 8, 297–308.
Malley, D., Bateman, A., & Gracey, F. (2009). Practically based project groups. In B. A.
Wilson, F. Gracey, J. J. Evans, & A. Bateman (Eds.), Neuropsychological rehabilitation:
Theory, therapy and outcomes (pp. 164–180). Cambridge, UK: Cambridge University
Press.
McClelland, J. L., Thomas, A. G., McCandliss, B. D., & Fiez, J. A. (1999). Understanding
failures of learning: Hebbian learning, competition for representational space, and some
preliminary experimental data. Progress in Brain Research, 121, 75–80.
McDonald, C., Bauer, R., Filoteo, J., Grande, L., Roper, S., & Gilmore, R. (2006). Episodic
memory in patients with focal frontal lobe lesions. Cortex, 42, 1080–1092.
McKitrick, L. A., & Camp, C. J. (1993). Relearning the names of things: The spaced-re
trieval intervention implemented by a caregiver. Clinical Gerontologist, 14 (2), 60–62.
Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our ca
pacity for processing information. Psychological Review, 63 (2), 81–97.
Page 22 of 26
Memory Disorders
Milner, B. (1968). Visual recognition and recall after right temporal lobe excision in man.
Neuropsychologia, 6, 191–209.
Mitchell, D. B., & Schmitt, F. A. (2006). Short- and long-term implicit memory in aging
and Alzheimer’s disease. Neuropsychology, Development, and Cognition. Section B, Ag
ing, Neuropsychology and Cognition, 13 (3-4), 611–635.
Mummery, C. J., Patterson, K., Price, C. J., Ashburner, J., Frackowiak, R. S. J., & Hodges, J.
R. (2000). A voxelbased morphometry study of semantic dementia: relationship between
temporal lobe atrophy and semantic memory. Annals of Neurology, 47, 36–45.
Osman, M., Wilkinson, L., Beigi, M., Castaneda, C. S., & Jahanshahi, M. (2008). Patients
with Parkinson’s disease learn to control complex systems via procedural as well as non-
procedural learning. Neuropsychologia, 46 (9), 2355–2363.
Page, M., Wilson, B. A., Shiel, A., Carter, G., & Norris, D. (2006). What is the locus of the
errorless-learning advantage? Neuropsychologia, 44 (1), 90–100.
Prigatano, G. P., Klonoff, P. S., O’Brien, K. P., Altman, I. M., Amin, K., Chiapello, D., et al.
(1994). Productivity after neuropsychologically oriented milieu rehabilitation. Journal of
Head Trauma Rehabilitation, 9 (1), 91.
Scherer, M. (2005). Assessing the benefits of using assistive technologies and other sup
ports for thinking, remembering and learning. Disability and Rehabilitation, 27 (13), 731–
739.
Shallice, T., & Burgess, P. W. (1991). Higher-order cognitive impairments and frontal lobe
lesions in man. In H. S. Levin, H. M. Eisenberg, & A. L. Benton (Eds.), Frontal lobe func
tion and dysfunction (pp. 125–138). New York: Oxford University Press.
Snowden, J. S., Neary, D., Mann, D. M., Goulding, P. J., & Testa, H. J. (1992). Progressive
language disorder due to lobar atrophy. Annals of Neurology, 31 (2), 174–183.
Page 23 of 26
Memory Disorders
Squire, L. R., & Zola-Morgan, S. (1988). Memory: brain systems and behavior. Trends in
neurosciences, 11 (4), 170–175.
Stilwell, P., Stilwell, J., Hawley, C., & Davies, C. (1999). The national traumatic brain in
jury study: Assessing outcomes across settings. Neuropsychological Rehabilitation, 9 (3),
277–293.
Stuss D. T., & Alexander, M. P. (2005). Does damage to the frontal lobes produce impair
ment in memory? Current Directions in Psychological Science, 14, 84–88.
Sundberg, N. D., & Tyler, L. E. (1962). Clinical psychology. New York: Appleton-Century-
Crofts.
Tulving, E. (1995). Organization of memory: Quo vadis? In M. S. Gazzaniga (Ed.), The cog
nitive neurosciences (pp. 839–847). Cambridge, MA: MIT Press.
Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology,
53, 1–25.
Tyerman, A., & King, N. (2004). Interventions for psychological problems after brain in
jury. In L. H. Goldstein & J. E. McNeil (Eds.), Clinical neuropsychology: A practical guide
to assessment and management for clinicians (pp. 385–404). Chichester, UK: John Wiley
& Sons.
Warrington, E. K., & Weiskrantz, L. (1968). A study of learning and retention in amnesic
patients. Neuropsychologia, 6 (3), 283–291.
Page 24 of 26
Memory Disorders
Wearing, D. (2005). Forever today: A memoir of love and amnesia. London: Doubleday.
Westerberg, H., Jacobaeus, H., Hirvikoski, T., Clevberger, P., Ostensson, M. L., Bartfai, A.,
& Klingberg, T. (2007). Computerized working memory training after stroke—a pilot
study. Brain Injury, 21 (1), 21–29.
Wilson, B. A. (1991). Long-term prognosis of patients with severe memory disorders. Neu
ropsychological Rehabilitation, 1 (2), 117–134.
Wilson, B. A. (1997). Semantic memory impairments following non progressive brain in
jury a study of four cases. Brain Injury, 11 (4), 259–270.
Wilson, B. A. (1999). Case studies in neuropsychological rehabilitation (p. 384). New York:
Oxford University Press.
Wilson, B. A. (2009). Memory rehabilitation: Integrating theory and practice. New York:
Guilford Press.
Wilson, B. A., & Baddeley, A. D. (1993). Spontaneous recovery of impaired memory span:
Does comprehension recover? Cortex, 29 (1), 153–159.
Wilson, B. A., Baddeley, A. D., & Young, A. W. (1999). LE, a person who lost her “mind’s
eye.” Neurocase, 5 (2), 119–127.
Wilson, B. A., Emslie, H. C., Quirk, K., & Evans, J. J. (2001). Reducing everyday memory
and planning problems by means of a paging system: A randomised control crossover
study. Journal of Neurology, Neurosurgery and Psychiatry, 70 (4), 477–482.
Wilson, B. A., Emslie, H., Quirk, K., Evans, J., & Watson, P. (2005). A randomised control
trial to evaluate a paging system for people with traumatic brain injury. Brain Injury, 19,
891–894.
Wilson, B. A., Evans, J. J., Emslie, H., & Malinek, V. (1997). Evaluation of NeuroPage: A
new memory aid. Journal of Neurology, Neurosurgery and Psychiatry, 63, 113–115.
Wilson, B. A., Evans, J. J., Gracey, F., & Bateman, A. (2009). Neuropsychological rehabilita
tion: Theory, therapy and outcomes. Cambridge, UK: Cambridge University Press.
Page 25 of 26
Memory Disorders
Wilson, B. A., Fish, J., Emslie, H. C., Evans, J. J., Quirk, K., & Watson, P. (2009). The Neu
roPage system for children and adolescents with neurological deficits. Developmental
Neurorehabilitation, 12, 421–426.
Wilson, B. A., Kopelman, M. D., & Kapur, N. (2008). Prominent and persistent loss of self-
awareness in amnesia: Delusion, impaired consciousness or coping strategy? Neuropsy
chological Rehabilitation, 18, 527–540.
Ylvisaker, M., & Feeney, T. (2000). Reconstruction of identity after brain injury. Brain Im
pairment, 1 (1), 12–28.
Notes:
(1) . The two sensory memory systems most studied are visual or iconic memory and audi
tory or echoic memory. People with disorders in the sensory memory systems would usu
ally be considered to have a visual or an auditory perceptual disorder rather than a mem
ory impairment and thus are beyond the scope of this chapter.
Barbara Wilson
Jessica Fish
Page 26 of 26
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Spelling and reading are evolutionarily relatively new functions, and therefore it is plausi
ble that they are accomplished by engaging neural networks initially devoted to other
functions, such as object recognition in the case of reading. However, there are unique
aspects of these complex tasks, such as the fact that certain words (e.g., “regular” words)
never previously encountered can often be read accurately. Furthermore, spelling in
many ways seems to be simply the reverse of reading, but the relationship is not quite so
simple, at least in English. For example, the spoken word “lead” can be spelled led or
lead, but the printed word lead can be pronounced like “led” or “lead” (rhyming with
“feed”). Therefore, there may be some unique areas of the brain devoted to certain com
ponents of reading or spelling. This chapter reviews the cognitive processes underlying
these tasks as well as areas of the brain that are thought to be necessary for these com
ponent processes (e.g., on the basis of individuals who are impaired in each component
because of lesions in a particular area) and areas of the brain engaged in each compo
nent on the basis of functional imaging studies showing neural activation associated with
a particular type of processing.
Introduction
In this chapter, we tackle the issue of correspondence between cognitive and neural sub
strates underlying the processes of reading and spelling. The main questions we explore
are how the brain reads and spells and how the cognitive architecture corresponds to the
neural one. We focus our discussion on the role of the following areas of the left hemi
sphere that have been found to be involved in the processes of reading and spelling in le
sion and functional neuroimaging studies: the inferior frontal gyrus (Brodmann area [BA]
44/45), fusiform gyrus (inferior and mesial part of BA 37), angular gyrus (BA 39), supra
marginal gyrus (BA 40), and superior temporal gyrus (BA 22). We also discuss the role of
Page 1 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
some other areas that seem to be involved only in reading or spelling. The particular is
sues we address are (1) the distinction between areas that are engaged in, versus neces
sary for, reading and spelling, as shown by functional neuroimaging and lesion studies;
(2) the unimodality versus multimodality of each area; and (3) whether areas are dedicat
ed to one or more cognitive processes. The question of common regions for reading and
spelling is not the focus of this chapter, but we summarize recent papers that deal with
this particular issue (Philipose et al., 2007; Rapcsak, 2007; Rapp & Lipka, 2011). First, we
present what is known about the cognitive architecture of reading (recognition/compre
hension) and spelling (production) as derived from classic case studies in the tradition of
cognitive neuropsychology as well as relative computational (nonrepresentational) mod
els, briefly exposing the rationale for postulating each cognitive module or process. Sub
sequently, we connect what we know about neural processes (p. 492) in reading and writ
ing with the cognitive processes described earlier.
Page 2 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Although both representational and connectionist models represent dynamic systems, es
pecially in their more recent versions, the main difference between them is that in repre
sentational accounts, reading and spelling of familiar versus unfamiliar words are accom
plished through two distinct routes or mechanisms, whereas in connectionist architec
ture, they are accomplished through the same process. The two routes in the representa
tional account are a lexical route, where stored (learned) representations of the words
(phonological, semantic, and orthographic) may be accessed and available for reading or
spelling; and a sublexical route, which converts graphemes (abstract letter identities) to
phonemes (speech sounds) in reading and phonemes to graphemes in writing. The lexical
route is proposed to be used for reading and spelling irregular and low-frequency words.
For example, to read yacht, one would access the orthographic representation (learned
spelling) of yacht, the semantic representation (learned meaning) of yacht, and the
phonological representation (learned pronunciation) of yacht. The sublexical (orthogra
phy-to-phonology conversion) route is proposed to be used for decoding nonwords or
first-encountered words, such as unfamiliar proper names. Both routes may interact and
operate in parallel; that is, there is likely to be functional interaction between processes
and components (Hillis & Caramazza, 1991; Rapp et al., 2002). Hence, the sublexical
mechanisms (e.g., grapheme-to-phoneme conversion) might contribute (along with se
mantic information) to accessing stored lexical representations for output.
In a connectionist architecture, on the other hand, both words and nonwords (or unfamil
iar words) are read and spelled through the same mechanisms (Seidenberg & McClel
land, 1989; Plaut & Shallice, 1993). An important feature of these models is that in order
to read or write any word, the system does not need to rely on any previous representa
tion of the word or on any memory trace, but instead relies on the particular algorithm of
interaction between phonological units, semantic units, and orthographic units, and the
on frequency with which they are activated together.
Page 3 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
The issue of which architecture better explains the behavioral findings from both normal
participants and neurologically impaired patients has generated debate over the past
decades but has also helped to further our understanding of the cognitive architecture of
reading and writing. We do not address the issue here. For heuristic reasons, we adopt a
representational model to consider the neural substrates of reading and spelling, but we
assume, consistent with computational models, that the component processes are interac
tive and operate in parallel. We also assume, as in computational models, that the repre
sentations are distributed and overlapping. That is, the phonological representation of
“here” overlaps considerably with the phonological representations of “fear” and “heap,”
and completely with the phonological representation of “hear.” Before discussing evi
dence for the neural substrates of reading and spelling, we (p. 493) sketch out the basic
cognitive elements that have been proposed.
Reading Mechanisms
The reading process (see Figure 24.1) begins with access to graphemes from letter
shapes, with specific font, case, and so on. Once individual graphemes are accessed, they
comprise a case-, font-, and orientation-independent graphemic description, which is held
by a temporary working memory component (sometimes called the graphemic buffer).
From here, representational models split processing into the lexical and sublexical
routes. The sublexical route, used to read nonwords or first-encountered letter strings be
fore they acquire a memory trace (orthographic representations), is a sublexical mecha
nism that transforms graphemes to phonemes. The lexical mechanism, on the other hand,
consists of three components: the orthographic lexicon, which is a long-term memory
storage of orthographic representations of words that have been encountered before; the
semantic system, which contains the semantic representations (meanings) of these
words; and the phonological lexicon, which is a long-term memory storage of the phono
logical representations of words that have been encountered before. During reading, both
these computational routes end up at the peripheral component of converting phonemes
to motor plans for articulation, so that the word or nonword can be read aloud. As men
tioned earlier, the independence of these components was inferred from lesion studies, in
which patients showed patterns of performance that could be explained by assuming se
lective damage to a single component. Patients with selective deficits at the level of the
orthographic lexicon cannot access lexical orthographic representations; that is, cannot
read or understand familiar words, but they understand the same words when they hear
them (Patterson et al., 1985). Patients with selective deficits at the semantic level, on the
other hand, cannot understand either spoken or written familiar words (Howard et al.,
1984). Patients with selective deficits at the level of the phonological lexicon can under
stand written words but make word selection errors when trying to pronounce them (Bub
& Kertesz, 1982; Hillis & Caramazza, 1990; Rapp et al., 1997). Patients with deficits in
the sublexical grapheme-to-phoneme conversion mechanisms are unable to read pseudo
words or unfamiliar words but can read correctly familiar words (Beauvois & Derousne,
1979; Funnell, 1996; Marshall & Newcombe, 1973).
Page 4 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Spelling Mechanisms
There are two hypotheses regarding the cognitive computations of spelling: In the first,
they are described as the reverse of reading, and in the second, reading and spelling
share only the semantic system. According to the first account (also called the shared-
components account), the orthographic and phonological lexicons in spelling are the
same ones used in reading, whereas the second account (also called the independent-
components account) posits that phonological and orthographic lexicons used in spelling
are independent from reading. The issue of whether reading and spelling share the same
components has been a controversial issue. The existence of patients with both associa
tions and dissociations between their reading and spelling does not prima facie favor one
or the other account. The issue of whether reading and writing share the same computa
tional components is discussed in great detail by Hillis and Rapp (2004) for each mecha
nism: the semantic system, the orthographic lexicon, the sublexical mechanism, the
graphemic buffer, and the representation of letter shapes.
In either case, spelling to dictation starts off with a peripheral phonemic analysis system
that distinguishes the sounds of one’s own language from other languages or environmen
tal sounds, by matching the sounds to learned phonemes. Then, there are again perhaps
two routes: the lexical route and the sublexical route. In the lexical route, the sequence of
phonemes accesses the phonological lexicon, the suppository of stored representations of
spoken words, so that the heard word is recognized. The phonological representation
then accesses the semantic system that evokes the meaning of the word. Finally, the
stored orthographic representation of the word is accessed from the orthographic lexi
con. In the sublexical route, the phoneme-to-grapheme conversion mechanism allows for
the heard word or nonword to be transformed to a sequence of corresponding
graphemes. Both routes, then, end up at the graphemic buffer, a temporary storage
mechanism for sequences of graphemes, which “holds” the position and identity of letters
while each letter is written or spelled aloud. Finally, graphemes (abstract letter identities)
are converted to letter shapes, with specific case, font, and so on, by evoking motor
schemes for producing the letter.
Page 5 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
then continue with the inferior frontal gyrus (IFG) and in particular BA 44/45, as well as
other perisylvian areas, such as the superior temporal gyrus (BA 22); then we discuss the
role of more controversial areas such as the angular gyrus (BA 39) and the supramarginal
gyrus/sulcus (BA 40). We end this discussion with the role of areas more peripheral to the
process areas, such as premotor BA 6 (including so-called Exner’s area). The general lo
cations of these areas are shown in Figure 24.2.
Because we will discuss brain areas involved in reading and spelling, it is important to
clarify the unique contribution of each type of study, hence the distinction between areas
“necessary for” and those “engaged in” a given cognitive process. Lesion and functional
neuroimaging studies answer different types of questions regarding the relationship be
tween brain areas and cognitive processes. The main question asked in lesion studies is
whether a brain area is necessary for a certain type of processing. If an area is necessary
for a particular aspect of reading or spelling, then a patient with damage to that area will
not be able to perform the particular type of processing. However, areas that appear to be
activated in functional neuroimaging studies in normal subjects may well be involved in,
but are not always necessary for, the particular type of processing. In the following sec
tions we discuss some brain areas found to be involved in reading and writing in a variety
of studies such as lesion-deficit correlation studies, functional neuroimaging studies with
normal control subjects, and functional neuroimaging studies with patients.
The fusiform gyrus (part of BA 37) is the area that has been associated with written lan
guage and has dominated both the neuropsychological and the neuroimaging literature
more than any other area. In this section we summarize and update the available evi
Page 6 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
dence from both the lesion-deficit and the functional neuroimaging literature. We then
discuss possible interpretations of the role of the fusiform in reading and spelling.
Several lesion-deficit studies have shown that dysfunction or lesion at the left fusiform
gyrus or at the left inferior-temporal gyrus (lateral BA 37) causes disrupted access to or
thographic representations in oral reading or spelling in both acute and chronic stroke
patients (Gailard et al., 2006; Hillis et al., 2001, 2004; Patterson & Kay, 1982; Philipose et
al., 2007; Rapcsak & Beeson, 2004; Rapcsak et al., 1990; Tsapkini & Rapp, 2010). In par
ticular, Rapcsak and Beeson (2004) examined eight patients after chronic stroke that had
caused a lesion in BA 37 and BA 20 (mid- and anterior fusiform gyrus) who showed read
ing and writing impairments. The importance of this region in oral reading of words and
pseudowords as well as in oral and written naming was also confirmed in acute lesion
studies (Hillis et al., 2004; Philipose et al., 2007). Philipose and colleagues (2007)
reported deficits in oral reading and spelling of words and pseudowords in sixty-nine cas
es of acute stroke. Their analyses showed that dysfunction of BA 37 (and BA 40, as we
discuss later), as demonstrated by hypoperfusion, was strongly correlated with impair
ments of oral reading and spelling of both words and pseudowords. In general, lesion
studies, whether they refer to reading (p. 495) only or spelling only (Hillis et al., 2002; Pat
terson & Kay, 1982; Rapcsak et al., 1990), or both (Hillis et al., 2005; Philipose et al.,
2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2009), confirmed that dysfunction in left BA
37 results in impairment in written language output (oral reading and spelling), indicat
ing that function in this area is necessary for some aspect of these tasks.
However, in most lesion-deficit studies the lesions are large, and in most (but not all) cas
es, there are other areas that are dysfunctional that might contribute to the deficit. Two
recent studies (Gaillard et al., 2006; Tsapkini & Rapp, 2009) address this issue by describ
ing patients with focal lesions that do not involve extended posterior visual areas known
to be responsible for general visual processing, and they offer additional evidence of the
importance of the fusiform gyrus in reading and spelling. Gaillard and colleagues (2006)
reported on a patient who exhibited impairment in reading but not spelling after a focal
lesion to the posterior portion of the left fusiform gyrus. The deficit in this patient was in
terpreted as resulting from a disconnection between the early reading areas of the poste
rior fusiform and abstract orthographic representations whose access might require mid-
fusiform gyrus. Similar “disconnection” accounts of pure alexia (also called letter-by-let
ter reading or alexia without agraphia) after left occipital-temporal or occipital-splenial le
sions. usually due to posterior cerebral artery stroke, have been given since the 1880s
(Binder et al., 1992; Cohen et al., 2003; Déjerine, 1891; Epelbaum et al., 2008; Gaillard et
al., 2006; Leff et al., 2006).
In contrast to cases of pure alexia, many studies have indicated that damage in the mid-
fusiform results in impairments in both oral reading and spelling (Hillis et al., 2005; Phili
pose et al., 2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2010). For example, in a case re
port of a patient with a focal lesion in this area after resection of the more anterior mid-
fusiform gyrus, Tsapkini and Rapp (2009) showed that orthographic representations as
well as access to word meanings from print were disrupted for both reading and spelling.
Page 7 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Furthermore, except for the impairment in orthographic processing of words (but not of
nonwords), there was no impairment in other types of visual processing such as the pro
cessing of faces and objects. It is striking that the localization of reading and spelling is
the same in all studies in which both reading and spelling are affected (Hillis et al., 2005;
Philipose et al., 2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2009). The above findings
lend support to the claim that the mid-fusiform gyrus is a necessary area for orthographic
processing (but see Price & Devlin, 2003, below, for a different view, as discussed later).
Functional neuroimaging studies have also found that reading words and nonwords, or
simply viewing letters relative to other visual stimuli, activates the left (greater than
right) mid-fusiform gyri. The area of activation is remarkably consistent across studies
and across printed stimuli, irrespective of location, font, or size (Cohen et al., 2000, 2002;
Dehaene et al., 2001, 2002; Gros et al., 2001; Polk et al., 2002; Polk & Farah, 2002; Price
et al., 1996; Puce et al., 1996; Uchida et al., 1999; see also Cohen & Dehaene, 2004, for
review). Some investigators have taken these data as evidence that the left mid-fusiform
gyrus has a specialized role in computing location- and font-independent visual word
forms (Cohen et al., 2003; McCandliss et al., 2003). This area has been labeled the visual
word form area (VWFA; Cohen et al., 2000, 2002, 2004a, 2004b). However, based on acti
vation of the same or a nearby area during a variety of nonreading lexical tasks, and on
the fact that lesions in this area are associated with a variety of lexical deficits, other in
vestigators have objected to the label. These opponents of the label of VWFA have pro
posed that the left mid-fusiform gyrus has a specialized role in modality-independent lexi
cal processing, rather than in reading alone (Büchel, Price, & Friston, 1998; Price et al.,
2003; Price & Devlin, 2003, 2004; see below for discussion).
Recently there have been further specifications of the VWFA proposal. Recent findings
suggest that there is a hierarchical organization in the fusiform gyrus and that word
recognition is a process accomplished from posterior-to-anterior direction in the occipital-
temporal cortex, which tunes progressively to more specified elements starting from ele
mentary visual features to letters, bigrams, and finally whole words (Cohen et al., 2008;
Dehaene et al., 2005; Vinckier et al., 2007). This proposal is accompanied by a proposal
about the experiential attenuation of the occipital-temporal cortex. Because reading and
writing area very recent evolutionary skills, it would be hard to argue for an area “set
aside” for orthographic processing only. Therefore, the proposal is that reading expertise
develops though experience and fine-tunes neural tissue that was supposed to achieve de
tailed visual processing (Cohen et al., 2008; McCandliss et al., 2003). Supporting evi
dence for this proposal comes from the developmental literature as well as from
Aghababian and Nazir, 2000, who reported (p. 496) increases in reading accuracy and
speed with age and maturation. The experiential attenuation of the occipital-temporal
cortex with experience may also be the reason we see different subdivisions of the
fusiform involved in word versus pseudoword reading, or even in the frequency effects
observed (Binder et al., 2006; Bruno et al., 2008; Dehaene et al., 2005; Glezer et al., 2009;
Kronbichler et al., 2004, 2007; Mechelli et al., 2003). This familiarity effect in the fusiform
has also been found for writing (Booth et al., 2003; Norton et al., 2007), confirming a role
for the fusiform in both reading and spelling. Recent functional neuroimaging findings of
Page 8 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
reading and spelling in the same individuals confirmed that the mid-fusiform is more sen
sitive to words than letter strings and to low- than high-frequency words, showing the
sensitivity of this area to lexical orthography for both reading and spelling (Rapp & Lipka,
2011).
A different interpretation of these results has been offered. Price and Devlin (2003) have
rigorously questioned the notion that this area is dedicated to orthographic processing
only, by arguing that this area is also found to be activated in hearing or speaking (Price
et al., 1996, 2005, 2006), undermining any claims about orthographic specificity. Other
studies have shown that the VWFA is activated when subjects process nonorthographic vi
sual patterns, and investigators have claimed that this area is not dedicated to ortho
graphic processing per se but rather to complex visual processing (Ben-Shachar et al.,
2007; Joseph et al., 2006; Starrfelt & Gerlach, 2007; Wright et al., 2008). Others have
found that this area may selectively process orthographic forms, but not exclusively (Bak
er et al., 2007; Pernet et al., 2005; Polk et al., 2002).
To evaluate the proposed roles of the mid-fusiform gyrus in orthographic processing ver
sus more general lexical processing, Hillis et al. (2005) studied eighty patients with acute,
left-hemisphere ischemic stroke on two reading tasks that require access to a visual word
form (or orthographic lexicon) but do not require lexical output—written lexical decision
and written word–picture verification. Patients were also administered other lexical tasks,
including oral and written naming of pictures, oral naming of objects from tactile explo
ration, oral reading, spelling to dictation, and spoken word–picture verification. Patients
underwent magnetic resonance imaging, including diffusion- and perfusion-weighted
imaging, the same day. It was argued that if left mid-fusiform gyrus were critical to ac
cessing visual word forms, then damage or dysfunction of this area would, at least at the
onset of stroke (before reorganization and recovery), reliably cause impaired perfor
mance on written lexical decision and written word–picture verification.
However, there was no significant association between damage or dysfunction of this re
gion and impaired written lexical decision or written word–picture verification, indicating
that left mid-fusiform gyrus is not reliably necessary for accessing visual word forms
(Hillis et al., 2005). Of the fifty-three patients who showed infarct or hypoperfusion (se
vere enough to cause dysfunction) involving the left mid-fusiform gyrus, twenty-two sub
jects had intact written word comprehension, and fifteen had intact written lexical deci
sion, indicating that both tasks can be accomplished without function of this region. How
ever, there was a strong association between damage or dysfunction of this area and im
pairment in oral reading (χ2 = 10.8; df1; p = 0.001), spoken picture naming (χ2 = 18.9;
df1; p < 0.0001), spoken object naming to tactile exploration (χ2 = 8.2; df1; p < 0.004);
and written picture naming (χ2 = 13.5; df1; p < 0.0002). These results indicate that struc
tural damage or tissue dysfunction of the left mid-fusiform gyrus was associated with im
paired lexical processing, irrespective of the input or output modality. In this study, the in
farct or hypoperfusion that included the left mid-fusiform gyrus usually extended to adja
cent areas of the left fusiform gyrus (BA 37). Therefore, it is possible that dysfunction of
areas near the left mid-fusiform gyrus, such as the lateral inferior-temporal multimodality
Page 9 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
area (LIMA; Cohen, Jobert, Le Bihan, & Dehaene, 2004), was critical to modality-indepen
dent lexical processing. Further support for a critical role of at least a portion of the left
fusiform gyrus in modality-independent lexical output is provided by patients who show
impaired naming (with visual or tactile input) and oral reading when this region is hypop
erfused, and improved naming and oral reading when this region is reperfused (Hillis,
Kane, et al., 2002; Hillis et al., 2005). Previous studies have also reported oral naming
and oral reading deficits associated with lesions to this region (e.g., Foundas et al., 1998;
Hillis, Tuffiash, et al., 2002; Raymer et al., 1997; Sakurai, Sakai, Sakuta, & Iwata, 1994).
One possible way to accommodate these various sets of results is the following. Part of
left fusiform gyrus (perhaps lateral to the VWFA identified by Cohen and colleagues) may
be critical for modality-independent lexical processing, rather than a reading-specific
process (Price & Devlin, 2003, 2004); (p. 497) and part of the left or right mid-fusiform
gyrus is essential for computing a location-, font-, and orientation-independent graphemic
description (one meaning of a visual word form). Evidence in support of this latter compo
nent of this account comes from patients with pure alexia (who have a disconnection be
tween visual input to the left mid-fusiform gyrus) (Binder & Mohr, 1992; Chialant & Cara
mazza, 1997; Marsh & Hillis, 2005; Miozzo & Caramazza, 1997; Saffran & Coslett, 1998).
In one case of acute pure alexia, written word recognition was impaired when the left
mid-fusiform gyrus was infarcted and the splenium of the corpus callosum was hypoper
fused, but recovered when the splenium was reperfused (despite persistent damage to
the left mid-fusiform gyrus). These results could be explained by proposing that the pa
tient was able to rely on the right mid-fusiform gyrus to compute case-, font- and orienta
tion-independent graphemic descriptions (when the left was damaged); but these
graphemic descriptions could not be used for reading until reperfusion of the splenium al
lowed information from the right mid-fusiform gyrus to access language cortex for addi
tional components of the reading process (Marsh & Hillis, 2005). This proposal of a criti
cal role of either the left or right mid-fusiform gyrus in computing graphemic descriptions
is consistent with findings of (1) reliable activation of this area in response to written
words, pseudowords, and letters in functional imaging; (2) activation of bilateral mid-
fusiform gyrus in written lexical decision (Fiebach et al., 2002); and (3) electrophysiologi
cal evidence of a mid-fusiform activity early in the reading process (Salmelin et al., 1996,
Tarkiainen et al., 1999).
In general, there are fewer functional imaging studies of spelling than reading (for re
views and meta-analyses, see Jobard et al., 2003; Mechelli et al., 2003; Turkeltaub et al.,
2002). However, four of the six spelling studies show that the left fusiform gyrus is an
area involved in spelling (Beeson et al., 2003; Norton et al., 2007; Rapp & Hsieh, 2002;
Rapp & Lipka, 2011). Spelling might require both computation of a case-, font-, and orien
tation-independent graphemic description (proposed to require right or left mid-fusiform
gyrus) and modality-independent lexical output (proposed to require the lateral part of
left mid-fusiform gyrus). Further evidence of a role for part of the left mid-fusiform gyrus
in modality-independent lexical output is that reading Braille in blind subjects (Büchel et
al., 1998) and sign language in deaf subjects (Price et al., 2005) activates this area.
Page 10 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Figure 24.3 shows the remarkable similarity in the area identified as a key neural sub
strate of reading, spelling, or naming across various lesion and functional imaging stud
ies.
The left posterior IFG is probably the area of the brain mostly found to be involved in lan
guage processing, and in particular in language production, since the 1800s. Damage in
the IFG has also been associated with impairments in fluency (Goodglass et al., 1969),
picture naming (Hecaen & Consoli, 1973; Hillis et al., 2004; Miceli & Caramazza, 1988),
lexical impairment (Goodglass, 1969; Perani & Cappa, 2006), semantic retrieval (Good
glass et al., 1969), and syntactic processing (Grodzinsky, 2000).
There have also been many reports of patients with impairments in reading and spelling
after lesions in Broca’s area. IFG lesions are often associated with deficits in reading or
writing nonwords using grapheme-to-phoneme conversion rules, sometimes with accu
rate performance on words. However, in most of these cases, lesions in BA 44/45 were ac
companied by damage to other perisylvian areas, such as the insula, precentral gyrus (BA
4/6), and superior temporal gyrus or Wernicke’s area (BA 22) (Coltheart, 1980; Fiez et al.,
2006; Henry et al., 2005; Rapcsak et al., 2002, 2009; Roeltgen et al., 1983). However, in
the chronic lesion studies there were no areas that could be identified as critical for non
word reading or spelling by themselves, in the absence of damage to the surrounding ar
eas (Rapcsak, 2007).
Page 11 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Studies have shown that a lesion or tissue dysfunction of this area alone may result in
deficits in spelling (see Hillis et al., 2002, for review). Many patients with relatively local
ized lesions involving Broca’s area have pure agraphia, with impaired written naming
(particularly of verbs compared with nouns), but spared oral naming of verbs and nouns,
accompanied by impaired sublexical mechanisms for converting phonemes to graphemes
(Hillis, Chang, & Breese, 2004; see also Hillis, Rapp, & Caramazza, 1999). Additional evi
dence that the IFG is crucial for written naming of verbs comes from the use of reperfu
sion in acute stroke (Hillis et al., 2002). For example, Hillis and colleagues found that im
pairment in accessing orthographic representations (particularly verbs) with relatively
spared oral naming and word meaning was associated with dysfunction (hypoperfusion)
of Broca’s area. Furthermore, (p. 498) reperfusion of Broca’s area resulted in recovery of
written naming of verbs. Thus, Broca’s area seems to be critical at least for writing verbs
(in the normal state, before reorganization).
Page 12 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
The reason we do not always see spelling or reading deficits caused by a lesion to the IFG
in chronic stroke patients may be that the brain reorganizes so that the contribution of
this area in spelling can be taken over by other areas over time (Price & Friston, 2002).
Functional neuroimaging studies have been employed to assess whether the IFG is en
gaged in reading and spelling. Many functional neuroimaging studies of reading and
spelling have found IFG involvement, even when no overt speech production is required
(Bolger et al., 2005; Fiez & Petersen, 1998; Jobard et al. 2003; Mechelli et al., 2003;
Paulesu et al., 2000; Price 2000; Rapp & Lipka, 2011 [review]; Turkeltaub, 2002). Howev
er, not all studies have shown IFG activation during spelling. One functional neuroimag
ing study reported frontal activation in written naming versus oral naming, but the activa
tion did not clearly involve Broca’s area (Katonoda et al., 2001). Another study did report
the activation of Broca’s area in written naming, but this activation did not survive the
comparison to oral naming (Beeson et al., 2003). Nevertheless, both Beeson and col
leagues (2003) and Rapp and Hsieh (2002) found spelling-specific activation in the left
IFG. Therefore, it is hard to identify the exact role of the left IFG in the process of read
ing or spelling; subparts of this area may be necessary or engaged in distinct cognitive
processes underlying reading or spelling, such as lexical retrieval or selection (Price,
2000; Thompson-Schill et al., 1999); phonological processing (Pugh et al., 1996), or re
trieval or selection from the orthographic lexicon specifically (Hillis et al., 2002). The de
termination of the role of Broca’s area in reading and writing becomes even more compli
cated if one considers the many other language and nonlanguage functions that have
been attributed to this area and that may be required for reading and writing, such as
working memory (Smith & Jonides, 1999), memory (Paulesu et al., 1993), syntactic pro
cessing (Thompson et al., 2007), cognitive control and task updating (Brass (p. 499) & von
Cramon, 2002, 2004; Derrfuss et al., 2005; Thompson-Schill et al., 1999), and semantic
encoding (Mummery et al., 1996; Paulesu et al., 1997; Poldrack et al., 1999).
Page 13 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
er is sufficient to impair the neural network necessary for reading and spelling both
words and nonwords. The reason we do not often see chronic lesions in supramarginal
gyrus to result in deficits in reading or spelling is that other areas might successfully as
sume its function in the networks.
The supramarginal gyrus (BA 40) has also been found to be involved in both reading and
spelling in the functional neuroimaging literature for both words and nonwords (Booth et
al., 2002, 2003; Corina et al., 2003; Law et al., 1991; Mummery et al., 1998; Price, 1998).
In these studies, the supramarginal gyrus was implicated in sublexical conversion either
from graphemes to phonemes or from phonemes to graphemes. However, in these studies
clusters at the supramarginal gyrus during sublexical conversion were always accompa
nied by clusters at the frontal operculum in the IFG, as Jobard and colleagues (2003) note
in their extensive meta-analysis. Some researchers have claimed that this set of brain re
gions sustains phonological working memory and serves as a temporary phonological
store (Becker et al., 1999; Fiez et al., 1996, 1997; Paulesu et al., 1993). Thus, it is plausi
ble that this region is activated in reading and writing because the sequential computa
tions involved in sublexical conversion must be held in working memory as the word is
written or spoken.
Page 14 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
large area, and it is conceivable that, because even small brain areas may be specialized,
there may be smaller areas with different functions included in this area. Furthermore, in
chronic stroke patients, there may be other areas that take over one function but not the
other, depending on factors of individual neuronal connectivity or practice, or for some
other unknown reason.
Functional neuroimaging studies of normal adults have been inconsistent in finding evi
dence for angular gyrus involvement; that is, some of them did find angular gyrus activa
tion (Rapp & Hsieh, 2002), but others did not (Beeson & Rapcsak, 2002). However, sever
al developmental studies have found that the angular gyrus activation is associated with
reading of pseudowords more than words and proposed that this area may be more im
portant during development (Pugh et al., 2001). Therefore, the evidence is not yet conclu
sive about the exact contribution of the angular gyrus in reading or spelling, but this area
seems to have an important role in these tasks.
Page 15 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
However, functional imaging studies show activation of anterior and middle areas of the
superior temporal gyrus linked to phonological processing or to grapheme-to-phoneme
conversion in reading (Jobard et al., 2003; Price et al., 1996; Wise et al., 1991). Likewise,
an MEG study of adult reading indicated that the middle part of the superior temporal
gyrus was involved in pseudoword but not word reading (Simos et al., 2002).
The discrepancy between lesion and functional imaging studies observed may be due to
the fact that the superior temporal gyrus is a large area of cortex, and its posterior part
may well be related to accessing meaning from phonological or orthographic codes as
Wernicke has claimed, whereas its middle and anterior parts may be more dedicated to
phonological processing per se. Alternatively, the middle and anterior parts of the superi
or temporal gyrus may be consistently engaged in all aspects of phonological processing
(including sublexical grapheme-to-phoneme conversion and hearing), but not necessary
for these processes (perhaps because the right superior temporal gyrus is capable of at
least some aspects of phonological processing), so that unilateral lesions do not cause
deficits in these aspects.
Conclusion
The evidence reviewed in this chapter, although controversial and inconclusive on some
points, illustrates the complexity of the cognitive and neural mechanisms underlying
reading and spelling and their relationships to one another. We started (p. 501) this review
by taking the paradigms of reading and spelling as instances to discuss the correspon
dence between cognitive and neural architecture, just to discover the complexity, al
though not ineffability, of this task. One of the most intriguing observations in reviewing
the bulk of the current evidence is that some brain areas seem to be crucial for more than
one language function or component, and some are more specialized. One function may
be subserved by more than one brain area, and a single brain area may subserve more
than one function. Thus, reading and spelling seem to be complex processes consisting of
distinct cognitive subprocesses that seem to rely on overlapping networks of separate
brain regions. Furthermore, the connectivity of this cortical circuitry may be altered and
reorganized over time, owing to practice or therapy (or to new infarcts, atrophy, or resec
tion). Further insights into the plasticity of the human cortex and its dynamic nature are
needed to shed greater light on the cognitive and neural processes underlying reading
and writing.
References
Aghababian, V., & Nazir, T. A. (2000). Developing normal reading skills: Aspects of the vi
sual process underlying word recognition. Journal of Experimental Child Psychology, 76,
123–150.
Alexander, M. P., Friedman, R. B., Loverso, F., & Fischer, R. S. (1992). Lesion localization
in phonological agraphia. Brain and Language, 43, 83–95.
Page 16 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Anderson, S.W., Damasio, A. R., & Damasio, H. (1990). Troubled letters but not numbers:
Domain specific cognitive impairments following focal damage in frontal-cortex. Brain,
113, 749–766.
Baker, C. I., Liu, J., Wald, L. L., et al. (2007). Visual word processing and the experiential
origins of functional selectivity in human extrastriate cortex. Proceedings of the National
Academy of Sciences U S A, 104, 9087–9092.
Beauvois, M. F., & Dérouesné, J. (1979). Phonological alexia: Three dissociations. Journal
of Neurology, Neurosurgery and Psychiatry, 42, 1115–1124.
Becker, J. T., Danean, K., MacAndrew, D. K., & Fiez, J. A. (1999). A comment on the func
tional localization of the phonological storage subsystem of working memory. Brain and
Cognition, 41, 27–38.
Beeson, P. M., & Rapcsak, S. Z. (2002). Clinical diagnosis and treatment of spelling disor
ders. In A. E. Hillis (Ed.), Handbook on adult language disorders: Integrating cognitive
neuropsychology, neurology, and rehabilitation (pp. 101–120). Philadelphia: Psychology
Press.
Beeson, P. M., & Rapcsak, S. Z. (2003). The neural substrates of sublexical spelling (INS
Abstract). Journal of the International Neuropsychological Society, 9, 304.
Beeson, P. M., Rapcsak, S. Z., Plante, E., et al. (2003). The neural substrates of writing: A
functional magnetic resonance imaging study. Aphasiology, 17, 647–665.
Behrmann, M., & Bub, D. (1992). Surface dyslexia and dysgraphia: Dual routes, single
lexicon. Cognitive Neuropsychology, 9, 209–251.
Behrmann, M., Nelson, J., & Sekuler, E. B. (1998). Visual complexity in letter-by-letter
reading: “Pure” alexia is not pure. Neuropsychologia, 36, 1115–1132.
Behrmann, M., Plaut, D. C., & Nelson, J. (1998). A literature review and new data support
ing an interactive account of letter-by-letter reading. Cognitive Neuropsychology, 15, 7–
51.
Ben-Shachar, M., Dougherty, R. F., Deutsch, G. K., & Wandell, B. A. (2007). Differential
sensitivity to words and shapes in ventral occipito-temporal cortex. Cerebral Cortex, 17,
1604–1611.
Benson, D. F. (1979). Aphasia, alexia and agraphia. New York: Churchill Livingstone.
Binder, J. R., McKiernan, K. A., Parsons, M. E., et al. (2003). Neural correlates of lexical
access during visual word recognition. Journal of Cognitive Neuroscience, 15, 372–393.
Binder, J. R., Medler, D. A., Westbury, C. F., et al. (2006). Tuning of the human left
fusiform gyrus to sublexical orthographic structure. NeuroImage, 33, 739–748.
Page 17 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Binder, J. R., & Mohr, J. P. (1992). The topography of callosal reading pathways. Brain,
115, 1807–1826.
Binder, J., & Price, C. J. (2001). Functional neuroimaging of language. In R. Cabeza & A.
Kingstone (Eds.), Handbook of functional neuroimaging of cognition (pp. 187–251). Cam
bridge, MA: MIT Press.
Black, S., & Behrmann, M. (1994). Localization in alexia. In A. Kertesz (Ed.), Localization
and neuroimaging in neuropsychology. San Diego: Academic Press.
Bolger, D. J., Perfetti, C. A., & Schneider, W. (2005). A cross-cultural effect on the brain re
visited. Human Brain Mapping, 25, 92–104.
Booth, J. R., Burman, D. D., Meyer, J. R, et al. (2003). Relation between brain activation
and lexical performance. Human Brain Mapping, 19, 155–169.
Booth, J. R., Burman, D. D., Meyer, J. R., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M.
(2002). Modality independence of word comprehension. Human Brain Mapping, 16, 251–
261.
Borowsky, R., & Besner, D. (1993). Visual word recognition: A multistage activation mod
el. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 813–840.
Bowers, J. S., Bub, D., & Arguin, M. A. (1996). Characterization of the word superiority ef
fect in a case of letter-by-letter surface alexia. Cognitive Neuropsychology, 13, 415–441.
Boxer, A. L., Rankin, K. P., Miller, B. L., et al. (2003). Cinguloparietal atrophy distinguish
es Alzheimer’s disease from semantic dementia. Archives of Neurology, 60, 949–956.
Brass, M., & von Cramon, D. Y. (2002). The role of the frontal cortex in task preparation.
Cerebral Cortex, 12, 908–914.
Brass, M., & von Cramon, D. Y. (2004). Decomposing components of task preparation with
functional magnetic resonance imaging. Journal of Cognitive Neuroscience, 16, 609–620.
Bruno, J. L., Zumberge, A., & Manis, F. R., et al. (2008). Sensitivity to orthographic famil
iarity in the occipito-temporal region. NeuroImage, 39, 1988–2001.
Bub, D., & Kertesz, A. (1982). Deep agraphia. Brain and Language, 17, 146–165.
Büchel, C., Price, C., Frackowiak, R. S., & Friston, K. (1998). Different activation patterns
in the visual cortex of late and congenitally blind subjects. Brain: A Journal of Neurology,
121, 409–419.
Burton, M. W., LoCasto, P. C., Krebs-Noble, D., & Gullapalli, R. P. (2005). A system
(p. 502)
atic investigation of the functional neuroanatomy of auditory and visual phonological pro
cessing. NeuroImage, 26, 647–661.
Page 18 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Chan, D., Fox, N. C., Scahill, R. I., et al. (2001). Patterns of temporal lobe atrophy in se
mantic dementia and Alzheimer’s disease. Annals of Neurology, 49, 433–442.
Chialant, D., & Caramazza, A. (1997). Identity and similarity factors in repetition blind
ness: Implications for lexical processing. Cognition, 63, 79–119.
Cloutman, L., Gingis, L., Newhart, M., Davis, C., Heidler-Gary, J., Crinion, J., & Hillis, A. E.
(2009). A neural network critical for spelling. Annals of Neurology, 66 (2), 249–253.
Cohen, L., & Dehaene, S. (2004). Specialization within the ventral stream: The case for
the visual word form area. NeuroImage, 22, 466–476.
Cohen, L., Dehaene, S., Naccache, L., et al. (2000). The visual word form area: Spatial
and temporal characterization of an initial stage of reading in normal subjects and poste
rior split-brain patients. Brain, 123, 291–307.
Cohen, L., Dehaene, S., Vinckier, F., et al. (2008). Reading normal and degraded words:
Contribution of the dorsal and ventral visual pathways. NeuroImage, 40, 353–366.
Cohen, L., Henry, C., Dehaene, S., et al. (2004). The pathophysiology of letter-by-letter
reading. Neuropsychologia, 42, 1768–1780.
Cohen, L., Jobert, A., Bihan, D. L., & Dehaene, S. (2004). Distinct unimodal and multi
modal regions for word processing in left temporal cortex. NeuroImage, 23, 1256–1270.
Cohen, L., Lehéricy, S., Chochon, F., et al. (2002). Language-specific tuning of visual cor
tex? Functional properties of the Visual Word Form Area. Brain, 125, 1054–1069.
Cohen, L., Martinaud, O., Lemer, C., et al. (2003). Visual word recognition in the left and
right hemispheres: anatomical and functional correlates of peripheral alexias. Cerebral
Cortex, 13, 1313–1333.
Coltheart, M., Patterson, K., & Marshall, J. C. (1980). Deep dyslexia. London: Routledge &
Kegan Paul.
Coltheart, M., Rastle, K., Perry, C., et al. (2001). DRC: A dual route cascaded model of vi
sual word recognition and reading aloud. Psychological Review, 108, 204–256.
Corina, D. P., San Jose-Robertson, L., Guillemin, A., High, J., & Braun, A. R. (2003). Lan
guage lateralization in a bimanual language. Journal of Cognitive Neuroscience, 15, 718–
730.
Crisp, J., & Lambon Ralph, M. A. (2006). Unlocking the nature of the phonological-deep
dyslexia continuum: the keys to reading aloud are in phonology and semantics. Journal of
Cognitive Neuroscience, 18, 348–362.
Damasio, A. R., & Damasio, H. (1983). The anatomic basis of pure alexia. Neurology, 33,
1573–1583.
Page 19 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Dehaene, A., Cohen, L., Sigman, M., & Vinckier, F. (2005). The neural code for written
words: A proposal. Trends in Cognitive Sciences, 9, 335–341.
Dehaene, S., Le Clec, H. G., Poline, J. B., Le Bihan, D., & Cohen, L. (2002). The visual
word form area: A prelexical representation of visual words in the fusiform gyrus. Neu
roReport, 13, 321–325.
Dehaene, S., Naccache, L., Cohen, L., Bihan, D. L., Mangin, J. F., Poline, J. B., et al. (2001).
Cerebral mechanisms of word masking and unconscious repetition priming. Nature Neu
roscience, 4, 752–758.
Déjerine, J. (1891). Sur un cas de cécité verbale avec agraphie, suivi d’autopsie. Mém Soc
Biol, 3, 197–201.
DeLeon, J., Gottesman, R. F., Kleinman, J. T., et al. (2007). Neural regions essential for dis
tinct cognitive processes underlying picture naming. Brain, 130, 1408–1422.
De Renzi, E., Zambolin, A., & Crisi, G. (1987). The pattern of neuropsychological impair
ment associated with left posterior cerebral artery infarcts. Brain, 110, 1099–1116.
Derrfuss, J., Brass, M., Neumann, J., & von Cramon, D. Y. (2005). Involvement of the infe
rior frontal junction in cognitive control: meta-analyses of switching and Stroop studies.
Human Brain Mapping, 25, 22–34.
Ellis, A. W., & Young, A. W. (1988). Human cognitive neuropsychology. Hove, UK: Erl
baum.
Epelbaum, S., Pinel, P., Gaillard, R., et al. (2008). Pure alexia as a disconnection syn
drome: new diffusion imaging evidence for an old concept. Cortex, 44, 962–974.
Exner, S. (1881). Lokalisation des Funcktion der Grosshirnrinde des Menschen. Wein:
Braunmuller.
Farah, M. J., Stowe, R. M., & Levinson, K. L. (1996). Phonological dyslexia: Loss of a read
ing-specific component of the cognitive architecture? Cognitive Neuropsychology, 13,
849–868.
Farah, M. J., & Wallace, M. A. (1991). Pure alexia as a visual impairment: A reconsidera
tion. Cognitive Neuropsychology, 8, 313–334.
Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5, 79–83.
Fiez, J. A., & Petersen, S. E. (1998). Neuroimaging studies of word reading. Proceedings
of the National Academy of Science U S A, 95, 914–921.
Page 20 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Fiez, J. A., Raichle, M. E., Balota, D. A., Tallal, P., & Petersen, S. E. (1996). PET activation
of posterior temporal regions during auditory word presentation and verb generation.
Cerebral Cortex, 6, 1–10.
Fiez, J. A., Tranel, D., Seager-Frerichs, D., & Damasio, H. (2006). Specific reading and
phonological processing deficits are associated with damage to the left frontal opercu
lum. Cortex, 42, 624–643.
Foundas, A., Daniels, S. K., & Vasterling, J. J. (1998). Anomia: Case studies with lesion lo
calization. Neurocase, 4, 35–43.
Friedman, R. B. (1996). Recovery from deep alexia to phonological alexia: Points on a con
tinuum. Brain and Language, 52, 114–128.
Friedman, R. B., & Hadley, J. A. (1992). Letter-by-letter surface alexia. Cognitive Neu
ropsychology, 9, 185–208.
Funnell, E. (1996). Response bias in oral reading: An account of the co-occurrence of sur
face dyslexia and semantic dementia. Quarterly Journal of Experimental Psychology, 49A,
417–446.
Gaillard, R., Naccache, L., Pinel, P., et al. (2006). Direct intracranial, fMRI, and lesion evi
dence for the causal role of left inferotemporal cortex in reading. Neuron, 50, 191–204.
Galton, C. J., Patterson, K., Graham, K., et al. (2001). Differing patterns of temporal atro
phy in Alzheimer’s disease and semantic dementia. Neurology, 57, 216–225.
Glezer, L. S., Jiang, X., & Riesenhuber, M. (2009). Evidence for highly selective neuronal
tuning to whole words in the “visual word form area.” Neuron, 62 (2), 199–204.
Gold, B. T., & Kertesz, A. (2000). Right-hemisphere semantic processing of visual words in
an aphasic patient: An fMRI study. Brain and Language, 73, 456–465.
Goodglass, H., Hyde, M. R., & Blumstein, S. (1969). Frequency, picturability and availabil
ity of nouns in aphasia. Cortex, 5, 104–119.
Gorno-Tempini, M. L., Dronkers, N. F., Rankin, K. P., et al. (2004). Cognition and anatomy
in three variants of primary progressive aphasia. Annals of Neurology, 55, 335–346.
Graham, K. S., Hodges, J. R., & Patterson, K. (1994). The relationship between compre
hension and oral reading in progressive fluent aphasia. Neuropsychologia, 32, 299–316.
Graham, N. L., Patterson, K., & Hodges, J. R. (2000). The impact of semantic memory im
pairment on spelling: evidence from semantic dementia. Neuropsychologia, 38, 143–163.
Page 21 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Grodzinsky, Y. (2000). The neurology of syntax: Language use without Broca’s area. Be
havioral and Brain Sciences, 23, 1–21.
Hecaen, H., & Consoli, S. (1973). Analysis of language disorders in lesions of Broca’s
area. Neuropsychologia, 11, 377–388.
Henry, C., Gaillard, R., & Volle, E., et al. (2005). Brain activations during letter-by-letter
reading: A follow-up study. Neuropsychologia, 43, 1983–1989.
Henry, M. L., Beeson, P. M., Stark, A. J., & Rapcsak, S. Z. (2007). The role of left perisyl
vian cortical regions in spelling. Brain and Language, 100, 44–52.
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature
Reviews Neuroscience, 8, 393–402.
Hillis, A. E., Chang, S., & Breese, E. (2004). The crucial role of posterior frontal regions in
modality specific components of the spelling process. Neurocase, 10, 157–187.
Hillis, A. E., Kane, A., Barker, P., et al. (2001). Neural substrates of the cognitive process
es underlying reading: Evidence from magnetic resonance perfusion imaging in hypera
cute stroke. Aphasiology, 15, 919–931.
Hillis, A. E., Kane, A., Tuffiash, E., et al. (2002). Neural substrates of the cognitive
processes underlying spelling: Evidence from MR diffusion and perfusion imaging. Apha
siology 16, 425–438.
Hillis, A. E., Kane, A., Tuffiash, E., Beauchamp, N., Barker, P. B., Jacobs, M. A., & Wityk, R.
(2002). Neural substrates of the cognitive processes underlying spelling: Evidence from
MR diffusion and perfusion imaging. Aphasiology, 16, 425–438.
Hillis, A. E., Newhart, M., Heidler, J., et al. (2005). The roles of the “visual word form
area” in reading. NeuroImage, 24, 548–559.
Hillis, A., & Rapp, B. (2004). Cognitive and neural substrates of written language compre
hension and production. In M. Gazzaniga (Ed.), The new cognitive neurosciences (3rd ed.,
pp. 755–788). Cambridge, MA: MIT Press.
Hillis, A. E., Rapp, B. C., & Caramazza, A. (1999). When a rose is a rose in speaking but a
tulip in writing. Cortex, 35, 337–356.
Hillis, A.E., Tuffiash, E., & Caramazza, A. (2002). Modality specific deterioration in oral
naming of verbs. Journal of Cognitive Neuroscience, 14, 1099–1108.
Howard, D., Patterson, K., Franklin, S., Morton, J., & Orchard-Lisle, V. (1984). Variability
and consistency in picture naming by aphasic patients. Advances in Neurology, 42, 263–
276.
Page 22 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Ino, T., Tokumoto, K., Usami, K., et al. (2008). Longitudinal fMRI study of reading in a pa
tient with letter-by-letter reading. Cortex, 44, 773–781.
Jefferies, E., Sage, K., & Lambon Ralph, M. A. (2007). Do deep dyslexia, dysphasia and
dysgraphia share a common phonological impairment? Neuropsychologia, 45, 1553–1570.
Jobard, G., Crivello, F., & Tzourio-Mazoyer, N. (2003). Evaluation of the dual route theory
of reading: a metanalysis of 35 neuroimaging studies. NeuroImage, 20, 693–712.
Joseph, J. E., Cerullo, M. A., Farley, A. B., et al. (2006). fMRI correlates of cortical special
ization and generalization for letter processing. NeuroImage, 32, 806–820.
Katzir, T., Misra, M., & Poldrack, R. A. (2005). Imaging phonology without print: Assess
ing the neural correlates of phonemic awareness using fMRI. NeuroImage, 27, 106–115.
Klein, D., Milner, B., Zatorre, R. J., Zhao, V., & Nikelski, J. (1999). Cerebral organization in
bilinguals: A PET study of Chinese-English verb generation. NeuroReport, 10, 2841–2846.
Kronbichler, M., Bergmann, J., Hutzler, F., et al. (2007). Taxi vs. taksi: On orthographic
word recognition in the left ventral occpitotemporal cortex. Journal of Cognitive Neuro
science, 19, 1584–1594.
Kronbichler, M., Hutzler, F., Wimmer, H., et al. (2004). The visual word form area and the
frequency with which words are encountered: Evidence from a parametric fMRI study.
NeuroImage, 21, 946–953.
Lambert, J., Giffard, B., Nore, F. et al. (2007). Central and peripheral agraphia in
Alzheimer’s disease: From the case of Auguste D. to a cognitive neuropsychology ap
proach. Cortex, 43, 935–951.
Lambon Ralph, M. A., & Graham, N. L. (2000). Previous cases: Acquired phonological and
deep dyslexia. Neurocase, 6, 141–178.
Larsen, J., Baynes, K., & Swick, D. (2004). Right hemisphere reading mechanisms in a
global alexic patient. Neuropsychologia, 42, 1459–1476.
Law, I., Kannao, I., Fujita, H., Miura, S., Lassen, N., & Uemura, K. (1991). Left supramar
ginal/angular gyri activation during reading of syllabograms in the Japanese language.
Journal of Neurolinguistics, 6, 243–251.
Leff, A. P., Crewes, H., Plant, G. T., et al. (2001). The functional anatomy of single-word
reading in patients with hemianopic and pure alexia. Brain, 124, 510–521.
Leff, A. P., Spitsyna, G., Plant, G. T., & Wise, R. J. S. (2006). Structural anatomy of pure
and hemianopic alexia. Journal of Neurology, Neurosurgery, and Psychiatry, 77, 1004–
1007.
Page 23 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Lichtheim, L. (1885). On aphasia. Brain, VII, 433–484.
Lubrano, V., Roux, F. E., & Démonet, J. F. (2004). Writingspecific sites in frontal areas: A
cortical stimulation study. Neurosurgery, 101 (5), 787–798.
Luders, H., Lesser, R. P., Hahn, J., et al. (1991). Basal temporal language area. Brain, 114,
743–754.
Mainy, N., Jung, J., Baciu, M., et al. (2008). Cortical dynamics of word recognition. Human
Brain Mapping, 29, 1215–1230.
Mani, J., Diehl, B., Piao, Z., et al. (2008). Evidence for a basal temporal visual language
center: Cortical stimulation producing pure alexia. Neurology, 71, 1621–1627.
Marinkovic, K., Dhond, R. P., Dale, A. M., et al. (2003). Spatiotemporal dynamics
(p. 504)
Marsh, E. B., & Hillis, A. E. (2005). Cognitive and neural mechanisms underlying reading
and naming: Evidence from letter-by-letter reading and optic aphasia. Neurocase, 11,
325–337.
McCandliss, B. D., Cohen, L., & Dehaene, S. (2003). The visual word form area: Expertise
for reading in the fusiform gyrus. Trends in Cognitive Sciences, 7, 293–299.
McKay, A., Castles, M., & Davis, C. (2007). The impact of progressive semantic loss on
reading aloud. Cognitive Neuropsychology, 24, 162–186.
Mechelli, A., Gorno-Tempini, M. L., & Price, C. J. (2003). Neuroimaging studies of word
and pseudoword reading: Consistencies, inconsistencies, and limitations. Journal of Cog
nitive Neuroscience, 15, 260–271.
Miceli, G., & Caramazza, A. (1988). Dissociation of inflectional and derivational morpholo
gy. Brain and Language, 35, 24–65.
Molko, N., Cohen, L., Mangin, J. F., et al. (2002). Visualizing the neural bases of a discon
nection syndrome with diffusion tensor imaging. Journal of Cognitive Neuroscience, 14,
629–636.
Moro, A., Tettamanti, M., Perani, D., Donati, C., Cappa, S. F., & Fazio, F. (2001). Syntax
and the brain: Disentangling grammar by selective anomalies. NeuroImage, 13, 110–118.
Page 24 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Mummery, C. J., Patterson, K., Hodges, J. R., & Price, C. J. (1998). Functional neuroanato
my of the semantic system: Divisible by what? Journal of Cognitive Neurosci ence, 10,
766–777.
Mummery, C. J., Patterson, K., Price, C. J., et al. (2000). A voxel-based morphometry study
of semantic dementia: Relationship between temporal lobe atrophy and semantic memo
ry. Annals of Neurology, 47, 36–45.
Murtha, S., Chertkow, H., Beauregard, M., & Evans, A. (1999). The neural substrate of
picture naming. Journal of Cognitive Neurosci ence, 11, 399–423.
Nakamura, K., Honda, M., Hirano, S., et al. (2002). Modulation of the visual word re
trieval system in writing: A functional MRI study on the Japanese orthographies. Journal
of Cognitive Neuroscience, 14, 104–115.
Nakamura, K., Honda, M., Okada, T., et al. (2000). Participation of the left posterior inferi
or temporal cortex in writing and mental recall of kanji orthography: A functional MRI
study. Brain, 123, 954–967.
Nobre, A. C., Allison, T., & McCarthy, G. (1995). Word recognition in the human inferior
temporal lobe. Nature, 372, 260–273.
Norton, E. S., Kovelman, I., & Petitto, L.-A. (2007). Are there separate neural systems for
spelling? New insights into the role of rules and memory in spelling from functional mag
netic resonance imaging. Mind, Brain, and Education, 1, 48–59.
Ogden, J. A. (1996). Phonological dyslexia and phonological dysgraphia following left and
right hemispherectomy. Neuropsychologia, 34, 905–918.
Omura, K., Tsukamoto, T., Kotani, Y., et al. (2004). Neural correlates of phoneme-
grapheme conversion. NeuroReport, 15, 949–953.
Patterson, K. E., Marshall, J. C., & Coltheart, M. (1985). Surface dyslexia: Neuropsycho
logical and cognitive studies of phonological reading. London: Erlbaum.
Paulesu, E., Frith, C. D., & Frackowiak, R. S. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.
Paulesu, E., Goldacre, B., Scifo, P., Cappa, S. F., Gilardi, M. C., Castiglioni, I., Perani, D., &
Fazio, F. (1997). Functional heterogeneity of left inferior frontal cortex as revealed by fM
RI. NeuroReport, 8, 2011–2017.
Paulesu, E., McCrory, E., Fazio, F., Menoncello, L., Brunswick, N., Cappa, S. F., Cotelli, M.,
Cossu, G., Corte, F., Lorusso, M., Pesenti, S., Gallagher, A., Perani, D., Price, C., Frith, C.
D., & Frith, U. (2000). A cultural effect on brain function. Nature Neuroscience, 3, 91–96.
Page 25 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Paulesu, E., Perani, D., Blasi, V., Silani, G., Borghese, N. A., De Giovanni, U., Sensolo, S., &
Fazio, F. (2003). A functional-anatomical model for lipreading. Journal of Neurophysiology,
90 (3), 2005–2013.
Perani, D., & Cappa, S. (2006). Broca’s area and lexical-semantic processing. In Y.
Grodzinsky & K Amunts (Eds.), Broca’s region. New York: Oxford University Press, 2006.
Perani, D., Cappa, S. F., Schnur, T., Tettamanti, M., Collina, S., Rosa, M. M., & Fazio, F.
(1999). The neural correlates of verb and noun processing: A PET study. Brain, 122, 2337–
2344.
Pernet, C., Celsis, P., & Demonet, J.-F. (2005). Selective response to letter categorization
within the left fusiform gyrus. NeuroImage, 28, 738–744.
Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1988). Positron
emission tomographic studies of the cortical anatomy of single-word processing. Nature,
331 (6157), 585–589.
Philipose, L. E., Gottesman, R. F., Newhart, M., et al. (2007). Neural regions essential for
reading and spelling of words and pseudowords. Annals of Neurology, 62, 481–492.
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding
normal and impaired word reading: Computational principles in quasi-regular domains.
Psychological Review, 103, 56–115.
Plaut, D. C., & Shallice, T. (1993). Deep dyslexia: A case study of connectionist neuropsy
chology. Cognitive Neuropsychology, 10, 377–500.
Polk, T. A., & Farah, M. J. (2002). Functional MRI evidence for an abstract, not perceptu
al, word-form area. Journal of Experimental Psychology: General, 131 (1), 65–72.
Polk, T. A., Stallcup, M., Aguirre, G. K., et al. (2002). Neural specialization for letter
recognition. Journal of Cognitive Neuroscience, 14, 145–159.
Price, C. J., & Devlin, J. T. (2003). The myth of the visual word form area. NeuroImage, 19,
473–481.
Price, C. J., & Devlin, J. T. (2004). The pro and cons of labelling a left occipitotem
(p. 505)
poral region: “The visual word form area” NeuroImage, 22, 477–479.
Price, C. J., Devlin, J. T., Moore, C. J., et al. (2005). Meta-analyses of object naming: Effect
of baseline. Human Brain Mapping, 25, 70–82.
Page 26 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Price, C. J., & Friston, K. (2002). Degeneracy and cognitive anatomy. Trends in Cognitive
Sciences, 6, 416–421.
Price, C. J., & Friston, K. J. (2005). Functional ontologies for cognition: The systematic de
finition of structure and function. Cognitive Neuropsychology, 22, 262–275.
Price, C. J., Gorno-Tempini, M. L., Graham, K. S., et al. (2003). Normal and pathological
reading: Converging data from lesion and imaging studies. NeuroImage, 20, S30–S41.
Price, C. J., Howard, D., Patterson, K., et al. (1998). A functional neuroimaging description
of two deep dyslexic patients. Journal of Cognitive Neuroscience, 10, 303–315.
Price, C. J., & Mechelli, A. (2005). Reading and reading disturbance. Current Opinion in
Neurobiology, 15, 231–238.
Price, C. J., Moore, C. J., & Frackowiak, R. S. (1996). The effect of varying stimulus rate
and duration on brain activity during reading. NeuroImage, 3 (1), 40–52.
Pugh, K. R., Mencl, W. E., Jenner, A. R., Katz, L., Frost, S. J., Lee, J. R., Shaywitz, S. E., &
Shaywitz, B. A. (2001). Neurobiological studies of reading and reading disability. Journal
of Communication Disorders, 34 (6), 479–492.
Pugh, K. R., Shaywitz, B. A., Shaywitz, S. E., Constable, R. T., Skudlarski, P., Fulbright, R.
K., et al. (1996). Cerebral organization of component processes in reading. Brain, 119,
1221–1238.
Rabinovici, G. D., Seeley, E. J., Gorno-Tempini, M. L., et al. (2008). Distinct MRI atrophy
patterns in autopsy-proven Alzheimer’s disease and frontotemporal lobar degeneration.
American Journal of Alzheimer’s Disease and Other Dementias, 22, 474–488.
Rapcsak, S. Z., & Beeson, P. M. (2002). Neuroanatomical correlates of spelling and writ
ing. In A. E. Hillis (Ed.), Handbook of adult language disorders: Integrating cognitive neu
ropsychology, neurology, and rehabilitation (pp. 71–99). Philadelphia: Psychology Press.
Rapcsak, S. Z., & Beeson, P. M. (2004). The role of left posterior inferior temporal cortex
in spelling. Neurology, 62, 2221–2229.
Rapcsak, S. Z., Beeson, P. M., Henry, M. L., Leyden, A., Kim, E., Rising, K., Andersen, S.,
& Cho, H. (2009). Phonological dyslexia and dysgraphia: Cognitive mechanisms and neur
al substrates. Cortex, 45 (5), 575–591.
Rapcsak, S. Z., Beeson, P. M., & Rubens, A. B. (1991). Writing with the right hemisphere.
Brain and Language, 41, 510–530.
Rapcsak, S. Z., Henry, M. L., Teague, S. L., et al. (2007). Do dual-route models accurately
predict reading and spelling performance in individuals with acquired alexia and
agraphia? Neuropsychologia, 45, 2519–2524.
Page 27 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Rapcsak, S. Z., Rubens, A. B., & Laguna, J. F. (1990). From letters to words: Procedures
for word recognition in letter-by-letter reading. Brain and Language, 38, 504–514.
Rapp, B., & Caramazza, A. (1998). A case of selective difficulty in writing verbs. Neuro
case, 4, 127–140.
Rapp, B., Epstein, C., & Tainturier, M. J. (2002). The integration of information across lexi
cal and sublexical processes in spelling. Cognitive Neuropsychology, 19, 1–29.
Rapp, B., & Hsieh, L. (2002). Functional magnetic resonance imaging of the cognitive
components of the spelling process. Cognitive Neuroscience Society Meeting, San Fran
cisco.
Rapp, B., & Lipka, K. (2011). The literate brain: The relationship between spelling and
reading. Journal of Cognitive Neuroscience, 23 (5), 1180–1197.
Raymer, A., Foundas, A. L., Maher, L. M., et al. (1997). Cognitive neuropsychological
analysis and neuroanatomical correlates in a case of acute anomia. Brain and Language,
58, 137–156.
Roeltgen, D. P., & Heilman, K. M. (1984). Lexical agraphia: Further support for the two-
system hypothesis of linguistic agraphia. Brain, 107, 811–827.
Roeltgen, D. P., Sevush, S., & Heilman, K. M. (1983). Phonological agraphia: Writing by
the lexical-semantic route. Neurology, 33, 755–765.
Rosen, H. J., Gorno-Tempini, M. L., Goldman, W. P., et al. (2002). Patterns of brain atrophy
in frontotemporal dementia and semantic dementia. Neurology, 58, 198–208.
Roux, F. E., Dufor, O., Giussani, C., Wamain, Y., Draper, L., Longcamp, M., & Démonet, J. F.
(2009). The graphemic/motor frontal area Exner’s area revisited. Annals of Neurology, 66
(4), 537–545.
Roux, F. E., Dufor, O., Giussani, C., Wamain, Y., Draper, L., Longcamp, M., & Démonet, J. F.
(2009). The graphemic/motor frontal area Exner’s area revisited. Annals of Neurology, 66,
537–545.
Saffran, E. M., & Coslett, H. B. (1998). Implicit vs. letter-by-letter reading in pure alexia:
A tale of two systems. Cognitive Neuropsychology, 15, 141–165.
Sakurai, Y., Sakai, K., Sakuta, M., & Iwata, M. (1994). Naming difficulties in alexia with
agraphia for kanji after a left posterior inferior temporal lesion. Journal of Neurology,
Neurosurgery, and Psychiatry, 57, 609–613.
Page 28 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Salmelin, R. (2007). Clinical neurophysiology of language: The MEG approach. Clinical
Neurophysiology, 118, 237–254.
Salmelin, R., Service, E., Kiesilä, P., Uutela, K., & Salonen, O. (1996). Impaired visual
word processing in dyslexia revealed with magnetoencephalography. Annals of
Neurology, 40 (2), 157–162.
Schlaggar, B. L., & McCandliss, B. D. (2007). Development of neural systems for reading.
Annual Review of Neuroscience, 30, 475–503.
Shallice, T. (1981). Phonological agraphia and the lexical route in writing. Brain, 104,
413–429.
Simos, P. G., Breier, J. I., Fletcher, J. M., Foorman, B. R., Castillo, E. M., & Papanicolaou,
A. C. (2002). Brain mechanisms for reading words and pseudowords: An integrated ap
proach. Cerebral Cortex, 12 (3), 297–305.
Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes [re
view]. Science, 283 (5408), 1657–1661.
Starrfelt, R., & Gerlach, C. (2007). The visual what for area: Words and pictures in the left
fusiform gyrus. NeuroImage, 35, 334–342.
Strain, E., Patterson, K., Graham, N., & Hodges, J. R. (1998). Word reading in
(p. 506)
Alzheimer’s disease: Cross-sectional and longitudinal analyses of response time and accu
racy data. Neuropsychologia, 36, 155–171.
Tainturier, M.-J., & Rapp, B. (2001). The spelling process. In B. Rapp (Ed.), The handbook
of cognitive neuropsychology: What deficits reveal about the human mind (pp. 263–289).
Philadelphia: Psychology Press.
Tarkiainen, A., Helenius, P., Hansen, P. C., Cornelissen, P. L., & Salmelin, R. (1999). Dy
namics of letter string perception in the human occipitotemporal cortex. Brain, 122 (Pt
11): 2119–2132.
Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., Fazio, F.,
Rizzolatti, G., Cappa, S. F., & Perani, D. (2005). Listening to action-related sentences acti
vates fronto-parietal motor circuits. Journal of Cognitive Neurosci ence, 17 (2), 273–281.
Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com
petition on activity in left prefrontal cortex during word generation. Neuron, 23 (3), 513–
522.
Page 29 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Tsapkini, K., & Rapp, B. (2009). Neural response to word and pseudoword reading after a
left fusiform gyrus resection: An fMRI investigation. Poster presented at the Academy of
Aphasia, Boston.
Tsapkini, K., & Rapp, B. (2010). The orthography-specific functions of the left fusiform
gyrus: Evidence of modality and category specificity. Cortex, 46, 185–205.
Turkeltaub, P. E., Eden, G. F., Jones, K. M., & Zeffiro, T. A. (2002). Meta-analysis of the
functional neuroanatomy of single-word reading: method and validation. NeuroImage, 16
(3 Pt 1), 765–780.
Usui, K., Ikeda, A., Takayama, M., et al. (2003). Conversion of semantic information into
phonological representation: A function in left posterior basal temporal area. Brain, 126,
632–641.
Usui, K., Ikeda, A., Takayama, M., et al. (2005). Processing of Japanese morphogram and
syllabogram in the left basal temporal area: Electrical cortical stimulation studies. Cogni
tive Brain Research, 24, 274–283.
Vigneau, M., Beaucousin, V., Hervé, P. Y., et al. (2006). Meta-analyzing left hemisphere
language areas: Phonology, semantics, and sentence processing. NeuroImage, 30, 1414–
1432.
Vinckier, F., Dehaene, S., Jobert, A., et al. (2007). Hierarchical coding of letter strings in
the ventral stream: dissecting the inner organization if the visual word-form system. Neu
ron, 55, 143–156.
Weekes, B., Coltheart, M., & Gordon, E. (1997). Deep dyslexia and right-hemisphere read
ing: Regional cerebral blood flow study. Aphasiology, 11, 1139–1158.
Wilson, S. M., Brambati, S. M., Henry, R. G., et al. (2009). The neural basis of surface
dyslexia in semantic dementia. Brain, 132, 71–86.
Wilson, T. W., Leuthold, A. C., Lewis, S. M., et al. (2005). Cognitive dimensions of ortho
graphic stimuli affect occipitotemporal dynamics. Experimental Brain Research, 167,
141–147.
Page 30 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Wilson, T. W., Leuthold, A. C., Moran, J. E., et al. (2007). Reading in a deep orthography:
Neuromagnetic evidence of dual mechanisms. Experimental Brain Research, 180, 247–
262.
Wise, R., Hadar, U., Howard, D., & Patterson, K. (1991). Language activation studies with
positron emission tomography [review]. Ciba Foundation Symposium, 163, 218–228; dis
cussion 228–234.
Woollams, A., Lambon Ralph, M. A., Plaut, D. C., & Patterson, K. (2007). SD-squared: On
the association between semantic dementia and surface dyslexia. Psychological Review,
114, 316–339.
Wright, N., Mechelli, A., Noppeney, U., et al. (2008). Selective activation around the left
occipito-temporal sulcus for words relative to pictures: Individual variability of false posi
tives? Human Brain Mapping, 29, 986–1000.
Xue, G., & Poldrack, R. A. (2007). The neural substrates of visual perceptual learning of
words: Implications for the visual word form area hypothesis. Journal of Cognitive Neuro
science, 19, 1643–1655.
Zaidel, E. (1990). Language functions in the two hemispheres following complete cere
bral commissurotomy and hemispherectomy. In F. Boller, J. Grafman (Eds.), Handbook of
neuropsychology (Vol. 4, pp. 115–150). Amsterdam: Elsevier.
Kyrana Tsapkini
Argye Hillis
Page 31 of 31
Neural Systems Underlying Speech Perception
This chapter reviews the neural systems underlying speech perception processes with a
focus on the acoustic properties contributing to the phonetic dimensions of speech and
the mapping of speech input to phonetic categories and word representations. Neu
roimaging findings and results from brain lesions resulting in aphasia suggest that these
processes recruit a neural system that includes temporal (Heschl’s gyrus, superior tempo
ral gyrus, superior temporal sulcus, and middle temporal gyrus), parietal (supramarginal
and angular gyri), and frontal lobe structures (inferior frontal gyrus) and support a func
tional architecture in which there are distinct stages organized such that information at
one stage of processing influences and modulates information at other stages of process
ing. In particular, spectral-temporal analysis of speech recruits temporal areas, lexical
processing, i.e. the mapping of sound structure to phonetic category and lexical represen
tations, recruits posterior superior temporal gyrus and parietal areas, and selection
processes recruit frontal areas. The authors consider how evidence from cognitive neuro
science can inform a number of theoretical issues that have dominated speech perception
research, including invariance, the influence of higher level processes (both lexical and
sentence processing) on lower level phonetic categorization processes, the influence of
articulatory and motor processes on speech perception, the effects of phonetic and
phonological competition on lexical access, and the plasticity of phonetic category learn
ing in the adult listener.
Keywords: speech perception, lexical processing, lexical competition, invariance, neuroimaging, aphasia, acoustic
properties, phonetic categories, plasticity
Introduction
Language processing appears to be easy and seamless. However, evidence across a wide
range of disciplines and approaches to the study of language, including theoretical lin
guistics, behavioral studies of language, and the cognitive neuroscience of language, sug
gests that the language system can be broken down into separable components—speech,
words, syntactic structure, meaning. It is the goal of this chapter to focus on one of these
Page 1 of 30
Neural Systems Underlying Speech Perception
For most of us, speech is the medium we use for language communication. Thus, in both
listening and speaking, it serves as the interface with the language system. In language
understanding, the auditory input must be mapped onto phonetic categories, and these
must be mapped onto word representations that are ultimately mapped onto meaning or
conceptual structures. Similarly, in spoken word production, concepts must be mapped
onto word representations that are ultimately mapped onto articulatory plans and com
mands for producing the sounds of speech. Although study of the cognitive (p. 508) neuro
science of speech processing includes the processes that map sounds to words and words
to speech, we will focus in this chapter on the perceptual processing stream, namely, the
processes that map auditory input to phonetic category and word representations.
To begin this review it is important to provide a theoretical framework that elucidates our
current understanding of the functional architecture of speech processing. Such a frame
work serves as a guide as we examine the neural systems underlying speech processing.
At the same time, examination of the neural systems recruited during speech processing
may in turn shape and constrain models of speech processing (Poldrack, 2006). For exam
ple, although most models agree that in both speech perception and speech production
there is a series of successive stages in which information is transformed along the pro
cessing stream, there is disagreement about the extent to which the processing stages in
fluence each other. In one view, the processing stages are discrete and informationally en
capsulated (Fodor, 1983; Levelt, 1992). In this view, the input to a stage or module is
processed and transformed, but its output is sent to the next stage without leaving any
“residue” of information from the earlier processing stage. In contrast, the interactive or
cascading view holds that although there are multiple stages of processing, information
flow at one level of processing influences and hence modulates other stages of processing
(Dell, 1986; Gaskell & Marslen-Wilson, 1997; McClelland & Elman, 1986). For example, it
has been shown in spoken word production that the sound structure properties of the lex
icon cascade and influence phonological planning and ultimately articulatory implementa
tion stages or processing (Baese-Berk & Goldrick, 2009; Goldrick & Blumstein, 2006; Per
amunage, Blumstein, Myers, Goldrick, & Baese-Berk, 2011).
Page 2 of 30
Neural Systems Underlying Speech Perception
In the remainder of this chapter we discuss in detail how this functional architecture
maps onto the neural system. In broad strokes, speech perception and the mapping of
sounds to words recruits a neural system which includes temporal lobe structures
(Heschl’s gyrus, superior temporal gyrus [STG], superior temporal sulcus [STS], and mid
dle temporal gyrus [MTG]), parietal lobe structures (supramarginal gyrus [SMG] and an
gular gyrus [AG]), and frontal lobe structures (inferior frontal gyrus [IFG]) (see Figure
25.1). Early stages of processing appear to be bilateral, whereas later stages appear to
recruit primarily the left hemisphere. We examine how the neural system responds to dif
ferent acoustic properties contributing to the phonetic dimensions of speech. We investi
gate the neural substrates reflecting the adult listener’s ability to respond to variability in
the speech stream and to perceive a stable phonetic representation. We consider the po
tential relationship between speech perception and speech production processes by ex
amining the extent to which speech perception processes recruit and are influenced by
the speech production and motor system. We review the various stages implicated in the
mapping from sounds to words as a means of examining whether the same neural system
is recruited for resolving competition across linguistic domains or whether different areas
are recruited as a function of a particular linguistic domain. Finally, we examine the neur
al systems underlying the plasticity of the adult listener in learning new phonetic cate
gories.
Page 3 of 30
Neural Systems Underlying Speech Perception
There is fairly general consensus that processing the acoustic details of speech occurs in
the temporal lobes, specifically in Heschl’s gyrus, the STG, and extending into the STS
(Binder & Price, 2001; Hickok & Poeppel, 2004). Phoneme identification and discrimina
tion tasks are disrupted when the (p. 510) left posterior temporal gyrus is directly stimu
Page 4 of 30
Neural Systems Underlying Speech Perception
lated (Boatman & Miglioretti, 2005), and damage to the bilateral STG can result in
deficits in processing the details of speech (Auerbach, Allard, Naeser, Alexander, & Al
bert, 1982; Coslett, Brashear, & Heilman, 1984; cf. Poeppel, 2001, for review). However,
interesting differences emerge in the neural systems recruited in the processing of the
acoustic parameters of speech. These differences appear to reflect whether the acoustic
attribute is spectral or temporal, whether the acoustic information occurs over a short or
long time window, and whether the cue plays a functional role in the language. We now
turn to these early analysis stages in which the acoustic-phonetic properties are mapped
onto phonetic representations, which is the primary focus of the next section.
Voice onset time (VOT) is defined as the time between the release of an oral closure and
the onset of voicing in syllable-initial stop consonants. This articulatory/acoustic parame
ter serves as a cue to the perception of voicing contrasts in syllable-initial stop conso
nants, distinguishing between voiced sounds such as /b/, /d/, and /g/, and voiceless sounds
such as /p/, /t/, and /k/ (Lisker & Abramson, 1964). VOT is notable in part because small
differences (10 to 20 ms) in the timing between the burst and onset of voicing cue per
ceptual distinctions between voiced and voiceless stop consonants (Liberman, Delattre, &
Cooper, 1958; Liberman, Harris, Hoffman, & Griffith, 1957). Nonetheless, depending on
the VOT values, similar differences are perceived by listeners as belonging to the same
phonetic category. Thus, stimuli on a VOT continuum are easily discriminated when they
fall into two distinct speech categories (e.g., “d” versus “t”), but are difficult to distin
guish if they are members of the same phonetic category. This is true even when the
acoustic distance between exemplars is equivalent.
tem to this property forms the basis for the phonetic property of voicing in stop conso
nants.
Consistent with these findings are results from Liégeois-Chauvel and colleagues (1999).
Using intracerebral evoked potentials, they observed components time-locked to the
acoustic properties of voiced and voiceless tokens for both speech and nonspeech lateral
ized to the left hemisphere in Heschl’s gyrus, the planum temporale (PT), and the posteri
or part of the STG (Brodmann area [BA] 22). Results from a magnetoencephalography
(MEG) study (Papanicolaou et al., 2003) suggest that responses to nonspeech “tone-onset-
time” and speech VOT stimuli do not differ in laterality early in processing (before 130
ms). However, a strong leftward laterality is observed for speech tokens, but not for non
speech tokens, later in processing (>130 ms). Similarly, in a MEG study examining within-
phonetic-category variation, Frye et al. (2007) showed greater sensitivity to within-cate
gorization in left-hemisphere evoked responses than in the right hemisphere. Taken to
gether, these findings suggest that early stages of processing recruit auditory areas bilat
erally, but that at later stages, the processing is lateralized to the left hemisphere, pre
sumably when the auditory information is encoded onto higher level acoustic-phonetic
properties of speech that ultimately have linguistic relevance.
Indeed, left posterior temporal activation is modulated by the nature of the phonetic cate
gory information conveyed by a particular VOT value. In particular, results of neuroimag
ing (p. 511) studies have shown graded activation to stimuli along a VOT continuum in left
posterior temporal areas with increasing activation as the VOT of the stimuli approached
the phonetic boundary (Blumstein, Myers, & Rissman, 2005; Hutchison, Blumstein, & My
ers, 2008; Myers, 2007; Myers, Blumstein, Walsh, & Eliassen, 2009). Exemplar stimuli
(i.e., those stimuli perceived as a clear-cut voiced or voiceless stop consonant) showed
the least activation. Stimuli that were poorer exemplars of their category, because those
stimuli either were close to the category boundary (Blumstein et al., 2005) or had ex
treme VOT values at the periphery of the continuum (Myers, 2007), showed increased
neural activation, suggesting that the posterior portion of the STG is sensitive to the
acoustic-phonetic structure of the phonetic category.
Stop consonants such as /b/, /d/, and /g/ differ with respect to their place of articulation—
that is, the location in the oral cavity where the closure is made to produce these sounds.
In stop consonants, differences in place of articulation are signaled by the shape of for
mants as they transition from the burst to the vowel. Unlike VOT, which is a temporal pa
rameter, place of articulation is cued by rapid spectral changes over a relatively short
time window of some 20 to 40 ms (Stevens & Blumstein, 1978). Neuroimaging evidence
largely supports the notion that place of articulation, like VOT, recruits temporal lobe
structures and is processed primarily in the left hemisphere.
Page 6 of 30
Neural Systems Underlying Speech Perception
Joanisse and colleagues (2007) examined cortical responses to changes in phonetic identi
ty from /ga/ to /da/ as participants listened passively to phonetic tokens while they
watched a silent movie. In this study, sensitivity to between-category shifts was observed
in the left STS, extending into the MTG. Of interest, similar areas were recruited for sine-
wave analogues of place of articulation (Desai, Liebenthal, Waldron, & Binder, 2008).
Sine-wave speech is speech that has been filtered so that the energy maxima are re
placed by simple sine waves. Initially, sine-wave speech sounds like unintelligible noise;
however, it has been shown that with experience, listeners become adept at perceiving
these sounds as speech (Remez, Rubin, Pisoni, & Carrell, 1981). Of importance, although
sine-wave speech is stripped of the precise acoustic cues that distinguish the sounds of
speech, it preserves relative shifts in spectral energy that signal place of articulation in
stop consonants. In a functional magnetic resonance imaging (fMRI) study by Desai et al.
(2008), participants heard sounds along a continuum from /ba/ to /da/, as well as sine-
wave versions of these sounds and sine-wave stimuli that did not correspond to any
speech sound. They were scanned while listening to speech and sine-wave speech sounds
before and after they were made aware of the phonetic nature of the sine-wave speech
sounds. Results showed that activation patterns were driven not by the spectral charac
teristics of the stimuli, but rather by whether sounds were perceived as speech or not.
Specifically, left posterior temporal activation was observed for real speech sounds before
training and for sine-wave speech sounds once participants had begun to perceive them
as speech. That sine-wave speech recruits the same areas as speech but only when it is
perceived by the listener as speech suggests that the linguistic function of an input is crit
ical for recruiting the left-hemisphere speech processing stream. As we will see below
(see the section, Perception of Lexical Tone), these findings are supported by the percep
tion of tone when it has linguistic relevance to a speaker.
Perception of Vowels
Vowel perception involves attention to spectral information at a longer time scale than
that of either VOT or place of articulation, typically on the order of 150 to 300 ms. Al
though stop consonants are marked by spectral transitions over short time windows,
spectral information in vowels is relatively steady at the vowel midpoint, although there
are spectral changes as a function of the consonant environment in which a vowel occurs
(Strange, Jenkins, & Johnson, 1983). Moreover, vowels are more forgiving of differences
in timing—for instance, vowels as short as 25 ms can still be perceived as members of
their category, suggesting that the dominant cue to vowel identity is the spectral cue,
rather than durational cues (Strange, 1989). Indeed, vowel quality can be perceived
based on the spectral transitions of a preceding consonant (Blumstein & Stevens, 1980).
There are few behavioral studies examining the perception of vowels pursuant to brain in
jury. Because participants with right-hemisphere damage typically do not present with
aphasia, it is usually assumed that they do not have speech perception impairments. The
sparse evidence available supports this claim (cf. Tallal & Newcombe, 1978). (p. 512)
Nonetheless, dichotic listening experiments with normal individuals have typically shown
that, unlike consonants, which display a strong right ear (left hemisphere) advantage,
Page 7 of 30
Neural Systems Underlying Speech Perception
vowels show either no ear preference or a slight left ear (right hemisphere) advantage
(Shankweiler & Studdert-Kennedy, 1967; Studdert-Kennedy & Shankweiler, 1970). As de
scribed below, neuroimaging experiments also suggest neural differences in the process
ing of consonants and vowels.
Not surprisingly, fMRI studies of vowel perception show recruitment of the temporal lobe.
However, results suggest that vowel processing appears to differ from consonant process
ing both in terms of the areas within the temporal lobe that are recruited and in terms of
the laterality of processing.
With respect to the recruitment of temporal lobe areas, it appears that vowels recruit the
anterior temporal lobe to a greater degree than consonant processing. In particular, acti
vation patterns observed by Britton et al. (2009), Leff et al. (2009), and Obleser et al.
(2006) have shown that areas anterior to Heschl’s gyrus are recruited in the perception of
vowels. For instance, in a study by Obleser and colleagues (2006), activation was moni
tored using fMRI as participants listened to sequences of vowels that varied in either the
first or second formants, acoustic parameters that spectrally distinguish vowels. In gener
al, more activation was observed for vowels than nonspeech sounds in the anterior STG
bilaterally. Moreover, anterior portions of this region responded more to back vowels ([u]
and [o]), whereas more posterior portions responded more to front vowels ([i] and [e]),
suggesting that the anterior portions of the superior temporal cortex respond to the
structure of vowel space.
These findings suggest that there may be a subdivision of temporal cortex along the ante
rior-posterior axis as a function of the phonetic cues giving rise to the sounds of lan
guage. Phonetic cues to stop consonants seem to recruit temporal areas posterior to
Heschl’s gyrus as well as the MTG (Joanisse, Zevin, & McCandliss, 2007; Liebenthal et
al., 2010; Myers et al., 2009), whereas phonetic cues to vowels seem to recruit more ante
rior regions (Britton et al., 2009; Leff et al., 2009; Obleser et al., 2006).
Nonetheless, it appears that the laterality of vowels is influenced by their duration and
the context in which they occur. In particular, it appears that vowels when presented in
more naturally occurring contexts, where either their durations are shorter or they occur
in consonant contexts, are more likely to recruit left-hemisphere mechanisms. For exam
ple, Britton et al. (2009) reported a study of vowel processing in which vowels were pre
Page 8 of 30
Neural Systems Underlying Speech Perception
sented at varying durations, from 75 to 300 ms. Vowels at longer durations showed
greater activation in both left and right anterior STG, with a general rightward asymme
try for vowels. However, there was a decreasing right-hemisphere preference as the dura
tion of the vowels got shorter. A study by Leff et al. (2009) examined the hemodynamic re
sponse to an unexpected vowel stimulus embedded in a /bVt/ context. Increased activa
tion for the vowel deviant was observed in anterior portions of the left STG, whereas no
such response to deviant stimuli was observed when the standard and deviant stimuli
were nonspeech stimuli that were perceptually matched to the vowels.
Evidence suggests that even at the level of the brainstem, native speakers of tone lan
guages respond differently to tone contours even when this information is not embedded
in linguistic structure (Krishnan, Swaminathan, & Gandour, 2009), suggesting that expe
rience in learning a tonal language fundamentally alters one’s auditory perception of
pitch changes in general. At the level of the cortex, however, responses to a given type of
tone pattern may be more language specific. For instance, Xu et al. (2006) reported a
study of tone processing that used Chinese syllables with appropriate overlaid Chinese
tones, and Chinese syllables with overlaid Thai tones. This latter group of stimuli was not
perceived by Chinese listeners as words in their language, and neither stimulus type was
perceived as words by Thai listeners. When native speakers of Thai and Chinese per
formed a discrimination task on these stimuli, an interaction was observed in the left PT,
such that activation was greatest for tone contours that corresponded to each group’s na
tive language (i.e., Chinese tones for Chinese listeners, Thai tones for Thai listeners). Of
particular interest, this language-specific preference was observable irrespective of
whether stimuli had lexical content or not. Although processing lower level details of tone
may rely on a bilaterally distributed system, it seems clear that even nonlinguistic uses of
tone become increasingly left lateralized when they correspond to the tonal patterns used
in one’s native language.
Page 9 of 30
Neural Systems Underlying Speech Perception
The search for the neural correlates of invariance presents methodological challenges to
the cognitive neuroscientist. The goal is to identify patterns of response that show sensi
tivity to variation between phonetic categories yet are insensitive to variation within the
category. One way researchers have approached this issue is to use adaptation or oddball
designs, in which a standard speech stimulus is repeated a number of times. This string
of standard stimuli is interrupted by a stimulus that differs from the standard along some
dimension, either acoustically (e.g., a different exemplar of the same phonetic category,
or the same phonetic content spoken by a different speaker) or phonetically (e.g., a differ
ent phonetic category). Increases in activation for the “different” stimulus are presumed
to reflect either a release from adaptation (Grill-Spector, Henson, & Martin, 2006; Grill-
Spector & Malach, 2001) or an active change detection response (Zevin, Yang, Skipper, &
McCandliss, 2010), but in either case serve to index a region’s sensitivity to the dimen
sion of change.
A study by Joanisse and colleagues (2007) used such a method to investigate neural acti
vation patterns to within- and between-category variation along a place of articulation
continuum from [da] to [ga]. Sensitivity to between-category variation was observed
along the left STS and into the MTG. One cluster in the left SMG showed an invariant re
sponse, that is, sensitivity to between-category changes in the face of insensitivity to
within-category changes. In this study, no areas showed sensitivity to within-category
variation, which leaves open the question of whether the paradigm was sensitive enough
to detect small acoustic changes.
A study by Myers and colleagues (2009) investigated the same question with a similar de
sign using a VOT continuum from [da] to [ta]. In contrast to Joanisse et al., sensitivity to
both within- and between-category variation was evident in the left posterior STG, indi
cating that the neural system was sensitive to within-category contrasts. The (p. 514) only
region to show an “invariant” response (i.e., sensitivity to 25-ms differences that signaled
Page 10 of 30
Neural Systems Underlying Speech Perception
The fact that invariant responses emerged in frontal areas rather than in temporal areas
suggests that the perception of category invariance may arise, at least in part, from com
putations in frontal areas on graded phonetic input processed in the temporal lobes
(Chang et al., 2010). A cognitive, rather than perceptual, role in computing phonetic cate
gory membership seems plausible in light of behavioral evidence showing that the precise
location of the phonetic category boundary is not fixed, but rather varies with a host of
factors including, but not limited to, lexical context, speaker identity, and speech rate
(Ganong, 1980; Kraljic & Samuel, 2005; Miller & Volaitis, 1989). A mechanism that relied
only on the specific properties of the acoustic signal would be unable to account for the
influence of these factors on speech perception.
Listeners perceive the same phonetic category spoken by both a male and a female
speaker even though the absolute acoustic properties are not the same. At the same time,
the listener also recognizes that two different speakers produced the same utterance.
Thus, the perceptual system must be able to respond in an invariant way at some level of
processing to phonetic identity while at the same time being able to respond differently to
speaker identity.
This relationship between phonetic category invariance and acoustic variability from dif
ferent speakers has been the topic of much research in the behavioral literature. This
type of invariance is especially relevant given the controversy about the degree to which
indexical features in the speech signal such as talker information are retained or discard
ed as we map speech to meaning. Behavioral studies (Goldinger, 1998; Palmeri,
Goldinger, & Pisoni, 1993) suggest that speaker information is encoded along with the
lexical form of a word, and that indexical information affects later access to that form.
This observation has led to episodic theories of speech perception, in which nonphonetic
information such as talker information is preserved in the course of speech processing.
Episodic theories stand in contrast to abstractionist theories of speech perception, which
propose that the speech signal is stripped of indexical features before the mapping of
sound structure to lexical form (Stevens, 1960).
Studies investigating neural activation patterns associated with changes in either speaker
or phonetic information have revealed dissociations between processing of these two
types of information. Sensitivity to changing speakers in the face of constant linguistic in
formation has been observed in the right anterior STS (Belin & Zatorre, 2003) and in the
bilateral MTG and SMG (Wong, Nusbaum, & Small, 2004). In contrast, neural coding that
distinguished among vowel types, but not individual speakers, was reported by Formisano
and colleagues (2008) across a distributed set of regions in left and right STG and STS.
Page 11 of 30
Neural Systems Underlying Speech Perception
Nonetheless, none of these studies has specifically shown invariance to speaker change—
that is, an area that treats phonetically equivalent utterances spoken by different speak
ers as the same. Using an adaptation paradigm, Salvata and colleagues (2012) showed
speaker-invariant responses in the anterior portion of the STG bilaterally, suggesting that
at some level of analysis, the neural system may abstract away from the acoustic variabili
ty across speakers in order to map to a common phonetic form. The fact that speaker-in
variant regions were observed in the temporal lobes suggests that this abstraction may
occur relatively early in the phonetic processing stream, before sound is mapped to lexi
cal form, challenging a strict episodic view of lexical processing.
Of interest, invariant responses may in some cases be shifted by attention to different as
pects of the signal. Bonte et al. (2009) used electroencephalography (EEG) to investigate
the temporal alignment of evoked responses to vowels spoken by a set of three different
speakers. Participants engaged in a one-back task on either phonetic identity or speaker
identity. When engaged in the phonetic task, significant temporal alignment was observed
for vowels regardless of the speaker, and when engaged in the speaker task, signals were
aligned for different vowels spoken by the same speaker. Similarly, von Kriegstein et al.
(2003) showed that attention to voices versus linguistic content resulted in shifts in acti
vation, with attention to voices activating the right middle STS and attention to linguistic
content activating the homologous area on the left.
Evidence from neuroimaging suggests that motor regions involved during speech produc
tion do play some role in speech perception processes. A study by Wilson and colleagues
showed coactivation of a region on the border of BA 4 and BA 6 for both the perception
and production of speech (Wilson, Saygin, Sereno, & Iacoboni, 2004). A similar precen
tral region, together with a region in the left superior temporal lobe, was also shown to
be more active for the perception of non-native speech sounds than for native speech
sounds (Wilson & Iacoboni, 2006), suggesting that motor regions may be recruited, espe
cially when the acoustic input is difficult to map to an existing speech sound in the native
language inventory. What is less clear is the degree to which motor regions involved in ar
ticulation are necessary for the perception of speech.
Page 12 of 30
Neural Systems Underlying Speech Perception
Some evidence comes from the neuropsychological literature. In particular, it is the case
that patients with damage to the speech motor system typically show good auditory lan
guage comprehension (see Hickok, 2009, for a review). Nonetheless, although this evi
dence is suggestive, it is not definitive because auditory comprehension is supported by
multiple syntactic, semantic, and contextual cues that may disguise an underlying speech
perception deficit. To answer this question, it is necessary to examine directly the extent
to which damage to motor areas produces a frank speech perception deficit. Few studies
have investigated this question. However, those that have are consistent with the view
that motor area involvement is not central to speech perception. In particular, aphasic pa
tients with anterior brain damage who showed deficits in the production of VOT in stop
consonants nonetheless performed normally in the perception of a VOT continuum (Blum
stein, Cooper, Zurif, & Caramazza, 1977).
But speech is more than simply identifying the sound shape of language. It serves as the
vehicle for accessing the words of a language, that is the lexicon. As we discuss later, ac
cessing the sound shape of words activates a processing stream that (p. 516) includes the
posterior STG, the SMG and AG, and the IFG. Of importance, the ease or difficulty with
which individuals access the lexicon in auditory word recognition is influenced by a num
ber of factors related to the sound structure properties of words in a language. In this
way, the structure of the phonological lexicon influences activation patterns along this
processing stream. We turn now to a discussion of these factors because they provide a
window into not only the functional architecture of the language processing system but
also the neural systems engaged in selecting a word from the lexicon. In particular, they
show that lexical processes are influenced by the extent to which a target word shares
phonological structure with other words in the lexicon, and hence the extent to which a
target word competes for access and selection.
Lexical Competition
Whether understanding or speaking, we need to select the appropriate words from our
mental lexicon, a lexicon that contains thousands of words, many of which share sound-
shape properties. It is generally assumed that in auditory word recognition, the auditory-
phonetic-phonological input activates not only the target word but also a set of words that
share sound-shape properties with it (Gaskell & Marslen-Wilson, 1997; Marslen-Wilson,
1987). The relationship between the sound structure of a word and the sound structure of
other words in the lexicon affects lexical access, presumably because both processing and
neural resources are influenced by the extent to which a target word has to be selected
from a set of competing words that share this phonological structure.
Indeed, it has been shown that the extent to which a word shares phonological properties
with other words in the lexicon affects the ease with which that word is accessed. In par
ticular, it is possible to count the number of words that share all but one phoneme with
other words in the lexicon and quantify the density of the neighborhood in which a partic
ular word resides. A word that has a lot of words that share phonological properties with
it is said to come from a high-density neighborhood, and a word that has a few words that
share phonological properties is said to come from a low-density neighborhood (Luce &
Pisoni, 1998).
Behavioral research has shown that reaction-time latencies are slower in a lexical deci
sion task for words from high-density neighborhoods compared with low-density neigh
borhoods (Luce & Pisoni, 1998). These findings are consistent with the view that it takes
greater computational resources to select a word when there are many potential competi
tors compared with when there are a few potential competitors. FMRI results show that
there is greater activation in the posterior STG and the SMG in a lexical decision task for
words from high-density compared with low-density neighborhoods (Okada & Hickok,
Page 14 of 30
Neural Systems Underlying Speech Perception
2006; Prabhakaran, Blumstein, Myers, Hutchison, & Britton, 2006). These findings sug
gest that these areas are recruited in accessing the lexical representations of words and
that there are greater processing demands for accessing words as a function of the set of
competing word targets. They are also consistent with studies indicating that posterior
STG and SMG are recruited in phonological and lexical processing (Binder & Price, 2001;
Indefrey & Levelt, 2004; Paulesu, Frith, & Frackowiak, 1993).
Of importance, a series of fMRI studies have shown that the information flow under con
ditions of lexical competition cascade throughout the lexical processing stream, activat
ing the posterior STG, SMG, AG, and IFG. Using the visual world paradigm coupled with
fMRI, Righi et al. (2010) showed increased activation in a temporal-parietal-frontal net
work for words that shared onsets, that is, initial sound segments or initial syllables. In
these studies, subjects were required to look at the picture of an auditorily presented
word from an array of four pictures including the named picture (hammock), a picture of
a word that shared the onset of the target word (hammer), and two other pictures that
had neither a phonological nor semantic relationship with the target word (monkey,
chocolate). Participants’ eye movements were tracked during fMRI scanning. Behavioral
results of participants in the scanner replicated earlier findings showing more looks to
the onset competitor than to the unrelated stimuli (Allopenna, Magnuson, & Tanenhaus,
1998). These findings show that the presence of phonological competition not only modu
lates activation in the posterior STG and the SMG, areas implicated in phonological pro
cessing and lexical access, but also has a modulatory effect on activation in frontal areas
and in particular the IFG, an area implicated in selecting among competing semantic al
ternatives (Thompson-Schill, D’Esposito, Aguirre, & Farah 1997; Thompson-Schill,
D’Esposito, & Kan, 1999).
That the IFG is also activated in selecting among competing phonological alternatives
raises one of two possibilities, as yet unresolved in the literature, about the functional
role of the IFG in resolving (p. 517) competition. One possibility is that selection processes
are domain general and cut across different levels of the grammar (cf. Duncan, 2001;
Duncan & Owen, 2000; Miller & Cohen, 2001; Smith & Jonides, 1999). Another possibility
is that there is a functional subdivision of the IFG depending on the source of the compe
tition. In particular, BA 44 is recruited in selection based on phonological properties
(Buckner, Raichle, & Petersen, 1995; Burton, Small, & Blumstein, 2000; Fiez, 1997; Pol
drack et al., 1999), and BA 45 is recruited in selection based on semantic properties (Sny
der, Feignson, & Thompson-Schill, 2007; Thompson-Schill et al., 1997, 1999). Whatever
the ultimate conclusion, the critical issue is that phonological properties of the lexicon
have a modulatory effect along the speech-lexical processing stream consistent with the
view that information flow at one level of processing (phonological lexical access) cas
cades and influences other stages of processing downstream from it (lexical selection).
Page 15 of 30
Neural Systems Underlying Speech Perception
word recognition has shown deficits in aphasic patients in accessing words in the lexicon,
especially under conditions of phonological and lexical competition (cf. Blumstein, 2009,
for review). Of importance, different patterns of deficits emerge for patients with frontal
lesions and for those with temporo-parietal lesions, suggesting that these areas play dif
ferent functional roles in lexical access and lexical selection processes (Blumstein & Mil
berg, 2000; Utman, Blumstein, & Sullivan, 2001; Yee, Blumstein, & Sedivy, 2008).
To this point, it appears as though information flow in speech processing and in accessing
the sound properties of words is unidirectional, going from lower level processes to in
creasingly more abstract representations, and recruiting a processing stream from tem
poral to parietal to frontal areas. However, we also know that listeners have a lexical bias
when processing the phonetic categories of speech, suggesting that higher level, lexical
information may influence lower level, speech processing. In a seminal paper, Ganong
(1980) showed that when presented with two continua varying along the same acoustic-
phonetic attribute (e.g., VOT of the initial stop consonant), listeners show a lexical bias.
Thus, they perceive more [b]’s in a beef–peef continuum and more [p]’s in a beace–peace
continuum. Of importance, the same stimulus at or near the phonetic boundary is per
ceived differently as a function of the lexical status of the stimulus.
These findings raise the question of whether information flow in mapping sounds to
words is solely bottom-up or whether top-down information affects lower level perceptual
processes. There has been much debate about this question in the behavioral literature
(cf. Burton et al., 1989; Connine & Clifton, 1987; McQueen, 1991; Pitt & Samuel, 1993),
some claiming that these results demonstrate that top-down lexical information shapes
perceptual processes (Pitt & Samuel, 1993), and others claiming that they reflect post
perceptual decision-related processes (Fox, 1984), and more recently in the fMRI litera
ture (Davis et al., 2011; Guediche et al., 2013). Evidence from functional neuroimaging
provides a potential resolution to this debate. In particular, findings using fMRI (Myers &
Blumstein, 2008) and MEG coupled with EEG (Gow, Segawa, Ahlfors, & Lin, 2008) are
consistent with a functional architecture in which information flow is not just bottom-up
but is also top-down. In both studies, participants listened to stimuli taken from continua
ranging from a word to a nonword (e.g., gift to kift) or from a nonword to a word (e.g.,
giss to kiss). Participants were asked to categorize the initial phoneme of each stimulus,
and in both studies there was evidence of a shift in the perception of tokens from the con
tinuum such that more categorizations were made that were consistent with the word
endpoint (e.g., more “g” responses in the “gift–kift” continuum). Activation patterns in the
STG reflected this shift in perception, showing modulation as a function of the lexically bi
ased shift in the locus of the phonetic boundary. Given that modulation of activation was
seen early in the neural processing stream, these data were taken as evidence that lexi
cally biased shifts in perception affect early processing of speech stimuli in the STG. In
deed, the MEG/EEG source estimates suggest that the information flow from the SMG, an
Page 16 of 30
Neural Systems Underlying Speech Perception
area implicated in lexical processing, influenced activation in the left posterior STG (Gow
et al., 2008).
Taken together, these results are consistent with a functional architecture of language in
which information flow is bidirectional. Of interest, the STG appears to integrate phonetic
information with multiple sources of linguistic information, including (p. 518) lexical and
sentence level (meaning) information (Chandrasekaran, Chan, & Wong, 2011; Obleser,
Wise, Dresner, & Scott, 2007).
Evidence from fMRI and event-related potential (ERP) studies of non-native sound pro
cessing suggests that better proficiency in perceiving non-native speech contrasts is ac
companied by neural activation patterns that increasingly resemble the patterns shown
for native speech sounds. Specifically, increased leftward lateralization is found in the
temporal lobe for those listeners who show better perceptual performance for trained
non-native speech sounds (Naatanen et al., 1997; Zhang et al., 2009; cf. Naatanen, Paavi
lainen, Rinne, & Alho, 2007, for review). Moreover, increased sensitivity to non-native
contrasts also correlates with less activation in the left IFG as measured by fMRI
(Golestani & Zatorre, 2004; Myers & Blumstein, 2011). At the same time, increased profi
ciency in perceiving non-native sounds is accompanied by an increase in the size of the
mismatch negativity, or MMN (Ylinen et al., 2010; Zhang et al., 2009), which is thought to
Page 17 of 30
Neural Systems Underlying Speech Perception
index preattentive sensitivity to acoustic contrasts and to originate from sources in the bi
lateral temporal lobes and inferior frontal gyri.
These findings are not necessarily incompatible with one another. The neural sensitivity
indexed by the MMN response may reflect more accurate or more efficient encoding of
learned phonetic contrasts, particularly in the temporal lobes. In turn, decreases in acti
vation in the left IFG for processing non-native speech sounds may be related to greater
efficiency and hence fewer neural resources needed to access and make decisions about
category representations when speech sound perception is well encoded in the temporal
lobes.
In fact, there is some evidence suggesting that frontal areas play a more active role in de
termining the degree of success that learners attain on non-native contrasts. In an ERP
study of bilingualism by Diaz et al. (2008), early Spanish-Catalan bilinguals were grouped
by their mastery of a vowel contrast in their second language, Catalan. MMN responses
to a non-native and a native vowel contrast as well as to nonspeech stimuli were assessed
in a group of “good perceivers” and a group of “poor perceivers.” Although good and poor
perceivers did not differ in their MMN response to either the native and non-native
speech contrasts or the nonspeech stimuli at electrodes over the temporal poles, signifi
cant differences between groups emerged for both types of speech contrasts over frontal
electrode sites, with good perceivers showing a larger MMN to both non-native and na
tive speech contrasts than poor perceivers.
Of interest, some recent research suggests that individual differences in activation for
non-native speech sounds may arise at least in part because of differences in brain mor
phology. In Heschl’s gyrus, greater volume in the left but not right hemisphere correlates
with individual success in learning a non-native tone contrast (Wong et al., 2008) and a
non-native place of articulation contrast (Golestani, Molko, Dehaene, LeBihan, & Pallier,
2007). Given that Heschl’s gyrus is involved in auditory processing, these anatomical
asymmetries give rise to the hypothesis that better learning of non-native contrasts
comes about because some individuals are better able to perceive the fine-grained
acoustic details of speech.
The question remains, however, of whether individual differences in brain morphology are
the cause or consequence of differences in proficiency (p. 519) with non-native contrasts.
That is, do preexisting, potentially innate differences in brain morphology support better
learning, or do differences in cortical volume arise because of neural plasticity resulting
from experience or with the sounds of speech in adulthood? This question was examined
in a recent study by Golestani and colleagues (2011) in which differences in brain mor
phology were measured in a group of trained phoneticians and a group of untrained con
trols. Greater white matter volume in Heschl’s gyrus was evident in the group of phoneti
cians compared to the normal controls. However, differences in these early auditory pro
cessing areas did not correlate with amount of phonetic training experience among the
phoneticians. In contrast, regions in a subportion of the left IFG, the left pars opercularis,
showed a significant correlation with years of phonetic training. Taken together, these re
Page 18 of 30
Neural Systems Underlying Speech Perception
sults suggest that differences in the size and morphology of Heschl’s gyrus may confer an
innate advantage in perceiving the fine-grained details of the speech stream. The in
creased volume in frontal areas appears instead to reflect experience-dependent plastici
ty. Thus, phoneticians may become phoneticians because they have a “natural” propensity
for perceiving speech. But their performance is enhanced by experience and the extent to
which they become “experts.”
Although there has been much progress, many questions remain unanswered. In particu
lar, the direction of information flow has largely been inferred from our knowledge of the
effects of lesions on speech perception processes. Studies that integrate imaging meth
ods with fine-grained temporal resolution (ERP, MEG) with the spatial resolution afforded
by fMRI will be critical for our understanding of the extent to which feedforward or feed
back mechanisms underlie the modulation of activation found, for example, on the influ
ence of lexical information on the perception of the acoustic-phonetic properties of
speech, the influence of motor and articulatory processes on speech perception, and the
influence of sound properties of speech on lexical access. Of particular interest is
whether the IFG is involved solely in decision-related executive processes or whether it,
in turn, modulates activation of neural areas downstream from it. That is, does IFG acti
vation influence the activation of temporal lobe areas in the processes of phonetic catego
rization, and does it influence the activation of parietal lobe areas in the processes of lexi
cal access?
At each level of processing, questions remain. Evidence we have reviewed suggests that
the acoustic-phonetic properties of speech are extracted in different areas within the tem
poral lobe, and at least at early stages of processing, they differentially recruit the right
hemisphere. However, listeners perceive a unitary percept of individual sound segments
and syllables. How does the neural system integrate or “bind” this information? As listen
ers we are attuned to fine acoustic differences, whether it is to within-category variation
or speaker variation, and yet we ignore these differences as we perceive a stable phonet
ic category or lexical representation. How does our neural system solve this invariance
problem across the different sources of variability encountered by the listener? Results to
date suggest that the resolution of this variability may recruit different neural areas, de
pending on the type of variability.
Page 19 of 30
Neural Systems Underlying Speech Perception
And finally, we know that adult listeners show some degree of plasticity in processing lan
guage. They can learn new languages and their attendant phonetic and phonological
structures, and they show the ability to dynamically adapt to variability in the speech and
language input when accessing the sound structure and lexical representations of their
native language (see Kraljic & Samuel, 2007, for a review). What are the neural systems
underlying this plasticity? Is the same neural system recruited that underlies adult pro
cessing of the sound structure of language, or are other areas recruited in support of
such learning?
These are only a few of the questions remaining to be answered, but together they set an
agenda for future research on the neural systems underlying speech perception process
es.
Author Note
This research was supported in part by NIH NIDCD Grant R01 DC006220, R01 DC00314,
and NIH NIDCD Grant R03 DC009495 to Brown University, and NIHDCD P30 DC010751
to the (p. 520) University of Connecticut. The content is solely the responsibility of the au
thors and does not necessarily represent the official views of the National Institute on
Deafness and Other Communication Disorders or the National Institutes of Health.
References
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of
spoken word recognition using eye movements: Evidence for continuous mapping models.
Journal of Memory and Language, 38 (4), 419–439.
Auerbach, S. H., Allard, T., Naeser, M., Alexander, M. P., & Albert, M. L. (1982). Pure word
deafness: Analysis of a case with bilateral lesions and a defect at the prephonemic level.
Brain: A Journal of Neurology, 105 (Pt 2), 271–300.
Belin, P., & Zatorre, R. J. (2003). Adaptation to speaker’s voice in right anterior temporal
lobe, 14 (16), 2105–2109.
Binder, J. R., & Price, C. (2001). Functional neuroimaging of language. In R. Cabeza & A.
Kingstone (Eds.), Handbook of functional neuroimaging of cognition (pp. 187–251). Cam
bridge, MA: MIT Press.
Blumstein, S. E. (2009). Auditory word recognition: Evidence from aphasia and functional
neuroimaging. Language and Linguistics Compass, 3, 824–838.
Blumstein, S. E., Cooper, W. E., Zurif, E. B., & Caramazza, A. (1977). The perception and
production of voice-onset time in aphasia. Neuropsychologia, 15, 371–383.
Page 20 of 30
Neural Systems Underlying Speech Perception
Blumstein, S. E., & Milberg, W. 2000. Language deficits in Broca’s and Wernicke’s apha
sia: A singular impairment. In Y. Grodzinsky, L. Shapiro, & D. Swinney (Eds.), Language
and the brain: Representation and processing (pp. 167–183). New York: Academic Press.
Blumstein, S. E., Myers, E. B., & Rissman, J. (2005). The perception of voice onset time:
An fMRI investigation of phonetic category structure. Journal of Cognitive Neuroscience,
17 (9), 1353–1366.
Blumstein, S. E., & Stevens, K. N. (1980). Perceptual invariance and onset spectra for
stop consonants in different vowel environments. Journal of the Acoustical Society of
America, 67 (2), 648–662.
Boatman, D. F., & Miglioretti, D. L. (2005). Cortical sites critical for speech discrimination
in normal and impaired listeners. Journal of Neuroscience, 25 (23), 5475–5480.
Bonte, M., Valente, G., & Formisano, E. (2009). Dynamic and task-dependent encoding of
speech and voice by phase reorganization of cortical oscillations. Journal of Neuroscience,
29 (6), 1699.
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japan
ese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on
speech production. Journal of the Acoustical Society of America, 101 (4), 2299–2310.
Britton, B., Blumstein, S. E., Myers, E. B., & Grindrod, C. (2009). The role of spectral and
durational properties on hemispheric asymmetries in vowel perception. Neuropsychologia,
47 (4), 1096–1106.
Buckner, R. L., Raichle, M. E., & Petersen, S. E. (1995). Dissociation of human prefrontal
cortical areas across different speech production tasks and gender groups. Journal of
Neurophysiology, 74 (5), 2163–2173.
Burton, M. W. (2001). The role of inferior frontal cortex in phonological processing. Cogni
tive Science, 25, 695–709.
Burton, M. W. (2009). Understanding the role of the prefrontal cortex in phonological pro
cessing. Clinical Linguistics and Phonetics, 23 (3), 180–195.
Burton, M. W., Baum, S. R., & Blumstein, S. E. (1989). Lexical effects on the phonetic cat
egorization of speech: The role of acoustic structure. Journal of Experimental Psychology:
Human Perception and Performance, 15, 567–575.
Burton, M. W., Small, S. L., & Blumstein, S. E. (2000). The role of segmentation in phono
logical processing: An fMRI investigation. Journal of Cognitive Neuroscience, 12 (4), 679–
690.
Caplan, D., Gow, D., & Makris, N. (1995). Analysis of lesions by MRI in stroke patients
with acoustic-phonetic processing deficits. Neurology, 45, 293–298.
Page 21 of 30
Neural Systems Underlying Speech Perception
Chandrasekaran, B., Chan, A. H. D., & Wong, P. C. M. (2011). Neural processing of what
and who information in speech. Journal of Cognitive Neuroscience, 2 (10), 2690–2700.
Chang, E. F., Rieger, J. W., Johnson, K., Berger, M. S., Barbaro, N. M., & Knight, R. T.
(2010). Categorical speech representation in human superior temporal gyrus. Nature
Neuroscience, 13 (11), 1428–1432.
Connine, C. M., & Clifton, C. (1987). Interactive use of lexical information in speech per
ception. Journal of Experimental Psychology: Human Perception and Performance, 13,
291–299.
Coslett, H. B., Brashear, H. R., & Heilman, K. M. (1984). Pure word deafness after bilater
al primary auditory cortex infarcts. Neurology, 34 (3), 347–352.
Davis, M. H., Ford, M. A., Kherif, F., & Johnsrude, I. S. (2011). Does semantic context ben
efit speech understanding through “top–down” processes? Evidence from time-resolved
sparse fMRI. Journal of Cognitive Neuroscience, 23 (12), 3914–3932.
Desai, R., Liebenthal, E., Waldron, E., & Binder, J. R. (2008). Left posterior temporal re
gions are sensitive to auditory categorization. Journal of Cognitive Neuroscience, 20 (7),
1174–1188.
Diaz, B., Baus, C., Escera, C., Costa, A., & Sebastian-Galles, N. (2008). Brain potentials to
native phoneme discrimination reveal the origin of individual differences in learning the
sounds of a second language. Proceedings of the National Academy of Science U S A, 105
(42), 16083–16088.
Duncan, J. (2001). An adaptive model of neural function in prefrontal cortex. Nature Re
views Neuroscience, 2, 820–829.
Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited
by diverse cognitive demands. Trends in Neurosciences, 23, 475–483.
Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5 (2), 79–83.
Formisano, E., De Martino, F., Bonte, M., & Goebel, R. (2008). “Who” is saying “what”?
Brain-based decoding of human voice and speech. Science, 322, 970–973.
Page 22 of 30
Neural Systems Underlying Speech Perception
Frye, R. E., Fisher, J. M. G., Witzel, T., Ahlfors, S. P., Swank, P., Liederman, J., &
(p. 521)
Frye, R. E., Fisher, J. M., Coty, A., Zarella, M., Liederman, J., & Halgren, E. (2007). Linear
coding of voice onset time. Journal of Cognitive Neuroscience, 19 (9), 1476–1487.
Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech percep
tion reviewed. Psychonomic Bulletin and Review, 13 (3), 361–377.
Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., & Hutchins, G. D. (2000).
A crosslinguistic PET study of tone perception. Journal of Cognitive Neuroscience, 12 (1),
207–222.
Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: A distrib
uted model of speech perception. Language and Cognitive Processes, 12, 613–656.
Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning
to articulatory processes: Evidence from tongue twisters. Language and Cognitive
Processes, 21, 649–683.
Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier, C. (2007). Brain structure
predicts the learning of foreign speech sounds. Cerebral Cortex, 17 (3), 575.
Golestani, N., Price, C. J., & Scott, S. K. (2011). Born with an ear for dialects? Structural
plasticity in the expert phonetician brain. Journal of Neuroscience, 31 (11), 4213–4220.
Golestani, N., & Zatorre, R. J. (2004). Learning new sounds of speech: Reallocation of
neural substrates, 21 (2), 494–506.
Gow, D. W., Segawa, J. A., Ahlfors, S. P., & Lin, F.-H. (2008). Lexical influences on speech
perception: A Granger causality analysis of MEG and EEG source estimates. NeuroImage,
43 (3), 614–623.
Grill-Spector, K., Henson, R., & Martin, A. (2006). Repetition and the brain: Neural mod
els of stimulus-specific effects. Trends in Cognitive Sciences, 10 (1), 14–23.
Page 23 of 30
Neural Systems Underlying Speech Perception
Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica (Amsterdam), 107 (1-3), 293–
321.
Guediche, S., Salvata, C., & Blumstein S. E. (2013). Temporal cortex reflects effects of
sentence context on phonetic processing. Journal of Cognitive Neuroscience, 25 (5), 706–
718.
Guenther, F. H., Nieto-Castanon, A., Ghosh, S. S., & Tourville, J. A. (2004). Representation
of sound categories in auditory cortical maps. Journal of Speech, Language, and Hearing
Research, 47 (1), 46–57.
Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for under
standing aspects of the functional anatomy of language. Cognition, 92 (1-2), 67–99.
Hutchison, E. R., Blumstein, S. E., & Myers, E. B. (2008). An event-related fMRI investiga
tion of voice-onset time discrimination. NeuroImage, 40 (1), 342–352.
Indefrey, P., & Levelt, W. J. M. (2004). The spatial and temporal signatures of word pro
duction components, Cognition, 92 (1–2), 101–144.
Joanisse, M. F., Zevin, J. D., & McCandliss, B. D. (2007). Brain mechanisms implicated in
the preattentive categorization of speech sounds revealed using fMRI and a shortinterval
habituation trial paradigm. Cerebral Cortex, 17 (9), 2084–2093.
Kasai, K., Yamada, H., Kamio, S., Nakagome, K., Iwanami, A., Fukuda, M., Itoh, K., Koshi
da, I., Yumoto, M., Iramina, K., Kato, N., & Ueno, S. (2001). Brain lateralization for mis
match response to across- and within-category change of vowels. NeuroReport, 12 (11),
2467–2471.
Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there a return to
normal? Cognitive Psychology, 51 (2), 141–178.
Kraljic, T., & Samuel, A. G. (2007). Perceptual adjustments to multiple speakers. Journal
of Memory and Language, 56, 1–15.
Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect”
for the prototypes of speech categories, monkeys do not. Perception and Psychophysics,
50 (2), 93–107.
Page 24 of 30
Neural Systems Underlying Speech Perception
Leff, A. P., Iverson, P., Schofield, T. M., Kilner, J. M., Crinion, J. T., Friston, K. J., & Price, C.
J. (2009). Vowel-specific mismatch responses in the anterior superior temporal gyrus: An
fMRI study. Cortex, 45 (4), 517–526.
Levelt, W. J. M (1992). Accessing words in speech production: Stages, processes, and rep
resentations. Cognition, 42, 1–22.
Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1958). Some cues for the distinction be
tween voiceless and voiced stops in initial position. Language and Speech, 1, 153–157.
Liberman, A. M., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The discrimination
of speech sounds within and across phoneme boundaries. Journal of Experimental Psy
chology, 54 (5), 358–368.
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception re
vised. Cognition, 21 (1), 1–36.
Liebenthal, E., Desai, R., Ellingson, M. M., Ramachandran, B., Desai, A., & Binder, J. R.
(2010). Specialization along the left superior temporal sulcus for auditory categorization.
Cerebral Cortex, 20, 2958–2970.
Liégeois-Chauvel, C., de Graaf, J. B., Laguitton, V., & Chauvel, P. (1999). Specialization of
left auditory cortex for speech perception in man depends on temporal coding. Cerebral
Cortex, 9 (5), 484–496.
Lisker, L., & Abramson, A. S. (1964). A cross-language study of voicing in initial stops:
Acoustical measurements. Word, 20, 384–422.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activa
tion model. Ear and Hearing, 19 (1), 1–36.
Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access dur
ing word recognition in continuous speech. Cognitive Psychology, 10 (1), 29–63.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cogni
tive Psychology, 18 (1), 1–86.
McGee, T., Kraus, N., King, C., Nicol, T., & Carrell, T. D. (1996). Acoustic elements
(p. 522)
of speechlike stimuli are reflected in surface recorded responses over the guinea pig tem
poral lobe. Journal of the Acoustical Society of America, 99 (6), 3606–3614.
Page 25 of 30
Neural Systems Underlying Speech Perception
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An
nual Review of Neuroscience, 24, 167–202.
Miller, J. L., & Volaitis, L. E. (1989). Effect of speaking rate on the perceptual structure of
a phonetic category. Perception and Psychophysics, 46 (6), 505–512.
Myers, E. B., & Blumstein, S. E. (2008). The neural bases of the lexical effect: An fMRI in
vestigation. Cerebral Cortex, 18 (2), 278.
Myers, E. B., Blumstein, S. E., Walsh, E., & Eliassen, J. (2009). Inferior frontal regions un
derlie the perception of phonetic category invariance. Psychological Science, 20 (7), 895–
903.
Naatanen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio,
M., et al. (1997). Language-specific phoneme representations revealed by electric and
magnetic brain responses. Nature, 385 (6615), 432–434.
Naatanen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity
(MMN) in basic research of central auditory processing: a review. Clinical Neurophysiolo
gy, 118 (12), 2544–2590.
Obleser, J., Boecker, H., Drzezga, A., Haslinger, B., Hennenlotter, A., Roettinger, M., Eu
litz, C., et al. (2006). Vowel sound extraction in anterior superior temporal cortex. Human
Brain Mapping, 27 (7), 562–571.
Obleser, J., Elbert, T., Lahiri, A., & Eulitz, C. (2003). Cortical representation of vowels re
flects acoustic dissimilarity determined by formant frequencies. Cognitive Brain
Research, 15 (3), 207–213.
Obleser, J., Wise, R. J. S., Dresner, M. A., & Scott, S. K. (2007). Functional integration
across brain regions improves speech perception under adverse listening conditions. Jour
nal of Neuroscience, 27 (9), 2283–2289.
Okada, K., & Hickok, G. (2006). Identification of lexical-phonological networks in the su
perior temporal sulcus using functional magnetic resonance imaging. NeuroReport, 17
(12), 1293–1296.
Page 26 of 30
Neural Systems Underlying Speech Perception
Pallier, C., Bosch, L., & Sebastián-Gallés, N. (1997). A limit on behavioral plasticity in
speech perception. Cognition, 64 (3), B9–B17.
Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attribut
es and recognition memory for spoken words. Learning, Memory, 19 (2), 309–328.
Papanicolaou, A. C., Castillo, E., Breier, J. I., Davis, R. N., Simos, P. G., & Diehl, R. L.
(2003). Differential brain activation patterns during perception of voice and tone onset
time series: a MEG study. NeuroImage, 18 (2), 448–459.
Paulesu, E., Frith, C. D., & Frackowiak, R. S. J. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.
Peramunage, D., Blumstein, S. E., Myers, E. B., Goldrick, M., & Baese-Berk, M. (2011).
Phonological neighborhood effects in spoken word production: An fMRI study. Journal of
Cognitive Neuroscience, 23 (3), 593–603.
Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of vowels. Jour
nal of the Acoustical Society of America, 24, 175–184.
Pisoni, D. B., & Tash, J. (1974). Reaction times to comparisons within and across phonetic
categories. Perception and Psychophysics, 15, 289–290.
Pitt, M.A., & Samuel, A.G. (1993). An empirical and meta-analytic evaluation of the
phoneme identification task. Journal of Experimental Psychology: Human Perception and
Performance, 19, 699–725.
Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code.
Cognitive Science, 25 (5), 679–693.
Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
(1999). Functional specialization for semantic and phonological processing in the left in
ferior prefrontal cortex. NeuroImage, 10 (1), 15–35.
Prabhakaran, R., Blumstein, S. E., Myers, E. B., Hutchison, E., & Britton, B. (2006). An
event-related fMRI investigation of phonological-lexical competition. Neuropsychologia,
44 (12), 2209–2221.
Remez, R. E., Rubin, P. E., Pisoni, D. B., & Carrell, T. D. (1981). Speech perception with
out traditional speech cues. Science, 212 (4497), 947–949.
Righi, G., Blumstein, S. E., Mertus, J., & Worden, M. S. (2010). Neural systems underlying
lexical competition: An eye tracking and fMRI study. Journal of Cognitive Neuroscience,
22 (2), 213–224.
Page 27 of 30
Neural Systems Underlying Speech Perception
Salvata, C., Blumstein, S. E., & Myers, E. B. (2012). Speaker invariance for phonetic infor
mation: An fMRI investigation, Language and Cognitive Processes, 27 (2), 210–230.
Sharma, A., & Dorman, M. F. (1999). Cortical auditory evoked potential correlates of cate
gorical perception of voice-onset time. Journal of the Acoustical Society of America, 106
(2), 1078–1083.
Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes.
Science, 283, 1657–1661.
Snyder, H. R., Feignson, K., & Thompson-Schill, S. L. (2007). Prefrontal cortical response
to conflict during semantic and phonological tasks. Journal of Cognitive Neuroscience, 19,
761–775.
Steinschneider, M., Schroeder, C. E., Arezzo, J. C., & Vaughan, H. G. (1995). Physiologic
correlates of the voice onset time boundary in primary auditory cortex (A1) of the awake
monkey: Temporal response patterns. Brain and Language, 48 (3), 326–340.
Steinschneider, M., Volkov, I. O., Noh, M. D., Garell, P. C., & Howard, M. A. (1999). Tem
poral encoding of the voice onset time phonetic parameter by field potentials recorded di
rectly from human auditory cortex. Journal of Neurophysiology, 82 (5), 2346–2357.
Stevens, K. N., & Blumstein, S. E. (1978). Invariant cues for place of articulation in stop
consonants. Journal of the Acoustical Society of America, 64, 1358–1368.
Strange, W. (1989). Evolving theories of vowel perception. Journal of the Acoustical Soci
ety of America, 85 (5), 2081–2087.
Strange, W., Jenkins, J., & Johnson, T. L. (1983). Dynamic specification of coarticulated
vowels. Journal of the Acoustical Society of America, 74 (3), 697–705.
Tallal, P., & Newcombe, F. (1978). Impairment of auditory perception and language com
prehension in aphasia. Brain and Language, 5, 13–24.
Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of the
left inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceed
ings of the National Academy of Sciences U S A, 94, 14792–14797.
Page 28 of 30
Neural Systems Underlying Speech Perception
Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com
petition on activation in left prefrontal cortex during word generation. Neuron, 23, 513–
522.
Utman, J. A., Blumstein, S. E., & Sullivan, K. (2001). Mapping from sound to meaning: Re
duced lexical activation in Broca’s aphasics, Brain and Language, 79, 444–472.
von Kriegstein, K., Eger, E., Kleinschmidt, A., & Giraud, A. L. (2003). Modulation of neur
al responses to speech by directing attention to voices or verbal content. Cognitive Brain
Research, 17 (1), 48–55.
Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., & Grodd, W. (2002). Dynamic brain ac
tivation during processing of emotional intonation: Influence of acoustic parameters,
emotional valence, and sex. NeuroImage, 15 (4), 856–869.
Wilson, S. M., & Iacoboni, M. (2006). Neural responses to non-native phonemes varying in
producibility: Evidence for the sensorimotor nature of speech perception. NeuroImage, 33
(1), 316–325.
Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech acti
vates motor areas involved in speech production. Nature Neuroscience, 7 (7), 701–702.
Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normaliza
tion. Journal of Cognitive Neuroscience, 16 (7), 1173–1184.
Wong, P., Warrier, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, T. B., & Zatorre,
R. J. (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex,
18 (4), 828–836.
Xu, Y., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Tong, Y., Li, X., et al. (2006). Ac
tivation of the left planum temporale in pitch processing is shaped by language experi
ence. Human Brain Mapping, 27 (2), 173–183.
Yee, E., Blumstein, S. E., & Sedivy, J. C. (2008). Lexical-semantic activation in Broca’s and
Wernicke’s aphasia: Evidence from eye movements. Journal of Cognitive Neuroscience,
20, 592–612.
Ylinen, S., Uther, M., Latvala, A., Vepsäläinen, S., Iverson, P., Akahane-Yamada, R., &
Näätänen, R. (2010). Training the brain to weight speech cues differently: A study of
Finnish second-language users of English. Journal of Cognitive Neuroscience, 22 (6),
1319–1332.
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex:
Music and speech. Trends in Cognitive Sciences, 6 (1), 37–46.
Zevin, J. D., Yang, J., Skipper, J. I., & McCandliss, B. D. (2010). Domain general change de
tection accounts for “dishabituation” effects in temporal-parietal regions in functional
Page 29 of 30
Neural Systems Underlying Speech Perception
Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., Kawakatsu, M., et al.
(2009). Neural signatures of phonetic learning in adulthood: A magnetoencephalography
study. NeuroImage, 46 (1), 226–240.
Sheila Blumstein
Sheila Blumstein is the Albert D. Mead Professor of Cognitive, Linguistic and Psycho
logical Sciences at Brown University.
Emily B. Myers
Page 30 of 30
Multimodal Speech Perception
Spoken language can be understood through different sensory modalities. Audition, vi
sion, and haptic perception each can transduce speech information from a talker as a sin
gle channel of information. The more natural context for communication is for language
to be perceived through multiple modalities and for multimodal integration to occur. This
chapter reviews the sensory information provided by talkers and the constraints on multi
modal information processing. The information generated during speech comes from a
common source, the moving vocal tract, and thus shows significant correlations across
modalities. In addition, the modalities provide complementary information for the per
ceiver. For example, the place of articulation of speech sounds is conveyed more robustly
by vision. These factors explain the fact that multisensory speech perception is more ro
bust and accurate than unisensory perception. The neural networks responsible for this
perceptual activity are diverse and still not well understood.
Page 1 of 56
Multimodal Speech Perception
The brain’s ability to combine sensory information from multiple modalities into a single,
unified percept is a key feature of organisms’ successful interaction with the external
world. Psychophysical studies have demonstrated that multisensory integration results in
perceptual enhancement by reducing ambiguity and therefore enhancing our ability to re
act to external events (Ernst & Banks, 2002; Welch & Warren, 1986). Furthermore, given
that information specified in each sensory modality reflects different features of the same
stimulus, the integration of sensory inputs provides complementary information about the
environment, increasing the likelihood of its accurate identification (O’Hare, 1991). In
short, multisensory integration produces a much richer and a more diverse sensory expe
rience than that offered by each sensory modality in isolation (Stein & Meredith, 1993).
Page 2 of 56
Multimodal Speech Perception
Given the multiple levels of audiovisual speech, a comprehensive explanation of what in
formation is available for speech, therefore, requires a more specific statement of what
unit of communication is being examined. For example, if the level of communication is
the message and its meaning, then global features, such as posture and tone of speech,
may be important. At times, how you say something—not what you say—conveys the im
portant information. At a more micro level, the unit might be the individual sound or
word, a dimension that, again, can be acquired acoustically (i.e., speech sounds) as well
as optically (i.e., articulatory movements). To a great extent, this more fine-grained analy
sis of speech has been the major focus of research, and it will be the level at which we
consider the existing literature.
Page 3 of 56
Multimodal Speech Perception
sound propagates through the vocal tract, the spatial configuration of the articulator
structures (e.g., velum, palate, tongue, lips) creates resonances that give each sound its
final characteristic spectral shape. Thus, when talking, the articulators must move to
achieve the (p. 526) spatial-temporal configurations necessary for each speech sound.
Such time-varying configurations of the vocal tract not only deform the immediate re
gions around the oral aperture but also encompass a much larger region of the face (Pre
minger et al., 1998; Thomas & Jordan, 2004; Yehia et al., 1998). Because some of the ar
ticulators of the oral aperture are visible along with their distributed correlates around
the face, there is an inherent relationship between the acoustic speech sounds and their
visible generators.
Page 4 of 56
Multimodal Speech Perception
al, however, a few basic features can be indexed to signal place, manner, and voicing of
consonants or frontness versus backness and height of vowels.
The absence of reliable acoustic boundary markers in phoneme segments led a number of
researchers to abandon the notion that a phonemic level of representation is activated
during speech processing. Alternative accounts proposed, among others, syllables (see
Dupoux, 1993, for a review), context-sensitive allophones (Wickelgren, 1969), and articu
latory gestures (Fowler, 1986; Liberman et al., 1967, Liberman & Mattingly, 1985)2 as the
minimal units that the listeners could use to parse the incoming input. Some alternative
models have even proposed that lexical representations are compared directly with a
transformed speech signal, with no intermediate stages of processing (Klatt, 1989).
Finally, when considering the mapping from prelexical segments to words stored in the
mental lexicon, some models have proposed that information only flows in one direction:
from sounds to words, without any backward influence (e.g., autonomous models such as
the merge model; (p. 527) Norris et al., 2000). According to other interactive models, how
ever, the flow of information between prelexical and lexical stages of processing is bidi
rectional, with top-down feedback from lexical forms to earlier acoustic and phonemic
processing contributing to word recognition (e.g., TRACE model; McClelland & Elman,
1986). The debate between autonomous and interactive models of word recognition has
become one central topic in the psychology of speech and seems to be far from being re
solved (see Norris et al., 1995, for a review).
Page 5 of 56
Multimodal Speech Perception
guistic context (e.g., Rönnberg et al., 1998; Samuelsson & Rönnberg, 1993) or training.
Although there is great individual variability in the ability to speech-read, some highly
skilled individuals are capable of reaching high comprehension levels (e.g., Andersson &
Lidestam, 2005).
Because of this high degree of visual confusability, it could be expected to find that many
words are indistinguishable from each other on the basis of the visual information alone
(a phenomenon called homophony; Berger, 1972; Nitchie, 1916). However, some studies
have shown that a reduction in phonemic distinctiveness does not necessarily imply a loss
of word recognition in visual speech perception. Indeed, lexical intelligibility during
speech reading is much better than would be expected on the basis of the visemic reper
toire alone (Auer, 2002; Auer & Bernstein, 1997; Bernstein et al., 1997; Mattys et al.,
2002). Additional information such as knowledge of phonotactic constraints (i.e.,
phoneme patterns that constitute words in the language; Auer & Bernstein, 1997), re
duced lexical density (i.e., the degree of visual similarity to other words in the lexicon),
high frequency of occurrence (Auer, 2009), and semantic predictability (Gagné et al.,
1991) may allow individuals to identify a word even in a reduced phonetic representation.
Visual primitives of speech have also been described based on time-varying features of ar
ticulation (see Jackson, 1988; Summerfield, 1987). For instance, Rosenblum & Saldaña
(1996) showed that isolated visual time-varying information of the speech event is suffi
cient for its identification. In fact, recent research suggests that the information that can
be retrieved from the visual speech may be much more detailed than previously thought.
Page 6 of 56
Multimodal Speech Perception
Growing evidence shows that subtle jaw, lip, and cheek movements can be perceived by a
viewer, thus allowing finer distinctions within gestures belonging to the same viseme
(Vatikiotis-Bateson et al., 1996; Yehia et al., 2002). This suggests that visible speech fea
tures can be described along kinematic as well as static dimensions, a point to which we
shall return in a following section (see later section, Correlated and Complementary Na
ture of Facial Movements and Vocal Acoustics).
Besides providing information of the phonemes, visual speech cues have been
(p. 528)
shown to improve the recognition of prosodic aspects of the message. For example,
phonologically similar languages can be discriminated on the basis of the visual informa
tion alone, possibly owing to differences in the rhythmic pattern of the languages (Soto-
Faraco et al., 2007; Weikum et al., 2007). Similarly, other prosodic dimensions, such as
emphatic stress, sentence intonation (Fisher et al., 1969), and pitch changes associated
with lexical tone (Burnham et al., 2000), can be processed by using visual cues alone,
sometimes even when the lower part of the face is occluded (Cvejic et al., 2010; Davis &
Kim, 2006). For instance, raising eyebrow movements (Granström et al., 1999) or eye
widening (Massaro & Beskow, 2002) can serve as an independent prosodic cue to promi
nence, and observers preferentially look at these regions in prosody-related judgments
(Buchan et al., 2004; Lansing & McConkie, 1999). In fact, visual cues such as movements
of the head and eyebrows have been shown to correlate with the basic cues for prosody in
the auditory domain, namely changes in voice pitch, loudness, or duration (Foxton et al.,
2009; Munhall et al., 2003). However, the visual contribution to the prosodic aspects of
the message is not necessarily limited to the upper part of the face. That is, head move
ments also include parts of the lower face (i.e., chin; Figure 26.2). The neck and chin
movements can provide information about the identity of lexical tones (e.g., Chen & Mas
saro, 2008), and the magnitude of mouth movements can possibly be cues for the percep
tion for loudness (i.e., amplitude). Therefore, the different regions of the face can be in
formative for various dimensions and to various degrees.
Page 7 of 56
Multimodal Speech Perception
Page 8 of 56
Multimodal Speech Perception
When speech is perceived bimodally, such as audiovisually or even visuotactually (see lat
er section, Tactile Contributions to Speech), perception is often enhanced in a synergistic
fashion, especially if the auditory input is somehow degraded. That is, speech-reading
performance in bimodal conditions is better than the pooled performance from the uni
modal conditions (Summerfield, 1987). In the first known experimental demonstration of
this phenomenon (Cotton, 1935), the voice of a talker speaking inside a dark sound booth
was filtered and masked with a loud buzzing noise, rendering it almost unintelligible.
However, when the lights of the sound booth were switched on and participants could see
the talker’s face, they correctly reported most of the words. Almost 20 years later, these
findings were quantified by Sumby and Pollack (1954), who showed that when the intelli
gibility of acoustic speech is impoverished by adding noise, the concurrent presentation
of its corresponding visual speech cues can improve comprehension to a degree equiva
lent to increasing acoustic signal-to-noise ratio by 15 to 20 dB (see also Rosenblum et al.,
1996; Ross et al., 2006; Figure 26.3). Because this gain in performance is so large, the
combination of multiple sources of information about speech has the potential to be ex
tremely useful for listeners with hearing impairments (Berger, 1972). Indeed, the benefits
derived from having access to the speaker’s facial speech information have been docu
mented in listeners with mild to more severe types of hearing loss (p. 529) (Payton et al.,
1994; Picheny et al., 1985) and even deaf listeners with cochlear implants (Rouger et al.,
2007, 2008).
Nonetheless, the visual contribution to speech perception has been demonstrated even in
circumstances in which the auditory signal is not degraded. For example, novice speakers
of a second language often report that face-to-face conversation is easier than situations
without visual support (Navarra & Soto-Faraco, 2007; Reisberg et al., 1987). Similarly,
Reisberg et al. (1987) demonstrated that when listening to perfectly audible messages
from a speaker with a heavy foreign accent or to a passage with difficult semantic con
tent (e.g., Kant’s Critique of Pure Reason), the availability of the visual information en
hances comprehension (see also Arnold & Hill, 2001).
This synergy effect in bimodal stimulation contexts may be attributed to at least two sepa
rate mechanisms. First, the perceiver might exploit time-varying features common to the
physical signals of both the acoustical and the visual input. That is, in addition to match
ing onsets and offsets, the concurrent streams of auditory and visual speech information
are usually invariant in terms of their tempo, rhythmical patterning, duration, intensity
variations, and even affective tone (Lewkowicz et al., 2000). These common (“amodal”)
properties have been proposed to be the critical determinants for integration by some
models of audiovisual speech perception (Rosenblum, 2008; Studdert-Kennedy, 1989;
Summerfield, 1987).4 Second, the auditory and visual signals are fused to a percept that
is more than the sum of its parts. That is, because each modality contains partially inde
pendent phonological cues about the same speech event, recognition is boosted when
both sources of information are available.
Page 9 of 56
Multimodal Speech Perception
Given that there is so much coherence between speech production components, it makes
sense that perceivers are sensitive to this structure and benefit from the time-varying re
dundancies across modalities to decode the spoken message more reliably (Grant & Seitz,
2000; Schwartz et al., 2002). This redundancy may permit early modulation of audition by
vision. For example, the visual signal may amplify correlated auditory inputs (Schroeder
(p. 530) et al., 2008). Grant and Seitz (2000) showed that the ability to detect the presence
Page 10 of 56
Multimodal Speech Perception
It has been suggested that the redundancy of information perceived from the talking
head, in particular in the common dynamic properties of the utterance, may be the metric
listeners use for perceptual grouping and phonetic perception (Munhall et al., 1996; Sum
merfield, 1987). In keeping with this idea, several studies have shown that the visual en
hancement of speech perception depends primarily on dynamic rather than static charac
teristics of facial images. For example, Vitkovitch and Barber (1994) have demonstrated
that speech-reading accuracy for auditory dynamic noise decreases as the frame rate
(temporal resolution) of the video of the speaker’s face drops below 16 Hz. Furthermore,
the temporal characteristics of facial motion can enhance phonetic perception, as demon
strated by use of dynamic point-light displays (Rosenblum & Saldaña, 1996). Neuropsy
chological data confirm the specific role of the dynamic, time-varying characteristics on
audiovisual speech integration. Munhall et al. (2002) presented dynamic and static (i.e.,
single frame) visual vowels to a patient who had suffered selective damage to the ventral
stream, a brain region involved in the discrimination of forms (e.g., faces, a deficit known
as agnosia). The authors found that, whereas the patient was unable to identify any
speech gestures from the static photographs, she did not differ from controls in the dy
namic condition (see also Campbell, 1997, for a similar case). Overall, these results sug
gest that local motion cues are critical for speech recognition.
The second way in which visible speech influences auditory speech is by providing com
plementary information. In this case, vision provides stronger cues than the auditory sig
nal, or even information that is missing from the auditory signal. As Summerfield (1987)
pointed out, the speech units that are most confusable visually are not those that are
most confusable with auditory stimuli (Figure 26.4). In Miller and Nicely’s (1955) classic
study of consonant perception in noise, place of articulation is one of the first things to
become difficult to perceive as the signal-to-noise ratio decreases. The auditory cues re
garding manner of consonants (e.g., stops vs. fricatives) are less susceptible to noise
while voicing, and presence of nasality is even more robust. These latter cues are very
weakly conveyed visually, if at all. Thus, audio and visual speech cues are complementary;
visual cues may be superior in conveying information about the place of articulation (e.g.,
Page 11 of 56
Multimodal Speech Perception
at the lips vs. at the back of the mouth), whereas auditory cues may be more robust for
conveying other phonetic information, such as the manner of articulation and voicing. The
visual input, therefore, provides (p. 531) redundant cues to reinforce the auditory stimulus
and can also be used to disambiguate some speech sounds with quite similar acoustics,
such as /ba/ versus /da/, which differ in place of articulation. The most famous multisenso
ry speech phenomenon, the McGurk effect, probably has its roots in this difference in in
formation strength.
Page 12 of 56
Multimodal Speech Perception
complementary signal for /b/. Another version of the McGurk effect, the combination illu
sion, reinforces this interpretation. In the combination illusion (sometimes called audiovi
sual phonological fusion; Radicke, 2007; see also Troyer et al., 2010), two stimuli for
which the cues are very strong are presented: a visual /ba/ and an acoustic / ga/. In this
case, both consonants are perceived as a cluster /bga/. This version of the illusion is more
reliable and more consistently perceived than the fusionillusion.
The McGurk effect has been replicated many times with different stimuli under a variety
of manipulations (e.g., Green & Gerdeman, 1995; Green et al., 1991; Jordan & Bevan,
1997; MacDonald & McGurk, 1978; Massaro & Cohen, 1996; Rosenblum & Saldaña,
1996), and has often been described as being very compelling and robust. However, after
extensive experience in the laboratory with this type of stimuli, one cannot help but no
tice that the effect and the illusory experience derived from it are not as robust as usually
described in the audiovisual speech literature. For instance, it is not uncommon to find
that the effect simply fails to occur for utterances produced by some talkers, even when
the auditory and visual channels are accurately dubbed (Carney et al., 1999). Moreover,
even for those talkers more amenable to producing the illusion, the effect is usually only
observed in a (p. 532) percentage of trials over the course of an experiment (Brancazio,
2004; Brancazio & Miller, 2005; Massaro & Cohen, 1983), contrary to what was claimed
in the original report (McGurk & MacDonald, 1976, p. 747). Furthermore, even when au
diovisual discrepancy is not recognized as such, the phenomenological experience arising
from a McGurk-type stimulus is often described as being different from the experience of
a naturally occurring equivalent audiovisual event. On the other hand, not experiencing
the illusion does not necessarily mean that there is no influence of vision on audition
(Brancazio & Miller, 2005; Gentilucci & Cattaneo, 2005). Brancazio and Miller, for in
stance, found that a visual phonetic effect involving the influence of visual speaking rate
on perceived voicing (Green & Miller, 1985) occurred even when observers did not expe
rience the McGurk illusion. According to the authors, this result suggests that the inci
dence of the McGurk effect as an index for audiovisual integration may be underestimat
ing the actual extent of interaction among the two modalities. Along the same lines, Gen
tilucci and Cattaneo (2005) showed that even when participants did not experience the
McGurk illusion, an acoustical analysis of the participants’ spoken responses revealed
that these utterances were always influenced by the lip movements of the speaker, sug
gesting that some phonetic features present in the visual signal were being processed.
Another factor that often puzzles researchers is that the overall incidence of the McGurk
effect for a given stimulus typically varies considerably across individuals (Brancazio et
al., 1999), with some people not experiencing the illusion at all. Whereas these differ
ences have been explained in terms of modality dominance (i.e., individual differences in
the weighting of auditory vs. visual streams during integration; Giard & Peronnet, 1999),
it is still surprising that some people can weigh the auditory and visual information in
such a way that the information provided by one modality can be completely filtered out.
Page 13 of 56
Multimodal Speech Perception
Given that this phenomenon has traditionally been used to quantify the necessary and
sufficient conditions under which audiovisual integration occurs (some of these are dis
cussed below), we believe a full understanding of the processes underlying the McGurk
effect is required before extrapolating the results to naturally occurring (i.e., audiovisual
matching) speech (see also Brancazio & Miller, 2005).
Page 14 of 56
Multimodal Speech Perception
A few studies have shown that even untrained hearing individuals can benefit from this
form of natural tactile speech reading when speech is presented in adverse listening con
ditions (Fowler & Dekle, 1991; Gick et al., 2008; Sato et al., 2010). For instance, Gick et
al. (2008) found that untrained participants could identify auditory or visual syllables
about 10 percent better when they were paired with congruent tactile information from
the face. Similarily, Sato et al. (2010) presented participants with auditory syllables em
bedded in noise (e.g., / ga/) alone or together with congruent (e.g., /ga/) and incongruent
(e.g., /ba/) McGurk-like combinations (e.g., /bga/) tactile utterances. The results showed
that manual tactile contact with the speaker’s face coupled with congruent auditory infor
mation facilitated the identification of the syllables. Interestingly, they also found that al
though auditory identification was significantly decreased by (p. 533) the influence of the
incongruent tactile information, participants did not report combined illusory percepts
(e.g., /bda/). Instead, on some trials, they selected the haptically specified event. This re
sult is in line with previous findings by Fowler and Dekle (1991), who showed no evidence
of illusory McGurk percepts as a result of audiotactile incongruent pairings (only one of
seven tested participants reported hearing a new percept under McGurk conditions). The
fact that audiotactile incongruent combinations do not elicit illusory percepts, together
with the fairly small audiotactile benefits observed in the matching conditions (in compar
ison to audiovisual matching conditions), raises theoretical questions regarding the com
binative structuring of audiotactile speech information. That is, these results suggest that
rather than genuine perceptual interactions, audiotactile interactions may be the result of
postperceptual decisions reached through probability summation of the two sources of in
formation (see Massaro, 2009). Note that if visual and tactile information were not per
ceptually integrated but rather were processed in parallel and independently of each oth
er, one should observe improvements in the syllable identification (because of signal re
dundancy), but incongruent conditions should never lead to hearing new percepts. This is
the exact pattern of results found in these studies. The finding that integrated processing
in speech perception is highly dependent on the modality through which information is
initially encoded clashes with a recent hypothesis suggesting that audiotactile speech in
formation is integrated in a similar way as synchronous audiovisual information (Gick &
Derrick, 2009, p. 503). In a recent study, Gick and Derrick (2009) paired acoustic speech
utterances (i.e., the syllables /pa/ or /ba/) with small bursts of air on their necks or hands
and found that participants receiving puffs of air were more likely to perceive both
sounds as aspirated (i.e., /pa/). According to the authors, the fact that this effect occurred
in untrained perceivers and at body locations unlikely to be reinforced by frequent experi
ence suggest that tactile information is combined in a natural manner with the auditory
speech signal. However, it is more likely that this interference is due to postperceptual
analysis (Massaro, 2009).
Tactile Aids
Given the substantial success of Tadoma and other natural methods for tactile communi
cation, researchers have become interested in exploiting this modality as an alternative
channel of communication in individuals with severe hearing impairments. The possibility
Page 15 of 56
Multimodal Speech Perception
The effort in developing tactile aids to assist speech reading and maximize sensory redun
dancy started in 1924 when Dr. Robert Gault built an (p. 534) apparatus, a long tube that
the speaker placed in front of the mouth and the receiver held in her hands, that deliv
ered unprocessed sound vibrations to the receivers (Gault, 1924). Although Gault’s vibra
tor only transmitted very limited cues for speech recognition (possibly only the temporal
envelope; see Levitt, 1995), it was nevertheless found to be useful to supplement speech
reading. The relative success in this early work boosted the development of more sophis
ticated sensory substitution systems,5 aiming at decoding different aspects of speech sig
nals into patterns of tactile stimulation.
Haptic devices have varied with respect to the type of transducers (electrotactile vs. vi
brotactile), the stimulated body site (e.g., finger, hand, forearm, abdomen, and thigh), and
the number and configurations of stimulators (e.g., single-channel vs. multi-channel stim
ulation; see Levitt, 1988, for a review). In the single-channel approach, for example, a sin
gle transducer directly presents minimally processed acoustic signals, thus conveying
global properties of speech such as intensity, rhythm, and energy contours (Boothroyd,
1970; Erber & Cramer, 1974; Gault, 1924; Gault & Crane, 1928; Schulte, 1972). In multi
channel schemes, the acoustic input signal is decomposed into a number of frequency
bands, the outputs of which drive separate tactile transducers that present a spectral dis
play along the user’s skin surface. The skin, thus, is provided with an artificial place
mechanism for coding frequencies. In most systems, moreover, the sound energy at a giv
en locus of stimulation cues intensity in the corresponding channel.
Psychophysical evaluations of performance with these methods have usually been done by
comparing speech reading alone, the tactile device alone, and speech reading plus the
tactile device. These studies have shown that after extensive training, vocoders can pro
vide enough feedback to improve speech intelligibility both when used in isolation and
presented together with visual speech information (Brooks & Frost, 1983; Brooks et al.,
1986). Nevertheless, none of these artificial methods has reached the performance
achieved through the Tadoma method. Such performance differences may be partly at
tributed to the superior overall richness that the Tadoma displays. Whereas Tadoma con
veys a variety of sensory qualities directly tied to the articulation process, tactile devices
encode and display acoustic information in an arbitrary fashion (i.e., employing a sensory
system not typically used for this sort of information; Weisenberg & Percy, 1995). Further
more, as pointed out before (see earlier section, Properties and Perception of the Audito
ry Speech Signal), phonemes are signalled by multiple acoustic cues that vary greatly as
a function of the context. It is possible, therefore, that some critical acoustic information
is lost in the processing and transduction when using such an apparatus.
Page 16 of 56
Multimodal Speech Perception
In terms of the information provided, single-channel devices have been generally de
scribed as being superior in conveying supra-segmental information, such as syllable
number, syllabic stress, and intonation (Bernstein et al., 1989; Carney & Beachler, 1986).
Multichannel aids, on the other hand, show better performance for tasks requiring the
identification of fine-structure phoneme information (both single-item and connected
speech; Brooks & Frost, 1983; Plant, 1989; Summers et al, 1997; Weisenberger et al.,
1991; Weisenberger & Russell, 1989). Other studies, however, have reported fairly simi
lar-sized benefits in extracting segmental and supra-segmental information regardless of
the number of channels. For instance, Carney (1988) and Carney and Beachler (1986)
compared a single-channel and a twenty-four-channel vibrotactile device in phoneme
recognition tasks, and reported similar levels of performance under both tactile aid alone
and speech reading plus tactile aid conditions. Hanin et al. (1988) measured the percep
tion of words in sentences by speech reading with and without tactile presentation of
voice fundamental frequency (F0) using both multichannel display and single-channel dis
plays. Mean performance with the tactile displays was found to be slightly, but signifi
cantly, better than speech reading alone, but no significant differences were observed be
tween the two displays.
Overall, what is clear from these studies, however, is that providing complementary tac
tile stimulation—in whatever form—is effective in helping hearing-impaired individuals to
decode different kinds of linguistic information such as segmental (e.g., Yuan et al.,
2005), and supra-segmental features (e.g., Auer et al., 1998; Bernstein et al., 1989; Grant
et al., 1986; Thompson, 1934), closed-set words (Brooks & Frost, 1983), and even con
nected speech (Miyamoto et al., 1987; Skinner et al., 1989).
Page 17 of 56
Multimodal Speech Perception
intelligibility and the McGurk effect) are still observed. This temporal window of integra
tion is thought to compensate for small temporal delays between modalities that may
arise under natural conditions because of both the physical characteristics of the arriving
inputs (i.e., differences in the relative time of arrival of stimuli at the eye and ear) and
biophysical differences on sensory information processing (i.e., differences in neural
transduction latencies between vision and audition; Spence & Squire 2003; Vatakis &
Spence, 2010).6
In the area of audiovisual speech perception, results suggest that the perceptual system
can handle a relatively large temporal offset between auditory and visual speech signals.
For example, studies measuring sensitivity to intermodal asynchrony (i.e., judgments
based on the temporal aspects of the stimuli) for syllables (Conrey & Pisoni, 2006) or sen
tences (Dixon & Spitz, 1980) have consistently identified a window of approximately 250
ms over which auditory-visual speech asynchronies are not reliably perceived. Further
more, this time window is often found to be longer when the visual input precedes the au
ditory input than when the auditory input precedes the visual. For instance, in one of the
first attempts to find the threshold for detection of asynchrony, Dixon and Spitz (1980)
presented participants with audiovisual speech streams that became gradually out of
sync, and instructed them to press a button as soon as they noticed the asynchrony. They
found that the auditory stream had to either lag by 258 ms or lead by 131 ms before the
discrepancy was detected. Later studies using detection methods less susceptible to bias
(see Vatakis & Spence, 2006, for a discussion on this),7 have provided a sharper delimita
tion of this temporal window, showing limits ranging between 40 and 100 ms for auditory
leading (see Soto-Faraco & Alsius, 2007, 2009; Vatakis et al., 2006) and up to 250 for
video leading stimuli (Grant et al. 2004; Soto-Faraco & Alsius, 2009; Vatakis & Spence
2006; Figure 26.7).
Page 18 of 56
Multimodal Speech Perception
Consistent findings are found in studies estimating the boundaries of this temporal win
dow by quantifying the magnitude of multisensory effects, such as visual enhancement of
speech intelligibility (Conrey & Pisoni, 2006; Grant & Greenberg, 2001; McGrath & Sum
merfield, 1985; Pandey et al., 1986) or the McGurk illusion (Jones & Jarick, 2006; Mas
saro & Cohen; 1993; Massaro et al., 1996; Miller & D’Esposito, 2005; Munhall et al.,
1996; Soto-Faraco, & Alsius, 2007, 2009; van Wassenhove et al., 2007). For instance,
Munhall et al. (1996) presented McGurk syllables at different asynchronies (i.e., from a
360-ms auditory lead to a 360-ms auditory lag, in 60-ms steps) and found that, although
illusory percepts prevailed at small asynchronies, a significant amount of fused responses
were still observed when the audio track lagged the video track for 240 ms, or when it
lead it by 60 ms. In a recent report, however, this temporal window was reported to be
much wider (480 ms with an audio lag and 320 ms with an audio lead; Soto-Faraco & Al
sius, 2009). According to some researchers, this much greater tolerance for situations in
which the audio signal lags the visual than for situations in which the visual signal lags
the audio can be explained by perceivers’ experience with the physical properties of the
natural world (see Munhall et al., 1996). That is, in audiovisual speech, like in many other
natural audiovisual occurrences, the visual information (i.e., the visible byproducts of
speech articulation) almost always precedes the acoustic output (i.e., speech sounds). It is
Page 19 of 56
Multimodal Speech Perception
conceivable, therefore, that after repeated exposure to visually leading occurrences in the
external word, our perceptual system has adapted to tolerate and bind visual-leading mul
tisensory events to a greater extent than audio-leading stimuli. Indeed, this hypothesis is
supported by recent studies showing that the temporal window of integration is adapt
able in size, as a result of perceptual experience. For instance, repeated exposure to tem
porally misaligned speech stimuli can alter the perception of synchrony or asynchrony
(Vatakis & Spence, (p. 536) 2007; see also Fujisaki et al., 2004, and Vroomen et al., 2004,
for nonspeech related results).
Furthermore, the ability of the human perceptual system to deal with audiovisual asyn
chronies appears to vary as a function of the nature of the stimuli with which the system
is being confronted. For instance, for stimuli with limited informational structure within
either modality (e.g., such as beeps and flashes; see Hirsh & Sherrick, 1961), the tempo
ral window for subjective simultaneity has been shown to be much narrower, with timing
differences of less than 60 to 70 ms being detected (Zampini et al., 2003). When the com
plexity of multisensory information increases, as in audiovisual speech or highly ecologi
cal nonspeech audiovisual events (e.g., music instruments or object actions; Vatakis
& Spence, 2006), the information content (i.e., the semantics and the inherent
(p. 537)
structure/dynamics that are extended in time) of the unisensory inputs may serve as an
additional factor promoting integration (Calvert et al., 1998; Laurienti et al., 2004), and
hence lead to a widening of the temporal window for integration (i.e., larger temporal and
spatial disparities are tolerated; Vatakis & Spence, 2008). For speech, some authors have
suggested, moreover, that once such complex sensory signals are merged into a single
unified percept, the system interprets that this multisensory stimulus has a unique tempo
ral onset and a common external origin (see Jackson, 1953), the “unity assumption hy
pothesis.” Consequently, the final perceptual outcome of multisensory integration re
duces or eliminates the original temporal asynchrony between the auditory and visual sig
nals (Vatakis & Spence, 2007; Welch & Warren, 1980). Support for the unity assumption
has primarily come from studies showing participants’ sensitivity to temporal misalign
ment in audiovisual speech signals is higher for mismatching audiovisual speech events
(e.g., different face-voice gender or McGurk syllables) than for matching ones (Vatakis &
Spence, 2007, and van Wassenhove et al., 2007; respectively). Other studies, however,
qualified these findings by showing that such informational congruency effects (i.e., wider
temporal windows for matching vs. nonmatching stimuli) were only observed for audiovi
sual speech stimuli, but not for other complex audiovisual events such as music instru
ments or object actions (Vatakis & Spence, 2006) and even some vocalizations (Vatakis et
al., 2008). This has been used to argue that the temporal synchrony perception for speech
stimuli may be somewhat special (Radeau, 1994; Vatakis et al., 2008).8
A critical methodological concern in Vatakis and Spence (2007) and van Wassenhove et al.
(2007), however is that mismatch conditions differed not only in the informational con
tent of the stimuli but also in a number of physical (sensory) dimensions (e.g., structural
factors; Welch, 1999). As a result, the reported effects may be explained by differences at
the level of simple spatial-temporal attributes. In fact, two recent studies evaluating the
temporal window of integration for different attributes and perceptual interpretations in
Page 20 of 56
Multimodal Speech Perception
a set of identical multisensory objects (thus controlling for low-level factors) have chal
lenged the unity assumption hypothesis (Soto-Faraco & Alsius, 2009; Vroomen & Steke
lenburg, 2011). Soto-Faraco and Alsius (2009) explored the tolerance of the McGurk com
bination effect to a broad range of audiovisual temporal asynchronies, while measuring,
at the same time, the temporal resolution across the two modalities involved. Critically,
they found that the McGurk illusion can arise even when perceivers are able to detect the
temporal mismatch between the face and the voice (Soto-Faraco & Alsius, 2009). This
suggests that the final perceptual outcome of multisensory integration perceptual input
does not overwrite the original temporal relation between the two inputs, as the unity as
sumption would predict. Instead, it demonstrates that the temporal window of multisen
sory integration has different widths depending on the perceptual attribute at stake. In
the Vroomen and Stekelnburg (2011) study, participants were required to detect asyn
chronies between synthetically modified (i.e., sine-wave speech) pseudowords and the
corresponding talking face. Sine-wave speech is an impoverished speech signal that is not
recognized as speech by naïve observers; however, when perceivers are informed of its
speech-like nature, they become able to decode its phonetic content. The authors found
that, whereas the sound was more likely integrated with lip-read speech if heard as
speech than non-speech (i.e., the magnitude of the McGurk effect was dependent on the
speech mode; see also Tuomainen et al., 2005), observers in both a speech and non
speech mode were equally sensitive at judging audiovisual temporal order of the events.
This result suggests that previously found differences between speech and nonspeech
stimuli were due to low-level stimulus differences, rather than reflecting the putative spe
cial nature of speech.
The wider temporal windows of integration for audiovisual speech—as compared with
complex nonspeech stimuli or simple stimuli—observed in previous studies can be better
explained by the increased low-level time-varying correlations that underlie audiovisual
speech. That is, in contrast to simple transitory stimuli, where asynchronies can be de
tected primarily by temporal onset–offsets cues, in slightly asynchronous matching audio-
visual speech there is still a fine temporal correlation between sound and vision (Munhall
et al., 1996). According to Vroomen and Stekelenburg, this (time-shifted) correlation
would induce a form of “temporal ventriloquist” effect (Morein-Zamir et al., 2003; Scheier
et al., 1999; Vroomen & de Gelder, 2004), by which the perceived timing of the auditory
speech stream would be actively shifted (i.e., “ventriloquized”) toward the corresponding
lip gestures, reducing differences in transmission and (p. 538) processing times of the dif
ferent senses (therefore leading to a wider temporal integration windows for these type
of stimuli).
In addition to tolerance of temporal asynchronies, several studies have shown that exten
sive spatial discrepancy between auditory and visual stimuli does not influence the
strength of the McGurk effect (Bertelson et al., 1994; Colin et al., 2001; Fisher &
Pylyshyn; 1994; Jones & Jarrick, 2006; Jones & Munhall, 1997), unless auditory spatial at
tention is manipulated (Tiippana et al., 2011). Jones and Munhall (1997) measured the
Page 21 of 56
Multimodal Speech Perception
magnitude of the illusion for up to 90-degree sound angles and found that the proportion
of auditory-based responses was independent of loudspeaker location. Similar findings
were reported by Jones and Jarrick (2006). They measured the illusion both under tempo
ral (from–360 to 360 ms) and spatial (five different locations) discrepancies. They found
no indication of an additive relationship between the effects of spatial and temporal sepa
rations. In other words, the McGurk illusion was not reduced by the combination of spa
tial and temporal disparities.
Besides measuring the magnitude of the McGurk illusion, a few studies also instructed
participants to make perceptual judgments of the relative location of auditory stimulus
(i.e., point to the apparent origin of the speech sounds; Bertelson et al., 1994; Driver,
1996; Jack & Thurlow, 1973). These studies showed that observers’ judgements of speech
sound locations are biased toward the visual source; the so-called ventriloquist illusion
(Connor, 2000; see Bertelson & de Gelder, 2004, for studies examining the effect in
nonassociative, simple stimuli). This illusion, a classic illustration of multisensory bias in
the spatial domain, underlies our perception of a voice emanating from actors appearing
on the screen when, in reality, the soundtrack is physically located elsewhere. Just as with
the temporal ventriloquist described earlier, the spatial ventriloquist effect is thought to
occur as a result of our perceptual system assuming that the co-occurring auditory infor
mation and visual information have a single spatial origin. An interesting characteristic of
the ventriloquist illusion is that vision tends to dominate audition in the computation of
the location of the emergent percept. This makes functional sense, considering that, in
our daily life, we usually assign the origin of sounds, which may be difficult to localize, es
pecially in noisy or reverberant conditions, to events perceived visually, which can be lo
calized more accurately (Kubovy, 1988; Kubovy & Van Valkenburg, 1995).
Bertelson et al. (1994) measured both the ventriloquist and the McGurk illusions for the
very same audiovisual materials in one experiment. On each trial, participants were pre
sented with an ambiguous fragment of auditory speech, delivered from one of seven hid
den loudspeakers, together with an upright or inverted face shown on centrally located
screen. They were instructed to do a localization task (i.e., point to the apparent origin of
the speech sounds) and an identification task (i.e., report what had been said). Whereas
spatial separations did not reduce the effectiveness of the audiovisual stimuli in produc
ing the McGurk effect, the ventriloquist illusion decreased as the loudspeaker location
moved away from the face (see Colin et al., 2001; Fisher & Pylyshyn, 1994; Jones &
Munhall, 1997, for similar results). The inverted presentation of the face, in contrast, had
no effect on the overall magnitude of the ventriloquism illusion, but it did significantly re
duce the integration of auditory and visual speech (see Jordan & Bevan, 1997). This re
versed pattern suggests that the two phenomena can be dissociated and, perhaps, involve
different components of the cognitive architecture. Note, moreover, that such dissociation
is also in disagreement with the unity assumption hypothesis described above because it
suggests that in some trials, both the unified (i.e., McGurk percepts) and low-level senso
ry discrepancies (i.e., the spatial origin) can be perceived by participants.
Page 22 of 56
Multimodal Speech Perception
The findings that the McGurk illusion is impervious to spatial discrepancies suggest that
the spatial rule for multisensory integration (i.e., enhanced integration for closely located
sensory stimuli) does not apply in the specific case of audiovisual speech perception un
less the task involves judgments regarding the spatial attributes of the event. Time-vary
ing similarities in the patterning of information might prove, in this case, a more salient
feature for binding (Calvert et al., 1998; Jones & Jarrick, 2006; Jones & Munhall, 1997).
Numerous studies attempting to isolate the critical aspects of the visual information have
been carried out by occluding (or freezing) different parts of facial regions and measur
ing the impact of the nonmasked areas to speech reading. These studies have generally
shown that the oral region (i.e., talker’s mouth) offers a fairly direct source of information
about the segmental properties of speech (Summerfield, 1979). For instance, Thomas and
Jordan (2004) showed that the intelligibility of an oral-movements display was similar to
that of a whole-face movements display (though see IJsseldijk, 1992). However, linguisti
cally relevant information can also be extracted from extraoral facial regions, when the
oral aperture is occluded (Preminger et al., 1998; Thomas & Jordan, 2004), probably ow
ing to the strong correlation between oral and extraoral movements described above
(Munhall & Vatikiotis-Bateson, 1998). Furthermore, visual speech influences on the audi
tory component have also been shown to remain substantially unchanged across horizon
tal viewing angles (full face, three-quarter, profile; Jordan & Thomas, 2001), rotations in
the picture plane (Jordan & Bevan, 1997), or when removing the color of the talking face
(Jordan et al., 2000). Even when the facial surface kinematics are reduced to the motion
of a collection of light points, observers still show perceptual benefit in an acoustically
noisy environment and can perceive the McGurk effect (Rosenblum & Saldaña, 1996).
This result suggests that pictorial information such as skin texture does not portray criti
cal information for audiovisual improvements (although note that performance never
reached the levels found in natural displays), and they highlight the importance of motion
cues in speech perception. However, is time-varying information of the seen articulators
sufficient for audiovisual benefits to be observed? In Rosenblum’s study, the patch-light
stimuli covered extensive regions of the face and the inside of the mouth (tongue, teeth)
and the angular deformations of the point-lights could have been used to reveal the local
surface configuration. It is possible, therefore, that such point-lights did not completely
Page 23 of 56
Multimodal Speech Perception
preclude configural information of the face forms (i.e., spatial relations between fea
tures), and that these cues were used together with the motion cues to support speech
identification. Indeed, other studies have found that configural information of the face is
critical for audiovisual benefits and integration to be observed. For instance, Campbell
(1996) demonstrated that the McGurk effect can be disrupted by inverting the brightness
of the face (i.e., photonegative images), a manipulation that severely degrades visual
forms of the face while preserving time-varying information. In the same line, in a speech-
in-noise task, Summerfield (1979) presented auditory speech stimuli together with four
point-lights tracking the motion of the lips (center of top and bottom lips and the corners
of the mouth) or with a Lissajou curve whose diameter was correlated with the amplitude
of the audio signal, and found no enhancement whatsoever. Similarly, auditory speech de
tection in noise is not facilitated by presenting a fine-grained correlated visual object
(e.g., dynamic rectangle whose horizontal extent was correlated with the speech enve
lope; also see Bernstein et al., 2004; Ghazanfar et al., 2005). This suggests that both the
analysis of visual form and analysis of the dynamic characteristics of the seen articulators
are important factors for audiovisual integration. Further studies are required to deter
mine the contribution of each of these sources to speech identification.
The image quality of the face has also been manipulated by using various techniques that
eliminate part of the spatial frequency spectrum (Figure 26.8). Studies using this type of
procedure suggest that fine facial detail is not critical for visual and audiovisual speech
recognition. For instance, Munhall et al. (2004) degraded images by applying different
bandpass and low-pass filters and revealed that the filtered visual information was suffi
cient for attaining a higher speech intelligibility score than that of auditory-only signal
presentation. Results showed that subjects had highest levels of speech intelligibility in
the midrange filter band with a center spectral frequency of 11 cycles per face, but that
the band with 5.5 cycles per face also significantly enhanced intelligibility. This suggests
that high spatial frequency information is not needed for speech perception. In this line,
other studies have shown that speech perception is reduced—but remains effective when
facial images are spatially degraded by quantization (e.g., Campbell & Massaro, 1997;
MacDonald et al., 2000), visual blur (Thomas & Jordan, 2002; Thorn & Thorn, 1989), or
increased stimulus distance (Jordan & Sergeant, 2000).
Page 24 of 56
Multimodal Speech Perception
The results of these studies, moreover, are consistent with the observation that visual
speech can be successfully encoded when presented in peripheral vision, (p. 540) several
degrees away from fixation (Smeele et al., 1998). Indeed, Paré et al. (2003) showed that
manipulations of observers’ gaze did not influence audiovisual speech integration sub
stantially until their gaze was directed at least 60 degrees eccentrically. Thus, the conclu
sion from this finding is that high-acuity, foveal vision of the oral area is not necessary to
extract linguistically relevant visual information from the face.
Altogether, results demonstrating that visual speech perception can subsist when experi
mental stimuli are restricted to low spatial frequency components of the images suggest
that visual speech information is processed at a coarse spatial level. In fact, many studies
examining eye movements of perceivers naturally looking at talking faces (i.e., with no
specific instructions regarding what cues to attend to), consistently look at the eye region
more than the mouth (Klin et al., 2005). Nevertheless, if the task requires extracting fine
linguistic information (e.g., phonetic details in high background noise, word identification
or segmental cues), observers make significantly more fixations on the mouth region than
the eye region (Buchan et al., 2007, 2008; Lansing & McConkie, 2003; Vatikiotis-Bateson
et al., 1998).
volved in audio-visual integration of speech because it exhibits increased activity for audi
tory speech stimulation (Scott & Johnsrude, 2003), visual speech articulation (Bernstein
et al., 2002; Calvert et al., 1997; Campbell et al., 2001; MacSweeney et al., 2001), and
congruent audiovisual speech (Calvert et al., 2000). Furthermore, several studies have
demonstrated that this region shows enhancement to concordant audiovisual stimuli and
depression to mismatching speech (Calvert et al., 2000). Finally, the STS and STG are in
volved with visual enhancement of speech intelligibility in the presence of an acoustic
masking noise, in accordance with the principle of inverse effectiveness (i.e., multisenso
ry enhancement is greatest when uni-modal stimuli are least effective; Callan et al., 2003;
Stevenson & James, 2009).
Whereas these brain regions have been repeatedly shown to be involved in audiovisual
speech processing, the relatively large diversity of experimental settings and analysis
strategies across studies makes it difficult to determine the specific role of each location
in the audiovisual integration processes. Nevertheless, it is now becoming apparent that
different distributed networks of neural structures may serve different functions in audio
visual integration (i.e., time, space, content; Calvert et al., 2000). In a recent study, for ex
ample, Miller and d’Esposito (2005; see also Stevenson et al., 2011) observed that differ
ent brain areas respond preferentially to the detection of sensory correspondence and to
the perceptual fusion of speech events. That is, they found that while middle STS, middle
IPS, and IFG (p. 541) are associated with the perception of fused bimodal speech stimuli,
another network of brain areas (the SC, anterior insula, and anterior IPS) is differentially
involved in the detection of commonalities of seen and heard speech in terms of its tem
poral signature (see also Jones & Callan, 2003; Kaiser et al., 2004; Macaluso et al., 2004).
It remains unknown, however, how these functionally distinct networks of neural groups
work in concert to match and integrate multimodal input during speech perception. Ac
cording to Doesburg et al. (2008), functional coupling between the networks is achieved
through long-range phase synchronization (i.e., neuronal groups that enter into precise
phase-locking over a limited period of time), particularly the oscillations in the gamma
band (30 to 80 Hz; Engel & Singer, 2001; Fingelkurts et al., 2003; Senkowski et al., 2005;
Varela et al., 2001).
Furthermore, other remarkable findings demonstrate that viewing speech can modulate
the activity in the secondary (Wilson et al., 2004) and primary motor cortices, specifically
the mouth area in the left hemisphere (Ferrari et al., 2003; Watkins et al., 2003). That is,
the distributed network of brain regions associated with the perception of bimodal speech
information also includes areas that are typically involved in speech production, such as
Broca’s area, premotor cortex (Meister et al., 2007), and even primary motor cortex
(Calvert & Campbell, 2003; Campbell et al., 2001; Sato et al., 2009; Skipper et al., 2007).
These results appear in keeping with the long-standing proposal that audiovisual integra
tion of speech is mediated by the speech motor system (motor theory of speech percep
tion; Liberman et al., 1967). However, the role that these areas play in audiovisual speech
integration remains to be elucidated (see Galantucci et al., 2006; Hickok, 2009, for dis
cussions).
Page 26 of 56
Multimodal Speech Perception
Moreover, there have been a number of studies in humans and animals that have re
vealed early multi-sensory interactions within areas traditionally considered purely visual
or purely auditory, such as MT/ V5 and Heschl’s gyrus, respectively (Beauchamp et al.,
2004; Callan et al., 2003, 2004; Calvert et al., 2000, 2001; Möttönen et al., 2004; Olson et
al., 2002), and even at earlier stages of processing (i.e., brainstem structures; Musacchia
et al., 2006).
Using functional magnetic resonance imaging (fMRI), Calvert et al. (1997), for example,
found that linguistic visual cues are sufficient to activate primary auditory cortex in nor
mal-hearing individuals in the absence of auditory speech sounds (see also Pekkola et al.,
2005; though see Bernstein et al., 2002, for a nonreplication).
The first evidence that visual speech modulates activity in the unisensory auditory cortex,
however, was provided by Sams et al. (1991). Using magnetoencephalography (MEG)
recordings in an oddball paradigm, Sams et al. found that infrequent McGurk stimuli (in
congruent audiovisually presented syllables) presented among frequent matching audiovi
sual standards gave rise to magnetic mismatch fields (i.e., mismatch negativity [MMN]10;
Näätänen, 1982) at the level of the supratemporal region about 180 ms after stimulus on
set. Other studies using mismatch paradigms have corroborated this result (Colin et al.,
2002; Kislyuk et al., 2008; Möttönen et al., 2002). Using high-density electrical mapping,
Saint-Amour et al. (2007) revealed a dominance of left hemispheric cortical generators
during the early and late phases of the McGurk MMN, consistent with the well-known left
hemispheric dominance for speech reading (Calvert & Lewis 2004; Capek et al. 2004).
Subsequent MEG and electroencephalography (EEG) studies using the additive model (in
which the event-related potential (ERP) responses to audiovisual stimulation are com
pared with the sum of auditory and visual evoked responses) have estimated that the ear
liest interactions between the auditory and visual properties can occur approximately 100
ms after the visual stimulus onset (Besle et al., 2004; Giard & Peronnet, 1999; Klucharev
et al., 2003; Van Wassenhove et al., 2005) or even before (50 ms; Lebib et al., 2003), sug
gesting that audiovisual integration of speech occurs early in the cortical auditory pro
cessing hierarchy. For instance, van Wassenhove et al. (2005) examined the timing of AV
integration for both congruent (/ka/, /pa/, and /ta/) and incongruent (McGurk effect)
speech syllables and found that the latency of the classic components of the auditory
evoked potential (i.e., N1/P2) speeded up when congruent visual information was present.
According to the authors, because the visual cues for articulation often precede the
acoustic cues in natural audiovisual speech (sometimes by more than 100 ms; Chan
drasekaran et al., 2009), an early representation of the speech event can be extracted
from visual cues related to articulator preparation and used to constrain the processing
of the forthcoming auditory input (analysis by synthesis model; van Wassenhove et al.,
2005). Thus, importantly, this model proposes not only that cross-modal effects occur ear
ly in time, and in areas that are generally regarded as lower (early) in the sensory hierar
chy, but also that the earlier-arriving visual information constrains the activation of pho
netic units in the auditory cortex. Note that such a (p. 542) mechanism would only be
Page 27 of 56
Multimodal Speech Perception
functional if the auditory processing system allowed for some sort of feedback signal from
higher order areas.
The idea of feedback modulating the processing in lower sensory areas by using predic
tions generated by higher areas is not novel. Evidence from fMRI experiments has shown
modulation of activity in sensory-specific cortices during audiovisual binding (Calvert &
Campbell, 2003; Calvert et al., 1997, 2000; Campbell et al., 2001; Sekiyama et al., 2003;
Wright et al., 2003) and have proposed that unisensory signals of multisensory objects
are initially integrated in the STS and that interactions in the auditory cortex reflect feed
back inputs from the STS (Calvert et al., 1999). However, although this explanation is ap
pealing, it is unlikely that this is the only way in which audiovisual binding occurs. The
relative preservation of multisensory function after lesions to higher order multisensory
regions (see Ettlinger & Wilson, 1990, for a review) and studies demonstrating that inter
actions in auditory cortex preceded activation in the STS region (Besle et al., 2008;
Bushara, 2003; Möttönen et al., 2004; Musacchia et al., 2006) challenge a pure feedback
interpretation. That is, current views on multisensory integration (Driver & Noesselt,
2008; Schroeder et al., 2008) suggest that there might be an additional direct cortical-
cortical input to auditory cortex from the visual cortex (Cappe & Barone, 2005; Falchier
et al., 2002; Rockland & Ojima, 2003) that might convey direct phase resetting by the mo
tion-sensitive cortex, resulting in tuning of the auditory cortex to the upcoming sound (Ar
nal et al., 2009; Schroeder et al., 2008). Thus, in the light of recent evidence, a more elab
orate network involving both unimodal processing streams and multisensory areas needs
to be considered.
The considerable robustness of the McGurk illusion under conditions in which the observ
er realizes that the visual and auditory streams do not emanate from the same source
makes it seem likely that higher cognitive functions have little access to the integration
process and that the integration cannot be avoided or interrupted at will (Colin et al.,
2002; Massaro, 1998; McGurk & MacDonald, 1976; Rosenblum & Saldaña, 1996; Soto-
Faraco et al., 2004). Nevertheless, the most straightforward demonstration for the auto
matic nature of speech integration is provided by studies using indirect methodologies
and showing that audiovisual integration emerges even when its resulting percept (e.g.,
McGurk illusion) is detrimental to the task at hand (Driver, 1996; Soto-Faraco et al.,
2004). In Soto-Faraco’s study, participants were asked to make speeded classification
judgments about the first syllable of audiovisually presented bisyllabic pseudowords,
Page 28 of 56
Multimodal Speech Perception
while attempting to ignore the second syllable. The paradigm, inspired by the classic Gar
ner speeded classification task (Garner, 1974; see Pallier, 1994, for the syllabic version),
is based on the finding that reaction times required to classify the first syllable (target)
are slowed down when the second (irrelevant) syllable varies from trial to trial, in com
parison to when it remains constant. Soto-Faraco et al. used audiovisual stimuli for which
the identity of the second syllable was sometimes manipulated by introducing McGurk
percepts. Critically, the authors found that the occurrence of the syllabic interference ef
fect was determined by the illusory percept, rather than the actually presented acoustic
stimuli. Overall, these results suggest that the integration of auditory and visual speech
cues occurs before attentive selection of syllables because participants were unable to fo
cus their attention on the auditory component alone and ignore the visual influence in the
irrelevant syllable.
In another study, Driver (1996) showed that selective listening to one of two spatially sep
arated speech streams worsened when the lip movements corresponding to the distractor
speech stream were displayed near the target speech stream. The interpretation for this
interference is that the matching (distractor) auditory stream is ventriloquized toward the
location of the seen talker face, causing an illusory perception that the two auditory
streams emanate from the same source, and thus worsening the selection of the target
stream by spatial attention. The key finding is that participants could not avoid integrat
ing the auditory and visual components of the speech stimulus, suggesting that multisen
sory binding of speech cannot be suppressed and arises before spatial selective attention
is completed.
In line with the behavioral results discussed above, neurophysiological (ERP) studies us
ing (p. 543) oddball paradigms have shown that an MMN can be evoked (Colin et al.,
2002) or eliminated (Kislyuk et al., 2008) by McGurk percepts. Because the MMN is con
sidered to reflect a preattentive comparison process between the neural representation of
the incoming deviant trace and the neural trace of the standard, these results suggest
that conflicting signals from different modalities can be combined into a unified neural
representation during early sensory processing without any attentional modulation. Fur
thermore, the recent discovery that audiovisual interactions can arise early in functional
terms (Callan et al., 2003; Calvert et al., 1997, 1999; Möttönen et al., 2004) seems to be
consistent with the argument that binding mechanisms need to be rapid and mandatory.
However, top-down modulation effects have been observed at very early stages in the
neural processing pathway both for the auditory (e.g., Schadow et al., 2009) and the visu
al sensory modalities (Shulman et al., 1997). It cannot be ruled out, therefore, that top-
down cognitive processes modulate the audiovisual integration mechanisms in an analo
gous fashion. In fact, attentional modulations of audiovisual interactions at hierarchically
early stages of processing have been described before (Calvert et al., 1997; Pekkola et al.,
2005).
Altogether, the studies reviewed above suggest that audiovisual speech integration is cer
tainly resilient to cognitive factors. However, these findings should be not construed as
conclusive evidence for automatic integration of the auditory and visual speech compo
Page 29 of 56
Multimodal Speech Perception
nents in all situations. Recent studies have shown that increasing attentional load to a de
manding, unrelated visual, auditory (Alsius et al., 2005; see also Tiippana et al., 2004), or
tactile task (Alsius et al., 2007) decreases the percentage of illusory McGurk responses.
This result conforms well to the perceptual load theory of attention11 (Lavie, 2005) and in
dicates that spare attentional resources are in fact needed for successful integration.
Note that if audiovisual speech integration were completely independent of attention, it
would remain unaltered when attentional resources were fully consumed.
Furthermore, other studies have shown that selective attention to the visual stimuli is re
quired to perceptually bind audiovisually correlated speech (Alsius & Soto-Faraco, 2011;
Fairhall & Macaluso, 2009) and McGurk-like combinations (Andersen et al., 2008).
Fairhall et al. measured blood-oxygen-level-dependent (BOLD) responses using fMRI
while participants were covertly attending to one of two visual speech streams (i.e., talk
ing faces on the left and right) that could be either congruent or incongruent with a cen
tral auditory stream. The authors found that spatial attention to the corresponding
(matching) visual component of these audiovisual speech pairs was critical for the activa
tion of cortical and subcortical brain regions thought to reflect neural correlates of audio
visual integration. Using a similar display configuration (i.e., two laterally displaced faces
and a single central auditory stream) with McGurk-like stimuli, Andersen et al. (2008)
found that directing visual spatial attention toward one of the faces increased the influ
ence of that particular face on auditory perception (i.e., the illusory percepts related to
that face). Finally, Alsius and Soto-Faraco (2011) used visual and auditory search para
digms to explore how the human perceptual system processes spatially distributed
speech signals (i.e., speaking faces) in order to detect matching audio-visual events. They
found that search efficiency among faces for the match with a voice declined with the
number of faces being monitored concurrently. This suggests that visual selective atten
tion is required to perceptually bind audiovisually correlated objects. In keeping with
Driver’s results described above, however, they found that search among auditory speech
streams for the match with a face was independent of the number of streams being moni
tored concurrently (though see Tiippana, 2011). It seems, therefore, that whereas the
perceptual system has to have full access to the spatial location of the visual event for the
audiovisual correspondences to be detected, similar constraints do not apply to the audi
tory event. That is, multisensory matching seems to occur before the deployment of audi
tory spatial attention (though see Tiippana et al., 2011).
In addition to the above-mentioned results, other studies have shown that audiovisual in
tegration can sometimes be influenced by higher cognitive properties such as observers’
expectations (Windman et al., 2003), talker familiarity (Walker et al., 1995), lexical status
(Barutchu et al., 2008), or instructions (Colin et al, 2005; Massaro, 1998; Summerfield &
McGrath, 1984). For instance, in one study by Summerfield and McGrath (1984; see also
Colin et al., 2005), half of the participants were informed about the artificial nature of the
McGurk stimuli and were instructed to report syllables according to the auditory input.
The rest of participants were not aware of the dubbing procedure and were simply in
structed to repeat what the (p. 544) speaker had said. The visual influence was weaker in
the first group of participants, suggesting that knowledge of how the illusion is created
Page 30 of 56
Multimodal Speech Perception
can facilitate the segregation of the auditory signal from the visual information. Taken to
gether, therefore, these results question an extreme view of automaticity and suggest in
stead some degree of penetrability in the binding process. Along the same lines, other
studies have shown that identical McGurk-like stimuli can be responded to in qualitative
ly different ways, depending on whether the auditory stimuli is perceived by the listener
as speech or as nonspeech (Tuomainen et al., 2005), on whether the visual speech ges
tures are consciously processed or not (Munhall et al., 2009), or on the set of features of
the audiovisual stimuli participants have to respond to (Soto-Faraco & Alsius, 2007;
2009). Thus, the broad picture that emerges from all these studies is that, although au
diovisual speech integration seems certainly robust to cognitive intervention, it can some
times adapt to the specific demands imposed by the task at hand in a dynamic and mal
leable manner.
Therefore, it seems that rather than absolute susceptibility or absolute immunity to top-
down modulation, the experimental conditions at hand will set the degree of automaticity
of the binding process. In situations of low perceptual and cognitive processing load,
where the visual speech stimuli may be particularly hard to ignore (see Lavie, 2005), au
dio-visual integration appears to operate quite early and in a largely automatic manner
(i.e., without voluntary control). This is line with a recent proposal by Talsma et al. (2010)
stating that audiovisual integration will occur automatically (i.e., in a bottom-up manner)
in low-perceptual-load settings, whereas in perceptually demanding conditions, top-down
attentional control will be required. In keeping with this idea, previous studies outside
the speech domain have shown that only when attention is directed to both modalities si
multaneously are auditory and visual stimuli integrated very early in the sensory flow pro
cessing (about 50 ms; Senkowski et al., 2005; Talsma & Woldorff, 2005; Talsma et al.,
2007) and do interactions within higher order heteromodal areas arise (Degerman et al.,
2007; Fort et al., 2002; Pekkola et al., 2005; van Atteveldt et al., 2007). However, top-
down modulations may impose a limit to this automaticity under certain conditions. That
is, having full access (at sensory and cognitive levels) to the visual stimuli (Alsius & Soto-
Faraco, 2011) and processing both the auditory and visual signals as speech stimuli seem
to be critical requisites for binding (Munhall et al., 2009; Summerfield, 1979; Tuomainen
et al., 2005).
In summary, there seems to be a complex and dynamic interplay between audiovisual in
tegration and cognitive processes. As mentioned earlier (see section, Neural Correlates of
Audiovisual Speech Integration), the auditory and visual sensory systems interact at mul
tiple levels of processing during speech perception (Hertrich et al., 2009; Klucharev et
al., 2003). It is possible, therefore, that top-down modulatory signals exert an influence
on some of these levels (see Calvert & Thesen, 2004, for a related argument), possibly de
pending on perceptual information available (Fairhall & Macaluso, 2009), resources re
quired (Alsius et al., 2005; 2007), spatial attention (Alsius & Soto-Faraco, 2011; Tiippana
et al., 2011), task parameters (Hugenschmidt et al., 2010), and the type of attribute en
coded (Munhall et al., 2009; Soto-Faraco & Alsius, 2009; Tuomainen et al., 2005).
Page 31 of 56
Multimodal Speech Perception
Summary
Speech is a naturally occurring multisensory event and we, as perceivers, are capable of
using various components of this rich signal to communicate. The act of talking involves
the physical generation of sound by the face and vocal tract and the acoustics and move
ments are inextricably linked. These physical signals can be accessed in part by the visu
al, auditory, and haptic perceptual systems. Each of these perceptual systems preferen
tially extracts different information about the total speech event. Thus, the combination
of multiple sensory channels provides a richer and more robust perceptual experience.
The neural networks supporting this processing are diverse and extended. In part, the
neural substrates involved depend on the task subjects are engaged in. Much remains to
be learned about how low-level automatic processes contribute to multisensory process
ing and how and when higher cognitive strategies can modify this processing. Under
standing behavioral data about task and attention are prerequisites for a principled neu
roscientific explanation of this phenomenon.
References
Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of
speech falters under high attention demands. Current Biology, 15, 839–843.
Alsius, A., Navarra, J., Soto-Faraco, S. (2007). Attention to touch reduces audiovisual
speech integration. Experimental Brain Research, 183, 399–404.
Andersen, T. S., Tiippana, K., Laarni, J., Kojo, I., & Sams, M. (2008). The role of visual spa
tial attention in audiovisual speech perception. Speech Communication, 51, 184–193.
Arnal, L. H., Morillon, B., Kell, C. A., & Giraud, A. L. (2009). Dual neural routing of visual
facilitation in speech processing. Journal of Neuroscience, 29, 13445–13453.
Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when
speech is clearly audible and intact. British Journal of Psychology, 92 (2), 339–355.
Auer, E. T., Jr. (2002). The influence of the lexicon on speech read word recognition: Con
trasting segmental and lexical distinctiveness. Psychonomic Bulletin and Review, 9, 341–
347.
Auer, E. T., Jr. (2009). Spoken word recognition by eye. Scandinavian Journal of Psycholo
gy, 50, 419–425.
Page 32 of 56
Multimodal Speech Perception
Auer, E. T., Jr., & Bernstein, L. E. (1997). Speechreading and the structure of the lexicon:
Computationally modeling the effects of reduced phonetic distinctiveness on lexical
uniqueness. Journal of the Acoustical Society of America, 102 (6), 3704–3710.
Auer, E. T., Jr., Bernstein, L. E., & Coulter, D. C. (1998). Temporal and spatio-temporal vi
brotactile displays for voice fundamental frequency: An initial evaluation of a new vibro
tactile speech perception aid with normal-hearing and hearing-impaired individuals. Jour
nal of the Acoustical Society of America, 104, 2477–2489.
Barutchu, A., Crewther, S., Kiely, P., & Murphy, M. (2008). When /b/ill with /g/ill becomes /
d/ill: Evidence for a lexical effect in audiovisual speech perception. European Journal of
Cognitive Psychology, 20 (1), 1–11.
Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling
multisensory integration: patchy organization within human STS multisensory cortex. Na
ture Neuroscience, 7, 1190–1192.
Berger, K. W. (1972). Visemes and homophenous words. Teacher of the Deaf, 70, 396–399.
Bernstein, L. E., Auer, E. T., Moore, J. K., Ponton, C., Don, M., & Singh, M. (2002). Visual
speech perception without primary auditory cortex activation. NeuroReport, 13, 311–315.
Bernstein, L. E., Auer, E. T., & Takayanagi, S. (2004). Auditory speech detection in
(p. 546)
Bernstein, L. E., Demorest, M. E., & Tucker, P. E. (1998). What makes a good speechread
er? First you have to find one. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by
eye. II: The Psychology of speechreading and auditory–visual speech (pp. 211–228). East
Sussex, UK: Psychology Press.
Bernstein, L. E., Iverson, P., & Auer, E. T., Jr. (1997). Elucidating the complex relation
ships between phonetic perception and word recognition in audiovisual speech percep
tion. In C. Benoît & R. Campbell (Eds.), Proceedings of the ESCA/ ESCOP workshop on
audio-visual speech processing (pp. 21–24). Rhodes, Greece.
Bertelson, P., & de Gelder, B. (2004). The psychology of multimodal perception In: C.
Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 141–177). Ox
ford, UK: Oxford University Press.
Bertelson, P., Vroomen, J., Wiegeraad, G., & de Gelder, B. (1994). Exploring the relation
between McGurk interference and ventriloquism. In Proceedings of ICSLP 94, Acoustical
Society of Japan (Vol. 2, pp. 559–562). Yokohama, Japan.
Page 33 of 56
Multimodal Speech Perception
Besle, J., Fischer, C., Bidet-Caulet, A., Lecaignard, F., Bertrand, O., & Giard, M.H. (2008).
Visual activation and audiovisual interactions in the auditory cortex during speech per
ception: intracranial recordings in humans. Journal of Neuroscience, 28, 14301–14310.
Besle, J., Fort, A., Delpuech, C., & Giard, M.-H. (2004). Bimodal speech: Early suppressive
visual effects in the human auditory cortex. European Journal of Neuroscience, 20, 2225–
2234.
Boothroyd, A. (1970). Concept and control of fundamental voice frequency in the deaf: An
experiment using a visible pitch display. Paper presented at the International Congress of
Education of the Deaf, Stockholm, Sweden.
Brancazio, L., & Miller, J. L. (2005). Use of visual information in speech perception: Evi
dence for a visual rate effect both with and without a McGurk effect. Perception and Psy
chophysics, 67, 759–769.
Brancazio, L., Miller, J. L., & Paré, M. A. (1999). Perceptual effects of place of articulation
on voicing for audiovisually discrepant stimuli. Journal of the Acoustical Society of Ameri
ca, 106, 2270.
Brooks, P. L., & Frost, B. J. (1983). Evaluation of a tactile vocoder for word recognition.
Journal of the Acoustical Society of America, 74, 34–39.
Brooks, P. L., Frost, B. J., Mason, J. L., & Gibson, D. M. (1986). Continuing evaluation of
Queen’s University tactile vocoder I: Identification of open set words. Journal of Rehabili
tation Research and Development, 23, 119–128.
Buchan, J. N., Paré, M., & Munhall, K. G. (2004). The influence of task on gaze during au
diovisual speech perception. Journal of the Acoustical Society of America, 115, 2607.
Buchan, J. N., Paré, M., & Munhall, K. G. (2007). Spatial statistics of gaze fixations during
dynamic face processing. Social Neuroscience, 2 (1), 1–13.
Buchan, J. N., Paré, M., & Munhall, K. G. (2008). The effect of varying talker identity and
listening conditions on gaze behavior during audiovisual speech perception. Brain Re
search, 1242, 162–171.
Burnham, D., Ciocca, V., Lauw, C., Lau, S., & Stokes, S. (2000). Perception of visual infor
mation for Cantonese tones. In M. Barlow & P. Rose (Eds.), Proceedings of the Eighth
Australian International Conference on Speech Science and Technology (pp. 86–91). Aus
tralian Speech Science and Technology Association, Canberra.
Bushara, K. O., Hanakawa, T., Immisch, I., Toma, K., Kansaku, K., & Hallet, M. (2003).
Neural correlates of cross-modal binding. Nature Neuroscience, 6 (2), 190–195.
Page 34 of 56
Multimodal Speech Perception
Callan, D., Jones, J. A., Munhall, K. G., Kroos, C., Callan, A., & Vatikiotis Bateson, E.
(2003). Neural processes underlying perceptual enhancement by visual speech gestures.
NeuroReport, 14, 2213–2218.
Callan, D., Jones, J. A., Munhall, K. G., Kroos, C., Callan, A., & Vatikiotis-Bateson, E.
(2004). Multisensory integration sites identified by perception of spatial wavelet filtered
visual speech gesture information. Journal of Cognitive Neuroscience, 16, 805–816.
Calvert, G. A., Brammer, M., Bullmore, E., Campbell, R., Iversen, S. D., & David, A.
(1999). Response amplification in sensory-specific cortices during crossmodal binding.
Neuroreport, 10, 2619–2623.
Calvert, G. A., Brammer, M. J., & Iversen, S. D. (1998). Crossmodal identification. Trends
in Cognitive Sciences, 2 (7), 247–253.
Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P.
K., Woodruff, P. W., Iversen, S. D., & David, A. S. (1997). Activation of auditory cortex dur
ing silent lipreading. Science, 276, 593–596.
Calvert, G. A., & Campbell, R. (2003). Reading speech from still and moving faces: The
neural substrates of visible speech. Journal of Cognitive Neuroscience, 15, 57–70.
Calvert, G.A., Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic
resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biol
ogy, 10, 649–657.
Calvert, G. A., Hansen, P. C., Iversen, S. D., & Brammer, M. J. (2001). Detection of audiovi
sual integration sites in humans by application of electro-physiological criteria to the
BOLD effect. NeuroImage, 14, 427–438.
Campbell, C. S., & Massaro, D. W. (1997). Visible speech perception: Influence of spatial
quantization. Perception, 26, 627–644.
Page 35 of 56
Multimodal Speech Perception
Campbell, R., MacSweeney, M., Surguladze, S., Calvert, G. A., Brammer, M. J.,
(p. 547)
David, A. S., & Williams, S. C. R. (2001). Cortical substrates for the perception of face ac
tions: An fMRI study of the specificity of activation for seen speech and for meaningless
lower-face acts (gurning). Cognitive Brain Research, 12, 233–243.
Campbell, R., Zihl, J., Massaro, D., Munhall, K., & Cohen, M. (1997). Speechreading in the
akinetopsic patient, LM. Brain, 120, 1793–1803.
Capek, C. M., Bavelier, D., Corina, D., Newman, A. J., Jezzard, P., & Neville, H. J. (2004)
The cortical organization of audio-visual sentence comprehension: An fMRI study at 4
Tesla. Cognitive Brain Research, 20, 111–119.
Cappe, C., & Barone, P. (2005). Heteromodal connections supporting multisensory inte
gration at low levels of cortical processing in the monkey. European Journal of Neuro
science, 22 (11), 2886–2902.
Carney, A. E., Clement, B. R., & Cienkowski, K. M. (1999). Talker variability effects in au
ditory-visual speech perception. Journal of the Acoustical Society America, 106, 2270 (A).
Chandrasekaran, C., & Ghazanfar, A. A. (2009). Different neural frequency bands inte
grate faces and voices differently in the superior temporal sulcus. Journal of Neurophysi
ology, 101, 773–788.
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009).
The natural statistics of audiovisual speech. PLoS Computational Biology, 5, e1000436.
Chen, T. H., & Massaro, D. W. (2008). Seeing pitch: Visual information for lexical tones of
Mandarin Chinese. Journal of Acoustical Society of America, 123 (4), 2356–2366.
Colin, C., Radeau, M., & Deltenre, P. (2005). Top-down and bottom-up modulation of au
diovisual integration in speech. European Journal of Cognitive Psychology, 17 (4), 541–
560.
Colin, C., Radeau, M., Deltenre, P., & Morais, J. (2001). Rules of intersensory integration
in spatial scene analysis and speech reading. Psychologica Belgica, 41, 131–144.
Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P. (2002). Mismatch
negativity evoked by the McGurk–MacDonald effect: A phonetic representation within
short-term memory. Clinical Neurophysiology, 113, 495–506.
Page 36 of 56
Multimodal Speech Perception
Conrey, B., & Pisoni, D. B. (2006). Auditory-visual speech perception and synchrony de
tection for speech and nonspeech signals. Journal of the Acoustical Society of America,
119, 4065–4073.
Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from
juncture misperception. Journal of Memory and Language, 31, 218–236.
Cvejic, E., Kim, J., & Davis, C. (2010). Prosody off the top of the head: Prosodic contrasts
can be discriminated by head motion. Speech Communication, 52 (6), 555–564.
Davis, C., & Kim, J. (2006). Audio-visual speech perception off the top of the head. Cogni
tion, v100, B21–B31.
Degerman, A., Rinne, T., Pekkola, J., Autti, T., Jääskeläinen, I. P., Sams, M., & Alho, K.
(2007). Human brain activity associated with audiovisual perception and attention. Neu
roImage, 34, 1683–1691.
Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception,
9, 719–721.
Doesburg, S. M., Emberson, L. L., Rahi1, A., Cameron D., & Ward, L. M. (2008). Asyn
chrony from synchrony: Longrange gamma-band neural synchrony accompanies percep
tion of audiovisual speech asynchrony. Experimental Brain Research, 185, 1.
Driver, J., & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on
“sensory-specific” brain regions, neural responses, and judgments. Neuron, 57 (1), 11–23.
Dupoux, E. (1993). The time course of prelexical processing: The Syllabic Hypothesis re
visited. In: G. T. M. Altmann & R. Shillcock (Eds.), Cognitive models of speech processing:
The second Sperlonga meeting (pp. 81–114). Hillsdale, NJ: Erlbaum.
Engel, A. K., & Singer, W. (2001). Temporal binding and the neural correlates of sensory
awareness. Trends in Cognitive Sciences, 5, 16–25.
Erber, N. P., & Cramer, K. D. (1974). Vibrotactile recognition of sentences. American An
nals of the Deaf, 119, 716–720.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a
statistically optimal fashion. Nature, 415, 429–433.
Page 37 of 56
Multimodal Speech Perception
Eskelund, K., Tuomainen, J., & Andersen, T. S. (2011). Multistage audiovisual integration
of speech: Dissociating identification and detection. Experimental Brain Research, 208,
447–457.
Fairhall, S., & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration
at multiple cortical and subcortical sites. European Journal of Neuroscience, 29, 1247–
1257.
Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical evidence of mul
timodal integration in primate striate cortex. Journal of Neuroscience, 22, 5749–5759.
Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons responding to
the observation of ingestive and communicative mouth actions in the monkey ventral pre
motor cortex. European Journal of Neuroscience, 17, 1703–1714.
Fingelkurts, A. A., Fingelkurts, A. A., Krause, C. M., Möttönen, R., & Sams, M. (2003).
Cortical operational synchrony during audio–visual speech integration. Brain Language,
85, 297–312.
Fisher, B. D., & Pylyshyn, Z. W. (1994). The cognitive architecture of bimodal event per
ception: A commentary and addendum to Radeau. Current Psychology of Cognition, 13
(1), 92–96.
Fisher, C. G. (1969). The visibility of terminal pitch contour. Journal of Speech and Hear
ing Research, 12, 379–382.
Fort, A., Delpuech, C., Pernier, J., & Giard, M. H. (2002). Early auditory-visual in
(p. 548)
teractions in human cortex during nonredundant target identification. Cognitive Brain Re
search, 14, 20–30.
Fowler, C., & Deckle, D. J. (1991). Listening with eye and hand: Cross-modal contributions
to speech perception. Journal of Experimental Psychology, Human Perception and Perfor
mance, 17, 816–828.
Page 38 of 56
Multimodal Speech Perception
Foxton, J. M., Weisz, N., Bauchet-Lecaignard, F., Delpuech, C., & Bertrand, O. (2009). The
neural bases underlying pitch processing difficulties. NeuroImage, 45, 1305–1313.
Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. Y. (2004). Recalibration of audiovisual
simultaneity. Nature Neuroscience, 7 (7), 773–778.
Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech percep
tion reviewed. Psychonomic Bulletin and Review, 13, 361–377.
Gagné, J. P., Tugby, K. G., & Michaud, J. (1991). Development of a Speechreading Test on
the Utilization of Contextual Cues (STUCC): Preliminary findings with normal-hearing
subjects. Journal of the Academy of Rehabilitative Audiology, 24, 157–170.
Garner, W. R. (1974). The processing of information and structure. Hillsdale, NJ: Erlbaum.
Gault, R. H., & Crane, G.W. (1928). Tactual patterns from certain vowel qualities instru
mentally communicated from a speaker to a subject’s fingers. Journal of General Psychol
ogy, 1, 353–359.
Gentilucci, M., & Cattaneo, L. (2005). Automatic audiovisual integration in speech per
ception. Experimental Brain Research, 167, 66–75.
Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., & Logothetis, N. K. (2005). Multisensory in
tegration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuro
science, 25, 5004–5012.
Giard, M. H., & Peronnet, F. (1999). Auditory-visual integration during multimodal object
recognition in humans: a behavioral and electrophysiological study. Journal of Cognitive
Neuroscience, 11, 473–490.
Gick, B., & Derrick, D. (2009). Aero-tactile integration in speech perception. Nature, 462,
502–504.
Gick, B., Jóhannsdóttir, K., Gibraiel, D., & Muehlbauer, J. (2008). Tactile enhancement of
auditory and visual speech perception in untrained perceivers. Journal of the Acoustical
Society America, 123, 72–76.
Granström, B., House, D., & Lundeberg, M. (1999). Prosodic cues in multimodal speech
perception. In Proceedings of the International Congress of Phonetic Sciences (ICPhS99)
(pp. 655–658). San Francisco.
Grant, K. W. (2001). The effect of speechreading on masked detection thresholds for fil
tered speech. Journal of the Acoustical Society America, 109, 2272–2275.
Grant, K. W., Ardell, L. A., Kuhl, P. K., & Sparks, D. W. (1986). The transmission of prosod
ic information via an electrotactile speech reading aid. Ear and Hearing, 7, 328–335.
Page 39 of 56
Multimodal Speech Perception
Grant, K. W., & Greenberg, S. (2001). Speech intelligibility derived from asynchronous
processing of auditory-visual information. International Conference of Auditory-Visual
Speech Processing (pp. 132–137). Santa Cruz, CA.
Grant, K. W., & Seitz, P.-F. (2000). The use of visible speech cues for improving auditory
detection of spoken sentences. Journal of the Acoustical Society America, 108, 1197–
1208.
Grant, K. W., VanWassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spec
tral) and auditory-visual (cross-modal) synchrony. Speech peech Communication, 44, 43–
53.
Green, K. P., & Gerdeman, A. (1995). Cross-modal discrepancies in coarticulation and the
integration of speech information: The McGurk effect with mismatched vowels. Journal of
Experiment Psychology: Human Perception and Performance, 21 (6), 1409–1426.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B. (1991). Integrating speech infor
mation across talkers, gender, and sensory modality: Female faces and male voices in the
McGurk effect. Perception and Psychophysics, 50 (6), 524–536.
Green, K. P., & Miller, J. L. (1985). On the role of visual rate information in phonetic per
ception. Perception & Psychophysics, 38, 269–276.
Hamilton, R. H., Shenton, J. T., & Coslett, H. B. (2006). An acquired deficit of audiovisual
speech processing. Brain and Lang uage, 98, 66–73.
Hanin, L., Boothroyd, A., & Hnath-Chisolm, T. (1988). Tactile presentation of voice fre
quency as an aid to the speechreading of sentences. Ear and Hearing, 9, 335–341.
Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. (2009). Time course of early
audiovisual interactions during speech and nonspeech central auditory processing: A
magnetoencephalography study. Journal of Cognitive Neuroscience, 21, 259–274.
Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding
in monkeys and humans. Journal of Cognitive Neuroscience, 21, 1229–1243.
Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities.
Journal of Experimental Psychology, 62, 423–432.
Hugenschmidt, C. E., Peiffer, A. M., McCoy, T. P., Hayasaka, S., & Laurienti, P. J. (2010).
Preservation of crossmodal selective attention in healthy aging. Experimental Brain Re
search, 198, 273–285.
Jack, C. E., & Thurlow, W. R. (1973). Effects of degree of visual association and angle of
displacement on the “ventriloquism” effect. Perceptual and Motor Skills, 37, 967–979.
Page 40 of 56
Multimodal Speech Perception
Jackson, P. L. (1988). The theoretical minimal unit for visual speech perception: Visemes
and coarticulation. Volta Review, 90 (5), 99–114.
Jones, J., & Callan, D. (2003). Brain activity during audiovisual speech perception: An fM
RI study of the McGurk effect. NeuroReport, 14, 1129–1133.
Jones, J. A., & Jarick, M. (2006). Multisensory integration of speech signals: The relation
ship between space and time. Experimental Brain Research, 174, 588–594.
Jones, J. A., & Munhall, K. G. (1997). The effects of separating auditory and visual sources
on audiovisual integration of speech. Canadian Acoustics, 25, 13–19.
Jordan, T. R., & Bevan, K. M. (1997). Seeing and hearing rotated faces: Influences
(p. 549)
Jordan, T. R., & Sergeant, P. (2000). Effects of distance on visual and audiovisual speech
recognition. Language and Speech, 43, 107–124.
Jordan, T. R., McCotter, M. V., & Thomas, S. M. (2000). Visual and audiovisual speech per
ception with color and gray scale facial images. Perception and Psychophysics, 62, 1394–
1404.
Jordan, T. R., & Thomas, S. (2001). Effects of horizontal viewing angle on visual and au
diovisual speech recognition, Journal of Experimental Psychology: Human Perception and
Performance, 27, 1386–1403.
Kaiser, J., Hertrich, L., Ackermann, H., Mathiak, K., & Lutzenberger, W. (2004). Hearing
lips: Gamma-band activity during audiovisual speech perception. Cerebral Cortex, 15,
646–653.
Kim, J., & Davis, C. (2004). Investigating the audio-visual speech detection advantage.
Speech Communication, 44, 19–30.
Kim, J., Kroos, C., & Davis, C. (2010). Hearing a point-light talker: An auditory influence
on a visual motion detection task. Perception, 39 (3), 407–416.
Kislyuk, D. S., Möttönen, R., & Sams, M. (2008). visual processing affects the neural basis
of auditory discrimination. Journal of Cognitive Neuroscience, 20 (12), 2175–2184.
Page 41 of 56
Multimodal Speech Perception
Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2005). Visual fixation patterns
during viewing of naturalistic social situations as predictors of social competence in indi
viduals with autism. Archives of General Psychiatry, 59, 809–816.
Klucharev, V., Möttönen, R., & Sams, M. (2003). Electrophysiological indicators of phonet
ic and non-phonetic multisensory interactions during audiovisual speech perception. Cog
nitive Brain Research, 18 (1), 65–75.
Kubovy, M., & Van Valkenburg, J. (1995). Auditory and visual objects. Cognition, 80, 97–
126.
Lansing, C. R., & McConkie, G. W. (1994). A new method for speechreading research:
Tracking observer’s eye movements. Journal of the Academy of Rehabilitative Audiology,
27, 25–43.
Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and
prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Re
search, 42, 526–538.
Lansing, C. R., & McConkie, G. W. (2003). Word identification and eye fixation locations in
visual and visual-plusauditory presentations of spoken sentences. Perception and Psy
chophysics, 65 (4), 536–552.
Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette J. H., & Wallace, M. T. (2004). Seman
tic congruence is a critical factor in multisensory behavioral performance. Experimental
Brain Research, 158, 405–414.
Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cog
nitive Sciences, 9 (2), 75–82.
Lebib, R., Papo, D., de Bode, S., & Baudonniere, P. M. (2003). Evidence of a visual-audito
ry cross-modal sensory gating phenomenon as reflected by the human P50 event-related
potential modulation. Neuroscience Letters, 341, 185–188.
Levitt, H. (1988). Recurrent issues underlying the development of tactile sensory aids.
Ear and Hearing, 9 (6), 301–305.
Page 42 of 56
Multimodal Speech Perception
Levitt, H. (1995). Processing of speech signals for physical and sensory disabilities. Pro
ceedings of the National Academy Science U S A, 92 (22), 9999–10006.
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Percep
tion of the speech code. Psychological Review, 74 (6), 431–461.
Liberman, A. M., Delattre, P., Cooper, F. S., & Gerstman, L. (1954). The role of consonant–
vowel transitions in the perception of the stop and nasal consonants. Psychological Mono
graphs: General and Applied, 68, 1–13.
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception re
vised. Cognition, 21, 1–36.
Lidestam, B., & Beskow, J. (2006). Visual phonemic ambiguity and speechreading. Journal
of Speech, Language, and Hearing Research, 49 (4), 835–847.
Lisker, L. (1986). “Voicing” in English: A catalog of acoustic features signaling /b/ versus /
p/ in trochees. Language and Speech, 29, 3–11.
Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J. (2004). Spatial and temporal
factors during processing of audiovisual speech: A PET study. NeuroImage, 21, 725–732.
MacDonald, J., Andersen, S., & Bachmann, T. (2000). Hearing by eye: How much spatial
degradation can be tolerated? Perception, 29, 1155–1168.
MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes.
Perception & Psychophysics, 24, 253–257.
MacLeod, A., & Summerfield, Q. (1987). Quantifying the contribution of vision to speech
perception in noise. British Journal of Audiology, 21, 131–141.
MacSweeney, M., Campbell, R., Calvert, G. A., McGuire, P. K., David, A. S., & Suckling, J.
(2001). Dispersed activation in the left temporal cortex for speechreading in congenitally
deaf speechreaders. Proceedings of the Royal Society of London B, 268, 451–457.
Massaro, D. (2009). Caveat emptor: The meaning of perception and integration in speech
perception. Available from Nature Precedings, http://hdl.handle.net/10101/npre.
2009.4016.1.
Massaro, D. W., & Beskow, J. (2002). Multimodal speech perception: A paradigm for
speech science. In B. Granström, D. House, & I. Karlsson (Eds.), Multimodality in lan
guage and speech systems (pp. 45–71). Dordrecht: Kluwer Academic Publishers.
Page 43 of 56
Multimodal Speech Perception
Massaro, D. W., & Cohen, M. M. (1983). Phonological context in speech perception. Per
ception and Psychophysics, 34, 338–348.
Massaro, D. W., & Cohen, M. M. (1996). Perceiving speech from inverted faces.
(p. 550)
Massaro, D. W., Cohen, M. M., & Smeele, P. M. T. (1996). Perception of asynchronous and
conflicting visible and auditory speech. Journal of the Acoustical Society of America, 100,
1777–1786.
Massaro, D. W., & Light, J. (2004). Using visible speech for training perception and pro
duction of speech for hard of hearing individuals. Journal of Speech, Language, and Hear
ing Research, 47 (2), 304–320.
Mattys, S. L., Bernstein, L. E., & Auer, E. T., Jr. (2002). Stimulus based lexical distinctive
ness as a general word-recognition mechanism. Perception and Psychophysics, 64, 667–
679.
Mayer, C., Abel, J., Barbosa, A., Black, A., & Vatikiotis-Bateson, E. (2011). The labial
viseme reconsidered: Evidence from production and perception. Journal of the Acoustic
Society of America, 129, 2456–2456.
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cogni
tive Psychology, 18, 1–86.
McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-visual
speech recognition by normal-hearing adults. Journal of the Acoustic Society of America,
77, 678–685.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 265, 746–
748.
Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D., & Iacoboni, M. (2007). The essential
role of premotor cortex in speech perception. Current Biology, 17 (19), 1692–1696.
Miller, L. M., & D’Esposito, M. (2005). Perceptual fusion and stimulus coincidence in the
cross-modal integration of speech. Journal of Neuroscience, 25, 5884–5893.
Miller, J. L., & Eimas, P. (1995). Speech perception: From signal to word. Annual Review
of Psychology, 46, 467–492.
Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some
English consonants. Journal of the Acoustical Society America, 27, 338–352.
Page 44 of 56
Multimodal Speech Perception
Miyamoto, R. T., Myres, W. A., Wagner, M., & Punch, J. I. (1987). Vibrotactile devices as
sensory aids for the deaf. Journal of American Academy of Otolaryngology-Head and Neck
Surgery, 97, 57–63.
Morein-Zamir, S., Soto-Faraco, S., & Kingstone, A. (2003). The capture of vision by audi
tion: Deconstructing temporal ventriloquism. Cognitive Brain Research, 17, 154–163.
Möttönen, R., Krause, C. M., Tiippana, K., & Sams, M. (2002). Processing of changes in vi
sual speech in the human auditory cortex. Cognitive Brain Research, 13, 417–425.
Möttönen, R., Schurmann, M., & Sams, M. (2004). Time course of multisensory interac
tions during audiovisual speech perception in humans: A magnetoencephalographic
study. Neuroscience Letters, 363, 112–115.
Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the
McGurk effect. Perception and Psychophysics, 58, 351–362.
Munhall, K. G., Jones, J. A., Callan, D. Kuratate, T., & Vatikiotis-Bateson, E. (2003). Visual
prosody and speech intelligibility: Head movement improves auditory speech perception.
Psychological Science, 15 (2), 133–137.
Munhall, K. G., Kroos, C., Jozan, G., & Vatikiotis-Bateson, E. (2004). Spatial frequency re
quirements for audiovisual speech perception. Perception and Psychophysics, 66, 574–
583.
Munhall, K. G., Servos, P., Santi, A., & Goodale, M. (2002). Dynamic visual speech percep
tion in a patient with visual form agnosia. NeuroReport, 13 (14), 1793–1796.
Munhall, K.G., ten Hove, M., Brammer, M., & Paré, M. (2009). Audiovisual integration of
speech in a bistable illusion. Current Biology, 19 (9), 1–5.
Munhall, K. G., & Vatikiotis-Bateson, E. (1998). The moving face during speech communi
cation. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: Pt. 2. The psychol
ogy of speechreading and audiovisual speech (pp. 123–139). London: Taylor & Francis,
Psychology Press.
Munhall, K. G., & Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audio
visual speech perception. In G. A. Calvert, C. Spence, B. E. Stein (Eds.), The handbook of
multisensory processing (pp. 177–188). Cambridge, MA: MIT Press.
Musacchia, G., Sams, M., Nicol, T., & Kraus, N. (2006). Seeing speech affects acoustic in
formation processing in the human brainstem. Experimental Brain Research, 168 (1-2), 1–
10.
Page 45 of 56
Multimodal Speech Perception
Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulato
ry information enables the perception of L2 sounds. Psychological Research, 71 (1), 4–12.
Nitchie, E. B. (1916). The use of homophenous words. Volta Rev iew, 18, 85–83.
Norris, D., McQueen, J. M., & Cutler, A. (1995). Competition and segmentation in spoken-
word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition,
21, 1209–1228.
Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recogni
tion: Feedback is never necessary. Behavioral and Brain Sciences, 23 (3), 299–324.
Ohala, J. (1975). Temporal regulation of speech. In G. Fant & M. A. A. Tatham (Eds.), Audi
tory analysis and perception of speech (pp. 431–453). London: Academic Press.
Olson, I. R., Gatenby, J. C., & Gore, J. C. (2002). A comparison of bound and unbound au
dio-visual information processing in the human cerebral cortex. Cognitive Brain Research,
14, 129–138.
Ouni, S., Cohen, M. M., Ishak, H., & Massaro, D. W (2007). Visual contribution to speech
perception: Measuring the intelligibility of animated talking heads EURASIP. Journal on
Audio, Speech, and Music Processing, 2007 (Article ID 47891), 1–12.
Owens, O., & Blazek, B. (1985) Visemes observed by hearingimpaired and normal-hearing
adult viewers. Speech and Hearing Research, 28, 381–393.
Pandey, P. C., Kunov, H., & Abel, S. M. (1986). Disruptive effects of auditory signal delay
on speech perception with lipreading. Journal of Auditory Research, 26, 27–41.
Paré, M., Richler, R., ten Hove, M., & Munhall, K. G. (2003). Gaze behavior in audiovisual
speech perception: The influence of ocular fixations on the McGurk effect. Perception and
Psychophysics, 65, 553–567.
Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and
clear speech in noise and reverberation for listeners with normal and impaired hearing.
Journal of the Acoustical Society of America, 95, 1581–1592.
Pekkola, J., Ojanen, V., Autti, T., Jaaskelainen, I. P., Mottonen, R., Tarkiainen, A., &
(p. 551)
Sams, M. (2005). Primary auditory cortex activation by visual speech: An fMRI study at 3
T. NeuroReport, 16, 125–128.
Page 46 of 56
Multimodal Speech Perception
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of
hearing I: Intelligibility differences between clear and conversational speech. Journal of
Speech and Hearing Research, 28, 96–103.
Plant, G. (1989). A comparison of five commercially available tactile aids. Australian Jour
nal of Audiology, 11, 11–19.
Pöppel, E., Schill, K., & von Steinbüchel, N. (1990). Sensory integration within temporally
neutral systems states: a hypothesis. Naturwissenschaften, 77, 89–91.
Potter, R. K., Kopp, G. A., & Green, H. C. (1947). Visible speech. New York: Van Nostrand.
(Dover Publications reprint 1966).
Preminger, J. E., Lin, H., Payen, M., & Levitt, H. (1998). Selective visual masking in
speech reading. Journal of Speech Language and Hearing Research, 41 (3), 564–575.
Reed, C. M., Durlach, N. I., & Braida, L. D. (1982). Research on tactile communication of
speech: A review. American Speech, Language, and Hearing Association, 20.
Reed, C. M., Rabinowitz, W. M., Durlach, N. I., Braida, L. D., Conway-Fithian, S., &
Schultz, M. C. (1985). Research on the Tadoma method of speech communication. Journal
of the Acoustical Society America, 77, 247–257.
Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A
lip-reading advantage with intact auditory stimuli. In B. Dodd, R. Campbell (Eds.), Hear
ing by eye: The psychology of lip-reading. Hillsdale, NJ: Erlbaum.
Rockland, K. S., & Ojima, H. (2003). Multisensory convergence in calcarine visual areas
in macaque monkey. International Journal of Psychophysiology, 50, 19–26.
Rosenblum, L. D., Johnson, J. A., & Saldaña, H. M. (1996). Visual kinematic information
for embellishing speech in noise. Journal of Speech and Hearing Research, 39 (6), 1159–
1170.
Page 47 of 56
Multimodal Speech Perception
Rosenblum, L. D., & Saldaña, H. M. (1996). An audiovisual test of kinematic primitives for
visual speech perception. Journal of Experimental Psychology: Human Perception and
Performance, 22 (2), 318–331.
Rosenblum, L. D., & Saldaña, H. M. (1998). Time-varying information for visual speech
perception. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: Pt. 2. The psy
chology of speechreading and audiovisual speech (pp. 61–81). Hillsdale, NJ: Erlbaum.
Rönnberg, J., Samuelsson, S., & Lyxell, B. (1998). Conceptual constraints in sentence-
based lipreading in the hearing-impaired. In R. Campbell, B. Dodd, & D. Burnham (Eds.),
Hearing by eye, II: Advances in the psychology of speechreading and auditory-visual
speech (pp. 143–153). East Sussex, UK: Psychology Press.
Ross, L., Saint-Amour, D., Leavitt, V., Jeavitt, D. C., & Foxe J. J. (2006). Do you see what
I’m saying? Optimal visual enhancement of speech comprehension in noisy environments.
Cerebral Cortex, 17 (5), 1147–1153.
Rouger, J., Fraysse, B., Deguine, O., & Barone, P. (2008). McGurk effects in cochlear-im
planted deaf subjects. Brain Research, 1188, 87–99.
Rouger, J., Lagleyre, S., Fraysse, B., Deneve, S., Deguine, O., & Barone, P. (2007). Evi
dence that cochlear-implanted deaf patients are better multisensory integrators. Proceed
ings of the National Academy of Sciences U S A, 104 (17), 7295–7300.
Saint-Amour, D., De Sanctis, S. P., Molholm, S., Ritter, W., & Foxe, J. J. (2007). Seeing voic
es: High-density electrical mapping and source-analysis of the multisensory mismatch
negativity evoked during the McGurk illusion. Neuropsychologia, 45, 587–597.
Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., & Simola, J.
(1991). Seeing speech: Visual information from lip movements modifies activity in the hu
man auditory cortex. Neuroscience Letters, 127, 141–145.
Samuelsson, S., & Rönnberg, J. (1993). Implicit and explicit use of scripted constraints in
lip-reading. European Journal of Cognitive Psychology, 5, 201–233.
Sato, M., Cavé, C., Ménard, L., & Brasseur, A. (2010). Auditorytactile speech perception
in congenitally blind and sighted adults. Neuropsychologia, 48 (12), 3683–3686.
Sato, M., Tremblay, P., & Gracco, V. L. (2009). A mediating role of the premotor cortex in
phoneme segmentation. Brain and Language, 111 (1), 1–7.
Schadow, J., Lenz, D., Dettler, N., Fründ, I., & Herrmann, C. S. (2009). Early gamma-band
responses reflect anticipatory top-down modulation in the auditory cortex. NeuroImage,
47 (2), 651–658.
Scheier, C. R., Nijhawan, R., & Shimojo, S. (1999). Sound alters visual temporal resolu
tion. Investigative Ophthalmology and Visual Science, 40, 4169.
Page 48 of 56
Multimodal Speech Perception
Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscilla
tions and visual amplification of speech. Trends in Cognitive Sciences, 12, 106–113.
Schulte, K. (1972). Fonator system: Speech stimulator and speech feedback by technical
ly amplified one-channel vibration. In G. Fant (Ed.), International Symposium on Speech
Communication Ability and Profound Deafness (pp. 351–353). Washington, DC: A.G. Bell
Association for the Deaf.
Schwartz, J. L., Berthommier, F., & Savariaux, C. (2002). Audiovisual scene analysis: Evi
dence for a “very-early” integration process in audio-visual speech perception. Proceed
ings of ICSLP, 1937–1940.
Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of
speech perception. Trends in Neurosciences, 26 (2), 100–107.
Sekiyama, K., Kanno, I., Miura, S., & Sugita, Y. (2003). Auditory-visual speech perception
examined by fMRI and PET. Neuroscience Research, 47 (3), 277–287.
Senkowski, D., Talsma, D., Herrmann, C., & Woldorff, M. G. (2005). Multisensory process
ing and oscillatory gamma responses: Effects of spatial selective attention. Experimental
Brain Research, 166, 411–426.
Shulman, G. L., Fiez, J. A., Corbetta, M., Buckner, R. L., Miezin, F. M., Raichle, M. E., &
Petersen, S. E. (1997). Common blood flow changes across visual tasks: II. Decreases in
cerebral cortex. Journal Cognitive Neuroscience, 9, 648–663.
Skinner, M. W., Rinzer, S. M., Fredricksorr, J. M., Smith, P. G., Holden, T. A., Hold
(p. 552)
en, L. K., Juelich, M. E., & Turner, B. A. (1989). Comparison of benefit from vibrotactile
aid and cochlear implant for post-linguistically deaf adults. Laryngoscope, 98, 1092–1099.
Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., & Small, S. L. (2007). Hearing lips and
seeing voices: How cortical areas supporting speech production mediate audiovisual
speech perception. Cerebral Cortex, 17 (10), 2387–2399.
Small, D. M., Voss, J., Mak, Y. E., Simmons, K. B., Parrish, T., & Gitelman, D. (2004). Expe
rience-dependent neural integration of taste and smell in the human brain. Journal of
Neurophysiology, 92, 1892–1903.
Smeele, P., Massaro, D., Cohen, M., & Sittig, A. (1998). Laterality in visual speech percep
tion. Journal of Experimental Psychology: Human Perception and Psychophysics, 24,
1232–1242.
Soto-Faraco, S., & Alsius, A. (2007). Conscious access to the unisensory components of a
crossmodal illusion. NeuroReport, 18, 347–350.
Soto-Faraco, S., & Alsius, A. (2009). Deconstructing the McGurk-MacDonald illusion. Jour
nal of Experimental Psychology: Human Perception and Performance, 35 (2), 580–587.
Page 49 of 56
Multimodal Speech Perception
Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual
speech integration: evidence from the speeded classification task. Cognition, 92 (3), B13–
B23.
Soto-Faraco, S., Navarra, J., Weikum, W. M., Vouloumanos, A., Sebastián-Gallés, N., &
Werker, J. F. (2007) Discriminating languages by speech-reading. Perception and Psy
chophysics, 69, 218–231.
Soto-Faraco, S., Sebastián-Gallés, N., & Cutler, A. (2001). Segmental and suprasegmental
mismatch in lexical access. Journal of Memory and Language, 45, 412–432.
Spence, C., & Squire, S. B. (2003). Multisensory integration: Maintaining the perception
of synchrony. Current Biology, 13, R519–R521.
Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT
Press.
Stevenson, C. M., Brookes, M. J., & Morris, P. G. (2011). b-Band correlates of the fMRI
BOLD response. Human Brain Mapping, 32, 182–197.
Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior tempo
ral sulcus: Inverse effectiveness and the neural processing of speech and object recogni
tion. NeuroImage, 44 (3), 1210–1223.
Sumby, W., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Jour
nal of Acoustical Society of America, 26, 212–215.
Summerfield, Q., & McGrath, M. (1984). Detection and resolution of audio-visual incom
patibility in the perception of vowels. Quarterly Journal of Experimental Psychology, 36A,
51–74.
Page 50 of 56
Multimodal Speech Perception
Summers, I. R., Cooper, P. G., Wright, P., Gratton, D. A., Milnes, P., & Brown, B. H. (1997).
Information from timevarying vibrotactile stimuli. Journal of the Acoustical Society of
America, 102, 3686–3696.
Talsma, D., Doty, T. J., & Woldorff, M. G. (2007). Selective attention and audiovisual inte
gration: is attending to both modalities a prerequisite for early integration? Cerebral Cor
tex, 17 (3), 679–690.
Talsma, D., Senkowski, D., Soto-Faraco, S., & Woldorff, M. G. (2010). The multifaceted in
terplay between attention and multisensory integration. Trends in Cognitive Sciences, 14,
400–410.
Talsma, D., & Woldorff, M. G. (2005). Selective attention and multisensory integration:
Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience,
17 (7), 1098–1114.
Thomas, S. M., & Jordan, T. R. (2002). Determining the influence of Gaussian blurring on
inversion effects with talking faces. Perception and Psychophysics, 64, 932–944.
Thomas, S. M., & Jordan, T. R. (2004). Contributions of oral and extra-oral facial motion to
visual and audiovisual speech perception. Journal of Experimental Psychology: Human
Perception and Performance, 30, 873–888.
Thorn, F., & Thorn, S. (1989). Speechreading with reduced vision: A problem of aging.
Journal of the Optical Society of America, 6, 491–499.
Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual attention modulates audiovisual
speech perception. European Journal of Cognitive Psychology, 16, 457–472.
Tiippana, K., Puharinen, H., Möttönen, R., & Sams, M. (2011). Sound location can influ
ence audiovisual speech perception when spatial attention is manipulated. Seeing and
Perceiving, 24, 67–90.
Tremblay, C., Champoux, F., Voss, P., Bacon, B. A., & Lepore, F. (2007). Speech and non-
speech audio-visual illusions: A developmental study. PLoS ONE, 2 (8), e742.
Troyer, M., Loebach, J. L., & Pisoni, D. B. (2010). Perception of temporal asynchrony in
audiovisual phonological fusion. Research on Spoken Language Processing, Progress Re
port, 29, 156–182.
Tuomainen, J., Andersen, T., Tiippana, K., & Sams, M. (2005). Audio-visual speech percep
tion is special. Cognition, 96 (1), B13–B22.
Page 51 of 56
Multimodal Speech Perception
van Atteveldt, N. M., Formisanoa, E., Goebela, R., & Blomert, L. (2007). Top-down task ef
fects overrule automatic multisensory responses to letter–sound pairs in auditory associa
tion cortex. NeuroImage, 36 (4), 1345–1360.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neur
al processing of auditory speech. Proceedings of the National Academy of Science U S A,
102, 1181–1186.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration
in bimodal speech. Neuropsychologia, 45 (3), 598–607.
Varela, F., Lachaux, J. P., Rodríguez, E., & Martinerie, J. (2001). The brainweb: Phase syn
chronization and large-scale integration. Nature Reviews Neuroscience, 2, 229–239.
Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration
by the “unity effect” reveals that speech is special. Journal of Vision, 9 (14), 1–11.
Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual tempo
(p. 553)
Vatakis, A., & Spence, C. (2006). Audiovisual synchrony perception for speech and music
assessed using a temporal order judgment task. Neuroscience Letters, 393, 40–44.
Vatakis, A., & Spence, C. (2007). Crossmodal binding: Evaluating the “unity assumption”
using audiovisual speech and nonspeech stimuli. P erception and Psychophysics, 69, 744–
756.
Vatakis, A., & Spence, C. (2008). Evaluating the influence of the “unity assumption” on
the temporal perception of realistic audiovisual stimuli. Acta Psychologica, 127 (1), 12–23.
Vatakis, A., & Spence, C. (2010). Audiovisual temporal integration for complex speech,
object-action, animal call, and musical stimuli. In M. J. Naumer & J. Kaiser (Eds.), Multi
sensory object perception in the primate brain (pp. 95–121). New York: Springer. Engi
neering in Medicine and Biology Society of the Institute of Electrical and Electronics En
gineers.
Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S., & Munhall, K. G. (1998). Eye movement of
perceivers during audiovisual speech perception. Perception and Psychophysics, 60, 926–
940.
Vatikiotis-Bateson, E., Munhall, K. G., Kasahara, Y., Garcia, F., & Yehia, H. (1996). Charac
terizing audiovisual information during speech. In Proceedings of the 4th International
Conference on Spoken Language Processing (ICSLP 96) (Vol. 3, pp. 1485–1488). New
York: IEEE Press.
Page 52 of 56
Multimodal Speech Perception
Vitkovitch, M., & Barber, P. (1994). Effect of video frame rate on subjects’ ability to shad
ow one of two competing verbal passages. Journal of Speech and Hearing Research, 37
(5), 1204–1211.
Vroomen, J., & de Gelder, B. (2004). Temporal ventriloquism: Sound modulates the flash-
lag effect. Journal of Experimental Psychology: Human Perception and Performance, 30,
513–518.
Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: A tutorial re
view. Attention, Perception, & Psychophysics, 72, 871–884.
Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal
order perception by exposure to audiovisual asynchrony. Cognitive Brain Research, 22 (1),
32–35.
Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing:
Familiar faces and voices in the McGurk effect. P erception and Psychophysics, 57, 1124–
1133.
Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the
motor system involved in speech production. Neuropsychologia, 41 (8), 989–994.
Weikum, W. M., Vouloumanos, A., Navarra, J., Soto-Faraco, S., Sebastián-Gallés, N., &
Werker, J. F. (2007). Visual language discrimination in infancy. Science, 316, 1159.
Weisenberger, J. M., Broadstone, S. M., & Saunders, F. A. (1989). Evaluation of two multi
channel tactile aids for the hearing impaired. Journal Acoustic Society America, 865,
1764–1775.
Weisenberger, J. M., & Kozma-Spytek, L. (1991). Evaluating tactile aids for speech per
ception and production by hearingimpaired adults and children. American Journal of Otol
ogy, 12 (Suppl), 188–200.
Welch, R. B. (1999). Meaning, attention and the “unity assumption” in the intersensory
bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler
Page 53 of 56
Multimodal Speech Perception
(Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 371–
388). Amsterdam: Elsevier.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory dis
crepancy. Psychological Bulletin, 88 (3), 638–667.
Wilson, S. M., Pinar Saygin, A., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech
activates motor areas involved in speech production. Nature, 7 (7), 701–702.
Windmann, S. (2003). Effects of sentence context and expectation on the McGurk illusion.
Journal of Memory and Language, 50 (2), 212–230.
Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysen
sory interactions along lateral temporal regions evoked by audiovisual speech. Cerebral
Cortex, 13, 1034–1043.
Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head
motion and speech acoustics. Journal of Phonetics, 30, 555–568.
Yehia, H., Rubin, P., & Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract
and facial behaviour. Speech Communication, 26, 23–43.
Yuan, H., Reed, C. M., & Durlach, N. I. (2005). Tactual display of consonant voicing as a
supplement to lipreading. Journal of the Acoustical Society of America, 118 (2), 1003–
1015.
Zampini, M., Guest, S., Shore, D. I., & Spence, C. (2003). A udio-visual simultaneity judg
ments. Perception and Psychophysics, 67 (3), 531–544.
Notes:
(2) . Two main theories (motor theory, Liberman et al., 1967, Liberman & Mattingly, 1985;
and direct realism, Fowler, 1986) propose that the process of speech perception involves
the perceptual recovery and classification of articulatory gestures produced by the talker
rather than the acoustic correlates of those articulations. Thus, according to these theo
ries, articulatory movements are the invariant components in speech perception.
Page 54 of 56
Multimodal Speech Perception
(3) . Whereas some researchers have used these two terms inter-changeably, others have
distinguished them theoretically. Summerfield (1992; Footnote 1), for instance, defines lip
reading as the perception of speech purely by observing the talker’s articulatory ges
tures, whereas speech reading would also include other linguistic aspects of nonverbal
communication (e.g., the talker’s facial and manual gestures).
(4) . According to this model, the processing of corresponding auditory and visual infor
mation is never functionally separated (at early stages of processing). Other models claim
that audiovisual binding occurs after unisensory signals have been thoroughly processed,
thus at a late stage of processing (e.g., FLMP; Massaro, 1998).
(5) . Sensory substitution systems gather environmental energy that would normally be
processed by one sensory system (e.g., acoustic energy in the present case) and translate
this information into stimuli for another sensory system (e.g., electrotactile or vibrotactile
energy).
(6) . In air, the speed of light is much faster than sound (approximately 300,000,000 m/
second vs. 330 m/second, respectively). The neural transduction latencies are faster for
auditory stimuli than for visual (approximately 10 ms vs. 50 ms, respectively; Pöppel et
al., 1990).
(7) . The temporal resolution has often been determined by means of temporal order judg
ment (TOJ) or simultaneity judgment (SJ) tasks. In a TOJ task, stimuli are presented with
multi-sensory information at various stimulus onset asynchronies (SOAs; Dixon & Spitz,
1980; Hirsh & Sherrick, 1961), and observers may judge which stimulus came first or
which came second. In an SJ task, observers must judge whether the stimuli were pre
sented simultaneously or successively.
(8) . The special nature of audiovisual speech perception, compared with nonspeech au
diovisual binding, has been supported in other studies showing that audiovisual interac
tion is stronger if the very same audiovisual stimuli are treated as speech rather than
nonspeech (Tuomainen et al., 2005) and that independent maturational processes under
lie speech and nonspeech audiovisual illusory effects (Tremblay et al., 2007). At the physi
ological level, audiovisual speech and nonspeech interactions also appear to rely, at least
in part, on distinct mechanisms (Klucharev et al., 2003; van Wassenhove et al., 2005).
However, higher familiarity, extensive exposure to these stimuli in daily life, and the fact
that audiovisual speech events may be somehow more attention grabbing (i.e., compared
with nonspeech events) are potential confounds that may explain some of the differences
reported in previous literature (see Stekelenburg & Vroomen, 2007; Vatakis et al., 2008).
(9) . Among other things, determining which components of the face are important for vis
ible speech perception will allow the development of applications with virtual three-di
mensional animated talking heads (Ouni et al., 2007). These animated agents have the po
tential to improve communication in a range of situations, by supporting auditory infor
Page 55 of 56
Multimodal Speech Perception
mation (e.g., deaf population, telephone conversation, second language learning; Massaro
& Light, 2004).
(10) . The MMN is typically evoked by an occasional auditory change (deviant) in a ho
mogenous sequence of auditory stimuli (standards), even when it occurs outside the sub
jects’ focus of attention. That is, it constitutes an electrophysiological signature of audito
ry discrimination abilities.
(11) . According to the perceptual load theory of attention, the level of attentional de
mands required to perform a particular process will determine, among other things, the
amount of resources available to engage in the processing of task irrelevant information.
When a relevant task exhausts the available processing resources (i.e., under conditions
of high perceptual load), other incoming stimuli/tasks will receive little, if any, attention.
However, the theory also implies that, if the target-processing load is low, attention will
inevitably spill over to the processing of distractors, even if they are task irrelevant.
Agnès Alsius
Ewen MacDonald
Kevin Munhall
Page 56 of 56
Organization of Conceptual Knowledge of Objects in the Human Brain
One of the most provocative and exciting issues in cognitive science is how neural speci
ficity for semantic categories of common objects arises in the functional architecture of
the brain. Three decades of research on the neuropsychological phenomenon of category-
specific semantic deficits has generated detailed claims about the organization and repre
sentation of conceptual knowledge. More recently, researchers have sought to test hy
potheses developed on the basis of neuropsychological evidence with functional imaging.
From those two fields, the empirical generalization emerges that object domain and sen
sory modality jointly constrain the organization of knowledge in the brain. At the same
time, research within the embodied cognition framework has highlighted the need to ar
ticulate how information is communicated between the sensory and motor systems, and
processes that represent and generalize abstract information. Those developments point
toward a new approach for understanding category-specificity in terms of the coordinat
ed influences of diverse regions and cognitive systems.
Keywords: objects, category-specific semantic deficits, conceptual knowledge, organization, object domain, senso
ry modality
Introduction
The scientific study of how concepts are represented in the mind/brain extends to all dis
ciplines within cognitive science. Within the psychological and brain sciences, research
has focused on studying how the perceptual, motor, and conceptual attributes of common
objects are represented and organized in the brain. Theories of conceptual representa
tion must therefore explain not only how conceptual content itself is represented and or
ganized but also the role played by conceptual content in orchestrating perceptual and
motor processes.
Page 1 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
The modern study of the representation of concepts in the brain was initiated by a series
of papers by Elizabeth Warrington, Tim Shallice, and Rosaleen McCarthy. Those authors
described patients with disproportionate semantic impairments for one, or several, cate
gories of objects compared with other categories (see Hécaen & De Ajuriaguerra, 1956,
for earlier work). Since those initial investigations, a great deal has been learned about
the causes of category-specific semantic deficits and, by extension, about the organiza
tion of object knowledge in the brain.
Page 2 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
The focus of this review is on neuropsychological research and, in particular, on the phe
nomenon of category-specific semantic deficits. Evidence from other fields within cogni
tive science and neuroscience and functional neuroimaging is reviewed as it bears on the
theoretical positions that emerge from the study of category-specific semantic deficits. In
particular, we highlight findings in functional neuroimaging related to the representation
of different semantic categories in the brain. We also discuss the degree to which concep
tual representations are grounded in sensory and motor processes and the critical role
that neuropsychological studies of patients with impairments to sensory and motor knowl
edge can play in constraining theories of semantic representation. However, the stated fo
cus of this chapter also excludes important theoretical positions in the field of semantic
memory (e.g., Patterson, Nestor, & Rogers, 2007).
Page 3 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
of category-specific semantic impairment (for review and discussion, see Capitani et al.,
2003; Hart et al., 2007; Humphreys & Forde, 2001; Tyler & Moss, 2001). The majority of
reported patients have disproportionate impairments for living things compared with non
living things (Capitani et al., 2003).
One important aspect of the performance profile of patients with category-specific seman
tic impairment is that the impairment is to conceptual knowledge and not (only) to modal
ity-specific input or output representations. The evidence for locating the deficit at a con
ceptual level is that the category-specific deficit does not depend on stimuli being pre
sented, or on patients responding, in only one modality of input or output. For instance,
patients KC and EW (see Figure 27.1A) were impaired for naming living animate things
compared with nonliving things and fruit/vegetables. Both patients were also impaired for
answering questions about living animate things, such as “does a whale have legs” or
“are dogs domestic animals,” but were unimpaired for the same types of questions about
nonanimals (Figure 27.2A).
Patients with category-specific semantic deficits may also have additional, and also cate
gory-specific, deficits at presemantic levels of processing. For instance, patient EW was
impaired for judging whether pictures depicted real or unreal animals, but was unim
paired for the same task over nonanimal stimuli. The ability to make such decisions is as
sumed to index the integrity of the visual structural description system, a presemantic
stage of object recognition (Humphreys et al., 1988). In contrast, patient KC was relative
ly unimpaired on an object decision task, even for the category of items (living animate)
that the patient was unable to name. A similar pattern to that observed in patient KC was
present in patient APA (Miceli et al., 2000). Patient APA was selectively impaired for con
ceptual knowledge of people (see Figure 27.1C). Despite a severe impairment for naming
famous people, APA did not have a deficit at the level of face recognition (prosopagnosia).
A number of studies have now documented that variables such as lexical frequency, con
cept familiarity, and visual complexity may be unbalanced if items are sampled “random
ly” from different semantic categories (Cree & McRae, 2003; Funnell & Sheridan, 1992;
Stewart et al., 1992). In addition, Laiacona and colleagues (Barbarotto et al., 2002; Laia
cona et al., 1998) have highlighted the need to control for gender-specific effects on vari
ables such as concept familiarity (for discussion of differences between males and fe
males in the incidence of category-specific semantic deficits for different categories, see
Laiacona et al., 2006). However, the existence of category-specific semantic deficits is not
an artifact of such stimulus-specific attributes. Clear cases have been reported while
carefully controlling for those factors, and double dissociations have been reported using
Page 4 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
the same materials (e.g., Hillis & Caramazza, 1991; see also the separate case reports in
Laiacona & Capitani, 2001, and Barbarotto et al., 1995).
Page 5 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
(p. 557)
The sensory/functional theory is composed of two assumptions. The first—the multiple se
mantics assumption—is that conceptual knowledge is organized into subsystems that par
allel the sensory and motor modalities of input and output. The second assumption is that
the critical semantic attributes of items from different categories of objects are repre
sented in different modality-specific semantic subsystems.
The domain-specific hypothesis assumes that the first-order constraint on the organiza
tion of conceptual knowledge is object domain, with the possible domains restricted to
those that could have had an evolutionarily relevant history—living animate, living inani
mate, conspecifics, and “tools.”
Theories based on the correlated structure principle model semantic memory as a system
that (p. 558) represents statistical regularities in the co-occurrence of object properties in
the world (Caramazza et al., 1990; Devlin et al., 1998; McClelland & Rogers, 2003; Tyler
Page 6 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
& Moss, 2001). This class of models has been instrumental in motivating large-scale em
pirical investigations of how different types of features are distributed and correlated for
different semantic categories. Several theories based on the correlated structure princi
ple have been developed to explain the causes of category-specific semantic deficits
(Caramazza et al., 1990; Devlin et al., 1998; Tyler & Moss, 2001).
This review is organized to reflect the role that different theoretical assumptions have
played in motivating empirical research. Initial hypotheses that were developed to ex
plain category-specific semantic deficits appealed to a single principle of organization
(modality specificity, domain specificity, or correlated structure). The current state of the
field of category-specific semantic deficits is characterized by complex models that inte
grate assumptions from multiple theoretical frameworks. This trajectory of theoretical po
sitions reflects the fact that, although theories that have been developed based on the
neural and correlated structure principles are mutually contrary as explanations about
the causes of category-specific semantic deficits, the individual assumptions that consti
tute those theories are not necessarily incompatible as hypotheses about the structure of
knowledge in the brain (for discussion, see Caramazza & Mahon, 2003).
The proposal that the organization of the semantic system follows the organization of the
various input and output modalities to and from the semantic system was initially pro
posed by Beauvois (Beauvois, 1982; Beauvois et al., 1978). The original motivation for the
assumption of multiple semantics was the phenomenon of optic aphasia (e.g., Llermitte &
Beavuois, 1973; for review, see Plaut, 2002). Patients with optic aphasia present with im
paired naming of visually presented objects, but relatively (or completely) spared naming
of the same objects when presented through the tactile modality (e.g., Hillis & Caramaz
za, 1995). The fact that optic aphasic patients can name objects presented through the
tactile modality indicates that the naming impairment to visual presentation is not due to
a deficit at the level of retrieving the correct names. In contrast to patients with visual ag
nosia (e.g., Milner et al., 1991), patients with optic aphasia can recognize, at a visual lev
el of processing, the stimuli they cannot name. Evidence for this is provided by the fact
that some optic aphasic patients can demonstrate the correct use of objects that they can
not name (e.g., Coslett & Saffran, 1992; Llermitte & Beauvois, 1973; see Plaut, 2002).
Beauvois (1982) explained the performance of optic aphasic patients by assuming that the
conceptual system is functionally organized into visual and verbal semantics and that op
tic aphasia is due to a disconnection between the two semantic systems.
Along with reporting the first cases of category-specific semantic deficit, Warrington and
her collaborators (Warrington & McCarthy, 1983; Warrington & Shallice, 1984) developed
an influential explanation of the phenomenon that built on the proposal of Beauvois
(1982). Warrington and colleagues argued that category-specific semantic deficits are
Page 7 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
due to differential damage to a modality-specific semantic subsystem that is not itself or
ganized by semantic category. Specifically, those authors noted that the patients they had
reported with impairments for living things also had impairments for foods, plants, and
precious stones (Warrington & Shallice 1984); in contrast, a patient with an impairment
for nonliving things (Warrington & McCarthy, 1983) was spared for living things, food,
and plant life. Warrington and her collaborators reasoned that the association of impaired
and spared categories was meaningfully related to the degree to which identification of
items from those categories depends on sensory or functional knowledge. Specifically,
they argued that the ability to identify living things differentially depends on sensory
knowledge, whereas the ability to identify nonliving things differentially depends on func
tional knowledge.
Farah and McClelland (1991) implemented the theory of Warrington and colleagues in a
connectionist framework. Three predictions follow from the computational model of Farah
and McClelland (1991; for discussion, see Caramazza & Shelton, 1998). All three of those
predictions have now been tested. The first prediction is that the grain of category-specif
ic semantic deficits should not be finer than living versus nonliving. This prediction fol
lows from the assumption that all living things differentially depend on visual knowledge.
However, as represented in Figure 27.1, patients have been reported with selective se
mantic impairments for fruit/vegetables (e.g., Hart et al., 1985; Laiacona et al., 2005;
Samson & Pillon, 2003) and animals (e.g., Blundo et al., 2006; Caramazza & Shelton,
1998). The second prediction is that an impairment for a given category of knowledge will
be associated with a disproportionate impairment for the modality of (p. 559) knowledge
that is critical for that category. At variance with this prediction, it is now known that cat
egory-specific semantic deficits are associated with impairments for all types of knowl
edge (sensory and functional) about items from the impaired category (see Figure 27.2A;
e.g., Blundo et al., 2006; Caramazza & Shelton, 1998; Laiacona & Capitani, 2001; Laia
cona et al., 1993; Lambon Ralph et al., 1998; Moss et al., 1998). The third prediction is
that impairments for a type of knowledge will necessarily be associated with differential
impairments for the category that depends on that knowledge type. Patients exhibiting
patterns of impairment contrary to this prediction have been reported. For instance, Fig
ure 27.2B shows the profile of a patient who was (1) more impaired for visual compared
with functional knowledge, and (2) if anything, more impaired for nonliving things than
living things (Lambon Ralph et al., 1998; see also Figure 27.2C, Figure 27.3, and discus
sion below).
Page 8 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
The original formulation of the sensory/functional theory was based on a simple division
between visual-perceptual knowledge and functional-associative knowledge. Warrington
and McCarthy (1987; see also Crutch & Warrington, 2003) suggested, however, that
knowledge of object color is differentially important for fruit/vegetables compared with
animals. Since Warrington and McCarthy, further sensory- and motor-based dimensions
that may be important for distinguishing between semantic categories have been articu
lated (e.g., Cree & McRae, 2003; Vinson et al., 2003).
Cree and McRae (2003) used a feature-listing task to study the types of information that
normal subjects spontaneously associate with different semantic categories. The seman
tic features were then classified into nine knowledge types: color, visual parts and surface
properties, visual motion, smell, sound, tactile, taste, function, and encyclopedic (see Vin
son et al., 2003, for a slightly different classification). Hierarchical cluster analyses were
used to determine which semantic categories differentially loaded on which feature types.
The results of those analyses indicated that (1) visual motion and function information
were the two most important knowledge types for distinguishing living animate things
(high on visual motion information) from (p. 560) nonliving things (high on function infor
mation); (2) living animate things were weighted lower on color information than fruit/
vegetables, but higher on this knowledge type than nonliving things; and (3) fruit/vegeta
bles were distinguished from living animate and nonliving things by being weighted the
highest on both color and taste information.
Cree and McRae’s analyses support the claim that the taxonomy of nine knowledge types
is effective in distinguishing among the domains living animate, fruit/vegetables, and non
Page 9 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
living. Those analyses do not demonstrate that the nine knowledge types are critical for
distinguishing among items within the respective categories. However, and as noted
above, patients with category-specific semantic impairments do not necessarily have diffi
culty distinguishing between different domains (i.e., they might know it is an “animal,”
but cannot say which one). It is therefore not obvious that Cree and McRae’s analyses
support the claim that category-specific semantic deficits may be explained by assuming
damage to one (or more) of the nine knowledge types.
At a more general level, the open empirical question is whether the additional knowledge
types and the corresponding further functional divisions that are introduced into the se
mantic system can account for the neuropsychological evidence. Clearly, if fruit/vegeta
bles and animals are assumed to differentially depend on different types of information
(and by inference, different semantic subsystems), it is in principle possible to account for
the tripartite distinction between animals, fruit/vegetables, and nonliving. As for the origi
nal formulation of the sensory/functional theory, the question is whether fine-grained cat
egory-specific semantic impairments are associated with impairments for the type of
knowledge upon which items from the impaired category putatively depend. However, pa
tients have been reported with category-specific semantic impairments for fruit/vegeta
bles, without disproportionate impairments for color knowledge (e.g., Samson & Pillon,
2003). Patients have also been reported with impairment for knowledge of object color
without a disproportionate impairment for fruit/vegetables compared with other cate
gories of objects (see Figure 27.2C; Luzzatti & Davidoff, 1994; Miceli et al., 2001).
Another way in which investigators have sought to provide support for the sensory/func
tional theory is to study the semantic categories that are systematically impaired togeth
er. As noted above, one profile of the first reported cases that motivated the development
of the sensory/functional theory (Warrington & Shallice, 1984) was that the categories of
animals, plants, and foods tended to be impaired or spared together. Those associations
of impairing and sparing of categories made sense if all of those categories depended on
the same modality-specific system for their identification. Following the same logic, it was
argued that musical instruments patterned with living things (because of the importance
of sensory attributes; see Dixon et al., 2000, for relevant data), whereas body parts pat
terned with nonliving things (because of the importance of functional attributes associat
ed with object use (e.g., Warrington & McCarthy, 1987). However, as was the case for the
dissociation between living animate (animals) and living inanimate (e.g., plants) things, it
is now known that musical instruments dissociate from living things, whereas body parts
dissociate from nonliving things (Caramazza & Shelton, 1998; Laiacona & Capitani, 2001;
Shelton et al., 1998; Silveri et al., 1997; Turnbull & Laws, 2000; for review and discus
sion, see Capitani et al., 2003).
More recently, Borgo and Shallice (2001, 2003) have argued that sensory-quality cate
gories, such as materials, edible substances, and drinks, are similar to animals in that
they depend on sensory information for their identification. Those authors reported that
impairment for living things was associated with impairments for sensory-quality cate
gories. However, Laiacona and colleagues (2003) reported a patient who was impaired for
Page 10 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
living things but spared for sensory-quality categories (for further discussion, see Carroll
& Garrard 2005).
Another dimension that has been argued to be instrumental in accounting for category-
specific semantic deficits is differential similarity in the visual structure of items from dif
ferent categories. Humphreys and Forde (2001; see also Tranel et al., 1997) argued that
living things tend to be more structurally similar than nonliving things. If that were the
case, then it could be argued that damage to a system not organized by object category
would result in disproportionate disruption of items that are more “confusable” (see also
Lambon Ralph et al., 2007; Rogers et al., 2004). Within Humphreys and Forde’s frame
work, it is also assumed that activation dynamically cascades from visual object recogni
tion processes through to lexical access. Thus, perturbation of visual recognition process
es could trickle through the system to disrupt the normal functioning of subsequent
processes, resulting in a naming deficit (see Humphreys et al., 1988). (p. 561) Laws and
colleagues (Laws & Gale, 2002; Laws & Neve, 1999) also argued for the critical role of
similarity in visual structure for explaining category-specific semantic deficits. However,
in contrast to Humphreys and Forde (see also Tranel et al., 1997), Laws and colleagues
argued that nonliving things tend to be more similar than living things.
Clearly, there remains much work to be done to understand the role that visual similarity
and the consequent “crowding” (Humphreys & Forde 2001) of visual representations
have in explaining category-specific semantic deficits. On the one hand, there is no con
sensus regarding the relevant object properties over which similarity should be calculat
ed, or regarding how such a similarity metric should be calculated. On the other hand, as
suming an “agreed on” means for determining similarity in visual shape, the question re
mains open as to the role that such a factor might play in explaining the facts of category-
specific semantic deficits.
Domain-Specific Hypothesis
Page 11 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
spatial reasoning (e.g., Cantlon et al., 2009; Feigenson et al., 2004; Hermer & Spelke,
1994).
Unique predictions are generated by the original formulation of the domain-specific hy
pothesis as it was articulated in the context of category-specific semantic deficits. One
prediction is that the grain of category-specific semantic deficits will reflect the grain of
those categories that could plausibly have had an evolutionarily relevant history (see Fig
ure 27.1). Another prediction is that category-specific semantic impairments will be asso
ciated with impairments for all types of knowledge about the impaired object type (see
Figure 27.2A). A third prediction made by the domain-specific hypothesis is that it should
be possible to observe category-specific impairments that result from early damage to the
brain. Evidence in line with this expectation is provided by the case of Adam (Farah & Ra
binowitz 2003). Patient Adam, who was 16 years old at the time of testing, suffered a
stroke at 1 day of age. Adam failed to acquire knowledge of living things, despite normal
levels of knowledge about nonliving things. As would be expected within the framework
of the domain-specific hypothesis, Adam was impaired for both visual and nonvisual
knowledge of living things (Farah & Rabinowitz, 2003).
Other researchers subsequently developed highly specified proposals based on the corre
lated structure principle, all of which build on the idea that different types of features are
differentially correlated across different semantic categories (Devlin et al., 1998; Rogers
et al., 2004; Tyler & Moss 2001). Those models of semantic memory have been imple
mented computationally, with simulated damage, to provide existence proofs that a sys
tem with no explicit functional organization may be damaged so as to produce category-
specific semantic deficits. Because theories based on the correlated structure principle
do not assume that the conceptual system has structure at the level of functional neu
roanatomy, (p. 562) they are best suited to modeling the patterns of progressive loss of
conceptual knowledge observed in neurodegenerative diseases, such as dementia of the
Alzheimer type and semantic dementia. The type of damage in such patients is diffuse
and widespread and can be modeled in connectionist architectures by removing, to vary
ing degrees, randomly selected components of the network.
Page 12 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
One important proposal is the conceptual-structure account of Tyler, Moss, and col
leagues (Bright et al., 2005; Tyler & Moss, 2001). That proposal assumes that living
things have more shared features, whereas nonliving things have more distinctive fea
tures. The model further assumes that the shared features of living things are highly cor
related (has eyes/can see), whereas for nonliving things distinctive features are highly
correlated (used for spearing/has tines). If distinctive features are critical for identifica
tion, and if greater correlation confers resilience to damage, then an interaction between
the severity of overall impairment and the direction of category-specific semantic deficit
is predicted. Mild levels of impairments should produce disproportionate impairments for
living things compared with nonliving things. At more severe levels of impairments, the
distinctive features of nonliving things will be lost, and a disproportionate impairment for
this category will be observed. The opposite prediction regarding the severity of overall
impairment and the direction of category-specific impairment is predicted by the account
of Devlin and colleagues (1998) because it is assumed that as damage becomes severe,
whole sets of inter-correlated features will be lost, resulting in a disproportionate impair
ment for living things. However, it is now known that neither prediction finds clear empir
ical support (Garrard et al., 1998; Zannino et al., 2002; see also Laiacona and Capitani,
2001, for discussion within the context of focal lesions; for further discussion and theoret
ical developments, see Cree and McRae, 2003; Vinson et al., 2003).
One issue that is not resolved is whether correlations between different features should
be calculated in a concept-dependent or concept-independent manner (Zannino et al.,
2006). For instance, although the (“distinctive”) information “has tines” is highly correlat
ed with the function “used for spearing” in the concept of “fork” (correlated as concept
dependent), the co-occurrence of those properties in the world is relatively low (concept
independent). Sartori, Lombardi, and colleagues (Sartori & Lombardi, 2004; Sartori et al.,
2005) have addressed a similar issue by developing the construct of “semantic rele
vance,” which is computed through a nonlinear combination of the frequency with which
particular features are produced for an item and the distinctiveness of those features for
all concepts in the database. Those authors have shown that living things tend to be low
er, on average, than nonliving things in terms of their relevance, thus making living
things on average “harder” than nonliving things. As is the case for other accounts of cat
egory-specific semantic deficits that are based on differences across categories along a
single dimension, the existence of disproportionate deficits for the relatively “easy” cate
gory (nonliving things) are difficult to accommodate (see, e.g., Hillis & Caramazza, 1991;
Laiacona & Capitani, 2001; see Figure 27.1D). Nevertheless, the theoretical proposal of
Sartori and colleagues highlights the critical and unresolved issue of how to determine
the “psychologically relevant” metric for representing feature correlations.
Another unresolved issue is whether high correlations between features will provide “re
silience” to damage for those features, or rather will make damage “contagious” among
them. It is often assumed that high correlation confers resilience to, or insulation from,
damage; however, our understanding of how damage to one part of the brain affects oth
er regions of the brain remains poorly developed. It is also not obvious that understand
ing the behavior of connectionist architectures constitutes the needed motivation for de
Page 13 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
ciding, one way or the other, whether greater correlation confers greater resilience to
damage. In fact, theoretical differences about the role of correlations in conferring re
silience to damage are in part responsible for the contrasting predictions that follow from
the models of Tyler and colleagues (Tyler & Moss, 2001) and Devlin and colleagues
(1998) (see Zannino et al., 2006, for discussion).
Another example that illustrates our current lack of understanding of the role of correla
tion in determining the patterns of impairment is provided by dissociations between sen
sory, motor, and conceptual knowledge. For instance, the visual structure of objects is
highly correlated with more abstract knowledge of the conceptual features of objects.
Even so, patients with impairments to abstract conceptual features of objects do not nec
essarily have corresponding impairments to object recognition processes (see above, and
Capitani et al., 2003, for review). Similarly, although manipulation knowledge (“how to”
knowledge) is correlated with functional knowledge (“what for” knowledge), damage to
the former does not imply damage to the latter (see Buxbaum et al., 2000; see Figure
27.3D and discussion below).
(p. 563) Theories based on the correlated structure principle are presented as alternatives
to proposals that assume neural structure within the conceptual system. The implicit as
sumption in that argument is that the theoretical construct of a semantic feature offers a
means for reducing different categories to a common set of elements (see Rogers et al.,
2004, for an alternative proposal). There are, however, no semantic features that have
been described that are shared across semantic categories, aside from very abstract fea
tures such as “has mass” (Strnad, Anzellotti, & Caramazza, 2011). In other words, in the
measure to which semantic features are the “substance” of conceptual representations,
different semantic categories would be represented by non-overlapping sets of features.
Thus, and as has been proposed on the basis of functional neuroimaging data (see, e.g.,
Haxby et al., 2001, and discussion below), it may be the case that regions of high feature
correlation (e.g., within semantic category correlations in visual structure) are reflected
in the functional neuroanatomy of the brain (see also Devlin et al., 1998, for a hybrid
model in which both focal and diffuse lesions can produce category-specific effects, and
Caramazza et al., 1990, for an earlier proposal along those lines).
Page 14 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Lesion Analyses
A natural issue to arise in neuropsychological research concerns which brain regions tend
to be lesioned in association with category-specific deficits. The first study to systemati
cally address this issue was by H. Damasio and colleagues (1996). Those authors found
that name retrieval deficits for pictures of famous people were associated with left tempo
ral pole lesions, a result confirmed by other investigators (see Lyons, et al., 2006, for an
overview). Damasio and colleagues also found that deficits for naming animals were asso
ciated with (more posterior) lesions of anterior left ventral temporal cortex. Subsequent
research has confirmed that deficits for naming animals are associated with lesions to an
terior regions of temporal cortex (e.g., Brambati et al., 2006). Damasio and collaborators
also found that deficits for naming tools were associated with lesions to posterior and lat
eral temporal areas, overlapping the left posterior middle gyrus. The critical role of the
left posterior middle temporal gyrus for knowing about tools has also since been con
firmed by other lesion studies (e.g., Brambati et al., 2006).
A subsequent report by H. Damasio and colleagues (2004) demonstrated that the same re
gions were also reliably damaged in patients with impairments for recognizing stimuli
from those three categories. In addition, Damasio and colleagues (2004) found that
deficits for naming tools, as well as fruit/vegetables, were associated with lesions to the
inferior precentral and postcentral gyri and the insula. Consensus about the association
of lesions to the regions discussed above with category-specific deficits is provided by
Gainotti’s (e.g., 2000) analyses of published reports of patients with category-specific se
mantic deficits.
A number of investigators have interpreted the differential role of anterior mesial aspects
of ventral temporal cortex in the processing of living things to reflect the fact that living
things have more shared properties than nonliving things, such that more fine-grained
discriminations are required to name them (Bright et al., 2005; Damasio et al., 2004; Sim
mons & Barsalou, 2003; see also Humphreys et al., 2001). Within this framework, the as
sociation of deficits to unique person knowledge and lesions to the most anterior aspects
of the temporal lobe is assumed to reflect the greater discrimination that is required for
distinguishing among conspecifics, compared with animals (less) and nonliving things
(even less).
Page 15 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Functional Imaging
Data from functional imaging, and in particular functional magnetic resonance imaging
(fMRI), have added in important ways to our understanding of how different semantic cat
egories are processed in the healthy brain. In particular, although (p. 564) the lesion over
lap approach is powerful in detecting brain regions that are critical for performing a giv
en task, functional imaging has the advantage of detecting both regions that are critical
and regions that are automatically engaged by the mere presentation of a certain type of
stimulus. Thus, in line with the lesion evidence described above, nonliving things, and in
particular tools, differentially activate the left middle temporal gyrus (Figure 27.4A; e.g.,
Martin et al., 1996; Thompson-Schill et al., 1999; see Devlin et al., 2002, for review). Oth
er imaging data indicate that this region plays an important role in processing the seman
tics of actions (e.g., Martin et al., 1995; Kable et al., 2002; Kemmerer et al., 2008), as well
Page 16 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
as mechanical (i.e., unarticulated) motion (Beauchamp et al., 2002, 2003; Martin & Weis
berg, 2003).
In contrast, and not as apparent in lesion studies, tools differentially activate dorsal
stream regions that mediate object-directed action. The activation of some of those re
gions is independent of whether (p. 565) action information is necessary to perform the
task in which participants are engaged (e.g., picture naming). For instance, regions with
in dorsal occipital cortex, posterior parietal cortex, through to the anterior intraparietal
sulcus, are automatically activated when participants observe manipulable objects (e.g.,
Chao & Martin, 2000; Culham et al., 2003; Fang & He, 2005; Frey et al., 2005). Those re
gions are important for determining volumetric and spatial information about objects as
well as shaping and transporting the hand for object grasping. However, those dorsal oc
cipital and posterior parietal regions are not thought to be critical for object identifica
tion or naming (e.g., Goodale & Milner, 1992). Naming tools also differentially activates
the left inferior parietal lobule (e.g., Mahon et al., 2007; Rumiati et al., 2003), a structure
that is important for representing complex object-associated manipulations (e.g., for re
view, see Johnson-Frey, 2004; Lewis, 2006).
One clear way in which functional imaging data have contributed beyond lesion evidence
to our understanding of category specificity in the brain is the description of highly con
sistent topographic biases by semantic categories in the ventral object processing stream
(see Figure 27.4B and C; for reviews, see Bookheimer, 2002; Gerlach, 2002; Grill-Spector
& Malach, 2004; Op de Beeck et al., 2008; Thompson-Schill, 2003). As opposed to the an
terior-posterior mapping of semantic categories within the ventral stream described by
the lesion evidence (e.g., Damasio et al., 1996), there is also a lateral-to-medial organiza
tion. The fusiform gyrus on the ventral surface of the temporal-occipital cortex is critical
for representing object color and form (e.g., Martin, 2007; Miceli et al., 2001). Living ani
mate things such as faces and animals elicit differential neural responses in the lateral
fusiform gyrus, whereas nonliving things (tools, vehicles) elicit differential neural re
sponses in the medial fusiform gyrus (e.g., Chao et al., 1999; Mahon et al., 2007; Nop
peney et al., 2006). Stimuli that are highly definable in terms of their spatial context, such
as houses and scenes, differentially activate regions anterior to these fusiform regions, in
the vicinity of parahippocampal cortex (e.g., Bar & Aminoff, 2003; Epstein & Kanwisher,
1998). Other visual stimuli also elicit consistent topographical biases in the ventral
stream, for instance, written words (see Dehaene et al., 2005, for discussion) and images
of body parts (e.g., Downing et al., 2001).
Page 17 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
required that combines the assumption of multiple semantics with some claim about how
information would come to be topographically segregated by semantic category. A num
ber of such proposals have been advanced, although not always in the context of the sen
sory/functional theory, or more generally within the context of theories that emerge from
category-specific semantic deficits (see, e.g., Gauthier et al., 2000; Haxby et al., 2001;
Ishai et al., 1999; Levy et al., 2001; Mechelli et al., 2006; Rogers et al., 2003).
To date, the emphasis of research on the organization of the ventral stream has been on
the stimulus properties that drive responses in a particular brain region, studied in rela
tive isolation from other regions. This approach was inherited from well-established tradi
tions in neurophysiology and psychophysics, where it has been enormously productive for
mapping psychophysical continua in primary sensory systems. It does not follow that the
same approach will yield equally useful insights for understanding the principles of the
neural organization of conceptual knowledge. The reason is that unlike the peripheral
sensory systems, the pattern of neural responses in higher order areas is only partially
driven by the physical input—it is also driven by how the stimulus is interpreted, and that
interpretation does not occur in a single, isolated region. The ventral object processing
stream is the central pathway for the extraction of object identity from visual information
in the primate brain—but what the brain does with that information about object identity
depends on how the ventral stream is connected to the rest of the brain.
Page 18 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
(fusiform face area; Martin & Weisberg, 2003; Pasley et al., 2004; Vuilleumier et al., 2004)
arises because that region of the brain has connectivity with the amygdala and the supe
rior temporal sulcus (among other regions), which are important for the extraction of so
cially relevant information and biological motion. Specificity for tools and manipulable ob
jects in the medial fusiform gyrus is driven, in part, by connectivity between that region
and regions of parietal cortex that subserve object manipulation (Mahon et al., 2007;
Noppeney et al., 2006; Rushworth et al., 2006; Valyear & Culham, 2009). Connectivity-
based constraints may also be responsible for other effects of category specificity in the
ventral visual stream, such as connectivity between somatomotor areas and regions of the
ventral stream that differentially respond to body parts (extrastriate body area; Astafiev
et al., 2004; Orlov et al., 2010; Peelen & Caramazza, 2010), connectivity between left lat
eralized frontal language processing regions and ventral stream areas specialized for
printed words (visual word form area; Dehaene et al., 2005; Martin, 2006), and connectiv
ity between regions involved in spatial analysis and ventral stream regions showing dif
ferential responses to highly contextualized stimuli, such as houses, scenes, and large
nonmanipulable objects (parahippocampal place area; Bar & Aminoff, 2003).
Although recent discussion has noted the possibility that nonvisual dimensions may be
relevant in shaping the organization of the ventral stream (Cant et al., 2009; Grill-Spector
& Malach, 2004; Martin, 2007), those accounts have given far greater prominence to the
role of visual experience in their explanation of the causes of category-specific organiza
tion within the ventral stream. A number of hypotheses have been developed, and we
merely touch on them here to illustrate a common assumption: that the organization of
the ventral stream reflects the visual structure of the world, as interpreted by domain-
general processing constraints. Thus, the general thrust of those accounts is that the vi
sual structure of the world is correlated with semantic category distinctions in a way that
is captured by how visual information is organized in the brain. One of the most explicit
proposals is that there are weak eccentricity preferences in higher order visual areas that
are inherited from earlier stages in the processing stream. Those eccentricity biases in
teract with our experience of foveating some classes of items (e.g., faces) and viewing
others in the relative periphery (e.g., houses; Levy et al., 2001). Another class of propos
als is based on the supposition that items from the same category tend to look more simi
lar than items from different categories, and similarity in visual shape is mapped onto
ventral temporal occipital cortex (Haxby et al., 2001). It has also been proposed that a
Page 19 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
given category may require differential processing relative to other categories, for in
stance, in terms of expertise (Gauthier et al., 1999), visual crowding (Rogers et al., 2005),
or the relevance of visual information for categorization (Mechelli et al., 2006). Still other
accounts (p. 567) appeal to “feature” similarity and distributed feature maps (Tyler et al.,
2003). Finally, it has been suggested that multiple, visually based dimensions of organiza
tion combine super-additively to generate the boundaries among category-preferring re
gions (Op de Beeck et al., 2008). Common to all of these accounts is the assumption that
visual experience provides the necessary structure, and that a visual dimension of organi
zation happens to be highly correlated with semantic category.
Although visual information is important in shaping how the ventral stream is organized,
recent findings indicate that visual experience is not necessary for the same, or similar,
patterns of category specificity to be present in the ventral stream. In an early positron
emission tomography (PET) study, Büchel and colleague (1998) showed that congenitally
blind subjects have activation for words (presented in Braille) in the same region of the
ventral stream as sighted individuals (presented visually). Pietrini and colleagues (2004)
used multivoxel pattern analyses to show that the pattern of activation over voxels in the
ventral stream was more consistent across different exemplars within a category, than ex
emplars across categories. More recently, we (Mahon et al., 2009) have shown that the
same medial-to-lateral bias in category preferences on the ventral surface of the occipital
temporal cortex that is present in sighted individuals is present in congenitally blind sub
jects. Specifically, nonliving things, compared with animals, elicit stronger activation in
medial regions of the ventral stream (see Figure 27.3).
Although these studies on category specificity in blind individuals represent only a first-
pass analysis of the role of visual experience in driving category specificity in the ventral
stream, they indicate that visual experience is not necessary for category specificity to
emerge in the ventral stream. This fact raises an important question—if visual experience
is not needed for the same topographical biases in category specificity to be present in
the ventral stream, then, what drives such organization? One possibility, as we have sug
gested, is innate connectivity between regions of the ventral stream and other regions of
the brain that process affective, motor, and conceptual information.
Page 20 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
tivity between parietal-frontal somatomotor areas and prefrontal cortex may dominate,
whereas when categorizing faces other regions may express stronger functional coupling
to those same prefrontal regions. Such a suggestion would generate the expectation that
although damage to prefrontal-to-ventral stream connections may result in difficulties
categorizing all types of visual stimuli, disruption of the afferents to prefrontal cortex
from a specific category-preferring area could lead to categorization problems selective
to that domain.
Object-Associated Actions
The activation by tool stimuli of regions of the brain that mediate object-directed action
has been argued to follow naturally from the sensory/functional theory. On that theory,
the activation of dorsal structures by tool stimuli indexes the critical role of function
knowledge in the recognition of nonliving things (e.g., Boronat et al., 2004; Kellenbach et
al., 2003; Martin, 2000; Noppeney et al., 2006; Simmons & Barsalou, 2003). That argu
ment is weakened, however, in the measure to which it is demonstrated that the integrity
of action knowledge is not necessary in order to have other types of knowledge about
tools, such as their function.
Page 21 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Other neuropsychological data indicate that the integrity of action knowledge is not nec
essary for patients to have accurate knowledge of object function. Figure 27.5 depicts the
performance of patient WC (Buxbaum et al., 2000) on two picture-matching tasks. In a
picture-matching task that required knowledge of object manipulation, performance was
impaired; however, in a picture-matching task that required knowledge of object function,
performance was spared. Functional imaging studies (Boronat et al., 2004; Canessa et al.,
2008; Kellenbach et al., 2003) converge with those neuropsychological data in showing
that manipulation, but not function, knowledge modulates neural responses in the inferi
or parietal lobule. There is also evidence, from both functional neuroimaging (e.g., Canes
sa et al., 2008) and neuropsychology (e.g., Sirigu et al., 1991), that temporal, and not
parietal, cortex may be involved in the representation of function knowledge of objects.
The convergence between the neuropsychological evidence from apraxia and the func
tional imaging evidence indicates that although there is a dedicated system for knowl
edge of object manipulation, that system is not critically involved in representing knowl
edge of object function. This suggests that the automatic engagement of action process
ing by manipulable objects, as observed in neuroimaging, may have consequences for a
theory of pragmatics or action, but not necessarily for a theory of semantics (Goodale &
Milner, 1992; Jeannerod & Jacob, 2005). This in turn weakens the claim that automatic
activation of dorsal stream structures by manipulable objects is evidence for the sensory/
functional theory.
The first detailed articulation of the embodied cognition framework was by Allen Allport
(1985). Allport (1985) proposed that conceptual knowledge is organized according to sen
sory and motor modalities and that the information represented within different modali
ties was format specific:
The essential idea is that the same neural elements that are involved in the coding
the sensory attributes of a (possibly unknown) object presented to eye or hand or
ear also make up the elements of the auto-associated activity-patterns that repre
sent familiar object-concepts in “semantic memory.” This model is, of course, in
Page 22 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
radical opposition to the view, apparently held by many psychologists, that “se
mantic memory” is represented in some abstract, modality-independent, “concep
tual” domain remote from the mechanisms of perception and motor organization.
(Allport, 1985, p. 53, emphasis original)
One type of evidence, discussed above, that has been argued to support an embodied rep
resentation of object concepts is the observation that regions of the brain that directly
mediate object-directed action are automatically activated when participants observe ma
nipulable objects. However, the available neuropsychological evidence (see Figure 27.5)
Page 23 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
reduces confidence in the claim that action knowledge plays a critical role in
(p. 569)
grounding the diverse types of knowledge that we have about tools. The strongest evi
dence for the relevance of motor and perceptual processes to conceptual processing is
provided by demonstrations that the sensory and motor systems are automatically en
gaged by linguistic stimuli that imply action (e.g., Buccino et al., 2005; Boulenger et al.,
2006; Glenberg & Kaschak, 2002; Oliveri et al., 2004). It has also been demonstrated that
activation of the motor system automatically spreads to conceptual and perceptual levels
of processing (e.g., Pulvermüller et al., 2005).
The embodied cognition hypothesis makes strong predictions about the integrity of con
ceptual processes after damage to sensory and motor processes. It predicts, necessarily,
and as Allport wrote, that “…the loss of particular attribute information in semantic mem
ory should be accompanied by a corresponding perceptual (agnostic) deficit” (p. 55,
(p. 570) emphasis original). Although there are long traditions within neuropsychology of
studying patients with deficits for sensory and motor knowledge, only recently have those
deficits been of such clear theoretical relevance to hypotheses about the nature of seman
tic memory. Systematic and theoretically informed studies of such patients will play a piv
otal role in evaluating the relation between sensory, motor, and conceptual knowledge.
Central to that enterprise will be to specify how information is dynamically exchanged be
tween systems, in the context of specific task requirements. This will be important for de
termining the degree to which sensory and motor activation is in fact a critical compo
nent of conceptual processing (see Machery, 2007; Mahon & Caramazza, 2008, for discus
sion). It is theoretically possible (and in our view, likely) that although concepts are not
exhausted by sensory and motor information, the organization of “abstract” concepts is
nonetheless shaped in important ways by the structure of the sensory and motor systems.
It is also likely, in our view, that processing of such “abstract” conceptual content is heav
ily interlaced with activation of the sensory and motor systems. We have referred to this
view as “grounding by interaction” (Mahon & Caramazza, 2008).
Page 24 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
On one level, the patient clearly does retain the concept of hammer, and in this sense, the
concept of hammer is “symbolic,” “abstract,” and “qualitatively different” from the motor
“knowledge” that is compromised in the patient. On another level, when the patient in
stantiates the abstract and symbolic concept of hammer, that instantiation occurs isolated
from sensory-motor information that, in the normal system, would go along with the in
stantiation of the concept.
Thus, on the one hand, there is a level of representation of meaning that is sufficiently
general and flexible that it may apply to inputs from diverse sensory modalities and be ex
pressed in action through diverse output modalities. The abstract and symbolic represen
tation of hammer could be accessed from touch, vision, or audition; similarly, that repre
sentation could be “expressed” by pantomiming the use of a hammer, producing the
sounds that make up the word hammer, writing the written word hammer, and so on. In
short, there is a level of conceptual representation that is abstract and symbolic and that
is not exhausted by information represented in the sensory and motor systems.
On the other hand, conceptual information that is represented at an abstract and symbol
ic level does not, in and of itself, exhaust what we know about the world. What we know
about the world depends also on interactions between abstract conceptual content and
the sensory and motor systems. There are two ways in which such interactions may come
about. First, abstract and symbolic concepts can be activated by events in the world that
are processed by the sensory systems, and realize changes in the world through the mo
tor system (see Jeannerod & Jacob, 2005, for relevant discussion). Second, the instantia
tion of a given abstract and symbolic concept always occurs in a particular situation; as
such, the instantiation of that concept in that situation may involve highly specific senso
ry and motor processes.
Within the grounding by interaction framework, sensory and motor information colors
conceptual processing, enriches it, and provides it with a relational context. The activa
tion of the sensory and motor systems during conceptual processing serves to ground ab
stract and symbolic representations in the rich sensory and motor content that mediates
our physical interaction with the world. Of course, the specific sensory and motor infor
mation that is activated may change depending on the situation in which the abstract and
symbolic conceptual representation is instantiated.
On the grounding by interaction view, the specific sensory and motor information that
goes along with the instantiation of a concept is not constitutive of that concept. Of
course, that does not mean that that specific sensory and motor information is not impor
tant for the instantiation of a concept, in a particular way, at a given point in time. In
deed, (p. 571) such sensory and motor information may constitute, in part, that instantia
tion. A useful analogy in this regard is to linguistic processing. There is no upper limit (in
principle) on the number of completely novel sentences that a speaker may utter. This
fact formed one of the starting points for formal arguments against the behaviorist para
digm (Chomsky, 1959). Consider the (indefinite) set of sentences that a person may utter
in his life: Those sentences can have syntactic structures that are in no way specifically
Page 25 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
tied to the particular words through which the expression of those syntactic structures
were realized. The syntax of a sentence is not exhausted by an account of the words of
which it is composed; this is the case even though it may be the first time that that syn
tactic structure has ever been produced, and even though the expression of that particu
lar syntactic structure clearly depended (de facto) on the presence of those particular
words. To close the analogy: Concepts “wear” sensory and motor information in the way
that the syntax of a sentence “wears” particular words.
Toward a Synthesis
We have organized this review around theoretical explanations of category specificity in
the human brain. One theme that emerges is the historical progression from theories
based on a single principle of organization to theories that integrate multiple dimensions
of organization. This progression is due to the broad recognition in the field that a single
dimension will not be sufficient to explain all aspects of the organization of object knowl
edge in the brain. However, every dimension or principle of organization is not of equal
importance. This is because all dimensions do not have the same explanatory scope. A rel
ative hierarchy of principles is therefore necessary to determine which of the many
known facts are theoretically important and which are of only marginal significance
(Caramazza & Mahon, 2003).
Two broad findings emerge from cognitive neuropsychological research. First, patients
have been reported with disproportionate impairments for a modality or type of knowl
edge (e.g., visual-perceptual knowledge—see Figure 27.2B; manipulation knowledge—see
Figure 27.5). Second, category-specific semantic deficits are associated with impairments
for all types of knowledge about the impaired category (see Figure 27.2A). Analogues to
those two facts are also found in functional neuroimaging. First, the attributes of some
categories of objects (e.g., tools) are differentially represented in modality specific sys
tems (i.e., motor systems). Second, within a given modality specific system (e.g., ventral
visual pathway) there is functional organization by semantic category (e.g., living animate
vs. nonliving; see Figure 27.4 for an overview). Thus, across both neuropsychological
studies and functional imaging studies, the broad empirical generalization emerges that
there are two, orthogonal, constraints on the organization of object knowledge: object do
main and sensory-motor modality. This empirical generalization is neutral with respect to
how one explains the causes of category-specific effects in both functional neuroimaging
and neuropsychology.
Page 26 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
All current theories of the organization of conceptual knowledge assume that a concept is
composed of distinct types of information. This shared assumption permits an explanation
of how thinking about a single concept (e.g., hammer) can engage different regions of the
brain that processes distinct types of information (e.g., sensory vs. motor). It also allows
for an account of how patients may present with impairments for a type or modality of
knowledge (e.g., know what a hammer looks like, but not know how to use it). However,
that assumption begs the question of how the different types of information that consti
tute a given concept are functionally unified. A central theoretical issue to be addressed
by the field is to understand the nature of the mechanisms that unify different types of
knowledge about the same entity in the world, and that give rise to a functionally unitary
concept of that entity.
2003; Mahon & Caramazza, 2009, 2011), is that the organization of conceptual knowl
edge in the brain reflects the final product of a complex tradeoff of pressures, some of
which are expressed locally within a given brain region, and some of which are expressed
as connectivity between that region and the rest of the brain. Our suggestion is that con
nectivity within a domain-specific neural circuit is the first, or broadest, principle accord
ing to which conceptual knowledge is organized. For instance, visual motion properties of
living animate things are represented in a different region or system than visual form
properties of living animate things. In addition, affective properties of living animate
things may be represented by other functionally and neuroanatomically distinct systems.
However, all those types of information constitute the domain “living animate.” For that
reason, it is critical to specify the nature of the functional connectivity that relates pro
cessing across distinct subsystems specialized for different types of information. The ba
sic expectation of the distributed domain-specific hypothesis is that the functional con
nectivity that relates processing across distinct types of information (e.g., emotional val
ue versus visual form) will be concentrated around those domains that have had evolu
tionarily important histories. The strong prediction that follows from that view is that it is
those neural circuits that are disrupted or disorganized after brain damage in patients
with category-specific semantic deficits.
Page 27 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
havior arises as a result of the integration of multiple cognitive processes that individual
ly operate over distinct types of knowledge. On the distributed domain-specific hypothe
sis, the distinct (and potentially modular) processes within the sensory, motor, and affec
tive systems are components of broader structures within the mind/brain. This framework
thus emphasizes the need to understand how different types of cognitive processes, oper
ating over different types of information, work in concert to orchestrate behavior.
In the more than 25 years since Warrington and colleagues’ first detailed reports of pa
tients with category-specific semantic deficits, new fields of study have emerged around
the study of the organization and representation of conceptual knowledge. Despite that
progress, the theoretical questions that currently occupy researchers are the same as
those that were initially framed and debated two decades ago: What are the principles of
neural organization that give rise to effects of category specificity? Are different types of
information involved in processing different semantic categories, and if so, what distin
guishes those different types of information? Future research will undoubtedly build on
the available theories and redeploy their individual assumptions within new theoretical
frameworks.
Author Note
Bradford Z. Mahon was supported in part by NIH training grant 5 T32 19942-13 and
R21NS076176-01A1; Alfonso Caramazza was supported by grant DC006842 from the Na
tional Institute on Deafness and Other Communication Disorders. Sections of this article
are drawn from three previous publications of the same authors: Mahon and Caramazza,
2008, 2009, and 2011. The authors are grateful to Erminio Capitani, Marcella Laiacona,
Alex Martin, and Daniel Schacter for their comments on an earlier draft.
References
Allport, D. A. (1985). Distributed memory, modular subsystems and dysphasia. In S. K.
Newman & R. Epstein (Eds.), Current perspectives in dysphasia. New York: Churchill Liv
ingstone.
Astafiev, S. V., et al. (2004). Extrastriate body area in human occipital cortex responds to
the performance of motor actions. Nature Neuroscience, 7, 542–548.
Bar, M., & Aminoff, E. (2003). Cortical analysis of visual context. Neuron, 38, 347–358.
Barbarotto, R., Capitani, E., & Laiacona, M. (1996). Naming deficit in herpes simplex en
cephalitis. Acta Neurologica Scandinavaca, 93, 272–280.
Page 28 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Barbarotto, R., Capitani, E., Spinnler, H., & Trivelli, C. (1995). Slowly progressive seman
tic impairment with category specificity. Neurocase, 1, 107–119.
Barbarotto, R., Laiacona, M., Macchi, V., & Capitani, E. (2002). Picture reality decision,
semantic categories, and gender: A new set of pictures, with norms and an experimental
study. Neuropsychologia, 40, 1637–1653.
22, 637–660.
Beauchamp, M. S., Lee, K. E., Haxby, J. V., & Martin, A. (2002.) Parallel visual motion pro
cessing streams for manipulable objects and human movements. Neuron, 24, 149–159.
Beauchamp, M. S., Lee, K. E., Haxby, J. V., & Martin, A. (2003). FMRI responses to video
and point-light displays of moving humans and manipulable objects. Journal of Cognitive
Neuroscience, 15, 991–1001.
Beauvois, M.-F. (1982). Optic aphasia: A process of interaction between vision and lan
guage. Philosophical Transacation of the Royal Society of London B, 298, 35–47.
Beauvois, M.-F., Saillant, B., Mhninger, V., & Llermitte, F. (1978). Bilateral tactile aphasia:
A tacto-verbal dysfunction. Brain, 101, 381–401.
Blundo, C., Ricci, M., & Miller, L. (2006). Category-specific knowledge deficit for animals
in a patient with herpes simplex encephalitis. Cognitive Neuropsychology, 23, 1248–1268.
Borgo, F., & Shallice, T. (2001). When living things and other “sensory-quality” categories
behave in the same fashion: A novel category-specific effect. Neurocase, 7, 201–220.
Borgo, F., & Shallice, T. (2003). Category specificity and feature knowledge: Evidence
from new sensory-quality categories. Cognitive Neuropsychology, 20, 327–353.
Boronat, C. B., Buxbaum, L. J., Coslett, H. B., Tang, K., Saffran, E. M., et al. 2004. Distinc
tions between manipulation and function knowledge of objects: Evidence from functional
magnetic resonance imaging. Cognitive Brain Research, 23, 361–373.
Boulenger, V., Roy, A. C., Paulignan, Y., Deprez, V., Jeannerod, M., & Nazir, T. A. (2006).
Cross-talk between language processes and overt motor behavior in the first 200 msec of
processing. Journal of Cognitive Neuroscience, 18, 1607–1615.
Brambati, S. M., Myers, D., Wilson, A., Rankin, K. P., Allison, S. C., Rosen, H. J., Miller, B.
L., & Gorno-Tempini, M. L. (2006). The anatomy of category-specific object naming in
neurodegenerative diseases. Journal of Cognitive Neuroscience, 18, 1644–1653.
Page 29 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Bright, P., Moss, H. E., Stamatakis, E. A., & Tyler, L. K. (2005). The anatomy of object pro
cessing: The role of anteromedial temporal cortex. Quarterly Journal of Experimental Psy
chology B, 58, 361–377.
Buccino, G., Riggio, L., Melli, G., Binkofski, F., Gallese, V., & Rizzolatti, G. (2005). Listen
ing to action related sentences modulates the activity of the motor system: A combined
TMS and behavioral study. Cognitive Brain Research, 24, 355–363.
Büchel, C., Price, C. J., & Friston, K. (1998). A multimodal language region in the ventral
visual pathway. Nature, 394, 274–277.
Buxbaum, L. J., Veramonti, T., & Schwartz, M. F. (2000). Function and manipulation tool
knowledge in apraxia: Knowing “what for” but not “how.” Neurocase, 6, 83–97.
Canessa, N., Borgo, F., Cappa, S. F., Perani, D., Falini, A., Buccino, G., Tettamanti, M., &
Shallice, T. (2008). The different neural correlates of action and functional knowledge in
semantic memory: An fMRI study. Cerebral Cortex, 18, 740–751.
Cant, J. S. et al. (2009) fMR-adaptation reveals separate processing regions for the per
ception of form and texture in the human ventral stream. Experimental Brain Research,
192, 391–405.
Cantlon, J. F., Platt, M., & Brannon, E. M. (2009). The number domain. Trends in Cogni
tive Sciences, 13 (2), 83–91.
Capitani, E., Laiacona, M., Mahon, B., & Caramazza, A. (2003). What are the facts of cate
gory-specific deficits? A critical review of the clinical evidence. Cognitive Neuropsycholo
gy, 20, 213–262.
Caramazza, A. (1986). On drawing inferences about the structure of normal cognitive sys
tems from the analysis of patterns of impaired performance: The case for single-patient
studies. Brain and Cognition, 5, 41–66.
Caramazza, A., Hillis, A. E., Rapp, B. C., & Romani, C. (1990). The multiple semantics hy
pothesis: Multiple confusions? Cognitive Neuropsychology, 7, 161–189.
Caramazza, A., & Mahon, B. Z. (2003). The organization of conceptual knowledge: The ev
idence from category-specific semantic deficits. Trends in Cognitive Sciences, 7, 354–361.
Page 30 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Caramazza, A., & Mahon, B. Z. (2006). The organisation of conceptual knowledge in the
brain: The future’s past and some future directions. Cognitive Neuropsychology, 23, 13–
38.
Caramazza, A., & Shelton, J. R. (1998). Domain specific knowledge systems in the brain:
The animate-inanimate distinction. Journal of Cognitive Neuroscience, 10, 1–34.
Carey, S., & Spelke, E. S. (1994). Domain specific knowledge and conceptual change. In
L. Hirschfeld & S. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and
culture (pp. 169–200). Cambridge, UK: Cambridge University Press.
Carroll, E., & Garrard, P. (2005). Knowledge of living, nonliving and “sensory quality” cat
egories in semantic dementia. Neurocase, 11, 338–350.
Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates in posteri
or temporal cortex for perceiving and knowing about objects. Nature Neuroscience, 2,
913–919.
Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.
Chao, L. L., Weisberg, J., & Martin, A. (2002). Experience-dependent modulation of cate
gory related cortical activity. Cerebral Cortex, 12, 545–551.
Coslett, H. B., & Saffran, E. M. (1992). Optic aphasia and the right hemisphere: A replica
tion and extension. Brain and Language, 43, 148–161.
Crawford, J. R., & Garthwaite, P. H. (2006). Methods of testing for a deficit in single case
studies: Evaluation of statistical power by Monte Carlo simulation. Cognitive Neuropsy
chology, 23, 877–904.
Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and com
putation of the meaning of chipmunk, cherry, chisel, cheese, and cello and many other
such concrete nouns Journal of Experimental Psychology: General, 132, 163–201.
Crutch, S. J., & Warrington, E. K. (2003). The selective impairment of fruit and
(p. 574)
Culham, J. C., Danckert, S. L., DeSourza, J. F. X., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153, 180–189.
Damasio, H., Grabowski, T. J., Tranel, D., & Hichwa, R. D. (1996). A neural basis for lexi
cal retrieval. Nature, 380, 499–505.
Damasio, H., Tranel, D., Grabowski, T., Adolphs, R., & Damasio, A. (2004). Neural systems
behind word and concept retrieval. Cognition, 92, 179–229.
Page 31 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Dehaene, S., Cohen, L., Sigman, M., & Vinckier, F. (2005). The neural code for written
words: A proposal. Trends in Cognitive Sciences, 9, 335–341.
Devlin, J., Gonnerman, L., Andersen, E., & Seidenberg, M. (1998). Category-specific se
mantic deficits in focal and widespread brain damage: A computational account. Journal
of Cognitive Neuroscience, 10, 77–94.
Devlin, J. T., Moore, C. J., Mummery, C. J., Gorno-Tempini, M. L., Phillips, J. A., Noppeney,
U., Frackowiak, R. S. J., Friston, K. J., & Price, C. J. (2002). Anatomic constraints on cogni
tive theories of category-specificity. NeuroImage, 15, 675–685.
Dixon, M. J., Piskopos, M., & Schweizer, T. A. (2000). Musical instrument naming impair
ments: The crucial exception to the living/nonliving dichotomy in category-specific ag
nosia. Brain and Cognition, 43, 158–164.
Duchaine, B. C., & Yovel, G. (2008). Face recognition. The Senses: A Comprehensive Ref
erence, 2, 329–357.
Duchaine, B. C., Yovel, G., Butterworth, E. J., & Nakayama, K. (2006). Prosopagnosia as
an impairment to face-specific mechanisms: Elimination of the alternative hypotheses in a
developmental case. Cognitive Neuropsychology, 23, 714–747.
Eggert, G. H. (1977). Wernicke’s works on aphasia: A sourcebook and review (Vol. 1). The
Hague: Mouton.
Ellis, A. W., Young, A. W., & Critchley, A. M. R. (1989). Loss of memory for people follow
ing temporal lobe damage. Brain, 112, 1469–1483.
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ
ment. Nature, 392, 598–601.
Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.
Farah, M., & McClelland, J. (1991). A computational model of semantic memory impair
ment: modality specificity and emergent category specificity. Journal of Experimental Psy
chology: General, 120, 339–357.
Farah, M. J., & Rabinowitz, C. (2003). Genetic and environmental influences on the orga
nization of semantic memory in the brain: Is “living things” an innate category? Cognitive
Neuropsychology, 20, 401–408.
Feigenson, L., Dehaene, S., & Spelke, E. S. (2004). Core systems of number. Trends in
Cognitive Sciences, 8, 307–314.
Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in primate
visual cortex. Cerebral Cortex, 1, 1–47.
Funnell, E., & Sheridan, J. (1992). Categories of knowledge: Unfamiliar aspects of living
and nonliving things. Cognitive Neuropsychology, 9, 135–153.
Gainotti, G. (2000). What the locus of brain lesion tells us about the nature of the cogni
tive defect underlying category-specific disorders: A review. Cortex, 36, 539–559.
Gallese, V., & Lakoff, G. (2005). The brain’s concepts: The role of the sensory-motor sys
tem in reason and language. Cognitive Neuropsychology, 22, 455–479.
Garrard, P., Patterson, K., Watson, P. C., & Hodges, J. R. (1998). Category-specific seman
tic loss in dementia of Alzheimer’s type: Functional-anatomical correlations from cross-
sectional analyses. Brain, 121, 633–646.
Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and
birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197.
Gauthier, I., et al. (1999) Activation of the middle fusiform “face area” increases with ex
pertise in recognizing novel objects. Nature Neuroscience, 2, 568–573.
Gelman, R. (1990). First principles organize attention to and learning about relevant da
ta: Number and the animate-inanimate distinction as examples. Cognitive Science, 14,
79–106.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac
tion. Trends in Neurosciences, 15, 20–25.
Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review Neuro
science, 27, 649–677.
Hart, J., Anand, R., Zoccoli, S., Maguire, M., Gamino, J., Tillman, G., King, R., & Kraut, M.
A. (2007). Neural substrates of semantic memory. Journal of the International Neuropsy
chology Society, 13, 865–880.
Hart, J., Jr., Berndt, R. S., & Caramazza, A. (1985). Category-specific naming deficit fol
lowing cerebral infarction. Nature, 316, 439–440.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor
tex. Science, 293, 2425–2430.
Page 33 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Hécaen, H., & De Ajuriaguerra, J. (1956). Agnosie visuelle pour les objets inanimées par
lésion unilatérale gauche. Révue Neurologique, 94, 222–233.
Hermer, L., & Spelke, E. S. (1994). A geometric process for spatial reorientation in young
children. Nature, 370, 57–59.
Hillis, A. E., & Caramazza, A. (1991). Category-specific naming and comprehension im
pairment: A double dissociation. Brain, 114, 2081–2094.
Hillis, A. E., & Caramazza, A. (1995). Cognitive and neural mechanisms underlying visual
and semantic processing: Implications from “optic aphasia.” Journal of Cognitive Neuro
science, 7, 457–478.
Humphreys, G. W., & Forde, E. M. E. (2005). Naming a giraffe but not an animal: Base-
level but not superordinate naming in a patient with impaired semantics. Cognitive Neu
ropsychology, 22, 539–558.
Ishai, A., Ungerleider, L. G., Martin, A., Schourten, J. L., & Haxby, J. V. (1999). Distributed
representation of objects in the human ventral visual pathway. Proceedings of the Nation
al Academy of Sciences U S A, 96, 9379–9384.
Jeannerod, M., & Jacob, P. (2005). Visual cognition: A new look at the two-visual systems
model. Neuropsychologia, 43, 301–312.
Johnson-Frey, S. H. (2004). The neural bases of complex tool use in humans. Trends in
Cognitive Sciences, 8, 71–78.
Kable, J. W., Lease-Spellmeyer, J., & Chatterjee, A. (2002). Neural substrates of action
event knowledge. Journal of Cognitive Neuroscience, 14, 795–805.
Kellenbach, M. L., Brett, M., & Patterson, K. (2003). Actions speak louder than functions:
The importance of manipulability and action in tool representation. Journal of Cognitive
Neuroscience, 15, 20–46.
Kemmerer, D., Gonzalez Castillo, J., Talavage, T., Patterson, S., & Wiley, C. (2008). Neu
roanatomical distribution of five semantic components of verbs: Evidence from fMRI.
Brain and Language, 107, 16–43.
Page 34 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in re
sponse patterns of neuronal population in monkey inferior temporal cortex. Journal of
Neurophysiology, 97, 4296–4309.
Kveraga, K., et al. (2007). Magnocellular projections as the trigger of top-down facilita
tion in recognition. Journal of Neuroscience, 27, 13232–13240.
Laiacona, M., Barbarotto, R., & Capitani, E. (1993). Perceptual and associative knowledge
in category specific impairment of semantic memory: A study of two cases. Cortex, 29,
727–740.
Laiacona, M., Barbarotto, R., & Capitani, E. (1998). Semantic category dissociation in
naming: Is there a gender effect in Alzheimer disease? Neuropsychologia, 36, 407–419.
Laiacona, M., Barbarotto, R., & Capitani, E. (2005). Animals recover but plant life knowl
edge is still impaired 10 years after herpetic encephalitis: the long-term follow-up of a pa
tient. Cognitive Neuropsychology, 22, 78–94.
Laiacona, M., Barbarotto, R., & Capitani, E. (2006). Human evolution and the brain repre
sentation of semantic knowledge: Is there a role for sex differences? Evolution and Hu
man Behaviour, 27, 158–168.
Laiacona, M., & Capitani, E. (2001). A case of prevailing deficit on nonliving categories or
a case of prevailing sparing of living categories? Cognitive Neuropsychology, 18, 39–70.
Laiacona, M., Capitani, E., & Caramazza, A. (2003). Category-specific semantic deficits do
not reflect the sensory-functional organisation of the brain: A test of the “sensory-quality”
hypothesis. Neurocase, 9, 3221–3231.
Lambon Ralph, M. A., Howard, D., Nightingale, G., & Ellis, A. W. (1998). Are living and
non-living category-specific deficits causally linked to impaired perceptual or associative
knowledge? Evidence from a category-specific double dissociation. Neurocase, 4, 311–
338.
Lambon Ralph, M. A., Lowe, C., & Rogers, T. T. (2007). Neural basis of category-specific
semantic deficits for living things: Evidence from semantic dementia, HSVE and a neural
network model. Brain, 130, 1127–1137.
Laws, K. R., & Gale, T. M. (2002). Category-specific naming and the “visual” characteris
tics of line drawn stimuli. Cortex, 38, 7–21.
Laws, K. R., & Neve, C. (1999). A “normal” category-specific advantage for naming living
things. Neuropsychologia, 37, 1263–1269.
Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organi
zation of human object areas. Nature Neuroscience, 4, 533–539.
Page 35 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Lewis, J. W. (2006). Cortical networks related to human use of tools. Neuroscientist, 12,
211–231.
Lhermitte, F., & Beauvois, M.-F. (1973). A visual speech disconnection syndrome: Report
of a case with optic aphasia, agnosic alexia and color agnosia. Brain, 96, 695–714.
Luzzatti, C., & Davidoff, J. (1994). Impaired retrieval of object-color knowledge with pre
served color naming. Neuropsychologia, 32, 1–18.
Lyons, F., Kay, J., Hanley, J. R., & Haslam, C. (2006). Selective preservation of memory for
people in the context of semantic memory disorder: Patterns of association and dissocia
tion. Neuropsychologia, 44, 2887–2898.
Mahon, B. Z., & Caramazza, A. (2003). Constraining questions about the organisation and
representation of conceptual knowledge. Cognitive Neuropsychology, 20, 433–450.
Mahon, B. Z., & Caramazza, A. (2005). The orchestration of the sensory-motor systems:
Clues from neuropsychology. Cognitive Neuropsychology, 22, 480–494.
Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothe
sis and a new proposal for grounding conceptual content. Journal of Physiology—Paris,
102, 50–70.
Mahon, B. Z., & Caramazza, A. (2011). The distributed domain-specific hypothesis. Trends
in Cognitive Sciences. In Production
Mahon, B. Z., Milleville, S., Negri, G. A. L., Rumiati, R. I., Martin, A., & Caramazza, A.
(2007). Action-related properties of objects shape object representations in the ventral
stream. Neuron, 55, 507–520.
Mahon, B. Z., et al. (2009) Category-specific organization in the human brain does not re
quire visual experience. Neuron, 63, 397–405.
Mahon, B. Z., & Caramazza, A. (2009). Concepts and categories: A cognitive neuropsy
chological perspective. Annual Review of Psychology, 60, 1–15.
Martin, A. (2006). Shades of Déjerine—Forging a causal link between the visual word
form area and reading. Neuron, 50, 173–175.
Martin, A. (2007). The representation of object concepts in the brain. Annual Review Psy
chology, 58, 25–45.
Martin, A., Haxby, J. V., Lalonde, F. M., Wiggs, C. L., & Ungerleider, L. G. (1995). Discrete
cortical regions associated with knowledge of color and knowledge of action. Science,
270, 102–105.
Page 36 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Martin, A., & Weisberg, J. (2003). Neural foundations for understanding social and me
chanical concepts. Cognitive Neuropsychology, 20, 575–587.
McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach to
semantic cognition. Nature Reviews Neuroscience, 4, 310–322.
Mechelli, A., Sartori, G., Orlandi, P., & Price, C. J. (2006). Semantic relevance explains
category effects in medial fusiform gyri. NeuroImage, 3, 992–1002.
Miceli, G., Capasso, R., Daniele, A., Esposito, T., Magarelli, M., & Tomaiuolo, F.
(p. 576)
(2000). Selective deficit for people’s names following left temporal damage: An impair
ment of domain-specific conceptual knowledge. Cognitive Neuropsychology, 17, 489–516.
Miceli, G., Fouch, E., Capasso, R., Shelton, J. R., Tamaiuolo, F., & Caramazza, A. (2001).
The dissociation of color from form and function knowledge. Nature Neuroscience, 4,
662–667.
Miller, E. K., et al. (2003). Neural correlates of categories and concepts. Current Opinion
in Neurobiology, 13, 198–203.
Milner, A. D., Perrett, D. I., Johnson, R. S., Benson, O. J., Jordan, T. R., et al. (1991). Per
ception and action “visual form agnosia.” Brain, 114, 405–428.
Mitchell, J. P., Heatherton, T. F., & Macrae, C. N. (2002). Distinct neural systems subserve
person and object knowledge. Proceedings of the National Academy of Sciences U S A,
99, 15238–15243.
Moreaud, O., Charnallet, A., & Pellat, J. (1998). Identification without manipulation: A
study of the relations between object use and semantic memory. Neuropsychologia, 36,
1295–1301.
Morris, J. S., öhman, A., & Dolan, R. J. (1999). A subcortical pathway to the right amyg
dala mediating “unseen” fear. Proceedings of the National Academy of Sciences U S A, 96,
1680–1685.
Moscovitch, M., Winocur, G., & Behrmann, M. (1997). What is special about face recogni
tion? Nineteen experiments on a person with visual object agnosia and dyslexia but with
normal face recognition. Journal of Cognitive Neuroscience, 9, 555–604.
Moss, H. E., Tyler, L. K., Durrant-Peatfield, M., & Bunn, E. M. (1998). “Two eyes of a see-
through”: Impaired and intact semantic knowledge in a case of selective deficit for living
things. Neurocase, 4, 291–310.
Negri, G. A. L., Rumiati, R. I., Zadini, A., Ukmar, M., Mahon, B. Z., & Caramazza, A.
(2007). What is the role of motor simulation in action and object recognition? Evidence
from apraxia. Cognitive Neuropsychology, 24, 795–816.
Page 37 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Noppeney, U., Price, C. J., Penny, W. D., & Friston, K. J. (2006). Two distinct neural mecha
nisms for category-selective responses. Cerebral Cortex, 16, 437–445.
Ochipa, C., Rothi, L. J. G., & Heilman, K. M. (1989). Ideational apraxia: A deficit in tool se
lection and use. Annals of Neurology, 25, 190–193.
Oliveri, M., Finocchiaro, C., Shapiro, K., Gangitano, M., Caramazza, A., & Pascual-Leone,
A. (2004). All talk and no action: A transcranial magnetic stimulation study of motor cor
tex activation during action word production. Journal of Cognitive Neuroscience, 16, 374–
381.
Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules, and dimensions. Nature Reviews Neuroscience, 9, 123–135.
Orlov, T., et al. (2010). Topographic representation of the human body in the occipitotem
poralcortex. Neuron, 68, 586–600.
Pasley, B. N., Mayes, L. C., & Schultz, R. T. (2004). Subcortical discrimination of unper
ceived objects during binocular rivalry. Neuron, 42, 163–172.
Patterson, K., Nestor, P. J., & Rogers, T. (2007). What do you know what you know? The
representation of semantic knowledge in the brain. Nature Neuroscience Reviews, 8,
976–988.
Peelen, M. V., & Caramazza, A. (2010) What body parts reveal about the organization of
the brain. Neuron, 68, 331–333.
Pietrini, P., et al. (2004) Beyond sensory images: Object-based representation in the hu
man ventral pathway. Proceedings of the National Academy of Sciences U S A, 101, 5658–
5663.
Pisella, L., Binkofski, B. F., Lasek, K., Toni, I., & Rossetti Y. 2006. No double-dissociation
between optic ataxia and visual agnosia: Multiple sub-streams for multiple visuo-manual
integrations. Neuropsychologia, 44, 2734–2748.
Polk, T. A., Park, J., Smith, M. R., & Park, D. C. (2007). Nature versus nurture in ventral vi
sual cortex: A functional magnetic resonance imaging study of twins. Journal of Neuro
science, 27, 13921–13925.
Page 38 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Prinz, J. J. (2002). Furnishing the mind: Concepts and their perceptual basis. Cambridge,
MA: MIT Press.
Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews
Neuroscience, 6, 576–582.
Pulvermüller, F., Hauk, O., Nikolin, V. V., & Ilmoniemi, R. J. (2005). Functional links be
tween language and motor systems. European Journal of Neuroscience, 21, 793–797.
Rapcsak, S. Z., Ochipa, C., Anderson, K. C., & Poizner, H. (1995). Progressive ideomotor
apraxia: Evidence for a selective impairment in the action production system. Brain and
Cognition, 27, 213–236.
Riddoch, M. J., Humphreys, G. W., Coltheart, M., & Funnell, E. (1988). Semantic systems
or system? Neuropsychological evidence re-examined. Cognitive Neuropsychology, 5, 3–
25.
Rogers, T. T., Hocking, J., Mechelli, A., Patterson, K., & Price, C. J. (2003). Fusiform activa
tion to animals is driven by the process, not the stimulus. Journal of Cognitive Neuro
science, 17, 434–445.
Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., et al. (2004).
Structure and deterioration of semantic memory: A neuropsychological and computation
al investigation. Psychological Review, 111, 205–235.
Rogers, T. T., et al. (2005). Fusiform activation to animals is driven by the process, not the
stimulus. Journal of Cognitive Neuroscience, 17, 434–445.
Rosci, C., Chiesa, V., Laiacona, M., & Capitani, E. (2003). Apraxia is not associated to a
disproportionate naming impairment for manipulable objects. Brain and Cognition, 53,
412–415.
Rothi, L. J., Ochipa, C., & Heilman, K. M. (1991). A cognitive neuropsychological model of
limb praxis. Cognitive Neuropsychology, 8, 443–458.
Rumiati, R. L., Zanini, S., & Vorano, L. (2001). A form of ideational apraxia as a selective
deficit of contention scheduling. Cognitive Neuropsychology, 18, 617–642.
Sacchett, C., & Humphreys, G. W. (1992). Calling a squirrel a squirrel but a canoe a wig
wam: A category-specific deficit for artifactual objects and body parts. Cognitive Neu
ropsychology, 9, 73–86.
Page 39 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Samson, D., & Pillon, A. (2003). A case of impaired knowledge for fruit and veg
(p. 577)
Sartori, G., & Lombardi, L. (2004). Semantic relevance and semantic disorders. Journal of
Cognitive Neuroscience, 16, 439–452.
Sartori, G., Lombardi, L., & Mattiuzzi, L. (2005). Semantic relevance best predicts normal
and abnormal name retrieval. Neuropsychologia, 43, 754–770.
Shelton, J. R., Fouch, E., & Caramazza, A. (1998). The selective sparing of body part
knowledge: A case study. Neurocase, 4, 339–351.
Silveri, M. C., Gainotti, G., Perani, D., Cappelletti, J. Y., Carbone, G., & Fazio, F. (1997).
Naming deficit for non-living items: Neuropsychological and PET study. Neuropsychologia,
35, 359–367.
Sirigu, A., Duhamel, J., & Poncet, M. (1991). The role of sensorimotor experience in object
recognition. Brain, 114, 2555–2573.
Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge.
Psychological Review, 99, 605–632.
Stewart, F., Parkin, A. J., & Hunkin, N. M. (1992). Naming impairments following recovery
from herpes simplex encephalitis. Quarterly Journal of Experimental Psychology A, 44,
261–284.
Strnad, L., Anzellotti, S., & Caramazza, A. (2011). Formal models of categorization: In
sights from cognitive neuroscience. In E. M. Pothos & A. J. Wills (Eds.), Formal approach
es in categorization (pp. 313–324). Cambridge, UK: Cambridge University Press.
Tanaka, K., et al. (1991). Coding visual images of objects in the inferotemporal cortex of
the macaque monkey. Journal of Neurophysiology, 66, 170–189.
Thompson-Schill, S. L., Aguirre, G. K., D’Esposito, M., & Farah, M. J. (1999). A neural ba
sis for category and modality specificity of semantic knowledge. Neuropsychologia, 37,
671–676.
Tranel, D., Logan, C. G., Frank, R. J., & Damasio, A. R. (1997). Explaining category-relat
ed effects in the retrieval of conceptual and lexical knowledge of concrete entities: Opera
tionalization and analysis of factor. Neuropsychologia, 35, 1329–1339.
Turnbull, O. H., & Laws, K. R. (2000). Loss of stored knowledge of object structure: Impli
cation for “category-specific” deficits. Cognitive Neuropsychology, 17, 365–389.
Tyler, L. K., & Moss, H. E. (2001). Towards a distributed account of conceptual knowl
edge. Trends in Cognitive Sciences, 5, 244–252.
Tyler, L. K., et al. (2003). Do semantic categories activate distinct cortical regions? Evi
dence for a distributed neural semantic system. Cognitive Neuropsychology, 20, 541–559.
Valyear, K. F., & Culham, J. C. (2009). Observing learned object-specific functional grasps
preferentially activates the ventral stream. Journal of Cognitive Neuroscience, 22, 970–
984.
Vinson, D. P., Vigliocco, G., Cappa, S., & Siri, S. (2003). The breakdown of semantic
knowledge: Insights from a statistical model of meaning representation. Brain and Lan
guage, 86, 347–365.
Vuilleumier, P., et al. (2004) Distant influences of amygdala lesion on visual cortical acti
vation during emotional face processing. Nature Neuroscience, 7, 1271–1278.
Warrington, E. K., & McCarthy, R. (1983). Category specific access dysphasia. Brain, 106,
859–878.
Warrington, E. K., & Shallice, T. (1984). Category specific semantic impairment. Brain,
107, 829–854.
Zannino, G. D., Perri, R., Carlesimo, G. A., Pasqualetti, P., & Caltagirone, C. (2002). Cate
gory-specific impairment in patients with Alzheimer’s disease as a function of disease
severity: A cross-sectional investigation. Neuropsychologia, 40, 2268–2279.
Zannino, G. D., Perri, R., Pasqualetti, P., Caltagirone, C., & Carlesimo, G. A. (2006). Analy
sis of the semantic representations of living and nonliving concepts: A normative study.
Cognitive Neuropsychology, 23, 515–540.
Page 41 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain
Bradford Z. Mahon
Alfonso Caramazza
Alfonso Caramazza is Daniel and Amy Starch Professor of Psychology at Harvard Uni
versity.
Page 42 of 42
A Parallel Architecture Model of Language Processing
The Parallel Architecture is a linguistic theory in which (1) the generative power of lan
guage is divided among phonology, syntax, and semantics; and (2) words, idioms, morpho
logical affixes, and phrase structure rules are stored in the lexicon in a common format.
This formalization leads to a theory of language processing in which the “competence”
grammar is put to work directly in “performance,” and which is compatible with psy
cholinguistic evidence concerning lexical retrieval, incremental and parallel processing,
syntactic priming, and the integration of visual context with semantic processing.
A theory of language processing has to explain how language users convert sounds into
meanings in language perception and how they convert meanings into sounds in lan
guage production. In particular, the theory has to describe what language users store in
long-term memory that enables them to do this, and how the material stored in memory is
brought to bear in understanding and creating novel utterances in real time.
Page 1 of 31
A Parallel Architecture Model of Language Processing
These observations already suffice to call into question connectionist models of language
perception whose success is judged by their ability to predict the next word of a sen
tence, given some finite preceding context (e.g., Elman, 1990; MacDonald & Christiansen,
2002; and, as far as I can determine, Tabor & Tanenhaus, 1999). The implicit theory of
language behind such models is that well-formed language is characterized only by the
statistical distribution of word sequencing. To be sure, statistics of word sequencing are
sometimes symptomatic of meaning relations, but they do not constitute meaning rela
tions. Consider the previous sentence: (1) How could a processor predict the full succes
sion of words, and (2) what good would such predictions do in understanding the sen
tence? Moreover, predicting the next word has no bearing whatsoever on an explanation
of speech production, where the goal has to be to produce the next word in an effort to
say something meaningful.
Page 2 of 31
A Parallel Architecture Model of Language Processing
More generally, we have known since Chomsky (1957) and Miller and Chomsky (1963)
that sequential dependencies among words in a sentence are not sufficient to determine
understanding or even grammaticality. For instance, in (1),
(1) Does the little boy in the yellow hat who Mary described as a genius like
icecream?
the fact that the italicized verb is like rather than likes is determined by the presence of does,
fourteen words away; and we would have no difficulty making the distance longer. What is signif
icant here is not the distance in words; it is the distance in noun phrases (NPs)—the fact that
does is one NP away from like. This relation is not captured in Elman-style recurrent networks,
which (as pointed out by many critics over the past twenty years) take account only of word se
quence and have no representation of global structure.
Other issues with connectionist models of language processing will arise below. However,
my main focus here is the Parallel Architecture, to which we now turn.
A major theoretical development in the 1970s (e.g., Goldsmith, 1979; Liberman & Prince,
1977) showed that phonology has its own units and principles of combination, incommen
surate with syntactic units, though correlated with them. For instance, consider the sen
tence in (2).
(2) Syntax:
[NP Sesame Street] [VP is [NP a production
[of [NP the Children’s Television Workshop]]]]
Phonology:
Page 3 of 31
A Parallel Architecture Model of Language Processing
The syntactic structure of (2) consists of a noun phrase (NP), Sesame Street, fol
(p. 580)
lowed by a verb phrase (VP), the rest of the sentence. The VP in turn consists of the verb
is plus another NP. This NP has embedded within it a further NP, the Children’s Television
Workshop. However, the way the sentence is pronounced does not necessarily conform to
this structure: It can be broken up into intonation contours (or breath groups) in various
ways, two of which are illustrated in (2). Some of these units, for instance, Sesame Street
is a production of in the first version, do not correspond to any syntactic constituent;
moreover, this unit cannot be classified as an NP or a VP, because it cuts across the
boundaries of both. Another such example is the familiar This is the cat that chased the
rat that ate the cheese. This has relentlessly right-embedded syntax; but the intonation is
a flat structure with three parallel parts, only the last of which corresponds to a syntactic
constituent.
The proper way to characterize the pronunciation of these examples is in terms of Intona
tion Phrases (IPs), phonological units over which intonation contours and the position of
pauses are defined. The pattern of intonation phrases is to some degree independent of
syntactic structure, as seen from the two possibilities in (2). Nevertheless it is not entire
ly free. For instance, (3) is not a possible pronunciation of this sentence.
Thus there is some correlation between phonological and syntactic structure, which a
theory of intonation needs to characterize. A first approximation to the proper account for
English appears to be the following principles, stated very informally (Gee & Grosjean,
1983; Hirst, 1993; Jackendoff, 1987; Truckenbrodt, 1999)2:
(4)
a. An Utterance is composed of a string of IPs; IPs do not embed in each other.
b. An IP must begin at the beginning of a syntactic constituent. It may end be
fore the syntactic constituent does, but it may not go beyond the end of the
largest syntactic constituent that it starts.
Inspection will verify that (2) observes this matching. But (3) does not, because the sec
ond IP begins with the noun Street, and there is no constituent starting with Street that
also contains is a.
These examples illustrate that phonological structure requires its own set of basic units
and combinatorial principles such as (4a). It is generative in the same sense that syntax
is, though not recursive. In addition, because units of phonological structure such as IPs
cannot be derived from syntactic structure, the grammar needs principles such as (4b)
Page 4 of 31
A Parallel Architecture Model of Language Processing
that stipulate how phonological and syntactic structures can be correlated. In the Parallel
Architecture, these are called interface rules.3
Several different and incompatible approaches to semantics developed during the 1970s
and 1980s: formal semantics (Chierchia & McConnell-Ginet, 1990; Lappin, 1996; Partee,
1976), Cognitive Grammar (Lakoff, 1987; Langacker, 1987; Talmy, 1988), Conceptual Se
mantics (Jackendoff, 1983, 1990; Pinker, 1989, 2007), and approaches growing out of cog
nitive psychology (Collins & Quillian, 1969; Rosch & Mervis, 1975; Smith & Medin, 1981;
Smith, Shoben, & Rips, 1974) and artificial intelligence (Schank, 1975). Whatever radical
differences among them, they implicitly agreed on one thing: Meanings of sentences are
not made up of syntactic units such as verbs, noun phrases, and prepositions. Rather,
they are combinations of specifically semantic units such as (conceptualized) individuals,
events, times, places, properties, and quantifiers, none of which correspond one to one
with syntactic units; and these semantic units are combined according to principles that
are specific to semantics and distinct from syntactic principles. This means that seman
tics, like phonology, must be an independent generative system, not strictly derivable
from syntactic structure, but only correlated with it. The correlation between syntax and
semantics again takes the form of interface rules that state the connection between the
two types of mental representation.
The Parallel Architecture, unlike MGG and most other linguistic theories incorporated in
to processing models, incorporates a rich and explicit theory of semantics: Conceptual
Semantics (Jackendoff, 1983, 1990, 2002, chapters 9–12). This theory is what makes it
possible to explore the ways in which syntax does and does not match up with meaning
and the ways in which semantics interfaces with other sorts of cognitive capacities, both
perception and “world knowledge.”
Granting semantics its independence from syntax makes sense both psychologically and
biologically. Sentence meanings are, after all, the combinatorial thoughts that spoken
sentences convey. Thoughts (or concepts) have their own structure, evident even (p. 581)
in nonhuman primates (Cheney & Seyfarth 1990, Hauser 2000), and language is at its ba
sis a combinatorial system for expressing thoughts. (See Pinker & Jackendoff, 2005, and
Jackendoff & Pinker, 2005, for discussion.)
To sum up, we arrive at an architecture for language along the lines of Figure 28.1.
Page 5 of 31
A Parallel Architecture Model of Language Processing
Here the interfaces are indicated by double arrows to signify that they characterize corre
lations of structures with each other rather than derivation of one structure from the oth
er.
A tree is built by “clipping together” these tree-lets at nodes they share, working from the
bottom up, or from the top down, or from anywhere in the middle, as long as the resulting
tree ends up with S at the top and terminal symbols at the bottom. No order for building
trees is logically prior to any other. Alternatively, one can take a given tree and check its
well-formedness by making sure that every part of it conforms to one of the treelets; the
structures in (6) then function as constraints on possible trees rather than as algorithmic
generative engines for producing trees. Hence the constraint-based formalism does not
presuppose any particular implementation; it is compatible with serial, parallel, top-down,
or bottom-up computation.
This approach is advantageous in making contact with models of processing. For exam
ple, suppose an utterance begins with the word the. This is listed in the lexicon as a de
terminer, so we begin with the subtree (7).
(7)
Page 6 of 31
A Parallel Architecture Model of Language Processing
Det is the initial node in treelet (6b), which can therefore be clipped onto (7) to produce
(8).
(8)
In turn, an initial NP fits into treelet (6a), which in turn can have (6c) clipped into its VP,
giving (9)—
(9)
—and we are on our way to anticipatory parsing, that is, setting up grammatical expectations on
the basis of an initial word. Further words in (p. 582) the sentence may be attached on the basis
of the top-down structure anticipated in (9). Alternatively, they may disconfirm it, as in the sen
tence The more I read, the less I understand—in which case other treelets had better be avail
able that can license the construction.4
In psycholinguistics, the term constraint-based seems generally to be used to denote a
lexically driven connectionist architecture along the lines of MacDonald et al. (1994). Like
the constraint-based linguistic theories, these feature multidimensional constraint satis
faction and the necessity to resolve competition among conflicting constraints. However,
as MacDonald and Christiansen (2002) observe, the constraint-based aspects of such pro
cessing theories can be separated from the connectionist aspects. Indeed, one of the ear
liest proposals for lexically driven constraint-based parsing, by Ford et al. (1982), is
couched in traditional symbolic terms.
The constraints that govern structure in the Parallel Architecture are not all word-based,
as they are for MacDonald et al. The projection of the into a Determiner node in (7) is in
deed word-based. But all the further steps leading to (9) are accomplished by the treelets
in (6), which are phrasal constraints that make no reference to particular words. Similar
ly, the use of the prosody–syntax interface constraint (4b) constrains syntactic structure
without reference to particular words. In general, as will be seen later, the building of
structure is constrained by a mixture of word-based, phrase-based, semantically based,
and even pragmatically based conditions.
Page 7 of 31
A Parallel Architecture Model of Language Processing
In the Parallel Architecture, a word is treated as a small interface rule that plays a role in
the composition of sentence structure. It says that in building the structure for a sen
tence, this particular piece of phonology can be matched with this piece of meaning and
these syntactic features. So, for instance, the word cat has a lexical structure along the
lines of (10a), and the has a structure like (10b).
(10)
a. kæt1—N1—CAT1
b. ðə2—Det2—DEF2
The first component of (10a) is a phonological structure; the second marks it as a noun;
the third is a stand-in for whatever semantic features are necessary to distinguish cats
from other things. (10b) is similar (where DEF is the feature “definiteness”). The co-sub
scripting of the components is a way of notating that the three parts are linked in long-
term memory (even if it happens that they are localized in different parts of the brain).
When words are built into phrases, structures are built in all three components in paral
lel, yielding a linked trio of structures like (11) for the cat.
(11)
Here the subscript 1 binds together the components of cat, and the subscript 2 binds to
gether the components of the.
A word can stipulate contextual restrictions on parts of its environment; these include,
among other things, traditional notions of subcategorization and selectional restrictions.
For example, the transitive verb devour requires a direct object in syntactic structure. Its
semantics requires two arguments: an action of devouring must involve a devourer (the
agent) and something being devoured (the patient). Moreover, the patient has to be ex
pressed as the direct object of the verb. Thus the lexical entry for this verb can be notat
ed as (12). The material composing the verb itself is notated in roman type. The contextu
al restrictions are notated in italics: NP, X, and Y are variables that must be satisfied in or
Page 8 of 31
A Parallel Architecture Model of Language Processing
der for a structure incorporating this word to be well formed. The fact that the patient
must appear in object position is notated in terms of the subscript 4 shared by the syntac
tic and semantic structure.
In the course of parsing, if the parser encounters devour, the syntactic and semantic
structure of (12) will create an anticipation of a direct object that denotes some edible en
tity.
that in lexically driven models of parsing such as that of MacDonald et al. (1994). Howev
er, MacDonald et al. claim that all structure is built on the basis of word-based contextual
constraints. This strategy is not feasible in light of the range of structures in which most
open-class items can appear (and it is questioned experimentally by Traxler et al., 1998).
For example, we do not want every English noun to stipulate that it can occur with a pos
sessive, with quantifiers, with prenominal adjectives, with postnominal prepositional
phrase modifiers, and with relative clauses, and if a count noun, in the plural. These pos
sibilities are a general property of noun phrases, captured in the phrasal rules, and they
do not belong in every noun’s lexical entry. Similarly, we do not want every verb to stipu
late that it can occur in every possible inflectional form, and that it can co-occur with a
sentential adverbial, a manner adverbial (if semantically appropriate), time and place
phrases, and so on. Nor, in German, do we want every verb to say that it occurs second in
main clauses and last in subordinate clauses. These sorts of linguistic phenomena are
what a general theory of syntax accounts for, and for which general phrasal rules like (6)
are essential.5 Furthermore, the constraints between prosodic and syntactic constituency
discussed earlier cannot be coded on individual words either. It is an empirical problem to
sort out which constraints on linguistic form are word-based, which are phrase based,
which involve syntactic structure, which involve semantic or prosodic structure, and
which involve interface conditions.
An important feature of the treatment of words illustrated in (12) is that it extends direct
ly to linguistic units both smaller and larger than words. For instance, consider the Eng
lish regular plural inflection, which can be formalized in a fashion entirely parallel to (12).
The phonological component of (13) says that the phoneme z is appended to a phonologi
cal word. The syntactic component says that an affix appears attached to a noun. The co-
subscripting indicates that this affix is pronounced z and the noun corresponds to the
phonological word that z is added to. The semantic component of (13) says that the con
cept expressed by this phonological word is pluralized. Thus the regular plural is formally
similar to a transitive verb; the differences lie in what syntactic category it belongs to and
what categories it attaches to in syntax and phonology.
Page 9 of 31
A Parallel Architecture Model of Language Processing
This conception of regular affixation is somewhat different from Pinker’s (1999). Pinker
would state the regular plural as a procedural rule: “To form the plural of a noun, add z.”
In the present account, the regular plural is at once a lexical item, an interface rule, and
a rule for combining an affix and a noun, depending on one’s perspective. However, the
present analysis does preserve Pinker’s dichotomy between regular affixation and irregu
lar forms. As in his account, irregular forms must be listed individually, whereas regular
forms can be constructed by combining (13) with a singular noun. In other words, this is
a “dual-process” model of inflection. However, the “second” process, that of free combina
tion, is exactly the same as is needed for combining transitive verbs with their objects.
Notice that every theory of language needs a general process for free combination of
verbs and their objects—the combinations cannot be memorized. So parsimony does not
constitute a ground for rejecting this particular version of the dual-process model.6
Next consider lexical entries that are larger than a word. For example, the idiom kick the
bucket is a lexical VP with internal phonological and syntactic structure:
(14)
Here the three elements in phonology are linked to the three terminal elements of the VP
(V, Det, and N). However, the meaning is linked not to the individual words but rather to
the VP as a whole (subscript 10). Thus the words have no meaning on their own—only the
entire VP has meaning. This is precisely what it means for a phrase to be an idiom: Its
meaning cannot be predicted from the meanings of its parts, but instead must be learned
and stored as a whole.
Once we acknowledge that pieces of syntactic structure are stored in long-term memory
associated with idiomatic meanings in items like (14), it is a short step to also admitting
pieces of structure that lack inherent meanings, such as the “treelets” in (6). This leads to
a radical conclusion from the mainstream point of view: words, regular affixes, idioms,
and ordinary phrase structure rules like (6) can all be expressed in a common formalism,
namely as pieces of linguistic structure stored in long-term memory. (p. 584) The lexicon is
not a separate component of grammar from the rules that assemble sentences. Rather,
what have traditionally been distinguished as “words” and “rules” are simply different
sorts of stored structure. “Words” are idiosyncratic interface rules; “rules” may be gener
al interface rules, or they may be simply stipulations of possible structure in one compo
Page 10 of 31
A Parallel Architecture Model of Language Processing
nent or another. Novel sentences are “generated” by “clipping together” pieces of stored
structure, an operation called unification (Shieber, 1986).7
Under this interpretation of words and rules, the distinction between word-based parsing
and rule-based parsing disappears. This yields an immediate benefit in the description of
syntactic priming, in which the use of a particular syntactic structure such as a ditransi
tive verb phrase primes subsequent appearances (Bock, 1995; Bock & Loebell, 1990,). As
Bock (1995) observes, the existence of syntactic priming is problematic within main
stream assumptions: There is no reason that rule application should behave anything like
lexical access. However, in the Parallel Architecture, where syntactic constructions and
words are both pieces of stored structure, syntactic priming is to be expected, altogether
parallel to word priming.
Summing up the last three sections, the Parallel Architecture acknowledges all the com
plexity of linguistic detail addressed by mainstream theory, but it proposes to account for
this detail in different terms. Phonology and semantics are combinatorially independent
from syntax; ordered derivations are replaced by parallel constraint checking; words are
regarded as interface rules that help mediate between the three components of language;
and words and rules are both regarded as pieces of stored structure. Jackendoff (2002)
and Culicover and Jackendoff (2005) demonstrate how this approach leads to far more
natural descriptions of many phenomena such as idioms and offbeat “syntactic nuts” that
have been either problematic or ignored in the mainstream tradition.
Culicover and Jackendoff (2005) further show how this approach leads to a considerable
reduction in the complexity of syntactic structure, an approach called Simpler Syntax.
From the point of view of psycholinguistics, this should be a welcome result. The syntac
tic structures posited by contemporary MGG are far more complex than have been or
could be investigated experimentally, whereas the structures of Simpler Syntax are for
the most part commensurate with those that have been assumed in the last three decades
of psycholinguistic and neurolinguistic research.
Page 11 of 31
A Parallel Architecture Model of Language Processing
ty, does it only pursue one preferred analysis, backing up if it makes a mistake—or does it
pursue multiple options in parallel? Through the past three decades, as these two alterna
tive hypotheses have competed in the literature and have been refined to deal with new
experimental evidence, they have become increasingly indistinguishable (Lewis, 2000).
On the one hand, a parallel model has to rank alternatives for plausibility; on the other
hand, a serial model has to be sensitive to detailed lexically conditioned alternatives that
imply either some degree of parallelism or a phenomenally fast recovery from certain
kinds of incorrect analyses.
The Parallel Architecture cannot settle this dispute, but it does place a distinct bias on
the choice. It has been clear since Swinney (1979) and Tanenhaus et al. (1979) that lexical
access in language perception is “promiscuous”: An incoming phonological string acti
vates all semantic structures associated with it, whatever their relevance to the current
semantic context, and these remain activated in parallel for some time in working memo
ry. As shown earlier, the Parallel Architecture treats syntactic treelets as the same formal
type as words: Both are pieces of structure stored in long-term memory. A structural am
biguity such as that in (15)—
(15) My professor told the girl that Bill liked the story about Harry.
—arises by activating different treelets and/or combining them in different ways, a treatment not
so different in spirit from a lexical ambiguity. This suggests that on grounds of consistency, the
Parallel Architecture recommends parallel processing. Thus in developing a model of process
ing, I assume all (p. 585) the standard features of parallel processing models, in particular com
petition among mutually inhibitory analyses.
A second ongoing dispute in the language processing literature concerns the character of
working memory. One view (going back at least to Neisser, 1967) sees working memory as
functionally separate from long-term memory, a “place” where incoming information can
be structured. In this view, lexical retrieval involves in some sense copying or binding the
long-term coding of a word into working memory. By contrast, semantic network and con
nectionist architectures for language processing (e.g., Smith & Medin, 1981; Elman et al.,
1996; MacDonald & Christiansen, 2002; MacDonald et al., 1994) make no distinction be
tween long-term and working memory. For them, “working memory” is just the part of
long-term memory that is currently activated (plus, in Elman’s recurrent network archi
tecture, a copy of the immediately preceding input); lexical retrieval consists simply of ac
tivating the word’s long-term encoding, in principle a simpler operation.
Such a conception, though, does not allow for the building of structure. Even if the words
of a sentence being perceived are activated, there is no way to connect them up; the dog
chased a cat, the cat chased a dog, and dog cat a chased the activate exactly the same
words. There is also no principled way to account for sentences in which the same word
occurs twice, such as my cat likes your cat: the sentence refers to two distinct cats, even
though there is (presumably) only one “cat node” in the long-term memory network. Jack
endoff (2002, section 3.5) refers to this difficulty as the “Problem of 2” and shows that it
recurs in many cognitive domains, for example, in recognizing two identical forks on the
table, or in recognizing a melody containing two identical phrases. In an approach with a
Page 12 of 31
A Parallel Architecture Model of Language Processing
separate working memory, these problems do not arise: There are simply two copies of
the same material in working memory, each of which has its own relations to other mater
ial (including the other copy).
An approach lacking an independent working memory also cannot make a distinction be
tween transient and permanent linkages. For instance, recall that MacDonald et al.
(1994) propose to account for structure by building into lexical items their potential for
participating in structure. For them, structure is composed by establishing linkages
among the relevant parts of the lexical entries. However, consider the difference between
the phrases throw the shovel and kick the bucket. In the former, where composition is ac
complished on the spot, the linkage between verb and direct object has to be transient
and not affect the lexical entries of the words. But in the latter, the linkage between the
verb and direct object is part of one’s lexical knowledge and therefore permanent. This
distinction is not readily available in the MacDonald et al. model. A separate working
memory deals easily with the problem: Both examples produce linkages in working mem
ory, but only kick the bucket is linked in long-term memory.
Neural network models suffer from two other important problems (discussed at greater
length in Jackendoff, 2002, section 3.5). First, such models encode long-term memories as
connection strengths among units in the network, acquired through thousands of steps of
training. This gives no account of one-time learning of combinatorial structures, such as
the meaning of the sentence I’ll meet you for lunch at noon, a single utterance of which
can be sufficient to cause the hearer to show up for lunch. In a model with a separate
working memory, the perception of this sentence leads to copying of the composite mean
ing into episodic memory (or whatever is responsible for keeping track of obligations and
formulating plans)—which is distinct from linguistic knowledge.
Finally, a standard neural network cannot encode a general relation such as X is identical
with Y, X rhymes with Y,8 or X is the (regular) past tense of Y. Connectionists, when
pressed (e.g., Bybee & McClelland, 2005), claim that there are no such general relations
—there are only family resemblances among memorized items, to which novel examples
are assimilated by analogy. But to show that there is less generality than was thought is
not to show that there are no generalizations. The syntactic generalizations mentioned
earlier, such as the multiple possibilities for noun modification, again can be cited as
counterexamples; they require typed variables such as N, NP, V, and VP in order to be
statable. Marcus (1998, 2001), in important work that has been met with deafening si
lence by the connectionist community,9 demonstrates that neural networks in principle
cannot encode the typed variables necessary for instantiating general relations, including
those involved in linguistic combinatoriality.
This deficit is typically concealed by dealing with toy domains with small vocabularies
and a small repertoire of structures. It is no accident that the domains of language in
which neural network architectures have been most successful are those that make mini
mal use of structure, such as word (p. 586) retrieval, lexical phonology, and relatively sim
ple morphology. All standard linguistic theories give us a handle on how to analyze com
Page 13 of 31
A Parallel Architecture Model of Language Processing
plex sentences like the ones you are now reading; but despite more than twenty years of
connectionist modeling, no connectionist model comes anywhere close. (For instance, the
only example worked out in detail by MacDonald et al., 1994, is the two-word utterance,
John cooked.)
Accordingly, as every theory of processing should, a processing model based on the Paral
lel Architecture posits a working memory separate from long-term memory: a “work
bench” or “blackboard” in roughly the sense of Arbib (1982), on which structures are con
structed online. Linguistic working memory has three subdivisions or “departments,” one
each for the three components of grammar, plus the capability of establishing linkages
among their parts in terms of online bindings of the standard (if ill-understood) sort stud
ied by neuroscience. Because we are adopting parallel rather than serial processing, each
department is capable of maintaining more than one hypothesis, linked to one or more hy
potheses in other departments.10
This notion of working memory contrasts with Baddeley’s (1986) influential treatment, in
which linguistic working memory is a “phonological loop” where perceived phonological
structure is rehearsed. Baddeley’s approach does not tell us how phonological structure
is constructed, nor how the corresponding syntactic and semantic structures are con
structed and related to phonology. Although a phonological loop may be adequate for de
scribing the memorization of strings of nonsense syllables (Baddeley’s principal concern),
it is not adequate for characterizing the understanding of strings of meaningful syllables,
that is, the perception of real spoken language. And relegating the rest of language pro
cessing to a general-purpose “central executive” simply puts off the problem. (See Jack
endoff, 2002, pp. 205–207, for more discussion.)
An Example
With this basic conception of working memory in place, we will now see how the knowl
edge structures posited by the Parallel Architecture can be put to use directly in the
process of language perception. Consider the following pair of sentences.
(16)
a. It’s not a parent, it’s actually a child.
b. It’s not apparent, it’s actually quite obscure.
(16a,b) are phonetically identical (at least in my dialect) up to their final two words. How
ever they are phonologically different: (16a) has a word boundary that (16b) lacks. They
are also syntactically different: a parent is an NP, whereas apparent is an adjective phrase
Page 14 of 31
A Parallel Architecture Model of Language Processing
(AP). And of course they are semantically different as well. The question is how the two
interpretations are developed in working memory and distinguished at the end.
Suppose that auditory processing deposits raw phonetic input into working memory. Fig
ure 28.2 shows what working memory looks like when the first five syllables of (16) have
been so assimilated. (I beg the reader’s indulgence in idealizing away from issues of
phoneme identification, which are of course nontrivial.)
At the next stage of processing, the lexicon must be called into play in order to identify
which words are being heard. For convenience, let us represent the lexicon as in Figure
28.3, treating it as a relatively unstructured collection of phonological, syntactic, and se
mantic structures—sometimes linked—of the sort illustrated in (6), (10), and (12) to (14)
above.
Working memory, seeking potential lexical matches, sends a call to the lexicon, in effect
asking, “Do any of you in there sound like this?” And various phonological structures “vol
unteer” or are activated. Following Swinney (1979) and Tanenhaus, Leiman, and Seiden
berg (1979), all possible forms with the appropriate phonetics are activated: both it’s and
its, both not and knot, and both apparent and a + parent. This experimental result stands
to reason, given that at this point only phonetic information is available to the processor.
However, following the lexically driven parsing tradition, we also can assume that the de
gree and/or speed of activation of the alternative forms depends on their frequency.
Page 15 of 31
A Parallel Architecture Model of Language Processing
Phonological activation in the lexicon spreads to linked syntactic and semantic struc
tures. If a lexical (p. 587) item’s semantic structure has already been primed by context,
its activation will be faster or more robust, or both. Moreover, once lexical semantic
structure is activated, it begins to prime semantically related lexical items. The result is
depicted in Figure 28.4, in which the activated items are indicated in bold.
Next, the activated lexical items are bound to working memory. However, not only the
phonological structure is bound; the syntactic and semantic structures are also bound (or
copied) to the appropriate departments of working memory, yielding the configuration in
Figure 28.5. Because there are alternative ways of carving the phonological content into
lexical items, working memory comes to contain mutually inhibitory “drafts” (in the sense
of Dennett, 1991) of what is being heard. At this point in processing, there is no way of
knowing which of the two competing “drafts” is correct. (For convenience in exposition,
from here on we consider just the fragment of phonetics corresponding (p. 588) to appar
ent or a + parent, ignoring its/it’s and not/ knot.)
The syntactic department of working memory now contains strings of syntactic elements,
so it is now possible to undertake syntactic integration: the building of a unified syntactic
structure from the fragments now present in working memory. Syntactic integration uses
the same mechanism as lexical access: the strings in working memory activate treelets in
Page 16 of 31
A Parallel Architecture Model of Language Processing
long-term memory. In turn these treelets are unified with the existing strings. The string
Det—N thus becomes an NP, and the adjective becomes an AP, as shown in Figure 28.6.
The other necessary step is semantic integration: building a unified semantic structure
from the pieces of semantic structure bound into working memory from the lexicon. This
process has to make use of at least two sets of constraints. One set is the principles of se
mantic well-formedness: unattached pieces of meaning have to be combined in a fashion
that makes sense—both internally and also in terms of any context that may also be
present in semantic working memory. In the present example, these principles will be suf
ficient to bring about semantic integration: INDEF and PARENT can easily be combined
into a semantic constituent, and APPARENT forms a constituent on its own. The resulting
state of working memory looks like Figure 28.7.
However, in more complex cases, semantic integration also has to use the syntax–seman
tics interface rules (also stored in the lexicon, but not stated here), so that integrated syn
tactic structures in working memory can direct the arrangement of the semantic frag
ments. In such cases, semantic integration is dependent on successful syntactic integra
tion. Consider the following example:
(17) What did Sandy say that Pat bought last Tuesday?
In order for semantic integration to connect the meaning of what to the rest of the inter
pretation, syntactic integration must determine that it is the object of bought rather than
the object of say. Syntactic integration must also determine that last Tuesday is a modifi
Page 17 of 31
A Parallel Architecture Model of Language Processing
er of bought and not say (compare to Last Tuesday, what did Sandy say that Pat bought?).
Thus in such cases we expect semantic integration (p. 589) to be (partly) dependent on the
output of syntactic integration.11 In Figure 28.7, as it happens, the semantic structure is
entirely parallel to the syntax, so there is no way to tell whether syntactic integration has
been redundantly evoked to determine the semantics, and with what time course.
At the point reached in Figure 28.7, working memory has two complete and mutually in
hibitory structures, corresponding to the meanings of the two possible interpretations of
the phonetic input. How is this ambiguity resolved? As observed above, it depends on the
meaning of the following context. In particular, part of the meaning of the construction in
(16), It’s not X, it’s (actually) Y, is that X and Y form a semantic contrast. Suppose the in
put is (16a), It’s not a parent, it’s actually a child. When the second clause is semantically
integrated into working memory, the result is then Figure 28.8.
At this point in processing (and only at this point), it becomes possible to detect that the
lower “draft” is semantically ill formed because apparent does not form a sensible con
trast with child. Thus the semantic structure of this draft comes to be inhibited or extin
guished, as in Figure 28.9.
This in turn sets off a chain reaction of feedback through the entire set of linked struc
tures. Because the semantic structure of the lower draft helps keep the rest of the lower
draft stable, and because all departments of the upper draft are trying to inhibit the low
er draft, the entire lower draft comes to be extinguished.
Meanwhile, through this whole process, the activity in working memory has also main
tained activation of long-term memory items that are bound to working memory. Thus,
when apparent and its syntactic structure are extinguished in working memory, the corre
sponding parts of the lexicon are deactivated as well—and they therefore cease priming
semantic associates, as in Figure 28.10.
The final state of working memory is Figure 28.11. If the process from Figure 28.6
through Figure 28.11 goes quickly enough, the perceiver ends up hearing the utterance
as It’s not a parent, with no sense of ambiguity or garden-pathing, even though the disam
biguating information follows the ambiguous passage. Strikingly, the semantics affects
the hearer’s impression of the phonology.
Page 18 of 31
A Parallel Architecture Model of Language Processing
• Syntactic integration proceeds by activating and binding to treelets stored in the lex
icon.
(p. 590)
There is ample room in this model to investigate standard processing issues such as ef
fects of frequency and priming on competition (here localized in lexical access), relative
prominence of alternative parsings (here localized in syntactic integration), influence of
context (here localized in semantic integration), and conditions for garden-pathing (here,
premature extinction of the ultimately correct draft) or absence thereof (as in the present
Page 19 of 31
A Parallel Architecture Model of Language Processing
example). The fact that each step of processing can be made explicit—in terms of ele
ments independently motivated by linguistic theory—recommends the model as a means
of putting all these issues in larger perspective.
Further Issues
This section briefly discusses two further sorts of phenomena that can be addressed in
the Parallel Architecture’s model of processing. I am not aware of attempts to draw these
together in other models.
Tanenhaus et al. (1995) confronted subjects with an array of objects and an instruction
like (18), and their eye movements over the array were tracked.
At the moment in time marked by *, the question faced by the language processor is
whether on (p. 591) is going to designate where the apple is, or where it is to be put—a
classic PP attachment ambiguity. It turns out that at this point, subjects already start
scanning the relevant locations in the array in order to disambiguate the sentence (Is
there more than one apple? Is there already an apple on the towel?). Hence visual feed
back is used to constrain interpretation early on in processing.
The Parallel Architecture makes it clear how this can come about. So far we have spoken
only of interfaces between semantic structure and syntax. However, semantic structure
also interfaces with other aspects of cognition. In particular, to be able to talk about what
we see, high-level representations produced by the visual system must be able to induce
the creation of semantic structures that can then be converted into utterances. Address
ing this need, the Parallel Architecture (Jackendoff, 1987, 1996, 2002, 2012; Landau &
Jackendoff, 1993) proposes a level of mental representation called spatial structure,
which integrates visual, haptic, and proprioceptive inputs into the perception of physical
objects in space (including one’s body). Spatial structure is linked to semantic structure
by means of an interface similar in character to the interfaces within the language faculty.
Some linkages between semantic and spatial structure are stored in long-term memory.
For instance, cat is a semantic category related to the category animal in semantic struc
ture and associated with the phonological structure /kæt/ in long-term memory. But it is
also associated with a spatial structure, which encodes what cats look like, the counter
part in the present approach to an “image of a stereotypical instance.” Other linkages
must be computed combinatorially on line. For instance, the spatial structure that arises
from seeing an apple on a towel is not a memorized configuration, and it must be mapped
online into the semantic structure [APPLE BE [ON [TOWEL]]]. Such a spatial structure
has to be computed in another department of working memory that encodes one’s con
ception of the current spatial layout.12
Page 20 of 31
A Parallel Architecture Model of Language Processing
This description of the visual–linguistic interface is sufficient to give an idea of how exam
ple (18) works. In hearing (18), which refers to physical space, the goal of processing is to
produce not only a semantic structure but a semantic structure that can be correlated
with the current spatial structure through the semantic–spatial interface. At the point
designated by *, syntactic and semantic integration have led to the two drafts in (19). (As
usual, italics denote anticipatory structure to be filled by subsequent material; the seman
tics contains YOU because this is an imperative sentence.)
(19)
Syntax Semantics
PLACE
Thus the hearer has performed enough semantic integration to anticipate finding a
unique referent in the visual environment for the NP beginning with the apple, and starts
scanning for one.
Suppose spatial structure turns up with two apples. Then the only draft that can be corre
lated consistently with spatial structure is (19b), with the expectation that the phrase on
NP will provide disambiguating information. The result is that draft (19a) is extinguished,
just like the lower draft in our earlier example. If it happens that the hearer sees one of
the apples on something, say a towel, and the other is not on anything, the first apple can
be identified as the desired unique referent—and the hearer ought to be able to antici
pate the phonological word towel. Thus by connecting all levels of representation through
the interfaces, it is possible to create an anticipation of phonological structure from visual
input.
This account of (18) essentially follows what Tanenhaus et al. have to say about it. What is
important here is how naturally and explicitly it can be couched in the Parallel Architec
ture—both in terms of its theory of interfaces among levels of representation and in terms
of its theory of processing.
Page 21 of 31
A Parallel Architecture Model of Language Processing
The relationship between the Parallel Architecture and its associated processing model is
a two-way street: It is possible to run experiments that test linguistic hypotheses. For ex
ample, consider the phenomenon of “aspectual coercion,” illustrated in (20) (Jackendoff,
1997; Pustejovsky, 1995; Verkuyl, 1993; among others). Example (20a) conveys a sense of
repeated jumping, but there is no sense of repeated sleeping in the syntactically parallel
(20b).
(20)
a. Joe jumped until the bell rang.
b. Joe slept until the bell rang.
Here is how it works: The semantic effect of until is to place a temporal bound on
(p. 592)
The Parallel Architecture makes the prediction that (20a) will look unexceptionable to the
processor until semantic integration. At this point, the meanings of the words cannot be
integrated, and so semantic integration attempts the more costly alternative of coercion.
Thus a processing load should be incurred specifically at the time of semantic integra
tion. Piñango et al. (1999) test for processing load in examples like (20a,b) during audito
ry comprehension by measuring reaction time to a lexical decision task on an unrelated
probe. The timing of the probe establishes the timing of the processing load. And indeed
extra processing load does show up in the coerced examples, in a time frame consistent
with semantic rather than syntactic or lexical processing, just as predicted by the Parallel
Architecture.13
Similar experimental results have been obtained for the “light verb construction” shown
in (21).
(21)
a. Sam gave Harry an order. (= Sam ordered Harry)
b. Sam got an order from Harry. (= Harry ordered Sam)
In these examples, the main verb order is paraphrased by the combination of the noun an
order plus the light verbs give and get. But the syntax is identical to a “nonlight” use of
the verb, as in Sam gave an orange to Harry and Sam got an orange from Harry. The light
Page 22 of 31
A Parallel Architecture Model of Language Processing
verb construction comes to paraphrase the simple verb through a semantic manipulation
that combines the argument structures of the light verb and the nominal (Culicover &
Jackendoff, 2005, pp. 222–225). Thus again the Parallel Architecture predicts additional
semantic processing, and this is confirmed by experimental results (Piñango et al., 2006;
Wittenberg et al., forthcoming, Wittenberg et al., in revision).
Final Overview
The theoretical account of processing sketched in the previous three sections follows di
rectly from the logic of the Parallel Architecture. First, as in purely word-driven ap
proaches to processing, this account assumes that words play an active role in determin
ing structure at phonological, syntactic, and semantic levels: The linguistic theory posits
that words, idioms, and constructions are all a part of the rule system. In particular, the
interface properties of words determine the propagation of activity across the depart
ments of working memory.
Fourth, the system makes crucial use of parallel processing: All relevant structures are
processed at once in multiple “drafts,” in competition with one another. The extinction of
competing drafts is carried out along pathways established by the linkages among struc
tures and the bindings between structures in working memory and the lexicon. Because
the Parallel Architecture conceives of the structure of a sentence as a linkage among
three separate structures, the handling of the competition among multiple drafts is com
pletely natural.
What is perhaps most attractive about the Parallel Architecture from a psycholinguistic
perspective is that the principles of grammar are used directly by the processor. That is,
unlike the classical MGG architecture, there is no “metaphor” involved in the notion of
(p. 593) grammatical derivation. The formal notion of structure building in the compe
tence model is the same as in the performance model, except that it is not anchored in
time. Moreover, the principles of grammar are the only routes of communication between
Page 23 of 31
A Parallel Architecture Model of Language Processing
semantic context and phonological structure: context effects involve no “wild card” inter
actions using non-linguistic strategies. The Parallel Architecture thus paves the way for a
much closer interaction between linguistic theory and psycholinguistics than has been
possible in the past three decades.
Author Note
This article is an abridged version of Jackendoff, 2007. I am thankful to Gina Kuperberg
and Maria Mercedes Piñango for many detailed comments and suggestions on previous
drafts. Two anonymous reviewers also offered important suggestions. Martin Paczynski
helped a great deal with graphics. My deepest gratitude goes to Edward Merrin for re
search support, through his gift of the Seth Merrin Professorship to Tufts University.
References
Arbib, M. A. (1982). From artificial intelligence to neurolinguistics. In M. A. Arbib, D. Ca
plan, J. C. Marshall (Eds.), Neural models of language processes (pp. 77–94). New York:
Academic Press.
Bock, K. (1995). Sentence production: From mind to mouth. In J. L. Miller & P. D. Eimas
(Eds.), Handbook of perception and cognition, Vol. XI: Speech, language, and communica
tion (pp. 181–216), Orlando, FL: Academic Press.
Bock, K., & Loebell, H. (1990). Framing sentences. Cognition, 35, 1–39.
Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of lin
guistic theory based on domain general principles of human cognition. Linguistic Review,
22, 381–410.
Cheney, D., & Seyfarth, R. (1990). How monkeys see the world. Chicago: University of
Chicago Press.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Page 24 of 31
A Parallel Architecture Model of Language Processing
Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge, UK:
Cambridge University Press.
Collins, A., & Quillian, M. (1969). Retrieval time from semantic memory. Journal of Verbal
Learning and Verbal Behavior, 9, 240–247.
Culicover, P., & Jackendoff, R. (2005). Simpler syntax. Oxford, UK: Oxford University
Press.
Elman, J., Bates, E., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Re
thinking innateness. Cambridge, MA: MIT Press.
Ford, M., Bresnan, J., Kaplan, R. C., 1982. A competencebased theory of syntactic closure.
In J. Bresnan (Ed.). The mental representation of grammatical relations (pp. 727–796).
Cambridge, MA: MIT Press.
Frazier, L., Carlson, K., & Clifton, C. (2006). Prosodic phrasing is central to language
comprehension. Trends in Cognitive Sciences, 10, 244–249.
Gee, J., & Grosjean, F. (1983). Performance structures: A psycholinguistic and lin
(p. 595)
Hagoort, P. (2005). On Broca, brain, and binding: A new framework. Trends in Cognitive
Sciences, 9, 416–423.
Hauser, M. D. (2000). Wild minds: What animals really think. New York: Henry Holt.
Hirst, D. (1993). Detaching intonational phrases from syntactic structure. Linguistic In
quiry, 24, 781–788.
Jackendoff, R. (1987). Consciousness and the computational mind. Cambridge, MA: MIT
Press.
Page 25 of 31
A Parallel Architecture Model of Language Processing
Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT
Press.
. Reprinted in
R. Borsley & K. Börjars, Eds. (2011). Non-Transformational syntax (pp. 268–296). Oxford,
UK: Wiley-Blackwell.
Jackendoff, R. (2010). Meaning and the lexicon: The Parallel Architecture 1975–2010. Ox
ford, UK: Oxford University Press.
Jackendoff, R. (2012). A user’s guide to thought and meaning. Oxford, UK: Oxford Univer
sity Press.
Jackendoff, R. (2011). What is the human language faculty? Two views. Language, 87,
586–624.
Jackendoff, R., & Pinker, S. (2005). The nature of the language faculty and its implications
for the evolution of language (reply to Fitch, Hauser, and Chomsky). Cognition, 97, 211–
225.
Lakoff, G. (1987). Women, fire, and dangerous things. University of Chicago Press, Chica
go.
Landau, B., & Jackendoff, R. (1993). “What” and “where” in spatial language and spatial
cognition. Behavioral and Brain Sciences, 16, 217–238.
Langacker, R. (1987). Foundations of cognitive grammar (Vol. 1). Stanford, CA: Stanford
University Press.
Lappin, S. (1996). The handbook of contemporary semantic theory. Oxford, UK: Black
well.
Lewis, R. (2000). Falsifying serial and parallel parsing models: Empirical conundrums and
an overlooked paradigm. Journal of Psycholinguistic Research, 29, 241–248.
Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8,
249–336.
Page 26 of 31
A Parallel Architecture Model of Language Processing
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). Lexical nature of syn
tactic ambiguity resolution. Psychological Review, 101, 676–703.
Miller, G. A., & Chomsky, N. (1963). Finitary models of language users. In R. D. Luce, R.
R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 419–
492). New York: Wiley.
Paczynski, M., Jackendoff, R., & Kuperberg, G. R. When events change their nature. Un
der revision.
Partee, B., Ed. (1976). Montague grammar. New York: Academic Press.
Phillips, C., & Lau, E. (2004). Foundational issues (review article on Jackendoff 2002).
Journal of Linguistics, 40, 1–21.
Piñango, M. M., Mack, J., & Jackendoff, R. (2006). Semantic combinatorial processes in
argument structure: Evidence from light verbs. Proceedings of the Berkeley Linguistics
Society.
Piñango, M. M., Zurif, E., & Jackendoff, R. (1999). Real-time processing implications of
enriched composition at the syntax-semantics interface. Journal of Psycholinguistic Re
search, 28, 395–414.
Pinker, S., & Jackendoff, R. (2005). The faculty of language: What’s special about it? Cog
nition, 95, 201–236.
Pollard, C., & Sag, I. (1994). Head-driven phrase structure grammar. Chicago: University
of Chicago Press.
Prince, A., & Smolensky, P. (1993/2004). Optimality theory: Constraint interaction in gen
erative grammar. Technical report, Rutgers University and University of Colorado at
Boulder, 1993. (Revised version published by Blackwell, 2004.)
Page 27 of 31
A Parallel Architecture Model of Language Processing
Pylkkänen, L., & McElree, B. (2006). The syntax-semantics interface: On-line composition
of sentence meaning. In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholin
guistics (2nd ed.) New York: Elsevier.
Rosch, E., & Mervis, C. (1975). Family resemblances: Studies in the internal structure of
categories. Cognitive Psychology, 7, 573–605.
Smith, E., & Medin, D. (1981). Categories and concepts. Cambridge, MA: Harvard Univer
sity Press.
Smith, E., Shoben, E., & Rips, L. (1974). Structure and process in semantic memory: A
featural model for semantic decisions. Psychological Review, 81, 214–241.
Tabor, W., & Tanenhaus, M. (1999). Dynamical models of sentence processing. Cognitive
Science, 23, 491–515.
Talmy, L. (1988). Force-dynamics in language and thought. Cognitive Science, 12, 49–100.
Tanenhaus, M., Leiman, J. M., & Seidenberg, M. (1979). Evidence for multiple stages in
the processing of ambiguous words in syntactic contexts. Journal of Verbal Learning and
Verbal Behavior, 18, 427–440.
Tanenhaus, M., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integra
tion of visual and linguistic information in spoken language comprehension. Science, 268,
1632–1634.
Traxler, M. J., Pickering, M. J., & Clifton, C. (1998). Adjunct attachment is not a form of
lexical ambiguity resolution. Journal of Memory and Language, 39, 558–592.
Verkuyl, H. (1993). A theory of aspectuality: The interaction between temporal and atem
poral structure. Cambridge, MA: Cambridge University Press.
Wittenberg, E., Jackendoff, R., Kuperberg, G., Paczynski, M., Snedeker, J., & Wiese, H.
(forthcoming) The processing and representation of light verb constructions. In Bachrach,
A., Roy, I. & Stockall, L. (eds). Structuring the argument. John Benjamins.
Page 28 of 31
A Parallel Architecture Model of Language Processing
Wittenberg, E., Paczynski, M., Wiese, H., Jackendoff, R., & Kuperberg, G. (in revision).
The difference between “giving a rose” and “giving a kiss”: A sustained anterior negativi
ty to the light verb construction.
Notes:
(1) . What counts as “enough” syntactic structure might be different in perception and
production. Production is perhaps more demanding of syntax, in that the processor has to
make syntactic commitments in order to put words in the correct order; to establish the
proper inflectional forms of verbs, nouns, and adjectives (depending on the language); to
leave appropriate gaps for long-distance dependencies; and so on. Perception might be
somewhat less syntax bound in that “seat-of-the-pants” semantic processing can often get
close to a correct interpretation.
(2) . Frazier et al. (2006) suggest that there is more to the prosody–syntax interface than
rule (4b), in that the relative length of pauses between Intonation Phrases can be used to
signal the relative closeness of syntactic relationship among constituents. This result
adds a further level of sophistication to rule (4b), but it does not materially affect the
point being made here. It does, however, show how experimental techniques can be used
to refine linguistic theory.
(3) . A further point: The notion of an independently generative phonology lends itself ele
gantly to the description of signed languages, in which phonological structure in the visu
al-manual modality can easily be substituted for the usual auditory-vocal system.
(4) . Phillips and Lau (2004) find such anticipatory parsing “somewhat mysterious” in the
context of the most recent incarnation of MGG, the Minimalist Program. Frazier (1989),
assuming a mainstream architecture, suggests that the processor uses “precompiled”
phrase structure rules to create syntactic hypotheses. Taken in her terms, the treelets in
(6) are just such precompiled structures. However, in the Parallel Architecture, there are
no “prior” algorithmic phrase structure rules like (5) from which the treelets are “com
piled”; rather, one’s knowledge of phrase structure is encoded directly in the repertoire of
treelets.
(5) . Bybee and McClelland (2005), observing that languages are far less regular than
MGG thinks, take this as license to discard general rules altogether in favor of a statisti
cally based connectionist architecture. But most of the irregularities they discuss are
word-based constraints. They completely ignore the sorts of fundamental syntactic gener
alizations just enumerated.
(6) . I do not exclude the possibility that high-frequency regulars are redundantly stored
in the lexicon. I would add that I do not necessarily endorse every claim made on behalf
of dual-process models of inflection. For more discussion, see Jackendoff (2002, pp.
163-167).
Page 29 of 31
A Parallel Architecture Model of Language Processing
(7) . Unification is superficially like the Merge operation of the Minimalist Program (the
most recent version of MGG). However, there are formal and empirical differences that
favor unification as the fundamental generative process in language. See Jackendoff
(2006, 2011) for discussion.
(8) . Note that rhymes cannot be all memorized. One can judge novel rhymes that cannot
be stored in the lexicon because they involve strings of words. Examples are Gilbert and
Sullivan’s lot o’news/hypotenuse, Ira Gershwin’s embraceable you/irreplaceable you, and
Ogden Nash’s to twinkle so/I thinkle so. Moreover, although embraceable is a legal Eng
lish word, it is probably an on-the-spot coinage; and thinkle is of course a distortion of
think made up for the sake of a humorous rhyme. So these words are not likely stored in
the lexicon (unless one has memorized the poem).
(9) . For instance, none of the connectionists referred to here cite Marcus; neither do any
of the papers in a 1999 special issue of Cognitive Science entitled “Connectionist Models
of Human Language Processing: Progress and Prospects”; neither was he cited other
than by me at a 2006 Linguistic Society of America Symposium entitled “Linguistic Struc
ture and Connectionist Models: How Good is the Fit?”
(11) . Note that in sentence production, the dependency goes the other way: A speaker us
es the semantic relations in the thought to be expressed to guide the arrangement of
words in syntactic structure.
(12) . Spatial structure in working memory also has the potential for multiple drafts. Is
there a cat behind the bookcase or not? These hypotheses are represented as two differ
ent spatial structures corresponding to the same visual input.
(13) . Another strand of psycholinguistic research on coercion (e.g., Pylkkänen & McElree,
2006) also finds evidence of increased processing load with coerced sentences. However,
those authors’ experimental technique, self-paced reading, does not provide enough tem
poral resolution to distinguish syntactic from semantic processing load. Paczynski, Jack
endoff, and Kuperberg (in revision) find ERP effects connected with aspectual coercion.
Ray Jackendoff
Page 30 of 31
A Parallel Architecture Model of Language Processing
Ray Jackendoff is Seth Merrin Professor of Philosophy and Co-Director of the Center
for Cognitive Studies at Tufts University. He was the 2003 recipient of the Jean Nicod
Prize in Cognitive Philosophy and has been President of both the Linguistic Society of
America and the Society for Philosophy and Psychology. His most recent books are
Foundations of Language (Oxford, 2002), Simpler Syntax (with Peter Culicover, Ox
ford, 2005), Language, Consciousness, Culture (MIT Press, 2007), Meaning and the
Lexicon (Oxford, 2010), and A User’s Guide to Thought and Meaning (Oxford, 2011).
Page 31 of 31
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
This epilogue looks at themes and trends that hint at future developments in cognitive
neuroscience. It first considers how affective neuroscience merged the study of neuro
science and emotion, how social neuroscience merged the study of neuroscience and so
cial behavior, and how social cognitive neuroscience merged the study of cognitive neuro
science with social cognition. Another theme is how the levels of analysis of behavior/ex
perience can be linked with psychological process and neural instantiation. Two topics
that have not yet been fully approached from a cognitive neuroscience perspective, but
seem ripe for near-term future progress, are the study of the development across the
lifespan of the various abilities described in the book, and the study of the functional or
ganization of the frontal lobes and their contributions to behaviors (e.g., the ability to ex
ert self-control). This epilogue also explores the multiple methods, both behavioral and
neuroscientific, used in cognitive neuroscience, new ways of modeling relationships be
tween levels of analysis, and the question of how to make cognitive neuroscience relevant
to everyday life.
Keywords: cognitive neuroscience, emotion, social behavior, social cognition, functional organization, frontal
lobes, behaviors, methods, analysis, neural instantiation
Whether you have read the two-volume Handbook of Cognitive Neuroscience from cover
to cover or have just skimmed a chapter or two, we hope that you take away a sense of
the breadth and depth of work currently being conducted in the field. Since the naming of
the field in the backseat of a New York City taxicab some 35 years ago, the field and the
approach it embodies have become a dominant—if not the dominant—mode of scientific
inquiry in the study of human cognitive, emotional, and social functions.
But where will it go from here? Where will the next 5, 10, or even 20 years take the field
and its approach? Obviously, nobody can say for sure—but there are broad intellectual
Page 1 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
themes and trends that run throughout this two-volume set, and a discussion of them can
be used as a springboard to thinking about possible directions future work might take.
What’s in a Name?
It is said that imitation is the sincerest form of flattery. Given the proliferation of new ar
eas of research with names that seemingly mimic cognitive neuroscience, the original has
reason to feel flattered.
Consider, for example, the development of three comparatively newer fields and the dates
of their naming: social neuroscience (Cacioppo, 1994), affective neuroscience (Panksepp
1991), and social cognitive neuroscience (Ochsner & Lieberman, 2001). Although all
three fields are undoubtedly the (p. 600) products of unique combinations of influences
(see, e.g., Cacioppo, 2002; Ochsner, 2007; Panksepp, 1998), they each followed in the
footsteps of cognitive neuroscience. In cognitive neuroscience the study of cognitive abili
ties and neuroscience were merged, and in the process of doing so, the field has made
considerable progress. In like fashion, affective neuroscience combined the study of emo
tion with neuroscience; social neuroscience, the study of social behavior with neuro
science; and social cognitive neuroscience, the study of social cognition with cognitive
neuroscience.
All three of these fields have adopted the same kind of multilevel, multimethod con
straints and convergence approach embodied by cognitive neuroscience (as we discussed
in the Introduction to this Handbook). In addition, each of these fields draws from and
builds on, to differing degrees, the methods and models developed within what we can
now call “classic” cognitive neuroscience (see Vol. 1 of the Handbook). These new fields
are siblings in a family of fields that have the similar, if not identical, research “DNA.”
It is for these reasons that Volume 2 of this Handbook has sections devoted to affect and
emotion and to self and social cognition. The topics of the constituent chapters in these
sections could easily appear in handbooks of affective or social or social cognitive neuro
science (and in some cases, they already have, see, e.g., Cacioppo & Berntson, 2004;
Todorov et al., 2011). We included this material here because it represents the same core
approach that guides research on the classic cognitive topics in Volume 1 and in the lat
ter half of Volume 2.
Page 2 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
One might wonder whether these related disciplines are on trajectories for scientific and
popular impact similar to that of classic cognitive neuroscience. In the age of the Inter
net, one way of quantifying impact is simply to count the number of Google hits returned
by a search for specific terms, in this case, “cognitive neuroscience,” “affective neuro
science,” and so on. The results of an April 2012 Google search for field names is shown
in the tables at right. The top table compares cognitive neuroscience with two of its an
tecedent fields: cognitive psychology (Neisser, 1967) and neuroscience. The bottom table
compares the descendants of classic cognitive neuroscience that were noted above. As
can be seen, cognitive psychology and neuroscience are the oldest fields and the ones
with the most online mentions. By comparison, their descendant, cognitive neuroscience,
which describes a narrower field than either of its ancestors, is doing quite well. And the
three newest fields of social, affective, and social cognitive neuroscience, each of which
describes fields even narrower than that of cognitive neuroscience, also are doing well,
with combined hit counts totaling about one-third that of cognitive neuroscience, in spite
of the fact that the youngest field is only about one-third of cognitive neuroscience’s age.
A theme running throughout the chapters concerns the different ways in which we can
link the levels of analysis of behavior/experience, psychological process, and neural in
stantiation. Here, we focus on two broad issues that were addressed, explicitly or implic
itly, by many of the authors of chapters in these volumes.
The first issue is the complexity of the behaviors that one is attempting to map onto un
derlying processes and neural systems. For example, one might ask whether we should
try to map what might be called “molar” abilities, such as memory or attention, onto sets
of processes and neural systems, or instead whether we should try to map “molecular”
subtypes of memory and subtypes of attention onto their constituent processes and neur
al systems. As alluded to in the Introduction, for most of the abilities described in Volume
1, it was clear as early as 20 years ago that a more molecular, subtype, method of map
ping makes the most sense in the context of neuroscience data. The current state-of-the-
Page 3 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
art in the study of perception, attention, memory, and language (reviewed in Volume 1 of
this Handbook) clearly bears this out. All the chapters in these sections describe careful
ways in which researchers have used combinations of behavioral and brain data to frac
tionate the processes that give rise to specific subtypes of abilities.
This leads us to the second issue, which concerns the fact that for at least some of the
topics discussed (p. 601) in Volume 2, only recently has it become clear that more molecu
lar mappings are possible. This is because for at least some of the Volume 2 topics, behav
ioral research before the rise of the cognitive neuroscience approach had not developed
clearly articulated process models that specified explicitly how information is represent
ed and processed to accomplish a particular task. This limitation was perhaps most evi
dent for topics such as the self, some aspects of higher level social cognition such as men
tal state inference, and some aspects of emotion, including how emotions are generated
and regulated. Twenty years ago, when functional neuroimaging burst on the scene, re
searchers had proposed few if any process models of these molar phenomena. Hence, ini
tial functional imaging and other types of neuroscience studies on these topics had more
of a “let’s induce an emotional state or evoke a behavior and see what happens” flavor,
and often they did not attempt to test specific theories. This is not to fault these re
searchers; at the time, they did not have the advantage of decades of process-oriented be
havioral research from cognitive psychology and vision research to help guide them (see,
e.g., Ochsner & Barrett, 2001; Ochsner & Gross, 2004). Instead, researchers had to devel
op process models on the fly.
However, times have changed. As attested by the chapters in the first two sections of Vol
ume 2, the incorporation of brain data into research on the self, social perception, and
emotion has been very useful in developing increasingly complex, “molecular” theories of
the relationships between the behavior/experience, psychological process, and neural in
stantiation.
Just as the study of memory moved beyond single-system models and toward multiple-sys
tem models (Schacter & Tulving, 1994), the study of the self, social cognition, and emo
tion has begun to move beyond simplistic notions that single brain regions (such as the
medial prefrontal cortex or amygdala) are the seat of these abilities.
New Topics
One of the ideas that recurs in the chapters of this Handbook is that the cognitive neuro
science approach is a general-purpose scientific tool. This approach can be used to ask
and answer questions about any number of topics. Indeed, even within the broad scope of
Page 4 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
this two-volume set, we have not covered every topic already being fruitfully addressed
using the cognitive neuroscience approach.
That said, of the many topics that have not yet been approached from a cognitive neuro
science perspective, do any appear particularly promising? Four such topics seem ripe for
near-term future progress. These topics run the gamut from the study of specific brain
systems to the study of lifespan development and differences in group or social network
status, to forging links with the study of mental and physical health.
The first topic is the study of the functional organization of the frontal lobes and the con
tributions they make to behaviors such as the ability to exert self-control. At first blush,
this might seem like a topic that already has received a great deal of attention. From one
perspective, it has. Over the past few decades numerous labs have studied the relation
ship of the frontal lobes to behavior. From another perspective, however, not much
progress has been made. What is missing are coherent process models that link specific
behaviors to specific subregions of prefrontal cortex. Notably, some chapters in this
Handbook (e.g., those by Badre, Christoff, and Silvers et al.) attempt to do this within spe
cific domains. But no general theory of prefrontal cortex has yet emerged that can link
the myriad behaviors in which it is involved to specific and well-described processes that
in turn are instantiated in specific portions of this evolutionarily newest portion of our
brain.
The second topic is the study of the development across the lifespan of the various abili
ties described in the Handbook. Although some Handbook sections include chapters on
development and aging, many do not—precisely because the cognitive neuroscientific
study of lifespan changes in many abilities has only just begun. Clearly, the development
from childhood into adolescence of various cognitive, social, and affective abilities is cru
cially important, as is the ways in which these abilities change as we move from middle
adulthood into older age (Casey et al, 2010; Charles & Carstensen, 2010; Mather, 2012).
The multilevel approach that characterizes the cognitive neuroscience approach holds
promise of deepening our understanding of such phenomena. Toward this end, it is impor
tant to note that new journals devoted to some of these topics have (p. 602) appeared
(e.g., Developmental Cognitive Neuroscience, which was first published in 2010), and var
ious institutes within the National Institutes of Health (NIH) have called for research on
these topics.
The third topic is the study of the way in which group-level variables impact the develop
ment and operation of the various processing systems described in both Volumes of this
Handbook. Notably, this is an area of research that is not yet represented in the Hand
book, although interest in connecting the study of group-level variables to the study of
the brain has been growing over the past few years. Consider, for example, emerging re
search suggesting that having grown up as a member of different cultural groups can dic
tate whether and how one engages perceptual, memory, and affective systems both when
reflecting on the self and in social settings (Chiao, 2009). There is also evidence that the
size of one’s social networks can impact the structure of brain systems involved in affect
Page 5 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
and affiliation, and that one’s status within these networks (Bickart et al., 2011) can de
termine whether and when one recruits brain systems implicated in emotion and social
cognition (Bickart et al., 2011; Chiao, 2010; Muscatell et al., 2011). Forging links be
tween group-level variables and the behavior/experience, process, and brain levels that
are the focus in the current Handbook will prove challenging and may require new kinds
of collaborative relationships with other disciplines, such as sociology and anthropology.
As these collaborations grow to maturity, we predict this work will make its way into fu
ture editions of the Handbook.
The fourth topic is the way in which specific brain systems play important roles in physi
cal, as well as mental, health. The Handbook already includes chapters that illustrate how
cognitive neuroscience approaches are being fruitfully translated to understand the na
ture of dysfunction, and potential treatments for it, in various kinds of psychiatric and
substance use disorders (see e.g., Barch et al., 2009; Johnstone et al., 2007; Kober et al.,
2010; Ochsner, 2008 and the section below on Translation). This type of translational
work is sure to grow in the future. What the current Handbook is missing, however, is dis
cussion of how brain systems are critically involved in physical health via their interac
tions with the immune system. This burgeoning area of interest seeks to connect fields
such as health psychology with cognitive neuroscience and allied disciplines to under
stand how variables like chronic stress or disease, or social connection vs. isolation, can
boost or diminish physical health. Such an effect would arise via interactions between the
immune system and brain systems involved in emotion, social cognition, and control
(Muscatell & Eisenberger, 2012; Eisenberger & Cole, 2012). This is another key area of
future growth that we expect to be represented in this Handbook in the future.
New Methods
How are we going to make progress on these questions and the countless others posed in
the chapters of the Handbook? On the one hand, the field will undoubtedly continue to
make good use of the multiple methods—both behavioral and neuroscientific—that have
been its bread and butter for the past decades. As noted in the Introduction, certain em
pirical and conceptual advances were only made possible by technological advances,
which enabled us to measure activity with dramatically new levels of spatial and temporal
resolution. The advent of positron emission tomography, and later functional magnetic
resonance imaging (20–30 years ago), were game-changing advances.
On the other hand, these functional imaging techniques are still limited in terms of their
spatial and temporal resolution, and the areas of the brain they allow researchers to fo
cus on reflect the contributions of many thousands of neurons. Other techniques, such as
magnetoencephalography and scalp electroencephalography, offer relatively good tempo
ral resolution, but their spatial localization is relatively poor. Moreover, they are best suit
ed to studying cortical rather than subcortical regions.
We could continue to beat the drum for the use of converging methods: What one tech
nique can’t do, another can, and by triangulating across methods, better theories can be
Page 6 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
built and evaluated. But for the next stage of game-changing methodological advances to
be realized, either current technologies will need to undergo a transformation that en
ables them to combine spatial and temporal resolution in new ways or new techniques
that have better characteristics will need to be invented.
All this said, even the greatest of technological advances will not immediately be useful
unless our ability to conceptualize the cognitive and emotional processes that lie between
brain and behavior becomes more sophisticated.
One possible response to this concern is that the description of phenomena at multiple
levels of analysis allows us to sidestep this problem. One could argue that at the highest
level of description, it’s just fine to use folk-psychological terms to describe behavior and
experience. After all, our goal is to map these terms—which prove extremely useful for
everyday discourse about human behavior—onto precise descriptions of underlying neur
al circuitry by reference to a set of information processing mechanisms.
Marr (1982) suggested a solution to this problem: Rely on the language of computation to
characterize information processing. The language of computation characterizes what
computers do, and this language often can be applied to describe what brains do. But
brains are demonstrably not digital computers, and thus it is not clear whether the tech
nical vocabulary that evolved to characterize information processing in computers can in
fact always be appropriately applied to brains. Back in the 1980s, many researchers
hoped that connectionist models might provide an appropriate kind of computational
specificity. More recently, computational models from the reinforcement learning and
neuroeconomic literatures have been advanced as offering a new level of computational
specificity.
Although no existing approach has yet offered a computational language that is powerful
enough to describe more than thin slices of human information processing, we believe
Page 7 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
that such a medium will become a key ingredient of the cognitive neuroscience approach
in the future.
Translation
In an era in which increasing numbers of researchers are applying for a static or shrink
ing pool of grant funding, some have come to focus on the question of how to use cogni
tive neuroscience to solve problems that arise in everyday life (and therefore address the
concerns of funding agencies, which often are pragmatic and applied).
Research is often divided into two categories (somewhat artificially): “Foundational” re
search focuses on understanding phenomena for its own sake, whereas “translational” re
search focuses on using such understanding to solve a real-world problem. Taking cogni
tive neuroscience models of abilities based on studies of healthy populations and applying
them to understand and treat the bases of dysfunction in specific groups is one form of
translational research. This will surely be an area of great future growth.
Already, a number of areas of psychiatric and substance use research have adopted a two-
step translational research sequence (e.g., Barch et al., 2004, 2009; Carter et al., 2009;
Ochsner, 2008; Paxton et al., 2008). The first step involves building a model of normal be
havior, typically in healthy adults, using the cognitive neuroscience approach. The second
step involves translating that model to a population of interest, and using the model to ex
plain the underlying bases of the disorder or other deviation from the normal baseline—
and this would be a crucial step in eventually developing effective treatments. This popu
lation could suffer from some type of clinically dysfunctional behavior, such as the four
psychiatric groups described in Part 4 of Volume 2 of the Handbook. It could be an ado
lescent or older adult population, as described in a handful of chapters scattered across
sections of the Handbook. Or—as was not covered in the Handbook, but might be in the
future—it could be a vulnerable group for whom training in a specific type of cognitive,
affective, or social skill would improve the quality of life.
Happily, there is evidence that federal funding agencies are beginning to under
(p. 604)
The two-step approach allows initial research to focus on understanding core processes—
considered in the context of different levels of analysis—but with an eye toward then un
derstanding how variability in these processes gives rise to the full range of normal to ab
normal behavior. Elucidating the fundamental nature of these cognitive and emotional
processes, and their relation to the behavioral/experiential level above and to the neural
level below, is the fundamental goal of cognitive neuroscience.
Concluding Comment
How do we measure the success of a field? By the number of important findings and in
sights? By the number of scientists and practitioners working within it?
If we take that late 1970s taxicab ride, when the term cognitive neuroscience was first
used as the inception point for the field, then by any and all of these metrics, cognitive
neuroscience has been enormously successful. Compared with physics, chemistry, medi
cine, and biology, however—or even compared with psychology and neuroscience—cogni
tive neuroscience is just beginning to hit its stride. This is to be expected, given that it
has existed only for a very short period of time. Indeed, the day for cognitive neuro
science is still young.
This is good news. Even though cognitive neuroscience is entering its mid-30s, compared
with these other broad disciplines that were established hundreds of years ago, this isn’t
even middle age. The hope, then, is that the field can continue to blossom and grow from
its adolescence to full maturity—and make good on the promising returns it has produced
so far.
References
Barch, D. M., Braver, T. S., Carter, C. S., Poldrack, R. A., & Robbins, T. W. (2009). CN
TRICS Final task selection: Executive control. Schizophrenia Bulletin, 35, 115–135.
Page 9 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Barch, D. M., Mitropoulou, V., Harvey, P. D., New, A. S., Silverman, J. M., & Siever, L. J.
(2004). Context-processing deficits in schizotypal personality disorder. Journal of Abnor
mal Psychology, 113, 556–568.
Bickart, K. C., Wright, C. I., Dautoff, R. J., Dickerson, B. C., & Barrett, L. F. (2011). Amyg
dala volume and social network size in humans. Nature Neuroscience, 14, 163–164.
Cacioppo, J. T., & Berntson, G. G. (2004) (Eds.). Social neuroscience: Key readings (Vol.
14). New York: Ohio State University Psychology Press.
Carter, C. S., Barch, D. M., Gur, R., Pinkham, A., & Ochsner, K. (2009). CNTRICS Final
task selection: Social cognitive and affective neuroscience-based measures. Schizophre
nia Bulletin, 35, 153–162.
Casey, B. J., Jones, R. M., Levita, L., Libby, V., Pattwell, S. S., et al. (2010). The storm and
stress of adolescence: Insights from human imaging and mouse genetics. Developmental
Psychobiology, 52, 225–235.
Charles, S. T., & Carstensen, L. L. (2010). Social and emotional aging. Annual Review Psy
chology, 61, 383–409.
Chiao, J. Y. (2009). Cultural neuroscience: a once and future discipline. Progress in Brain
Research, 178, 287–304.
Chiao, J. Y. (2010). Neural basis of social status hierarchy across species. Current Opinion
in Neurobiology, 20, 803–809.
Eisenberger, N. I., & Cole, S. W. (2012). Social neuroscience and health: neurophysiologi
cal mechanisms linking social ties with physical health. Nature Neuroscience, 15, 669–
674.
Johnstone, T., van Reekum, C. M., Urry, H. L., Kalin, N. H., & Davidson, R. J. (2007). Fail
ure to regulate: counterproductive recruitment of top-down prefrontal-subcortical circuit
ry in major depression. The Journal of Neuroscience: the Official Journal of the Society for
Neuroscience, 27, 8877–8884.
Kober, H., Mende-Siedlecki, P., Kross, E. F., Weber, J., Mischel, W., et al. (2010). Pre
frontal-striatal pathway underlies cognitive regulation of craving. Proceedings of the Na
tional Academy of Sciences of the United States of America, 107, 14811–14816.
Mather, M. (2012). The emotion paradox in the aging brain. Annals of the New York Acad
emy of Sciences, 1251, 33–49.
Page 10 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Marr, D. (1982). Vision: A computational investigation into the human representa
(p. 605)
tion and processing of visual information (397pp). San Francisco: W.H. Freeman.
Muscatell, K. A., Morelli, S. A., Falk, E. B., Way, B. M., Pfeifer, J. H., et al. 2012. Social sta
tus modulates neural activity in the mentalizing network. NeuroImage, 60, 1771–7.
Noble, K. G., McCandliss, B. D., & Farah, M. J. (2007). Socioeconomic gradients predict
individual differences in neurocognitive abilities. Developmental Science, 10, 464–480.
Ochsner, K. N. (2008). The social-emotional processing stream: Five core constructs and
their translational potential for schizophrenia and beyond. Biological Psychiatry, 64, 48–
61.
Ochsner, K. N., & Gross, J. J. (2004). Thinking makes it so: A social cognitive neuroscience
approach to emotion regulation. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of
Self-regulation: Research, Theory, and Applications (pp. 229–255). New York: Guilford
Press.
Ochsner, K. N., & Lieberman, M. D. (2001). The emergence of social cognitive neuro
science. American Psychologist, 56, 717–734.
Panksepp, J. (1998). Affective neuroscience: The foundations of human and animal emo
tions. New York: Oxford University Press.
Paxton, J. L., Barch, D. M., Racine, C. A., & Braver, T. S. (2008). Cognitive control, goal
maintenance, and prefrontal function in healthy aging. Cerebral Cortex, 18, 1010–1028.
Pizzagalli, D. A., Holmes, A. J., Dillon, D. G., Goetz, E. L., Birk, J. L., Bogdan, R., et al.
(2009). Reduced caudate and nucleus accumbens response to rewards in unmedicated in
dividuals with major depressive disorder. Am J Psychiatry, 166 (6), 702–710.
Schacter, D. L., & Tulving, E. (1994). Memory Systems 1994 (Vol. 8). Cambridge, MA: MIT
Press.
Page 11 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Todorov, A. B., Fiske, S. T., & Prentice, D. A. (2011). Social neuroscience: Toward Under
standing the Underpinnings of the Social Mind (Vol. 8). New York: Oxford University
Press.
Kevin N. Ochsner
Stephen Kosslyn
Stephen M. Kosslyn, Center for Advanced Study in the Behavioral Sciences, Stanford
University, Stanford, CA
Page 12 of 12
Index
Index
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn
Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013
Page 2 of 46
Index
language, 173
neglect syndrome, 331
reading and spelling, 494, 499–500
speech perception, 509
speech processing system, 508f
anorexia, olfactory perception, 102
anterior cingulate cortex (ACC)
executive attention, 301t, 302, 304
learning, 419, 421
retrieval, 384
anterior olfactory nucleus, 93, 95f
anterior piriform cortex, 95f, 96
anterograde amnesia, 376, 439, 443, 474
aphasia, 368n.3, 390
verbal working memory, 402–403
apraxia, 327, 567, 568
object-related actions, 361
Aristotle, 417
articulation
linking to vocal acoustics, 525–526
speech perception, 514–515
artificial neural networks, spatial judgments, 41
asomatognosia, 202
Association for Chemoreception Science, 97
associative phase, stage model, 417–418
astereopsis, 197
Atlas of Older Character Profiles, Dravnieks, 97
attack, timbre, 114
attention, 255–256, 267–268, 268. See also auditory attention; spatial attention
cognitive neuroscience, 313
constructs of, and marker tasks, 297–300
defining as verb, 256–257
deployment, in time, 230, 231f
(p. 608) development of, networks, 302–311, 312–313
Page 3 of 46
Index
selective, 225–228
as selectivity, 297–299
as state, 297
sustained, 224–225
theories of visual, 259–263
tuning, 258–259
attentional deficit
neglect and optic ataxia, 337, 339–341
oculomotor deficits, 343–344
optic ataxia, 335–337
attentional disorders
Bálint–Holmes syndrome, 326–328
dorsal and ventral visual streams, 320, 322–323
optic ataxia, 335–341
psychic paralysis of gaze, 341–344
saliency maps of dorsal and ventral visual streams, 321f
simultanagnosia, 342–343
unilateral neglect, 328–334
attentional landscape, reach-to-grasp movement, 276
attentional orienting, inhibition, 298–299
attention-for-action theory, 262–263
Attention in Early Development, Ruff and Rothbart, 302
attention networks
childhood, 306–311
developmental time course of, 311f
development of, 302–311
infancy and toddlerhood, 302–306
attention network task (ANT), 300, 306–307
attention window, spatial relation, 41–42
attentive listening, rhythm, 118
audiovisual phonological fusion, 531
audiovisual speech perception
extraction of visual cues in, 538–540
McGurk illusion, 531–532, 535, 537, 538
neural correlates of integration, 540–542
perception in noise, 529f
spatial constraints, 538
temporal constraints, 534–538
temporal window of integration, 536f, 537
audition, 135–136, 200–201. See also auditory system
amusia, 127, 201
auditory agnosia, 201
auditory scene analysis, 156–162
auditory system, 136, 137f
challenges, 136, 163–164
frequency selectivity and cochlea, 136–137, 139–140
future directions, 162–164
inference problems of, 163
Page 4 of 46
Index
Page 5 of 46
Index
Page 6 of 46
Index
attention, 230f
audiovisual speech, 543
auditory sentence comprehension, 188f
category-specific patterns in healthy brain, 559f
congenitally blind and sighted participants, 564f
episodic memory, 379
intramodal selective attention, 226
measuring early brain activity, 173
mental imagery, 75, 85
responses across neural populations, 19–20
spatial working memory, 399f, 400
sustained attention, 224
body perception disorders, 202
BOLD. See blood oxygen level-dependent signal (BOLD)
bottom-up inputs, 62, 60, 67f
brain
activation in motor areas, 84
audiovisual speech integration, 540–542
auditory attention, 229, 230f
auditory sentence comprehension, 187, 188f
basis of sound segregation, 160–161
bilateral responses for speech and singing, 174–175
category specificity, 571–572
cognitive neuropsychology, 554–555
combining sensory information, 524–525
division of labor, 31–32, 41, 50
functional magnetic resonance imaging (fMRI) measuring activation, 12
functional specialization of visual, 194–195
mapping, 3
measuring activity in early development, 172–173
musical functions, 127–129
object-, face- and place-selective cortex, 12, 13f
olfactory information, 96
semantic memory, 358
topographic maps of, 29–31
brain injury
bilateral postchiasmatic, 196
memory disorders, 473–474
mental imagery and perception, 79–80
musical functions, 128
perceptual disorders, 205
sensorimotor processing, 361
traumatic, 473, 478, 481
visual disorders, 195
Broadbent, Donald, 297
Broca’s aphasia, verbal working memory, 402–403
Page 7 of 46
Index
Page 8 of 46
Index
Page 9 of 46
Index
Page 10 of 46
Index
Cowan’s K, 401
creative problem-solving, role of sleep in, 450
Critique of Pure Reason, Kant, 50, 528
culture, pleasantness ratings by, 99, 100f
D
declarative memory, 353, 443
long-term, 475–478
sleep-related consolidation of, 444–448
delayed response task, working memory, 394–395, 396f
dementia, 203, 481
depth perception, 197
diabetes mellitus, 203
diagonal illusion, 285
Diamond, Adele, 304
difference in memory (DM) paradigm, 379
diffusion tensor imaging (DTI), 428, 494
direction, gravity, 37–38
disconnected edges, response, 15f
distributed, 23
division of labor
analog vs. digital spatial relations, 45
brain, 31–32
brain organization, 41
visual processing, 277–278
domain-general binding, 380
domain-specific category-based models, semantic memory, 356
domain-specific encoding, 380
domain-specific hypothesis
category-specific semantic deficits, 556, 557, 561
distributed, 565–567, 572
dopamine
alerting network, 300, 301t
executive attention, 301t, 302
dorsal frontoparietal network
orientation, 308
spatial attention, 245–246, 250
dorsal premotor cortex (PMC), 118, 123, 397, 423
dorsal simultanagnosia, 199–200
dorsal system
object map, 46
object recognition, 45–48
spatial information, 43
dorsal visual stream
active vision, 323, 326
episodic retrieval, 383
interaction with ventral stream, 289–290
landmark test characterizing, 321f
object identification, 32
Page 11 of 46
Index
Page 12 of 46
Index
Page 13 of 46
Index
Page 14 of 46
Index
F
face agnosia, 200
face-selective cortex, fMRI signals, 13f
facial movements, combination with vocal acoustics, 529–531
familiarity, 458
far transfer, working memory, 402
fault tolerance, 367
fear-conditioning task, sleep, 447
feature-based attention, 258–259, 261–262
feature integration theory, 46
feedback connections, visual world, 61–62
feedforward connections, visual world, 61–62
figure-ground segregation, auditory attention, 221–222
filling in, sound segregation, 159–160
filtering, separating sound sources from environment, 161–162
Fitts, Paul, 417
flanker task, 300, 310
Fowler, Carol, 525
frames of reference
object-centered or word-centered, 38–39
spatial representation, 36–38
French
syntactic structure, 186–187
word stress, 177–179, 179f
frequency selectivity, cochlea, 136–137, 139f, 139–140
frontal eye field (FEF), 36, 226, 230f,\
attention, 334
orienting network, 301
remapping, 325f
spatial attention, 244
spatial attention and eye movements, 266, 267
spatial working memory, 398–399
frontal lobes, 601
functional imaging, encoding episodic memory, 378–379
(p. 612) functional magnetic resonance imaging (fMRI)
Page 15 of 46
Index
Page 16 of 46
Index
Page 17 of 46
Index
Page 18 of 46
Index
Page 19 of 46
Index
Page 20 of 46
Index
Page 21 of 46
Index
Page 22 of 46
Index
Page 23 of 46
Index
navigational, 48–49
Plato, 74
recovery of functioning, 478–479
rehabilitation, 480–484
stages of remembering, 474
systems, 474–478
memory as reinstatement (MAR) model, episodic memory, 376f, 377, 378, 381
memory disorders, 473–474, 484
anterograde amnesia, 474
assessment of memory functioning, 479–480
compensating with memory aids, 480–481
declarative long–term memory, 475–478
emotional consequences, 483–484
episodic, 475–476
memory systems, 474–478
modifying the environment, 483
new learning, 481–483
non-declarative long-term memory, 478
priming, 478
procedural memory, 478
prospective memory, 477–478
recovery of memory functioning, 478–479
rehabilitation of memory, 480–484
relationship between semantic and episodic memory, 477
retrograde amnesia, 474
semantic memory, 476–477
short-term and working memory, 474–475
stages of remembering, 474
understanding, 474–484
memory-guided saccade (MGS), spatial working memory, 399f
memory impairment, 474
memory systems debate, 4
memory trace
consolidation, 437
skill learning, 418
menstrual synchrony, olfaction influence, 103
mental exertion, 437
mental imagery, 74, 84–85
brain-damaged patients, 80–81
dorsal stream and spatial, 81–82
early studies of, 74–76
imagery debate, 74–76
visual, and early visual areas, 76–78
visual, and higher visual areas, 78–82
mental mimicry, 224
mental rotation tasks, strategies in, 83–84
mental rotation paradigm
humans, 82–83
Page 24 of 46
Index
objects, 75
mesial temporal epilepsy, 203
meter, music, 112, 117–118
middle temporal gyrus, speech processing system, 508f, 509
Mikrokosmos, Bartok, 124
Milan square’s neglect experiment, 49
Miller, George, 1
mismatch negativity (MMN), 178
audiovisual speech, 541, 545n.10
chords, 116
language development, 176f
oddball paradigm, 543
phonetic learning, 518
speech sounds, 176
timbre, 118–119
mismatch paradigm, 176
mismatch response (MMR), 176
missing fundamental illusion, 152
modulation frequencies, 145
modulation tuning, sound measurement, 146–148
Molaison, Henry (H.M.), 4, 376–377, 438, 478
monkeys
navigational memory, 49
neurophysiological studies, 267
perception of space and object, 33
motion perception, 198
motor imagery, 82–84
functional role of area M1, 84
music, 124
physical movement, 82–83
strategies in mental rotation tasks, 83–84
motor systems, speech perception, 514–515
mouse, olfactory system, 90f
moving window paradigm, perception, 264, 264f
Müller, Georg, 436
Müller–Lyer illusion, 66, 285
multimodal speech perception. See speech perception, multimodal
multiple sclerosis, 473
gustatory disorders, 203
olfactory perception, 203
multiple semantics assumption, 558–559
multivoxel pattern analysis (MVPA)
episodic retrieval, 383
semantic memory, 366, 368
sensibility to position, 17, 18
music
absolute pitch, 121
action, 122–124
Page 25 of 46
Index
disorders, 127
emotion, 124–126
episodic memory, 120–121
imagery, 123–124
improvised performance, 123
memory, 119–121
motor imagery, 124
parallels between music and language, 121–122
perception and cognition, 115–121
performance, 123
pitch and melody, 115
psychology and neuroscience, 111–112
rhythm and meter, 117–118
score-based performance, 123
semantics, 121–122
sensorimotor coordination, 122–123
singing, 122–123
syntax, 121
tapping, 122
timbre, 114–115, 118–119
time, 112
tonal dynamics, 117
tonality, 112–114, 115–117
tonal relationships, 113f
working memory, 120
working memory maintenance, 405f
N
National Institute of Mental Health (NIMH), 604
Navigation, spatial information, 48–49
n-back task, sustained working memory, 224, 225f
negative priming, phenomenon, 299
neglect. See also unilateral neglect
reference frames, 39f
space, 38–40
neglect dyslexia, 40
Neopolitan sixth, 116
neural networks
invariant object recognition, 15–16
spatial attention, 244–246
neural play, slow-wave sleep, 445–446
Page 26 of 46
Index
neuroanatomy
attention, 300–302
mammalian olfactory system, 88–96
neuroimaging
cognitive aging, 465–467
cortical 3D processing, 36
musical processes, 120
navigational tasks, 49
reach-to-grasp actions, 34–35
shape-selective actions, 33
NeuroPage, memory aid, 480–481
neurophysiology
monkeys, 267
visual-spatial attention and eye movements, 266
neuroscience, 111–112, 600
neurotransmitters, attention networks, 301t
noise-vocoded speech, 146, 147f
nondeclarative memory, 443, 448–450, 478
noradrenaline, alerting network, 301t
norepinephrine, 300, 301
O
obesity, olfactory perception, 102
object-centered, frame of reference, 38–39
object form topography, 22
object map, 46
object recognition
category-specific modules, 22
cue-invariant responses in lateral occipital complex (LOC), 14–15
distributed object form topography, 22
electrophysiology measurements, 11–12
functional magnetic resonance imaging (fMRI), 11, 12
functional organization of human ventral stream, 12–14
future directions, 23–24
lesions of dorsal system, 46
neural bases of invariant, 15–16
neural sensitivity to object view, 20
open questions, 23–24
orbitofrontal cortex (OFC), 64
process maps, 22
representations of faces and body parts, 22–23
responses to shape, edges and surfaces across ventral stream, 15f
theories of, 18, 20–21
variant neuron, 20–21
object recognition task, unilateral posterior lesions, 47f
objects. See also category-specific semantic deficits
ambiguous figures, 68–69
perceptual and conceptual processing of, 360–362
spatial attention, 239–240
Page 27 of 46
Index
Page 28 of 46
Index
Page 29 of 46
Index
Page 30 of 46
Index
pleasantness
music, 125
odorants, 97, 98f, 99–101
Ponzo illusion, 286, 287f
positron emission tomography (PET)
auditory attention, 223
changes in regional cerebral blood flow, 360
mental imagery, 75–76
reading and spelling, 494
semantic memory retrieval, 359
spatial attention, 245
visual information, 567
visual mental imagery, 76–77
visual-spatial working memory, 396–398
Posner, Michael, 297, 417
model of attention, 299–300
Posner cueing paradigm, 259, 298, 308, 339f
posterior parietal cortex (PPC), 34–36, 49, 565
category specificity, 563
mental imagery, 81, 82, 84
optic ataxia, 336f
somatosensory perception, 201, 205
spatial attention, 245
unilateral neglect, 333–334
vision for action, 277–278, 281–283, 289
visual attention, 319, 323, 327
posterior piriform cortex, 95f, 96
precedence effect, 161
prediction, relevance in visual recognition, 62–65
prefrontal cortex (PFC)
activity in young and older adults, 460f
episodic memory, 456–457
episodic memory encoding, 459–461
episodic memory retrieval, 461–463
hemispheric asymmetry reduction in older adults (HAROLD), 458–463
longitudinal changes, 457f
resource deficit hypothesis, 467
sustained and transient memory effects by aging, 461f
top-down control of working memory, 408–410
verbal and spatial working memory for older and younger adults, 459f
visual-spatial working memory, 397–398
working memory, 394–395, 396f, 458–459
premotor cortex (PMC), 36, 37, 123, 128f
attention, 230f
body perception disorders, 202
mental imagery, 83
rhythm, 118
semantic memory, 361, 364
Page 31 of 46
Index
Page 32 of 46
Index
Page 33 of 46
Index
Page 34 of 46
Index
Page 35 of 46
Index
Page 36 of 46
Index
Page 37 of 46
Index
Page 38 of 46
Index
Page 39 of 46
Index
subcortical networks
auditory system, 142
neglect, 40
spatial attention, 244
suicide, 473
Sullivan, Anne, 525f
superior colliculus, 30, 36
attention, 322, 323
audiovisual speech, 540
orienting network, 301
spatial attention, 238, 242, 244, 267
visual processing, 277f, 283
superior parietal lobe
agent operating in space, 36
eye movement, 35
limb movements, 37
superior parietal occipital cortex (SPOC), 282
superior temporal cortex, 36, 40, 187, 226, 327, 404–406, 512
superior temporal gyrus (STG), 40, 144f, 187
auditory attention, 227
auditory cortex, 144f
language, 173, 174f
memory, 225f, 402
music, 115, 128f
semantic memory, 365
speech perception, 509
speech processing system, 508f
(p. 620) unilateral neglect, 333
Page 40 of 46
Index
Page 41 of 46
Index
Page 42 of 46
Index
Page 43 of 46
Index
Page 44 of 46
Index
words
frame of reference, 38–39
mapping of sounds to, 515–518
meaning in language acquisition, 180–182
word-learning paradigm, 181
word stress in language acquisition, 177–179
working memory, 389–390, 410
aging process, 456
audition, 162–163
capacity, 400–402
capacity expansion, 402
central executive, 392, 407
cognitive control of, 407–410
compensation-related utilization of neural circuits hypothesis (CRUNCH), 459
delayed response task, 394–395, 396f
development of concept of, 391–394
dorsolateral prefrontal cortex (DLPFC), 119
emergence as neuroscientific concept, 394–395
Page 45 of 46
Index
Page 46 of 46