Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Oxford Handbook of Cognitive Neuroscience, Vol. 1 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1111

Oxford Library of Psychology

Oxford Library of Psychology  


The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. ii) Oxford Library of Psychology


Editor-in-Chief

Peter E. Nathan

Area Editors:

Clinical Psychology

David H. Barlow

Cognitive Neuroscience

Kevin N. Ochsner and Stephen M. Kosslyn

Cognitive Psychology

Daniel Reisberg

Counseling Psychology

Elizabeth M. Altmaier and Jo-Ida C. Hansen

Developmental Psychology

Philip David Zelazo

Health Psychology

Howard S. Friedman

History of Psychology

David B. Baker

Methods and Measurement

Page 1 of 2
Oxford Library of Psychology

Todd D. Little

Neuropsychology

Kenneth M. Adams

Organizational Psychology

Steve W. J. Kozlowski

Personality and Social Psychology

Kay Deaux and Mark Snyder

Page 2 of 2
[UNTITLED]

[UNTITLED]  
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. iv)

Oxford University Press is a department of the University of Oxford.


It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide.

Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto

With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam

Oxford is a registered trademark of Oxford University Press


in the UK and certain other countries.

Published in the United States of America by


Oxford University Press
198 Madison Avenue, New York, NY 10016

© Oxford University Press 2013


All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, without the prior
permission in writing of Oxford University Press, or as expressly permitted by law,
by license, or under terms agreed with the appropriate reproduction rights organiza­
tion.
Inquiries concerning reproduction outside the scope of the above should be sent to
the
Rights Department, Oxford University Press, at the address above.

You must not circulate this work in any other form


and you must impose this same condition on any acquirer.

Library of Congress Cataloging-in-Publication Data


The Oxford handbook of cognitive neuroscience / edited by Kevin Ochsner, Stephen
M. Kosslyn.

Page 1 of 2
[UNTITLED]

volumes cm.—(Oxford library of psychology)


ISBN 978–0–19–998869–3

1. Cognitive neuroscience—Handbooks, manuals, etc. 2. Neuropsychology—Hand­


books, manuals, etc.
I. Ochsner, Kevin N. (Kevin Nicholas) II. Kosslyn, Stephen Michael, 1948– III. Title:
Handbook of cognitive neuroscience.
QP360.5.O94 2013
612.8'233—dc23
2013026213

987654321
Printed in the United States of America
on acid-free paper

Page 2 of 2
Oxford Library of Psychology

Oxford Library of Psychology  


The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. vi) (p. vii) Oxford Library of Psychology


The Oxford Library of Psychology, a landmark series of handbooks, is published by Oxford
University Press, one of the world’s oldest and most highly respected publishers, with a
tradition of publishing significant books in psychology. The ambitious goal of the Oxford
Library of Psychology is nothing less than to span a vibrant, wide-ranging field and, in so
doing, to fill a clear market need.

Encompassing a comprehensive set of handbooks, organized hierarchically, the Library


incorporates volumes at different levels, each designed to meet a distinct need. At one
level is a set of handbooks designed broadly to survey the major subfields of psychology;
at another are numerous handbooks that cover important current focal research and
scholarly areas of psychology in depth and detail. Planned as a reflection of the dynamism
of psychology, the Library will grow and expand as psychology itself develops, thereby
highlighting significant new research that will influence the field. Adding to its accessibil­
ity and ease of use, the Library will be published in print and electronically.

The Library surveys psychology’s principal subfields with a set of handbooks that capture
the current status and future prospects of those major subdisciplines. This initial set in­
cludes handbooks of social and personality psychology, clinical psychology, counseling
psychology, school psychology, educational psychology, industrial and organizational psy­
chology, cognitive psychology, cognitive neuroscience, methods and measurements, histo­
ry, neuropsychology, personality assessment, developmental psychology, and more. Each
handbook undertakes to review one of psychology’s major subdisciplines with breadth,
comprehensiveness, and exemplary scholarship. In addition to these broadly conceived
volumes, the Library also includes a large number of handbooks designed to explore in
depth more specialized areas of scholarship and research, such as stress, health and cop­
ing, anxiety and related disorders, cognitive development, and child and adolescent as­
sessment. In contrast to the broad coverage of the subfield handbooks, each of these lat­
ter volumes focuses on an especially productive, more highly focused line of scholarship
and research. Whether at the broadest or most specific level, however, all of the Library
handbooks offer synthetic coverage that reviews and evaluates the relevant past and

Page 1 of 2
Oxford Library of Psychology

present research and anticipates research in the future. Each handbook in the Library
includes introductory and concluding chapters written by its editor or editors to provide a
roadmap to the handbook’s table of contents and to offer informed anticipations of signifi­
cant future developments in that field.

An undertaking of this scope calls for handbook editors and chapter authors who are es­
tablished scholars in the areas about which they write. Many of the (p. viii) nation’s and
world’s most productive and respected psychologists have agreed to edit Library
handbooks or write authoritative chapters in their areas of expertise.

For whom has the Oxford Library of Psychology been written? Because of its breadth,
depth, and accessibility, the Library serves a diverse audience, including graduate stu­
dents in psychology and their faculty mentors, scholars, researchers, and practitioners in
psychology and related fields. All will find in the Library the information they seek on the
subfield or focal area of psychology in which they work or are interested.

Befitting its commitment to accessibility, each handbook includes a comprehensive index,


as well as extensive references to help guide research. And because the Library was de­
signed from its inception as an online as well as a print resource, its structure and con­
tents will be readily and rationally searchable online. Further, once the Library is re­
leased online, the handbooks will be regularly and thoroughly updated.

In summary, the Oxford Library of Psychology will grow organically to provide a thorough­
ly informed perspective on the field of psychology, one that reflects both psychology’s dy­
namism and its increasing interdisciplinarity. Once published electronically, the Library is
also destined to become a uniquely valuable interactive tool, with extended search and
browsing capabilities. As you begin to consult this handbook, we sincerely hope you will
share our enthusiasm for the more than 500-year tradition of Oxford University Press for
excellence, innovation, and quality, as exemplified by the Oxford Library of Psychology.

Peter E. Nathan

Editor-in-Chief

Oxford Library of Psychology

Page 2 of 2
About the Editors

About the Editors  


The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. ix) About the Editors


Kevin N. Ochsner

Kevin N. Ochsner is Associate Professor of Psychology at Columbia University. He gradu­


ated summa cum laude from the University of Illinois where he received his B.A. in Psy­
chology. Ochsner then received a M.A. and Ph.D. in psychology from Harvard University
working in the laboratory of Dr. Daniel Schacter, where he studied emotion and memory.
Also at Harvard, he began his postdoctoral training in the lab or Dr. Daniel Gilbert, where
he first began integrating social cognitive and neuroscience approaches to emotion-cogni­
tion interactions, and along with Matthew Lieberman published the first articles on the
emerging field of social cognitive neuroscience. Ochsner later completed his postdoctoral
training at Stanford University in the lab of Dr. John Gabrieli, where he conducted some
of the first functional neuroimaging studies examining the brain systems supporting cog­
nitive forms of regulation. He is now director the Social Cognitive Neuroscience Labora­
tory at Columbia University, where current studies examine the psychological and neural
bases of emotion, emotion regulation, empathy and person perception in both healthy and
clinical populations. Ochsner has received various awards for his research and teaching,
including the American Psychological Association’s Division 3 New Investigator Award,
the Cognitive Neuroscience Society’s Young Investigator Award, and Columbia
University’s Lenfest Distinguished Faculty Award.

Stephen M. Kosslyn

Stephen M. Kosslyn is the Founding Dean of the university at the Minerva Project, based
in San Francisco. Before that, he served as Director of the Center for Advanced Study in
the Behavioral Sciences and Professor of Psychology at Stanford University, and was pre­
viously chair of the Department of Psychology, Dean of Social Science, and the John Lind­
sley Professor of Psychology in Memory of William James at Harvard University. He re­
ceived a B.A. from UCLA and a Ph.D. from Stanford University, both in psychology. His
original graduate training was in cognitive science, which focused on the intersection of
cognitive psychology and artificial intelligence; faced with limitations in those approach­
es, he eventually turned to study the brain. Kosslyn’s research has focused primarily on
Page 1 of 2
About the Editors

the nature of visual cognition, visual communication, and individual differences; he has
authored or coauthored 14 books and over 300 papers on these topics. Kosslyn has re­
ceived the American Psychological Association’s Boyd R. McCandless Young Scientist
Award, the National Academy of Sciences Initiatives in Research Award, a Cattell Award,
a Guggenheim Fellowship, the J-L. Signoret (p. x) Prize (France), an honorary Doctorate
from the University of Caen, an honorary Doctorate from the University of Paris
Descartes, an honorary Doctorate from Bern University, and election to Academia Rodi­
nensis pro Remediatione (Switzerland), the Society of Experimental Psychologists, and
the American Academy of Arts and Sciences.

Page 2 of 2
Contributors

Contributors  
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. xi) Contributors

Claude Alain

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Agnès Alsius

Department of Psychology

Queen’s University

Kingston, Ontario, Canada

George A. Alvarez

Page 1 of 19
Contributors

Department of Psychology

Harvard University

Cambridge, MA

Stephen R. Arnott

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Moshe Bar

Martinos Center for Biomedical Imaging

Massachusetts General Hospital

Harvard Medical School

Charlestown, MA

Bryan L. Benson

Department of Psychology

Page 2 of 19
Contributors

School of Kinesiology

University of Michigan

Ann Arbor, MI

Damien Biotti

Lyon Neuroscience Research Center

Bron, France

Annabelle Blangero

Lyon Neuroscience Research Center

Bron, France

Sheila E. Blumstein

Department of Cognitive, Linguistic, and Psychological Sciences

Brown Institute for Brain Science

Brown University

Providence, RI

Page 3 of 19
Contributors

Grégoire Borst

University Paris Descartes

Laboratory for the Psychology of Child Development and Education (CNRS Unit
3521)

Paris, France

Department of Psychology

Harvard University

Cambridge, MA

Nathaniel B. Boyden

Department of Psychology

University of Michigan

Ann Arbor, MI

Andreja Bubic

Martinos Center for Biomedical Imaging

Massachusetts General Hospital

Page 4 of 19
Contributors

Harvard Medical School

Charlestown, MA

Bradley R. Buchsbaum

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Roberto Cabeza

Center for Cognitive Neuroscience

Duke University

Durham, NC

Denise J. Cai

Department of Psychology

University of California, San Diego

La Jolla, CA

Page 5 of 19
Contributors

Alfonso Caramazza

Department of Psychology

Harvard University

Cambridge, MA

Center for Mind/Brain Sciences

University of Trento

Rovereto, Italy

(p. xii) Evangelia G. Chrysikou

Department of Psychology

University of Kansas

Lawrence, KS

Jared Danker

Department of Psychology

New York University

Page 6 of 19
Contributors

New York, NY

Sander Daselaar

Donders Institute for Brain, Cognition,and Behaviour

Radboud University

Nijmegen, Netherlands

Center for Cognitive Neuroscience

Duke University

Durham, NC

Lila Davachi

Center for Neural Science

Department of Psychology

New York University

New York, NY

Mark D’Esposito

Page 7 of 19
Contributors

Helen Wills Neuroscience Institute

Department of Psychology

University of California

Berkeley, CA

Benjamin J. Dyson

Department of Psychology

Ryerson University

Toronto, Ontario, Canada

Jessica Fish

MRC Cognition and Brain Sciences Unit

Cambridge, UK

Angela D. Friederici

Department of Neuropsychology

Max Planck Institute for Human Cognitive and Brain Sciences

Page 8 of 19
Contributors

Leipzig, Germany

Melvyn A. Goodale

The Brain and Mind Institute

University of Western Ontario

London, Ontario, Canada

Kalanit Grill-Spector

Department of Psychology and Neuroscience Institute

Stanford University

Stanford, CA

Argye E. Hillis

Departments of Neurology, Physical Medicine and Rehabilitation, and Cognitive


Science

Johns Hopkins University

Baltimore, MD

Page 9 of 19
Contributors

Ray Jackendoff

Center for Cognitive Studies

Tufts University

Medford, MA

Petr Janata

Center for Mind and Brain

Department of Psychology

University of California Davis

Davis, CA

Roni Kahana

Department of Neurobiology

Weizmann Institute of Science

Rehovot, Israel

Stephen M. Kosslyn

Page 10 of 19
Contributors

Minerva Project

San Francisco, CA

Youngbin Kwak

Neuroscience Program

University of Michigan

Ann Arbor, MI

Bruno Laeng

Department of Psychology

University of Oslo

Oslo, Norway

Ewen MacDonald

Department of Psychology

Queen’s University

Ontario, Canada

Page 11 of 19
Contributors

Centre for Applied Hearing Research

Department of Electrical Engineering

Technical University of Denmark

Lyngby, Denmark

Bradford Z. Mahon

Departments of Neurosurgery and Brain and Cognitive Sciences

University of Rochester

Rochester, NY

Claudia Männel

Department of Neuropsychology

Max Planck Institute for Human Cognitive and Brain Sciences

Leipzig, Germany

(p. xiii) Jason B. Mattingley

Queensland Brain Institute

Page 12 of 19
Contributors

University of Queensland

St. Lucia, Queensland, Australia

Josh H. McDermott

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, MA

Kevin Munhall

Department of Psychology

Queen’s University

Kingston, Ontario, Canada

Emily B. Myers

Department of Psychology

Department of Speech, Language, and Hearing Sciences

University of Connecticut

Page 13 of 19
Contributors

Storrs, CT

Jeffrey Nicol

Department of Psychology

Nipissing University

North Bay, Ontario, Canada

Kevin N. Ochsner

Department of Psychology

Columbia University

New York, NY

Laure Pisella

Lyon Neuroscience Research Center

Bron, France

Gilles Rode

Page 14 of 19
Contributors

Lyon Neuroscience Research Center

University Lyon

Hospices Civils de Lyon

Hôpital Henry Gabrielle

St. Genis Laval, France

Yves Rossetti

Lyon Neuroscience Research Center

University Lyon

Mouvement et Handicap

Plateforme IFNL-HCL

Hospices Civils de Lyon

Lyon, France

M. Rosario Rueda

Departemento de Psicolog í a Experimental

Centro de Investigación Mente, Cerebro y Comportamiento (CIMCYC)

Page 15 of 19
Contributors

Universidad de Granada

Granada, Spain

Rachael D. Seidler

Department of Psychology

School of Kinesiology

Neuroscience Program

University of Michigan

Ann Arbor, MI

Noam Sobel

Department of Neurobiology

Weizmann Institute of Science

Rehovot, Israel

Sharon L. Thompson-Schill

Department of Psychology

Page 16 of 19
Contributors

University of Pennsylvania

Philadelphia, PA

Caroline Tilikete

Lyon Neuroscience Research Center

University Lyon

Hospices Civils de Lyon

Hôpital Neurologique

Lyon, France

Kyrana Tsapkini

Departments of Neurology, and Physical Medicine and Rehabilitation

Johns Hopkins University

Baltimore, MD

Alain Vighetto

Lyon Neuroscience Research Center

Page 17 of 19
Contributors

University Lyon

Hospices Civils de Lyon

Hôpital Neurologique

Lyon, France

Barbara A. Wilson

Department of Psychology

Institute of Psychiatry

King’s College

London, UK

John T. Wixted

Department of Psychology

University of California, San Diego

La Jolla, CA

Eiling Yee

Page 18 of 19
Contributors

Basque Center on Cognition, Brain, and Language

San Sebastian, Spain

Josef Zihl

Neuropsychology Unit

Department of Psychology

University of Munich

Max Planck Institute of Psychiatry

Munich, Germany

(p. xiv)

Page 19 of 19
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?

Introduction to The Oxford Handbook of Cognitive Neu­


roscience: Cognitive Neuroscience—Where Are We
Now?  
Kevin N. Ochsner and Stephen Kosslyn
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0001

Abstract and Keywords

This two-volume set reviews the current state-of-the art in cognitive neuroscience. The in­
troductory chapter outlines central elements of the cognitive neuroscience approach and
provides a brief overview of the eight sections of the book’s two volumes. Volume 1 is di­
vided into four sections comprising chapters that examine core processes, ways in which
they develop across the lifespan, and ways they may break down in special populations.
The first section deals with perception and addresses topics such as the abilities to repre­
sent and recognize objects and spatial relations and the use of top-down processes in vi­
sual perception. The second section focuses on attention and how it relates to action and
visual motor control. The third section, on memory, covers topics such as working memo­
ry, semantic memory, and episodic memory. Finally, the fourth section, on language, in­
cludes chapters on abilities such as speech perception and production, semantics, the ca­
pacity for written language, and the distinction between linguistic competence and per­
formance.

Keywords: cognitive neuroscience, perception, attention, language, memory, spatial relations, visual perception,
visual motor control, semantics, linguistic competence

On a night in the late 1970s, something important happened in a New York City taxicab:
A new scientific field was named. En route to a dinner at the famed Algonquin Hotel, the
neuroscientist Michael Gazzaniga and the cognitive psychologist George Miller coined
the term “cognitive neuroscience.” This field would go on to change the way we think
about the relationship between behavior, mind, and brain.

This is not to say that the field was born on that day. Indeed, as Hermann Ebbinghaus
(1910) noted, “Psychology has a long past, but a short history,” and cognitive neuro­
science clearly has a rich and complex set of ancestors. Although it is difficult to say ex­
actly when a new scientific discipline came into being, the groundwork for the field had
begun to be laid decades before the term was coined. As has been chronicled in detail
elsewhere (Gardner, 1985; Posner & DiGirolamo, 2000), as behaviorism gave way to the

Page 1 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
cognitive revolution, and as computational and neuroscientific approaches to understand­
ing the mind became increasingly popular, researchers in numerous allied fields came to
believe that understanding the relationships between behavior and the mind required un­
derstanding their relationship to the brain.

This two-volume set reviews the current state-of-the art in cognitive neuroscience, some
35 years after the field was named. In these intervening years, the field has grown
tremendously—so much so, in fact, that cognitive neuroscience is now less a bounded dis­
cipline focused on specific topics and more an approach that permeates psychological
and neuroscientific inquiry. As such, no collection of chapters could possibly encompass
the entire breadth and depth of cognitive neuroscience. That said, this two-volume set at­
tempts systematically to survey eight core areas of inquiry in cognitive neuroscience, four
per volume, in a total of 55 chapters.

As an appetizer to this scientific feast, this introductory chapter offers a quick


(p. 2)

sketch of some central elements of the cognitive neuroscience approach and a brief
overview of the eight sections of the Handbook’s two volumes.

The Cognitive Neuroscience Approach


Among the many factors that gave rise to cognitive neuroscience, we highlight three sig­
nal insights. In part, we explicitly highlight these key ideas because they lay bare ele­
ments of the cognitive neuroscience approach that have become so commonplace today
that their importance may be forgotten even as they implicitly influence the ways re­
search is conducted.

Multiple Levels of Analysis

The first crucial influence on cognitive neuroscience were insights presented in a book by
the late British vision scientist David Marr. Published in 1982, the book Vision took an old
idea—levels of analysis—and made a strong case that we can only understand visual per­
ception if we integrate descriptions cast at three distinct, but fundamentally interrelated
(Kosslyn & Maljkovic, 1990), levels. At the topmost computational level, one describes the
problem at hand, such as how one can see edges, derive three-dimensional structure of
shapes, and so on; this level characterizes “what” the system does. At the middle algo­
rithm level, one describes how a specific computational problem is solved by a system
that includes specific processes that operate on specific representations; this level char­
acterizes “how” the system operates. And at the lowest implementation level, one de­
scribes how the representations and processes that constitute the algorithm are instanti­
ated in the brain. All three levels are crucial, and characteristics of the description at
each level affect the way we must describe characteristics at the other levels.

This approach proved enormously influential in vision research, and researchers in other
domains quickly realized that it could be applied more broadly. This multilevel approach
is now the foundation for cognitive neuroscience inquiry more generally, although we of­
Page 2 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
ten use different terminology to refer to these levels of analysis. For instance, many re­
searchers now talk about the levels of behavior and experience, psychological processes
(or information processing mechanisms), and neural systems (Mitchell, 2006; Ochsner,
2007; Ochsner & Lieberman, 2001). But the core idea is still the same as that articulated
by Marr: A complete understanding of the ways in which vision, memory, emotion, or any
other cognitive or emotional faculty operates necessarily involves connecting descriptions
of phenomena across levels of analysis.

The resulting multilevel descriptions have many advantages over the one- or two-level ac­
counts that are typical of traditional approaches in allied disciplines such as cognitive
psychology. These advantages include the ability to use both behavioral and brain data in
combination—rather than just one or the other taken alone—to draw inferences about
psychological processes. In so doing, one constructs theories that are constrained by,
must connect to, and must make sense in the context of more types of data than theories
that are couched solely at the behavioral or at the behavioral and psychological levels. We
return to some of these advantages below.

Use of Multiple Methods

If we are to study human abilities and capacities at multiple levels of analysis, we must
necessarily use multiple types of methods to do so. In fact, many methods exist to mea­
sure phenomena at each of the levels of analysis, and new measures are continually being
invented (Churchland & Sejnowski 1988).

Today, this observation is taken as a given by many graduate students who study cogni­
tive neuroscience. They take it for granted that we should use studies of patient popula­
tions, electrophysiological methods, functional imaging methods, transcranial magnetic
stimulation (TMS, which uses magnetic fields to temporarily impair or enhance neural
functioning in a specific brain area), and other new techniques as they are developed. But
this view wasn’t always the norm. This fact is illustrated nicely by a debate that took
place in the early 1990s about whether and how neuroscience data should inform psycho­
logical models of cognitive processes. On one side was the view from cognitive neuropsy­
chology, which centered on the idea that studies of patient populations may be sufficient
to understand the structure of cognitive processing (Caramazza, 1992). The claim was
that by studying the ways in which behavior changes as a result of the unhappy accidents
of nature (e.g., strokes, traumatic brain injuries) that caused lesions of language areas,
memory areas, and so on, we can discover the processing modules that constitute the
mind. The key assumption here is that researchers can identify direct relationships be­
tween behavioral deficits and specific areas of the brain that were damaged. On the other
side of the debate was the view from cognitive neuroscience, (p. 3) which centered on the
idea that the more methods used, the better (Kosslyn & Intriligator, 1992). Because every
method has its limitations, the more methods researchers could bring to bear, the more
likely they are to have a correct picture of how behavior is related to neural functioning.
In the case of patient populations, for example, in some cases the deficits in behavior
might not simply reflect the normal functions of the damaged regions; rather, they could

Page 3 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
reflect reorganization of function after brain damage or diffuse damage to multiple re­
gions that affects multiple separate functions. If so, then observing patterns of dissocia­
tions and associations of abilities following brain damage would not necessarily allow re­
searchers to delineate the structure of cognitive processing. Other methods would be re­
quired (such as neuroimaging) to complement studies of brain-damaged patients.

The field quickly adopted the second perspective, drawing on multiple methods when
constructing and testing theories of cognitive processing. Researchers realized that they
could use multiple methods together in complementary ways: They could use functional
imaging methods to describe the network of processes active in the healthy brain when
engaged in a particular behavior; they could use lesion methods or TMS to assess the
causal relationships between activity in specific brain areas and particular forms of infor­
mation processing (which in turn give rise to particular types of behavior); they could use
electrophysiological methods to study the temporal dynamics of cortical systems as they
interactively relate to the behavior of interest. And so on. The cognitive neuroscience ap­
proach adopted the idea that no single technique provides all the answers.

That said, there is no denying that some techniques have proved more powerful and gen­
erative than others during the past 35 years. In particular, it is difficult to overstate the
impact of functional imaging of the healthy intact human brain, first ushered in by
positron emission tomography studies in the late 1980s (Petersen et al., 1988) and given
a tremendous boost by the advent of, and subsequent boom of, functional magnetic reso­
nance imaging in the early 1990s (Belliveau et al., 1992). The advent of functional imag­
ing is in many ways the single most important contributor to the rise of cognitive neuro­
science. Without the ability to study cortical and subcortical brain systems in action in
healthy adults, it’s not clear whether cognitive neuroscience would have become the cen­
tral paradigm that it is today.

We must, however, offer a cautionary note: Functional imaging is by no means the be-all
and end-all of cognitive neuroscience techniques. Like any other method, it has its own
strengths and weaknesses (which have been described in detail elsewhere, e.g., Poldrack,
2006, 2008, 2011; Van Horn & Poldrack, 2009; Yarkoni et al., 2010). Researchers trained
in cognitive neuroscience understand many, if not all, of these limitations, but unfortu­
nately, many outside the field do not. This can cause two problems. The first is that new­
comers to the field may improperly use functional imaging in the service of overly simplis­
tic “brain mapping” (e.g., seeking to identify “love spots” in the brain; Fisher et al., 2002)
and may commit other inferential errors (Poldrack, 2006). The second, less appreciated
problem, is that when nonspecialists read about studies of such overly simplistic hypothe­
ses, they may assume that all cognitive neuroscientists traffic in this kind of experimenta­
tion and theorizing. As the chapters in these volumes make clear, most cognitive neuro­
scientists appreciate the strengths and limits of the various techniques they use, and un­
derstand that functional imaging is simply one of a number of techniques that allow neu­
roscience data to constrain theories of psychological processes. In the next section, we
turn to exactly this point.

Page 4 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
Constraints and Convergence

One implication of using multiple methods to study phenomena at multiple levels of analy­
sis is that we have numerous types of data. These data provide converging evidence for,
and constrain the nature of, theories of human cognition, emotion, and behavior. That is,
the data must fit together, painting different facets of the same picture (this is what we
mean by convergence). And even though each type of data alone does not dictate a partic­
ular interpretation, each type helps to narrow the range of possible interpretations (this
is what we mean by constraining the nature of theories). Researchers in cognitive neuro­
science acknowledge that data always can be interpreted in various ways, but they also
rely on the fact that data limit the range of viable interpretations—and the more types of
data, the more strongly they will narrow down the range of possible theories. In this
sense, constraints and convergence are the very core of the cognitive neuroscience ap­
proach (Ochsner & Kosslyn, 1999).

We note that the principled use of constraining and converging evidence does not privi­
lege evidence couched at any one level of analysis. Brain data are not more important,
more real, or more (p. 4) intrinsically valuable than behavioral data, and vice versa.
Rather, both kinds of data constrain the range of possible theories of psychological
processes, and as such, both are valuable.

In addition, both behavioral and brain data can spark changes in theories of psychological
processes. This claim stands in contrast to claims made by those who have argued that
brain data can never change, or in any way constrain, a psychological theory. According
to this view, brain data are ambiguous without a psychological theory to interpret them
(Kihlstrom, 2012). Such arguments fail to appreciate the fact that the goal of cognitive
neuroscience is to construct theories couched at all three levels of analysis. Moreover, be­
havioral and brain data often are dependent variables collected in the same experiments.
This is not arbitrary; we have ample evidence that behavior and brain function are inti­
mately related: When the brain is damaged in a particular location, specific behaviors are
disrupted—and when a person engages in specific behaviors, specific brain areas are acti­
vated. Dependent measures are always what science uses to constrain theorizing, and
thus it follows that both behavioral and brain data must constrain our theories of the in­
tervening psychological processes.

This point is so important that we want to illustrate it with a two examples. The first be­
gins with classic studies of the amnesic patient known for decades only by his initials,
H.M. (Corkin, 2002). After he died, his brain was famously donated to science and dis­
sected live on the Internet in 2009 (see http://thebrainobservatory.ucsd.edu/hm_live.php).
We now know that his name was Henry. In the 1960s, Henry suffered from severe epilep­
sy that could not be treated with medication, which arose because of abnormal neural tis­
sue in his temporal lobes. At the time, he suffered horribly from seizures, and the last re­
maining course of potential treatment was a neurosurgical operation that removed the
tips of Henry’s temporal lobes (and with them, the neural origins of his epileptic
seizures).

Page 5 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
When Henry awoke after his operation, the epilepsy was gone, but so was his ability to
form new memories of events he experienced. Henry was stuck in the eternal present,
forevermore awakening each day with his sense of time frozen at the age at which he had
the operation. The time horizon for his experience was about two minutes, or the amount
of time information could be retained in short-term memory before it required transfer to
a longer-term episodic memory store.

To say that the behavioral sequelae of H.M.’s operation were surprising to the scientific
community at that time is an understatement. Many psychologists and neuroscientists
spent the better part of the next 20 to 30 years reconfiguring their theories of memory in
order to accommodate these and subsequent findings. It wasn’t until the early 1990s that
the long-reaching theoretical implications of Henry’s amnesia finally became clear
(Schacter & Tulving, 1994), when a combination of behavioral, functional imaging, and
patient lesion data converged to implicate a multiple-systems account of human memory.

This understanding of H.M.’s deficits was hard won, and emerged only after an extended
“memory systems debate” in psychology and neuroscience (Schacter & Tulving, 1994).
This debate was between, on the one hand, behavioral and psychological theorists who
argued that we have a single memory system (which has multiple processes) and, on the
other hand, neuroscience-inspired theorists who argued that we have multiple memory
systems (each of which instantiates a particular kind of process or processes). The initial
observation of H.M.’s amnesia, combined with decades of subsequent careful experimen­
tation using multiple behavioral and neuroscience techniques, decisively came down on
the side of the multiple memory systems theorists. Cognitive processing relies on multiple
types of memory, and each uses a distinct set of representations and processes. This was
a clear victory for the cognitive neuroscience approach over purely behavioral approach­
es.

A second example of the utility of combining neuroscientific and behavioral evidence


comes from the “imagery debate” (Kosslyn, Thompson, & Ganis, 2006). On one hand,
some psychologists and philosophers argued that the pictorial characteristics of visual
mental images that are evident to experience are epiphenomenal, like heat produced by a
light bulb when someone is reading—something that could be experienced but played no
role in accomplishing the function. On the other hand, cognitive neuroscientists argued
that visual mental images are analogous to visual percepts in that they use space in a rep­
resentation to specify space in the world.

This debate went back and forth for many years without resolution, and at one point a
mathematical proof was offered that behavioral data alone could never resolve it (Ander­
son, 1978). The advent of neuroimaging helped bring this debate largely to a close (Koss­
lyn, Thompson, & Ganis, 2006). A key (p. 5) finding was that the first cortical areas that
process visual input during perception each are topographically mapped, such that adja­
cent locations in the visual world are represented in adjacent locations in the visual cor­
tex. That is, these areas use space on the cortex to represent space in the world. In the
early 1990s, researchers showed that visualizing objects typically activates these areas,

Page 6 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
and increasing the size of a visual mental image activates portions of this cortex that reg­
ister increasingly larger sizes in perception. Moreover, in the late 1990s researchers
showed that temporarily impairing these areas using TMS hampers imagery and percep­
tion to the same degree. Hence, these brain-based findings provided clear evidence that
visual mental images are, indeed, analogous to visual percepts in that both represent
space in the world by using space in a representation.

We have written as if both debates—about memory systems and mental imagery repre­
sentation—are now definitely closed. But this is a simplification; not everyone is con­
vinced of one or another view. Our crucial point is that the advent of neuroscientific data
has shifted the terms of the debate. When only behavioral data were available, in both
cases the two alternative positions seemed equally plausible—but after the relevant neu­
roscientific data were introduced, the burden of proof shifted dramatically to one side—
and a clear consensus emerged in the field (e.g., see Reisberg, Pearson, & Kosslyn, 2003).

In the years since these debates, evidence from cognitive neuroscience has constrained
theories of a wide range of phenomena. Many such examples are chronicled in this Hand­
book.

Overview of the Handbook


Cognitive neuroscience in the new millennium is a broad and diverse field, defined by a
multileveled integrative approach. To provide a systematic overview of this field, we’ve di­
vided this Handbook into two volumes.

Volume 1

The first volume surveys classic areas of interest in cognitive neuroscience: perception,
attention, memory, and language. Twenty years ago when Kevin Ochsner was a graduate
student and Stephen Kosslyn was one of his professors, research on these topics formed
the backbone of cognitive neuroscience research. And this is still true today, for two rea­
sons.

First, when cognitive neuroscience took off, these were the areas of research within psy­
chology that had the most highly developed behavioral, psychological, and neuropsycho­
logical (i.e., brain-damaged patient based) models in place. And in the case of research on
perception, attention, and memory, these were topics for which fairly detailed models of
the underlying neural circuitry already had been developed on the basis of rodent and
nonhuman primate studies. As such, these areas were poised to benefit from the use of
brain-based techniques in humans.

Second, research on the representations and processes used in perception, attention,


memory, and language in many ways forms a foundation for studying other kinds of com­
plex behaviors, which are the focus of the second volume. This is true both in terms of the

Page 7 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
findings themselves and in terms of the evidence such findings provided that the cogni­
tive neuroscience approach could be successful.

With this in mind, each of the four sections in Volume 1 includes a selection of chapters
that cover core processes and the ways in which they develop across the lifespan and may
break down in special populations.

The first section, on perception, includes chapters on the abilities to represent and recog­
nize objects and spatial relations. In addition, this section contains chapters on the use of
top-down processes in visual perception and on the ways in which such processes enable
us to construct and use mental images. We also include chapters on perceptual abilities
that have seen tremendous research growth in the past 5 to 10 years, such as on the
study of olfaction, audition, and music perception. Finally, there is a chapter on disorders
of perception.

The second section, on attention, includes chapters on the abilities to attend to auditory
and spatial information as well as on the relationships between attention, action, and vi­
sual motor control. These are followed by chapters on the development of attention and
its breakdown in various disorders.

The third section, on memory, includes chapters on the abilities to maintain information
in working memory as well as semantic memory, episodic memory, and the consolidation
process that governs the transfer of information from working to semantic and episodic
memory. There is also a chapter on the ability to acquire skills, which depends on differ­
ent systems than those used in other forms of memory, as well as chapters on changes in
memory function with older age and the ways in which memorial processes break down in
various disorders.

Finally, the fourth section, on language, includes chapters on abilities such as speech per­
ception and production, the distinction between linguistic (p. 6) competence and perfor­
mance, semantics, the capacity for written language, and multimodal and developmental
aspects of speech perception.

Volume 2

Whereas Volume 1 addresses the classics of cognitive neuroscience, Volume 2 focuses on


the “new wave” of research that has developed primarily in the past 10 years. As noted
earlier, in many ways the success of these relatively newer research directions builds on
the successes of research in the classic domains. Indeed, our knowledge of the systems
implicated in perception, attention, memory, and language literally—and in this Handbook
—provided the foundation for the work described in Volume 2.

The first section, on emotion, begins with processes involved in interactions between
emotion, perception, and attention, as well as the generation and regulation of emotion.
This is followed by chapters that provide models for understanding broadly how emotion
affects cognition as well as the contribution that bodily sensation and control make to af­

Page 8 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
fective and other processes. This section concludes with chapters on genetic and develop­
mental approaches to emotion.

The second section, on self and social cognition, begins with a chapter on the processes
that give rise to the fundamental ability to know and understand oneself. This is followed
by chapters on increasingly complex abilities involved in perceiving others, starting with
the perception of nonverbal cues and perception–action links, and from there ranging to
face recognition, impression formation, drawing inferences about others’ mental states,
empathy, and social interaction. This section concludes with a chapter on the develop­
ment of social cognitive abilities.

The third section, on higher cognitive functions, surveys abilities that largely depend on
processes in the frontal lobes of the brain, which interact with the kinds of core perceptu­
al, attentional, and memorial processes described in Volume 1. Here, we include chapters
on conflict monitoring and cognitive control, the hierarchical control of action, thinking,
decision making, categorization, expectancies, numerical cognition, and neuromodulatory
influences on higher cognitive abilities.

Finally, in the fourth section, four chapters illustrate how disruptions of the mechanisms
of cognition and emotion produce abnormal functioning in clinical populations. This sec­
tion begins with a chapter on attention deficit-hyperactivity disorder and from there
moves to chapters on anxiety, post-traumatic stress disorder, and obsessive-compulsive
disorder.

Summary
Before moving from the appetizer to the main course, we offer two last thoughts.

First, we edited this Handbook with the goal of providing a broad-reaching compendium
of research on cognitive neuroscience that will be widely accessible to a broad audience.
Toward this end, the chapters included in this Handbook are available online to be down­
loaded individually. This is the first time that chapters of a Handbook of this sort have
been made available in this way, and we hope this facilitates access to and dissemination
of some of cognitive neuroscience’s greatest hits.

Second, we hope that, whether you are a student, an advanced researcher, or an interest­
ed layperson, this Handbook whets your appetite for learning more about this exciting
and growing field. Although reading survey chapters of the sort provided here is an excel­
lent way to become oriented in the field and to start building your knowledge of the top­
ics that interest you most, we encourage you to take your interests to the next level:
Delve into the primary research articles cited in these chapters—and perhaps even get in­
volved in doing this sort of research!

Page 9 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
References
Anderson, J. R. (1978). Arguments concerning representations for mental imagery. Psy­
chological Review, 85, 249–277.

Belliveau, J. W., Kwong, K. K., Kennedy, D. N., Baker, J. R., Stern, C. E., et al. (1992). Mag­
netic resonance imaging mapping of brain function: Human visual cortex. Investigative
Radiology, 27 (Suppl 2), S59–S65.

Caramazza, A. (1992). Is cognitive neuropsychology possible? Journal of Cognitive Neuro­


science, 4, 80–95.

Churchland, P. S., & Sejnowski, T. J. (1988). Perspectives on cognitive neuroscience.


Science, 242, 741–745.

Corkin, S. (2002). What’s new with the amnesic patient H.M.? Nature Reviews, Neuro­
science, 3, 153–160.

Fisher, H. E., Aron, A., Mashek, D., Li, H., & Brown, L. L. (2002). Defining the brain sys­
tems of lust, romantic attraction, and attachment. Archives of Sexual Behavior, 31, 413–
419.

Gardner, H. (1985). The mind’s new science: A history of the cognitive revolution. New
York: Basic Books.

Kihlstrom, J. F. (2012). Social neuroscience: The footprints of Phineas Gage. Social Cogni­
tion, 28, 757–782.

Kosslyn, S. M., & Intriligator, J. I. (1992). Is cognitive neuropsychology plausible? The per­
ils of sitting on a one-legged stool. Journal of Cognitive Neuroscience, 4, 96–105.

Kosslyn, S. M., & Maljkovic, V. M. (1990). Marr’s metatheory revisited. Concepts in Neu­
roscience, 1, 239–251.

Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.

Marr, D. (1982). Vision: A computational investigation into the human representa­


(p. 7)

tion and processing of visual information. San Francisco: W. H. Freeman.

Mitchell, J. P. (2006). Mentalizing and Marr: An information processing approach to the


study of social cognition. Brain Research, 1079, 66–75.

Ochsner, K. (2007). Social cognitive neuroscience: Historical development, core princi­


ples, and future promise. In A. Kruglanksi & E. T. Higgins (Eds.), Social psychology: A
handbook of basic principles (pp. 39–66). New York: Guilford Press.

Page 10 of 11
Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive
Neuroscience—Where Are We Now?
Ochsner, K. N., & Kosslyn, S. M. (1999). The cognitive neuroscience approach. In B. M.
Bly & D. E. Rumelhart (Eds.), Cognitive science (pp. 319–365). San Diego, CA: Academic
Press.

Ochsner, K. N., & Lieberman, M. D. (2001). The emergence of social cognitive neuro­
science. American Psychologist, 56, 717–734.

Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1988). Positron
emission tomographic studies of the cortical anatomy of single-word processing. Nature,
331, 585–589.

Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data?


Trends in Cognitive Sciences, 10, 59–63.

Poldrack, R. A. (2008). The role of fMRI in cognitive neuroscience: Where do we stand?


Current Opinion in Neurobiology, 18, 223–227.

Poldrack, R. A. (2011). Inferring mental states from neuroimaging data: From reverse in­
ference to large-scale decoding. Neuron, 72, 692–697.

Posner, M. I., & DiGirolamo, G. J. (2000). Cognitive neuroscience: Origins and promise.
Psychological Bulletin, 126, 873–889.

Reisberg, D., Pearson, D. G., & Kosslyn, S. M. (2003). Intuitions and introspections about
imagery: The role of imagery experience in shaping an investigator’s theoretical views.
Applied Cognitive Psychology, 17, 147–160.

Schacter, D. L., & Tulving, E. (1994). (Eds.) Memory systems 1994. Cambridge, M A: MIT
Press.

Van Horn, J. D., & Poldrack, R. A. (2009). Functional MRI at the crossroads. International
Journal of Psychophysiology, 73, 3–9.

Yarkoni, T., Poldrack, R. A., Van Essen, D. C., & Wager, T. D. (2010). Cognitive neuro­
science 2.0: Building a cumulative science of human brain function. Trends in Cognitive
Sciences, 14, 489–496.

Kevin N. Ochsner

Kevin N. Oschner is a professor in the Department of Psychology at Columbia Univer­


sity in New York, NY.

Stephen Kosslyn

Stephen M. Kosslyn, Center for Advanced Study in the Behavioral Sciences, Stanford
University, Stanford, CA

Page 11 of 11
Representation of Objects

Representation of Objects  
Kalanit Grill-Spector
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0002

Abstract and Keywords

Functional magnetic resonance imaging (fMRI) has enabled neuroscientists and psycholo­
gists to understand the neural bases of object recognition in humans. This chapter re­
views fMRI research that yielded important insights about the nature of object represen­
tations in the human brain. Combining fMRI with psychophysics may offer clues about
what kind of visual processing is implemented in distinct cortical regions. This chapter
explores how fMRI has influenced current understanding of object representations by fo­
cusing on two aspects of object representation: how the underlying representations pro­
vide for invariant object recognition and how category information is represented in the
ventral stream. It first provides a brief introduction of the functional organization of the
human ventral stream and a definition of object-selective cortex before describing cue-in­
variant responses in the lateral occipital complex (LOC), neural bases of invariant object
recognition, object and position information in the LOC, and viewpoint sensitivity across
the LOC. The chapter concludes by commenting on debates about the nature of functional
organization in the human ventral stream.

Keywords: functional magnetic resonance imaging, object recognition, psychophysics, brain, object
representation, category information, ventral stream, object-selective cortex, lateral occipital complex, functional
organization

Introduction
Humans can effortlessly recognize objects in a fraction of a second despite large variabili­
ty in the appearance of objects (Thorpe et al., 1996). What are the underlying representa­
tions and computations that enable this remarkable human ability? One way to answer
these questions is to investigate the neural mechanisms of object recognition in the hu­
man brain. With the advent of functional magnetic resonance imaging (fMRI) about 20
years ago, neuroscientists and psychologists began to examine the neural bases of object
recognition in humans. fMRI is an attractive method because it is a noninvasive tech­
nique that allows multiple measurements of brain activation in the same awake behaving
human. Among noninvasive techniques, it provides the best spatial resolution currently
Page 1 of 29
Representation of Objects

available, enabling us to localize cortical activations in the spatial resolution of millime­


ters (as fine as 1 mm) and at a reasonable time scale (on the order of seconds).

Before the advent of fMRI, knowledge about the function of the ventral stream was based
on single-unit electrophysiology measurements in monkeys and on lesion studies. These
studies showed that neurons in the monkey inferotemporal (IT) cortex respond to shapes
(Fujita et al., 1992) and complex objects such as faces (Desimone et al., 1984), and that
lesions to the ventral stream can produce specific deficits in object recognition such as
agnosia (inability to recognize objects) and prosopagnosia (inability to recognize faces;
see Farah, 1995). However, interpreting lesion data is complicated because lesions are
typically diffuse (usually more than one region is damaged), may disrupt both a cortical
region and its connectivity, (p. 12) and are not replicable across patients. Therefore, the
primary knowledge gained from fMRI research was which cortical sites in the normal hu­
man brain are involved in object recognition. The first set of fMRI studies of object and
face recognition in humans identified the regions in the human brain that respond selec­
tivity to objects and faces (Malach et al., 1995; Kanwisher et al., 1997; Grill-Spector et al.,
1998b). Next, a series of studies demonstrated that activation in object- and face-selec­
tive regions correlates with success at recognizing object and faces, respectively, provid­
ing striking evidence for the involvement of these regions in recognition (Bar et al., 2001;
Grill-Spector et al., 2000, 2004). After researchers determined which regions in cortex
are involved in object recognition, their focus shifted to examining the nature of repre­
sentations and computations that are implemented in these regions to understand how
they enable efficient object recognition in humans.

In this chapter, I review fMRI research that provided important knowledge about the na­
ture of object representations in the human brain. For example, one of the fundamental
problems in recognition is how to recognize an object across variations in its appearance
(invariant object recognition). Understanding how a biological system has solved this
problem may give hints for how to build a robust artificial recognition system. Further,
fMRI is more adequate for measuring object representations than the temporal sequence
of computations en route to object recognition because the time scale of fMRI measure­
ments is longer than the time scale of the recognition process (the temporal resolution of
fMRI is on the order of seconds, whereas object recognition takes about 100 to 250 ms).
Nevertheless, combining psychophysics with fMRI may give us some clues to the kinds of
visual processing implemented in distinct cortical regions. For example, finding regions
whose activation is correlated with success at some tasks, but not others, may suggest
the involvement of particular cortical regions in one computation, but not another.

In discussing how fMRI has affected our current understanding of object representations,
I focus on results pertaining to two aspects of object representation:

• How do the underlying representations provide for invariant object recognition?


• How is category information represented in the ventral stream?

Page 2 of 29
Representation of Objects

I have chosen these topics for three reasons: (1) are central topics in the field of object
recognition for which fMRI has substantially advanced our understanding, (2) some find­
ings related to these topics stirred considerable debate (see the later section, Debates
about the Nature of Functional Organization in the Human Ventral Stream), and (3) some
of the fMRI findings in humans are surprising given prior knowledge from single-unit
electrophysiology in monkeys. In terms of the chapter’s organization, I begin with a brief
introduction of the functional organization of the human ventral stream and a definition
of object-selective cortex, and then describe research that elucidated the properties of
these regions with respect to basic coding principles. I continue with findings related to
invariant object recognition, and then end with research and theories regarding category
representation and specialization in the human ventral stream.

Functional Organization of the Human Ventral


Stream
The first set of fMRI studies on object and face recognition in humans was devoted to
identifying the regions in the brain that are object and face selective. Electrophysiology
research in monkeys suggested that neurons in higher level regions respond to shapes
and objects more than simple stimuli such as lines, edges, and patterns (Desimone et al.,
1984; Fujita et al., 1992; Logothetis et al., 1995). Based on these findings, fMRI studies
measured brain activation when people viewed pictures of objects, as opposed to when
people viewed scrambled objects (i.e., pictures that have the same local information and
statistics, but do not contain an object) or texture patterns (e.g., checkerboards, which
are robust visual stimuli, but do not elicit a percept of a global form). These studies found
a constellation of regions in the lateral occipital cortex termed the lateral occipital com­
plex (LOC), extending from the lateral occipital sulcus, posterior to the medial temporal
hMT+ region ventrally to the occipito-temporal sulcus (OTS) and the fusiform gyrus
(Fus), that respond more to objects than controls. The LOC is located lateral and anterior
to early visual areas (Grill-Spector et al., 1998a, 1998b) and is typically divided to two
subregions: LO, a region in lateral occipital cortex adjacent and posterior to the hMT+ re­
gion; and pFus/OTS, a ventral region overlapping the OTS and the posterior fusiform
gyrus (pFus) (Figure 2.1). More recent experiments indicate that the posterior subregion
(LO) overlaps a visual field map representation between V3a and hMT+ called LO2
(Sayres & Grill-Spector, 2008).

Page 3 of 29
Representation of Objects

Figure 2.1 Object-, face- and place-selective cortex.


(a) Data of one representative subject shown on her
partially inflated right hemisphere. Left: lateral view.
Right: ventral view. Dark gray: sulci. Light gray: gyri.
White lines delineate retinotopic regions. Blue: ob­
ject-selective regions (objects > scrambled objects),
including LO and pFus/OTS ventrally as well as dor­
sal foci along the intraparietal sulcus (IPS). Red:
face-selective regions (faces > objects, body parts,
places & words), including two regions in the
fusiform gyrus (FFA-1, FFA-2) a region in the inferior
occipital gyrus (IOG) and two regions in the posteri­
or superior temporal sulcus (STS). Magenta: overlap
between face- and object-selective regions. Green:
place-selective regions (places > faces, body parts,
objects and words.), including the PPA and a dorsal
region lateral to the IPS. Yellow: Body part selective
regions (bodies > other categories). Black: Visual
word form area (VWFA), words > other categories.
All maps thresholded at p < 0.001, voxel level. (b) LO
and pFus (but not V1) responses are correlated with
recognition performance (Ungerleider et al., 1983;
Grill-Spector et al., 2000). To superimpose recogni­
tion performance and fMRI signals on the same plot,
all values were normalized relative to the maximal
response for the 500-ms duration stimulus. For

For

(p. 13)

Page 4 of 29
Representation of Objects

The LOC responds robustly to many kinds of objects and object categories (including nov­
el objects) and is thought to be in the intermediate or high-level stages of the visual hier­
archy. Importantly, LOC activations are correlated with subjects’ object recognition per­
formance. High LOC responses correlate with successful object recognition (hits), and
low LOC responses correlate with trials in which objects are present, but are not recog­
nized (misses) (see Figure 2.1b). There are also object-selective regions in the dorsal
stream (Grill-Spector, 2003; Grill-Spector & Malach, 2004), but these regions do not cor­
relate with object recognition performance (Fang & He, 2005) and may be involved in
computations related to visually guided actions toward objects (Culham et al., 2003).
However, a comprehensive discussion of the dorsal stream’s role in object perception is
beyond the scope of this chapter.

In addition to the LOC, researchers found several ventral regions that show preferential
responses to specific object categories. Searching for regions with categorical preference
was motivated by reports that suggested that lesions to the ventral stream can produce
very specific deficits, such as the inability to recognize faces or the inability to read
words, whereas other visual (and recognition) faculties are preserved. By contrasting ac­
tivations to different kinds of objects, researchers found ventral regions that show higher
responses to specific object categories, such as lateral fusiform regions that respond
more to animals than tools and medial fusiform regions that respond to tools more than
animals (Chao et al., 1999; Martin et al., 1996); a region in the left OTS that responds
more strongly to letters than textures (the visual word form area [VWFA]; Cohen et al.,
2000); several foci that respond more strongly to faces than other objects (Grill-Spector
et al., 2004; Haxby et al., 2000; Hoffman & Haxby, 2000; Kanwisher et al., 1997; Weiner &
Grill-Spector, 2012), including the fusiform face areas (FFA-1, FFA-2; Kanwisher et al.,
1997; Weiner & Grill-Spector, 2010); regions that respond more strongly to houses and
places than faces and objects, including a region in the parahippocampal gyrus, the
parahippocampal place area (PPA; Epstein & Kanwisher, 1998); and regions that respond
more strongly to body parts than faces and objects, including a region near the MT called
the extrastriate body area (EBA; Downing et al., 2001); and a region in the fusiform gyrus,
the fusiform body area (FBA; Schwarzlose et al., 2005, or OTS-limbs, Weiner and Grill-
Spector, 2011). Nevertheless, many of these object-selective and category-selective re­
gions respond to more than one object category and also respond strongly to object frag­
ments (Grill-Spector et al., 1998b; Lerner et al., 2001, 2008). This suggests that caution is
needed when interpreting the nature of the selective responses. It is possible that the un­
derlying representation is perhaps of object parts, features, and/or fragments and not of
whole objects or object categories.

Findings of category-selective regions in the human brain initiated a fierce debate about
the (p. 14) principles of functional organization in the ventral stream. Should one consider
only the maximal responses to the preferred category, or do the non maximal responses
also carry information? How abstract is the information represented in these regions? For
example, is category information represented in these regions, or are low-level visual fea­
tures that are associated with categories represented? I address these questions in detail

Page 5 of 29
Representation of Objects

in the later section, Debates about the Nature of Functional Organization in the Human
Ventral Stream.

Cue-Invariant Responses in the Lateral Occipi­


tal Complex
Although findings of object-selective responses in the human brain were suggestive of the
involvement of these regions in processing objects, there are many differences between
objects and scrambled objects (or objects and texture patterns). Objects have a shape,
surfaces, and contours; they are associated with a meaning and semantic information;
and generally are more interesting than texture patterns. Each of these factors may af­
fect the higher fMRI response to objects than controls. Further, differences in low-level
visual properties across objects and controls may be driving differences in response am­
plitudes.

Figure 2.2 Selective responses to objects across


multiple visual cues across the lateral occipital com­
plex. Statistical maps of selective response to object
from luminance, stereo, and motion information in a
representative subject. All maps were thresholded at
p < 0.005, voxel level, and are shown on the inflated
right hemisphere of a representative subject. (a)
Luminance objects > scrambled luminance objects.
(b) Objects generated from random dot stereograms
vs. structureless random dot stereograms (perceived
as a cloud of dots). (c) Objects generated from dot
motion vs. the same dots moving randomly. Visual
meridians are represented by the red (upper), blue
(horizontal), and green (lower) lines. White contour:
motion-selective region, MT. (Adapted from Vinberg
& Grill-Spector, 2008.)

Converging evidence from several studies revealed an important aspect of coding in the
LOC: it responds to object shape, not low-level visual features. These studies showed that
all LOC subregions (LO and pFus/OTS) respond more strongly when subjects view objects
independently of the type of visual information that defines the object form (Gilaie-Dotan
et al., 2002; Grill-Spector et al., 1998a; Kastner et al., 2000; Kourtzi & Kanwisher, 2000,
2001; Vinberg & Grill-Spector, 2008) (Figure 2.2). The LOC responds more strongly to (1)
objects defined by luminance compared with luminance textures, (2) objects generated
from random dot stereograms compared with structureless random dot stereograms, (3)
objects generated from structure from motion relative to random (structureless) motion,
and (4) objects generated from textures compared with texture patterns. LOC response to
Page 6 of 29
Representation of Objects

objects is also similar across object format (gray-scale, line drawings, silhouettes), and it
responds to objects delineated by both real and illusory contours (Mendola et al., 1999;
Stanley & Rubin, 2003). Kourtzi and Kanwisher (2001) also showed that when objects
have the same shape but different contours, there is fMRI adaptation (fMRI-A, indicating
a common neural substrate), but there is no fMRI-A when the shared contours were iden­
tical but the perceived shape was different, suggesting that the LOC responds to global
shape rather than local contours (see also Kourtzi et al., 2003; Lerner et al., 2002). Over­
all, these studies provided fundamental knowledge showing that LOC activation is driven
by shape rather than low-level visual information.

More recently, we examined whether LOC response to objects is driven by their global
shape or their surfaces and whether LOC subregions are sensitive to border ownership.
One open question in object recognition is whether the region in the image that belongs
to the object is first segmented from the rest of the image (figure–ground segmentation)
and then recognized, or whether knowing the shape of an object aids its segmentation
(Nakayama et al., 1995; Peterson & Gibson, 1994a, 1994b). To address these questions,
we scanned subjects when they viewed stimuli that were matched for their low-level in­
formation (p. 15) but generated different percepts. Conditions included: (1) a flat object in
front of a flat background object, (2) a flat surface with a shaped hole (same shape as the
object) in front of a flat background, (3) two flat surfaces without shapes, (4) local edges
(created by scrambling the object contour) in front of a background, or (5) random dot
stimuli with no structure (Vinberg & Grill-Spector, 2008) (Figure 2.3a). Note that condi­
tions 1 and 2 both contain a shape, but only condition 1 contains an object. We repeated
the experiment twice, once with random dots that were presented stereoscopically and
once with random dots that moved, to determine whether the pattern of result varied
across stereo and motion cues. We found that LOC responses (both LO and pFus/OTS)
were higher for objects and shaped holes than for surfaces, local edges, or random stim­
uli (see Figure 2.3b). We observed these results for both motion and stereo cues. In con­
trast, LOC responses were not higher for surfaces than for random stimuli and were not
higher for local edges than for random stimuli. Thus, adding either local edge information
or global surface information does not increase LOC response. However, adding a global
shape produces a significant increase in LOC response. These results provide clear evi­
dence that cue-invariant responses in the LOC are driven by object shape, rather than by
global surface information or local edge information.

Additional studies revealed that the LOC is also sensitive to border ownership (Appel­
baum et al., 2006; Vinberg & Grill-Spector, 2008). Specifically, LO and pFus/OTS respons­
es were higher for objects (shapes presented in the foreground) than for the same shapes
when they defined holes in the foreground. Since objects and holes had the same shape,
the only difference between the objects and the holes was the border ownership of the
contour defining the shape. In the former case, the border belongs to the object, and in
the latter case, it belongs to the flat surface in which the hole is punched in. Interestingly,
this higher response to objects than holes was a unique characteristic of LOC subregions
and did not occur in other visual regions (see Figure 2.3). This result suggests that LOC
prefers shapes (and contours) when they define the figure region. One implication of this
Page 7 of 29
Representation of Objects

result is that perception the same cortical machinery determines what is the object in the
visual input as well as which region in the visual input is the figure regions, correspond­
ing to the object.

Neural Bases of Invariant Object Recognition

Figure 2.3 Responses to shape, edges and surfaces


across the ventral stream. (a) Schematic illustration
of experimental conditions. Stimuli were generated
from either motion or stereo information alone and
had no luminance edges or surfaces (except for the
screen border, which was present during the entire
experiment, including blank baseline blocks). For il­
lustration purposes, darker regions indicate front
surfaces. From left to right: Object on the front sur­
face in front of a flat background plane. Shaped hole
on the front surface in front of a flat background. (c)
Disconnected edges in front of a flat background.
Edges were generated by scrambling the shape con­
tours. Surfaces: Two semitransparent flat surfaces at
different depths. Random stimuli with no coherent
structure, edges, global surfaces, or global shape.
Random stimuli had the same relative disparity or
depth range as other conditions. See examples of
stimuli: http://www-psych.stanford.edu/~kalanit/
jnpstim/. (b) Responses to objects, holes, edges, and
global surfaces across the visual ventral processing
hierarchy. Responses: mean ± SEM across eight sub­
jects. O: object; H: hole; S: surfaces; E: edges; R: ran­
dom. Diamonds: significantly different than random
at p < 0.05. (Adapted with permission from Vinberg
& Grill-Spector, 2008.)

The literature reviewed so far provides accumulating evidence that LOC is involved in
processing object form. The next question that one may ask, given the role of the LOC in
object perception, (p. 16) is, How does it deal with the variability in objects’ appearance?
There are many factors that can affect the appearance of objects. Changes in object ap­
pearance can occur as a result of the object being at different locations relative to the ob­
server, which will affect the retinal projection of the object in terms of its size and posi­
tion. Also, the two-dimensional (2D) projection of a three-dimensional (3D) object on the
retina varies considerably owing to changes in its rotation and viewpoint relative to the
observer. Other changes in appearance occur because of differential illumination condi­

Page 8 of 29
Representation of Objects

tions, which affect the object’s color, contrast, and shadowing. Nevertheless, humans are
able to recognize objects across large changes in their appearance, which is referred to
as invariant object recognition.

A central topic of research in the study of object recognition is understanding how invari­
ant recognition is accomplished. One view suggests that invariant object recognition is
accomplished because the underlying neural representations are invariant to the appear­
ance of objects. Thus, there will be similar neural responses even when the appearance of
an object changes considerably. One means by which this can be achieved is by extracting
from the visual input features or fundamental elements (such as geons; Biederman, 1987)
that are relatively insensitive to changes in objects’ appearance. According to one influen­
tial model (the recognition by components [RBC] model; Biederman, 1987), objects are
represented by a library of geons (that are easy to detect in many viewing conditions) and
their spatial relations. Other theories suggest that invariance may be generated through
a sequence of computations across a hierarchically organized processing stream in which
the level of sensitivity to object transformation decreases from one level of processing to
the next. For example, at the lowest level, neurons code local features, and in higher lev­
els of the processing stream, neurons respond to more complex shapes and are less sensi­
tive to changes in position and size (Riesenhuber & Poggio, 1999).

Neuroimaging studies of invariant object recognition found differential sensitivity across


the ventral stream to object transformations such as size, position, illumination, and view­
point. Intermediate regions such as LO show higher sensitivity to image transformations
than higher level regions such as pFus/OTS. Notably, accumulating evidence from many
studies suggests that at no point in the ventral stream are neural representations entirely
invariant to object transformations. These results support an account in which invariant
recognition is supported by a pooled response across neural populations that are sensi­
tive to object transformations. One way in which this can be accomplished is by a neural
code that contains independent sensitivity to object information and object transforma­
tion (DiCarlo & Cox, 2007). For example, neurons may be sensitive to both object catego­
ry and object position. As long as the categorical preference is retained across object
transformations, invariant object information can be extracted.

Object and Position Information in the Lateral


Occipital Complex
Of the object transformations that the recognition system needs to overcome, size and po­
sition invariance are thought to be accomplished in part by an increase in the size of
neural receptive fields along the visual hierarchy. That is, as one ascends the visual hier­
archy, neurons respond to stimuli across a larger part of the visual field. At the same
time, a more complex visual stimulus is necessary to elicit significant responses in neu­
rons (e.g., shapes instead of oriented lines). Findings from electrophysiology suggest that
even at the highest stages of the visual hierarchy, neurons retain some sensitivity to ob­
ject location and size (although electrophysiology reports vary significantly about the de­
Page 9 of 29
Representation of Objects

gree of position sensitivity of IT neurons (DiCarlo & Maunsell, 2003; Op De Beeck & Vo­
gels, 2000; Rolls, 2000). A related issue is whether position sensitivity of neurons in high­
er visual areas manifests as an orderly, topographic representation of the visual field.

Several studies documented sensitivity to both eccentricity and polar angle in distinct
ventral stream regions. Both object-selective and category-selective regions in the ventral
stream respond to objects presented at multiple positions and sizes. However, the ampli­
tude of response to an object varies across different retinal positions. The LO, pFus/OTS,
and category-selective regions (e.g. FFA, PPA) respond more strongly to objects present­
ed in the contralateral compared with ipsilateral visual field (Grill-Spector et al., 1998b;
Hemond et al., 2007; McKyton & Zohary, 2007). Some regions (LO and EBA) also respond
more strongly to objects presented in the lower visual field (Sayres & Grill-Spector, 2008;
Schwarzlose et al., 2008). Responses also vary with eccentricity: the FFA and the VWFA
respond more strongly to centrally presented stimuli, and the PPA responds more strong­
ly to peripherally presented stimuli (Hasson et al., 2002, 2003; Levy et al., 2001; Sayres &
Grill-Spector, 2008). Further, more recently, Aracro & (p. 17) colleagues discovered that
the PPA contains two visual field map representations (Aracaro et al., 2009).

Using fMRI-A, my colleagues and I have shown that the pFus/OTS, but not the LO, ex­
hibits some degree of insensitivity to objects’ size and position (Grill-Spector et al., 1999).
fMRI-A is a method that allows characterization of the sensitivity of neural representa­
tions to stimulus transformations at a subvoxel resolution. fMRI-A is based on findings
from single-unit electrophysiology showing that when objects repeat, there is a stimulus-
specific decrease in IT cells’ response to the repeated image, but not to other object im­
ages (Miller et al., 1991; Sawamura et al., 2006). Similarly, fMRI signals in higher visual
regions show a stimulus-specific reduction (fMRI-A) in response to repetition of identical
object images (Grill-Spector et al., 1999, 2006a; Grill-Spector & Malach, 2001). We
showed that fMRI-A can be used to test the sensitivity of neural responses to object trans­
formation by adapting cortex with a repeated presentation of an identical stimulus and
examining adaptation effects when the stimulus is changed along an object transforma­
tion (e.g., changing its position). If the response remains adapted, it indicates that neu­
rons are insensitive to the change. However, if the response returns to the initial level
(i.e., recovers from adaptation), it indicates sensitivity to the change (Grill-Spector &
Malach, 2001).

Using fMRI-A, we found that repeated presentation of the same face or object at the same
position and size produces reduced fMRI activation. This is thought to reflect stimulus-
specific neural adaptation. Presenting the same face or object in different positions in the
visual field or at different sizes also produces fMRI-A in pFus/OTS and FFA, indicating in­
sensitivity to object size and position in the range we tested (Grill-Spector et al., 1999;
see also Vuilleumier et al., 2002). This result is consistent with electrophysiology findings
showing that IT neurons that respond similarly to stimuli at different positions in the visu­
al field also show adaptation when the same object is shown in different positions
(Lueschow et al., 1994). In contrast, the LO recovered from fMRI-A to images of the same

Page 10 of 29
Representation of Objects

face or object when presented at different sizes or positions. This indicates that the LO is
sensitive to object position and size.

Recently, several groups examined the sensitivity of the distributed response across the
visual stream to object category and object position (Sayres & Grill-Spector, 2008; Sch­
warzlose et al., 2008) and also object identity and object position (Eger et al., 2008).
These studies used multivoxel pattern analyses (MVPA) and classifier methods developed
in machine learning to examine what information is present in the distributed responses
across voxels in a cortical region. The distributed response can carry different informa­
tion from the mean response of a region of interest (ROI) when there is variation across
voxel responses.

To examine sensitivity to position information, several studies examined whether distrib­


uted response patterns to the same object category (or object exemplar) is the same (or
different) when the same stimulus is presented in a different position in the visual field.
In MVPA, researchers typically split the data into two independent sets and examine the
cross-correlation between the distributed responses to the same (or different) stimulus in
the same (or different) position across the two datasets. This gives a measure of the sen­
sitivity of distributed responses to object information and position. When responses are
position invariant, there is a high correlation between the distributed responses to the
same object category (or exemplar) at different positions. When responses are sensitive
to position, there is a low correlation between responses to the same object category (or
exemplar) at different positions.

When exemplars from the same object category are shown in the same position in the vi­
sual field LO responses are reliable (or positively correlated). Surprisingly, showing ob­
jects from the same category, but at a different position, significantly reduced the correla­
tion between activation patterns (Figure 2.4, first vs. third bars) and this reduction was
larger than changing the object category in the same position (see Figure 2.4, first vs.
second bar). Importantly, position and category effects were independent because there
were no significant interactions between position and category (all F values < 1.02, all p
values > 0.31). Thus, changing both object category and position produced maximal
decorrelation between distributed responses (see Figure 2.4, fourth bar).

Page 11 of 29
Representation of Objects

Figure 2.4 Mean cross correlations between LO dis­


tributed responses across two independent halves of
the data for the same or different category at the
same or different position in the visual field. Position
effects: LO response patterns to the same category
were substantially more similar if they were present­
ed at the same position versus different positions (
first and third bars, p < 10–7). Category effects: the
mean correlation was higher for same-category re­
sponse patterns than for different-category response
patterns when presented in the same retinotopic po­
sition (first two bars; p < 10–4). Error bars indicate
SEM across subjects. (Adapted with permission from
Sayres & Grill-Spector, 2008.)

Is the position information in the LO a consequence of an orderly retinotopic map (similar


to retinotopic organization in lower visual areas)? By measuring retinotopic maps in the
LO using standard traveling wave paradigms (Sayres & Grill-Spector, 2008; Wandell,
1999), we found a continuous mapping of the visual field in the LO in terms of both ec­
centricity and polar angle. This retinotopic map (p. 18) contained an over-representation
of the contralateral and lower visual field (more voxels preferred these visual field posi­
tions than ipsilateral and upper visual fields). Although we did not consistently find a sin­
gle visual field map (a single hemifield or quarterfield representation) in LO, it over­
lapped the visual map named LO2 (Larsson & Heeger, 2006) and extended inferior to it.
This suggests that there is retinotopic information in the LO which explains the position
sensitivity found in the MVPA.

A related recent study examined position sensitivity using pattern analysis more broadly
across the ventral stream, providing additional evidence for a hierarchical organization
across the ventral stream (Schwarzlose et al., 2008). Schwarzlose and colleagues found
that distributed responses to a particular object category (faces, body parts, or scenes)
were similar across positions in ventral temporal regions (e.g., pFus/OTS and FBA) but
changed across positions in occipital regions (e.g., EBA and LO). Thus, accumulating evi­
dence from both fMRI-A and pattern analysis studies suggests a hierarchy of representa­

Page 12 of 29
Representation of Objects

tions in the human ventral stream through which representations become less sensitive to
object position as one ascends the visual hierarchy.

Implications for Theories of Object Recognition

It is important to relate imaging results to the concept of position-invariant representa­


tions of objects and object categories. What exactly is implied by the term invariance
depends on the scientific context. In some instances, this term is taken to reflect a neural
representation that is abstracted so as to be independent of viewing conditions. A fully in­
variant representation, in this meaning of the term, is expected to be completely indepen­
dent of retinal position information (Biederman & Cooper, 1991). However, in the context
of studies of visual cortex, the term is more often considered to be a graded phenomenon,
in which neural populations are expected to retain some degree of sensitivity to visual
transformations (like position changes) but in which stimulus selectivity is preserved
across these transformations (DiCarlo & Cox, 2007; Kobatake & Tanaka, 1994; Rolls &
Milward, 2000). In support of this view, a growing literature suggests that maintaining lo­
cal position information within a distributed neural representation may actually aid in­
variant recognition in several ways (DiCarlo & Cox, 2007; Dill & Edelman, 2001; Sayres &
Grill-Spector, 2008). First, maintaining separable information about position and category
may also allow maintaining information about the structural relationships between object
parts (Edelman & Intrator, 2000). Second, separable position and object information may
provide a robust way to generate position invariance by using a population code. Accord­
ing to this model, objects are represented as manifolds in a high dimensional space
spanned by a population of neurons. The separability of position and object information
may allow for fast decisions based on linear computations (e.g., linear discriminant func­
tions) to determine the object identity (or category) across positions (see DiCarlo & Cox,
2007). Finally, separable object and position information may allow concurrent localiza­
tion and recognition of objects, that is, recognizing what the object is and also where it is.

Evidence for Viewpoint Sensitivity Across the


Lateral Occipital Complex
Another source of change in object appearance that merits separate consideration is
change across rotation in depth. In contrast to position or size changes, for which invari­
ance may be achieved by a linear transformation, the shape of objects changes with depth
rotation. This is because the visual system (p. 19) receives 2D retinal projections of 3D ob­
jects. Some theories suggest that view-invariant recognition across object rotations or
changes in the observer viewing angle are accomplished by largely view-invariant repre­
sentations of objects (generalized cylinders, Marr, 1980; the RBC model, Biederman,
1987). That is, the underlying neural representations respond similarly to an object
across its views. However, other theories suggest that object representations are view de­
pendent, that is, consist of several 2D views of an object (Bulthoff et al., 1995; Bulthoff &
Edelman, 1992; Edelman & Bulthoff, 1992; Poggio & Edelman, 1990; Tarr & Bulthoff,

Page 13 of 29
Representation of Objects

1995; Ullman, 1989). Invariant object recognition is accomplished by interpolation across


these views (Logothetis et al., 1995; Poggio & Edelman, 1990; Ullman, 1989) or by a dis­
tributed neural code across view-tuned neurons (Perrett et al., 1998).

Single-unit electrophysiology studies in primates indicate that most neurons in monkey IT


cortex are view dependent (Desimone et al., 1984; Logothetis et al., 1995; Perrett, 1996;
Vogels & Biederman, 2002; Wang et al., 1996), with a small minority (5–10 percent) of
neurons showing view-invariant responses across object rotations (Booth & Rolls, 1998;
Logothetis et al., 1995;).

In humans, results vary considerably. Short-lagged fMRI-A experiments, in which the test
stimulus is presented immediately after the adapting stimulus (Grill-Spector et al.,
2006a), suggest that object representations in the lateral occipital complex are view de­
pendent (Fang et al., 2007; Gauthier et al., 2002; Grill-Spector et al., 1999; but see Va­
lyear et al., 2006). However, long-lagged fMRI-A experiments, in which many intervening
stimuli occur between the test and adapting stimulus (Grill-Spector et al., 2006a), have
provided some evidence for view-invariant representations in the ventral LOC, especially
in the left hemisphere (James et al., 2002; Vuilleumier et al., 2002) and the PPA, (Epstein
et al., 2008). Also, a recent study showed that the distributed LOC responses to objects
remained stable across 60-degree rotations (Eger et al., 2008). Presently, there is no con­
sensus across experimental findings in the degree to which ventral stream representa­
tions are view dependent or view invariant. These variable results may reflect differences
in the neural representations depending on object category and cortical region, or
methodological differences across studies (e.g., level of object rotation and fMRI-A para­
digm used).

To address these differential findings, in a recent study we used a parametric approach to


investigating sensitivity to object rotation and used a computational model to link be­
tween putative neural tuning and resultant fMRI measurements (Andresen et al., 2009).
The parametric approach allows a richer characterization of rotation sensitivity because
it measures the degree of sensitivity to rotations rather than characterizing representa­
tions as one of two possible alternatives: “invariant” or “not invariant.” We used fMRI-A
to measure viewpoint sensitivity as a function of the rotation level for two object cate­
gories: animals and vehicles. Overall, we found sensitivity to object rotation in the LOC.
However, there were differences across categories and regions. First, there was higher
sensitivity to vehicle rotation than animal rotation. Rotations of 60 degrees produced a
complete recovery from adaptation for vehicles, but rotations of 120 degrees were neces­
sary to produce recovery from adaptation for animals (Figure 2.5). Second, we found evi­
dence for over-representation of the front view of animals in the right pFus/OTS: its re­
sponses to animals were higher for the front view than the back view (compare black and
gray circles in Figure 2.5b, right). In addition, fMRI-A effects across rotation varied ac­
cording to the adapting view (see Figure 2.5b, right). When adapting with the back view
of animals, we found recovery from adaptation for rotations of 120 degrees or larger, but
when adapting with the front view of animals, there was no significant recovery from
adaptation across rotations. One interpretation is that there is less sensitivity to rotation

Page 14 of 29
Representation of Objects

when adapting with front views than back views of animals. However, subjects’ behav­
ioral performance in a discrimination task across object rotations showed that they are
equally sensitive to rotations (performance decreases with rotation level) whether rota­
tions are relative to the front or back of an animal (Andresen et al., 2009), suggesting
that this interpretation is unlikely. Alternatively, the apparent rotation cross-adaptation
may be due to lower responses for back views of animals. That is, the apparent adapta­
tion across rotation from the front view to the back view is driven by lower responses to
the back view rather than adaptation across 180-degree rotations.

Figure 2.5 LO and pFus/OTS responses during fMRI-


A experiments of rotation sensitivity Each line repre­
sents response after adapting with a front (dashed
black) or back (solid gray) view of an object. The
nonadapted response is indicated by diamonds (black
for front view and gray for back view). The open cir­
cles indicate significant adaptation, lower than non­
adapted, p < 0.05, paired t-test across subjects. (a)
Vehicle data. (b) Animal data. Responses are plotted
relative to a blank fixation baseline. Error bars
indicate SEM across eight subjects. (Adapted with
permission from Anderson, Vinberg, & Grill-Spector,
2009.)

To better characterize the underlying representations and examine which representations


may lead to our observed results, we simulated putative neural responses in a voxel and
predicted the resultant (p. 20) blood oxygen level dependent (BOLD) responses. In the
model, each voxel contains a mixture of neural populations, each tuned to a different ob­
ject view (Andresen et al., 2009) (Figure 2.6). blood oxygen level dependent (BOLD) re­
sponses were modeled to be proportional to the sum of responses across all neural popu­
lations. We simulated the BOLD responses in fMRI-A experiments. Results of the simula­
tions indicate that two main parameters affected the pattern of fMRI data: (1) the view

Page 15 of 29
Representation of Objects

tuning width of the neural population and (2) the proportion of neurons in a voxel that
prefer a specific object view.

Figure 2.6a shows the response characteristics of a model of a putative voxel containing a
mixture of view-dependent neural populations tuned to different object views, in which
the distribution of neurons tuned to different object views is uniform. In this model, nar­
rower neural tuning to object view (left) results in recovery from fMRI-A for smaller rota­
tions than wider view tuning (right). Responses to front and back views are identical
when there is no adaptation (see Figure 2.6a, diamonds), and the pattern of adaptation as
a function of rotation is similar when adapting with the front or back views (see Figure
2.6a). Such a model provides an account of responses to vehicles across object-selective
cortex (as measured with fMRI), and for animals in the LO. Thus, this model suggests that
the difference between the representation of animals and vehicles in the LO is likely due
to a smaller population view tuning for vehicles than animals (a tuning width of σ < 40°
produces complete recovery from adaptation for rotations larger than 60 degrees, as ob­
served for vehicles).

Figure 2.6b shows simulation results when there is a prevalence of neurons to the front
view of objects. This simulation shows higher BOLD responses to frontal views without
adaptation (gray vs. black diamonds) and a flatter profile of fMRI-A across rotations when
adapting with the front view. These simulation results are consistent with our observa­
tions in pFus/OTS and indicate that the differential recovery from adaptation as a func­
tion of the adapting animal view may be a consequence of a larger neural population
tuned to front views of animals.

Page 16 of 29
Representation of Objects

Implications for Theories of Object Recognition

Figure 2.6 Simulations predicting fMRI responses of


putative voxels containing a mixture of view-depen­
dent neural populations. Left: schematic illustration
of the view tuning and distribution of neural popula­
tions tuned to different views in a voxel. Right: result
of model simulations illustrating the predicted fMRI-
A data. In all panels, the model includes six Gaus­
sians tuned to specific views around the viewing cir­
cle, separated 60° apart. Across columns, the view
tuning width varies; across rows, the distribution of
neural populations preferring specific views varies.
Diamonds, responses without adaptation; black, back
view; gray, front view; lines, response after adapta­
tion with a front view (dashed gray line) or back view
(solid black line). (a) Mixture of view-dependent
neural populations that are equally distributed in a
voxel. Narrower tuning (left) shows recovery from
fMRI-A for smaller rotations than wider view tuning (r
ight). This model predicts the same pattern of recov­
ery from adaptation when adapting with the front or
back view. (b) Mixture of view-dependent neural pop­
ulations in a voxel with a higher proportion of neu­
rons that prefer the front view. The number on the
right indicates the ratio between the percentages
neurons tuned to the front vs. back view. Top row: ra­
tio = 1.2; bottom row: ratio = 1.4. Because there are
more neurons tuned to the front view in this model, it
predicts higher BOLD responses to frontal views
without adaptation (gray vs. black diamonds) and a
flatter profile of fMRI-A across rotations when adapt­
ing with the front view. (Adapted with permission
from Anderson, Vinberg, & Grill-Spector, 2009).

Overall, recent results provide empirical evidence for view-dependent object representa­
tion across human object-selective cortex that is evident both with standard fMRI and fM­
RI-A measurements. These data provide important empirical constraints for theories of
object recognition and highlight the importance of parametric manipulations for captur­
ing neural selectivity to any type of stimulus transformation. They findings also generate
new questions. For example, if there is no view-invariant neural representation in the hu­
man ventral temporal cortex, how is view invariant object recognition accomplished? One

Page 17 of 29
Representation of Objects

possibility is that view invariant recognition is achieved by utilizing a population code


across neurons, where each neuron itself is not view invariant, but the responses of the
populations to views of an object are separable from views of other objects (Perret et al,
1998; Cox & Dicarlo, 2007). Thus the (p. 21) distributed pattern of responses across neu­
rons separates among views of one object from views of another object. Another possibili­
ty is that downstream neurons in the anterior temporal lobe or prefrontal cortex read out
the information from ventral temporal cortex and these downstream neurons contain
view-invariant representations supporting behavior (Friedman et al., 2003, 2008; Quiroga
et al., 2005, 2008).

Debates about the Nature of Functional Orga­


nization in the Human Ventral Stream
So far, we have considered general computational principles that are required by any ob­
ject recognition system. Nevertheless, it is possible that some object classes or domains
require specialized computations. The rest of this chapter examines functional specializa­
tion in the ventral stream that may be linked to these putative “domain-specific” compu­
tations.

As illustrated in Figure 2.1, several regions in the ventral stream exhibit higher responses
to particular object categories such as places, faces, and body parts compared with other
object categories. Findings of category-selective regions initiated a fierce debate about
the principles of functional organization in the ventral stream. Are there regions in the
cortex that are specialized for (p. 22) any object category? Is there something special
about computations relevant to specific categories that generate specialized cortical re­
gions for these computations? That is, perhaps some general processing is applied to all
objects, but some computations may be specific to certain domains and may require addi­
tional brain resources.

In explaining the pattern of functional selectivity in the ventral stream, four prominent
views have emerged. The main debate centers on the question of whether regions that
elicit maximal response for a category should be treated as a module for the representa­
tion of that category, or whether they are part of a more general object recognition sys­
tem.

Handful of Category-Specific Modules and a General Purpose Region for


Processing All Other Objects
Kanwisher and coworkers (Kanwisher, 2000; Op de Beeck et al., 2008) suggested that the
ventral temporal cortex contains a limited number of modules specialized for the recogni­
tion of special object categories such as faces (in the FFA), places (in the PPA), and body
parts (in the EBA and FBA). The remaining object-selective cortex (LOC), which shows lit­
tle selectivity for particular object categories, is a general-purpose mechanism for per­
ceiving any kind of visually presented object or shape. The underlying hypothesis is that
there are few “domain-specific modules” that perform computations specific to these
Page 18 of 29
Representation of Objects

classes of stimuli beyond what would be required from a general object recognition sys­
tem. For example, faces, like other objects, need to be recognized across variations in
their appearance (a domain-general process). However, given the importance of face pro­
cessing for social interactions, there are aspects of face processing that are unique. Spe­
cialized face processing may include identifying faces at the individual level (e.g., John vs.
Harry), extracting gender information, gaze, expression, and so forth. These unique face-
related computations may be implemented in face-selective regions.

Process Maps
Tarr and Gauthier (2000) proposed that object representations are clustered according to
the type of processing that is required, rather than according to their visual attributes. It
is possible that different levels of processing may require dedicated computations that
are performed in localized cortical regions. For example, faces are usually recognized at
the individual level (e.g. “Bob Jacobs”), but many objects are typically recognized at the
category level (e.g. “a horse”). Following this reasoning, and evidence that objects of ex­
pertise activate the FFA more than other objects (Gauthier et al., 1999, 2000), Gauthier,
Tarr, and their colleagues have suggested that the FFA is a region for subordinate identi­
fication of any object category that is automated by expertise (Gauthier et al., 1999, 2000;
Tarr & Gauthier, 2000).

Distributed Object Form Topography


Haxby et al. (2001) posited an “object form topography” in which occipito-temporal cor­
tex contains a topographically organized representation of shape attributes. The repre­
sentation of an object is reflected by a distinct pattern of response across all ventral cor­
tex, and this distributed activation produces the visual perception. Haxby et al. showed
that the activation patterns for eight object categories were replicable, and that the re­
sponse to a given category could be determined by the distributed pattern of activation
across the ventral temporal cortex. Further, they showed that it is possible to predict
what object category subjects viewed even when regions that show maximal activation to
a particular category (e.g., the FFA) were excluded (Haxby et al., 2001). Thus, this model
suggests that the ventral temporal cortex represents object category information in an
overlapping and distributed fashion.

One of the reasons that this view is appealing is that a distributed code is a combinatorial
code that allows representation of a large number of object categories. Given
Biederman’s rough estimate that humans can recognize about 30,000 categories (Bieder­
man, 1987), this provides a neural substrate that has a capacity to represent such a large
number of categories. Second, this model posited a provocative view that when consider­
ing information in the ventral stream, one needs to consider the weak signals as much as
the strong signals because both convey useful information.

Sparsely Distributed Representations of Faces and Body Parts


Recently, using high-resolution fMRI (HR- fMRI), we reported a series of alternating face-
and limb-selective activations that are arranged in a consistent spatial organization rela­
tive to each other (p. 23) as well as retinotopic regions and hMT+ (Weiner & Grill-Spector,
Page 19 of 29
Representation of Objects

2010, 2011, 2013). Specifically, our data illustrate that there is not just one distinct re­
gion selective for each category (i.e., a single FFA or FBA) in the ventral temporal cortex,
but rather a series of face- and limb-selective clusters that minimally overlap, with a con­
sistent organization relative to one another on a posterior-to-anterior axis on the OTS and
fusiform gyrus (FG). Our data also show an interaction between localized cortical clusters
and distributed responses across voxels outside these clusters. Our results further illus­
trate that even in weakly selective voxels outside of these clusters, the distributed re­
sponses for faces and limbs are distinct from one another. Nevertheless, there is signifi­
cantly more face information in the distributed responses in weakly and highly selective
voxels compared with nonselective voxels, indicating differential amounts of information
in these different subsets of voxels where weakly and highly selective voxels are more in­
formative than nonselective voxels.

These data suggest a fourth model—a sparsely distributed organization in the ventral
temporal cortex—mediating the debate between modular and distributed theories of ob­
ject representation. Sparsely refers to the presence of several face- and limb-selective
clusters with a distinct, minimally overlapping organization, and distributed refers to the
presence of information in weakly and nonselective voxels outside of these clusters. This
sparsely distributed organization is supported by recent cortical connectivity studies indi­
cating a hybrid modular and distributed organization (Borra et al., 2009; Zangenehpour &
Chaudhuri, 2005), as well as theoretical work of a sparse-distributed network (Kanerva,
1988).

Presently, there is no consensus in the field about which account best explains ventral
stream functional organization. Much of the debate centers on the degree to which object
processing is constrained to discrete modules or involves distributed computations across
large stretches of the ventral stream (Op de Beeck et al., 2008). The debate is both about
the spatial scale on which computations for object recognition occur and about the funda­
mental principles that underlie specialization in the ventral stream. On the one hand, do­
main-specific theories need to address findings of multiple foci that show selectivity. For
example, there are multiple foci in the ventral stream that respond more strongly to faces
versus objects. Thus, a strong modular account of a single “face module” for face recogni­
tion is unlikely. Second, the spatial extent of these putative modules is undetermined, and
it is unclear whether each of these category-selective regions corresponds to a visual
area. On the other hand, a very distributed and overlapping account of object representa­
tion in the ventral stream suffers from the potential problem that in order to resolve cate­
gory information, the brain may need to read out information present across the entire
ventral stream (which is inefficient). Further, the fact that there is information in the dis­
tributed response does not mean that the brain uses the information in the same way that
an independent classifier does. It is possible that activation in localized regions is more
informative for perceptual decisions than the information available across the entire ven­
tral stream (Grill-Spector et al., 2004; Williams et al., 2007). For example, FFA responses
predict when subjects recognize faces and birds, but do not predict when subjects recog­
nize houses, guitars, or flowers (Grill-Spector et al., 2004). The recent sparsely distrib­
uted model we proposed attempts to bridge between the extreme modular views and
Page 20 of 29
Representation of Objects

highly distributed and overlapping views of organization of the ventral temporal cortex.
One particular appeal of this view is that it is closely tied to the measurements and allows
for additional clusters to be incorporated into the model. As scanning resolutions improve
for human fMRI studies, the number of clusters is likely to increase, but the alternating
nature of face and limb representations is likely to remain in adjacent activations, as also
suggested by monkey fMRI (Pinsk et al., 2009).

Open Questions and Future Directions


In sum, neuroimaging research has advanced our understanding of object representa­
tions in the human brain. These studies have identified regions involved in object recogni­
tion, and have laid fundamental stepping stones in understanding the neural mechanisms
underlying invariant object recognition.

However, many questions remain. First, what is the relationship between neural sensitivi­
ty to object transformations and behavioral sensitivity to object transformations? Do bias­
es in neural representations produce biases in performance? For example, empirical evi­
dence shows over-representation of the lower visual field in LO. Does this lead to better
recognition in the lower than upper visual field? Second, what information does the visual
system use to build invariant object representations? Third, (p. 24) what computations are
implemented in distinct cortical regions involved in object recognition? Does the “aha”
moment in recognition involve a specific response in a particular brain region, or does it
involve a distributed response across a large cortical expanse? Combining experimental
methods such as fMRI and EEG will provide high spatial and temporal resolution, which
is critical to addressing this question. Fourth, why do representation of few categories
such as faces or body parts yield local clustered activations while many other categories
(e.g., manmade objects) produce more diffuse and less intense responses across the ven­
tral temporal cortex? Fifth, what is the pattern of connectivity between ventral stream vi­
sual regions in the human brain? Although the connectivity in monkey visual cortex has
been extensively explored (Moeller et al., 2008; Van Essen et al., 1990), there is little
knowledge about connectivity between cortical visual areas in the human ventral stream.
This knowledge is necessary for building a model of hierarchical processing in humans
and any neural network model of object recognition. Future directions that combine
methodologies, such as psychophysics with fMRI, EEG with fMRI, or diffusion tensor
imaging with fMRI, will be instrumental in addressing these fundamental questions.

Acknowledgements
I thank David Andresen, Rory Sayres, Joakim Vinberg, and Kevin Weiner for their contri­
butions to the research summarized in this chapter. This work was supported by NSF
grant and NEI grant.

Page 21 of 29
Representation of Objects

References
Andresen, D. R., Vinberg, J., & Grill-Spector, K. (2009). The representation of object view­
point in the human visual cortex. NeuroImage, 45, 522–536.

Appelbaum, L. G., Wade, A. R., Vildavski, V. Y., Pettet, M. W., & Norcia, A. M. (2006). Cue-
invariant networks for figure and background processing in human visual cortex. Journal
of Neuroscience, 26, 11695–11708.

Bar, M., Tootell, R. B., Schacter, D. L., Greve, D. N., Fischl, B., Mendola, J. D., Rosen, B. R.,
& Dale, A. M. (2001). Cortical mechanisms specific to explicit visual object recognition.
Neuron, 29, 529–535.

Biederman, I. (1987). Recognition-by-components: A theory of human image understand­


ing. Psychological Review, 94, 115–147.

Biederman, I., & Cooper, E. E. (1991). Evidence for complete translational and reflection­
al invariance in visual object priming. Perception, 20, 585–593.

Borra, E., Ichinohe, N., Sato, T., Tanifuji, M., Rockland KS. (2010). Cortical connections to
area TE in monkey: hybrid modular and distributed organization. Cereb Cortex, 20 (2):
257–70.

Booth, M. C., & Rolls, E. T. (1998). View-invariant representations of familiar objects by


neurons in the inferior temporal visual cortex. Cerebral Cortex, 8, 510–523.

Bulthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view
interpolation theory of object recognition. Proceedings of the National Academy of
Sciences U S A 89: 60–64.

Bulthoff, H. H., Edelman, S. Y., & Tarr, M. J. (1995). How are three-dimensional objects
represented in the brain? Cerebral Cortex 5, 247–260.

Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates in tempo­
ral cortex for perceiving and knowing about objects. Nature Neuroscience, 2, 913–919.

Cohen, L., Dehaene, S., Naccache, L., Lehericy S., Dehaene-Lambertz, G., Henaff, M. A.,
& Michel, F. (2000). The visual word form area: Spatial and temporal characterization of
an initial stage of reading in normal subjects and posterior split-brain patients. Brain, 123
(2), 291–307.

Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153, 180–189.

Desimone, R., Albright, T. D., Gross, C. G., & Bruce, C. (1984). Stimulus-selective proper­
ties of inferior temporal neurons in the macaque. Journal of Neuroscience, 4, 2051–2062.

Page 22 of 29
Representation of Objects

DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog­
nitive Science, 11, 333–341.

DiCarlo, J. J., & Maunsell, J. H. (2003). Anterior inferotemporal neurons of monkeys en­
gaged in object recognition can be highly sensitive to object retinal position. Journal of
Neurophysiology, 89, 3264–3278.

Dill, M., & Edelman, S. (2001). Imperfect invariance to object translation in the discrimi­
nation of complex shapes. Perception, 30: 707–724.

Downing, P. E., Jiang, Y., Shuman, M., & Kanwisher, N. (2001). A cortical area selective
for visual processing of the human body. Science, 293, 2470–2473.

Edelman, S., & Bulthoff, H. H. (1992) Orientation dependence in the recognition of famil­
iar and novel views of three-dimensional objects. Vision Research, 32, 2385–2400.

Edelman, S., & Intrator, N. (2000), (Coarse coding of shape fragments) + (retinotopy) ap­
proximately = representation of structure. Spatial Vision, 13, 255–264.

Eger, E., Ashburner, J., Haynes, J. D., Dolan, R. J., & Rees, G. (2008). fMRI activity pat­
terns in human LOC carry information about object exemplars within category. Journal of
Cognitive Neuroscience, 20, 356–370.

Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­
ment. Nature, 392, 598–601.

Epstein, R. A., Parker, W. E., & Feiler, A. M. (2008). Two kinds of fMRI repetition suppres­
sion? Evidence for dissociable neural mechanisms. Journal of Neurophysiology, 99 (6),
2877–2886.

Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.

Farah, M. J. (1995). Visual agnosia. Cambridge, MA: MIT Press.

Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2001) Categorical rep­
(p. 25)

resentation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316.

Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri­
mate prefrontal and inferior temporal cortices during visual categorization. Journal of
Neuroscience, 23, 5235–5246.

Fujita, I., Tanaka, K., Ito, M., & Cheng, K. (1992). Columns for visual features of objects in
monkey inferotemporal cortex. Nature, 360, 343–346.

Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and
birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197.

Page 23 of 29
Representation of Objects

Gauthier, I., Tarr, M. J., Anderson, A. W., Skudlarski, P., & Gore, J. C. (1999). Activation of
the middle fusiform “face area” increases with expertise in recognizing novel objects. Na­
ture Neuroscience, 2, 568–573.

Gilaie-Dotan, S., Ullman, S., Kushnir, T., & Malach, R. (2002). Shape-selective stereo pro­
cessing in human object-related visual areas. Human Brain Mapping, 15, 67–79.

Grill-Spector, K. (2003). The neural basis of object perception. Current Opinion in Neuro­
biology, 13, 159–166.

Grill-Spector, K., Golarai, G., & Gabrieli, J. (2008). Developmental neuroimaging of the hu­
man ventral visual cortex. Trends in Cognitive Science, 12, 152–162.

Grill-Spector, K., Henson, R., & Martin, A. (2006a). Repetition and the brain: Neural mod­
els of stimulus-specific effects. Trends in Cognitive Science, 10, 14–23.

Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area subserves
face perception, not generic within-category identification. Nature Neuroscience, 7, 555–
562.

Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Dif­
ferential processing of objects under various viewing conditions in the human lateral oc­
cipital complex. Neuron, 24, 187–203.

Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., & Malach, R. (1998a). Cue-invariant
activation in object-related areas of the human occipital lobe. Neuron, 21, 191–202.

Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., & Malach, R. (1998b). A
sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hu­
man Brain Mapping, 6, 316–328.

Grill-Spector, K., Kushnir, T., Hendler, T., & Malach, R. (2000). The dynamics of object-se­
lective activation correlate with recognition performance in humans. Nature Neuro­
science, 3, 837–843.

Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica (Amst), 107, 293–321.

Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro­
science, 27, 649–677.

Hasson, U., Harel, M., Levy, I., & Malach, R. (2003). Large-scale mirror-symmetry organi­
zation of human occipito-temporal object areas. Neuron, 37, 1027–1041.

Hasson, U., Levy, I., Behrmann, M., Hendler, T., & Malach, R. (2002). Eccentricity bias as
an organizing principle for human high-order object areas. Neuron, 34, 479–490.

Page 24 of 29
Representation of Objects

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor­
tex. Science, 293, 2425–2430.

Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system
for face perception. Trends in Cognitive Science, 4, 223–233.

Hemond, C. C., Kanwisher, N. G., & Op de Beeck, H. P. (2007). A preference for contralat­
eral stimuli in human object- and face-selective cortex. PLoS ONE, 2, e574.

James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differen­
tial effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron,
35, 793–801.

Johnson, M. H. (2001). Functional brain development in humans. Nature Reviews, Neuro­


science, 2, 475–483.

Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3,


759–763.

Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17,
4302–4311.

Kastner, S., De Weerd, P., & Ungerleider, L. G. (2000). Texture segregation in the human
visual cortex: A functional MRI study. Journal of Neurophysiology, 83, 2453–2457.

Kobatake, E., & Tanaka, K. (1994). Neuronal selectivities to complex object features in
the ventral visual pathway of the macaque cerebral cortex. Journal of Neurophysiology,
71, 856–867.

Kourtzi, Z., & Kanwisher, N. (2000). Cortical regions involved in perceiving object shape.
Journal of Neuroscience, 20, 3310–3318.

Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the hu­
man lateral occipital complex. Science, 293, 1506–1509.

Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K. (2003). Integra­
tion of local features into global shapes: monkey and human fMRI studies. Neuron, 37,
333–346.

Larsson, J., & Heeger, D. J. (2006). Two retinotopic visual areas in human lateral occipital
cortex. Journal of Neuroscience, 26, 13128–13142.

Lerner, Y., Epshtein, B., Ullman, S., & Malach, R. (2008). Class information predicts acti­
vation by object fragments in human object areas. Journal of Cognitive Neuroscience, 20,
1189–1206.

Page 25 of 29
Representation of Objects

Lerner, Y., Hendler, T., Ben-Bashat, D., Harel, M., & Malach, R. (2001). A hierarchical axis
of object processing stages in the human visual cortex. Cerebral Cortex, 11, 287–297.

Lerner, Y., Hendler, T., & Malach, R. (2002). Object-completion effects in the human later­
al occipital complex. Cerebral Cortex, 12, 163–177.

Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organi­
zation of human object areas. Nature Neuroscience, 4, 533–539.

Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior tem­
poral cortex of monkeys. Current Biology, 5, 552–563.

Malach, R., Levy, I., & Hasson, U. (2002) The topography of high-order human object ar­
eas. Trends in Cognitive Science, 6, 176–184.

Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., Ledden,
P. J., Brady, T. J., Rosen, B. R., & Tootell, R. B. (1995). Object-related activity revealed by
functional magnetic resonance imaging in human occipital cortex. Proceedings of the Na­
tional Academy of Sciences U S A, 92, 8135–8139.

Marr, D. (1980). Visual information processing: The structure and creation of visu­
(p. 26)

al representations. Philosophical Transactions of the Royal Society of London, Series B,


Biological Sciences, 290, 199–218.

Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of cat­
egory-specific knowledge. Nature, 379, 649–652.

McKyton, A., & Zohary, E. (2007). Beyond retinotopic mapping: The spatial representa­
tion of objects in the human lateral occipital complex. Cerebral Cortex, 17, 1164–1172.

Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K., & Tootell, R. B. (1999). The representa­
tion of illusory and real contours in human cortical visual areas revealed by functional
magnetic resonance imaging. Journal of Neuroscience, 19, 8560–8572.

Miller, E. K., Li, L., & Desimone, R. (1991). A neural mechanism for working and recogni­
tion memory in inferior temporal cortex. Science, 254, 1377–1379.

Moeller, S., Freiwald, W. A., & Tsao, D. Y. (2008). Patches with links: A unified system for
processing faces in the macaque temporal lobe. Science, 320, 1355–1359.

Nakayama, K., He, Z. J., & Shimojo, S. (1995). Visual surface representation: A critical
link between low-level and high-level vision. In S. M. Kosslyn & D. N. Osherson (Eds.), An
invitation to cognitive sciences: Visual cognition. Cambridge, MA: MIT Press.

Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules and dimensions. Nature Reviews, Neuroscience, 9, 123–135.

Op De Beeck, H., & Vogels, R. (2000). Spatial sensitivity of macaque inferior temporal
neurons. Journal of Comparative Neurology, 426, 505–518.
Page 26 of 29
Representation of Objects

Perrett, D. I. (1996). View-dependent coding in the ventral stream and its consequence
for recognition. In R. Camaniti, K. P. Hoffmann, & A. J. Lacquaniti (Eds.), Vision and move­
ment mechanisms in the cerebral cortex (pp. 142–151). Strasbourg: HFSP.

Perrett, D. I., Oram, M. W., & Ashbridge, E. (1998). Evidence accumulation in cell popula­
tions responsive to faces: An account of generalisation of recognition without mental
transformations. Cognition, 67, 111–145.

Peterson, M. A., & Gibson, B. S. (1994a). Must shape recognition follow figure-ground or­
ganization? An assumption in peril. Psychological Science, 5, 253–259.

Peterson, M. A., & Gibson, B. S. (1994b). Object recognition contributions to figure-


ground organization: Operations on outlines and subjective contours. Perception and Psy­
chophysics, 56, 551–564.

Pinsk, MA., Arcaro, M., Weiner, KS., Kalkus, JF., Inati, SJ., Gross, CG., Kastner, S. (2009).
Neural representations of faces and body parts in macaque and human cortex: a compar­
ative FMRI study. J Neurophysiol. (5): 2581–600.

Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional
objects. Nature, 343, 263–266.

Quiroga, R. Q., Mukamel, R., Isham, E. A., Malach, R., & Fried, I. (2008). Human single-
neuron responses at the threshold of conscious recognition. Proceedings of the National
Academy of Sciences U S A, 105, 3599–3604.

Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual repre­
sentation by single neurons in the human brain. Nature, 435, 1102–1107.

Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex.
Nature Neuroscience, 2, 1019–1025.

Rolls, E. T. (2000). Functions of the primate temporal lobe cortical visual areas in invari­
ant visual object and face recognition. Neuron, 27, 205–218.

Rolls, E. T., & Milward, T. (2000). A model of invariant object recognition in the visual sys­
tem: Learning rules, activation functions, lateral inhibition, and information-based perfor­
mance measures. Neural Computation, 12, 2547–2572.

Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of neuronal adaptation does
not match response selectivity: A single-cell study of the FMRI adaptation paradigm. Neu­
ron, 49, 307–318.

Sayres, R., & Grill-Spector, K. (2008). Relating retinotopic and object-selective responses
in human lateral occipital cortex. Journal of Neurophysiology, 100 (1), 249–267.

Schwarzlose, R. F., Baker, C. I., & Kanwisher, N. K. (2005). Separate face and body selec­
tivity on the fusiform gyrus. Journal of Neuroscience, 25, 11055–11059.

Page 27 of 29
Representation of Objects

Schwarzlose, R. F., Swisher, J. D., Dang, S., & Kanwisher, N. (2008). The distribution of
category and location information across object-selective regions in human visual cortex.
Proceedings of the National Academy of Sciences U S A, 105, 4447–4452.

Stanley, D. A., & Rubin, N. (2003). fMRI activation in response to illusory contours and
salient regions in the human lateral occipital complex. Neuron, 37, 323–331.

Tarr, M. J., & Bulthoff, H. H. (1995). Is human object recognition better described by geon
structural descriptions or by multiple views? Comment on Biederman and Gerhardstein
(1993). Journal of Experimental Psychology: Human Perception and Performance, 21,
1494–1505.

Tarr, M. J., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual
processing automatized by expertise. Nature Neuroscience, 3, 764–769.

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system.
Nature, 381, 520–522.

Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cog­


nition, 32, 193–254.

Ungerleider, L. G., Mishkin, M., & Macko, K. A. (1983). Object vision and spatial vision:
Two cortical pathways. Trends in Neuroscience, 6, 414–417.

Van Essen, D. C., Felleman, D. J., DeYoe, E. A., Olavarria, J., & Knierim, J. (1990). Modular
and hierarchical organization of extrastriate visual cortex in the macaque monkey. Cold
Spring Harbor Symposia on Quantum Biology, 55, 679–696.

Vinberg, J., & Grill-Spector, K. (2008). Representation of shapes, edges, and surfaces
across multiple cues in the human visual cortex. Journal of Neurophysiology, 99, 1380–
1393.

Vogels, R., & Biederman, I. (2002). Effects of illumination intensity and direction on ob­
ject coding in macaque inferior temporal cortex. Cerebral Cortex, 12, 756–766.

Vuilleumier, P., Henson, R. N., Driver, J., & Dolan, R. J. (2002). Multiple levels of visual ob­
ject constancy revealed by event-related fMRI of repetition priming. Nature
Neuroscience, 5, 491–499.

Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. Annual Re­


view of Neuroscience, 22, 145–173.

Wang, G., Tanaka, K., & Tanifuji, M. (1996). Optical imaging of functional organization in
the monkey inferotemporal cortex. Science, 272, 1665–1668.

(p. 27) Weiner, K. S., & Grill-Spector, K. (2010). Sparsely-distributed organization of face
and limb activations in human ventral temporal cortex. NeuroImage, 52, 1559–1573.

Page 28 of 29
Representation of Objects

Weiner, KS., & Grill-Spector, K. (2011). Not one extrastriate body area: using anatomical
landmarks, hMT+, and visual field maps to parcellate limb-selective activations in human
lateral occipitotemporal cortex. NeuroImage, 56 (4): 2183–99.

Weiner, KS., & Grill-Spector, K. (2013). Neural representations of faces and limbs neigh­
bor in human high-level visual cortex: evidence for a new organization principle. Psychol
Res. 277 (1): 74–97.

Williams, M. A., Dang, S., & Kanwisher, N. G. (2007). Only some spatial patterns of fMRI
response are read out in task performance. Nature Neuroscience, 10, 685–686.

Zangenehpour, S., Chaudhuri, A., Zangenehpour, S., Chaudhuri A. (2005). Patchy organi­
zation and asymmetric distribution of the neural correlates of face processing in monkey
inferotemporal cortex. Curr Biol, 15 (11): 993–1005.

Kalanit Grill-Spector

Kalanit Grill-Spector is Associate Professor, Department of Psychology and Neuro­


science Institute, Stanford University.

Page 29 of 29
Representation of Spatial Relations

Representation of Spatial Relations  


Bruno Laeng
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0003

Abstract and Keywords

Research in cognitive neuroscience in humans and animals has revealed a considerable


degree of specialization of the brain for spatial functions, and has also revealed that the
brain’s representation of space is separable from its representation of object identities.
The current picture is that multiple and parallel frames of reference cooperate to make
up a representation of space that allows efficient navigation and action within the sur­
rounding physical environment. As humans, however, we do not simply “act” in space, but
we also “know” it and “talk” about it. Hence, the human brain’s spatial representations
may involve specifically human and novel patterns of lateralization and of brain areas’
specializations. Pathologies of space perception and spatially-directed attention, like spa­
tial neglect, can be explained by the damage to one or several of these maps and frames
of reference. The multiple spatial, cognitive, maps used by the human brain clearly coop­
erate toward flexible representations of spatial relations that are progressively abstract
(or categorical) and may be apt to support the human ability to communicate spatial in­
formation and understand mathematical concepts. Nevertheless, a representation of
space as extended and continuous is also necessary for the control of action and naviga­
tion.

Keywords: spatial representations, frames of reference, spatial neglect, lateralization, cognitive maps

Representing the space around our human bodies seems to serve three main functions or
goals: to “act,” to “know,” and to “talk.” First of all, humans need to navigate in their en­
vironment. Some navigational skills require a very precise adaptation of the movement of
the entire body to the presence of other external bodies and obstacles (whether these are
static or also mobile). Without an adequate representation of physical distances between
external bodies (and between these and oneself), actions like running in a crowd or dri­
ving in traffic would be impossible. Humans also possess hands and need to manipulate
objects; their highly developed control of fine finger movements makes possible the con­
struction and use of tools. Engaging in complex sequences of manual actions requires the
positioning and direction of movements over precise and narrow areas of space (e.g.,
when typing on the keyboard, opening a lock with a key, making a knot, playing a musical

Page 1 of 59
Representation of Spatial Relations

instrument). All these behaviors would be impossible without a fine-grained representa­


tion of the spatial distances and actual sizes of the objects and of their position relative to
each other and to the body and hand involved in the action.

However, humans do not solely “act” in space. Humans also “cognize” about space, For
example, we can think about an object being present in a place although neither the ob­
ject nor the place is any longer visible (i.e., Piagetian object permanence). We can think
about simple spatial schemata or complex mathematical spaces and geometries (Aflalo &
Graziano, 2006); the ability to represent space as a continuum may lie at the basis of our
understanding of objects’ permanence in space and, therefore, (p. 29) of numerosity (De­
haene, 1997). We can also engage in endless construction of meaning and parse the phys­
ical world into classes and concepts; in fact, abstraction and recognition of equivalences
between events (i.e., categorization) has obvious advantages, as the cultural evolution of
humans demonstrates. Categories can be expressed in symbols, which can be exchanged.
All this applies to spatial representations as well. For example, the category “to the left”
identifies as equivalent a whole class of positions and can be expressed either in verbal
language or with pictorial symbols (e.g., an arrow: ←). Clearly, humans not only “act” in
space and “know” space, but they also “talk” about space. The role played by spatial cog­
nition in linguistic function cannot be underestimated (e.g., “thinking for speaking”;
Slobin, 1996). Thus, a categorical, nonmetric, representation of space constitutes an addi­
tional spatial ability, one that could make the groundwork for spatial reference in lan­
guage.

The present discussion focuses on vision, in part because evolution has devoted to it a
large amount of the primate brain’s processing capacity (e.g., in humans, about 4 to 6 bil­
lion neurons and 20 percent of the entire cortical area; Maunsell & Newsome, 1987; Wan­
dell et al., 2009;); in part because vision is the most accurate spatial sense in primates
and is central to the human representation of space (“to see is to know what is where by
looking”; Marr, 1982). Nevertheless, it is important to acknowledge that a representation
of space can be obtained in humans through several sensory modalities (e.g., kinesthesia)
and that blind people do posses a detailed knowledge of space and can move and act in it
as well as talk about it. Some animals possess a representation of navigational space that
can be obtained through sensory information that is not available to humans (e.g., a
“magnetic” sense; Johnsen & Lohmann, 2005; Ritz, 2009). Honeybees’ “dances” can com­
municate spatial information that refers to locations where food can be found (Kirchner &
Braun, 1994; Menzel et al., 2000).

We begin by discussing (1) the existence of topographic maps in the brain. From these
maps, the brain extracts higher order representations of the external world that are dedi­
cated to the basic distinction between (2) “what” is out there versus “where” it is. These
two types of information need to be integrated for the control of action, and this yields in
turn the representation of (3) “how” an object can be an object of action and “which” spe­
cific object among multiple ones is located in a specific place at a given time. However, lo­
calization of objects can take place according to multiple and parallel (4) spatial frames of
reference; the deployment of spatial attention can also occur along these multiple refer­

Page 2 of 59
Representation of Spatial Relations

ence frames, as the pathology of spatial attention or (5) neglect, after brain damage, has
clearly revealed. Regarding the brain’s specialization for spatial cognition, humans show
a strong degree of (6) cerebral lateralization that may be unique in the animal world. The
current evidence indicates a right hemisphere’s specialization for analog (coordinate)
spatial information versus a left hemisphere’s specialization for digital (categorical) spa­
tial information. The representation of categorical spatial relations is also relevant for (7)
object recognition because an object’s structural description is made in terms of parts
and categorical spatial relations between these (i.e., the “where of what”). Finally, we dis­
cuss the brain’s representation of large-scale space, as it is used in navigation, or the (8)
“cognitive map” of the environment.

1. The Brain’s Topographic Maps


In the evolutionary history of vision, the primeval function of photoreceptors or “eyes”
may have been a raw sense of location (e.g., cardinal directions as up vs. down based on
sunlight; detecting motion paths or distances by use of optic flow; Ings, 2007). However,
the ability to focus a detailed “image” with information about wavelengths and spatial fre­
quencies that allow the extraction of colored surfaces and forms requires the evolution of
camera-like eyes with a single lens focusing light onto a mosaic of photoreceptors (e.g.,
the image-forming eye of the squid or the eyes of vertebrates; Lamb et al., 2007). The hu­
man retina allows the formation of an image that is very detailed spatially, and this detail
seems conserved at the early cortical level in the flow of information from the eye to suc­
cessive brain areas. The smallest cortical receptive fields processing spatial information
in human vision possess receptive field centers hardly wider than the single-cone pho­
toreceptors (Smallman et al., 1996).

The retina provides the initial topographic map for humans, where nearby scene points
are represented in the responses of nearby photoreceptors and, in turn, in a matrix of
neurons to which they provide their input. Areas of the brain receiving retinal input are
also topographically organized in retinotopic “visual field maps” (Tootell, Hadjikhani, et
al., 1998; Wandell et al., 2005). These preserve, to some extent, the geometric structure
of the retina, which in turn, by the laws of optic refraction, reflects the geometric struc­
ture of the external visual (p. 30) world as a planar projection onto a two-dimensional sur­
face (Figure 3.1).

Page 3 of 59
Representation of Spatial Relations

Figure 3.1 Topographic representations in primary


visual cortex (V1).

Reprinted with permission from Tootell, Hadjikhani,


Vanduffel W, et al., 1998. © 1998 National Academy
of Sciences, U.S.A.

There is now consensus that the topographic features of cortical and subcortical maps
are not incidental, but instead are essential to brain function (Kaas, 1997). A topographi­
cally organized structure can depict visual information as “points” organized by their rel­
ative locations in space and varying in size, brightness, and color. Points near each other
in the represented space are represented by points near each other in the representing
substrate; this (internal) space can be used to represent (external) space (Markman,
1999). Thus, one fundamental property of the brain’s representation of space is that the
brain uses space on the cortex to represent space in the world (Kosslyn, Thompson, & Ga­
nis, 2006). Specifically, topographic maps can evaluate how input from one set of recep­
tors can be different from that of adjoining sets of receptors. Local connections among
neurons that are topographically organized can easily set up center-surround receptive
fields and compare adjacent features. Other types of brain organization between units
sensitive to adjacent points requires more complex arrays and longer connections (Cher­
niak, 1990), which are metabolically (and evolutionarily) costly and result in increases in
neural transmission time. Cortical maps appear to be further arranged in spatial clusters
at a coarser scale (Wandell et al., 2005, 2009). This organization allows neural mosaics in
different maps that serve similar common computational goals to share resources (e.g.,
coordinating the timing of neural signals or temporarily storing memories; Wandell et al.,
2005). Thus, cortical maps may organize themselves to optimize nearest neighbor rela­
tionships (Kohonen, 2001) so that neurons that process similar information are located
near each other, minimizing wiring length.

A topographical neural design is clearly revealed by the effects of localized cortical dam­
age, resulting in a general loss of visual function restricted to a corresponding region
within the visual field (Horton & Hoyt, 1991). Several areas of the human visual cortex,
but also brainstem nuclei (e.g., the superior colliculus) and thalamus (e.g., the lateral

Page 4 of 59
Representation of Spatial Relations

geniculate nucleus and the pulvinar), are organized (p. 31) into retinotopic maps, which
preserve both left–right and top–bottom ordering. Consequently, cells that are close to­
gether on the sending surface (the retina) project to regions that are close together on
the target surface (Thivierge & Marcus, 2007). Remarkably, the dorsal surface of the hu­
man brain, extending from the posterior portion of the intraparietal sulcus forward, con­
tains several maps (400–700 mm2) that are much smaller than the V1 map (4000 mm2;
Wandell et al., 2005). The visual field is represented continuously as in V1, but the visual
field is split along the vertical meridian so that input to each hemisphere originates from
the contralateral visual hemifield. The two halves are thus “seamed” together by the long
connections of the corpus callosum.

If early vision’s topography has high resolution, later maps in the hierarchy are progres­
sively less organized topographically (Malach, Levy, & Hasson, 2002). As a result, the im­
age is represented at successive stages with decreasing spatial precision and resolution.
In addition, beyond the initial V1 cortical map, retinotopic maps are complex and patchy.
Consequently, adjacent points in the visual field are not represented in adjacent regions
of the same area in every case (Sereno et al., 1995). However, the topographic organiza­
tion of external space in the visual cortex is extraordinarily veridical compared with other
modalities. For example, the representation of bodily space in primary somatosensory
cortex (or the so-called homunculus) clearly violates a smooth and continuous spatial rep­
resentation of body parts. The face representation is inverted (Servos et al., 1999; Yang et
al., 1994), and the facial skin areas are located between those representing the thumb
and the lower lip of the mouth (Nguyen et al., 2004). Also, similarly to the extensive rep­
resentation of the thumb in somatosensory cortex (in fact larger than that of the skin of
the whole face), cortical visual maps magnify or compress distances in some portions of
the visual field. The central 15 degrees of vision take up about 70 percent of cortical
area; and the central 24 degrees cover 80 percent (Fishman, 1997; Zeki, 1969). In V2,
parts of the retina that correspond to the upper half of the visual field are represented
separately from parts that respond to the lower half of the visual field. Area MT repre­
sents only the binocular field, and V4 only the central 30 to 40 degrees, whereas the pari­
etal areas represent more of the periphery (Gatass et al., 2005; Sereno et al., 2001). Cal­
losal connections in humans allow areas of the inferior parietal cortex and the fusiform
gyrus in the temporal lobe to deal with stimuli presented in the ipsilateral visual field
(Tootell, Mendola, et al., 1998). These topographic organizations have been revealed by a
variety of methods, including clinical studies of patients (Fishman, 1997); animal re­
search (Felleman & Van Essen, 1991; Tootell et al., 1982); and more recently, neuroimag­
ing in healthy humans (Engel et al., 1994, 1997; Sereno et al., 1995; DeYoe et al., 1996;
Wandell, 1999) and brain stimulation (Kastner et al., 1998). In some of the retinotopic
mapping studies with functional magnetic resonance imaging (fMRI), participants per­
formed working memory tasks or planning eye movements (Silver & Kastner, 2009).
These studies revealed the existence of previously unknown topo-graphic maps of visual
space in the human parietal (e.g., Sereno & Huang, 2006) and frontal (e.g., Kastner et al.,
2007) lobes.

Page 5 of 59
Representation of Spatial Relations

Another clear advantage of a topographical organization of the visual brain would be in


guiding ocular movements by maintaining a faithful representation of the position of the
target of a saccade (Optican, 2005). In addition, a topographical organization provides ex­
plicit and accessible information that represents the external world, beginning with the
extraction of boundaries and limits of surfaces of objects and ground (Barlow, 1981). Ac­
cording to Marr (1982): “A representation is a formal system for making explicit certain
entities or types of information.” If different properties or features of the physical infor­
mation are encoded or made explicit at any level of the flow of information, then qualita­
tively different types of information will be represented.

2. “What” Versus “Where” in the Visual System


The brain can be defined as a largely special-purpose machine in which a “division of la­
bor” between brain areas is the pervasive principle of neural organization. After more
than a century of systematic brain research, the idea that functions fractionate into a pro­
gressively modular brain structure has achieved axiomatic status (Livingstone & Hubel,
1988; Zeki, 2001). Specifically, a perceptual problem is most easily dealt with by dividing
the problem into smaller subproblems, as independent of the others as possible so as not
to disrupt each other (Gattass et al., 2005). One way in which the visual brain accomplish­
es this division of labor is by separating visual information into two streams of processing
or, namely, a “what” system and a “where” system. It may appear odd that the brain or
cognitive system separates visual attributes that in the physical world are con­
(p. 32)

joined. Disjoining attributes of the same object exacerbates the problem of integrating
them (i.e., the so-called binding problem; Treisman, 1996; Revonsuo & Newman, 1999;
Seth et al., 2004). However, computer simulations with artificial neural networks have
demonstrated that two subsystems can be more efficient than one in computing different
mappings of the same input at the same time (Otto et al., 1992; Reuckl, Cave & Kosslyn,
1989).

Thus, cognitive neuroscience reveals that the brains of humans and, perhaps, of all other
vertebrates process space and forms (bodies) as independent aspects of reality. Thus, the
visual brain can be divided between “two cortical visual systems” (Ingle, 1967; Mishkin,
Ungerleider & Macko, 1983; Schneider, 1967; Ungerleider & Mishkin, 1982). A ventral
(mostly temporal cortex in humans) visual stream mediates object identification (“what is
it?”), and a dorsal (mostly parietal cortex) visual stream mediates localization of objects
in space (“where is it?”). This partition is most clearly documented for the visual modality
in our species (and in other primates), although it appears to be equally valid for other
sensory processes (e.g., for the auditory system, Alain et al., 2001; Lomber & Malhotra,
2008; Romanski et al., 1999; for touch, Reed et al., 2005). Information processed in the
two streams appears to be integrated at a successive stage in the superior temporal lobe
(Morel & Bullier, 1990), where integration of the two pathways could recruit ocular move­
ments to the location of a shape and provide some positional information to the temporal
areas (Otto et al., 1992). Further integration of shape and position information occurs in

Page 6 of 59
Representation of Spatial Relations

the hippocampus where long-term memories of “configurations” of the life environment


(Sutherland & Rudy, 1988) or its topography (O’Keefe & Nadel, 1978) are formed.

Similarly to the classic Newtonian distinction between matter and absolute space, the dis­
tinction between what and where processing assumes that (1) objects can be represented
by the brain independently from their locations, and (2) locations can be perceived as im­
material points in an immaterial medium, empty of objects. Indeed, at a phenomenologi­
cal level, space primarily appears as an all-pervading stuff, an incorporeal or “ethereal”
receptacle that can be filled by material bodies, or an openness in which matter spreads
out. In everyday activities, a region of space is typically identified with reference to
“what” is or could be located there (e.g., “the book is on the table”; Casati & Varzi, 1999).
However, we also need to represent empty or “negative” space because trajectories and
paths toward other objects or places (Collett, 1982) occur in a spatial medium that is free
of material obstacles. The visual modality also estimates distance (i.e., the empty portion
of space between objects), and vision can represent the future position of a moving object
into some unoccupied point in space (Gregory, 2009).

Humans exemplify the division of labor between the two visual systems. Bálint (1909) was
probably the first to report a human case in which the perception of visual location was
impaired while the visual recognition of an object was relatively spared. Bálint’s patient,
who had bilateral inferior parietal lesions, was unable to reach for objects or estimate dis­
tances between objects. Holmes and Horax (1919) described similar patients and showed
that they had difficulty in judging differences in the lengths of two lines, but could easily
judge whether a unitary shape made by connecting the same two lines was a trapezoid
and not a rectangle (i.e., when the lines were part of a unitary shape). In general, pa­
tients with damage to the parietal lobes lack the ability to judge objects’ positions in
space, as shown by their difficulties in reaching, grasping, pointing to, or verbally de­
scribing their position and size (De Renzi, 1982). In contrast, “blindsight” patients, who
are unaware of the presence of an object, can locate by pointing an object in the blind
field of vision (Perenin & Jeannerod, 1978; Weiskrantz, 1986). Children with the
Williams’ (developmental) syndrome show remarkable sparing of object recognition with
severe breakdown of spatial abilities (Landau et al., 2006). In contrast, patients with infe­
rior temporal-occipital lesions, who have difficulty in recognizing the identity of objects
and reading (i.e., visual agnosia), typically show unimpaired spatial perception; they can
reach and manipulate objects and navigate without bumping into objects (Kinsbourne &
Warrington, 1962). Patients with visual agnosia after bilateral damage to the ventral sys­
tem (Goodale et al., 1991) can also guide a movement in a normal and natural manner to­
ward a vertical opening by inserting a piece of cardboard into it (Figure 3.2), but perform
at chance when asked to report either verbally or by adjusting the cardboard to match
the orientation of the opening (Milner & Goodale, 2008). It would thus seem that the spa­
tial representations of the dorsal system can effectively guide action but cannot make
even simple pattern discriminations.

Page 7 of 59
Representation of Spatial Relations

Figure 3.2 Performance of a patient with extensive


damage to the ventral system in perceptually match­
ing a linear gap versus inserting an object into the
gap.

From Milner & Goodale, 2006. Reprinted with per­


mission from Oxford University Press.

Importantly, several neurobiological investigations in which brain regions of nonhuman


primates (p. 33) were selectively damaged have shown a double dissociation between per­
ception of space and of objects (Mishkin et al., 1983). In one condition, monkeys had to
choose one of two identical objects (e.g., two square plaques) located closer to a land­
mark object (a cylinder). In another condition, two different objects (e.g., a cube and a
pyramid), each with a different pattern on its surface (e.g., checkerboard vs. stripes),
were shown. In both tasks, the monkeys had to reach a target object, but the solution in
the first task mainly depended on registering spatial information, whereas in the other,
information about shape and pattern was crucial to obtain the reward. A group of mon­
keys lacked parts of the parietal cortex, whereas another group lacked parts of the tem­
poral cortex. The “parietal” monkeys were significantly more impaired in the spatial task,
whereas the “temporal” monkeys were significantly more impaired in the object discrimi­
nation task. The results are counterintuitive because one would think that the monkeys
with an intact dorsal system (parietal lobe) should be able to discriminate checkerboards
and stripes (because these differ in both size and slant of their elements), despite the
damage to the ventral system (temporal lobe). The inability of monkeys to do so clearly
indicates that spatial representations of their dorsal system are used to guide action, not
to discriminate patterns (Kosslyn, 1994; Kosslyn & Koenig, 1992).

In addition, electrical recordings from individual neurons in monkeys’ parietal lobes re­
veal cells that encode the shape of objects (Sereno & Maunsell, 1998; Taira et al., 1990;;
Sakatag et al., 1997). However, these pattern (or “what”) representations in the dorsal
cortex are clearly action related; their representation of the geometrical properties of
shapes (e.g., orientation, size, depth, and motion) are used exclusively when reaching and
grasping objects. That is, they represent space in a “pragmatic” sense and without a con­

Page 8 of 59
Representation of Spatial Relations

ceptual content that is reportable (Faillenot et al., 1997; Jeannerod & Jacob, 2005; Va­
lyear et al., 2005).

Neuroimaging studies show shape-selective activations in humans’ dorsal areas (Denys et


al., 2004); simply seeing a manipulable human-made object (e.g., a tool like a hammer)
evokes changes in neural activity within the human dorsal system (Chao & Martin, 2000).
This is consistent with primate studies showing that the parietal lobes contain neurons
that encode the shape of objects. Thus, human parietal structures contain motor-relevant
information about the shapes of some objects, information that would seem necessary to
efficiently control specific actions. These dorsal areas’ shape information could also be
used in conjunction with the ventral system and act as an organizing principle for a “cate­
gory-specific” representation of semantic categories (in this case, for “tools”; Mahon et
al., 2007). Remarkably, shape information in the dorsal system per se does not seem to be
able to support the conscious perception of object qualities; a profound object agnosia
(that includes the recognition of manipulable objects) is observed after temporal lesions.
Dissociations of motor-relevant shape information from conscious shape perception have
also been documented with normal observers, when actions were directed toward visual
illusions (Aglioti et al., 1995; Króliczak et al., 2006). Evidently, the dorsal (parietal)
system’s visual processing does not lead to a conscious description (identification) of ob­
jects’ shapes (Fang & He, 2005; Johnson & Haggard, 2005).

It is also clear that the dorsal system’s shape sensitivity does not derive from information
relayed by the ventral system because monkeys with large temporal lesions and profound
object recognition deficits are able to normally grasp small objects (Glickstein et al.,
1998) and catch flying insects. Similarly, patient D.F. (Goodale et al., 1991; Milner et al.,
1991; Milner & Goodale, 2006) showed preserved visuomotor abilities and could catch a
ball in flight (Carey et al., 1996) but could not recognize a drawing of an apple. When
asked to make a copy of it, she arranged straight lines into a spatially incoherent square-
like configuration (Servos et al., 1999). D.F.’s drawing deficit indicates that her spared
dorsal shape representations cannot be (p. 34) accessed or expressed symbolically, de­
spite being usable as the targets of directed action. The fact that D.F. performed at
chance, when reporting the orientation of the opening in the previously described “post­
ing” experiment, does not necessarily indicate that the conscious representation of space
is an exclusive function of the ventral system (Milner & Goodale, 2006, 2008). Nor does it
indicate that the dorsal system’s representation of space should be best described as a
“zombie agent” (Koch, 2004) or as a form of “blindsight without blindness” (Weiskrantz,
1997). In fact, patient D.F. made accurate metric judgments of which elements in an array
were nearest and which were farthest; when asked to reproduce an array of irregularly
positioned colored dots on a page, her rendition of their relative positions (e.g., what ele­
ment was left of or below another element) was accurate, although their absolute posi­
tioning was deficient (Carey et al., 2006). It seems likely that a perfect reproduction of an
array of elements requires comparing the copy to the model array as an overall “gestalt”
or perceptual template, a strategy that may depend on the shape perception mechanisms
of D.F.’s (damaged) ventral system.

Page 9 of 59
Representation of Spatial Relations

Finally, it would seem that the ventral system in the absence of normal parietal lobes can­
not support an entirely normal perception of shapes. Patients with extensive and bilateral
parietal lesions (i.e., with Bálint’s syndrome) do not show completely normal object or
shape perception; their object recognition is typically limited to one object or just a part
of it (Luria, 1963). Remarkably, these patients need an extraordinarily long time to ac­
complish recognition of even a single object, thus revealing a severe reduction in object
processing rate (Duncan et al., 2003). Kahneman, Treisman, and Gibbs (1992) made a dis­
tinction between object identification and object perception. They proposed that identifi­
cation (i.e., the conscious experience of seeing an instance of an object) depends on form­
ing a continuously updated, integrated representation of the shapes and their space–time
coordinates. The ventral and dorsal system may each support a form of “phenomenal”
consciousness (cf. Baars, 2002; Block, 1996), but they necessitate the functions of the oth­
er system in order to generate a conscious representation that is reportable and accessi­
ble to other parts of the brain (e.g., the executive areas of the frontal lobes; Lamme, 2003,
2006).

3. “Where,” “How,” or “Which” Systems?


“What” versus “where” functional distinctions have also been identified in specific areas
of the frontal lobe of monkeys (Goldman-Rakic, 1987; Passingham, 1985; Rao et al., 1997;
Wilson et al., 1993). These areas appear to support working memory (short-term reten­
tion) of associations between shape and spatial information. In other words, they encode
online information about “what is where?” or “which is it?” (when seeing multiple ob­
jects). Prompted by these findings with animals, similar functional distinctions have been
described in human patients (e.g., Newcombe & Russell, 1969) as well as in localized
brain activations in healthy subjects (e.g., Courtney et al., 1997; Haxby et al., 1991; Smith
et al., 1995; Ungerleider & Haxby, 1994; Zeki et al., 1991).

Spatial information (in particular fine-grained spatial information about distances, direc­
tion, and size) is clearly essential to action. In this respect, much of the spatial informa­
tion of the “where” system is actually in the service of movement planning and guidance
or of “praxis” (i.e., how to accomplish an action, especially early-movement planning; An­
dersen & Buneo, 2002). In particular, the posterior parietal cortex of primates performs
the function of transforming visual information into a motor plan (Snyder et al., 1997).
For example, grasping an object with one hand is a common primate behavior that, ac­
cording to studies of both monkeys and humans, depends on the spatial functions of the
parietal lobe (Castiello, 2005). Indeed, patients with damage to the superior parietal lobe
show striking deficits in visually guided grasping (i.e., optic ataxia; Perenin & Vighetto,
1988). Damage to this area may result in difficulties generating visual-motor transforma­
tions that are necessary to mold the hand’s action to the shape and size of the object
(Jeannerod et al., 1994; Khan et al., 2005), as well as to take into account the position of
potential obstacles (Schindler et al., 2004).

Page 10 of 59
Representation of Spatial Relations

Neuroimaging studies of healthy participants scanned during reach-to-grasp actions show


activity in the posterior parietal cortex (especially when a precision grip is required; Cul­
ham et al., 2003; Gallivan et al., 2009). The superior parietal lobe appears to contain a
topographic map that represents memory-driven saccade direction (Sereno et al., 2001)
as well as the direction of a pointing movement (Medendorp et al., 2003). The parietal
cortex may be tiled with spatiotopic maps, each representing space in the service of a
specific action (Culham & Valyear, 2006). It is likely that the computation of motor com­
mands for reaching depends on the simultaneous processing of mutually connected areas
of the parietal and frontal lobes, which (p. 35) together provide an integrated coding sys­
tem for the control of reaching (Burnod et al., 1999; Thiebaut de Schotten et al., 2005).
Importantly, while lesions to the ventral system invariably impair object recognition, ob­
ject-directed grasping is spared in the same patients (James et al., 2003).

The parietal lobes clearly support spatial representations detailed enough to provide the
coordinates for precise actions like reaching, grasping, pointing, touching, looking, and
avoiding a projectile. Given the spatial precision of both manual action and oculomotor
behavior, one would expect that the neural networks of the parietal cortex would include
units with the smallest possible spatial tuning (i.e., very small receptive fields; Gross &
Mishkin, 1977). By the same reasoning, the temporal cortex may preferentially include
units with large spatial tuning because the goal of such a neural network is to represent
the presence of a particular object, regardless of its spatial attributes (i.e., show dimen­
sional and translational invariance). However, it turns out that both the parietal and tem­
poral cortices contain units with large receptive fields (from 25 to 100 degrees; O’Reilly
et al., 1990) that exclude the fovea and can even represent large bilateral regions of the
visual field (Motter & Mountcastle, 1981).

Therefore, some property of these neural populations other than receptive field size must
underlie the ability of the parietal lobes to code positions precisely. A hint is given by
computational models (Ballard, 1986; Eurich & Schwegler, 1997; Fahle & Poggio, 1981;
Hinton, McClelland, & Rumelhart, 1986; O’Reilly et al., 1990) showing that a population
of neurons with large receptive fields, if these are appropriately overlapping, can be su­
perior to a population of neurons with smaller receptive fields in its ability to pinpoint
something. Crucially, receptive fields of parietal neurons are staggered toward the pe­
riphery of the visual field, whereas receptive fields of temporal neurons tend to crowd to­
ward the central, foveal position of the visual field. Consequently, parietal neurons can ex­
ploit coarse coding to pinpoint locations, whereas the temporal neurons, which provide
less variety in output (i.e., they all respond to stimuli in the fovea), trade off the ability to
locate an object with the ability to show invariance to spatial transformations. Spatial at­
tention evokes increased activity in individual parietal cells (Constantinidis & Steinmetz,
2001) and within whole regions of the parietal lobes (Corbetta et al., 1993; 2000). There­
fore, focusing attention onto a region of space can also facilitate computations of the rela­
tive activation of overlapping receptive fields of cells and thus enhance the ability to pre­
cisely localize objects (Tsal & Bareket, 2005; Tsal, Meiran, & Lamy, 1995).

Page 11 of 59
Representation of Spatial Relations

Following the common parlance among neuro-scientists, who refer to major divisions be­
tween neural streams with interrogative pronouns, the “where” system is to a large ex­
tent also the “how” system of the brain (Goodale & Milner, 1992), or the “vision-for-ac­
tion” system (whereas the “what” system has been labeled the “vision-for-perception’ sys­
tem by Milner & Goodale, 1995, 2008). However, spatial representations do much more
than guide action; we also “know” space and can perceive it without having an intention­
al plan or having to perform any action. Although neuroimaging studies show that areas
within the human posterior parietal cortex are active when the individual is preparing to
act, the same areas are also active during the mere observation of others’ actions (Bucci­
no et al., 2004; Rizzolatti & Craighero, 2004). One possibility is that these areas may be
automatically registering the “affordances” of objects that could be relevant to action, if
an action were to be performed (Culham & Valyear, 2006).

The superior and intraparietal regions of the parietal lobes are particularly engaged with
eye movements (Corbetta et al., 1998; Luna et al., 1998), but neuroimaging studies also
show that the parietal areas are active when gaze is fixed on a point on the screen and no
action is required, while the observer simply attends to small objects moving randomly on
the screen (Culham et al., 1998). The larger the number of moving objects that have to be
attentively monitored on the screen, the greater is the activation in the parietal lobe (Cul­
ham et al., 2001). Moreover, covert attention to spatial positions that are empty of objects
(i.e., before they appear in the expected locations) strongly engages mappings in the pari­
etal lobes (Corbetta et al., 2000; Kastner et al., 1999). Monkeys also show neural popula­
tion activity within the parietal cortex when they solve visual maze tasks and when men­
tally following a path, without moving their eyes or performing any action (Crowe et al.,
2005). In human neuroimaging studies, a stimulus position judgment (left vs. right) in re­
lation to the body midline mainly activates the superior parietal lobe (Neggers et al.,
2006), although pointing to or reaching is not required.

In addition, our cognition of space is also qualitative or “categorical” (Hayward & Tarr,
1995). Such a type of spatial information is too abstract to be useful in fine motor guid­
ance. Yet, neuropsychological evidence clearly indicates that this type of (p. 36) spatial
perception and knowledge is also dependent on parietal lobe function (Laeng, 1994).
Thus, the superior and inferior regions of the parietal lobes may specialize, respectively,
in two visual-spatial functions: vision-for-action versus vision-for-knowledge, or a “prag­
matic” versus a “semantic” function that encodes the behavioral relevance or meaning of
stimuli (Freedman & Assad, 2006; Jeannerod & Jacob, 2005). The superior parietal lobe
(i.e., the dorsal part of the dorsal system) may have a functional role close to the idea of
an “agent” directly operating in space, whereas the inferior parietal lobe plays a role clos­
er to that of an “observer” that understands space and registers others’ actions as they
evolve in space (Rizzolatti & Matelli, 2003).

According to Milner and Goodale (2006), the polysensory areas of the inferior parietal
lobe and superior temporal cortex may have developed in humans as new functional ar­
eas and be absent in monkeys. Thus, they can be considered a “third stream” of visual
processing. More generally, they may function as a supramodal convergent system be­

Page 12 of 59
Representation of Spatial Relations

tween the dorsal and ventral systems that supports forms of spatial cognition that are
unique to our species (e.g., use of pictorial maps; Farrell & Robertson, 2000; Semmes et
al., 1955). These high-level representational systems in the human parietal lobe may also
provide the substrate for the construction and spatial manipulation of mental images
(e.g., the three-dimensional representation of shapes and the ability to “mentally rotate”).
Indeed, mental rotation of single shapes is vulnerable to lesions of the posterior parietal
lobe (in the right hemisphere; Butters et al., 1970) or to its temporary and reversible de­
activation after transcranial magnetic stimulation (Harris & Miniussi, 2003) or stimula­
tion with cortically implanted electrodes in epileptic patients (Zacks et al., 2003). Neu­
roimaging confirms that imagining spatial transformations of shapes (i.e., mental rota­
tion) produces activity in the parietal lobes (e.g., Alivisatos & Petrides, 1997; Carpenter et
al., 1999a; Harris et al., 2000; Jordan et al., 2002; Just et al., 2001; Kosslyn, DiGirolamo,
et al., 1998). Specifically, it is the right superior parietal cortex that seems most involved
in the mental rotation of objects (Parsons, 2003). Note that space in the visual cortex is
represented as two-dimensional space (i.e., as a planar projection of space in the world),
but disparity information from each retina can be used to reconstruct the objects’ three-
dimensional (3D) shape and the depth and global 3D layout of a scene. Neuroimaging in
humans and electrical recordings in monkeys both indicate that the posterior parietal
lobe is crucial to cortical 3D processing (Naganuma et al., 2005; Tsao et al., 2003). The
parietal lobe of monkeys also contains neurons that are selectively sensitive to 3D infor­
mation from monocular information like texture gradients (Tsutsui et al., 2002).

Thus, the human inferior parietal lobe or Brodmann area 39 (also known as the angular
gyrus) would then be a key brain structure for our subjective experience of space or for
“space awareness.” This area may contribute to forming individual representations of
multiple objects by representing the spatial distribution of their contours and boundaries
(Robertson et al., 1997). Such a combination of “what” with “where” information would
result in selecting “which” objects will be consciously perceived. In sum, the posterior
parietal cortex in humans and primates appears to be the control center for visual-spatial
functions and the hub of a widely distributed brain system for the processing of spatial in­
formation (Graziano & Gross, 1995; Mountcastle, 1995). This distributed spatial system
would include the premotor cortex, putamen, frontal eye fields, superior colliculus, and
hippocampus. Parietal lesions, however, would disrupt critical input to this distributed
system of spatial representations.

4. Spatial Frames of Reference


In patients with Bálint’s syndrome, bilateral parietal lesions destroy the representation of
spatial relations. These patients act as though there is no frame of reference on which to
hang the objects of vision (Robertson, 2003). Normally, in order to reach, grasp, touch,
look, point toward, or avoid something, we need to compute relative spatial locations be­
tween objects and our body (or the body’s gravitational axis) or of some of its parts (e.g.,

Page 13 of 59
Representation of Spatial Relations

the eyes or the head’s vertical axis). Such computations are in principle possible with the
use of various frames of reference.

The initial visual frame of reference is retinotopic. However, this frame represents a view
of the world that changes with each eye movement and therefore is of limited use for con­
trolling action. In fact, the primate brain uses multiple frames of reference, which are ob­
tained by integrating information from the other sense modalities with the retinal infor­
mation. These subsequent frames of reference provide a more stable representation of
the visual world (Feldman, 1985). In the parietal lobe, neurons combine information to a
stimulus in a particular location with information about the position (p. 37) of the eyes
(Andersen & Buneo, 2002; Andersen, Essick, & Siegel, 1985), which is updated across
saccades (Heide et al., 1995). Neurons with head-centered receptive fields are also found
in regions of the monkey’s parietal lobe (Duhamel et al., 1997). Comparing location on
the retina to one internal to the observer’s body is an effective way to compute position
within a spatiotopic frame of reference, as also shown by computer simulations (Zipser &
Andersen, 1988). In the monkey, parietal neurons can also code spatial relationships as
referenced to an object and not necessarily to an absolute position relative to the viewer
(Chafee et al., 2005, 2007).

A reference frame can be defined by an origin and its axes. These can be conceived as
rigidly attached or fixed onto something (an object or an element of the environment) or
someone (e.g., the viewer’s head or the hand). For example, the premotor cortex of mon­
keys contains neurons that respond to touch, and their receptive fields form a crude map
of the body surface (Avillac et al., 2005). These neurons are bimodal in that they also re­
spond to visual stimuli that are adjacent in space to the area of skin they represent (e.g.,
the face or an arm). However, these cells’ receptive fields are not retinotopic; instead,
when the eyes move, their visual receptive fields remain in register with their respective
tactile fields (Gross & Graziano, 1995; Kitada et al., 2006). For example, a bimodal neu­
ron with a facial tactile field responds as if its visual field is an inflated balloon glued to
the side of the face. About 20 percent of these bimodal neurons continue their activity af­
ter lights are turned off, so as to also code the memory of an object’s location (Graziano,
Hu, & Gross, 1997). Apparently, some of these neurons are specialized for withdrawing
from an object rather than for reaching it (Graziano & Cooke, 2006). Neurons that inte­
grate several modalities at once have also been found within the premotor cortex of the
monkey; trimodal neurons (visual, tactile, and auditory; Figure 3.3) have receptive fields
that respond to a sound stimulus located in the space surrounding the head, within
roughly 30 cm (Graziano, Reiss, & Gross, 1999). Neuroimaging reveals maximal activity
in the human dorsal parieto-occipital sulcus when viewing objects looming near the face
(i.e., in a range of 13 to 17 cm), and this neural response decreases proportionally to dis­
tance from the face (Quinlan & Culham, 2007).

The superior parietal lobe of monkeys might be the substrate for body-centered positional
codes for limb movements, where coordinates define the azimuth, elevation, and distance
of the hand (Lacquaniti et al.,

Page 14 of 59
Representation of Spatial Relations

1995). In other words, pre­


motor and parietal areas can
represent visual space near
the body in “arm-centered”
coordinates (Graziano et al.,
1994). Visual space is con­
structed many times over, at­
tached to different parts of
the body for different func­
tions (Graziano & Gross,
1998). Neuroimaging in hu­
mans confirms that at least
one of the topographic maps
of the parietal lobes uses a
head-centered coordinate
frame (Sereno & Huang,
2006). Thus, a plurality of
Figure 3.3 Frames of reference of neural cells of the sensorimotor action spaces
macaque centered on a region of the face and ex­ may be related to specific ef­
tending into a bounded region in near space. Such
cells are multimodal and can respond to either visual
fectors that can move inde­
or auditory stimuli localized within their head-cen­ pendently from the rest of
tered receptive field. the body (e.g., hand, head,
From Graziano et al., 1999. Reprinted with permis­ and eye). In these motor-ori­
sion from Nature. ented frames of reference, a
spatial relationship between
two locations can be coded in terms of the movement required to get from one to the other (Pail­
lard, 1991).
Finally, the underpinning of our sense of direction is gravity, which leads to the percep­
tion of “up” versus “down” or of a vertical direction that is clearly (p. 38) distinct from all
other directions. This gravitational axis appears as irreversible, whereas front–back and
left–right change continuously in our frame of reference simply by our turning around
(Clément & Reschke, 2008). The multimodal cortex integrates the afferent signals from
the peripheral retina with those from the vestibular organs (Battista & Peters, 2010;
Brandt & Dietrich, 1999; Kahane et al., 2003; Waszak, Drewing, & Mausfeld, 2005) so as
to provide a sense of the body’s position in relation to the environment.

5. Neglecting Space
Localizing objects according to multiple and parallel spatial frames of reference is also
relevant to the manner in which spatial attention is deployed. After brain damage, atten­
tional deficit, or the “neglect” of space, clearly reveals how attention can be allocated
within different frames of reference. Neglect is a clinical disorder that is characterized by
a failure to notice objects to one side (typically, the left). However, “left” and “right” must
be defined with respect to some frame of reference (Beschin et al., 1997; Humphreys &
Riddoch, 1994; Pouget & Snyder, 2000), and several aspects of the neglect syndrome are
best understood in terms of different and specific frames of reference. That is, an object

Page 15 of 59
Representation of Spatial Relations

can be on the left side with respect to the eyes, head, or body, or with respect to some ax­
is placed on the object (e.g., the left side of the person facing the patient). In the latter
case, one can consider the left side of an object (e.g., of a ball) as (1) based on a vector
originating from the viewer or (2) based on the intrinsic geometry of the object (e.g., the
left paw of the cat). These frames of reference can be dissociated by positioning different­
ly the parts of the body or of the object. For example, a patient’s head may turn to the
right, but gaze can be positioned far to the left. Thus, another person directly in front of
the patient would lie to the left with respect to the patient’s head and to the right with re­
spect to the patient’s eyes. Moreover, the person in front of the patient would have her
right hand to the left of the patient’s body, but if she turned 180 degrees, her right hand
would then lie to the right of the patient’s body.

Although right-hemisphere damage typically leads to severe neglect (Heilman et al.,


1985; Heilman & Van Den Abell, 1980; Vallar et al., 2003), some patients with left-sided
lesions tend to neglect the ends of words (i.e., the right side, in European languages),
even when the word appears rotated 180 degrees or is written backward or in mirror
fashion (Caramazza & Hillis, 1990). Such errors occurring for a type of stimulus (e.g.,
words) in an object-centered or word-centered frame of reference imply (1) a spatial rep­
resentation of the object’s parts (e.g., of the letters, from left to right, for words) and (2)
that damage can specifically affect how one reference frame is transformed into another.
As discussed earlier, the parietal cortex contains neurons sensitive to all combinations of
eye position and target location. Consequently, a variety of reference frame transforma­
tions are possible because any function over that input space can be created with appro­
priate combinations of neurons (Pouget & Sejnowski, 1997; Pouget & Snyder, 2000). That
is, sensory information can be recoded into a flexible intermediate representation to facil­
itate the transformation into a motor command. In fact, regions of the parietal lobes
where cells represent space in eye-centered coordinates may not form any single spatial
coordinate system but rather carry the raw information necessary for other brain areas to
construct other spatial coordinate systems (Andersen & Buneo, 2002; Chafee et al., 2007;
Colby & Goldberg, 1999; Graziano & Gross, 1998; Olson, 2001, 2003; Olson & Gettner,
1995).

Page 16 of 59
Representation of Spatial Relations

Figure 3.4 Left, The eight conditions used to probe


neglect in multiple reference frames: viewer cen­
tered, object centered, and extra personal. In condi­
tion A, the patient viewed the cubes on a table and
“near” his body. In condition B, the patient viewed
the cubes on a table and “far” from his body. In con­
dition C, the patient viewed the cubes held by the ex­
perimenter while she sat “near” the patient, facing
him. In condition D, the patient viewed the cubes
held by the experimenter while she sat “far” away
and facing the patient. In condition E, the patient
viewed the cubes held by the experimenter while she
sat “far” and turned her back to the patient. In condi­
tion F, the patient viewed the cubes in the “far” mir­
ror while these were positioned on a “near” table. In
condition G, the patient viewed the cubes in the “far”
mirror while the experimenter facing him held the
cubes in her hands. In condition H, the patient
viewed the cubes in the “far” mirror while the experi­
menter turned her back to the patient and held the
cubes in her hands. Note that in the last three condi­
tions, the cubes are seen only in the mirror (in ex­
trapersonal space) and not directly (in peripersonal
space). Right, Results for conditions D and E, show­
ing a worsening of performance when the target was
held in the left hand of the experimenter and in left
hemispace compared with the other combinations of
an object-centered and viewer-centered frames of
reference.

From Laeng, Brennen, et al., 2002. Reprinted with


permission of Elsevier.

According to a notion of multiple coordinate systems, different forms of neglect will mani­
fest depending on the specific portions of parietal or frontal cortex that are damaged.
These will reflect a complex mixture of various coordinate frames. Thus, if a lesion of the
parietal lobe causes a deficit in a distributed code of locations that can be read out in a
variety of reference frames (Andersen & Buneo, 2002), neglect behavior will emerge in
the successive visual transformations (Driver & Pouget, 2000). It may also be manifested
within an object-centered reference frame (Behrmann & Moscovitch, 1994). Indeed, ne­

Page 17 of 59
Representation of Spatial Relations

glect patients can show various mixtures and dissociations between the reference frames;
thus, some patients show both object-centered and viewer-centered neglect (Behrmann &
Tipper 1999), but other patients shown neglect in just one of these frames (Hillis & Cara­
mazza, 1991; Tipper & Behrmann, 1996). For example, Laeng and colleagues (2002)
asked a neglect patient to report the colors of two objects (cubes) that could either lie on
a table positioned near or far from the patient or be held in the left and right hands of the
experimenter. In the latter case, the experimenter either faced the patient or turned
backward so that the cubes held in her hands could lie in either the left or right hemi­
space (Figure 3.4). Thus, the cubes’ position in space was also determined by the
experimenter’s (p. 39) body position (i.e., they could be described according to an exter­
nal body’s object-centered frame). Moreover, by use of a mirror, the cubes could be seen
in the mirror far away, although they were “near” the patient’s body, so that the patient
actually looked at a “far” location (i.e., the surface of the mirror) to see the physically
near object. The experiment confirmed the presence of all forms of neglect. Not only did
the patient name the color of a cube seen in his left hemispace more slowly than in his
right hemispace, but also latencies increased for a cube held by the experimenter in her
left hand and in the patient’s left hemispace (both when the left hand was seen directly or
as a mirror reflection). Finally, the patient’s performance was worse for “far” than “near”
locations. He neglected cubes located near his body (i.e., within “grasping” space) but
seen in the mirror, thus dissociating directing gaze toward extrapersonal space to see an
object physically located in peripersonal space.

In most accounts of spatial attention, shifting occurs within coordinate frames that can be
defined by a portion (or side) of a spatial plane that is orthogonally transected by some
egocentric axis (Bisiach et al., 1985). However, together with the classic form of neglect
for stimuli or features to the left of the body (or an object’s) midline, neglect can also oc­
cur below or above the horizontal plane or in the lower (Butter et al., 1989) versus the
upper visual field (Shelton et al., 1990) In addition, several neglect behaviors would seem
to occur in spatial frames of reference that are best defined by vectors (p. 40) (Kins­
bourne, 1993) or polar coordinates (Halligan & Marshall, 1995), so that either there is no
abrupt boundary for the deficit to occur or the neglected areas are best described by an­
nular regions of space around the patient’s body (e.g., grasping or near, peripersonal,
space). Neurological studies have identified patients with more severe neglect for stimuli
within near or reaching space than for stimuli confined beyond the peripersonal region in
far, extrapersonal, space (Halligan & Marshall, 1991; Laeng et al., 2002) as well as pa­
tients with the opposite pattern of deficit (Cowey et al., 1994; Mennemeier et al., 1992).
These findings appear consistent with the evidence from neurophysiology studies in mon­
keys (e.g., Graziano et al., 1994), where spatial position can be defined within a bounded
region of space to the head or arm. Moreover, a dysfunction within patients’ inferior pari­
etal regions is most likely to result in neglect occurring in an “egocentric” spatial frame
of reference (i.e., closely related to action control within personal space), whereas dys­
function within the superior temporal region is most likely to result in “allocentric” ne­
glect occurring in a spatial frame of reference centered on the visible objects in extraper­
sonal space (Committeri et al., 2004, 2007; Hillis, 2006).

Page 18 of 59
Representation of Spatial Relations

Patients with right parietal lesions also have difficulties exploring “virtual space” (i.e., lo­
cating objects within their own mental images). For example, patients with left-sided ne­
glect are unable to describe from their visual memory left-sided buildings in a city scene
(“left” being based on their imagined position within a city’s square; Beschin et al., 2000;
Bisach & Luzzatti, 1978). Such patients may also be unable to spell the beginning of
words (i.e., unable to read the left side of the word from an imaginary display; Baxter &
Warrington, 1983). However, patients with neglect specific to words (or “neglect dyslex­
ia”) after a left-hemisphere lesion can show a spelling deficit for the ends of words (Cara­
mazza & Hillis, 1990).

Neuropsychological findings also have shown that not only lesions of the inferior parietal
lobe but also those of the frontal lobe and the temporal-parietal-occipital junction lead to
unilateral neglect. Remarkably, damage to the rostral part of the human superior tempo­
ral cortex (of the right hemisphere) results in profound spatial neglect (Karnath et al.,
2001) in humans and monkeys, characterized by a profound lack of awareness for objects
in the left hemispace. Because the homologous area of the left hemisphere is specialized
for language in humans, this may have preempted the spatial function of the left superior
temporal cortex, causing a right-sided dominance for space-related information (Wein­
traub & Mesulam, 1987). One possibility is that the right-sided superior temporal cortex
plays an integrative role with regard to the ventral and dorsal streams (Karnath, 2001)
because the superior temporal gyrus is adjacent to the inferior areas of the dorsal system
and receives input from both streams and is therefore a site for multimodal sensory con­
vergence (Seltzer & Pandya, 1978). However, none of these individual areas should be in­
terpreted as the “seat” of the conscious perception of spatially situated objects. In fact,
no cortical area alone may be sufficient for visual awareness (Koch, 2004; Lamme et al.,
2000). Most likely, a conscious percept is the expression of a distributed neural network
and not of any neural bottleneck. That is, a conscious percept is the gradual product of
recurrent and interacting neural activity from several reciprocally interconnected regions
and streams (Lamme, 2003, 2006). Nevertheless, the selective injury of a convergence
zone, like the superior temporal lobe, could disrupt representations that are necessary
(but not sufficient) to spatial awareness.

Interestingly, patients with subcortical lesions and without detectable damage of either
temporal or parietal cortex also show neglect symptoms. However, blood perfusion mea­
surements in these patients reveal that the inferior parietal lobe is hypoperfused and
therefore dysfunctional (Hillis et al., 2005). Similarly, damage to the temporoparietal
junction, an area neighboring both the ventral and dorsal systems, produces abnormal
correlation of the resting state signal between left and right inferior parietal lobes, which
are not directly damaged; this abnormality correlates with the severity of neglect (Corbet­
ta et al., 2008; He et al., 2007). Therefore, the “functional lesion” underlying neglect may
include a more extensive area than what is revealed by structural magnetic resonance, by
disrupting underlying association or recurrent circuits (e.g., parietal-frontal pathways;
Thiebaut de Schotten et al., 2005).

Page 19 of 59
Representation of Spatial Relations

6. Lateralization of Spatial Representations


Differential functional specializations of the two sides of the brain are already present in
early vertebrates (Sovrano et al., 2005; Vallortigara & Rogers, 2005), suggesting that lat­
eralization may be the expression of a strategy of division of labor that evolved millions of
years before the appearance of the human species. In several species, the right brain ap­
pears to be specialized for vigilance (p. 41) and recognition of novel or surprising stimuli.
For example, birds appear more adept at gathering food or catching prey seen with the
right eye (i.e., left brain) than with the left eye (i.e., right brain). Such a segregation of
functions would seem at first glance not so adaptive because it may put the animal at
great risk (by making its behavior predictable to both prey and predators). An evolution­
ary account that can explain this apparently nonadaptive brain organization is based on
the hypothesis that a complementary lateralization makes the animal superior in perform­
ing several tasks at the same time (Vallortigara et al., 2001), counteracting the ecological
disadvantages of lateral bias. Evidence indicates that birds that are strongly lateralized
are more efficient at parallel processing than birds of the same species that are weakly
lateralized (Rogers et al., 2004). Thus, a complementary lateral specialization would seem
to make the animals apt to attend to two domains simultaneously.

There is a clear analogy between this evolution-arily adaptive division of labor between
the vertebrate cerebral hemispheres and the performance of artificial neural networks
that segregate processing to multiple, smaller subsystems (Otto et al., 1992; Reuckl,
Cave, & Kosslyn, 1989). Most relevant, this principle of division of labor has also been ap­
plied to the modularization of function for types of spatial representations. Specifically,
Kosslyn (1987) proposed the existence of two neural subnetworks within the dorsal
stream that process qualitatively different types of spatial information. One spatial repre­
sentation is based on a quantitative parsing of space and therefore closely related to that
of spatial information in the service of action. This type of representation is called coordi­
nate (Kosslyn, 1987) because it is derived from representations that provide coordinates
for navigating into the environment as well as for performing targeted actions such as
reaching, grasping, hitting, throwing, and pointing to something. In contrast, the other
hypothesized type of spatial representation, labeled categorical spatial relation, parses
space in a qualitative manner. For example, two configurations can be described as “one
to the left of the other.” Thus, qualitative spatial relations are based on the perception of
spatial categories, where an object (but also an empty place) is assigned to a broad equiv­
alence class of spatial positions (e.g., if a briefcase can be on the floor, and being “on the
floor” is satisfied by being placed on any of the particular tiles that make up the whole
floor).

Each of the two proposed separate networks would be complementarily lateralized. Thus,
the brain can represent in parallel the same spatial layout in at least two separate man­
ners (Laeng et al., 2003): a right-hemisphere mode that assesses spatial “analog” spatial
relations (e.g., the distance between two objects) and a left-hemisphere mode that assess­
es “digital” spatial relations (e.g., whether two objects are attached to one another or
above or below the other). The underlying assumption in the above account is that com­
Page 20 of 59
Representation of Spatial Relations

puting separately the two spatial relations (instead of, e.g., taking the quantitative repre­
sentation and making it coarser by grouping the finer locations) could result in a more ef­
ficient representation of space, where both properties can be attended simultaneously.
Artificial neural network simulations of these spatial judgments provide support for more
efficient processing in “split” networks than unitary networks (Jacobs & Kosslyn, 1994;
Kosslyn, Chabris, et al., 1992; Kosslyn & Jacobs, 1994). These studies have shown that,
when trained to make either digital or analog spatial judgments, the networks encode
more effectively each relation if their input is based, respectively, on units with relatively
small, nonoverlapping receptive fields, as opposed to units with relatively large, overlap­
ping receptive fields (Jacobs & Kosslyn, 1994). Overlap of location detectors would then
promote the representation of distance, based on a “coarse coding” strategy (Ballard,
1986, Eurich & Schwegler, 1997; Fahle & Poggio, 1981; Hinton, McClelland, & Rumel­
hart, 1986). In contrast, the absence of overlap between location detectors benefits the
representation of digital or categorical spatial relations, by effectively parsing space.

Consistent with the above computational account, Laeng, Okubo, Saneyoshi, and Michi­
mata (2011) observed that spreading the attention window to encompass an area that in­
cludes two objects or narrowing it to encompass an area that includes only one of the ob­
jects can modulate the ability to represent each type of spatial relation. In this study, the
spatial attention window was manipulated to select regions of differing areas by use of
cues of differing sizes that preceded the presentation of pairs of stimuli. The main as­
sumption was that larger cues would encourage a more diffused attention allocation,
whereas the small cues would encourage a more focused mode of attention. When the at­
tention window was large (by cueing an area that included both objects as well as the
empty space between them), spatial transformations of distance between two objects
were noticed faster than when (p. 42) the attention window was relatively smaller (i.e.,
when cueing an area that included no more than one of the objects in the pair). Laeng
and colleagues concluded that a relatively larger attention window would facilitate the
processing of an increased number of overlapping spatial detectors so as to include (and
thus “measure”) the empty space in between or the spatial extent of each form (when
judging, e.g., size or volume). In contrast, smaller nonoverlapping spatial detectors would
facilitate parsing space into discrete bins or regions and, therefore, the processing of cat­
egorical spatial transformations; indeed, left–right and above–below were noticed faster
in the relatively smaller cueing condition than in the larger (see also Okubo et al., 2010).

The theoretical distinction between analog and digital spatial functions is relatively re­
cent, but early neurological investigations had already noticed that some spatial functions
(e.g., distinguishing left from right) are commonly impaired after damage to the posterior
parietal cortex of the left hemisphere, whereas impairment of other spatial functions, like
judging an object’s orientation or exact position, is typical after damage to the same area
in the opposite, right, hemisphere (Luria, 1973). The fact that different forms of spatial
dysfunctions can occur independently for each hemisphere has been repeatedly con­
firmed by studies of patients with unilateral lesions (Laeng, 1994, 2006; Palermo et al.,
2008) as well as by neuroimaging studies of normal individuals (Baciu et al., 1999; Koss­
lyn et al., 1998; Slotnick & Moo, 2006; Trojano et al., 2002). Complementary results have
Page 21 of 59
Representation of Spatial Relations

been obtained with behavioral methods that use the lateralized (and tachistoscopic) pre­
sentation of visual stimuli to normal participants (e.g., Banich & Federmeier, 1999; Bruy­
er et al., 1997; Bullens & Postma, 2008; Hellige & Michimata, 1989; Kosslyn, 1897; Koss­
lyn et al., 1989, 1995; Laeng et al., 1997; Laeng & Peters, 1995; Roth & Hellige, 1998; Ry­
bash & Hoyer, 1992).

Laeng (1994, 2006) showed a double dissociation between failures to notice changes in
categorical spatial relations and changes in coordinate spatial relations. A group of pa­
tients with unilateral damage to the right hemisphere had difficulty noticing a change in
distance or angle between two figures of animals presented successively. The same pa­
tients had less difficulty noticing a change of relative orientation (e.g., left vs. right or
above vs. below) between the same animals. In contrast, the patients with left-hemi­
sphere damage had considerably less difficulty noticing that the distance between the
two animals had either increased or decreased. In another study (Laeng, 2006), similar
groups of patients with unilateral lesions made corresponding errors in spatial construc­
tion tasks from memory (e.g., building patterns made of matchsticks; relocating figures of
animals on a cardboard). Distortions in reproducing the angle between two elements and
accuracy of relocation of the objects in the original position were more common after
damage to the right hemisphere (see also Kessels et al., 2002), whereas mirror reversals
of elements of a pattern were more common after damage to the left hemisphere. A study
by Palermo and colleagues (2008) showed that patients with damage confined to the left
hemisphere had difficulty visually imaging whether a dot shown in a specific position
would fall inside or outside of a previously seen circle. These patients were relatively bet­
ter in visually imaging whether a dot shown in a specific position would be nearer to or
farther from the circle’s circumference than another dot previously seen together with
the same circle. The opposite pattern of deficit was observed in the patients with right-
hemisphere damage. Another study with patients by Amorapanth, Widick, and Chatterjee
(2010) showed that lesions to a network of areas in the left hemisphere resulted in more
severe impairment in judging categorical spatial relations (i.e., the above–below relations
between pairs of objects) than lesions to homologous areas of the right hemisphere. Also
in this study, the reverse pattern of impairment was observed for coordinate spatial pro­
cessing, where right-brain damage produced more severe deficit than left-hemisphere
damage.

Page 22 of 59
Representation of Spatial Relations

Figure 3.5 Spatial memories for “coordinate” rela­


tions showed increased activity in the right
hemisphere’s prefrontal cortex, whereas memories
for “categorical” relations showed increased activity
in the left hemisphere’s prefrontal cortex.

From Slotnick & Moo, 2006. Reprinted with permis­


sion from Elsevier.

The above evidence with patients is generally consistent with that from studies with
healthy participants, in particular studies using the lateralized tachistoscopic method. In
these studies, the relative advantages in speed of response to stimuli presented either to
the left or right of fixation indicated superiority of the right hemisphere (i.e., left visual
field) for analog judgments and of the left hemisphere (i.e., right visual field) for digital
judgments. However, in studies with healthy subjects, the lateral differences appear to be
small (i.e., in the order of a few tens of milliseconds according to a meta-analysis; Laeng
et al., 2003). Nevertheless, small effect sizes identified with such a noninvasive method
are actually greater than effect sizes in percent blood oxygenation observed with fMRI.
Most important, both behavioral effects can predict very dramatic outcomes after dam­
age to the same region or side of the brain. Another method, whereby the same cortical
sites can be temporarily and reversibly deactivated (i.e., transcranial magnetic stimula­
tion [TMS]), (p. 43) provides converging evidence. Left-sided stimulation can effectively
mimic the deficit in categorical perception after left-hemisphere damage, whereas right-
sided stimulation mimics the deficit in coordinate space perception after right-hemi­
sphere damage (Slotnick et al., 2001; Trojano et al., 2006).

A common finding from studies using methods with localizing power (e.g., neuroimaging,
TMS, and selected patients) is that both parietal lobes play a key role in supporting the
perception of spatial relations (e.g., Amorapanth et al., 2010; Baciu et al., 1999; Kosslyn
et al., 1998; Laeng et al., 2002; Trojano et al., 2006). Moreover, areas of the left and right
prefrontal cortex that receive direct input from ipsilateral parietal areas also show activi­
ty when categorical or coordinate spatial information, respectively, is held in memory
(Kosslyn, Thompson, et al., 1998; Trojano et al., 2002). In an fMRI study (Slotnick & Moo,
2006), participants viewed in each trial a configuration consisting of a shape and a dot
placed at a variable distance from the shape (either “on” or “off” the shape and, in the
latter case, either “near” or “far” from the shape). In the subsequent retrieval task, the
shape was presented without the dot, and participants responded to queries about the
previously seen spatial layout (e.g., either about a categorical spatial relation property:

Page 23 of 59
Representation of Spatial Relations

“was the dot ‘on’ or ‘off’ the shape?”; or about a coordinate spatial relation property:
“was the dot ‘near’ to or ‘far’ from the shape?”). Spatial memories for coordinate rela­
tions were accompanied by increased activity in the right hemisphere’s prefrontal cortex,
whereas memories for categorical relations were accompanied by activity in the left
hemisphere’s prefrontal cortex (see Figure 3.5).

One should note that the above studies on the perception of categorical and coordinate
relations do not typically involve any specific action in space, but instead involve only ob­
servational judgments (e.g., noticing or remembering the position of objects in a display).
Indeed, a child’s initial cognition of space and of objects’ numerical identity may be en­
tirely based on a purely observational representation of space whereby the child notices
that entities preserve their identities and trajectories when they disappear behind other
objects and reappear within gaps of empty space (Dehaene & Changeux, 1993; Xu &
Carey, 1996). The above findings from neuroscience studies clearly point to a role of the
dorsal system in representing spatial information beyond the mere service of action (cf.
Milner & Goodale, 2008). Thus, the evidence from categorical and coordinate spatial pro­
cessing, together with the literature on other spatial transformations or operations (e.g.,
mental rotations of shapes, visual maze solving) clearly indicates that a parietal-frontal
system supports not merely support the “act” function but also two other central func­
tions of visual-spatial representations: to “know” and “talk.” The latter, symbolic function
would seem of particular relevance to our species and the only one that we do not share
with other living beings (except, perhaps, honeybees; Kirchner & Braun, 1994; Menzel et
al., 2000).

That is, humans can put into words or verbal propositions (as well as into gestures) any
type of (p. 44) spatial relations, whether quantitative (by use of numerical systems and
geometric systems specifying angles and eccentricities) or qualitative (by use of preposi­
tions and locutions). However, quantitative propositions may require measurement with
tools, whereas establishing qualitative spatial relations between objects would seem to
require merely looking at them (Ullman, 1984). If abstract spatial relations between ob­
jects in a visual scene can be effortlessly perceived, these representations are particular­
ly apt to be efficiently coded in a propositional manner (e.g., “on top of”). The latter lin­
guistic property would seem pervasive in all languages of the world and also pervade dai­
ly conversations. Some “localist” linguists have proposed that the deep semantic struc­
ture of language is intrinsically spatial (Cook, 1989). Some cognitive neuro-scientists
have also suggested that language in our species may have originated precisely from the
need to transmit information about the spatial layout of an area from one person to anoth­
er (O’Keefe, 2003; O’Keefe & Nadel, 1978).

Locative prepositions are often used to refer to different spatial relations in a quick and
frugal manner (e.g., “above,” “alongside,” “around,” “behind,” “between,” “inside,” “left,”
“on top of,” “opposite,” “south,” “toward,” “underneath”); their grammatical class may
exist in all languages (Jackendoff & Landau, 1992; Johnson-Laird, 2005; Kemmerer, 2006;
Miller & Johnson-Laird, 1976; Pinker, 2007). Clearly, spatial prepositions embedded in
sentences (e.g., the man is “in” the house) can express spatial relations only in a rather

Page 24 of 59
Representation of Spatial Relations

abstract manner (compared, for example, with how GPS coordinates can pinpoint space)
and can guide actions and navigation only in a very coarse sense (e.g., by narrowing
down an area of search). Locative prepositions resemble categorical spatial relations in
that they express spatial relationships in terms of sketchy or schematic structural proper­
ties of the objects, often ignoring details of spatial metrics (e.g., size, orientation, dis­
tance; Talmy, 2000). Nevertheless, the abstract relations of locative prepositions seem ex­
tremely useful to our species because they can become the referents of vital communica­
tion. Moreover, categorical spatial representations and their verbal expression counter­
parts may underlie the conceptual structure of several other useful representations
(Miller & Johnson-Laird, 1976), like the representations of time and of numerical entities
(Hubbard et al., 2005). Indeed, categorical spatial representations could provide the ba­
sic mental scaffolding for semantics (Cook, 1989: Jeannerod & Jacob, 2005), metaphors
(Lakoff & Johnson, 1999), and reasoning in general (Goel et al., 1998; Johnson-Laird,
2005; Pinker, 1990).

O’Keefe (1996; 2003) has proposed that the primary function of locative prepositions is to
identify a set of spatial vectors between places. The neural substrate supporting such
function would consist of a specific class of neurons or “place cells” within the right hip­
pocampus and of cerebral structures interconnected with the hippocampus. Specifically, a
combination of the receptive fields of several space cells would define boundaries of re­
gions in space that effectively constitute the referential meaning of a preposition. For ex­
ample, the preposition “below” would identify a “place field” with its center on the verti­
cal direction vector from a reference object. The width of such a place field would typical­
ly be larger than the width of the reference object but would taper with distance so as to
form a tear-dropped region attached to the bottom surface of the reference object (see al­
so Carlson et al., 2003; Hayward & Tarr, 1995).

Cognitive neuroscience studies have found neuroanatomical correlates of locative prepo­


sitions within the left inferior prefrontal and left inferior parietal regions (Friederici,
1982; Tranel & Kemmerer, 2004). Consistently, neuroimaging studies have found that
naming spatial relationships with prepositions activated the same regions in healthy sub­
jects (Carpenter et al., 1999b; Damasio et al., 2001). Similar results have been found with
speakers of sign language (Emmorey et al., 2002). Kemmerer and Tranel (2000) found a
double dissociation between linguistic representations and perceptual representations;
that is, some patients had difficulties using locative prepositions but not making percep­
tual judgments, and other patients had the opposite problem. Laeng (1994) also noticed
that patients who made errors in a matching-to-sample task with pictures differing in
their categorical spatial relations were nonetheless able to follow the instructions of the
Token Test (where the comprehension of locative prepositions is necessary; De Renzi &
Vignolo, 1962). These findings indicate that the encoding of categorical spatial relations
(and their loss after left-hemisphere damage) cannot be easily reduced to the mediation
of semantic or verbal codes. In fact, the evidence suggests that, although perceptual rep­
resentations may be crucial for establishing the meaning of locative prepositions (Hay­
ward & Tarr, 1995), once these are learned, they can be supported and interpreted within
the semantic network and also selectively disrupted by (p. 45) brain damage. The concep­
Page 25 of 59
Representation of Spatial Relations

tual representation of locative prepositions also appears to be separated from other lin­
guistic representations (e.g., action verbs, despite several of these verbs sharing with
prepositions the conceptual domain of space) because brain damage can dissociate the
meanings of these terms (Kemmerer & Tranel, 2003).

Although the evidence for the existence of a division of labor for analog versus digital
spatial relations in humans is now clearly established, an analogous lateralization of brain
function in nonhuman species remains unclear (Vauclair et al., 2006). Importantly, lateral­
ization is under the influence of “opportunistic” processes of brain development that opti­
mize the interaction of different subsystems within the cerebral architecture (Jacobs,
1997, 1999). Thus, in our species, local interactions with linguistic and semantic net­
works may play a key role in the manner in which the spatial system is organized. That is,
biasing categorical spatial representations within a left hemisphere’s substrate by “yok­
ing” them with linguistic processes may facilitate a joint operation between perception,
language, and thought (Jacobs & Kosslyn, 1994; Kosslyn, 1987).

7. The “Where” of “What”: Spatial Information


Within the Object
The conceptual distinctions of categorical and coordinate spatial relations also have
strong similarities to different types of geometries (e.g., “topological” versus “Euclidean”
or “affine” geometries). For example, inside–outside judgments are topological judg­
ments. Piaget and Inhelder (1956) considered topological judgments as basic and sug­
gested that children learn topological spatial concepts earlier than other types of spatial
concepts, such as projective and “Euclidian-like” geometry; however, even infants are
sensitive to metric qualities (Liben, 2009).

Research in neuroscience shows that topological judgments are accomplished by the pari­
etal lobe (also in rats; Goodrich-Hunsaker et al., 2008); in humans, these judgments have
a robust left-hemisphere advantage (Wang et al., 2007). As originally reasoned by Franco
and Sperry (1977), given that we can represent multiple geometries (e.g., Euclidian,
affine, projective, topological) and the right hemisphere’s spatial abilities are superior to
those of the left (the prevalent view in the 1970s), the right hemisphere should match
shapes by their geometrical properties better than the left hemisphere. They tested this
idea with a group of (commissurotomized) “split-brain” patients in an intermodal (vision
and touch) task. Five geometrical forms of the same type were examined visually, while
one hand searched behind a curtain for one shape among three with the matching geome­
try. As expected, the left hand’s performance of the split-brain patients was clearly superi­
or to that of their right hand. This geometrical discrimination task required the percep­
tion of fine spatial properties of shapes (e.g., differences in slant and gradient of surfaces,
angular values, presence of concavities or holes). Thus, the superior performance of the
left hand, which is controlled by the right hemisphere, reflects the use of the right
hemisphere’s coordinate spatial relations’ system (of the right hemisphere) in solving a
shape discrimination task that crucially depends on the fine metrics of the forms. Later
Page 26 of 59
Representation of Spatial Relations

investigations on split-brain patients showed that the left hand outperforms the right
hand also when copying drawings from memory or in rearranging blocks of the WAIS-R
Block Design Test (LeDoux, Wilson, & Gazzaniga, 1977). Specifically, LeDoux and Gaz­
zaniga (1978) proposed that the right hemisphere possesses a perceptual capacity that is
specifically dedicated to the analysis of space in the service of the organization of action
or movements planning that they called a manipulospatial subsystem. Again, a left-hand
(right-hemisphere) superiority in these patients’ constructions is consistent with a view
that rearranging multiple items that are identical in shape (and share colors) may require
a coordinate representation of the matrix of the design or array (Laeng, 2006).

A shape is intuitively a geometrical entity that occupies a volume of space. As such, a


shape is nothing but the spatial arrangement of the points in space occupied by it. How­
ever, many real-world objects can be parsed in component elements or simpler shapes,
and many objects differ in the locations of similar or identical constitutive elements (Bie­
derman, 1987). Intuitively, it would seem that a multipart object (e.g., a bicycle) is noth­
ing but the spatial relations among its parts. According to several accounts, an object can
be represented as a structural description (Marr, 1982) or a representation of connec­
tions between parts (e.g., geons, in Biederman’s, 1987, recognition by components mod­
el). In these computational models, an object’s connections are conceived as abstract spa­
tial specifications of how an object’s parts are put together. The resulting representations
can differentially describe whole classes of similar objects (e.g., cups versus buckets). In
this case, abstract, categorical, spatial relations (Hayward & Tarr, 1995; Hummel & Bie­
derman, 1992) (p. 46) could provide the spatial ingredient of such structural descriptions.
Indeed, Kosslyn (1987) proposed that the left dorsal system’s categorical spatial represen­
tations can play a role in object recognition, by representing spatial relations among the
object’s parts. In this account, shape properties are stored in a visual memory system
within the inferior temporal lobe (Tanaka et al., 1991) as a nontopographical “population
code” that ignores locations. Whereas the dorsal system registers locations in a topo-
graphic map that ignores shape (e.g., the object can be represented here simply as a
point and its location specified relatively to other points or indices). Such a map of in­
dices or spatial tokens could then represent the locations of objects in a scene or of parts
of objects in space and form an “object map” (Kosslyn et al., 2006) or “skeletal
image” (Kosslyn, 1980). This information can be used to reconstruct the image by posi­
tioning (back-propagating) each part representation in its correct location within the
high-detail topographic maps of the occipital lobes. When reconstituting a mental image
or, generally, in recollection (O’Regan & Nöe, 2001), the set of locations retrieved from
the “object map” could also be specified by the relation of parts to eye position during
learning (Laeng & Teodorescu, 2002).

Based on the above account, one would expect that lesions of the dorsal system (in partic­
ular, of the left hemisphere) would result in object recognition problems. However, as al­
ready discussed, patients with parietal lesions do not present the dramatic object recogni­
tion deficits of patients with temporal lesions. Patients with unilateral lesions localized in
the parietal lobe often appear to lack knowledge of the spatial orientation of objects, yet
they appear to achieve normal object recognition (Turnbull et al., 1995, 1997). For exam­
Page 27 of 59
Representation of Spatial Relations

ple, they may fail to recognize their correct orientation or, when drawing from memory,
they rotate shapes of 90 or 180 degrees. Most remarkably, patients with bilateral parietal
lesions (i.e., with Bálint’s syndrome and simultanagnosia), despite being catastrophically
impaired in their perception of spatial relations between separate objects, can recognize
an individual object (albeit very slowly; Duncan et al., 2003) on the basis of its shape.
Hence, we face something of a paradox: Object identity depends on spatial representa­
tions among parts (i.e., within object relations), but damage to the dorsal spatial systems
does not seem to affect object recognition (Farah, 1990). It may appear that representing
spatial relations “within” and “without” shapes depends on different perceptual mecha­
nisms.

However, there exists evidence that lesions in the dorsal system can cause specific types
of object recognition deficits (e.g., Warrington, 1982; Warrington & Taylor, 1973). First of
all, patients with Bálint’s syndrome do not have entirely normal object perception (Dun­
can et al., 2003; Friedman-Hill et al., 1995; Robertson et al., 1997). Specifically, patient
R.M., with bilateral parieto-occipital lesions, is unable to judge both relative and absolute
visual locations. Concomitantly, he makes mistakes in combining the colors and shapes of
separate objects or the shape of an object with the size of another (i.e., the patient com­
mits several “illusory conjunctions”). Thus, an inadequate spatial representation or loss of
spatial awareness of the features of forms, due to damage to parietal areas, appears to
underlie both the deficit in spatial judgment and that of binding shape features. Accord­
ing to feature integration theory (Treisman, 1988), perceptual representations of two sep­
arate objects currently in view require integration of information in the dorsal and ven­
tral system, so that each object’s specific combination of features in their proper loca­
tions can be obtained.

Additional evidence that spatial information plays a role in shape recognition derives
from a study with lateralized stimuli (Laeng, Shah & Kosslyn, 1999). This study revealed a
short-lived advantage for the left hemisphere (i.e., for stimuli presented tachistoscopical­
ly to the right visual field) in the recognition of pictures of contorted poses of animals. It
was reasoned that nonrigid multipart objects (typically animal bodies but also some arti­
facts, e.g., a bicycle) can take a number of contortions that, combined with an unusual
perspective, are likely to be novel or rarely experienced by the observer. In such cases,
the visual system may opt to operate in a different mode from the usual matching of
stored representations (i.e., bottom-up matching of global templates) and initiate a hy­
pothesis-testing procedure (i.e., a top-down search for connected parts and a serial
matching of these to stored structural descriptions). In the latter case, the retrieval of
categorical spatial information (i.e., a hypothesized dorsal and left hemisphere’s function)
seems to be crucial for recognition. Abstract spatial information about the connectivity of
the object’s parts would facilitate the formation of a perceptual hypothesis and verifying
it by matching visible parts to the object’s memorized spatial configuration. In other
words, an “object map” in the dorsal system specifies the spatial relations among parts’
representation of the complex pattern represented by the ventral system (Kosslyn, Ganis,
& Thompson, 2006).

Page 28 of 59
Representation of Spatial Relations

Figure 3.6 Stimuli used in an object recognition task


with patients with unilateral posterior lesions. Pa­
tients with damage to the left hemisphere had
greater difficulties with the noncanonical views of
the nonrigid objects (animals) than those with dam­
age to the right hemisphere, whereas those with
damage to the right hemisphere had relatively
greater difficulties with the noncanonical views of
rigid objects.

Reprinted with permission from Laeng et al., 2000.

A subsequent study (Laeng et al., 2000) of patients with unilateral posterior dam­
(p. 47)

age (mainly affecting the parietal lobe) confirmed that patients with left-hemisphere dam­
age had greater difficulties in recognizing the contorted bodies of animals (i.e., the same
images used in Laeng et al.’s, 1999, study) than those with right- hemisphere damage
(Figure 3.6). However, left-hemisphere damage resulted in less difficulty than right-hemi­
sphere damage when recognizing pictures of the same animals seen in conventional pos­
es but from noncanonical (unusual) views as well as when recognizing rigid objects (arti­
facts) from noncanonical views. As originally shown in studies by Warrington (1982; War­
rington & Taylor, 1973), patients with right parietal lesions showed difficulties in the
recognition or matching of objects when viewed at unconventional perspectives or in the
presence of strong shadows. According to Marr (1982), these findings suggested that the
patients’ difficulties reflect the inability to transform or align an internal spatial frame of
reference centered on the object’s intrinsic coordinates (i.e., its axes of elongation) to
match the perceived image.

To conclude, the dorsal system plays a role in object recognition but as an optional re­
source (Warrington & James, 1988) by cooperating with the ventral system during chal­
lenging visual situations (e.g., novel contortions of flexible objects or very unconventional
views or difficult shape-from-shadows discriminations; Warrington & James, 1986) or
when making fine judgments about the shapes of objects that differ by subtle variations

Page 29 of 59
Representation of Spatial Relations

in size or orientation (Aguirre & D’Esposito, 1997; Faillenot et al., 1997). In ordinary cir­
cumstances, different from these “visual problem solving” (p. 48) situations (Farah, 1990),
a spatial analysis provided by the dorsal system seems neither necessary nor sufficient to
achieve object recognition.

8. Cognitive Maps
As thinking agents, we accumulate in our lifetime a spatial understanding of our sur­
rounding physical world. Also, we can remember and think about spatial relations either
in the immediate, visible, physical environment or in the invisible environments of a large
geographic scale. We can also manipulate virtual objects in a virtual space and imagined
geometry (Aflalo & Graziano, 2008). As a communicative species, we can transfer knowl­
edge about physical space to others through symbolic systems like language and geo­
graphical maps (Liben, 2009). Finally, as a social species, we tend to organize space into
territories and safety zones, and to develop a sense of personal place.

A great deal of our daily behavior must be based on spatial decisions and choices be­
tween routes, paths, and trajectories. A type of spatial representation, called the cogni­
tive map, appears to be concerned with the knowledge of large-scale space (Cheng, 1986;
Kosslyn et al., 1974; Kuipers, 1978; Tolman, 1948; Wolbers & Hegarty, 2010). A distinc­
tion can be made between (1) a map-like representation, consisting of a spatial frame ex­
ternal to the navigating organism (this representation is made by the overall geometric
shape of the environment [survey knowledge] and/or a set of spatial relationships be­
tween locales [landmarks and place]); and (2) an internal spatial frame that is based on
egocentric cues generated by self-motion (route knowledge) and vestibular information
(Shelton & McNamara, 2001). Cognitive maps may be based on categorical spatial infor­
mation (often referred to as topological; e.g., Poucet, 1993), which affords a coarse repre­
sentation of the connectivity of space and its overall arrangement, combined with coordi­
nate (metric) information (e.g., information about angles and distances) of the large-scale
environment.

Navigation (via path integration or dead reckoning or via the more flexible map-like rep­
resentation) and environmental knowledge can be disrupted by damage to a variety of
brain regions. Parietal lesions result in difficulties when navigating in immediate space
(DiMattia & Kesner, 1988; Stark et al., 1996) and can degrade the topographic knowledge
of their environment (Newcombe & Russell, 1969; Takahashi et al., 1997). However, areas
supporting cognitive map representations in several vertebrate species appear to involve
portions of the hippocampus and surrounding areas (Wilson & McNaughton, 1993). As
originally revealed by single-cell recording studies in rats (O’Keefe, 1976; O’Keefe &
Nadel, 1978) and later in primates (O’Keefe et al., 1998; Ludvig et al., 2004) and also hu­
mans (Ekstrom et al., 2003), some hippocampal cells can provide a spatial map-like repre­
sentation within a reference frame fixed onto the external environment. For example,
some of these cells have visual receptive fields that do not move with the position of the
animal or with changes in viewpoint but instead fire whenever the animal (e.g., the mon­

Page 30 of 59
Representation of Spatial Relations

key; Rolls et al., 1989; Fyhn et al., 2004) is in a certain place in the local environment.
Thus, these cells can play an important functional role as part of a navigational system
(Lenck-Santini et al., 2001). Reactivation of place cells has also been observed during
sleep episodes in rats (Wilson & McNaughton, 1994), which can be interpreted as an of­
fline consolidation process of spatial memories. Reactivation of whole past sequences of
place cell activity has been recorded in rats during maze navigation whenever they stop
at a turning point (Foster & Wilson, 2006); in some cases, place cell discharges can indi­
cate future locations along the path (Johnson & Redish, 2007) before the animals choose
between alternative trajectories. Another type of cell (Figure 3.7) has been found in the
rat entorhinal cortex (adjacent to the hippocampus). These cells present tessellating fir­
ing fields or “grids” (Hafting et al., 2005; Solstad et al., 2008) that could provide the ele­
ments of a spatial map based on path integration (Kjelstrup et al., 2008; Moser et al.,
2008) and thus complement the function of the place cells.

Figure 3.7 Neuralfiring of “place cells” and “grid


cells” of rats while navigating in their cage environ­
ment.

Reprinted with permission from Moser et al., 2008.

Ekstrom and colleagues (2003) recorded directly from hippocampal and parahippocampal
cells of epileptic patients undergoing neurosurgery. The patients played a videogame (a
taxi-driving game in which a player navigates within a virtual city) while neural activity
was recorded simultaneously from multiple cells. A significant proportion of the recorded
cells showed spiking properties identical to those of place cells already described for the
rat’s hippocampus. Other cells were instead view responsive. They responded to the view
of a specific landmark (e.g., the picture of a particular building) and were relatively more
common in the patients’ parahippocampal region. Thus, these findings support an ac­
count of the human hippocampus as computing a flexible map-like representation of
space by combining visual and spatial elements with a coarser representation of salient
scenes, views, and landmarks formed in the parahippocampal region. In addition, neu­
roimaging studies in humans (p. 49) revealed activity in the hippocampal region during
navigational memory tasks (e.g., in taxi drivers recalling routes; Grön et al., 2000;
Maguire et al., 1997; Wolbers et al., 2007). Lesion studies of animals and neurological
cases have demonstrated deficits after temporal lesions that include the hippocampus
(Barrash et al., 2000; Kessels et al., 2001; Maguire et al., 1996). However, the hippocam­
pus and the entorhinal cortex may not constitute necessary spatial structures for humans
and for all types of navigational abilities; patients with lesions in these areas can main­

Page 31 of 59
Representation of Spatial Relations

tain a path in mind and point to (estimate) the distance from a starting point by keeping
track of a reference location while moving (Shrager etal., 2008).

In rats, the parietal cortex is also clearly involved in the processing of spatial information
(Save & Poucet, 2000) and constitutes another important structure for navigation (Nitz,
2006; Rogers & Kesner, 2006). One hypothesis is that the parietal cortex is involved in
combining visual-spatial information and self-motion information so that egocentrically
acquired information can be relayed to the hippocampus to generate and update an allo­
centric representation of space. Based on the role of the human dorsal system in the com­
putation of both categorical and coordinate types of spatial representations (Laeng et al.,
2003), one would expect a strong interaction between processing in the hippocampal for­
mation and in the posterior parietal cortex. The human parietal cortex could provide both
coordinate (distance and angle) and categorical information (boundary conditions, con­
nectivity, and topological information; Poucet, 1993) to the hippocampus. In turn, the hip­
pocampus could combine the above spatial information with spatial scenes encoded by
the parahippocampal area and ventral areas specialized for landscape object recognition
(e.g., recognition of a specific building; Aguirre et al., 1998). In addition, language-based
spatial information (Hermer & Spelke, 1996; Hermer-Vazquez et al., 2001) could play an
active role for this navigational system.

Neuroimaging studies with humans revealed activity in the parahippocampal cortex when
healthy participants passively viewed an environment or large-scale scenes (Epstein &
Kanwisher, 1998), including an empty room, as well as during navigational tasks (e.g., in
virtual environments; Aguirre et al., 1996; Maguire et al., 1998, 1999). Patients with dam­
age in this area show problems in scene recognition and route learning (Aguirre &
D’Esposito, 1999; Epstein et al., 2001). Subsequent research with patients and monkeys
has clarified the involvement of the parahippocampal cortex in memorizing objects’ loca­
tions within a large-scale scene or room’s geometry (Bohbot et al., 1998; Malkova &
Mishkin, 2003), more than in supporting navigation or place knowledge (Burgess &
O’Keefe, 2003). One proposal is that, when remembering a place or scene, the parietal
cortex, based on reciprocal connections, can also translate an allocentric (North, South,
East, West) parahippocampal representation into an egocentric (left, right, ahead, be­
hind) representation (Burgess, 2008). By this account, neglect in scene imagery (e.g., the
Milan square’s neglect experiment of Bisiach & Luzzatti, 1978) after parietal lesions
would result from an intact ventral allocentric representation of space (i.e., the whole
square) along with damage to the parietal egocentric representation.

Conclusion
As humans, we “act” in space, “know” space, and “talk” about space; three func­
(p. 50)

tions that together would seem to require the whole human brain in order to be accom­
plished. Indeed, research on the human brain’s representation of spatial relations in­
cludes rather different traditions and theoretical backgrounds, which taken together pro­
vide us with a complex and rich picture of our cognition of space. Neuroscience has re­

Page 32 of 59
Representation of Spatial Relations

vealed (1) the existence of topo-graphic maps in the brain or, in other words, the brain’s
representation of space by the spatial organization of the brain itself. The visual world is
then represented by two higher order, representational streams of the brain, anatomically
located ventrally and dorsally, that make the basic distinction between (2) “what” is in the
world and “where” it is. However, these two forms of information need to be integrated in
other representations that specify (3) “how” an object can be acted on and “which” object
is the current target. Additionally, the brain localizes objects according to multiple and
parallel (4) spatial frames of reference that are also relevant to the manner in which spa­
tial attention is deployed. After brain damage, attentional deficits (5) or neglect clearly
reveal the relevance allocating attention along different frames of reference. Although
many of the reviewed functions are shared among humans and other animals, humans
show a strong degree of (6) cerebral lateralization for spatial cognition, and the current
evidence indicates complementary hemispheric specializations for digital (categorical)
and for analog (coordinate) spatial information. The representation of categorical spatial
relations is also relevant for (7) object recognition by specifying the spatial arrangement
of parts within an object (i.e., the “where of what”). Humans, as other animals, can also
represent space in the very large scale, a (8) cognitive map of the external environment,
which is useful for navigation.

The most striking finding of cognitive neuro-science is the considerable degree of func­
tional specialization of the brain’s areas. Interestingly, the discovery that the visual brain
separates visual information into two streams of processing (“what” versus “where”) does
particular justice to Kant’s classic concept of space as a separate mode of knowledge
(Moser et al., 2008). In the Critique of Pure Reason, space was defined as what is left
when one ignores all the attributes of a shape: “If we remove from our empirical concept
of a body, one by one, every feature in it which is empirical, the color, the hardness or
softness, the weight, even the impenetrability, there still remains the space which the
body (now entirely vanished) occupied, and this cannot be removed” (Kant, 1787; 2008, p.
377).

Author Note
I am grateful for comments and suggestions on drafts of the chapter to Charlie Butter,
Michael Peters, and Peter Svenonius.

Please address correspondence to Bruno Laeng, Ph.D., Department of Psychology, Univer­


sity of Oslo, 1094 Blindern, 0317 Oslo, Norway; e-mail: bruno.laeng@psykologi.uio.no.

References
Aflalo, T. N., & Graziano, M. S. A. (2006). Possible origins of the complex topographic or­
ganization of motor cortex: Reduction of a multidimensional space onto a two-dimension­
al array. Journal of Neuroscience, 26, 6288–6297.

Page 33 of 59
Representation of Spatial Relations

Aflalo, T. N., & Graziano, M. S. A. (2008). Four-dimensional spatial reasoning in humans.


Journal of Experimental Psychology: Human Perception and Performance, 34, 1066–1077.

Aglioti, S., Goodale, M. A., & DeSouza, J. F. X (1995). Sizecontrast illusions deceive the
eye but not the hand. Current Biology, 5, 679–685.

Aguirre, G. K., & D’Esposito, M. (1997). Environmental knowledge is subserved by sepa­


rable dorsal/ventral neural areas. Journal of Neuroscience, 17, 2512–2518.

Aguirre, G. K., & D’Esposito, M. (1999). Topographical disorientation: A synthesis and


taxonomy. Brain, 122, 1613–1628.

Aguirre, G. K., Zarahn, E., & D’Esposito, M. (1998). An area within human ventral cortex
sensitive to “building” stimuli: Evidence and implications. Neuron, 17, 373–383.

Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). “What” and
“where” in the human auditory system. Proceedings of the National Academy of Sciences
U S A, 98, 12301–12306.

Alivisatos, B., & Petrides, M. (1997). Functional activation of the human brain during
mental rotation. Neuropsychologia, 35, 111–118.

Amorapanth, P. X., Widick, P., & Chatterjee, A. (2010). The neural basis for spatial rela­
tions. Journal of Cognitive Neuroscience, 22, 1739–1753.

Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. An­
nual Review of Neuroscience, 25, 189–220.

Andersen, R. A., Essick, G. K., & Siegel, R. M. (1985). The encoding of spatial location by
posterior parietal neurons. Science, 230, 456–458.

Avillac, M., Deneve, S., Olivier, E., Pouget, A., & Duhamel, J. R. (2005). Reference frames
for representing visual and tactile locations in parietal cortex. Nature Neuroscience, 8,
941–949.

Baars, B. J. (2002). The conscious access hypothesis: Origins and recent evidence. Trends
in Cognitive Sciences, 6, 47–52.

Baciu, M., Koenig, O., Vernier, M.-P., Bedoin, N., Rubin, C., & Segebarth, C. (1999). Cate­
gorical and coordinate spatial relations: fMRI evidence for hemispheric specialization.
Neuroreport, 10, 1373–1378.

(p. 51) Bálint, R. (1909). Seelenlähmung des “Schauens”, optische Ataxie, räumliche
Störung der Aufmerksamkeit. European Neurology, 25 (1), 51–66, 67–81.

Ballard, D. H. (1986). Cortical connections and parallel processing: Structure and func­
tion. Behavioral and Brain Sciences, 9, 67–120.

Page 34 of 59
Representation of Spatial Relations

Banich, M. T., & Federmeier, K. D. (1999). Categorical and metric spatial processes distin­
guished by task demands and practice. Journal of Cognitive Neuroscience, 11 (2), 153–
166.

Barlow, H. (1981). Critical limiting factors in the design of the eye and visual cortex. Pro­
ceedings of the Royals Society of London, Biological Sciences, 212, 1–34.

Barrash, J., Damasio, H., Adolphs, R., & Tranel, D. (2000). The neuroanatomical corre­
lates of route learning impairment. Neuropsychologia, 38, 820–836.

Battista, C., & Peters, M. (2010). Ecological aspects of mental rotation around the vertical
and horizontal axis. Learning and Individual Differences, 31 (2), 110–113.

Baxter, D. M., & Warrington, E. K. (1983). Neglect dysgraphia. Journal of Neurology, Neu­
rosurgery, and Psychiatry, 46, 1073–1078.

Behrmann, M., & Moscovitch, M. (1994). Object-centered neglect in patients with unilat­
eral neglect: Effects of left-right coordinates of objects. Journal of Cognitive
Neuroscience, 6, 1–16.

Behrmann, M., & Tipper, S. P. (1999). Attention accesses multiple reference frames: Evi­
dence from visual neglect. Journal of Experimental Psychology: Human Perception and
Performance, 25, 83–101.

Beschin, N., Basso, A., & Della Sala, S. (2000). Perceiving left and imagining right: Disso­
ciation in neglect. Cortex, 36, 401–414.

Beschin, N., Cubelli, R., Della Sala, S., & Spinazzola, L. (1997). Left of what? The role of
egocentric coordinates in neglect. Journal of Neurosurgery and Psychiatry, 63, 483–489.

Biederman, I. (1987). Recognition-by-components: A theory of human image understand­


ing. Psychological Review, 94, 115–147.

Bisiach, E., Capitani, E., & Porta, E. (1985). Two basic properties of space representation
in the brain: Evidence from unilateral neglect. Journal of Neurology, Neurosurgery, and
Psychiatry, 48, 141–144.

Bisiach, E., & Luzzatti, C. (1978). Unilateral neglect of representational space. Cortex, 14,
129–133.

Block, N. (1996). How can we find the neural correlate of consciousness. Trends in Neuro­
sciences, 19, 456–459.

Bohbot, V. D., Kalina, M., Stepankova, K., Spackova, N., Petrides, M., & Nadel, L. (1998).
Spatial memory deficits in patients with lesions to the right hippocampal and the right
parahippocampal cortex. Neuropsychologia, 36, 1217–1238.

Brandt, T., & Dietrich, M. (1999). The vestibular cortex: Its locations, functions and disor­
ders. Annals of the New York Academy of Sciences, 871, 293–312.
Page 35 of 59
Representation of Spatial Relations

Bruyer, R., Scailquin, J. C., & Coibon, P. (1997). Dissociation between categorical and co­
ordinate spatial computations: Modulation by cerebral hemispheres, task properties,
mode of response, and age. Brain and Cognition, 33, 245–277.

Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., Porro, C.A., &
Rizzolatti, G. (2004). Neural circuits involved in the recognition of actions performed by
nonconspecifics: An fMRI study. Journal of Cognitive Neuroscience, 16, 114–126.

Bullens, J., & Postma, A. (2008). The development of categorical and coordinate spatial
relations. Cognitive Development, 23, 38–47.

Burgess, N. (2008). Spatial cognition and the brain. Annals of the New York Academy of
Sciences, 1124, 77–97.

Burgess, N., & O’Keefe, J. (2003). Neural representations in human spatial memory.
Trends in Cognitive Sciences, 7, 517–519.

Burnod, Y., Baraduc, P., Battaglia-Mayer, A., Guigon, E., Koechlin, E., Ferraina, S., Lac­
quaniti, F., & Caminiti, R. (1999). Parieto-frontal coding of reaching: An integrated frame­
work. Experimental Brain Research, 129, 325–346.

Butter, C. M., Evans, J., Kirsh, N., & Kewman, D. (1989). Altitudinal neglect following trau­
matic brain injury: A case report. Cortex, 25, 135–146.

Butters, N., Barton, M., & Brody, B. A. (1970). Role of the right parietal lobe in the media­
tion of cross-modal associations and reversible operations in space. Cortex, 6, 174–190.

Caramazza, A., & Hillis, A. E. (1990). Spatial representation of words in the brain implied
by the studies of a unilateral neglect patient. Nature, 346, 267–269.

Carey, D. P., Dijkerman, H. C., Murphy, K. J., Goodale, M. A., & Milner, A. D. (2006). Point­
ing to places and spaces in a patient with visual form agnosia. Neuropsychologia, 44,
1584–1594.

Carey, D. P., Harvey, M., & Milner, A. D. (1996). Visuomotor sensitivity for shape and ori­
entation in a patient with visual form agnosia. Neuropsychologia, 3, 329–337.

Carlson, L., Regier, T., & Covey, E. (2003). Defining spatial relations: Reconciling axis and
vector representations. In E. van der Zee & J. Slack (Eds.), Representing direction in lan­
guage and space (pp. 111–131). Oxford, UK: Oxford University Press.

Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999a). Graded func­
tional activation in the visuospatial system with the amount of task demand. Journal of
Cognitive Neuroscience, 11, 9–24.

Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999b). Time-course of
fMRI activation in language and spatial networks during sentence comprehension. Neu­
roImage, 10, 216–224.

Page 36 of 59
Representation of Spatial Relations

Casati, R., & Varzi, A. (1999). Parts and places: The structures of spatial representation.
Boston: MIT Press.

Castiello, U. (2005). The neuroscience of grasping. Nature Reviews: Neuroscience, 6,


726–736.

Cavanagh, P. (1998). Attention: Exporting vision to the mind. In S. Saida & P. Cavanagh
(Eds.), Selection and integration of visual information, pp. 3–11. Tsukuba, Japan: STA &
NIBH-T.

Chafee, M. V., Averbeck, B. B., & Crowe, D. A. (2007). Representing spatial relationships
in posterior parietal cortex: Single neurons code object-referenced position. Cerebral Cor­
tex, 17, 2914–2932.

Chafee, M. V., Crowe, D. A., Averbeck, B. B., & Georgopoulos, A. P. (2005). Neural corre­
lates of spatial judgement during object construction in parietal cortex. Cerebral Cortex,
15, 1393–1413.

Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.

Cheng, K. (1986) A purely geometric module in the rat’s spatial representation. Cognition,
23, 149–178.

Cherniak, C. (1990). The bounded brain: Toward a quantitative neuroanatomy. Journal of


Cognitive Neuroscience, 2, 58–68.

(p. 52) Clément, G., & Reschke, M. F. (2008). Neuroscience in space. New York: Springer.

Colby, C. L., & Goldberg, M. E. (1999), Space and attention in parietal cortex. Annual Re­
view of Neuroscience, 23, 319–349.

Collett, T. (1982). Do toads plan routes? A study of the detour behaviour of Bufo Viridis.
Journal of Comparative Physiology, 146, 261–271.

Committeri, G., Galati, G., Paradis, A. L., Pizzamiglio, L., Berthoz, A., & LeBihan, D.
(2004). Reference frames for spatial cognition: Different brain areas are involved in view­
er-, object-, and landmark-centered judgments about object location. Journal of Cognitive
Neuroscience, 16, 1517–1535.

Committeri, G., Pitzalis, S., Galati, G., Patria, F., Pelle, G., Sabatini, U., Castriota-Scander­
beg, A., Piccardi, L., Guariglia, C., & Pizzamiglio L. (2007). Neural bases of personal and
extrapersonal neglect in humans. Brain, 130, 431–441.

Constantinidis, C., & Steinmetz, M. A. (2001). Neuronal responses in Area 7a to multiple-


stimulus displays: I. Neurons encode the location of the salient stimulus. Cerebral Cortex,
11, 581–591.

Cook, W. A. (1989). Case grammar theory. Washington, DC: Georgetown University Press.
Page 37 of 59
Representation of Spatial Relations

Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A.,
Linenweber, M. R., Petersen, S. E., Raichle, M. E., Van Essen, D. C., & Shulman, G. L.
(1998). A common network of functional areas for attention and eye movements. Neuron,
21, 761–773

Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol­
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature, 3, 292–297.

Corbetta, M., Miezin, F. M., Shulman, G. L., & Petersen, S. E. (1993). A PET study of visu­
ospatial attention. Journal of Neuroscience, 13, 1202–1226.

Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human
brain: From environment to theory of mind. Neuron, 58, 306–324.

Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1997). Transient and sustained
activity in a distributed neural system for human working memory. Nature, 386, 608–611.

Cowey, A., Small, M., & Ellis, S. (1994). Left visuo-spatial neglect can be worse in far than
in near space. Neuropsychologia, 32, 1059–1066.

Crowe, D. A., Averbeck, B. B., Chafee, M. V., & Georgopoulos, A. P. (2005). Dynamics of
parietal neural activity during spatial cognitive processing. Neuron, 47, 885–891.

Culham, J. C., Brandt, S. A., Cavanagh, P., Kanwisher, N. G., Dale, A. M., & Tootell, R. B.
H. (1998). Cortical fMRI activation produced by attentive tracking of moving targets.
Journal of Neurophysiology, 80, 2657–2670.

Culham, J. C., Cavanagh, P., & Kanwisher, N. G. (2001) Attention response functions:
Characterizing brain areas using fmri activation during parametric variations of atten­
tional load. Neuron, 32, 737–745.

Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153 (2), 180–189.

Culham, J. C., & Valyear, K. F. (2006). Human parietal cortex in action. Current Opinion in
Neurobiology, 16 (2), 205–212.

Damasio, H., Grabowski, T. J., Tranel, D., Ponto, L. L. B., Hichwa, R. D., & Damasio, A. R.
(2001). Neural correlates of naming actions and of naming spatial relations. NeuroImage,
13, 1053–1064.

Dehaene, S. (1997). The number sense. Oxford, UK: Oxford University Press.

Dehaene, S., & Changeux, J.-P. (1993). Development of elementary numerical abilities: A
neuronal model. Journal of Cognitive Neuroscience, 5, 390–407.

Page 38 of 59
Representation of Spatial Relations

Denys, K., Vanduffel, W., Fize, D., Nelissen, K., Peuskens, H., Van Essen, D., & Orban, G.
A. (2004). The processing of visual shape in the cerebral cortex of human and nonhuman
primates: A functional magnetic resonance imaging study. Journal of Neuroscience, 24,
2551–2565.

De Renzi, E. (1982). Disorders of space exploration and cognition. New York: John Wiley
& Sons.

De Renzi, E., & Vignolo, L. (1962). The Token Test: A sensitive test to detect receptive dis­
turbances in aphasics. Brain, 85, 665–678.

DeYoe, E. A., Carman, G. J., Bandettini, P., Glickman, S., Wieser, J., Cox, R., Miller, D., &
Neitz, J. (1996). Mapping striate and extrastriate visual areas in human cerebral cortex.
Proceedings of the National Academy of Sciences U S A, 93, 2382–2386.

DiMattia, B. V., & Kesner, R. P. (1988). Role of the posterior parietal association cortex in
the processing of spatial event information. Behavioral Neuroscience, 102, 397–403.

Driver, J., & Pouget, A. (2000). Object-Centered Visual Neglect, or Relative Egocentric Ne­
glect? Journal of Cognitive Neuroscience, 12 (3), 542–545.

Duhamel, J.-R., Bremmer, F., BenHamed, S., & Graf, W. (1997) Spatial invariance of visual
receptive fields in parietal cortex neurons. Nature, 389, 845–848.

Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Ward, R., Kyllingsbæk, S., van Raams­
donk, M., Rorden, R., & Chavda, S. (2003). Attentional functions in dorsal and ventral si­
multanagnosia. Cognitive Neuropsychology, 20, 675–701.

Ekstrom, A. D., Kahana, M. J., Caplan, J. B., Fields, T. A., Isham, E. A., Newman, E. L., &
Fried, I. (2003). Cellular networks underlying human spatial navigation. Nature, 425,
184–187.

Emmorey, K., Damasio, H., McCullough, S., Grabowski, T., Ponto, L., Hichwa, R., & Bellu­
gi, U. (2002). Neural systems underlying spatial language in American Sign Language.
Neuroimage, 17, 812–824.

Engel, S. A., Glover, G. H., & Wandell, B. A. (1997) Retinotopic organization in human vi­
sual cortex and the spatial precision of functional MRI. Cerebral Cortex, 7, 181–192.

Engel, S. A., Rumelhart, D. E., Wandell, B. A., Lee, A. T., Glover, G. H., Chichilnisky, E. J.,
& Shadlen, M. N. (1994). fMRI of human visual cortex. Nature, 369, 525.

Epstein, R., DeYoe, E. A., Press, D. Z., Rosen, A. C., & Kanwisher, N. (2001). Neuropsycho­
logical evidence for a topographical learning mechanism in parahippocampal cortex. Cog­
nitive Neuropsychology, 18, 481–508.

Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­
ment. Nature, 392 (6676), 598–601.

Page 39 of 59
Representation of Spatial Relations

Eurich, C. W., & Schwegler, H. (1997). Coarse coding: Calculation of the resolution
achieved by a population of large receptive field neurons. Biological Cybernetics, 76, 357–
363.

Fahle, M., & Poggio, T. (1981). Visual hyperacuity: Spatiotemporal interpolation in


(p. 53)

human vision. Philosophical Transactions of the Royal Society of London: Series B, 213,
451–477.

Faillenot, I., Toni, I., Decety, J., Grégoire, M.-C., & Jeannerod, M. (1997). Visual pathways
for object-oriented action and object recognition: Functional anatomy with PET. Cerebral
Cortex, 7, 77–85.

Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.

Farah, M. J. (1990). Visual agnosia: Disorders of object recognition and what they tell us
about normal vision. Cambridge, MA: MIT Press.

Farrell, M. J., & Robertson, I. H. (2000). The automatic updating of egocentric spatial re­
lationships and its impairment due to right posterior cortical lesions. Neuropsychologia,
38, 585–595.

Feldman, J. (1985). Four frames suffice: A provisional model of vision and space. Behav­
ioral and Brain Sciences, 8, 265–289.

Felleman, D. J., & Van Essen, D. C. (1991) Distributed hierarchical processing in primate
cerebral cortex. Cerebral Cortex, 1, 1–47.

Fishman, R. S. (1997). Gordon Holmes, the cortical retina, and the wounds of war. Docu­
menta Ophthalmologica, 93, 9–28.

Foster, D.J., & Wilson, M. A. (2006). Reverse replay of behavioural sequences in hip­
pocampal place cells during the awake state. Nature, 440, 680–683.

Franco, L., & Sperry, R. W. (1977). Hemisphere lateralization for cognitive processing of
geometry. Neuropsychologia, 75, 107–114.

Freedman, D. J., & Assad, J. A. (2006). Experience-dependent representation of visual cat­


egories in parietal cortex. Nature, 443, 85–88.

Friederici, A. D. (1982). Syntactic and semantic processes in aphasic deficits: The avail­
ability of prepositions. Brain and Language, 15, 249–258.

Friedman-Hill, S. R., Robertson, L. C., & Treisman, A. (1995). Parietal contributions to vi­
sual feature binding: Evidence from a patient with bilateral lesions. Science, 269, 853–
855.

Fyhn, M., Molden, S., Witter, M. P., Moser, E. I., & Moser, M.-B. (2004). Spatial represen­
tation in the entorhinal cortex. Science, 305, 1258–1264.
Page 40 of 59
Representation of Spatial Relations

Gallivan, J. P., Cavina-Pratesi, C., & Culham, J. C. (2009). Is that within reach? fMRI re­
veals that the human superior parieto-occipital cortex (SPOC) encodes objects reachable
by the hand. Journal of Neuroscience, 29, 4381–4391.

Gattass, R., Nascimento-Silva, S., Soares, J. G. M., Lima, B., Jansen, A. K., Diogo, A. C. M.,
Farias, M. F., Marcondes, M., Botelho, E. P., Mariani, O. S., Azzi, J., & Fiorani, M. (2005).
Cortical visual areas in monkeys: Location, topography, connections, columns, plasticity
and cortical dynamics. Philosophical Transactions of the Royal Society, B, 360, 709–731.

Glickstein, M., Buchbinder, S., & May, J. L. (1998). Visual control of the arm, the wrist and
the fingers: Pathways through the brain. Neuropsychologia, 36, 981–1001.

Goel, V., Gold, B., Kapur, S., & Houle, S. (1998). Neuroanatomical correlates of human
reasoning. Journal of Cognitive Neuroscience, 10, 293–302.

Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of be­


havior by representational memory. Handbook of Physiology, 5, 373–417.

Goodale, M.A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15, 20–25.

Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissoci­
ation between perceiving objects and grasping them. Nature, 349, 154–156.

Goodrich-Hunsaker, N. J., Howard, B. P., Hunsaker, M. R., & Kesner, R. P. (2008). Human
topological task adapted for rats: Spatial information processes of the parietal cortex.
Neurobiology of Learning and Memory, 90, 389–394.

Graziano, M. S. A., & Cooke, D. F. (2006). Parieto-frontal interactions, personal space, and
defensive behavior. Neuropsychologia, 44, 845–859.

Graziano, M. S. A., & Gross, C. G. (1995). Multiple representations of space in the brain.
The Neuroscientist, 1, 43–50.

Graziano, M. S. A., & Gross, C. G. (1998). Spatial maps for the control of movement. Cur­
rent Opinion in Neurobiology, 8, 195–201.

Graziano, M. S. A., Hu, X. T., & Gross, C. G. (1997). Coding the locations of objects in the
dark. Science, 277, 239–241.

Graziano, M. S. A., Reiss, L. A. J., & Gross, C. G. (1999). A neuronal representation of the
location of nearby sounds. Nature, 397, 428–430.

Graziano, M. S. A., Yap, G. S., & Gross, C. G. (1994). Coding of visual space by pre-motor
neurons. Science, 226, 1054–1057.

Gregory, R. (2009). Seeing through illusions. Oxford, UK: Oxford University Press.

Page 41 of 59
Representation of Spatial Relations

Grön, G., Wunderlich, A. P., Spitzer, M., Tomczak, R., & Riepe, M. W. (2000). Brain activa­
tion during human navigation: Gender-different neural networks as a substrate of perfor­
mance. Nature Neuroscience, 3, 404–408.

Gross, C.G., & Graziano, M. S. (1995). Multiple representations of space in the brain.
Neuroscientist, 1, 43–50.

Gross, C. G., & Mishkin, M. (1977). The neural basis of stimulus equivalence across reti­
nal translation. In S. Harnad, R. Doty, J. Jaynes, L. Goldstein, and G. Krauthamer (Eds.),
Laterulizution in the nervous system (pp. 109–122). New York: Academic Press.

Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). Microstructure of a
spatial map in the entorhinal cortex. Nature, 436, 801–806.

Halligan, P. W., & Marshall, J. C. (1991). Left neglect for near but not far space in man.
Nature, 350, 498–500.

Halligan, P. W., & Marshall, J. C. (1995). Lateral and radial neglect as a function of spatial
position: A case study. Neuropsychologia, 33, 1697–1702.

Harris, I. M., Egan, G. F., Sonkkila, C., Tochon-Danguy, H. J., Paxinos, G., & Watson, J. D.
(2000). Selective right parietal lobe activation during mental rotation: A parametric PET
study. Brain, 123, 65–73.

Harris, I. M., & Miniussi, C. (2003). Parietal lobe contribution to mental rotation demon­
strated with rTMS. Journal of Cognitive Neuroscience, 15, 315–323.

Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., et al.
(1991). Dissociation of object and spatial visual processing pathways in human extrastri­
ate cortex. Proceedings of the National Academy of Sciences U S A, 88, 1621–1625.

Hayward, W. G., & Tarr, M. J. (1995). Spatial language and spatial representation. Cogni­
tion, 55, 39–84.

He, B. J., Snyder, A. Z., Vincent, J. L., Epstein, A., Shulman, G. L., & Corbetta, M.
(p. 54)

(2007). Breakdown of functional connectivity in frontoparietal networks underlies behav­


ioral deficits in spatial neglect. Neuron, 53, 905–918.

Heide, W., Blankenburg, M., Zimmermann, E., & Kompf, D. 1995. Cortical control of dou­
blestep saccades—implications for spatial orientation. Annals of Neurology, 38, 739–748.

Heilman, K. M., Bowers, D., Coslett, H. B., Whelan, H., & Watson, R. T. (1985). Directional
hypokinesia: prolonged reaction times for leftward movements in patients with right
hemisphere lesions and neglect. Neurology, Cleveland, 35, 855–859.

Heilman, K. M., & Van Den Abell, T. (1980). Right hemisphere dominance for attention:
The mechanism underlying hemispheric asymmetries of inattention (neglect). Neurology,
30, 327–330.

Page 42 of 59
Representation of Spatial Relations

Hellige, J. B., & Michimata, C. (1989). Categorization versus distance: Hemispheric differ­
ences for processing spatial information. Memory & Cognition, 17, 770–776.

Hermer, L., & Spelke, E. (1996). Modularity and development: The case of spatial reorien­
tation. Cognition, 61, 195–232.

Hermer-Vazquez, L., Moffet, A., & Munkholm, P. (2001). Language, space, and the devel­
opment of cognitive flexibility in humans: The case of two spatial memory tasks. Cogni­
tion, 79, 263–299.

Hillis, A. E. (2006). Neurobiology of unilateral spatial neglect. Neuroscientist, 12, 153–


163.

Hillis, A. E., & Caramazza, A. (1991). Spatially-specific deficit to stimulus-centered letter


shape representations in a case of “neglect dyslexia.” Neuropsychologia, 29, 1223–1240.

Hillis, A. E., Newhart, M., Heidler, J., Barker, P. B., & Degaonkar, M. (2005). Anatomy of
spatial attention: Insights from perfusion imaging and hemispatial neglect in acute
stroke. Journal of Neuroscience, 25, 3161–3167.

Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In


D. E. Rumelhart & D. L. McClelland (Eds.), Parallel distributed processing: Explorations
in the microstructure of cognition. Volume 1: Foundations (pp. 77–109). Cambridge, MA:
MIT Press.

Holmes, G., & Horax, G. (1919). Disturbances of spatial orientation and visual attention,
with loss of stereoscopic vision. Archives of Neurology and Psychiatry, 1, 385–407.

Horton, J. C., & Hoyt, W. F. (1991). The representation of the visual field in human striate
cortex: A revision of the classic Holmes map. Archives of Ophthalmology, 109, 816–824.

Hubbard, E. M., Piazza, M., Pinel, P., & Dehaene, S. (2005). Interactions between number
and space in parietal cortex. Nature Reviews, Neuroscience, 6, 435–448.

Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape
recognition. Psychological Review, 99, 480–517.

Humphreys, G. W., & Riddoch, M. J. (1994). Attention to withinobject and between-object


spatial representation: Multiple sites for visual selection. Cognitive Neuropsychology, 11,
207–241.

Ingle, D. J. (1967). Two visual mechanisms underlying the behaviour of fish. Psychologis­
che Forschung, 31, 44–51.

Ings, S. (2007). A natural history of seeing. New York: Norton & Company.

Jackendoff, R., & Landau, B. (1992). Spatial language and spatial cognition. In R. Jackend­
off (Ed.), Languages of the mind: Essays on mental representation (pp. 99–124). Cam­
bridge, MA: MIT Press.
Page 43 of 59
Representation of Spatial Relations

Jacobs, R. A. (1997). Nature, nurture, and the development of functional specializations:


A computational approach. Psychonomic Bulletin & Review, 4, 299–309.

Jacobs, R. A. (1999). Computational studies of the development of functionally specialized


modules. Trends in Cognitive Sciences, 3, 31–38.

Jacobs, R. A., & Kosslyn, S. M. (1994). Encoding shape and spatial relations: The role of
receptive field size in coordinating complementary representations. Cognitive Science,
18, 361–386.

James, T. W., Culham, J. C., Humphrey, G. K., Milner, A. D., & Goodale, M. A. (2003). Ven­
tral occipital lesions impair object recognition but not object-directed grasping: A fMRI
study. Brain, 126, 2463–2475.

Jeannerod, M., Decety, J., & Michel, F. (1994). Impairment of grasping movements follow­
ing a bilateral posterior parietal lesion. Neuropsychologia, 32, 369–380.

Jeannerod, M., & Jacob, P. (2005). Visual cognition: A new look at the two-visual systems
model. Neuropsychologia, 43, 301–312.

Johnsen, S., & Lohmann, K. J. (2005). The physics and neurobiology of magnetoreception.
Nature Review Neuroscience, 6, 703–712.

Johnson, H., & Haggard, P. (2005). Motor awareness without perceptual awareness. Neu­
ropsychologia, 43, 227–237.

Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths
forward of the animal at a decision point. Journal of Neuroscience, 27, 12176–12189.

Johnson-Laird, P. N. (2005). Mental models in thought. In K. Holyoak & R. J. Sternberg


(Eds.), The Cambridge handbook of thinking and reasoning (pp. 179–212). Cambridge,
UK: Cambridge University Press.

Jordan, K., Wustenberg, T., Heinze, H. J., Peters, M., & Jancke, L. (2002). Women and men
exhibit different cortical activation patterns during mental rotation tasks. Neuropsycholo­
gia, 40, 2397–2408.

Just, M. A., Carpenter, P. A., Maguire, M., Diwadkar, V., & McMains, S. (2001). Mental ro­
tation of objects retrieved from memory: A functional MRI study of spatial processing.
Journal of Experimental Psychology: General, 130, 493–504.

Kaas, J. H. (1997). Topographic maps are fundamental to sensory processing. Brain Re­
search Bulletin, 44, 107–112.

Kahane, P., Hoffman, D., Minotti, L., & Berthoz, A. (2003) Reappraisal of the human
vestibular cortex by cortical electrical stimulation study. Annals of Neurology, 54, 615–
624.

Page 44 of 59
Representation of Spatial Relations

Kahneman, D., Treisman, A., & Gibbs, B. (1992). The reviewing of object files: Object-spe­
cific integration of information. Cognitive Psychology, 24, 175–219.

Kant, I. (1781). Kritik der reinen Vernunft (translation: Critique of Pure Reason, 2008.
Penguin Classics.

Karnath, H. O. (2001). New insights into the functions of the superior temporal cortex:
Nature Reviews, Neuroscience, 2, 568–576.

Karnath, H. O., Ferber, S., & Himmelbach, M. (2001). Spatial awareness is a function of
the temporal not the posterior parietal lobe. Nature, 411, 950–953.

Khan, A. Z., Pisella, L., Vighetto, A., Cotton, F., Luauté, J., Boisson, D., Salemme, R., Craw­
ford, J. D., & Rossetti, Y. (2005). Optic ataxia errors depend on remapped, not viewed, tar­
get location. Nature Neuroscience, 8, 418–420.

Kastner, S., Demner, I., & Ziemann, U. (1998). Transient visual field defects induced by
transcranial magnetic stimulation. Experimental Brain Research, 118, 19–26.

Kastner, S., DeSimone, K., Konen, C. S., Szczepanski, S. M., Weiner, K. S., & Sch­
(p. 55)

neider, K. A. (2007). Topographic maps in human frontal cortex revealed in memory-guid­


ed saccade and spatial working-memory tasks. Journal of Neurophysiology, 97, 3494–
3507.

Kastner, S., Pinsk, M. A., De Weerd, P., Desimone, R., & Ungerleider, L. G. (1999). In­
creased activity in human visual cortex during directed attention in the absence of visual
stimulation. Neuron, 22, 751–761.

Kemmerer, D. (2006). The semantics of space: Integrating linguistic typology and cogni­
tive neuroscience. Neuropsychologia, 44, 1607–1621.

Kemmerer, D., & Tranel, D. (2000). A double dissociation between linguistic and percep­
tual representations of spatial relationships. Cognitive Neuropsychology, 17, 393–414.

Kemmerer, D., & Tranel, D. (2003). A double dissociation between the meanings of action
verbs and locative prepositions. Neurocase, 9, 421–435.

Kessels, R. P. C., de Haan, E. H. F., Kappelle, L. J., & Postma, A. (2001). Varieties of human
spatial memory: A meta-analysis on the effects of hippocampal lesions. Brain Research
Reviews, 35, 295–303.

Kessels, R. P. C., Kappelle, L. J., de Haan, E. H. F., & Postma, A. (2002). Lateralization of
spatial-memory processes: evidence on spatial span, maze learning, and memory for ob­
ject locations. Neuropsychologia, 40, 1465–1473.

Kirchner, W. H., & Braun, U. (1994). Dancing honey bees indicate the location of food
sources using path integration rather than 48, cognitive maps. Animal Behaviour, 1437–
1441.

Page 45 of 59
Representation of Spatial Relations

Kinsbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from at­
tentional gradients within hemispace. In I. H. Robertson & J. C. Marshall (Eds.), Unilater­
al neglect: Clinical and experimental studies (pp. 63–86). Hillsdale, NJ: Erlbaum.

Kinsbourne, M., & Warrington, E. K. (1962). A disorder of simultaneous form perception.


Brain, 85, 461–486.

Kitada, R., Kito, T., Saito, D. N., Kochiyama, T., Matsamura, M., Sadato, N., & Lederman,
S. J. (2006). Multisensory activation of the intraparietal area when classifying grating ori­
entation: A functional magnetic resonance imaging study. Journal of Neuroscience, 26,
7491–7501.

Kjelstrup, K. B., Solstad, T., Brun, V. H., Hafting, T., Leutgeb, S., Witter, M. P., Moser, E. I.,
& Moser, M.-B. (2008). Finite scales of spatial representation in the hippocampus.
Science, 321, 140–143.

Koch, C. (2004). The quest for consciousness: A neurobiological approach. Englewood,


CO: Roberts and Company.

Kohonen, T. (2001). Self-organizing maps. Berlin: Springer.

Kosslyn, S. M. (1987). Seeing and imagining in the cerebral hemispheres: A computation­


al approach. Psychological Review, 94 (2), 148–175.

Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press.

Kosslyn, S. M. (1994). Image and brain: The resolution of the imagery debate. Cambridge,
MA: MIT Press.

Kosslyn, S. M., Chabris, C. F., Marsolek, C. J., & Koenig, O. (1992). Categorical versus co­
ordinate spatial relations: Computational analyses and computer simulations. Journal of
Experimental Psychology: Human Perception and Performance, 18 (2), 562–577.

Kosslyn, S. M., DiGirolamo, G. J., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation
of objects versus hands: neural mechanisms revealed by positron emission tomography.
Psychophysiology, 35, 151–161.

Kosslyn, S. M., & Jacobs, R. A. (1994). Encoding shape and spatial relations: A simple
mechanism for coordinating complementary representations. In V. Honavar & L. M. Uhr
(Eds.), Artificial intelligence and neural networks: Steps toward principled integration
(pp. 373–385). Boston: Academic Press.

Kosslyn, S. M., & Koenig, O. (1992). wet mind: the new cognitive neuroscience. New York:
Free Press.

Kosslyn, S. M., Koenig, O., Barrett, A., Cave, C. B., Tang, J., & Gabrieli, J. D. E. (1989). Evi­
dence for two types of spatial representations: Hemispheric specialization for categorical

Page 46 of 59
Representation of Spatial Relations

and coordinate relations. Journal of Experimental Psychology: Human Perception and Per­
formance, 15 (4), 723–735.

Kosslyn, S. M., Maljkovic, V., Hamilton, S. E., Horwitz, G., & Thompson, W. L. (1995). Two
types of image generation: Evidence for left and right hemisphere processes. Neuropsy­
chologia, 33 (11), 1485–1510.

Kosslyn, S. M., Pick, H. L., & Fariello, G. R. (1974). Cognitive maps in children and men.
Child Development, 45, 707–716.

Kosslyn, S. M., Thompson, W. T., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.

Kosslyn, S. M., Thompson, W.T., Gitelman, D. R., & Alpert, N. M. (1998). Neural systems
that encode categorical versus coordinate spatial relations: PET investigations. Psychobi­
ology, 26 (4), 333–347.

Króliczak, G., Heard, P., Goodale, M. A., & Gregory, R. L. (2006). Dissociation of percep­
tion and action unmasked by the hollow-face illusion. Brain Research, 1080, 9–16.

Kuipers, B. (1978). Modeling spatial knowledge. Cognitive Science, 2, 129–153.

Lacquaniti, F., Guigon, E., Bianchi, L., Ferraina, S., & Caminiti, R. (1995). Representing
spatial information for limb movement: The role of area 5 in the monkey. Cerebral Cortex,
5, 391–409.

Laeng, B. (1994). Lateralization of categorical and coordinate spatial functions: A study of


unilateral stroke patients. Journal of Cognitive Neuroscience, 6 (3), 189–203.

Laeng, B. (2006). Constructional apraxia after left or right unilateral stroke. Neuropsy­
chologia, 44, 1519–1523.

Laeng, B., Brennen, T., Johannessen, K., Holmen, K., & Elvestad, R. (2002). Multiple refer­
ence frames in neglect? An investigation of the object-centred frame and the dissociation
between “near” and “far” from the body by use of a mirror. Cortex, 38, 511–528.

Laeng, B., Carlesimo, G. A., Caltagirone, C., Capasso, R., & Miceli, G. (2000). Rigid and
non-rigid objects in canonical and non-canonical views: Effects of unilateral stroke on ob­
ject identification. Cognitive Neuropsychology, 19, 697–720.

Laeng, B., Chabris, C. F., & Kosslyn, S. M. (2003). Asymmetries in encoding spatial rela­
tions. In K. Hugdahl and R. Davidson (Eds.), The asymmetrical brain (pp. 303–339). Cam­
bridge, MA: MIT Press.

Laeng, B., Okubo, M., Saneyoshi, A., & Michimata, C. (2011). Processing spatial relations
with different apertures of attention. Cognitive Science, 35, 297–329.

Laeng, B., & Peters, M. (1995). Cerebral lateralization for the processing of spatial coor­
dinates and categories in left- and right-handers. Neuropsychologia, 33, 421–439.
Page 47 of 59
Representation of Spatial Relations

Laeng, B., Peters, M., & McCabe, B. (1997). Memory for locations within regions. Spatial
biases and visual hemifield differences. Memory and Cognition, 26, 97–107.

Laeng, B., Shah, J., & Kosslyn, S. M. (1999). Identifying objects in conventional and con­
torted poses: Contributions of hemisphere-specific mechanisms. Cognition, 70 (1), 53–85.

Laeng, B., & Teodorescu, D.-S. (2002). Eye scanpaths during visual imagery reen­
(p. 56)

act those of perception of the same visual scene. Cognitive Science, 26, 207–231.

Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh. Chicago: University of Chicago
Press.

Lamb, T. D., Collin, S. P., & Pugh, E. N. (2007). Evolution of the vertebrate eye: Opsins,
photoreceptors, retina and eye cup. Nature Reviews, Neuroscience, 8, 960–975.

Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cog­
nitive Sciences, 7, 12–18.

Lamme, V. A. F. (2006). Towards a true neural stance on consciousness. Trends in Cogni­


tive Sciences, 10, 494–501.

Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed­
forward and recurrent processing. Trends in Neuroscience, 23, 571–579.

Lamme, V. A. F., Super, H., Landman, R., Roelfsema, P. R., & Spekreijse, H. (2000). The
role of primary visual cortex (V1) in visual awareness. Vision Research, 40, 1507–1521.

Landau, B., Hoffman, J. E., & Kurz, N. (2006). Object recognition with severe spatial
deficits in Williams syndrome: Sparing and breakdown. Cognition, 100, 483–510.

LeDoux, J. E., & Gazzaniga, M. S. (1978). The integrated mind. New York: Plenum.

LeDoux, J. E., Wilson, D. H., & Gazzaniga, M. S. (1977). Manipulospatial aspects of cere­
bral lateralization: Clues to origin of lateralization. Neuropsychologia, 15, 743–750.

Lenck-Santini, P.-P., Save, E., & Poucet, B. (2001). Evidence for a relationship between
place-cell spatial firing and spatial memory performance. Hippocampus, 11, 337–390.

Liben, L. S. (2009). The road to understanding maps. Current Directions in Psychological


Science, 18, 310–315.

Livingstone, M. S., & Hubel, D. H. (1988). Segregation of form, color, movement, and
depth: Anatomy, physiology, and perception. Science, 240, 740–749.

Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process­
ing in auditory cortex. Nature Neuroscience, 11, 609–616.

Page 48 of 59
Representation of Spatial Relations

Ludvig, N., Tang, H. M., Gohil, B. C., & Botero, J. M. (2004). Detecting location-specific
neuronal firing rate increases in the hippocampus of freely moving monkeys. Brain Re­
search, 1014, 97–109.

Luna, B., Thulborn, K. R., Strojwas, M. H., McCurtain, B. J., Berman, R. A., Genovese, C.
R., et al. (1998). Dorsal cortical regions subserving visually guided in humans: an fMRI
study. Cerebral Cortex, 8 (1), 40–47.

Luria, A. R. (1963). Restoration of function after brain injury. Pergamon Press.

Luria, A. R. (1973). The working brain: An introduction to neuropsychology. New York:


Basic Books.

Maguire, E. A., Burgess, N., & O’Keefe, J. (1999). Human spatial navigation: cognitive
maps, sexual dimorphism. Current Opinion in Neurobiology, 9, 171–177.

Maguire, E. A., Burke, T., Phillips, J., & Staunton, H. (1996). Topographical disorientation
following unilateral temporal lobe lesions in humans. Neuropsychologia, 34, 993–1001.

Maguire, E. A., Frackowiak, R. S., & Frith, C. D. (1997). Recalling routes around London:
Activation of the right hippocampus in taxi drivers. Journal of Neuroscience, 17, 7103–
7110.

Maguire, E. A., Frith, C. D., Burgess, N., Donnett, J. G., & O’Keefe, J. (1998). Knowing
where things are: Parahippocampal involvement in encoding object relations in virtual
large-scale space. Journal of Neuroscience, 10, 61–76.

Mahon, B. Z., Milleville, S. C., Negri, G. A. L., Rumiati, R. I., Caramazza, A., & Martin, A.
(2007). Action-related properties shape object representations in the ventral stream. Neu­
ron, 55, 507–520.

Malach, R., Levy, I., & Hasson, U. (2002). The topography of high-order human object ar­
eas. Trends in Cognitive Science, 6, 176–184.

Malkova, L., & Mishkin, M. (2003). One-trial memory for object-place associations after
separate lesions of hippocampus and posterior parahippocampal region in the monkey.
Journal of Neuroscience, 1; 23 (5), 1956–1965.

Markman, A. B. (1999). Knowledge representation. Mahwah, NJ: Psychology Press.

Marr, D. (1982). Vision. San Francisco: Freeman and Company.

Maunsell, J. H. R., & Newsome, W. T. (1987). Visual processing in the monkey extrastriate
cortex. Annual Review of Neuroscience, 10, 363–401.

Medendorp, W. P., Goltz, H. C., Vilis, T., & Crawford, J. D. (2003). Gaze-centered updating
of visual space in human parietal cortex. Journal of Neuroscience, 23, 6209–6214.

Page 49 of 59
Representation of Spatial Relations

Mennemeier, M., Wertman, E., & Heilman, K. M. (1992). Neglect of near peripersonal
space. Brain, 115, 37–50.

Menzel, R., Brandt, R., Gumbert, A., Komischke, B., & Kunze, J. (2000). Two spatial mem­
ories for honeybee navigation. Proceedings of the Royals Society of London, Biological
Sciences, 267, 961–968.

Miller, G. A., & Johnson-Laird, P. N. (1976). Language and perception. Cambridge, MA:
Harvard University Press.

Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford: Oxford Univer­
sity Press.

Milner, D. A., & Goodale, M. A. (2006). The visual brain in action. New York: Oxford Uni­
versity Press.

Milner, D. A., & Goodale, M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46, 774–785.

Milner, D. A., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., et al.
(1991). Perception and action in “visual form agnosia.” Brain, 114, 405–428.

Mishkin, M., Ungerleider, L. G., & Macko, K. A. (1983). Object vision and spatial vision:
Two cortical pathways. Trends in Neurosciences, 6, 414–417.

Morel, A., & Bullier, J. (1990). Anatomical segregation of two cortical visual pathways in
the macaque monkey. Visual Neuroscience, 4, 555–578.

Moser, E. I., Kropff, E., & Moser, M.-B. (2008). Place cells, grid cells and the brain’s spa­
tial representation system. Annual Review of Neuroscience, 31, 69–89.

Motter, B. C., & Mountcastle, V. B. (1981). The functional properties of light-sensitive neu­
rons of the posterior parietal cortex studied in waking monkeys: Foveal sparing and oppo­
nent vector organization. Journal of Neuroscience, 1, 3–26.

Mountcastle, V. B. (1995). The parietal system and some higher brain functions. Cerebral
Cortex, 5, 377–390.

Naganuma, T., Nose, I., Inoue, K., Takemoto, A., Katsuyama, N., & Taira, M. (2005). Infor­
mation processing of geometrical features of a surface based on binocular disparity cues:
An fMRI study. Neuroscience Research, 51, 147–155.

Neggers, S. F. W., Van der Lubbe, R. H. J., Ramsey, N. F., & Postma, A. (2006). Interactions
between ego- and allocentric neuronal representations of space. NeuroImage, 31, 320–
331.

Newcombe, F., & Russell, W. R. (1969). Dissociated visual perceptual and spatial deficits
in focal lesions of the right hemisphere. Journal of Neurology, Neurosurgery, and Psychia­
try, 32, 73–81.
Page 50 of 59
Representation of Spatial Relations

Nguyen, B. T., Trana, T. D., Hoshiyama, M., Inuia, K., & Kakigi, R. (2004). Face rep­
(p. 57)

resentation in the human primary somatosensory cortex. Neuroscience Research, 50,


227–232.

Nitz, D. A. (2006). Tracking route progression in the posterior parietal cortex. Neuron, 49,
747–756.

O’Keefe, J. (1976). Place units in the hippocampus of the freely moving rat. Experimental
Neurology, 51, 78–109.

O’Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive
map theory. In P. Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Eds.), Language and
space (pp. 277–316). Cambridge, MA: The MIT Press.

O’Keefe, J. (2003). Vector grammar, places, and the functional role of spatial prepositions
in English. In E. van der Zee & J. Slack (Eds.), Representing direction in language and
space (pp. 69–85). Oxford, K: Oxford University Press.

O’Keefe, J., Burgess, N., Donnett, J. G., Jeffery, J. K., & Maguire, E. A. (1998). Place cells,
navigational accuracy, and the human hippocampus. P hilosophical Transactions of the
Royal Society of London, Series B, Biological Sciences, 353, 1333–1340.

O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, UK: Calren­
don Press.

Okubo, M., Laeng, B., Saneyoshi, A., & Michimata, C. (2010). Exogenous attention differ­
entially modulates the processing of categorical and coordinate spatial relations. Acta
Psychologica, 135, 1–11.

Olson, C. R. (2001). Object-based vision and attention in primates. Current Opinion in


Neurobiology, 11, 171–179.

Olson, C. R. (2003). Brain representations of object-centered space in monkeys and hu­


mans. Annual Review of Neuroscience, 26, 331–354.

Olson, C. R., & Gettner, S. N. (1995). Object-centered direction selectivity in the macaque
supplementary eye field. Science, 269, 985–988.

Optican, L. M. (2005). Sensorimotor transformation for visually guided saccases. Annals


of the New York Academy of Sciences, 1039, 132–148.

O’Regan, J. K., & Nöe, A. (2001). A sensorimotor account of vision and visual conscious­
ness. Behavioral and Brain Sciences, 24, 939–1011.

O’Reilly, R. C., Kosslyn, S. M., Marsolek, C. J., & Chabris, C. F. (1990). Receptive field
characteristics that allow parietal lobe neurons to encode spatial properties of visual in­
put: A computational analysis. Journal of Cognitive Neuroscience, 2, 141–155.

Page 51 of 59
Representation of Spatial Relations

Otto, I., Grandguillaume, P., Boutkhil, L., & Guigon, E. (1992). Direct and indirect cooper­
ation between temporal and parietal networks for invariant visual recognition. Journal of
Cognitive Neuroscience, 4, 35–57.

Paillard, J. (1991). Motor and representational framing of space. In J. Paillard (Ed.), Brain
and space (pp. 163–182). Oxford, UK: Oxford University Press.

Palermo, L., Bureca, I., Matano, A., & Guariglia, C. (2008). Hemispheric contribution to
categorical and coordinate representational processes: A study on brain-damaged pa­
tients. Neuropsychologia, 46, 2802–2807.

Parsons, L. M. (2003). Superior parietal cortices and varieties of mental rotation. Trends
in Cognitive Sciences, 7, 515–517.

Perenin, M.-T., & Vighetto, A. (1988). Optic ataxia: A specific disruption in visuomotor
mechanisms. Brain, 111, 643–674.

Piaget, J., & Inhelder, B. (1956). The child’s conception of space. London: Routledge &
Kegan Paul.

Perenin, M. T., & Jeannerod, M. (1978). Visual function within the hemianopic field follow­
ing early cerebral hemidecortication in man. I. Spatial localization. Neuropsychologia, 16,
1–13.

Pinker, S. (1990). A theory of graph comprehension. In R. Friedle (Ed.), Artificial intelli­


gence and the future of testing (pp. 73–126). Hillsdale, NJ: Erlbaum.

Pinker, S. (2007). The stuff of thought: Language as a window into human nature. New
York: Penguin Books.

Poucet, B. (1993). Spatial cognitive maps in animals: New hypotheses on their structure
and neural mechanisms. Psychological Review, 100, 163–182.

Pouget, A., & Sejnowski, T. J. (1997). A new view of hemineglect based on the response
properties of parietal neurones. Philosophical Transactions of the Royal Society, Series B,
Biological Sciences, 352, 1449–1459.

Pouget, A., Snyder, L. H. (2000). Computational approaches to sensorimotor transforma­


tions. Nature Neuroscience, 3, 1192–1198.

Quinlan, D. J., & Culham, J. C. (2007). fMRI reveals a preference for near viewing in the
human parietal-occipital cortex. NeuroImage, 36, 167–187.

Rao, S. C., Rainer, G., & Miller, E. K. (1997). Integration of what and where in the primate
prefrontal cortex. Science, 276, 821–824.

Reed, C. L., Klatzky, R. L., & Halgren, E. (2005). What vs. where in touch: An fMRI study.
NeuroImage, 25, 718–726.

Page 52 of 59
Representation of Spatial Relations

Revonsuo, A., & Newman, J. (1999). Binding and consciousness. Consciousness and Cog­
nition, 8, 123–127.

Ritz, T. (2009). Magnetic sense in animal navigation. In L. Squire (Ed.), Encyclopedia of


neuroscience (pp. 251–257). Elsevier: Network Version.

Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neu­
roscience, 27, 169–192.

Rizzolatti, G., & Matelli, M. (2003). Two different streams form the dorsal visual system:
Anatomy and functions. Experimental Brain Research, 153, 146–157.

Robertson, L. C. (2003). Binding, spatial attention and perceptual awareness. Nature Re­
views: Neuroscience, 4, 93–102.

Robertson, L. C., Treisman, A., Friedman-Hill, S., & Grabowecky, M. (1997). The interac­
tion of spatial and object pathways: Evidence from Bálint ‘s syndrome. Journal of Cogni­
tive Neuroscience, 9, 295–317.

Rogers, J. L., & Kesner, R. P. (2006). Lesions of the dorsal hippocampus or parietal cortex
differentially affect spatial information processing. Behavioral Neuroscience, 120, 852–
860.

Rogers, L. J., Zucca, P., & Vallortigara, G. (2004). Advantage of having a lateralized brain.
Proceedings of the Royal Society of London B (Suppl.): Biology Letters, 271, 420–422.

Rolls, E. T., Miyashita, Y., Cahusac, P. M. B., Kesner, R. P., Niki, H., Feigenbaum, J., et al.
(1989). Hippocampal neurons in the monkey with activity related to the place in which a
stimulus is shown. Journal of Neuroscience, 9, 1835–1845.

Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P.
(1999). Dual streams of auditory afferents target multiple domains in the primate pre­
frontal cortex. Nature Neuroscience, 2, 1131–1136.

Roth, E. C., & Hellige, J. B. (1998). Spatial processing and hemispheric asymmetry: Con­
tributions of the transient/magnocellular visual system. Journal of Cognitive
Neuroscience, 10, 472–484.

Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are ‘what’ and ‘where’ processed
by separate visual systems? A computational investigation. Journal of Cognitive Neuro­
science, 1 (2), 171–186.

Rybash, J. M., & Hoyer, W. J. (1992). Hemispheric specialization for categorical and
(p. 58)

coordinate spatial representations: A reappraisal. Memory & Cognition, 20 (3), 271–276.

Sakata, H., Taira, M., Kusunoki, M., Murata, A., & Tanaka, Y. (1997). The parietal associa­
tion cortex in depth perception and visual control of hand action. Trends in Neuroscience,
20, 350–357.

Page 53 of 59
Representation of Spatial Relations

Save, E., & Poucet, B. (2000). Hippocampal-parietal cortical interactions in spatial cogni­
tion. Hippocampus, 10, 491–499.

Schindler, I., Rice, N. J., McIntosh, R. D., Rossetti, Y., Vighetto, A., & Milner, A. D. (2004).
Automatic avoidance of obstacles is a dorsal stream function: Evidence from optic ataxia.
Nature Neuroscience, 7, 779–784.

Schneider, G. E. (1967). Contrasting visuomotor functions of tectum and cortex in the


golden hamster. Psychologische Forschung, 30, 52–62.

Seltzer, B., & Pandya, D. N. (1978). Afferent cortical connections and architectonics of the
superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Research,
149, 1–24.

Semmes, J., Weinstein, S., Ghent, L., & Teuber, H. L. (1955). Spatial orientation in man af­
ter cerebral injury: I. Analyses by locus of lesion. Journal of Psychology, 39, 227–244.

Sereno, A. B., & Maunsell, J. H. R. (1998). Shape selectivities in primate lateral intrapari­
etal cortex. Nature, 395, 500–503.

Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., et al. (1995). Bor­
ders of multiple visual areas in humans revealed by functional magnetic resonance imag­
ing. Science, 268, 889–893.

Sereno, M. I., & Huang, R.-S. (2006). A human parietal face area contains head-centered
visual and tactile maps. Nature Neuroscience, 9, 1337–1343.

Sereno, M. I., Pitzalis, S., & Martinez, A. (2001). Mapping of contralateral space in retino­
topic coordinates by a parietal cortical area in humans. Science, 294, 1350–1354.

Servos, P., Engel, S. A., Gati, J., & Menon, R. (1999). FMRI evidence for an inverted face
representation in human somatosensory cortex. Neuroreport, 10, 1393–1395.

Seth, A. K., McKinstry, J. L., Edelman, G. M., & Krichmar, J. L. (2004). Visual binding
through reentrant connectivity and dynamic synchronization in a brain-based device.
Cerebral Cortex, 14, 1185–1199.

Shelton, P. A., Bowers, D., & Heilman, K. M. (1990). Peripersonal and vertical neglect.
Brain, 113, 191–205.

Shelton, A. L., & McNamara, T. P. (2001). Systems of spatial reference in human memory.
Cognitive Psychology, 43, 274–310.

Shrager, Y., Kirwan, C. B., & Squire, L. R. (2008). Neural basis of the cognitive map: Path
integration does not require hippocampus or entorhinal cortex. Proceedings of the Na­
tional Academy of Sciences U S A, 105, 12034–12038.

Silver, M., & Kastner, S. (2009). Topographic maps in human frontal and parietal cortex.
Trends in Cognitive Sciences, 11, 488–495.
Page 54 of 59
Representation of Spatial Relations

Slobin, D. I. (1996). From “thought and language” to “thinking for speaking.” In J. J.


Gumperz & S. C. Levinson (Eds.), Rethinking linguistic relativity (pp. 70–96). Cambridge,
UK: Cambridge University Press.

Slotnick, S. D., & Moo, L. R. (2006). Prefrontal cortex hemispheric specialization for cate­
gorical and coordinate visual spatial memory. Neuropsychologia, 44, 1560–1568.

Slotnick, S. D., Moo, L., Tesoro, M. A., & Hart, J. (2001). Hemispheric asymmetry in cate­
gorical versus coordinate visuospatial processing revealed by temporary cortical deacti­
vation. Journal of Cognitive Neuroscience, 13, 1088–1096.

Smallman, H. S., MacLeod, D. I. A., He, S., & Kentridge, R. W. (1996). Fine grain of the
neural representation of human spatial vision. Journal of Neuroscience, 76 (5), 1852–
1859.

Smith, E. E., Jonides, J., Koeppe, R. A., Awh, E., Schumacher, E., & Minoshima, S. (1995).
Spatial vs. object working memory: PET investigations. Journal of Cognitive
Neuroscience, 7, 337–358.

Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior
parietal cortex. Nature, 386, 167–170.

Solstad, T., Boccara, C. N., Kropff, E., Moser, M.-B., & Moser, E. I. (2008). Representation
of geometric borders in the entorhinal cortex. Science, 322, 1865–1868.

Sovrano, V. A., Dadda, M., Bisazza, A. (2005). Lateralized fish perform better than nonlat­
eralized fish in spatial reorientation tasks. Behavioural Brain Research, 163, 122–127.

Stark, M., Coslett, H. B., & Saffran, E. M. (1996). Impairment of an egocentric map of lo­
cations: Implications for perception and action. Cognitive Neuropsychology, 13, 481–523.

Sutherland, R. J., & Rudy, J. W. (1988) Configural association theory: The role of the hip­
pocampal formation in learning, memory and amnesia. Psychobiology, 16, 157–163.

Tanaka, K., Saito, H., Fukada, Y., & Moriya, M. (1991) Coding visual images of objects in
the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, 66, 170–
189.

Taira, M., Mine, S., Georgopoulos, A. P., Murata, A., & Sakata, H. (1990). Parietal cortex
neurons of the monkey related to the visual guidance of hand movement. Experimental
Brain Research, 83, 29–36.

Takahashi, N., Kawamura, M., Shiota, J., Kasahata, N., & Hirayama, K. (1997). Pure topo­
graphic disorientation due to right retrosplenial lesion. Neurology, 49, 464–469.

Talmy, L. (2000). Toward a cognitive semantics. Cambridge, MA: MIT Press.

Page 55 of 59
Representation of Spatial Relations

Thiebaut de Schotten, M., Urbanski, M., Duffau, H., Volle, E., Levy, R., Dubois, B., & Bar­
tolomeo, P. (2005). Direct evidence for a parietal-frontal pathway subserving spatial
awareness in humans. Science, 309, 2226–2228.

Thivierge, J.-P., & Marcus, G. (2007). The topographic brain: From neural connectivity to
cognition. Trends in Neuroscience, 30, 251–259.

Tipper, S.P., & Behrmann, M. (1996). Object-centered not scene based visual neglect.
Journal of Experimental Psychology: Human Perception and Performance, 22, 1261–1278.

Tolman, E. C. (1948) Cognitive maps in rats and men. Psychological Review, 55, 189–208.

Tootell, R. B., Hadjikhani, N., Hall, E. K., Marrett, S., Vanduffel, W., Vaughan, J.T., & Dale,
A. M. (1998). The retinotopy of visual spatial attention. Neuron, 21, 1409–1422.

Tootell, R. B., Hadjikhani, N., Vanduffel, W., Liu, A. K., Mendola, J. D., Sereno, M. I., &
Dale, A. M. (1998). Functional analysis of primary visual cortex (V1) in humans. Proceed­
ings of the National Academy of Sciences U S A, 95, 811–817.

Tootell, R. B., Mendola, J. D., Hadjikhani, N., Liu, A. K., & Dale, A. M. (1998). The repre­
sentation of the ipsilateral visual field in human cerebral cortex. Proceedings of the Na­
tional Academy of Sciences U S A, 95, 818–824.

Tootell, R. B., Silverman, M. S., Switkes, E., & DeValois, R. L. (1982). Deoxyglucose analy­
sis of retinotopic organization in primate striate cortex. Science, 218, 902–904.

Tranel, D., & Kemmerer, D. (2004). Neuroanatomical correlates of locative prepositions.


Cognitive Neuropsychology, 21, 719–749.

Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178.

Treisman, A. (1988). The perception of features and objects. In R. D. Wright (Ed.),


(p. 59)

Visual attention (pp. 26–54). New York: Oxford University Press.

Tranel, D., & Kemmerer, D. (2004). Neuroanatomical correlates of locative prepositions.


Cognitive Neuropsychology, 21, 719–749.

Trojano, L., Conson, M., Maffei, R., & Grossi, D. (2006). Categorical and coordinate spa­
tial processing in the imagery domain investigated by rTMS. Neuropsychologia, 44, 1569–
1574.

Trojano, L., Grossi, D., Linden, D. E. J., Formisano, E., Goebel, R., & Cirillo, S. (2002). Co­
ordinate and categorical judgements in spatial imagery: An fMRI study. Neuropsychologia,
40, 1666–1674.

Tsal, Y., & Bareket, T. (2005). Localization judgments under various levels of attention.
Psychonomic Bulletin & Review, 12 (3), 559–566.

Page 56 of 59
Representation of Spatial Relations

Tsal, Y., Meiran, N., & Lamy, D. (1995). Toward a resolution theory of visual attention. Vi­
sual Cognition, 2, 313–330.

Tsao, D. Y., Vanduffel, W., Sasaki, Y., Fize, D., Knutsen, T. A., Mandeville, J. B., Wald, L. L.,
Dale, A. M., Rosen, B. R., Van Essen, D. C., Livingstone, M. S., Orban, G. A., & Tootell, R.
B. H. (2003). Stereopsis activates V3A and caudal intraparietal areas in macaques and
humans. Neuron, 31, 555–568.

Tsutsui, K.-I., Sakata, H., Naganuma, T., & Taira, M. (2002). Neural correlates for percep­
tion of 3D surface orientation from texture gradients. Science, 298, 409–412.

Turnbull, O. H., Beschin, N., & Della Sala, S. (1997). Agnosia for object orientation: Impli­
cations for theories of object recognition. Neuropsychologia, 35, 153–163.

Turnbull, O. H., Laws, K. R., & McCarthy, R. A. (1995). Object recognition without knowl­
edge of object orientation. Cortex, 31, 387–395.

Ullman, S. (1984). Visual routines. Cognition, 18, 97–159.

Ungerleider, L. G., & Haxby, J. V. (1994). “What” and “where” in the human brain. Current
Opinion in Neurobiology, 4, 157–165.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge,
MA: MIT Press.

Vallar, G., Bottini, G., & Paulesu, E. (2003). Neglect syndromes: the role of the parietal
cortex. Advances in Neurology, 93, 293–319.

Vallortigara, G., Cozzutti, C., Tommasi, L., & Rogers, L. J. (2001) How birds use their eyes:
Opposite left-right specialisation for the lateral and frontal visual hemifield in the domes­
tic chick. Current Biology, 11, 29–33.

Vallortigara, G., & Rogers, L. J. (2005). Survival with an asymmetrical brain: advantages
and disadvantages of cerebral lateralization. Behavioral and Brain Sciences, 28, 575–589.

Valyear, K. F., Culham, J. C., Sharif, N., Westwood, D., & Goodale, M. A. (2005). A double
dissociation between sensitivity to changes in object identity and object orientation in the
ventral and dorsal visual streams: A human fMRI study. Neuropsychologia, 44, 218–228.

Vauclair, J., Yamazaki, Y., & Güntürkün, O. (2006). The study of hemispheric specialization
for categorical and coordinate spatial relations in animals. Neuropsychologia, 44, 1524–
1534.

Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. Annual Re­


view of Neuroscience, 22, 145–173.

Wandell, B.A., Brewer, A.A., & Dougherty, R. F. (2005). Visual field map clusters in human
cortex. Philosophical Transactions of the Royal Society, B, 360, 693–707.
Page 57 of 59
Representation of Spatial Relations

Wandell, B. A., Dumoulin, S. O., & Brewer, A. A. (2009). Visual cortex in humans. In L.
Squire (Ed.), Encyclopedia of neuroscience (pp. 251–257). Elsevier: Network Version.

Wang, B., Zhou, T. G., Zhuo, Y., & Chen, L. (2007). Global topological dominance in the
left hemisphere. Proceedings of the National Academy of Sciences U S A, 104, 21014–
21019.

Warrington, E. K. (1982). Neuropsychological Studies of Object Recognition. Philosophi­


cal Transactions of the Royal Society of London. Series B, Biological, 298, 15–33.

Warrington, E. K., & James, A. M. (1986). Visual object recognition in patients with right-
hemisphere lesions: Axes or features. Perception, 15, 355–366.

Warrington, E. K., & James, A. M. (1988). Visual apperceptive agnosia: A clinico-anatomi­


cal study of three cases. Cortex, 24, 13–32.

Warrington, E. K., & Taylor, A. M. (1973). The contribution of the right parietal lobe to ob­
ject recognition. Cortex, 9, 152–164.

Waszak, F., Drewing, K., & Mausfeld, R. (2005). Viewerexternal frames of reference in the
mental transformation of 3-D objects. Perception & Psychophysics, 67, 1269–1279.

Weintraub, S., & Mesulam, M.-M. (1987). Right cerebral dominance in spatial attention:
Further evidence based on ipsilateral neglect. Archives of Neurology, 44, 621–625.

Weiskrantz, L. (1986). Blindsight: A Case Study and Implications. Oxford: Oxford Univer­
sity Press.

Weiskrantz, L. (1997). Consciousness lost and found: A neuropsychological exploration.


Oxford, UK: Oxford University Press.

Wilson, F. A., Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of object and
spatial processing domains in primate prefrontal cortex. Science, 260, 1955–1958.

Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code
for space. Science, 261, 1055–1058.

Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of the hippocampal ensemble


memories during sleep. Science, 265, 676–679.

Wolbers, T., & Hegarty, M. (2010). What determines our navigational abilities? Trends in
Cognitive Sciences, 14 (3), 138–146.

Wolbers, T., Wiener, J. M., Mallot, H. A., & Bűchel, C. (2007). Differential recruitment of
the hippocampus, medial prefrontal vortex, and the human motion complex during path
integration in humans. Journal of Neuroscience, 27, 9408–9416.

Xu, F., & Carey, S. (1996). Infants’ metaphysics: The case of numerical identity. Cognitive
Psychology, 30, 111–153.

Page 58 of 59
Representation of Spatial Relations

Yang, T. T., Gallen, C. C., Schwartz, B. J., & Bloom, F. E. (1994). Noninvasive somatosenso­
ry homunculus mapping in humans by using a large-array biomagnetometer. Proceedings
of the National Academy of Sciences U S A, 90, 3098–3102.

Zacks, J. M., Gilliam, F., & Ojemann, J. G. (2003). Selective disturbance of mental rotation
by cortical stimulation. Neuropsychologia, 41, 1659–1667.

Zeki, S. (1969). Representation of central visual fields in prestriate cortex of monkey.


Brain Research, 14, 271–291.

Zeki, S. (2001). Localization and globalization in conscious vision. Annual Review of Neu­
roscience, 24, 57–86.

Zeki, S., Watson, J. D. G., Luexk, C. J., Friston, K. J., Kennard, C., & Frackowiak, R. S. J.
(1991). A direct demonstration of functional specialization in human visual cortex. Journal
of Neuroscience, 11, 641–649.

Zipser, D., & Andersen, R. A. (1988). A back-propagation programmed network that simu­
lates response properties of a subset of posterior parietal neurons. Nature, 331, 679–684.

Bruno Laeng

Bruno Laeng is professor in cognitive neuropsychology at the University of Olso.

Page 59 of 59
Top-Down Effects in Visual Perception

Top-Down Effects in Visual Perception  


Moshe Bar and Andreja Bubic
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0004

Abstract and Keywords

The traditional belief that our perception is determined by sensory input is an illusion.
Empirical findings and theoretical ideas from recent years indicate that our experience,
memory, expectations, goals, and desires can substantially affect the appearance of the
visual information that is in front of us. Indeed, our perception is equally shaped by the
incoming bottom-up information that captures the objective appearance of the world sur­
rounding us, as by our previous knowledge and personal characteristics that constitute
different sources of top-down perceptual influences. The present chapter focuses on such
feedback effects in visual perception, and describes how the interplay between prefrontal
and posterior visual cortex underlies the mechanisms through which top-down predic­
tions guide visual processing. One major conclusion is that perception and cognition are
more interactive than typically thought, and any artificial boundary between them may be
misleading.

Keywords: top-down, feedback, predictions, expectations, perception, visual cortex, prefrontal cortex

Introduction
The importance of top-down effects in visual perception is widely acknowledged by now.
There is nothing speculative in recognizing the fact that our perception is influenced by
our previous experiences, expectations, emotions, or motivation. Research in cognitive
neuroscience has revealed how such factors systematically influence the processing of
signals that originate from our retina or other sensory organs. Nevertheless, even most
textbooks still view percepts as objective reflections of the external world. According to
this view, when placed in a particular environment, we all principally see the same things,
but may mutually differ with respect to postperceptual, or higher level, cognitive process­
es. Thus, perception is seen as exclusively determined by the external sensory, so-called
bottom-up, inputs. And, although the importance of the constant exchange between in­
coming sensory data and existing knowledge used for postulating hypotheses regarding
sensations has occasionally been recognized during the early days of cognitive psycholo­

Page 1 of 24
Top-Down Effects in Visual Perception

gy (MacKay, 1956), only rarely have the top-down effects been considered to be of signifi­
cant importance as bottom-up factors (Gregory, 1980; Koffka, 1935).

Although increased emphasis has been placed on top-down effects in perception within
modern cognitive neuroscience theories and approaches, it is important to note that these
are mostly treated as external, modulatory effects. This view incorporates an implicit and
rather strong assumption according to which top-down effects represent secondary phe­
nomena whose occurrence is not mandatory, but rather is contingent on environmental
conditions such as the level of visibility, or ambiguity. As such, they are even discussed as
exclusively attentional or higher-level executive effects that may influence and (p. 61) bias
visual perception, without necessarily constituting one of its inherent, core elements.

This chapter reviews a more comprehensive outlook on visual perception. In this account,
interpreting the visual world can’t be accomplished by relying solely on bottom-up infor­
mation, but rather it emerges from the integration of external information with preexist­
ing knowledge. Several sources of such knowledge that trigger different types of top-
down biases and may modulate and guide visual processing are described here. First, we
discuss how our brain, when presented with a visual object, employs a proactive strategy
of testing multiple hypotheses regarding its identity. Another source of early top-down fa­
cilitatory bias derived from the presented object concerns its potential emotional salience
that is also extracted during the early phases of visual processing. At the same time, a
comparable associative and proactive strategy is used for formulating and testing hy­
potheses based on the context in which the object is presented and on other items likely
to accompany it. Finally, factors outside the currently presented input such as task de­
mands, previously encountered events, or the behavioral context may be used as addition­
al sources of top-down influences in visual recognition. In conclusion, although triggered
and constrained by the incoming bottom-up signals, visual perception is strongly guided
and shaped by top-down mechanisms whose origin, temporal dynamics, and neural basis
are reviewed and synthesized here.

Understanding the Visual World


The common view of visual perception posits that the flow of visual information starts
once the visual information is acquired by the eye, and continues as it is transmitted fur­
ther, to the primary and then to higher order visual cortices. This hierarchical view is in­
complete because it ignores the influences that our preexisting knowledge and various
dispositions have on the way we process and understand visual information.

In our everyday life, we are rarely presented with clearly and fully perceptible visual in­
formation, and we rarely approach our environment without certain predictions. Even
highly familiar objects are constantly encountered in different circumstances that greatly
vary in terms of lighting, occlusions, and other parameters. Naturally, all of these varia­
tions limit our ability to structure and interpret the presented scene in a meaningful man­
ner without relying on previous experience. Therefore, outside the laboratory, successful
recognition has to rely not only on the immediately available visual input but also on dif­
Page 2 of 24
Top-Down Effects in Visual Perception

ferent sources of related preexisting information represented at higher processing levels


(Bullier, 2001; Gilbert & Sigman, 2007; Kveraga et al., 2007b), which trigger the so-called
top-down effects in visual perception. Consequently, to understand visual recognition, we
need to consider differential, as well as integrative, effects of bottom-up and top-down
processes, and their respective neural bases.

Before reviewing specific instances in which top-down mechanisms are of prominent rele­
vance, it is important to mention that the phrase “top-down modulation” has typically
been utilized in several different ways in the literature. Engel et al. (2001) summarize
four variants of its use: anatomical (equating top-down influences with the feedback activ­
ity in a processing hierarchy), cognitivist (equating top-down influences with hypothesis-
or expectation-driven processing), perceptual or gestaltist (equating top-down influences
with contextual modulation of perceptual items), and dynamicist (equating top-down in­
fluences with the enslavement of local processing elements by large-scale neuronal dy­
namics). Although separable, these variants are in some occasions partly overlapping
(e.g., it may be hard to clearly separate cognitivist and perceptual definitions) or mutually
complementary (e.g., cognitivist influences may be mediated by anatomical or dynamicist
processing mechanisms). A more general definition might specify top-down influences as
instances in which complex types of information represented at higher processing stages
influence simpler processing occurring at earlier stages (Gilbert & Sigman, 2007). How­
ever, even this definition might be problematic in some instances because the term “com­
plexity of information” can prove to be hard to specify, or even to apply in some instances
of top-down modulations. Nevertheless, bringing different definitions and processing lev­
els together, it is possible to principally suggest that bottom-up information flow is sup­
ported by feedforward connections that transfer information from lower to higher level
regions, in contrast to feedback connections that transmit the signals originating from
the higher level areas downstream within the processing hierarchy. Feedforward connec­
tions are typically considered to be those that originate from supragranular layers and
terminate in and around the fourth cortical layer, in contrast to feedback connections that
originate in infragranular and end in agranular layers (Felleman & Van Essen, 1991; Fris­
ton, 2005; Maunsell & Van Essen, 1983; Mumford, 1992; Shipp, 2005). (p. 62) Further­
more, feedforward connections are relatively focused and tend to project to a smaller
number of mostly neighboring regions in the processing hierarchy. In contrast, feedback
connections are more diffused because they innervate many regions and form wider con­
nection patterns within them (Shipp & Zeki, 1989; Ungerleider & Desimone, 1986). Con­
sequently, feedforward connections carry the main excitatory input to cortical neurons
and are considered as driving connections, unlike the feedback ones that typically have
modulatory effects (Buchel & Friston, 1997; Girard & Bullier, 1989; Salin & Bullier, 1995;
Sherman & Guillery, 2004). For example, feedback connections potentiate responses of
low-order areas by nonlinear mechanisms such as gain control (Bullier et al., 2001), in­
crease the synchronization in lower order areas (Munk et al., 1995), and contribute to at­
tentional processing (Gilbert et al., 2000). Mumford (1992) has argued that feedback and
feedforward connections have to be considered of equal importance, emphasizing that
most cognitive processes reflect a balanced exchange of information between pairs of

Page 3 of 24
Top-Down Effects in Visual Perception

brain regions that are often hierarchically asymmetrical. However, the equal importance
does not imply equal functionality: as suggested by Lamme et al. (1998), feedforward con­
nections may be fast, but they are not necessarily linked with perceptual experience. In­
stead, attentive vision and visual awareness, which may in everyday life introspectively be
linked to a conscious perceptual experience, arise from recurrent processing within the
hierarchy (Lamme & Roefsema, 2000).

As described in more detail later in this chapter, a similar conceptualization of feedfor­


ward and feedback connections has been advocated by a number of predictive approach­
es to cortical processing, all of which emphasize that efficient perception arises from a
balanced exchange of bottom-up and top-down signals. Within such recurrent pathways,
feedback connections are suggested to trigger “templates,” namely expected reconstruc­
tions of sensory signals that can be compared with the input being received from lower-
level areas by the feedforward projections. According to one theory, the residual, or the
difference between the template and incoming input, is calculated and transmitted to
higher level areas (Mumford, 1992). Different models propose different ways to handle
the discrepancy between top-down hypotheses and bottom-up input, but they mostly
agree that a comparison between ascending and descending information is needed to fa­
cilitate convergence.

In conclusion, our understanding of visual recognition strongly requires that we charac­


terize the sources and dynamics of top-down effects in sensory processing. First, we need
to distinguish between different types of cognitive biases in vision, some of which may be­
come available before the presentation of a specific stimulus, whereas others are trig­
gered by its immediate appearance. Specifically, although in some situations we may be
able to use prior knowledge to generate more or less specific predictions regarding the
upcoming stimulation, in other contexts hypotheses regarding the identity or other fea­
tures of incoming stimuli can only be generated after the object is presented. In either
case, visual perception does not simply reflect a buildup of independently processed stim­
ulus features (e.g., shape, color, or edge information), which are first integrated into a
recognizable image and thereafter complemented with other, already existing information
about that particular object. Instead, links between the presented input and preexisting
representations are created before, or at the initial moment of, object presentation and
continuously refined thereafter until the two sufficiently overlap and the object is suc­
cessfully recognized.

Relevance of Prediction in Simple Visual


Recognition
Even the simplest, most impoverished contexts of visual recognition rely on an integra­
tion of bottom-up and top-down processing mechanisms. It has been proposed that this
type of recognition is predictive in nature, in the sense that predictions are initialized af­
ter stimulus presentation based on certain features of the presented input (e.g., global
object shape), which are rapidly processed and used for facilitating the processing of oth­
Page 4 of 24
Top-Down Effects in Visual Perception

er object features (Bar, 2003, 2007; Kveraga et al., 2007b). One proposal is that percep­
tion, even when single objects are presented in isolation, progresses through several
phases that constitute an activate-predict-confirm perception cycle (Enns & Lleras, 2008).
Explicitly or implicitly, this view is in accordance with predictive theories of cortical pro­
cessing (Bar, 2003; Friston, 2005; Grossberg, 1980; Mumford, 1992; Rao & Ballard, 1999;
Ullman, 1995) that were developed in attempts to elucidate the mechanisms of iterative
recurrent cortical processing underlying successful cognition. Their advancement was
strongly motivated by the increase in knowledge regarding the structural and functional
properties of feedback and feedforward connections described earlier, as well as the
posited distinctions between (p. 63) forward and inverse models in computer vision (Bal­
lard et al., 1983; Friston, 2005; Kawato et al., 1993). Based on these developments, pre­
dictive theories suggested that different cortical systems share a common mechanism of
constant formulation and communication of expectations and other top-down biases from
higher to lower level cortical areas, which thereby become primed for the anticipated
events. This allows the input that arrives to these presensitized areas to be compared and
integrated with the postulated expectations, perhaps through specific synchrony patterns
visible across different levels of the hierarchy (Engel et al., 2001; Ghuman et al., 2008;
Kveraga et al., 2007b; von Stein et al., 2000; von Stein & Satnthein, 2000). Such compari­
son and integration of top-down and bottom-up information has been posited to rely on it­
erative error-minimization mechanisms (Friston, 2005; Grossberg, 1980; Kveraga et al.,
2007b; Mumford, 1992; Ullman, 1995) that support successful cognitive functioning. With
respect to visual processing, this means that an object can be recognized once a match
between the prepostulated hypothesis and sensory representation is reached, such that
no significant difference exists between the two. As mentioned before, this implies that
feedforward pathways carry error signals, or information regarding the residual discrep­
ancy between predicted and actual events (Friston, 2005; Mumford, 1992; Rao & Ballard,
1999). This suggestion is highly important because it posits a privileged status for errors
of prediction in cortical processing. These events require more pronounced and pro­
longed analysis because they typically signal inadequacy of the preexisting knowledge for
efficient functioning. Consequently, in addition to providing a powerful signal for novelty
detection, discrepancy signals often trigger a reevaluation of current knowledge, new
learning, or a change in behavior (Corbetta et al., 2002; Escera et al., 2000; Schultz &
Dickinson, 2000; Winkler & Czigler, 1998). In contrast, events that are predicted correct­
ly typically carry little informational value (because of successful learning, we expected
these to occur all along) and are therefore processed in a more efficient manner (i.e.,
faster and more accurately) than the unexpected or wrongly predicted ones. Although the
described conceptualization of the general dynamics of iterative recurrent processing
across different levels of cortical hierarchies is shared by most predictive theories of
neural processing, these theories differ with respect to their conceptualizations of more
specific features of such processing (e.g., the level of abstraction of the top-down mediat­
ed templates or the existence of information exchange outside neighboring levels of corti­
cal hierarchies; cf. Bar, 2003; Kveraga et al., 2007b; Mumford, 1992).

Page 5 of 24
Top-Down Effects in Visual Perception

One of the most influential models that highlight the relevance of recurrent connections
and top-down feedback in visual processing is the “global-to-local” integrated model of vi­
sual processing of Bullier (2001). This model builds on our knowledge of the anatomy and
functionality of the visual system, especially the differences between magnocellular and
parvocellular visual pathways that carry the visual information from the retina to the
brain. Specifically, it takes into account findings showing that the magnocellular and par­
vocellular pathways are separated over the first few processing stages and that, following
stimulus presentation, area V1 receives activation from lateral geniculate nucleus magno­
cellular neurons around 20 ms earlier than from parvocellular neurons. This faster activa­
tion of the M-channel (characterized by high contrast but poor chromatic sensitivity, larg­
er receptive fields, and lower spatial sampling rate), together with the high degree of
myelination, could account for the early activation of the dorsal visual processing stream
after visual stimulation, which enables the generation of fast feedback connections from
higher to lower areas (V1 and V2) at exactly the time when feedforward activation from
the P-channel arrives (Bullier, 2001; Kaplan, 2004; Kaplan & Shapley, 1986; Merigan &
Maunsell, 1993; Schiller & Malpeli, 1978; Goodale & Milner, 1992). This view is very dif­
ferent not only from the classic theories that emphasize the importance of feedforward
connections but also from the more “traditional” account of feedback connections stating
that, regardless of the overall importance of recurrent connections, the first sweep of ac­
tivity through the hierarchy of (both dorsal and ventral) visual areas is still primarily de­
termined by the pattern of feedforward connections (Thorpe & Imbert, 1989). The inte­
grated model of visual processing treats V1 and V2 areas as general-purpose representa­
tors that integrate computations from higher levels, allowing global information to influ­
ence the processing of more detailed ones. V1 could be a place for perceptual integration
that reunites information returned from different higher level areas by feedback connec­
tions after being divided during the first activity sweep for a very simple reason: This cor­
tical area still has a high-resolution map of almost all relevant feature information. This
view resonates well with recent theories of visual processing and awareness, such as
Zeki’s theory of visual (p. 64) consciousness (Zeki & Bartels, 1999), Hochstein and
Ahissar’s theory of perceptual processing and learning (Hochstein & Ahissar, 2002), or
Lamme’s (2006) views on consciousness.

Another model that suggests a similar division of labor and offers a predictive view of vi­
sual processing in object recognition was put forth by Bar and colleagues (Bar, 2003,
2007; Bar et al., 2006; Kveraga et al., 2007a). This model identifies the information con­
tent that triggers top-down hypotheses and characterizes the exact cortical regions that
bias visual processing by formulating and communicating those top-down predictions to
lower-level cortical areas. It also explains how top-down information modulates bottom-up
processing in situations in which single objects are presented in isolation and in which
prediction is driven by a subset of an object’s own properties that facilitate the recogni­
tion of the object itself as well as other objects it typically appears with. Specifically, this
top-down facilitation model posits that visual recognition progresses from an initial stage
aimed at determining what an object resembles, and a later stage geared toward specify­
ing its fine details. For this to occur, the top-down facilitation model critically assumes

Page 6 of 24
Top-Down Effects in Visual Perception

that different features of the presented stimulus are processed at different processing
stages. First, the coarse, global information regarding the object shape is rapidly extract­
ed and used for activating in memory existing representations that most resemble the
global properties of the given object to be recognized. These include all objects that share
the rudimentary properties of the presented object and look highly similar if viewed in
blurred or decontextualized circumstances (e.g., a mushroom, desk lamp, and umbrella).
Although an object cannot be identified with full certainty based on coarse stimulus out­
line, such rapidly extracted information is still highly useful for basic-level recognition of
resemblance, creating the so-called analogies (Bar, 2007). Generally, analogies allow the
formulation of links between the presented input and relevant preexisting representa­
tions in memory, which may be based on different types of similarity between the two
(e.g., perceptual, semantic, functional, or conceptual). In the context of visual recogni­
tion, analogies are based on global perceptual similarity between the input object and ob­
jects in memory, which allows the brain to generate multiple hypotheses or guesses re­
garding the object’s most likely identity.

For these analogies and initial guesses to be useful, they need to be triggered early dur­
ing visual processing, while they can still bias the slower incoming bottom-up input. Thus,
information regarding the coarse stimulus properties has to be processed first, before the
finer object features. Indeed, it has been suggested that such information is rapidly ex­
tracted and conveyed using low spatial frequencies of the visual input through the mag­
nocellular (M) pathway, which is ideal for transmitting coarse information regarding the
general object outlines at higher velocity compared with other pathways (Merigan &
Maunsell, 1993; Schiller & Malpeli, 1978). This information is transmitted to the or­
bitofrontal cortex (OFC) (Bar, 2003; Bar et al., 2006), a polymodal region implicated pri­
marily in the processing of rewards and affect (Barbas, 2000; Carmichael & Price, 1995;
Cavada et al., 2000; Kringelbach & Rolls, 2004), as well as supporting some aspects of vi­
sual processing (Bar et al., 2006; Freedman et al., 2003; Frey & Petrides, 2000; Meunier
et al., 1997; Rolls et al., 2005; Thorpe et al., 1983). In the present framework, the OFC is
suggested to generate predictions regarding the object’s identity by activating all repre­
sentations that share the global appearance with the presented image by relying on
rapidly analyzed low spatial frequencies (Bar, 2003, 2007). Once fed back into the ventral
processing stream, these predictions interact with the slower incoming bottom-up infor­
mation and facilitate the recognition of the presented object. Experimental findings cor­
roborate the hypothesis regarding the relevance of the M-pathway for transmitting
coarse visual information (Bar, 2003; Bar et al., 2006; Kveraga et al., 2007b), as well as
the importance of OFC activity, and the interaction between the OFC and inferior tempo­
ral cortex, for recognizing visual objects (Bar et al., 2001, 2006). It has also been suggest­
ed that low spatial frequency information is used for generating predictions regarding
other objects and events that are likely to be encountered in the same context (Bar, 2004;
Oliva & Torralba, 2007). As also suggested by Ganis and Kosslyn (2007), the associative
nature of our long-term memory plays a crucial role in matching the presented input with
preexisting representations and all associated information relevant for object identifica­
tion. Finally, as suggested by Barrett and Bar (2009), different portions of the OFC are al­

Page 7 of 24
Top-Down Effects in Visual Perception

so implicated in mediating affective predictions by supporting a representation of objects’


emotional salience that constitutes an inherent part of visual recognition. The elaboration
of the exact dynamics and interaction between affective predictions and (p. 65) other
types of top-down biases in visual recognition remains to be elucidated.

Contextual Top-Down Effects


In the previous sections, we described how the processing of single objects might be facil­
itated by using rapidly processed global shape information for postulating hypotheses re­
garding their identity. However, in the real world, objects are rarely presented in isola­
tion. Instead, they are embedded in particular environments and surrounded by other ob­
jects that are typically not random, but rather are mutually related in that they typically
share the same context. Such relations that are based on frequent co-occurrence of ob­
jects within the same spatial or temporal context may be referred to as contextual associ­
ations. Importantly, objects that are typically found in the same environment can share
many qualitatively different types of relations. For example, within one particular context
(e.g., a kitchen), it may be possible to find objects that are semantically or categorically
related in the sense that they belong to the same semantic category (e.g., kitchen appli­
ances such as a dishwasher and a refrigerator), as well as those that are typically used to­
gether, thus sharing a mutual functional relation (e.g., a frying pan and oil). However, the
existence of such relations is not crucial for defining contextual associates because some
may only have the environment that they typically coinhabit in common (e.g., a shower
curtain and a toilet brush). Furthermore, some categorically or contextually related ob­
jects may be perceptually similar (e.g., an orange and a peach) or dissimilar (e.g., an ap­
ple and a banana). In addition, various contextual associates may share spatial relations
of different flexibility, for example, such that a bathroom mirror is typically placed above
the sink, whereas a microwave and a refrigerator may be encountered in different rela­
tional positions within the kitchen. Finally, although some object pairs may share only one
relation (e.g., categorical: a cat and a goat; or contextual: a towel and sunscreen), others
may be related in more than one way (e.g., a mouse and a cat are related both contextual­
ly and categorically, whereas a cat and a dog are related contextually, categorically, and
perceptually).

Although the number of different association types shared by the two objects may be of
some relevance, it is mainly the frequency and consistency of their mutual co-occurrence
that determines the strength of their associative connections (Bar, 2004; Biederman,
1981; Hutchison, 2003; Spence & Owens, 1990). Such associative strength is important in
that it provides information that our brain can use for accurate predictions. Our nervous
system is extremely efficient in capturing the statistics in natural visual scenes (Fiser &
Aslin, 2001; Torralba & Oliva, 2003), as well as learning the repeated spatial contingen­
cies of even arbitrarily distributed abstract stimuli (Chun & Jiang, 1999). In addition to
being sensitive and successful in learning contextual relations, our brain is also highly ef­
ficient in utilizing this knowledge for facilitating its own processing. In accordance with
the general notion of a proactively predictive brain (Bar, 2009), it has repeatedly been
Page 8 of 24
Top-Down Effects in Visual Perception

demonstrated that the learned contextual information is constantly used for increasing
the efficiency of visual search and recognition (Bar, 2004; Biederman et al., 1982; Daven­
port & Potter, 2004; Torralba et al., 2006). Specifically, objects presented in familiar back­
grounds, especially if appearing in expected spatial configuration, are processed faster
and more accurately than those presented in noncongruent settings. Furthermore, the
contextual and semantic redundancy provided by the presentation of several related ob­
jects encountered in such settings allows the observer to resolve the insecurities and am­
biguities of individual objects more efficiently. Such context-based predictions are ex­
tremely useful because they save processing resources and reduce the need for exerting
mental effort while dealing with predictable aspects of our surroundings. They allow us to
allocate attention toward relevant environmental aspects more efficiently, and they are
very useful for guiding our actions and behavior. However, to understand the mechanisms
underlying contextual top-down facilitation effects, it is important to illustrate how the
overall meaning, or the gist, of a complex image can be processed fast enough to become
useful for guiding the processing of individual objects presented within it (Bar, 2004; Oli­
va, 2005; Oliva & Torralba, 2001).

Studies that have addressed this issue have recently demonstrated how extracting the
gist of a scene mainly relies on low spatial frequencies present in the image. This is not
surprising because we demonstrated earlier how such frequencies can be rapidly ana­
lyzed within the M-pathway allowing them to, in this case, aid the rapid classification of
the presented context (Bar, 2004; Oliva & Torralba, 2001; Schyns & Oliva, 1994). Specifi­
cally, information regarding the global scene features can proactively be used for activat­
ing context frames, namely the representations of objects and relations that are common
to that specific context (Bar, 2004; Bar & Ullman, 1996). Similar to the ideas of schemata
(p. 66) (Hock et al., 1978), scripts (Shank, 1975), and frames (Minsky, 1975) from the

1970s and 1980s, such context frames are suggested to aid the processing of individual
objects presented in the scene. At this point, it is important to mention that context
frames should not be understood as static images that are activated in an all-or-none
manner, but rather as dynamic entities that are processed gradually. In this view, a proto­
typical spatial template of the global structure of a familiar context is activated first, and
is then filled with more instance-specific details until it develops into an episodic context
frame that includes specific information regarding an individual instantiation of the given
context. Overall, there is ample evidence that demonstrates our brain’s efficiency in ex­
tracting the gist of even highly complex visual images, allowing it to identify individual
objects typically encountered in such settings.

This, however, represents only one level of predictive, contextual top-down modulations.
On another level, it is possible that individual objects that are typically encountered to­
gether in a certain context are also able to mutually facilitate each other’s processing. In
other words, although it has long been known that a kitchen environment helps in recog­
nizing a refrigerator that it typically contains, it was less clear whether seeing an image
of that refrigerator in isolation could automatically invoke the image of the contextual set­
ting in which it is typically embedded. If so, it would be plausible to expect that present­
ing objects typically associated with a certain context in isolation could facilitate subse­
Page 9 of 24
Top-Down Effects in Visual Perception

quent recognition of their typical contextual associates, even when these are presented
outside the shared setting. This hypothesis has been addressed and substantiated in a se­
ries of studies conducted by Bar and colleagues (Aminoff et al., 2007; Bar et al., 2008a,
2008b; Bar & Aminoff, 2003; Fenske et al., 2006) that indicated the relevance of the
parahippocampal cortex (PHC), the retrosplenial complex (RSC), and the medial pre­
frontal cortex (MPFC) for contextual processing. In addition to identifying the regions rel­
evant for contextual relations, they also revealed a greater sensitivity of the PHC to asso­
ciations with greater physical specificity, in contrast to RSC areas that seem to represent
contextual associations in a more abstract manner (Bar et al., 2008b). Similar to the
neighboring OFC’s role in generating predictions related to object identity, the involve­
ment of the neighboring part of the MPFC was suggested to reflect the formulations of
predictions based on familiar, both visual-spatial and more abstract types of contextual
associations (Bar & Aminoff, 2003; Bar et al., 2008a, 2008b). These findings are of high
relevance because they illustrate how the organization of the brain as a whole may auto­
matically and parsimoniously support the strongly associative nature of our cognitive
mind (Figure 4.1).

Before moving on to other types of top-down modulations, it is important to highlight that


the described contextual effects in visual recognition represent only one type of contextu­
al phenomena in vision. As described, they mostly address the functional or even seman­
tic context in which the objects appear. On a somewhat more basic level than this, sim­
pler stimulus contextual effects also need to be mentioned as a form of modulatory influ­
ence in vision. For example, contextual effects in simple figure–ground segregation of ob­
jects are visible in situations in which, for example, the response of a neuron becomes af­
fected by the global characteristics of the contour defining the object that is outside the
neuron’s receptive field (Li et al., 2006). Generally, it is hard to determine whether such
influences should be considered as top-down because some of them might be mediated
solely by local connections intrinsic to the area involved in processing a certain feature.
Thus, strictly speaking, they might not adhere to the previous definitions of top-down
modifications as those reflecting influences from higher level processing stages (Gilbert
& Sigman, 2007). However, given the relevance of these stimulus contextual influences
on many elements of visual processing, including contour integration, scene segmenta­
tion, color constancy, and motion processing (Gilbert & Sigman, 2007), their relevance for
intact visual perception has to be acknowledged. In a sense, even gestalt rules of percep­
tual grouping may be seen as similar examples of modulations in vision because they
summarize how contours are perceptually linked as a consequence of their geometric re­
lationships (Todorović, 2007). These rules most clearly indicate how, indeed, “the whole is
greater than the sum of its parts,” because they show how our perception of each stimu­
lus strongly depends on the contextual setting in which it is embedded. A very interesting
feature of gestalt rules, and information integration in general, is that the knowledge re­
garding the grouping process itself does not easily modify the percept. In the example of
the Muller-Lyer visual illusion, knowing that the two lines are of equal length does not
necessarily make the illusion go away (Bruno & Franz, 2009).

Page 10 of 24
Top-Down Effects in Visual Perception

Figure 4.1 Parallel to the systematic bottom-up pro­


gression of image details that are mainly mediated
by high spatial frequency (HSF) information along
the ventral visual pathway, rapid projections of
coarse low spatial frequencies (LSFs) trigger the
generation of hypotheses or “initial guesses” regard­
ing the exact object identity and the context within
which it typically appears. Both of these types of pre­
dictions are validated and refined with the gradual
arrival of HSFs (Bar, 2004). IT, inferior temporal cor­
tex; MPFC, medial prefrontal cortex; OFC, orbital
frontal cortex; RSC, retrosplenial cortex; PHC,
parahippocampal cortex.

Modified, with permission, from Bar (2009).

(p. 67)

Interfunctional Nature of Top-Down Modula­


tions
In the previous section, a “purely perceptual” aspect of top-down modulations has been
introduced because the theoretical proposals and experimental findings described mostly
focused on the top-down modulations that are triggered by the presentation of a single
stimulus. However, top-down influences in visual perception include a much wider array
of phenomena. Specifically, in many cases, the top-down-mediated preparation begins be­
fore the actual presentation of the stimulus and is triggered by factors such as instruction
(Carlsson et al., 2000) or specific task cue (Simmons et al., 2004). These types of influ­
ences may be considered contextual, in the sense that they reflect the behavioral context
of visual processing that is related to the perceptual task at hand (Watanabe et al., 1998).
In this case, the participant may choose to focus on a certain aspect of a stimulus that is
expected to be of relevance in the future, thus triggering a modulatory effect on the feed­
forward analysis of the stimulus once it appears. This is similar to the way prior knowl­
edge regarding the likelihood of the spatial location or other features of the objects ex­
pected to appear in the near future influences our perception (Bar, 2004; Biederman,

Page 11 of 24
Top-Down Effects in Visual Perception

1972, 1981; Biederman et al., 1973; Driver & Baylis, 1998; Scholl, 2001). Typically, know­
ing what to expect allows one to attend to the location or other features of the expected
stimulus. Not only spatial but also object features, objects or categories, and temporal
context or other perceptual groups could be considered different subtypes of top-down in­
fluences (Gilbert & Sigman, 2007).

In addition, prior presentation of events that provide clues regarding the identity of the
incoming stimulation may also influence visual processing. Within this context, one spe­
cific form of the potential influence of previous experience on current visual processing
involves priming (Schacter et al., 2004; Tulving & Schacter, 1990; Wiggs & Martin, 1998).
Events that occur before target presentation and that influence its processing may in­
clude those that are semantically or contextually long-term related to the target event
(Kveraga et al., 2007b), (p. 68) as well as stimuli that had become associated with the tar­
get stimulus through short-term learning within the same (Schubotz & von Cramon,
2001; 2002) or a different (Widmann et al., 2007) modality. It has been suggested that, in
some cases, and especially in the auditory domain, prediction regarding the forthcoming
stimuli can be realized within the sensory system itself (Näätänen et al., 2007). In other
cases, however, these predictive sensory phenomena may be related to the computations
of the motor domain. Specifically, expectations regarding the incoming stimulation may
be formulated based on an “efference copy” elicited by the motor system in situations in
which a person’s own actions trigger such stimulation (Sperry, 1950). Specifically, von
Holst, Mittelstaedt, and Sperry in the 1950s provided the first experimental evidence
demonstrating the importance of motor-to-sensory feedback in controlling behavior (Bays
& Wolpert, 2008; Wolpert & Flanagan, 2001). This motivated an increased interest in the
so-called internal model framework that can now be considered a prevailing, widely ac­
cepted view of action generation (Miall & Wolpert, 1996; Wolpert & Kawato, 1998). Ac­
cording to this framework, not only is there a prediction for sensory events that may be
considered consequences of one’s own behavior (Blakemore et al., 1998), but the same
system may be utilized for anticipating sensory events that are strongly associated with
other sensory events co-occurring on a short time scale (Schubotz, 2007). Before moving
away from the motor system, it is important to mention one more, somewhat different
view that also emphasizes a strong link between perceptual and motor behavior. Specifi­
cally, according to Gross and colleagues (1999), when discussing perceptual processing, a
strong focus should be placed on sensorimotor anticipation because it allows one to di­
rectly characterize a visual scene in categories of behavior. Thus, perception is not simply
predictive but also is “predictive with purpose” and may thus be described as behavior
based (Gross et al., 1999) or, as suggested by Arbib (1972), action oriented.

Up to now, top-down modulatory influences have been recognized in a multitude of differ­


ent contexts and circumstances. However, one that is possibly the most obvious has not
been addressed. Specifically, we have not discussed in any detail the relevance and neces­
sity of top-down modulations in situations in which we can, from our daily experience, ex­
plicitly appreciate the aid of prior knowledge for processing stimuli at hand. This mostly
concerns the situations in which the visual input is impoverished or ambiguous, and in
which we almost consciously rely on top-down information for closing the percept. For ex­
Page 12 of 24
Top-Down Effects in Visual Perception

ample, contextual information is crucial in recovering visual scene properties lost be­
cause of the blurs or superimpositions in visual image (Albright & Stoner, 2002). In situa­
tions in which an object is ambiguous and may be interpreted in more than one fashion,
the importance of a prior template that may guide recognition is even more of relevance
than when viewing a clear percept (Cavanagh, 1991). Examples of such stimuli include
two faces or a vase versus one face behind a vase (Costall, 1980), or as shown in Figure
4.2, a face of a young versus an old woman or a man playing a saxophone seen in silhou­
ette versus a woman’s face in shadow (Shepard, 1990).

Figure 4.2 Ambiguous figures demonstrate how our


experience of the visual world is not determined sole­
ly by bottom-up input: example of the figure of a
young woman versus an old woman, and a woman’s
face versus a man playing the saxophone.

Generally, in the previous sections, a wide array of top-down phenomena have been men­
tioned, and some were described in more detail. From this description, it became clear
how difficult it is to categorize these effects in relation to other cognitive functions, the
most important of which is attention. Specifically, not only is it hard to clearly distinguish
between attentional and other forms of top-down phenomena, but also, given that the
same effect is sometimes discussed as an attentional, and sometimes as a perceptual,
phenomenon, a clear separation may not be possible. This might not even be necessary in
all situations because, for most practical purposes, the mechanisms underlying such phe­
nomena and the effects they introduce may be considered the same. When discussing top-
down influences of selective attention, one typically refers to the processing guided by
hypotheses or expectations and the influence of prior knowledge or other (p. 69) personal
features on stimulus selection (Engel et al., 2001), which is quite concordant to the way
top-down influences have been addressed in this chapter. Even aspects of visual percep­
tion as mentioned here might, at least in part, be categorized under anticipatory atten­
tion that involves a preparation for the upcoming stimulus and improves the speed and
precision of stimulus processing (Posner & Dehaene, 1994; Brunia, 1999). Not surprising­
ly, then, instead of always categorizing a certain effect into one functional “box,” Gazzaley
(2010) uses a general term “topdown modulation,” which includes changes in sensory
cortical activity associated with relevant and irrelevant information that stand at the
crossroads of perception, memory, and attention. Similarly, a general conceptualization of

Page 13 of 24
Top-Down Effects in Visual Perception

top-down influences as those that reflect an interaction of goals, action plans, working
memory, and selective attention is suggested by Engel et al. (2001). In an attempt to bet­
ter differentiate basic types of top-down influences in perception, Ganis and Kosslyn
(2007) suggested a crucial distinction between strategic top-down processes that include
those influences that are under voluntary control and may be contrasted with involuntary
reflexive types of top-down modulations. In a somewhat different view, Summerfield and
Egner (2009) argued that biases in visual processing include attentional mechanisms that
prioritize processing based on motivational relevance and expectations that bias process­
ing based on prior experience. While acknowledging these initial attempts to create a
clear taxonomy of top-down influences in perceptual processing, there is still quite a lot
of disagreement in this area that remains to be settled in the future.

On a more anatomical and physiological side, when discussing top-down modulations, it


was suggested that these should not be regarded as an intrinsic property of individual
sensory areas, but instead as a phenomenon realized through long-range connections be­
tween distant brain regions (Gazzaley, 2010). In this context, a lot of work has been in­
vested in clearly defining the two types of areas involved in such interactive processing:
sites and sources of biases. Specifically, sites relate to those areas in which the analysis of
afferent signals takes place, whereas sources include those that provide the biasing infor­
mation that modulate processing (Frith & Dolan, 1997). Although some of the previously
mentioned theories of visual perception that emphasize the role of feedback connections
(Grossberg, 1980; Mumford, 1992; Ullman, 1995) often consider the immediate external
stimulation to constitute the source of feedback information, others consider modulatory
“bias” signals to be triggered by the system itself based on the prior knowledge (Engel et
al., 2001). The sources of such signals most often include the prefrontal, but also parietal
and temporal, cortices as well as the limbic system, depending on the specific type of in­
formation influencing information processing (Beck & Kastner, 2009; Engel et al., 2001;
Hopfinger et al., 2000; Miller & Cohen, 2001). All of them aid and guide visual processing
by providing relevant information necessary for the evaluation and interpretation of the
incoming stimulus.

An important thing to keep in mind, however, is that defining top-down modulations in


terms of long-term connections may be limiting for recognizing some influences that have
a modulatory role in perception and contribute to our experience of an intact percept. In
this conceptualization, it is not quite clear how distant two regions have to be in order for
their interaction to be considered a “top-down” phenomenon and whether some short-
range and local connections that modulate our perception may also be considered “top-
down.” It is not clear what should be more relevant for defining top-down modulations,
the distance between regions or the existence of recurrent processing between separate
units that allows us to use prior experiences and knowledge for modulating the feedfor­
ward sweep of processing. This modulation may, as suggested in the attentional theory of
Kastner and Ungerleider (2000), be realized through an increase of the neural response
to the relevant or attended stimulus and an attenuation of the irrelevant stimulus before
or after its presentation, a claim that has been experimentally corroborated (Desimone &
Duncan, 1995; Pessoa et al., 2003; Reynolds & Chelazzi, 2004). Furthermore, as suggest­
Page 14 of 24
Top-Down Effects in Visual Perception

ed by Dehaene et al. (1998), such top-down attentional amplification, or an increase in ac­


tivity related to the relevant stimulus, is also relevant as the mechanism that allows the
stimulus to be made available to consciousness.

Concluding Remarks
Visual recognition was depicted here as a process that reflects the integration of two
equally relevant streams of information. One, the bottom-up stream, captures the input
present in the visual image itself and conveyed through the senses. Second, a top-down
stream, contributes based on prior experiences, current personal and behavioral sets, and
future expectations and modifies the processing of the presented stimulus. In this concep­
tualization, perception may be considered the process of integrating the incoming input
and our preexisting (p. 70) knowledge that exists in all contexts, even in very controlled
and simple viewing circumstances.

In this chapter, it has been argued that top-down effects in visual recognition may be of
different complexity and may be triggered by the stimulus itself or information external to
the presented stimulus such as the task instruction, copies of motor commands, or prior
presentation of events informative for the current stimulation. These various types of top-
down influences originate from different cortical sources, reflecting the fact that they are
triggered by different types of information. Regardless of the differences in their respec­
tive origins, different types of top-down biases may nevertheless result in similar local ef­
fects, or rely on similar mechanisms for the communication between lower level visual ar­
eas and higher level regions to enable the described effects. In that sense, understanding
the principles of one type of top-down facilitation effect may provide important clues re­
garding the general mechanisms of triggering and integrating top-down and bottom-up
information that underlie successful visual recognition.

In conclusion, the body of research synthesized here demonstrates the richness, complex­
ity, and importance of top-down effects in visual perception and illustrates some of the
fundamental mechanisms underlying these. Critically, it is suggested that top-down ef­
fects should not be considered a secondary phenomenon that only occurs in special, ex­
treme settings. Instead, the wide, heterogeneous set of such biases suggests an ever-
present and complex informational processing stream that, together with the bottom-up
stream, constitutes the core of visual processing. As suggested by Gregory (1980), per­
ception can then be defined as a dynamic search for the best interpretation of the sensory
data, a claim that highlights both the active and the constructive nature of visual percep­
tion. Consequently, visual recognition itself should be considered a proactive, predictive,
and dynamic process of integrating different sources of information, the success of which
is determined by their mutual resonance and correspondence and by our ability to learn
from the past in order to predict the future.

Page 15 of 24
Top-Down Effects in Visual Perception

Author Note
Work on this chapter was supported by NIH grant R01EY019477-01, NSF grant 0842947,
and DARPA grant N10AP20036.

References
Albright, T. D., & Stoner, G. R. (2002). Contextual influences on visual processing. Annual
Review of Neuroscience, 25, 339–379.

Aminoff, E., Gronau, N., & Bar, M. (2007). The parahippocampal cortex mediates spatial
and nonspatial associations. Cerebral Cortex, 27, 1493–1503.

Arbib, M. (1972). The metaphorical brain: An introduction to cybernetics as artificial in­


telligence and brain theory. New York: Wiley Interscience.

Ballard, D. H., Hinton, G. E., & Sejnowski, T. J. (1983). Parallel visual computation. Nature,
306, 21–26.

Bar, M. (2009). The proactive brain: Memory for predictions. Theme issue: Predictions in
the brain: Using our past to generate a future (M. Bar, Ed.), Philosophical Transactions of
the Royal Society, Series B, Biological Sciences, 364, 1235–1243.

Bar, M. (2007). The proactive brain: Using analogies and associations to generate predic­
tions. Trends in Cognitive Sciences, 11 (7), 280–289.

Bar, M. (2004). Visual objects in context. Nature Reviews, Neuroscience, 5 (8), 617–629.

Bar, M. (2003). A cortical mechanism for triggering top-down facilitation in visual object
recognition. Journal of Cognitive Neuroscience, 15, 600–609.

Bar, M., & Aminoff, E. (2003). Cortical analysis of visual context. Neuron, 38 (2), 347–358.

Bar, M., Aminoff, E., & Ishai, A. (2008a). Famous faces activate contextual associations in
the parahippocampal cortex. Cerebral Cortex, 18 (6), 1233–1238.

Bar, M., Aminoff, E., & Schacter, D. L. (2008b). Scenes unseen: The parahippocampal cor­
tex intrinsically subserves contextual associations, not scenes or places per se. Journal of
Neuroscience, 28, 8539–8544.

Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Dale, A. M., Hämäläi­
nen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., & Halgren, E. (2006). Top-down
facilitation of visual recognition. Proceedings of the National Academy of Sciences U S A,
103 (2), 449–454.

Bar, M., Tootell, R., Schacter, D., Greve, D., Fischl, B., Mendola, J., Rosen, B. R., & Dale, A.
M. (2001). Cortical mechanisms of explicit visual object recognition. Neuron, 29 (2), 529–
535.

Page 16 of 24
Top-Down Effects in Visual Perception

Bar, M., & Ullman, S. (1996). Spatial context in recognition. Perception, 25 (3), 343–352.

Barbas, H. (2000). Connections underlying the synthesis of cognition, memory, and emo­
tion in primate prefrontal cortices. Brain Research Bulletin, 52 (5), 319–330.

Barrett, L. F., & Bar, M. (2009). See it with feeling: Affective predictions during object
perception. Theme issue: Predictions in the brain: Using our past to generate a future (M.
Bar. Ed.), Philosophical Transactions of the Royal Society, Series B, Biological Sciences,
364, 1325–1334.

Bays, P. M., & Wolpert, D. M. (2008). Predictive attenuation in the perception of touch. In
P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Sensorimotor foundations of higher cogni­
tion: Attention and performance XXII (pp. 339–359). New York: Oxford University Press.

Beck, D. M., & Kastner, S. (2009). Top-down and bottom-up mechanisms in biasing com­
petition in the human brain. Vision Research, 49 (10), 1154–1165.

Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy, and J. R.


Pomerantz (Eds.), Perceptual organization (pp. 213–263). Hillsdale, NJ: Erlbaum.

Biederman, I. (1972). Perceiving real-world scenes. Science, 177, 77–80.

Biederman, I., Glass, A. L., & Stacy, W. (1973). Searching for objects in real-world scenes.
Journal of Experimental Psychology, 97, 22–27.

Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: De­
(p. 71)

tecting and judging objects undergoing relational violations. Cognitive Psychology, 14 (2),
143–177.

Blakemore, S. J., Rees, G., & Frith, C. D. (1998). How do we predict the consequences of
our actions? A functional imaging study. Neuropsychologia, 36 (6), 521–529.

Brunia, C. H. M. (1999). Neural aspects of anticipatory behavior. Acta Psychologica, 101,


213–352.

Bruno, N., & Franz, V. H. (2009). When is grasping affected by the Muller-Lyer illusion? A
quantitative review. Neuropsychologia, 47 (6), 1421–1433.

Buchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by atten­
tion: Cortical interactions evaluated with structural equation modeling and fMRI. Cere­
bral Cortex, 7 (8), 768–778.

Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–
107.

Carlsson, K., Petrovic, P., Skare, S., Petersson, K. M., & Ingvar, M. (2000). Tickling antici­
pations: Neural processing in anticipation of a sensory stimulus. Journal of Cognitive Neu­
roscience, 12, 691–703.

Page 17 of 24
Top-Down Effects in Visual Perception

Carmichael, S. T., & Price, J. L. (1995). Limbic connections of the orbital and medial pre­
frontal cortex in macaque monkeys. Journal of Comparative Neurology, 363 (4), 615–641.

Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R. J., & Reinoso-Suarez, F. (2000). The
anatomical connections of the macaque monkey orbitofrontal cortex: A review. Cerebral
Cortex, 10 (3), 220–242.

Cavanagh, P. (1991). What’s up in top-down processing? In A. Gorea (Ed.), Representa­


tions of vision: Trends and tacit assumptions in vision research (pp. 295–304). Cambridge,
UK: Cambridge University Press.

Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning
of visual covariation. Psychological Science, 10, 360–365.

Corbetta, M., Kincade, J. M., & Shulman, G. L. (2002). Neural systems for visual orienting
and their relationships to spatial working memory. Journal of Cognitive Neuroscience, 14,
508–523.

Costall, A. (1980). The three faces of Edgar Rubin. Perception, 9, 115.

Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background per­
ception. Psychological Science, 15 (8), 559–564.

Dehaene, S., Kerszberg, M., & Changeux, J. P. (1998). A neuronal model of a global work­
space in effortful cognitive tasks. Proceedings of the National Academy of Sciences U S A,
95, 14529–14534.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­
nual Review of Neuroscience, 183, 193–222.

Driver, J., & Baylis, G. C. (1998). Attention and visual object segmentation. In R. Parasura­
man (Ed.), The attentive brain (pp. 299–325). Cambridge, MA: MIT Press.

Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: Oscillations and syn­
chrony in top-down processing. Nature Reviews, Neuroscience, 2 (10), 704–716.

Enns, J. T., & Lleras, A. (2008). What’s next? New evidence for prediction in human vi­
sion. Trends in Cognitive Sciences, 12, 327–333.

Escera, C., Alho, K., Schröger, E., & Winkler, I. (2000). Involuntary attention and dis­
tractibility as evaluated with event-related brain potentials. Audiology and Neurootology,
5, 151–166.

Felleman, D. J., & Van Essen, V. C. (1991). Distributed hierarchical processing in primate
visual cortex. Cerebral Cortex, 1, 1–47.

Fenske, M. J., Aminoff, E., Gronau, N., & Bar, M. (2006). Top-down facilitation of visual ob­
ject recognition: Object-based and context-based contributions. Progress in Brain Re­
search, 155, 3–21.
Page 18 of 24
Top-Down Effects in Visual Perception

Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial
structures from visual scenes. Psychological Science, 12, 499–504.

Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri­
mate prefrontal and inferior temporal cortices during visual categorization. Journal of
Neuroscience, 23 (12), 5235–5246.

Frey, S., & Petrides, M. (2000). Orbitofrontal cortex: A key prefrontal region for encoding
information. Proceedings of the National Academy of Sciences U S A, 97, 8723–8727.

Frith, C., & Dolan, R. J. (1997). Brain mechanisms associated with top-down processes in
perception. Philosophical Transactions of the Royal Society, Series B, Biological Sciences,
352 (1358), 1221–1230.

Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal


Society of London, Series B, Biological Sciences, 360 (1456), 815–836.

Ganis, G., & Kosslyn, S. M. (2007). Multiple mechanisms of top-down processing in vision.
In S. Funahashi (Ed.), Representation and brain (pp. 21–45). Tokyo: Springer-Verlag.

Gazzaley, A. (2010). Top-down modulation: The crossroads of perception, attention and


memory. Proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol. 7527, 75270A.

Ghuman, A., Bar, M., Dobbins, I. G., & Schnyer, D. (2008). The effects of priming on
frontal-temporal communication. Proceedings of the National Academy of Sciences U S A,
105 (24), 8405–8409.

Gross, H., Heinze, A., Seiler, T., & Stephan, V. (1999). Generative character of perception:
A neural architecture for sensorimotor anticipation. Neural Networks, 12 (7–8), 1101–
1129.

Gilbert, C., Ito, M., Kapadia, M., & Westheimer, G. (2000). Interactions between attention,
context and learning in primary visual cortex. Vision Research, 40 (10–12), 1217–1226.

Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory process­
ing. Neuron, 54, 677–696.

Girard, P., & Bullier, J. (1989). Visual activity in area V2 during reversible inactivation of
area 17 in the macaque monkey. Journal of Neurophysiology, 62 (6), 1287–1302.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15 (1), 20–25.

Gregory, R. L. (1980). Perceptions as hypotheses. Philosophical Transactions of the Royal


Society of London, Series B, Biological Sciences, 290, 181–197.

Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87
(1), 1–51.

Page 19 of 24
Top-Down Effects in Visual Perception

Hochstein, S., & Ahissar, M. (2002). Review view from the top: Hierarchies and reverse
hierarchies in the visual system. Neuron, 36, 791–804.

Hock, H. S., Romanski, L., Galie, A., & Williams, C. S. (1978). Real-world schemata and
scene recognition in adults and children. Memory and Cognition, 6, 423–431.

Hopfinger, J. B., Buonocore, M. H., & Mangun, G. R. (2000). The neural mechanisms of
top-down attentional control. Nature Neuroscience, 3 (3), 284–291.

Hutchison, K. A. (2003). Is semantic priming due to association strength or feature


(p. 72)

overlap? A microanalytic review. Psychonomic Bulletin & Review, 10 (4), 785–813.

Kaplan, E. (2004). The M, P, and K pathways of the primate visual system. In L. M. Chalu­
pa & J.S. Werner (Eds.), The visual neuroscience (pp. 481–494). Cambridge, MA: MIT
Press.

Kaplan, E., & Shapley, R. M. (1986). The primate retina contains two types of ganglion
cells, with high and low contrast sensitivity. Proceedings of the National Academy of
Sciences U S A, 83 (8), 2755–2757.

Kastner, S., & Ungerleider, L.G. (2000). Mechanisms of visual attention in the human cor­
tex. Annual Review of Neuroscience, 23, 315–341.

Kawato, M., Hayakawa, H., & Inui, T. (1993). A forward-inverse optics model of reciprocal
connections between visual cortical areas. Network, 4, 415–422.

Koffka, K. (1935). The principles of gestalt psychology. New York: Harcourt, Brace, &
World.

Kringelbach, M. L., & Rolls, E. T. (2004). The functional neuroanatomy of the human or­
bitofrontal cortex: Evidence from neuroimaging and neuropsychology. Progress in Neuro­
biology, 72 (5), 341–372.

Kveraga, K., Boshyan, J., & Bar, M. (2007a). Magnocellular projections as the trigger of
top-down facilitation in recognition. Journal of Neuroscience, 27 (48), 13232–13240.

Kveraga, K., Ghuman, A. S., & Bar, M. (2007b). Top-down predictions in the cognitive
brain. Brain and Cognition, 65, 145–168.

Lamme, V. A. F. (2006). Towards a true neural stance on consciousness. Trends in Cogni­


tive Sciences, 10 (11), 494–501.

Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed­
forward and recurrent processing. Trends in Neurosciences, 23, 571–579.

Lamme, V. A. F., Super, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feed­
back processing in the visual cortex. Current Opinion in Neurobiology, 8, 529–535.

Page 20 of 24
Top-Down Effects in Visual Perception

Li, W., Piech, V., & Gilbert, C. D. (2006). Contour saliency in primary visual cortex. Neuron,
50, 951–962.

MacKay, D. (1956). Towards an information-flow model of human behaviour. British Jour­


nal of Psychiatry, 43, 30–43.

Maunsell, J. H. R., & Van Essen, D. C. (1983) Functional properties of neurons in the mid­
dle temporal visual area of the macaque monkey. II. Binocular interactions and the sensi­
tivity to binocular disparity. Journal of Neurophysiology, 49, 1148–1167.

Merigan, W. H., & Maunsell, J. H. (1993). How parallel are the primate visual pathways?
Annual Review of Neuroscience, 16, 369–402.

Meunier, M., Bachevalier, J., & Mishkin, M. (1997). Effects of orbital frontal and anterior
cingulate lesions on object and spatial memory in rhesus monkeys. Neuropsychologia, 35,
999–1015.

Miall, R. C., & Wolpert, D. M. (1996). Forward models for physiological motor control.
Neural Networks, 9 (8), 1265–1279.

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­
nual Review of Neuroscience, 24, 167–202.

Minsky, M. (1975). A framework for representing knowledge, In P. H. Winston (Ed.), The


psychology of computer vision (pp. 163–189). New York: McGraw-Hill.

Mumford, D. (1992). On the computational architecture of the neocortex. I. The role of


cortico-cortical loops. Biological Cybernetics, 66 (3), 241–251.

Munk, M. H., Nowak, L. G., Nelson, J. I., & Bullier, J. (1995). Structural basis of cortical
synchronization. II. Effects of cortical lesions. Journal of Neurophysiology, 74 (6), 2401–
2414.

Näätänen, R., Paavilainen, P., Rinne, T., & Ahlo, K. (2007). The mismatch negativity
(MMN) in basic research of central auditory processing: A review. Clinical Neurophysiolo­
gy, 118, 2544–2590.

Oliva, A. (2005). Gist of the scene. In L. Itti, G. Rees, & J.K. Tsotsos (Eds.), Encyclopedia of
neurobiology of attention (pp. 251–256). San Diego, CA: Elsevier.

Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representa­
tion of the spatial envelope. International Journal in Computer Vision, 42, 145–175.

Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cogni­
tive Sciences, 11 (12), 520–527.

Pessoa, L., Kastner, S., & Ungerleider, L. G. (2003). Neuroimaging studies of attention:
From modulation of sensory processing to top-down control. Journal of Neuroscience, 23
(10), 3990–3998.
Page 21 of 24
Top-Down Effects in Visual Perception

Posner, M. I., & Dehaene, S. (1994). Attentional networks. Trends in Neuroscience, 17,
75–79.

Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional in­
terpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–
87.

Reynolds, J. H., & Chelazzi, L. (2004). Attentional modulation of visual processing. Annual
Review of Neuroscience, 27, 611–647.

Rolls, E. T., Browning, A. S., Inoue, K., & Hernadi, I. (2005). Novel visual stimuli activate
a population of neurons in the primate orbitofrontal cortex. Neurobiology of Learning and
Memory, 84 (2), 111–123.

Salin, P. A., & Bullier, J. (1995). Corticocortical connections in the visual system: Struc­
ture and function. Physiological Reviews, 75 (1), 107–154.

Schacter, D. L., Dobbins, I. G., & Schnyer, D. M. (2004). Specificity of priming: A cognitive
neuroscience perspective. Nature Reviews Neuroscience, 5 (11), 853–862.

Shank, R. C. (1975). Conceptual information processing. New York: Elsevier Science Ltd.

Schiller, P. H., & Malpeli, J. O. (1978). Functional specificity of lateral geniculate nucleus
laminae of the rhesus monkey. Journal of Neurophysiology, 41, 788–797.

Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time- and
spatial-dependent scene recognition. Psychological Science, 5 (4), 195–200.

Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80 (1–2), 1–46.

Schubotz, R. I. (2007). Prediction of external events with our motor system: towards a
new framework. Trends in Cognitive Sciences, 11, 211–218.

Schubotz, R. I., & von Cramon, D. Y. (2002). Dynamic patterns make the premotor cortex
interested in objects: Influence of stimulus and task revealed by fMRI. Brain Research
Cognitive Brain Research, 14, 357–369.

Schubotz, R. I., & von Cramon, D. Y. (2001). Functional organization of the lateral premo­
tor cortex: fMRI reveals different regions activated by anticipation of object properties,
location and speed. Brain Research: Cognitive Brain Research, 11 (1), 97–112.

Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual
(p. 73)

Review of Neuroscience, 23, 473–500.

Shepard, R. (1990). Mind sights. New York: W. H. Freeman.

Sherman, S. M., & Guillery, R. W. (2004). The visual relays in the thalamus. In L. M.
Chalupa & J. S. Werner (Eds.), The visual neuroscience (pp. 565–592). Cambridge, MA:
MIT Press.

Page 22 of 24
Top-Down Effects in Visual Perception

Shipp, S. (2005). The importance of being agranular: A comparative account of visual and
motor cortex. Philosophical Transactions of the Royal Society of London, Series B, Biolog­
ical Sciences, 360, 797–814.

Shipp, S., & Zeki, S. (1989). The organization of connections between areas V5 and V2 in
macaque monkey visual cortex. European Journal of Neuroscience, 1 (4), 333–354.

Simmons, A., Matthews, S. C., Stein, M. B., & Paulus, M. P. (2004). Anticipation of emo­
tionally aversive visual stimuli activates right insula. Neuroreport, 15, 2261–2265.

Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association strength.
Journal of Psycholinguistic Research, 19, 317–330.

Sperry, R. (1950). Neural basis of the spontaneous optokinetic response produced by visu­
al inversion. Journal of Comparative and Physiological Psychology, 43, 482–489.

Summerfield, C., & Egner, T. (2009). Expectation (and attention) in visual cognition.
Trends in Cognitive Sciences, 13 (9), 403–408.

Thorpe, S., & Imbert, M. (1989). Biological constraints on connectionist modeling. In R.


Pfeifer, Z. Schreter, F. Fogelman-Soulié, & L. Steels (Eds), Connectionism in perspective
(pp. 63–93). Amsterdam: Elsevier.

Thorpe, S. J., Rolls, E. T., & Maddison, S. (1983). Neuronal activity in the orbitofrontal
cortex of the behaving monkey. Experimental Brain Research, 49, 93–115.

Todorović, D. (2007). W. Metzger: Laws of seeing. Gestalt Theory, 28, 176–180.

Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network, 14 (3),
391–412.

Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance
of attention in natural scenes: the role of global features on object search. Psychological
Review, 113, 766–786.

Tulving, E., & Schacter, D. L. (1990). Priming and human memory systems. Science, 247,
301–306.

Ullman, S. (1995). Sequence seeking and counter streams: A computational model for
bidirectional information flow in the visual cortex. Cerebral Cortex, 1, 1–11.

Ungerleider, L. G., & Desimone, R. (1986). Cortical connections of visual area MT in the
macaque. Journal of Comparative Neurology, 248 (2), 190–222.

von Stein, A., Chiang, C., Konig, P., & Lorincz, A. (2000). Top-down processing mediated
by interareal synchronization. Proceedings of the National Academy of Sciences U S A, 97
(26), 14748–14753.

Page 23 of 24
Top-Down Effects in Visual Perception

von Stein, A., & Satnthein, J. (2000). Different frequencies for different scales of cortical
integration: From local gamma to long range alpha/theta synchronization. International
Journal of Psychophysiology, 38 (3), 301–313.

Watanabe, T., Harner, A. M., Miyauchi, S., Sasaki, Y., Nielsen, M., Palomo, D., & Mukai, I.
(1998). Task-dependent influences of attention on the activation of human primary visual
cortex. Proceedings of the National Academy of Sciences U S A, 95, 11489–11492.

Widmann, A., Gruber, T., Kujala, T., Tervaniemi, M., & Schroger, E. (2007). Binding sym­
bols and sounds: Evidence from event-related oscillatory gamma-band activity. Cerebral
Cortex, 17, 2696–2702.

Wiggs, C. L., & Martin, A. (1998). Properties and mechanisms of visual priming. Current
Opinion in Neurobiology, 8, 227–233.

Winkler, I., & Czigler, I. (1998). Mismatch negativity: Deviance detection or the mainte­
nance of the “standard.” Neuroreport, 9, 3809–3813.

Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction. Current Biology, 11 (18),
R729–R732.

Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for mo­
tor control. Neural Networks, 11 (7–8), 1317–1329.

Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness. Consciousness and
Cognition 8, 225–259.

Moshe Bar

Moshe Bar is a neuroscientist, director of the Gonda Multidisciplinary Brain Re­


search Center at Bar-Ilan University, associate professor in psychiatry and radiology
at Harvard Medical School, and associate professor in psychiatry and neuroscience
at Massachusetts General Hospital. He directs the Cognitive Neuroscience Laborato­
ry at the Athinoula A. Martinos Center for Biomedical Imaging.

Andreja Bubic

Andreja Bubic, Martinos Center for Biomedical Imaging, Massachusetts General Hos­
pital, Harvard Medical School, Charlestown, MA

Page 24 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery

Neural Underpinning of Object Mental Imagery, Spatial


Imagery, and Motor Imagery  
Grégoire Borst
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0005

Abstract and Keywords

Mental imagery is one of the cognitive functions that has received a lot of attention in the
past 40 years both from philosophers and cognitive psychologists. Recently, researchers
started to use neuroimaging techniques in order to tackle fundamental properties of men­
tal images such as their depictive nature—which was fiercely debated for almost 30
years. Results from neuroimaging, brain-damaged patients, and transcranial magnetic
stimulation studies converged in showing that visual, spatial and motor mental imagery
relies on the same basic brain mechanisms used respectively in visual perception, spatial
cognition, and motor control. Thus, neuroimaging and lesions studies have proved critical
to answer the imagery debate between depictive and propositionalist theorists. Partly be­
cause of the controversy that surrounded the nature of mental images, the neural bases
of mental imagery are probably more closely defined than those of any other higher cog­
nitive functions.

Keywords: visual mental imagery, spatial mental imagery, motor mental imagery, visual perception, neuroimaging,
brain-damaged patients, transcranial magnetic stimulation

When we think of the best way to load luggage in the trunk of a car, of the fastest route to
go from point A to point B, or of the easiest way to assemble bookshelves, we generally
rely on our abilities to simulate those events by visualizing them instead of actually per­
forming them. When we do so, we experience “seeing with the mind’s eye,” which is the
hallmark of a specific type of representation processed by our brain, namely, visual men­
tal images. According to Kosslyn, Thompson, and Ganis (2006), mental images are repre­
sentations that are similar to those created on the initial phase of perception but that do
not require an external stimulation to be created. In addition, those representations pre­
serve the perceptible properties of the stimuli they represent.

Page 1 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery

Early Studies of Mental Imagery and the


Imagery Debate
Visual mental imagery has a very specific place in the study of human mental activity. In
fact, dating back to early theories of mental activity, Greek philosophers such as Plato
proposed that memory might be analogous to a wax tablet into which our perception and
thoughts stamp images of themselves, as a signet ring stamps impressions in wax. Ac­
cording to this view, seeing with the mind’s eye is considered a phenomenon closely relat­
ed to perceptual activities. Thus, the idea of an analogy between mental imagery and per­
ception is not new. However, because of the inherent private and introspective nature of
mental imagery, garnering objective empirical evidence of the nature of these representa­
tions has been a great challenge for psychology researchers. The introspective nature of
imagery led behaviorists (who championed the idea that psychology should focus on ob­
servable stimuli and the responses to these stimuli) such as Watson (1913) to deny the ex­
istence of mental images by asserting that thinking was solely constituted by subtle
movements of the vocal apparatus. Behaviorism has had a (p. 75) long-lasting negative im­
pact on the legitimacy of studying mental imagery. In fact, neither the cognitive revolu­
tion of the 1950s—during which the human mind started to be conceived of as like com­
puter software—nor the first results of Paivio (1971) showing that mental imagery im­
proves the memorization of words were sufficient to legitimize the study of mental im­
agery.

The revival of mental imagery was driven not only by empirical evidence that mental im­
agery was a key part of memory, problem solving, and creativity but also by the type of
questions and methodologies researchers used. Researchers shifted from phenomenologi­
cal problematic and introspective methods and started to focus on refining the under­
standing of the nature of the representations involved in mental imagery and of the cogni­
tive processes that interpret those representations. The innovative idea was to use
chronometric data as a “mental tape measure” of the underlying cognitive processes that
interpret mental images in order to characterize the properties of the underlying repre­
sentations and cognitive processes. One of the most influential works that helped mental
imagery to regain its respectability was proposed by Shepard and Metzler (1971). In their
mental rotation paradigm, participants viewed two three-dimensional (3D) objects with
several arms, each consisting of small cubes, and decided whether the two objects had
the same shape, regardless of difference in the orientations of the objects. The key find­
ing was that response times increased linearly with increasing angular disparity between
the two objects. The results demonstrated for the first time that people mentally rotated
one of the objects in congruence with the orientation of the other object. Other para­
digms, such as the image scanning paradigm (e.g., Finke & Pinker, 1982; Kosslyn, Ball, &
Reiser, 1978), allowed researchers to characterize not only the properties of the cognitive
processes at play in visual mental imagery but also the nature of visual mental images.
Critically, the data of these experiments suggested that visual mental images are depic­
tive representations. By depictive, researchers mean that (1) each part of the representa­
tion corresponds to a part of the represented object, such that (2) the distances between
Page 2 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
representations of the parts (in a representational space) correspond to the distances be­
tween the parts on the object itself (see Kosslyn et al., 2006).

However, not all researchers interpreted behavioral results in mental imagery studies as
evidence that mental images are depictive. For example, Pylyshyn (1973, 2002, 2003a,
2003b, 2007) proposed a propositional account of mental imagery. According to this view,
results obtained in mental imagery experiments can be best explained by the fact that
participants rely not on visuo-spatial mental images, but instead on descriptive represen­
tations (the sort of representations that underlie language). Pylyshyn (1981) championed
the idea that the conscious experience of visualizing an object is purely epiphenomenal,
as is the power light on an electronic device—the light does not plays a functional role in
the way the electronic device works. Thus, it became evident that behavioral data would
not be sufficient to resolve the mental imagery debate between propositional and depic­
tive researchers. In fact, Anderson (1978) demonstrated that any behavioral data collect­
ed in a visual mental imagery experiment could be explained equally well by inferring
that depictive representations were processed or that a set of propositions were
processed.

As cognitive neuroscience started to elicit the neural underpinning of a number of higher


cognitive functions and of visual perception started, it became evident that neuroimaging
data could resolve the imagery debate initiated in the 1970s. The rationale of using neu­
roimaging to characterize the nature of visual mental images followed directly on the
heels of the functional and structural equivalence documented in behavioral studies be­
tween visual mental imagery and visual perception (see Kosslyn, 1980). Researchers rea­
soned that if visual mental imagery relies on representations and cognitive processes sim­
ilar to those involved during visual perception, then visual mental imagery should rely on
the same brain areas that support visual perception (Kosslyn, 1994).

In this chapter we report results collected in positron tomography emission (PET), func­
tional magnetic resonance imagery (fMRI), transcranial magnetic stimulation (TMS), and
brain lesions studies, which converged in showing that visual mental imagery relies on
the same brain areas as those elicited when one perceives the world or initiates an ac­
tion. The different methods serve different means. For example, fMRI allows researchers
to monitor the whole brain at work with a good spatial resolution—by contrasting the
mean blood oxygen level–dependent signal (BOLD) in a baseline condition to the BOLD
signal in an experimental condition. However, fMRI provides information on the correla­
tions between the brain areas activated and the tasks performed but not on the causal re­
lations between the two. In contrast, brain-damaged patients and TMS studies can pro­
vide such causal (p. 76) relations. In fact, if a performance in a particular task is selective­
ly impaired following a virtual lesion (TMS) or an actual brain lesion, this specific brain
area plays a causal role in the cognitive processes and representations engaged in that
particular task. However, researchers need to rely on previous fMRI or PET studies to de­
cide what specific brain areas to target with TMS or which patients to include in their

Page 3 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
study. Thus, a comprehensive view of the neural underpinning of any cognitive function
requires taking into account data from all of these approaches.

In this chapter, we first discuss and review the results of studies that document an over­
lap of the neural circuitry in the early visual cortex between visual mental imagery and vi­
sual perception. Then, we present studies that specifically look at the neural underpin­
ning of shape-based mental imagery and spatial mental imagery. Finally, we report stud­
ies on the neural bases of motor imagery and how they overlap with those recruited when
ones initiates an action.

Visual Mental Imagery and the Early Visual Ar­


eas
The early visual cortex comprises Brodmann areas 17 and 18, which receive input from
the retina. These visual areas are retinotopically organized: Two objects located close to
each other in a visual scene activate neurons in areas 17 and 18 relatively close to each
other (Sereno et al., 1995). Thus, visual space is represented topographically in the visual
cortex using two dimensions: eccentricity and polar angle. “Eccentricity” is the distance
from the fovea (i.e., high-resolution central region of the visual field) of a point projected
on the retina. Crucially, the farther away a point is located from the fovea, the more ante­
rior the activation is observed in the early visual cortex. Bearing on the way eccentricity
is represented on the cortex, Kosslyn and collaborators (1993) used PET to study whether
area 17 was recruited during visual mental imagery of letters. In their task, participants
visualized letters, maintained the mental images of the letters for 4 seconds, and then
were asked to make a judgment about a visual property of the letters—such as whether
the letters possess a straight line. Blood flow was monitored through PET. The hypothesis
was that if visual mental images were depictive and recruited topographical areas of the
visual cortex, then when participants were asked to visualize letters as small as possible
(while remaining visible), the activation of area 17 should be more anterior than when
participants visualized letters as large as possible (while being entirely visible). The re­
sults were consistent with their hypothesis: Large visualized letters activated posterior
regions of area 17, whereas small visualized letters recruited anterior regions of area 17.
Kosslyn, Thompson, Kim, and Alpert (1995) replicated the results in a PET study in which
participants visualized line drawings of objects previously memorized in boxes of three
different sizes. These two studies used a neuroimaging technique with limited spatial res­
olutions, which led some to raise doubt about these results.

However, similar findings were reported when fMRI was used—a technique that provides
a better spatial resolution of the brain areas activated. For example, Klein, Paradis, Po­
line, Kosslyn, and Lebihan (2000) in an event-related fMRI study documented an activa­
tion of area 17 that started 2 seconds after the auditory cue prompting participants to
form a mental image, peaked around 4 to 6 seconds, and dropped off after 8 seconds or
so. In a follow-up experiment, Klein et al. (2004) demonstrated that the orientation with
which a bowtie shape stimulus was visualized modulated the activation of the visual cor­
Page 4 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tex. The activation elicited by visualizing the bowtie vertically or horizontally matched the
cortical representations of the horizontal and vertical meridians. Moreover, in a fMRI
study, Slotnick, Thompson, and Kosslyn (2005) found that the retinotopic maps produced
by the visual presentation of rotating and flickering checkerboard wedges were similar to
the ones produced when rotating and flickering checkerboard wedges were visualized.
And to some extent, those maps were more similar than the maps produced in an atten­
tion-based condition. Finally, Thirion and colleagues (2006) adopted an “inverse retino­
topy” approach to infer the content of visual images based on the brain activations ob­
served. Participants were asked in a perceptual condition to fixate rotating Gabor patches
and in the mental imagery condition to visualize one of the six Gabor patches rotating
right or left of a fixation point. Authors were able to predict accurately the stimuli partici­
pants had seen and to a certain degree the stimuli participants had visualized. Crucially,
most of the voxels leading to a correct prediction of the stimuli visualized or presented vi­
sually were located in area 17 and 18.

Figure 5.1 Set of stimuli (left) and mean response


times for each participant (noted 1 to 5) in the two
experimental conditions (perception vs. mental im­
agery) as a function of the repetitive transcranial
magnetic stimulation (rTMS) condition (real vs.
sham).

Taken together, results from fMRI and PET studies converge in showing that visual men­
tal imagery activates the early visual areas and that the spatial structure of the activa­
tions elicited by the mental imagery task is accounted for by standard (p. 77) retinotopic
mapping. Nevertheless, the questions remained as to whether activation of the early visu­
al areas plays any functional role in visual imagery. In order to address this question,
Kosslyn et al. (1999) designed a task in which participants first memorized four patterns
of black and white stripes (which varied in length, width, orientation, and spacing of the
stripes; Figure 5.1) in four quadrants, and then were asked to visualize two of the pat­
terns and to compare them on a given visual attribute (such as the orientation of the
stripes). The same participants performed the task in a perceptual condition on which
their judgments were based on patterns of stripes displayed on the screen. In both condi­
tions, before comparing two patterns of stripes, repetitive TMS (rTMS) was delivered to
the early visual cortex—which had been shown to be activated using PET. In rTMS stud­
Page 5 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
ies, a coil is used to deliver low-frequency magnetic pulses, which decrease cortical ex­
citability for several minutes in the cortical area targeted (see Siebner et al. 2000). This
technique has the advantage that the disruption is reversible and lasts for a few minutes.
In addition, because the disruption is transient, there are no compensatory phenomena as
with real lesions. When stimulation was delivered to the posterior occipital lobe (real
rTMS condition), participants required more time to compare two patterns of stripes than
when stimulations were delivered away from the brain (in a sham rTMS). The effect of re­
al rTMS (as denoted by the difference between the sham and real stimulations; see Fig­
ure 5.1) was similar in visual mental imagery and visual perception, which makes sense if
area 17 is critical for both.

Sparing et al. (2002) used another TMS approach to determine whether visual mental im­
agery modulates cortex excitability. The rationale of their approach was to use the
phosphene threshold (PT; i.e., the minimum TMS intensity that evokes phosphenes) to de­
termine the cortical excitability of the primary visual areas of the brain. A single-pulse
TMS was delivered on the primary visual cortex to produce phosphenes in the right lower
quadrant of the visual field. Concurrently, participants performed either a visual mental
imagery task or an auditory control task. For each participant, the PT was determined by
increasing TMS intensity on each trial until participants reported experiencing
phosphenes. Visual mental imagery decreased the PT compared with the baseline condi­
tion, whereas the auditory task had no effect on the PT. The results indicate that visual
mental imagery enhances cortex excitability in the visual cortex, which supports the func­
tional role of the primary visual cortex in visual mental imagery. Consistent with the role
of area 17 in visual mental imagery, the apparent horizontal (p. 78) size of visual mental
images of a patient who had the occipital lobe surgically removed in one hemisphere was
half the apparent size of mental images in normal participants (Farah, Soso, & Dasheiff,
1992).

However, not all studies reported a functional role of area 17 in visual mental imagery. In
fact, neuropsychological studies offered compelling evidence that cortically blind patients
could have spared visual mental imagery abilities (see Bartolomeo, 2002, 2008). Anton
(1899) and Goldenberg, Müllbacher, and Nowak (1995) reported cortically blind patients
who seemed to be able to form visual mental images. In addition, Chatterjee and South­
wood (1995) reported two cortically blind patients resulting from medial occipital lesions
with no impairment of their capacity to imagine object forms—such as capital letters or
common animals. These two patients could also draw a set of common objects from mem­
ory.

Finally, Kosslyn and Thompson (2003), reviewed more than 50 neuroimaging studies (fM­
RI, PET, and single-photon emission computer tomography, or SPECT) and found that in
nearly half, there was no activation of the early visual cortex. A meta-analysis of the neu­
roimaging literature of visual mental imagery revealed that three factors accounted for
the probability of activation in area 17. Sensitivity of the technique is one of the factors,
and 19 fMRI studies out of 27 reported activation in area 17, compared with only 2
SPECT studies out of 9 reporting such activation. The degree of detail of the visual men­

Page 6 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tal images that needs to be generated is also important, with high-resolution mental im­
ages more likely to elicit activation in area 17. Finally, the type of judgment accounts for
the probability of activation in area 17: If spatial judgment is required, activation in area
17 is less likely. Thus, activation in area 17 most likely reflects the computation needed to
generate the visual images, at least for certain types of mental images, such as high-reso­
lution, shape-based mental images.

Visual Mental Imagery and Higher Visual Areas


The overlap of the brain areas elicited by visual perception and mental imagery was stud­
ied not only in early visual areas but also in higher visual areas. The visual system is orga­
nized hierarchically, with early visual cortical areas (areas 17 and 18) located on the low­
est level (see Felleman & Van Essen, 1991). Brain lesions and neuroimaging studies docu­
ment that the visual system is then organized in two parallel streams with different func­
tions (e.g., Goodale & Milner, 1992; Haxby et al., 1991; Ungerleider & Mishkin, 1982).
The ventral stream (running from the occipital lobes down to the inferior temporal lobes)
is specialized in processing object properties of percepts (such as shape, color, and tex­
ture), whereas the dorsal stream (running from the occipital lobes up to the posterior
parietal lobes) is specialized in processing spatial properties of percepts (such as orienta­
tion and location) and action (but see for a discussion Borst, Thompson, and Kosslyn,
2011). A critical finding is that parallel deficits occur in visual mental imagery (e.g.,
Levine, Warach, & Farah, 1985): Damages to the ventral stream disrupt the ability to vi­
sualize the shape of objects (such as the shape of a stop sign), whereas damages to the
dorsal stream disrupt the ability to create a spatial mental image (such as the locations of
landmarks on a map).

In the next section, we review neuroimaging and brain-damaged patient studies showing
that shape-based mental imagery (including mental images of faces) and visual percep­
tion engage most of the same higher visual areas in the ventral stream and that spatial
mental imagery and spatial vision recruit most of the same areas in the dorsal stream.

Ventral Stream, Shape-Based Mental Imagery and Color Imagery

Brain imaging and neuropsychological data document a spatial segregation of visual ob­
ject representations in the higher visual areas. For example, Kanwisher and Yovel (2006)
demonstrated that the lateral fusiform gyrus responds more strongly to pictures of faces
than other categories of objects, whereas the medial fusiform gyrus and the parahip­
pocampal gyri respond selectively to pictures of buildings (Downing, Chan, Peelen,
Dodds, & Kanwisher, 2006).

To demonstrate the similarity between the cognitive processes and representations in vi­
sion and visual mental imagery, researchers investigated whether the spatial segregation
of visual objects in the ventral stream can be found during shape-based mental imagery.
Bearing on this logic, O’Craven and Kanwisher (2000) asked a group of participants ei­
ther to recognize pictures of familiar faces and buildings or to visualize those pictures in
Page 7 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
an fMRI study. In the perceptual condition, a direct comparison of activation elicited by
the two types of stimuli (buildings and faces) revealed a clear segregation within the ven­
trotemporal cortex—with activation found in the fusiform face area (FFA) for faces and in
the parahippocampal place area for buildings (PPA). In the visual mental imagery condi­
tion, (p. 79) the same pattern was observed but with weaker activation and smaller patch­
es of cortex activated. Crucially, there was no hint of activation of the FFA when partici­
pants visualized faces, nor of the PPA when they visualized buildings. The similarity be­
tween vision and mental imagery in the higher visual areas was further demonstrated by
the fact that more than 84 percent of the voxels activated in the mental imagery condition
were activated in the perceptual condition.

These results were replicated in another fMRI study (Ishai, Ungerleider, & Haxby, 2000).
In this study, participants were asked either to view passively pictures of three objects
categories (i.e., faces, houses, and chairs), to view scrambles version of these pictures
(perceptual control condition), to visualize the pictures while looking at a gray back­
ground, or to stare passively at the gray background (mental imagery control condition).
When activation elicited by the three object categories were compared in the perceptual
condition—after removing the activation in the respective control condition—different re­
gions in the ventral stream showed differential responses to faces (FFA), houses (PPA),
and chairs (inferior temporal gyrus). Fifteen percent of the voxels in these three ventral
stream regions showed a similar pattern of activation in the mental imagery condition.
Mental images of the three categories of objects elicited additional activation in the pari­
etal and the frontal regions that were not found in the perceptual condition.

In a follow-up study, Ishai, Haxby, and Ungerleider (2002) studied the activation elicited
by famous faces either presented visually or visualized. In the mental imagery condition,
participants studied pictures of half of the famous faces before the experiment. For the
other half of the trials, participants had to rely on their long-term memory to generate the
mental images of the faces. In the mental imagery and perceptual conditions, the FFA
(lateral fusiform gyrus) was activated, and 25 percent of the voxels activated in the men­
tal imagery condition were within regions recruited during the perceptual condition. The
authors found that activation within the FFA was stronger for faces studied before the ex­
periment than for faces generated on the basis of information stored in long-term memo­
ry. In addition, given that visual attention did not modulate the activity recorded in higher
visual areas, Ishai and colleagues argued that attention and mental imagery are dissociat­
ed to some degree.

Finally, although mental imagery and perception recruit the same category-selective ar­
eas in the ventral stream, these areas are activated predominantly through bottom-up in­
puts during perception and through top-down inputs during mental imagery. In fact, a
new analysis of the data reported by Ishai et al. (2000) revealed that the functional con­
nectivity of ventral stream areas was stronger with the early visual areas in visual percep­
tion; whereas during visual mental imagery, stronger functional connections were found

Page 8 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
between the higher visual areas and the frontal and parietal areas (Mechelli, Price, Fris­
ton, & Ishai, 2004).

A recent fMRI study further supported the similarity of the brain areas recruited in the
ventral stream during visual mental imagery and visual perception (Stokes, Thompson,
Cusack, & Duncan, 2009). In this study, participants were asked to visualize an “X” or an
“O” based on an auditory cue, or to view passively the two capital letters displayed on a
computer screen. During both conditions (i.e., visual mental imagery and visual percep­
tion), the visual cortex was significantly activated. Above-baseline activation was record­
ed in the calcarine sulcus, cuneus, and lingual gyrus, and it extended to the fusiform and
middle temporal gyri. In addition, in both conditions, a multivoxel pattern analysis re­
stricted to the anterior and posterior regions of the lateral occipital cortex (LOC) re­
vealed that different populations of neurons code for the two types of stimuli (“X” and
“O”). Critically, a perceptual classifier trained on patterns of activation elicited by the
perceptual presentation of the stimuli was able to predict the type of visual images gener­
ated in the mental imagery condition. The data speak to the fact that mental imagery and
visual perception activate shared content-specific representations in high-level visual ar­
eas, including in the LOC.

Brain lesions studies generally present converging evidence that mental imagery and per­
ception rely on the same cortical areas in the ventral stream (see Ganis, Thompson, Mast,
& Kosslyn, 2003). The logic underlying the brain lesions studies is that if visual mental
imagery and perception engage the same visual areas, then the same pattern of impair­
ment should be observe in the two functions. Moreover, given that different visual mental
images (i.e., houses vs. faces) selectively elicit activation in different areas of the ventral
stream, the impairment in one domain of mental imagery (color or shape) should yield
parallel deficit in this specific domain in visual perception. In fact, patients with impair­
ment in face recognition (i.e., prosopagnosia) are impaired in their ability to visualize
faces (Shuttleworth, Syring, & Allen, 1982; (p. 80) Young, Humphreys, Riddoch, Hellawell,
& De Haan, 1994). Selective deficit to identify animals in a single case study was accom­
panied by similar deficit to describe animals or to draw them from memory (Sartori &
Job, 1988). In addition, as revealed by an early review of the literature (Farah, 1984), ob­
ject agnosia was generally associated with deficit in the ability to visualize objects. Even
finer parallel deficits can be observed in the ventral stream. For example, Farah, Ham­
mond, Mehta, and Ratcliff (1989) reported the case of a prosopagnosic patient with spe­
cific deficit in his ability to visualize living things (such as animals or faces) but not in his
ability to visualize nonliving things. In addition, some brain-damaged patients cannot dis­
tinguish colors perceptually and present similar deficits in color imagery (e.g., Rizzo,
Smith, Pokorony, & Damasio, 1993). Critically, patients with color perception deficits have
good general mental imagery abilities but are specifically impaired in color mental im­
agery tasks (e.g., Riddoch & Humphreys, 1987).

However, not all neuropsychological findings report parallel deficits in mental imagery
and perception. Cases of patients were reported who had altered perception but pre­
served imagery (e.g., Bartolomeo et al., 1998; Behrmann, Moscovitch, & Winocur, 1994;

Page 9 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Servos & Goodale, 1995). For example, Behrmann et al. (1994) reported the case of C.K.,
a brain-damaged patient with a left homonymous hemianopia and a possible thinning of
the occipital cortex (as revealed by a PET and MRI scan) who was severely impaired at
recognizing objects but who had no apparent deficit in shape-based mental imagery. In
fact, C.K. could draw objects with considerable detail from memory and could use infor­
mation derived from visual images in a variety of tasks. Conversely, he could not identify
objects presented visually, even those he drew from memory. A similar dissociation be­
tween perceptual impairments and relative spared ability in mental imagery was ob­
served in Madame D. (Bartolomeo et al., 1998). Following bilateral brain lesions to the ex­
trastriate visual areas (i.e., Brodmann areas 18, 19 bilaterally and 37 in the right hemi­
sphere), Madame D. developed severe alexia, agnosia, prosopagnosia, and achromatop­
sia. Her ability to recognize objects presented visually was severely impaired except for
very simple shapes like geometric figures. In contrast, she could draw objects from mem­
ory, but she could not identify them. She performed well on an object mental imagery
test. Her impairment was not restricted to shape processing. In fact, she could not dis­
criminate between colors, match colors, or point to the correct color. In sharp contrast,
she presented no deficit in color imagery, being able, for example, to determine which of
two objects had a darker hue when presented with a pair of objects names.

In some instances, studies reported the reverse pattern of dissociation with relatively nor­
mal perception associated with deficits in visual mental imagery (e.g., Goldenberg, 1992;
Guariglia, Padovani, Pantano, & Pizzamiglio, 1993; Jacobson, Pearson, & Robertson,
2008; Moro, Berlucchi, Lerch, Tomaiuolo, & Aglioti, 2008). For example, two patients who
performed a battery of mental imagery tests in several sensory domains (visual, tactile,
auditory, gustatory, olfactory, and motor) showed pure visual imagery deficit for one and
visual and tactile imagery deficit for the other. Critically, the two patients had no appar­
ent perceptual, language, or memory deficits (Moro et al., 2008). Lesions were located in
the middle and inferior temporal gyri of the left hemisphere in one patient and in the tem­
poro-occipital area and the left medial and superior parietal lobe in the other patient.

The fact that some brain-damaged patients can present spared mental imagery with
deficit in visual perception or spared visual perception with deficit in mental imagery
could reveal a double dissociation between shape- and color-based imagery and visual
perception. In fact, visualizing an object relies on top-down processes that are not always
necessary to perceive this object, whereas perceiving an object relies on bottom-up orga­
nizational processes not required to visualize it (e.g., Ganis, Thompson, & Kosslyn, 2004;
Kosslyn, 1994). This double dissociation is supported by the fact that not all of the same
brain areas are activated during visual mental imagery and visual perception (Ganis et
al., 2004; Kosslyn, Thompson, & Alpert, 1997). In an attempt to quantify the similarity be­
tween visual mental imagery and visual perception, Ganis et al. (2004) in an fMRI study
asked participants to judge visual properties of objects (such as whether the object was
taller than wide) based either on a visual mental image of that object or on a picture of
that object presented visually. Across the entire brain, the amount of overlap of the brain
regions activated during visual mental imagery and visual perception reached 90 percent.
The amount of overlap in activation was smaller in the occipital and temporal lobes than
Page 10 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
in the frontal and parietal lobes, which suggests that perception relies in part on bottom-
up organizational processes that are not used as extensively during mental imagery. How­
ever, visual imagery elicited activation in regions that were a (p. 81) subset of the regions
activated during the perceptual condition.

Dorsal Stream and Spatial Mental Imagery

In the same way that researchers have studied brain areas in the ventral stream involved
in shape- and color-based mental imagery, researchers have identified brain areas recruit­
ed during spatial mental imagery in the dorsal stream. A number of neuroimaging studies
used a well-understood mental imagery phenomenon to investigate the brain areas elicit­
ed during spatial mental imagery, namely, the image scanning paradigm.

In the image scanning paradigm, participants first learn a map of an island with a num­
ber of landmarks, then they mentally scan the distance between each pair of landmarks
after hearing the names of a pair of landmarks (e.g., Denis & Cocude, 1989; Kosslyn et al.,
1978). The landmarks are positioned in such a way that distances between each pair of
landmarks are different. The classic finding is a linear increase of response times with in­
creasing distance between landmarks (see Denis & Kosslyn, 1999). The linear relation­
ship between distance and scanning times suggests that spatial images incorporate the
metric properties of the objects they represent—which constitutes some of the evidence
that spatial images depict information. In a PET study, Mellet, Tzourio, Denis, and Mazoy­
er (1995) investigated the neural basis of image scanning. After learning the map of a cir­
cular island, participants were asked either to scan between each landmark on a map pre­
sented visually in clockwise or counterclockwise direction or to scan a mental image of
the same map in the same way. When compared with a rest condition, both conditions
elicited brain activation in the bilateral superior external occipital regions and in the left
internal parietal region (precuneus). However, primary visual areas were activated only in
the perceptual condition.

fMRI studies provided further evidence that spatial processing of spatial images and spa­
tial processing of the same material presented visually share the same brain areas in the
dorsal stream (e.g., Trojano et al., 2000, 2004). For example, Trojano et al. (2000) asked
participants to visualize two analogue clock faces and then to decide on which of them
the clock hands form the greater angle. In the perceptual task, the task of the partici­
pants was identical, but the two clock faces were presented visually. When compared with
a control condition (i.e., participants judged which of the two times was numerically
greater), the mental imagery condition elicited activation in the posterior parietal cortex
and several frontal regions. In both conditions, brain activation was found in the inferior
parietal sulcus (IPS). Critically, when the two conditions (imagery and perception) were
directly contrasted, the activity in the IPS was no longer observed. The neuroimaging da­
ta suggest that the IPS supports spatial processing of mental images and of visual per­
cepts. In a follow-up study using the clock-face mental imagery task in an event-related
fMRI study, Formisano et al. (2002) found similar activation of the posterior parietal cor­

Page 11 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tex with a peak of activation in the IPS 2 seconds after the auditory presentation of the
hours to visualize.

Interestingly, the frontoparietal network at play during spatial imagery is not restricted to
the processing of static spatial representation. In fact, Kaas, Weigelt, Roebroeck, Kohler,
and Muckli (2010) studied the brain areas recruited when participants were imagining
objects in movement using fMRI. In the motion imagery task, participants were asked to
visualize a blue ball moving back and forth within either the upper right corner or the
lower left corner of a computer screen. Participants imagined the motion of the ball at dif­
ferent speeds—adjusted in the function of duration of an auditory cue. To determine
whether participants visualized the ball at the correct speed, participants were required
upon hearing a specific auditory cue to decide which of two visual targets was closer to
the imagined blue ball. The motion imagery task elicited activation in a parietofrontal net­
work comprising bilaterally the superior and inferior parietal lobules (areas 7 and 40) and
the superior frontal gyrus (area 6), in addition to activation in the left middle occipital
gyrus and hMT/V5+. Finally, in V1, V2, and V3, a negative BOLD response was found.
Kass and colleagues argue that this negative BOLD signal might reflect an inhibition of
these areas to prevent visual inputs to interfere with motion imagery in higher visual ar­
eas such as hMT/V5+.

The recruitment of the dorsal route for spatial imagery is not restricted to the modality in
which information is presented. Mellet et al. (2002) found similar activation in a pari­
etofrontal network (i.e., intraparietal sulcus, presupplementary motor area, and superior
frontal sulcus) when participants mentally scan an environment described verbally or an
environment learned visually. Activation of similar brain areas in the dorsal route is also
observed when participants generate spatial images of cubes assembled on the basis of
verbal information (Mellet et al., 1996). In addition, neuroimaging studies on (p. 82) blind
participants suggest that representations and cognitive processes in spatial imagery are
not visuo-spatial. For example, during a spatial mental imagery task, the superior occipi­
tal (area 19), the precuneus, and the superior parietal lobes (area 7) were activated in the
same way in sighted and early blind participants (Vanlierde, de Volder, Wanet-Defalque, &
Veraart, 2003). The task required participants to generate a pattern in a 6 × 6 grid by fill­
ing in cells based on verbal instructions. Once they generated the mental image of the
pattern, participants judged the symmetry of this pattern. The fact that vision is not nec­
essary to form and to process spatial images was further demonstrated in an rTMS study.
Aleman et al. (2002) found that participants required more time to determine whether a
cross presented visually “fell” on the uppercase letter they visualized in a real rTMS con­
dition (compared with a sham rTMS condition) only when repetitive pulses were delivered
on the posterior parietal cortex (P4 positions) but not when delivered on the early visual
cortex (Oz position).

The functional role of the dorsal route in spatial mental imagery is supported by results
collected on brain-damaged patients (e.g., Farah et al., 1988; Levine et al., 1985; Luzzatti,
Vecchi, Agazzi, Cesa-Bianchi, & Vergani, 1998; Morton & Morris, 1995). For example,
Morton and Morris (1995) reported a patient called M.G. with a left parieto-occipital le­

Page 12 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
sion who was selectively impaired in visuo-spatial processing. M.G. had no deficit in face
recognition and visual memory tests nor in an image inspection task. In contrast, she was
not only impaired on a mental rotation task but also on an image scanning task. She could
learn the map of the island and indicate the correct positions of the landmarks, but she
was not able to mentally scan the distance between the landmarks. She presented similar
deficit when asked to scan the contour of block letters.

Motor Imagery
In the previous sections, we presented evidence that visual mental imagery and spatial
imagery rely on the same brain areas as the ones elicited during vision and spatial vision,
respectively. Given that motor imagery occurs when a movement is mentally simulated,
motor imagery should recruit brain areas involved in physical movement. And in fact
there is a growing number of evidence that motor areas are activated during motor im­
agery. In the next section, we review evidence that motor imagery engages the same
brain areas as the ones recruited during a physical movement, including in some in­
stances the primary motor cortex, and that motor imagery is one of the strategies used to
transform mental images.

Motor Imagery and Physical Movement

Decety and Jeannerod (1995) demonstrated that if one is asked to mentally walk from
point A to point B, the time to realize this “mental travel” is similar to the time one would
take to walk that distance. This mental travel effect (i.e., similarity of the time to imagine
an action and the time to perform that action) constitutes strong evidence that motor im­
agery is crucial to simulating actual physical movements. Motor imagery is a particular
type of mental imagery and differs from visual imagery (and to a certain extent from spa­
tial imagery). In fact, a number of studies have documented that visual mental imagery
and motor imagery rely on distinct mechanisms and brain areas (Tomasino, Borroni, Isa­
ja, & Rumiati, 2005; Wraga, Shepard, Church, Iniati, Kosslyn, 2005; Wraga, Thompson,
Alpert, & Kosslyn, 2003). A single-cell recoding of the motor strip of monkeys first demon­
strated that motor imagery relies partially on areas of the cortex that carry motor control:
Neurons in the motor cortex fired in sequence depending of their orientation tuning while
monkeys were planning to move a lever along a specific arc (Georgopoulos, Lurito,
Petrides, Schwartz, & Massey, 1989). Crucially, the neurons fired when the animals were
preparing to move their arms, not actually moving them.

To study motor imagery in humans, researchers often used mental rotation paradigms. In
the seminal mental rotation paradigm designed by Shepard and Metzler (1971), a pair of
3D objects with several arms (each consisting of small cubes) is presented visually (Fig­
ure 5.2). The task of the participants is to decide whether the two objects have the same
shape, regardless of difference in their orientation. The key finding is that the time to
make this judgment increases linearly as the angular disparity between the two objects
increases (i.e., mental rotation effect). Subsequent studies showed that the mental rota­

Page 13 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
tion effect is found with alphanumerical stimuli (e.g., Cooper & Shepard, 1973, Koriat &
Norman, 1985), two-dimensional line drawings of letter-like asymmetrical characters
(e.g., Tarr & Pinker, 1989), and pictures of common objects (e.g., Jolicoeur, 1985).

Figure 5.2 Example of a pair of Shepard and Met­


zler–like three-dimensional objects with (a) identical
and (b) different shapes with a 50-degree rotation of
the object on the right.

Richter et al. (2000) in an fMRI study found that mental rotation of Shepard and Metzler
stimuli elicited activation in the superior parietal lobes bilaterally, the supplementary mo­
tor cortex, and the left (p. 83) primary motor cortex. Results from a hand mental rotation
study provided additional evidence that motor processes were involved during image
transformation (Parsons et al., 1995). Pictures of hands were presented in the right or left
visual field with different orientations, and participants determined whether each picture
depicted a left or right hand. Parsons and colleagues reasoned that the motor cortex
would be recruited if participants mentally rotated their own hand in congruence with the
orientation of the stimulus presented to make their judgment. Bilateral activation was
found in the supplementary motor cortex, and critically, activation in the prefrontal and
the insular premotor areas occurred in the hemisphere contralateral to the stimulus
handedness. Activation was not restricted to brain areas that implemented motor func­
tions; significant activation was also reported in the frontal and parietal lobes as well as
in area 17.

According to Decety (1996), image rotation occurs because we anticipate what we would
see if we manipulate an object, which implies that motor areas are recruited during men­
tal rotation regardless of the category of objects rotated. Kosslyn, DiGirolamo, Thompson,
and Alpert (1998) in a PET study directly tested this assumption by asking participants ei­
ther to mentally rotate inanimate 3D armed objects or pictures of hands. In both condi­
tions, the two objects (or the two hands) were presented with different angular dispari­
ties, and participants judged whether the two objects (or hands) were identical. To deter­
mine the brain areas specifically activated during mental rotation, each experimental con­
ditions was compared with a baseline condition in which the two objects (or hands) were
presented in the same orientation. The researchers found activation in the primary motor
cortex (area M1), premotor cortex, and posterior parietal lobe when participants rotated
hands. In contrast, none of the frontal motor areas was activated when participants men­
tally rotated inanimate objects. The findings suggest that there are at least two ways ob­
jects in images can be rotated: one that relies heavily on motor processes, and one that
does not. However, the type of stimuli rotated might not predict when the motor cortex is
recruited. In fact, Cohen et al. (1996) in an fMRI study found that motor areas were acti­

Page 14 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
vated in half of the participants in a mental rotation task using 3D armed object similar to
the one used in the Kosslyn et al. (1998) study.

Strategies in Mental Rotation Tasks

The fact that mental rotation of inanimate objects elicits activation in frontal motor areas
in some participants but not others suggests that there might be more than one strategy
to rotate this type of object. Kosslyn, Thompson, Wraga, and Alpert (2001) tested whether
in a mental rotation task of 3D armed objects participants could imagine the rotation of
objects in two different ways: as if an external force (such as a motor) was rotating the
objects (i.e., external action condition), or as if the objects were being physically manipu­
lated (i.e., internal action condition). Participants received different sets of instructions
and practice procedures to prompt them to use one of the two strategies (external action
vs. internal action). In the practice of the external action condition, a wooden model of a
typical Shepard and Metzler object was rotated by an electric motor. In contrast, in the
internal condition, participants rotated the wooden model physically. The object used dur­
ing practice was not used on the experimental trials. On each new set of trials, partici­
pants were instructed to mentally rotate the object in the exact same way the wooden
model was rotated in the preceding practice session. The crucial finding was that area
M1 was activated when participants mentally rotated the object on the internal action tri­
als but not on the external action trials. However, posterior parietal and secondary motor
(p. 84) areas were recruited in both conditions. The results have two implications: First,

mental rotation in general (independently of the type of stimuli) can be achieved by imag­
ining the physical manipulation of the object. Second, participants can adopt one or the
other strategy voluntarily regardless of their cognitive styles or cognitive abilities.

However, the previous study left open the question of whether one can spontaneously use
a motor strategy to perform a mental rotation task of inanimate objects. Wraga et al.
(2003) addressed this issue in a PET study. In their experiment, participants performed ei­
ther a mental rotation task of pictures of hands (similar to the one used by Kosslyn et al.,
1998) and then a Shepard and Metzler rotation task or two Shepard and Metzler tasks.
The authors reasoned that for the group that started with the mental rotation task of
hands, motor processes involved in the hand rotation task would covertly transfer to the
Shepard and Metzler task. In fact, when the brain activation in the two groups of partici­
pants were compared in the second mental rotation task (Shepard and Metzler task in
both groups), activation in the motor areas (areas 6 and M1) were found only in the group
that performed a hand rotation task before the Shepard and Metzler task (Figure 5.3).
The results clearly demonstrate that motor processes can be used spontaneously to men­
tally rotate objects that are not body parts.

Page 15 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Functional Role of Area M1

Figure 5.3 Brain activations observed in the internal


action minus the external action conditions.

The studies we reviewed suggest that area M1 plays a role in the mental transformation
of objects. However, none addressed whether M1 plays a functional role in mental trans­
formation and more specifically in mental rotation of objects. To test this issue, Ganis,
Keenan, Kosslyn, and Pascual-Leone (2000) administered single-pulse TMS to the left pri­
mary motor cortex of participants while they performed mental rotations of line drawings
of hands or feet presented in their right visual field. Single-pulse TMS was administered
at different time intervals from the stimulus onset (400 or 650 ms) to determine when pri­
mary motor areas are recruited during mental rotation. In addition, to test whether men­
tal rotation of body parts is achieved by imagining the movement of the corresponding
part of the body, single-pulse TMS was delivered specifically to the hand area of M1. Par­
ticipants required more time and made more errors when a single-pulse TMS was deliv­
ered to M1, when the single-pulse TMS was delivered 650 ms rather than 400 ms after
stimulus onset, and when participants mentally rotated hands rather than feet. Within the
limits of the spatial resolution of the TMS methodology, the results suggest that M1 is re­
quired to perform mental rotation of body parts by mapping the movement on one’s own
body part but only after the visual and spatial relations of the stimuli have been encoded.
Tomasino et al. (2005) reported converging data supporting the functional role of M1 in
mental rotation by using a mental rotation task of hands in a TMS study.

However, the data are not sufficient to claim that the computations are actually taking
place in M1. It is possible that M1 relays information computed elsewhere in the brain
(such as in the posterior parietal cortex). And in fact, Sirigu, Duhamel, Cohen, Pillon,
Dubois, and Agid (1996) demonstrated that the parietal cortex, not the motor cortex, is
critical to generate mental movement representations. Patients with lesions restricted to
the parietal cortex showed deficit in predicting the time necessary to perform specific fin­

Page 16 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
ger movements, whereas no such deficit was reported for a patient with lesions restricted
to M1.

Conclusion
Some remain dubious that mental imagery can be functionally meaningful and can consti­
tute a topic of research on its own. However, by drifting away from a purely introspective
approach of mental imagery to embrace more objective approaches, and notably by using
neuroimaging, researchers have collected evidence that mental images are depictive rep­
resentations interpreted by cognitive processes at play in other systems—like the percep­
tual and the (p. 85) motor systems. In fact, we hope that this review of the literature has
made clear that there is little evidence to counter the concepts that most of the same
neural processes underlying perception are also used in visual mental imagery and that
motor imagery can recruit the motor system in a similar way that physical action does.
Researchers now rely on what is known of the organization of the perceptual and motor
systems and of the key features of the neural mechanisms in those systems to refine the
characterization of the cognitive mechanisms at play in the mental imagery system. The
encouraging note is that each new characterization of the perceptual and motor systems
brings a chance to better understand neural mechanisms at play in mental imagery.

Finally, with the ongoing development of more elaborate neuroimaging techniques and
analyses of the BOLD signal, mental imagery researchers have an increasing set of tools
at their disposal to resolve complicate questions about mental imagery. A number of ques­
tions remain to be answered in order to achieve a full understanding of the neural mecha­
nisms carrying shape, color, spatial, and motor imagery. For example, although much evi­
dence points toward an overlapping of perceptual and visual mental imagery processes in
high-level visual cortices—temporal and parietal lobes—evidence remains mixed at this
point concerning the role of lower level processes in visual mental imagery. Indeed, we
need to understand the circumstances under which the early visual cortex is recruited
during mental imagery. Another problem that warrants further investigation is the neural
basis of the individual differences observed in mental imagery abilities. As a prerequisite,
we can develop objective methods to measure individual differences in those abilities.

References
Aleman, A., Schutter, D. J. L. G., Ramsey, N. F., van Honk, J., Kessels, R. P. C., Hoogduin, J.
M., Postma, A., Kahn, R. S., & de Haan, E. H. F. (2002). Functional anatomy of top-down
visuospatial processing in the human brain: Evidence from rTMS. Cognitive Brain Re­
search, 14, 300–302.

Anderson, A. K. (1978). Arguments concerning representations for mental imagery. Psy­


chological Review, 85, 249–277.

Page 17 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Anton, G. (1899). Über die Selbstwahrnehmungen der Herderkranungen des Gehirns
durch den Kranken bei Rindenblindheit. Archiv für Psychiatrie und Nervenkrankheiten,
32, 86–127.

Bartolomeo, P. (2002). The relationship between visual perception and visual mental im­
agery: A reappraisal of the neuropsychological evidence. Cortex, 38, 357–378.

Bartolomeo, P. (2008). The neural correlates of visual mental imagery: An ongoing de­
bate. Cortex, 44, 107–108.

Bartolomeo, P., Bachoud-Levi, A. C., De Gelder, B., Denes, G., Dalla Barba, G., Brugieres,
P., et al. (1998). Multiple-domain dissociation between impaired visual perception and
preserved mental imagery in a patient with bilateral extrastriate lesions. Neuropsycholo­
gia, 36, 239–249.

Behrmann, M., Moscovitch, M., & Winocur, G. (1994). Intact visual imagery and impaired
visual perception in a patient with visual agnosia. Journal of Experimental Psychology:
Human Perception and Performance, 20, 1068–1087.

Borst, G., Thompson, W. L., & Kosslyn, S. M. (2011). Understanding the dorsal and ventral
systems of the cortex: Beyond dichotomies. American Psychologist, 66, 624–632.

Chatterjee, A., & Southwood, M. H. (1995). Cortical blindness and visual imagery. Neurol­
ogy, 45.

Cohen, M. S., Kosslyn, S. M., Breiter, H. C., DiGirolamo, G. J., Thompson, W. L.,
Bookheimer, S. Y., Belliveau, J. W., & Rosen, B. R. (1996). Changes in cortical activity dur­
ing mental rotation: A mapping study using functional MRI. Brain, 119, 89–100.

Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental im­
ages. In W. G. Chase (Eds.), Visual information processing (pp. 75–176). New York: Acade­
mic Press.

Decety, J. (1996). Neural representation for action. Reviews in the Neurosciences, 7, 285–
297.

Decety, J., & Jeannerod, M. (1995). Mentally simulated movements in virtual reality: Does
Fitts’s law hold in motor imagery? Behavioral Brain Research, 72, 127–134.

Denis, M., & Cocude, M. (1989). Scanning visual images generated from verbal descrip­
tions. European Journal of Cognitive Psychology, 1, 293–307.

Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental images: A window on the
mind. Current Psychology of Cognition, 18, 409–465.

Downing, P. E., Chan, A. W., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain
specificity in visual cortex. Cerebral Cortex, 16, 1453–1461.

Page 18 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Farah, M. J. (1984). The neurological basis of mental imagery: A componential analysis.
Cognition, 18, 245–272.

Farah, M. J., Hammond, K. M., Mehta, Z., & Ratcliff, G. (1989). Category-specificity and
modality-specificity in semantic memory. Neuropsychologia, 27, 193–200.

Farah, M. J., Soso, M. J., & Dasheiff, R. M. (1992). Visual angle of the mind’s eye before
and after unilateral occipital lobectomy. Journal of Experimental Psychology: Human Per­
ception and Performance, 18, 241–246.

Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the pri­
mate cerebral cortex. Cerebral Cortex, 1, 1–47.

Finke, R. A., & Pinker, S. (1982). Spontaneous imagery scanning in mental extrapolation.
Journal of Experimental Psychology: Learning, Memory and Cognition, 8, 142–147.

Formisano, E., Linden, D. E. J., Di Salle, F., Trojano, L., Esposito, F., Sack, A. T., Grossi, D.,
Zanella, F. E., & Goebel, R. (2002). Tracking the mind’s image in the brain I: Time-re­
solved fMRI during visuospatial mental imagery. Neuron, 35, 185–194.

Ganis, G., Keenan, J. P., Kosslyn, S. M., & Pascual-Leone, A. (2000). Transcranial magnetic
stimulation of primary motor cortex affects mental rotation. Cerebral Cortex, 10, 175–
180.

Ganis, G., Thompson, W. L., & Kosslyn, S. M. (2004). Brain areas underlying visual mental
imagery and visual perception: An fMRI study. Brain Research: Cognitive Brain Research,
20, 226–241.

Ganis, G., Thompson, W. L., Mast, F. W., & Kosslyn, S. M. (2003). Visual imagery in
(p. 86)

cerebral visual dysfunction. Neurologic Clinics, 21, 631–646.

Georgopoulos, A. P., Lurito, J. T., Petrides, M., Schwartz, A. B., & Massey, J. T. (1989).
Mental rotation of the neuronal population vector. Science, 243, 234–236.

Goldenberg, G. (1992). Loss of visual imagery and loss of visual knowledge: A case study.
Neuropsychologia, 30, 1081–1099.

Goldenberg, G., Müllbacher, W., & Nowak, A. (1995). Imagery without perception: A case
study of anosognosia for cortical blindness. Neuropsychologia, 33, 1373–1382.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15, 20–25.

Guariglia, C., Padovani, A., Pantano, P., & Pizzamiglio, L. (1993). Unilateral neglect re­
stricted to visual imagery. Nature, 364, 235–237.

Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., Her­
scovitch, P., Schapiro, M. B., & Rapoport, S. I. (1991). Dissociation of object and spatial

Page 19 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
visual processing pathways in human extrastriate cortex. Proceedings of the National
Academy of Sciences U S A, 88, 1621–1625.

Ishai, A., Haxby, J. V., & Ungerleider, L. G. (2002). Visual imagery of famous faces: Effects
of memory and attention revealed by fMRI. NeuroImage, 17, 1729–1741.

Ishai, A., Ungerleider, L. G., & Haxby, J. V. (2000). Distributed neural systems for the gen­
eration of visual images. Neuron, 28, 979–990.

Jacobson, L. S., Pearson, P. M., & Robertson, B. (2008). Hue-specific color memory impair­
ment in an individual with intact color perception and color naming. Neuropsychologia,
46, 22–36.

Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory and Cognition,
13, 289–303.

Kaas, A., Weigelt, S., Roebroeck, A., Kohler, A., & Muckli, L. (2010). Imagery of a moving
object: The role of occipital cortex and human MT/V5+. NeuroImage, 49, 794–804.

Kanwisher, N., & Yovel, G. (2006). The fusiform face area: A cortical region specialized for
the perception of faces. Philosophical Transactions of the Royal Society of London B, 361,
2109–2128.

Klein, I., Dubois, J., Mangin, J. F., Kherif, F., Flandin, G., Poline, J. B., Denis, M., Kosslyn, S.
M., & Le Bihan, D. (2004). Retinotopic organization of visual mental images as revealed
by functional magnetic resonance imaging. Brain Research: Cognitive Brain Research, 22,
26–31.

Klein, I., Paradis, A.-L., Poline, J.-B., Kosslyn, S. M., & Le Bihan, D. (2000). Transient activ­
ity in human calcarine cortex during visual imagery. Journal of Cognitive Neuroscience,
12, 15–23.

Koriat, A., & Norman, J., (1985). Reading rotated words. Journal of Experimental Psychol­
ogy: Human Perception and Performance, 11, 490–508.

Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press.

Kosslyn, S. M. (1994). Image and brain. Cambridge, MA: Harvard University Press.

Kosslyn, S. M., Alpert, N. M., Thompson, W. L., Maljkovic, V., Weise, S. B., Chabris, C.,
Hamilton, S. E., & Buonanno F. S. (1993). Visual mental imagery activates topographically
organized visual cortex: PET investigations. Journal of Cognitive Neuroscience, 5, 263–
287.

Kosslyn, S. M., Ball, T. M., & Reiser, B. J. (1978). Visual images preserve metric spatial in­
formation: Evidence from studies of image scanning. Journal of Experimental Psychology:
Human Perception and Performance, 4, 47–60.

Page 20 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Kosslyn, S. M., DiGirolamo, G., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation
of objects versus hands: Neural mechanisms revealed by positron emission tomography.
Psychophysiology, 35, 151–161.

Kosslyn, S. M., Pascual-Leone, A., Felician, O., Camposano, S., Keenan, J. P., Thompson, W.
L., Ganis, G., Sukel, K. E., & Alpert, N. M. (April 2, 1999). The role of area 17 in visual im­
agery: Convergent evidence from PET and rTMS. Science, 284, 167–170.

Kosslyn, S. M., & Thompson, W. L. (2003). When is early visual cortex activated during vi­
sual mental imagery? Psychological Bulletin, 129, 723–746.

Kosslyn, S. M., Thompson, W. L., & Alpert, N. M. (1997). Neural systems shared by visual
imagery and visual perception: A positron emission tomography study. NeuroImage, 6,
320–334.

Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New
York: Oxford University Press.

Kosslyn, S. M., Thompson, W. L., Kim, I. J., & Alpert, N. M. (1995). Topographical repre­
sentations of mental images in primary visual cortex. Nature, 378, 496–498.

Kosslyn, S. M., Thompson, W. L., Wraga, M., & Alpert, N. M. (2001). Imagining rotation by
endogenous versus exogenous forces: Distinct neural mechanisms. NeuroReport, 12,
2519–2525.

Levine, D. N., Warach, J., & Farah, M. J. (1985). Two visual systems in mental imagery:
Dissociation of “what” and “where” in imagery disorders due to bilateral posterior cere­
bral lesions. Neurology, 35, 1010–1018.

Luzzatti, C., Vecchi, T., Agazzi, D., Cesa-Bianchi, M., & Vergani, C. (1998). A neurological
dissociation between preserved visual and impaired spatial processing in mental imagery.
Cortex, 34, 461–469.

Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets top-
down: neuronal interactions during perception and imagery. Cerebral Cortex, 14, 1256–
1265.

Mellet, E., Briscogne, S., Crivello, F., Mazoyer, B., Denis, M., & Tzourio-Mazoyer, N.
(2002). Neural basis of mental scanning of a topographic representation build from a text.
Cerebral Cortex, 12, 1322–1330.

Mellet, E., Tzourio, N., Crivello, F., Joliot, M., Denis, M., & Mazoyer, B. (1996). Functional
anatomy of spatial mental imagery generated from verbal instructions. Journal of Neuro­
science, 16, 6504–6512.

Mellet, E., Tzourio, N., Denis, M., & Mazoyer, B. (1995). A positron emission tomography
study of visual and mental spatial exploration. Journal of Cognitive Neuroscience, 4, 433–
445.

Page 21 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Moro, V., Berlucchi, G., Lerch, J., Tomaiuolo, F., & Aglioti, S. M. (2008). Selective deficit of
mental visual imagery with intact primary visual cortex and visual perception. Cortex, 44,
109–118.

Morton, N., & Morris, R. G. (1995). Image transformations dissociated from visuo-spatial
working memory. Cognitive Neuropsychology, 12, 767–791.

O’Craven, K. M., & Kanwisher, N. (2000). Mental imagery of faces and places activates
corresponding stimulus-specific brain regions. Journal of Cognitive Neuroscience, 12,
1013–1023.

(p. 87) Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart and Win­
ston.

Parsons, L. M., Fox, P. T., Downs, J. H., Glass, T., Hirsch, T. B., Martin, C. C., Jerabek, P. A.,
Lancaster, J. L. (1995). Use of implicit motor imagery for visual shape discrimination as
revealed by PET. Nature, 375, 54–58.

Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental
imagery. Psychological Bulletin, 80, 1–24.

Pylyshyn, Z. W. (1981). Psychological explanations and knowledge-dependent processes.


Cognition, 10, 267–274.

Pylyshyn, Z. W. (2002). Mental imagery: In search of a theory. Behavioral and Brain


Sciences, 25, 157–237.

Pylyshyn, Z. W. (2003a). Return of the mental image: Are there really pictures in the
head? Trends in Cognitive Sciences, 7, 113–118.

Pylyshyn, Z. W. (2003b). Seeing and visualizing: It s not what you think. Cambridge, MA:
MIT Press.

Pylyshyn, Z. W. (2007). Things and places: How the mind connects with the world. Cam­
bridge, MA: MIT Press.

Richter, W., Somorjai, R., Summers, R., Jarmasz, M., Tegeler, C., Ugurbil, K., Menon, R.,
Gati, J. S., Georgopoulos, A. P., & Kim, S.-G. (2000). Motor area activity during mental ro­
tation studied by time-resolved single-trial fMRI. Journal of Cognitive Neuroscience, 12,
310–320.

Riddoch, M. J., & Humphreys, G. W. (1987). A case of integrative visual agnosia. Brain,
110, 1431–1462.

Rizzo, M., Smith, V., Pokorny, J., & Damasio, A. (1993). Color perception profiles in central
achromatopsia. Neurology 43, 995–1001.

Sartori, G., & Job, R. (1988). The oyster with four legs: A neuropsychological study on the
interaction of visual and semantic information. Cognitive Neuropsychology, 5, 105–132.
Page 22 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., Brady, T. J., Rosen,
B. R., & Tootell, R. B. H. (1995). Borders of multiple visual areas in humans revealed by
functional magnetic resonance imaging. Science, 268, 889–893.

Servos, P., & Goodale, M. A. (1995). Preserved visual imagery in visual form agnosia. Neu­
ropsychologia, 33 (11), 1383–1394.

Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional objects.


Science, 171, 701–703.

Shuttleworth, E. C., Jr., Syring, V., & Allen, N. (1982). Further observations on the nature
of prosopagnosia. Brain and Cognition, 1, 307–322.

Siebner, H. R., Peller, M., Willoch, F., Minoshima, S., Boecker, H., Auer, C., Drzezga, A.,
Conrad, B., & Bartenstein, P. (2000). Lasting cortical activation after repetitive TMS of
the motor cortex: A glucose metabolic study. Neurology, 54, 956–963.

Sirigu, A., Duhamel, J.-R., Cohen, L., Pillon, B., Dubois, B., & Agid, Y. (1996). The mental
representation of hand movements after parietal cortex damage. Science, 273 (5281),
1564–1568.

Slotnick, S. D., Thompson, W. L., & Kosslyn, S. M. (2005). Visual mental imagery induces
retinotopically organized activation of early visual areas. Cerebral Cortex, 15, 1570–1583.

Sparing, R., Mottaghy, F., Ganis, G. Thompson, W. L., Toepper, R., Kosslyn, S. M., & Pas­
cual-Leone, A. (2002). Visual cortex excitability increases during visual mental imagery: A
TMS study in healthy human subjects. Brain Research, 938, 92–97.

Stokes, M., Thompson, R., Cusack, R., & Duncan, J. (2009). Top-down activation of shape-
specific population codes in visual cortex during mental imagery. Journal of Neuroscience,
29, 1565–1572.

Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation-dependence in shape
recognition. Cognitive Psychology, 21, 233–282.

Thirion, B., Duchesnay, E., Hubbard, E., Dubois, J., Poline, J.-B., Lebihan, D., & Dehaene,
S. (2006). Inverse retinotopy: Inferring the visual content of images from brain activation
patterns. Neuroimage, 33, 1104–1116.

Tomasino, B., Borroni, P., Isaja, A., & Rumiati, R. I. (2005). The role of the primary motor
cortex in mental rotation: A TMS study. Cognitive Neuropsychology, 22, 348–363.

Trojano, L., Grossi, D., Linden, D. E., Formisano, E., Hacker, H., Zanella, F. E., Goebel, R.,
& Di Salle, F. (2000). Matching two imagined clocks: The functional anatomy of spatial
analysis in the absence of visual stimulation. Cerebral Cortex, 10, 473–481.

Page 23 of 24
Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor
Imagery
Trojano, L., Linden, D. E., Formisano, E., Grossi, D., Sack, A. T., & Di Salle, F. (2004).
What clocks tell us about the neural correlates of spatial imagery. European Journal of
Cognitive Psychology, 16, 653–672.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam­
bridge, MA: MIT Press.

Vanlierde, A., de Volder, A. G., Wanet-Defalque, M. C., & Veraart C. (2003). Occipito-pari­
etal cortex activation during visuo-spatial imagery in early blind humans. NeuroImage,
19, 698–709.

Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20,
158–177.

Wraga, M., Shephard, J. M., Church, J. A., Inati, S., & Kosslyn, S. M. (2005). Imagined ro­
tations of self versus objects: An fMRI study. Neuropsychologia, 43, 1351–1361.

Wraga, M. J., Thompson, W. L., Alpert, N. M., & Kosslyn, S. M. (2003). Implicit transfer of
motor strategies in mental rotation. Brain and Cognition, 52, 135–143.

Young, A. W., Humphreys, G. W., Riddoch, M. J., Hellawell, D. J., & de Haan, E. H. (1994).
Recognition impairments and face imagery. Neuropsychologia, 32, 693–702.

Grégoire Borst

Grégoire Borst is an assistant professor in developmental psychology and cognitive


neuroscience at Paris Descartes University.

Page 24 of 24
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose

Looking at the Nose Through Human Behavior, and at


Human Behavior Through the Nose  
Roni Kahana and Noam Sobel
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0006

Abstract and Keywords

Mammalian olfaction is highly stereotyped. It consists of a sensory epithelium in the nose,


where odorants are transduced to form neural signals. These neural signals are projected
via the olfactory nerve to the olfactory bulb, where they generate spatiotemporal patterns
of neural activity subserving odorant discrimination. This information is then projected
via the olfactory tract to olfactory cortex, a neural substrate optimized for olfactory ob­
ject perception. In contrast to popular notions, human olfaction is quite keen. Thus, sys­
tematic analysis of human olfactory perception has uncovered fundamental properties of
mammalian olfactory processing, and mammalian olfaction explains fundamental proper­
ties of human behaviors such as eating, mating, and social interaction, which are all criti­
cal for survival.

Keywords: olfactory perception, olfaction, behavior, odorant, olfactory epithelium, olfactory discrimination, piri­
form cortex, eating, mating, social interaction

Introduction
Even in reviews on olfaction, it is often stated that human behavior and perception are
dominated by vision, or that humans are primarily visual creatures. This reflects the con­
sensus in cognitive neuroscience (Zeki & Bartels, 1999). Indeed, if asked which distal
sense we would soonest part with, most (current authors included) would select olfaction
before audition or vision. Thus, whereas primarily olfactory animals such as rodents are
referred to as macrosmatic, humans are considered microsmatic.

That said, we trust our nose over our eyes and ears in the two most critical decisions we
make: what we eat, and with whom we mate (Figure 6.1).

We review various studies in this respect, yet first we turn to the reader’s clear intuition:
Given a beautiful-looking slice of cake that smells of sewage and a mushy-looking shape­
less mixture that smells of cinnamon and banana, which do you eat? Given a gorgeous-

Page 1 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
looking individual who smells like yeast and a profoundly physically unattractive person
who smells like sweet spice, with whom do you mate? In both of these key behaviors, hu­
mans, like all mammals, are primarily olfactory. With this simple truth in mind, namely,
that in our most important decisions we follow our nose, should humans nevertheless still
be considered microsmatic (Stoddart, 1990)?

Functional Neuroanatomy of the Mammalian


Olfactory System

Figure 6.1 The primacy of human olfaction. Humans


trust olfaction over vision and audition in key behav­
iors related to survival, such as mate selection and
determination of edibility.

Courtesy of Gilad Larom.

Page 2 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose

Figure 6.2 Schematic of the human olfactory system.


Odorants are transduced at the olfactory epithelium
(1). Receptors of different subtypes (three illustrat­
ed, ∼1,000 in mammals) converge via the olfactory
nerve onto common glomeruli at the olfactory bulb
(2). From here, information is conveyed via the later­
al olfactory tract to primary olfactory cortex (3).
From here, information is conveyed throughout the
brain, most notably to orbitofrontal cortex (5) via a
direct and indirect route through the thalamus (4).

(From Sela & Sobel, 2010. Reprinted with permission


from Springer.)

Before considering the behavioral significance of human olfaction, we first provide a ba­
sic overview of olfactory system organization. The mammalian olfactory system follows a
rather clear hierarchy, starting with transduction at the olfactory epithelium in the nose,
then initial processing subserving odor discrimination in the olfactory bulb, and finally
higher order processing related to odor object formation and odor memory in primary ol­
factory cortex (R. I. Wilson & Mainen, 2006) (Figure 6.2). This organization is bilateral
and symmetrical, and although structural connectivity appears largely (p. 89) ipsilateral
(left epithelium to left bulb to left cortex) (Powell, Cowan, & Raisman, 1965), functional
measurements have implied more contralateral than ipsilateral driving of activity (Cross
et al., 2006; McBride & Slotnick, 1997; J. Porter, Anand, Johnson, Khan, & Sobel, 2005;
Savic & Gulyas, 2000; D. A. Wilson, 1997). The neural underpinnings of this functional
contralaterality remain unclear.

Page 3 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
More than One Nose in the Nose

Figure 6.3 More than one nose in the nose. A, The


olfactory system in the mouse contains multiple sub­
systems: the olfactory epithelium (OE), the
vomeronasal organ (VNO), the Grueneberg ganglion
(GG), and the septal organ (SO). Sensory neurons po­
sitioned in the OE, SO, and GG project to the main ol­
factory bulb (MOB), whereas sensory neurons of the
VNO project to the accessory olfactory bulb (AOB).

(From Ferrero & Liberles, 2010, originally adapted


from Buck, 2000.) B, The human nose is innervated
by both olfactory and trigeminal sensory nerve end­
ings. (Modification of illustration by Patrick J. Lynch.)

Odorants are concurrently processed in several neural subsystems beyond the above-de­
scribed main olfactory system (Breer, Fleischer, & Strotmann, 2006) (Figure 6.3). For ex­
ample, air-borne molecules are transduced at endings of the trigeminal nerve in the eye,
nose, and throat (Hummel, 2000). It is trigeminal activation that provides the cooling sen­
sation associated with odorants such as menthol, or the stingy sensation associated with
odorants such as ammonia or onion. In rodents, at least three additional sensing mecha­
nisms have been identified in the nose. These include (1) the septal organ, which consists
of a small patch of olfactory receptors that are anterior to the main epithelium (Ma et al.,
2003); (2) the Grueneberg organ, which contains small grape-like clusters of receptors at
the anterior end of the nasal passage that project to a separate subset of main olfactory
bulb targets (Storan & Key, 2006); and (3) the vomeronasal system, or accessory olfactory
system (Halpern, 1987; Wysocki & Meredith, 1987). The accessory olfactory system is
equipped with a separate bilateral epithelial structure, the vomeronasal organ, or VNO
(sometimes also referred to as Jacobson’s organ). The VNO is a (p. 90) pit-shaped struc­
ture at the anterior portion of the nasal passage, containing receptors that project to an
accessory olfactory bulb, which in turn projects directly to critical components of the lim­
bic system such as the amygdala and hypothalamus (Keverne, 1999; Meredith, 1983) (see

Page 4 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Figure 6.3). In rodents, the accessory olfactory system plays a key role in mediating so­
cial chemosignaling (Halpern, 1987; Kimchi, Xu, & Dulac, 2007; Wysocki & Meredith,
1987). Whether humans have a septal organ or Grueneberg organ has not been carefully
studied, and it is largely held that humans do not have an accessory olfactory system, al­
though this issue remains controversial (Frasnelli, Lundstrˆm, Boyle, Katsarkas, & Jones
Gotman; Meredith, 2001; Monti-Bloch, Jennings-White, Dolberg, & Berliner, 1994; Witt &
Hummel, 2006). Regardless of this debate, it is clear that the sensation of smell in hu­
mans and other mammals is the result of common activation across several neural subsys­
tems (Restrepo, Arellano, Oliva, Schaefer, & Lin, 2004; Spehr et al., 2006). However, be­
fore air-borne stimuli are processed, they first must be acquired.

Sniffs: More than a Mechanism for Odorant Sampling

Figure 6.4 Sniffing. Careful visualization of human


sniff airflow revealed that although the nostrils are
structurally close together, an asymmetry in nasal
airflow generates a “functional distance” between
the nostrils. A, A PIV laser light sheet was oriented in
a coronal plane intersecting the nostrils at their mid­
point. B and C, PIV images of particle-laden inspired
air stream for two example sniffs. D, A contour plot
of velocity magnitude of the inspired air stream into
the nose of a subject sniffing at 0.2 Hz. E, Velocity
profiles of the right and left naris; abscissa indicates
distance from the tip of the nose to the lateral extent
of the naris.

From Porter et al., 2007. Reprinted with permission


from Nature.

Mammalian olfaction starts with a sniff—a critical act of odor sampling. Sniffs are not
merely an epiphenomenon of olfaction, but rather are an intricate component of olfactory
perception (Kepecs, Uchida, & Mainen, 2006, 2007; Mainland & Sobel, 2006; Schoenfeld
& Cleland, 2006). Sniffs are in part a reflexive action (Tomori, Benacka, & Donic, 1998),
which is then rapidly modified in accordance with odorant content (Laing, 1983) (Figure
6.4). Humans begin tailoring their sniff according to odorant properties within about 160
ms of sniff onset, reducing sniff magnitude for both intense (B. N. Johnson, Mainland, &
Sobel, 2003) and unpleasant (Bensafi et al., 2003) odorants. We have proposed that the
mechanism that tailors a sniff to its content is cerebellar (Sobel, Prabhakaran, Hartley, et

Page 5 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
al., 1998), and cerebellar lesions indeed negate this mechanism (Mainland, Johnson,
Khan, Ivry, & Sobel, 2005). Moreover, not only are sniffs the key mechanism for odorant
sampling, they also play a key role in timing and organization of neural representation in
the olfactory system. This influence of sniffing on neural representation in olfaction may
begin at the earliest phase of olfactory processing because olfactory receptors are also
mechanosensitive (Grosmaitre, Santarelli, Tan, Luo, & Ma, 2007), potentially responding
to sniffs even without odor. Sniff properties are then reflected in neural activity at both
the olfactory bulb (Verhagen, Wesson, Netoff, White, & Wachowiak, 2007) and olfactory
cortex (Sobel, Prabhakaran, Desmond, et al., 1998). Indeed, negating sniffs (whether
their execution, or only their (p. 91) intension) may underlie in part the pronounced differ­
ences in olfactory system neural activity during wake and anesthesia (Rinberg, Koulakov,
& Gelperin, 2006). Finally, odor sampling is not only through the nose (orthonasal) but al­
so through the mouth (retronasal): Food odors make their way to the olfactory system by
ascending through the posterior nares of the nasopharynx (Figure 6.5). Several lines of
evidence have suggested partially overlapping yet partially distinct neural substrates sub­
serving orthonasal and retronasal human olfaction (Bender, Hummel, Negoias, & Small,
2009; Hummel, 2008; Small, Gerber, Mak, & Hummel, 2005).

Olfactory Epithelium: The Site of Odorant Transduction

Figure 6.5 Schematic drawing of the nasal cavity


with the lower, middle, and upper turbinates. Airflow
in relation to orthonasal (through the nostrils) or
retronasal (from the mouth/pharynx to the nasal cavi­
ty) is indicated by arrows, both leading to the olfacto­
ry epithelium located just beneath the cribriform
plate.

From Negoias, Visschers, Boelrijk, & Hummel, 2008.


Reprinted with permission from Elsevier.

Once sniffed, an odorant makes its way up the nasal passage, where it crosses a mucous
membrane before (p. 92) interacting with olfactory receptors that line the olfactory ep­
ithelium. This step is not inconsequential to the olfactory process. Odorants cross this
mucus following the combined action of passive gradients and active transporters, which
generate an odorant-specific pattern of dispersion (Moulton, 1976). These so-called sorp­
Page 6 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
tion properties have been hypothesized to play a key role in odorant discrimination, in
that they form a sort of chromatographic separation at the nose (Mozell & Jagodowicz,
1973). The later identification of an inordinately large family of specific olfactory receptor
types (L. Buck & Axel, 1991; Zhang & Firestein, 2002) shifted the focus of enquiry regard­
ing odor discrimination to that of receptor–ligand interactions, but the chromatographic
component of this process has never been negated and likely remains a key aspect of
odorant processing.

Once an odorant crosses the mucosa, it interacts with olfactory receptors at the sensory
end of olfactory receptor neurons. Humans have about 12 million bipolar receptor neu­
rons (Moran, Rowley, Jafek, & Lovell, 1982) that differ from typical neurons in that they
constantly regenerate from a basal cell layer throughout the lifespan (Graziadei & Monti
Graziadei, 1983). These neurons send their dendritic process to the olfactory epithelial
surface, where they form a knob from which five to twenty thin cilia extend into the mu­
cus. These cilia contain the olfactory receptors: 7-transmembrane G-protein–coupled sec­
ond-messenger receptors, where a cascade of events that starts with odorant binding cul­
minates in the opening of cross-membrane cation channels that depolarize the cell
(Firestein, 2001; Spehr & Munger, 2009; Zufall, Firestein, & Shepherd, 1994) (Figure
6.6). The mammalian genome contains more than 1,000 such receptor types (L. Buck &
Axel, 1991), yet humans functionally express only about 400 of these (Gilad & Lancet,
2003). Typically, each receptor neuron expresses only one receptor type, although recent
evidence from Drosophila has suggested that in some cases a single neuron may express
two receptor types (Goldman, Van der Goes van Naters, Lessing, Warr, & Carlson, 2005).
In rodents, receptor types are grouped into four functional expression zones along a
dorsoventral epithelial axis, yet are randomly dispersed within each zone (Ressler, Sulli­
van, & Buck, 1993; Strotmann, Wanner, Krieger, Raming, & Breer, 1992; Vassar, Ngai, &
Axel, 1993). Each receptor type is typically responsive to a small subset of odorants
(Hallem & Carlson, 2006; Malnic, Hirono, Sato, & Buck, 1999; Saito, Chi, Zhuang, Mat­
sunami, & Mainland, 2009), although some receptors may be responsive to only very few
odorants (Keller, Zhuang, Chi, Vosshall, & Matsunami, 2007), and other receptors may be
responsive to a very wide range of odorants (Grosmaitre et al., 2009). Despite some alter­
native hypotheses (Franco, Turin, Mershin, & Skoulakis, 2011), this receptor-to-odorant
specificity is widely considered the basis for olfactory coding (Su, Menuz, & Carlson,
2009).

Page 7 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Olfactory Bulb: A Neural Substrate for Odorant Discrimination

Figure 6.6 Receptor events in olfaction. Signal trans­


duction in an olfactory sensory neuron. Binding of an
odorant to its cognate odorant receptor (OR) results
in the activation of heterotrimeric G protein (Gαolf
plus Gβγ). Activated Gαolf in turn activates type III
adenylyl cyclase (AC3), leading to the production of
cyclic adenosine monophosphate (cAMP) from adeno­
sine triphosphate (ATP). cAMP gates or opens the
cyclic nucleotide–gated (CNG) ion channel, leading
to the influx of Na+ and Ca2+, depolarizing the cell.
This initial depolarization is amplified through the
activation of a Ca2+-dependent Cl− channel. In addi­
tion, cAMP activates protein kinase A (PKA), which
can regulate other intracellular events, including
transcription of cAMP-regulated genes.

Reprinted with permission from DeMaria & Ngai,


2010.

Whereas receptor types appear randomly dispersed throughout each epithelial subzone,
the path (p. 93) from epithelium to bulb via the olfactory nerve entails a unique pattern of
convergence that brings together all receptor neurons that express a particular receptor
type. These synapse onto one of two common points at the olfactory bulb, termed
glomeruli (Mombaerts et al., 1996). Thus, the number of glomeruli is expected to be
about double the number of receptor types, and the receptive range of a glomerulus is ex­
pected to reflect the receptive range of a given receptor type (Feinstein & Mombaerts,
2004). Within the glomeruli, receptor axons contact dendrites of either mitral or tufted
output neurons and periglomerular interneurons. Whereas these rules have been learned
mostly from studies in rodents, the human olfactory system may be organized slightly dif­
ferently; rather than the expected about 750 glomeruli (about double the number of ex­
pressed receptor types), postmortem studies revealed many thousands of glomeruli in the
human olfactory bulb (Maresh, Rodriguez Gil, Whitman, & Greer, 2008).

The stereotyped connectivity from epithelium to bulb generates a spatial representation


of receptor types on the olfactory bulb surface. In simple terms, each activated glomeru­
lus reflects the activation of a given receptor type. Thus, the spatiotemporal pattern of
bulbar activation is largely considered the base for olfactory discrimination coding
(Firestein, 2001). The common notion is that a given odorant is represented by the partic­
ular pattern of glomeruli activation in time. Indeed, various methods of recording neural
activity at the olfactory bulb have converged to support this notion (Leon & Johnson,
Page 8 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
2003; Su et al., 2009; Uchida, Takahashi, Tanifuji, & Mori, 2000) (Figure 6.7). Although it
is easy to grasp and convey this notion of a purely spatial substrate where different odors
induce different patterns of activation, this is clearly a simplified view because the partic­
ular timing of neural activity also clearly plays a role in odor coding at this stage. The role
of temporal neural activity patterns in odor coding was primarily uncovered in insects
(Laurent, 1997, 1999; Laurent, Wehr, & Davidowitz, 1996) but has been revealed in mam­
mals as well (Bathellier, Buhl, Accolla, & Carleton, 2008; Lagier, Carleton, & Lledo, 2004;
Laurent, 2002). Moreover, it is noteworthy that olfactory bulb lesions have a surprisingly
limited impact on olfactory discrimination (Slotnick & Schoonover, 1993), and a spa­
tiotemporal bulbar activation code has yet to be linked to meaningful olfactory informa­
tion within a predictive framework (Mainen, 2006). In other words, a “map of odors” on
the olfactory bulb is a helpful concept in understanding the olfactory system, but it is not
the whole story.

Primary Olfactory Cortex: A Loosely Defined Structure with Loosely


Defined Function

The structural and functional properties of epithelium and bulb are relatively straightfor­
ward: The epithelium is the site of transduction, where odorants become neural signals.
The bulb is the site of discrimination, where different odors form different spatiotemporal
patterns of neural activity. By contrast, the structure and function of primary olfactory
cortex remain unclear. In other words, there is no clear agreement as to what constitutes
primary olfactory cortex, let alone what it does.

By current definition, primary olfactory cortex consists of all brain regions that receive di­
rect input from the mitral and tufted cell axons of the olfactory bulb (Allison, 1954;
Carmichael, Clugnet, & Price, 1994; de Olmos, Hardy, & Heimer, 1978; Haberly, 2001; J.
L. Price, 1973, 1987; J. L. Price, 1990; Shipley, 1995). These comprise most of the paleo­
cortex, including (by order along the olfactory tract) the anterior olfactory cortex (also re­
ferred to as the anterior olfactory nucleus) (Brunjes, Illig, & Meyer, 2005), ventral tenia
tecta, anterior hippocampal continuation and indusium griseum, olfactory tubercle, piri­
form cortex, anterior cortical nucleus of the amygdala, periamygdaloid cortex, and rostral
entorhinal cortex (Carmichael et al., 1994) (Figure 6.8).

As can be appreciated by both the sheer area and diversity of cortical real estate that is
considered primary olfactory cortex, this definition is far from functional. One cannot as­
sign a single function to “primary olfactory cortex” when primary olfactory cortex is a la­
bel legitimately applied to a large proportion of the mammalian brain. The term primary
typically connotes basic functional roles such as early feature extraction, yet as can be ex­
pected, a region comprising in part piriform cortex, amygdala, and entorhinal cortex is in­
volved in far more complex sensory processing than mere early feature extraction. With
this in mind, several authors have simply shifted the definition by referring to the classic
primary olfactory structures as secondary olfactory structures, noting that the definition

Page 9 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
of mammalian primary olfactory cortex may better fit the olfactory bulb than piriform cor­
tex (Cleland & Sullivan, 2003; Haberly, 2001).

Figure 6.7 Spatial coding at the olfactory bulb. Pat­


terns of rat glomeruli activation (by 2-deoxyglucose
uptake) evoked by different odors. Activation is rep­
resented as the average z-score pattern for both
bulbs of up to four separate rats exposed to each
odor. Warmer colors indicate higher uptake.

From Johnson, Ong, & Leon, 2010. Copyright © 2009


Wiley-Liss, Inc.

At the same time, there has been a growing tendency to use the term primary olfactory
cortex for piriform cortex alone. Piriform cortex, the largest (p. 94) component of primary
olfactory cortex in mammals, lies along the olfactory tract at the junction of temporal and
frontal lobes and continues onto the dorsomedial aspect of the temporal lobe (see Figure
6.8A and B). Consistent with the latter approach, here we restrict our review of olfactory
cortex to the piriform portion of primary olfactory cortex alone.

Page 10 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Piriform Cortex: A Neural Substrate for Olfactory Object Formation

Figure 6.8 Human olfactory cortex. A, Ventral view


of the human brain in which the right anterior tem­
poral lobe has been resected in the coronal plane to
expose the limbic olfactory areas. B, Afferent output
from the olfactory bulb (OB) passes through the lat­
eral olfactory tract (LOT) and projects monosynapti­
cally to numerous regions, including the anterior ol­
factory nucleus (AON), olfactory tubercle (OTUB),
anterior piriform cortex (APC), posterior piriform
cortex (PPC), amygdala (AM), and entorhinal cortex
(EC). Downstream relays include the hippocampus
(HP) and the putative olfactory projection site in the
human orbitofrontal cortex (OFColf). C, Schematic
representation of the cellular organization of the piri­
form cortex. Pyramidal neurons are located in cell
body layers II and III, and their apical dendrites
project to molecular layer I. Layer I is subdivided in­
to a superficial layer (Ia) that contains the sensory
afferents from the olfactory bulb (shown in red) and
a deeper layer (Ib) that contains the associative in­
puts from other areas of the primary olfactory cortex
and higher order areas (shown in blue). Most of the
layer Ia afferents terminate in the APC, whereas
most of the layer Ib associative inputs terminate in
the posterior piriform cortex (PPC).

Reprinted with permission from Gottfried, 2010.

Piriform cortex is three-layered paleocortex that has been described in detail (Martinez,
Blanco, Bullon, & Agudo, 1987). In brief, layer I is subdivided into layer Ia, where afferent
fibers from the olfactory bulb terminate, and layer lb, where (p. 95) association fibers ter­
minate (see Figure 6.8C). Layer II is a compact zone of neuronal cell bodies. Layer III
contains neuronal cell bodies at a lower density than layer II and a large number of den­
dritic and axonal elements. Piriform input is widely distributed, and part of piriform out­
put feeds back into piriform as further distributed input. Moreover, piriform cortex is rec­
iprocally and extensively connected with several high-order areas of the cerebral cortex,
including the prefrontal, amygdaloid, perirhinal, and entorhinal cortices (Martinez et al.,
1987).

Page 11 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
The current understanding of piriform cortex function largely originated from the work of
Lew Haberly and colleagues (Haberly, 2001; Haberly & Bower, 1989; Illig & Haberly,
2003). These authors hypothesized that the structural organization of piriform cortex ren­
ders it highly suitable to function as a content-addressable memory system, where frag­
mented input can be used to “neurally reenact” a stored representation. Haberly and col­
leagues identified or predicted several aspects of piriform organization that render it an
ideal substrate for such a system. These predictions have been remarkably borne out in
later studies of structure and function. Haberly and colleagues noted that first, associa­
tive networks depend on spatially distributed input systems. Several lines of evidence
have indeed suggested that the projection from bulb to piriform is in fact spatially distrib­
uted. In other words, in contrast to the spatial clustering of responses at the olfactory
bulb, this ordering is apparently obliterated in the projection to piriform cortex (Stettler
& Axel, 2009). Second, the discriminative power of associative networks relies on positive
feedback via interconnections between the processing units that receive the distributed
input. Indeed, in piriform cortex, each pyramidal cell makes a small number of synaptic
contacts on a large number (>1,000) of other cells in piriform cortex at disparate loca­
tions. Axons from individual pyramidal cells also arborize extensively within many neigh­
boring cortical areas, most of which send strong projections back to piriform cortex (D.
M. G. Johnson, Illig, Behan, & Haberly, 2000). Third, in associative memory models, indi­
vidual inputs are typically weak relative to output threshold, a situation that indeed likely
(p. 96) occurs in piriform (Barkai & Hasselmo, 1994). Finally, content-addressable memo­

ry systems typically require activity-dependent changes in excitatory synaptic strengths.


Again, this pattern has since consistently been demonstrated in piriform cortex, where
enhanced olfactory learning capability is accompanied by long-term enhancement of
synaptic transmission in both the descending and ascending inputs (Cohen, Reuveni,
Barkai, & Maroun, 2008).

In addition to the above materialization of Haberly’s predictions on piriform structure,


several studies have similarly borne out his predictions on function. In a series of studies,
Don Wilson and colleagues have demonstrated the importance of piriform cortex associa­
tive memory-like properties in olfactory pattern formation, completion, and separation
from background (Barnes, Hofacer, Zaman, Rennaker, & Wilson, 2008; Kadohisa & Wil­
son, 2006; Linster, Henry, Kadohisa, & Wilson, 2007; Linster, Menon, Singh, & Wilson,
2009; D. A. Wilson, 2009a, 2009b; D. A. Wilson & Stevenson, 2003). In a critical recent
study, these authors taught rats to discriminate between various mixtures, each contain­
ing ten monomolecular components (Barnes et al., 2008). They found that rats easily dis­
criminated between a target mixture of ten components (10C) and a second mixture in
which only one of the ten components was replaced with a novel component (10CR1). In
turn, rats were poor at discriminating this same target mixture from a mixture where one
of the components was deleted (10C-1). The authors concluded that through pattern com­
pletion, 10C-1 was “completed” to 10C, yet through pattern separation, 10CR1 was per­
ceived as something new altogether. Critically, the authors accompanied these behavioral
studies with electrical recordings from both olfactory bulb and piriform cortex. They
found that a shift from 10C to 10C-1 induced a significant decorrelation in the activity of

Page 12 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
olfactory bulb mitral cell ensembles. In other words, olfactory bulb mitral cell ensembles
readily separated these overlapping patterns. In contrast, piriform cortex ensembles
showed no significant decorrelation across 10C and 10C-1 mixtures. In other words, the
piriform ensemble filled in the missing component and responded as if the full 10C mix­
ture were present—consistent with pattern completion. In contrast, a shift from 10C to
10CR1 produced significant cortical ensemble pattern separation. In other words, the en­
semble results were consistent with behavior whereby introduction of a novel component
into a complex mixture was relatively easy to detect, whereas removal of a single compo­
nent was difficult to detect.

Consistent with the above, Jay Gottfried and colleagues have used functional magnetic
resonance imaging (fMRI) to investigate piriform activity in humans (Gottfried & Wu,
2009). In an initial study, they uncovered a heterogenic response profile whereby odorant
physicochemical properties were evident in activity patterns measured in anterior piri­
form cortex, and odorant perceptual properties were associated with activity patterns
measured in posterior piriform (Gottfried, Winston, & Dolan, 2006). In that posterior piri­
form is richer than anterior piriform in the extent of associational connectivity, this find­
ing is consistent with the previously described findings in rodents. Moreover, using multi­
variate fMRI analysis techniques, they found that odorants with similar perceived quality
induced similar patterns of ensemble activity in posterior piriform cortex alone (Howard,
Plailly, Grueschow, Haynes, & Gottfried, 2009). Taken together, these results from both
rodents and humans depict piriform cortex as a critical component allowing the olfactory
system to deal with an ever-changing olfactory environment, while still allowing stable ol­
factory object formation and constancy.

Finally, beyond primary olfactory cortex, olfactory information is distributed widely


throughout the brain. Whereas other sensory modalities traverse a thalamic relay en
route from periphery to primary cortex, in olfaction information reaches primary cortex
directly. This is not to say, however, that there is no olfactory thalamus. A recent lesion
study has implicated thalamic involvement in olfactory identification, hedonic processing,
and olfactory motor control (Sela et al., 2009), and a recent imaging study has implicated
a thalamic role in olfactory attention (Plailly, Howard, Gitelman, & Gottfried, 2008), a
finding further supported by lesion studies (Tham, Stevenson, & Miller, 2010). From the
thalamus, olfactory information radiates widely, yet most notable in its projections is the
orbitofrontal cortex that is largely considered secondary olfactory cortex (J. L. Price,
1990). Both human fMRI studies and single-cell recordings in monkeys suggest that or­
bitofrontal cortex is critical for coding odor identity (Rolls, Critchley, & Treves, 1996; Tan­
abe, Iino, Ooshima, & Takagi, 1974) and may further be key for conscious perception of
smell (Li et al., 2010).

Page 13 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose

Looking at the Nose Through Human Be­


(p. 97)

havior
As reviewed above, the basic functional architecture of mammalian olfaction is well un­
derstood. In the olfactory epithelium, there is a thorough understanding of receptor
events that culminate in transduction of odorants into neural signals. In the olfactory
bulb, there is a comprehensive view of how such neural signals form spatiotemporal pat­
terns that allow odor discrimination. Finally, in piriform cortex, there is an emerging view
of how sparse neural representation enables formation of stable olfactory objects. Howev­
er, despite this good understanding of olfaction at the genetic, molecular, and cellular lev­
els, we have only poor understanding of structure–function relations in this system
(Mainen, 2006). Put simply, there is not a scientist or perfumer in the world who can look
at a novel molecule and predict its odor, or smell a novel smell and predict its structure.
One reason for this state of affairs is that the olfactory stimulus, namely, a chemical, has
typically been viewed as it would be by chemists. For example, carbon chain length has
been the most consistently studied odorant property, yet there is no clear importance for
carbon chain length in mammalian olfactory behavior (Boesveldt, Olsson, & Lundstrom,
2010). Indeed, as elegantly stated by the late Larry Katz at a lecture he gave at the Asso­
ciation for Chemoreception Science: “The olfactory system did not evolve to decode the
catalogue of Sigma-Aldrich, it evolved to decode the world around us.” In other words,
perhaps if we reexamine the olfactory stimulus space from a perceptual rather than a
chemical perspective, we may gain important insight into the function of the olfactory
system. It is with this notion in mind that we have recently generated an olfactory percep­
tual metric, and tested its application to perception and neural activity in the olfactory
system.

In an effort led by Rehan Khan (Khan et al., 2007), we constructed a perceptual “odor
space” using data from the Dravnieks’ Atlas of Odor Character Profiles, wherein about
150 experts (perfumers and olfactory scientists) ranked (from 0 to 5, reflecting “absent”
to “extremely” representative) 160 odorants (144 monomolecular species and 16 mix­
tures) against each of the 146 verbal descriptors (Dravnieks, 1982, 1985). We applied
principal components analysis (PCA), a well-established method for dimension reduction
that generates a new set of dimensions (principal components, or PCs) for the profile
space in which (1) each successive dimension has the maximal possible variance and (2)
all dimensions are uncorrelated. We found that the effective dimensionality of the odor
profile space was much smaller than 146, with the first four PCs accounting for 54 per­
cent of the variance (Figure 6.9A). To generate a perceptual odor space, we projected the
odorants onto a subspace formed by these first four PCs (Figure 6.9B. A navigable version
of this space is available at the odor space link at http://www.weizmann.ac.il/neurobiolo­
gy/worg). In a series of experiments, we found that this space formed a valid representa­
tion of odorant perception: Simple Euclidian distances in the space predicted both explic­
it (Figure 6.9C) and implicit (Figure 6.9D) odor similarity. In other words, odorants close
in the space smell similar, and odorants far-separated in the space smell dissimilar (Khan
et al, 2007). Moreover, we found that the primary dimension in the space (PC1) was tight­
Page 14 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
ly linked to odorant pleasantness, that is, a continuum ranging from very unpleasant at
one end to very pleasant at the other (Haddad et al, 2010, Figure 6.10).

Finding that pleasantness was the primary dimension of human olfactory perception was
consistent with many previous efforts. Odorant pleasantness was the primary aspect of
odor spontaneously used by subjects in olfactory discrimination tasks (S. S. Schiffman,
1974), and odorant pleasantness was the primary criterion spontaneously used by sub­
jects in order to combine odorants into groups (Berglund, Berglund, Engen, & Ekman,
1973; S. Schiffman, Robinson, & Erickson, 1977). When using large numbers of verbal de­
scriptors in order to describe odorants, pleasantness repeatedly emerged as the primary
dimension in multidimensional analyses of the resultant descriptor space (Khan et al.,
2007; Moskowitz & Barbe, 1977). Studies with newborns suggested that at least some as­
pects of olfactory pleasantness are innate (Soussignan, Schaal, Marlier, & Jiang, 1997;
Steiner, 1979). For example, neonate’s behavioral markers of disgust (nose wrinkling, up­
per lip raising) discriminated between vanillin judged as being pleasant and butyric acid
judged to be unpleasant by adult raters (Soussignan et al., 1997). Moreover, there is
agreement in the assessments of pleasantness by adults and children for various pure
odorants (Schmidt & Beauchamp, 1988) and personal odors (Mallet & Schaal, 1998).

Page 15 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose

Figure 6.9 Olfactory perceptual space. A, The pro­


portion of (descending line) and cumulative (ascend­
ing line) variance in perceptual descriptions ex­
plained by each of the principal components (PCs). B,
The 144 odorants projected into a two-dimensional
space made of the first and second PCs. Nine odor­
ants used in experiments depicted in C and D:
[acetophenone (AC), amyl acetate (AA), diphenyl ox­
ide (DP), ethyl butyrate (EB), eugenol (EU), guaiacol
(GU), heptanal (HP), hexanoic acid (HX), and phenyl
ethanol (PEA)]. C, For the nine odorants, the correla­
tion between explicit perceived similarity ratings and
PCA-based distance for all pairwise comparisons.
Odorants closer in the perceptual space were per­
ceived as more similar. D, Reaction time for correct
trials in a forced-choice same–different task using
five of the nine odorants. Error bars reflect SE. The
reaction time was longer for odorant pairs that were
closer in PCA-based space, thus providing an implicit
validation of the perceptual space.

Reprinted with permission from Khan et al., 2007.

Figure 6.10 Identifying pleasantness as the first PC


of perception. A, The five descriptors that flanked
each end of PC1 of perception. B, For the nine odor­
ants in Figure 6.8, the correlation between the pair­
wise difference in pleasantness and the pairwise dis­
tance along the first PC. Distance along the first PC
was a strong predictor of difference in pleasantness.

Reprinted with permission from Khan et al., 2007.

Page 16 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
After using PCA to reduce the apparent dimensionality of olfactory perception, we set out
to independently apply the same approach to odorant structure. We used structural chem­
istry software to obtain 1,514 physicochemical descriptors for each of 1,565 odorants.
These descriptors were of many types (p. 98) (p. 99) (e.g., atom counts, functional group
counts, counts of types of bonds, molecular weights, topological descriptors). We applied
PCA to these data and found that much of the variance could be explained by a relatively
small number of PCs. The first PC accounted for about 32 percent of the variance, and
the first ten accounted for about 70 percent of the variance.

Figure 6.11 Relating physicochemical space to per­


ceptual space. A, The correlation between the first to
fourth (descending in the figure) perceptual PCs and
each of the first seven physicochemical PCs for the
144 odorants. Error bars reflect the SE from 1,000
bootstrap replicates. The best correlation was be­
tween the first PC of perception and the first PC of
physicochemical space. This correlation was signifi­
cantly larger than all other correlations. B, For the
144 odorants, the correlation between their actual
first perceptual PC value and the value our model
predicted from their physicochemical data.

Reprinted with permission from Khan et al., 2007.

Because we separately generated PC spaces for perception and structure, we could then
ask whether these two spaces were related in any way. In other words, we tested for a
correlation between perceptual PCs and physicochemical PCs. Strikingly, the strongest
correlation was between the first perceptual PC and the first physicochemical PC (Figure
6.11A). In other words, there was a privileged relationship between PC1 of perception
and PC1 of physicochemical organization. The single best axis for explaining the variance
in the physicochemical data was the best predictor of the single best axis for explaining
the variance in the perceptual data. Having established that the physicochemical space is
related to the perceptual space, we next built a linear predictive model through a cross-
validation procedure that allowed us to predict odor perception from odorant structure
(Figure 6.11B). To test the predictive power of our model, we obtained physicochemical
parameters for 52 odorants commonly used in olfaction experiments, but not present in
the set of 144 used in the model building. We applied our model to the fifty-two new mole­
cules so that for each we had predicted values for the first PC of perceptual space. We
found that using these PC values, we could convincingly predict the rank-order of pleas­

Page 17 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
antness of these molecules (Spearman rank correlation, r = 0.72; p = 0.0004), and mod­
estly yet significantly predict their actual pleasantness ratings (r = 0.55; p = 0.004).
Moreover, we obtained similar predictive power across three different cultures: urban
Americans in California, rural Muslim Arab Israelis, and urban Jewish Israelis (Figure
6.12).

Figure 6.12 Predicting odorant pleasantness across


cultures. Twenty-seven odorous molecules not com­
monly used in olfactory studies, and not previously
tested by us, were presented to three cultural groups
of naïve subjects: urban Americans (23 subjects), rur­
al Arab Israelis (22 subjects), and urban Jewish Is­
raelis (20 subjects).

Reprinted with permission from Khan et al., 2007.

An aspect of these results that has been viewed as challenging by many is that they imply
that pleasantness is written into the molecular structure of odorants and is therefore by
definition innate. This can be viewed as inconsistent with the high levels of cross-individ­
ual and cross-cultural variability in odor perception (Ayabe-Kanamura et al., 1998; Wysoc­
ki, Pierce, & Gilbert, 1991). We indeed think that odor pleasantness is hard-wired and in­
nate. Consistent with this, many odors have clear hedonic value despite no previous expe­
rience or exposure (Soussignan et al., 1997; Steiner, 1979), and moreover, the metric that
links this hedonic value with odorant structure (PC1 of structure) predicts (p. 100) re­
sponses across species (Mandairon, Poncelet, Bensafi, & Didier, 2009). Nevertheless, we
stress that an innate hard-wired link remains highly susceptible to the influences of learn­
ing, experience, and context. For example, no one would argue that perceived color is in­
nately and hard-wire-linked to wavelength. However, a given wavelength can be per­
ceived to have very different colors as a function of context (see striking online demon­
strations at http://www.purveslab.net/seeforyourself/). Moreover, no one would argue that
location in space is reflected in location on the retina in an innate and hard-wired fashion.
Nevertheless, context can alter spatial perception, as clearly evident in the Muller-Lyer il­
Page 18 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
lusion. Similarly, we argue that odor pleasantness is hard-wire-linked to (PC1 of) odorant
structure, yet this link is clearly modified by learning, experience, and context.

Olfaction is often described as multidimensional. It is made of a multidimensional stimu­


lus, which is transduced by about 1,000 different receptor types, giving rise to a neural
image of similarly high dimensionality. Yet our results suggested that very few dimen­
sions, in fact primarily one, captures a significant portion of the variance in olfactory per­
ception, and critically, this one dimension allows for modest yet accurate predictions of
odor perception from odorant structure. With this in mind, in an effort led by Rafi Haddad
(Haddad, Khan, et al., 2008; Haddad, Lapid, Harel, & Sobel, 2008; Haddad et al., 2010),
we set out to ask whether this reduced dimensionality was reflected in any way in neural
activity. We mined all available previously published data sets that reported the neural re­
sponse in a sizable number of receptor types or glomeruli to a sizable number of odor­
ants. This rendered 12 data sets using either methods of electrical or optical recording.
Once again, we applied PCA to this data. The first two PCs alone explained about 58 per­
cent of the variance in the neural activity data. Moreover, in nine of the twelve datasets
we analyzed, we found a strong correlation between PC1 of neural response space and
the summed activity of the sampled population, whether spike rates or optical signal, with
r values ranging between 0.73 and 0.98 (all p < 0.001). Considering the summed re­
sponse in the olfactory system of insects was previously identified as strongly predictive
of insect approach or withdrawal (Kreher, Mathew, Kim, & Carlson, 2008) (Figure 6.13A),
we set out here to ask whether PC1 of neural activity in the mammalian olfactory system
was similarly related to behavior and perception. One of the datasets we studied was that
of Saito et al. (2009), who reported the neural response of ten human neurons and fifty-
three mouse neurons in vitro to a set of sixty-two odorants. We asked eighteen human
subjects to rate the odorant pleasantness of twenty-six odorants randomly selected from
those tested by Saito et al. (2009). The correlation between human receptor PC1 and
odorant pleasantness was 0.49 (p < 0.009), and if we added the mouse receptor (p. 101)
response, it was 0.71 (p < 0.0001) (Figure 6.13B). To reiterate this critical result, PC1 of
odorant-induced neural activity measured in a dish by one group at Duke University in
the United States was a significant predictor of odorant pleasantness, as estimated by hu­
man subjects tested by a different group at the Weizmann Institute in Israel.

Finally, here we also conducted an initial investigation into the second principal compo­
nent of activity as well. In that PC1 of neural activity reflected approach or withdrawal in
animals, we speculated that once approached, a second decision to be made regarding an
odor is whether it is edible or poisonous. Consistent with this prediction, we found signifi­
cant correlations between PC2 of neural activity and odorant toxicity in mice and in rats
(Figure 6.13C), as well as a significant correlation between toxicity/edibility and PC2 of
perception in humans (Figure 6.13D). Similar findings have been obtained by others inde­
pendently (Zarzo, 2008).

To conclude this section, we found that if one uses the human nose as a window onto ol­
faction, one obtains a surprisingly simplified picture that explains a significant portion of
the variance in both neural activity and perception in this system. This picture relied on a

Page 19 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
set of simple linear transforms. It suggested that the primary axis of perception was
linked to the primary axis of odorant structure, and that both of these were in turn relat­
ed to the primary axis of neural activity in this system. Moreover, the second axis of per­
ception was linked to the second axis of neural activity. Critically, these transforms al­
lowed for modest but significant predictions of perception, structure, and neural activity
across species.

Looking at Human Behavior Through the Nose

Figure 6.13 The principal axes of neural space re­


flected olfactory behavior and perception. A, Correla­
tion between PC1 of neural population activity and
the odor preferences of Drosophila larvae. Every dot
represents a single odor. B, Correlation between PC1
of neural space in humans and mice with human odor
pleasantness. Every dot represents a single odor. C,
Correlation between PC2 of neural population activi­
ty and oral toxicity for rats (LD50 values in mg/kg).
Every dot represents an odor. D, Correlation be­
tween PC2 of human perceptual space and LD50 val­
ues of rats.

Reprinted with permission from Haddad et al., 2010.

In the previous section, the human nose taught us about the mammalian olfactory system.
This was (p. 102) possible because, in contrast to popular notions, the human nose is an
astonishingly acute device. This is evident in unusually keen powers of detection and dis­
crimination, which in some cases compete with those of microsmatic mammals, or with
those of sophisticated analytical equipment. These abilities have been detailed within re­
cent reviews (Sela & Sobel, 2010; Shepherd, 2004, 2005; Stevenson, 2010; Yeshurun &
Sobel, 2010; Zelano & Sobel, 2005). Here, we highlight key cases in which these keen ol­
factory abilities clearly influence human behavior.

As noted in the introduction, two aspects of human behavior that are, in our view, macros­
matic, are eating and mating: Notably, both are critical for survival. A third human behav­

Page 20 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
ior for which human olfactory influences are less clearly apparent, yet in our view are
nevertheless critical, is social interaction. It is beyond the scope of this chapter to provide
a comprehensive review on human chemosignaling, as recently done elsewhere (Steven­
son, 2010). Here, we selectively choose examples to highlight the role of olfaction in hu­
man behavior.

Eating

We eat what tastes good (Drewnowski, 1997). Taste, or more accurately flavor, is domi­
nated by smell (Small, Jones-Gotman, Zatorre, Petrides, & Evans, 1997). Hence, things
taste good because they smell good (Letarte, 1997). In other words, by determining the
palatability and hedonic value of food, olfaction influences the balance of food intake
(Rolls, 2006; Saper, Chou, & Elmquist, 2002; Yeomans, 2006). In addition to this very sim­
ple basic premise, there are also several direct and indirect lines of evidence that high­
light the significance of olfaction in eating behavior. For example, olfaction drives saliva­
tion even at subthreshold odor concentrations (Pangborn & Berggren, 1973; Rogers &
Hill, 1989). Odors regulate appetite (Rogers & Hill, 1989) and affect the cephalic phase of
insulin secretion (W. G. Johnson & Wildman, 1983; Louis-Sylvestre & Le Magnen, 1980)
and gastric acid secretion (Feldman & Richardson, 1986).

The interaction between olfaction and eating is bidirectional. Olfaction influences eating,
and eating behavior and mechanisms influence olfaction. The nature of this influence,
however, remains controversial. For example, whereas some studies suggest that hunger
increases olfactory sensitivity to food odors (Guild, 1956; Hammer, 1951; Schneider &
Wolf, 1955; Stafford & Welbeck, 2010), others failed to replicate these results (Janowitz &
Grossman, 1949; Zilstorff-Pedersen, 1955), or even found the opposite—higher sensitivity
in satiety (Albrecht et al., 2009). Hunger and satiety influence not only sensitivity but also
hedonics: Odors of foods consumed to satiety become less pleasant (Albrecht et al., 2009;
Rolls & Rolls, 1997). This satiety-driven shift in hedonic representation is accompanied by
altered brain representation. This was uncovered in an elegant human brain–imaging
study in which eating bananas to satiety changed the representation of banana odor in
the orbitofrontal cortex (O’Doherty et al., 2000). Also, an odor encoded during inactiva­
tion of taste-cortex in rats was later remembered as the same only during similar taste-
cortex inactivation (Fortis-Santiago, Rodwin, Neseliler, Piette, & Katz, 2009). The mecha­
nism for these shifted representations may be evident at the earliest stages of olfactory
processing: Perfusion of the eating-related hormones insulin and leptin onto olfactory re­
ceptor neurons in rats significantly increased spontaneous firing frequency in the ab­
sence of odors and decreased odorant-induced peak amplitude in response to food odors
(Ketterer et al., 2010; Savigner et al., 2009). Therefore, by increasing spontaneous activi­
ty but reducing odorant-induced activity of olfactory receptor neurons, elevated levels of
insulin and leptin (such as after a meal) may result in decreased global signal-to-noise ra­
tio in the olfactory epithelium (Ketterer et al., 2010; Savigner et al., 2009).

Page 21 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
The importance of olfaction for human eating behavior is clearly evidenced in cases of ol­
factory loss. Anosmic patients experience distortions in flavor perception (Bonfils, Avan,
Faulcon, & Malinvaud, 2005) and changes in eating behavior (Aschenbrenner et al.,
2008). Simulating anosmia in healthy subjects by intranasal lidocaine administration re­
sulted in reduced hunger ratings (Greenway et al., 2007). Nevertheless, the rate of abnor­
mal body mass index subjects among anosmic people is no larger than in the general pop­
ulation (Aschenbrenner et al., 2008).

Several eating disorders, ranging from obesity (Hoover, 2010; Obrebowski, Obrebowska-
Karsznia, & Gawlinski, 2000; Richardson, Vander Woude, Sudan, Thompson, & Leopold,
2004; Snyder, Duffy, Chapo, Cobbett, & Bartoshuk, 2003) to anorexia (Fedoroff, Stoner,
Andersen, Doty, & Rolls, 1995; Roessner, Bleich, Banaschewski, & Rothenberger, 2005),
have been associated with alterations in olfactory perception, and the nutritional chal­
lenge associated with aging has been clearly linked to the age-related loss of olfaction
(Cain & Gent, 1991; Doty, 1989; (p. 103) S. S. Schiffman, 1997). Accordingly, artificially in­
creasing the odorous properties of foods helps overcome the nutritional challenge in ag­
ing (Mathey, Siebelink, de Graaf, & Van Staveren, 2001; S. S. Schiffman & Warwick, 1988;
Stevens & Lawless, 1981).

Consistent with the bidirectional influences of olfaction and eating behavior, edibility is
clearly a key category in odor perception. It was identified as the second principal axis of
perception independently by us (Haddad et al., 2010) and others (Zarzo, 2008). Consis­
tent with edibility as an olfactory category, olfactory responses are stronger (Small et al.,
2005) and faster (Boesveldt, Frasnelli, Gordon, & Lundstrom), and identification is more
accurate (Fusari & Ballesteros, 2008), for food over nonfood odors. Moreover, whereas
humans are poor at spontaneous odor naming, they are very good at spontaneous rating
of odor edibility, even in childhood (de Wijk & Cain, 1994a, 1994b). Indeed, olfactory pref­
erences of neonates are influenced by their mother’s food preferences during pregnancy
(Schaal, Marlier, & Soussignan, 2000), suggesting that the powerful link between olfacto­
ry preferences and eating behavior is formed at the earliest stages of development.

Mating

When reasoning the choice of a sexual partner, some may list physical and personality
qualities, whereas others may just explain the choice by a “simple click” or “chemistry.” Is
this “click” indeed chemical? Although, as noted, humans tend to underestimate their
own olfactory abilities, humans can nevertheless use olfaction to discriminate the genetic
makeup of potential mating partners. The human genome includes a region called human
leukocyte antigen (HLA), which consists of many genes related to the immune system, in
addition to olfactory receptor genes and pseudogenes. Several studies have found that
women can use smell to discriminate between men as a function of similarity between
their own and the men’s HLA alleles (Eggert, Muller-Ruchholtz, & Ferstl, 1998; Jacob,
McClintock, Zelano, & Ober, 2002; Ober et al., 1997; Wedekind & Furi, 1997; Wedekind,
Seebeck, Bettens, & Paepke, 1995). The “ideal” smell of genetic makeup remains contro­
versial, yet most evidence suggests that women prefer an odor of a man with HLA alleles

Page 22 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
not identical to their own, but at the same time not too different (Jacob et al., 2002; T.
Roberts & Roiser, 2010). In turn, this preference may be for major histocompatibility
complex (MHC) heterozygosity rather than dissimilarity (Thornhill et al., 2003). Olfactory
mate preference, however, is plastic. For example, single women preferred odors of MHC-
similar men, whereas women in relationships preferred odors of MHC-dissimilar men (S.
C. Roberts & Little, 2008). Moreover, olfactory mate preferences are influenced by the
menstrual cycle (Gangestad & Cousins, 2001; Havlicek, Roberts, & Flegr, 2005; Little,
Jones, & Burriss, 2007; Singh & Bronstad, 2001) (Figure 6.14A) and by hormone-based
contraceptives (S. C. Roberts, Gosling, Carter, & Petrie, 2008; Wedekind et al., 1995;
Wedekind & Furi, 1997).

Finally, although not directly related to mate selection, the clearest case of chemical com­
munication in humans also has clear potential implications for mating behavior. This is
the phenomenon of menstrual synchrony, whereby women who live in close proximity,
such as roommates in dorms, synchronize their menstrual cycle over time (McClintock,
1971). This effect is mediated by an odor in sweat. This was verified in a series of studies
in which experimenters obtained underarm sweat extracts from donor women during ei­
ther the ovulatory or follicular menstrual phase. These extracts were then deposited on
the upper lips of recipient women, where follicular sweat accelerated ovulation, and ovu­
latory sweat delayed it (Russell, Switz, & Thompson, 1980; Stern & McClintock, 1998)
(Figure 6.14B). Moreover, variation in menstrual timing can be increased by the odor of
other lactating women (Jacob et al., 2004) or regulated by the odor of male hormones
(Cutler et al., 1986; Wysocki & Preti, 2004).

Olfactory influences on mate preferences are not restricted to women. Men can detect an
HLA odor different from their own when taken from either men or women odor donors,
and can rate the similar odor as more pleasant for both of the sexes (Thornhill et al.,
2003; Wedekind & Furi, 1997). In addition, men preferred the scent of common over rare
MHC alleles (Thornhill et al., 2003). Moreover, unrelated to HLA similarity, male raters
can detect the menstrual phase of female body odor donors. The follicular phase is rated
as more pleasant and sexy than the luteal phase (Singh & Bronstad, 2001), an effect that
is diminished when the women use hormonal contraceptives (Kuukasjarvi et al., 2004;
Thornhill et al., 2003).

Page 23 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose

Figure 6.14 Human chemosignaling. A, Women’s


preference for symmetrical men’s odor as a function
of probability of conception, based on actuarial val­
ues. Normally ovulating (non-pill-using) women only.
Positive regression values reflect increased relative
attraction to scent of symmetrical males; r = 0.54, p
< 0.005. (From Gangestad & Thornhill, 1998.) B,
Change in length of the recipient’s cycle. Cycles
were shorter than baseline during exposure to follic­
ular compounds (t = 1.78; p ≤ 0.05, 37 cycles) but
longer during exposure to ovulatory compounds (t =
2.7; p ≤ 0.01, 38 cycles). Cycles during exposure to
the carrier were not different from baseline (t =
0.05; p ≤ 0.96, 27 cycles). (From Stern & McClin­
tock, 1998.) C, Post-smell testosterone levels (con­
trolling for pre-smell testosterone levels) among men
exposed to the odor of a woman close to ovulation,
the odor of a woman far from ovulation, or a control
odor. Error bars represent standard errors.

Reprinted with permission from Miller & Maner,


2010.

These behavioral results are echoed in hormone expression. Men exposed to the scent of
an ovulating woman subsequently displayed higher levels of testosterone than did men
exposed to the scent of a (p. 104) nonovulating woman or a control scent (Miller & Maner,
2010) (Figure 6.14C). Moreover, a recent study on chemosignals in human tears revealed
a host of influences on sexual arousal (Gelstein et al., 2011). Sniffing negative-emotion-re­
lated odorless tears obtained from women donors induced reductions in sexual appeal at­
tributed by men to pictures of women’s faces. Sniffing tears also reduced self-rated sexu­
al arousal, reduced physiological measures of arousal, and reduced levels of testosterone.
Finally, fMRI revealed that sniffing women’s tears selectively reduced activity in brain
substrates of sexual arousal in men (Gelstein et al., 2011) (Figure 6.15).

Page 24 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Social Interaction

Figure 6.15 A chemosignal in human tears. Sniffing


odorless emotional tears obtained from women
donors, altered brain activity in the substrates of
arousal in men, and significantly lowered levels of
salivary testosterone.

Whereas olfactory influences on human eating and mating are intuitively obvious, olfacto­
ry cues may play into aspects of human social interaction that have been less commonly
associated with smell. Many such types of social chemosignaling have been examined
(Meredith, 2001), but here we will detail only one particular case that has received more
attention than others, and that is the ability of humans to smell fear. Fear or distress
chemosignals are prevalent throughout animal species (Hauser et al., 2008; Pageat &
Gaultier, 2003). In an initial study in humans, Chen and Haviland-Jones (2000) collected
underarm odors on gauze pads from young women and men after they watched funny or
frightening movies. They later asked other women and men to determine by smell which
was the odor of people when they were “happy” or “afraid.” Women correctly identified
happiness in men and women, and fear in men. Men correctly identified happiness in
women and fear in men. A (p. 105) similar result was later obtained in a study that exam­
ined women only (Ackerl, Atzmueller, & Grammer, 2002). Moreover, women had improved
performance in a cognitive verbal task after smelling fear sweat versus neutral sweat
(Chen, Katdare, & Lucas, 2006), and the smell of fearful sweat biased women toward in­
terpreting ambiguous expressions as more fearful, but had no effect when the facial emo­
tion was more discernible (Zhou & Chen, 2009). Moreover, subjects had an increased
startle reflex when exposed to anxiety-related sweat versus sports-related sweat (Prehn,
Ohrt, Sojka, Ferstl, & Pause, 2006). Finally, imaging studies have revealed dissociable
brain representations after smelling anxiety sweat versus sports-related sweat (Prehn-
Kristensen et al., 2009). These differences are particularly pronounced in the amygdala, a
brain substrate common to olfaction, fear responses, and emotional regulation of behav­
ior (Mujica-Parodi et al., 2009). Taken together, this body of research strongly suggests

Page 25 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
that humans can discriminate the scent of fear from other body odors, and it is not unlike­
ly that this influences behavior. We think that smelling fear or distress is by no means one
of the key roles of human chemical communication, yet we have chosen to detail this par­
ticular example of human social chemosignaling because it has received increased experi­
mental attention. We think that chemosignaling in fact plays into many aspects of human
social interaction, and uncovering these instances of chemosignaling is a major goal for
research in our field.

Final Word
We have described the functional neuroanatomy of the mammalian sense of smell. This
system is highly conserved (Ache & Young, 2005), and therefore the human sense of smell
is not very different from that of other mammals. With this in mind, just as a deep under­
standing of human visual psychophysics provided the basis for probing vision neurobiolo­
gy, we propose that a solid understanding of human olfactory psychophysics is a
perquisite to understanding the neurobiological mechanisms of the sense of smell. More­
over, olfaction significantly influences critical human behaviors directly related to sur­
vival, such as eating, mating, and social interaction. Better understanding of these olfac­
tory influences is key, in our view, to a comprehensive picture of human behavior.

References
Ache, B. W., & Young, J. M. (2005). Olfaction: Diverse species, conserved principles. Neu­
ron, 48 (3), 417–430.

Ackerl, K., Atzmueller, M., & Grammer, K. (2002). The scent of fear. Neuroendocrinology
Letters, 23 (2), 79–84.

Albrecht, J., Schreder, T., Kleemann, A. M., Schopf, V., Kopietz, R., Anzinger, A., et al.
(2009). Olfactory detection thresholds and pleasantness of a food-related and a non-food
odour in hunger and satiety. Rhinology, 47 (2), 160–165.

Allison, A. (1954). The secondary olfactory areas in the human brain. Journal of Anatomy,
88, 481–488.

Aschenbrenner, K., Hummel, C., Teszmer, K., Krone, F., Ishimaru, T., Seo, H. S., et al.
(2008). The influence of olfactory loss on dietary behaviors. Laryngoscope, 118 (1), 135–
144.

Ayabe-Kanamura, S., Schicker, I., Laska, M., Hudson, R., Distel, H., Kobayakawa, T., et al.
(1998). Differences in perception of everyday odors: A Japanese-German cross-cultural
study. Chemical Senses, 23 (1), 31–38.

Barkai, E., & Hasselmo, M. E. (1994). Modulation of the input/output function of rat piri­
form cortex pyramidal cells. Journal of Neurophysiology, 72 (2), 644.

Page 26 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Barnes, D. C., Hofacer, R. D., Zaman, A. R., Rennaker, R. L., & Wilson, D. A. (2008). Olfac­
tory perceptual stability and discrimination. Nature Neuroscience, 11 (12), 1378–1380.

Bathellier, B., Buhl, D. L., Accolla, R., & Carleton, A. (2008). Dynamic ensemble odor cod­
ing in the mammalian olfactory bulb: Sensory information at different timescales. Neuron,
57 (4), 586–598.

Bender, G., Hummel, T., Negoias, S., & Small, D. M. (2009). Separate signals for or­
thonasal vs. retronasal perception of food but not nonfood odors. Behavioral Neuro­
science, 123 (3), 481–489.

Bensafi, M., Porter, J., Pouliot, S., Mainland, J., Johnson, B., Zelano, C., et al. (2003). Olfac­
tomotor activity during imagery mimics that during perception. Nature Neuroscience, 6
(11), 1142–1144.

Berglund, B., Berglund, U., Engen, T., & Ekman, G. (1973). Multidimensional analysis of
21 odors. Scandinavian Journal of Psychology, 14 (2), 131–137.

Boesveldt, S., Frasnelli, J., Gordon, A. R., & Lundstrom, J. N. (2010). The fish is bad: Nega­
tive food odors elicit faster and more accurate reactions than other odors. Biological Psy­
chology, 84 (2), 313–317.

Boesveldt, S., Olsson, M. J., & Lundstrom, J. N. (2010). Carbon chain length and the stimu­
lus problem in olfaction. Behavioral Brain Research, 215 (1), 110–113.

Bonfils, P., Avan, P., Faulcon, P., & Malinvaud, D. (2005). Distorted odorant perception:
Analysis of a series of 56 patients with parosmia. Archives of Otolaryngology—Head and
Neck Surgery, 131 (2), 107–112.

Breer, H., Fleischer, J., & Strotmann, J. (2006). The sense of smell: Multiple olfactory sub­
systems. Cellular and Molecular Life Sciences, 63 (13), 1465–1475.

Brunjes, P. C., Illig, K. R., & Meyer, E. A. (2005). A field guide to the anterior olfactory nu­
cleus (cortex). Brain Res Brain Res Rev, 50 (2), 305–335.

Buck, L. B. (2000). The molecular architecture of odor and pheromone sensing in mam­
mals. Cell, 100 (6), 611–618.

Buck, L., & Axel, R. (1991). A novel multigene family may encode odorant receptors: a
molecular basis for odor recognition. Cell, 65 (1), 175–187.

Cain, W. S., & Gent, J. F. (1991). Olfactory sensitivity: Reliability, generality, and associa­
tion with aging. Journal of Experimental Psychology: Human Perception and Performance,
17 (2), 382–391.

Carmichael, S. T., Clugnet, M. C., & Price, J. L. (1994). Central olfactory connections in
the macaque monkey. Journal of Comparative Neurology, 346 (3), 403–434.

Page 27 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Chen, D., & Haviland-Jones, J. (2000). Human olfactory communication of emo­
(p. 106)

tion. Perceptual and Motor Skills, 91 (3 Pt 1), 771.

Chen, D., Katdare, A., & Lucas, N. (2006). Chemosignals of fear enhance cognitive perfor­
mance in humans. Chemical Senses, 31 (5), 415.

Cleland, T. A., & Sullivan, R. M. (2003). Central olfactory structures. In R. L. Doty (Ed.),
Handbook of olfaction and gustation (2nd ed., pp. 165–180). New York: Marcel Dekker.

Cohen, Y., Reuveni, I., Barkai, E., & Maroun, M. (2008). Olfactory learning-induced long-
lasting enhancement of descending and ascending synaptic transmission to the piriform
cortex. Journal of Neuroscience, 28 (26), 6664.

Cross, D. J., Flexman, J. A., Anzai, Y., Morrow, T. J., Maravilla, K. R., & Minoshima, S.
(2006). In vivo imaging of functional disruption, recovery and alteration in rat olfactory
circuitry after lesion. NeuroImage, 32 (3), 1265–1272.

Cutler, W. B., Preti, G., Krieger, A., Huggins, G. R., Garcia, C. R., & Lawley, H. J. (1986).
Human axillary secretions influence women’s menstrual cycles: The role of donor extract
from men. Hormones and Behavior, 20 (4), 463–473.

de Olmos, J., Hardy, H., & Heimer, L. (1978). The afferent connections of the main and the
accessory olfactory bulb formations in the rat: an experimental HRP-study. Journal of
Comparative Neurology, 15 (181), 213–244.

de Wijk, R. A., & Cain, W. S. (1994a). Odor identification by name and by edibility: Life-
span development and safety. Human Factors, 36 (1), 182–187.

de Wijk, R. A., & Cain, W. S. (1994b). Odor quality: Discrimination versus free and cued
identification. Perception and Psychophysics, 56 (1), 12–18.

DeMaria, S., & Ngai, J. (2010). The cell biology of smell. Journal of Cell Biology, 191 (3),
443–452.

Doty, R. L. (1989). Influence of age and age-related diseases on olfactory function. Annals
of the New York Academy of Sciences, 561, 76–86.

Dravnieks, A. (1982). Odor quality: Semantically generated multi-dimensional profiles are


stable. Science, 218, 799–801.

Dravnieks, A. (1985). Atlas of odor character profiles. Philadelphia: ASTM Press.

Drewnowski, A. (1997). Taste preferences and food intake. Annual Review of Nutrition, 17
(1), 237–253.

Eggert, F., Muller-Ruchholtz, W., & Ferstl, R. (1998). Olfactory cues associated with the
major histocompatibility complex. Genetica, 104 (3), 191–197.

Page 28 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Fedoroff, I. C., Stoner, S. A., Andersen, A. E., Doty, R. L., & Rolls, B. J. (1995). Olfactory
dysfunction in anorexia and bulimia nervosa. International Journal of Eating Disorders, 18
(1), 71–77.

Feinstein, P., & Mombaerts, P. (2004). A contextual model for axonal sorting into
glomeruli in the mouse olfactory system. Cell, 117 (6), 817–831.

Feldman, M., & Richardson, C. T. (1986). Role of thought, sight, smell, and taste of food in
the cephalic phase of gastric acid secretion in humans. Gastroenterology, 90 (2), 428–433.

Ferrero, D. M., & Liberles, S. D. (2010). The secret codes of mammalian scents. Wiley In­
terdisciplinary Reviews: Systems Biology and Medicine, 2 (1), 23–33.

Firestein, S. (2001). How the olfactory system makes sense of scents. Nature, 413 (6852),
211–218.

Frasnelli, J., Lundstrom, J. N., Boyle, J. A., Katsarkas, A., & Jones-Gotman, M. (2011). The
Vomeronasal Organ is not Involved in the Perception of Endogenous Odors. Human Brain
Mapping, 32 (3), 450–460.

Fortis-Santiago, Y., Rodwin, B. A., Neseliler, S., Piette, C. E., & Katz, D. B. (2009). State
dependence of olfactory perception as a function of taste cortical inactivation. Nature
Neuroscience, 13 (2), 158–159.

Franco, M. I., Turin, L., Mershin, A., & Skoulakis, E. M. (2011). Molecular vibration-sens­
ing component in Drosophila melanogaster olfaction. Proceedings of the National Acade­
my of Sciences U S A, 108 (9), 3797–3802.

Fusari, A., & Ballesteros, S. (2008). Identification of odors of edible and nonedible stimuli
as affected by age and gender. Behavior Research Methods, 40 (3), 752.

Gangestad, S. W., & Cousins, A. J. (2001). Adaptive design, female mate preferences, and
shifts across the menstrual cycle. Annual Review of Sex Research, 12, 145–185.

Gangestad, S. W., & Thornhill, R. (1998). Menstrual cycle variation in women’s prefer­
ences for the scent of symmetrical men. Proceedings of the Royal Society of London. B.
Biological Sciences, 265 (1399), 927–933.

Gelstein, S., Yeshurun, Y., Rozenkrantz, L., Shushan, S., Frumin, I., Roth, Y., et al. (2011).
Human tears contain a chemosignal. Science, 331 (6014), 226–230.

Gilad, Y., & Lancet, D. (2003). Population differences in the human functional olfactory
repertoire. Molecular Biology and Evolution, 20 (3), 307–314.

Goldman, A. L., Van der Goes van Naters, W., Lessing, D., Warr, C. G., & Carlson, J. R.
(2005). Coexpression of two functional odor receptors in one neuron. Neuron, 45 (5), 661–
666.

Page 29 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Gottfried, J. A. (2010). Central mechanisms of odour object perception. Nature Reviews,
Neuroscience, 11 (9), 628–641.

Gottfried, J. A., Winston, J. S., & Dolan, R. J. (2006). Dissociable codes of odor quality and
odorant structure in human piriform cortex. Neuron, 49 (3), 467–479.

Gottfried, J. A., & Wu, K. N. (2009). Perceptual and neural pliability of odor objects. An­
nals of the New York Academy of Sciences, 1170, 324–332.

Graziadei, P. P., & Monti Graziadei, A. G. (1983). Regeneration in the olfactory system of
vertebrates. American Journal of Otolaryngology, 4 (4), 228–233.

Greenway, F. L., Martin, C. K., Gupta, A. K., Cruickshank, S., Whitehouse, J., DeYoung, L.,
et al. (2007). Using intranasal lidocaine to reduce food intake. International Journal of
Obesity (London), 31 (5), 858–863.

Grosmaitre, X., Fuss, S. H., Lee, A. C., Adipietro, K. A., Matsunami, H., Mombaerts, P., et
al. (2009). SR1, a mouse odorant receptor with an unusually broad response profile. Jour­
nal of Neuroscience, 29 (46), 14545–14552.

Grosmaitre, X., Santarelli, L. C., Tan, J., Luo, M., & Ma, M. (2007). Dual functions of mam­
malian olfactory sensory neurons as odor detectors and mechanical sensors. Nature Neu­
roscience, 10 (3), 348–354.

Guild, A. A. (1956). Olfactory acuity in normal and obese human subjects: Diurnal varia­
tions and the effect of d-amphetamine sulphate. Journal of Laryngology and Otology, 70
(7), 408–414.

Haberly, L. B. (2001). Parallel-distributed processing in olfactory cortex: New insights


from morphological and physiological analysis of neuronal circuitry. Chemical Senses, 26
(5), 551–576.

Haberly, L. B., & Bower, J. M. (1989). Olfactory cortex: Model circuit for study of associa­
tive memory? Trends in Neurosciences, 12 (7), 258–264.

Haddad, R., Khan, R., Takahashi, Y. K., Mori, K., Harel, D., & Sobel, N. (2008). A
(p. 107)

metric for odorant comparison. Nature Methods, 5 (5), 425–429.

Haddad, R., Lapid, H., Harel, D., & Sobel, N. (2008). Measuring smells. Current Opinion
in Neurobiology, 18 (4), 438–444.

Haddad, R., Weiss, T., Khan, R., Nadler, B., Mandairon, N., Bensafi, M., et al. (2010). Glob­
al features of neural activity in the olfactory system form a parallel code that predicts ol­
factory behavior and perception. Journal of Neuroscience, 30 (27), 9017–9026.

Hallem, E. A., & Carlson, J. R. (2006). Coding of odors by a receptor repertoire. Cell, 125
(1), 143–160.

Page 30 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Halpern, M. (1987). The organization and function of the vomeronasal system. Annual Re­
view of Neuroscience, 10 (1), 325–362.

Hammer, F. J. (1951). The relation of odor, taste and flicker-fusion thresholds to food in­
take. Journal of Comparative Physiology and Psychology, 44 (5), 403–411.

Hauser, R., Marczak, M., Karaszewski, B., Wiergowski, M., Kaliszan, M., Penkowski, M.,
et al. (2008). A preliminary study for identifying olfactory markers of fear in the rat. Labo­
ratory Animals (New York), 37 (2), 76–80.

Havlicek, J., Roberts, S. C., & Flegr, J. (2005). Women’s preference for dominant male
odour: Effects of menstrual cycle and relationship status. Biology Letters, 1 (3), 256–259.

Hoover, K. C. (2010). Smell with inspiration: The evolutionary significance of olfaction.


American Journal of Physical Anthropology, 143 (Suppl 51), 63–74.

Howard, J. D., Plailly, J., Grueschow, M., Haynes, J. D., & Gottfried, J. A. (2009). Odor qual­
ity coding and categorization in human posterior piriform cortex. Nature Neuroscience,
12 (7), 932–938.

Hummel, T. (2000). Assessment of intranasal trigeminal function. International Journal of


Psychophysiology, 36 (2), 147–155.

Hummel, T. (2008). Retronasal perception of odors. Chemical Biodiversity, 5 (6), 853–861.

Illig, K. R., & Haberly, L. B. (2003). Odor-evoked activity is spatially distributed in piri­
form cortex. Journal of Comparative Neurology, 457 (4), 361–373.

Jacob, S., McClintock, M. K., Zelano, B., & Ober, C. (2002). Paternally inherited HLA alle­
les are associated with women’s choice of male odor. Nature Genetics, 30 (2), 175–179.

Jacob, S., Spencer, N. A., Bullivant, S. B., Sellergren, S. A., Mennella, J. A., & McClintock,
M. K. (2004). Effects of breastfeeding chemosignals on the human menstrual cycle. Hu­
man Reproduction, 19 (2), 422–429.

Janowitz, H. D., & Grossman, M. I. (1949). Gustoolfactory thresholds in relation to ap­


petite and hunger sensations. Journal of Applied Physiology, 2 (4), 217–222.

Johnson, B. A., Ong, J., & Leon, M. (2010). Glomerular activity patterns evoked by natural
odor objects in the rat olfactory bulb are related to patterns evoked by major odorant
components. Journal of Comparative Neurology, 518 (9), 1542–1555.

Johnson, B. N., Mainland, J. D., & Sobel, N. (2003). Rapid olfactory processing implicates
subcortical control of an olfactomotor system. Journal of Neurophysiology, 90 (2), 1084–
1094.

Johnson, D. M. G., Illig, K. R., Behan, M., & Haberly, L. B. (2000). New features of connec­
tivity in piriform cortex visualized by intracellular injection of pyramidal cells suggest

Page 31 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
that” primary” olfactory cortex functions like” association” cortex in other sensory sys­
tems. Journal of Neuroscience, 20 (18), 6974.

Johnson, W. G., & Wildman, H. E. (1983). Influence of external and covert food stimuli on
insulin secretion in obese and normal persons. Behavioral Neuroscience, 97 (6), 1025–
1028.

Kadohisa, M., & Wilson, D. A. (2006). Separate encoding of identity and similarity of com­
plex familiar odors in piriform cortex. Proceedings of the National Academy of Sciences U
S A, 103 (41), 15206–15211.

Keller, A., Zhuang, H., Chi, Q., Vosshall, L. B., & Matsunami, H. (2007). Genetic variation
in a human odorant receptor alters odour perception. Nature, 449 (7161), 468–472.

Kepecs, A., Uchida, N., & Mainen, Z. F. (2006). The sniff as a unit of olfactory processing.
Chemical Senses, 31 (2), 167.

Kepecs, A., Uchida, N., & Mainen, Z. F. (2007). Rapid and precise control of sniffing dur­
ing olfactory discrimination in rats. Journal of Neurophysiology, 98 (1), 205.

Ketterer, C., Heni, M., Thamer, C., Herzberg-Schafer, S. A., Haring, H. U., & Fritsche, A.
(2010). Acute, short-term hyperinsulinemia increases olfactory threshold in healthy sub­
jects. International Journal of Obesity (London), 35 (8), 1135–1138.

Keverne, E. B. (1999). The vomeronasal organ. Science, 286 (5440), 716.

Khan, R., Luk, C., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., et al. (2007). Predict­
ing odor pleasantness from odorant structure: Pleasantness as a reflection of the physical
world. Journal of Neuroscience, 27 (37), 10015–10023.

Kimchi, T., Xu, J., & Dulac, C. (2007). A functional circuit underlying male sexual behav­
iour in the female mouse brain. Nature, 448 (7157), 1009–1014.

Kreher, S. A., Mathew, D., Kim, J., & Carlson, J. R. (2008). Translation of sensory input in­
to behavioral output via an olfactory system. Neuron, 59 (1), 110–124.

Kuukasjarvi, S., Eriksson, C. J. P., Koskela, E., Mappes, T., Nissinen, K., & Rantala, M. J.
(2004). Attractiveness of women’s body odors over the menstrual cycle: the role of oral
contraceptives and receiver sex. Behavioral Ecology, 15 (4), 579–584.

Lagier, S., Carleton, A., & Lledo, P. M. (2004). Interplay between local GABAergic in­
terneurons and relay neurons generates {gamma} oscillations in the rat olfactory bulb.
Journal of Neuroscience, 24 (18), 4382.

Laing, D. G. (1983). Natural sniffing gives optimum odour perception for humans. Percep­
tion, 12 (2), 99–117.

Laurent, G. (1997). Olfactory processing: Maps, time and codes. Current Opinion in Neu­
robiology, 7 (4), 547–553.
Page 32 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Laurent, G. (1999). A systems perspective on early olfactory coding. Science, 286 (5440),
723–728.

Laurent, G. (2002). Olfactory network dynamics and the coding of multidimensional sig­
nals. Nature Reviews, Neuroscience, 3 (11), 884–895.

Laurent, G., Wehr, M., & Davidowitz, H. (1996). Temporal representations of odors in an
olfactory network. Journal of Neuroscience, 16 (12), 3837–3847.

Leon, M., & Johnson, B. A. (2003). Olfactory coding in the mammalian olfactory bulb.
Brain Research, Brain Research Reviews, 42 (1), 23–32.

Letarte, A. (1997). Similarities and differences in affective and cognitive origins of food
likings and dislikes* 1. Appetite, 28 (2), 115–129.

Li, W., Lopez, L., Osher, J., Howard, J. D., Parrish, T. B., & Gottfried, J. A. (2010). Right or­
bitofrontal cortex mediates conscious olfactory perception. Psychological Science, 21 (10),
1454–1463.

Linster, C., Henry, L., Kadohisa, M., & Wilson, D. A. (2007). Synaptic adaptation
(p. 108)

and odor-background segmentation. Neurobiology of Learning and Memory, 87 (3), 352–


360.

Linster, C., Menon, A. V., Singh, C. Y., & Wilson, D. A. (2009). Odor-specific habituation
arises from interaction of afferent synaptic adaptation and intrinsic synaptic potentiation
in olfactory cortex. Learning and Memory, 16 (7), 452–459.

Little, A. C., Jones, B. C., & Burriss, R. P. (2007). Preferences for masculinity in male bod­
ies change across the menstrual cycle. Hormones and Behavior, 51 (5), 633–639.

Louis-Sylvestre, J., & Le Magnen, J. (1980). Palatability and preabsorptive insulin release.
Neuroscience and Biobehavioral Reviews, 4 (Suppl 1), 43–46.

Ma, M., Grosmaitre, X., Iwema, C. L., Baker, H., Greer, C. A., & Shepherd, G. M. (2003).
Olfactory signal transduction in the mouse septal organ. Journal of Neuroscience, 23 (1),
317.

Mainen, Z. F. (2006). Behavioral analysis of olfactory coding and computation in rodents.


Current Opinion on Neurobiology, 16 (4), 429–434.

Mainland, J., Johnson, B. N., Khan, R., Ivry, R. B., & Sobel, N. (2005). Olfactory impair­
ments in patients with unilateral cerebellar lesions are selective to inputs from the con­
tralesion nostril. Journal of Neuroscience, 25 (27), 6362–6371.

Mainland, J., & Sobel, N. (2006). The sniff is part of the olfactory percept. Chemical Sens­
es, 31 (2), 181–196.

Mallet, P., & Schaal, B. (1998). Rating and recognition of peers’ personal odors by 9-year-
old children: an exploratory study. Journal of General Psychology, 125 (1), 47–64.
Page 33 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Malnic, B., Hirono, J., Sato, T., & Buck, L. B. (1999). Combinatorial receptor codes for
odors. Cell, 96 (5), 713–723.

Mandairon, N., Poncelet, J., Bensafi, M., & Didier, A. (2009). Humans and mice express
similar olfactory preferences. PLoS One, 4 (1), e4209.

Maresh, A., Rodriguez Gil, D., Whitman, M. C., & Greer, C. A. (2008). Principles of
glomerular organization in the human olfactory bulb—implications for odor processing.
PLoS One, 3 (7), e2640.

Martinez, M. C., Blanco, J., Bullon, M. M., & Agudo, F. J. (1987). Structure of the piriform
cortex of the adult rat: A Golgi study. J Hirnforsch, 28 (3), 341–834.

Mathey, M. F., Siebelink, E., de Graaf, C., & Van Staveren, W. A. (2001). Flavor enhance­
ment of food improves dietary intake and nutritional status of elderly nursing home resi­
dents. Journals of Gerontology. A. Biological Sciences and Medical Sciences, 56 (4),
M200–M205.

McBride, S. A., & Slotnick, B. (1997). The olfactory thalamocortical system and odor re­
versal learning examined using an asymmetrical lesion paradigm in rats. Behavioral Neu­
roscience, 111 (6), 1273.

McClintock, M. K. (1971). Menstrual synchrony and suppression. Nature, 229 (5282),


244–245.

Meredith, M. (1983). Sensory physiology of pheromone communication. In J. G. Vanden­


bergh (Ed.), Pheromones and reproduction in mammals (pp. 200–252). New York: Acade­
mic Press.

Meredith, M. (2001). Human vomeronasal organ function: a critical review of best and
worst cases. Chemical Senses, 26 (4), 433.

Miller, S. L., & Maner, J. K. (2010). Scent of a woman: men’s testosterone responses to ol­
factory ovulation cues. Psychological Sciences, 21 (2), 276–283.

Mombaerts, P., Wang, F., Dulac, C., Chao, S. K., Nemes, A., Mendelsohn, M., et al. (1996).
Visualizing an olfactory sensorymap. Cell, 87 (4), 675–686.

Monti-Bloch, L., Jennings-White, C., Dolberg, D. S., & Berliner, D. L. (1994). The human
vomeronasal system. Psychoneuroendocrinology, 19 (5–7), 673–686.

Moran, D. T., Rowley, J. C., Jafek, B. W., & Lovell, M. A. (1982). The fine-structure of the
olfactory mucosa in man. Journal of Neurocytology, 11 (5), 721–746.

Moskowitz, H. R., & Barbe, C. D. (1977). Profiling of odor components and their mixtures.
Sensory Processes, 1 (3), 212–226.

Moulton, D. G. (1976). Spatial patterning of response to odors in peripheral olfactory sys­


tem. Physiological Reviews, 56 (3), 578–593.
Page 34 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Mozell, M. M., & Jagodowicz, M. (1973). Chromatographic separation of odorants by the
nose: Retention times measured across in vivo olfactory mucosa. Science, 181 (106),
1247–1249.

Mujica-Parodi, L. R., Strey, H. H., Frederick, B., Savoy, R., Cox, D., Botanov, Y., et al.
(2009). Chemosensory cues to conspecific emotional stress activate amygdala in humans.
PLoS One, 4 (7), 113–123.

Negoias, S., Visschers, R., Boelrijk, A., & Hummel, T. (2008). New ways to understand
aroma perception. Food Chemistry, 108 (4), 1247–1254.

Ober, C., Weitkamp, L. R., Cox, N., Dytch, H., Kostyu, D., & Elias, S. (1997). HLA and mate
choice in humans. Am J Hum Genet, 61 (3), 497–504.

Obrebowski, A., Obrebowska-Karsznia, Z., & Gawlinski, M. (2000). Smell and taste in chil­
dren with simple obesity. International Journal of Pediatric Otorhinolaryngology, 55 (3),
191–196.

O’Doherty, J., Rolls, E. T., Francis, S., Bowtell, R., McGlone, F., Kobal, G., et al. (2000).
Sensory-specific satiety-related olfactory activation of the human orbitofrontal cortex.
Neuroreport, 11 (4), 893–897.

Pageat, P., & Gaultier, E. (2003). Current research in canine and feline pheromones. Vet­
erinary Clinics of North America, Small Animal Practice, 33 (2), 187–211.

Pangborn, R. M., & Berggren, B. (1973). Human parotid secretion in response to pleasant
and unpleasant odorants. Psychophysiology, 10 (3), 231–237.

Plailly, J., Howard, J. D., Gitelman, D. R., & Gottfried, J. A. (2008). Attention to odor modu­
lates thalamocortical connectivity in the human brain. Journal of Neuroscience, 28 (20),
5257–5267.

Porter, J., Anand, T., Johnson, B., Khan, R. M., & Sobel, N. (2005). Brain mechanisms for
extracting spatial information from smell. Neuron, 47 (4), 581–592.

Porter, J., Craven, B., Khan, R. M., Chang, S. J., Kang, I., Judkewitz, B., et al. (2007).
Mechanisms of scent-tracking in humans. Nature Neuroscience, 10 (1), 27–29.

Powell, T. P., Cowan, W. M., & Raisman, G. (1965). The central olfactory connexions. Jour­
nal of Anatomy, 99 (Pt 4), 791.

Prehn, A., Ohrt, A., Sojka, B., Ferstl, R., & Pause, B. M. (2006). Chemosensory anxiety sig­
nals augment the startle reflex in humans. Neuroscience Letters, 394 (2), 127–130.

Prehn-Kristensen, A., Wiesner, C., Bergmann, T. O., Wolff, S., Jansen, O., Mehdorn, H. M.,
et al. (2009). Induction of empathy by the smell of anxiety. PLoS One, 4 (6), e5987.

Page 35 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Price, J. L. (1973). An autoradiographic study of complementary laminar patterns
(p. 109)

of termination of afferent fibers to the olfactory cortex. Journal of Comparative Neurology,


150, 87–108.

Price, J. L. (1987). The central olfactory and accessory olfactory systems. In T. E. Finger &
W. L. Silver (Eds.), Neurobiology of taste and smell (179–203). New York: Wiley.

Price, J. L. (1990). Olfactory system. In G. Paxinos (Ed.), Human nervous system (pp. 979–
1001). San Diego: Academic Press.

Ressler, K. J., Sullivan, S. L., & Buck, L. B. (1993). A zonal organization of odorant recep­
tor gene expression in the olfactory epithelium. Cell, 73 (3), 597–609.

Restrepo, D., Arellano, J., Oliva, A. M., Schaefer, M. L., & Lin, W. (2004). Emerging views
on the distinct but related roles of the main and accessory olfactory systems in respon­
siveness to chemosensory signals in mice. Hormones and Behavior, 46 (3), 247–256.

Richardson, B. E., Vander Woude, E. A., Sudan, R., Thompson, J. S., & Leopold, D. A.
(2004). Altered olfactory acuity in the morbidly obese. Obesity Surgery, 14 (7), 967–969.

Rinberg, D., Koulakov, A., & Gelperin, A. (2006). Sparse odor coding in awake behaving
mice. Journal of Neuroscience, 26 (34), 8857.

Roberts, S. C., Gosling, L. M., Carter, V., & Petrie, M. (2008). MHC-correlated odour pref­
erences in humans and the use of oral contraceptives. Proceedings of the Royal Society of
London. B. Biological Sciences, 275 (1652), 2715–2722.

Roberts, S. C., & Little, A. C. (2008). Good genes, complementary genes and human mate
preferences. Genetica, 132 (3), 309–321.

Roberts, T., & Roiser, J. P. (2010). In the nose of the beholder: are olfactory influences on
human mate choice driven by variation in immune system genes or sex hormone levels?
Experimental Biology and Medicine (Maywood), 235 (11), 1277–1281.

Roessner, V., Bleich, S., Banaschewski, T., & Rothenberger, A. (2005). Olfactory deficits in
anorexia nervosa. European Archives of Psychiatry and Clinical Neuroscience, 255 (1), 6–
9.

Rogers, P. J., & Hill, A. J. (1989). Breakdown of dietary restraint following mere exposure
to food stimuli: interrelationships between restraint, hunger, salivation, and food intake.
Addictive Behavior, 14 (4), 387–397.

Rolls, E. T. (2006). Brain mechanisms underlying flavour and appetite. Philosophical


Transactions of the Royal Society of London. B. Biological Sciences, 361 (1471), 1123–
1136.

Rolls, E. T., Critchley, H. D., & Treves, A. (1996). Representation of olfactory information
in the primate orbitofrontal cortex. Journal of Neurophysiology, 75 (5), 1982–1996.

Page 36 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Rolls, E. T., & Rolls, J. H. (1997). Olfactory sensory-specific satiety in humans. Physiology
& Behavior, 61 (3), 461–473.

Russell, M. J., Switz, G. M., & Thompson, K. (1980). Olfactory influences on the human
menstrual cycle. Pharmacology, Biochemisty, and Behavior, 13 (5), 737–738.

Saito, H., Chi, Q., Zhuang, H., Matsunami, H., & Mainland, J. D. (2009). Odor coding by a
Mammalian receptor repertoire. Science Signal, 2 (60), ra9.

Saper, C. B., Chou, T. C., & Elmquist, J. K. (2002). The need to feed: Homeostatic and he­
donic control of eating. Neuron, 36 (2), 199–211.

Savic, I., & Gulyas, B. (2000). PET shows that odors are processed both ipsilaterally and
contralaterally to the stimulated nostril. Neuroreport, 11 (13), 2861–2866.

Savigner, A., Duchamp-Viret, P., Grosmaitre, X., Chaput, M., Garcia, S., Ma, M., et al.
(2009). Modulation of spontaneous and odorant-evoked activity of rat olfactory sensory
neurons by two anorectic peptides, insulin and leptin. Journal of Neurophysiology, 101
(6), 2898–2906.

Schaal, B., Marlier, L., & Soussignan, R. (2000). Human foetuses learn odours from their
pregnant mother’s diet. Chemical Senses, 25 (6), 729–737.

Schiffman, S. S. (1974). Physicochemical correlates of olfactory quality. Science, 185


(146), 112–117.

Schiffman, S. S. (1997). Taste and smell losses in normal aging and disease. Journal of the
American Medical Association, 278 (16), 1357–1362.

Schiffman, S., Robinson, D. E., & Erickson, R. P. (1977). Multidimensional-scaling of odor­


ants—examination of psychological and physiochemical dimensions. Chemical Senses &
Flavour, 2 (3), 375–390.

Schiffman, S. S., & Warwick, Z. S. (1988). Flavor enhancement of foods for the elderly can
reverse anorexia. Neurobiology of Aging, 9 (1), 24–26.

Schmidt, H. J., & Beauchamp, G. K. (1988). Adult-like odor preferences and aversions in
three-year-old children. Child Development, 1136–1143.

Schneider, R. A., & Wolf, S. (1955). Olfactory perception thresholds for citral utilizing a
new type olfactorium. Journal of Applied Physiology, 8 (3), 337–342.

Schoenfeld, T. A., & Cleland, T. A. (2006). Anatomical contributions to odorant sampling


and representation in rodents: zoning in on sniffing behavior. Chemical Senses, 31 (2),
131.

Sela, L., Sacher, Y., Serfaty, C., Yeshurun, Y., Soroker, N., & Sobel, N. (2009). Spared and
impaired olfactory abilities after thalamic lesions. Journal of Neuroscience, 29 (39),
12059.
Page 37 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Sela, L., & Sobel, N. (2010). Human olfaction: A constant state of change-blindness. Ex­
perimental Brain Research, 1–17.

Shepherd, G. M. (2004). The human sense of smell: Are we better than we think? PLoS Bi­
ology, 2 (5), E146.

Shepherd, G. M. (2005). Outline of a theory of olfactory processing and its relevance to


humans. Chemical Senses, 30, I3–I5.

Shipley, M. T. (1995). Olfactory system. In G. Paxinos (Ed.), Rat nervous system (2nd ed.,
pp. 899–928). San Diego: Academic Press.

Singh, D., & Bronstad, P. M. (2001). Female body odour is a potential cue to ovulation.
Proceedings of the Royal Society of London. B. Biological Sciences, 268 (1469), 797–801.

Slotnick, B. M., & Schoonover, F. W. (1993). Olfactory sensitivity of rats with transection
of the lateral olfactory tract. Brain Research, 616 (1–2), 132–137.

Small, D. M., Gerber, J. C., Mak, Y. E., & Hummel, T. (2005). Differential neural responses
evoked by orthonasal versus retronasal odorant perception in humans. Neuron, 47 (4),
593–605.

Small, D. M., Jones-Gotman, M., Zatorre, R. J., Petrides, M., & Evans, A. C. (1997). Flavor
processing: more than the sum of its parts. Neuroreport, 8 (18), 3913–3917.

Snyder, D., Duffy, V., Chapo, A., Cobbett, L., & Bartoshuk, L. (2003). Childhood taste dam­
age modulates obesity risk: Effects on fat perception and preference. Obesity Research,
11, A147–A147.

Sobel, N., Prabhakaran, V., Desmond, J. E., Glover, G. H., Goode, R. L., Sullivan, E. V., et
al. (1998). Sniffing and smelling: Separate subsystems in the human olfactory cortex. Na­
ture, 392 (6673), 282–286.

Sobel, N., Prabhakaran, V., Hartley, C. A., Desmond, J. E., Zhao, Z., Glover, G. H.,
(p. 110)

et al. (1998). Odorant-induced and sniff-induced activation in the cerebellum of the hu­
man. Journal of Neuroscience, 18 (21), 8990–9001.

Soussignan, R., Schaal, B., Marlier, L., & Jiang, T. (1997). Facial and autonomic responses
to biological and artificial olfactory stimuli in human neonates: Re-examining early hedo­
nic discrimination of odors. Physiology & Behavior, 62 (4), 745–758.

Spehr, M., & Munger, S. D. (2009). Olfactory receptors: G protein-coupled receptors and
beyond. Journal of Neurochemistry, 109 (6), 1570–1583.

Spehr, M., Spehr, J., Ukhanov, K., Kelliher, K., Leinders-Zufall, T., & Zufall, F. (2006). Sig­
naling in the chemosensory systems: Parallel processing of social signals by the mam­
malian main and accessory olfactory systems. Cellular and Molecular Life Sciences, 63
(13), 1476–1484.

Page 38 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Stafford, L. D., & Welbeck, K. (2010). High hunger state increases olfactory sensitivity to
neutral but not food odors. Chemical Senses, 36 (2), 189–198.

Steiner, J. E. (1979). Human facial expressions in response to taste and smell stimulation.
Advances in Child Development and Behavior, 13, 257–295.

Stern, K., & McClintock, M. K. (1998). Regulation of ovulation by human pheromones. Na­
ture, 392 (6672), 177–179.

Stettler, D. D., & Axel, R. (2009). Representations of odor in the piriform cortex. Neuron,
63 (6), 854–864.

Stevens, D. A., & Lawless, H. T. (1981). Age-related changes in flavor perception. Appetite,
2 (2), 127–136.

Stevenson, R. J. (2010). An initial evaluation of the functions of human olfaction. Chemical


Senses, 35 (1), 3.

Stoddart, D. M. (1990). The scented ape: The biology and culture of human odour: Cam­
bridge, UK: Cambridge University Press.

Storan, M. J., & Key, B. (2006). Septal organ of Gr¸neberg is part of the olfactory system.
Journal of Comparative Neurology, 494 (5), 834–844.

Strotmann, J., Wanner, I., Krieger, J., Raming, K., & Breer, H. (1992). Expression of odor­
ant receptors in spatially restricted subsets of chemosensory neurons. Neuroreport, 3
(12), 1053–1056.

Su, C. Y., Menuz, K., & Carlson, J. R. (2009). Olfactory perception: receptors, cells, and
circuits. Cell, 139 (1), 45–59.

Tanabe, T., Iino, M., Ooshima, Y., & Takagi, S. F. (1974). Olfactory area in prefrontal lobe.
Brain Research, 80 (1), 127–130.

Tham, W. W. P., Stevenson, R. J., & Miller, L. A. (2010). The role of the mediodorsal thala­
mic nucleus in human olfaction. Neurocase, 99999 (1), 1–12.

Thornhill, R., Gangestad, S. W., Miller, R., Scheyd, G., McCollough, J. K., & Franklin, M.
(2003). Major histocompatibility complex genes, symmetry, and body scent attractiveness
in men and women. Behavioral Ecology, 14 (5), 668–678.

Tomori, Z., Benacka, R., & Donic, V. (1998). Mechanisms and clinicophysiological implica­
tions of the sniff- and gasp-like aspiration reflex. Respiration Physiology, 114 (1), 83–98.

Uchida, N., Takahashi, Y. K., Tanifuji, M., & Mori, K. (2000). Odor maps in the mammalian
olfactory bulb: Domain organization and odorant structural features. Nature Neuro­
science, 3 (10), 1035–1043.

Page 39 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Vassar, R., Ngai, J., & Axel, R. (1993). Spatial segregation of odorant receptor expression
in the mammalian olfactory epithelium. Cell, 74 (2), 309–318.

Verhagen, J. V., Wesson, D. W., Netoff, T. I., White, J. A., & Wachowiak, M. (2007). Sniffing
controls an adaptive filter of sensory input to the olfactory bulb. Nature Neuroscience, 10
(5), 631–639.

Wedekind, C., & Furi, S. (1997). Body odour preferences in men and women: Do they aim
for specific MHC combinations or simply heterozygosity? Proceedings of the Royal Soci­
ety of London. B. Biological Sciences, 264 (1387), 1471–1479.

Wedekind, C., Seebeck, T., Bettens, F., & Paepke, A. J. (1995). MHC-dependent mate pref­
erences in humans. Proceedings of the Royal Society of London. B. Biological Sciences,
260 (1359), 245–249.

Wilson, D. A. (1997). Binaral interactions in the rat piriform cortex. Journal of Neurophys­
iology, 78 (1), 160–169.

Wilson, D. A. (2009a). Olfaction as a model system for the neurobiology of mammalian


short-term habituation. Neurobiology of Learning and Memory, 92 (2), 199–205.

Wilson, D. A. (2009b). Pattern separation and completion in olfaction. Annals of the New
York Academy of Sciences, 1170, 306–312.

Wilson, D. A., & Stevenson, R. J. (2003). The fundamental role of memory in olfactory per­
ception. Trends in Neurosciences, 26 (5), 243–247.

Wilson, R. I., & Mainen, Z. F. (2006). Early events in olfactory processing. Neuroscience,
29 (1), 163.

Witt, M., & Hummel, T. (2006). Vomeronasal versus olfactory epithelium: is there a cellu­
lar basis for human vomeronasal perception? International review of cytology, 248, 209–
259.

Wysocki, C. J., & Meredith, M. (1987). The vomeronasal system. In T. E. Finger & W. L.
Silver (Eds.), Neurobiology of taste and smell (pp. 125–150). New York: John Wiley &
Sons.

Wysocki, C. J., Pierce, J. D., & Gilbert, A. N. (1991). Geographic, cross-cultural, and indi­
vidual variation in human olfaction. In T. V. Getchell (Ed.), Smell and taste in health and
disease (pp. 287–314). New York: Raven Press.

Wysocki, C. J., & Preti, G. (2004). Facts, fallacies, fears, and frustrations with human
pheromones. Anatomical Record. A. Discoveries in Molecular, Cellular, and Evolutionary
Biology, 281 (1), 1201–1211.

Yeomans, M. R. (2006). Olfactory influences on appetite and satiety in humans. Physiolo­


gy of Behavior, 89 (1), 10–14.

Page 40 of 41
Looking at the Nose Through Human Behavior, and at Human Behavior
Through the Nose
Yeshurun, Y., & Sobel, N. (2010). An odor is not worth a thousand words: from multidi­
mensional odors to unidimensional odor objects. Annual Review of Psychology, 61, 219–
241.

Zarzo, M. (2008). Psychologic dimensions in the perception of everyday odors: Pleasant­


ness and edibility. Journal of Sensory Studies, 23 (3), 354–376.

Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness* 1. Consciousness
and Cognition, 8 (2), 225–259.

Zelano, C., & Sobel, N. (2005). Humans as an animal model for systems-level organization
of olfaction. Neuron, 48 (3), 431–454.

Zhang, X., & Firestein, S. (2002). The olfactory receptor gene superfamily of the mouse.
Nature Neuroscience, 5 (2), 124–133.

Zhou, W., & Chen, D. (2009). Fear-related chemosignals modulate recognition of fear in
ambiguous facial expressions. Psychological Science, 20 (2), 177.

Zilstorff-Pedersen, K. (1955). Olfactory threshold determinations in relation to food in­


take. Acta Otolaryngologica, 45 (1), 86–90.

Zufall, F., Firestein, S., & Shepherd, G. M. (1994). Cyclic nucleotide-gated ion channels
and sensory transduction in olfactory receptor neurons. Annual Review of Biophysics and
Biomolecular Structure, 23 (1), 577–607.

Roni Kahana

Roni Kahana, Department of Neurobiology, Weizmann Institute of Science, Rehovot,


Israel

Noam Sobel

Noam Sobel, Department of Neurobiology, Weizmann Institute of Science, Rehovot,


Israel

Page 41 of 41
Cognitive Neuroscience of Music

Cognitive Neuroscience of Music  


Petr Janata
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0007

Abstract and Keywords

Humans engage with music in many ways, and music is associated with many aspects of
our personal and social lives. Music represents an organization of our auditory environ­
ments, and many neural processes must be recruited and coordinated both to perceive
and to create musical patterns. Accordingly, our musical experiences depend on the inter­
play of diverse brain systems underlying perception, cognition, action, and emotion. Com­
pared with the study of other human faculties, the neuroscientific study of music is rela­
tively recent. Paradigms for examining musical functions have been adopted from other
domains of neuroscience and also developed de novo. The relationship of music to other
cognitive domains, in particular language, has garnered considerable attention. This
chapter provides a survey of the experimental approaches taken, and emphasizes consis­
tencies across the various studies that help us understand musical functions within the
broader field of cognitive neuroscience.

Keywords: music, auditory environments, cognition, neuroscience

Introduction
Discoveries of ancient bone flutes illustrate that music has been a part of human society
for millennia (Adler, 2009). Although the reason for why the human brain enables musical
capacities is hotly debated—did it evolve like language, or is it an evolutionary epiphe­
nomenon?—the fact that human individuals and societies devote considerable time and
resources to incorporate music into their lives is indisputable.

Scientific study of the psychology and neuroscience of music is relatively recent. In terms
of human behaviors, music is usually viewed and judged in relation to language. Accord­
ingly, the organization of music in the brain has been viewed by many in terms of modular
organization, whereby music-specific processing modules are associated with specialized
neural substrates in much the same way that discrete brain areas (e.g. Broca’s and
Wernicke’s areas) have been associated traditionally with language functions (Peretz &
Coltheart, 2003). Indeed, neuropsychological case studies in which the loss of language
Page 1 of 42
Cognitive Neuroscience of Music

abilities can be dissociated from the loss of musical abilities, and vice versa, clearly sup­
port a modular view. To the extent that language has been treated as a domain of cogni­
tive neuroscience that is largely separate from other functional domains—such as percep­
tion, memory, attention, action, and emotion—music, too, has been regarded as a cogni­
tive novelty with a unique embodiment in the human brain. In other words, music has
been subjected to the same localizationist pressures that have historically pervaded cog­
nitive neuroscience. Neuroscientific inquiry into musical functions has therefore focused
most often on the auditory cortices situated along the superior surface of the temporal
lobes. The logic is simple: if music is an auditory phenomenon and the auditory cortex
(p. 112) processes auditory information, then music must reside in the auditory cortex.

Although it is true that we mainly listen to music, meaningful engagement with music is
much more than a perceptual phenomenon. The most obvious example is that of musical
performance, whereby the action systems of the brain must be engaged. However, as de­
scribed below in detail, underneath overt perception and action lie more subtle aspects of
musical engagement: attention, various forms of memory, and covert action such as men­
tal imagery. Finally, music plays on our emotions in various ways. Thus, the most parsimo­
nious view of how music coexists with the other complex behaviors that the human brain
supports is not one in which all of the cognitive functions on which music depends are lo­
calized to a specific area of the brain, but rather one in which musical experiences are in­
stantiated through the coupling of networks that serve domain general functions. Follow­
ing a brief description of key principles underlying the organization of acoustic informa­
tion into music, this chapter treats the neuroscience of each of those functions in turn.

Building Blocks of Music


In order to discuss the cognitive neuroscience of music, it is necessary to describe briefly
some of the basic dimensions on which music is organized. These broad dimensions en­
compass time, tonality, and timbre. Music from different cultures utilizes these dimen­
sions to varying degrees, and in general it is the variability in the ways that these dimen­
sions are utilized that gives rise to concepts of musical styles and genres that vary within
and across cultures. Here I restrict my discussion to Western tonal music.

Time

The patterning of acoustic events in time is a crucial element of music that gives rise to
rhythm and meter, properties that are essential in determining how we move along with
music. We often identify “the beat” in music by tapping our feet or bobbing our heads
with a regular periodicity that seems to fit best with the music, and this capacity for en­
trainment has strong social implications in terms of coordinated action and shared experi­
ence (Pressing, 2002; Wiltermuth & Heath, 2009). Importantly, meter and rhythm provide
a temporal scaffold that guides our expectations for when events will occur (Jones, 1976;
London, 2004). This scaffold might be thought of in terms of coupled oscillators that fo­
cus attention at particular moments in time (Large & Jones, 1999; Large & Palmer, 2002)

Page 2 of 42
Cognitive Neuroscience of Music

and thereby influence our perception of other musical attributes such as pitch (Barnes &
Jones, 2000; Jones et al., 2002).

Our perception of metric structure, that aspect of music that distinguishes a waltz from a
march, is associated with hierarchies of expectations for when events will occur (Jones &
Boltz, 1989; Palmer & Krumhansl, 1990). Some temporal locations within a metric struc­
ture are more likely to be associated with events (“strong beats”), whereas others are less
likely (“weak beats”). The manipulation of when events actually occur relative to weak
and strong beat locations is associated with syncopation, a salient feature of many
rhythms, which in turn characterize musical styles and shape our tendency to move along
with music (Pressing, 2002).

Tonality

Tonality refers to notes (pitches) and the relationships among notes. Even though there
are many notes that differ in their fundamental frequency (e.g., the frequency associated
with each key on a piano keyboard), tonality in Western tonal music is based on twelve
pitch classes, with each pitch class associated with a label such as C or C-sharp (Figure
7.1). The organization into twelve pitch classes arises because (1) two sounds that are
separated by an octave (a doubling in frequency) are perceptually similar and are given
the same note name (e.g., F), and (2) an octave is divided into twelve equal (logarithmic)
frequency steps called semitones. When we talk about melody we refer to sequences of
notes, and when we talk about harmony, we refer to two or more notes that sound at the
same time to produce an interval or chord. Melodies are defined in large part by their
contours—the pattern of ups and downs of the notes—such that changes in contour are
easier to detect than contour-preserving shifts of isolated notes when the entire melody is
transposed to a different key (Dowling, 1978).

Figure 7.1 Tonal relationships. A, The relationship


between pitch height and pitch chroma (pitch class)

Page 3 of 42
Cognitive Neuroscience of Music

is obtained by placing successively higher pitches on


a spiral, such that one full turn of the spiral corre­
sponds to a frequency doubling. This arrangement
captures the phenomenon of octave equivalence.
When the spiral is viewed from above, the chroma
circle emerges. The chroma circle comprises the
twelve pitch classes (C, C#, D, etc.) that are used in
Western tonal music. B, The seven pitch classes
(notes) belonging to the C-major scale are shown in
musical notation. The notes on the staff are in ap­
proximate alignment with the notes along the pitch
spiral. The unfilled symbols above the notes labeled
C, D, and E illustrate the additional notes that would
be played in conjunction with each root note to form
a triad—a type of chord. Roman numerals are used to
designate the scale position of each note. The case of
the numeral indicates whether the triad formed with
that note as a root has a major (uppercase) or minor
(lowercase) quality. C, Although seven of the twelve
possible pitch classes belong to a key, they are not
equally representative of the key. This fact is embod­
ied in the concept of tonal hierarchies (key profiles),
which can be derived via psychological methods such
as goodness-of-fit ratings of probe tones correspond­
ing to each of the twelve possible pitch classes.
Shown in blue is the canonical Krumhansl & Kessler
key profile (Krumhansl, 1990). The generality of this
profile is illustrated by the red bars, which were ob­
tained from about 150 undergraduate students of
varying musical aptitude in the author’s psychology
of music class. Each student made a single judgment
about each probe tone. Very similar key profiles are
obtained when the distributions of notes in pieces of
music are generated, suggesting that tonal knowl­
edge embodies the statistics of the music we hear.
The seven notes belonging to the key (black note
names) are judged to fit better than the five notes
that don’t belong to the key (blue note names). The
number symbol (#) following a letter indicates a
pitch that is raised (sharp) by one semitone, whereas
a “b” following a letter indicates a note that is low­
ered (flat) by one semitone. D, The fact that each ma­
jor and minor key is associated with a different key
profile gives rise to the concept of distance between
the different keys. In music theory, the distances are
often represented by the circle of fifths for the major
(red) and minor (cyan) keys. The circle is so named
because working in a clockwise direction, the fifth
scale degree, which is the second most stable note in
a key (e.g., the note G in C-major), becomes the most
stable note (the tonic) of the next key (e.g., G-major).
The distances between major and minor keys (pitch
probability distributions) are represented most parsi­
moniously on the surface of a torus. The toroidal rep­
resentation is arrived at either by multidimensional
scaling of subjective distances between the keys or
by self-organizing neural networks that are served
music that moves through all twenty-four major and
minor keys as input. Each location on the torus rep­
resents a particular pitch probability distribution. Ac­
cordingly, a tally of the notes in a musical segment
can be projected onto the toroidal surface and ren­
Page 4 of 42
Cognitive Neuroscience of Music

dered in color to indicate the likelihood that the


piece of music is in a particular key at a particular
moment in time. Because the notes that make up a
piece’s melodies and harmonies change in time,
thereby creating variation in the momentary key pro­
files, the activation on the torus changes dynamically
in time.

Central to the way that tonality works are the concepts of pitch probability distributions
and the related notion of key (Krumhansl, 1990; Temperley, 2001, 2007). When we say
that a piece of music is in the key of G-major or g-minor, it means that certain notes, such
as G or D, will be perceived as fitting well into that key, whereas others, like G-sharp or C-
sharp, will sound out of place. Tallying up the notes of many different pieces written in
the key of G-major, we would find that G and D are the notes that occur most often,
whereas G-sharp and C-sharp would not be very frequent. That is, (p. 113) the key of G-
major is associated with a particular probability distribution across the twelve possible
pitch classes. The key of D-major is associated with a slightly different probability distrib­
ution, albeit one that is closely related to that of G-major in that the note (pitch class) D
figures prominently in both. However, the probability distribution for the key of C-sharp
major differs markedly from that of G-major. These probability distributions are often re­
ferred to as key profiles or tonal hierarchies, and they simultaneously reflect the statistics
of music as well as perceptual distances between individual notes and the keys (or tonal
contexts) that have been established in the minds of listeners (Krumhansl, 1990; Temper­
ley, 2007). Knowledge of the statistics underlying tonality is presumably acquired through
(p. 114) implicit statistical learning mechanisms (Tillmann et al., 2000), as evidenced by

the brain’s rapid adaptation to novel tonal systems (Loui et al., 2009b).

Conveniently, the perceived, statistical, and music-theoretical distance relationships be­


tween keys can be represented geometrically on the surface of a ring (torus) such that
keys that have many of their notes in common are positioned close to each other on the
toroidal surface (Krumhansl & Kessler, 1982; Krumhansl, 1990). Each location on the
toroidal surface is associated with a probability distribution across the twelve pitch class­
es. If a melody or sequence of chords is played that contains notes in proportions that
correspond to the probability distribution for a particular location on the torus, then that
region of tonal space is considered activated or primed. If a chord is played whose con­
stituent notes belong to a very different probability distribution (i.e., a distantly situated
location on the torus corresponding to a distant key), the chord will sound jarring and out
of place. However, if several other chords are now played that are related to the chord
that was jarring, the perceived tonal center will shift to that other region of the torus.
Therefore, movement of music in tonal space is dynamic and dependent on the melodies
and chord progressions that are being played (Janata, 2007; Janata et al., 2002b; Toivi­
ainen, 2007; Toiviainen & Krumhansl, 2003).

Page 5 of 42
Cognitive Neuroscience of Music

Timbre

The third broad dimension of music is timbre. Timbre is what distinguishes instruments
from each other. Timbre is the spectrotemporal signature of an instrument: a description
of how the frequency content of the sound changes in time. If one considers the range of
musical sounds that are generated not just by physical instruments but also by electronic
synthesizers, human voices, and environmental sounds that are used for musical purpos­
es, the perceptual space underlying timbre is vast. Based on multidimensional scaling
analyses of similarity judgments between pairs of sounds, it has been possible to identify
candidate dimensions of timbre (Caclin et al., 2005; Grey, 1977; Lakatos, 2000; McAdams
et al., 1995). Most consistently identified across studies are dimensions corresponding to
attack and spectral centroid. Attack refers to the onset characteristics of the amplitude
envelope of the sound (e.g., percussive sounds have a rapid attack, whereas bowed
sounds have a slower attack). Centroid refers to the location along the frequency axis
where the peak of energy lies, and is commonly described as the “brightness” of a sound.
The use of acoustic features beyond attack and centroid for the purpose of judging the
similarities between pairs of sounds is much less consistent and appears to be context de­
pendent (Caclin et al., 2005). For (p. 115) example, variation in spectral fine structure,
such as the relative weighting of odd and even frequency components, or time-varying
spectral features (spectrotemporal flux) influence similarity judgments also. The fact that
timbre cannot be decomposed into a compact set of consistent dimensions that satisfacto­
rily explain the abundance of perceptual variation somewhat complicates the search for
the functional representation of timbre in the brain.

Perception and Cognition


Tonality

Pitch and Melody


Some of the earliest attempts to understand the neural substrates for music processing
focused on the ability of patients, in whom varying amounts of the temporal lobes, and
thereby auditory cortical areas, had been removed, to detect alterations in short
melodies. A consistent result has been that right temporal lobe (RTL) damage impairs the
ability of individuals to detect a variety of changes in melodies, including the starkest
type of change in which the altered note violates both the contour of the melody and the
key, whereas left temporal lobe damage leads to no or little impairment (Dennis & Hopy­
an, 2001; Liegeois-Chauvel et al., 1998; Samson & Zatorre, 1988; Warrier & Zatorre,
2004; Zatorre, 1985). However, more difficult discriminations, in which the altered notes
preserve the contour and the key of the initial melody, suffer when either hemisphere is
damaged. Patients with RTL damage have difficulty judging whether one pitch is higher
or lower than the next, a critical ability for determining both the contour and the specific
intervals (distances between notes) that define a melody, even though their basic ability
to discriminate whether the notes are the same or different remains intact (Johnsrude et

Page 6 of 42
Cognitive Neuroscience of Music

al., 2000). Whereas having the context of a familiar melody generally facilitates the abili­
ty to detect when a final note is mistuned (Warrier & Zatorre, 2002), RTL damage re­
duces the amount of that facilitation (Warrier & Zatorre, 2004), indicating that the pro­
cessing of pitch relationships in the temporal lobes also affects more basic perceptual
processes such as intonation judgments. Interestingly, the melody processing functions of
the RTL appear to depend in large part on areas that are anterior to the primary auditory
cortex, which is situated on Heschl’s gyrus (HG; Johnsrude et al., 2000; Samson & Za­
torre, 1988).

The results from the patient studies are supported by a number of functional magnetic
resonance imaging (fMRI) studies of pitch and melody processing. A hierarchy of pitch
processing is observed in the auditory cortex following a medial to lateral progression.
Broadband noise or sounds that have no clear pitch activate medial HG, sounds with dis­
tinct pitch produce more activation in the lateral half of HG, and sequences in which the
pitch varies, in either tonal or random melodies, generate activity that extends rostrally
from HG along the superior temporal gyrus (STG) toward the planum polare, biased to­
ward the right hemisphere (Patterson et al., 2002).

One of the critical features of pitch in music is the distinction between pitch height and
pitch chroma (pitch class). The chroma of a pitch is referred to by its note name (e.g. D,
D#). Critically, chroma represent perceptual constancy that allows notes played in differ­
ent octaves to be identified as the same note. These separable aspects of pitch appear to
have partially distinct neural substrates, with preferential processing of pitch height pos­
terior to HG in the planum temporale, and processing of chroma anterolateral to HG in
the planum polare (Warren et al., 2003), consistent with the proposed role of anterior
STG regions in melody processing (Griffiths et al., 1998; Patterson et al., 2002;
Schmithorst & Holland, 2003). The neural representations of individual pitches in melod­
ic sequences are also influenced by the statistical structure of the sequences (Patel & Bal­
aban, 2000, 2004). In these experiments, the neural representations were quantified by
examining the quality of the coupling between the amplitude modulations of the tones
used to create the sequence and the amplitude modulations in the response recorded
above the scalp using magnetoencephalography (MEG). Random sequences elicited little
coupling, whereas highly predictable musical scales elicited the strongest coupling. These
effects were strongest above temporal and lateral prefrontal sensor sites, consistent with
a hypothesis that a frontotemporal circuit supports the processing of melodic structure.

Detecting Wrong Notes and Wrong Chords


The representation and processing of tonal information has been the aspect of music that
has been most extensively studied using cognitive neuroscience methods. Mirroring the
approach taken in much of the rest of cognitive neuroscience, “expectancy violation” par­
adigms have been the primary approach to establishing the existence of a cognitive
schema through which we assimilate pitch information. In other words, how does the
brain respond (p. 116) when a target event, usually the terminal note of a melody or chord
of a harmonic progression, is unexpected given the preceding musical context?

Page 7 of 42
Cognitive Neuroscience of Music

When scales or melodies end in notes that do not belong to the established key, large pos­
itive deflections are evident in event-related potentials (ERPs) recorded at posterior scalp
sites, indicating the activation of congruency monitoring and context-updating processes
indexed by the P300 or late-positive complex (LPC) components of ERP waveforms
(Besson & Faïta, 1995; Besson & Macar, 1987; Paller et al., 1992). These effects are ac­
centuated in subjects with musical training and when the melodies are familiar (Besson &
Faïta, 1995; Miranda & Ullman, 2007). Similarly, short sequences of chords that termi­
nate with a chord that is unexpected given the tonal context established by the preceding
chords elicit P300 and LPC components (Beisteiner et al., 1999; Carrion & Bly, 2008;
Janata, 1995; Koelsch et al., 2000; Patel et al., 1998), even in natural musical contexts
(Koelsch & Mulder, 2002). As would be expected given the sensitivity of the P300 to glob­
al event probability (Tueting et al., 1970), the magnitude of the posterior positive re­
sponses increases as the starkness of the harmonic violation increases (Janata, 1995; Pa­
tel et al., 1998).

The appearance of the late posterior positivities depends on overt processing of the tar­
get chords by making either a detection or categorization judgment. When explicit judg­
ments about target chords are eliminated, the most prominent deviance-related response
is distributed frontally, and typically manifests as an attenuated positivity approximately
200 ms after the onset of a deviant chord. This relative negativity in response to contextu­
ally irregular chords was termed an early right anterior negativity (ERAN; Koelsch et al.,
2000), although in many subsequent studies, it was found to be distributed bilaterally.

The ERAN has been studied extensively and is interesting for two principle reasons. First,
the ERAN and the right anterior negativity (RATN; Patel et al., 1998) have been interpret­
ed as markers of syntactic processing in music, paralleling the left anterior negativities
associated with the processing of syntactic deviants in language (Koelsch et al., 2000; Pa­
tel et al., 1998). Localization of the ERAN to Broca’s area using MEG supports such an in­
terpretation (Maess et al., 2001). (The parallels between syntactic processing in music
and language are discussed in a later subsection.) Second, the ERAN is regarded as an in­
dex of automatic harmonic syntax processing in that it is elicited even when the irregular
chords themselves are not task relevant (Koelsch et al., 2000; 2002b). Whether the ERAN
is attenuated when attention is oriented away from musical material is a matter of some
debate (Koelsch et al., 2002b; Loui et al., 2005).

The ERAN is a robust index of harmonic expectancy violation processing, and it is sensi­
tive to musical training. It is found in children (Koelsch et al., 2003b), and it increases in
amplitude with musical training in both adults (Koelsch et al., 2002a) and children
(Jentschke & Koelsch, 2009).

The amplitude of the ERAN is also sensitive to the probability with which a particular
chord occurs at a particular location in a sequence. For example, the ERAN to the same
irregular chord function, such as a Neapolitan sixth, is weaker when that chord occurs at
a sequence location that is more plausible from a harmonic syntax perspective (Leino et
al., 2007). Similarly, using Bach chorales, chords that are part of the original score, but

Page 8 of 42
Cognitive Neuroscience of Music

not the most expected from a music-theoretical point of view, elicit an ERAN, in compari­
son to more expected chords that have been substituted in, but a much weaker ERAN
than highly unexpected Neapolitan sixth chords inserted into the same location (Steinbeis
et al., 2006).

The automaticity of the ERAN naturally leads to comparisons with the mismatch negativi­
ty (MMN), the classic marker of preattentive detection of deviant items in auditory
streams (Näätänen, 1992; Näätänen & Winkler, 1999). Given that irregular chords might
be regarded as violations of an abstract context established by a sequence of chords, the
ERAN could just be a form of “abstract MMN.” The ERAN and MMN occur within a simi­
lar latency range, and their frontocentral distributions often make them difficult to distin­
guish from one another based on their scalp topographies (e.g. Koelsch et al., 2001; Leino
et al., 2007). Nonetheless, the ERAN and MMN are dissociable (Koelsch, 2009). For ex­
ample, an MMN elicited by an acoustically aberrant stimulus, such as a mistuned chord
or a change in the frequency of a note (frequency MMN), does not show sensitivity to lo­
cation within a harmonic context (Koelsch et al., 2001; Leino et al., 2007). Moreover, if
the sensory properties of the chord sequences are carefully controlled in terms of repeti­
tion priming for specific notes or the relative roughness of target chords and those in the
preceding context, an ERAN is elicited by harmonically incongruent chords even if the
harmonically incongruent chords are more similar (p. 117) in their sensory characteristics
to the penultimate chords than are the harmonically congruent chords (Koelsch et al.,
2007).

A number of fMRI studies have contributed to the view that musical syntax is evaluated in
the ventrolateral prefrontal cortex (VLPFC), in a region comprising the ventral aspect of
the inferior frontal gyrus (IFG), frontal operculum, and anterior insula. The evaluation of
target chords in a harmonic priming task results in bilateral activation of this region, and
is greater for harmonically unrelated targets than harmonically related targets (Koelsch
et al., 2005b; Tillmann et al., 2003). Similarly, chord sequences that contain modulations
—strong shifts in the tonal center toward another key—activate this region in the right
hemisphere (Koelsch et al., 2002c). Further evidence that the VLPFC is sensitive to con­
textual coherence comes from a paradigm in which subjects listened to a variety of 23-
second excerpts of familiar and unfamiliar classical music. Each excerpt was rendered in­
coherent by being chopped up into 250- to 350-ms segments and then reconstituted with
random arrangement of the segments. Bilaterally, the inferior frontal cortex and adjoining
insula responded more strongly to the normal music, compared with reordered music
(Levitin & Menon, 2003).

Tonal Dynamics
Despite their considerable appeal from an experimental standpoint, trial-based expectan­
cy violation paradigms are limited in their utility in investigating the brain dynamics that
accompany listening to extended passages of music in which the manipulation of ex­
pectancies is typically more nuanced and ongoing. When examined more closely, chord
sequences such as those used in the experiments described above do more than establish
a particular static tonal context. They actually create patterns of movement within tonal
Page 9 of 42
Cognitive Neuroscience of Music

space—the system of major and minor keys that can be represented on a torus. The de­
tails of the trajectories depend on the specific chords and the sequence in which they oc­
cur (Janata, 2007). Different pieces of music will create different patterns of movement
through tonal space, depending on the notes in the melodies and harmonic accompani­
ments. The time-varying pattern on the toroidal surface can be quantified for any piece of
music, and this quantification can then be used to probe the time-varying structure of the
fMRI activity recorded while a person listens to the music. This procedure identifies brain
areas that are sensitive to the movements of the music through tonal space (Janata, 2005,
2009; Janata et al., 2002b).

The “tonality-tracking” approach has suggested a role of the medial prefrontal cortex
(MPFC) in the maintenance of tonal contexts and the integration of tonal contexts with
music-evoked autobiographical memories (Janata, 2009). When individuals underwent fM­
RI scans while listening attentively to an arpeggiated melody that systematically moved
(modulated) through all twenty-four major and minor keys over the course of 8 minutes
(Janata et al., 2003), the MPFC was the one brain region that was consistently active
across three to four repeated sessions within listeners and across listeners, even though
consistent tonality tracking responses were observed at the level of individuals in several
brain areas (Janata, 2005; Janata et al., 2002b). ERP studies provide converging evidence
for a context maintenance interpretation in that a midline negativity with a very frontal
focus characterizes both the N5 component, a late negative peak that has been interpret­
ed to reflect contextual integration of harmonically incongruous material (Koelsch et al.,
2000; Loui et al., 2009b), and a sustained negative shift in response to modulating se­
quences (Koelsch et al., 2003a). As discussed below, tonality tracking in the MPFC is ger­
mane to understanding how music interacts with functions underlying a person’s sense of
self because the MPFC is known to support such functions (Gilbert et al., 2006; Northoff
& Bermpohl, 2004; Northoff et al., 2006).

Rhythm and Meter

As in the case of melody perception, the question arises to what extent auditory cortical
areas in the temporal lobe are engaged in the processing of musical rhythmic patterns.
Studies of patients in whom varying amounts of either the left or right anterior temporal
lobes have been removed indicate that the greatest impairment in reproducing rhythmic
patterns is found in patients with excisions that encroach on secondary auditory areas in
HG in the right hemisphere (Penhune et al., 1999). The deficits are observed when exact
durational patterns are to be reproduced, but not when the patterns can be encoded cate­
gorically as sequences of long and short intervals.

Given the integral relationship between timing and movement, and the propensity of hu­
mans to move along with the beat in the music, neuroimaging experiments of rhythm and
meter perception have examined the degree to which motor systems (p. 118) of the brain
are engaged alongside auditory areas during passive listening to rhythms or attentive lis­
tening while performing a secondary discrimination task (Grahn & Rowe, 2009), listening
with the intent to subsequently synchronize with or reproduce the rhythm (Chen et al.,

Page 10 of 42
Cognitive Neuroscience of Music

2008a), or listening with the intent to make a same/different comparison with a target
rhythm (Grahn & Brett, 2007). Discrimination of metrically simple, complex, and non­
metric rhythms recruits, bilaterally, the auditory cortex, cerebellum, IFG, and a set of pre­
motor areas including the basal ganglia (putamen), pre–supplementary motor area (pS­
MA) or supplementary motor area (SMA), and dorsal premotor cortex (PMC) (Grahn &
Brett, 2007). The putamen has been found to respond more strongly to simple rhythms
than complex rhythms, suggesting that its activation is central to the experienced
salience of a beat. A subsequent study found stronger activation throughout the basal
ganglia in response to beat versus nonbeat rhythms, along with greater coupling of the
putamen with the auditory cortex and medial and lateral premotor areas in the beat con­
ditions (Grahn & Rowe, 2009). Interestingly, the response of the putamen increased as ex­
ternal accenting cues weakened, suggesting that activity within the putamen is also
shaped by the degree to which listeners generate a subjective beat.

Activity in premotor areas and the cerebellum is differentiated by the degree of engage­
ment with a rhythm (Chen et al., 2008a). The SMA and mid-PMC are active during pas­
sive listening to rhythms of varying complexity. Listening to a rhythm with the require­
ment to subsequently synchronize with that rhythm recruits these regions along with ven­
tral premotor and inferior frontal areas. These regions are then also active when subjects
subsequently synchronize their taps with the rhythm. Similar results are obtained in re­
sponse to short 3-second piano melodies: lateral premotor areas are activated both dur­
ing listening and during execution of arbitrary key press sequences without auditory
feedback (Bangert et al., 2006). Converging evidence for the recruitment of premotor ar­
eas during listening to rhythmic structure in music has been obtained through analyses of
MEG data in which a measure related to the amplitude envelope of the auditory stimulus
is correlated with the same measure applied to the MEG data (Popescu et al., 2004). One
study of attentive listening to polyphonic music also found increased activation of premo­
tor areas (pSMA, mid-PMC), although the study did not seek to associate these activa­
tions directly with the rhythmic structure in the music (Janata et al., 2002a).

Timbre

Given the strong dependence of music on variation in timbre (instrument sounds), it is


surprising that relatively few studies have addressed the representation of timbre in the
brain. Similarity judgments of pairs of heard or imagined orchestral instrument sounds
drive activity in auditory cortical areas along the posterior half of the STG, around HG
and within the planum temporale (Halpern et al., 2004). When activation in response to
more complex timbres (sounds consisting of more spectral components—harmonics—and
greater temporal variation in those harmonics) is compared with simpler timbres or pure
tones, regions of the STG surrounding primary auditory areas stand out as more active
(Menon et al., 2002; Meyer et al., 2006). Processing of attack and spectral centroid cues
is more impaired in individuals with right temporal lobe resections than in those with left
temporal lobe resections or in normal controls (Samson & Zatorre, 1994).

Page 11 of 42
Cognitive Neuroscience of Music

A number of studies have made explicit use of the multidimensional scaling approach to
examine the organization of timbral dimensions in the brain. For example, when based on
similarity judgments of isolated tones varying in attack or spectral centroid, the perceptu­
al space of patients with resections of the right temporal lobe is considerably more dis­
torted than is that of normal controls or individuals with left temporal lobe resections
(Samson et al., 2002). These impairments are ameliorated to a great extent, but not en­
tirely, when the timbral similarity of eight-note melodies is judged (Samson et al., 2002),
although the extent to which melodic perception as opposed to simple timbral reinforce­
ment drives this effect is unclear.

Evidence that timbral variation is assessed at relatively early stages of auditory cortical
processing comes from observations that the MMN is similar to changes in the emotional
connotation of a tone played by a violin, a change in timbre from violin to flute, and
changes in pitch (Goydke et al., 2004). Moreover, MMN responses are largely additive
when infrequent ignored deviant sounds deviate from standard ignored sounds on multi­
ple timbre dimensions simultaneously, suggesting that timbral dimensions are processed
within separate sensory memory channels (Caclin et al., 2006). Also, within the represen­
tation of the spectral centroid dimension, the (p. 119) magnitude of the MMN varies lin­
early with perceptual and featural similarity (Toiviainen et al., 1998).

Although timbre can be considered in terms of underlying feature dimensions, musical


sounds nonetheless have a holistic object-like quality to them. Indeed, the perceptual pro­
cessing of timbre dimensions is not entirely independent (Caclin et al., 2007). The inter­
activity of timbral dimensions becomes evident when timbral categorization judgments
are required and manifest themselves mainly in the amplitude of later decision-related
components such as the P300 (Caclin et al., 2008). An understanding of how timbral ob­
jects are represented and behave within broader brain networks in a task-dependent
manner, such as when musical pieces are recognized based on very brief timbral cues
(Schellenberg et al., 1999), or emotional distinctions are made (Goydke et al., 2004; Bi­
gand et al., 2005), remains to be elaborated.

Attention

Most of the research described to this point was aimed at understanding the representa­
tion of basic musical features and dimensions, and the processing of change along those
dimensions, without much regard for the broader and perhaps more domain-general psy­
chological processes that are engaged by music.

Following from the fact that expectancy violation paradigms have been a staple of cogni­
tive neuroscience research on music, considerable information has been collected about
attentional capture by deviant musical events. Working from a literature based primarily
on visual attention, Corbetta and Shulman (2002) proposed a distinction between a dorsal
and a ventral attentional system, whereby the ventral attentional system is engaged by
novel or unexpected sensory input while the dorsal attentional system is active during en­
dogenously guided expectations, such as the orientation of attention to a particular spa­

Page 12 of 42
Cognitive Neuroscience of Music

tial location. Overall, the orienting and engagement of attention in musical tasks recruits
these attention systems. Monitoring for target musical events and the targets themselves
cause activity increases in the ventral system—in the VLPFC in the region of the frontal
operculum where the IFG meets the anterior insula (Janata et al., 2002a; Koelsch et al.,
2002c; Tillmann et al., 2003). The strongest activation arises when the targets violate har­
monic expectations (Maess et al., 2001; Tillmann et al., 2003).

Structural markers in music, such as the boundaries of phrases or the breaks between
movements, also cause the ventral and dorsal attentional systems to become engaged
(Nan et al., 2008; Sridharan et al., 2007), with the ventral system leading the dorsal sys­
tem (Sridharan et al., 2007). Attentive listening to excerpts of polyphonic music engages
both systems even in the absence of specific targets or boundary markers (Janata et al.,
2002a ; Satoh et al., 2001). Interestingly, the ventral attentional system is engaged, bilat­
erally, when (1) an attentive listening task requires target detection in either selective or
divided attention conditions, or (2) selective listening is required without target detec­
tion, but not during divided/global listening without target detection. When task demands
are shifted from target detection to focusing attention on an instrument as though one
were trying to memorize the part the instrument is playing, working memory areas in the
dorsolateral prefrontal cortex (DLPFC) are recruited bilaterally (Janata et al., 2002a).

Given the integral relationship between attention and timing (Jones, 1976; Large & Jones,
1999), elements of the brain’s attention and action systems interact when attention is fo­
cused explicitly on timing judgments, and this interaction is modulated by individual dif­
ferences in listening style. For example, while frontoparietal attention areas, together
with auditory and premotor areas, are engaged overall, greater activation is observed
within the ventral attentional system in those individuals who tend to orient their atten­
tion toward a longer, rather than a subdivided, beat period (Grahn & McAuley, 2009).

Memory

Music depends on many forms of memory that aid in its perception and production. For
example, we form an implicit understanding of tonal structure that allows us to form ex­
pectations and detect violations of those expectations. We also store knowledge about
musical styles in terms of the harmonic progressions, timbres, and orchestration that
characterize them. Beyond the memory for structural aspects of music are memories for
specific pieces of music or autobiographical memories that may be inextricably linked to
those pieces of music. We also depend on working memory to play music in our minds, ei­
ther when imagining a familiar song that we have retrieved from long-term memory or
when repeating a jingle from a commercial that we just heard. Because linguistic materi­
al is often an integral part of music (i.e., the lyrics in songs), the set of memory processes
that needs to be considered in association with music necessarily extends to include those
associated with language. Two questions that arise (p. 120) are, How do the different
memory systems interact, and how might they be characterized in terms of their overlap
with memory systems identified using different tasks and sensory modalities?

Page 13 of 42
Cognitive Neuroscience of Music

Working Memory
An early neuroimaging study of musical processes used short eight-note melodies and
found that parietal and lateral prefrontal areas were recruited when subjects had to com­
pare the pitch of the first and last notes (Zatorre et al., 1994). This result suggested that
there is an overlap of musical working memory with more general working memory sys­
tems. Direct evidence that verbal working memory and tonal working memory share the
same neural substrates—auditory, lateral prefrontal, and parietal cortices—was obtained
in two studies in which verbal and tonal material was presented and rehearsed in sepa­
rate trials (Hickok et al., 2003) or in which the stimuli were identical but the task instruc­
tions emphasized encoding and rehearsal of either the verbal or tonal material (Koelsch
et al., 2009). Further studies relevant to musical working memory are discussed below in
the section on musical imagery.

Episodic Memory
If we have a melody running through our minds, that is, if we are maintaining a melody in
working memory, it is likely the consequence of having heard and memorized the melody
at some point in the past. The melody need not even be one that we heard repeatedly dur­
ing our childhood (remote episodic memory), but could be one that we heard for the first
time earlier in the experimental session (recent episodic memory). Neuroimaging experi­
ments have examined both types of episodic memory.

During an incidental encoding phase of an episodic memory experiment, familiarity judg­


ments about 5-second excerpts of melodies (half of which were familiar) with no associat­
ed lyrics resulted in activation of medial prefrontal and anterior temporal lobe regions
(Platel et al., 2003). The same pattern of activations was observed when a familiarity
judgment task about nursery tunes was contrasted against a change detection judgment
task using those same melodies (Satoh et al., 2006). Familiarity judgments based on CD
recordings (as opposed to synthesized melodies) of instrumental music without lyrics or
voice were found to increase activation within the MPFC, but not the anterior temporal
lobes (Plailly et al., 2007). Similarly, both increased familiarity and autobiographical
salience of popular music excerpts that did contain lyrics resulted in stronger activation
of the dorsal medial prefrontal cortex, but not the anterior temporal lobes. In addition to
showing a stronger response to familiar and memory-evoking music, the MPFC tracked
the trajectories of musical excerpts in tonal space, supporting a hypothesis that tonal con­
texts are integrated with self-relevant information within this region (Janata, 2009).

One possible reason for the discrepant findings in the anterior temporal lobes is the use
of neuroimaging technique in that the positron emission tomography studies (Platel et al.,
2003; Satoh et al., 2006) were not susceptible to signal loss in those regions as were the
fMRI experiments (Janata, 2009; Plailly et al., 2007). Another is the use of complex
recorded musical material compared with monophonic melodies. Familiarity judgments
about monophonic melodies must be based solely on the pitch and temporal cues of a sin­
gle melodic line, whereas recordings of orchestral or popular music contain a multitude
of timbral, melodic, harmonic, and rhythmic cues that can facilitate familiarity judgments.

Page 14 of 42
Cognitive Neuroscience of Music

Recruitment of the anterior temporal lobes is consistent with the neuropsychological evi­
dence of melody processing and recognition deficits following damage to those areas (Ay­
otte et al., 2000; Peretz, 1996). Indeed, when engaged in a recognition memory test in
which patients were first presented with twenty-four unfamiliar folk tune fragments, and
then made old/new judgments, those with right temporal lobe excisions were mainly im­
paired on tune recognition, whereas left temporal lobe excisions resulted in impaired
recognition of the words (Samson & Zatorre, 1991). When tunes and words were com­
bined, new words paired with old tunes led to impaired tune recognition in both patient
groups, suggesting some sort of text–tune integration process involving the temporal
lobes of both hemispheres.

In contrast to making judgments about or experiencing the long-term familiarity of musi­


cal materials, making judgments about whether a melody (either familiar or unfamiliar)
was recently heard is more comparable with typical laboratory episodic memory tasks in
which lists of items are memorized. The results from the small number of studies that
have examined brain activations during old/new judgments for musical material consis­
tently indicate that different brain areas than those described above are recruited. More­
over, the reported activation patterns are quite heterogeneous, including the right hip­
pocampus (Watanabe et al., 2008), and a large number of prefrontal, parietal, and lateral
temporal loci distributed bilaterally (Klostermann et al., 2009; (p. 121) Platel et al., 2003;
Watanabe et al., 2008). Most consistent among those are the recruitment of the lateral
anterior prefrontal cortex along the middle frontal gyrus (Brodmann area 10) and assort­
ed locations in the precuneus.

Absolute Pitch
The rare ability to accurately generate the note name for a tone played in isolation with­
out an external referent is popularly revered as a highly refined musical ability. Neu­
roimaging experiments indicate that regions of the left DLPFC that are associated with
working memory functions become more active when absolute pitch possessors passively
listen to or make melodic interval categorization judgments about pairs of tones relative
to musically trained individuals without absolute pitch (Zatorre et al., 1998). The hypothe­
sis that these regions become more active because of the process of associating a note
with a label is supported by the observation that nonmusician subjects who are trained to
associate chords with arbitrary numbers show activity within this region during that task
following training (Bermudez & Zatorre, 2005). Although the classic absolute pitch ability
as defined above is rare, the ability to distinguish above chance whether a recording of
popular music has been transposed by one or two semitones is common (Schellenberg &
Trehub, 2003), although not understood at a neuroscientific level.

Page 15 of 42
Cognitive Neuroscience of Music

Parallels Between Music and Language


Syntax

As noted in the section on the processing of harmonic/tonal structure in music, harmonic


incongruities appear to engage brain regions and processes that are similar to those in­
volved in the processing of syntactic relations in language. These observations suggest
that music and language may share neural resources that are more generally devoted to
syntactic processing (Fedorenko et al., 2009; Patel, 2003). If there is a shared resource,
then processing costs or altered ERP signatures should be observed when a person is at­
tending to and making judgments about one domain and syntactic violations occur in the
unattended domain. Indeed, when chords are presented synchronously with words, and
subjects have to make syntactic or semantic congruity judgments regarding the final
word of each sentence, the amplitude of the left anterior negativity (LAN), a marker of
linguistic syntax processing, is reduced in response to a syntactically incongruous word
when it is accompanied by a harmonically irregular chord compared with when it is ac­
companied by a harmonically regular chord (Koelsch et al., 2005a). Conversely, the ERAN
is reduced in amplitude when an incongruous chord is accompanied by a syntactically in­
congruous word (Steinbeis & Koelsch, 2008). Interestingly, there is no effect of harmonic
(in)congruity on the N400, a marker of semantic incongruity, when words are being
judged (Koelsch et al., 2005a). Nor is there an effect of physical auditory incongruities
that give rise to an MMN response when sequences of tones, rather than chords, accom­
pany the words. The latter result further supports the separability of processes underly­
ing the ERAN and MMN.

Semantics

The question of whether music conveys meaning is another interesting point of compari­
son between music and language. Music lacks sufficient specificity to unambiguously con­
vey relationships between objects and concepts, but it can be evocative of scenes and
emotions (e.g., Beethoven’s Pastoral Symphony). Even without direct reference to lan­
guage, music specifies relationships among successive tonal elements (e.g., notes and
chords), by virtue of the probability structures that govern music’s movement in tonal
space. Less expected transitions create a sense of tension, whereas expected transitions
release tension (Lerdahl & Krumhansl, 2007). Similarly, manipulations of timing (e.g.,
tempo, rhythm, and phrasing) parallel concepts associated with movement.

Evidence of music’s ability to interact with the brain’s semantic systems comes from two
elegant studies that make use of simultaneously presenting musical and linguistic materi­
al. The first (Koelsch et al., 2004) used short passages of real music to prime semantic
contexts and then examined the ERP response to probe words that were either semanti­
cally congruous or incongruous with the concept ostensibly primed by the music. The in­
congruous words elicited an N400, indicating that the musical passage had indeed
primed a meaningful concept as intended. The second (Steinbeis & Koelsch, 2008) used
simultaneously presented words and chords, along with a moderately demanding dual
Page 16 of 42
Cognitive Neuroscience of Music

task in which attention had to be oriented to both the musical and linguistic information,
and found that semantically incongruous words affected the processing of irregular
(Neapolitan) chords. The semantic incongruities did not affect the ERAN, but rather af­
fected the N500, a late frontal negativity that follows the ERAN in response to incongru­
ous chords and is interpreted as a stage of contextual (p. 122) integration of the anom­
alous chord (Koelsch et al., 2000).

Action
Although technological advances over the past few decades have made it possible to se­
quence sounds and produce music without the need for a human to actively generate
each sound, music has been and continues to be intimately dependent on the motor sys­
tems of the human brain. Musical performance is the obvious context in which the motor
systems are engaged, but the spectrum of actions associated with music extends beyond
overt playing of an instrument. Still within the realm of overt action are movements that
are coordinated with the music, be they complex dance moves or the simple tapping, nod­
ding, or bobbing along with perceived beat. Beyond that is the realm of covert action, in
which no overt movements are detectable, but movements or sensory inputs are imag­
ined. Even expectancy, the formation of mental images of anticipated sensory input, can
be viewed as a form of action (Fuster, 2001; Schubotz, 2007).

As described above, listening to rhythms in the absence of overt action drives activity
within premotor areas, the basal ganglia, and the cerebellum. Here, we examine musical
engagement of the brain’s action system when some form of action, either overt or
covert, is required. One of the beautiful things about music is the ability to engage the ac­
tion system across varying degrees of complexity and still have it be a compelling musical
experience, from simple isochronous synchronization with the beat to virtuosic polyrhyth­
mic performance on an instrument. Within other domains of cognitive neuroscience, there
is an extensive literature on timing and sequencing behaviors that shows differential en­
gagement of the action systems as a function of complexity (Janata & Grafton, 2003),
which is largely echoed in the emerging literature pertaining explicitly to music.

Sensorimotor Coordination

Tapping
Perhaps the simplest form of musical engagement is tapping isochronously with a
metronome whereby a sense of meter is imparted through the periodic accentuation of a
subset of the pacing events (e.g., accenting every other beat to impart the sense of a
march or every third beat to impart the sense of a waltz). In these types of situations, a
very simple coupling between the posterior auditory cortex and dorsal premotor areas is
observed, in which the strength of the response in these two areas is positively correlated
with the strength of the accent that drives the metric salience and the corresponding be­
havioral manifestation of longer taps to more salient events (Chen et al., 2006). When the
synchronization demands increase as simple and complex metric and then nonmetric

Page 17 of 42
Cognitive Neuroscience of Music

rhythms are introduced, behavioral variability increases, particularly among nonmusi­


cians. Positively correlated with the increased performance variability is the activity with­
in a more extensive network comprising the pSMA, SMA, ventral PMC, DLPFC, inferior
parietal lobule, thalamus, and cerebellum, with a few additional differences between mu­
sicians and non-musicians (Chen et al., 2008b). Thus, premotor areas are coupled with at­
tention and working memory areas as the sensorimotor coupling demands increase.

Basic synchronization with a beat also provides a basis for interpersonal synchronization
and musical communication. Simultaneous electroencephalogram (EEG) recordings from
guitarists given the task of mentally synchronizing with a metronome and then commenc­
ing to play a melody together reveal that EEG activity recorded from electrodes situated
above premotor areas becomes synchronized both within and between the performers
(Lindenberger et al., 2009). Although the degree of interpersonal synchronization that
arises by virtue of shared sensory input is difficult to estimate, such simultaneous record­
ing approaches are bound to shape our understanding of socioemotional aspects of senso­
rimotor coordination.

Singing
The adjustment of one’s own actions based on sensory feedback is an important part of
singing. In its simplest form, the repeated singing/chanting of a single note, in compari­
son to passive listening to complex tones, recruits auditory cortex, motor cortex, the
SMA, and the cerebellum, with possible involvement of the anterior cingulate and basal
ganglia (Brown et al., 2004b; Perry et al., 1999; Zarate & Zatorre, 2008). However, as
pitch regulation demands are increased through electronic shifting of the produced pitch,
additional premotor and attentional control regions such as the pSMA, ventral PMC,
basal ganglia, and intraparietal sulcus are recruited across tasks that require the singer
to either ignore the shift or try to compensate for it. The exact network of recruited areas
depends on the amount of singing experience (Zarate & Zatorre, 2008). In regard to more
melodic material, (p. 123) repetition of, or harmonization with, a melody also engages the
anterior STG relative to monotonic vocalization (Brown et al., 2004b), although this acti­
vation is not seen during the singing of phrases from an aria that is familiar to the subject
(Kleber et al., 2007).

Performance

Tasks involving the performance of complex musical sequences, beyond those that are re­
produced via the imitation of a prescribed auditory pattern, afford an opportunity to ob­
serve the interaction of multiple brain systems. Performance can be externally guided by
a musical score, internally guided as in the case of improvisation, or some combination of
the two.

Score-Based
One of the first neuroimaging studies of musical functions examined performance of a
Bach partita from a score and the simpler task of playing scales contrasted with listening
to the corresponding music (Sergent et al., 1992). Aside from auditory, visual, and motor
Page 18 of 42
Cognitive Neuroscience of Music

areas recruited by the basic processes of hearing, score reading, and motor execution,
parietal regions were engaged, presumably by the visuomotor transformations associated
with linking the symbols in the score with a semantic understanding of those symbols as
well as associated actions (Bevan et al., 2003; McDonald, 2006; Schön et al., 2001; 2002).
In addition, left premotor and IFG areas were engaged, presumably reflecting some of the
sequencing complexity associated with the partita. A similar study (Parsons et al., 2005)
in which bimanual performance of memorized Bach pieces was compared with bimanual
playing of scales, found extensive activation of medial and lateral premotor areas, anteri­
or auditory cortex, and subcortical activations in the thalamus and basal ganglia, presum­
ably driven by the added demands of retrieving and executing complex sequences from
memory.

Other studies complicate the interpretation that premotor cortices are driven by greater
complexity in the music played. For example, separate manipulation of melodic and rhyth­
mic complexity found some areas that were biased toward processing melodic informa­
tion (mainly in the STG and calcarine sulcus), whereas others were biased toward pro­
cessing rhythmic information (left inferior frontal cortex and inferior temporal gyrus), but
there was no apparent activation of premotor areas (Bengtsson & Ullen, 2006).

Improvised
Music, like language, is often improvised with the intent of producing a syntactically (and
semantically) coherent stream of auditory events. Given a task of continuing an unfamil­
iar melody or linguistic phrase with an improvised sung melodic or spoken linguistic
phrase, a distributed set of brain areas is engaged in common for music and language, in­
cluding the SMA, motor cortex, putamen, globus pallidus, cerebellum, posterior auditory
cortex, and lateral inferior frontal cortex (Brodmann area 44/45), although the extent of
activation in area 44/45 is greater for language (Brown et al., 2006). Lateral premotor ar­
eas are consistently found to be active during improvisation tasks that involve various de­
grees of piano performance realism. For instance, when unimanual production of
melodies is constrained by a five-key keyboard and instructions that independently vary
the amount of melodic or rhythmic freedom that can be exhibited by the subject, activity
in mid-dorsal premotor cortex is modulated by complexity along both dimensions
(Berkowitz & Ansari, 2008). A similar region is recruited during unimanual production
while improvising around a visually presented score, both when the improvised perfor­
mance must be memorized and when it is improvised freely without memorization
(Bengtsson et al., 2007). A more dorsal premotor area is engaged during this type of im­
provisation also, mirroring effects found in a study of unimanual improvisation in which
free improvisation without a score was contrasted with playing a jazz melody from memo­
ry (Limb & Braun, 2008). The latter study observed activation within an extensive net­
work encompassing the ventrolateral prefrontal (Brodmann area 44), middle temporal,
parietal, and cerebellar areas. Emotion areas in the ventromedial prefrontal cortex were
active during improvisation also, providing the first neuroimaging evidence of how motor
control areas are coupled with affective areas during a highly naturalistic task. Interest­
ingly, both of the studies in which improvisation was least constrained also found substan­

Page 19 of 42
Cognitive Neuroscience of Music

tial activation in extrastriate visual cortices that could not be attributed to visual input or
score reading, suggesting perhaps that visual mental imagery processes accompany im­
provisation. One must note that in all of these studies, subjects were musicians, often
with high levels of training.

Imagery

Music affords an excellent opportunity for examining mental imagery. It is common to


sing to oneself or have a song stuck in one’s head, so it would (p. 124) seem that the
brain’s sensorimotor system is covertly engaged by this mental pastime. Studies of musi­
cal imagery have tended to emphasize either the auditory or the motor components, with
an interest in determining the degree to which the primary auditory and motor cortices
are engaged.

Auditory Imagery
Activation of auditory association cortices is found using fMRI or PET when subjects sing
through a short melody in order to compare the pitch of two notes corresponding to spe­
cific words in the lyric (Zatorre et al., 1996), or continue imaging the notes following the
opening fragment of a television show theme song (Halpern & Zatorre, 1999). The activa­
tion of auditory areas is corroborated by EEG/MEG studies in which responses to an
imagined note (Janata, 2001) or expected chord (Janata, 1995; Otsuka et al., 2008) closely
resemble auditory evoked potentials with known sources in the auditory cortex (e.g., the
N100). One study that used actual CD recordings of instrumental and vocal music found
extensive activation of auditory association areas during silent gaps that were inserted in­
to the recordings, with some activation of the primary auditory cortex when the gaps oc­
curred in instrumental music (Kraemer et al., 2005). However, another study that used
actual CD recordings to examine anticipatory imagery—the phenomenon of imagining the
next track on a familiar album as soon as the current one ends—found no activation of the
auditory cortices during the imagery period but extensive activation of a frontal and pre­
motor network (Leaver et al., 2009).

Premotor areas, in particular the SMA, as well as frontal regions associated with memory
retrieval, have been activated in most neuroimaging studies of musical imagery that have
emphasized the auditory components (Halpern & Zatorre, 1999; Leaver et al., 2009; Za­
torre et al., 1996), even under relatively simple conditions that could be regarded as
maintenance of items in working memory during same/different comparison judgments of
melodies or harmonized melodies lasting 4 to 6 seconds (Brown & Martinez, 2007). It has
been argued, however, on the basis of comparing activations in an instrumental timbre
imagery task with a visual object imagery task, that the frontal contribution may arise
from general imagery task demands (Halpern et al., 2004). Nonetheless, effortful musical
imagery tasks, such as those requiring the imagining of newly learned pairings of novel
melodies (Leaver et al., 2009), imagery of expressive short phrases from an aria (Kleber
et al., 2007), or imagining the sound or actions associated with a Mozart piano sonata
when only the other modality is presented (Baumann et al., 2007), appear to be associat­
ed with activity in a widespread network of cortical and subcortical areas. This network
Page 20 of 42
Cognitive Neuroscience of Music

matches quite well elements of both the ventral and dorsal attentional networks (Corbet­
ta & Shulman, 2002) and the network observed when attentive listening to polyphonic
music is contrasted with rest (Janata et al., 2002a).

Motor Imagery
Several studies have focused on motor imagery. In a group of pianists, violinists, and cel­
lists, imagined performance of rehearsed pieces from the classical repertoire recruited
frontal and parietal areas bearing resemblance to the dorsal attention network, together
with the SMA and subcortical areas and cerebellum (Langheim et al., 2002). Imagining
performing the right-hand part of one of Bartok’s Mikrokosmos while reading the score
similarly activates the dorsal attentional network along with visual areas and the cerebel­
lum (Meister et al., 2004). Interestingly, the SMA is not activated significantly when the
source of the information to be imagined is external rather than internal (i.e., playing
from memory), indicating that premotor and parietal elements of the dorsal attentional
system coordinate with other brain regions based on the specific demands of the particu­
lar imagery task.

Emotion
The relationship between music and emotion is a complex one, and multiple mechanisms
have been postulated through which music and the emotion systems of the brain can in­
teract (Juslin & Vastfjall, 2008). Compared with the rather restricted set of paradigms
that have been developed for probing the structural representations of music (e.g., tonali­
ty), the experimental designs for examining neural correlates of emotion in music are di­
verse. The precise emotional states that are captured in any given experiment, and their
relevance to real music listening experiences, are often difficult to discern when the actu­
al comparisons between experimental conditions are considered carefully. Manipulations
have tended to fall into one of two categories: (1) normal music contrasted with the same
music rendered dissonant or incoherent, or (2) selection or composition of musical stimuli
to fall into discrete affective categories (e.g., happy, sad, fearful). In general, studies have
found modulation of activity within limbic system areas of the brain.

(p. 125) When the relative dissonance of a mechanical performance of a piano melody is
varied by the dissonance of the accompanying chords, activity in the right parahippocam­
pal gyrus correlates positively with the increases in dissonance and perceived unpleasant­
ness, whereas activity in the right orbitofrontal cortex and subcallosal cingulate increases
as the consonance and perceived pleasantness increase (Blood et al., 1999). Similarly,
when listening to pleasant dance music spanning a range of mostly classical genres is
contrasted with listening to the same excerpts rendered dissonant and displeasing by
mixing the original with two copies that have been pitch-shifted by a minor second and a
tritone, medial temporal areas—the left parahippocampal gyrus, hippocampus, and bilat­
eral amygdala—respond more strongly to the dissonant music, whereas areas more typi­
cally associated with listening to music—the auditory cortex, the left IFG, anterior insula
and frontal operculum, and ventral premotor cortex—respond more strongly to the origi­

Page 21 of 42
Cognitive Neuroscience of Music

nal pleasing versions (Koelsch et al., 2006). The same stimulus materials result in
stronger theta activity along anterior midline sites in response to the pleasing music
(Sammler et al., 2007). Listening to coherent excerpts of music rather than their tempo­
rally scrambled counterparts (Levitin & Menon, 2003) increases activity in parts of the
dopaminergic pathway—the ventral tegmental area and nucleus accumbens (Menon &
Levitin, 2005). These regions interact and are functionally connected to the left IFG, insu­
la, hypothalamus, and orbitofrontal cortex, thus delineating a set of emotion-processing
areas of the brain that are activated by music listening experiences that are relatively
pleasing.

The results of the above-mentioned studies are somewhat heterogeneous and challenging
to interpret because they depend on comparisons of relatively normal (and pleasing) mu­
sic-listening experiences with highly abnormal (and displeasing) listening experience,
rather than comparing normal pleasing listening experiences with normal displeasing ex­
periences. Nonetheless, modulation of the brain’s emotion circuitry is also observed when
the statistical contrasts do not involve distorted materials. Listening to unfamiliar and
pleasing popular music compared with silent rest activates the hippocampus, nucleus ac­
cumbens, ventromedial prefrontal cortex, right temporal pole, and anterior insula (Brown
et al., 2004a). When listening to excerpts of unfamiliar and familiar popular music, activi­
ty in the VMPFC increases as the degree of experienced positive affect increases (Janata,
2009). Somewhat paradoxically, listening to music that elicits chills (goosebumps or shiv­
ers down the spine)—something that is considered by many to be highly pleasing—re­
duces activity in the VMPFC (where activity increases tend to be associated with positive
emotional responses), whereas activity in the right amygdala and in the left hippocampus/
amygdala also decreases (Blood & Zatorre, 2001). Activity in other brain areas associated
with positive emotional responses, such as the ventral striatum and orbitofrontal cortex
increases, along with activity in the insula and premotor areas (SMA and cerebellum).

The amygdala has been of considerable interest, given its general role in the processing
of fearful stimuli. Patients with either unilateral or bilateral damage to the amygdala
show impaired recognition of scary music and difficulty differentiating peaceful music
from sad music (Gosselin et al., 2005, 2007). Right amygdala damage, in particular,
leaves patients unable to distinguish intended fear in music from either positive or nega­
tive affective intentions (Gosselin et al., 2005). Chord sequences that contain irregular
chord functions and elicit activity in the VLPFC are also regarded as less pleasing and
elicit activity bilaterally in the amygdala (Koelsch et al., 2008). Violations of syntactic ex­
pectations also increase the perceived tension in a piece of music and are associated with
changes in electrodermal activity—a measure of emotional arousal (Steinbeis et al.,
2006).

Perhaps the most common association between music and emotion is the relationship be­
tween affective valence and the mode of the music: The minor mode is consistently asso­
ciated with sadness, whereas the major mode is associated with happiness. Brain activa­
tions associated with mode manipulations are not as consistent across studies, however.
In one study (Khalfa et al., 2005), the intended emotions of classical music pieces played

Page 22 of 42
Cognitive Neuroscience of Music

on the piano were assessed on a five-point bivalent scale (sad to happy). Relative to major
pieces, minor pieces elicited activity in the posterior cingulate and in the medial pre­
frontal cortex, whereas pieces in the major mode were not associated with any activity in­
creases relative to minor pieces. A similar absence of response for major mode melodies
relative to minor mode was observed in a different study in which unfamiliar monophonic
melodies were used (Green et al., 2008). However, minor mode melodies elicited activity
in the left parahippocampal gyrus and rostral anterior cingulate, indicating engagement
of the limbic system, albeit in a unique constellation. A study in which responses to
(p. 126) short four-chord sequences that established either major or minor tonalities were

compared with repeated chords found bilateral activation of the IFG, irrespective of the
mode (Mizuno & Sugishita, 2007). This result was consistent with the role of this region
in the evaluation of musical syntax, but inconsistent with the other studies comparing ma­
jor and minor musical material. Finally, a study using recordings of classical music that
could be separated into distinct happy, sad, and neutral categories (Mitterschiffthaler et
al., 2007) found that happy and sad excerpts strongly activated the auditory cortex bilat­
erally relative to neutral music. Responses to happy and sad excerpts (relative to neutral)
were differentiated in that happy music elicited activity within the ventral striatum, sev­
eral sections of the cingulate cortex, and the parahippocampal gyrus, whereas sad music
was associated with activity in a region spanning the right hippocampus and amygdala,
along with cingulate regions.

Anatomy, Plasticity, and Development


Music provides an excellent arena in which to study the effects of training and expertise
on the brain, both in terms of structure and function (Munte et al., 2002), and also to ex­
amine structural differences in unique populations, such as those individuals who possess
the ability to name pitches in isolation (absolute pitch) or those who have difficulty per­
ceiving melodies (amusics). Anatomical correlates of musical expertise have been ob­
served both in perceptual and motor areas of the brain.

Unsurprisingly, the auditory cortex has been the specific target of several investigations.
An early investigation observed a larger planum temporale in the left hemisphere among
musicians, although the effect was primarily driven by musicians with absolute pitch
(Schlaug et al., 1995). In studies utilizing larger numbers of musically trained and un­
trained subjects, the volume of HG, where the primary and secondary auditory areas are
situated, was found to increase with increasing musical aptitude (Schneider et al., 2002,
2005). The volumetric measures were positively correlated with the strength of the early
(19–30 ms) stages of the evoked responses to amplitude-modulated pure tones (Schneider
et al., 2002). Within the lateral extent of HG, the volume was positively correlated with
the magnitude of a slightly later peak (50 ms post-stimulus) in the waveform elicited by
sounds consisting of several harmonics. Remarkably, the hemispheric asymmetry in the
volume of this region was indicative of the mode of perceptual processing of these
sounds, with larger left-hemisphere volumes reflecting a bias toward processing the im­
plied fundamental frequency of the sounds and larger right-hemisphere volumes indicat­
Page 23 of 42
Cognitive Neuroscience of Music

ing a bias toward spectral processing of the sounds (Schneider et al., 2005). In general,
the auditory cortex appears to respond more strongly to musical sounds in musicians
(Pantev et al., 1998) and as a function of the instrument with which they have had the
most experience (Margulis et al., 2009).

Whole brain analyses using techniques such a cortical thickness mapping or voxel-based
morphometry (VBM) have also revealed differences between individuals trained on musi­
cal instruments and those with no training, although there is considerable variability in
the regions identified in the different studies, possibly a consequence of differences in the
composition of the samples and the mapping technique used (Bermudez et al., 2009). Cer­
tain findings, such as a greater volume in trained pianists of primary motor and so­
matosensory areas and cerebellar regions responsible for hand and finger movements
(Gaser & Schlaug, 2003), are relatively easy to interpret, and they parallel findings of
stronger evoked responses in the hand regions of the right hemisphere that control the
left (fingering) hand of violinists (Pascual-Leone et al., 1994). Larger volumes in musi­
cians are also observed in lateral prefrontal cortex, both ventrally (Bermudez et al., 2009;
Gaser & Schlaug, 2003; Sluming et al., 2002) and dorsally along the middle frontal gyrus
(Bermudez et al., 2009). However, one particular type of musical aptitude, absolute pitch,
is associated with decreased cortical thickness in dorsolateral frontal cortex in similar ar­
eas that are associated with increases in activation in listeners with absolute pitch rela­
tive to other musically trained subjects (Zatorre et al., 1998). Another paradox presented
by VBM are findings of greater gray-matter volumes in amusic subjects in some of the
same ventrolateral regions that show greater cortical thickness in musicians (Bermudez
et al., 2009; Hyde et al., 2007).

The anatomical differences that are observed as a function of musical training are per­
haps better placed into a functional context when one observes the effects of short-term
training on neural responses. Nonmusicians who were trained over the course of two
weeks to play a cadence consisting of broken chords on a piano keyboard exhibited a
stronger MMN to deviant notes in similar note patterns compared either with their re­
sponses before receiving training or with a group of subjects who received training by lis­
tening and making judgments about (p. 127) the sequences performed by the trained
group (Lappe et al., 2008). Similarly, nonmusicians who, over the course of 5 days,
learned to perform five-note melodies showed greater activation bilaterally in the dorsal
IFG (Broca’s area) and lateral premotor areas when listening to the trained melodies
compared with listening to comparison melodies on which they had not trained (Lahav et
al., 2007). Thus, perceptual responses are stronger following sensorimotor training with­
in the networks that are utilized during the training. When the training involves reading
music from a score, medial superior parietal areas also show effects of training (Stewart
et al., 2003). Both mental and physical practice of five-finger piano exercises is capable of
strengthening finger representations in the motor cortex across a series of days, as mea­
sured by reduced transcranial magnetic stimulation (TMS) thresholds for eliciting move­
ments (Pascual-Leone et al., 1995).

Page 24 of 42
Cognitive Neuroscience of Music

Disorders
Musical behaviors, like any other behaviors, are disrupted when the functioning of the
neural substrates that support those behaviors is impaired. Neuropsychological investiga­
tions provide intriguing insights into component processes in musical behaviors and their
likely locations in the brain. Aside from the few studies mentioned in the sections above
of groups of patients who underwent brain surgery, there are many case studies docu­
menting the effects of brain insults, typically caused by stroke, on musical functions
(Brust, 2003). A synthesis of the findings from these many studies (Stewart et al., 2006) is
beyond the scope of this chapter, as is a discussion of the burgeoning topic of using music
for neurorehabilitation (Belin et al., 1996; Sarkamo et al., 2008; Thaut, 2005). Here, the
discussion of brain disorders in relation to music is restricted to amusia, a music-specific
disorder.

Amusia

Amusia, commonly referred to as “tone deafness,” refers to a profound impairment in ac­


curately perceiving melodies. The impairment arises not so much from an inability to dis­
criminate one note in a melody from the next (i.e., to recognize that a different note is be­
ing played), but rather from the inability to determine the direction of the pitch change
(Ayotte et al., 2002; Foxton et al., 2004). The ability to perceive the direction of pitch
change from one note to the next is critical to discerning the contour of the melody, that
is, its defining feature. The impairment may be restricted to processing of melodic rather
than rhythmic structure (Hyde & Peretz, 2004), although processing of rhythms is im­
paired when the pitch of individual notes is also changing (Foxton et al., 2006).

Impaired identification of pitch direction, but not basic pitch discrimination, has been ob­
served in patients with right temporal lobe excisions that encroach on auditory cortex in
HG (Johnsrude et al., 2000), which is likely to underlie the bias toward the right hemi­
sphere for the processing of melodic information (Warrier & Zatorre, 2004). A diffusion
tensor imaging study of amusics and normal controls found that the volume of the superi­
or arcuate fasciculus in the right hemisphere was consistently smaller in the group of
amusics than in normal controls (Loui et al., 2009a). The arcuate fasciculus connects the
temporal and frontal lobes, specifically the posterior superior and middle temporal gyri
and the IFG. VBM results additionally indicate a structural anomaly in a small region of
the IFG in amusics (Hyde et al., 2006, 2007). Taken together, the neuroimaging studies
that have implicated the IFG in the processing of musical syntax and temporal structure
(Koelsch et al., 2002c, 2005b; Levitin & Menon, 2003; Maess et al., 2001; Patel, 2003; Till­
mann et al., 2003), the behavioral and structural imaging data from amusics, and the
studies of deficits in melody processing in right temporal lobe lesion patients support a
view that the ability to perceive, appreciate, and remember melodies depends in large
part on intact functioning of a perception/action circuit in the right hemisphere (Fuster,
2000; Loui et al., 2009a).

Page 25 of 42
Cognitive Neuroscience of Music

Music and the Brain’s Ensemble of Functions


During the past 15 to 20 years, there has been a tremendous increase in the amount of
knowledge pertaining to the ways in which the human brain interacts with music. Al­
though it is expeditious for those outside the field to regard music as a tidy circumscribed
object or unitary process that is bound to have a concrete representation in the brain, or
perhaps conversely a complex cultural phenomenon for which there is no hope of under­
standing its neural basis, even a modest amount of deeper contemplation reveals music to
be a multifaceted phenomenon that is integral to human life. My objective in this chapter
was to provide an overview of the variety of musical processes that contribute to musical
behaviors and experiences, and of the way that these processes interact with various
(p. 128) domain-general brain systems. Clearly, music is a diverse phenomenon, and musi­

cal stimuli and musical tasks are capable of reaching most every part of the brain. Given
this complexity, is there any hope for generating process models of musical functions and
behaviors that can lead to a grand unified theory of music and the brain?

Figure 7.2 A highly schematized and simplified sum­


mary of brain regions involved in different facets of
music psychological processes. On the left is a lateral
view of the right cerebral hemisphere. A medial view
of the right hemisphere is shown on the right. There
is no intent to imply lateralization of function in this
portrayal. White lettering designates the different
lobes. The colored circles correspond to the colored
labels in the box below. AG, angular gyrus; DLPFC,
dorsolateral prefrontal cortex; HG, Heschl’s gyrus;
IFG, inferior frontal gyrus; IPS, intraparietal sulcus;
MPFC, medial prefrontal cortex; PMC, premotor cor­
tex; pSMA, pre–supplementary motor area; PT,
planum temporale; SMA, supplementary motor area;
STG, superior temporal gyrus; VLPFC, ventrolateral
prefrontal cortex.

Obligatory components of process models are boxes with arrows between them, in which
each box refers to a discrete function, and perhaps an associated brain area. Such models
have been proposed with respect to music (Koelsch & Siebel, 2005; Peretz & Coltheart,
2003). Within such models, some component processes are considered music specific,
whereas others represent shared processes with other brain functions (e.g., language,
emotion). The issue of music specificity, or modularity of musical functions, is of consider­
able interest, in large part because of its evolutionary implications (Peretz, 2006). The
strongest evidence for modularity of musical functions derives from studies of individuals
Page 26 of 42
Cognitive Neuroscience of Music

with brain damage in whom specific musical functions are selectively impaired (Peretz,
2006; Stewart et al., 2006). Such specificity in loss of function is remarkable given the
usual extent of brain damage. Indeed, inferences about modularity must be tempered
when not all possible parallel functions in other domains have been considered. The pro­
cessing functions of the right lateral temporal lobes provide a nice example. As reviewed
in this chapter and elsewhere (Zatorre et al., 2002), the neuropsychological and function­
al and structural neuroanatomical evidence suggests that the auditory cortex in the right
hemisphere is more specialized than the left for the processing of pitch, pitch relation­
ships, and thereby melody. Voice-selective regions of the auditory cortex are highly selec­
tive in the right hemisphere (Belin et al., 2000), and in general a large extent of the right
lateral temporal lobes appears to be important for the processing of emotional prosody
(Ethofer et al., 2009; Ross & Monnot, 2008; Schirmer & Kotz, 2006). Given evidence of
parallels between melody and prosody, such as the connotation of sadness by an interval
of a minor third in both music and speech (Curtis & Bharucha, 2010), or deficits among
amusic individuals in processing speech intonation contours (Patel et al., 2005), it is likely
that contour-related music and language functions are highly intertwined in the right
temporal lobe.

Perhaps more germane to the question of how the brain processes music is the question
of how one engages with the music. Few would doubt that the brain of someone dancing a
tango at a club would show more extensive engagement with the music than that of some­
one incidentally hearing music while choosing what cereal to buy at a supermarket. It
would therefore seem that any given result from the cognitive neuroscience of music lit­
erature must be interpreted with regard to the experiential situation of the participants,
both in terms of the affective, motivational, and task/goal states they might find them­
selves in, and in terms of the relationship of the (often abstracted) musical stimuli that
(p. 129) they are being asked to interact with to the music they would normally interact

with. In this regard, approaches that explicitly relate the dynamic time-varying properties
of real musical stimuli to the behaviors and brain responses they engender will become
increasingly important.

Figure 7.2 is a highly schematized (and incomplete) summary of sets of brain areas that
may be recruited by different musical elements and more general processes that mediate
musical experiences. It is not intended to be a process model or to represent the outcome
of a quantitative meta-analysis. Rather, it serves mainly to suggest that one goal of re­
search in the cognitive neuroscience of music might be to serve as a model system for un­
derstanding the coordination of the many processes that underlie creative goal-directed
behavior in humans.

References
Adler, D. S. (2009). Archaeology: The earliest musical tradition. Nature, 460, 695–696.

Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflict­
ed with a music-specific disorder. Brain, 125, 238–251.

Page 27 of 42
Cognitive Neuroscience of Music

Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music
agnosia associated with middle cerebral artery infarcts. Brain, 123 (Pt 9), 1926–1938.

Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., Heinze, H.-J.,
& Altenmüller, E. (2006). Shared networks for auditory and motor processing in profes­
sional pianists: Evidence from fMRI conjunction. NeuroImage, 30, 917–926.

Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology,
41, 254–311.

Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A net­
work for audio-motor coordination in skilled pianists and non-musicians. Brain Res, 1161,
65–78.

Beisteiner, R., Erdler, M., Mayer, D., Gartus, A., Edward, V., Kaindl, T., Golaszewski, S.,
Lindinger, G., & Deecke, L. (1999). A marker for differentiation of capabilities for process­
ing of musical harmonies as detected by magnetoencephalography in musicians. Neuro­
science Letters, 277, 37–40.

Belin, P., VanEeckhout, P., Zilbovicius, M., Remy, P., Francois, C., Guillaume, S., Chain, F.,
Rancurel, G., & Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intona­
tion therapy: A PET study. Neurology, 47, 1504–1511.

Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in hu­
man auditory cortex. Nature, 403, 309–312.

Bengtsson, S. L., Csikszentmihalyi, M., & Ullen, F. (2007). Cortical regions involved in the
generation of musical structures during improvisation in pianists. Journal of Cognitive
Neuroscience, 19, 830–842.

Bengtsson, S. L., & Ullen, F. (2006). Dissociation between melodic and rhythmic process­
ing during piano performance from musical scores. NeuroImage, 30, 272–284.

Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural
correlates of musical improvisation. NeuroImage, 41, 535–543.

Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical corre­
lates of musicianship as revealed by cortical thickness and voxel-based morphometry.
Cerebral Cortex, 19, 1583–1596.

Bermudez, P., & Zatorre, R. J. (2005). Conditional associative memory for musical stimuli
in nonmusicians: Implications for absolute pitch. Journal of Neuroscience, 25, 7718–7723.

Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical ex­
pectancy: Comparison of musicians with nonmusicians. Journal of Experimental Psycholo­
gy: Human Perception and Performance, 21, 1278–1296.

Page 28 of 42
Cognitive Neuroscience of Music

Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in mu­
sic and other non-linguistic contexts. Psychophysiology, 24, 14–25.

Bevan, A., Robinson, G., Butterworth, B., & Cipolotti, L. (2003). To play “B” but not to say
“B”: Selective loss of letter names. Neurocase, 9, 118–128.

Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimension­
al scaling of emotional responses to music: The effect of musical expertise and of the du­
ration of the excerpts. Cognition & Emotion, 19, 1113–1139.

Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate
with activity in brain regions implicated in reward and emotion. Proceedings of the Na­
tional Academy of Sciences U S A, 98, 11818–11823.

Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to
pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature
Neuroscience, 2, 382–387.

Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical
discrimination. Brain and Cognition, 63, 59–69.

Brown, S., Martinez, M. J., & Parsons, L. M. (2004a). Passive music listening spontaneous­
ly engages limbic and paralimbic systems. Neuroreport, 15, 2033–2037.

Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in
the brain: A PET study of the generation of melodies and sentences. European Journal of
Neuroscience, 23, 2791–2803.

Brown, S., Martinez, M. J., Hodges, D. A., Fox, P. T., & Parsons, L. M. (2004b). The song
system of the human brain. Cognitive Brain Research, 20, 363–375.

Brust, J. C. M. (2003). Music and the neurologist: A historical perspective. In I. Peretz &
R. J. Zatorre (Eds.), Cognitive neuroscience of music (pp. 181–191). Oxford, UK: Oxford
University Press.

Caclin, A., Brattico, E., Tervaniemi, M., Naatanen, R., Morlet, D., Giard, M. H., &
McAdams, S. (2006). Separate neural processing of timbre dimensions in auditory senso­
ry memory. Journal of Cognitive Neuroscience, 18, 1959–1972.

Caclin, A., Giard M.-H., Smith, B. K., & McAdams, S. (2007). Interactive processing of
timbre dimensions: A Garner interference study. Brain Research, 1138, 159–170.

Caclin, A., McAdams, S., Smith, B. K., & Giard, M. H. (2008). Interactive processing of
timbre dimensions: An exploration with event-related potentials. Journal of Cognitive
Neuroscience, 20, 49–64.

Page 29 of 42
Cognitive Neuroscience of Music

Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates
(p. 130)

of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the
Acoustical Society of America, 118, 471–482.

Carrion, R. E., & Bly, B. M. (2008). The effects of learning on event-related potential cor­
relates of musical expectancy. Psychophysiology, 45, 759–775.

Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits
motor regions of the brain. Cerebral Cortex, 18, 2844–2854.

Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for au­
ditory-motor synchronization is modulated by rhythm complexity and musical training.
Journal of Cognitive Neuroscience, 20, 226–239.

Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dor­
sal premotor cortex during synchronization to musical rhythms. NeuroImage, 32, 1771–
1781.

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten­
tion in the brain. Nature Reviews, Neuroscience, 3, 201–215.

Curtis, M. E., & Bharucha, J. J. (2010). The minor third communicates sadness in speech,
mirroring its use in music. Emotion, 10, 335–348.

Dennis, M., & Hopyan, T. (2001). Rhythm and melody in children and adolescents after
left or right temporal lobectomy. Brain and Cognition, 47, 461–469.

Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for
melodies. Psychological Review, 85, 341–354.

Ethofer, T., De Ville, D. V., Scherer, K., & Vuilleumier, P. (2009). Decoding of emotional in­
formation in voice-sensitive cortices. Current Biology, 19, 1028–1033.

Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural inte­
gration in language and music: Evidence for a shared system. Memory and Cognition, 37,
1–9.

Foxton, J. M., Dean, J. L., Gee, R., Peretz, I., & Griffiths, T. D. (2004). Characterization of
deficits in pitch perception underlying “tone deafness.” Brain, 127, 801–810.

Foxton, J. M., Nandy, R. K., & Griffiths, T. D. (2006). Rhythm deficits in “tone deafness.”
Brain and Cognition, 62, 24–29.

Fuster, J. M. (2000). Executive frontal functions. Experimental Brain Research, 133, 66–
70.

Fuster, J. M. (2001) The prefrontal cortex—an update: Time is of the essence. Neuron, 30,
319–333.

Page 30 of 42
Cognitive Neuroscience of Music

Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musi­
cians. Journal of Neuroscience, 23, 9240–9245.

Gilbert, S. J., Spengler, S., Simons, J. S., Steele, J. D., Lawrie, S. M., Frith, C. D., &
Burgess, P. W. (2006). Functional specialization within rostral prefrontal cortex (Area 10):
A meta-analysis. Journal of Cognitive Neuroscience, 18, 932–948.

Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emo­
tion recognition from music. Neuropsychologia, 45, 236–244.

Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S.
(2005). Impaired recognition of scary music following unilateral temporal lobe excision.
Brain, 128, 628–640.

Goydke, K. N., Altenmuller, E., Moller, J., & Munte, T. F. (2004). Changes in emotional tone
and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Re­
search, 21, 351–359.

Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain.
Journal of Cognitive Neuroscience, 19, 893–906.

Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual differences in beat per­
ception. NeuroImage, 47, 1894–1903.

Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in
musicians and nonmusicians during beat perception. Journal of Neuroscience, 29, 7540–
7548.

Green, A. C., Baerentsen, K. B., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., &
Vuust, P. (2008). Music in minor activates limbic structures: A relationship with disso­
nance? Neuroreport, 19, 711–715.

Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the


Acoustical Society of America, 61, 1270–1277.

Griffiths, T. D., Buchel, C., Frackowiak, R. S. J., & Patterson, R. D. (1998). Analysis of tem­
poral structure in sound by the human brain. Nature Neuroscience, 1, 422–427.

Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET in­
vestigation of auditory imagery for familiar melodies. Cerebral Cortex, 9, 697–704.

Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
correlates of perceived and imagined musical timbre. Neuropsychologia, 42, 1281–1292.

Hickok, G., Buchsbaum, B., Humphries, C., & Muftuler, T. (2003). Auditory-motor interac­
tion revealed by fMRI: Speech, music, and working memory in area Spt. Journal of Cogni­
tive Neuroscience, 15, 673–682.

Page 31 of 42
Cognitive Neuroscience of Music

Hyde, K. L., Lerch, J. P., Zatorre, R. J., Griffiths, T. D., Evans, A. C., & Peretz, I. (2007).
Cortical thickness in congenital amusia: When less is better than more. Journal of Neuro­
science, 27, 13028–13032.

Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological
Science, 15, 356–360.

Hyde, K. L., Zatorre, R. J., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of
the amusic brain: A two-site study. Brain, 129, 2562–2570.

Janata, P. (1995). ERP measures assay the degree of expectancy violation of harmonic
contexts in music. Journal of Cognitive Neuroscience, 7, 153–164.

Janata, P. (2001). Brain electrical activity evoked by mental formation of auditory expecta­
tions and images. Brain Topography, 13, 169–193.

Janata, P. (2005). Brain networks that track musical structure. Annals of the New York
Academy of Sciences, 1060, 111–124.

Janata, P. (2007). Navigating tonal space. In W. B. Hewlett, E. Selfridge-Field, & E. Cor­


reia (Eds.), Tonal theory for the digital age (pp. 39–50). Stanford, CA: Center for Comput­
er Assisted Research in the Humanities.

Janata, P. (2009). The neural architecture of music-evoked autobiographical memories.


Cerebral Cortex, 19, 2579–2594.

Janata, P., Birk, J. L., Tillmann, B., & Bharucha, J. J. (2003). Online detection of tonal pop-
out in modulating contexts. Music Perception, 20, 283–305.

Janata, P., Birk, J. L., Van Horn, J. D., Leman, M., Tillmann, B., & Bharucha, J. J. (2002b).
The cortical topography of tonal structures underlying Western music. Science, 298,
2167–2170.

Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for be­
haviors related to sequencing and music. Nature Neuroscience, 6, 682–687.

Janata, P., Tillmann, B., & Bharucha, J. J. (2002a). Listening to polyphonic music recruits
domain-general attention and working memory circuits. Cognitive, Affective and Behav­
ioral Neuroscience, 2, 121–140.

Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development
(p. 131)

of syntax processing in children. NeuroImage, 47, 735–744.

Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right
human auditory cortex for perceiving pitch direction. Brain, 123, 155–163.

Jones, M. R. (1976). Time, our lost dimension—toward a new theory of perception, atten­
tion, and memory. Psychological Review, 83, 323–355.

Page 32 of 42
Cognitive Neuroscience of Music

Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological
Review, 96, 459–491.

Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stim­
ulus-driven attending in dynamic arrays. Psychological Science, 13, 313–319.

Juslin, P. N., & Vastfjall, D. (2008). Emotional responses to music: The need to consider
underlying mechanisms. Behavioral and Brain Science, 31, 559–621.

Khalfa, S., Schon, D., Anton, J. L., & Liegeois-Chauvel, C. (2005). Brain regions involved in
the recognition of happiness and sadness in music. Neuroreport, 16, 1981–1984.

Kleber, B., Birbaumer, N., Veit, R., Trevorrow, T., & Lotze, M. (2007). Overt and imagined
singing of an Italian aria. NeuroImage, 36, 889–900.

Klostermann, E. C., Loui, P., Shimamura, A. P. (2009). Activation of right parietal cortex
during memory retrieval of nonlinguistic auditory stimuli. Cognitive Affective & Behav­
ioral Neuroscience, 9, 242–248.

Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and dif­
ferences between ERAN and MMN. Psychophysiology, 46, 179–190.

Koelsch, S., & Mulder, J. (2002). Electric brain responses to inappropriate harmonies dur­
ing listening to expressive music. Clinical Neurophysiology, 113, 862–869.

Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in
Cognitive Sciences, 9, 578–584.

Koelsch, S., Schmidt, B. H., & Kansok, J. (2002a). Effects of musical expertise on the early
right anterior negativity: An event-related brain potential study. Psychophysiology, 39,
657–663.

Koelsch, S., Schroger, E., & Gunter, T. C. (2002b). Music matters: Preattentive musicality
of the human brain. Psychophysiology, 39, 38–48.

Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unex­
pected chord functions during music listening. Neuroreport, 19, 1815–1819.

Koelsch, S., Gunter, T., Friederici, A. D., & Schroger, E. (2000). brain indices of music pro­
cessing: “Nonmusicians” are musical. Journal of Cognitive Neuroscience, 12, 520–541.

Koelsch, S., Gunter, T., Schroger, E., & Friederici, A. D. (2003a). Processing tonal modula­
tions: An ERP study. Journal of Cognitive Neuroscience, 15, 1149–1159.

Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005a). Interaction between syn­
tax processing in language and in music: An ERP study. Journal of Cognitive
Neuroscience, 17, 1565–1577.

Page 33 of 42
Cognitive Neuroscience of Music

Koelsch, S., Jentschke, S., Sammler, D., & Mietchen, D. (2007). Untangling syntactic and
sensory processing: An ERP study of music perception. Psychophysiology, 44, 476–490.

Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005b). Adults and children
processing music: An fMRI study. NeuroImage, 25, 1068–1076.

Koelsch, S., Fritz, T., v Cramon, D. Y., Muller, K., & Friederici, A. D. (2006). Investigating
emotion with music: An fMRI study. Human Brain Mapping, 27, 239–250.

Koelsch, S., Gunter, T. C., Schroger, E., Tervaniemi, M., Sammler, D., & Friederici, A. D.
(2001). Differentiating ERAN and MMN: An ERP study. Neuroreport, 12, 1385–1389.

Koelsch, S., Gunter, T. C., v Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D.
(2002c). Bach speaks: A cortical “language-network” serves the processing of music. Neu­
roImage, 17, 956–966.

Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schroger, E., & Friederici, A. D.
(2003b). Children processing music: Electric brain responses reveal musical competence
and gender differences. Journal of Cognitive Neuroscience, 15, 683–693.

Koelsch, S., Kasper, E., Gunter, T. C., Sammler, D., Schulze, K., & Friederici, A. D. (2004).
Music, language, and meaning: Brain signatures of semantic processing. Nature Neuro­
science, 7, 302–307.

Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Muller, K., & Gruber, O. (2009). Functional
architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping,
30, 859–873.

Kraemer, D. J. M., Macrae, C. N., Green, A. E., & Kelley, W. M. (2005). Musical imagery:
Sound of silence activates auditory cortex. Nature, 434, 158–158.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford Uni­
versity Press.

Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological Review, 89, 334–
368.

Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomo­
tor recognition network while listening to newly acquired actions. Journal of Neuro­
science, 27, 308–314.

Lakatos, S. (2000). A common perceptual space for harmonic and percussive timbres. Per­
ception and Psychophysics, 62, 1426–1439.

Langheim, F. J. P., Callicott, J. H., Mattay, V. S., Duyn, J. H., & Weinberger, D. R. (2002).
Cortical systems associated with covert music rehearsal. NeuroImage, 16, 901–908.

Page 34 of 42
Cognitive Neuroscience of Music

Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by
short-term unimodal and multimodal musical training. Journal of Neuroscience, 28, 9632–
9639.

Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-
varying events. Psychological Review 106, 119–159.

Large, E. W., & Palmer, C. (2002). Perceiving temporal regularity in music. Cognitive
Science, 26, 1–37.

Leaver, A. M., Van Lare, J., Zielinski, B., Halpern, A. R., & Rauschecker, J. P. (2009). Brain
activation during anticipation of sound sequences. Journal of Neuroscience, 29, 2477–
2485.

Leino, S., Brattico, E., Tervaniemi, M., & Vuust, P. (2007). Representation of harmony
rules in the human brain: Further evidence from event-related potentials. Brain Research,
1142, 169–177.

Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception, 24,
329–366.

Levitin, D. J., & Menon, V. (2003). Musical structure is processed in “language” areas of
the brain: A possible role for Brodmann Area 47 in temporal coherence. NeuroImage, 20,
2142–2152.

Liegeois-Chauvel, C., Peretz, I., Babai, M., Laguitton, V., & Chauvel, P. (1998).
(p. 132)

Contribution of different cortical areas in the temporal lobes to music processing. Brain,
121, 1853–1867.

Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical perfor­
mance: An fMRI study of jazz improvisation. PLoS One, 3, e1679.

Lindenberger, U., Li, S. C., Gruber, W., & Muller, V. (2009). Brains swinging in concert:
Cortical phase synchronization while playing guitar. BMC Neuroscience, 10 (22), 1–12.

London, J. (2004). Hearing in time: Psychological aspects of musical meter. New York: Ox­
ford University Press.

Loui, P., Alsop, D., & Schlaug, G. (2009a). Tone deafness: A new disconnection syndrome?
Journal of Neuroscience, 29, 10215–10220.

Loui, P., Grent-’t-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the
neural processing of harmonic syntax in Western music. Cognitive Brain Research, 25,
678–687.

Loui, P., Wu, E. H., Wessel, D. L., & Knight, R. T. (2009b). A generalized mechanism for
perception of pitch patterns. Journal of Neuroscience, 29, 454–459.

Page 35 of 42
Cognitive Neuroscience of Music

Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is
processed in Broca’s area: An MEG study. Nature Neuroscience, 4, 540–545.

Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009).
Selective neurophysiologic responses to music in instrumentalists with different listening
biographies. Human Brain Mapping, 30, 267–275.

McAdams, S., Winsberg, S., Donnadieu, S., Desoete, G., & Krimphoff, J. (1995). Perceptual
scaling of synthesized musical timbres: Common dimensions, specificities, and latent sub­
ject classes. Psychological Research, 58, 177–192.

McDonald, I. (2006). Musical alexia with recovery: A personal account. Brain, 129, 2554–
2561.

Meister, I. G., Krings, T., Foltys, H., Boroojerdi, B., Muller, M., Topper, R., & Thron, A.
(2004). Playing piano in the mind: An fMRI study on music imagery and performance in
pianists. Cognitive Brain Research, 19, 219–228.

Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physio­
logical connectivity of the mesolimbic system. NeuroImage, 28, 175–184.

Menon, V., Levitin, D. J., Smith, B. K., Lembke, A., Krasnow, B. D., Glazer, D., Glover, G. H.,
& McAdams, S. (2002). Neural correlates of timbre change in harmonic sounds. NeuroI­
mage, 17, 1742–1754.

Meyer, M., Baumann, S., & Jancke, L. (2006). Electrical brain imaging reveals spatio-tem­
poral dynamics of timbre perception in humans. NeuroImage, 32, 1510–1523.

Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in
music: An event-related potential study. NeuroImage, 38, 331–345.

Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R.
(2007). A functional MRI study of happy and sad affective states induced by classical mu­
sic. Human Brain Mapping, 28, 1150–1162.

Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonality-re­
lated emotional contents. Neuroreport, 18, 1651–1655.

Munte, T. F., Altenmuller, E., & Jancke, L. (2002). The musician’s brain as a model of neu­
roplasticity. Nature Reviews, Neuroscience, 3, 473–478.

Näätänen, R. (1992). Attention and brain function. Hillsdale, NJ: Erlbaum.

Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in
cognitive neuroscience. Psychological Bulletin, 125, 826–859.

Nan, Y., Knosche, T. R., Zysset, S., & Friedericil, A. D. (2008) Cross-cultural music phrase
processing: An fMRI study. Human Brain Mapping, 29, 312–328.

Page 36 of 42
Cognitive Neuroscience of Music

Northoff, G., & Bermpohl, F. (2004). Cortical midline structures and the self. Trends in
Cognitive Sciences, 8, 102–107.

Northoff, G., Heinzel, A., Greck, M., Bennpohl, F., Dobrowolny, H., & Panksepp, J. (2006).
Self-referential processing in our brain: A meta-analysis of imaging studies on the self.
NeuroImage, 31, 440–457.

Otsuka, A., Tamaki, Y., & Kuriki, S. (2008). Neuromagnetic responses in silence after mu­
sical chord sequences. Neuroreport, 19, 1637–1641.

Paller, K. A., McCarthy, G., & Wood, C. C. (1992). Event-related potentials elicited by de­
viant endings to melodies. Psychophysiology, 29, 202–206.

Palmer, C., & Krumhansl, C. L. (1990). Mental representations for musical meter. Journal
of Experimental Psychology. Human Perception and Performance, 16, 728–741.

Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). In­
creased auditory cortical representation in musicians. Nature, 392, 811–814.

Parsons, L. M., Sergent, J., Hodges, D. A., & Fox, P. T. (2005). The brain basis of piano per­
formance. Neuropsychologia, 43, 199–215.

Pascual-Leone, A., Dang, N., Cohen, L. G., Brasilneto, J. P., Cammarota, A., & Hallett, M.
(1995). Modulation of muscle responses evoked by transcranial magnetic stimulation dur­
ing the acquisition of new fine motor-skills. Journal of Neurophysiology, 74, 1037–1045.

Pascual-Leone, A., Grafman, J., Hallett, M. (1994), Modulation of cortical motor output
maps during development of implicit and explicit knowledge. Science, 263, 1287–1289.

Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–
681.

Patel, A. D., & Balaban, E. (2000). Temporal patterns of human cortical activity reflect
tone sequence structure. Nature, 404, 80–84.

Patel, A. D., & Balaban, E. (2004). Human auditory cortical dynamics during perception of
long acoustic sequences: Phase tracking of carrier frequency by the auditory steady-state
response. Cerebral Cortex, 14, 35–46.

Patel, A. D., Foxton, J. M., & Griffiths, T. D. (2005). Musically tone-deaf individuals have
difficulty discriminating intonation contours extracted from speech. Brain and Cognition,
59, 310–313.

Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. J. (1998). Processing syntac­
tic relations in language and music: An event-related potential study. Journal of Cognitive
Neuroscience, 10, 717–733.

Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing
of temporal pitch and melody information in auditory cortex. Neuron, 36, 767–776.
Page 37 of 42
Cognitive Neuroscience of Music

Penhune, V. B., Zatorre, R. J., & Feindel, W. H. (1999). The role of auditory cortex in reten­
tion of rhythmic patterns as studied in patients with temporal lobe removals including
Heschl’s gyrus. Neuropsychologia, 37, 315–331.

Peretz, I. (1996). Can we lose memory for music? A case of music agnosia in a nonmusi­
cian. Journal of Cognitive Neuroscience, 8, 481–496.

Peretz, I. (2006). The nature of music from a biological perspective. Cognition,


(p. 133)

100, 1–32.

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience,
6, 688–691.

Perry, D. W., Zatorre, R. J., Petrides, M., Alivisatos, B., Meyer, E., & Evans, A. C. (1999).
Localization of cerebral activity during simple singing. Neuroreport, 10, 3979–3984.

Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors:
The same neural signature? Cerebral Cortex, 17, 2650–2658.

Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and
episodic memory of music are subserved by distinct neural networks. NeuroImage, 20,
244–256.

Popescu, M., Otsuka, A., & Ioannides, A. A. (2004). Dynamics of brain activity in motor
and frontal cortical areas during music listening: A magnetoencephalographic study. Neu­
roImage, 21, 1622–1638.

Pressing, J. (2002). Black Atlantic rhythm: Its computational and transcultural founda­
tions. Music Perception, 19, 285–310.

Ross, E. D., & Monnot, M. (2008). Neurology of affective prosody and its functional-
anatomic organization in right hemisphere. Brain and Language, 104, 51–74.

Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion: Electro­
physiological correlates of the processing of pleasant and unpleasant music. Psychophysi­
ology, 44, 293–304.

Samson, S., & Zatorre, R. J. (1988). Melodic and harmonic discrimination following unilat­
eral cerebral excision. Brain and Cognition, 7, 348–360.

Samson, S., & Zatorre, R. J. (1991). Recognition memory for text and melody of songs af­
ter unilateral temporal lobe lesion: Evidence for dual encoding. Journal of Experimental
Psychology. Learning, Memory, and Cognition, 17, 793–804.

Samson, S., & Zatorre, R. J. (1994). Contribution of the right temporal-lobe to musical
timbre discrimination. Neuropsychologia, 32, 231–240.

Page 38 of 42
Cognitive Neuroscience of Music

Samson, S., Zatorre, R. J., & Ramsay, J. O. (2002). Deficits of musical timbre perception af­
ter unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain, 125,
511–523.

Sarkamo, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., Autti,
T., Silvennoinen, H. M., Erkkilae, J., Laine, M., Peretz, I., & Hietanen, M. (2008). Music
listening enhances cognitive recovery and mood after middle cerebral artery stroke.
Brain, 131, 866–876.

Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain
regions in musicians during an ensemble: A PET study. Cognitive Brain Research, 12,
101–108.

Satoh, M., Takeda, K., Nagata, K., Shimosegawa, E., & Kuzuhara, S. (2006). Positron-
emission tomography of brain regions activated by recognition of familiar music. Ameri­
can Journal of Neuroradiology, 27, 1101–1106.

Schellenberg, E. G., Iverson, P., & McKinnon, M. C. (1999). Name that tune: Identifying
popular recordings from brief excerpts. Psychonomic Bulletin and Review, 6, 641–646.

Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psycholog­
ical Science, 14, 262–266.

Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: brain mechanisms medi­
ating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30.

Schlaug, G., Jancke, L., Huang, Y. X., & Steinmetz, H. (1995). In-vivo evidence of structur­
al brain asymmetry in musicians. Science, 267, 699–701.

Schmithorst, V. J., & Holland, S. K. (2003). The effect of musical training on music pro­
cessing: A functional magnetic resonance imaging study in humans. Neuroscience
Letters, 348, 65–68.

Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002).
Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musi­
cians. Nature Neuroscience, 5, 688–694.

Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., Dosch, H. G.,
Bleeck, S., Stippich, C., & Rupp, A. (2005). Structural and functional asymmetry of lateral
Heschl’s gyrus reflects pitch perception preference. Nature Neuroscience, 8, 1241–1247.

Schön, D., Anton, J. L., Roth, M., & Besson, M. (2002). An fMRI study of music sight-read­
ing. Neuroreport, 13, 2285–2289.

Schön, D., Semenza, C., & Denes, G. (2001). Naming of musical notes: A selective deficit
in one musical clef. Cortex, 37, 407–421.

Page 39 of 42
Cognitive Neuroscience of Music

Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a
new framework. Trends in Cognitive Sciences, 11, 211–218.

Sergent, J., Zuck, E., Terriah, S., & Macdonald, B. (1992). Distributed neural network un­
derlying musical sight-reading and keyboard performance. Science, 257, 106–109.

Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-
based morphometry reveals increased gray matter density in Broca’s area in male sym­
phony orchestra musicians. NeuroImage, 17, 1613–1622.

Sridharan, D., Levitin, D. J., Chafe, C. H., Berger, J., & Menon, V. (2007). Neural dynamics
of event segmentation in music: Converging evidence for dissociable ventral and dorsal
networks. Neuron, 55, 521–532.

Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language
indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex, 18,
1169–1178.

Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of harmonic expectancy viola­
tions in musical emotions: Evidence from subjective, physiological, and neural responses.
Journal of Cognitive Neuroscience, 18, 1380–1393.

Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Brain
changes after learning to read and play music. Neuroimage, 20 (1), 71–83. doi: http://
dx.doi.org/10.1016/S1053-8119(03)00248-9

Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the
brain: Disorders of musical listening. Brain, 129, 2533–2553.

Temperley, D. (2001). The cognition of basic musical structures. Cambridge, MA: MIT
Press.

Temperley, D. (2007). Music and probability. Cambridge, MA: MIT Press.

Thaut, M. (2005). Rhythm, music, and the brain: Scientific foundations and clinical appli­
cations. New York: Routledge.

Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-orga­
nizing approach. Psychological Review, 107, 885–913.

Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal
(p. 134)

cortex in musical priming. Cognitive Brain Research, 16, 145–161.

Toiviainen, P. (2007). Visualization of tonal content in the symbolic and audio domains.
Computing in Musicology, 15, 187–199.

Toiviainen, P., & Krumhansl, C. L. (2003). Measuring and modeling real-time responses to
music: The dynamics of tonality induction. Perception, 32, 741–766.

Page 40 of 42
Cognitive Neuroscience of Music

Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., & Naatanen, R.
(1998). Timbre similarity: Convergence of neural, behavioral, and computational ap­
proaches. Music Perception, 16, 223–241.

Tueting, P., Sutton, S., & Zubin, J. (1970). Quantitative evoked potential correlates of the
probability of events. Psychophysiology, 7, 385–394.

Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch
chroma and pitch height in the human brain. Proceedings of the National Academy of
Sciences U S A, 100, 10038–10042.

Warrier, C. M., & Zatorre, R. J. (2002). Influence of tonal context and timbral variation on
perception of pitch. Perception and Psychophysics, 64, 198–207.

Warrier, C. M., & Zatorre, R. J. (2004). Right temporal cortex is critical for utilization of
melodic contextual cues in a pitch constancy task. Brain, 127, 1616–1625.

Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hip­
pocampus and left inferior frontal gyrus. NeuroImage, 39, 483–491.

Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science,
20, 1–5.

Zarate, J. M., & Zatorre, R. J. (2008). Experience-dependent neural substrates involved in


vocal pitch regulation during singing. NeuroImage, 40, 1871–1887.

Zatorre, R. J. (1985). Discrimination and recognition of tonal melodies after unilateral


cerebral excisions. Neuropsychologia, 23, 31–41.

Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex:
Music and speech. Trends in Cognitive Sciences, 6, 37–46.

Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural mechanisms underlying melodic
perception and memory for pitch. Journal of Neuroscience, 14, 1908–1919.

Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the
mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive
Neuroscience, 8, 29–46.

Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Function­
al anatomy of musical processing in listeners with absolute pitch and relative pitch. Pro­
ceedings of the National Academy of Sciences U S A, 95, 3172–3177.

Petr Janata

Petr Janata is Professor at University of California Davis in the Psychology Depart­


ment and Center for Mind and Brain.

Page 41 of 42
Audition

Audition  
Josh H. McDermott
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0008

Abstract and Keywords

Audition is the process by which organisms use sound to derive information about the
world. This chapter aims to provide a bird’s-eye view of contemporary audition research,
spanning systems and cognitive neuroscience as well as cognitive science. The author
provides brief overviews of classic areas of research as well as some central themes and
advances from the past ten years. The chapter covers the sound transduction of the
cochlea, subcortical and cortical anatomical and functional organization of the auditory
system, amplitude modulation and its measurement, adaptive coding and plasticity, the
perception of sound sources (with a focus on the classic research areas of location, loud­
ness, and pitch), and auditory scene analysis (including sound segregation, streaming,
filling in, and reverberation perception). The chapter concludes with a discussion of
where hearing research seems to be headed at present.

Keywords: sound transduction, auditory system anatomy, modulation, adaptation, plasticity, pitch perception, au­
ditory scene analysis, sound segregation, streaming, reverberation

Introduction
From the cry of a baby to the rumble of a thunderclap, many events in the world produce
sound. Sound is created when matter in the world vibrates, and takes the form of pres­
sure waves that propagate through the air, containing clues about the environment
around us. Audition is the process by which organisms utilize these clues to derive infor­
mation about the world.

Audition is a crucial sense for most organisms. Humans, in particular, use sound to infer a
vast number of important things—what someone said, their emotional state when they
said it, and the whereabouts and nature of objects we cannot see, to name but a few.
When hearing is impaired (via congenital conditions, noise exposure, or aging), the conse­
quences can be devastating, such that a large industry is devoted to the design of pros­
thetic hearing devices.

Page 1 of 62
Audition

As listeners we are largely unaware of the computations underlying our auditory system’s
success, but they represent an impressive feat of engineering. The computational chal­
lenges of everyday audition are reflected in the gap between biological and machine hear­
ing systems—machine systems for interpreting sound currently fall far short of human
abilities. Understanding the basis of our success in perceiving sound will hopefully help
us to replicate it in machine systems and to restore it in biological auditory systems when
their function becomes impaired.

The goal of this chapter is to provide a bird’s-eye view of contemporary hearing research.
I provide brief overviews of classic areas of research as well as some central themes and
advances from the past ten years. The first section describes the sensory transduction of
the cochlea. The second section outlines subcortical and cortical functional organization.
(p. 136) The third section discusses modulation and its measurement by subcortical and

cortical regions of the auditory system, a key research focus of the past few decades. The
fourth section describes adaptive coding and plasticity, encompassing the relationship be­
tween sensory coding and the environment as well as its adaptation to task demands. The
fifth section discusses the perception of sound sources, focusing on location, loudness,
and pitch. The sixth section presents an overview of auditory scene analysis. I conclude
with a discussion of where hearing research is headed at present. Because other chapters
in this handbook are devoted to auditory attention, music, and speech, I will largely avoid
these topics.

The Problem
Just by listening, we can routinely apprehend many aspects of the world around us: the
size of a room in which we are talking, whether it is windy or raining outside, the speed
of someone approaching from behind, or whether the surface someone is walking on is
gravel or marble. These abilities are nontrivial because the properties of the world that
are of interest to a listener are generally not explicit in the acoustic input—they cannot be
easily recognized or discriminated using the sound waveform itself. The brain must
process the sound entering the ear to generate representations in which the properties of
interest are more evident. One of the main objectives of hearing science is to understand
the nature of these transformations and their instantiation in the brain.

Like other senses, audition is further complicated by a second challenge—that of scene


analysis. Although listeners are generally interested in the properties of individual ob­
jects or events, the ears are rarely presented with the sounds from isolated sources. In­
stead, the sound signal that reaches the ear is typically a mixture of sounds from different
sources. Such situations occur frequently in natural auditory environments, for example,
in social settings, where a single speaker of interest may be talking among many others,
and in music. From the mixture it receives as input, the brain must derive representa­
tions of the individual sound sources of interest, as are needed to understand someone’s
speech, recognize a melody, or otherwise guide behavior. Known as the “cocktail party
problem” (Cherry, 1953), or “auditory scene analysis” (Bregman, 1990), this problem has

Page 2 of 62
Audition

analogues in other sensory modalities, but the auditory version presents some uniquely
challenging features.

Sound Measurement—The Peripheral Auditory


System
The transformation of the raw acoustic input into representations that are useful for be­
havior is apparently instantiated over many brain areas and stages of neural processing,
spanning the cochlea, midbrain, thalamus, and cortex (Figure 8.1). The early stages of
this cascade are particularly intricate in the auditory system relative to other sensory sys­
tems, with many processing stations occurring before the cortex. The sensory organ of
the cochlea is itself a complex multicomponent system, whose investigation remains a
considerable challenge—the mechanical nature of the cochlea renders it much more diffi­
cult to probe (e.g., with electrodes) than the retina or olfactory epithelium, for instance.
Peripheral coding of sound is also unusual relative to that of other senses in its degree of
clinical relevance. Unlike vision, for which the most common forms of dysfunction are op­
tical in nature, and can be fixed with glasses, hearing impairment typically involves al­
tered peripheral neural processing, and its treatment has benefited from a detailed un­
derstanding of the processes that are altered. Much of hearing research has accordingly
been devoted to understanding the nature of the measurements made by the auditory pe­
riphery, and they provide a natural starting point for any discussion of how we hear.

Frequency Selectivity and the Cochlea

Hearing begins with the ear, where the sound pressure waveform carried by the air is
transduced into action potentials that are sent to the brain via the auditory nerve. Action
potentials are a binary code, but what is conveyed to the brain is far from simply a bina­
rized version of the incoming waveform. The transduction process is marked by several
distinctive signal transformations, the most obvious of which is produced by frequency
tuning.

Page 3 of 62
Audition

Figure 8.1 The auditory system. Sound is transduced


by the cochlea, processed by an interconnected set
of subcortical areas, and then fed into the core re­
gions of auditory cortex.

The coarse details of sound transduction are well understood (Figure 8.2). Sound induces
vibrations of the eardrum, which are transmitted via the bones of the middle ear to the
cochlea, the sensory organ of the auditory system. The cochlea is a coiled, fluid-filled
tube, containing several membranes that extend along its length and vibrate in response
to sound. Transduction of this mechanical vibration into an electrical signal occurs in the
organ of Corti, a mass of cells attached to the basilar membrane. The organ of Corti in
particular contains what are known as hair cells, named for the stereocilia that protrude
from them. The inner hair cells are (p. 137) responsible for sound transduction. When the
section of membrane on which they lie vibrates, the resulting deformation of the hair cell
body opens mechanically gated ion channels, inducing a voltage change within the cell.
Neurotransmitter release is triggered by the change in membrane potential, generating
action potentials in the auditory nerve fiber that the hair cell synapses with. This electri­
cal signal is carried by the auditory nerve fiber to the brain.

The frequency tuning of the transduction process occurs because different parts of the
basilar membrane vibrate in response to different frequencies. This is partly due to me­
chanical resonances—the thickness and stiffness of the membrane vary along its length,
producing a different resonant frequency at each point. However, the mechanical reso­
nances are actively enhanced via a feedback process, believed to be mediated largely by a
second set of cells, called the outer hair cells. The outer hair cells abut the inner hair
cells on the organ of Corti and serve to alter the basilar membrane vibration rather than
transduce it. They expand and contract in response to sound through mechanisms that
are only partially understood (Ashmore, 2008; Dallos, 2008; Hudspeth, 2008). Their mo­
tion alters the passive mechanics of the basilar membrane, amplifying the response to
low-intensity sounds and tightening the frequency tuning of the resonance. The upshot is
that high frequencies produce vibrations at the basal end of the cochlea (close to the
eardrum), whereas low frequencies produce vibrations at the apical end (far from the

Page 4 of 62
Audition

eardrum), with frequencies in between stimulating intermediate regions. The auditory


nerve fibers that synapse onto individual inner hair cells are thus frequency tuned—they
fire action potentials in response to a local range of frequencies, collectively providing
the rest of the auditory system with a frequency decomposition of the incoming wave­
form. As a result of this behavior, the cochlea is often described functionally as a set of
bandpass filters—filters that each pass frequencies within a particular range, and elimi­
nate those outside of it.

Figure 8.2 Structure of the peripheral auditory sys­


tem. Top right, Diagram of ear. The eardrum trans­
mits sound to the cochlea via the middle ear bones
(ossicles). Top middle, Inner ear. The semicircular
canals abut the cochlea. Sound enters the cochlea
via the oval window and causes vibrations along the
basilar membrane, which runs through the middle of
the cochlea. Top left, Cross section of cochlea. The
organ of Corti, containing the hair cells that trans­
duce sound into electrical potentials, sits on top of
the basilar membrane. Bottom, Schematic of section
of organ of Corti. The shearing that occurs between
the basilar and tectorial membranes when they vi­
brate (in response to sound) causes the hair cell
stereocilia to deform. The deformation causes a
change in the membrane potential of the inner hair
cells, transmitted to the brain via afferent auditory
nerve fibers. The outer hair cells, which are three
times more numerous than the inner hair cells, serve
as a feedback system to alter the basilar membrane
motion, tightening its tuning and amplifying the re­
sponse to low amplitude sounds.

Page 5 of 62
Audition

Figure 8.3 Frequency selectivity. A, Threshold tun­


ing curves of auditory nerve fibers from a cat ear,
plotting the level that was necessary to evoke a crite­
rion increase in firing rate for a given frequency (Mill
er, Schilling, et al., 1997). B, The tonotopy of the
cochlea. The position along the basilar membrane at
which auditory nerve fibers synapse with a hair cell
(determined by dye injections) is plotted vs. their
best frequency (Liberman, 1982).

Both parts of this figure are courtesy of Eric Young,


2010, who replotted data from the original sources.

The frequency decomposition of the cochlea is conceptually similar to the Fourier trans­
form, but differs in the way that the frequency spectrum is decomposed. Whereas the
Fourier transform uses linearly spaced frequency bins, each separated by the same num­
ber of hertz, the tuning bandwidth of auditory nerve fibers increases with their preferred
frequency. This characteristic can be observed in Figure 8.3A, in which the frequency re­
sponse of a set of auditory nerve fibers is (p. 138) (p. 139) plotted on a logarithmic frequen­
cy scale. Although the lowest frequency fibers are broader on a log scale than the high-
frequency fibers, in absolute terms their bandwidths are much lower—several hundred
hertz instead of several thousand. The distribution of best frequency along the cochlea
follows a roughly logarithmic function, apparent in Figure 8.3B, which plots the best fre­
quency of a large set of nerve fibers against the distance along the cochlea of the hair cell
that they synapse with. These features of frequency selectivity are present in most biolog­
ical auditory systems. It is partly for this reason that a log scale is commonly used for fre­
quency.

Cochlear frequency selectivity has a host of perceptual consequences—our ability to de­


tect a particular frequency is limited largely by the signal-to-noise ratio of the cochlear fil­
ter centered on the frequency, for instance. There are many treatments of frequency se­
lectivity and perception (Moore, 2003); it is perhaps the most studied aspect of hearing.

Page 6 of 62
Audition

Although the frequency tuning of the cochlea is uncontroversial, the teleological question
of why the cochlear transduction process is frequency-tuned remains less settled. How
does frequency tuning aid the brain’s task of recovering useful information about the
world from its acoustic input? Over the past two decades, a growing number of re­
searchers have endeavored to explain properties of sensory systems as optimal for the
task of encoding natural sensory stimuli, initially focusing on coding questions in vision,
and using notions of efficiency as the optimality criterion (Field, 1987; Olshausen & Field,
1996). Lewicki and colleagues have applied similar concepts to hearing, using algorithms
that derive efficient and sparse representations of sounds (Lewicki, 2002; Smith & Lewic­
ki, 2006), properties believed to be desirable of early sensory representations. They re­
port that for speech, or for (p. 140) combinations of environmental sounds and animal vo­
calizations, efficient representations for sound look much like the representation pro­
duced by auditory nerve fiber responses—sounds are represented with filters whose tun­
ing is localized in frequency. Interestingly, the resulting representations share the depen­
dence of bandwidth on frequency found in biological hearing—bandwidths increase with
frequency as they do in the ear. Moreover, representations derived in the same way for
“unnatural” sets of sounds, such as samples of white noise, do not exhibit frequency tun­
ing, indicating that the result is at least somewhat specific to the sorts of sounds com­
monly encountered in the world. These results suggest that frequency tuning provides an
efficient means to encode the sounds that were likely of importance when the auditory
system evolved, possibly explaining its ubiquitous presence in auditory systems. It re­
mains to be seen whether this framework can explain potential variation in frequency tun­
ing bandwidths across species (humans have recently been claimed to possess narrower
tuning than other species (Joris, Bergevin, et al., 2011; Shera, Guinan, et al., 2002), or the
broadening of frequency tuning with increasing sound intensity (Rhode, 1978), but it pro­
vides one means by which to understand the origins of peripheral auditory processing.

Amplitude Compression

A second salient transformation that occurs in the cochlea is that of amplitude compres­
sion, whereby the mechanical response of the cochlea to a soft sound (and thus the neur­
al response as well) is larger than would be expected given the response to a loud sound.
The response elicited by a sound is thus not proportional to the sound’s amplitude (as it
would be if the response were linear), but rather to a compressive nonlinear function of
amplitude. The dynamic range of the response to sound is thus “compressed” relative to
the dynamic range of the acoustic input. Whereas the range of audible sounds covers five
orders of magnitude, or 100 dB, the range of cochlear response covers only one or two or­
ders of magnitude (Ruggero, Rich, et al., 1997).

Compression appears to serve to map the range of amplitudes that the listener needs to
hear (i.e., those commonly encountered in the environment), onto the physical operating
range of the cochlea. Without compression, it would have to be the case that either
sounds low in level would be inaudible, or sounds high in level would be indiscriminable
(for they would fall outside the range that could elicit a response change). Compression

Page 7 of 62
Audition

permits very soft sounds to produce a physical response that is (just barely) detectable,
while maintaining some discriminability of higher levels.

The compressive nonlinearity is often approximated as a power function with an exponent


of 0.3 or so. It is not obvious why the compressive nonlinearity should take the particular
form that it does. Many different functions could in principle serve to compress the out­
put response range. It remains to be seen whether compression can be explained in terms
of optimizing the encoding of the input, as has been proposed for frequency tuning (but
see Escabi, Miller, et al., 2003). Most machine hearing applications also utilize amplitude
compression before analyzing sound, however, and it is widely agreed to be useful to am­
plify low amplitudes relative to large when processing sound.

Amplitude compression was first noticed in measurements of the physical vibrations of


the basilar membrane (Rhode, 1971; Ruggero, 1992) but is also apparent in auditory
nerve fiber responses (Yates, 1990) and is believed to account for a number of perceptual
phenomena (Moore & Oxenham, 1998). The effects of compression are related to
“cochlear amplification,” in that compression results from response enhancement that is
limited to low-intensity sounds. Compression is achieved in part via the outer hair cells,
whose motility modifies the motion of the basilar membrane in response to sound (Rug­
gero & Rich, 1991). Outer hair cell function is frequently altered in hearing impairment,
one consequence of which is a loss of compression, something that hearing aids attempt
to mimic.

Neural Coding in the Auditory Nerve

Figure 8.4 Phase locking. A, A 200-Hz pure tone


stimulus waveform aligned in time with several over­
laid traces of an auditory nerve fiber’s response to
the tone. Note that the spikes are not uniformly dis­
tributed in time, but rather occur at particular phas­
es of the sinusoidal input. B, A measure of phase
locking for each of a set of nerve fibers in response
to different frequencies. Phase locking decreases at
high frequencies.

Both parts of this figure are reprinted with permis­


sion from the original source: Javel & Mott, 1988.

Although frequency tuning and amplitude compression are at this point uncontroversial
and relatively well understood, several other empirical questions about peripheral audito­
ry coding remain unresolved. One important issue involves the means by which the audi­

Page 8 of 62
Audition

tory nerve encodes frequency information. As a result of the frequency tuning of the audi­
tory nerve, the spike rate of a nerve fiber contains information about frequency (a large
firing rate indicates that the sound input contains frequencies near the center of the
range of the fiber’s tuning). Collectively, the firing rates of all nerve fibers could thus be
used to estimate the instantaneous spectrum of a sound. However, spike timings also car­
ry frequency information. At least for low frequencies, the spikes that are fired in re­
sponse to sound do not occur randomly, (p. 141) but rather tend to occur at the peak dis­
placements of the basilar membrane vibration. Because the motion of a particular section
of the membrane mirrors the bandpass-filtered sound waveform, the spikes occur at the
waveform peaks (Rose, Brugge, et al., 1967). If the input is a single frequency, spikes thus
occur at a fixed phase of the frequency cycle (Figure 8.4A). This behavior is known as
phase locking and produces spikes at regular intervals corresponding to the period of the
frequency. The spike timings thus carry information that could potentially augment or su­
percede that conveyed by the rate of firing.

Phase locking degrades in accuracy as frequency is increased (Figure 8.4B) due to limita­
tions in the temporal fidelity of the hair cell membrane potential (Palmer & Russell, 1986)
and is believed to be largely absent for frequencies above 4 kHz in most mammals, al­
though there is some variability across species (Johnson, 1980; Palmer & Russell, 1986).
The appeal of phase locking as a code for sound frequency is partly due to features of
rate-based frequency selectivity that are unappealing from an engineering standpoint. Al­
though frequency tuning in the auditory system (as measured by auditory nerve spike
rates or psychophysical masking experiments) is narrow at low stimulus levels, it broad­
ens considerably as the level is raised (Glasberg & Moore, 1990; Rhode, 1978). Phase
locking, by comparison, is robust to sound level—even though a nerve fiber responds to a
broad range of frequencies when the level is high, the time intervals between spikes con­
tinue to convey frequency-specific information, as the peaks in the bandpass-filtered
waveform tend to occur at integer multiples of the periods of the component frequencies.

Our ability to discriminate frequency is impressive, with thresholds on the order of 1 per­
cent (Moore, 1973), and there has been long-standing interest in whether this ability in
part depends on fine-grained spike timing information (Heinz, Colburn, et al., 2001). Al­
though phase locking remains uncharacterized in humans because of the unavailability of
human auditory nerve recordings, it is presumed to occur in much the same way as in
nonhuman auditory systems. Moreover, several psychophysical phenomena are consistent
with a role for phase locking in human hearing. For instance, frequency discrimination
becomes much poorer for frequencies above 4 kHz (Moore, 1973), roughly the point at
which phase locking declines in nonhuman animals. The fundamental frequency of the
highest note on a piano is also approximately 4 kHz; this is also the point above which
melodic intervals between pure tones (tones containing a single frequency) are much less
evident (Attneave & Olson, 1971; Demany & Semal, 1990). These findings provide some
circumstantial evidence that phase locking is important for deriving precise estimates of
frequency, but definitive evidence remains elusive. It remains possible that the perceptual
degradations at high frequencies reflect a lack of experience with such frequencies, or

Page 9 of 62
Audition

their relative unimportance for typical behavioral judgments, rather than a physiological
limitation.

The upper limit of phase locking is also known to decrease markedly at each successive
stage of the auditory system (Wallace, Anderson, et al., 2007). (p. 142) By primary audito­
ry cortex, the upper cutoff is in the neighborhood of a few hundred hertz. It would thus
seem that the phase locking that occurs robustly in the auditory nerve would need to be
rapidly transformed into a spike rate code if it were to benefit processing throughout the
auditory system. Adding to the puzzle is the fact that frequency tuning is not thought to
be dramatically narrower at higher stages in the auditory system. Such tightening might
be expected if the frequency information provided by phase-locked spikes was trans­
formed to yield improved rate-based frequency tuning at subsequent stages (but see Bit­
terman, Mukamel, et al., 2008).

II. Organization of the Auditory System


Subcortical Pathways

The auditory nerve feeds into a cascade of interconnected subcortical regions that lead
up to the auditory cortex, as shown in Figure 8.1. The subcortical auditory pathways have
complex anatomy, only some of which is depicted in Figure 8.1. In contrast to the subcor­
tical pathways of the visual system, which are often argued to largely preserve the repre­
sentation generated in the retina, the subcortical auditory areas exhibit a panoply of in­
teresting response properties not found in the auditory nerve, many of which remain ac­
tive topics of investigation. Several subcortical regions will be referred to in the sections
that follow in the context of other types of acoustic measurements or perceptual func­
tions.

Feedback to the Cochlea

Like other sensory systems, the auditory system can be thought of as a processing cas­
cade, extending from the sensory receptors to cortical areas believed to mediate auditory-
based decisions. This “feedforward” view of processing underlies much auditory re­
search. As in other systems, however, feedback from later stages to earlier ones is ubiqui­
tous and substantial, and in the auditory system is perhaps even more pronounced than
elsewhere in the brain. Unlike the visual system, for instance, the auditory pathways con­
tain feedback extending all the way back to the sensory receptors. The function of much
of this feedback remains poorly understood, but one particular set of projections—the
cochlear efferent system—has been the subject of much discussion.

Efferent connections to the cochlea originate primarily from the superior olivary nucleus,
an area of the midbrain a few synapses removed from the cochlea (see Figure 8.1, al­
though the efferent pathways are not shown). The superior olive is divided into two sub­
regions, medial and lateral, and to first order, these give rise to two efferent projections:

Page 10 of 62
Audition

one from the medial superior olive to the outer hair cells, called the medial olivocochlear
(MOC) efferents, and one from the lateral superior olive to the inner hair cells, called the
lateral olivocochlear (LOC) efferents (Elgoyhen & Fuchs, 2010). The MOC efferents have
been relatively well studied. Their activation (e.g., by electrical stimulation) is known to
reduce the basilar membrane response to low-intensity sounds, and causes the frequency
tuning of the response to broaden. This is probably because the MOC efferents inhibit the
outer hair cells, which are crucial to amplifying the response to low-intensity sounds and
to sharpening frequency tuning.

The MOC efferents may serve a protective function by reducing the response to loud
sounds (Rajan, 2000), but their most commonly proposed function is to enhance the re­
sponse to transient sounds in noise (Guinan, 2006). When the MOC fibers are severed, for
instance, performance on tasks involving discrimination of tones in noise is reduced (May
& McQuone, 1995). Noise-related MOC effects are proposed to derive from its influence
on adaptation, which when induced by background noise, reduces the detectability of
transient foreground sounds by decreasing the dynamic range of the auditory nerve’s re­
sponse. Because MOC activation reduces the response to ongoing sound, adaptation in­
duced by continuous background noise is reduced, thus enhancing the response to tran­
sient tones that are too brief to trigger the MOC feedback themselves (Kawase, Delgutte,
et al., 1993; Winslow & Sachs, 1987). Another interesting but controversial proposal is
that the MOC efferents play a role in auditory attention. One study, for instance, found
that patients whose vestibular nerve (containing the MOC fibers) had been severed were
better at detecting unexpected tones after the surgery, suggesting that selective attention
had been altered so as to prevent the focusing of resources on expected frequencies
(Scharf, Magnan, et al., 1997). See Guinan, 2006, for a recent review of these and other
ideas about MOC efferent function.

Less is known about the LOC efferents. One recent study found that destroying the LOC
efferents to one ear in mice caused binaural responses to become “unbalanced” (Darrow,
Maison, et al., 2006)—when sounds were presented binaurally at equal levels, responses
from the two ears that were equal under normal conditions were generally not equal fol­
lowing the surgical procedure. The suggestion was that the LOC efferents serve to regu­
late binaural responses so that interaural intensity (p. 143) differences, crucial to sound
localization (see below), can be accurately registered.

Page 11 of 62
Audition

Tonotopy

Figure 8.5 Tonotopy. Best frequency of voxels in the


human auditory cortex, measured with fMRI, plotted
on the flattened cortical surface (Humphries, Lieben­
thal, et al., 2010). Note that the best frequency
varies quasi-smoothly over the cortical surface and is
suggestive of two maps that are approximately mir­
ror images of each other.

Although many of the functional properties of subcortical and cortical neurons are dis­
tinct from what is found in auditory nerve responses, frequency tuning persists. Every
subcortical region contains frequency-tuned neurons, and neurons tend to be spatially or­
ganized to some extent according to their best frequency, forming “tonotopic” maps. This
organization is also evident in the cortex. Many cortical neurons have a preferred fre­
quency, although they are often less responsive to pure tones (relative to sounds with
more complex spectra) and often have broader tuning than neurons in peripheral stages
(Moshitch, Las, et al., 2006). Cortical frequency maps were one of the first reported find­
ings in single-unit neurophysiology studies of the auditory cortex in animals, and have
since been found using functional magnetic resonance imaging (fMRI) in humans
(Formisano, Kim, et al., 2003; Humphries, Liebenthal, et al., 2010; Talavage, Sereno, et
al., 2004) as well as monkeys (Petkov, Kayser, et al., 2006). Figure 8.5 shows an example
of a tonotopic map obtained in a human listener with fMRI. Although never formally
quantified, it seems that tonotopy is less robust than the retinotopy found in the visual
system (evident, e.g., in recent optical imaging studies; Bandyopadhyay, Shamma, et al.,
2010; Rothschild, Nelken, et al., 2010).

Although the presence of some degree of tonotopy in the cortex is beyond question, its
functional importance remains unclear. Frequency selectivity is not the end goal of the
auditory system, and it does not obviously bear much relevance to behavior, so it is un­
clear why tonotopy would be a dominant principle of organization throughout the audito­

Page 12 of 62
Audition

ry system. It may be that other principles of organization are in fact more prominent but
have yet to be discovered. At present, however, tonotopy remains a staple of textbooks
and review chapters such as this.

Functional Organization

Largely on grounds of anatomy and connectivity, mammalian auditory cortex is standard­


ly divided into three sets of regions, shown in Figure 8.6: a core region receiving direct
input from the thalamus, a “belt” region surrounding it, and a “parabelt” region beyond
that (Kaas & Hackett, 2000; Sweet, Dorph-Petersen, et al., 2005). Within these areas,
tonotopy is often used to delineate distinct fields (a field is typically considered to contain
a single tonotopic map). The core region is divided in this way into areas A1, R (for ros­
tral), and RT (for rostrotemporal) in primates, with A1 and R receiving direct input from
the medial geniculate nucleus of the thalamus. There are also multiple belt areas (Petkov,
Kayser, et al., 2006), each receiving input from the core areas. Functional imaging re­
veals many additional areas that respond to sound in the awake primate, including parts
of parietal and frontal cortex (Poremba, Saunders, et al., 2003). There are some indica­
tions that the three core regions have different properties (Bendor & Wang, 2008), and
that stimulus selectivity increases in complexity from the core to surrounding areas
(Kikuchi, Horwitz, et al., 2010; Rauschecker & Tian, 2004; Tian & Rauschecker, 2004),
suggestive of a hierarchy of processing. However, at present, there is not a single widely
accepted framework for auditory cortical organization. Several principles of organization
have been proposed with varying degrees of empirical support; here, we review a few of
them.

Page 13 of 62
Audition

Figure 8.6 Anatomy of auditory cortex. A, Lateral


view of macaques cortex. The approximate location
of the parabelt region is indicated with dashed or­
ange lines. B, View of the brain from (A) after re­
moval of the overlying parietal cortex. Approximate
locations of the core (solid red line), belt (dashed yel­
low line), and parabelt (dashed orange line) regions
are shown. AS, arcuate sulcus; CS, central sulcus;
INS, insula; LS, lateral sulcus; STG, superior tempo­
ral gyrus; STS, superior temporal sulcus. C, Connec­
tivity between core and belt regions. Solid lines with
arrows denote dense connections; dashed lines with
arrows denote less dense connections. RT, R, and A1
compose the core; all three subregions receive input
from the thalamus. The areas surrounding the core
make up the belt, and the two regions outlined with
dashed lines make up the parabelt. The core has few
direct connections with the parabelt or more distant
cortical areas. AL, anterolateral; CL, caudolateral;
CM, caudomedial; CPB, caudal parabelt; ML, middle
lateral; MM, middle medial; RM, rostromedial; RPB,
rostral parabelt; RT, rostrotemporal; RTM, medial
rostrotemporal; RTL, lateral rostrotemporal.

All parts reprinted from original source: Kaas &


Hackett, 2000.

Some of the proposed organizational principles clearly derive inspiration from the visual
system. For (p. 144) instance, selectivity for vocalizations and selectivity for spatial loca­
tion have been found to be partially segregated, each being most pronounced in a differ­
ent part of the lateral belt (Tian, Reser, et al., 2001; Woods, Lopez, et al., 2006). These re­
gions have thus been proposed to constitute the beginning of ventral “what” and dorsal
“where” pathways analogous to those in the visual system, perhaps culminating in the
same parts of the prefrontal cortex as the analogous visual pathways (Cohen, Russ, et al.,
2009; Romanski, Tian, et al., 1999). Functional imaging results in humans have also been
viewed as supportive of this framework (Alain, Arnott, et al., 2001; Warren, Zielinski, et

Page 14 of 62
Audition

al., 2002). Additional evidence for a “what/where” dissociation comes from a recent study
in which sound localization and temporal pattern discrimination in cats were selectively
impaired by reversibly deactivating different regions of nonprimary auditory cortex
(Lomber & Malhotra, 2008). However, other studies have found less evidence for segre­
gation of tuning properties in early auditory cortex (Bizley, Walker, et al., 2009). More­
over, the properties of the “what” stream remain relatively undefined (Recanzone, 2008);
at this point, it has been defined mainly by reduced selectivity to spatial location.

There have been further attempts to extend the characterization of a ventral auditory
pathway by testing for specialization for the analysis of particular categories of sounds,
analogous to what has been found in the visual system (Kanwisher, 2010). The most wide­
ly proposed specialization is for vocalizations. Using functional imaging, regions of the
anterior temporal lobe have been identified in both humans (Belin, Zatorre, et al., 2000)
and macaques (Petkov, Kayser, et al., 2008) that appear to be somewhat selectively re­
sponsive to vocalizations and that could be homologous across species. Evidence for re­
gions selective for other categories is less clear at present (Leaver & Rauschecker, 2010),
although see the section below on pitch perception for a discussion of a cortical region
putatively involved in pitch processing.

Another proposal is that the left and right auditory cortices are specialized for different
aspects of signal processing, with the left optimized for temporal resolution and the right
for frequency resolution (Zatorre, Belin, et al., 2002). This idea is motivated by the uncer­
tainty principle of time–frequency analysis, whereby resolution cannot simultaneously be
optimized for both time and frequency. The evidence for hemispheric differences comes
mainly from functional imaging studies that manipulate spectral and temporal stimulus
characteristics (Samson, Zeffiro, et al., 2011; Zatorre & Belin, 2001) and neuropsycholo­
gy studies that find pitch perception deficits associated with right temporal lesions (John­
srude, Penhune, et al., 2000; Zatorre, 1985). (p. 145) A related alternative idea is that the
two hemispheres are specialized to analyze distinct timescales, with the left hemisphere
more responsive to short-scale temporal variation (e.g. tens of milliseconds) and the right
hemisphere more responsive to long-scale variation (e.g. hundreds of milliseconds)
(Boemio, Fromm, et al., 2005; Poeppel, 2003).

Page 15 of 62
Audition

III. Sound Measurement—Modulation


Amplitude Modulation and the Envelope

Figure 8.7 Amplitude modulation. A, The output of a


bandpass filter (centered at 340 Hz) for a recording
of speech, plotted in blue, with its envelope plotted in
red. B, Close-up of part of A (corresponding to the
black rectangle in A). Note that the filtered sound
signal (like the unfiltered signal) fluctuates around
zero at a high rate, whereas the envelope is positive-
valued and fluctuates more slowly. C, Spectrogram of
the same speech signal. Spectrogram is formed from
the envelopes (one of which is plotted in A) of a set
of filters mimicking the frequency tuning of the
cochlea. The spectrogram is produced by plotting
each envelope horizontally in grayscale. D, Power
spectra of the filtered speech signal in A and its en­
velope. Note that the envelope contains power only
at low frequencies (modulation frequencies), where­
as the filtered signal has power at a restricted range
of high frequencies (acoustic frequencies).

The cochlea decomposes the acoustic input into frequency channels, but much of the im­
portant information in sound is conveyed by the way that the output of these frequency
channels is modulated in amplitude. Consider Figure 8.7A, which displays in blue the out­
put of one such frequency channel for a short segment of a speech signal. The blue wave­
form oscillates at a rapid rate, but its amplitude waxes and wanes at a much lower rate
(evident in the close-up view of Figure 8.7B). This waxing and waning is known as ampli­
tude modulation and is a common feature of many modes of sound production (e.g., vocal
articulation). The amplitude is captured by what is known as the envelope of a signal,
shown in red for the signal of Figures 8.7A and B. Often, the envelopes of each cochlear
channel are stacked vertically and displayed as an image called a spectrogram, providing
a depiction of how the sound energy in each frequency channel varies over time (Figure
8.7C). Figure 8.7D shows the spectra of the signal and envelope shown in Figures 8.7A
and B. The signal spectrum is bandpass (because it is the output of a bandpass filter),
with energy at frequencies in the audible range. The envelope spectrum, in contrast, is
Page 16 of 62
Audition

low-pass, with most of the power below 10 Hz, corresponding to the slow rate at which
the envelope changes. The frequencies that compose the envelope are typically termed
modulation frequencies, distinct from the acoustic frequencies that compose the signal
that the envelope is derived from.

The information carried by a cochlear channel can thus be viewed as the product of “fine
structure”—a (p. 146) waveform that varies rapidly, at a rate close to the center frequency
of the channel—and an amplitude envelope that varies more slowly (Rosen, 1992). The
envelope and fine structure have a clear relation to common signal processing formula­
tions in which the output of a bandpass filter is viewed as a single sinusoid varying in am­
plitude and frequency—the envelope describes the amplitude variation, and the fine
structure describes the frequency variation. The envelope of a frequency channel is also
straightforward to extract from the auditory nerve—it can be obtained by low-pass filter­
ing a spike train (because the amplitude changes reflected in the envelope are relatively
slow). Despite the fact that envelope and fine structure are not completely independent
(Ghitza, 2001), there has been much interest in the past decade in distinguishing their
roles in different aspects of hearing (Smith, Delgutte, et al., 2002) and its impairment
(Lorenzi, Gilbert, et al., 2006).

Perhaps surprisingly, the temporal information contained in amplitude envelopes can be


sufficient for speech comprehension even when spectral information is severely limited.
In a classic paper, Shannon and colleagues isolated the information contained in the am­
plitude envelopes of speech signals with a stimulus known as noise-vocoded speech
(Shannon, Zeng, et al., 1995). Noise-vocoded speech is generated by filtering a speech
signal and a noise signal into frequency bands, multiplying the frequency bands of the
noise by the envelopes of the speech, and then summing the modified noise bands to syn­
thesize a new sound signal. By using a small number of broad frequency bands, spectral
information can be greatly reduced, leaving amplitude variation over time (albeit smeared
across a broader than normal range of frequencies) as the primary signal cue. Examples
are shown in Figure 8.8 for two, four, and eight bands. Shannon and colleagues found that
the resulting stimulus was intelligible even when just a few bands were used (i.e., with
much broader frequency tuning than is present in the cochlea), indicating that the tempo­
ral modulation of the envelopes contains much information about speech content.

Modulation Tuning

Motivated by its perceptual importance, amplitude modulation has been proposed to be


analyzed by dedicated banks of filters operating on the envelopes of cochlear filter out­
puts rather than the sound waveform itself (Dau, Kollmeier, et al., 1997). Early evidence
for such a notion came from masking and adaptation experiments, which found that the
detection of a modulated signal was impaired by a masker or adapting stimulus modulat­
ed at a similar frequency (Bacon & Grantham, 1989; Houtgast, 1989; Tansley & Suffield,
1983). There is now considerable evidence from neurophysiology that single neurons in
the midbrain, thalamus, and cortex exhibit some degree of tuning to modulation frequen­
cy (Depireux, Simon, et al., 2001; Joris, Schreiner, et al., 2004; Miller, Escabi, et al., 2001;

Page 17 of 62
Audition

Rodriguez, Chen, et al., 2010; Schreiner & Urbas, 1986, 1988; Woolley, Fremouw, et al.,
2005), loosely consistent with the idea of a modulation filter bank (Figure 8.9A). Because
such filters are typically conceived to operate on the envelope of a particular cochlear
channel, they are tuned both in acoustic frequency (courtesy of the cochlea) and modula­
tion frequency.

Neurophysiological studies in nonhuman animals (Schreiner & Urbas, 1986, 1988) and
neuroimaging results in humans (Boemio, Fromm, et al., 2005; Giraud, Lorenzi, et al.,
2000; Schonwiesner & Zatorre, 2009) have generally found that the auditory cortex re­
sponds preferentially to low modulation frequencies (in the range of 4–8 Hz), whereas
subcortical structures prefer higher rates (up to 100–200 Hz), with preferred modulation
frequency generally decreasing up the auditory pathway. Based on this, it is intriguing to
speculate that successive stages of the auditory system might process structure at pro­
gressively longer (slower) timescales, analogous to the progressive increase in receptive
field size that occurs in the visual system from V1 to inferotemporal cortex (Lerner, Hon­
ey, et al., 2011). Within the cortex, however, no hierarchy is clearly evident as of yet, at
least in the response to simple patterns of modulation (Boemio, Fromm, et al., 2005; Gi­
raud, Lorenzi, et al., 2000). Moreover, there is considerable variation within each stage of
the pathway in the preferred modulation frequency of individual neurons (Miller, Escabi,
et al., 2001; Rodriguez, Chen, et al., 2010). There are several reports of topographic orga­
nization for modulation frequency in the inferior colliculus, in which a gradient of pre­
ferred modulation frequency is observed orthogonal to the tonotopic gradient of pre­
ferred acoustic frequency (Baumann, Griffiths, et al., 2011; Langner, Sams, et al., 1997).
Whether there is topographic organization in the cortex remains unclear (Nelken, Bizley,
et al., 2008).

Page 18 of 62
Audition

Figure 8.8 Noise-vocoded speech. A, Spectrogram of


a speech utterance, generated as in Figure 8.7C. B–D
Spectrograms of noisevocoded versions of the utter­
ance from A, generated with eight (B), four, (C), or
two (D) channels. To generate the noise-vocoded
speech, the amplitude envelope of the original
speech signal was first measured in each of the fre­
quency bands in B, C, and D. A white noise signal
was then filtered into these same bands, and the
noise bands were multiplied by the corresponding
speech envelopes. These modulated noise bands
were then summed to generate a new sound signal.
It is visually apparent that the sounds in parts B to D
are spectrally coarser versions of the original utter­
ance. Good speech intelligibility is usually obtained
with only four channels, indicating that patterns of
amplitude modulation can support speech recogni­
tion in the absence of fine spectral detail.

Modulation tuning in single neurons is often studied by measuring spectrotemporal re­


ceptive fields (STRFs) (Depireux, Simon, et al., 2001), (p. 147) conventionally estimated
using techniques such as spike-triggered averaging. To compute an STRF, neuronal re­
sponses to a long, stochastically varying stimulus are recorded, after which the stimulus
spectrogram segments preceding each spike are averaged to yield the STRF—the stimu­
lus, described in terms of acoustic frequency content over time, that on average preceded
a spike. In Figure 8.9B, for instance, the STRF consists of a decrease in power followed
by an increase in power in the range of 10 kHz; the neuron would thus be likely to re­
spond well to a rapidly modulated 10 kHz tone, and less so to a tone whose amplitude
was constant. This STRF can be viewed as a filter that passes modulations in a certain
range of rates, that is, modulation frequencies. Note, however, that it is also tuned in
acoustic frequency (the dimension on the y-axis), responding only to modulations of fairly
high acoustic frequencies.

Page 19 of 62
Audition

Figure 8.9 Modulation tuning. A, Example of tempo­


ral modulation tuning curves for neurons in the me­
dial geniculate nucleus of the thalamus (Miller, Es­
cabi, et al., 2002). B, Example of the spectrotemporal
receptive field (STRF) from a thalamic neuron (Miller
, Escabi, et al., 2002). Note that the modulation in
the STRF is predominantly along the temporal di­
mension, and that this neuron would thus be sensi­
tive primarily to temporal modulation. C, Example of
STRFs from cortical neurons (Mesgarani, David, et
al., 2008). Note that the STRFs feature spectral mod­
ulation in addition to temporal modulation, and as
such are selective for more complex acoustic fea­
tures. Cortical neurons typically have longer laten­
cies than subcortical neurons, but this is not evident
in the STRFs, probably because of nonlinearities in
the cortical neurons that produce small artifacts in
the STRFs (Stephen David, personal communication).

Figure parts are taken from the original sources.

The STRF approximates a neuron’s output as a linear function of the cochlear input—the
result of convolving the spectrogram of the acoustic input with the STRF. However, it is
clear that linear models are inadequate to explain neuronal responses (Christianson, Sa­
hani, et al., 2008; Machens, Wehr, et al., 2004; Rotman, Bar Yosef, et al., 2001; Theunis­
sen, Sen, et al., 2000). Understanding the nonlinear contributions is an important direc­
tion (p. 148) of future research (Ahrens, Linden, et al., 2008; David, Mesgarani, et al.,
2009), as neuronal nonlinearities likely play critical computational roles, but at present
much analysis is restricted to linear receptive field estimates. There are established
methods for computing STRFs, and they exhibit many interesting properties even though
they are clearly not the whole story.

Modulation tuning functions (e.g., those shown in Figure 8.9A) can be obtained via the
Fourier transform of the STRF. Temporal modulation tuning is commonly observed, as
previously discussed, but some tuning is normally also present for spectral modulation—

Page 20 of 62
Audition

variation in power that occurs along the frequency axis. Spectral modulation is often evi­
dent as well in spectrograms of speech (e.g., Figure 8.7C) and animal vocalizations. Mod­
ulation results both from individual frequency components and from formants—the broad
spectral peaks that are present for vowel sounds due to vocal tract resonances. Tuning to
spectral modulation is generally less pronounced than to amplitude modulation, especial­
ly subcortically (Miller, Escabi, et al., 2001), but is an important feature of cortical re­
sponses (Barbour & Wang, 2003; Mesgarani, David, et al., 2008). Examples of cortical
STRFs with spectral modulation sensitivity are shown in Figure 8.9C.

(p. 149) IV. Adaptive Coding and Plasticity


Because the auditory system evolved to enable behavior in natural auditory environ­
ments, it is likely to be adapted for the representation of naturally occurring sounds. Nat­
ural sounds thus in principle should provide hearing researchers with clues about the
structure and function of the auditory system (Attias & Schreiner, 1997). In recent years
there has been increasing interest in the use of natural sounds as experimental stimuli
and in computational analyses of the relation between auditory representation and the
environment. Most of the insights gained thus far from this approach are “postdictive”—
they offer explanations of previously observed phenomena rather than revealing previous­
ly unforeseen mechanisms. For instance, we described earlier the attempts to explain
cochlear frequency selectivity as optimal for encoding natural sounds (Lewicki, 2002;
Smith & Lewicki, 2006).

The efficient coding hypothesis has also been proposed to apply to modulation tuning in
the inferior colliculus. Modulation tuning bandwidth tends to increase with preferred
modulation frequency (Rodriguez, Chen, et al., 2010), as would be predicted if the low-
pass modulation spectra of most natural sounds (Attias & Schreiner, 1997; McDermott,
Wrobleski, et al., 2011; Singh & Theunissen, 2003) were to be divided into channels con­
veying equal power. Inferior colliculus neurons have also been found to convey more in­
formation about sounds whose amplitude distribution follows that of natural sounds
rather than that of white noise (Escabi, Miller, et al., 2003). Along the same lines, studies
of STRFs in the bird auditory system indicate that neurons are tuned to the properties of
bird song and other natural sounds, maximizing discriminability of behaviorally important
sounds (Hsu, Woolley, et al., 2004; Woolley, Fremouw, et al., 2005). Similar arguments
have been made about the coding of binaural cues to sound localization (Harper &
McAlpine, 2004).

Other strands of research have explored whether the auditory system might further adapt
to the environment by changing its coding properties in response to changing environ­
mental statistics, so as to optimally represent the current environment. Following on re­
search showing that the visual system adapts to local contrast statistics (Fairhall, Lewen,
et al., 2001), numerous groups have reported evidence for neural adaptation in the audi­
tory system—responses to a fixed stimulus that vary depending on the immediate history
of stimulation (Ulanovsky, Las, et al., 2003; Kvale & Schreiner, 2004). In some cases, it

Page 21 of 62
Audition

can be shown that this adaptation increases information transmission. For instance, the
“tuning” of neurons in the inferior colliculus to sound intensity (i.e., the function relating
intensity to firing rate) depends on the mean and variance of the local intensity distribu­
tion (Dean, Harper, et al., 2005). Qualitatively, the rate–intensity curves shift so that the
point of maximum slope (around which neural discrimination of intensity is best) is closer
to the most commonly occurring intensity. Quantitatively, this behavior results in in­
creased information transmission about stimulus level.

Some researchers have recently taken things a step further, showing that auditory re­
sponses are dependent not just on the stimulus history but also on the task a listener is
performing. Fritz and colleagues found that the STRFs measured for neurons in the pri­
mary auditory cortex of awake ferrets change depending on whether the animals are per­
forming a task (Fritz, Shamma, et al., 2003), and that the nature of the change depends
on the task (Fritz, Elhilali, et al., 2005). For instance, STRF changes serve to accentuate
the frequency of a tone being detected, or to enhance discrimination of a target tone from
a reference. These changes are mirrored in sound-evoked responses in the prefrontal cor­
tex (Fritz, David, et al., 2010), which may drive the changes that occur in auditory cortex
during behavior. In some cases the STRF changes persist long after the animals are fin­
ished performing the task, and as such may play a role in sensory memory and perceptual
learning.

Perhaps surprisingly, long-term plasticity appears to occur as early as the brainstem,


where recent evidence in humans suggests considerable experience-dependent variation
across individuals. The data in question derive from an evoked electrical potential known
as the auditory brainstem response (ABR) (Skoe & Kraus, 2010). The ABR is recorded at
the scalp but is believed to originate in the brainstem. It often mirrors properties of the
stimulus, such that its power spectrum, for instance, often resembles that of the acoustic
input. The extent to which the ABR preserves the stimulus can thus be interpreted as a
measure of processing integrity. Interestingly, the ABR more accurately tracks stimulus
frequency for musician listeners than nonmusicians (Wong, Skoe, et al., 2007). This could
in principle reflect innate differences in auditory ability that predispose listeners to be­
come musicians or not, but it could also reflect the substantial differences in auditory ex­
perience between the two groups. Consistent (p. 150) with the latter notion, 10 hours of
training on a pitch discrimination task is sufficient to improve the fidelity of the ABR re­
sponse to frequency, providing clear evidence of experience-dependent plasticity
(Carcagno & Plack, 2011). Aspects of the ABR are also altered in listeners with reading
problems (Banai, Hornickel, et al., 2009). This line of research suggests that potentially
important individual differences are present at early stages of the auditory system, and
that these differences are in part the result of plasticity.

V. Sound Source Perception


Ultimately, we wish to understand not only what acoustic measurements are made by the
auditory system, as were characterized in the previous sections, but also how these mea­

Page 22 of 62
Audition

surements give rise to perception—what we hear when we listen to sound. Following


Helmholtz, we might suppose that the purpose of audition is to infer something about the
events in the world that produce sound. We can often identify sound sources with a ver­
bal label, for instance, and realize that we heard a finger snap, a flock of birds, or con­
struction noise. Even if we cannot determine the object that caused the sound, we may
nonetheless know something about what happened: that something fell onto a hard floor,
or into water (Gaver, 1993). Despite the richness of these aspects of auditory recognition,
remarkably little is known about them at present (speech recognition stands alone as an
exception), mainly because they are rarely studied (but see Gygi, Kidd, et al., 2004; Lutfi,
2008; McDermott & Simoncelli, 2011).

Perhaps because they are more easily controlled and manipulated, researchers have been
more inclined to instead study the perception of isolated properties of sounds or their
sources. Much research has concentrated in particular on three well-known properties of
sound: spatial location, pitch, and loudness. This focus is in some sense unfortunate be­
cause auditory perception is much richer than the hegemony of these three attributes in
hearing science would indicate. However, their study has nonetheless given rise to fruit­
ful lines of research that have yielded many useful insights about hearing more generally.

Localization

Localization is less precise in hearing than in vision but is nonetheless of great value, be­
cause sound enables us to localize objects that we may not be able to see. Human ob­
servers can judge the location of a source to within a few degrees if conditions are opti­
mal. The processes by which this occurs are among the best understood in hearing.

Spatial location is not made explicit on the cochlea, which provides a map of frequency
rather than of space, and instead must be derived from three primary sources of informa­
tion. Two of these are binaural, resulting from differences in the acoustic input to the two
ears. Due to the difference in path length from the source to the ears, and to the acoustic
shadowing effect of the head, sounds to one side of the vertical meridian reach the two
ears at different times and with different intensities. These interaural time and level dif­
ferences vary with direction and thus provide a cue to a sound source’s location. Binaural
cues are primarily useful for deriving the location of a sound in the horizontal plane, be­
cause changes in elevation do not change interaural time or intensity differences much.
To localize sounds in the vertical dimension, or to distinguish sounds coming from in front
of the head from those from in back, listeners rely on a third source of information: the
filtering of sounds by the body and ears. This filtering is direction specific, such that a
spectral analysis can reveal peaks and valleys in the frequency spectrum that are signa­
tures of location in the vertical dimension (Figure 8.10; discussed further below).

Page 23 of 62
Audition

Figure 8.10 Head-related transfer function (HRTF).


Example HRTF for the left ear of one human listener.
The gray level represents the amount by which a fre­
quency originating at a particular elevation is attenu­
ated or amplified by the torso, head, and ear of the
listener. Sounds are filtered differently depending on
their elevation, and the spectrum that is registered
by the cochlea thus provides a localization cue. Note
that most of the variation in elevation-dependent fil­
tering occurs at high frequencies (above 4 kHz).

Figure is reprinted with permission from original


source: Zahorik, Bangayan, et al., 2006.

Interaural time differences (ITDs) are typically a fraction of a millisecond, and just-notice­
able ITDs (which determine spatial acuity) can be as low as 10 microseconds (Klump &
Eady, 1956). This is striking given that neural refractory periods (which determine the
minimal interspike interval for a single neuron) are on the order of a millisecond, which
one might think would put a limit on the temporal resolution of neural representations.
Typical interaural level differences (ILDs) can be as large as 20 dB, with a just-noticeable
difference of about 1 dB. ILDs result from the acoustic shadow cast by the head. To first
order, ILDs are more pronounced for high frequencies because low frequencies are less
affected by the acoustic shadow (because their wavelengths are comparable to the dimen­
sions of the head). ITDs, in contrast, support localization most effectively at low frequen­
cies, when the time difference between individual cycles of sinusoidal sound components
can be detected via phase-locked spikes from the two ears (phase locking, as we dis­
cussed earlier, degrades at high frequencies). That said, ITDs between the envelopes of
high-frequency sounds can also produce percepts of localization. The classic “duplex”
view that localization is determined by either ILDs or ITDs, depending (p. 151) on the fre­
quency (Rayleigh, 1907), is thus not fully appropriate for realistic natural sounds, which
in general produce perceptible ITDs across the spectrum. See Middlebrooks and Green
(1991), for a review of much of the classic behavioral work on sound localization.

The binaural cues to sound location are extracted in the superior olive, a subcortical re­
gion where inputs from the two ears are combined. In most animals there appears to be
an elegant segregation of function, with ITDs being extracted in the medial superior olive
(MSO) and ILDs being extracted in the lateral superior olive (LSO). In both cases, accu­
rate coding of interaural differences is made possible by neural signaling with unusually
high temporal precision. This precision is needed to encode both sub-millisecond ITDs
and ILDs of brief transient events, for which the inputs from the ears must be aligned in
time. Brain structures subsequent to the superior olive largely inherit its ILD and ITD
Page 24 of 62
Audition

sensitivity. See Yin and Kuwada, 2010, for a recent review of the physiology of binaural lo­
calization.

Binaural cues are of little use in distinguishing sounds at different locations on the verti­
cal dimension (relative to the head), or in distinguishing front from back, because interau­
ral time and level differences are largely unaffected by changes across these locations.
Instead, listeners rely on spectral cues provided by the filtering of a sound by the torso,
head, and ears of a listener. The filtering results from the reflection and absorption of
sound by the surfaces of a listener’s body, with sound from different directions producing
different patterns of reflection and thus different patterns of filtering. The effect of these
interactions on the sound that reaches the eardrum can be described by a linear filter
known as the head-related transfer function (HRTF). The overall effect is that of amplify­
ing some frequencies while attenuating others. A broadband sound entering the ear will
thus be endowed with peaks and valleys in its frequency spectrum (see Figure 8.10).

Compelling sound localization can be perceived when these peaks and valleys are artifi­
cially induced. The effect of the filtering is obviously confounded with the spectrum of the
unfiltered sound source, and the brain must make some assumptions about the source
spectrum. When these assumptions are violated, as with narrowband sounds whose spec­
tral energy occurs at a peak in the HRTF of a listener, sounds are mislocalized (Middle­
brooks, 1992). For broadband sounds, however, HRTF filtering produces signatures that
are sufficiently distinct as to support localization in the vertical dimension to within 5 de­
grees or so in some cases, although some locations are more accurately perceived than
others (Makous & Middlebrooks, 1990; Wightman & Kistler, 1989).

The bulk of the filtering occurs in the outer ear (the pinna), the folds of which produce
distinctive pattern of reflections. Because pinna shapes vary across listeners, the HRTF is
listener specific as well as location specific, with spectral peaks and valleys that are in
different places for different listeners. Listeners appear to learn the HRTFs for their set
of ears. When ears are artificially modified with plastic molds that change their shape, lo­
calization initially suffers considerably, but over a period of weeks, listeners regain the
ability to localize with the modified ears (Hofman, Van Riswick, et al., 1998). Listeners
thus learn at least some of the details of their particular HRTF through experience, al­
though sounds (p. 152) can be localized even when the peaks and valleys of the pinna fil­
tering are somewhat blurred (Kulkarni & Colburn, 1998). Moreover, compelling spatial­
ization is often evident even if a generic HRTF is used.

The physiology of HRTF-related cues for localization is not as developed as it is for binau­
ral cues, but there is evidence that midbrain regions may again be important. Many infe­
rior colliculus neurons, for instance, show tuning to sound elevation (Delgutte, Joris, et
al., 1999). The selectivity for elevation presumably derives from tuning to particular spec­
tral patterns (peaks and valleys in the spectrum) that are diagnostic of particular loca­
tions (May, Anderson, et al., 2008).

Page 25 of 62
Audition

Although the key cues for sound localization are extracted subcortically, lesion studies re­
veal that the cortex is essential for localizing sound. Ablating auditory cortex typically
produces large deficits in localization (Heffner & Heffner, 1990), with unilateral lesions
producing deficits specific to locations contralateral to the side of the lesion (Jenkins &
Masterton, 1982). Consistent with these findings, tuning to sound location is widespread
in auditory cortical neurons, with the preferred location generally positioned in the con­
tralateral hemifield (Middlebrooks, 2000). Topographic representations of space have not
been found to be evident within individual auditory cortical areas, although one recent re­
port argues that such topography may be evident across multiple areas (Higgins, Storace,
et al., 2010).

Pitch

Although the word pitch is often used colloquially to refer to the perception of sound fre­
quency, in hearing research it has a more specific meaning—pitch is the perceptual corre­
late of periodicity. Vocalizations, instrument sounds, and some machine sounds are all of­
ten produced by periodic physical processes. Our vocal cords open and close at regular
intervals, producing a series of clicks separated by regular temporal intervals. Instru­
ments produce sounds via strings that oscillate at a fixed rate, or via tubes in which the
air vibrates at particular resonant frequencies, to give two examples. Machines frequent­
ly feature rotating parts, which often produce sounds at every rotation. In all these cases,
the resulting sounds are periodic—the sound pressure waveform consists of a single
shape that repeats at a fixed rate (Figure 8.11A). Perceptually, such sounds are heard as
having a pitch that can vary from low to high, proportional to the frequency at which the
waveform repeats (the fundamental frequency, i.e., the F0). The periodicity is distinct
from whether a sound’s frequencies fall in high or low regions of the spectrum, although
in practice periodicity and the spectral center of mass are sometimes correlated.

Pitch is important because periodicity is important—the period is often related to proper­


ties of the source that are useful to know, such as its size, or tension. Pitch is also used
for communicative purposes, varying in speech prosody, for instance, to convey meaning
or emotion. Pitch is a centerpiece of music, forming the basis of melody, harmony, and
tonality. Listeners also use pitch to track sound sources of interest in auditory scenes.

Many physically different sounds—all those with a particular period—have the same
pitch. Historically, pitch has been a focal point of hearing research because it is an impor­
tant perceptual property with a nontrivial relationship to the acoustic input, whose mech­
anistic characterization has been resistant to unambiguous solution. Debates on pitch and
related phenomena date back at least to Helmholtz, and continue to occupy many re­
searchers today (Plack, Oxenham, et al., 2005).

One central debate concerns whether pitch is derived from an analysis of frequency or
time. Periodic waveforms produce spectra whose frequencies are harmonically related—
they form a harmonic series, being integer multiples of the fundamental frequency, whose
period is the period of the waveform (Figure 8.11B). Although the fundamental frequency

Page 26 of 62
Audition

determines the pitch, the fundamental need not be physically present in the spectrum for
a sound to have pitch—sounds missing the fundamental frequency but containing other
harmonics of the fundamental are still perceived to have the pitch of the fundamental, an
effect known as the missing fundamental illusion. What matters for pitch perception is
whether the frequencies that are present are harmonically related. Pitch could thus con­
ceivably be detected with harmonic templates applied to an estimate of a sound’s spec­
trum obtained from the cochlea (Goldstein, 1973; Shamma & Klein, 2000; Terhardt, 1974;
Wightman, 1973). Alternatively, periodicity could be assessed in the time domain, for in­
stance via the autocorrelation function (Cariani & Delgutte, 1996; de Cheveigne & Kawa­
hara, 2002; Meddis & Hewitt, 1991). The autocorrelation measures the correlation of a
signal with a delayed copy of itself. For a periodic signal that repeats with some period,
the autocorrelation exhibits peaks at multiples of the period (Figure 8.11C).

Figure 8.11 Periodicity and pitch. Waveform, spec­


trum, and autocorrelation function for a note played
on an oboe. The note shown is the A above middle C,
with a fundamental frequency (F0) of 440 Hz. A, Ex­
cerpt of waveform. Note that the waveform repeats
every 2.27 ms (the period). B, Spectrum. Note the
peaks at integer multiples of the F0, characteristic of
a periodic sound. In this case, the F0 is physically
present, but the second, third, and fourth harmonics
actually have higher amplitude. C, Autocorrelation.
The correlation coefficient is always 1 at a lag of 0
ms, but because the waveform is periodic, correla­
tions close to 1 are also found at integer multiples of
the period (2.27, 4.55, 6.82, and 9.09 ms in this ex­
ample).

Figure reprinted with permission from original


source: McDermott & Oxenham, 2008.

(p. 153)

Page 27 of 62
Audition

Such analyses are in principle functionally equivalent because the power spectrum is re­
lated to the autocorrelation via the Fourier transform, and detecting periodicity in one do­
main versus the other might simply seem a question of implementation. In the context of
the auditory system, however, the two concepts diverge, due to information being limited
by distinct factors in the two domains. Time–domain models are typically assumed to uti­
lize fine-grained spike timing (i.e., phase locking), with concomitant temporal resolution
limits. In contrast, frequency-based models (often known as place models, in reference to
the frequency–place mapping that occurs on the basilar membrane) rely on the pattern of
excitation along the cochlea, which is limited in resolution by the frequency tuning of the
cochlea (Cedolin & Delgutte, 2005). Cochlear frequency selectivity is present in time–do­
main models of pitch as well, but its role is typically not to estimate the spectrum but sim­
ply to restrict an autocorrelation analysis to a narrow frequency band (Bernstein & Oxen­
ham, 2005), which might help improve its robustness in the presence of multiple sound
sources. Reviews of the current debates and their historical origins are available else­
where (de Cheveigne, 2004; Plack & Oxenham, 2005), and we will not discuss them ex­
haustively here. Suffice it to say that despite being a centerpiece of hearing research for
decades, the mechanisms underlying pitch perception remain under debate.

Research on pitch has provided many important insights about hearing even though a
conclusive account of pitch remains elusive. One contribution of pitch research has been
to reveal the importance of the resolvability of individual frequency components by the
cochlea, a principle that has importance in other aspects of hearing as well. Because the
frequency resolution of the cochlea is approximately constant on a logarithmic scale,
whereas the components of a harmonic tone are equally spaced on a linear scale (separat­
ed by a fixed number of hertz, equal to the fundamental frequency of the tone; Figure
8.12A), multiple high-numbered harmonics fall within a single cochlear filter (Figure
8.12B). Because of the nature of the log scale, this is true regardless of whether the fun­
damental is low or high. As a result, the excitation pattern induced by a tone on the
cochlea (of a human with normal hearing) is believed to contain resolvable peaks for only
the first ten or so harmonics (Figure 8.12C).

Page 28 of 62
Audition

Figure 8.12 Resolvability. A, Spectrum of a harmonic


complex tone composed of thirty-five harmonics of
equal amplitude. The fundamental frequency is 100
Hz—the frequency of the lowest component in the
spectrum and the amount by which adjacent harmon­
ics are separated. B, Frequency responses of audito­
ry filters, each of which represents a particular point
on the cochlea. Note that because a linear frequency
scale is used, the filters increase in bandwidth with
center frequency, such that many harmonics fall
within the passband of the high frequency filters. C,
The resulting pattern of excitation along the cochlea
in response to the tone in A. The excitation is the am­
plitude of vibration of the basilar membrane as a
function of characteristic frequency (the frequency to
which a particular point on the cochlea responds
best, i.e., the center frequency of the auditory filter
representing the response properties of the cochlea
at that point). Note that the first ten or so harmonics
produce resolvable peaks in the pattern of excitation,
but that higher numbered harmonics do not. The lat­
ter are thus said to be “unresolved.” D, The pattern
of vibration that would be observed on the basilar
membrane at several points along its length. When
harmonics are resolved, the vibration is dominated
by the harmonic close to the characteristic frequen­
cy, and is thus sinusoidal. When harmonics are unre­
solved, the vibration pattern is more complex, re­
flecting the multiple harmonics that stimulate the
cochlea at those points.

Figure reprinted with permission from original


source: Plack, 2005.

There is now abundant evidence that resolvability places strong constraints on pitch per­
ception. For instance, the perception of pitch is determined (p. 154) predominantly by low-
numbered harmonics (harmonics one to ten or so in the harmonic series), presumably ow­
ing to the peripheral resolvability of these harmonics. Moreover, the ability to discrimi­

Page 29 of 62
Audition

nate pitch is much poorer for tones synthesized with only high-numbered harmonics than
for tones containing only low-numbered harmonics, an effect not accounted for simply by
the frequency range in which the harmonics occur (Houtsma & Smurzynski, 1990; Shack­
leton & Carlyon, 1994). This might be taken as evidence that the spatial pattern of excita­
tion, rather than the periodicity that could be derived from the autocorrelation, underlies
pitch perception, but variants of autocorrelation-based models have also been (p. 155) pro­
posed to account for the effect of resolvability (Bernstein & Oxenham, 2005). Resolvabili­
ty has since been demonstrated to constrain sound segregation as well as pitch (Micheyl
& Oxenham, 2010); see below.

Just as computational theories of pitch remain a matter of debate, so do its neural corre­
lates. One might expect that neurons at some stage of the auditory system would be
tuned to stimulus periodicity, and there is one recent report of this in marmosets (Bendor
& Wang, 2005). However, comparable results have yet to be reported in other species
(Fishman, Reser, et al., 1998), and some have argued that pitch is encoded by ensembles
of neurons with broad tuning rather than single neurons selective for particular funda­
mental frequencies (Bizley, Walker, et al., 2010). In general, pitch-related responses can
be difficult to disentangle from artifactual responses to distortions introduced by the non­
linearities of the cochlea (de Cheveigne, 2010; McAlpine, 2004).

Given the widespread presence of frequency tuning in the auditory system, and the im­
portance of harmonic frequency relations in pitch, sound segregation (Darwin, 1997), and
music (McDermott, Lehr, et al., 2010), it is natural to think there might be neurons with
multipeaked tuning curves selective for harmonic frequencies. There are a few isolated
reports of such tuning (Kadia & Wang, 2003; Sutter & Schreiner, 1991), but the tuning
peaks do not always correspond to harmonic frequencies, and whether they relate to
pitch is unclear. At least given how researchers have looked for it thus far, tuning for har­
monicity is not as evident in the auditory system as might be expected.

If pitch is analyzed in a particular part of the brain, one might expect the region to re­
spond more to stimuli with pitch than to those lacking it, other things being equal. Such
response properties have in fact been reported in regions of auditory cortex identified
with functional imaging in humans (Hall, Barrett, et al. 2005; Patterson, Uppenkamp, et
al., 2002; Penagos, Melcher, et al., 2004; Schonwiesner & Zatorre, 2008). The regions are
typically reported to lie outside primary auditory cortex, and could conceivably be homol­
ogous to the region claimed to contain pitch-tuned neurons in marmosets (Bendor &
Wang, 2006), although again there is some controversy over whether pitch per se is impli­
cated (Hall & Plack, 2009). See Winter, 2005, and Walker, Bizley, et al., 2010, for recent
reviews of the brain basis of pitch.

In many contexts (e.g., the perception of music or speech intonation), it is the changes in
pitch over time that matter rather than the absolute value of the F0. For instance, pitch
increases or decreases are what capture the identity of a melody or the intention of a
speaker. Less is known about how this relative pitch information is represented in the
brain, but the right temporal lobe has been argued to be important, in part on the basis of

Page 30 of 62
Audition

brain-damaged patients with apparently selective deficits in relative pitch (Johnsrude,


Penhune, et al., 2000). See McDermott and Oxenham, 2008, for a review of the perceptu­
al and neural basis of relative pitch.

Loudness

Loudness is the perhaps the most immediate perceptual property of sound, and has been
actively studied for more than 150 years. To first order, loudness is the perceptual corre­
late of sound intensity. In real-world listening scenarios, loudness exhibits additional in­
fluences that suggest it serves to estimate the intensity of a sound source, as opposed to
the intensity of the sound entering the ear (which changes with distance and the listening
environment). However, loudness models that capture exclusively peripheral processing
nonetheless have considerable predictive power.

For a sound with a fixed spectral profile, such as a pure tone or a broadband noise, the
relationship between loudness and intensity can be approximated via the classic Stevens
power law (Stevens, 1955). However, the relation between loudness and intensity is not
as simple as one might imagine. For instance, loudness increases with increasing band­
width—a sound whose frequencies lie in a broad range will seem louder than a sound
whose frequencies lie in a narrow range, even when their physical intensities are equal.

Standard models of loudness thus posit something somewhat more complex than a simple
power law of intensity: that loudness is linearly related to the total amount of neural ac­
tivity elicited by a stimulus at the level of the auditory nerve (ANSI, 2007; Moore & Glas­
berg, 1996). The effect of bandwidth on loudness is explained via the compression that
occurs in the cochlea: loudness is determined by the neural activity summed across nerve
fibers, the spikes of which are generated after the output of a particular cochlear location
is nonlinearly compressed. Because compression boosts low responses relative to high re­
sponses, the sum of several responses to low amplitudes (produced by the several fre­
quency channels stimulated by a broadband sound) is greater than a single response to a
high amplitude (produced by a single frequency (p. 156) channel responding to a narrow­
band sound of equal intensity). Loudness also increases with duration for durations up to
half a second or so (Buus, Florentine, et al., 1997), suggesting that it is computed from
neural activity integrated over some short window.

The ability to predict perceived loudness is important in many practical situations, and is
a central issue in the fitting of hearing aids. Cochlear compression is typically reduced in
hearing-impaired listeners, and amplification runs the risk of making sounds uncomfort­
ably loud unless compression is introduced artificially. There has thus been long-standing
interest in quantitative models of loudness.

Loudness is also influenced in interesting ways by the apparent distance of a sound


source. Because intensity attenuates with distance from a sound source, the intensity of a
sound at the ear is determined conjointly by the intensity and distance of the source. At
least in some contexts, the auditory system appears to use loudness as a perceptual esti­
mate of a source’s intensity (i.e., the intensity at the point of origin), such that sounds
Page 31 of 62
Audition

that appear more distant seem louder than those that appear closer but have the same
overall intensity. Visual cues to distance have some influence on perceived loudness (Mer­
shon, Desaulniers, et al., 1981), but the cue provided by the amount of reverberation also
seems to be important. The more distant a source, the weaker the direct sound from the
source to the listener, relative to the reverberant sound that reaches the listener after re­
flection off of surfaces in the environment (see Figure 8.14). This ratio of direct to rever­
berant sound appears to be used both to judge distance and to calibrate loudness percep­
tion (Zahorik & Wightman, 2001), although how the listener estimates this ratio from the
sound signal remains unclear at present. Loudness thus appears to function somewhat
like size or brightness perception in vision, in which perception is not based exclusively
on retinal size or light intensity (Adelson, 2000).

VI. Auditory Scene Analysis


Thus far we have discussed how the auditory system represents single sounds in isola­
tion, as might be produced by a note played on an instrument, or a word uttered by some­
one talking. The simplicity of such isolated sounds renders them convenient objects of
study, yet in many auditory environments, isolated sounds are not the norm. It is often the
case that many things make sound at the same time, causing the ear to receive a mixture
of multiple sources as its input. Consider Figure 8.13, which displays spectrograms of a
single “target” speaker along with that of the mixture that results from adding to it the
utterances of one, three, and seven additional speakers, as might occur in a social set­
ting. The brain’s task in this case is to take such a mixture as input and recover enough of
the content of a target sound source to allow speech comprehension or otherwise support
behavior. This is a nontrivial task. In the example of Figure 8.13, for instance, it is appar­
ent that the structure of the target utterance is progressively obscured as more speakers
are added to the mixture. Machine systems for recognizing speech suffer dramatically un­
der such conditions, performing well in quiet, but much worse in the presence of multiple
speakers (Lippmann, 1997). The presence of competing sounds greatly complicates the
computational extraction of just about any sound source property, from pitch (de
Cheveigne, 2006) to location. Human listeners, however, parse auditory scenes with a re­
markable degree of success. In the example of Figure 8.13, the target remains largely au­
dible to most listeners even in the mixture of eight speakers. This is the classic “cocktail
party problem” (Bee & Micheyl, 2008; Bregman, 1990; Bronkhorst, 2000; Carlyon, 2004;
Cherry, 1953; Darwin, 1997; McDermott, 2009).

Historically, the “cocktail party problem” has referred to two conceptually distinct prob­
lems that in practice are closely related. The first, known as sound segregation, is the
problem of deriving representations of individual sound sources from a mixture of sounds.
The second is the task of directing attention to one source among many, as when listening
to a particular speaker at a party. These tasks are related because the ability to segregate
sounds is probably dependent on attention (Carlyon, Cusack, et al., 2001; Shinn-Cunning­
ham, 2008), although the extent and nature of this dependence remains an active area of
study (Macken, Tremblay, et al., 2003). Here, we will focus on the first problem, of sound
Page 32 of 62
Audition

segregation, which is usually studied under conditions in which listeners pay full atten­
tion to a target sound. Al Bregman, a Canadian psychologist, is typically credited with
drawing interest to this problem and pioneering its study (Bregman, 1990).

Sound Segregation and Acoustic Grouping Cues

Figure 8.13 The cocktail party problem. Spectro­


grams of a single “target” utterance (top row), and
the same utterance mixed with one, three, and seven
additional speech signals from different speakers.
The mixtures approximate the signal that would en­
ter the ear if the additional speakers were talking as
loud as the target speaker, but were standing twice
as far away from the listener (to simulate cocktail
party conditions). The grayscale denotes attenuation
from the maximum energy level across all of the sig­
nals (in dB), such that gray levels can be compared
across spectrograms. Spectrograms in the right col­
umn are identical to those on the left except for the
superimposed color masks. Pixels labeled green are
those where the original target speech signal is more
than –50 dB but the mixture level is at least 5 dB
higher, and thus masks the target speech. Pixels la­
beled red are those where the target had less than
-50 dB and the mixture had more than –50 dB ener­
gy. Spectrograms were computed from a filter bank
with bandwidths and frequency spacing similar to
those in the ear. Each pixel is the rms amplitude of
the signal within a frequency band and time window.

Figure reprinted with permission from original


source: McDermott, 2009.

Sound segregation is a classic example of an ill-posed problem in perception. Many differ­


ent sets of sounds are physically consistent with the mixture (p. 157) that enters the ear
(in that their sum is equal to the mixture), only one of which actually occurred in the
world. The auditory system must infer the set of sounds that actually occurred. As in oth­
Page 33 of 62
Audition

er ill-posed problems, this inference is only possible with the aid of assumptions that con­
strain the solution. In this case, the assumptions concern the nature of sounds in the
world, and are presumably learned from experience with natural sounds (or perhaps
hard-wired into the auditory system via evolution).

Grouping cues (i.e., sound properties that dictate whether sound elements are heard as
part of the same sound) are examples of these assumptions. For instance, natural sounds
that have pitch, such as vocalizations, contain frequencies that are harmonically related,
evident as banded structures in lower half of the spectrogram of the target speaker in
Figure 8.13. Harmonically related frequencies are unlikely to occur from the chance
alignment of multiple different sounds, and thus when they (p. 158) are present in a mix­
ture, they are likely to be due to the same sound and are generally heard as such (de
Cheveigne, McAdams, et al., 1995; Roberts & Brunstrom, 1998). Moreover, a component
that is mistuned (in a tone containing otherwise harmonic frequencies) segregates from
the rest of the tone (Moore, Glasberg, et al., 1986). Understanding sound segregation re­
quires understanding the acoustic regularities, such as harmonicity, that characterize nat­
ural sound sources and that are used by the auditory system.

Perhaps the most important generic acoustic grouping cue is common onset: frequency
components that begin and end at the same time are likely to belong to the same sound.
Onset differences, when manipulated experimentally, cause frequency components to per­
ceptually segregate from each other (Cutting, 1975; Darwin, 1981). Interestingly, a com­
ponent that has an earlier or later onset than the rest of a set of harmonics has reduced
influence over the perceived pitch of the entire tone (Darwin & Ciocca, 1992), suggesting
that pitch computations operate on frequency components that are deemed likely to be­
long together, rather than on the raw acoustic input.

Onset may be viewed as a special case of comodulation—amplitude modulation that is


common to different spectral regions. In some cases relatively slow comodulation pro­
motes grouping of different spectral components (Hall, Haggard, et al., 1984), although
abrupt onsets seem to be most effective. Common offset also promotes grouping but is
less effective than common onset (Darwin, 1984), perhaps because abrupt offsets are less
common in natural sounds (Cusack & Carlyon, 2004).

Not every intuitively plausible grouping cue produces a robust effect when assessed psy­
chophysically. For instance, frequency modulation (FM) that is shared (“coherent”) across
multiple frequency components, as in voiced speech, has been proposed to promote their
grouping (Bregman, 1990; McAdams, 1989). However, listeners are poor at discriminat­
ing coherent from incoherent FM if the component tones are not harmonically related, in­
dicating that sensitivity to FM coherence may simply be mediated by the deviations from
harmonicity that occur when harmonic tones are incoherently modulated (Carlyon, 1991).

One might also think that the task of segregating sounds would be greatly aided by the
tendency of distinct sound sources in the world to originate from distinct locations. In
practice, spatial cues are indeed of some benefit, for instance, in hearing a target sen­
tence from one direction amid distracting utterances from other directions (Bronkhorst,
Page 34 of 62
Audition

2000; Hawley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast,
et al., 2005). However, spatial cues are surprisingly ineffective at segregating one fre­
quency component from a group of others (Culling & Summerfield, 1995), especially
when pitted against other grouping cues such as onset or harmonicity (Darwin & Hukin,
1997). The benefit of listening to a target with a distinct location (Bronkhorst, 2000; Haw­
ley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast, et al.,
2005) may thus be due to the ease with which the target can be attentively tracked over
time amid competing sound sources, rather than to a facilitation of auditory grouping per
se (Darwin & Hukin, 1999). Moreover, humans are usually able to segregate monaural
mixtures of sounds without difficulty, demonstrating that spatial separation is often not
necessary for high performance. For instance, much popular music of the twentieth cen­
tury was released in mono, and yet listeners have no trouble distinguishing many differ­
ent instruments and voices in any given recording. Spatial cues thus contribute to sound
segregation, but their presence or absence does not seem to fundamentally alter the
problem.

The weak effect of spatial cues on segregation may reflect their fallibility in complex audi­
tory scenes. Binaural cues can be contaminated when sounds are combined or degraded
by reverberation (Brown & Palomaki, 2006) and can even be deceptive, as when caused
by echoes (whose direction is generally different from the original sound source). It is
possible that the efficacy of different grouping cues in general reflects their reliability in
natural conditions. Evaluating this hypothesis will require statistical analysis of natural
auditory scenes, an important direction for future research.

Sequential Grouping

Because the spectrogram approximates the input that the cochlea provides to the rest of
the auditory system, it is common to view the problem of sound segregation as one of de­
ciding how to group the various parts of the spectrogram (Bregman, 1990). However, the
brain does not receive an entire spectrogram at once. Rather, the auditory input arrives
gradually over time. Many researchers thus distinguish between the problem of simulta­
neous grouping (determining how the spectral content of a short segment of the auditory
input should be segregated) and sequential grouping (determining how the (p. 159)
groups from each segment should be linked over time, e.g., to form a speech utterance or
a melody) (Bregman, 1990).

Although most of the classic grouping cues (e.g., onset/comodulation, harmonicity, ITD)
are quantities that could be measured over short timescales, the boundary between what
is simultaneous and what is sequential is unclear for most real-world signals, and it may
be more appropriate to view grouping as being influenced by processes operating at mul­
tiple timescales rather than two cleanly divided stages of processing. There are, however,
contexts in which the bifurcation into simultaneous and sequential grouping stages is nat­
ural, as when the auditory input consists of discrete sound elements that do not overlap
in time. In such situations interesting differences are sometimes evident between the
grouping of simultaneous and sequential elements. For instance, spatial cues, which are

Page 35 of 62
Audition

relatively weak as a simultaneous cue, have a stronger influence on sequential grouping


of tones (Darwin & Hukin, 1997).

Another clear case of sequential processing can be found in the effects of sound repeti­
tion. Sounds that occur repeatedly in the acoustic input are detected by the auditory sys­
tem as repeating, and are inferred to be a single source. Perhaps surprisingly, this is true
even when the repeating source is embedded in mixtures with other sounds, and is never
presented in isolation (McDermott, Wrobleski, et al., 2011). In such cases the acoustic in­
put itself does not repeat, but the source repetition induces correlations in the input that
the auditory system detects and uses to extract the repeating sound. The informativeness
of repetition presumably results from the fact that mixtures of multiple sounds tend not to
occur repeatedly, such that when a structure does repeat, it is likely to be a single source.

Effects of repetition are also evident in classic results on “informational masking”—mask­


ing-like effects on the detectability of a target tone, so-called because they cannot be ex­
plained in terms of conventional “energetic masking,” (in which the response to the tar­
get is swamped by a masker that falls within the same peripheral channel). Demonstra­
tions of informational masking typically present a target tone along with other tones that
lie outside a “protected region” of the spectrum, such that they are unlikely to stimulate
the same filters as the target tone. These “masking” tones nonetheless often elevate the
detection threshold for the target, sometimes quite dramatically (Durlach, Mason, et al.,
2003; Lutfi, 1992; Neff, 1995; Watson, 1987). The effect is presumably due to impair­
ments in the ability to segregate the target tone from the masker tones, and can be re­
duced when the target is repeatedly presented (Kidd, Mason et al., 1994; Kidd, Mason et
al., 2003).

Streaming

One type of sequential segregation effect has particularly captured the imagination of the
hearing community and merits special mention. When two pure tones of different fre­
quency are repeatedly presented in alternation, one of two perceptual states is commonly
reported by listeners: one in which the two repeated tones are heard as a single “stream”
whose pitch varies over time, and one in which two distinct streams are heard, one with
the high tones and one with the low tones (Bregman & Campbell, 1971). If the frequency
separation between the two tones is small, and if the rate of alternation is slow, one
stream is generally heard. When the frequency separation is larger or the rate is faster,
two streams tend to be heard, in which case “streaming” is said to occur (van Noorden,
1975).

An interesting hallmark of this phenomenon is that when two streams are perceived,
judgments of the temporal order of elements in different streams are impaired (Bregman
& Campbell, 1971; Micheyl & Oxenham, 2010). This latter finding provides compelling ev­
idence for a substantive change in the representation underlying the two percepts. Sub­
sequent research has demonstrated that separation along most dimensions of sound can
elicit streaming (Moore & Gockel, 2002). The streaming effects in these simple stimuli

Page 36 of 62
Audition

may be viewed as a variant of grouping by similarity—elements are grouped together


when they are similar along some dimension, and segregated when they are sufficiently
different, presumably because this similarity reflects the likelihood of having been pro­
duced by the same source.

Filling in

Although it is common to view sound segregation as the problem of grouping the spectro­
gram-like output of the cochlea across frequency and time, this cannot be the whole story,
in part because large swaths of a sound’s time–frequency representation are often physi­
cally obscured (masked) by other sources and are thus not physically available to be
grouped. Masking is evident in the green pixels of Figure 8.13, which represent points
where the target source has substantial energy, but where the mixture exceeds it in level.
If these points are simply assigned (p. 160) to the target, or omitted from its representa­
tion, the target’s level at those points will be misconstrued, and the sound potentially
misidentified. To recover an accurate estimate of the target source, it is necessary to in­
fer not just the grouping of the energy in the spectrogram but also the structure of the
target source in the places where it is masked.

There is in fact considerable evidence that the auditory system does just this, from exper­
iments investigating the perception of partially masked sounds. For instance, tones that
are interrupted by noise bursts are “filled in” by the auditory system, such that they are
heard as continuous in conditions in which physical continuity is plausible given the stim­
ulus (Warren, Obusek, et al., 1972). Known as the “continuity effect”, it occurs only when
the interrupting noise bursts are sufficiently intense in the appropriate part of the spec­
trum to have masked the tone should it have been present continuously. Continuity is also
heard for frequency glides (Ciocca & Bregman, 1987; Kluender & Jenison, 1992) as well
as oscillating frequency-modulated tones (Carlyon, Micheyl, et al., 2004). The perception
of continuity across intermittent maskers was actually first reported for speech signals in­
terrupted by noise bursts (Warren, 1970). For speech, the effect is often termed phonemic
restoration, and likely indicates that knowledge of speech acoustics (and perhaps of other
types of sounds as well) influences the inference of the masked portion of sounds. Similar
effects occur in the spectral domain—regions of the spectrum are perceptually filled in
when evidence indicates they are likely to have been masked, e.g. by a continuous noise
source (McDermott & Oxenham, 2008). Filling-in effects in hearing are conceptually simi­
lar to completion under and over occluding surfaces in vision, although the ecological
constraints provided by masking (involving the relative intensity of two sounds) are dis­
tinct from those provided by occlusion (involving the relative depth of two surfaces). Neu­
rophysiological evidence indicates that the representation of tones in primary auditory
cortex reflects the perceived continuity, responding as though the tone were continuously
present despite being interrupted by noise (Petkov, O’Connor, et al., 2007; Riecke, van
Opstal, et al., 2007).

Page 37 of 62
Audition

Brain Basis of Sound Segregation

Recent years have seen great interest in how sound segregation is instantiated in the
brain. One proposal that has attracted interest is that sounds are heard as segregated
when they are represented in non-overlapping neural populations at some stage of the au­
ditory system. This idea derives largely from studies of the pure-tone streaming phenome­
na described earlier, with the hope that it will extend to more realistic sounds.

The notion is that conditions that cause two tones to be represented in distinct neural
populations are also those that cause sequences of two tones to be heard as separate
streams (Bee & Klump, 2004; Fishman, Arezzo, et al., 2004; Micheyl, Tian, et al., 2005;
Pressnitzer, Sayles, et al., 2008). Because of tonotopy, different frequencies are processed
in neural populations whose degree of overlap decreases as the frequencies become more
separated. Moreover, tones that are more closely spaced in time are more likely to reduce
each other’s response (via what is termed suppression), which also reduces overlap be­
tween the tone representations—a tone on the outskirts of a neuron’s receptive field
might be sufficiently suppressed as to not produce a response at all. These two factors,
frequency separation and suppression, predict the two key effects in pure-tone stream­
ing: that streaming should increase when tones are more separated in frequency or are
presented more quickly (van Noorden, 1975).

Experiments over the past decade in multiple animal species indicate that pure-tone se­
quences indeed produce non-overlapping neural responses under conditions in which
streaming is perceived by human listeners (Bee & Klump, 2004; Fishman, Arezzo, et al.,
2004; Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al., 2008). Some of these experi­
ments take advantage of another notable property of streaming—its strong dependence
on time. Specifically, the probability that listeners report two streams increases with time
from the beginning of the sequence, an effect termed buildup (Bregman, 1978). Buildup
has been linked to neurophysiology via neural adaptation. Because neural responses de­
crease with stimulus repetition, over time it becomes less likely that two stimuli with dis­
tinct properties will both exceed the spiking threshold for the same neuron, such that the
neural responses to two tones become increasingly segregated on a timescale consistent
with that of perceptual buildup (Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al.,
2008). For a comprehensive review of these and related studies, see Snyder and Alain,
2007, and Fishman and Steinschneider, 2010.

A curious feature of these studies is that they suggest that streaming is an accidental side
effect of what would appear to be general features of the auditory system—tonotopy, sup­
pression, and (p. 161) adaptation. Given that sequential grouping seems likely to be of
great adaptive significance (because it affects our ability to recognize sounds), it would
seem important for an auditory system to behave close to optimally, that is, for the per­
ception of one or two streams to be related to the likelihood of one or two streams in the
world. It is thus striking that the phenomenon is proposed to result from apparently inci­
dental features of processing. Consistent with this viewpoint, a recent study showed that
synchronous high- and low-frequency tones produce neural responses that are just as

Page 38 of 62
Audition

segregated as those for the classic streaming configuration of alternating high and low
tones, even though perceptual segregation does not occur when the tones are synchro­
nous (Elhilali, Ma, et al., 2009). This finding indicates that non-overlapping neural re­
sponses are not sufficient for perceptual segregation, and that the relative timing of neur­
al responses may be more important. The significance of neural overlap thus remains un­
clear, and the brain basis of streaming will undoubtedly continue to be debated in the
years to come.

Separating Sound Sources from the Environment

Thus far we have mainly discussed how the auditory system segregates the signals from
multiple sound sources, but listeners face a second important scene analysis problem.
The sound that reaches the ear from a source is almost always altered to some extent by
the surrounding environment, and these environmental influences must be separated
from those of the source if the source content is to be estimated correctly. Typically the
sound produced by a source reflects off multiple surfaces on its way to the ears, such that
the ears receive some sound directly from the source, but also many reflected versions
(Figure 8.14). These reflected versions (echoes) are delayed because their path to the ear
is lengthened, but generally they also have altered frequency spectra because reflective
surfaces absorb some frequencies more than others. Because each reflection can be well
described with a linear filter applied to the source signal, the signal reaching the ear,
which is the sum of the direct sound along with all the reflections, can be described sim­
ply as the result of applying a single composite linear filter to the source (Gardner, 1998).
Significant filtering of this sort occurs in almost every natural listening situation, such
that sound produced in anechoic conditions (in which all surfaces are minimally reflec­
tive) sounds noticeably strange and unnatural.

Listeners are often interested in the properties of sound sources, and one might think of
the environmental effects as a nuisance that should simply be discounted. However, envi­
ronmental filtering imbues the acoustic input with useful information—for instance, about
the size of a room where sound is produced and the distance of the source from the lis­
tener. It is thus more appropriate to think of separating source and environment, at least
to some extent, rather than simply recovering the source. Reverberation is commonly
used in music production, for instance, to create a sense of space or to give a different
feel to particular instruments or voices.

The loudness constancy phenomena discussed earlier are one example of the brain infer­
ring the properties of the sound source as separate from that of the environment, but
there are many others. One of the most interesting involves the treatment of echoes in
sound localization. The echoes that are common in most natural environments pose a
problem for localization because they generally come from directions other than that of
the source (Figure 8.14B). The auditory system appears to solve this problem by percep­
tually fusing similar impulsive sounds that occur within a brief interval of each other (on
the order of 10 ms or so), and using the sound that occurs first to determine the per­
ceived location. This precedence effect, so called because of the dominance of the sound

Page 39 of 62
Audition

that occurs first, was described and named by Hans Wallach (Wallach, Newman, et al.,
1949), one of the great gestalt psychologists, and has since been the subject of a large
and interesting literature. For instance, the maximal delay at which echoes are perceptu­
ally suppressed increases as two pairs of sounds are repeatedly presented (Freyman,
Clifton, et al., 1991), presumably because the repetition provides evidence that the sec­
ond sound is indeed an echo of the first, rather than being due to a distinct source (in
which case it would not occur at a consistent delay following the first sound). Moreover,
reversing the order of presentation can cause an abrupt breakdown of the effect, such
that two sounds are heard rather than one, each with a different location. See Litovsky,
Colburn, et al., 1999, for a review.

Figure 8.14 Reverberation. A, Impulse response for


a classroom. This is the sound waveform recorded in
this room in response to a click (impulse) produced
at a particular location in the room. The top arrow
indicates the impulse that reaches the microphone
directly from the source (that thus arrives first). The
lower arrow indicates one of the subsequent reflec­
tions, i.e., echoes. After the early reflections, a grad­
ually decaying reverberation tail is evident (cut off at
250 ms for clarity). The sound signal resulting from
an arbitrary source could be produced by convolving
the sound from the source with this impulse re­
sponse. B, Schematic diagram of the sound reflec­
tions that contribute to the signal that reaches a
listener’s ears in a typical room. The brown box in
the upper right corner depicts the speaker producing
sound. The green lines depict the path taken by the
direct sound to the listener’s ears. Blue and red lines
depict sound reaching the ears after one and two re­
flections, respectively. Sound reaching the ear after
more than two reflections is not shown.

Part B is reprinted with permission from Culling &


Akeroyd, 2010.

Page 40 of 62
Audition

Reverberation poses a problem for sound recognition in addition to localization because


different environments alter the sound from a source in different ways. Large amounts of
reverberation (with prominent echoes at very long delays), as are present in some large
auditoriums, can in fact greatly reduce the intelligibility of speech. Moderate amounts of
(p. 162) reverberation, however, as are present most of the time, typically have minimal ef­

fect on our ability to recognize speech and other sounds. Recent work indicates that part
of our robustness to reverberation derives from a process that adapts to the history of
echo stimulation. In reverberant conditions, the intelligibility of a speech utterance has
been found to be higher when preceded by another utterance than when not, an effect
that does not occur in anechoic conditions (Brandewie & Zahorik, 2010). Such results,
like those of the precedence effect, are consistent with the idea that listeners construct a
model of the environment’s contribution to the acoustic input and use it to partially dis­
count the environment when judging properties of a source. Analogous effects have been
found with nonspeech sounds. When listeners hear instrument sounds preceded by
speech or music that has been passed through a filter that “colors” the spectrum, the in­
strument sound is identified differently, as though listeners internalize the filter, assume
it to be an environmental effect, and discount it to some extent when identifying the
sound (Stilp, Alexander, et al., 2010).

VII. Current and Future Directions


Hearing science is one of the oldest areas of psychology and neuroscience, with a strong
research tradition dating back over 100 years, yet there remain many important open
questions. Although research on each of the senses need not be expected to proceed ac­
cording to a single fixed trajectory, the contrast between hearing and vision nonetheless
provides useful reminders of what remains poorly understood in audition. The classic
methods of psychophysics were initially developed largely within hearing research, and
were then borrowed by vision scientists to explore sensory encoding processes in vision.
But while vision science quickly embraced perceptual and cognitive questions, hearing
science remained more focused on the periphery. This can be explained in part by the
challenge of understanding the cochlea, the considerable complexity of the early auditory
system, and the clinical importance of peripheral audition. However, the focus on the pe­
riphery has left many central aspects of audition underexplored, and recent trends in
hearing research reflect a shift toward the study of these neglected mid- and high-level
questions.

One important set of questions concerns the interface of audition with the rest of cogni­
tion, via attention and memory. Attention research ironically also flourished in hearing
early on (with Cherry’s [1953] classic dichotic listening studies), but then largely moved
to the visual domain. Recent years have seen renewed interest (see chapter 11 in this vol­
ume), but there remain many open questions. Much is still unclear about what is repre­
sented about sound in the absence of attention, about how and what auditory attention
selects, and about the role of attention in perceptual organization.

Page 41 of 62
Audition

Another promising research area involves working memory. Auditory short-term memory
may have some striking differences with its visual counterpart (Demany, Trost, et al.,
2008) and appears closely linked to auditory scene analysis (Conway, Cowan, et al., 2001).
(p. 163) Studies of these topics in audition also hold promise for informing us more gener­

ally about the structure of cognition––the similarities and differences with respect to visu­
al cognition will reveal much about whether attention and memory mechanisms are do­
main general (perhaps exploiting central resources) or specific to particular sensory sys­
tems.

Interactions between audition and the other senses are also attracting increased interest.
Information from other sensory systems likely plays a crucial role in hearing given that
sound on its own often provides ambiguous information. The sounds produced by rain and
applause, for instance, can in some cases be quite similar, such that multisensory integra­
tion (using visual, somatosensory, or olfactory input) may help to correctly recognize the
sound source. Cross-modal interactions in localization (Alais & Burr, 2004) are similarly
powerful. Understanding cross-modal effects within the auditory system (Bizley, Nodal, et
al., 2007; Ghazanfar, 2009; Kayser, Petkov, et al., 2008) and their role in behavior will be a
significant direction of research going forward.

In addition to the uncharted territory in perception and cognition, there remain important
open questions about peripheral processing. Some of these unresolved issues, such as the
mechanisms of outer hair cell function, have great importance for understanding hearing
impairment. Others may dovetail with higher level function. For instance, the role of ef­
ferent connections to the cochlea is still uncertain, with some hypothesizing a role in at­
tention or segregation (Guinan, 2006). The role of phase locking in frequency encoding
and pitch perception is another basic issue that remains controversial and that has wide­
spread relevance to mid-level audition.

As audition continues to evolve as a field, I believe useful guidance will come from a com­
putational analysis of the inference problems the auditory system must solve (Marr,
1982). This necessitates thinking about the behavioral demands of real-world listening
situations, as well as the constraints imposed by the way that information about the world
is encoded in a sound signal. Many of these issues are becoming newly accessible with re­
cent advances in computational power and signal processing techniques.

For instance, one of the most important tasks a listener must perform with sound is sure­
ly that of recognition—determining what it was in the world that caused a sound, be it a
particular type of object, or of a type of event, such as something falling on the floor
(Gaver, 1993; Lutfi, 2008). Recognition is computationally challenging because the same
type of occurrence in the world typically produces a different sound waveform each time
it occurs. A recognition system must generalize across the variation that occurs within
categories, but not the variation that occurs across categories (DiCarlo & Cox, 2007). Re­
alizing this computational problem allows us to ask how the auditory system solves it.
One place where these issues have been explored to some extent is speech perception
(Holt & Lotto, 2010). The ideas explored there—about how listeners achieve invariance

Page 42 of 62
Audition

across different speakers and infer the state of the vocal apparatus along with the accom­
panying intentions of the speaker—could perhaps be extended to audition more generally
(Rosenblum, 2004).

The inference problems of audition can also be better appreciated by examining real-
world sound signals, and formal analysis of these signals seems likely to yield valuable
clues. As discussed in previous sections, statistical analysis of natural sounds has been a
staple of recent computational auditory neuroscience (Harper & McAlpine, 2004; Ro­
driguez, Chen, et al., 2010; Smith & Lewicki, 2006), where natural sound statistics have
been used to explain the mechanisms observed in the peripheral auditory system. Howev­
er, sound analysis seems likely to provide insight into mid- and high-level auditory prob­
lems as well. For instance, the acoustic grouping cues used in sound segregation are al­
most surely rooted to some extent in natural sound statistics, and examining such statis­
tics could reveal unexpected cues. Similarly, because sound recognition must generalize
across the variability that occurs within sounds produced by a particular type of source,
examining this variability in natural sounds may provide clues to how the auditory system
achieves the appropriate invariance in this domain.

The study of real-world auditory competence will also necessitate measuring auditory
abilities and physiological responses with more realistic sound signals. The tones and
noises that have been the staple of classical psychoacoustics and auditory physiology
have many uses, but also have little in common with many everyday sounds. One chal­
lenge of working with realistic signals is that actual recordings of real-world sounds are
often uncontrolled, and typically introduce confounds associated with their familiarity.
Methods of synthesizing novel sounds with naturalistic properties (Cavaco & Lewicki,
2007; McDermott, Wrobleski et al., 2011; (p. 164) McDermott & Simoncelli, 2011) are thus
likely to be useful experimental tools. Simulations of realistic auditory environments are
also increasingly within reach, with methods for generating three-dimensional auditory
scenes (Wightman & Kistler, 1989; Zahorik, 2009) being used in studies of sound localiza­
tion and speech perception in realistic conditions.

We must also consider more realistic auditory behaviors. Hearing does not normally oc­
cur while we are seated in a quiet room, listening over headphones, and paying full atten­
tion to the acoustic stimulus, but rather in the context of everyday activities in which
sound is a means to some other goal. The need to respect this complexity while maintain­
ing sufficient control over experimental conditions presents a challenge, but not one that
is insurmountable. For instance, neurophysiology experiments involving naturalistic be­
havior are becoming more common, with preparations being developed that will permit
recordings from freely moving animals engaged in vocalization (Eliades & Wang, 2008) or
locomotion—ultimately, perhaps a real-world cocktail party.

Page 43 of 62
Audition

Author Note
I thank Garner Hoyt von Trapp, Sam Norman-Haignere, Michael Schemitsch, and Sara
Steele for helpful comments on earlier drafts of this chapter, the authors who kindly al­
lowed me to reproduce their figures (acknowledged individually in the figure captions),
and the Howard Hughes Medical Institute for support.

References
Adelson, E. H. (2000). Lightness perception and lightness illusions. In: M. S. Gazzaniga
(Ed.), The new cognitive neurosciences (2nd ed., pp. 339–351). Cambridge, MA, MIT
Press.

Ahrens, M. B., Linden, J. F., et al. (2008). Influences in auditory cortical responses mod­
eled with multilinear spectrotemporal methods. Journal of Neuroscience, 28 (8), 1929–
1942.

Alain, C., Arnott, S. R., et al. (2001). “What” and “where” in the human auditory system.
Proceedings of the National Academy of Sciences U S A, 98, 12301–12306.

Alais, D., & Burr, D. E. (2004). The ventriloquist effect results from near-optimal bimodal
integration. Current Biology, 14, 257–262.

ANSI (2007). American national standard procedure for the computation of loudness of
steady sounds. ANSI, S3–4.

Ashmore, J. (2008). Cochlear outer hair cell motility. Physiological Review, 88, 173–210.

Attias, H., & Schreiner, C. E. (1997). Temporal low-order statistics of natural sounds. Ad­
vances in Neural Information Processing (p. 9). In M. Mozer, Jordan, M., & Petsche, T.
Cambridge, MA: MIT Press.

Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical
scaling. American Journal of Psychology, 84 (2), 147–166.

Bacon, S. P., & Grantham, D. W. (1989). Modulation masking: Effects of modulation fre­
quency, depth, and phase. Journal of the Acoustical Society of America, 85, 2575–2580.

Banai, K., Hornickel, J., et al. (2009). Reading and subcortical auditory function. Cerebral
Cortex, 19 (11), 2699–2707.

Bandyopadhyay, S., Shamma, S. A., et al. (2010). Dichotomy of functional organization in


the mouse auditory cortex. Nature Neuroscience, 13 (3), 361–368.

Barbour, D. L., & Wang, X. (2003). Contrast tuning in auditory cortex. Science, 299, 1073–
1075.

Page 44 of 62
Audition

Baumann, S., Griffiths, T. D., et al. (2011). Orthogonal representation of sound dimensions
in the primate midbrain. Nature Neuroscience, 14 (4), 423–425.

Bee, M. A., & Klump, G. M. (2004). Primitive auditory stream segregation: A neurophysio­
logical study in the songbird forebrain. Journal of Neurophysiology, 92, 1088–1104.

Bee, M. A., & Micheyl, C. (2008). The cocktail party problem: What is it? How can it be
solved? And why should animal behaviorists study it? Journal of Comparative Psychology,
122 (3), 235–251.

Belin, P., Zatorre, R. J., et al. (2000). Voice-selective areas in human auditory cortex. Na­
ture, 403, 309–312.

Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory
cortex. Nature, 426, 1161–1165.

Bendor, D., & Wang, X. (2006). Cortical representations of pitch in monkeys and humans.
Current Opinion in Neurobiology, 16, 391–399.

Bendor, D., & Wang, X. (2008). Neural response properties of primary, rostral, and ros­
trotemporal core fields in the auditory cortex of marmoset monkeys. Journal of Neuro­
physiology, 100 (2), 888–906.

Bernstein, J. G. W., & Oxenham, A. J. (2005). An autocorrelation model with place depen­
dence to account for the effect of harmonic number on fundamental frequency discrimi­
nation. Journal of the Acoustical Society of America, 117 (6), 3816–3831.

Bitterman, Y., Mukamel, R., et al. (2008). Ultra-fine frequency tuning revealed in single
neurons of human auditory cortex. Nature, 451 (7175), 197–201.

Bizley, J. K., Nodal, F. R., et al. (2007). Physiological and anatomical evidence for multi­
sensory interactions in auditory cortex. Cerebral Cortex, 17, 2172–2189.

Bizley, J. K., Walker, K. M. M., et al. (2009). Interdependent encoding of pitch, timbre, and
spatial location in auditory cortex. Journal of Neuroscience, 29 (7), 2064–2075.

Bizley, J. K., Walker, K. M. M., et al. (2010). Neural ensemble codes for stimulus periodici­
ty in auditory cortex. Journal of Neuroscience, 30 (14), 5078–5091.

Boemio, A., Fromm, S., et al. (2005). Hierarchical and asymmetric temporal sensitivity in
human auditory cortices. Nature Neuroscience, 8, 389–395.

Brandewie, E., & Zahorik, P. (2010). Prior listening in rooms improves speech intelligibili­
ty. Journal of the Acoustical Society of America, 128, 291–299.

Bregman, A. S. (1978). Auditory streaming is cumulative. Journal of Experimental Psy­


chology: Human Perception and Performance, 4, 380–387.

Page 45 of 62
Audition

Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound.


Cambridge, MA: MIT Press.

Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and percep­
tion of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244–
249.

Bronkhorst, A. W. (2000). The cocktail party phenomenon: A review of research on


(p. 165)

speech intelligibility in multiple-talker conditions. Acustica 86: 117–128.

Brown, G. J., & Palomaki, K. J. (2006). In D. Wang & G. J. Brown (Eds.), Reverberation.
Computational auditory scene analysis: Principles, algorithms, and applications (pp. 209–
250). D. Wang and G. J. Brown. Hoboken, NJ: John Wiley & Sons.

Buus, S., Florentine, M., et al. (1997). Temporal integration of loudness, loudness discrim­
ination, and the form of the loudness function. Journal of the Acoustical Society of Ameri­
ca, 101, 669–680.

Carcagno, S., & Plack, C. J. (2011). Subcortical plasticity following perceptual learning in
a pitch discrimination task. Journal of the Association for Research in Otolaryngology, 12,
89–100.

Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones. I.
Pitch and pitch salience. Journal of Neurophysiology, 76, 1698–1716.

Carlyon, R. P. (1991). Discriminating between coherent and incoherent frequency modula­


tion of complex tones. Journal of the Acoustical Society of America, 89, 329–340.

Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8
(10), 465–471.

Carlyon, R. P., & Cusack, R., et al. (2001). Effects of attention and unilateral neglect on
auditory stream segregation. Journal of Experimental Psychology: Human Perception and
Performance, 27 (1), 115–127.

Carlyon, R. P., Micheyl, C., et al. (2004). Auditory processing of real and illusory changes
in frequency modulation (FM) phase. Journal of the Acoustical Society of America, 116
(6), 3629–3639.

Cavaco, S., & Lewicki, M. S. (2007). Statistical modeling of intrinsic structures in impact
sounds. Journal of the Acoustical Society of America, 121 (6), 3558–3568.

Cedolin, L., & Delgutte, B. (2005). Pitch of complex tones: Rate-place and interspike in­
terval representations in the auditory nerve. Journal of Neurophysiology, 94, 347–362.

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two
ears. Journal of the Acoustical Society of America, 25 (5), 975–979.

Page 46 of 62
Audition

Christianson, G. B., Sahani, M., et al. (2008). The consequences of response nonlineari­
ties for interpretation of spectrotemporal receptive fields. Journal of Neuroscience, 28 (2),
446–455.

Ciocca, V., & Bregman, A. S. (1987). Perceived continuity of gliding and steady-state tones
through interrupting noise. Perception & Psychophysics, 42, 476–484.

Cohen, Y. E., Russ, B. E., et al. (2009). A functional role for the ventrolateral prefrontal
cortex in non-spatial auditory cognition. Proceedings of the National Academy of Sciences
U S A, 106, 20045–20050.

Conway, A. R., Cowan, A. N., et al. (2001). The cocktail party phenomenon revisited: The
importance of working memory capacity. Psychonomic Bulletin & Review, 8, 331–335.

Culling, J. F., & Akeroyd, M. A. (2010). In C. J. Plack (Ed.), Spatial hearing. The Oxford
handbook of auditory science: Hearing (Vol. 3, pp. 123–144). Oxford, UK: Oxford Universi­
ty Press.

Culling, J. F., & Summerfield, Q. (1995). Perceptual separation of concurrent speech


sounds: Absence of across-frequency grouping by common interaural delay. Journal of the
Acoustical Society of America, 98 (2), 785–797.

Cusack, R., & Carlyon, R. P. (2004). Auditory perceptual organization inside and outside
the laboratory. In J. G. Neuhoff (Ed.), Ecological psychoacoustics (pp. 15–84). San Diego:
Elsevier Academic Press.

Cutting, J. E. (1975). Aspects of phonological fusion. Journal of Experimental Psychology:


Human Perception and Performance, 104, 105–120.

Dallos, P. (2008). Cochlear amplification, outer hair cells and prestin. Current Opinion in
Neurobiology, 18, 370–376.

Darrow, K. N., Maison, S. F., et al. (2006). Cochlear efferent feedback balances interaural
sensitivity. Nature Neuroscience, 9 (12), 1474–1476.

Darwin, C. (1984). Perceiving vowels in the presence of another sound: Constraints on


formant perception. Journal of the Acoustical Society of America, 76 (6), 1636–1647.

Darwin, C. J. (1981). Perceptual grouping of speech components different in fundamental


frequency and onset-time. Quarterly Journal of Experimental Psychology, 3A 185–207.

Darwin, C. J. (1997). Auditory grouping. Trends in Cognitive Sciences, 1, 327–333.

Darwin, C. J., & Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asyn­
chrony and ear of presentation of a mistuned component. Journal of the Acoustical Soci­
ety of America, 91, 3381–3390.

Page 47 of 62
Audition

Darwin, C. J., & Hukin, R. W. (1997). Perceptual segregation of a harmonic from a vowel
by interaural time difference and frequency proximity. Journal of the Acoustical Society of
America, 102 (4), 2316–2324.

Darwin, C. J., & Hukin, R. W. (1999). Auditory objects of attention: The role of interaural
time differences. Journal of Experimental Psychology: Human Perception and Perfor­
mance, 25 (3), 617–629.

Dau, T., Kollmeier, B., et al. (1997). Modeling auditory processing of amplitude modula­
tion. I. Detection and masking with narrow-band carriers. Journal of the Acoustical Soci­
ety of America, 102 (5), 2892–2905.

David, S. V., Mesgarani, N., et al. (2009). Rapid synaptic depression explains nonlinear
modulation of spectro-temporal tuning in primary auditory cortex by natural stimuli. Jour­
nal of Neuroscience, 29 (11), 3374–3386.

de Cheveigne, A. (2005). Pitch perception models. In C. J. Plack & A. J. Oxenham (Eds.),


Pitch (pp. 169–233). New York: Springer-Verlag.

de Cheveigne, A. (2006). Multiple F0 estimation. In: D. Wang & G. J. Brown (Eds.), Com­
putational auditory scene analysis: Principles, algorithms, and applications (pp. 45–80).
Hoboken, NJ: John Wiley & Sons.

de Cheveigne, A. (2010). Pitch perception. In C. J. Plack (Ed.), The Oxford handbook of au­
ditory science: Hearing (Vol. 3), pp. 71–104. New York: Oxford University Press.

de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for
speech and music. Journal of the Acoustical Society of America, 111, 1917–1930.

de Cheveigne, A., McAdams, S., et al. (1995). Identification of concurrent harmonic and
inharmonic vowels: A test of the theory of harmonic cancellation and enhancement. Jour­
nal of the Acoustical Society of America, 97 (6), 3736–3748.

Dean, I., Harper, N. S., et al. (2005). Neural population coding of sound level adapts to
stimulus statistics. Nature Neuroscience, 8 (12), 1684–1689.

Delgutte, B., Joris, P. X., et al. (1999). Receptive fields and binaural interactions for virtu­
al-space stimuli in the cat inferior colliculus. Journal of Neurophysiology, 81, 2833–2851.

Demany, L., & Semal, C. (1990). The upper limit of “musical” pitch. Music Percep­
(p. 166)

tion, 8, 165–176.

Demany, L., Trost, W., et al. (2008). Auditory change detection: Simple sounds are not
memorized better than complex sounds. Psychological Science, 19, 85–91.

Depireux, D. A., Simon, J. Z., et al. (2001). Spectro-temporal response field characteriza­
tion with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology,
85 (3), 1220–1234.

Page 48 of 62
Audition

DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog­
nitive Sciences, 11, 333–341.

Durlach, N. I., Mason, C. R., et al. (2003). Note on informational masking. Journal of the
Acoustical Society of America, 113 (6), 2984–2987.

Elgoyhen, A. B., & Fuchs, P. A. (2010). Efferent innervation and function. In P. A. Fuchs
(Ed.), The Oxford handbook of auditory science: The ear (pp. 283–306). Oxford, UK: Ox­
ford University Press.

Elhilali, M., Ma, L., et al. (2009). Temporal coherence in the perceptual organization and
cortical representation of auditory scenes. Neuron, 61, 317–329.

Eliades, S. J., & Wang X. (2008). Neural substrates of vocalization feedback monitoring in
primate auditory cortex. Nature, 453, 1102–1106.

Escabi, M. A., Miller, L. M., et al. (2003). Naturalistic auditory contrast improves spec­
trotemporal coding in the cat inferior colliculus. Journal of Neuroscience, 23, 11489–
11504.

Fairhall, A. L., Lewen, G. D., et al. (2001). Efficiency and ambiguity in an adaptive neural
code. Nature, 412, 787–792.

Field, D. J. (1987). Relations between the statistics of natural images and the response
profiles of cortical cells. Journal of the Optical Society of America A, 4, 2379–2394.

Fishman, Y. I., Arezzo, J. C., et al. (2004). Auditory stream segregation in monkey auditory
cortex: Effects of frequency separation, presentation rate, and tone duration. Journal of
the Acoustical Society of America, 116, 1656–1670.

Fishman, Y. I., Reser, D. H., et al. (1998). Pitch vs. spectral encoding of harmonic complex
tones in primary auditory cortex of the awake monkey. Brain Research, 786, 18–30.

Fishman, Y. I., & Steinschneider, M. (2010). Formation of auditory streams. In A. Rees &
A. R. Palmer (Eds.), The oxford handbook of auditory science: The auditory brain (pp.
215–245). Oxford, UK: Oxford University Press.

Formisano, E., Kim, D., et al. (2003). Mirror-symmetric tonotopic maps in human primary
auditory cortex. Neuron, 40 (4), 859–869.

Freyman, R. L., Clifton, R. K., et al. (1991). Dynamic processes in the precedence effect.
Journal of the Acoustical Society of America, 90, 874–884.

Fritz, J. B., David, S. V., et al. (2010). Adaptive, behaviorally gated, persistent encoding of
task-relevant auditory information in ferret frontal cortex. Nature Neuroscience, 13 (8),
1011–1019.

Fritz, J. B., Elhilali, M., et al. (2005). Differential dynamic plasticity of A1 receptive fields
during multiple spectral tasks. Journal of Neuroscience, 25 (33), 7623–7635.
Page 49 of 62
Audition

Fritz, J. B., Shamma, S. A., et al. (2003). Rapid task-related plasticity of spectrotemporal
receptive fields in primary auditory cortex. Nature Neuroscience, 6, 1216–1223.

Gardner, W. G. (1998). Reverberation algorithms. In M. Kahrs and K. Brandenburg (Eds.),


Applications of digital signal processing to audio and acoustics (pp. 85–131). Norwell,
MA: Kluwer Academic.

Gaver, W. W. (1993). What in the world do we hear? An ecological approach to auditory


source perception. Ecological Psychology, 5 (1), 1–29.

Ghazanfar, A. A. (2009). The multisensory roles for auditory cortex in primate vocal com­
munication. Hearing Research, 258, 113–120.

Ghitza, O. (2001). On the upper cutoff frequency of the auditory critical-band envelope
detectors in the context of speech perception. Journal of the Acoustical Society of Ameri­
ca, 110 (3), 1628–1640.

Giraud, A., Lorenzi, C., et al. (2000). Representation of the temporal envelope of sounds
in the human brain. Journal of Neurophysiology, 84 (3), 1588–1598.

Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from
notched-noise data. Hearing Research, 47, 103–138.

Goldstein, J. L. (1973). An optimum processor theory for the central formation of the pitch
of complex tones. Journal of the Acoustical Society of America, 54, 1496–1516.

Guinan, J. J. (2006). Olivocochlear efferents: Anatomy, physiology, function, and the mea­
surement of efferent effects in humans. Ear and Hearing, 27 (6), 589–607.

Gygi, B., Kidd, G. R., et al. (2004). Spectral-temporal factors in the identification of envi­
ronmental sounds. Journal of the Acoustical Society of America, 115 (3), 1252–1265.

Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain. Cere­
bral Cortex, 19 (3), 576–585.

Hall, D. A., Barrett, D. J. K., Akeroyd, M. A., & Summerfield, A. Q. (2005). Cortical repre­
sentations of temporal structure in sound. Journal of Neurophysiology, 94 (11), 3181–
3191.

Hall, J. W., Haggard, M. P., et al. (1984). Detection in noise by spectro-temporal pattern
analysis. Journal of the Acoustical Society of America, 76, 50–56.

Harper, N. S., & McAlpine, D. (2004). Optimal neural population coding of an auditory
spatial cue. Nature, 430, 682–686.

Hawley, M. L., Litovsky, R. Y., et al. (2004). The benefit of binaural hearing in a cocktail
party: Effect of location and type of interferer. Journal of the Acoustical Society of Ameri­
ca, 115 (2), 833–843.

Page 50 of 62
Audition

Heffner, H. E., & Heffner, R. S. (1990). Effect of bilateral auditory cortex lesions on sound
localization in Japanese macaques. Journal of Neurophysiology, 64 (3), 915–931.

Heinz, M. G., Colburn, H. S., et al. (2001). Evaluating auditory performance limits: I. One-
parameter discrimination using a computational model for the auditory nerve. Neural
Computation, 13, 2273–2316.

Higgins, N. C., Storace, D. A., et al. (2010). Specialization of binaural responses in ventral
auditory cortices. Journal of Neuroscience, 30 (43), 14522–14532.

Hofman, P. M., Van Riswick, J. G. A., et al. (1998). Relearning sound localization with new
ears. Nature Neuroscience, 1 (5), 417–421.

Holt, L. L., & Lotto, A. J. (2010). Speech perception as categorization. Attention, Percep­
tion, and Psychophysics, 72 (5), 1218–1227.

Houtgast, T. (1989). Frequency selectivity in amplitude-modulation detection. Journal of


the Acoustical Society of America, 85, 1676–1680.

Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for
complex tones with many harmonics. Journal of the Acoustical Society of America, 87 (1),
304–310.

Hsu, A., Woolley, S. M., et al. (2004). Modulation power and phase spectrum of
(p. 167)

natural sounds enhance neural encoding performed by single auditory neurons. Journal of
Neuroscience, 24, 9201–9211.

Hudspeth, A. J. (2008). Making an effort to listen: Mechanical amplification in the ear.


Neuron, 59 (4), 530–545.

Humphries, C., Liebenthal, E., et al. (2010). Tonotopic organization of human auditory
cortex. NeuroImage, 50 (3), 1202–1211.

Ihlefeld, A., & Shinn-Cunningham, B. (2008). Spatial release from energetic and informa­
tional masking in a divided speech identification task. Journal of the Acoustical Society of
America, 123 (6), 4380–4392.

Javel, E., & Mott, J. B. (1988). Physiological and psychophysical correlates of temporal
processes in hearing. Hearing Research, 34, 275–294.

Jenkins, W. M., & Masterton, R. G. (1982). Sound localization: Effects of unilateral lesions
in central auditory system. Journal of Neurophysiology, 47, 987–1016.

Johnson, D. H. (1980). The relationship between spike rate and synchrony in responses of
auditory-nerve fibers to single tones. Journal of the Acoustical Society of America, 68,
1115–1122.

Johnsrude, I. S., Penhune, V. B., et al. (2000). Functional specificity in the right human au­
ditory cortex for perceiving pitch direction. Brain, 123 (1), 155–163.
Page 51 of 62
Audition

Joris, P. X., Bergevin, C., et al. (2011). Frequency selectivity in Old-World monkeys corrob­
orates sharp cochlear tuning in humans. Proceedings of the National Academy of
Sciences U S A, 108 (42), 17516–17520.

Joris, P. X., Schreiner, C. E., et al. (2004). Neural processing of amplitude-modulated


sounds. Physiological Review, 84, 541–577.

Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing
streams in primates. Proceedings of the National Academy of Sciences U S A, 97, 11793–
11799.

Kadia, S. C., & Wang, X. (2003). Spectral integration in A1 of awake primates: Neurons
with single and multipeaked tuning characteristics. Journal of Neurophysiology, 89 (3),
1603–1622.

Kanwisher, N. (2010). Functional specificity in the human brain: A window into the func­
tional architecture of the mind. Proceedings of the National Academy of Sciences U S A,
107, 11163–11170.

Kawase, T., Delgutte, B., et al. (1993). Anti-masking effects of the olivocochlear reflex. II.
Enhancement of auditory-nerve response to masked tones. Journal of Neurophysiology,
70, 2533–2549.

Kayser, C., Petkov, C. I., et al. (2008). Visual modulation of neurons in auditory cortex.
Cerebral Cortex, 18 (7), 1560–1574.

Kidd, G., Arbogast, T. L., et al. (2005). The advantage of knowing where to listen. Journal
of the Acoustical Society of America, 118 (6), 3804–3815.

Kidd, G., Mason, C. R., et al. (1994). Reducing informational masking by sound segrega­
tion. Journal of the Acoustical Society of America, 95 (6), 3475–3480.

Kidd, G., Mason, C. R., et al. (2003). Multiple bursts, multiple looks, and stream coher­
ence in the release from informational masking. Journal of the Acoustical Society of Amer­
ica, 114 (5), 2835–2845.

Kikuchi, Y., Horwitz, B., et al. (2010). Hierarchical auditory processing directed rostrally
along the monkey’s supratemporal plane. Journal of Neuroscience, 30 (39), 13021–13030.

Kluender, K. R., & Jenison, R. L. (1992). Effects of glide slope, noise intensity, and noise
duration on the extrapolation of FM glides through noise. Perception & Psychophysics,
51, 231–238.

Klump, R. G., & Eady, H. R. (1956). Some measurements of interural time difference
thresholds. Journal of the Acoustical Society of America, 28, 859–860.

Kulkarni, A., & Colburn, H. S. (1998). Role of spectral detail in sound-source localization.
Nature, 396, 747–749.

Page 52 of 62
Audition

Kvale, M., & Schreiner, C. E. (2004). Short-term adaptation of auditory receptive fields to
dynamic stimuli. Journal of Neurophysiology, 91, 604–612.

Langner, G., Sams, M., et al. (1997). Frequency and periodicity are represented in orthog­
onal maps in the human auditory cortex: Evidence from magnetoencephalography. Jour­
nal of Comparative Physiology, 181, 665–676.

Leaver, A. M., & Rauschecker, J. P. (2010). Cortical representation of natural complex


sounds: Effects of acoustic features and auditory object category. Journal of
Neuroscience, 30 (22), 7604–7612.

Lerner, Y., Honey, C. J., et al. (2011). Topographic mapping of a hierarchy of temporal re­
ceptive windows using a narrated story. Journal of Neuroscience, 31 (8), 2906–2915.

Lewicki, M. S. (2002). Efficient coding of natural sounds. Nature Neuroscience, 5 (4),


356–363.

Liberman, M. C. (1982). The cochlear frequency map for the cat: Labeling auditory-nerve
fibers of known characteristic frequency. Journal of the Acoustical Society of America, 72,
1441–1449.

Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communi­


cation, 22, 1–16.

Litovsky, R. Y., Colburn, H. S., et al. (1999). The precedence effect. Journal of the Acousti­
cal Society of America, 106, 1633–1654.

Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process­
ing in auditory cortex. Nature Neuroscience, 11 (5), 609–616.

Lorenzi, C., Gilbert, G., et al. (2006). Speech perception problems of the hearing impaired
reflect inability to use temporal fine structure. Proceedings of the National Academy of
Sciences U S A, 103, 18866–18869.

Lutfi, R. A. (1992). Informational processing of complex sounds. III. Interference. Journal


of the Acoustical Society of America, 91, 3391–3400.

Lutfi, R. A. (2008). Human sound source identification. In W. A. Yost & A. N. Popper


(Eds.), Springer handbook of auditory research: Auditory perception of sound sources (pp.
13–42). New York: Springer-Verlag.

Machens, C. K., M. S. Wehr, et al. (2004). Linearity of cortical receptive fields measured
with natural sounds. Journal of Neuroscience, 24, 1089–1100.

Macken, W. J., Tremblay, S., et al. (2003). Does auditory streaming require attention? Evi­
dence from attentional selectivity in short-term memory. Journal of Experimental Psychol­
ogy: Human Perception and Performance, 29, 43–51.

Page 53 of 62
Audition

Makous, J. C., & Middlebrooks, J. C. (1990). Two-dimensional sound localization by human


listeners. Journal of the Acoustical Society of America, 87, 2188–2200.

Marr, D. C. (1982). Vision: A computational investigation into the human representation


and processing of visual information. New York: Freeman.

May, B. J., Anderson, M., et al. (2008). The role of broadband inhibition in the rate
(p. 168)

representation of spectral cues for sound localization in the inferior colliculus. Hearing
Research, 238, 77–93.

May, B. J., & McQuone, S. J. (1995). Effects of bilateral olivocochlear lesions on pure-tone
discrimination in cats. Auditory Neuroscience, 1, 385–400.

McAdams, S. (1989). Segregation of concurrent sounds. I. Effects of frequency modula­


tion coherence. Journal of the Acoustical Society of America, 86, 2148–2159.

McAlpine, D. (2004). Neural sensitivity to periodicity in the inferior colliculus: Evidence


for the role of cochlear distortions. Journal of Neurophysiology, 92, 1295–1311.

McDermott, J. H. (2009). The cocktail party problem. Current Biology, 19, R1024–R1027.

McDermott, J. H., Lehr, A. J., et al. (2010). Individual differences reveal the basis of conso­
nance. Current Biology, 20, 1035–1041.

McDermott, J. H., & Oxenham, A. J. (2008a). Music perception, pitch, and the auditory
system. Current Opinion in Neurobiology, 18, 452–463.

McDermott, J. H., & Oxenham, A. J. (2008b). Spectral completion of partially masked


sounds. Proceedings of the National Academy of Sciences U S A, 105 (15), 5939–5944.

McDermott, J. H., & Simoncelli, E. P. (2011). Sound texture perception via statistics of the
auditory periphery: Evidence from sound synthesis. Neuron, 71, 926–940.

McDermott, J. H., Wrobleski, D., et al. (2011). Recovering sound sources from embedded
repetition. Proceedings of the National Academy of Sciences U S A, 108 (3), 1188–1193.

Meddis, R., & Hewitt, M. J. (1991). Virtual pitch and phase sensitivity of a computer mod­
el of the auditory periphery: Pitch identification. Journal of the Acoustical Society of
America, 89, 2866–2882.

Mershon, D. H., Desaulniers, D. H., et al. (1981). Perceived loudness and visually-deter­
mined auditory distance. Perception, 10, 531–543.

Mesgarani, N., David, S. V., et al. (2008). Phoneme representation and classification in
primary auditory cortex. Journal of the Acoustical Society of America, 123 (2), 899–909.

Micheyl, C., & Oxenham, A. J. (2010). Objective and subjective psychophysical measures
of auditory stream integration and segregation. Journal of the Association for Research in
Otolaryngology, 11 (4), 709–724.

Page 54 of 62
Audition

Micheyl, C., & Oxenham, A. J. (2010). Pitch, harmonicity and concurrent sound segrega­
tion: Psychoacoustical and neurophysiological findings. Hearing Research, 266, 36–51.

Micheyl, C., Tian, B., et al. (2005). Perceptual organization of tone sequences in the audi­
tory cortex of awake macaques. Neuron, 48, 139–148.

Middlebrooks, J. C. (1992). Narrow-band sound localization related to external ear


acoustics. Journal of the Acoustical Society of America, 92 (5), 2607–2624.

Middlebrooks, J. C. (2000). Cortical representations of auditory space. In M. S. Gazzaniga.


The new cognitive neurosciences (2nd ed., pp. 425–436). Cambridge, MA: MIT Press.

Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual
Review of Psychology, 42, 135–159.

Miller, L. M., Escabi, M. A., et al. (2001). Spectrotemporal receptive fields in the lemnis­
cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527.

Miller, L. M., Escabi, M. A., et al. (2002). Spectrotemporal receptive fields in the lemnis­
cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527.

Miller, R. L., Schilling, J. R., et al. (1997). Effects of acoustic trauma on the representation
of the vowel /e/ in cat auditory nerve fibers. Journal of the Acoustical Society of America,
101 (6), 3602–3616.

Moore, B. C. J. (1973). Frequency differences limens for short-duration tones. Journal of


the Acoustical Society of America, 54, 610–619.

Moore, B. C. J. (2003). An introduction to the psychology of hearing. San Diego, CA: Acad­
emic Press.

Moore, B. C., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model. Acta Acus­
tica, 82 (2), 335–345.

Moore, B. C. J., Glasberg, B. R., et al. (1986). Thresholds for hearing mistuned partials as
separate tones in harmonic complexes. Journal of the Acoustical Society of America, 80,
479–483.

Moore, B. C. J., & Gockel, H. (2002). Factors influencing sequential stream segregation.
Acta Acustica, 88, 320–332.

Moore, B. C. J., & Oxenham, A. J. (1998). Psychoacoustic consequences of compression in


the peripheral auditory system. Psychological Review, 105 (1), 108–124.

Moshitch, D., Las, L., et al. (2006). Responses of neurons in primary auditory cortex (A1)
to pure tones in the halothane-anesthetized cat. Journal of Neurophysiology, 95 (6), 3756–
3769.

Page 55 of 62
Audition

Neff, D. L. (1995). Signal properties that reduce masking by simultaneous, random-fre­


quency maskers. Journal of the Acoustical Society of America, 98, 1909–1920.

Nelken, I., Bizley, J. K., et al. (2008). Responses of auditory cortex to complex stimuli:
Functional organization revealed using intrinsic optical signals. Journal of Neurophysiolo­
gy, 99 (4), 1928–1941.

Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties
by learning a sparse code for natural images. Nature, 381, 607–609.

Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig
and its relation to the receptor potential of inner hair-cells. Hearing Research, 24, 1–15.

Patterson, R. D., Uppenkamp, S., et al. (2002). The processing of temporal pitch and
melody information in auditory cortex. Neuron, 36 (4), 767–776.

Penagos, H., Melcher, J. R., et al. (2004). A neural representation of pitch salience in non­
primary human auditory cortex revealed with functional magnetic resonance imaging.
Journal of Neuroscience, 24 (30), 6810–6815.

Petkov, C. I., Kayser, C., et al. (2006). Functional imaging reveals numerous fields in the
monkey auditory cortex. PLoS Biology, 4 (7), 1213–1226.

Petkov, C. I., Kayser, C., et al. (2008). A voice region in the monkey brain. Nature Neuro­
science, 11, 367–374.

Petkov, C. I., O’Connor, K. N., et al. (2007). Encoding of illusory continuity in primary au­
ditory cortex. Neuron, 54, 153–165.

Plack, C. J. (2005). The sense of hearing. New Jersey, Lawrence Erlbaum.

Plack, C. J., & Oxenham, A. J. (2005). The psychophysics of pitch. In C. J. Plack, A. J. Oxen­
ham, R. R. Fay, & A. J. Popper (Eds.), Pitch: Neural coding and perception (pp. 7–55). New
York: Springer-Verlag.

Plack, C. J., Oxenham, A. J., et al. (Eds.) (2005). Pitch: Neural coding and perception.
Springer Handbook of Auditory Research. New York: Springer-Verlag.

Poeppel, D. (2003). The analysis of speech in different temporal integration windows:


Cerebral lateralization as “asymmetric sampling in time.” Speech Communication, 41,
245–255.

(p. 169) Poremba, A., Saunders, R. C., et al. (2003). Functional mapping of the primate au­
ditory system. Science, 299, 568–572.

Pressnitzer, D., Sayles, M., et al. (2008). Perceptual organization of sound begins in the
auditory periphery. Current Biology, 18, 1124–1128.

Page 56 of 62
Audition

Rajan, R. (2000). Centrifugal pathways protect hearing sensitivity at the cochlea in noisy
environments that exacerbate the damage induced by loud sound. Journal of Neuro­
science, 20, 6684–6693.

Rauschecker, J. P., & Tian, B. (2004). Processing of band-passed noise in the lateral audi­
tory belt cortex of the rhesus monkey. Journal of Neurophysiology, 91, 2578–2589.

Rayleigh, L. (1907). On our perception of sound direction. Philosophical Magazine, 3,


456–464.

Recanzone, G. H. (2008). Representation of con-specific vocalizations in the core and belt


areas of the auditory cortex in the alert macaque monkey. Journal of Neuroscience, 28
(49), 13184–13193.

Rhode, W. S. (1971). Observations of the vibration of the basilar membrane in squirrel


monkeys using the Mossbauer technique. Journal of the Acoustical Society of America, 49,
1218–1231.

Rhode, W. S. (1978). Some observations on cochlear mechanics. Journal of the Acoustical


Society of America, 64, 158–176.

Riecke, L., van Opstal, J., et al. (2007). Hearing illusory sounds in noise: Sensory-
(p. 170)

perceptual transformations in primary auditory cortex. Journal of Neuroscience, 27 (46),


12684–12689.

Roberts, B., & Brunstrom, J. M. (1998). Perceptual segregation and pitch shifts of mis­
tuned components in harmonic complexes and in regular inharmonic complexes. Journal
of the Acoustical Society of America, 104 (4), 2326–2338.

Rodriguez, F. A., Chen, C., et al. (2010). Neural modulation tuning characteristics scale to
efficiently encode natural sound statistics. Journal of Neuroscience, 30, 15969–15980.

Romanski, L. M., Tian, B., et al. (1999). Dual streams of auditory afferents target multiple
domains in the primate prefrontal cortex. Nature Neuroscience, 2 (12), 1131–1136.

Rose, J. E., Brugge, J. F., et al. (1967). Phase-locked response to low-frequency tones in
single auditory nerve fibers of the squirrel monkey. Journal of Neurophysiology, 30, 769–
793.

Rosen, S. (1992). Temporal information in speech: Acoustic, auditory and linguistic as­
pects. Philosophical Transactions of the Royal Society, London, Series B, 336, 367–373.

Rosenblum, L. D. (2004). Perceiving articulatory events: Lessons for an ecological psy­


choacoustics. In J. G. Neuhoff (Ed.), Ecological psychoacoustics (pp.: 219–248). San Diego,
CA: Elsevier Academic Press.

Rothschild, G., Nelken, I., et al. (2010). Functional organization and population dynamics
in the mouse primary auditory cortex. Nature Neuroscience, 13 (3), 353–360.

Page 57 of 62
Audition

Rotman, Y., Bar Yosef, O., et al. (2001). Relating cluster and population responses to nat­
ural sounds and tonal stimuli in cat primary auditory cortex. Hearing Research, 152, 110–
127.

Ruggero, M. A. (1992). Responses to sound of the basilar membrane of the mammalian


cochlea. Current Opinion in Neurobiology, 2, 449–456.

Ruggero, M. A., & Rich, N. C. (1991). Furosemide alters organ of Corti mechanics: Evi­
dence for feedback of outer hair cells upon the basilar membrane. Journal of Neuro­
science, 11, 1057–1067.

Ruggero, M. A., Rich, N. C., et al. (1997). Basilar-membrane responses to tones at the
base of the chinchilla cochlea. Journal of the Acoustical Society of America, 101, 2151–
2163.

Samson, F., Zeffiro, T. A., et al. (2011). Stimulus complexity and categorical effects in hu­
man auditory cortex: an Activation Likelihood Estimation meta-analysis. Frontiers in Psy­
chology, 1, 1–23.

Scharf, B., Magnan, J., et al. (1997). On the role of the olivocochlear bundle in hearing: 16
Case studies. Hearing Research, 102, 101–122.

Schonwiesner, M., & Zatorre, R. J. (2008). Depth electrode recordings show double disso­
ciation between pitch processing in lateral Heschl’s gyrus. Experimental Brain Research,
187, 97–105.

Schonwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function


of single voxels in the human auditory cortex measured with high-resolution fMRI. Pro­
ceedings of the National Academy of Sciences U S A, 106 (34), 14611–14616.

Schreiner, C. E., & Urbas, J. V. (1986). Representation of amplitude modulation in the au­
ditory cortex of the cat. I. Anterior auditory field. Hearing Research, 21, 227–241.

Schreiner, C. E., & Urbas, J. V. (1988). Representation of amplitude modulation in the au­
ditory cortex of the cat. II. Comparison between cortical fields. Hearing Research, 32, 49–
64.

Shackleton, T. M., & Carlyon, R. P. (1994). The role of resolved and unresolved harmonics
in pitch perception and frequency modulation discrimination. Journal of the Acoustical So­
ciety of America, 95 (6), 3529–3540.

Shamma, S. A., & Klein, D. (2000). The case of the missing pitch templates: How harmon­
ic templates emerge in the early auditory system. Journal of the Acoustical Society of
America, 107, 2631–2644.

Shannon, R. V., Zeng, F. G., et al. (1995). Speech recognition with primarily temporal
cues. Science, 270 (5234), 303–304.

Page 58 of 62
Audition

Shera, C. A., Guinan, J. J., et al. (2002). Revised estimates of human cochlear tuning from
otoacoustic and behavioral measurements. Proceedings of the National Academy of
Sciences U S A, 99 (5), 3318–3323.

Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in


Cognitive Sciences, 12 (5), 182–186.

Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and etho­
logical theories of auditory processing. Journal of the Acoustical Society of America, 114
(6), 3394–33411.

Skoe, E., & Kraus, N. (2010). Auditory brainstem response to complex sounds: A tutorial.
Ear and Hearing, 31 (3), 302–324.

Smith, E. C., & Lewicki, M. S. (2006). Efficient auditory coding. Nature, 439, 978–982.

Smith, Z. M., Delgutte, B., et al. (2002). Chimaeric sounds reveal dichotomies in auditory
perception. Nature, 416, 87–90.

Snyder, J. S., & Alain, C. (2007). Toward a neurophysiological theory of auditory stream
segregation. Psychological Bulletin, 133 (5), 780–799.

Stevens, S. S. (1955). The measurement of loudness. Journal of the Acoustical Society of


America, 27 (5), 815–829.

Stilp, C. E., Alexander, J. M., et al. (2010). Auditory color constancy: Calibration to reli­
able spectral properties across nonspeech context and targets. Attention, Perception, and
Psychophysics, 72 (2), 470–480.

Sutter, M. L., & Schreiner, C. E. (1991). Physiology and topography of neurons with multi­
peaked tuning curves in cat primary auditory cortex. Journal of Neurophysiology, 65,
1207–1226.

Sweet, R. A., Dorph-Petersen, K., et al. (2005). Mapping auditory core, lateral belt, and
parabelt cortices in the human superior temporal gyrus. Journal of Comparative Neurolo­
gy, 491, 270–289.

Talavage, T. M., Sereno, M. I., et al. (2004). Tonotopic organization in human auditory cor­
tex revealed by progressions of frequency sensitivity. Journal of Neurophysiology, 91,
1282–1296.

Tansley, B. W., & Suffield, J. B. (1983). Time-course of adaptation and recovery of chan­
nels selectively sensitive to frequency and amplitude modulation. Journal of the Acousti­
cal Society of America, 74, 765–775.

Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of
America, 55, 1061–1069.

Page 59 of 62
Audition

Theunissen, F. E., Sen, K., et al. (2000). Spectral-temporal receptive fields of non-linear
auditory neurons obtained using natural sounds. Journal of Neuroscience, 20, 2315–2331.

Tian, B., & Rauschecker, J. P. (2004). Processing of frequency-modulated sounds in the lat­
eral auditory belt cortex of the rhesus monkey. Journal of Neurophysiology, 92, 2993–
3013.

Tian, B., Reser, D., et al. (2001). Functional specialization in rhesus monkey auditory cor­
tex. Science, 292, 290–293.

Ulanovsky, N., Las, L., et al. (2003). Processing of low-probability sounds by cortical neu­
rons. Nature Neuroscience, 6 (4), 391–398.

van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences.


Eindhoven, The Netherlands: The Institute of Perception Research, University of Technol­
ogy.

Walker, K. M. M., Bizley, J. K., et al. (2010). Cortical encoding of pitch: Recent results and
open questions. Hearing Research, 271 (1–2), 74–87.

Wallace, M. N., Anderson, L. A., et al. (2007). Phase-locked responses to pure tones in the
auditory thalamus. Journal of Neurophysiology, 98 (4), 1941–1952.

Wallach, H., Newman, E. B., et al. (1949). The precedence effect in sound localization.
American Journal of Psychology, 42, 315–336.

Warren, J. D., Zielinski, B. A., et al. (2002). Perception of sound-source motion by the hu­
man brain. Neuron, 34, 139–148.

Warren, R. M. (1970). Perceptual restoration of missing speech sounds. Science, 167,


392–393.

Warren, R. M., Obusek, C. J., et al. (1972). Auditory induction: perceptual synthesis of ab­
sent sounds. Science, 176, 1149–1151.

Watson, C. S. (1987). Uncertainty, informational masking and the capacity of immediate


auditory memory. In W. A. Yost & C. S. Watson (Eds.), Auditory processing of complex
sounds (pp. 267–277). Hillsdale, NJ: Erlbaum.

Wightman, F. (1973). The pattern-transformation model of pitch. Journal of the Acoustical


Society of America, 54, 407–416.

Wightman, F., & Kistler, D. J. (1989). Headphone simulation of free-field listening. II. Psy­
chophysical validation. Journal of the Acoustical Society of America, 85 (2), 868–878.

Winslow, R. L., & Sachs, M. B. (1987). Effect of electrical stimulation of the crossed olivo­
cochlear bundle on auditory nerve response to tones in noise. Journal of Neurophysiology,
57 (4), 1002–1021.

Page 60 of 62
Audition

Winter, I. M. (2005). The neurophysiology of pitch. In C. J. Plack, A. J. Oxenham, R. R. Fay,


& A. J. Popper (Eds.), Pitch—Neural coding and perception (pp. 99–146). New York:
Springer-Verlag.

Wong, P. C. M., Skoe, E., et al. (2007). Musical experience shapes human brainstem en­
coding of linguistic pitch patterns. Nature Neuroscience, 10 (4), 420–422.

Woods, T. M., Lopez, S. E., et al. (2006). Effects of stimulus azimuth and intensity on the
single-neuron activity in the auditory cortex of the alert macaque monkey. Journal of Neu­
rophysiology, 96 (6), 3323–3337.

Woolley, S. M., Fremouw, T. E., et al. (2005). Tuning for spectro-temporal modulations as
a mechanism for auditory discrimination of natural sounds. Nature Neuroscience, 8 (10),
1371–1379.

Yates, G. K. (1990). Basilar membrane nonlinearity and its influence on auditory nerve
rate-intensity functions. Hearing Research, 50, 145–162.

Yin, T. C. T., & Kuwada, S. (2010). Binaural localization cues. In A. Rees & A. R. Palmer
(Eds.), The Oxford handbook of auditory science: The auditory brain (pp. 271–302). Ox­
ford, UK: Oxford University Press.

Young, E. D. (2010). Level and spectrum. In A. Rees & A. R. Palmer (Eds.), The Oxford
handbook of auditory science: The auditory brain (pp. 93–124). Oxford, UK: Oxford Uni­
versity Press.

Zahorik, P. (2009). Perceptually relevant parameters for virtual listening simulation of


small room acoustics. Journal of the Acoustical Society of America, 126, 776–791.

Zahorik, P., Bangayan, P., et al. (2006). Perceptual recalibration in human sound localiza­
tion: Learning to remediate front-back reversals. Journal of the Acoustical Society of
America, 120 (1), 343–359.

Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source dis­
tance. Nature Neuroscience, 4 (1), 78–83.

Zatorre, R. J. (1985). Discrimination and recognition of tonal melodies after unilateral


cerebral excisions. Neuropsychologia, 23 (1), 31–41.

Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cor­
tex. Cerebral Cortex, 11, 946–953.

Zatorre, R. J., Belin, P., et al. (2002). Structure and function of auditory cortex: Music and
speech. Trends in Cognitive Sciences, 6 (1), 37–46.

Josh H. McDermott

Page 61 of 62
Audition

Josh H. McDermott, Department of Brain and Cognitive Sciences, Massachusetts In­


stitute of Technology, Cambridge MA

Page 62 of 62
Neural Correlates of the Development of Speech Perception and Compre­
hension

Neural Correlates of the Development of Speech Per­


ception and Comprehension  
Angela Friederici and Claudia Männel
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0009

Abstract and Keywords

The development of auditory language perception proceeds from acoustic features via
phonological representations to words and their relations in a sentence. Neurophysiologi­
cal data indicate that infants discriminate acoustic differences relevant for phoneme cate­
gories and word stress patterns by the age of 2 and 4 months, respectively. Salient
acoustic cues that mark prosodic phrase boundaries (e.g., pauses) are also perceived at
about the age of 5 months and infants learn about the rules according to which phonemes
are legally combined (i.e, phonotactics). At the end of their first year of life, children rec­
ognize and produce their first words, and electrophysiological studies suggest that they
establish brain mechanisms to gain lexical representations similar to those of adults. In
their second year of life, children enlarge their lexicon, and electrophysiological data
show that 24-month-olds base their processing of semantic relations in sentences on
brain mechanisms comparable to those observable in adults. At this age, children are also
able to recognize syntactic errors in a sentence, but it takes until 32 months before they
display a brain response pattern to syntactic violations similar to adults. The development
of comprehension of syntactically complex sentences, such as sentences with a noncanon­
ical word order, however, takes several more years before adult-like processes are estab­
lished.

Keywords: phonemes, word stress, prosody, phonotactics, lexicon, semantics, syntax

Introduction
Language acquisition, with its remarkable speed and high levels of success, remains a
mystery. At birth, infants are able to communicate by crying in different ways. From birth
on, infants also distinguish the sound of their native language from that of other lan­
guages. Following these first language-related steps, there is a fast progression in the de­
velopment of perceptive and expressive language skills. At about 4 months, babies start
to babble, the earliest stages of language production. A mere 12 months after birth, most

Page 1 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
babies start to speak their first words, and about half a year later, they can even produce
short sentences. Finally, at the end of most children’s third year of life, they have ac­
quired at least 500 words and know how to combine them into meaningful utterances.
Thus, they have mastered the entry into their native language: They have acquired a com­
plex system with the typical sounds of a language, these sounds are combined in different
ways to make up a wide vocabulary, and the vocabulary items are related to each other by
means of syntactic rules.

Although developmental research has delivered increasing knowledge about language ac­
quisition (e.g., Clark, 2003; Szagun, 2006), many questions remain. Studying how chil­
dren acquire language is not easily accomplished because a great deal of learning takes
place before the child is able to speak and to show overt responses to what he or she ac­
tually perceives. It is a methodological challenge to develop ways to investigate whether
infants know (p. 172) a particular word before they can demonstrate this by producing it.
The measurement of infants’ brain responses to language input can help to provide infor­
mation about speech perception abilities early in life, and, moreover, they allow us to de­
scribe the neural basis of language perception and comprehension during development.

Measuring Brain Activity in Early Development


There are several neuroimaging methods that enable the measurement of the brain’s re­
action to environmental input such as spoken language. The most frequently used mea­
sures in infants and young children are event-related brain potentials (ERPs), as regis­
tered with electroencephalography (EEG). ERPs reflect the brain’s electrical activity in
response to a particular stimulus with an excellent temporal resolution, thus covering the
high-speed and temporally sequenced sensory and cognitive processes. Each time-locked
average response typically appears as a waveform with several positive or negative peaks
at particular latencies after stimulus onset; and each peak, or component, has a charac­
teristic scalp distribution. Although ERPs deliver restricted spatial information about the
component’s distribution in two-dimensional maps, reliable source reconstruction from
surface data still poses a methodological problem. The polarity (negative/positive inflec­
tion of the waveform relative to baseline) and the latency and the scalp distribution of dif­
ferent components allow us to dissociate perceptual and cognitive processes associated
with them. Specifically, changes within the dimensions of the ERP can be interpreted as
reflecting a slowing down of a particular cognitive process (reflected in the latency), a re­
duction in the processing demands or efficiency (amplitude of a positivity or negativity),
or alterations/maturation of cortical tissue supporting a particular process (topography).
For example, ERP measures allow the investigation of infants’ early ability to detect audi­
tory changes and how the timing and appearance of these perceptual processes vary
through the first year (Kushnerenko, Ceponiene, Balan, Fellman, & Näätänen, 2002a).

Only recently, magnetoencephalography (MEG) has started to be used for developmental


research. MEG measures the magnetic fields associated with the brain’s electrical activi­
ty. Accordingly, this method also captures information processing in the brain with a high

Page 2 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
temporal resolution. In contrast to EEG, however, it also provides reliable spatial informa­
tion about the localization of the currents responsible for the magnetic field sources. For
example, an MEG study with newborns revealed infants’ instant ability to discriminate
speech sounds and, moreover, reliably located this process in the auditory cortices (Ku­
jala et al., 2004). Because movement restrictions limit the use of MEG in developmental
research, this method has been primarily applied to sleeping newborns (e.g., Kujala et al.,
2004; Sambeth, Ruohio, Alku, Fellman, & Huotilainen, 2008). However, the use of addi­
tional head-tracking techniques considerably expands its field of application (e.g., Imada
et al., 2006).

A third neuroimaging method, functional near-infrared spectroscopy (fNIRS) or optical


topography (OT) (Villringer & Chance, 1997), enables the study of cortical hemodynamic
responses in infants. This method relies on the spectroscopic determination of changes in
hemoglobin concentrations resulting from increased regional blood flow in the cerebral
cortex, which can be assessed through the scalp and skull. Changes in light attenuation
at different wavelengths greatly depend on the concentration changes in oxygenated and
deoxygenated hemoglobin ([oxy-Hb] and [deoxy-Hb]) in the cerebral cortex. Because he­
modynamic responses are only slowly evolving, the temporal resolution of this method is
low, but its spatial resolution is relatively informative, depending on the number of chan­
nels measured (Obrig & Villringer, 2003; Okamoto et al., 2004; Schroeter, Zysset, Wahl, &
von Cramon, 2004). To improve the temporal resolution, event-related NIRS paradigms
have been suggested (Gratton & Fabiani, 2001). The limitations of fNIRS are at the same
time its advantages because the spatial characteristics outrank EEG and the temporal
characteristics are comparable or superior to fMRI, so fNIRS simultaneously delivers
both kinds of information in moderate resolutions. Furthermore, it is, in contrast to MEG
and fMRI, not subject to movement restrictions and seems thus particularly suitable for
infant research. For example, fNIRS was used to locate brain responses to vocal and non­
vocal sounds and revealed voice-sensitive areas in the infant brain (Grossmann, Obereck­
er, Koch, & Friederici, 2010). Another advantage of fNIRS is its compatibility with EEG
and MEG measures, which delivers complementary high temporal information (e.g.,
Grossmann et al., 2008).

Another method that registers the metabolic demand due to neural signaling is functional
magnetic resonance imaging (fMRI). Here, the resulting changes in oxygenated hemoglo­
bin are measured as (p. 173) blood-oxygen-level-dependent (BOLD) contrast. The temporal
resolution of this method is considerably lower than with EEG/MEG, but its spatial resolu­
tion is excellent. Thus, fMRI studies provide information about the localization of sensory
and cognitive processes, not only in surface layers of the cortex as with fNIRS, but also in
deeper cortical and subcortical structures. So far, this measurement has been primarily
applied with infants while they were asleep in the scanner (e.g., Dehaene-Lambertz, De­
haene, & Hertz-Pannier, 2002; Dehaene-Lambertz et al., 2010; Perani et al., 2010). The
limited number of developmental fMRI studies may be due to both practical issues and
methodological considerations. Movement restrictions during brain scanning make it dif­
ficult to work with children in the scanner. Moreover, there is an ongoing discussion
whether the BOLD signal in children is comparable to the one in adults and whether the
Page 3 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
adult models applied so far are appropriate for developmental research (see, e.g., Muzik,
Chugani, Juhasz, Shen, & Chugani, 2000; Rivkin et al., 2004; Schapiro et al., 2004). To ad­
dress the latter problem, recent fMRI studies in infants and young children have used
age-specific brain templates (Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002) and
optimized template construction for developmental populations (Fonov et al., 2011; Wilke,
Holland, Altaye, & Gaser, 2008).

The decision to use one of these neuroimaging techniques in developmental research is


thus determined by practical matters and, in addition, is highly dependent on the kind of
information sought, that is, the neuronal correlates of information processing in their
temporal or spatial resolution. Ideally, various methods using their complementary abili­
ties should be combined because results from the respective measures all add up to pro­
vide insight into the brain bases of language development in its early stages.

Neural Dispositions of Language in the Infant


Brain
For about half a century, developmental researchers have used electrophysiological mea­
sures to investigate the neural basis of language development (for reviews, see Csibra,
Kushnerenko, & Grossmann, 2008; Friederici, 2005; Kuhl, 2004). Methods that allow a
better spatial resolution, (i.e., MEG, fNIRS, and fMRI) have only recently been applied in
language research with infants and young children (for reviews, see Cheour et al., 2004;
Gervain et al., 2011; Leach & Holland, 2010; Lloyd-Fox, Blasi, & Elwell, 2010; Minagawa-
Kawai, Mori, Hebden, & Dupoux, 2008; Vannest et al., 2009).

Although there are only a few studies that have applied the latter methods with infants so
far, their findings strongly point to neural dispositions for language in the infant brain. In
an fNIRS study with sleeping newborns, Pena et al. (2003) observed a left hemispheric
dominance in temporal areas for normal speech compared with backward speech. In an
fMRI experiment with 3-month-olds, Dehaene-Lambertz, Dehaene, and Hertz-Pannier
(2002) also found that speech sounds, collapsed across forward and backward speech,
compared with silence evoked strong left hemispheric activation in the superior temporal
gyrus. This activation included Heschl’s gyrus and extended to the superior temporal sul­
cus and the temporal pole (Figure 9.1). Activation differences between forward and back­
ward speech were observed in the left angular gyrus and the precuneus. An additional
right frontal brain activation occurred only in infants that were awake and was interpret­
ed as reflecting attentional factors.

In a second analysis of the fMRI data from 3-month-olds, Dehaene-Lambertz et al. (2006)
found that the temporal sequence of left hemispheric activations in the different brain ar­
eas was similar to adult patterns, with the activation in the auditory cortex preceding ac­
tivation both in the most posterior and anterior parts of the temporal cortex and in
Broca’s area. The reported early left hemisphere specialization has been shown to be
speech-specific (Dehaene-Lambertz et al., 2010) and, in addition, appears more pro­

Page 4 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
nounced for native relative to non-native language input (Minagawa-Kawai et al., 2011).
Specifically, 2-month-olds showed stronger fMRI activation in the left posterior temporal
lobe in response to language than music stimuli (Dehaene-Lambertz et al., 2010). In an
fNIRS study with 4-month-olds, Minagawa-Kawai et al. (2011) reported stronger respons­
es in left temporal regions for speech relative to three nonspeech conditions, with native
stimuli revealing stronger lateralization patterns than non-native stimuli. Left hemispher­
ic superior temporal activation has also been reported as a discriminative response to syl­
lables in a recent MEG study with neonates, 6-month-olds, and 12-month-olds (Imada et
al., 2006). Interestingly, 6- and 12-month-olds additionally showed activation patterns in
inferior frontal regions, but newborns did not yet do this. Together with the results in 3-
month-olds (Dehaene-Lambertz et al., 2006), these findings indicate developmental
changes in motor speech areas (Broca’s area) at an age when infants produce their first
syllable sequences and words.

Figure 9.1 Neural disposition for language in the in­


fant brain. Brain activation of 3-month-old infants in
response to speech sounds. A, Averaged brain activa­
tion in response to speech sounds (forward speech
and backward speech versus rest). B, left panel, Av­
eraged brain activation in awake infants (forward
speech versus backward speech). Right panel, Aver­
aged hemodynamic responses to forward speech and
backward speech in awake and asleep infants. L, left
hemisphere; R, right hemisphere.

Reprinted with permission from Dehaene-Lambertz,


Dehaene, & Hertz-Pannier, 2002. Copyright © 2002,
American Association for the Advancement of
Science.

(p. 174)

Page 5 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Thus, it appears that early during the development, there is a dominance of the left hemi­
sphere for the processing of speech, and particularly native language stimuli, in which
both segmental and suprasegmental information is intact, compared with, for example,
backward speech, in which this information is altered. However, the processing of more
fine-grained phonological features and language-specific contrasts may lateralize later
during the development, once children have advanced from general perceptual abilities to
the attunement to their native language (e.g., Minagawa-Kawai, Mori, Naoi, & Kojima,
2007).

In adults, processing of sentence-level prosody (i.e., suprasegmental information) has


been shown to predominantly recruit brain areas in the right hemisphere (Meyer, Alter,
Friederici, Lohmann, & von Cramon, 2002; Meyer, Steinhauer, Alter, Friederici, & von
Cramon, 2004). To investigate the brain basis for prosodic processes in infancy, Sambeth
et al. (2008) used MEG and presented sleeping newborns with varying degrees of prosod­
ic information. For normal continuous speech and singing, infants showed pronounced bi­
lateral brain responses, which, however, dramatically decreased when infants were pre­
sented with filtered low-prosody speech. Similarly, in two fNIRS studies with newborns,
Saito and colleagues observed first, increased bilateral frontal responses to (p. 175) in­
fant-directed compared with adult-directed speech, with the former featuring more pro­
nounced prosodic information (Saito et al., 2007a). Second, infants only showed this
frontal activation pattern for speech with normal prosody, but not for speech with flat
prosody (Saito et al., 2007b). An fNIRS study with 3-month-olds, which directly compared
normal and flattened speech, revealed activation differences in the right temporal-pari­
etal cortex, suggesting a right hemispheric dominance for the processing of sentential
prosody (pitch information) similar to the dominance reported in adults (Homae, Watan­
abe, Nakano, Asakawa, & Taga, 2006). Surprisingly, at 10 months, infants showed anoth­
er activation pattern, with flattened speech evoking stronger responses than normal
speech in right temporal and temporal-parietal regions and bilateral prefrontal regions,
which the authors explained by the additional effort to process unfamiliar pitch contours
in brain regions specialized for prosodic information processing and attention allocation
(Homae, Watanabe, Nakano, Asakawa, & Taga, 2007). Thus, the combined data on infant
prosodic processing suggest that infants are sensitive to prosodic information from early
on, but that the brain activation develops from a bilateral toward a more right lateralized
pattern.

Given these early neuroimaging findings, it seems that the functional neural network on
which language is based, with a left-hemispheric dominance for speech over nonspeech
and right-hemispheric dominance for prosody (Friederici & Alter, 2004; Hickok & Poep­
pel, 2007), is, in principle, established during the first 3 months of life. However, it ap­
pears that neither the specialization of the language-related areas (e.g., Brauer &
Friederici, 2007; Minagawa-Kawai, Mori, Naoi, & Kojima, 2007) nor all of their structural
connections are fully developed from early on (Brauer, Anwander, & Friederici, 2011;
Dubois et al., 2008).

Page 6 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension

Developmental Stages in Language Acquisition


and Their Associated Neural Correlates
The development of auditory language perception proceeds from acoustic features via
phonological representations to the representation of words and their relations in a sen­
tence. From a schematic point of view, two parallel developmental paths can be consid­
ered: one proceeding from acoustic cues to phonemes, and then to words and their mean­
ings, and the other proceeding from acoustic cues to prosodic phrases, to syntactic phras­
es and their relations.

With respect to the first developmental path, neurophysiological data indicate that
acoustic differences in phonemes and word stress patterns are detected by the age of 2 to
4 months. At the end of their first year, children recognize and produce their first words,
and ERP studies suggest that infants have established brain mechanisms necessary to ac­
quire lexical representations in a similar way to adults, although these are still less spe­
cific. In their second year, children enlarge their lexicon, and ERP data show that 24-
month-olds process semantic relations between nouns and verbs in sentences. These
processes resemble those in adults, indicated by children at this age already displaying
an adult-like N400 component reflecting semantic processes.

With respect to the second developmental path, developmental studies show that salient
acoustic cues which mark prosodic phrase boundaries (e.g., pauses) are also perceived at
about the age of 5 months, although it takes some time before less salient cues (e.g.,
changes in the pitch contour) can be used to identify prosodic boundaries that divide the
speech stream into lexical and syntactic units. Electrophysiological data suggest that the
processing of prosodic phrase structure, reflected by the closure positive shift (CPS),
evolves with the emerging ability to process syntactic phrase structure. At the age of 2
years, children are able to recognize syntactic errors in a sentence, reflected by the P600
component. However, the fast automatic syntactic phrase structure building processes,
indicated by the early left anterior negativity (ELAN) in addition to the P600, do not oc­
cur before the age of 32 months. The ability to comprehend syntactically complex sen­
tences, such as those with noncanonical word order (e.g., passive sentences, object-first
sentences), takes a few more years until adult-like processes are established. Diffusion-
tensor imaging data suggest that this progression is dependent on the maturation of the
fiber bundles that connect the language-relevant brain areas in the inferior frontal gyrus
(Broca’s area) and in the superior temporal gyrus (posterior portion). Figure 9.2 gives an
overview of the outlined developmental steps in language acquisition and the following
sections will describe the related empirical evidence in more detail.

From Sounds to Words

Page 7 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension

Figure 9.2 Developmental stages of language acqui­


sition. Development stages are specified in the top
row and their associated ERP components in the bot­
tom row. MMN, mismatch negativity.

Modified from Friederici, 2005; Männel & Friederici,


2008.

On the path from sounds to words, infants initially start to process phonological informa­
tion that makes up the actual speech sounds (phonemes) and the rules according to
which these sounds are combined (phonotactic rules). Soon, they process prosodic stress
patterns of words, which help them to recognize lexical units in the speech
(p. 176)

stream. These information types are accessed before the processing of word meaning.

Phoneme Characteristics
As one crucial step in language acquisition, infants have to tackle the basic speech
sounds of their native language. The smallest sound units of a language, phonemes, are
contrastive from each other, although functionally equivalent. In a given language, a cir­
cumscribed set of approximately 40 phonemes can be combined in different ways to form
unique words. Thus, the meaning of a word changes when one of its component
phonemes is exchanged with another, as in from cat to pat.

Electrophysiological studies investigated phoneme discrimination using the mismatch


paradigm. In this paradigm, two classes of stimuli are repeatedly presented, with one
stimulus occurring relatively frequently (standard) and the other one relatively rarely (de­
viant). The mismatch negativity (MMN) component is a preattentive electrophysiological
response that is evoked by any discriminable change in repetitive auditory stimulation
(Näätänen, 1990). Thus, the mismatch response (MMR) in the ERP is the result of the
brain’s automatic detection of the deviant among the standards. Several ERP studies have
studied phoneme discrimination in infants and reported that the ability to detect acoustic
changes in consonant articulation (Dehaene-Lambertz & Dehaene, 1994), consonant du­
ration (Kushnerenko et al., 2001), vowel duration (Leppänen, Pikho, Eklund, & Lyytinen,
1999; Pihko et al., 1999), and vowel type (Cheour et al., 1998) is present between 2 and 4
months of age.

For example, Friederici, Friedrich, and Weber (2002) investigated infants’ ability to dis­
criminate between different vowel lengths in phonemes at the age of 2 months. Infants
were presented with two syllables of different duration, /ba:/(baa) versus /ba/(ba), in an
MMR paradigm. Two separate sessions tested the long syllable /ba:/(baa) as deviant in a

Page 8 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
stream of short syllable /ba/(ba) standards, and short /ba/(ba) as deviant in a stream of
long /ba:/(baa) standards.

Figure 9.3 Syllable discrimination. ERP responses of


2-month-old infants to standard and deviant syllables
and difference waves (deviant-standard) for the long
syllable /ba:/ and the short syllable /ba/ in an auditory
oddball paradigm.

Modified from Friederici, Friedrich, & Weber, 2002.

In Figure 9.3, the ERP difference waves display a positivity with a frontal maximum at
about 500-ms post-syllable onset for deviant processing. However, this positivity was only
present for the deviancy detection of the long syllable in a stream of short syllables but
not vice versa, which can be explained by the greater perceptual saliency of a larger ele­
ment in the context of smaller elements. In adults, the same experimental setting evoked
a pronounced negative deflection at about 200-ms post-stimulus onset in the difference
wave, the typical MMN response to acoustically deviating stimuli. Interestingly, in in­
fants, the response varied depending on their state of alertness; children who were in qui­
et sleep during the experiment showed only a positivity, while children who (p. 177) were
awake showed an adult-like MMN in addition to the positivity. From the data, it follows
that infants as young as 2 months of age are able to discriminate long syllables from short
syllables and that they display a positivity in the ERP as MMR.

Interestingly, language-specific phonemic discrimination is established only later during


infants’ development, between the age of 6 and 12 months. Electrophysiological evidence
revealed that younger infants aged 6 and 7 months show discrimination of phonemic con­
trasts that are either relevant or not relevant for their native language, whereas older in­
fants aged 11 and 12 months only display discrimination of the phonemic contrast in their
native language (Cheour et al., 1998; Rivera-Gaxiola, Silva-Pereyra, & Kuhl, 2005). Simi­
larly, Minagawa-Kawai, Mori, Naoi, and Kojima (2007) showed in an fNIRS study that in­
fants tune into their native language-specific phoneme contrast at about the age of 6 to 7
months. However, the left dominance of the phoneme-specific response in the temporal
regions was observed only in infants aged 13 months and older. These results suggest
that phoneme contrasts are initially processed as acoustic rather than linguistic differ­
ence until at about 12 months, when left hemisphere regions are recruited similarly to in
adults (Minagawa-Kawai, Mori, Furuya, Hayashi, & Sato, 2002). In infant studies using
the mismatch paradigm, the MMR can appear as either a positive or a negative deflection
in the ERP. For example, Kushnerenko et al. (2001) presented sleeping newborns with
fricatives of different durations and observed negative MMRs, whereas Leppänen et al.
Page 9 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
(1999) and Pihko et al. (1999) reported positive MMRs in sleeping newborns for syllables
with different vowel length. The outcome of ERP responses to auditory change detection
seems to be affected by several factors, for example, the infants’ state of alertness (awake
or asleep). Furthermore, stimulus discrimination difficulty or saliency seems to have an
impact on the discrimination response (Friederici, Friedrich, & Weber, 2002; Morr,
Shafer, Kreuzer, & Kurtzberg, 2002). Also, the transition from a positive to a negative
MMR has been shown to be an effect of advancing maturation (Kushnerenko et al., 2002b;
Morr et al., 2002; Trainor et al., 2003). Despite the differences in the ERP morphology of
the detection of phoneme changes, the combined data suggest that infants’ ability to au­
tomatically discriminate between different phonemes is present from early on.

Word Stress
Another important phonological feature that infants have to discover and make use of
during language acquisition is the rule according to which stress is applied to multisyllab­
ic words. For example, English, like German, is a stress-based language and has a bias to­
ward a stress-initial pattern for bisyllabic words (Cutler & Carter, 1987). French, in con­
trast, is a syllable-based language that tends to lengthen the word’s last syllable (Nazzi,
Iakimova, Bertoncini, Frédonie, & Alcantara, 2006).

Behaviorally, it has been shown that even newborns discriminate differently stressed
pseudowords (Sansavini, Bertoncini, & Giovanelli, 1997) and that between 6 and 9
months, infants acquire language-specific knowledge about the stress pattern of possible
word candidates (Jusczyk, Cutler, & Redanz, 1993; Höhle, Bijeljac-Babic, Nazzi, Herold, &
Weissenborn, 2009; Skoruppa et al., 2009). Interestingly, studies revealed that stress pat­
tern discrimination at 6 months is shaped by language experience, as German-learning
(p. 178) but not French-learning infants distinguish between stress-initial and stress-final

pseudowords (Höhle et al., 2009). Similarly, at 9 months, Spanish-learning but not


French-learning infants show discrimination responses, suggesting that French infants,
although they are sensitive to the acoustic differences, do not treat stress as lexically in­
formative (Skoruppa et al., 2009).

Neuroimaging studies that do not require infants’ attention during testing suggest that
infants are sensitive to the predominant stress pattern of their target language as early
as 4 to 5 months of age (Friederici, Friedrich, & Christophe, 2007; Weber, Hahne,
Friedrich, & Friederici, 2004). In an ERP study, Friederici, Friedrich, and Christophe
(2007) tested two groups of 4- to 5-month-old German- and French-learning infants for
their ability to discriminate between different stress patterns. In a mismatch paradigm,
the standard stimuli were bisyllabic pseudowords with stress on the first syllable (baaba),
whereas the deviant stimuli had stress on the second syllable (babaa). The data revealed
that both groups are able to discriminate between the two types of stress patterns (Fig­
ure 9.4). However, results differed with respect to the amplitude of the MMR: Infants
learning German showed a larger effect for the language-nontypical iambic pattern
(stress on the second syllable), whereas infants learning French demonstrated a larger ef­
fect for the language-nontypical trochaic pattern (stress on the first syllable). These re­
sults suggest that the respective nontypical stress pattern is considered deviant both
Page 10 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
within the experiment (i.e., rare stimulus in the set) and with respect to an individual
infant’s native language. This finding, in turn, presupposes that infants have established
knowledge about the predominant stress pattern of their target language by the age of 5
months.

As behavioral and electrophysiological developmental studies suggest, early syllable iden­


tification and stress pattern discrimination support speech segmentation during later ac­
quisition stages, performed by identifying onset and offset boundaries. Accordingly, in a
number of behavioral experiments, Nazzi, Dilley, Jusczyk, Shattuck-Hunagel, and Jusczyk
(2005) demonstrated that both the type of initial phoneme and the stress pattern influ­
ence word segmentation from fluent speech, with a preference for the predominant pat­
terns of the infants’ native language. Similarly, infants’ word detection is facilitated when
words occur at boundary positions and are thus marked by additional prosodic informa­
tion (Gout, Christophe, & Morgan, 2004; Seidl & Johnson, 2007). Moreover, several stud­
ies that measured children’s later language outcome in lexical-semantic and syntactic do­
mains revealed the predictive value of infants’ early ERP responses to phoneme and
stress pattern contrasts (Friedrich & Friederici, 2010; Kuhl et al., 2008; Tsao, Liu, & Kuhl,
2004).

Regarding the development of word segmentation abilities, behavioral studies have


demonstrated that at the age of 7.5 months, infants learning English are able to segment
bisyllabic words with stress on the first syllable from continuous speech but not those
with stress on the second syllable (Jusczyk, Houston, & Newsome, 1999). Segmentation
of stress-initial words was also reported in 9-month-old Dutch-learning infants for both
native and nonnative words, which, however, all followed the same language-specific
stress pattern rules (Houston, Jusczyk, Kuijpers, Coolen, & Cutler, 2000; Kuijpers, Coolen,
Houston, & Cutler, 1998). In contrast, the ability to segment words with stress on the sec­
ond syllable was only observed at the age of 10.5 months in English-learning infants
(Jusczyk, Houston, & Newsome, 1999). For French-learning infants, Nazzi et al. (2006)
found developmental differences between 8 and 16 months for the detection of syllables
and bisyllabic words in fluent speech. Bisyllabic words as one unit are only detected at
the age of 16 months. Although no segmentation effect was found for 8-month-olds, 12-
month-olds segmented individual syllables from the speech stream, with more ease in
segmenting the second syllable, which is consistent with the rhythmic features of French.

Page 11 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension

Figure 9.4 Word stress. ERP responses of 4- to 5-


month-olds to standard and deviant stress patterns in
an auditory oddball paradigm. A, Grand-average
ERPs for French infants with the trochaic stress pat­
tern /baaba/ as standard and deviant (upper row) and
the iambic stress pattern /babaa/ as standard and de­
viant (lower row). B, Grand-average ERPs for Ger­
man infants with the trochaic stress pattern /baaba/
as standard and deviant (upper row) and the iambic
stress pattern /babaa/ as standard and deviant (lower
row).

Reprinted with permission from Friederici, Friedrich,


& Christophe, 2007.

Electrophysiological studies on word segmentation revealed word recognition responses


for Dutch-learning 7-month-olds in the ERP to previously familiarized words (Kooijman,
Johnson, & Cutler, 2008), whereas behavioral studies observed word segmentation for 9-
month-olds, but not yet for 7.5-month-olds (Kuijpers, Coolen, Houston, & Cutler, 1998).
Interestingly, detection of words in sentences by German-learning infants was observed
even at 6 months, when during familiarization, words were particularly prosodically em­
phasized (Männel & Friederici, 2010). For the segmentation of the less familiar final-
stress pattern, Dutch-learning 10-months-olds still largely relied on the strong syllable to
launch words (Kooijman, Hagoort, & Cutler, 2009). Similarly, relating to the behavioral
delay of French-learning infants in bisyllabic word segmentation, Goyet, de Schonen, and
Nazzi (2010) found for French 12-month-olds, ERP responses to (p. 179) bisyllabic stress-
final words that revealed both whole word and syllable segmentation.

Phonotactics
For successful language learning, infants eventually need to acquire the rules according
to which phonemes may be combined to form a word in a given language. As infants be­
come more familiar with actual speech sounds, they gain probabilistic knowledge about
particular phonotactic rules. This also includes information about which phonemes or
phoneme combinations can legitimately appear at word onsets and offsets. If infants ac­

Page 12 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
quire this kind of information early on, it can support the detection of lexical units in con­
tinuous speech and thus facilitate the learning of new words.

Behaviorally, it has been shown that phonotactic knowledge about word onsets and off­
sets is present and used for speech segmentation at the age of 9 months, but is not yet
present at 6 months of age (Friederici & Wessels, 1993; Jusczyk, Friederici, Wessels,
Svenkerud, & Jusczyk, 1993). In ERP studies, the N400 component can serve as an elec­
trophysiological marker for studying phonotactic knowledge by enabling the comparison
of brain responses to nonwords that follow the phonotactic rules of a given language and
nonsense words that do not. The N400 component is known to indicate lexical (word
form) and semantic (meaning) processes and is interpreted to mark the effort to integrate
an event into its current or long-term context, with more pronounced N400 amplitudes in­
dicating lexically and semantically unfamiliar or unexpected events (Holcomb, 1993; Ku­
tas & Van Petten, 1994). Regarding phonotactic processing in adults, ERP studies re­
vealed larger N400 amplitudes for pseudowords (i.e., (p. 180) phonotactically legal but
nonexistent in the lexicon) than to real words. In contrast, nonwords (i.e., phonotactically
illegal words) did not elicit an N400 response (e.g., Bentin, Mouchetant-Rostaing, Giard,
Echallier, & Pernier, 1999; Holcomb, 1993; Nobre & McCarthy, 1994). This suggests that
pseudowords trigger search processes for possible lexical entries, but this search fails be­
cause pseudowords do not exist in the lexicon. Nonwords, in contrast, do not initiate a
similar search response because they are not even treated as possible lexical entries as
they already violate the phonotactic rules.

Figure 9.5 Phonotactic rules. ERP data of 12-month-


olds, 19-month-olds, and adults in response to phono­
tactically illegal nonwords and phonotactically legal
pseudowords in a picture–word priming paradigm.

Modified from Friedrich & Friederici, 2005a.

In a developmental ERP study, Friedrich and Friederici (2005a) investigated phonotactic


knowledge in 12- and 19-month-old toddlers by measuring brain responses to phonotacti­
cally legal pseudowords and phonotactically illegal nonwords. In a picture–word priming
paradigm, children were presented with simple colored pictures while simultaneously lis­
tening to words that either correctly labeled the picture content or were pseudowords or
nonwords. The picture content is assumed to initiate lexical-semantic priming, which re­
sults in semantic integration difficulties when the respective labels do not match the pic­
tures, reflected in enhanced N400 amplitudes. As Figure 9.5 illustrates, the ERP respons­
es of 19-month-olds are quite similar to the ones observed in adults because they demon­
strate more negative responses to phonotactically legal pseudowords than to phonotacti­
cally illegal nonwords. Adults show the typical N400 response to pseudowords, starting at
about 400 ms after stimulus onset, whereas in 19-month-olds, the negative deflection to
Page 13 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
pseudowords is sustained longer. In contrast, data of 12-month-olds do not reveal differ­
ential ERP responses to pseudowords and nonwords. From these data it follows that, in
contrast to 12-month-olds, who do not yet have this ability, 19-month-olds possess some
phonotactic knowledge (indicated by an N400-like response) and therefore treat pseudo­
words, but not nonwords, as likely referents for picture labels. This implies that nonwords
that do not follow the language-specific phonotactic rules are not considered word candi­
dates and, from very early on, are excluded from further word learning.

Phonological Familiarity
Infants’ emerging efforts to map sounds onto objects (or pictures of objects) has been
captured in an additional ERP effect. An ERP study with 11-month-olds suggested a dif­
ferential brain response to known compared with unknown words in the form of a nega­
tivity at about 200 ms after word onset, which could be viewed as a familiarity effect
(Thierry, Vihman, & Roberts, 2003). Using a picture–word priming paradigm with 12- and
14-month-olds, Friedrich and Friederici (2005b) observed an early frontocentral negativi­
ty between 100 and 400 ms for auditory word targets that matched the picture compared
with nonmatching words. This early effect was interpreted as a familiarity effect reflect­
ing the fulfillment of a phonological (word) expectation after seeing the picture of an ob­
ject. At this age, infants seem to have some lexical knowledge, but the specific word form
referring to a given object may not yet be sharply defined. This interpretation is support­
ed by the finding that 14-month-olds show an ERP difference between known words and
phonetically dissimilar known words, but not between known words and phonetically sim­
ilar words (Mills et al., 2004). The available data thus indicate that phonological informa­
tion and semantic knowledge interact at about 14 months of age.

Word Meaning
As described above, the adult N400 component reflects the integration of a lexical ele­
ment into a semantic context (Holcomb, 1993; Kutas & Van Petten, 1994) and can be used
as an ERP template (p. 181) against which the ERPs for lexical-semantic processing during
early development are compared.

Mills, Coffey-Corina, and Neville (1997) investigated infants’ processing of words whose
meaning they knew or did not know. Infants between 13 and 17 months of age showed a
bilateral negativity for unknown words, whereas 20-month-olds showed a left-hemispher­
ic negativity, which was interpreted as a developmental change toward a hemispheric
specialization for word processing (see also Mills et al., 2004). In a more recent ERP
study, Mills and colleagues tested the effect of word experience (training) and vocabulary
size (word production) on lexical processing (Mills, Plunkett, Prat, & Schafer, 2005). In
this word-learning paradigm, 20-month-olds acquired novel words either paired with a
novel object or without an object. After training, the infant ERPs showed a repetition ef­
fect indicated by a reduced N200-500 amplitude to familiar and novel unpaired words,
whereas an increased bilaterally distributed N200-500 was found for novel paired words.
This finding indicates that the N200-500 is linked to word meaning; however, it is not en­
tirely clear whether the N200-500 reflects semantic processes alone or whether phono­

Page 14 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
logical familiarity also plays a role. Assuming that this early ERP effect indeed reflects se­
mantic processing, its early onset may be explained by infants’ relatively small vocabular­
ies. A small vocabulary results in a low number of phonologically possible alternative
word forms, allowing the brain to respond soon after hearing a word’s first phonemes
(see earlier section on phonological familiarity).

A clear semantic-context N400 effect at the word level has been observed for 14- and 19-
month-olds, but not yet for 12-month-olds (Friedrich & Friederici, 2005a, 2005b, 2004).
The ERP to words in picture contexts showed a central-parietal, bilaterally distributed
negative-going wave between 400 and 1400 ms, which was more negative for words that
did not match the picture context than those that did (Figure 9.6). Compared with adults,
this N400-like effect reached significance later and lasted longer, which suggests slower
lexical-semantic processes in children. There were also small topographic differences of
the effect because children showed a stronger involvement of frontal electrode sites than
adults. The more frontal distribution could either mean that children’s semantic process­
es are still more image-based (see frontal distribution in adults for picture instead of
word processing; West & Holcomb, 2002) or that children recruit frontal brain regions
that, in adults, are associated with attention (Courchesne, 1990) and increased demands
on language processing (Brauer & Friederici, 2007). In a recent study, Friedrich and
Friederici (2010) found that the emergence of the N400 is not merely age dependent but
also related to the infants’ state of behavioral language development. Twelve-month-olds,
who obtained a high early word production score, already displayed an N400 semantic
priming effect, whereas infants with lower vocabulary rates did not.

Torkildsen and colleagues examined lexical-semantic processes as indicated by the N400


in 2-year-olds. In the first study, using a picture–word priming paradigm, they found that
20-month-olds showed earlier and larger N400 effects for between-category than within-
category violations, pointing to children’s early knowledge about semantic categories
(Torkildsen et al., 2006). In the second study, the authors used a unimodal lexical-seman­
tic priming paradigm with semantically related and unrelated word pairs, and demon­
strated that 24-month-olds reveal a phonological-lexical familiarity effect for related word
pairs and an N400 effect for unrelated word pairs, suggesting that semantic relatedness
priming is functional at the end of children’s second year (Torkildsen et al., 2007).

There are few fMRI studies with children investigating lexical-semantic processes at the
word level. These fMRI studies suggest that a neural network for the processing of words
and their meaning often seen in adults is established by the age of 5 years. For example,
one study used a semantic categorization task with 5- to 10-year-old children and ob­
served activation in the left inferior frontal gyrus and the temporal region as well as in
the left fusiform gyrus, suggesting a left hemispheric language network similar to that in
adults (Balsamo, Xu, & Gaillard, 2006). Another study used a semantic judgment task
requiring the evaluation of the semantic relatedness of two auditorily presented words
(Chou et al., 2006). During this task, 9- to 15-year-olds showed activation in the temporal
gyrus and in the inferior frontal gyri bilaterally. In a recent fMRI language mapping study
with 8- to 17-year-old children, de Guibert et al. (2010) applied two auditory lexical-se­

Page 15 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
mantic tasks and two visual phonological tasks and observed selective activations in left
frontal and temporal regions.

Figure 9.6 Word meaning. ERP data and difference


maps (nonmatching–matching) of 12-month-olds, 14-
month-olds, 19-montholds, and adults in response to
words matching or not matching the picture content
in a picture–word priming paradigm.

Modified from Friedrich & Friederici, 2005a, 2005b.

In summary, in the developmental ERP studies on semantic processing at the word level
we have introduced, two ERP effects have been observed. First, an early negativity in re­
sponse to picture-matching words has been found even in 12-month-olds (p. 182) and can
be interpreted as a phonological familiarity effect. Second, a later central-parietal nega­
tivity for nonmatching words has been observed in 14- and 19-month-olds, an effect re­
ferred to as infant N400. The occurrence of a phonological familiarity effect across all age
groups suggests that not only 14- and 19-month-olds but also 12-month-olds create lexical
expectations from picture contents, revealing that they already possess some lexical-se­
mantic knowledge. However, infants at the latter age do not yet display an N400 semantic
expectancy violation effect present in 14-month-olds, which indicates that the neural
mechanisms of the N400 mature between 12 and 14 months of age. Furthermore, at the
end of their second year, toddlers are sensitive to semantic category relations and seman­
tic relatedness of basic-level words. The finding that the N400 at this age still differs in
latency and distribution from the adult N400 suggests that the underlying brain systems
are still under development. The fact, however, that an N400 effect is present at this age
implies that this ERP component is a useful tool to further investigate semantic process­
ing in young children. In this context, developmental fMRI studies have revealed left-
hemisphere activation patterns for lexical-semantic processes that resemble those of
adults. Direct developmental comparisons, however, suggest age-related activation in­
creases in left inferior frontal regions and left superior temporal regions, indicating
greater lexical control and experience-based gain in lexical representations, respectively
(Booth et al., 2004; Cao et al., 2010; Schlaggar et al., 2002).

Page 16 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
From Sounds to Sentences

On the path from sounds to sentences, prosodic information plays a central role. Senten­
tial prosody is crucial for the acquisition of syntactic structure because different acoustic
cues that in combination mark prosodic phrase boundaries often signal syntactic phrase
boundaries. The detection and processing of prosodic phrase boundaries thus facilitate
the segmentation of linguistically relevant units from continuous speech and provide an
easy entry into later lexical and syntactic learning (see Gleitman & Wanner, 1982).

Sentence-Level Prosody
Intonational phrase boundaries (IPBs) mark the largest units in phrasal prosody, roughly
(p. 183) corresponding to syntactic clauses, and are characterized by several acoustic

cues, namely, preboundary lengthening, pitch change, and pausing (Selkirk, 1984). Be­
haviorally, it has been shown that adult listeners make use of prosodic boundaries in the
interpretation of spoken utterances (e.g., Schafer, Speer, Warren, & White, 2000). Simi­
larly, developmental studies indicate that infants perceive larger linguistic units in contin­
uous speech based on prosodic boundary cues. Although 6-month-old English-learning in­
fants detect clauses in continuous speech, they cannot yet reliably identify syntactic
phrases in continuous speech (Nazzi, Kemler Nelson, Jusczyk, & Jusczyk, 2000; Seidl,
2007; Soderstrom, Nelson, & Jusczyk, 2005; Soderstrom, Seidl, Nelson, & Jusczyk, 2003).
In contrast, 9-month-olds demonstrate this ability at both clause and phrase level (Soder­
strom et al., 2003). Thus, the perception of prosodic cues that, in combination, signal
boundaries appears to be essential for the structuring of the incoming speech signal and
enables further speech analyses.

In adult ERP studies, the offset of IPBs is associated with a positive-going deflection with
a central-parietal distribution, the CPS (Pannekamp, Toepel, Alter, Hahne, & Friederici,
2005; Steinhauer, Alter, & Friederici, 1999). This component has been interpreted as an
indicator of the closure of prosodic phrases by IPBs. The CPS has been shown to be not a
mere reaction to the acoustically salient pause (lower-level processing), but rather an in­
dex for the underlying linguistic process of prosodic structure perception (higher-level
processing) because it is still present when the pause is deleted (Steinhauer, Alter, &
Friederici, 1999).

To investigate the electrophysiology underlying prosodic processing at early stages of lan­


guage acquisition, a recent ERP study examined 5-month-olds’ ability to process IPBs
with and without a boundary pause (Männel & Friederici, 2009). Infants listened to sen­
tences with two different prosodic realizations determined by their particular syntactic
structure: sentences containing an IPB (e.g., Tommi verspricht, # Papa zu helfen [Tommi
promises to help Papa]), and sentences without an IPB (e.g., Tommi verspricht Papa zu
schlafen [Tommi promises Papa to sleep]). In a first experiment, 5-month-olds showed no
CPS in response to IPBs; instead, they demonstrated an obligatory ERP response to sen­
tence continuation after the pause. In a second experiment in which the boundary pause
had been deleted and only preboundary lengthening and pitch change signaled the IPBs,
another group of 5-month-olds did not reveal the obligatory ERP response observed previ­

Page 17 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
ously. In contrast, adults showed a CPS in addition to obligatory ERP responses indepen­
dent of the presence of the boundary pause (see also Steinhauer, Alter, & Friederici,
1999). The developmental comparison indicates that infants are sensitive to salient
acoustic cues such as pauses in the speech input, and that they process speech interrup­
tions at lower perceptual levels. However, they do not yet show higher-level processing of
combined prosodic boundary cues, reflected by the CPS.

ERP studies in older children examined when, during language learning, the processes
associated with the CPS emerge by exploring the relationship between prosodic boundary
perception and syntactic knowledge (Männel & Friederici, 2011). ERP studies on the pro­
cessing of phrase structure violations have revealed a developmental shift between
children’s second and third year (Oberecker, Friedrich, & Friederici, 2005; Oberecker &
Friederici, 2006; see below). Accordingly, children were tested on IPB processing before
this developmental phase, at 21 months, and after this phase, at 3 and 6 years of age. As
can be seen from Figure 9.7, 21-month-olds do not yet show a positive shift in response to
IPBs, although 3- and 6-year-olds do. These results indicate that prosodic structure pro­
cessing, as indicated by the CPS, does not emerge until some knowledge of syntactic
phrase structure has been established.

The combined ERP findings on prosodic processing in infants and children suggest that
during early stages of language acquisition, infants initially rely on salient acoustic as­
pects of prosodic information that are likely contributors to the recognition of prosodic
boundaries. Children may initially detect prosodic breaks through lower-level processing
mechanisms until a degree of syntactic structure knowledge is formed through continued
language experience that, in turn, reinforces the ability of children to perceive prosodic
phrasing at a cognitive level.

The use of prosodic boundary cues for later language learning has been shown in lexical
acquisition (Gout, Christophe, & Morgan, 2004; Seidl & Johnson, 2007), and in the acqui­
sition of word order regularities (Höhle, Weissenborn, Schmitz, & Ischebeck, 2001). Thus,
from a developmental perspective, the initial analysis and segmentation of larger linguis­
tically relevant units based on prosodic boundary cues seems to be particularly important
during language acquisition and likely facilitates bootstrapping into smaller syntactic and
lexical units in the speech signal later in children’s development.

Page 18 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension

Figure 9.7 Sentence-level prosody. ERP data and dif­


ference maps (with IPB–without IPB) of 21-month-
olds, 3-year-olds, and 6-year-olds in response to sen­
tences with and without intonational phrase bound­
aries (IPB).

Modified from Männel, 2011.

(p. 184)

Sentence-Level Semantics
Sentence processing requires not only the identification of linguistic units but also the
maintenance of the related information in working memory and the integration of differ­
ent information over time. To understand the meaning of a sentence, the listener has to
possess semantic knowledge about nouns and verbs as well as their respective relation­
ship (for neural correlates of developmental differences between noun and verb process­
ing, see Li, Shu, Liu, & Li, 2006; Mestres-Misse, Rodriguez-Fornells, & Münte, 2010; Tan
& Molfese, 2009). To investigate whether children already process word meaning and se­
mantic relations in sentential context, the semantic violation paradigm can be applied
with semantically correct and incorrect sentences such as The king was murdered and
The honey was murdered, respectively (Friederici, Pfeifer, & Hahne, 1993; Hahne &
Friederici, 2002). This paradigm uses the N400 as an index of semantic integration abili­
ties, with larger N400 amplitudes for higher integration efforts of semantically inappro­
priate words into their context. The semantic expectation of a possible sentence ending,
for example, is violated in The honey was murdered because the verb at the end of the
sentence (murdered) does not semantically meet the meaning that was set up by the noun
in the beginning (honey). In adult ERP studies, an N400 has been found in response to
such semantically unexpected sentence endings (Friederici, Pfeifer, & Hahne, 1993;
Hahne & Friederici, 2002).

Friedrich and Friederici (2005c) studied the ERP responses to semantically correct and
incorrect sentences in 19- and 24-month-old children. Semantically incorrect sentences
contained objects that violated the selection restrictions of the preceding verb, as in The
cat drinks the ball in contrast to The child rolls the ball. For both age groups, the sen­
tence endings of semantically incorrect sentences evoked N400-like effects in the ERP,
with a maximum at central-parietal electrode sites (Figure 9.8). In comparison to the
adult data, the negativities in children started at about the same time (i.e., at around 400
ms post-word onset) but were longer lasting. This suggests that semantically unexpected
nouns that violate the selection restrictions of the preceding verb also initiate semantic
integration processes in children but that these integration efforts are maintained longer
than in adults. The developmental ERP data indicate that even at the age of 19 and 24

Page 19 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
months, children are able to process semantic relations between words in sentences in a
similar manner to adults.

ERP studies on the processing of sentential lexical-semantic information have also report­
ed N400-like responses to semantically incorrect sentences in older children, namely 5- to
15-year-olds (Atchley et al., 2006; Hahne et al., 2004; Holcomb, Coffey, & Neville, 1992).
Similarly, Silva-Pereyra and colleagues found that sentence endings that semantically vio­
lated the preceding sentence phrases evoked several anteriorly distributed negative
peaks in 3- and 4-year-olds, whereas in 30-month-olds, an anterior negativity between
500- and 800-ms after word onset occurred (Silva-Pereyra, Klarman, Lin, & Kuhl, 2005;
Silva-Pereyra, Rivera-Gaxiola, & Kuhl, 2005). Although these studies revealed differential
responses to semantically incorrect and correct sentences in young children, the distribu­
tion of these negativities did not match the usual central-parietal maximum of the N400
seen in adults.

Figure 9.8 Sentence-level lexical-semantic informa­


tion. ERP data and difference maps (incorrect–cor­
rect) of 19-month-olds, 24-month-olds, and adults in
response to the sentence endings of semantically cor­
rect and incorrect sentences in a semantic violation
paradigm.

Modified from Friedrich & Friederici, 2005c.

Despite the different effects reported in the ERP studies on sentential semantic process­
ing, the current ERP studies suggest that semantic processes at sentence level, as reflect­
ed by an N400-like response, are, in principle, present at the end of children’s second
year of life. However, it takes a few more years (p. 185) before the neural network under­
lying these processes is established in an adult-like manner.

A recent fMRI study investigating the neural network underlying sentence-level semantic
processes in 5- to 6-year-old children and adults provides some evidence for the differ­
ence between the neural network recruited in children and adults (Brauer & Friederici,

Page 20 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
2007). Activation in children was found bilaterally in the superior temporal gyri and in
the inferior and middle frontal gyri for the processing of correct sentences and semanti­
cally incorrect sentences. Compared with adults, the children’s language network was
less lateralized, was less specialized with respect to different aspects of language pro­
cessing (semantics versus syntax, see also below), and engaged additional areas in the in­
ferior frontal cortex bilaterally. Another fMRI study examined lexical-semantic decisions
for semantically congruous and incongruous sentences in older children, aged 7 to 10
years, and adults (Moore-Parks et al., 2010). Overall, the results suggested that by the
end of children’s first decade, they employ a similar cortical network in semantic process­
ing as adults, including activation in left inferior frontal, left middle temporal, and bilater­
al superior temporal gyri. However, results also revealed developmental differences, with
adults showing greater activation in the left inferior frontal gyrus, left supramarginal
gyrus, and left inferior parietal lobule as well as motor-related regions.

Syntactic Rules
In any language, a well-defined rule system determines the composition of lexical ele­
ments, thus giving the sentence its structure. The analysis of syntactic relations between
words and phrases is a complicated process, yet children have acquired the basic syntac­
tic rules of their native language (p. 186) by the end of their third year (see Guasti, 2002;
Hirsh-Pasek, & Golinkoff, 1996; Szagun, 2006). For successful sentence comprehension,
two aspects of syntax processing appear to be of particular relevance: first, the structure
of each phrase that has to be built on the basis of word category information; and second,
the grammatical relationship between the various sentence elements, which has to be es­
tablished in order to allow the interpretation of who is doing what to whom.

Figure 9.9 Syntactic rules. ERP data of 24-month-


olds, 32-month-olds, and adults in response to syn­
tactically correct and incorrect sentences in a syntac­
tic violation paradigm.

Modified from Oberecker, Friedrich, & Friederici,


2005.

Adult ERP and fMRI studies have investigated the neural correlates of syntactic process­
ing during sentence comprehension by focusing on two aspects: phrase structure build­
ing and the establishment of grammatical relations and thereby the sentence’s interpreta­
tion. Studies of the former have used the syntactic violation paradigm (e.g., Atchley et al.,
2006; Friederici, Pfeifer, & Hahne, 1993). In this paradigm, syntactically correct and syn­
tactically incorrect sentences are presented, with the latter having morphosyntactic,
phrase structure, or tense violations. In the ERP response to syntactically incorrect sen­
Page 21 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
tences containing phrase structure violations, two components have been observed. The
first is the ELAN, an early anterior negativity, which is interpreted to reflect highly auto­
matic phrase structure building processes (Friederici, Pfeifer, & Hahne, 1993; Hahne &
Friederici, 1999). The second is the P600, a later-occurring central-parietal positivity,
which is interpreted to indicate processes of syntactic integration (Kaan et al., 2000) and
controlled processes of syntactic reanalysis and repair (Friederici, Hahne, & Mecklinger,
1996; Osterhout & Holcomb, 1993). This biphasic ERP pattern in response to phrase
structure violations has been observed for both passive and active sentence constructions
(Friederici, Pfeifer, & Hahne, 1993; Hahne, Eckstein, & Friederici, 2004; Hahne &
Friederici, 1999; Rossi, Gugler, Hahne, & Friederici, 2005).

Several developmental ERP studies have examined at what age children process phrase
structure violations and therefore show the syntax-related ERP components ELAN and
P600 as observed in adults (Oberecker, Friedrich, Friederici, 2005; Oberecker & Friederi­
ci, 2006). In these experiments, 24- and 32-month-old German children listened to syntac­
tically correct sentences and incorrect sentences that comprised incomplete prepositional
phrases. For example, the noun after the preposition was omitted as in *The lion in the
___ roars versus The lion roars. As illustrated in Figure 9.9, the adult data revealed the ex­
pected biphasic ERP pattern in response to the sentences containing a phrase structure
violation. The ERP responses of 32-month-old children showed a similar ERP pattern, al­
though both components appeared in later time windows than the adult data. Interesting­
ly, 24-month-old children also showed a difference between correct and incorrect sen­
tences; however, in this age group only, a P600 but no ELAN occurred.

Recently, Bernal, Dehaene-Lambertz, Millotte, and Christophe (2010) demonstrated that


24-month-old French children compute syntactic structure when listening to spoken sen­
tences. The authors report an early left-lateralized ERP response for word category viola­
tions (i.e., when an expected verb was incorrectly replaced by a noun, or vice versa). Sil­
va-Pereyra and colleagues examined the processing of tense violations in sentences in
children between 30 and 48 months (Silva-Pereyra et al., 2005; (p. 187) Silva-Pereyra,
Rivera-Gaxiola, & Kuhl, 2005). The ERPs to incorrect sentences revealed a late positivity
for the older children and a very late-occurring positivity for the 30-month-olds. In a re­
cent series of ERP experiments, Silva-Pereyra, Conboy, Klarmann, and Kuhl (2007)
studied syntactic processing in 3-year-olds, using natural sentences and sentences with­
out semantic information (so-called jabberwocky sentences) in which content words are
replaced by pseudowords. Children were presented with correct sentences and incorrect
sentences containing phrase structure violations. For the natural sentences, children
showed two positivities in response to the syntactic violations, whereas for the syntacti­
cally incorrect jabberwocky sentences, two negativities were observed. This ERP pattern
is certainly different from that in adults, who show an ELAN and a P600 in normal and
jabberwocky sentences, with a constant amplitude of the ELAN and a reduced P600 for
jabberwocky sentences, in which integration is not necessary (Hahne & Jescheniak, 2001;
Yamada & Neville, 2007).

Page 22 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Hahne, Eckstein, and Friederici (2004) investigated the processing of phrase structure vi­
olations in syntactically more complex, noncanonical sentences (i.e., passive sentences
such as The boy was kissed by the girl). In these sentences, the first noun (the boy) is not
the actor, which makes the interpretation more difficult than in active sentences. When a
syntactic violation occurred in passive sentences, the ELAN-P600 pattern was evoked in
7- to 13-year-old children. Six-year-olds, however, only displayed a late P600.

The combined ERP results point to developmental differences suggesting that automatic
syntactic processes, reflected by the ELAN, are present later during language develop­
ment than processes reflected by the P600. Moreover, the adult-like ERP pattern is
present earlier for active than for passive sentences. This developmental course is in line
with behavioral findings indicating that the processing of noncanonical sentences only de­
velops late, after the age of 5 years and, depending on the syntactic structure only around
the age of 7 years (Dittmar, Abbot-Smith, Lieven, & Tomasello, 2008).

The neural network underlying syntactic processes in the developing brain has recently
been investigated in an fMRI study with 5- to 6-year-olds using the syntactic violation par­
adigm (Brauer & Friederici, 2007). Sentences containing a phrase structure violation bi­
laterally activated the superior temporal gyri and the inferior and middle frontal gyri
(similar to correct and semantically incorrect sentences) but, moreover, specifically acti­
vated left Broca’s area. Compared with that in adults, this activation pattern was less lat­
eralized, less specific, and more extended. A time course analysis of the perisylvian acti­
vation across correct and incorrect sentences also revealed developmental differences. In
contrast to that in adults, children’s inferior frontal cortex responded much later than
their superior temporal cortex (Figure 9.10). Moreover, in contrast to adults, children dis­
played a temporal primacy of right-hemispheric over left-hemispheric activation (Brauer,
Neumann & Friederici, 2008), which suggests a strong reliance on right-hemisphere
prosodic processes during auditory sentence comprehension in childhood. In a recent fM­
RI study with 10- to 16-year-old children, Yeatman, Ben-Shachar, Glover, and Feldmann
(2010) investigated sentence processing by systematically varying syntactic complexity
and observed broad activation patterns in frontal, temporal, temporal-parietal and cingu­
late regions. Independent of sentence length, syntactically more complex sentences
evoked stronger activation in the left temporal-parietal junction and the right superior
temporal gyrus. Interestingly, activation changes in frontal regions correlated with vocab­
ulary and syntax perception measures. Thus, individual differences in activation patterns
demonstrate that auditory sentence comprehension is based on a dynamic and distrib­
uted network that is modulated by age, language skills, and task demands.

Page 23 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension

Conclusion

Figure 9.10 Temporal organization of cortical activa­


tion during auditory sentence comprehension. Brain
activation of adults and children in sagittal section (x
= −50) and horizontal section (z = 2). Data are
masked by random-effects activation maps at z =
2.33 and display a color coding for time-to-peak val­
ues in active voxels between 3.0 and 8.0 seconds.
The lines indicate the cut for the corresponding sec­
tion. Note the very late response in the inferior
frontal cortex in children and their hemispheric dif­
ferences in this region. Inserted diagrams demon­
strate examples of BOLD responses to sentence com­
prehension in Broca’s area and in Heschl’s gyrus.

Reprinted with permission from Brauer, Neumann, &


Friederici, 2008.

The results of the reported behavioral and neuroimaging studies broadly cover phonologi­
cal/prosodic, semantic, and syntactic aspects of language acquisition during the first
years of life. In developmental research, ERPs are well established and often the method
of choice; however, MEG, NIRS, and fMRI have recently been adjusted for use in develop­
mental populations. Because the ERP method delivers information about the neural corre­
lates of different aspects of language processing, it is an excellent tool for the investiga­
tion of the various developmental stages in language acquisition. More specifically, a par­
ticular ERP component, the MMR, which reflects discrimination not only of acoustic but
also of phonological features, can thus be used to examine very early stages of language
acquisition, even in newborns. A further ERP component that indicates lexical and seman­
tic processes in adults, the N400, has been registered in 14-month-olds, but has not been
found in 12-month-olds, and can (p. 188) be used to investigate phonotactic knowledge,
word knowledge, and knowledge of lexical-semantic relations between basic-level words
and verbs and their arguments in sentences. For the syntactic domain, an adult-like
biphasic ERP pattern, the ELAN-P600, is not yet present in 24-month-olds but is in 32-
month-old children for the processing of structural dependencies within phrases, thus

Page 24 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
characterizing the developmental progression of syntax acquisition. Other methods, par­
ticularly fMRI, deliver complementary evidence that the neural basis underlying specific
aspects of language processing, such as semantics and syntax, is still under development
for a few more years before adult-like language processes are achieved.

In summary, neuroimaging methods, in addition to behavioral studies, provide relevant in­


formation on various aspects of language processing. Although developmental research is
still far from a detailed outline of the exact steps in language acquisition, the use of so­
phisticated neuroscientific methods with high temporal or spatial resolution allows re­
searchers to study language development from very early on and to gain a more fine-
grained picture of the language acquisition process and its neural basis.

References
Atchley, R. A., Rice, M. L., Betz, S. K., Kwasney, K. M., Sereno, J. A., & Jongman, A. (2006).
A comparison of semantic and syntactic event related potentials generated by children
and adults. Brain & Language, 99, 236–246.

Balsamo, L. M., Xu B., & Gaillard W. D. (2006). Language lateralization and the role of the
fusiform gyrus in semantic processing in young children. NeuroImage, 31 (3), 1306–1314.

Bentin, S., Mouchetant-Rostaing, Y., Giard, M. H., Echallier, J. F., & Pernier, J. (1999). ERP
manifestations of processing printed words at different psycholinguistic levels: Time
course and scalp distribution. Journal of Cognitive Neuroscience, 11 (3), 235–260.

Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-year-
(p. 189)

olds compute syntactic structure online. Developmental Science, 13 (1), 69–76.

Booth, J. R., Burman, D. D., Meyer, J. R., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M.
(2004). Development of brain mechanisms for processing orthographic and phonologic
representations. Journal of Cognitive Neuroscience, 16 (7), 1234–1249.

Brauer, J., Anwander, A. & Friederici, A. D. (2011). Neuroanatomical prerequisites for lan­
guage functions in the maturing brain. Cerebral Cortex, 21, 459–466.

Brauer, J., & Friederici, A. D. (2007). Functional neural networks of semantic and syntac­
tic processes in the developing brain. Journal of Cognitive Neuroscience, 19 (10), 1609–
1623.

Brauer, J., Neumann, J., & Friederici, A. D. (2008). Temporal dynamics of perisylvian acti­
vation during language processing in children and adults. NeuroImage, 41 (4), 1484–
1492.

Cao, F., Khalid, K., Zaveri, R., Bolger, D. J., Bitan, T., & Booth, J. R. (2010). Neural corre­
lates of priming effects in children during spoken word processing with orthographic de­
mands. Brain & Language, 114 (2), 80–89.

Page 25 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., et al. (1998). Devel­
opment of language-specific phoneme representations in the infant brain. Nature Neuro­
science, 1, 351–353.

Cheour, M., Imada, T., Taulu, S., Ahonen, A., Salonen, J., & Kuhl, P. K. (2004). Magnetoen­
cephalography is feasible for infant assessment of auditory discrimination. Experimental
Neurology, 190, 44–51.

Chou, T. L., Booth, J. R., Burman, D. D., Bitan, T., Bigio, J. D., Lu, D., & Cone, N. E. (2006).
Developmental changes in the neural correlates of semantic processing. NeuroImage, 29,
1141–1149.

Clark, E. V. (2003). First language acquisition. Cambridge, MA: Cambridge University


Press.

Courchesne, E. (1990). Chronology of postnatal human brain development: Event-related


potential, positron emission tomography, myelogenesis, and synaptogenesis studies. In J.
W. Rohrbaugh, R. Parasuraman, & R. Johnson (Eds.), Event-related brain potentials: Basic
issues and applications (pp. 210–241). New York: Oxford University Press.

Csibra, G., Kushnerenko, E., & Grossmann, T. (2008). Electrophysiological methods in


studying infant cognitive development. In: C. Nelson & M. Luciana (Eds.), Handbook of
developmental cognitive neuroscience, 2nd ed. (pp. 247–262). Cambridge, MA: MIT
Press.

Cutler, A., & Carter, D. (1987). The predominance of strong initial syllables in the English
vocabulary. Computational Speech and Language, 2, 133–142.

de Guibert, C., Maumeta, C., Ferréa, J.-C., Jannina, P., Birabeng, A., Allairee, C., Barillota,
C., & Le Rumeur, E. (2010). FMRI language mapping in children: A panel of language
tasks using visual and auditory stimulation without reading or metalinguistic require­
ments. NeuroImage, 51 (2), 897–909.

Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral correlates of syllable
discrimination in infants. Nature, 370, 292–295.

Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging


of speech perception in infants. Science, 298 (5600), 2013–2015.

Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Mériaux, S., Roche, A., Sigman, M.,
et al. (2006). Functional organization of perisylvian activation during presentation of sen­
tences in preverbal infants. Proceedings of the National Academy of Sciences U S A, 103,
14240–14245.

Dehaene-Lambertz, G., Montavont, A., Jobert, A., Allirol, L., Dubois, J., Hertz-Pannier, L.,
& Dehaene, S. (2010). Language or music, mother or Mozart? Structural and environmen­
tal influences on infants’ language networks. Brain & Language, 114 (2), 53–65.

Page 26 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Dittmar, M., Abbot-Smith, K., Lieven, E., & Tomasello, M. (2008). Young German
children’s early syntactic competence: A preferential looking study. Developmental
Science, 11 (4), 575–582.

Dubois, J., Dehaene-Lambertz, G., Perrin, M., Mangin, J.-F., Cointepas, Y., Duchesnay, E.,
et al. (2008). Asynchrony of the early maturation of white matter bundles in healthy in­
fants: Quantitative landmarks revealed noninvasively by diffusion tensor imaging. Human
Brain Mapping, 29 (1), 14–27.

Fonov, V. S., Evans, A. C., Botteron, K., Almli, C. R., McKinstry, R. C., Collins, D. L., et al.
(2011). Unbiased average age-appropriate atlases for pediatric studies. NeuroImage, 54,
313–327.

Friederici, A. D. (2005). Neurophysiological markers of early language acquisition: From


syllables to sentences. Trends in Cognitive Sciences, 9, 481–488.

Friederici, A. D., & Alter, K. (2004). Lateralization of auditory language functions: A dy­
namic dual pathway model. Brain and Language, 89, 267–276.

Friederici, A. D., & Wessels, J. M. (1993). Phonotactic knowledge and its use in infant
speech perception. Perception and Psychophysics, 54, 287–295.

Friederici, A. D., Friedrich, M., & Christophe, A. (2007). Brain responses in 4-month-old
infants are already language specific. Current Biology, 17 (14), 1208–1211.

Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestation of cognitive and
precognitive mismatch detection in early infancy. NeuroReport, 13, 1251–1254.

Friederici, A. D., Hahne, A., & Mecklinger, A. (1996). Temporal structure of syntactic
parsing: Early and late event-related brain potential effects. Journal of Experimental Psy­
chology: Learning Memory and Cognition, 22, 1219–1248.

Friederici, A. D., Pfeifer, E., & Hahne, A. (1993). Event-related brain potentials during
natural speech processing: Effects of semantic, morphological and syntactic violations.
Cognitive Brain Research, 1, 183–192.

Friedrich, M., & Friederici, A. D. (2004). N400-like semantic incongruity effect in 19-
month-olds: Processing known words in picture contexts. Journal of Cognitive Neuro­
science, 16, 1465–1477.

Friedrich, M., & Friederici, A. D. (2005a). Phonotactic knowledge and lexical-semantic


priming in one-year-olds: Brain responses to words and nonsense words in picture con­
texts. Journal of Cognitive Neuroscience, 17 (11), 1785–1802.

Friedrich, M., & Friederici, A. D. (2005b). Lexical priming and semantic integration re­
flected in the ERP of 14-month-olds. NeuroReport, 16 (6), 653–656.

Page 27 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Friedrich, M., & Friederici, A. D. (2005c). Semantic sentence processing reflected in the
event-related potentials of one- and two-year-old children. NeuroReport, 16 (6), 1801–
1804.

Friedrich, M., & Friederici, A. D. (2010). Maturing brain mechanisms and developing be­
havioral language skills. Brain and Language, 114, 66–71.

Gervain, J., Mehler, J., Werker, J. F., Nelson, C. A., Csibra, G., Lloyd-Fox, S., et al. (2011).
Near-infrared spectroscopy: A report from the McDonnell infant methodology consor­
tium. Developmental Cognitive Neuroscience, 1 (1), 22–46.

Gleitman, L. R., & Wanner, E. (1982). The state of the state of the art. In E. Wan­
(p. 190)

ner & L. Gleitman (Eds.), Language acquisition: The state of the art (pp. 3–48). Cam­
bridge, MA: Cambridge University Press.

Gout, A., Christophe, A., & Morgan, J. L. (2004). Phonological phrase boundaries con­
strain lexical access. II. Infant data. Journal of Memory and Language, 51, 548–567.

Goyet, L., de Schonen, S., & Nazzi, T. (2010). Words and syllables in fluent speech seg­
mentation by French-learning infants: An ERP study. Brain Research, 1332, 75–89.

Gratton, G., & Fabiani, M. (2001). Shedding light on brain function: The event-related op­
tical signal. Trends in Cognitive Sciences, 5, 357–363.

Grossmann, T., Johnson, M. H., Lloyd-Fox, S., Blasi, A., Deligianni, F., Elwell, C., et al.
(2008). Early cortical specialization for face-to-face communication in human infants. Pro­
ceedings of the Royal Society B, 275, 2803–2811.

Grossmann, T., Oberecker, R., Koch, S. P., & Friederici, A. D. (2010). Developmental ori­
gins of voice processing in the human brain. Neuron, 65, 852–858.

Guasti, M. T. (2002). Language acquisition: The growth of grammar. Cambridge, MA: MIT
Press.

Hahne, A., & Friederici, A. D. (1999). Electrophysiological evidence for two steps in syn­
tactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuro­
science, 11, 194–205.

Hahne, A., & Friederici, A. D. (2002). Differential task effects on semantic and syntactic
processes as revealed by ERPs. Cognitive Brain Research, 13, 339–356.

Hahne, A., & Jescheniak, J. D. (2001). What’s left if the Jabberwock gets the semantics?
An ERP investigation into semantic and syntactic processes during auditory sentence
comprehension. Cognitive Brain Research, 11, 199–212.

Hahne, A., Eckstein, K., & Friederici, A. D. (2004). Brain signatures of syntactic and se­
mantic processes during children’s language development. Journal of Cognitive Neuro­
science, 16, 1302–1318.

Page 28 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature
Reviews Neuroscience, 8, 393–402.

Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The origins of grammar: Evidence from early
language comprehension. Cambridge, MA: MIT Press.

Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). Language
specific prosodic preferences during the first half year of life: Evidence from German and
French infants. Infant Behavior and Development, 32 (3), 262–274.

Höhle, B., Weissenborn, J., Schmitz, M., & Ischebeck, A. (2001). Discovering word order
regularities: The role of prosodic information for early parameter setting. In J. Weis­
senborn & B. Höhle (Eds.), Approaches to bootstrapping. Phonological, lexical, syntactic
and neurophysiological aspects of early language acquisition (Vol. 1, p. 249–265). Amster­
dam: John Benjamins.

Holcomb, P. J. (1993). Semantic priming and stimulus degradation: Implications for the
role of the N400 in language processing. Psychophysiology, 30, 47–61.

Holcomb, P. J., Coffey, S. A., & Neville, H. J. (1992). Visual and auditory sentence process­
ing: A developmental analysis using event-related brain potentials. Developmental Neu­
ropsychology, 8, 203–241.

Homae, F., Watanabe, H., Nakano, T., Asakawa, K., & Taga, G. (2006). The right hemi­
sphere of sleeping infant perceives sentential prosody. Neuroscience Research, 54 (4),
276–280.

Homae, F., Watanabe, H., Nakano, T., & Taga, G. (2007). Prosodic processing in the devel­
oping brain. Neuroscience Research, 59 (1), 29–39.

Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., & Cutler, A. (2000). Cross-lan­
guage word segmentation by 9-month-olds. Psychonomic Bulletin Review 7, 504–509.

Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., & Kuhl, P. K. (2006). Infant speech
perception activates Broca`s area: A developmental magnetoencephalography study. Neu­
roReport, 17, 957–962.

Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant
stress patterns of English words. Child Development, 64, 675–687.

Jusczyk, P. W., Friederici, A. D., Wessels, J. M. I., Svenkerud, V., & Jusczyk, A. M. (1993).
Infants’ sensitivity to the sound patterns of native language words. Journal of Memory
and Language, 32, 402–420.

Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmenta­
tion in English-learning infants. Cognitive Psychology, 39 (3–4), 159–207.

Page 29 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Kaan, E., Harris, A., Gibson, E., & Holcomb, P. (2000). The P600 as an index of syntactic
integration difficulty. Language and Cognitive Processes, 15, 159–201.

Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmenta­
tion: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591–612.

Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word
recognition. In A. D. Friederici & G. Thierry (Eds.), Early language development: Bridging
brain and behaviour (TiLAR 5, p. 91–114). Amsterdam: John Benjamins.

Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews
Neuroscience, 5, 831–843.

Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T.
(2008). Phonetic learning as a pathway to language: New data and native language mag­
net theory expanded (NLM-e). Philosophical Transactions of the Royal Society B, 363,
979–1000.

Kuijpers, C. T. L., Coolen, R., Houston, D., Cutler, A. (1998). Using the headturning tech­
nique to explore cross-linguistic performance differences. Advances in Infancy Research,
12, 205–220.

Kujala, A., Huotilainen, M., Hotakainen, M., Lennes, M., Parkkonen, L., Fellman, et al.
(2004). Speech-sound discrimination in neonates as measured with MEG. NeuroReport,
15 (13), 2089–2092.

Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., Huotilainen, M., & Näätänen, R.
(2002b). Maturation of the auditory event-related potentials during the first year of life.
NeuroReport, 13, 47–51.

Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., & Näätänen, R. (2002a). Matura­
tion of the auditory change detection response in infants: A longitudinal ERP study. Neu­
roReport, 13 (15), 1843–1846.

Kushnerenko, E., Cheour, M., Ceponiene, R., Fellman, V., Renlund, M., Soininen, K., et al.
(2001). Central auditory processing of durational changes in complex speech patterns by
newborns: An event-related brain potential study. Developmental Neuropsychology, 19
(1), 83–97.

Kutas, M., & van Petten, C. K. (1994). Psycholinguistics electrified: Event-related brain
potential investigations. In M. A. Gernsbacher (Ed.), Handbook of psycholinguistics (pp.
83–143). San Diego, CA: Academic Press.

Leach, J. L., & Holland, S. K. (2010). Functional MRI in children: Clinical and research ap­
plications. Pediatric Radiology, 40, 31–49.

Page 30 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Leppänen, P. H. T., Pikho, E., Eklund, K. M., & Lyytinen, H. (1999). Cortical re­
(p. 191)

sponses of infants with and without a genetic risk for dyslexia: II. Group effects. NeuroRe­
port, 10, 969–973. Cambridge, MA: MIT Press.

Lloyd-Fox, S., Blasi, A., & Elwell, C. E. (2010) Illuminating the developing brain: The past,
present and future of functional near infrared spectroscopy. Neuroscience and Biobehav­
ioural Reviews, 34 (3), 269–284.

Li, X. S., Shu, H., Liu, Y. Y., & Li, P. (2006). Mental representation of verb meaning: Behav­
ioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 18 (10), 1774–
1787.

Männel, C., & Friederici, A. D. (2008). Event-related brain potentials as a window to


children’s language processing: From syllables to sentences. In I. A. Sekerina, E. M. Fer­
nandez, & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in
children’s language processing (LALD 44, p. 29–72). Amsterdam: John Benjamins.

Männel, C., & Friederici, A. D. (2009). Pauses and intonational phrasing: ERP studies in 5-
month-old German infants and adults. Journal of Cognitive Neuroscience, 21 (10), 1988–
2006.

Männel, C., & Friederici, A. D. (2010). Prosody is the key: ERP studies on word segmenta­
tion in 6- and 12-month-old children. Journal of Cognitive Neuroscience, Supplement,
261.

Männel, C., & Friederici, A. D. (2011). Intonational phrase structure processing at differ­
ent stages of syntax acquisition: ERP studies in 2-, 3-, and 6-year-old children. Develop­
mental Science, 14 (4), 786–798.

Mestres-Misse, A., Rodriguez-Fornells, A., & Münte, T. F. (2010). Neural differences in the
mapping of verb and noun concepts onto novel words. NeuroImage, 49, 2826–2835.

Meyer, M., Alter, K., Friederici, A. D., Lohmann, G., & von Cramon, D. Y. (2002). fMRI re­
veals brain regions mediating slow prosodic modulations in spoken sentences. Human
Brain Mapping, 17 (2), 73–88.

Meyer, M., Steinhauer, K., Alter, K., Friederici, A. D., & von Cramon, D. Y. (2004). Brain
activity varies with modulation of dynamic pitch variance in sentence melody. Brain and
Language, 89 (2), 277–289.

Mills, D. L., Coffey-Corina, S. A., & Neville, H. J. (1997). Language comprehension and
cerebral specification from 13 to 20 months. Developmental Neuropsychology, 13 (3),
397–445.

Mills, D. L., Plunkett, K., Prat, C., & Schafer, G. (2005). Watching the infant brain learn
words: Effects of vocabulary size and experience. Cognitive Development, 20, 19–31.

Page 31 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Mills, D. L., Prat, C., Zangl, R., Stager, C. L., Neville, H. J., & Werker, J. F. (2004). Lan­
guage experience and the organization of brain activity to phonetically similar words:
ERP evidence from 14- and 20-month-olds. Journal of Cognitive Neuroscience, 16 (8),
1452–1464.

Minagawa-Kawai, Y., Mori, K., Furuya, I., Hayashi R., & Sato, Y. (2002). Assessing cere­
bral representations of short and long vowel categories by NIRS. NeuroReport, 13, 581–
584.

Minagawa-Kawai, Y., Mori, K., Hebden, J., & Dupoux, E. (2008). Optical imaging of in­
fants’ neurocognitive development: Recent advances and perspectives. Developmental
Neurobiology, 68 (6), 712–728.

Minagawa-Kawai, Y., Mori, K., Naoi, N., & Kojima, S. (2007). Neural attunement process­
es in infants during the acquisition of a language-specific phonemic contrast. Journal of
Neuroscience, 27, 315–321.

Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R., & Dupoux, E.
(2011). Optical brain imaging reveals general auditory and language-specific processing
in early infant development. Cerebral Cortex, 21 (2), 254–261.

Moore-Parks, E. N., Burns, E. L., Bazzill, R., Levy, S., Posada, V., & Muller, R. A. (2010). An
fMRI study of sentence-embedded lexical-semantic decision in children and adults. Brain
and Language, 114 (2), 90–100.

Morr, M. L., Shafer, V. L., Kreuzer, J., & Kurtzberg, D. (2002). Maturation of mismatch
negativity in infants and pre-school children. Ear and Hearing, 23, 118–136.

Muzik, O., Chugani, D. C., Juhasz, C., Shen, C., & Chugani, H. T. (2000). Statistical para­
metric mapping: Assessment of application in children. NeuroImage, 12, 538–549.

Näätänen, R. (1990). The role of attention in auditory information processing as revealed


by event-related potentials and other brain measures of cognitive function. Behavioral
and Brain Sciences, 13, 201–288.

Nazzi, T., Dilley, L. C., Jusczyk, A. M., Shattuck-Hufnagel, S., & Jusczyk, P. W. (2005). Eng­
lish-learning infants’ segmentation of verbs from fluent speech. Language and Speech,
48, 279–298.

Nazzi, T., Iakimova, G., Bertoncini, J., Frédonie, S., & Alcantara, C. (2006). Early segmen­
tation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic
differences. Journal of Memory and Language, 54, 283–299.

Nazzi, T., Kemler Nelson, D. G., Jusczyk, P.W., & Jusczyk, A. M. (2000). Six-month-olds’ de­
tection of clauses embedded in continuous speech: Effects of prosodic well-formedness.
Infancy, 1 (1), 123–147.

Page 32 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Nobre, A. C., & McCarthy, G. (1994). Language-related ERPs: Scalp distributions and
modulation by word type and semantic priming. Journal of Cognitive Neuroscience, 6 (33),
233–255.

Oberecker, R., & Friederici, A. D. (2006). Syntactic event-related potential components in


24-month-olds’ sentence comprehension. NeuroReport, 17 (10), 1017–1021.

Oberecker, R., Friedrich, M., & Friederici, A. D. (2005). Neural correlates of syntactic
processing in two-year-olds. Journal of Cognitive Neuroscience, 17, 407–421.

Obrig, H., & Villringer, A. (2003). Beyond the visible: Imaging the human brain with light.
Journal of Cerebral Blood Flow and Metabolism, 23, 1–18.

Okamoto, M., Dan, H., Shimizu, K., Takeo, K., Amita, T. Oda, I., et al. (2004). Multimodal
assessment of cortical activation during apple peeling by NIRS and fMRI. NeuroImage,
21, 1275–1288.

Osterhout, L., & Holcomb, P. J. (1993). Event-related brain potentials and syntactic anom­
aly: Evidence on anomaly detection during perception of continuous speech. Language
and Cognitive Processes, 8, 413–437.

Pannekamp, A., Toepel, U., Alter, K., Hahne, A., & Friederici, A. D. (2005). Prosody-driven
sentence processing: An event-related brain potential study. Journal of Cognitive Neuro­
science, 17, 407–421.

Pena, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., et al.
(2003). Sounds and silence: An optical topography study of language recognition at birth.
Proceedings of the National Academy of Sciences U S A, 100 (20), 11702–11705.

Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., Baldoli, C., &
Koelsch, S. (2010). Functional specializations for music processing in the human newborn
brain. Proceedings of the National Academy of Sciences U S A, 107 (10), 4758–4763.

Pihko, E., Leppänen, P. H. T., Eklund, K. M., Cheour, M., Guttorm, T. K., & Lyyti­
(p. 192)

nen, H. (1999). Cortical responses of infants with and without a genetic risk for dyslexia:
I. Age effects. NeuroReport, 10, 901–905.

Rivera-Gaxiola, M., Silva-Pereyra, J., & Kuhl, P. K. (2005). Brain potentials to native- and
non-native speech contrasts in seven- and eleven-month-old American infants. Develop­
mental Science, 8, 162–172.

Rivkin, M. J., Wolraich, D., Als, H., McAnulty, G., Butler, S., Conneman, N., et al. (2004).
Prolonged T*[2] values in newborn versus adult brain: Implications for fMRI studies of
newborns. Magnetic Resonance in Medicine, 51 (6), 1287–1291.

Rossi, S., Gugler, M. F., Hahne, A., & Friederici, A. D. (2005). When word category infor­
mation encounters morphosyntax: An ERP study. Neuroscience Letters, 384, 228–233.

Page 33 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Saito, Y., Aoyama, S., Kondo, T., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M.,
& Toshima, T. (2007a). Frontal cerebral blood flow change associated with infant-directed
speech. Archives of Disease in Childhood. Fetal and Neonatal Edition, 92 (2), F113–F116.

Saito, Y., Kondo, T., Aoyama, S., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M.,
& Toshima, T. (2007b). The function of the frontal lobe in neonates for response to a
prosodic voice. Early Human Development, 83 (4), 225–230.

Sambeth, A., Ruohio, K., Alku, P., Fellman, V., & Huotilainen, M. (2008). Sleeping new­
borns extract prosody from continuous speech. Clinical Neurophysiology, 119 (2), 332–
341.

Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of
multisyllabic stressed words. Developmental Psychology, 33 (1), 3–11.

Schafer, A. J., Speer, S. R., Warren, P., & White, S. D. (2000). Intonational disambiguation
in sentence production and comprehension. Journal of Psycholinguistic Research, 29,
169–182.

Schapiro, M. B., Schmithorst, V. J., Wilke, M., Byars Weber, A., Strawsburg, R. H., & Hol­
land, S. K. (2004). BOLD fMRI signal increases with age in selected brain regions in chil­
dren. NeuroReport, 15 (17), 2575–2578.

Schlaggar, B. L., Brown, T. T., Lugar, H. L., Visscher, K. M., Miezin, F. M., & Petersen, S.
E. (2002). Functional neuroanatomical differences between adults and school-age chil­
dren in the processing of single words. Science, 296, 1476–1479.

Schroeter, M. L., Zysset, S., Wahl, M., & von Cramon, D. Y. (2004). Prefrontal activation
due to Stroop interference increases during development: An event-related fNIRS study.
NeuroImage, 23, 1317–1325.

Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Jour­
nal of Memory and Language, 57, 24–48.

Seidl, A., & Johnson, E. K. E. (2007). Boundary alignment facilitates 11-month-olds’ seg­
mentation of vowel-initial words from speech. Journal of Child Language, 34, 1–24.

Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure.
Cambridge, MA: MIT Press.

Silva-Pereyra, J., Conboy, B. T., Klarman, L., & Kuhl, P. K. (2007). Grammatical processing
without semantics? An event-related brain potential study of preschoolers using jabber­
wocky sentences. Journal of Cognitive Neuroscience, 19 (6), 1–16.

Silva-Pereyra, J., Klarman, L., Lin, L. J., & Kuhl, P. K. (2005). Sentence processing in 30-
month-old children: An event-related potential study. NeuroReport, 16, 645–648.

Page 34 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Silva-Pereyra, J., Rivera-Gaxiola, M., & Kuhl, P. K. (2005). An event-related brain potential
study of sentence comprehension in preschoolers: Semantic and morphosyntactic pro­
cessing. Cognitive Brain Research, 23, 247–258.

Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastián-Gallés, N., &
Peperkamp, S. (2009). Language-specific stress perception by nine-month-old French and
Spanish infants. Developmental Science, 12, 914–919.

Soderstrom, M., Nelson, D. G. K., & Jusczyk, P. W. (2005). Six-month-olds recognize claus­
es embedded in different passages of fluent speech. Infant Behavior & Development, 28,
87–94.

Soderstrom, M., Seidl, A., Nelson, D. G. K., & Jusczyk, P. W. (2003). The prosodic boot­
strapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Lan­
guage, 49 (2), 249–267.

Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate
use of prosodic cues in natural speech processing. Nature Neuroscience, 2, 191–196.

Szagun, G. (2006). Sprachentwicklung beim Kind. Weinheim: Beltz.

Tan, A., & Molfese, D. L. (2009). ERP Correlates of noun and verb processing in
preschool-age children. Biological Psychology, 8 (1), 46–51.

Thierry, G., Vihman, M., & Roberts, M. (2003). Familiar words capture the attention of 11-
month-olds in less than 250 ms. NeuroReport, 14, 2307–2310.

Torkildsen, J. V. K., Sannerud, T., Syversen, G., Thormodsen, R., Simonsen, H. G., Moen, I.,
et al. (2006). Semantic organization of basic level words in 20-month-olds: An ERP study.
Journal of Neurolinguistics, 19, 431–454.

Torkildsen, J. V. K., Syversen, G., Simonsen, H. G., Moen, I., Smith, L., & Lindgren, M.
(2007). Electrophysiological correlates of auditory semantic priming in 24-month-olds.
Journal of Neurolinguistics, 20, 332–351.

Trainor, L., Mc Fadden, M., Hodgson, L., Darragh Barlow, J., Matsos, L., & Sonnadara, R.
(2003). Changes in auditory cortex and the development of mismatch negativity between
2 and 6 months of age. International Journal of Psychophysiology, 51, 5–15.

Tsao, F.-M., Liu, H.-M., & Kuhl, P. K. (2004). Speech perception in infancy predicts lan­
guage development in the second year of life: A longitudinal study. Child Development,
75, 1067–1084.

Vannest, J., Karunanayaka, P. R., Schmithorst, V. J., Szaflarski, J. P., & Holland, S. K.
(2009). Language networks in children: Evidence from functional MRI studies. American
Journal of Roentgenology, 192 (5), 1190–1196.

Page 35 of 36
Neural Correlates of the Development of Speech Perception and Compre­
hension
Villringer, A., & Chance, B. (1997). Noninvasive optical spectroscopy and imaging of hu­
man brain function. Trends in Neuroscience, 20, 435–442.

Weber, C., Hahne, A., Friedrich, M., & Friederici, A. D. (2004). Discrimination of word
stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research,
18, 149–161.

West, W. C., & Holcomb, P. J. (2002). Event-related potentials during discourse-level se­
mantic integration of complex pictures. Cognitive Brain Research, 13, 363–375.

Wilke, M., Holland, S. K., Altaye, M., & Gaser, C. (2008). Template-O-Matic: A toolbox for
creating customized pediatric templates. NeuroImage, 41 (3), 903–913.

Yamada, Y., & Neville, H. J. (2007). An ERP study of syntactic processing in English and
nonsense sentences. Brain Research, 1130, 167–180.

Yeatman, J. D., Ben-Shachar, M., Glover, G. H., & Feldman, H. M. (2010). Individual differ­
ences in auditory sentence comprehension in children: An exploratory event-related func­
tional magnetic resonance imaging investigation. Brain & Language, 114 (2), 72–79.

Angela Friederici

Angela D. Friederici, Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany.

Claudia Männel

Claudia Männel, Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany

Page 36 of 36
Perceptual Disorders

Perceptual Disorders  
Josef Zihl
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0010

Abstract and Keywords

Perceptual processes provide the basis for mental representation of the visual, auditory,
olfactory, gustatory, somatosensory, and social “worlds” as well as for guiding and con­
trolling cognitive, social, and motor activities. All perceptual systems, i.e. vision, audition,
somatosensory perception, smell and taste, and social perception are segregated func­
tional networks and show a parallel-hierarchical type of organization of information pro­
cessing and encoding. In pathological conditions such as acquired brain injury, perceptu­
al functions and abilities can be variably affected, ranging from the loss of stimulus detec­
tion to impaired recognition. Despite the functional specialization of perceptual systems,
association of perceptual deficits within sensory modalities is the rule, and disorders of a
single perceptual function or ability are rare. This chapter describes cerebral visual, audi­
tory, somatosensory, olfactory, and gustatory perceptual disorders within a neuropsycho­
logical framework. Disorders in social perception are also considered because they repre­
sent a genuine category of perceptual impairments.

Keywords: vision, audition, somatosensory perception, smell, taste, social perception, cerebral perceptual disor­
ders

Introduction
Perception “is the process or result of becoming aware of objects, relationships, and
events by means of the senses,” which includes activities such as detecting, discriminat­
ing, identifying, and recognizing. “These activities enable organisms to organize and in­
terpret the stimuli received into meaningful knowledge” (APA, 2007). Perception is con­
structed in the brain and involves lower and higher level processes that serve simpler and
more complex perceptual abilities such as detection, identification, and recognition
(Mather, 2006). The behavioral significance of perception lies not only in the processing
of stimuli as a basis for mental representation of the visual, auditory, olfactory, gustatory,
somatosensory, and social “worlds” but also in the guidance and control of activities. Thus
there exists a reciprocal interaction between perception, cognition, and action. For per­
ceptual activities, attention, memory, and executive functions are crucial prerequisites.
Page 1 of 37
Perceptual Disorders

They form the bases for focusing on stimuli and maintaining attention during stimulus ac­
quisition and processing, storing percepts as experience and concepts, and controlling in­
put and output activities that allow for an optimal, flexible adaptation to extrinsic and in­
trinsic challenges.

The aim of this chapter is to describe the effect of pathological conditions, particularly ac­
quired brain injury, on the various abilities in the domains of vision, audition, somatosen­
sory perception, smell and taste, and social perception as well as the behavioral conse­
quences and the significance of these disorders for the understanding of brain organiza­
tion. Perceptual disorders can result from injury to the afferent sensory pathways and/or
to their subcortical and cortical processing and coding (p. 194) stages. Peripheral injury
usually causes “lower level” dysfunctions (e.g., threshold elevation or difficulties with
stimulus localization and sensory discrimination), whereas central injuries cause “higher
level” perceptual dysfunctions (e.g., in the domains of identification and recognition).
However, peripheral sensory deficits may also be associated with higher perceptual disor­
ders because the affected sensory functions and their interactions represent a crucial
prerequisite for more complex perceptual abilities (i.e., detection and discrimination of
stimuli build the basis for identification and recognition).

Vision
Visual perception comprises lower level visual abilities (i.e., the visual field, visual acuity,
contrast sensitivity, color and form vision, and stereopsis) and higher level visual abilities,
in particular visual identification and recognition. Visual perceptual abilities also form the
basis for visually guided behavior, such as oculomotor activities, hand and finger move­
ments, and spatial navigation. From its very beginning, visual neuroscience has been con­
cerned with the analysis of the various visual perceptual deficits and the identification of
the location of the underlying brain injury. Early clinical reports on patients have already
demonstrated the selective loss of visual abilities after acquired brain injury. These obser­
vations have suggested a functional specialization of the visual cortex, a concept verified
many years later by combined anatomical, electrophysiological, and behavioral evidence
(Desimone & Ungerleider, 1989; Grill-Spector & Malach, 2004; Orban, 2008; Zeki, 1993).
The primary visual cortical area (striate cortex, Brodmann area 17, visual area 1, or V1)
receives its input from the retina via the lateral geniculate body (LGN) and possesses a
highly accurate, topographically organized representation of the retina and thus of the vi­
sual field. The central visual field occupies a large proportion of the striate cortex; about
half of the cortical surface is devoted to the central 10 degrees of the visual field, which is
only 1 percent of the visual field (Tootell, Hadjikhani, Mendola, Marrett, & Dale, 1998). In
addition, V1 distributes specific visual signals to the other visual areas that are located in
the surrounding cortex (for a review, see Bullier, 2003). This anatomical and functional or­
ganization enables the visual brain to deal with the processing of global and local fea­
tures of visual objects and scenes. The result of processing at distinct levels of complexity
at each stage can be flexibly and dynamically integrated into coherent perception (Bar­
tels & Zeki, 1998; Tootell et al., 1998; Zeki, 1993; Zeki & Bartels, 1998). Because of the
Page 2 of 37
Perceptual Disorders

inhomogeneity of spatial resolution and acuity in the visual field (Anstis, 1974), the field
size for processing visual details (e.g., form vision) is much smaller, comprising the inner
9 degrees of the binocular visual field (i.e., macular region; Henderson, 2003). Occipital-
parietal, posterior parietal, and prefrontal mechanisms guarantee rapid global context ex­
traction as well as visual spatial working memory (Bar, 2004; Henderson, 2003; Hochstein
& Ahissar, 2002).

Ungerleider and Mishkin (1982) have characterized the functional specialization of the vi­
sual brain as consisting of two processing streams: The “where” pathway or dorsal route,
comprising occipital-parietal visual areas and connections, is specialized in space pro­
cessing; and the “what” pathway or ventral route, comprising occipital-temporal visual
areas and connections, is specialized in object processing. According to Milner and
Goodale (2008), information processed in the dorsal pathway is used for the implicit visu­
al guidance of actions, whereas explicit perception is associated with processing in the
ventral stream. Because visual perception usually involves both space- and object-based
information processing, cooperation and interaction between the two visual streams are
required (Goodale & Westwood, 2004). In addition, both routes interact either directly or
indirectly via attention involving the inferior parietal cortex (Singh-Curry & Husain,
2009) and working memory involving the prefrontal cortex (Goodale & Westwood, 2004;
Oliveri et al., 2001). Eye movements play a crucial role in visual processing and thus in vi­
sual perception (for a comprehensive review, see Martinez-Conde, Macknik, & Hubel,
2004). The posterior thalamus and its reciprocal connections with cortical regions in the
occipital, parietal, and frontal lobes and with the limbic neocortex form a cortical-subcor­
tical network subserving intentionally guided and externally triggered attention as well as
saccadic eye movements that are involved in visual information processing (e.g., Anders­
son et al., 2007; Dean, Crowley & Platt 2004; Himmelbach, Erb, & Karnath, 2006; Nobre,
2001; Olson et al., 2000; Schiller & Tehovnik, 2001, 2005). Complex visual stimuli (e.g.,
objects and faces) are coded as specific categories in extrastriate regions in the ventral
visual pathway (Grill-Spector, 2003; Sigala, 2004; Wierenga et al., 2009). Top-down
processes involving the prefrontal cortex facilitate visual object recognition (Bar, 2003),
and hippocampal-dependent memory builds the basis for experience-dependent visual
scanning (Smith & Squire, 2008). (p. 195) Yet, it is still unclear how the brain eventually
codes complex visual stimuli for accurate identification and recognition; it appears, how­
ever, that complex visual stimuli are simultaneously represented in two parallel and hier­
archically organized processing systems in the ventral and dorsal visual pathways (Konen
& Kastner, 2008).

About 30 percent of patients with acquired brain injury suffer from visual disorders
(Clarke, 2005; Rowe et al., 2009; Suchoff et al., 2008). Lower level visual functions and
abilities (e.g., visual detection and localization, visual acuity, contrast sensitivity, and col­
or discrimination) may be understood as perceptual architecture, whereas higher level,
visual-cognitive capacities (e.g., text processing and recognition) also involve learning
and memory processes as well as executive functions. Selective visual disorders after
brain injury are the exception rather than the rule because small “strategic” lesions are
very rare and visual cortical areas are intensely interconnected. Injury to the visual brain,
Page 3 of 37
Perceptual Disorders

that is, to visual cortical areas and fiber connections, therefore commonly causes an asso­
ciation of visual disorders.

Visual Field

A homonymous visual field defect is defined as a restriction of the normal visual field
caused by injury to the afferent postchiasmatic visual pathway, that is, an interruption in
the flow of visual information between the optic chiasm and the striate cortex. Homony­
mous visual field disorders are characterized by partial or total blindness in correspond­
ing visual field regions of each eye. In the case of unilateral postchiasmatic brain injury,
vision may be lost in the left or right hemifield (homonymous left- or right-sided hemi­
anopia), the left or right upper or lower quadrants (homonymous upper or lower quadra­
nopia in the left or right hemifield), or a restricted portion in the parafoveal visual field
(paracentral scotoma). The most common type of homonymous visual field disorders is
hemianopia (loss of vision in one hemifield), followed by quadranopia (loss of vision in one
quadrant) and paracentral scotoma (island of blindness in the parafoveal field region). Vi­
sual field defects are either absolute (complete loss of vision, anopia) or relative (de­
pressed vision, amblyopia, hemiachromatopsia). Homonymous amblyopia typically affects
the entire hemifield (hemiamblyopia), and homonymous achromatopsia (i.e., the selective
loss of color vision) typically affects one hemifield (hemiachromatopsia) or the upper
quadrant. Visual field defects differ with respect to visual field sparing. Foveal sparing
refers to sparing of the foveal region (1 degree), macular sparing refers to the preserva­
tion of the macular region (5 degrees), and macular splitting refers to a sparing of less
than 5 degrees (for review, see Harrington & Drake, 1990). In the majority of patients
(71.5 percent of 876 cases), field sparing does not exceed 5 degrees. As a rule, patients
with small visual field sparing are more disabled, especially with regard to reading.
Stroke represents the most common etiology, but other etiologies such as traumatic brain
injury, tumors, multiple sclerosis, and cortical posterior atrophy may also cause homony­
mous visual field disorders (see Zihl, 2011).

About 80 percent of patients (n = 157) with unilateral homonymous visual field loss suffer
from functional impairments in reading (hemianopic dyslexia) and/or in global perception
and overview (Zihl, 2011). Homonymous visual field loss causes a restriction of the field
of view, which prevents the rapid extraction of the entire spatial configuration of the visu­
al environment. It therefore impairs the top-down and bottom-up interactions that are re­
quired for efficient guidance of spatial attention and oculomotor activities during scene
perception and visual search. Patients with additional injury to the posterior thalamus,
the occipital white matter route (i.e., fiber pathways to the dorsal visual route and path­
ways connecting occipital, parietal, temporal, and frontal cortical areas) show disorga­
nized oculomotor scanning behavior (Zihl & Hebel, 1997; Mort & Kennard, 2003). The im­
pairments in global perception and visual scanning shown by these patients are more se­
vere than those resulting from visual field loss alone (Zihl, 1995a). Interestingly, about 20
percent show spontaneous substitution of visual field loss by oculomotor compensation
and thus enlargement of the field of view; the percentage is even higher in familiar sur­
roundings because patients can make use of their spatial knowledge of the surroundings
Page 4 of 37
Perceptual Disorders

(Zihl, 2011). In normal subjects, global visual perception is based on the visual field with­
in which they can simultaneously detect and process visual stimuli. The visual field can be
enlarged by eye shifts, which is typically 50 degrees in all directions (Leigh & Zee, 2006).
The resulting field of view is thus defined by the extent of the visual field when moving
the eyes in global visual perception (see also Pambakian, Mannan, Hodgson, & Kennard,
2004).

Reading is impaired in patients with unilateral homonymous field loss and visual field
sparing of less than 5 degrees to the left and less than 8 degrees to the right of the fovea.
In reading, the visual brain (p. 196) relies on a gestalt-type visual word-form processing,
the “reading span.” It is asymmetrical (larger to the right in left-to right-orthographies)
and is essential for the guidance of eye movements during text processing (Rayner, 1998).
However, insufficient visual field sparing does not appear to be the only factor causing
persistent “hemianopic” dyslexia. The extent of brain injury affecting in particular the oc­
cipital white matter seems to be crucial in this regard (Schuett, Heywood, Kentridge, &
Zihl, 2008a; Zihl 1995b). That reading is impaired at the pre-semantic visual sensory level
is supported by the outcome of treatment procedures involving practice with nontext ma­
terial, which have been found to be as effective as word material in reestablishing eye
movement reading patterns and improving reading performance (Schuett, Heywood, Ken­
tridge, & Zihl, 2008b).

In the case of bilateral postchiasmatic brain injury, both visual hemifields are affected, re­
sulting in bilateral homonymous hemianopia (“tunnel vision”), bilateral upper or lower
hemianopia, bilateral paracentral scotoma, or central scotoma. Patients with bilateral vi­
sual field disorders suffer from similar, but typically more debilitating, visual impairments
in global visual perception and reading. A central scotoma is a very dramatic form of
homonymous visual field loss because foveal vision is either totally lost or depressed (cen­
tral amblyopia). The reduction or loss of vision in the central part of the visual field is typ­
ically associated with a corresponding loss of visual spatial contrast sensitivity, visual acu­
ity, and form, object, and face perception. The loss of foveal vision also causes a loss of
the central reference for optimal fixation and of the straight-ahead direction as well as an
impairment of the visual-spatial guidance of saccades and hand-motor responses. As a
consequence, patients cannot accurately fixate a visual stimulus and shift their gaze from
one stimulus to another, scan a scene or a face, and guide their eye movements during
scanning and reading. Patients therefore show severe impairments in locating objects,
recognizing objects and faces, finding their way in rooms or places, and reading, and of­
ten get lost when scanning a word or a scene (Zihl, 2011).

Visual Acuity, Spatial Contrast Sensitivity, and Visual Adaptation

After unilateral postchiasmatic brain injury, visual acuity is usually not significantly re­
duced, except for cases in which the optic tract is involved (Frisén, 1980). After bilateral
postchiasmatic injury, visual acuity can either be normal, gradually diminished, or totally

Page 5 of 37
Perceptual Disorders

lost (i.e., form vision is no longer possible) (Symonds & MacKenzie, 1957). This reduction
in visual acuity cannot be improved by optical correction.

When spatial contrast sensitivity is reduced, patients usually complain of “blurred” or


“foggy” vision despite normal visual acuity, accommodation, and convergence (Walsh,
1985). Impairments of contrast sensitivity have been reported in cerebrovascular dis­
eases (Bulens, Meerwaldt, van der Wildt, & Keemink, 1989; Hess, Zihl, Pointer, & Schmid,
1990); after closed head trauma, encephalitis, and hypoxia (Hess et al., 1990); in
Parkinson’s disease (Bulens, Meerwaldt, van der Wildt & Keemink, 1986; Uc et al., 2005);
multiple sclerosis (Gal, 2008); and dementia of the Alzheimer type (Jackson & Owsley,
2003). Bulens et al. (1989) have suggested that impairments of contrast sensitivity for
high spatial frequencies mainly occur after occipital injury, whereas impairments of sensi­
tivity for lower spatial frequencies occur after temporal or parietal injury. Depending on
the severity of the sensitivity loss, patients have difficulties with depth perception, text
processing, face perception, and visual recognition. Because reduction in spatial contrast
sensitivity is not necessarily associated with reduced visual acuity, assessing visual acuity
alone is not sufficient for detecting impaired spatial contrast sensitivity.

Color Vision

Color vision may be lost in the contralateral hemifield (homonymous hemiachromatopsia)


or in the upper quadrant after unilateral occipital-temporal brain injury. Because light
sensitivity and form vision are not impaired in the affected hemifield, the loss of color vi­
sion is selective (e.g., Short & Graff-Radford, 2001). Patients are usually aware of this dis­
order and report that the corresponding part of the visual environment appears “pale,” in
“black and white,” or “like in an old movie.” In the case of cerebral dyschromatopsia,
foveal color vision is affected with and without the concomitant loss of color vision in the
peripheral visual field (Koh et al., 2008; Rizzo, Smith, Pokorny, & Damasio, 1993). Pa­
tients with cerebral dyschromatopsia find it difficult to discriminate fine color hues. Bilat­
eral occipital-temporal injury causes moderate or severe loss of color vision in the entire
visual field, which is called cerebral achromatopsia (Bouvier & Engel, 2006; Heywood &
Kentridge, 2003; Meadows 1974); yet, discrimination of grays (p. 197) (Heywood, Wilson,
& Cowey, 1987) and even processing of wavelength differences (Heywood & Kentridge,
2003) may be spared. Consequently, discriminating and sorting of colors and associating
color stimuli with their names and with particular objects (e.g., yellow and banana; green
and grass) are affected. Patients may report that objects and pictures appear “drained of
color,” as “dirty brownish” or “reddish,” or as “black and white.” Cerebral hemiachro­
matopsia is a rather rare condition. Among 1,020 patients with unilateral homonymous vi­
sual field disorders after acquired posterior brain injury, we found thirty cases (3.9 per­
cent) with unilateral hemiachromatopsia and impaired foveal color discrimination; among
130 cases with bilateral occipital injury, sixteen cases (12.3 percent) showed complete
cerebral achromatopsia. Partial cerebral achromatopsia may also occur and may be asso­
ciated with impaired color constancy (Kennard, Lawden, Morland, & Ruddock, 1995). The
ventral occipital-temporal cortex is the critical lesion location of color vision deficits (Bou­
vier & Engel, 2006; Heywood & Kentridge, 2003). Color vision may also be impaired in
Page 6 of 37
Perceptual Disorders

(mild) hypoxia (Connolly, Barbur, Hosking, & Moorhead, 2008), multiple sclerosis (Moura
et al., 2008), Parkinson’s disease (Müller, Woitalla, Peters, Kohla, & Przuntek, 2002), and
dementia of the Alzheimer type (Jackson & Owsley, 2003). Furthermore, color hue dis­
crimination accuracy can be considerably reduced in older age (Jackson & Owsley, 2003).

Spatial Vision

Disorders in visual space perception comprise deficits in visual localization, depth percep­
tion, and perception of visual spatial axes. Brain injury can differentially affect retino­
topic, spatiotopic, egocentric, and allocentric frames of reference. Visual-spatial disor­
ders typically occur after occipital-parietal and posterior parietal injury; a right-hemi­
sphere injury more frequently causes visual spatial impairments (for comprehensive re­
views, see Farah, 2003; Karnath & Zihl, 2003; Landis 2000).

After unilateral brain injury, moderate defective visual spatial localization is typically
found in the contralateral hemifield, but may also be present in the foveal visual field
(Postma, Sterken, de Vries, & de Haan, 2000), which is associated with less accurate sac­
cadic localization accuracy. Patients with bilateral posterior brain injury, in contrast, show
moderate to severe localization inaccuracy in the entire visual field, which typically af­
fects all visually guided activities, including accurately fixating objects, reaching for ob­
jects, and reading and writing (Zihl, 2011). Interestingly, patients with parietal lobe injury
can show dissociation between spatial perception deficits and pointing errors (Darling,
Bartelt, Pizzimenti, & Rizzo, 2008), indicating that inaccurate pointing cannot always be
explained in terms of defective localization but may represent a genuine disorder (optic
ataxia; see Caminiti et al., 2010).

Impaired monocular and binocular depth perception (astereopsis) has been observed in
patients with unilateral and bilateral posterior brain injury, with bilateral injury causing
more severe deficits. Defective depth perception may cause difficulties in pictorial depth
perception, walking (downstairs), and reaching for objects or handles (Koh et al., 2008;
Miller et al., 1999; Turnbull, Driver, & McCarthy, 2004). Impaired distance perception, in
particular in the peripersonal space, has mainly been observed after bilateral occipital-
parietal injury (Berryhill, Fendrich, & Olson, 2009).

Shifts in the vertical and horizontal axes have been reported particularly in patients with
right occipital-parietal injury (Barton, Behrmann, & Black, 1998; Bonan, Leman, Legar­
gasson, Guichard, & Yelnik, 2006). Right-sided posterior parietal injury can also cause ip­
silateral and contralateral shifts in the visually perceived trunk median plane (Darling,
Pizzimenti, & Rizzo, 2003). Occipital injury more frequently causes contralateral shifts in
spatial axes, whereas posterior parietal injury also causes ipsilateral shifts. Barton and
Black (1998) suggested that the contralateral midline shift of hemianopic patients is “a
consequence of the strategic adaptation of attention into contralateral hemispace after
hemianopia” (p. 660), that is, that a change in attentional distribution might cause an ab­
normal bias in line bisection. In a study of 129 patients with homonymous visual field
loss, we found the contralateral midline shift in more than 90 percent of cases. However,

Page 7 of 37
Perceptual Disorders

the line bisection bias was not associated with efficient oculomotor compensation for the
homonymous visual loss. In addition, visual field sparing also did not modulate the degree
of midline shift. Therefore, the subjective straight-ahead deviation may be explained as a
consequence of a systematic, contralesional shift of the egocentric visual midline and may
therefore represent a genuine visual-spatial perceptual disorder (Zihl, Sämann, Schenk,
Schuett, & Dauner, 2009). This idea is supported by Darling et al. (2003), who reported
difficulties in visual perception of the trunk-fixed anterior-posterior axis in patients with
left- or (p. 198) right-sided unilateral posterior parietal lesions without visual field defects.

Visual Motion Perception

Processing of direction and speed of visual motion stimuli is a genuine visual ability. How­
ever, in order to know how objects move in the world, we must take into account the rota­
tion of our eyes as well as of our head (Bradley, 2004; Snowden & Freeman, 2004). Mo­
tion perception also enables recognition of biological movements (Giese & Poggio, 2003)
and supports face perception (Roark, Barrett, Spence, Abdi, & O’Toole, 2003). Visual area
V5 activity is the most critical basis for generating motion perception (Moutoussis & Zeki,
2008), whereas superior temporal and premotor areas subserve biological motion percep­
tion (Saygin, 2007).

The first well-documented case of loss of visual motion perception (cerebral akinetopsia)
is L.M. After bilateral temporal-occipital cerebrovascular injury, she completely lost move­
ment vision in all three dimensions, except for detection and direction discrimination of
single targets moving at low speed with elevated thresholds. In contrast, all other visual
abilities, including the visual field, visual acuity, color vision, stereopsis, and visual recog­
nition, were spared, as was motion perception in the auditory and tactile modalities. Her
striking visual-perceptual impairment could not be explained by spatial or temporal pro­
cessing deficits, impaired contrast sensitivity (Hess, Baker, & Zihl, 1989), or generalized
cognitive slowing (Zihl, von Cramon, & Mai, 1983; Zihl, von Cramon, Mai, & Schmid,
1991). L.M. was also unable to search for a moving target among stationary distractor
stimuli in a visual display (McLeod, Heywood, Driver, & Zihl, 1989) and could not see bio­
logical motion stimuli (McLeod, Dittrich, Driver, Perrett, & Zihl, 1996), including facial
movements in speech reading (Campbell, Zihl, Massaro, Munhall, & Cohen, 1997). She
could not extract shape from motion and lost apparent motion perception (Rizzo, Nawrot,
& Zihl, 1995). Because of her akinetopsia, L.M. was severely handicapped in all activities
involving visual motion perception, whereby perception and action were similarly affect­
ed (Schenk, Mai, Ditterich, & Zihl, 2000). Selective impairment of movement vision in
terms of threshold elevation for speed and direction has also been reported in the hemi­
field contralateral to unilateral posterior brain injury for motion types of different com­
plexity, combined and in separation (Billino, Braun, Bohm, Bremmer, & Gegenfurtner,
2009; Blanke, Landis, Mermoud, Spinelli, & Safran, 2003; Braun, Petersen, Schoenle, &
Fahle, 1998; Plant, Laxer, Barbaro, Schiffman, & Nakayama, 1993; Vaina, Makris,
Kennedy, & Cowey, 1998).

Page 8 of 37
Perceptual Disorders

Visual Identification and Visual Recognition

Visual agnosia is the inability to identify, recognize, interpret, or comprehend the mean­
ing of visual stimuli even though basic visual functions (i.e., the visual field, visual acuity,
spatial contrast sensitivity, color vision, and form discrimination) are intact or at least suf­
ficiently preserved. Visual agnosia either results from defective visual perception (e.g.,
synthesis of features; apperceptive visual agnosia) or from the loss of the “bridge” be­
tween the visual stimulus and its semantic associations (e.g., label, use, history; associa­
tive or semantic visual agnosia). However, objects can be recognized in the auditory and
tactile modalities, and the disorder cannot be explained by supramodal cognitive or apha­
sic deficits (modified after APA, 2007). Lissauer (1890) interpreted apperceptive visual ag­
nosia as “mistaken identity” because incorrectly identified objects share global (e.g., size
and shape) and/or local properties (e.g., color, texture, form details) with other objects,
which causes visual misidentification. Cases with pure visual agnosia seem to be the ex­
ception rather than the rule (Riddoch, Johnston, Bracewell, Boutsen, & Humphreys,
2008). Therefore, a valid and equivocal differentiation between a “genuine” visual ag­
nosia and secondary impairments in visual identification and recognition resulting from
other visual deficits is often difficult, in particular concerning the integration of global
and local information (Delvenne, Seron, Coyette, & Rossion, 2004; Thomas & Forde,
2006). In a group of 1,216 patients with acquired injury to the visual brain we have found
only seventeen patients (about 2.4 percent) with genuine visual agnosia. Visual agnosia is
typically caused by bilateral occipital-temporal injury (Barton, 2008a) but may also occur
after left- (Barton, 2008b) or right-sided posterior brain injury (Landis, Regard, Bliestle,
& Kleihues, 1988). There also exist progressive forms of visual agnosia in posterior corti­
cal atrophy and in early stages of dementia (Nakachi et al., 2007; Rainville et al., 2006).

Farah (2000) has proposed a useful classification of visual agnosia according to the type
of visual material patients find difficult to identify and recognize. Patients with visual ob­
ject and form agnosia are unable to visually recognize complex objects or pictures.
(p. 199) There exist category-specific types of object agnosia, such as for living and nonliv­

ing things (Thomas & Forde, 2006), animals or artifacts (Takarae & Levin, 2001). A par­
ticular type of visual object agnosia is visual form agnosia. The most elaborated case with
visual form agnosia is D.F. (Milner et al., 1991). After extensive damage to the ventral
processing stream due to carbon monoxide poisoning, this patient showed a more or less
complete loss of form perception, including form discrimination, despite having a visual
resolution capacity of 1.7 minute of arc. Visually guided activities such as pointing to or
grasping for an object, however, were spared (Carey, Dijkerman, Murphy, Goodale, & Mil­
ner, 2006; James, Culham, Humphrey, Milner, & Goodale, 2003; McIntosh, Dijkerman,
Mon-Williams, & Milner, 2004). D.F. also showed profound inability to visually recognize
objects, places, and faces, indicating a more global rather than selective visual agnosia.
Furthermore, D.F.’s visual disorder may also be explained in terms of an allocentric spa­
tial deficit rather than as perceptual deficit (Schenk, 2006). As Goodale and Westwood
(2004) have pointed out, the proposed ventral-dorsal division in visual information pro­
cessing may not be as exclusive as assumed, and both routes interact at various stages.
However, automatic obstacle avoidance was intact in D.F. while correct grasping was pos­
Page 9 of 37
Perceptual Disorders

sible for simple objects only (McIntosh et al., 2004), suggesting that the “what” pathway
plays no essential role in detecting and localizing objects or in the spatial guidance of
walking (Rice et al., 2006). Further cases of visual form agnosia after carbon monoxide
poisoning have been reported by Heider (2000). Despite preserved visual acuity and only
minor visual field defects, patients were severely impaired in shape and form discrimina­
tion, whereas the perception of color, motion, and stereoscopic depth was relatively unim­
paired. Heider (2000) identified a failure in figure–ground segregation and grouping sin­
gle elements of a composite visual scene into a “gestalt” as the main underlying deficit.
Global as well as local processing can be affected after right- and left-sided occipital-tem­
poral injury (Rentschler, Treutwein, & Landis, 1994); yet, typically patients find it more
difficult to process global features and integrate them into a whole percept (integrative or
simultaneous agnosia; Behrmann & Williams, 2007; Saumier, Arguin, Lefebvre, & Las­
sonde, 2002; Thomas & Forde, 2006). Consequently, patients are unable to report more
than one attribute of a single object (Coslett & Lie, 2008). Encoding the spatial arrange­
ments of parts of an object requires a mechanism that is different from that required for
encoding the shape of individual parts, with the former selectively compromised in inte­
grative agnosia (Behrmann, Peterson, Moscovitch, & Suzuki, 2006). Integration of multi­
ple object stimuli into a holistic interpretation seems to depend on the spatial distance of
local features and elements (Huberle & Karnath, 2006). Yet, shifting fixation and thus al­
so attention to all elements of an object in a regular manner seems not sufficient to
“bind” together the different elements of spatially distributed stimuli (Clavagnier et al.,
2006). The integration of multiple visual elements resulting in a conscious perception of
their gestalt seems to rely on bilateral structures in the human lateral and medial inferior
parietal cortex (Himmelbach, Erb, Klockgether, Moskau, & Karnath, 2009). An alternative
explanation for the impairment in global visual perception is shrinkage of the field of at­
tention and thus perception (Michel & Henaff, 2004), which might be elicited by atten­
tional capture (“radical visual capture”) to single, local elements (Takaiwa, Yoshimura,
Abe, & Terai, 2003; Dalrymple, Kingstone, & Barton, 2007). The pathological restriction
and rigidity of attention impair the integration of multiple visual elements to a gestalt,
but the type of capture depends on the competitive balance between global and local
salience. The impaired disengaging of attention causes inability to “unlock” attention
from the first object or object element to other objects or elements of objects (Pavese,
Coslett, Saffran, & Buxbaum, 2002). Interestingly, facial expressions of emotion are less
affected in simultanagnosia, indicating that facial stimuli constitute a specific category of
stimuli that attract attention more effectively and are possibly processed before attention­
al engagement (Pegna, Caldara-Schnetzer, & Khateb, 2008). It has been proposed that
differences in local relative to more global visual processing can be explained by different
processing modes in the dorsal and medial ventral visual pathways at an extrastriate lev­
el; these characteristics can also explain category-specific deficits in visual perception
(Riddoch et al., 2008). The dual-route organization of visual information has also been ap­
plied to local–global perception. Difficulties with processing of multiple stimulus elements
or features (within-object representation) are often referred to as “ventral” simultanag­
nosia, and impaired processing of multiple spatial stimuli (between-object representation)
as “dorsal” simultanagnosia (Karnath, Ferber, Rorden, & Driver, 2000). Dorsal simul­

Page 10 of 37
Perceptual Disorders

tanagnosia is one component of the Bálint-Holmes syndrome, which consists of (p. 200)
spatial (and possibly temporal) restriction of the field of visual attention and thus visual
processing and perception, impaired visual spatial localization and orientation, and defec­
tive depth perception (Moreaud, 2003; Rizzo & Vecera, 2002). In addition, patients with
severe Balint’s syndrome find it extremely difficult to shift their gaze voluntarily or on
command (oculomotor apraxia or psychic paralysis of gaze) and are unable to direct
movement of an extremity in space under visual guidance (optic or visuomotor ataxia). As
a consequence, visually guided oculomotor and hand motor activities, visual-constructive
abilities, visual orientation, recognition, and reading are severely impaired (Ghika, Ghika-
Schmid, & Bogousslavsky, 1998).

In face agnosia (prosopagnosia), recognition of familiar faces, including one’s own face, is
impaired or lost. The difficulties prosopagnosic patients have with visual face recognition
also manifest in their oculomotor scan path during inspection of a face; global features
such as hair or the forehead, for example, are scanned in much more detail than genuine
facial features such as the eye or nose (Stephan & Caine, 2009). Other prosopagnosic
subjects may show partial processing of facial features, such as the mouth region
(Bukach, Le Grand, Kaiser, Bub, & Tanaka, 2008). Topographical (topographagnosia) or
environmentalagnosia refers to defective recognition of familiar environments, in reality
and on maps and pictures; however, patients may have fewer difficulties in familiar sur­
roundings and with scenes with clear landmarks, and may benefit from semantic informa­
tion such as street names (Mendez & Cherrier, 2003). Agnosia for letters (pure alexia) is a
form of acquired dyslexia with defective visual recognition of letters and words while au­
ditory recognition of letters and words and writing are intact. The underlying disorder
may have a pre-lexical, visual-perceptual basis because patients can also exhibit difficul­
ties with nonlinguistic stimuli (Mycroft, Behrmann, & Kay, 2009).

Audition
Auditory perception comprises detection, discrimination, identification, and recognition
of sounds, voice, music, and speech. The ability to detect and discriminate attributes of
sounds improves with practice (Wright & Zhang, 2009) and thus depends on auditory ex­
perience. This might explain interindividual differences in auditory performance, in par­
ticular recognition expertise and domain specificity concerning, for example, sounds,
voices, and music (Chartrand, Peretz, & Belin, 2008). Another factor that crucially modu­
lates auditory perceptual efficiency is selective attention (Shinn-Cunningham & Best,
2008).

The auditory brain possesses tonotopic maps that show rapid task-related changes to sub­
serve distinct functional roles in auditory information processing, such as pitch versus
phonetic analysis (Ozaki & Hashimoto, 2007). This task specificity can be viewed as a
form of plasticity that is embedded in a context- and cognition-related frame of reference,
whereby attention, learning and memory, and mental imagery can modulate processing
(Dahmen & King, 2007; Fritz, Elhilali, & Shamma, 2005; Weinberger, 2007; Zatorre,

Page 11 of 37
Perceptual Disorders

2007). The auditory cortex forms internal representations of temporal characteristic


structures, which may build the further basis for sound segmentation, complex auditory
objects processing, and also multisensory integration (Wang, Lu, Bendor, & Bertlett,
2008). In the discrimination of speech and nonspeech stimuli, which is based on subtle
temporal acoustic features, the middle temporal gyrus, the superior temporal sulcus, the
posterior part of the inferior frontal gyrus, and the parietal operculum of the left hemi­
sphere are involved (Zaehle, Geiser, Alter, Jancke, & Meyer, 2008). Environmental sounds
are mainly processed in the middle temporal gyri in both hemispheres (Lewis et al.,
2004), whereas vocal communication sounds are preferentially coded in the insular re­
gion (Bamiou, Musiek, & Luxon, 2003). Music perception is understood as a form of com­
munication in which formal codes (i.e., acoustic patterns) and their auditory representa­
tions are employed to elicit a variety of perceptual and emotional experiences (Bharucha,
Curtis, & Paroo, 2006). Musical stimuli have also been found to activate specific path­
ways in several brain areas, which are associated with emotional behavior, such as insu­
lar and cingulate cortices, amygdala, and prefrontal cortex (Boso, Politi, Barale, & Enzo,
2006). For the representation of auditory scenes and categories within past and actual ex­
periences and contexts, the medial and ventrolateral prefrontal cortex appears to play a
particular role (Janata, 2005; Russ, Lee, & Cohen, 2007).

The auditory system also possesses a “where” and a “what” subdivision for processing
spatial and nonspatial aspects of acoustic stimuli, which allows detection, localization,
discrimination, identification, and recognition of auditory information, including vocal
communication sounds (speech perception) and music (Kraus & Nicol, 2005; Wang, Wu, &
Li, 2008).

(p. 201) Auditory Perceptual Disorders

Unilateral and bilateral injury to left- or right-sided temporal brain structures can affect
processing of spatial and temporal auditory processing capacities (Griffiths et al., 1997;
Polster & Rose, 1998) and the perception of environmental sounds (Tanaka, Nakano, &
Obayashi, 2002), sound movement (Lewald, Peters, Corballis, & Hausmann, 2009), tunes,
prosody, and voice (Peretz et al., 1994), and words (pure word deafness) (Shivashankar,
Shashikala, Nagaraja, Jayakumar, & Ratnavalli, 2001). Functional dissociation of auditory
perceptual deficits, such as preservation of speech perception and environmental sounds
but impairment of melody perception (Peretz et al., 1994), impaired speech perception
but intact environmental sound perception (Kaga, Shindo, & Tanaka, 1997), and impaired
perception of verbal but spared perception of nonverbal stimuli (Shivashankar et al.,
2001), suggests a modular architecture similar to that in the visual cortex (Polster &
Rose, 1998).

Auditory Agnosia

Auditory agnosia is defined as the impairment or loss of recognition of auditory stimuli in


the absence of defective auditory functions and language and cognitive disorders that can
(sufficiently) explain the recognition disorder. As in visual agnosia, it may be difficult to

Page 12 of 37
Perceptual Disorders

validly distinguish between genuine and secondary auditory agnosia. It is impossible to


clearly differentiate sensory-perceptual from perceptual-cognitive abilities because both
domains are required for auditory recognition. For example, patients with intact process­
ing of steady-state patterns but impaired processing of dynamic acoustic patterns may ex­
hibit verbal auditory agnosia (Wang, Peach, Xu, Schneck, & Manry, 2000) or have (addi­
tional) difficulties with auditory spatial localization and auditory motion perception
(Clarke, Bellmann, Meuli, Assal, & Steck, 2000). Auditory agnosia for environmental
sounds may be associated with impaired processing of meaningful verbal information
(Saygin, Dick, Wilson, Dronkers, & Bates, 2003) and impaired recognition of music (Kaga,
Shindo, Tanaka, & Haebara, 2000); yet, perception of environmental sound (Shivashankar
et al., 2001) and music may also be spared even in the case of generalized auditory ag­
nosia (Mendez, 2001). However, there exist examples of pure agnosia for recognizing par­
ticular categories of auditory material, such as environmental sounds (Taniwaki, Tagawa,
Sato, & Iino, 2000), speech (pure word deafness) (Engelien et al., 1995; Polster & Rose,
1998), and music perception. Musical timber perception can be affected after left- or
right temporal lobe injury (Samson, Zatorre, & Ramsay, 2002). Agnosia for music (music
agnosia, amusia) and agnosia for other auditory categories are frequently associated but
can also dissociate; they typically occur after right unilateral and bilateral temporal lobe
injury (Vignolo, 2003). Amusia may affect discrimination and recognition of familiar
melodies (Ayotte, Peretz, Rousseau, Bard, & Bojanowski, 2000; Sato et al., 2005). Howev­
er, there is evidence for a less strong hemispheric specificity for music perception be­
cause cross-hemisphere and fragmented neural substrates underlie local and global musi­
cal information processing at least in the melodic and temporal dimensions (Schuppert,
Munte, Wieringa, & Altenmüller, 2000).

Somatosensory Perception
The somatosensory system provides information about object surfaces that are in direct
contact with the skin (touch) and about the position and movements of body parts (propri­
oception and kinesthesis). Somatosensory perception thus includes detection and discrim­
ination of (fine) differences in touch stimulation and haptic perception, that is, the per­
ception of shape, size, and identity (recognition) of objects on the basis of touch and
kinesthesis. Shape is an important cue for recognizing objects by touch; edges, curvature,
and surface areas are associated with three-dimensional shape (Plaisier, Tiest, & Kap­
pers, 2009). Exploratory motor procedures are directly linked to the extraction of specific
shape properties (Valenza et al., 2001). Somatosensory information is processed in anteri­
or, lateral, and posterior parietal cortex, but also in frontal, cingulate, temporal, and insu­
lar cortical regions (Porro, Lui, Facchin, Maieron, & Baraldi, 2005).

Somatosensory Perceptual Disorders

Impaired haptic perception of (micro) geometrical properties, which may be associated


with a failure to recognize objects, has been reported after injury to the postcentral
gyrus, including somatosensory areas SI and SII, and the posterior parietal cortex
Page 13 of 37
Perceptual Disorders

(Bohlhalter, Fretz, & Weder, 2002; Estanol, Baizabal-Carvallo, & Senties-Madrid, 2008).
Difficulties to identify objects using hand manipulation only have been reported after
parietal injury (Tomberg & Desmedt, 1999). Impairment of the perception of stimulus
shape (morphagnosia) may result from defective processing of spatial orientation in two-
and three-dimensional space (Saetti, De Renzi, & Comper, 1999). (p. 202) Tactile object
recognition can be impaired without associated disorders in tactile discrimination and
manual shape exploration, indicating the existence of “pure” tactile agnosia (Reed, Casel­
li, & Farah, 1996).

Body Perception Disorders

Disorders in body perception may affect body form and body actions selectively or in com­
bination (Moro et al., 2008). Patients with injury to the premotor cortex may show ag­
nosia for their body (asomatognosia); that is, they describe parts of their body to be miss­
ing or disappeared from body awareness (Arzy, Overney, Landis, & Blanke, 2006). Macro
and, less frequently, micro somatognosia have been reported as transient and reversible
modifications of body representation during migraine aura (Robinson & Podoll, 2000).
Asomatognosia either may involve the body as a whole (Beis, Paysant, Bret, Le Chapelain,
& Andre, 2007) or may be restricted to finger recognition (“finger agnosia”; Anema et al.,
2008). Body misperception may also result in body illusion, a deranged representation of
the body concerning its ownership labeled “somatoparaphrenia” (Vallar & Ronchi, 2009).
Distorted body perception may also occur in chronic pain (Lotze & Moseley, 2007).

Olfactory and Gustatory Perception


The significance of the sense of smell is still somehow neglected. This is surprising given
that olfactory processing monitors the intake of airborne agents into the human respirato­
ry system and warns of spoiled food, leaking natural gas, polluted air, and smoke. In addi­
tion, it determines to a large degree the flavor and palatability of foods and beverages,
enhances life quality, and mediates basic elements of human social relationships and com­
munication, such as in mother–child interactions (Doty, 2009). Olfactory perception im­
plies detection, discrimination, identification, and recognition of olfactory stimuli. Olfacto­
ry perception shows selective adaptation; the perceived intensity of a smell drops by 50
percent or more after continuous exposure of about 10 minutes, and recovers again after
removal of the smell stimulus (Eckman, Berglund, Berglund, & Lindvall, 1967). Continu­
ous exposition to a particular smell, such as cigarette smoke, causes persistent adapta­
tion to that smell on the person and in the environment.

Smell perception involves the caudal orbitofrontal and medial temporal cortices. Olfacto­
ry stimuli are processed in primary olfactory (piriform) cortex and also activate the amyg­
dala bilaterally, regardless of valence. In posterior orbitofrontal cortex, processing of
pleasant and unpleasant odors is segregated within medial and lateral segments, respec­
tively, indicating functional heterogeneity. Olfactory stimuli also show that brain regions
mediating emotional processing are differentially activated by odor valence and provide

Page 14 of 37
Perceptual Disorders

evidence for a close anatomical coupling between olfactory and emotional processes (Got­
tfried, Deichmann, Winston, & Dolan, 2002).

Gustation is vital for establishing whether a specific substance is edible and nutritious or
poisonous, and for developing preferences for specific foods. According to the well-known
taste tetrahedron, four basic taste qualities can be distinguished: sweet, salt, sour, and
bitter. A fifth taste quality is umami, a Japanese word for “good taste.” Perceptual taste
qualities are based on the pattern of activity across different classes of sensory fibers
(i.e., cross-fiber theory; Mather, 2006, pp. 44) and distributed cortical processing (Simon,
de Araujo, Gutierrez, & Nicolelis, 2006). Taste information is conveyed through the cen­
tral gustatory pathways to the gustatory cortical area, but is also sent to the reward sys­
tem and feeding center via the prefrontal cortex, insular cortex, and amygdala (Simon et
al., 2006; Yamamoto, 2006). The sensation of eating, or flavor, involves smell and taste as
well as interactions between these and other perceptual systems, including temperature,
touch, and sight. However, flavor is not a simple summation of different sensations; smell
and taste seem to dominate flavor.

Olfactory Perceptual Disorders

Olfactory perception can be impaired in the domains of detection, discrimination, and


identification/recognition of smell stimuli. Typically, patients experience hyposmia or dys­
geusia (decrease) or anosmia (loss of sense of smell) (Haxel, Grant, & Mackay-Sim, 2008).
However, distinct patterns of olfactory dysfunctions have been reported, indicating differ­
ential breakdown in olfactory perception analogous to visual and auditory modalities
(Luzzi et al., 2007). Interestingly, selective inability to recognize the favorite foods by
smell can also occur despite preserved detection and evaluation of food stimuli as pleas­
ant or familiar (Mendez & Ghajarnia, 2001).

Chronic disorders in olfactory perception and recognition have been reported after (trau­
matic) brain injury mainly to ventral frontal cortical structures (Fujiwara, Schwartz,
Gaom Black, & Levine, 2008; Haxel, Grant, & Mackay-Sim, 2008; Wermer, Donswijk,
Greebe, Verweij, & Rinkel, 2007), in (p. 203) Parkinson’s disease and multiple sclerosis, in
mesial temporal epilepsy, and in neurodegenerative diseases, including dementia of the
Alzheimer type, frontal-temporal dementia, cortical-basal degeneration, and Huntington’s
disease (Barrios et al., 2007; Doty, 2009; Jacek, Stevenson, & Miller, 2007; Pardini, Huey,
Cavanagh, & Grafman, 2009). It should be mentioned, however, that hyposmia and im­
paired odor identification can also be found in older age (Wilson, Arnold, Tang, & Ben­
nett, 2006), in particular in subjects with cognitive decline. Presbyosmia has been found
in particular after 65 years of age, with no difference between males and females, and
with a weak relationship between self-reports of olfactory function and objective olfactory
function (Mackay-Sim, Johnston, Owen, & Burne, 2006). Olfactory perceptual changes
have also been reported among subjects receiving chemotherapy (Bernhardson, Tishel­
man, & Ruthqvist, 2009), in depression (Pollatos et al., 2007), and in anorexia nervosa
(Roessner, Bleich, Banashewski, & Rothenburger, 2005).

Page 15 of 37
Perceptual Disorders

Gustatory Perceptual Disorders

Gustatory disorders in the form of quantitatively reduced (hypogeusia) or qualitatively


changed (dysgeusia) gestation have been reported after subcortical, inferior collicular
stroke (Cerrato et al., 2005), after pontine infarction (Landis, Leuchter, San Millan Ruiz,
Lacroix, & Landis, 2006), after left insular and opercular stroke (Mathy, Dupuis, Pigeolet,
& Jacquerye, 2003), in multiple sclerosis (Combarros, Miro, & Berciano, 1994), and in di­
abetes mellitus (Stolbova, Hahn, Benes, Andel, & Treslova, 1999). The anteromedial tem­
poral lobe plays an important role in recognizing taste quality because injury to this
structure can cause gustatory agnosia (Small, Bernasconi, Sziklas, & Jones-Gutman,
2005). Gustatory perception also decreases with age (>40 years), which is more pro­
nounced in males than in females (Fikentscher, Roseburg, Spinar, & Bruchmuller, 1977).

Smell and taste dysfunctions, including impaired detection, discrimination, and identifica­
tion of foods, have been frequently reported in patients following (minor) stroke in tempo­
ral brain structures (Green, McGregor, & King, 2008). Abnormalities in taste and smell
have also been reported in patients with Parkinson’s disease (Shah et al., 2009).

Social Perception
Social perception is an individual’s perception of social stimuli (i.e., facial expression,
prosody and gestures, and smells), which allow inferring motives, attitudes, or values
from the social behavior of other individuals. Social perception and social cognition, but
also sensitivity to the social context, and social action, belong to particular functional sys­
tems in the prefrontal brain (Adolphs, 2003; Adolphs, Tranel, & Damasio, 2003). The
amygdala is involved in recognizing facial emotional expressions; the orbitofrontal cortex
is important to reward processing; and the insula is involved in representing “affective”
states of our own body, such as empathy or pain (Adolphs, 2009). The neural substrates of
social perception are characterized by a general pattern of right-hemispheric functional
asymmetry (Brancucci, Lucci, Mazzatenta, & Tommasi, 2009). The (right) amygdala is
crucially involved in evaluating sad but not happy faces, suggesting that this brain struc­
ture plays a specific role in processing negative emotions, such as sadness and fear
(Adolphs & Tranel, 2004).

Disorders in Social Perception

Patients with traumatic brain injury may show difficulties with recognizing affective infor­
mation from the face, voice, bodily movement, and posture (Bornhofen & McDonald,
2008), which may persistently interfere with successful negotiation of social interactions
(Ietswaart, Milders, Crawford, Currie, & Scott, 2008). Interestingly, face perception and
perception of visual social cues can be affected while the perception of prosody can be
relatively spared, indicating a dissociation between visual and auditory social-perceptual
abilities (Croker & McDonald, 2005; Green, Turner, & Thompson, 2004; Pell, 1998). Im­
paired auditory recognition of fear and anger has been reported following bilateral amyg­

Page 16 of 37
Perceptual Disorders

dala lesions (Scott et al., 1997). Impairments of social perception, including inaccurate in­
terpretation and evaluation of stimuli signifying reward or punishment in a social context,
and failures to translate emotional and social information into task- and context-appropri­
ate action patterns are often observed in subjects with frontal lobe injury. Consequently,
patients may demonstrate inadequate social judgments and decision making, social inflex­
ibility, and lack of self-monitoring, particularly in social situations (Rankin, 2007). Difficul­
ties with facial expression perception have also been reported in mood disorders (Venn,
Watson, Gallagher, & Young, 2006).

Conclusion and Some Final Comments


The systematic study of individuals with perceptual deficits has substantially contributed
to the (p. 204) understanding of the role of perceptual abilities and their underlying so­
phisticated brain processes, as well as the neural organization of the perceptual modali­
ties. Combined neurobiological, neuroimaging, and neuropsychological evidence supports
the view that all perceptual systems are functionally segregated and show a parallel-hier­
archical type of organization of information processing and coding. Despite this type of
functional organization, pure perceptual disorders are the exception rather than the rule.
This somehow surprising fact can be explained by three main factors: (1) focal brain in­
jury is only rarely restricted to the cortical area in question; (2) the rich, typically recipro­
cal fiber connections between cortical areas are frequently also affected; and (3) percep­
tion may depend on spatiotemporally distributed activity in more than just one cortical
area, as is known, for example, in body perception (Berlucchi & Aglioti, 2010). Thus, an
association of deficits is more likely to occur. Furthermore, complex perceptual disorders,
such as recognition, may also be caused by impaired lower level perceptual abilities, and
it is rather difficult to clearly distinguish between lower and higher level perceptual abili­
ties. In addition, recognition cannot be understood without reference to memory, and it is
therefore not surprising that it has been suggested that the brain structures underlying
visual memory, in particular in the medial temporal lobe, also possess perceptual func­
tions and can thus be understood as an extension of the ventral visual processing stream
(Baxter, 2009; Suzuki, 2009). Consequently, rather than trying to map perceptual func­
tions onto more or less separate brain structures, a more comprehensive understanding
of perception would benefit from the study of cortical representation of functions crucial­
ly involved in defined percepts (Bussey & Saksida, 2007). This also holds true for the per­
ception–action debate, in particular in vision, which is treated as an exploratory activity,
that is, a way of acting based on sensorimotor contingencies, as proposed by O’Regan &
Noë (2001). According to this approach, the outside visual world serves as its own repre­
sentation, whereas the experience of seeing occurs as a result of mastering the “govern­
ing laws of sensorimotor contingency” and thereby accounts for visual experience and
“visual consciousness.” If one applies this approach to the pathology of visual perception,
then the question arises as to which visual perceptual disorders would result from the im­
paired representation of the “outside” visual world, and which from the defective “mas­
tering of the governing laws of sensorimotor contingency.” Would visual perceptual disor­
ders of the first type not be experienced by patients, and thus not represent a disorder
Page 17 of 37
Perceptual Disorders

and not cause any handicap, because there is no “internal” representation of the outside
world in our brains and thus no visual experience? Modulatory effects of producing action
on perception such that observers become selectively sensitive to similar or related ac­
tions are known from visual imitation learning and social interactions (Schutz-Bosbach &
Prinz, 2007), but in both instances, perception of action and, perhaps, motivation to ob­
serve and attention directed to the action in question are required. Nevertheless, a more
detailed understanding of the bidirectional relationships between perception and action
and the underlying neural networks will undoubtedly help us to understand how percep­
tion modulates action and vice versa. Possibly, the search for associations and dissocia­
tions of perceptions and actions in cases with acquired brain injury in the framework of
common functional representations in terms of sensorimotor contingencies represents a
helpful approach to studying the reciprocal relationships between perception and action.
Accurate visually guided hand actions in the absence of visual perception (Goodale, 2008)
and impaired eye–hand coordination and saccadic control in optic ataxia as a conse­
quence of impaired visual-spatial processing (Pisella et al., 2009) are examples of such
dissociations and associations. Despite some conceptual concerns and limitations, the
dual-route model of visual processing proposed by Milner and Goodale (2008) is still of
theoretical and practical value (Clark, 2009).

An interesting issue is implicit processing of stimuli in the absence of experience or


awareness, such as detection, localization, and even discrimination of simple visual stim­
uli in hemianopia (“blindsight”; Cowey, 2010; Danckert & Rosetti, 2005); discrimination of
letters in visual form agnosia (Aglioti, Bricolo, Cantagallo & Berlucchi, 1999), discrimina­
tion of forms in visual agnosia (Kentridge, Heywood, & Milner, 2004; Yang, Wu, & Shen,
2006); and discrimination of faces in prosopagnosia (Le, Raufaste, Roussel, Puel, & De­
monet, 2003). Such results suggest sparing of function in the particular brain structure,
but they may also be explained by stimulus processing in structures or areas that also
contribute to a particular perceptual function. However, spared processing of stimuli is
not identical with perception of the same stimuli. A paradigmatic example of implicit pro­
cessing of visual stimuli in the absence of the primary visual (p. 205) cortex, blindsight,
has helped us to understand the nature of visual processing, but it is still unknown
whether it is used or useful in everyday life activities; that is, it may not have any percep­
tual significance (Cowey, 2010).

Furthermore, cognition plays an important role in perception, in particular attention,


memory, and in monitoring of perceptual activities. Therefore, perceptual disorders can
also result from or at least be exaggerated by cognitive dysfunctions associated with ac­
quired brain injury. The parietal cortex may be one of the brain structures that serve as a
bridge between perception, cognition, and action (Gottlieb, 2007). Future research on
perceptual disorders should therefore also focus on the effect of injury to brain structures
engaged in attention, memory, and executive functions involved in perception, such as the
temporal lobe, hippocampus, (posterior) parietal cortex, and prefrontal cortex. As a re­
sult, the framework for interpreting perceptual disorders after brain injury, as well as in
other pathological states, could be further widened substantially. The search for funda­
mental requirements for visual perception and the coupling between brain functions un­
Page 18 of 37
Perceptual Disorders

derlying perception and cognition may further help to define perceptual dysfunction with
sufficient validity and thus contribute to the significance of perception (Pollen, 2008). Re­
search on functional plasticity in subjects with perceptual disorders using experimental
practice paradigms may, in addition, contribute to a more comprehensive and integrative
understanding of perception in the framework of other functional systems in the brain,
which are known to modulate perceptual learning and thus functional plasticity in percep­
tual systems (Gilbert, Li & Piech, 2009; Gilbert & Sigman, 2007).

Author Note
Preparation of this chapter has been supported in part by the German Ministry for Educa­
tion and Research (BMBF grant 01GW0762). I want to thank Susanne Schuett for her
very helpful support.

References
Adolphs, R. (2003). Cognitive neuroscience of human social behaviour. Nature Reviews
Neuroscience, 4, 165–178.

Adolphs, R. (2009). The social brain: Neural basis of social knowledge. Annual Review of
Psychology, 60, 693–716.

Adolphs, R., & Tranel, D. (2004). Impaired judgments of sadness but not happiness follow­
ing bilateral amygdala damage. Journal of Cognitive Neuroscience, 16, 453–462.

Adolphs, R., Tranel, D., & Damasio, A. R. (2003). Dissociable neural systems for recogniz­
ing emotions. Brain and Cognition, 52, 61–69.

Aglioti, S., Bricolo, E., Cantagallo A., Berlucchi, G. (1999). Unconscious letter discrimina­
tion is enhanced by association with conscious color perception in visual form agnosia.
Current Biology, 9, 1419–1422.

Andersson, F., Joliot, M., Perchey, G., & Petit, L. (2007). Eye position-dependent activity in
the primary visual area as revealed by fMRI. Human Brain Mapping, 28, 673–680.

Anema, H. A., Kessels, R. P., de Haan, E. H., Kappelle, L. J., Leijten, F. S., van Zandvoort,
M. J., & Dijkerman, H. C. (2008). Differences in finger localisation performance of pa­
tients with finger agnosia. NeuroReport, 19, 1429–1433.

Anstis, S. M. (1974). A chart demonstrating variations in acuity with retinal position. Vi­
sion Research, 14, 579–582.

Arzy, S., Overney, L. S., Landis, T., Blanke, O. (2006). Neural mechanisms of embodiment:
Asomatognosia due to premotor cortex damage. Archives of Neurology, 63, 1022–1025.

Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music
agnosia associated with middle cerebral artery infarcts. Brain, 123, 1926–1938.

Page 19 of 37
Perceptual Disorders

Bamiou, D. E., Musiek, F. E., & Luxon, L. M. (2003). The insula (Island of Reil) and its role
in auditory processing. Brain Research—Brain Research Reviews, 42, 143–154.

Bar, M. (2003). A cortical mechanism triggering top-down facilitation in visual object


recognition. Journal of Cognitive Neuroscience, 15, 600–609.

Bar, M. (2004). Visual objects in context. Nature Reviews Neuroscience, 5, 617–628.

Barrios, F. A., Gonzalez, L., Favila, R., Alonso, M. E., Salgado, P. M., Diaz, R., & Fernan­
dez-Ruiz, J. (2007). Olfaction and neurodegeneration in HD. NeuroReport, 18, 73–76.

Bartels, A., & Zeki, S. (1998). The theory of multistage integration. Proceedings of the
Royal Society London B, 265, 2327–2332.

Barton, J. J. S. (2008a). Structure and function in acquired prosopagnosia: Lessons from a


series of 10 patients with brain damage. Journal of Neuropsychology, 2, 197–225.

Barton, J. J. S. (2008b). Prosopagnosia associated with a left occipitotemporal lesion. Neu­


ropsychologia, 46, 2214–2224.

Barton, J. J. S., & Black, S. E. (1998). Line bisection in hemianopia. Journal of Neurology,
Neurosurgery & Psychiatry, 64, 660–662.

Barton, J. J. S., Behrmann, M., & Black, S. (1998) Ocular search during line bisection: The
effects of hemi-neglect and hemianopia. Brain, 121, 1117–1131.

Baxter, M. G. (2009). Involvement of medial temporal structures in memory and percep­


tion. Neuron, 61, 667–677.

Behrmann, M., Peterson, M. A., Moscovitch, M., & Suzuki, S. (2006). Independent repre­
sentation of parts and the relations between them: Evidence from integrative agnosia.
Journal of Experimental Psychology: Human Perception & Performance, 32, 1169–1184.

Behrmann, M., & Williams, P. (2007). Impairments in part-whole representations of ob­


jects in two cases of integrative visual agnosia. Cognitive Neuropsychology, 24, 701–730.

Beis, J. M., Paysant, J., Bret, D., Le Chapelain, L., & Andre, J. M. (2007). Specular right-left
disorientation, finger-agnosia, and asomatognosia in right hemisphere stroke. Cognitive &
Behavioral Neurology, 20, 163–169.

Berlucchi, G., & Aglioti, S. M. (2010). The body in the brain revisited. Experimental Brain
Research, 200, 25–35.

Bernhardson, B. M., Tishelman, C., & Rutqvist, L. E. (2009). Olfactory changes among pa­
tients receiving chemotherapy. European Journal of Oncology Nursing, 13, 9–15.

Berryhill, M. E., Fendrich, R., & Olson, I. R. (2009). Impaired distance perception
(p. 206)

and size constancy following bilateral occipito-parietal damage. Experimental Brain Re­
search, 194, 381–393.

Page 20 of 37
Perceptual Disorders

Bharucha, J. J., Curtis, M., & Paroo, K. (2006). Varieties of musical experiences. Cognition,
100, 131–172.

Billino, J., Braun, D. I., Bohm, K. D., Bremmer, F., & Gegenfurtner, K. R. (2009). Cortical
networks for motion perception: effects of focal brain lesions on perception of different
motion types. Neuropsychologia, 47, 2133–2144.

Blanke, O., Landis, T., Mermoud, C., Spinelli, L., & Safran, A. B. (2003). Direction-selec­
tive motion blindness after unilateral posterior brain damage. European Journal of Neuro­
science, 18, 709–722.

Bohlhalter, S., Fretz, C., & Weder, B. (2002). Hierarchical versus parallel processing in
tactile object recognition: A behavioural-neuroanatomical study of apperceptive tactile
agnosia. Brain, 125, 2537–2548.

Bonan, I. V., Leman, M. C., Legargasson, J. F., Guichard, J. P., & Yelnik, A. P. (2006). Evolu­
tion of subjective visual vertical perturbation after stroke. Neurorehabilitation & Neural
Repair, 20, 484–491.

Bornhofen, C., & McDonald, S. (2008). Emotion perception deficits following traumatic
brain injury: A review of the evidence and rationale for intervention. Journal of the Inter­
national Neuropsychological Society, 14, 511–525.

Boso, M., Politi, P., Barale, F., & Enzo, E. (2006). Neurophysiology and neurobiology of the
musical experience. Functional Neurology, 21, 187–191.

Bouvier, S. E., & Engel, S. A. (2006). Behavioral deficits and cortical damage loci in cere­
bral achromatopsia. Cerebral Cortex, 16, 183–191.

Bradley, D. (2004). Object motion: A world view. Current Biology, 14, R892–R894.

Brancucci, A., Lucci, G., Mazzatena, A. & Tommasi, L. (2009). Asymmetries of the human
social brain in the visual, auditory, and chemical modalities. Philosophical Transactions of
the Royal Society of London—Series B: Biological Sciences, 364, 895–914.

Braun, D., Petersen, D., Schoenle, P., & Fahle, M. (1998). Deficits and recovery of first-
and second-order motion perception in patients with unilateral cortical lesions. European
Journal of Neuroscience, 10, 2117–2128.

Bukach, C. M., Le Grand, R., Kaiser, M. D., Bub, D., & Tanaka, J. W. (2008). Preservation
of mouth region processing in two cases of prosopagnosia. Journal of Neuropsychology, 2,
227–244.

Bulens, C., Meerwaldt, J. D., van der Wildt, G. J., & Keemink, D. (1986). Contrast sensitivi­
ty in Parkinson’s disease. Neurology, 36, 1121–1125.

Page 21 of 37
Perceptual Disorders

Bulens, C., Meerwaldt, J. D., van der Wildt, G. J., & Keemink, D. (1989). Spatial contrast
sensitivity in unilateral cerebral ischemic lesions involving the posterior visual pathway.
Brain, 112, 507–520.

Bullier, J. (2003). Cortical connections and functional interactions between visual cortical
areas. In M. Fahle & M. Greenlee (Eds.), The neuropsychology of vision (pp. 23–63). Ox­
ford, UK: Oxford University Press.

Bussey, T. J., & Saksida, L. M. (2007). Memory, perception, and the ventral visual-perirhi­
nal-hippocampal stream: thinking outside of the boxes. Hippocampus, 17, 898–908.

Caminiti, R., Chafee, M. V., Battalglia-Mayer, A., Averbeck, B. B., Crowe, D. A., & Geor­
gopoulos, A. P. (2010). Understanding the parietal lobe syndrome from a neuropsychologi­
cal and evolutionary perspective. European Journal of Neuroscience, 31, 2320–2340.

Campbell, R., Zihl, J., Massaro, D., Munhall, K., & Cohen, M. M. (1997). Speechreading in
the akinetopsic patient, L.M. Brain, 120, 1793–1803.

Carey, D. P., Dijkerman, H. C., Murphy, K. J., Goodale, M. A., & Milner, A. D. (2006). Point­
ing to places and spaces in a patient with visual form agnosia. Neuropsychologia, 44,
1584–1594.

Cerrato, P., Lentini, A., Baima, C., Grasso, M., Azzaro, C., Bosco, G., Destefanis, E., Benna,
P., Bergui, M., & Bergamasco, B. (2005). Hypogeusia and hearing loss in a patient with an
inferior collicular lesion. Neurology, 65, 1840–1841.

Chartrand, J. P., Peretz, I., & Belin, P. (2008). Auditory recognition expertise and domain
specificity. Brain Research, 1220, 191–198.

Clark, A. (2009). Perception, action, and experience: unraveling the golden braid. Neu­
ropsychologia, 47, 1460–1468.

Clarke, G. (2005). Incidence of neurological vision impairment in patients who suffer from
an acquired brain injury. International Congress Series, 1282, 365–369.

Clarke, S., Bellmann, A., Meuli, R. A., Assal, G., & Steck, A. J. (2000). Auditory agnosia
and auditory spatial deficits following left hemispheric lesions: Evidence for distinct pro­
cessing pathways. Neuropsychologia, 38, 797–807.

Clavagnier, S., Fruhmann Berger, M., Klockgether, T., Moskau, S., & Karnath, H. O.
(2006). Restricted ocular exploration does not seem to explain simultanagnosia. Neu­
ropsychologia, 44, 2330–2336.

Combarros, O., Miro, J., & Berciano, J. (1994). Ageusia associated with thalamic plaque in
multiple sclerosis. European Neurology, 34, 344–346.

Page 22 of 37
Perceptual Disorders

Connolly, D. M., Barbur, J. L., Hosking, S. L., & Moorhead, I. R. (2008). Mild hypoxia im­
pairs chromatic sensitivity in the mesopic range. Investigative Ophthalmology & Visual
Science, 49, 820–827.

Coslett, H. B., & Lie, G. (2008). Simultanagnosia: When a rose is not red. Journal of Cog­
nitive Neuroscience, 20, 36–48.

Cowey, A. (2010). The blindsight saga. Experimental Brain Research, 200, 3–24.

Croker, V., & McDonald, S. (2005). Recognition of emotion from facial expression follow­
ing traumatic brain injury. Brain Injury, 19, 787–799.

Dahmen, J. C., & King, A. J. (2007). Learning to hear: Plasticity of auditory cortical pro­
cessing. Current Opinion in Neurobiology, 17, 456–464.

Dalrymple, K. A., Kingstone, A., & Barton, J. J. (2007). Seeing trees OR seeing forests in
simultanagnosia: Attentional capture can be local or global. Neuropsychologia, 45, 871–
875.

Danckert, J., & Rosetti, Y. (2005). Blindsight in action: What can the different subtypes of
blindsight tell us about the control of visually guided actions? Neuroscience & Biobehav­
ioral Reviews, 29, 1035–1046.

Darling, W. G., Bartelt, R., Pizzimenti, M. A., & Rizzo, M. (2008). Spatial perception errors
do not predict pointing errors by individuals with brain lesions. Journal of Clinical & Ex­
perimental Neuropsychology, 30, 102–119.

Darling, W. G., Pizzimenti, M. A., & Rizzo, M. (2003). Unilateral posterior parietal lobe le­
sions affect representation of visual space. Vision Research, 43, 1675–1688.

Dean, H. L., Crowley, J. C., & Platt, M. L. (2004). Visual and saccade-related activity in
macaque posterior cingulated cortex. Journal of Neurophysiology, 92, 3056–3068.

Delvenne, J. F., Seron, X., Coyette, F., & Rossion, B. (2004). Evidence for perceptu­
(p. 207)

al deficits in associative visual (prosop)agnosia: A single case study. Neuropsychologia,


42, 597–612.

Desimone, R., & Ungerleider, L. G. (1989). Neural mechanisms of visual processing in


monkeys. In F. Boller & J. Grafman (Eds.), Handbook of neuropsychology (Vol. 2, pp. 267–
299). Amsterdam: Elsevier.

Doty, R. L. (2009). The olfactory system and its disorders. Seminars in Neurology, 29, 74–
81.

Eckman, G., Berglund, B., Berglund, U., & Lindvall, T. (1967). Perceived intensity of odor
as a function of time of adaptation. Scandinavian Journal of Psychology, 8, 177–186.

Page 23 of 37
Perceptual Disorders

Engelien, A., Silbersweig, D., Stern, E., Huber, W., Doring, W., Frith, C., & Frackowiak, R.
S. (1995). The functional anatomy of recovery from auditory agnosia. A PET study of
sound categorization in a neurological patient and normal controls. Brain, 118, 1395–
1409.

Estanol, B., Baizabal-Carvallo, J. F., & Senties-Madrid, H. (2008). A case of tactile agnosia
with a lesion restricted to the post-central gyrus. Neurology India, 56, 471–473.

Farah, M. (2000). The cognitive neuroscience of vision. Oxford, UK: Blackwell.

Farah, M. (2003). Disorders of visual-spatial perception and cognition. In K. M. Heilman


& E. Valenstein (Eds)., Clinical neuropsychology (4th ed., pp. 146–160). New York: Oxford
University Press.

Fikentscher, R., Roseburg, B., Spinar, H., & Bruchmuller, W. (1977). Loss of taste in the el­
derly: Sex differences. Clinical Otolaryngology & Allied Sciences, 2, 183–189.

Frisén, L. (1980). The neurology of visual acuity. Brain, 103, 639–670.

Fritz, J., Elhilali, M., & Shamma, S. (2005). Active listening: Task-dependent plasticity of
spectrotemporal receptive fields in primary auditory cortex. Hearing Research, 206, 159–
176.

Fujiwara, E., Schwartz, M. L., Gao, F., Black, S.E., & Levine, B. (2008). Ventral frontal cor­
tex functions and quantified MRI in traumatic brain injury. Neuropsychologia, 46, 461–
474.

Gal, R. L. (2008). Visual function 15 years after optic neuritis: A final follow-up report
from the optic neuritis treatment trial. Ophthalmology. 115, 1079–1082.

Ghika, J., Ghika-Schmid, F., & Bogousslavsky, J. (1998). Parietal motor syndrome: A clini­
cal description in 32 patients in the acute phase of pure parietal stroke studied prospec­
tively. Clinical Neurology & Neurosurgery, 100, 271–282.

Giese, M. A., & Poggio, T. (2003). Neural mechanisms for the recognition of biological
movements. Nature Reviews Neuroscience, 4, 179–192.

Gilbert, C. D., Li, W., & Piech, V. (2009). Perceptual learning and adult cortical plasticity.
Journal of Physiology, 587, 2743–2751.

Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory process­
ing. Neuron, 54, 677–696.

Goodale, M. A. (2008). Action without perception in human vision. Cognitive Neuropsy­


chology, 25, 891–919.

Goodale, M. A., & Westwood, D. A. (2004). An evolving view of duplex vision: Separate
but interacting cortical pathways for perception and action. Current Opinion in Neurobi­
ology, 14, 203–211.
Page 24 of 37
Perceptual Disorders

Gottfried, J. A., Deichmann, R., Winston, J. S., & Dolan, R. J. (2002). Functional hetero­
geneity in human olfactory cortex: An event-related functional magnetic resonance imag­
ing study. Journal of Neuroscience, 22, 10819–10828.

Gottlieb, J. (2007). From thought to action: The parietal cortex as a bridge between per­
ception, action, and cognition. Neuron, 53, 9–16.

Green, R. E., Turner, G. R., & Thompson, W. F. (2004). Deficits in facial emotion percep­
tion in adults with recent traumatic brain injury. Neuropsychologia, 42, 133–141.

Green, T. L., McGregor, L. D., & King, K. M. (2008). Smell and taste dysfunction following
minor stroke: A case report. Canadian Journal of Neuroscience Nursing, 30, 10–13.

Griffiths, T. D., Rees, A., Witton, C., & Cross, P. M., Shakir, R. A., & Green, G. G. (1997).
Spatial and temporal auditory processing deficits following right hemisphere infarction: A
psychophysical study. Brain, 120, 85–94.

Grill-Spector, K. (2003). The neural basis of object recognition. Current Opinion in Neuro­
biology, 13, 159–166.

Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro­
science, 27, 649–677.

Harrington, D. O., & Drake, M. V. (1990). The visual fields (6th ed.). St. Louis: Mosby.

Haxel, B. R., Grant, L., & Mackay-Sim, A. (2008). Olfactory dysfunction after head injury.
Journal of Head Trauma Rehabilitation, 23, 407–413.

Heider, B. (2000). Visual form agnosia: Neural mechanisms and anatomical foundations.
Neurocase, 6, 1–12.

Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends
in Cognitive Sciences, 7, 498–504.

Hess, R. F., Zihl, J., Pointer, S. J., & Schmid, C. (1990). The contrast sensitivity deficit in
cases with cerebral lesions. Clinical Vision Sciences, 5, 203–215.

Hess, R. H., Baker, C. L. Jr., & Zihl, J. (1989). The “motion-blind” patient: Low-level spatial
and temporal filters. Journal of Neuroscience, 9, 1628–1640.

Heywood, C. A., & Kentridge, R. W. (2003). Achromatopsia, color vision, and cortex. Neu­
rologic Clinics, 21, 483–500.

Heywood, C. A., Wilson, B., & Cowey, A. (1987). A case study of cortical colour blindness
with relatively intact achromatic discrimination. Journal of Neurology, Neurosurgery, and
Psychiatry, 50, 22–29.

Himmelbach, M., Erb, M., & Karnath, H.-O. (2006). Exploring the visual world: The neur­
al substrate of spatial orienting. NeuroImage, 32, 1747–1759.

Page 25 of 37
Perceptual Disorders

Himmelbach, M., Erb, M., Klockgether, T., Moskau, S., & Karnath, H. O. (2009). fMRI of
global visual perception in simultanagnosia. Neuropsychologia, 47, 1173–1177.

Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierar­
chies in the visual system. Neuron, 36, 791–804.

Huberle, E., & Karnath, H. O. (2006). Global shape recognition is modulated by the spa­
tial distance of local elements: Evidence from simultanagnosia. Neuropsychologia, 44,
905–911.

Ietswaart, M., Milders, M., Crawford, J. R., Currie, D., & Scott, C. L. (2008). Longitudinal
aspects of emotion recognition in patients with traumatic brain injury. Neuropsychologia,
46, 148–159.

Jacek, S., Stevenson, R. J., & Miller, L. A. (2007). Olfactory dysfunction in temporal lobe
epilepsy: A case of ictus-related parosmia. Epilepsy & Behavior, 11, 466–470.

Jackson, G. R., & Owsley, C. (2003). Visual dysfunction, neurodegenerative diseases, and
aging. Neurologic Clinics, 21, 709–728.

James, T. W., Culham, J., Humphreys, G. K., Milner, A. D., & Goodale, M. A. (2003).
(p. 208)

Ventral occipital lesions impair object recognition but not object-directed grasping: an
fMRI study. Brain, 126, 2463–2475.

Janata, P. (2005). Brain networks that track musical structure. Annals of the New York
Academy of Sciences, 1060, 111–124.

Kaga, K., Shindo, M., & Tanaka, Y. (1997). Central auditory information processing in pa­
tients with bilateral auditory cortex lesions. Acta Oto-Laryngologica Supplement, 532, 77–
82.

Kaga, K., Shindo, M., Tanaka, Y., & Haebara, H. (2000). Neuropathology of auditory ag­
nosia following bilateral temporal lobe lesions: A case study. Acta Oto-Laryngologica, 120,
259–262.

Karnath, H.-O., Ferber, S., Rorden, C., & Driver, J. (2000). The fate of global information in
dorsal simultanagnosia. Neurocase, 6, 295–306.

Karnath, H.-O., Zihl, J. (2003). Disorders of spatial orientation. In T. Brandt, L. Caplan, J.


Dichgans, C. Diener, & C. Kennard (Eds.), Neurological disorders: Course and treatment
(2nd ed., pp. 277–286). New York: Academic Press.

Kennard, C., Lawden, M., Morland, A. B., & Ruddock, K. H. (1995). Colour identification
and colour constancy are impaired in a patient with incomplete achromatopsia associated
with prestriate cortical lesions. Proceedings of the Royal Society of London—Series B: Bi­
ological Sciences, 260, 169–175.

Page 26 of 37
Perceptual Disorders

Kentridge, R. W., Heywood, C. A., & Milner, A. D. (2004). Covert processing of visual form
in the absence of area L Neuropsychologia, 42, 1488–1495.

Koh, S. B., Kim, B. J., Lee, J., Suh, S. I., Kim, T. K., & Kim, S. H. (2008). Stereopsis and col­
or vision impairment in patients with right extrastriate cerebral lesions. European Neurol­
ogy, 60, 174–178.

Konen, C. S., & Kastner, S. (2008). Two hierarchically organized neural systems for object
information in human visual cortex. Nature Neuroscience, 11, 224–231.

Kraus, N., & Nicol, T. (2005). Brainstem origins for cortical “what” and “where” pathways
in the auditory system. Trends in Neurosciences, 28, 176–181.

Landis, B. N., Leuchter, I., San Millan Ruiz, D., Lacroix, J. S., & Landis, T. (2006). Tran­
sient hemiageusia in cerebrovascular lateral pontine lesions. Journal of Neurology, Neuro­
surgery, and Psychiatry, 77, 680–683.

Landis, T. (2000). Disruption of space perception due to cortical lesions. Spatial Vision,
13, 179–191.

Landis, T., Regard, M., Bliestle, A., & Kleihues, P. (1988). Prosopagnosia and agnosia for
noncanonical views. An autopsied case. Brain, 111, 1287–1297.

Le, S., Raufaste, E., Roussel, S., Puel, M., & Demonet, J. F. (2003). Implicit face percep­
tion in a patient with visual agnosia? Evidence from behavioural and eye-tracking analy­
ses. Neuropsychologia, 41, 702–712.

Leigh, R. J., & Zee, D. S. (2006). The neurology of eye movements (4th ed.). Philadelphia:
F. A. Davis.

Lewald, J., Peters, S., Corballis, M. C., & Hausmann, M. (2009). Perception of stationary
and moving sound following unilateral cortectomy. Neuropsychologia, 47, 962–971.

Lewis, J. W., Wightman, F. L., Brefczynski, J. A., Phinney, R. E., Binder, J. R., & DeYoe, E. A.
(2004). Human brain regions involved in recognizing environmental sounds. Cerebral
Cortex, 14, 1008–1021.

Lissauer, H. (1890). Ein Fall von Seelenblindheit nebst einem Beitrage zur Theorie dersel­
ben. [A case of mindblindness with a contribution to its theory]. Archiv für Psychiatrie
und Nervenkrankheiten, 21, 222–270.

Lotze, M., & Moseley, G. L. (2007). Role of distorted body image in pain. Current Rheuma­
tology Reports, 9, 488–496.

Luzzi, S., Snowden, J. S., Neary, D., Coccia, M., Provinciali, L., & Lambon Ralph, M. A.
(2007). Distinct patterns of olfactory impairment in Alzheimer’s disease, semantic demen­
tia, frontotemporal dementia, and corticobasal degeneration. Neuropsychologia, 45,
1823–1831.

Page 27 of 37
Perceptual Disorders

Mackay-Sim, A., Johnston, A. N., Owen, C., & Burne, T. H. (2006). Olfactory ability in the
healthy population: Reassessing presbyosmia. Chemical Senses, 31, 763–771.

Martinez-Conde, S., Macknik, S. L., & Hubel, D. H. (2004). The role of fixational eye
movements in visual perception. Nature Reviews Neuroscience, 5, 229–239.

Mather, G. (2006). Foundations of perception. Hove (UK) and New York: Psychology
Press.

Mathy, I., Dupois, M. J., Pigeolet, Y., & Jacquerye, P. (2003). Bilateral ageusia after left in­
sular and opercular ischemic stroke. [French]. Revue Neurologique, 159, 563–567.

McIntosh, R. D., Dijkerman, H. C., Mon-Williams, M., & Milner, A. D. (2004). Grasping
what is graspable: Evidence from visual form agnosia. Cortex, 40, 695–702.

McLeod, P., Dittrich, W., Driver, J., Perrett, D., & Zihl, J. (1996). Preserved and impaired
detection of structure from motion by a motion-blind patient. Visual Cognition, 3, 363–
391.

McLeod, P., Heywood, C., Driver, J., & Zihl, J. (1989). Selective deficit of visual search in
moving displays after extrastriate damage. Nature, 339, 466–467.

Meadows, J. C. (1974). Disturbed perception of colours associated with localized cerebral


lesions. Brain, 97, 615–632.

Mendez, M. F. (2001). Generalized auditory agnosia with spared music recognition in a


left-hander: Analysis of a case with a right temporal stroke. Cortex, 37, 139–150.

Mendez, M. F., & Cherrier, M. M. (2003). Agnosia for scenes in topographagnosia. Neu­
ropsychologia, 41, 1387–1395.

Mendez, M. F., & Ghajarnia, M. (2001). Agnosia for familiar faces and odors in a patient
with right temporal lobe dysfunction. Neurology, 57, 519–521.

Michel, F., & Henaff, M. A. (2004). Seeing without the occipito-parietal cortex: Simul­
tanagnosia as a shrinkage of the attentional visual field. Behavioural Neurology, 15, 3–13.

Miller, L. J., Mittenberg, S., Carey, V. M., McMorrow, M. A., Kushner, T. E., & Weinstein, J.
M. (1999). Astereopsis caused by traumatic brain injury. Archives of Clinical Neuropsy­
chology, 14, 537–543.

Milner, A. D., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., et al.
(1991). Perception and action in “visual form agnosia.” Brain, 114, 405–428.

Milner, A. D., & Goodale, M. A. (2008). Two visual systems re-reviewed. Neuropsychologia,
46, 774–785.

Moreaud, O. (2003). Balint syndrome. Archives of Neurology, 60, 1329–1331.

Page 28 of 37
Perceptual Disorders

Moro, V., Urgesi, C., Pernigo, S., Lanteri, P., Pazzaglia, M., & Aglioti, S. M. (2008). The
neural basis of body form and body action agnosia. Neuron, 60, 235–246.

Mort, D. J., & Kennard, C. (2003). Visual search and its disorders. Current Opinion in Neu­
rology, 16, 51–57.

Moura, A. L., Teixeira, R. A., Oiwa, N. N., Costa, M. F., Feitosa-Santana, C., Calle­
(p. 209)

garo, D., Hamer, R. D., & Ventura, D. F. (2008). Chromatic discrimination losses in multi­
ple sclerosis patients with and without optic neuritis using the Cambridge Colour Test. Vi­
sual Neuroscience, 25, 463–468.

Moutoussis, K., & Zeki, S. (2008). Motion processing, directional selectivity, and con­
scious visual perception in the human brain. Proceedings of the National Academy of the
United States of America, 105, 16362–16367.

Müller, T., Woitalla, D., Peters, S., Kohla, K., & Przuntek, H. (2002). Progress of visual dys­
function in Parkinson’s disease. Acta Neurologica Scandinavica, 105, 256–260.

Mycroft, R. H., Behrmann, M., & Kay, J. (2009). Visuoperceptual deficits in letter-by-letter
reading? Neuropsychologia, 47, 1733–1744.

Nakachi, R., Muramatsu, T., Kato, M., Akiyama, T., Saito, F., Yoshino, F., Mimura, M., &
Kashima, H. (2007). Progressive prosopagnosia at a very early stage of frontotempolar
lobular degeneration. Psychogeriatrics, 7, 155–162.

Nobre, A. C. (2001). The attentive homunculus: Now you see it, now you don’t. Neuro­
science & Biobehavioral Reviews, 25, 477–496.

Oliveri, M., Turriziani, P., Carlesimo, G. A., Koch, G., Tomaiuolo, F., Panella M., & Calta­
girone, G. M. (2001). Parieto-frontal interactions in visual-object and visual-spatial work­
ing memory: Evidence from transcranial magnetic stimulation. Cerebral Cortex, 11, 606–
618.

Olson, C. R., Gettner, S. N., Ventura, V., Carta, R., & Kass, R. E. (2000). Neuronal activity
in macaque supplementary eye field during planning of saccades in response to pattern
and spatial cues. Journal of Neurophysiology, 84, 1369–1384.

Orban, G. A. (2008). Higher order visual processing in macaque extrastriate cortex. Physi­
ological Reviews, 88, 59–89.

O’Regan, J. K., & Noë, A. (2001). A sensorimotor account of visual and visual conscious­
ness. Behavioral and Brain Sciences, 24, 939–1031.

Ozaki, I., & Hashimoto, I. (2007). Human tonotopic maps and their rapid task-related
changes studied by magnetic source imaging. Canadian Journal of Neurological Sciences,
34, 146–153.

Page 29 of 37
Perceptual Disorders

Pambakian, A. L. M., Mannan, S. K., Hodgson, T. L., & Kennard, C. (2004). Saccadic visual
search training: A treatment for patients with homonymous hemianopia. Journal of Neu­
rology, Neurosurgery, and Psychiatry, 75, 1443–1448.

Pardini, M., Huey, E. D., Cavanagh, A. L., & Grafman, J. (2009). Olfactory function in corti­
cobasal syndrome and frontotemporal dementia. Archives of Neurology, 66, 92–96.

Pavese, A., Coslett, H. B., Saffran, E., & Buxbaum, L. (2002). Limitations of attentional
orienting: Effects of abrupt visual onsets and offsets on naming two objects in a patient
with simultanagnosia. Neuropsychologia, 40, 1097–1103.

Pegna, A. J., Caldara-Schnetzer, A. S., & Khateb, A. (2008). Visual search for facial expres­
sions of emotion is less affected in simultanagnosia. Cortex, 44, 46–53.

Pell, M. D. (1998). Recognition of prosody following unilateral brain lesion: influence of


functional and structural attributes of prosodic contours. Neuropsychologia, 36, 701–715.

Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville,
S. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain,
117, 1283–1301.

Pisella, L., Sergio, L., Blangero, A., Torchin, H., Vighetto, A., & Rosetti, Y. (2009). Optic
ataxia and the function of the dorsal stream: Contributions to perception and action. Neu­
ropsychologia, 47, 3033–3044.

Plaisier, M. A., Tiest, W. M., & Kappers, A. M. (2009). Salient features in 3-D haptic shape
perception. Attention Perception & Psychophysics, 71, 421–430.

Plant, G. T., Laxer, K. D., Barbaro, N. M., Schiffman, J. S., & Nakayama, K. (1993). Im­
paired visual motion perception in the contralateral hemifield following unilateral posteri­
or cerebral lesions in humans. Brain, 116, 1303–1335.

Pollatos, O., Albrecht, J., Kopietz, R., Linn, J., Schoepf, V., Kleemann, A. M., Schreder, T.,
Schandry, R., & Wiesmann, M. (2007). Reduced olfactory sensitivity in subjects with de­
pressive symptoms. Journal of Affective Disorders, 102, 101–108.

Pollen, D. A. (2008). Fundamental requirements for primary visual perception. Cerebral


Cortex, 18, 1991–1998.

Polster, M. R., & Rose, S. B. (1998). Disorders of auditory processing: Evidence for modu­
larity in audition. Cortex, 34, 47–65.

Porro, C. A., Lui, F., Facchin, P., Maieron, M., & Baraldi, P. (2005). Percept-related activity
in the human somatosensory system: functional magnetic resonance imaging studies.
Magnetic Resonance Imaging, 22, 1539–1548.

Page 30 of 37
Perceptual Disorders

Postma, A., Sterken, Y., de Vries, L., & de Haan, E. H. E. (2000). Spatial localization in pa­
tients with unilateral posterior left or right hemisphere lesions. Experimental Brain Re­
search, 134, 220–227.

Rainville, C., Joubert, S., Felician, O., Chabanne, V., Ceccaldi, M., & Peruch, P. (2006).
Wayfinding in familiar and unfamiliar environments in a case of progressive topographi­
cal agnosia. Neurocase, 11, 297–309.

Rankin, K. P. (2007). Social cognition in frontal injury. In B. L. Miller & J. L. Cummings


(Eds.), The human frontal lobes. Functions and disorders (2nd ed., pp. 345–360). New
York, London: The Guilford Press.

Rayner, K. (1998). Eye movements in reading and visual information processing: 20 Years
of research. Psychological Bulletin, 124, 372–422.

Reed, C. L., Caselli, R. J., & Farah, M. J. (1996). Tactile agnosia: Underlying impairment
and implications for normal tactile object recognition. Brain, 119, 875–888.

Rentschler, I., Treutwein, B., & Landis, T. (1994). Dissociation of local and global process­
ing in visual agnosia. Vision Research, 34, 963–971.

Rice, N. J., McIntosh, R. D., Schindler, I., Mon-Williams, M., Demonet, J. F., & Milner, A. D.
(2006). Intact automatic avoidance of obstacles in patients with visual form agnosia. Ex­
perimental Brain Research, 174, 76–88.

Riddoch, M. J., Humphreys, G. W., Akhtar, N., Allen, H., & Bracewell, R. M., & Schofield,
A. J. (2008). A tale of two agnosias: Distinctions between form and integrative agnosia.
Cognitive Neuropsychology, 25, 56–92.

Riddoch, M. J., Johnston, R. A., Bracewell, R. M., Boutsen, L., & Humphreys, G. W. (2008).
Are faces special? A case of pure prosopagnosia. Cognitive Neuropsychology, 25, 3–26.

Rizzo, M., Nawrot, M., & Zihl, J. (1995). Motion and shape perception in cerebral akine­
topsia. Brain, 118, 1105–1127.

Rizzo, M., Smith, V., Pokorny, J., & Damasio, A. R. (1993). Colour perception profiles in
central achromatopsia. Neurology, 43, 995–1001.

Rizzo, M., & Vecera, S. P. (2002). Psychoanatomical substrates of Balint’s syndrome. Jour­
nal of Neurology, Neurosurgery, and Psychiatry, 72, 162–178.

(p. 210) Roark, D. A., Barrett, S. E., Spence, M. J., Abdi, H., & O’Toole, A. J. (2003). Psycho­
logical and neural perspectives on the role of motion in face recognition. Behavioral &
Cognitive Neuroscience Reviews, 2, 15–46.

Robinson, D., & Podoll, K. (2000). Macrosomatognosia and microsomatognosia in mi­


graine art. Acta Neurologica Scandinavica, 101, 413–416.

Page 31 of 37
Perceptual Disorders

Roessner, V., Bleich, S., Banaschewski, T., & Rothenberger, A. (2005). Olfactory deficits in
anorexia nervosa. European Archives of Psychiatry & Clinical Neuroscience, 255, 6–9.

Rowe, F., Brand, D., Jackson, C. A., Price, A., Walker, L., Harrison, S., Eccleston, C., Scott
C., Akerman, N., Dodridge, C., Howard, C., Shipman, T., Sperring, U., MacDiarmid, S., &
Freeman, C. (2009). Visual impairment following stroke: Do stroke patients require vision
assessment? Age and Ageing, 38, 188–193.

Russ, B. E., Lee, Y. S., & Cohen, Y. E. (2007). Neural and behavioral correlates of auditory
categorization. Hearing Research, 229, 204–212.

Saetti, M. C., De Renzi, E., & Comper, M. (1999). Tactile morphagnosia secondary to spa­
tial deficits. Neuropsychologia, 37, 1087–1100.

Samson, S., Zatorre, R. J., & Ramsay, J. O. (2002). Deficits of musical timbre perception af­
ter unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain, 125,
511–523.

Satoh, M., Takeda, K., Murakami, Y., Onouchi, K., Inoue, K., & Kuzuhara, S. (2005). A case
of amusia caused by the infarction of anterior portion of bilateral temporal lobes. Cortex,
41, 77–83.

Saumier, D., Arguin, M., Lefebvre, C., & Lassonde, M. (2002). Visual object agnosia as a
problem in integrating parts and part relations. Brain & Cognition, 48, 531–537.

Saygin, A. P. (2007). Superior temporal and premotor brain areas necessary for biological
motion perception. Brain, 130, 2452–2461.

Saygin, A. P., Dick, F., Wilson, S. M., Dronkers, N. F., & Bates, E. (2003). Neural resources
for processing language and environmental sounds: Evidence from aphasia. Brain, 126,
928–945.

Schenk, T. (2006). An allocentric rather than perceptual deficit in patient DF. Nature Neu­
roscience, 9, 1369–1370.

Schenk, T., Mai, N., Ditterich, J., & Zihl, J. (2000). Can a motion-blind patient reach for
moving objects? European Journal of Neuroscience, 12, 3351–3360.

Schiller, P. H., & Tehovnik, E. J. (2001). Look and see: How the brain moves your eyes
about. Progress in Brain Research, 134, 127–142.

Schiller, P. H., & Tehovnik, E. J. (2005). Neural mechanisms underlying target selection
with saccadic eye movements. Progress in Brain Research, 149, 157–171.

Schuett, S., Heywood, C. A., Kentrigde, R. W., & Zihl, J. (2008a). The significance of visual
information processing in reading: Insights from hemianopic dyslexia. Neuropsychologia,
46, 2445–2462.

Page 32 of 37
Perceptual Disorders

Schuett, S., Heywood, C. A., Kentrigde, R. W., & Zihl, J. (2008b). Rehabilitation of hemi­
anopic dyslexia: are words necessary for re-learning oculomotor control? Brain, 131,
3156–3168.

Schuppert, M., Munte, T. F., Wieringa, B. M., & Altenmuller, E. (2000). Receptive amusia:
Evidence for cross-hemispheric neural networks underlying music processing strategies.
Brain, 123, 546–559.

Schutz-Bosbach, S., & Prinz, W. (2007). Perceptual resonance: Action-induced modulation


of perception. Trends in Cognitive Sciences, 11, 349–355.

Scott, S. K., Young, A. W., Calder, A. J., Hellawell, D. J., Aggleton, J. P., & Johnson, M.
(1997). Impaired auditory recognition for fear and anger following bilateral amygdala le­
sions. Nature, 385, 254–257.

Shah, M., Deeb, J., Fernando, M., Noyce, A., Visentin, E., Findley, L. J., & Hawkes, C. H.
(2009). Abnormality of taste and smell in Parkinson’s disease. Parkinsonism & Related
Disorders, 15, 232–237.

Shinn-Cunningham, B. G., & Best, V. (2008). Selective attention in normal and impaired
hearing. Trends in Amplification, 12, 283–299.

Shivashankar, N., Shashikala, H. R., Nagaraja, D., Jayakumar, P. N., & Ratnavalli, E.
(2001). Pure word deafness in two patients with subcortical lesions. Clinical Neurology &
Neurosurgery, 103, 201–205.

Short, R. A., & Graff-Radford, N. R. (2001). Localization of hemiachromatopsia. Neuro­


case, 7, 331–337.

Sigala, N. (2004). Visual categorization and the inferior temporal cortex. Behavioural
Brain Research, 149, 1–7.

Simon, S. A., de Araujo, I. E., Gutierrez, R., & Nicolelis, M. A. L. (2006). The neural mech­
anisms of gestation: a distributed processing code. Nature Reviews Neuroscience, 7, 890–
901.

Singh-Curry, V., & Husain, M. (2009). The functional role of the inferior parietal lobe in
the dorsal and ventral stream dichotomy. Neuropsychologia, 47, 1434–1448.

Small, D. M., Bernasconi, N., Bernasconi, A., Sziklas, V., & Jones-Gotman, M. (2005). Gus­
tatory agnosia. Neurology, 64, 311–317.

Smith, C. N., & Squire, L. R. (2008). Experience-dependent eye movements reflect hip­
pocampus-dependent (aware) memory. Journal of Neuroscience, 28, 12825–12833.

Snowden, R. J., & Freeman, T. C. (2004). The visual perception of motion. Current Biology,
14, R828–R831.

Page 33 of 37
Perceptual Disorders

Stephan, B. C. M., & Caine, D. (2009). Aberrant pattern of scanning in prosopagnosia re­
flects impaired face processing. Brain and Cognition, 69, 262–268.

Stolbova, K., Hahn, A., Benes, B., Andel, M., & Treslova, L. (1999). Gustatometry of dia­
betes mellitus patients and obese patients. International Tinnitus Journal, 5, 135–140.

Suchoff, I. B., Kapoor, N., Ciuffreda, K. J., Rutner, D., Han, E., & Craig, S. (2008). The fre­
quency of occurrence, types, and characteristics of visual field defects in acquired brain
injury: A retrospective analysis. Optometry, 79, 259–265.

Suzuki, W. A. (2009). Perception and the medial temporal lobe: Evaluating the current evi­
dence. Neuron, 61, 657–666.

Symonds, C., & MacKenzie, I. (1957). Bilateral loss of vision from cerebral infarction.
Brain, 80, 415–455.

Takaiwa, A., Yoshimura, H., Abe, H., & Terai, S. (2003). Radical “visual capture” observed
in a patient with severe visual agnosia. Behavioural Neurology, 14, 47–53.

Takarae, Y., & Levin, D. T. (2001). Animals and artifacts may not be treated equally: differ­
entiating strong and weak forms of category-specific visual agnosia. Brain & Cognition,
45, 249–264.

Tanaka, Y., Nakano, I., & Obayashi T. (2002). Environmental sound recognition after uni­
lateral subcortical lesions. Cortex, 38, 69–76.

Taniwaki, T., Tagawa, K., Sato, F., & Iino, K. (2000). Auditory agnosia restricted to envi­
ronmental sounds following cortical deafness and generalized auditory agnosia. Clinical
Neurology & Neurosurgery, 102, 156–162.

Thomas, R., & Forde, E. (2006). The role of local and global processing in the recognition
of living and nonliving things. Neuropsychologia, 44, 982–986.

Tomberg, C., & Desmedt, J. E. (1999). Failure to recognise objects by active touch
(p. 211)

(astereognosia) results from lesion of parietal-cortex representation of finger kinaesthe­


sis. Lancet, 354, 393–394.

Tootell, R. B. H., Hadjikhani, N. K., Mendola, J. D., Marrett, S., & Dale, A. M. (1998). From
retinotopy to recognition: fMRI in human visual cortex. Trends in Cognitive Sciences, 2,
174–183.

Turnbull, O. H., Driver, J., & McCarthy, R. A. (2004). 2D but not 3D: Pictorial depth
deficits in a case of visual agnosia. Cortex, 40, 723–738.

Uc, E. Y., Rizzo, M., Anderson, S. W., Quian, S., Rodnitzky, R. L., & Dawson, J. D. (2005).
Visual dysfunction in Parkinson disease without dementia. Neurology, 65, 1907–1923.

Page 34 of 37
Perceptual Disorders

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical systems. In D. J. Ingle, J. W. Mans­
field, & M. A. Goodale (Eds.), Advances in the analysis of visual behaviour (pp. 549–596).
Cambridge, MA: MIT Press.

Vaina, L. M., Makris, N., Kennedy, D., & Cowey, A. (1998). The selective impairment of the
perception of first-order motion by unilateral cortical brain damage. Visual Neuroscience,
15, 333–348.

Valenza, N., Ptak, R., Zimine, I., Badan, M., Lazeyras, F., & Schnider, A. (2001). Dissociat­
ed active and passive tactile shape recognition: A case study of pure tactile apraxia.
Brain, 24, 2287–2298.

Vallar, G., & Ronchi R. (2009). Somatoparaphrenia: A body delusion. A review of the neu­
ropsychological literature. Experimental Brain Research, 192, 533–551.

VandenBos, G. R. (Ed.). (2007). APA dictionary of psychology. Washington, DC: American


Psychological Association.

Venn, H. R., Watson, S., Gallagher, P., & Young, A. H. (2006). Facial expression percep­
tion: An objective outcome measure for treatment studies in mood disorders? Internation­
al Journal of Neuropsychopharmacology, 9, 229–245.

Vignolo, L. A. (2003). Music agnosia and auditory agnosia: Dissociations in stroke pa­
tients. Annals of the New York Academy of Sciences, 999, 50–57.

Walsh, Th. J. (1985). Blurred vision. In Th. J. Walsh (Ed.) Neuro-ophthalmology: Clinical
signs and symptoms (pp. 343–385). Philadelphia: Lea & Febiger.

Wang, E., Peach, R. K., Xu, Y., Schneck, M., & Manry, C. (2000). Perception of dynamic
acoustic patterns by an individual with unilateral verbal auditory agnosia. Brain & Lan­
guage, 73, 442–455.

Wang, X., Lu, T., Bendor, D., & Bartlett, E. (2008). Neural coding of temporal information
in auditory thalamus and cortex. Neuroscience, 157, 484–494.

Wang, W. J., Wu, X. H., & Li, L. (2008). The dual-pathway model of auditory signal pro­
cessing. Neuroscience Bulletin, 24, 173–182.

Weinberger, N. M. (2007). Auditory associative memory and representational plasticity in


the primary auditory cortex. Hearing Research, 229, 54–68.

Wermer, M. J., Donswijk, M., Greebe, P., Verweij, B. H., & Rinkel, G. J. (2007). Anosmia af­
ter aneurysmal subarachnoid hemorrhage. Neurosurgery, 61, 918–922.

Wierenga, C. E., Perlstein, W. M., Benjamin, M., Leonard, C. M., Rothi, L. G., Conway, T.,
Cato, M. A., Gopinath, K., Briggs, R., & Crosson, B. (2009). Neural substrates of object
identification: Functional magnetic resonance imaging evidence that category and visual

Page 35 of 37
Perceptual Disorders

attribute contribute to semantic knowledge. Journal of the International Neuropsychologi­


cal Society, 15, 169–181.

Wilson, R. S., Arnold, S. E., Tang, Y., & Bennett, D. A. (2006). Odor identification and de­
cline in different cognitive domains in old age. Neuroepidemiology, 26, 61–67.

Wright, B. A., & Zhang, Y. (2009). A review of the generalization of auditory learning.
Philosophical Transactions of the Royal Society of London—Series B: Biological Sciences,
364, 301–311.

Yamamoto, T. (2006). Neural substrates fort he processing of cognitive and affective as­
pects of taste in the brain. Archives of Histology & Cytology, 69, 243–255.

Yang, J., Wu, M., & Shen, Z. (2006). Preserved implicit form perception and orientation
adaptation in visual form agnosia. Neuropsychologia, 44, 1833–1842.

Zaehle, T., Geiser, E., Alter, K., Jancke, L., & Meyer, M. (2008). Segmental processing in
the human auditory dorsal stream. Brain Research, 1220, 179–190.

Zatorre, R. J. (2007). There’s more to auditory cortex than meets the ear. Hearing Re­
search, 229, 24–30.

Zeki, S. (1993). A vision of the brain. Oxford, UK: Blackwell Scientific.

Zeki, S., & Bartels, A. (1998). The autonomy of the visual systems and the modularity of
conscious vision. Proceedings of the Royal Society London B, 353, 1911–1914.

Zihl, J. (1995a). Visual scanning behavior in patients with homonymous hemianopia. Neu­
ropsychologia, 33, 287–303.

Zihl, J. (1995b). Eye movement patterns in hemianopic dyslexia. Brain, 118, 891–912.

Zihl, J. (2011). Rehabilitation of cerebral visual disorders (2nd ed.). Hove, UK: Psychology
Press.

Zihl, J., & Hebel, N. (1997). Patterns of oculomotor scanning in patients with unilateral
posterior parietal or frontal lobe damage. Neuropsychologia, 35, 893–906.

Zihl, J., Sämann, Ph., Schenk, T., Schuett S., & Dauner, R. (2009). On the origin of line bi­
section error in hemianopia. Neuropsychologia, 47, 2417–2426.

Zihl, J., von Cramon, D., & Mai, N. (1983). Selective disturbance of movement vision after
bilateral brain damage. Brain, 106, 313–334.

Zihl, J., von Cramon, D., Mai, N., & Schmid, C. (1991). Disturbance of movement vision af­
ter bilateral posterior brain damage. Further evidence and follow up observations. Brain,
114, 2235–2352.

Page 36 of 37
Perceptual Disorders

Josef Zihl

Josef Zihl is research group leader and head of the outpatient clinic for neuropsychol­
ogy, Max Planck Institute of Psychiatry.

Page 37 of 37
Varieties of Auditory Attention

Varieties of Auditory Attention  


Claude Alain, Stephen R. Arnott, and Benjamin J. Dyson
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0011

Abstract and Keywords

Research on attention is one of the major areas of investigation within psychology, neurol­
ogy, and cognitive neuroscience. There are many areas of active investigations that aim to
understand the brain networks and mechanisms that support attention, in addition to the
relationship between attention and other cognitive processes like working memory, vigi­
lance, and learning. This chapter focuses on auditory attention, with a particular empha­
sis on studies that have examined the neural underpinnings of sustained, selective, and
divided attention. The chapter begins with a brief discussion regarding the possible role
of attention in the formation and perception of sound objects as the underlying units of
selection. The similarities and differences in neural networks supporting various aspects
of auditory attention, including selective attention, sustained attention, and divided atten­
tion are then discussed. The chapter concludes with a description of the neural networks
involved in the control of attention and a discussion of future directions.

Keywords: attention, perception, cognitive neuroscience, working memory

Varieties of Auditory Attention


Modern research in psychology and neuroscience has shown that attention is not a uni­
tary phenomenon and that attentional processes may vary as a function of the sensory
modality and the task at hand. In audition, one can think of attention in terms of mode or
types of processes that might be engaged during everyday listening situations, namely
sustained attention, selective attention, and divided attention. Each plays an important
role in solving complex listening situations that are often illustrated colloquially using the
cocktail party example, although in most everyday situations, all three modes of attention
can be called on, depending on the context or situation. Imagine, for example, waiting to
be seated at a restaurant. While waiting, it is sustained attention that enables us to moni­
tor the auditory environment for a particular event (i.e., the maître d to call out your
name). During the wait, we may also start to selectively attend to an interesting conversa­
tion occurring within the general dining cacophony, thereby leading to a division of our

Page 1 of 35
Varieties of Auditory Attention

attentional resources between the conversation and the maître d. Common to all three as­
pects of attention (i.e., sustained, selective, divided) are processes that allow us to switch
from one auditory source to another as well as to switch from one attentional mode to an­
other. Auditory attention is both flexible and dynamic in that it can be driven by external
events such as loud sounds (e.g., a plate smashing on the floor) as well as by internal
goal-directed actions that enable listeners to prioritize and selectively process task-rele­
vant sounds at a deeper level (e.g., what the maître d is saying to the restaurant manager
about the reservation), often at the expense of other, less relevant stimuli. This brings up
another important issue in research related to auditory attention, and that is (p. 216) the
role of bottom-up, data-driven attentional capture (e.g., one’s initial response to a fire
alarm) in relation to top-down, controlled attention (e.g., the voluntary control of atten­
tion often associated with goal-directed behavior). This distinction between exogenous
and endogenous factors appears inherent to all three modes of auditory attention.

Owing to the fact that current theories of attention have been developed primarily to ac­
count for visual scene analysis, as well as the fact that early work on attention demon­
strated greater equity between vision and audition, there has been a tendency to assume
that general principles derived from research on visual attention can also be applied to
situations that involve auditory attention. Despite these links, analogies between auditory
and visual attention may be misleading. For instance, in vision, it appears that attention
can be allocated to a particular region of retinal space. However, the same does not nec­
essarily apply to audition, in the sense that as far as it is known, there is no topographic
representation of auditory space in the human brain. Although evidence suggests that we
can allocate auditory attention to various locations of retinal space, we can also attend to
sounds that are outside of our sight (e.g., Tark & Curtis, 2009), which makes auditory at­
tention particularly important for monitoring changes that occur outside our visual field.
However, it is important to point out that we do not necessarily have to actively monitor
our auditory environment in order to notice occasional or peculiar changes in sounds oc­
curring outside our visual field. Indeed, there is ample evidence from scalp recordings of
event-related potentials (ERPs) showing that infrequent changes in the ongoing auditory
scene are automatically detected and can trigger attention to them (Näätänen et al.,
1978; Picton et al., 2000; Winkler et al., 2009). These changes in the auditory environ­
ment may convey important information that could require an immediate response, such
as a car horn that is increasing in intensity. In that respect, audition might be thought of
as being at the service of vision, especially in situations that require the localization of
sound sources in the environment (Arnott & Alain, 2011; Kubovy & Van Valkenburg,
2001). Such considerations set the scene for the following discussion, in which a number
of central questions pertaining to attention and auditory cognitive neuroscience will be
tackled. What are the similarities and differences in the brain areas associated with ex­
ogenous and endogenous auditory attention? What is the neural network that enables us
to switch attention between modalities or tasks? Are there different networks for switch­
ing attention between auditory spatial locations and objects? If so, are these the same as
those used in the visual modality? What are the neural networks supporting the engage­
ment and disengagement of auditory attention?

Page 2 of 35
Varieties of Auditory Attention

This chapter focuses on the psychological and neural mechanisms supporting auditory at­
tention. We review studies on auditory attention, with a particular emphasis on human
studies, although we also consider animal research when relevant. Our review is by no
means exhaustive but rather aims to provide a theoretical framework upon which future
research can be generated. We begin with a brief discussion regarding the limit of atten­
tion and explore the possible mechanisms involved in the formation of auditory objects
and the role of attention on sound object formation. We then discuss the similarities and
differences in neural networks supporting various aspect of auditory attention, including
selective attention, sustained attention, and divided attention. We conclude by describing
the neural networks involved in the control of attention and discuss future directions.

What Are We Attending To?


Although the auditory environment usually comprises a myriad of sounds from various
physical sources, only a subset of those sounds “enter” our awareness. Continuing with
our earlier analogy for instance, we may start to listen to a nearby conversation while the
maître d speaks with other patrons. The sounds from both conversations may overlap in
time, yet we can follow and switch from one conversation to the other with seeming ef­
fortlessness: each conversation appears as a separate auditory stream, with our attention
alternating between them at will. During the past 30 years, researchers have identified
numerous cues that help listeners sort the incoming acoustic data into distinct sound
sources, hereafter referred to as auditory objects or streams. For instance, sounds with
common spatial locations, onsets, and spectral profiles usually originate from the same
physical source and therefore are usually assigned to the same perceptual object (Alain &
Bernstein, 2008; Bregman, 1990; Carlyon, 2004). The sounds that surround us often
change in a predictive manner such that a certain sound may lead us to expect the next
one. While listening to the maître d, we anticipate the upcoming sounds that indicate that
our table is ready. Knowledge and experience with the auditory environment are particu­
larly helpful in solving complex listening situations in which sounds (p. 217) from one
source may partially overlap and mask those from another source. In everyday listening
situations, our auditory scene changes constantly, and observers must be able to keep
track of multiple sound objects. Once an auditory scene has been parsed into its compo­
nent objects, selectively attending to one stream (e.g., shouts from the kitchen) while ig­
noring all the other talkers (e.g., maître d) and background noise becomes crucial for ef­
fective communication, especially in acoustically adverse or “cocktail party” environ­
ments (Cherry, 1953; Moray & O’Brien, 1967).

Some visual models have characterized attention in spatial or frequency terms, likening
attention to a “spotlight” or “filter” that moves around, applying processing resources to
whatever falls within a selected spatial region (e.g., Brefczynski & DeYoe, 1999; LaBerge,
1983; McMains & Somers, 2004). Other models discuss resource allocation on the basis
of perceptual objects, in which attending to a particular object enhances processing of all
features of that object (e.g., Chen & Cave, 2008). Recent models of auditory attention
have been consistent with the latter conception of attention in that the underlying units of
Page 3 of 35
Varieties of Auditory Attention

selection are discrete objects or streams, and that attending to one component of an audi­
tory object facilitates the processing of other properties of the same object (Alain &
Arnott, 2000; Shinn-Cunningham, 2008).

In the visual domain, the object-based account of attention was proposed to explain why
observers are better at processing visual features that belong to the same visual object
than when those visual features are distributed between different objects (Duncan, 1984;
Egly et al., 1994). For instance, Duncan (1984) showed that performance declined when
individuals were required to make a single judgment about each of two visually overlap­
ping objects (e.g., the size of one object and the texture of the other object), compared
with when those two judgments had to be made about a single object. The robustness of
the findings was demonstrated not only in behavioral experiments (Baylis & Driver, 1993)
but also in neuroimaging studies employing functional magnetic resonance imaging (fM­
RI; e.g., O’Craven et al., 1999; Yantis & Serences, 2003) and ERPs (e.g., Valdes-Sosa et
al., 1998). Such findings indicated that attention is preferentially allocated to a visual ob­
ject such that the processing of all features belonging to an attended object is facilitated.
Alain and Arnott (2000) proposed an analogous object-based account for audition in
which listeners’ attention is allocated to auditory objects derived from the ongoing audi­
tory scene according to perceptual grouping principles (Bregman, 1990). More recently,
Shinn-Cunningham (2008) has drawn a parallel between object-based auditory and visual
attention, proposing that perceptual objects form the basic units of auditory attention.

Central to the object-based account of auditory attention is the notion that several per­
ceptual objects may be simultaneously accessible for selection and that interactions be­
tween object formation and selective attention determine how competing sources inter­
fere with perception (Alain & Arnott, 2000; Shinn-Cunningham, 2008). Although there is
some behavioral evidence supporting the notion that objects can serve as an organiza­
tional principle in auditory memory (Dyson & Ishfaq, 2008; Hecht et al., 2008), further
work is required to understand how sound objects are represented in memory when these
objects occur simultaneously. Moreover, the object-based account of auditory attention
posits that perceptual objects form the basic unit for selection rather than individual
acoustic features of the stimuli such as pitch and location (Mondor & Terrio, 1998). How­
ever, in many studies, it is often difficult to distinguish feature- and object-based attention
effects because sound objects are usually segregated from other concurrent sources us­
ing simple acoustic features such as pitch and location (e.g., the voice and location of the
maître d in the restaurant). Therefore, how do we know that attention is allocated to a
sound object rather than its defining acoustic features? One way to distinguish between
feature- and object-based attentional accounts is to pit the grouping of stimuli into per­
ceptual objects against the physical proximity of features (e.g., Alain et al., 1993; Alain &
Woods, 1993, 1994; Arnott & Alain, 2002a; Bregman & Rudnicky, 1975; Driver & Baylis,
1989).

Page 4 of 35
Varieties of Auditory Attention

Figure 11.1 A, Schemata of the stimuli presented in


three different conditions. In the evenly spaced (ES)
condition, tones were equally spaced along the fre­
quency axis (tone 1 = 1,048, tone 2 = 1,482, and
tone 3 = 2,096 Hz). In the clustering easy (CE) condi­
tion, the two task-irrelevant tones (i.e., tone 2 =
1,482 and tone 3 = 1,570 Hz) were clustered based
on frequency to promote the segregation of the task-
relevant frequency (i.e., tone 1). In the clustering
hard (CH) condition, the task-relevant frequency was
clustered with the middle distracters (1 = 1,400 Hz).
Arrows indicate the frequency to be attended in each
condition. Targets (defined as longer or louder than
other stimuli) are shown by asterisks. B, Response
time to infrequent duration targets embedded in the
attended stream. C, Group mean difference wave be­
tween event-related brain potentials elicited by the
same stimuli when they were attended and unattend­
ed in all three conditions from the midline frontal
scalp region. The negativity is plotted upward.

Adapted from Alain & Woods, 1994.

Figure 11.1A shows an example in the auditory modality where the physical similarity be­
tween three different streams of sounds was manipulated to promote perceptual group­
ing while at the same time decreasing the physical distance between task-relevant and
task-irrelevant stimuli. In that study, Alain and Woods (1993) presented participants with
rapid transient pure tones that varied along three different frequencies in random order.
In the example shown in Figure 11.1A, participants were asked to focus their attention to
the lowest pitch sound in order to detect slightly longer (Experiment 1) or louder (Experi­
ment 2) target stimuli while ignoring the other sounds. In one condition, the tones com­
posing the sequence were evenly spaced along the frequency domain, whereas in another
condition, the two task-irrelevant frequencies were (p. 218) grouped together by making
the extreme sound (lowest or highest, depending on the condition) more similar to the
middle pitch tone. Performance improved when the two task-irrelevant sounds were
grouped together even though this meant having more distractors closer in pitch to the
task-relevant stimuli (see Figure 11.1B). Figure 11.1C shows the effects of perceptual

Page 5 of 35
Varieties of Auditory Attention

grouping on the selective attention effects on ERPs, which was isolated in the difference
wave between the ERPs elicited by the same sounds when they were task relevant and
when they were task irrelevant. Examining the neural activity that occurred in the brain
during these types of tasks suggested that perceptual grouping enhances selective atten­
tion effects in auditory cortices, primarily along the Sylvian fissure (Alain et al., 1993;
Alain & Woods, 1994; Arnott & Alain, 2002a). Taken together, these studies show that
perceptual grouping can override physical similarity effects during selective listening,
and suggest that sound objects form the basic unit for attentional selection.

Figure 11.2 A, Schematic representation of harmon­


ic complex tones (each horizontal line represents a
pure tone) used in studies that have examined the
role of attention on concurrent sound segregation.
Participants were presented with a harmonic com­
plex that had all tonal elements in tune (fusion) or in­
cluded a mistuned harmonic. In the active listening
task, participants indicated whether they heard one
sound or two sounds (i.e., a buzz plus another sound
with a pure tone quality). In the passive listening
condition, participants watched a muted movie of
their choice with subtitles. B, Auditory event-related
potentials (ERPs) to complex harmonic tones were
measured over the right frontal-central scalp region
(FC2). The difference wave reveals the object-related
negativity (ORN), an ERP component that indexes
the perception of concurrent sound objects. Note the
similarity in ORN amplitude during active listening
(participants indicated whether they heard one
sound or two concurrent sound objects) and passive
listening (participants watched a muted subtitled
movie of their choice). During active listening, the
ORN is followed by a positive wave (P400) thought to
be related to the perceptual decision.

Adapted with permission from Alain, Arnott, & Pic­


ton, 2001.

Page 6 of 35
Varieties of Auditory Attention

One important assumption of the object-based account of auditory attention is that sound
objects are created and segregated independently of attention and that selection for fur­
ther processing takes place after this initial partition of the auditory scene into its con­
stituent objects. Although there is evidence to suggest that sound segregation may occur
independently of listeners’ attention, there are also some findings that suggest otherwise
(Shamma et al., 2011). The role of attention on perceptual organization has been investi­
gated for sounds that occur concurrently as well as for sounds that are sequentially
grouped into distinct perceptual streams. In the case of concurrent sound segregation
(Figure 11.2), the proposal that concurrent sound (p. 219) segregation is not under voli­
tional control was confirmed in ERP studies using passive listening (Alain et al., 2001a,
2002; Dyson & Alain, 2004) as well as active listening paradigms that varied auditory
(Alain & Izenberg, 2003) or visual attentional demands (Dyson et al., 2005). For sequen­
tial sound segregation, the results are more equivocal.

Figure 11.3 B, Schematic representation of the ABA


paradigm often used to investigate auditory stream
segregation. The loss of rhythmic information is in­
dicative of stream segregation. B, Likelihood of re­
porting hearing two concurrent stream of sounds as
a function of the frequency separation between tone
A and tone B. C, Scalp-recorded auditory event-relat­
ed potentials using sequences similar to those shown
in A. Note the similarities in auditory event-related
potentials (ERPs) recorded during attend-and-ignore
condition. The effects of frequency separation on
ERPs recorded during active and passive listening
was not statistically different, suggesting that encod­
ing of ΔF (difference in frequency), which deter­
mines streaming, is little affected by attention.

Adapted with permission from Snyder, Alain, & Pic­


ton, 2006.

In most studies, the effect of attention on sequential sound organization has been investi­
gated by mixing two sets of sound sequences that differ in terms of some acoustic feature
(e.g., in the frequency range of two sets of interleaved pure tones). In a typical frequency
paradigm, sounds are presented in patterns of “ABA—ABA—”, in which “A” and “B” are
tones of different frequencies and “—” is a silent interval (Figure 11.3A). The greater the
stimulation rate and the feature separation, the more likely and rapidly listeners are to
Page 7 of 35
Varieties of Auditory Attention

report hearing two separate streams of sounds (i.e., one of A’s and another of B’s), with
this type of perceptual organization or stream segregation taking several seconds to build
up. Using similar sequences as those shown in Figure 11.3A, Carlyon and colleagues
found that the buildup of stream segregation was affected by auditory (Carlyon et al.,
2001) and visual (Cusack et al., 2004) attention, and they proposed that attention may be
needed (p. 220) for stream segregation to occur. Also consistent with this hypothesis are
findings from neuropsychological studies in which patients with unilateral neglect follow­
ing a brain lesion show impaired buildup in streaming relative to age-matched controls
when stimuli are presented to the neglected side (Carlyon et al., 2001). In addition to the
pro-attention studies reviewed above, there is also evidence to suggest that attention may
not be required for sequential perceptual organization to occur. For instance, patients
with unilateral neglect who are unaware of sounds presented to their neglected side ex­
perience the “scale illusion” (Deutsch, 1975), which can occur only if the sounds from the
left and right ears are grouped together (Deouell et al., 2008). Such findings are difficult
to reconcile with a model invoking a required role of attention in stream segregation and
suggest that some organization must be taking place outside the focus of attention, as
others have suggested previously (e.g., Macken et al., 2003; McAdams & Bertoncini, 1997;
Snyder et al., 2006). This apparent discrepancy could be reconciled by assuming that se­
quential stream segregation relies on multiple levels of representations (Snyder et al.,
2009), some of which may be more sensitive to volitional control (Gutschalk et al., 2005;
Snyder et al., 2006).

In summary, evidence from ERP studies suggests that perceptual organization of acoustic
features into sound objects can occur independently of attention (e.g., Alain & Izenberg,
2003; Alain & Woods, 1994; Snyder et al., 2006; Sussman et al., 2007). However, it is very
likely that attention facilitates perceptual organization and that selective attention may
determine which stream of sounds is in the foreground and which is in the background
(i.e., figure–ground segmentation) (Sussman et al., 2005). These views are compatible
with the object-based account of auditory attention in which primitive perceptual process­
es sort the incoming acoustic data into its constituent sources, allowing selective process­
es to work on the basis of meaningful objects (Alain & Arnott, 2000).

Mechanisms of Auditory Attention


As mentioned earlier, selective attention enables us to prioritize information processing
such that only a subset of the vast sensory world (and internal (p. 221) thought) receives
more in-depth analysis. In the last decade, there has been a growing interest in three im­
portant mechanisms that could serve to optimize the contrast between sounds of interest
and those that are “task irrelevant.” These are enhancement (also referred to as gain),
the sharpening of receptive fields for task-relevant information, and the possible suppres­
sion of task-irrelevant information (for a discussion of these ideas related to stimulus rep­
etition, see Grill-Spector et al., 2006). The enhancement and suppression mechanisms
were originally proposed to account for visual attention (e.g., Hillyard et al., 1998), and
such models posit feature-specific enhancements in regions that are sensitive to the at­
Page 8 of 35
Varieties of Auditory Attention

tended features as well as suppression in regions that are sensitive to the unattended
(task-irrelevant) features.

Attention as an Enhancement and Suppression Mechanism to En­


hance Figure–Ground Segregation

In the auditory attention literature, the notion of a gain mechanism was introduced early
on (e.g., Hillyard et al., 1973), although it was not originally ascribed as a feature-specific
process. For instance, early studies in nonhuman primates showed that the firing rate of
auditory neurons increased when sounds occurred at the attended location (Benson &
Hienz, 1978) or when attention was directed toward auditory rather than visual stimuli
(Hocherman et al., 1976), consistent with a mechanism that “amplifies” or enhances the
representation of task-relevant stimuli. Electroencephalography (EEG; Hillyard et al.,
1973; Woods et al., 1994) and magnetoencephalography (MEG; Ross et al., 2010; Woldorff
et al., 1993) studies provide further evidence for neural enhancement during auditory se­
lective attention. For instance, the amplitude of the N1 wave (negative wave at ∼100 ms
after sound onset) from scalp-recorded auditory evoked potentials is larger when sounds
are task-relevant and fall within an “attentional spotlight” compared with when the same
sounds are task-irrelevant (Alho et al., 1987; Giard et al., 2000; Hansen & Hillyard, 1980;
Hillyard et al., 1973; Woldorff, 1995; Woods et al., 1994; Woods & Alain, 2001).

Figure 11.4 Schematic representation of an experi­


ment in which participants were presented with
streams of sounds defined by the conjunction of pitch
and location. In a typical experiment, participants
were asked to listen to low pitch sounds in the left
ear in order to detect occasional stimuli (i.e., target)
that slightly differed from the standard stimuli along
a third dimension (e.g., duration or intensity).

The notion of feature-specific enhancement and suppression as the principle mechanisms


to enhance figure–ground segregation implies that selective attention would “amplify”
neural activity in specific cortical areas that respond preferably to particular stimulus at­
tributes. Because the task-relevant stream of sounds in most neuroimaging studies of au­
ditory selective attention is usually defined by the pitch or the location of the stimuli, the
feature-specific enhancement and suppression hypothesis can be evaluated by comparing
neural activity for identical stimuli when they are task relevant and task irrelevant. Such
comparisons have revealed different ERP-selective attention effects for pitch and location
attributes (Degerman et al., 2008; Woods & Alain, 2001). Figure 11.4 shows a schematic
diagram of a typical experiment in which participants are presented with multidimension­
Page 9 of 35
Varieties of Auditory Attention

al stimuli. The deployment of stimuli constructed by the orthogonal combination of two


frequencies (i.e., 250 and 4000 Hz) and two locations (left and right ear) have proved
helpful for investigating feature-specific attention effects (e.g., attend to high tones) as
well as object-based attention effects that rely on the conjunction of sound features (e.g.,
attend to high tones in the left ear). Using such paradigms under feature-specific condi­
tions, Woods and colleagues have shown that auditory selective attention modulates activ­
ity in frequency-specific regions of auditory cortex (Woods et al., 1994; Woods & Alain,
2001). The fact that attention enhances the amplitude of the auditory evoked response
from a tonotopically organized generator provides strong support for models that posit at­
tentional gain of sensory processing (Hillyard et al., 1987; Woldorff et al., 1993). In addi­
tion to enhanced responses to features deemed task relevant by virtue of the task instruc­
tions, object-based (or conjunctive) attentional effects have also been observed, including
neural facilitation for objects expressed both during and after featural processing (e.g.,
Woods et al., 1994; Woods & Alain, 1993, 2001). For example, in complex listening situa­
tions in which target sounds are defined by a combination of features, (p. 222) nontarget
sounds that share either frequency or location features with the targets have also shown
attention-related effects that differ in amplitude distribution (Woods et al., 1994; Woods
and Alain, 1993, 2001). Differences in amplitude distribution are indicative that attention
modulates neural activity arising from different cortical fields related to the processing of
different sound features. Such findings are also consistent with the dual-pathway model
of auditory scene analysis (Rauschecker & Scott, 2009; Rauschecker & Tian, 2000) in
which sound identity and sound location are preferably processed along ventral (what)
and dorsal (where) pathway streams. There is also evidence that such gain mechanisms
play an important role during sustained selective attention to a single speech stream em­
bedded in a multiple-talker environment (Kerlin et al., 2010).

Although the evidence for feature-based and object-based enhancement is compelling,


that related to suppression of auditory stimuli occurring outside the attentional spotlight
are more equivocal. There are some findings consistent with an active suppression mech­
anism. For example, the amplitude of the P2 wave (positive deflection at ∼180 ms after
sound onset) from the auditory ERPs elicited by task-irrelevant sounds is larger during in­
tramodal attention than during intermodal (auditory-visual) attention (Degerman et al.,
2008; Michie et al., 1993). Although the enhanced P2 amplitude during intramodal atten­
tion may reflect an active suppression mechanism, there is no evidence that the suppres­
sion is feature specific. Moreover, one cannot rule out the possibility that during inter­
modal attention, participants’ attention may have wandered to the auditory stimuli, there­
by modulating the amplitude of ERPs (i.e., increased negativity) to the so-called ‘unat­
tended’ stimuli. Therefore, the higher P2 amplitude observed during selective attention
tasks may not reflect suppression, but instead may simply indicate attention effects dur­
ing the control baseline condition.

In a more recent study, Munte et al. (2010) measured auditory evoked responses from two
locations, each containing a spoken story and bandpass-filtered noise. Participants were
told to focus their attention on a designated story/location. Consistent with prior re­
search, the N1 elicited by the speech probe was found to be larger at the attended than
Page 10 of 35
Varieties of Auditory Attention

at the unattended location. More importantly, the noise probes from the task-relevant
story’s location showed a more positive frontal ERP response at about 300 ms than did
the probes at the task-irrelevant location. Although the enhanced positivity may be indica­
tive of a suppression mechanism, there is a possibility that it reflects novelty or target-re­
lated activity, which may comprise a positive wave that peaks about the same time. More­
over, using a similar design, but measuring the effect of sustained selective attention in
the EEG power spectrum, Kerlin et al. (2010) found no evidence of suppression for the
task-irrelevant stream of speech.

If suppression of task-irrelevant sounds does occur, what underlying mechanisms support


it? It is known that the auditory system is composed of descending efferent pathways that
are particularly important for modulating neural activity at the peripheral level. Never­
theless, it remains unclear whether such suppression would be an appropriate strategy in
everyday listening situations. Although one could argue that a feature-specific suppres­
sion mechanism would help to prevent information overload, such a suppression mecha­
nism could also have a negative impact in the sense that important information could be
missed or undetected. Perhaps a more parsimonious account would be a facilitatory
process that enhances representation of task-relevant information in sensory and short-
term memory, while representations of task-irrelevant information would simply decay
progressively without necessarily postulating active suppression. Indeed, increasing at­
tentional demands reduces the amplitude of the mismatch negativity, an ERP component
that reflects sensory memory, but has little impact on the amplitude of the N1 and P2
waves, which reflect sensory registration (Alain & Izenberg, 2003). Thus, attention ap­
pears to play an important role in maintaining and enhancing sensory representations
such that conditions that prevent listeners from attending to task-irrelevant sounds cause
a processing deficit in detecting changes in the unattended stream of sounds (depending
both on sensory memory and a comparison process between recent and incoming stimuli)
(e.g., Alain & Woods, 1997; Arnott & Alain, 2002b; Trejo et al., 1995; Woldorff et al.,
1991). This is akin to the model proposed by Cowan (1988, 1993), in which attention plays
an important role in keeping “alive” sensory representations for further and more in-
depth processing.

Page 11 of 35
Varieties of Auditory Attention

Attention Enhances Perception by Sharpening of Tuning Curve

Figure 11.5 A, Schematic representation of the stim­


uli embedded in notched noise around 1,000 Hz. B,
Global field power showing the strength of the N100
auditory evoked response elicited by stimuli without
masking noise (no mask) or as a function of width of
the notched noise. Note that the greater the width of
the notched noise, the larger the N100. C, As selec­
tive attention is made increasingly possible by the in­
crease in noise width, the N100 peak arrives earlier
and is more sharply tuned.

Adapted with permission from Kauramaki et al.,


2007.

In addition to enhancement and suppression mechanisms, the effects of attention may be


(p. 223) mediated by a mechanism that selectively sharpens the receptive fields of neu­

rons representing task-relevant sounds, a mechanism that may also enhance figure–
ground separation. Kauramaki et al. (2007) used the notched noise technique in which a
pure tone is embedded within noise that has a segment whose width around the pure
tone is parametrically varied to ease target detection (Figure 11.5). Kauramaki et al. mea­
sured the N1 wave elicited by the pure tone during selective attention and found a de­
crease in attention effects when the width of the notched noise was decreased. However,
the shape of the function was significantly different from a multiplicative one expected on
the basis of simple gain model of selective attention (see also Okamoto et al., 2007). Ac­
cording to Kauramaki et al. (2007), auditory selective attention in humans cannot be ex­
plained by a gain model, whereby only the neural activity level is increased. Rather, selec­
tive attention additionally enhances auditory cortex frequency selectivity. This effect of
selective attention on frequency tuning evolves rapidly, within a few seconds after atten­
tion switching, and appears to occur for neurons in nonprimary auditory cortices (Ahveni­
nen et al., 2011).

In summary, attentional effects on sensory response functions include an increase in gain


and sharpening of tuning curves that appears to be specific to the task-relevant feature.
This feature-specific attention effect is also accompanied by a more general suppression
response, although the evidence for such suppression in the auditory domain remains

Page 12 of 35
Varieties of Auditory Attention

equivocal. Both gain and suppression mechanisms, as well as sharper receptive field tun­
ing, may enhance figure–ground segregation, thereby easing the monitoring and selec­
tion of task-relevant information.

Neural Network of Auditory Attention


The development of positron emission tomography (PET) and fMRI has allowed re­
searchers to make major strides in identifying the brain areas that play an important role
in auditory attention. In the next sections, we briefly review the brain areas involved in
sustained, selective (intramodal and intermodal), and divided attention in an effort to
draw commonalities as well as important differences (p. 224) in the neural networks sup­
porting the various types of auditory attention.

Sustained Attention

The process of monitoring the auditory environment for a particular event (e.g., perhaps
you are still waiting for the maître d to call out your name) has been studied in a number
of ways in order to reveal the neural architecture supporting sustained attention. One ex­
ample is the oddball paradigm in which participants are asked to respond to infrequent
targets sounds in a train of nontarget stimuli. In oddball tasks, the target stimuli differ
from the standard sounds along a particular dimension (e.g., pitch, duration, intensity, lo­
cation). The detection of the task-relevant stimuli is typically accompanied by a parietal-
central positive wave of the ERP, the P300 or P3b. Moreover, fMRI studies have identified
several brain areas that show increased hemodynamic activity during the detection of
these oddball targets relative to the nontarget stimuli, including the auditory cortex bilat­
erally, parietal cortex and prefrontal cortex, supramarginal gyrus, frontal operculum, and
insular cortex bilaterally (Linden et al., 1999; Yoshiura et al., 1999). The increased fMRI
signals for target versus nontarget conditions are consistent over various stimuli (i.e., au­
ditory versus visual stimuli) and response modalities (i.e., button pressing for targets ver­
sus silently counting the targets) and can be regarded as specific for target detection in
both the auditory modality and the visual modality (Linden et al., 1999). Interestingly, the
amount of activity in the anterior cingulate and bilateral lateral prefrontal cortex, tempo­
ral-parietal junction, postcentral gyri, thalamus, and cerebellum was positively correlated
with an increase in time between targets (Stevens et al., 2005). The fact that these effects
were only observed for target and not novel stimuli suggests that the activity in these ar­
eas indexes the updating of a working memory template for the target stimuli or strategic
resource allocation processes (Stevens et al., 2005).

Another task that requires sustained attention is the n-back working memory task in
which participants indicate whether the incoming stimulus matches the one occurring
one, two, or three positions earlier. Alain and colleagues (2008) used a variant of this par­
adigm and asked their participants to press a button only when the incoming stimulus
sound matched the previous one (1-back) in terms of identity (i.e., same sound category
such as animal sounds, human sounds, or musical instrument sounds) or location. Distinct

Page 13 of 35
Varieties of Auditory Attention

patterns of neural activity were observed for sustained attention and transient target-re­
lated activity. Figure 11.6 shows sustained task-related activity during sound identity and
sound location processing after taking into account transient target-related activity. The
monitoring of sound attributes recruited many of the areas previously mentioned for the
oddball task, including auditory, parietal, and prefrontal cortices (see also, Martinkauppi
et al., 2000; Ortuno et al., 2002). Interestingly, changes in task instruction modulated ac­
tivity in this attentional network, with greater activity in ventral areas, including the ante­
rior temporal lobe and inferior frontal gyrus, when participants’ attention was directed
toward sound identity and greater activity in dorsal areas, including the inferior parietal
lobule, superior parietal cortex, and superior frontal gyrus, when participants’ attention
was directed toward sound location.

Figure 11.6 A, Schematic of n-back task used to in­


vestigate sustained working memory to sound identi­
ty and sound location. B, Task differences in sus­
tained activity during a working memory task. Warm
colors indicate greater sustained activity during
working memory for sound identity, and cool colors
indicate greater sustained activity during working
memory for sound location. The activation maps are
displayed on the cortical surface using surface map­
ping (SUMA). IFG, inferior frontal gyrus; IPL, inferi­
or parietal lobule; STG, superior temporal gyrus.

Adapted with permission from Alain et al., 2008.

Although such data again suggest the relationship between specific regions or pathways
within the brain and specific perceptual and cognitive function, it is important to consider
the extent to which clear delineations can be made. For example, although the inferior
parietal lobule (IPL) is clearly involved in spatial analysis and may play an important role
in monitoring or updating sound source location in working memory, there is also plenty
of evidence demonstrating its involvement in nonspatial processing (see Arnott et al.,
2004). In fact, some of this activity may be accounted for in terms of an action–perception

Page 14 of 35
Varieties of Auditory Attention

dissociation (Goodale & Milner, 1992) in which the dorsal auditory pathway brain regions
are important for acting on objects and sounds in the environment (Arnott & Alain, 2011).
For example, in a task requiring listeners to categorize various sounds as being
“material” (i.e., malleable sheets of paper, Styrofoam, aluminium foil, or plastic being
crumpled in a person’s hands), “human” (i.e., nonverbal vocalizations including coughing,
yawning, snoring, and throat clearing) or “noise” (i.e., randomized material sounds),
Arnott et al. (2008) found increased blood-oxygen-level- dependent (BOLD) effect along a
dorsal region, the left intraparietal sulcus (IPS), in response to the material sounds. A
very similar type of activation was also reported by Lewis and colleagues when listeners
attended to hand-related (i.e., tool) sounds compared with animal vocalizations (Lewis,
2006; Lewis et al., 2005). Both groups proposed that such sounds triggered a “mental
mimicry” of the motor production sequences that most likely would have produced the
sounds, with the left hemispheric activation reflecting the right-handed dominance of the
participants. This explanation finds support in the fact that area (p. 225) hAIP, a region
shown to be active not only during real hand-grasping movements but also during imag­
ined movements, as well as passive observation of people grasping three-dimensional ob­
jects, is located proximally at the junction of the anterior IPS and inferior postcentral sul­
cus (Culham et al., 2006).

Additionally, it is noteworthy that the IPS is an area of the IPL known to make use of visu­
al input, and that it is particularly important for integrating auditory and visual informa­
tion (Calvert, 2001; Macaluso et al., 2004; Meienbrock et al., 2007), especially with re­
spect to guiding and controlling action in space (e.g., Andersen et al., 1997; Sestieri et al.,
2006). As noted earlier, knowledge about auditory material properties is largely depen­
dent on prior visual experiences, at least for normally sighted individuals. Thus, it is plau­
sible that the IPS auditory material–property activity reflects the integration of auditory
input with its associated visual knowledge.

The above data remind us that successful interaction with the complex and dynamic
acoustic environment that we ultimately face involves the coordination and integration of
attentional demands from other modalities such as vision, the timely initiation of task-ap­
propriate action, and the maintenance of attentional processes over long periods of time.
Nevertheless, there are cases in which less important signals must be sacrificed for more
important signals, and the brain regions associated with selective attention are those to
which we now turn.

Selective Attention

Auditory selective attention was originally studied using dichotic listening situations in
which two (p. 226) different streams of speech sounds were presented simultaneously in
both ears (Broadbent, 1962; Cherry, 1953; for a review, see Driver, 2001). In such situa­
tions, participants were asked to shadow (repeat) the message presented in one ear while
ignoring the speech sounds presented at the irrelevant location (i.e., the other ear). This
intramodal attention (i.e., between streams of sounds) involves sustained attention to a
particular stream of sounds in the midst of others, usually defined by its most salient fea­

Page 15 of 35
Varieties of Auditory Attention

tures, such as pitch and location (e.g., Hansen & Hillyard, 1983). This form of selective at­
tention differs from intermodal attention, whereby participants are presented with
streams of auditory and visual stimuli and alternatively focus on either auditory or visual
stimuli in order to detect infrequent target stimuli. The neural networks supporting in­
tramodal and intermodal selective attention are examined next in turn.

Intramodal Selective Attention


Intramodal auditory selective attention tasks engage a network of frontal, temporal, and
parietal regions (Hill & Miller, 2009; Lipschutz et al., 2002), and activity in these areas
appears to be sensitive to task demands. For instance, using identical stimuli, Hill and
Miller (2009) found greater activity in dorsal brain regions when listeners were told to at­
tend to the location of a particular talker in a multiple-talker situation, whereas more ven­
tral activity was observed when participants attended to the pitch (voice) of the talker.
Once again, this dissociation between spatial and nonspatial auditory attention is consis­
tent with the general trend of showing greater activity in ventral (what) and dorsal
(where) brain regions in auditory tasks that require sound identification or sound localiza­
tion, respectively (Alain et al., 2001b, 2008; Arnott et al., 2004, 2005; Degerman et al.,
2006; Leung & Alain, 2011; Maeder et al., 2001; Rama et al., 2004).

Recently, there has been increased interest in examining the effects of attention on audi­
tory cortical activity. This interest is motivated in part by the notion that the auditory cor­
tex is not a single entity but rather comprises many cortical fields that appear to be dif­
ferentially sensitive to sound frequency and sound location, and by the analogous discov­
ery of feature-specific enhancement and suppression of visual neurons during visual se­
lective attention. Although the effects of attention on neural activity in auditory cortical
areas are not disputed, the effects of attention on the primary auditory cortex remain
equivocal. For instance, some fMRI studies do not find evidence for attention effects on
primary auditory cortex in Heschl’s gyrus (Hill & Miller, 2009; Petkov et al., 2004),
whereas others report enhancements in frequency-sensitive regions, although the atten­
tion effects are not necessarily restricted to them (Paltoglou et al., 2009). The lack of at­
tention effects on the BOLD signal from the primary cortex does not mean that neural ac­
tivity in primary auditory cortex is not modulated by attention; it may be too weakly or
differentially modulated such that the BOLD effect cannot capture it. As we have already
seen, studies that have used another imaging technique such as EEG or MEG provide evi­
dence suggesting that selective attention amplifies neural activity in frequency-sensitive
regions (Woods et al., 1994; Woods & Alain, 2001) as well as in or near primary areas
(Woldorff & Hillyard, 1991; Woldorff et al., 1993). In addition, single-unit research in
mammals has shown that attention can modulate the neural firing rate of neurons in pri­
mary auditory cortex (Benson and Hienz, 1978; Hocherman et al., 1976).

Moreover, intramodal selective attention to location (i.e., left or right ear) has been
shown to increase BOLD activity in the right middle frontal gyrus regardless of the loca­
tion of attentional focus (Lipschutz et al., 2002). In contrast, brain regions including the
middle and inferior frontal cortex, frontal eye fields (FEFs), and the superior temporal
cortex in the contralateral hemisphere did show attention-related activity according to
Page 16 of 35
Varieties of Auditory Attention

which ear was being attended to (Lipschutz et al., 2002). Activation in the superior tem­
poral cortex extended through the temporal-parietal junction to the inferior parietal cor­
tex, including the IPS (Lipschutz et al., 2002). The activation in the human homologue of
FEFs during auditory spatial attention has been reported in several studies (e.g., Lip­
schutz et al., 2002; Tzourio et al., 1997; Zatorre et al., 1999), but the results should be in­
terpreted with caution because these studies did not control for eye movements, which
could partly account for the activation in the FEFs. In a more recent study, Tark and Cur­
tis (2009) found FEF activity during audiospatial working memory task even for sounds
that were presented behind the head to which it was impossible to make saccades. Their
findings are consistent with the proposal that FEF plays an important role in processing
and maintaining sound location (Arnott & Alain, 2011).

In addition to enhanced activity in cortical areas during intramodal auditory selective at­
tention, there is also evidence from fMRI that selective attention (p. 227) modulates activi­
ty in the human inferior colliculus (Rinne et al., 2008). The inferior colliculus is a mid­
brain nucleus of the ascending auditory pathway with diverse internal and external con­
nections. The inferior colliculus also receives descending projections from the auditory
cortex, suggesting that cortical processes affect inferior colliculus operations. Enhanced
fMRI activity in the basal ganglia has also been observed during auditory selective atten­
tion to speech sounds (Hill & Miller, 2009). There is also some evidence that selective at­
tention may modulate neural activity at the peripheral level via descending projections.
However, the effects of attention on the peripheral and efferent auditory pathways re­
main equivocal. For example, although some studies report attention effects on the pe­
ripheral auditory systems as measured with evoked otoacoustic emissions (Giard et al.,
1994; Maison et al., 2001), other studies do not (Avan & Bonfils, 1992; Michie et al.,
1996; Timpe-Syverson & Decker, 1999).

Intermodal Auditory Attention


In intermodal attention studies, the effects of attention on auditory processing are as­
sessed by comparing neural activity to auditory stimuli when participants perform a de­
manding task in another modality (usually visual), with activity elicited by the same stim­
uli when attention is directed toward the auditory stimuli. Rinne et al. (2007) found en­
hanced activity in auditory areas during a cross-modal attention task, with the intermodal
attention effects extending to both the posterior and superior temporal gyrus. However,
there was no difference in parietal cortex or prefrontal cortex when attention was direct­
ed to auditory versus visual (i.e., picture) stimuli. Similarly, Kawashima et al. (1999)
reported comparable activation in the right prefrontal cortex during visual and auditory
attention to speech sounds. However, they did find a difference in parietal cortex activa­
tion between attention to auditory and visual stimuli (Kawashima et al., 1999), suggesting
that the parietal cortex may play a different role during auditory and visual attention. In­
terestingly, electrophysiological investigations of the chinchilla cochlea demonstrate that
as the attentional demands to the visual system increase (as in the case of an increasingly
difficult visual discrimination task), there is a corresponding decrease in the sensitivity of
the cochlea that appears mediated by efferent projections to the outer hair cells (Delano

Page 17 of 35
Varieties of Auditory Attention

et al., 2007). Accordingly, it seems plausible that intermodal attention could theoretically
alter auditory processing at the very earliest stages of auditory processing (i.e., at senso­
ry transduction).

Further underscoring the need to consider the interaction of attentional demands be­
tween modalities, there is evidence to suggest that during intermodal selective attention
tasks, attention to auditory stimuli may alter visual cortical activity. The attentional repul­
sion effect is one example of this. Traditionally understood as a purely visual phenome­
non, attentional repulsion refers to the perceived displacement of a vernier stimulus in a
direction that is opposite to that of a brief peripheral visual cue (Suzuki & Cavanagh,
1997). Observers in these behavioral tasks typically judge two vertical lines placed above
and below one another to be offset in a direction that is opposite to the location of a
briefly presented irrelevant visual stimulus. Under the assumption that the repulsion ef­
fect exerts its effect in early retinotopic areas (i.e., visual cortex; Pratt & Turk-Browne,
2003; Suzuki & Cavanagh, 1997), Arnott and Goodale (2006) sought to determine
whether peripheral auditory events could also elicit the repulsion effect. In keeping with
the notion that sounds can alter occipital activity, peripheral sounds were also found to
elicit the repulsion effect.

More direct evidence for enhancement of activity in occipital cortex comes from an fMRI
study in which selective attention to sounds was found to activate visual cortex (Cate et
al., 2009). In that particular study, the occipital activations appeared to be specific to at­
tended auditory stimuli given that the same sounds failed to produce occipital activation
when they were not being attended to (Cate et al., 2009). Moreover, there is some evi­
dence that auditory attention, but not passive exposure to sounds, activates peripheral re­
gions of visual cortex when participants attended to sound sources outside the visual
field. Functional connections between auditory cortex and visual cortex subserving the
peripheral visual field appear to underlie the generation of auditory occipital activations
(Cate et al., 2009). This activation may reflect the priming of visual regions to process
soon-to-appear objects associated with unseen sound sources and provides further sup­
port for the idea that the auditory “where” subsystem may be in the service of the visual-
motor “where” subsystem (Kubovy & Van Valkenburg, 2001). In fact, the functional over­
lap between the auditory cortical spatial network and the visual orientation network is
quite striking, as we have recently shown, suggesting that the auditory spatial network
and visual orienting network share a (p. 228) common neural substrate (Arnott & Alain,
2011). Along the same line, auditory selective attention to speech modulates activity in
the visual word form areas (Yoncheva et al., 2009), suggesting a high level of interaction
between sensory systems even at the relatively early stages of processing.

Throughout this discussion of intermodal attention, one should also keep in mind that
even though auditory information can certainly be obtained in the absence of other senso­
ry input (e.g., sounds perceived in the dark, or with the eyes closed), for the vast majority
of neurologically intact individuals, any given auditory experience is often experienced in
the presence of other sensory (especially visual) input. Thus, there is good reason to ex­
pect that in such cases, the neural architecture of auditory processing may be interwoven

Page 18 of 35
Varieties of Auditory Attention

with that of other sensory systems, especially in instances in which the two are depen­
dent on one another. Once again, neuroimaging data derived from the experience of audi­
tory material property information are useful in this regard. Unlike auditory localization
where a sound’s position can be constructed from interaural timing and intensity differ­
ences, the acoustic route to the material properties of any given object depends entirely
on previous associations between sound and information from other senses (e.g., hearing
the sound that an object makes as one watches someone or something come into contact
with that object, or hearing the sound that an object makes as one touches it). Building
on research showing that visual material processing appears to be accomplished in ven­
tromedial brain areas that include the collateral sulcus and parahippocampal gyrus (Cant
& Goodale, 2007), Arnott and colleagues used fMRI to investigate the brain regions in­
volved in auditory material processing (Arnott et al., 2008). Relative to control sounds,
audio recordings of various materials being manipulated in someone’s hands (i.e., paper,
plastic, aluminium foil, and Styrofoam) were found to elicit greater hemodynamic activity
in the medial region of the right parahippocampus both in neurologically intact individu­
als and in a cortically blind individual. Most interestingly, a concomitant visual material
experiment in which the sighted participants viewed pictures of objects rendered in dif­
ferent materials (e.g., plastic, wood, marble, foil) was found to elicit right parahippocam­
pal activity in an area just lateral to the auditory-evoked region. These results fit well
with animal neuroanatomy in that the most medial aspect of the monkey parahippocam­
pus (i.e., area TH) has connections with auditory cortex (Blatt et al., 2003; Lavenex et al.,
2004; Suzuki et al., 2003), whereas the region situated immediately adjacent to area TH
(i.e., area TFm in the macaque or TL in the rhesus monkey) has strong inputs from areas
processing visual information, receiving little if any input from auditory areas (Blatt et al.,
2003).

The data from both the intramodal and intermodal attentional literature reminds us again
that attention may have a variety of neural expressions, at both relatively early (e.g., oc­
cipital) and late (e.g., parahippocampal) stages of processing, and depends to a large ex­
tent on specific task demands (Griffiths et al., 2004). In this respect, attention demon­
strates itself as a pervasive and powerful influence on brain functioning.

Divided Attention

Divided attention between two concurrent streams of sounds (one in each hemispace) or
between auditory modality and visual modality, has been associated with enhanced activi­
ty in the precuneus, IPS, FEFs, and middle frontal gyrus compared with focused attention
to either one modality or location (Santangelo et al., 2010). Moreover, Lipschutz et al.
(2002) found comparable activation during selective and divided attention suggestive of a
common neural network. Bimodal divided attention (i.e., attending to auditory and visual
stimuli) has also been associated with enhanced activity in the posterior dorsolateral pre­
frontal cortex (Johnson et al., 2007). Importantly, the same area was not significantly ac­
tive when participants focused their attention to either visual or auditory stimuli or when
they were passively exposed to bimodal stimuli (Johnson & Zatorre, 2006). The impor­
tance of the dorsolateral prefrontal cortex (DLPFC) during bimodal divided attention is
Page 19 of 35
Varieties of Auditory Attention

further supported by evidence from a repetitive transcranial magnetic stimulation (rTMS)


study (Johnson et al., 2007). In that particular study, performance during bimodal divided
attention was hindered by temporarily disrupting the function of the DLPFC using rTMS
compared with control site stimulation.

Deployment of Auditory Attention

Our ability to attend a particular sound object or sound location is not instantaneous and
may require a number of cognitive alterations. We may need to disengage from what we
are doing, switch our attention to a different sensory modality, focus on a different spatial
location or object, and then engage our selection mechanisms. It has been generally es­
tablished that focusing attention on a (p. 229) particular input modality, or a particular
“what” or “where” feature, modulates cortical activity such that task-relevant representa­
tions are enhanced at the expense of irrelevant ones (Alain et al., 2001b; Degerman et al.,
2006; Johnson & Zatorre, 2005, 2006). Although the object-based model can adequately
account for many findings that involve focusing attention to a task-relevant stream, there
is a growing need to better understand how attention is deployed toward an auditory ob­
ject within an auditory scene and whether the mechanisms at play when directing audito­
ry attention toward spatial and nonspatial cues also apply when the auditory scene com­
prises multiple sound objects. Although substantial research has been carried out on sus­
tained and selective attention, fewer studies have examined the deployment of attention,
especially in the auditory modality.

One popular paradigm to assess the deployment of attention consists of presenting an in­
formational cue before either a target sound or streams of sounds in which the likely in­
coming targets are embedded (Green et al., 2005; Stormer et al., 2009). The mechanisms
involved in the control of attention can be assessed by comparing brain activity during
the cue period (i.e., the interval between the cue and the onset of the target or stream of
sounds) in conditions in which attention is directed to a particular feature (e.g., location
or pitch) that defined either the likely incoming target or the streams of sounds to be at­
tended. Such a design has enabled researchers to identify a frontal-parietal network in
the deployment of attention to either a particular location (Hill & Miller, 2009; Salmi et
al., 2007a, 2009; Wu et al., 2007) or sound identity (Hill & Miller, 2009). A similar frontal-
parietal network has been reported during attention switching between locations in both
auditory modality and visual modality (Salmi et al., 2007b; Shomstein & Yantis, 2004;
Smith et al., 2010), sound locations (left vs. right ear), or sound identities (male vs. fe­
male voice) (Shomstein & Yantis, 2006). The network that mediates voluntary control of
auditory spatial and nonspatial attention encompasses several brain areas that vary
among studies and task and include, but are not limited to, the inferior and superior
frontal gyrus, dorsal precentral sulcus, IPS, superior parietal lobule, and auditory cortex.
Some of the areas (e.g., anterior cingulate, FEFs, superior parietal lobule) involved in ori­
enting auditory spatial attention are similar to those observed during the orientation of
visual spatial attention, suggesting that control of spatial attention may be supported by a
combination of supramodal and modality- specific brain mechanism (Wu et al., 2007). The
activations in this network vary as a function of the feature to be attended, with location
Page 20 of 35
Varieties of Auditory Attention

recruiting the parietal cortex to a greater extent and attention to pitch recruiting the infe­
rior frontal gyrus. These findings resemble studies of auditory working memory for sound
location and sound identity (Alain et al., 2008; Arnott et al., 2005; Rama et al., 2004).

In addition to brain areas that appear to be specialized in terms of processing isolated


featural information in individual modalities, other sites have been identified whose role
may be considered more integrative in nature. For example, the left IPS has been found
to be equally active when attention is directed to sound identity or sound location (Hill &
Miller, 2009). Moreover, the left IPS is also activated by tool- or hand-manipulated
sounds, as previously discussed. This suggests that the IPS may be an integrative center
that coordinates attention regardless of which class of features is the focus of attention
(Hill & Miller, 2009).

Figure 11.7 Blood oxygenation level–dependent ef­


fects showing brain areas that are activated by bot­
tom-up data-driven stimulus designed to capture a
participant’s attention (i.e., loud deviant sounds) as
well as those reflecting top-down controlled process­
es engaged a spatial cuing task. Note the overlap be­
tween the frontal eye field (FEF), the temporal pari­
etal junction (TPJ), and the superior parietal lobule
(SPL) during bottom-up and top-down controlled au­
ditory spatial attention. Cb, cerebellum; CG/medFG,
cingulated/medial frontal gyrus; IFG/MFG, inferior
frontal gyrus/middle frontal gyrus; IPS, intraparietal
sulcus; OC, occipital cortex; PMC, premotor cortex.

Reprinted from Brain Research, 1286, Juha Salmi,


Teemu Rinne, Sonja Koistinen, Oili Salonen, and Kim­
mo Alho, “Brain networks of bottom-up triggered and
top-down controlled shifting of auditory attention,”
155–164, Copyright 2008, with permission from Else­
vier.

Additionally, the neural networks involved in the endogenous control of attention differ
from those engaged by salient auditory oddball or novel stimuli designed to capture a
participant’s attention in an exogenous fashion. Salmi et al. (2009) used fMRI to measure
brain activity elicited by infrequently occurring loudness deviation tones (LTDs) while

Page 21 of 35
Varieties of Auditory Attention

participants were told to focus their attention on one auditory stream (e.g., left ear) and
to ignore sounds presented in the other ear (i.e., right ear). The LTD occurred in both
streams and served as a means of assessing involuntary (i.e., bottom-up) attentional cap­
ture. The authors found impaired performance when the targets were preceded by LTDs
in the unattended location, and this effect coincided with enhanced activity in the ventro­
medial prefrontal cortex (VMPFC), possibly related to evaluation of the distracting event
(Figure 11.7). Together, these fMRI studies reveal a complex neural network involved in
the deployment of auditory attention. In a recent study, Gamble and Luck (2011)
measured auditory ERPs while listeners were presented with two clearly distinguishable
sound objects occurring in the left and right hemispace simultaneously. Participants indi­
cated whether a predefined target was present or absent. They found an increased nega­
tivity between 200 and 400 ms that was maximum at anterior and contralateral elec­
trodes to the target location, which was followed by a posterior contralateral positivity.
These results suggest that auditory attention can be quickly deployed to the sound object
location. (p. 230) More important, these findings suggest that scalp-recordings of ERPs
may provide a useful tool for studying the deployment of auditory attention in real-life sit­
uations in which multiple sound objects are simultaneously present in the environment.
This is an important issue to address given that auditory perception often occurs in a
densely cluttered, rapidly changing acoustic environment, where multiple sound objects
compete for attention.

Deployment of Attention in Time


Although the data clearly provide evidence for neural modulation with respect to auditory
spatial attention, it has been argued that the role of location in audition is less critical
than in vision (Woods et al., 2001), and that in contrast to the high spatial resolution of
the visual system, the auditory system shows similarly acute sensitivity with respect to
the temporal domain (Welch & Warren, 1980). Ventriloquism and sound-induced visual
temporal illusions (Shams et al., 2000; Recanzone, 2003) are good examples of this prop­
erty. Sanders and Astheimer (2008) showed that listeners can selectively direct their at­
tention to specific time points that differ by as little as 500 ms, and that doing so im­
proves target detection, affects baseline neural activity preceding stimulus presentation,
and modulates auditory evoked potentials at a perceptually early stage (Figure 11.8).
Rimmele, Jolsvai, and Sussman (2011) set up spatial and temporal expectation using a
moving auditory stimulus. They found that implicitly directing attention to a specific mo­
ment in time modulated the amplitude of auditory ERPs, independently from spatially di­
recting attention. These studies show that listeners can flexibly allocate temporally selec­
tive attention over short intervals (for a more extensive review, see Jones, 1976; Jones &
Boltz 1989).

Page 22 of 35
Varieties of Auditory Attention

Future Directions

Figure 11.8 A, Schematic of the paradigm used to in­


vestigate the deployment of auditory attention in
time. B, Accuracy in detecting targets at the desig­
nated time. C, The first row shows scalp-recorded au­
ditory evoked potential at the midline central site
(i.e., Cz) for the whole epoch. The amplitude of the
contingent negative variation (CNV) increased as the
designate attended time increased. The second row
shows the transient N1 and P2 wave at the midline
frontal site (i.e., Fz) elicited by the stimuli when they
occurred at the attended time (thick line). Note that
the N1 amplitude was larger when the stimulus oc­
curred at the attended time than when attention was
allocated to a different time.

Adapted with permission from Sanders & Astheimer,


2008.

Over the past decade, we have seen a significant increase in research activity regarding
the mechanisms supporting the varieties of auditory attention. Attention to auditory mate­
rial engages a broadly distributed neural network that varies as a function of task de­
mands, including selective, divided, and sustained attention. An important goal for future
research will be to clarify the role of auditory cortical areas as well as those beyond audi­
tory cortices (p. 231) (e.g., parietal cortex) in auditory attention. This may require a com­
bination of neuroimaging techniques such as EEG, TMS, and fMRI, as well as animal stud­
ies using microstimulation and selective-deactivation (e.g. cooling) techniques combined
with behavioral measures.

Current research using relatively simple sounds (e.g., pure tone) suggests that selective
attention may involve facilitation and suppression of task-relevant and task-irrelevant
stimuli, respectively. However, further research is needed to determine the extent to
which attentional mechanisms derived from paradigms using relatively simple stimuli ac­
count for the processes involved in more complex and realistic listening situations often
illustrated using the cocktail party example. Speech is a highly familiar stimulus, and our

Page 23 of 35
Varieties of Auditory Attention

auditory system has had the opportunity to learn about speech-specific properties (e.g.,
f0, formant transitions) that may assist listeners while they selectively attend to speech
stimuli (Rossi-Katz & Arehart, 2009). For instance, speech sounds activate schemata that
may interact with more primitive mechanisms, thereby influencing our incoming acoustic
data to perceptually organize and select for further processing. It is unlikely that such
schemata play an equivalent role in the processing of pure tones, so the relationship be­
tween bottom-up and top-down contributions in the deployment of attention may be dif­
ferent according to the naturalism of the auditory environment used. Lastly, spoken com­
munication is a multimodal and highly interactive process whereby visual input can help
listeners identify speech in noise and can also influence what is heard. Hence, it is also
important to examine the role of visual information during selective attention to speech
sounds.

In short, it is clear that auditory attention plays a central (and perhaps even pri­
(p. 232)

mary) role in guiding our interaction with the external world. However, in reviewing the
literature, we have noted how auditory attention is intimately connected with other fun­
damental issues such as multimodal integration, the relationship between perception and
action-based processing, and how mental representations are maintained across both
space and time. In advancing the field, it will be important not to ignore the complexity of
the problem, such that our understanding of the neural substrates that underlie auditory
attention reflect this core mechanism at its most ecologically valid expression.

References
Ahveninen, J., Hamalainen, M., Jaaskelainen, I. P., Ahlfors, S. P., Huang, S., Lin, F. H., Raij,
T., Sams, M., Vasios, C. E., & Belliveau, J. W. (2011). Attention-driven auditory cortex
short-term plasticity helps segregate relevant sounds from noise. Proceedings of the Na­
tional Academy of Sciences U S A, 108, 4182–4187.

Alain, C., Achim, A., & Richer, F. (1993). Perceptual context and the selective attention ef­
fect on auditory event-related brain potentials. Psychophysiology, 30, 572–580.

Alain, C., & Arnott, S. R. (2000). Selectively attending to auditory objects. Frontiers in
Bioscience, 5, D202–D212.

Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). “What” and
“where” in the human auditory system. Proceedings of the National Academy of Sciences
U S A, 98, 12301–12306.

Alain, C., Arnott, S. R., & Picton, T. W. (2001). Bottom-up and top-down influences
(p. 233)

on auditory scene analysis: Evidence from event-related brain potentials. Journal of Ex­
perimental Psychology: Human Perception and Performance, 27, 1072–1089.

Alain, C., & Bernstein, L. J. (2008). From sounds to meaning: The role of attention during
auditory scene analysis. Current Opinion in Otolaryngology & Head and Neck Surgery,
16, 485–489.

Page 24 of 35
Varieties of Auditory Attention

Alain, C., He, Y., & Grady, C. (2008). The contribution of the inferior parietal lobe to audi­
tory spatial working memory. Journal of Cognitive Neuroscience, 20, 285–295.

Alain, C., & Izenberg, A. (2003). Effects of attentional load on auditory scene analysis.
Journal of Cognitive Neuroscience, 15, 1063–1073.

Alain, C., Schuler, B. M., & McDonald, K. L. (2002). Neural activity associated with distin­
guishing concurrent auditory objects. Journal of the Acoustical Society of America, 111,
990–995.

Alain, C., & Woods, D. L. (1993). Distractor clustering enhances detection speed and ac­
curacy during selective listening. Perception & Psychophysics, 54, 509–514.

Alain, C., & Woods, D. L. (1994). Signal clustering modulates auditory cortical activity in
humans. Perception & Psychophysics, 56, 501–516.

Alain, C., & Woods, D. L. (1997). Attention modulates auditory pattern memory as indexed
by event-related brain potentials. Psychophysiology, 34, 534–546.

Alho, K., Tottola, K., Reinikainen, K., Sams, M., & Naatanen, R. (1987). Brain mechanism
of selective listening reflected by event-related potentials. Electroencephalography and
Clinical Neurophysiology, 68, 458–470.

Andersen, R. A., Snyder, L. H., Bradley, D. C., & Xing, J. (1997). Multiple representation of
space in the posterior parietal cortex and its use in planning movements. Annual Review
of Neuroscience, 20, 303–330.

Arnott, S. R., & Alain, C. (2002a). Effects of perceptual context on event-related brain po­
tentials during auditory spatial attention. Psychophysiology, 39, 625–632.

Arnott, S. R., & Alain, C. (2002b). Stepping out of the spotlight: MMN attenuation as a
function of distance from the attended location. NeuroReport, 13, 2209–2212.

Arnott, S. R., & Alain, C. (2011). The auditory dorsal pathway: Orienting vision. Neuro­
science and Biobehavioral Reviews, 35 (10), 2162–2173.

Arnott, S. R., Binns, M. A., Grady, C. L., & Alain, C. (2004). Assessing the auditory dual-
pathway model in humans. NeuroImage, 22, 401–408.

Arnott, S. R., Cant, J. S., Dutton, G. N., & Goodale, M. A. (2008). Crinkling and crumpling:
An auditory fMRI study of material properties. NeuroImage, 43, 368–378.

Arnott, S. R., Grady, C. L., Hevenor, S. J., Graham, S., & Alain, C. (2005). The functional
organization of auditory working memory as revealed by fMRI. Journal of Cognitive Neu­
roscience, 17, 819–831.

Arnott, S. R., & Goodale, M. A. (2006). Distorting visual space with sound. Vision Re­
search, 46, 1553–1558.

Page 25 of 35
Varieties of Auditory Attention

Avan, P., & Bonfils, P. (1992). Analysis of possible interactions of an attentional task with
cochlear micromechanics. Hearing Research, 57, 269–275.

Baylis, G. C., & Driver, J. (1993). Visual attention and objects: Evidence for hierarchical
coding of location. Journal of Experimental Psychology: Human Perception and Perfor­
mance, 19, 451–470.

Benson, D. A., & Hienz, R. D. (1978). Single-unit activity in the auditory cortex of mon­
keys selectively attending left vs. right ear stimuli. Brain Research, 159, 307–320.

Blatt, G. J., Pandya, D. N., & Rosene, D. L. (2003). Parcellation of cortical afferents to
three distinct sectors in the parahippocampal gyrus of the rhesus monkey: An anatomical
and neurophysiological study. Journal of Comparative Neurology, 466, 161–179.

Brefczynski, J. A., & DeYoe, E. A. (1999). A physiological correlate of the “spotlight” of vi­
sual attention. Nature Neuroscience, 2, 370–374.

Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sounds.


London, UK: MIT Press.

Bregman, A. S., & Rudnicky, A. I. (1975). Auditory segregation: Stream or streams? Jour­
nal of Experimental Psychology: Human Perception and Performance, 1, 263–267.

Broadbent, D. E. (1962). Attention and the perception of speech. Scientific American, 206,
143–151.

Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional
neuroimaging studies. Cerebral Cortex, 11, 1110–1123.

Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates dif­
ferent regions of human occipitotemporal cortex. Cerebral Cortex, 17, 713–731.

Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8,
465–471.

Carlyon, R. P., Cusack, R., Foxton, J. M., & Robertson, I. H. (2001). Effects of attention
and unilateral neglect on auditory stream segregation. Journal of Experimental Psycholo­
gy: Human Perception and Performance, 27, 115–127.

Cate, A. D., Herron, T. J., Yund, E. W., Stecker, G. C., Rinne, T., Kang, X., Petkov, C. I., Dis­
brow, E. A., & Woods, D. L. (2009). Auditory attention activates peripheral visual cortex.
PLoS One, 4, e4645.

Chen, Z., & Cave, K. R. (2008). Object-based attention with endogenous cuing and posi­
tional certainty. Perception & Psychophysics, 70, 1435–1443.

Cherry, E. C. (1953). Some experiments on the recognition of speech with one and with
two ears. Journal of the Acoustical Society of America, 25, 975–979.

Page 26 of 35
Varieties of Auditory Attention

Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their
mutual constraints within the human information-processing system. Psychological Bul­
letin, 104, 163–191.

Cowan, N. (1993). Activation, attention, and short-term memory. Memory & Cognition, 21,
162–167.

Culham, J. C., Cavina-Pratesi, C., & Singhal, A. (2006). The role of parietal cortex in visuo­
motor control: what have we learned from neuroimaging? Neuropsychologia, 44, 2668–
2684.

Cusack, R., Deeks, J., Aikman, G., & Carlyon, R. P. (2004). Effects of location, frequency
region, and time course of selective attention on auditory scene analysis. Journal of Ex­
perimental Psychology: Human Perception and Performance, 30, 643–656.

Degerman, A., Rinne, T., Salmi, J., Salonen, O., & Alho, K. (2006). Selective attention to
sound location or pitch studied with fMRI. Brain Research, 1077, 123–134.

Degerman, A., Rinne, T., Sarkka, A. K., Salmi, J., & Alho, K. (2008). Selective attention to
sound location or pitch studied with event-related brain potentials and magnetic fields.
European Journal of Neuroscience, 27, 3329–3341.

Delano, P. H., Elgueda, D., Hamame, C. M., & Robles, L. (2007). Selective attention to vi­
sual stimuli reduces cochlear sensitivity in chinchillas. Journal of Neuroscience, 27, 4146–
4153.

Deouell, L. Y., Deutsch, D., Scabini, D., & Knight, R. T. (2008). No disillusions in auditory
extinction: Perceived a melody comprised of unperceived notes. Frontiers in Human Neu­
roscience, 1, 1–6.

Deutsch, D. (1975). Two-channel listening to musical scales. Journal of the Acoustical So­
ciety of America, 57, 1156–1160.

Driver, J. (2001). A selective review of selective attention research from the past century.
British Journal of Psychology, 92, 53–78.

Driver, J., & Baylis, G. C. (1989). Movement and visual attention: The spotlight metaphor
breaks down. Journal of Experimental Psychology: Human Perception and Performance,
15, 448–456.

Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology: General, 113, 501–517.

Dyson, B. J., & Alain, C. (2004). Representation of concurrent acoustic objects in primary
auditory cortex. Journal of the Acoustical Society of America, 115, 280–288.

Dyson, B. J., Alain, C., & He, Y. (2005). Effects of visual attentional load on low-level audi­
tory scene analysis. Cognitive, Affective, & Behavioral Neuroscience, 5, 319–338.

Page 27 of 35
Varieties of Auditory Attention

Dyson, B. J., & Ishfaq, F. (2008). Auditory memory can be object-based. Psychonomic Bul­
letin & Review, 15, 409–412.

Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo­
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy­
chology: General, 123, 161–177.

Gamble, M. L., & Luck, S. J. (2011). N2ac: An ERP component associated with the focus­
ing of attention within an auditory scene. Psychophysiology, 48, 1057–1068.

Giard, M. H., Collet, L., Bouchet, P., & Pernier, J. (1994). Auditory selective attention in
the human cochlea. Brain Research, 633, 353–356.

Giard, M. H., Fort, A., Mouchetant-Rostaing, Y., & Pernier, J. (2000). Neurophysiological
mechanisms of auditory selective attention in humans. Frontiers in Bioscience, 5, D84–
D94.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15, 20–25.

Green, J. J., Teder-Salejarvi, W. A., & McDonald, J. J. (2005). Control mechanisms mediat­
ing shifts of attention in auditory and visual space: A spatio-temporal ERP analysis. Exper­
imental Brain Research, 166, 358–369.

Griffiths, T. D., Warren, J. D., Scott, S. K., Nelken, I., & King, A. J. (2004). Cortical process­
ing of complex sound: A way forward? Trends in Neurosciences, 27, 181–185.

Grill-Spector, K., Henson, R., & Martin, A. (2006). Repetition and the brain: Neural mod­
els of stimulus-specific effects. Trends in Cognitive Sciences, 10, 14–23.

Gutschalk, A., Micheyl, C., Melcher, J. R., Rupp, A., Scherg, M., & Oxenham, A. J. (2005).
Neuromagnetic correlates of streaming in human auditory cortex. Journal of Neuro­
science, 25, 5382–5388.

Hansen, J. C., & Hillyard, S. A. (1980). Endogenous brain potentials associated with selec­
tive auditory attention. Electroencephalography and Clinical Neurophysiology, 49, 277–
290.

Hansen, J. C., & Hillyard, S. A. (1983). Selective attention to multidimensional auditory


stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 1–19.

Hecht, L. N., Abbs, B., & Vecera, S. P. (2008). Auditory object-based attention. Visual Cog­
nition, 16, 1109–1115.

Hill, K. T., & Miller, L. M. (2009). Auditory attentional control and selection during cock­
tail party listening. Cerebral Cortex, 20, 583–590.

Hillyard, S. A., Hink, R. F., Schwent, V. L., & Picton, T. W. (1973). Electrical signs of selec­
tive attention in the human brain. Science, 182, 177–180.
Page 28 of 35
Varieties of Auditory Attention

Hillyard, S. A., Vogel, E. K., & Luck, S. J. (1998). Sensory gain control (amplification) as a
mechanism of selective attention: Electrophysiological and neuroimaging evidence. Philo­
sophical Transactions of the Royal Society of London, Series B, Biological Sciences, 353,
1257–1270.

Hillyard, S. A., Woldorff, M., Mangun, G. R., & Hansen, J. C. (1987). Mechanisms of early
selective attention in auditory and visual modalities. Electroencephalography and Clinical
Neurophysiology Supplement, 39, 317–324.

Hocherman, S., Benson, D. A., Goldstein, M. H., Jr., Heffner, H. E., & Hienz, R. D. (1976).
Evoked unit activity in auditory cortex of monkeys performing a selective attention task.
Brain Research, 117, 51–68.

Johnson, J. A., Strafella, A. P., & Zatorre, R. J. (2007). The role of the dorsolateral pre­
frontal cortex in bimodal divided attention: two transcranial magnetic stimulation studies.
Journal of Cognitive Neuroscience, 19, 907–920.

Johnson, J. A., & Zatorre, R. J. (2005). Attention to simultaneous unrelated auditory and vi­
sual events: behavioral and neural correlates. Cerebral Cortex, 15, 1609–1620.

Johnson, J. A., & Zatorre, R. J. (2006). Neural substrates for dividing and focusing
(p. 234)

attention between simultaneous auditory and visual events. NeuroImage, 31, 1673–1681.

Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, atten­
tion, and memory. Psychological Review, 83, 323–355.

Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological
Review, 96, 459–491.

Kauramaki, J., Jaaskelainen, I. P., & Sams, M. (2007). Selective attention increases both
gain and feature selectivity of the human auditory cortex. PLoS One, 2, e909.

Kawashima, R., Imaizumi, S., Mori, K., Okada, K., Goto, R., Kiritani, S., Ogawa, A., &
Fukuda, H. (1999). Selective visual and auditory attention toward utterances: A PET
study. NeuroImage, 10, 209–215.

Kerlin, J. R., Shahin, A. J., & Miller, L. M. (2010). Attentional gain control of ongoing corti­
cal speech representations in a “cocktail party.” Journal of Neuroscience, 30, 620–628.

Kubovy, M., & Van Valkenburg, D. (2001). Auditory and visual objects. Cognition, 80, 97–
126.

LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen­
tal Psychology: Human Perception and Performance, 9, 371–379.

Lavenex, P., Suzuki, W. A., & Amaral, D. G. (2004). Perirhinal and parahippocampal cor­
tices of the macaque monkey: Intrinsic projections and interconnections. Journal of Com­
parative Neurology, 472, 371–394.

Page 29 of 35
Varieties of Auditory Attention

Leung, A. W., & Alain, C. (2011). Working memory load modulates the auditory “What”
and “Where” neural networks. NeuroImage, 55, 1260–1269.

Lewis, J. W. (2006). Cortical networks related to human use of tools. Neuroscientist, 12,
211–231.

Lewis, J. W., Brefczynski, J. A., Phinney, R. E., Janik, J. J., & DeYoe, E. A. (2005). Distinct
cortical pathways for processing tool versus animal sounds. Journal of Neuroscience, 25,
5148–5158.

Linden, D. E., Prvulovic, D., Formisano, E., Vollinger, M., Zanella, F. E., Goebel, R., &
Dierks, T. (1999). The functional neuroanatomy of target detection: An fMRI study of visu­
al and auditory oddball tasks. Cerebral Cortex, 9, 815–823.

Lipschutz, B., Kolinsky, R., Damhaut, P., Wikler, D., & Goldman, S. (2002). Attention-de­
pendent changes of activation and connectivity in dichotic listening. NeuroImage, 17,
643–656.

Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J. (2004). Spatial and temporal
factors during processing of audiovisual speech: A PET study. NeuroImage, 21, 725–732.

Macken, W. J., Tremblay, S., Houghton, R. J., Nicholls, A. P., & Jones, D. M. (2003). Does
auditory streaming require attention? Evidence from attentional selectivity in shortterm
memory. Journal of Experimental Psychology: Human Perception and Performance, 29,
43–51.

Maeder, P. P., Meuli, R. A., Adriani, M., Bellmann, A., Fornari, E., Thiran, J. P., Pittet, A., &
Clarke, S. (2001). Distinct pathways involved in sound recognition and localization: a hu­
man fMRI study. NeuroImage, 14, 802–816.

Maison, S., Micheyl, C., & Collet, L. (2001). Influence of focused auditory attention on
cochlear activity in humans. Psychophysiology, 38, 35–40.

Martinkauppi, S., Rama, P., Aronen, H. J., Korvenoja, A., & Carlson, S. (2000). Working
memory of auditory localization. Cerebral Cortex, 10, 889–898.

McAdams, S., & Bertoncini, J. (1997). Organization and discrimination of repeating sound
sequences by newborn infants. Journal of the Acoustical Society of America, 102, 2945–
2953.

McMains, S. A., & Somers, D. C. (2004). Multiple spotlights of attentional selection in hu­
man visual cortex. Neuron, 42, 677–686.

Meienbrock, A., Naumer, M. J., Doehrmann, O., Singer, W., & Muckli, L. (2007). Retino­
topic effects during spatial audiovisual integration. Neuropsychologia, 45, 531–539.

Michie, P. T., LePage, E. L., Solowij, N., Haller, M., & Terry, L. (1996). Evoked otoacoustic
emissions and auditory selective attention. Hearing Research, 98, 54–67.

Page 30 of 35
Varieties of Auditory Attention

Michie, P. T., Solowij, N., Crawford, J. M., & Glue, L. C. (1993). The effects of between-
source discriminability on attended and unattended auditory ERPs. Psychophysiology, 30,
205–220.

Mondor, T. A., & Terrio, N. A. (1998). Mechanisms of perceptual organization and audito­
ry selective attention: The role of pattern structure. Journal of Experimental Psychology:
Human Perception and Performance, 24, 1628–1641.

Moray, N., & O’Brien, T. (1967). Signal-detection theory applied to selective listening.
Journal of the Acoustical Society of America, 42, 765–772.

Munte, T. F., Spring, D. K., Szycik, G. R., & Noesselt, T. (2010). Electrophysiological atten­
tion effects in a virtual cocktail-party setting. Brain Research, 1307, 78–88.

Näätänen, R., Gaillard, A. W., & Mantysalo, S. (1978). Early selective-attention effect on
evoked potential reinterpreted. Acta Psychologica (Amsterdam), 42, 313–329.

O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the
units of attentional selection. Nature, 401, 584–587.

Okamoto, H., Stracke, H., Wolters, C. H., Schmael, F., & Pantev, C. (2007). Attention im­
proves population-level frequency tuning in human auditory cortex. Journal of Neuro­
science, 27, 10383–10390.

Ortuno, F., Ojeda, N., Arbizu, J., Lopez, P., Marti-Climent, J. M., Penuelas, I., & Cervera, S.
(2002). Sustained attention in a counting task: normal performance and functional neu­
roanatomy. NeuroImage, 17, 411–420.

Paltoglou, A. E., Sumner, C. J., & Hall, D. A. (2009). Examining the role of frequency speci­
ficity in the enhancement and suppression of human cortical activity by auditory selective
attention. Hearing Research, 257, 106–118.

Petkov, C. I., Kang, X., Alho, K., Bertrand, O., Yund, E. W., & Woods, D. L. (2004). Atten­
tional modulation of human auditory cortex. Nature Neuroscience, 7, 658–663.

Picton, T. W., Alain, C., Otten, L., Ritter, W., & Achim, A. (2000). Mismatch negativity: Dif­
ferent water in the same river. Audiology & Neuro-otology, 5, 111–139.

Pratt, J., & Turk-Browne, N. B. (2003). The attentional repulsion effect in perception and
action. Experimental Brain Research, 152, 376–382.

Rama, P., Poremba, A., Sala, J. B., Yee, L., Malloy, M., Mishkin, M., & Courtney, S. M.
(2004). Dissociable functional cortical topographies for working memory maintenance of
voice identity and location. Cerebral Cortex, 14, 768–780.

Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhu­
man primates illuminate human speech processing. Nature Neuroscience, 12, 718–724.

Page 31 of 35
Varieties of Auditory Attention

Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of
(p. 235)

“what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences
U S A, 97, 11800–11806.

Recanzone, G. H. (2003). Auditory influences on visual temporal rate perception. Journal


of Neurophysiology, 89, 1078–1093.

Rimmele, J., Jolsvai, H., & Sussman, E. (2011). Auditory target detection is affected by im­
plicit temporal and spatial expectations. Journal of Cognitive Neuroscience, 23, 1136–
1147.

Rinne, T., Balk, M. H., Koistinen, S., Autti, T., Alho, K., & Sams, M. (2008). Auditory selec­
tive attention modulates activation of human inferior colliculus. Journal of Neurophysiolo­
gy, 100, 3323–3327.

Rinne, T., Kirjavainen, S., Salonen, O., Degerman, A., Kang, X., Woods, D. L., & Alho, K.
(2007). Distributed cortical networks for focused auditory attention and distraction. Neu­
roscience Letters, 416, 247–251.

Ross, B., Hillyard, S. A., & Picton, T. W. (2010). Temporal dynamics of selective attention
during dichotic listening. Cerebral Cortex, 20, 1360–1371.

Rossi-Katz, J., & Arehart, K. H. (2009). Message and talker identification in older adults:
Effects of task, distinctiveness of the talkers’ voices, and meaningfulness of the compet­
ing message. Journal of Speech, Language, and Hearing Research, 52, 435–453.

Salmi, J., Rinne, T., Degerman, A., & Alho, K. (2007a). Orienting and maintenance of spa­
tial attention in audition and vision: An event-related brain potential study. European
Journal of Neuroscience, 25, 3725–3733.

Salmi, J., Rinne, T., Degerman, A., Salonen, O., & Alho, K. (2007b). Orienting and mainte­
nance of spatial attention in audition and vision: multimodal and modality-specific brain
activations. Brain Structure and Function, 212, 181–194.

Salmi, J., Rinne, T., Koistinen, S., Salonen, O., & Alho, K. (2009). Brain networks of bot­
tom-up triggered and top-down controlled shifting of auditory attention. Brain Research,
1286, 155–164.

Sanders, L. D., & Astheimer, L. B. (2008). Temporally selective attention modulates early
perceptual processing: Event-related potential evidence. Perception & Psychophysics, 70,
732–742.

Santangelo, V., Fagioli, S., & Macaluso, E. (2010). The costs of monitoring simultaneously
two sensory modalities decrease when dividing attention in space. NeuroImage, 49, 2717–
2727.

Page 32 of 35
Varieties of Auditory Attention

Sestieri, C., Di Matteo, R., Ferretti, A., Del Gratta, C., Caulo, M., Tartaro, A., Olivetti Be­
lardinelli, M., & Romani, G. L. (2006). “What” versus “where” in the audiovisual domain:
an fMRI study. NeuroImage, 33, 672–680.

Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in
auditory scene analysis. Trends in Neurosciences, 34, 114–123.

Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear.
Nature, 408, 788.

Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in


Cognitive Sciences, 12, 182–186.

Shomstein, S., & Yantis, S. (2004). Control of attention shifts between vision and audition
in human cortex. Journal of Neuroscience, 24, 10702–10706.

Shomstein, S., & Yantis, S. (2006). Parietal cortex mediates voluntary control of spatial
and nonspatial auditory attention. Journal of Neuroscience, 26, 435–439.

Smith, D. V., Davis, B., Niu, K., Healy, E. W., Bonilha, L., Fridriksson, J., Morgan, P. S., &
Rorden, C. (2010). Spatial attention evokes similar activation patterns for visual and audi­
tory stimuli. Journal of Cognitive Neuroscience, 22, 347–361.

Snyder, J. S., Alain, C., & Picton, T. W. (2006). Effects of attention on neuroelectric corre­
lates of auditory stream segregation. Journal of Cognitive Neuroscience, 18, 1–13.

Snyder, J. S., Carter, O. L., Hannon, E. E., & Alain, C. (2009). Adaptation reveals multiple
levels of representation in auditory stream segregation. Journal of Experimental Psycholo­
gy: Human Perception and Performance, 35, 1232–1244.

Stevens, M. C., Calhoun, V. D., & Kiehl, K. A. (2005). fMRI in an oddball task: Effects of
target-to-target interval. Psychophysiology, 42, 636–642.

Stormer, V. S., Green, J. J., & McDonald, J. J. (2009). Tracking the voluntary control of au­
ditory spatial attention with event-related brain potentials. Psychophysiology, 46, 357–
366.

Sussman, E. S., Bregman, A. S., Wang, W. J., & Khan, F. J. (2005). Attentional modulation
of electrophysiological activity in auditory cortex for unattended sounds within multi­
stream auditory environments. Cognitive, Affective, and Behavioral Neuroscience, 5, 93–
110.

Sussman, E. S., Horváth, J., Winkler, I., & Orr, M. (2007). The role of attention in the for­
mation of auditory streams. Perception & Psychophysics, 69, 136–152.

Suzuki, K., Takei, N., Toyoda, T., Iwata, Y., Hoshino, R., Minabe, Y., & Mori, N. (2003). Au­
ditory hallucinations and cognitive impairment in a patient with a lesion restricted to the
hippocampus. Schizophrenia Research, 64, 87–89.

Page 33 of 35
Varieties of Auditory Attention

Suzuki, S., & Cavanagh, P. (1997). Focused attention distorts visual space: An attentional
repulsion effect. Journal of Experimental Psychology: Human Perception and Performance,
23, 443–463.

Tark, K. J., & Curtis, C. E. (2009). Persistent neural activity in the human frontal cortex
when maintaining space that is off the map. Nature Neuroscience, 12, 1463–1468.

Timpe-Syverson, G. K., & Decker, T. N. (1999). Attention effects on distortion-product


otoacoustic emissions with contralateral speech stimuli. Journal of the American Academy
of Audiology, 10, 371–378.

Trejo, L. J., Ryan-Jones, D. L., & Kramer, A. F. (1995). Attentional modulation of the mis­
match negativity elicited by frequency differences between binaurally presented tone
bursts. Psychophysiology, 32, 319–328.

Tzourio, N., Massioui, F. E., Crivello, F., Joliot, M., Renault, B., & Mazoyer, B. (1997).
Functional anatomy of human auditory attention studied with PET. NeuroImage, 5, 63–77.

Valdes-Sosa, M., Bobes, M. A., Rodriguez, V., & Pinilla, T. (1998). Switching attention
without shifting the spotlight object-based attentional modulation of brain potentials.
Journal of Cognitive Neuroscience, 10, 137–151.

Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory dis­
crepancy. Psychological Bulletin, 88, 638–667.

Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive
regularity representations and perceptual objects. Trends in Cognitive Sciences, 13, 532–
540.

Woldorff, M. G. (1995). Selective listening at fast stimulus rates: so much to hear, so little
time. Electroencephalography and Clinical Neurophysiology Supplement, 44, 32–51.

Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D.,
(p. 236)

& Bloom, F. E. (1993). Modulation of early sensory processing in human auditory cortex
during auditory selective attention. Proceedings of the National Academy of Science U S
A, 90, 8722–8726.

Woldorff, M. G., Hackley, S. A., & Hillyard, S. A. (1991). The effects of channel-selective
attention on the mismatch negativity wave elicited by deviant tones. Psychophysiology,
28, 30–42.

Woldorff, M. G., & Hillyard, S. A. (1991). Modulation of early auditory processing during
selective listening to rapidly presented tones. Electroencephalography and Clinical Neu­
rophysiology, 79, 170–191.

Woods, D. L., & Alain, C. (1993). Feature processing during high-rate auditory selective
attention. Perception & Psychophysics, 53, 391–402.

Page 34 of 35
Varieties of Auditory Attention

Woods, D. L., & Alain, C. (2001). Conjoining three auditory features: An event-related
brain potential study. Journal of Cognitive Neuroscience, 13, 492–509.

Woods, D. L., Alain, C., Diaz, R., Rhodes, D., & Ogawa, K. H. (2001). Location and fre­
quency cues in auditory selective attention. Journal of Experimental Psychology: Human
Perception and Performance, 27, 65–74.

Woods, D. L., Alho, K., & Algazi, A. (1994). Stages of auditory feature conjunction: An
event-related brain potential study. Journal of Experimental Psychology: Human Percep­
tion and Performance, 20, 81–94.

Wu, C. T., Weissman, D. H., Roberts, K. C., & Woldorff, M. G. (2007). The neural circuitry
underlying the executive control of auditory spatial attention. Brain Research, 1134, 187–
198.

Yantis, S., & Serences, J. T. (2003). Cortical mechanisms of space-based and object-based
attentional control. Current Opinion in Neurobiology, 13, 187–193.

Yoncheva, Y. N., Zevin, J. D., Maurer, U., & McCandliss, B. D. (2010). Auditory selective at­
tention to speech modulates activity in the visual word form area. Cerebral Cortex, 20,
622–632.

Yoshiura, T., Zhong, J., Shibata, D. K., Kwok, W. E., Shrier, D. A., & Numaguchi, Y. (1999).
Functional MRI study of auditory and visual oddball tasks. NeuroReport, 10, 1683–1688.

Zatorre, R. J., Mondor, T. A., & Evans, A. C. (1999). Auditory attention to space and fre­
quency activates similar cerebral systems. NeuroImage, 10, 544–554.

Claude Alain

Claude Alain is Senior Scientist and Assistant Director Rotman Research Institute,
Baycrest Centre; Professor Department of Psychology & Institute of Medical
Sciences, University of Toronto.

Stephen R. Arnott

Stephen R. Arnott, Rotman Research Institute, Baycrest Centre for Geriatric Care.

Benjamin J. Dyson

Benjamin J. Dyson, Department of Psychology, Ryerson University.

Page 35 of 35
Spatial Attention

Spatial Attention  
Jeffrey R. Nicol
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0012

Abstract and Keywords

Spatial attention facilitates adaptive interaction in the environment by enhancing the per­
ceptual processing associated with selected locations or objects and suppressing process­
ing associated with nonselected stimuli. This chapter presents spatial attention as a di­
chotomy of visual processes, mechanisms, and neural networks. Specifically, distinctions
are made between space- and object-based models of attentional selection, overt and
covert orienting, reflexive and voluntary control of the allocation of resources, and dorsal
and ventral frontoparietal neural networks. The effects of spatial attention on neurophysi­
ological activity in subcortical and visual cortical areas are reviewed, as are the findings
from behavioral studies examining the effects of spatial attention on early (i.e., low-level)
visual perception.

Keywords: space-based selection, object-based selection, reflexive control, voluntary control, covert orienting,
overt orienting, eye movements, neural networks of attention, visual perception

When our eyes open to view the environment that surrounds us, we are immediately con­
fronted with a massive amount of information. This situation presents a problem to the vi­
sual system because it is limited in the amount of information that it can process, and
hence be aware of, at one time (e.g., Broadbent, 1958). Adaptive interaction with the en­
vironment, therefore, requires a neural mechanism that effectively facilitates the selec­
tion of behaviorally relevant information for enhanced processing while simultaneously
suppressing the processing associated with irrelevant information. This mechanism,
which permits differential processing of spatially and temporally contiguous sources of in­
formation, is referred to as selective attention (e.g., Johnston & Dark, 1986), and it broad­
ly describes “those processes that enable an observer to recruit resources for processing
selected aspects of the retinal image more fully than nonselected aspects” (Palmer, 1999,
p. 532). Although a considerable body of research has demonstrated effects of spatial at­
tention in the auditory sensory modality (e.g., Spence & Driver, 1994) and cross-modally
(see Spence & Driver, 2004), the majority of research has focused on the visual modality.
Accordingly, the present chapter represents a selective review of seminal and recent
studies that have examined visual spatial attention. Spatial attention is defined here as

Page 1 of 30
Spatial Attention

the focusing of limited capacity visual processing resources to a specific location in space
for the purpose of selective perceptual enhancement of the information at that location.

Theories, Models, and Metaphors of Spatial At­


tention
Given that mechanisms of spatial attention are charged with the critical task of selecting
behaviorally relevant information for further processing, (p. 238) the question becomes:
What does spatial attention select? In other words: What are the units of selection? For
decades, researchers of attentional selection have been largely polarized into two fac­
tions. Broadly speaking, there are those that support space-based models and those that
support object-based models of attentional selection.

Space-Based Models

Space-based theories of attention posit that space is the primary unit of selection. Gener­
al support for space-based approaches comes from studies showing that performance is
negatively affected when a target stimulus is flanked by spatially contiguous distractors
(i.e., within 1 degree of visual angle), but not when distractors are more disparate from
the target (Eriksen & Eriksen, 1974; Eriksen & Hoffman, 1972). Common space-based
theories use metaphors such as a spotlight, a zoom lens, or a gradient to describe the
process of attentional selection. According to the spotlight account (e.g., Posner, 1978;
Posner, Snyder, & Davidson, 1980), attention operates like a beam of light that enhances
processing of stimuli that occupy regions that fall within its boundary, and inhibits, or at­
tenuates, processing of stimuli at locations outside its boundary. The spotlight metaphor
of attention was developed to account for findings that emerged from Posner and col­
leagues’ (Posner, 1978; Posner, Nissen, & Ogden, 1978; Posner et al., 1980) seminal work
using the spatial cueing task.

In a typical spatial cueing task, observers fixate on a centrally presented stimulus and are
presented with a cue that directs attention to a specific spatial location. Following the
cue, a target appears, and observers are required to either detect the onset of the target
as quickly as possible or perform a target discrimination as quickly and as accurately as
possible. Generally, three different cue types are used: valid cues that indicate where the
target will appear, invalid cues that direct attention away from the target location, and
neutral cues that alert the observer to the ensuing target onset in the absence of direc­
tional information (see Jonides & Mack, 1984, and Wright, Richard, & McDonald, 1995, in
regard to methodological issues surrounding neutral cues). Cost–benefit analysis of reac­
tion time (RT) (i.e., the perceptual cost associated with invalid cues is determined by sub­
tracting mean invalid RT from neutral RT, and the perceptual benefit associated with
valid cues is determined by subtracting neutral RT from valid RT) reveal that, relative to
the neutral cueing condition, targets are detected faster when they appear at the cued lo­
cation and slower when they appear at the uncued location (Posner et al., 1978). Percep­
tual discriminations are also more accurate for targets that appear at the cued than un­
Page 2 of 30
Spatial Attention

cued location (Posner et al., 1980). The spotlight account of the data forwarded by Posner
and colleagues contends that detection was more efficient at the cued location because
attentional resources were allocated to that location before the onset of the target,
whereas detection was less efficient at the uncued location because of the additional time
it took to shift attention to the actual location of the target.

An important aspect of Posner and colleagues’ spotlight account is the notion that spatial
attention must disengage from its origin before it can shift to another location (e.g., Pos­
ner & Petersen, 1990; Posner, Petersen, Fox, & Raichle, 1998). In fact, they propose that
shifts of spatial attention involve three distinct processes, each of which is associated
with a specific neural substrate. First, an area of the parietal lobe called the temporal-
parietal junction (TPJ) activates to permit disengagement of spatial attention from the
current orientation. Next, a midbrain structure called the superior colliculus determines
the target location and initiates the attentional shift toward that location. Finally, an area
of the thalamus called the pulvinar activates to facilitate engagement of attention at the
new spatial location. Perhaps the most convincing evidence in support of the disengage
function comes from research demonstrating that perceptual discriminations improve
when the central fixation stimulus is removed just before the onset of peripherally pre­
sented targets (e.g., Mackeben & Nakayama, 1993; Pratt & Nghiem, 2000)—a finding re­
ferred to as the “gap effect” (Saslow, 1967).

In contrast to the spotlight metaphor, which assumes that the size of the attentional beam
is fixed, the zoom lens theory (Eriksen & St. James, 1986; Eriksen & Yeh, 1985) suggests
that the spatial extent of attention can be focused or diffuse, depending on the demands
of the task. Support for that claim comes from an experiment by La Berge (1983) that re­
quired observers to categorize five-letter words, the middle letter of five-letter words, or
the middle letter of five-letter nonwords in a speeded RT task. Within each condition, crit­
ical probe trials were periodically presented. On those trials, observers categorized a
probe stimulus that appeared in one of the five letter positions (+ signs occupied the re­
maining four positions). In the letter conditions, RTs were fastest when the probe ap­
peared in the middle (i.e., attended) position, and slowest (p. 239) when it appeared at the
first and fifth positions (i.e., the data formed a V-shaped RT function). In the word condi­
tion, however, RT was not affected by the position of the probe (i.e., the RT function was
flat). The findings support the zoom lens theory by demonstrating that the focus of atten­
tion can indeed vary in size, in accordance with the demands of the task.

Zoom lens theory also proposes that the resolution of the visual system that is afforded by
spatial attention is determined by the variable scope of the lens: focusing the lens yields
higher resolution within a narrow spatial area, whereas widening the lens yields lower
resolution over a broader spatial area (e.g., Eriksen & Yeh, 1985). Consistent with that
proposal, Eriksen and St. James (1986) showed that RTs to targets increased as a function
of the number of spatial pre-cues presented. In the experiment, between one and four
contiguous locations of an eight-item circular array were spatially cued, followed by the
presentation of the target and distractors items. The results showed that target discrimi­
nations became slower as the number of cued items in the display increased, suggesting

Page 3 of 30
Spatial Attention

that attentional resources are “watered down” as they are expanded over larger spatial
areas (Eriksen & St. James, 1986, p. 235). Two other findings from that study are also
worth noting: the entire array used in the experiment only subtended 1.5 degrees of visu­
al angle, so the reported effects emerged when attention was quite focused; and the cue-
to-target stimulus onset asynchrony (SOA) manipulation revealed that performance was
asymptotic beyond 100 ms, suggesting that the zoom lens required approximately that
much time to make adjustments within spatial areas of that size.

The space-based theories discussed above assume that the attentional field is indivisible
(e.g., Eriksen & Yeh, 1985; Posner et al., 1980). Several studies, however, have found evi­
dence to the contrary (e.g., Bichot, Cave, & Pashler, 1999; Gobell, Tseng, & Sperling,
2004; Kramer & Hahn, 1995). For example, Kramer and Hahn (1995) presented observers
with displays of two targets with two spatially intervening distractors. On each trial, two
boxes were used to pre-cue the target positions, and observers were required to report
whether the letters that appeared within the boxes were the same or different. To avoid
an attentional capture confound associated with abrupt onsets, targets and distractors
were initially presented as square figure eights, then shortly after the cues were present­
ed, segments of the figure eights were removed to produce letters. On half of the trials,
the distractors primed a same response, and on the other half, they primed a different re­
sponse. The researchers found that both RT and discrimination accuracy were unaffected
by the presence of the intervening distractors and concluded that “attention can be flexi­
bly deployed and maintained on multiple locations” (Kramer & Hahn, 1995, p. 384).

Object-Based Models

The finding that the attentional field can be divided into multiple noncontiguous regions
of space represents a major obstacle for space-based models of attention. According to
object-based models, however, perceptual grouping mechanisms make it possible to di­
vide the attentional field across spatially disparate or partially occluded stimuli (Driver &
Baylis, 1989; Duncan, 1984; Kahneman, Treisman, & Gibbs, 1992; Kramer & Jacobson,
1991). Object-based models assert that attentional selection is determined by the number
of objects that are present in the visual field and emphasize the influence of gestalt
grouping factors on the distribution of attention (Duncan, 1984; Neisser, 1967).

One of the original object-based models contended that attentional selection is a two-
stage process (Neisser, 1967). In the first stage, the visual field is preattentively parsed
into perceptual units (i.e., into objects) according to the gestalt principles of grouping
(e.g., similarity, proximity). Then, in the second stage, the object is analyzed in detail by
focal attention. Proponents of object-based theories of selection generally argue that once
attention is directed to an object, all parts or features of that object are automatically se­
lected regardless of spatial location (e.g., Duncan, 1984; Egly, Rafal, & Driver, 1994; Kah­
neman & Treisman, 1984). As such, all parts of the same object are processed in a paral­
lel fashion, whereas different objects are processed serially (Treisman, Kahneman, &
Burkell, 1983).

Page 4 of 30
Spatial Attention

In a classic demonstration of object-based attentional selection, Duncan (1984) presented


observers with two spatially overlapping objects, each with two attributes: a line that was
either dotted or dashed and tilted to the left or right, and a box that was either large or
small, with a gap on the left or right side. Observers were told which two attributes they
would need to report before each trial. The results showed that discriminations were less
accurate when observers reported two attributes of different objects (i.e., one attribute of
the line and one attribute of the box) than two attributes of the same object (i.e., both at­
tributes of the line or the (p. 240) box). Given that the objects appeared in the same loca­
tion, a space-based account of attentional selection would predict that performance
would not differ across the same- and different-object conditions. Thus, the results clearly
support an object-based theory of spatial attention.

Object-based attention is also supported by studies showing that attention automatically


spreads across selected objects. In one such study, Egly et al. (1994) presented observers
with two rectangular placeholders oriented lengthwise on either side of fixation (or ori­
ented sideways above and below fixation). In each trial, one rectangle was cued by a brief
bolding of one end, followed by the presentation of a target disk at the cued or uncued
end of the cued rectangle, or at the end of the uncued rectangle that was adjacent to the
cued location on the cued rectangle. Critically, the uncued end of the cued rectangle and
the uncued end of the uncued rectangle were equidistant from the cued location. Despite
equal spacing across the two uncued conditions, observers were faster to respond to tar­
gets that appeared at the uncued end of the cued object than at the uncued end of the un­
cued object, suggesting that attention had automatically spread across the cued object
(Egly et al., 1994).

Control of Spatial Attention


Spatial attention can be controlled reflexively by external stimuli that capture attention,
or voluntarily by internally generated goals (c.f. Klein, 2004). These two types of visual
orienting are typically referred to as exogenously driven and endogenously driven
attentional control, respectively (Posner, 1980). Similarly, a stimulus-driven shift of atten­
tion is said to be “pulled” to a peripheral location by the onset of a salient stimulus, and a
goal-directed shift of attention is said to be “pushed” to a peripheral location following
the cognitive interpretation of a centrally presented central cue (e.g., Corbetta & Shul­
man, 2002).

One of the earliest studies to comprehensively examine the idea that both automatic and
voluntary mechanisms can guide the allocation of spatial attention resources was con­
ducted by Jonides (1981). He tested the independence of these two mechanisms by pre­
senting observers with either a centrally presented directional cue (i.e., an arrow at fixa­
tion) or a peripherally presented location cue (i.e., an arrow adjacent to a potential target
position in the periphery) and then asking them to perform a visual search for one of two
targets in a circular array of eight letters. Across three experiments, Jonides showed that
RTs to targets following central cues, but not peripheral cues, are slowed when mental re­

Page 5 of 30
Spatial Attention

sources are simultaneously consumed in a working memory task (i.e., holding a sequence
of numbers in mind); cueing effects persist when observers are asked to ignore peripher­
al cues, but not when they are asked to ignore central cues; and cueing effects are modu­
lated as a function of the relative proportion of central cues that observers expect to be
presented with, but not as a function of the relative proportion of peripheral cues that
they expect to be presented with. Together, the findings were taken as evidence that ex­
ogenously and endogenously driven spatial attention, activated by peripheral and central
cues respectively, “differ in the extent to which they engage attention
automatically” (Jonides, 1981, p. 200).

Although concluding that reflexive or voluntary control processes can independently


guide the allocation of spatial attention, Jonides (1981) nevertheless assumed that the
modes of orienting were parts of the same unitary attentional mechanism. According to
that account, exogenously and endogenously driven attentional controls differ simply in
the process by which they propel spatial attention to a specific spatial location or object
in a region of space. Other researchers, however, have argued that automatic and volun­
tary orienting are distinct separate attentional mechanisms (c.f. Klein & Shore, 2000).
The two-mechanism model of spatial attention put forward by Nakayama and Mackeben
(1989) consists of a relatively primitive fast-acting transient component that is guided by
stimulus-driven, or bottom-up, processes and affects perception at early stages of cortical
processing, and a more recently evolved sustained component that is guided by goal-di­
rected, or top-down, processes. According to the model, the transient component is re­
sponsible for generating rapid attentional shifts to a cued location, whereas the sustained
component is needed to hold attention at that location (Nakayama & Mackeben, 1989). In
an empirical test of their two-mechanism model, Nakayama and Mackeben (1989)
instructed observers to perform either a simple search (i.e., orientation distinguished the
target from the distractors) or a conjunctive search (i.e., target orientation and color dis­
tinguished it from distractors) for a target amid an eight-by-eight display array. In the
sustained attention condition, observers’ attention was directed to the valid target loca­
tion by a location cue that remained visible for the duration of each trial and appeared at
the same position in the array across all trials. Sustained (p. 241) attention was also inves­
tigated by informing observers of the valid target location without presenting them with a
physical cue. In both scenarios, performance on cued (or informed) trials was compared
with performance when no cue was presented. In the transient attention condition, spa­
tial attention was directed to the valid target location by a spatially unpredictable loca­
tion cue with an onset just before the display of the search array. The results showed that
sustained attentional control facilitated performance in the conjunctive search task, but
not in the simple search task (also see Treisman & Gelade, 1980) and transient attention­
al control facilitated performance when the cue preceded the display array by 50 to 200
ms, but impaired performance at SOAs longer than 200 ms.

A two-mechanism model of spatial attention was also supported by the results of a study
showing that reflexive and voluntary orientations have different time courses with re­
spect to their respective facilitative and inhibitory effects on perception. Muller and Rab­
bitt (1989) presented observers with a central or a peripheral cue that directed their at­
Page 6 of 30
Spatial Attention

tention to one of four target locations. Following the cue, a target stimulus appeared at
the cued or an uncued location, and observers were required to discriminate whether it
was the same or different from a previously presented stimulus. Peripheral cues, which
activate reflexive orienting, produced maximal facilitation of target discriminations at the
cued location at short (100–175 ms) SOAs and also improved performance at the uncued
location at longer (400–725 ms) SOAs. In contrast, central cues, which activate voluntary
orienting, facilitated performance at the cued location maximally at SOAs between 275
and 400 ms. Based on these results, the researchers proposed that spatial orienting is
composed of a fast-acting mechanism that briefly facilitates, but then inhibits, attentional
processing at peripherally cued locations, and a slower-acting mechanism, activated only
by central cues, that facilitates attentional processing for a more sustained interval
(Muller & Rabbitt, 1989).

It is noteworthy that the results from both studies that were just reviewed are consistent
with the findings from an earlier study by Posner and Cohen (1984) showing that an inhi­
bition of return (IOR) effect occurs when spatial attention is directed exogenously in re­
sponse to peripheral cues, but not when it is directed endogenously by central cues. In
the study, observers were presented with three placeholder boxes along the horizontal
meridian of the screen. In the peripheral cueing procedure, one of peripheral boxes
brightened briefly, followed by brief brightening of the centre box (i.e., to reorient spatial
attention away from the initially cued location). The central cueing procedure was similar,
except an arrow stimulus appeared in the center box, followed by the brightening of that
box 600 ms later (if no target had appeared in a peripheral box 450 ms following the on­
set of the central cue). In line with previous research (e.g., Posner et al., 1978, 1980), pe­
ripheral cues and central cues produced reliable cueing effects: target detection was
most efficient when the target appeared at the cued location. However, although the facil­
itative effect of the cue persisted across the entire range of cue–target SOAs when cen­
tral cues were used, target detection was actually inhibited at the cued location at SOAs
beyond 200 ms when peripheral cues were used. Posner and Cohen (1984) called the ef­
fect IOR and suggested that it reflected an evolved mechanism that promoted efficient vi­
sual searching of the environment (i.e., by assigning inhibitory tags to previously exam­
ined spatial locations). The finding that the IOR effect occurs when attention is controlled
exogenously, but not when it is under endogenous control (but see Lupianez et al., 2004
for an exception), clearly supports a two-mechanism model of spatial attention.

Perhaps the strongest evidence supporting the distinction between exogenous and en­
dogenous attentional orienting comes from Klein and colleagues’ research (e.g., Briand &
Klein, 1987; Klein, 1994; Klein & Hansen, 1990; also see Funes, Lupianez, & Milliken,
2007, for a double dissociation using on a spatial Stroop task) showing a double dissocia­
tion between the two components of spatial attention. In one study, Briand and Klein
(1987) dissociated the two mechanisms by examining the effect of spatial cueing on the
likelihood that observers would make illusory conjunction errors. Spatial attention was di­
rected to the left or right of fixation by peripheral or central cues. In each trial, a pair of
letters appeared at the cued or uncued location and observers were required to report
whether or not the target letter R was present or absent. The critical comparison con­
Page 7 of 30
Spatial Attention

cerned the difference in performance when the target was absent and the letter pair pro­
moted an illusory conjunction of the target (i.e., PQ), as opposed to when the target was
absent and the letter pair did not promote an illusory conjunction of the target (i.e., PB).
The conditions were referred to as conjunction and feature search conditions, respective­
ly. The results revealed that search type interacted with spatial attention (p. 242) when ex­
ogenous peripheral cues were used, but not when endogenous central cues were used.
Thus, the possibility of making an illusory conjunction error impaired performance when
spatial attention was controlled exogenously, but not when it was controlled endogenous­
ly. Briand and Klein (1987) concluded that central and peripheral cues engage different
attentional systems, and that feature integration processes are only performed by the ex­
ogenously driven component of spatial attention.

In another study, Klein (1994) dissociated the two mechanisms of spatial attention by ex­
amining the effect of spatial cueing on nonspatial (i.e., perceptual motor) expectancies.
Covert spatial attention was directed by peripheral or central cues to a box on the left or
right side of fixation, and on each trial the cued or uncued box either increased or de­
creased in size. For half the observers an increase in target size was far more likely to oc­
cur, and for the other half a decrease in target size was far more likely to occur. The re­
sults indicated that nonspatial expectancies interacted with spatial attention when en­
dogenous central cues were used, but not when exogenous peripheral cues were used.
Specifically, performance was impaired by the occurrence of an unexpected event at the
uncued location when central cues were employed, whereas the effect of exogenous at­
tention on performance was the same for expected and unexpected events. Klein (1994)
concluded that when taken together with evidence from the Briand and Klein (1987)
study, their results demonstrated that exogenously and endogenously controlled atten­
tions recruit qualitatively different attentional mechanisms. In other words, “it is not just
the vehicle, but the passenger, that might differ with endogenous versus exogenous con­
trol” (Klein, 1994, p. 169).

Eye Movements and Spatial Attention


Orienting of spatial attention to specific locations or objects can be performed covertly or
overtly. Covert shifts of spatial attention involve internal movement of the “mind’s eye,”
whereas overt shifts of spatial attention involve observable eye movements. A consider­
able amount of research has been conducted in an attempt to understand the nature of
the relationship between eye movements and spatial attention (e.g., Goldberg & Wurtz,
1972; Klein, 1994; Posner, 1980; Remington, 1980; Rizzolatti, Riggio, Dascola, & Umilta,
1987; Shepherd, Findlay, & Hockey, 1986). Specifically, a number of studies have exam­
ined whether the processes are controlled by the same mechanism, whether they are
completely independent mechanisms, or whether they interact with each other as interde­
pendent mechanisms.

Early single-cell recording research showed that programming an eye movement caused
increased firing rates in cells in the superior colliculus whose receptive fields were at the

Page 8 of 30
Spatial Attention

target location, well before the actual eye movement had begun (Goldberg & Wurtz, 1972).
These findings supported the notion that a relationship exists between eye movements
and spatial attention and that the superior colliculus may be the neural substrate that
manages that relationship. However, since then, several studies have demonstrated that
the perceptual costs and benefits produced by spatial cueing manipulations can occur in
the absence of eye movements (e.g., Bashinski & Bacharach, 1980; Eriksen & Hoffman,
1972; Posner et al., 1978). The fact that the effects of spatial attention can occur in the
absence of eye movements suggests that attention shifts and eye movements are mediat­
ed by different neural structures. In fact, Posner (1980) concluded that when considered
together, the behavioral, electrophysiological, and single-cell recording findings are evi­
dence that “eliminates the idea that attention and eye movements are identical
systems” (p. 13).

Although spatial attention and eye movements are not identical (Posner, 1980), the oculo­
motor readiness hypothesis (OMRH) (Klein, 1980) and premotor theory (Rizzolatti, Rig­
gio, Dascola, and Umilta (1987) nevertheless propose that they are two processes of a
unitary mechanism. Both theories contend that endogenously controlled covert shifts of
attention prepare and facilitate the execution of eye movements to a target location. Or
as Rizzolatti et al. (1987) simply put it: Eye movements follow attention shifts. In this
view, the preparation of an eye movement to a target location is the endogenous orienting
mechanism (Klein, 2004). Thus, covert endogenous spatial attention is simply an intend­
ed, but unexecuted, eye movement (Rizzolatti et al., 1987). Critically, the theories predict
that covert shifts of spatial attention facilitate eye movements when they are in the same
direction, and that sensory processing should be enhanced at the location of a pro­
grammed eye movement. Despite representing the dominant view concerning the nature
of the relationship between spatial attention and eye movements (Palmer, 1999), the re­
search has yielded equivocal results concerning the two critical predictions that emerge
from the OMRH and premotor theory.

Klein (1980; see also Hunt & Kingstone, 2003; Klein & Pontefract, 1994) conducted two
(p. 243) experiments to test to the predictions of his OMRH. In one experiment, observers

were presented with a central cue, and then, depending on the type of trial, they either
made an eye movement or a detection response to a target at the cued or uncued location
(i.e., the type of response was determined by the physical characteristics of the target).
Most trials required a detection response, and performance in these trials suggested that
the participants had indeed shifted their attention in response to the cue. However, in
contrast to what would be predicted by the OMRH and premotor theory (Rizzolatti et al.,
1987), eye movements to a target at the cued location were no faster than they were to a
target at the uncued location (Klein, 1980). That result is not consistent with the OMRH
prediction that covert shifts of spatial attention facilitate eye movements when they are in
the same direction. In his second experiment, Klein (1980) instructed some observers to
perform a leftward eye movement on every trial and others to perform a rightward eye
movement on every trial, regardless of whether the target appeared on the left or right of
central fixation. Critically, on some trials, instead of executing an eye movement, ob­
servers simply needed to make a detection response to the target. Again, the results were
Page 9 of 30
Spatial Attention

inconsistent with the OMRH and premotor theory. Instead of RTs being facilitated when
detection targets appeared in the location of the prepared eye movement, as the OMRH
and premotor theory would predict, RTs were unaffected by the spatial compatibility of
the target and the direction of the eye movement. These findings and others (e.g., Hunt &
Kingstone, 2003; Posner, 1980; Remington, 1980) refute the notion that covert spatial at­
tention shifts are unexecuted eye movements, and rather suggest that covert spatial at­
tention and overt eye movements are independent mechanisms (Klein, 1980; Klein & Pon­
tefract, 1994).

The extant research does suggest that covert spatial attention and overt eye movements
are interdependent. For example, the results of a study conducted by Shepherd et al.
(1986) showed that spatial attention and eye movements form an asymmetrical relation­
ship, such that an attention shift does not require a corresponding eye movement, but an
eye movement necessarily involves a shift in the focus of attention. They examined covert
and overt attentional orienting independently by manipulating the validity of a central
cue and by instructing observers to either prepare and execute, or prepare and inhibit, an
eye movement to the target location, respectively. In a blocked design, observers were re­
quired to remain fixated (i.e., fixate condition) or make an eye movement (i.e., move con­
dition) following the presentation of either a valid (80 percent), uninformative (50 per­
cent), or invalid (20 percent) central cue, and then respond to the detection of a target at
the cued or uncued location. Not surprisingly, the results showed a significant reduction
in detection RT when the move condition was coupled with the valid spatial cue. Interest­
ingly, a benefit to RT was also found in the move condition when the cue was invalid, sug­
gesting that eye movements have a more dominant effect on performance than endoge­
nously controlled attention (Shepherd et al., 1986). Most germane to the OMRH (Klein,
1980) and premotor theory (Rizzolatti et al., 1987), however, was the large benefit to RT
that was found in the move condition when the cue was uninformative. Accordingly, the
authors concluded that making an eye movement “involves some allocation of attention to
the target position before the movement starts” (Shepherd et al., 1986, p. 486). This
demonstration of the interdependence between spatial attention and eye movements is at
least in partial support of the OMRH (Klein, 1980) and premotor theory (Rizzolatti et al.,
1987).

The findings from a more recent study conducted by Hoffman and Subramanian (1995)
also support the OMRH (Klein, 1980) and premotor theory (Rizzolatti et al., 1987) by
showing that a close relationship exists between spatial attention and eye movements. In
their first experiment, observers were presented with four placeholders around fixation
(i.e., above and below, and on either side) followed by an uninformative central cue indi­
cating to which placeholder they should prepare an eye movement toward (but providing
no prediction about the target location). Shortly after the directional cue was removed, a
tone was presented to cue observers to initiate the planned eye movement. When the
tone was presented or immediately afterward, a letter was presented in each placeholder
(i.e., three distractors and one target), and observers were required to perform a two-al­
ternatives forced choice regarding the identity of the target. The results revealed that tar­
get discriminations were approximately 20 percent better when the target appeared in
Page 10 of 30
Spatial Attention

the location of the intended location compared with when the target appeared at one of
the uncued locations. Because this finding indicates that observers attended to the loca­
tion of the intended eye movement, even though they were aware that the cue did not
validly predict the target location, the researchers concluded that the spatial attention
and eye movements are indeed (p. 244) related, such that eye movements to a given tar­
get location are preceded by a shift of spatial attention to that location (Hoffman &
Subramanian, 1995).

In their second experiment, Hoffman and Subramanian (1995) dissociated spatial atten­
tion and eye movements by requiring observers to prepare eye movements in the same di­
rection on each trial, before presenting them with a valid central cue. Thus, on some tri­
als, the intended eye movement and the target location matched, and on other trials they
mismatched. The results showed that regardless of whether the cue directed attention to
the target location or not (i.e., whether the cue was valid or not), target discriminations
were highly accurate when the target location and the intended eye movement matched,
and target discriminations were poor when the target location and the intended eye
movement mismatched. The findings from this experiment indicate that spatial attention
cannot be easily directed to a different location than an intended eye movement and sug­
gest that an “obligatory” relationship exists between spatial attention and eye movements
such that covert spatial orienting precedes overt orienting (Hoffman & Subramanian,
1995). Thus, the results of both experiments reported by Hoffman and Subramanian
(1995; see also Shepherd et al., 1986) support the OMRH (Klein, 1980) and premotor the­
ory (Rizzolatti et al., 1987).

Taken together, the studies reviewed above indicate that spatial attention and eye move­
ments are interdependent mechanisms: Although a shift of attention can be made in the
absence of an eye movement, an eye movement cannot be executed without a preceding
shift of spatial attention. Moreover, because spatial attention appears to play a critical
role in initiating eye movements, rather than vice versa, it can be concluded that “atten­
tion is the primary mechanism of visual selection, with eye movements playing an impor­
tant but secondary role” (Palmer, 1999, p. 570).

Neural Sources of Spatial Attention


The advent of modern neuroimaging techniques has afforded researchers great insight in­
to the effects of spatial attention on neural activity in humans (e.g., Corbetta, Miezin,
Dobmeyer, Shulman, & Petersen, 1993; Desimone & Duncan, 1995; Kanwisher & Wojciu­
lik, 2000). Subcortical and cortical sources have been implicated in spatial attention.

Subcortical Networks of Spatial Attention

One subcortical substrate, the superior colliculus, plays a key role in reflexive shifts of at­
tention (it is also involved in localization of stimuli and eye movements) (Wright & Ward,
2008). Another area called the frontal eye field (FEF) is particularly important for execut­
ing voluntary shifts of attention (Paus, 1996). The critical connection between the FEF
Page 11 of 30
Spatial Attention

and attention shifts would be expected given the interdependent relationship between
spatial attention and eye movements that was illustrated in the previous section. Some re­
searchers suggest that the superior colliculus and FEF make up a subcortical network
that interacts to control the allocation of spatial attention. Specifically, it has been sug­
gested that the FEF may serve to inhibit reflexive attention shifts generated by the supe­
rior colliculus (c.f., Wright & Ward, 2008).

Areas of the thalamus are also important subcortical regions associated with spatial at­
tention. One area, called the pulvinar nucleus, is critically involved in covert spatial ori­
enting (e.g., Robinson & Petersen, 1992). In a single-cell recording study, Petersen, Robin­
son, and Morris (1987) measured activity of neurons in the dorsomedial part of the lateral
pulvinar (Pdm) of trained rhesus monkeys while they fixated centrally and made speeded
responses to targets appearing at peripherally cued or uncued locations. First, the re­
searchers confirmed that the Pdm is related to spatial attention, and is independent of
eye movements, by observing enhanced activity in that area when the monkeys covertly
attended to the target location. Having established that the Pdm is related to spatial at­
tention, next Petersen et al. examined changes in attentional performance that resulted
from pharmacological alteration of the Pdm by GABA-related drugs (i.e., muscimol, a GA­
BA-agonist, and bicuculline, a GABA-antagonist). The results showed that pharmacologi­
cal alteration of this part of the brain did in fact alter performance in the spatial cueing
task: Injections of muscimol, which increases inhibition by increasing GABA effectiveness,
impaired the monkey’s ability to execute contralateral attention shifts, and injections of
bicuculline, which decreases inhibition by decreasing GABA effectiveness, facilitated the
monkey’s ability to shift attention to the contralateral field. Given that these modulations
of attentional performance were produced by drug injections to the Pdm, it is reasonable
to conclude that the Pdm is implicated in spatial attention (Petersen et al. 1987). A study
by O’Connor, Fukui, Pinsk, and Kastner (2002) that will be reviewed later in the chapter
shows that activity in another area of the thalamus, called the lateral geniculate nucleus
(LGN), is also modulated by spatial attention.

(p. 245) Cortical Networks of Spatial Attention

In addition to the subcortical sources discussed above, multiple cortical sources are criti­
cally involved in the allocation of spatial attention resources. A number of different atten­
tional control processes involve various areas of the parietal cortex (e.g., Corbetta, 1998;
Kanwisher & Wojciulik, 2000). For example, Yantis, Schwarzbach, Serences, et al. (2002)
used event-related functional magnetic resonance imaging (fMRI) to examine changes in
brain activity that occurred when observers made covert attention shifts between two pe­
ripheral target locations. In particular, the researchers were interested in determining
whether the activation of parietal cortex is associated with transient or sustained atten­
tional control. The scans revealed that while shifts of spatial attentional produced sus­
tained contralateral activation in extrastriate cortex, they produced only transient in­
creases in activation in the posterior parietal cortex (Yantis et al., 2002). Accordingly the
authors concluded, “activation of the parietal cortex is associated with a discrete signal to

Page 12 of 30
Spatial Attention

shift spatial attention, and is not the source of a signal to continuously maintain the cur­
rent attentive state” (Yantis et al., 2002, p. 995).

Activation of the parietal cortex is associated with top-down attentional control that is
specifically related to the processing of the spatial pre-cue. That was demonstrated in an
event-related fMRI study that Hopfinger, Buonocore, and Mangun (2000) designed to dis­
sociate neural activity associated with cue-related attentional control, from the selective
sensory processing associated with target perception. Valid central cues were used to di­
rect spatial attention to one of the black-and-white checkerboard targets that were pre­
sented on either side of fixation. The task required observers to covertly attend to the
cued checkerboard and report whether or not it contained some gray checks. The fMRI
scans revealed that the cues activated a network for voluntary attentional control com­
prising the inferior parietal (particularly the intraparietal sulcus [IPS]), superior tempo­
ral, and superior frontal cortices. Moreover, the scans showed contralateral cue-related
activation in areas of extrastriate cortex that represented the spatial location of the ensu­
ing target. Taken together, these findings indicate that a neural network comprising areas
of parietal, temporal, and frontal cortex is associated with top-down attentional control,
and that this network in turn modulates activity in areas of visual cortex where the target
is expected to appear.

Although one area of the parietal lobule, the IPS, is involved in generating and sustaining
voluntary attention toward a cued location in the absence of sensory stimulation
(Friedrich, Egly, Rafal, & Beck, 1998; Hopfinger et al., 2000; Yantis et al., 2002), another
area, TPJ, is activated by unexpected stimulus onsets, particularly at unattended locations
(e.g., Serences et al., 2005; Shulman et al., 2003). Thus, the neuroimaging evidence from
the studies reviewed above indicates that several distinct areas of the parietal cortex are
critically involved in a number of components of spatial attention (Serences et al., 2005,
p. 1000).

Abundant findings from neuroimaging research also indicate that the parietal cortex in­
terconnects with the frontal cortex to form a complex cortical network for spatial atten­
tion (e.g., Corbetta, 1998; Corbetta & Shulman, 2002; Kastner & Ungerleider, 2000).
Positron emission tomography (PET) was used in a classic study by Corbetta et al. (1993)
to investigate the brain areas involved in the voluntary orienting of spatial attention. Spa­
tial attention was directed endogenously by informative central cues and by instructions
to covertly shift attention to the most probable target location. The behavioral findings
showed the expected pattern of costs and benefits to RT (i.e., targets were detected
faster at cued, and slower at uncued, locations), confirming the effectiveness of the cue in
orienting attention. The PET scans revealed significant activation of superior (i.e., dorsal)
areas in both the parietal and frontal cortex during performance of the spatial cueing
task.

Page 13 of 30
Spatial Attention

Dorsal and Ventral Frontoparietal Networks

The behavioral studies reviewed in the previous section on the control of spatial attention
provide compelling evidence that it is governed by two independent, and perhaps inter­
acting, mechanisms. One mechanism shifts reflexively, provides transient attentional ef­
fects on perception, and is under exogenous (i.e., stimulus-driven) control. The other
mechanism is associated with voluntary shifts, provides sustained attentional effects on
perception, and is under endogenous (i.e., goal-directed) control. Moreover, given the as­
sertion made by researchers such as Klein (1994; Klein & Shore, 2000) that these two at­
tentional mechanisms are specialized to process different types of visual information, one
might reasonably assume that they are each associated with distinct areas of the brain. In
fact, in an influential review of the literature, Corbetta and Shulman (2002) suggested
that spatial attention is (p. 246) associated with two “partially segregated networks of
brain areas that carry out different attentional functions” (p. 201). On one hand, they pro­
pose that control of goal-directed (i.e., voluntary) spatial attention is accomplished in a bi­
lateral neural network that sends top-down signals from parts of the superior frontal cor­
tex (i.e., FEF) to areas of the parietal cortex (i.e., IPS). On the other hand, they propose
that control of stimulus-driven (i.e., reflexive) spatial attention takes place in a right-later­
alized neural network involving temporal-parietal cortex (i.e., TPJ) and inferior frontal
cortex. Corbetta and Shulman (2002) refer to these two streams as the dorsal frontopari­
etal network and ventral frontoparietal network, respectively.

The dorsal and ventral frontoparietal networks play separate roles in the control of atten­
tion. The dorsal network performs attentional processing associated with voluntary con­
trol of attention (e.g., Corbetta et al., 2000; Hopfinger et al., 2000; Yantis et al., 2002),
and the ventral network is more involved in attentional processing associated with the on­
set and detection of salient and behaviorally relevant stimuli in the environment (Corbet­
ta & Shulman, 2002; Marois, Chun, & Gore, 2000). While Corbetta and Schulman (2002)
contend that the two networks are partially segregated, they also posit that the systems
interact. Specifically, they suggest that one of the important functions of the ventral net­
work is to interrupt processing in the dorsal stream when a behaviorally relevant stimu­
lus is detected (Corbetta & Shulman, 2002). This “circuit breaker” role of the ventral
stream would facilitate the disengagement and reorientation of attention (Corbetta &
Shulman, 2002).

Interaction between the dorsal and ventral frontoparietal networks may also be required
for the spatial localization of salient stimuli. The ventral network is specialized to detect
the onset of salient stimuli, but localizing such stimuli in space probably requires assis­
tance from the dorsal network. Consistent with that idea, an influential theory of visual
processing postulated by Ungerleider and Mishkin (1982) contends that the ventral path­
way of the visual system determines “what is out there,” whereas the dorsal pathway of
the visual system determines “where it is” (see also Ungerlieder & Mishkin, 1992).

Results from a recent study by Fox, Corbetta, Snyder, Vincent, and Raichle (2006)
provided some support for Corbetta and Shulman’s (2002) contention that the exchange

Page 14 of 30
Spatial Attention

of information between the dorsal and ventral networks is accomplished by interconnec­


tions between the right IPS and right TPJ. However, that study also found that the correla­
tion of neural activity between the right IPS and right TPJ was no stronger that it was be­
tween several other areas across the two networks (e.g., FEF/TPJ, IPS/VFC), and the au­
thors also conceded that spatial reorienting behaviors have been shown to persist even in
patients with IPS lesions (Fox et al., 2006). Thus, the signal from the ventral network
must be able to access dorsal stream in areas other than the IPS.

Neurophysiological Effects of Spatial Attention


Spatially attended stimuli are perceived differently than unattended stimuli. Research has
revealed a number of ways that attention modulates the neural activity that gives rise to
our subjective experience of the visual world. Given the information processing con­
straints of the visual system (e.g., Desimone & Duncan, 1995), attention is needed to
serve as a selection mechanism that actively enhances processing of some stimuli at the
expense of others (e.g., Petersen et al., 1987). In a classic study of spatial attention,
Moran and Desimone (1985) investigated how attentional mechanisms filter (i.e., select)
wanted from unwanted stimuli. They recorded the activity of single cells in visual cortex
of monkeys trained to perform a match-to-sample task. The monkeys covertly attended to
either an effective (i.e., elicits a response from the cell) or ineffective stimulus and were
required to determine whether sequentially presented stimuli at the attended location
were the same or different. When the effective and the ineffective stimuli were both in­
side the cell’s receptive field, the response was determined by the attended stimulus: the
cell responded strongly when the monkey attended to the effective stimulus, and it re­
sponded poorly when the monkey attended to the ineffective stimulus. Thus, when multi­
ple stimuli fall within the receptive field of a single cell, the response is determined by
the attended stimulus. Indeed, in V4 the consequence of ignoring an effective stimulus in­
side the receptive field was a reduction in the cell’s response by more than half (Moran &
Desimone, 1985). In contrast, when the ineffective stimulus was moved outside the cell’s
receptive field, and the effective stimulus was left inside, the cell’s response to the effec­
tive stimulus was the same whether the animal attended to the ineffective stimulus out­
side the cell’s receptive field or attended to the effective stimulus inside the cell’s recep­
tive field. That pattern of data led the researchers to (p. 247) conclude that attentional
mechanisms do not serve to enhance responses to attended stimuli; rather, the neural ba­
sis of spatial attention is to attenuate processing of irrelevant information “as if the re­
ceptive field has contracted around the attended stimulus” (Moran & Desimone, 1985, p.
783).

Mangun and Hillyard (1991) investigated the neural bases of the ubiquitous perceptual
benefits of spatial attention (e.g., more efficient and accurate detections and discrimina­
tions) using event-related potentials (ERPs) (i.e., changes in the electrophysiological ac­
tivity in the brain time-locked to the presentation of an external stimulus). Central cues
were used to direct spatial attention toward or away from subsequently presented periph­
eral targets. Observers detected or made choice discriminations about targets that ap­
Page 15 of 30
Spatial Attention

peared at covertly attended or unattended locations while the electrical activity from cor­
tical neurons was measured from their scalps. Visual onsets produce predictable early re­
sponses over visual cortex called the P1 and N1 waveform components (e.g., Eason, 1981;
Van Voorhis & Hillyard, 1977). The P1 is the first major positive deflection, occurring be­
tween 70 and 100 ms after the presentation of a visual stimulus, and the N1 is the first
major negative component that is more broadly distributed and occurs about 150 to 200
ms after the presentation of a visual stimulus (Mangun & Hillyard, 1991). The recordings
revealed that the P1 was larger for attended than unattended targets in both the detec­
tion and the discrimination tasks; however, the amplitude of the N1 component only dif­
fered for attended and unattended targets in the discrimination task. These findings led
Mangun and Hillyard (1991; Hillyard & Mangun, 1987) to conclude that spatial attention
facilitates a sensory gain control mechanism that enhances processing associated with
sensory signals emanating from attended locations. Interestingly, the observed attention-
related dissociation between the P1 and N1 waveform components has since been repli­
cated, and has been interpreted as evidence that the costs and benefits produced by spa­
tial cues may reflect the activity of qualitatively different neural mechanisms (c.f. Luck,
1995).

Similar attention-related increases in neural activity have been shown using fMRI. In a
study by O’Conner et al. (2002), observers were presented with high- or low-contrast
flickering checkerboards on either side of fixation while they were in the scanner. There
were two viewing conditions in the experiment: In the attend condition, observers covert­
ly attended to the checkerboard on the left or right of fixation and responded when they
detected random changes in target luminance; in the unattended condition, instead of
shifting attention, observers counted letters that were presented at fixation. The results
showed increased fMRI signal change in the LGN and visual cortex in the attended condi­
tion compared with the unattended condition.

In the same study, O’Conner et al. (2002) also investigated the effect of spatial attention
on processing of nonselected, or unattended, information. They assumed that the spread
of spatial attention would be determined (i.e., constrained) by the relative amount of cog­
nitive effort that was required to perform a perceptual task at fixation. Specifically, be­
cause of the limited resource capacity of the attention system, they predicted that the
amount of processing devoted to an unattended stimulus would be determined by the
amount of resources not consumed by the attended stimulus. To test their prediction, ob­
servers were required to perform either a demanding (high-load) or easy (low-load) task
at fixation while ignoring checkerboard stimuli presented in periphery. As expected, the
results showed a decrease in activation across the visual cortex in the demanding condi­
tion compared with the easy condition. Thus, activation of the LGN and visual cortex is
enhanced when observers voluntarily attend to peripheral stimulus, and it is attenuated
when the same stimulus is voluntarily ignored (O’Conner et al., 2002). The researchers
concluded that spatial attention facilitates visual perception by “enhancing neural re­
sponses to an attended stimulus relative to those evoked by the same stimulus when ig­
nored” (Kastner, McMains, & Beck, 2009, p. 206).

Page 16 of 30
Spatial Attention

Effects of Spatial Attention on Early Visual Per­


ception
Abundant research has shown that spatial attention enhances performance in tasks that
are based on low-level visual perception (see Carrasco, 2006, for a review). Behavioral
studies have traditionally investigated the relationship between spatial attention and ear­
ly vision using basic dimensions such as contrast sensitivity, spatial sensitivity, and tempo­
ral sensitivity as indices of early visual processing.

Effects of Spatial Attention on Contrast Sensitivity

Spatial attention increases contrast sensitivity (see Carrasco, 2006 for a review).
Cameron, Tai, and Carrasco (2002) examined the effect of covert spatial attention on con­
trast sensitivity in an orientation discrimination task. Spatial attention was manipulated
by using an informative (i.e., (p. 248) 100 percent valid) peripheral cue that appeared at
one of eight locations in an imaginary circular array around the fixation point, or a neu­
tral cue that appeared at fixation. Following the onset of the cue, observers were briefly
presented with a tilted sine wave grating (i.e., alternating fuzzy black lines in a Gaussian
envelope) at varying stimulus contrasts. The researchers found that observers’ contrast
sensitivity thresholds were lower at the attended than the unattended location. In other
words, the contrast needed for observers to perform the task at a given level of accuracy
was different at attended and unattended locations: They attained a threshold level of
performance in the orientation discrimination task at a lower contrast when targets ap­
peared at the attended than the unattended location.

Similar findings were reported by Pestilli and Carrasco (2005) in a study that manipulated
spatial attention using pure exogenous cues (i.e., peripheral and uninformative). They
presented a peripheral cue (50 percent valid with two locations) or a neutral cue (i.e., at
fixation) followed by the brief presentation of a titled sine wave grating on either side of
fixation. Shortly after the removal of the target gratings (which were presented at vary­
ing contrasts), a centrally presented stimulus indicated to the observer whether the ori­
entation of the left or right target grating was to be reported. Thus, trials were divided in­
to three equally probable types: On valid trials the peripheral cue and the response cue
were spatially congruent; on invalid trials the peripheral cue and the response cue were
spatially incongruent; and on neutral trials the response cue, which followed the centrally
presented neutral cue, was equally likely to point to the left or right target. This elegant
experimental design permitted the researchers to evaluate the effect of attention on con­
trast sensitivity at cued and uncued locations. The results showed both a benefit and cost
of spatial attention. That is, relative to the neutral condition, contrast sensitivity was en­
hanced at the attended location and was impaired at the unattended location. Pestilli and
Carrasco (2005) concluded that there is a processing tradeoff associated with spatial at­
tention such that the benefit to perception at the attended location means that fewer cor­
tical resources are available for perceptual processing at unattended spatial locations.

Page 17 of 30
Spatial Attention

Effects of Spatial Attention on Spatial Sensitivity

An extensive amount of research has demonstrated that spatial attention improves spatial
resolution (e.g., Balz & Hock, 1997; Tsal & Shalev, 1996; Yeshurun & Carrasco, 1998;
1999; and see Carrasco & Yeshurun, 2009, for a review). In a study conducted by Yeshu­
run and Carrasco (1999), which used peripheral cues to direct covert spatial attention, it
was shown that performance was better at attended than unattended locations in a vari­
ety of spatial resolution tasks that required observers to either localize a spatial gap, dis­
criminate between dotted and dashed lines, or determine the direction of vernier line dis­
placements (Yeshurun & Carrasco, 1999). In another study (Yeshurun & Carrasco, 1998),
these researchers showed that spatial attention modulates spatial resolution via signal
enhancement (i.e., as opposed to attenuating noise or changing decisional criteria). Ob­
servers performed a two-interval forced-choice texture-segregation task in which they re­
ported whether the first or second display contained a unique texture patch. On peripher­
al cue trials, a cue in one interval validly predicted the location of the target, but not the
interval that contained the target, and a cue in another interval simply appeared at a non­
target location. Neutral cues were physically distinct from the peripheral cues, and they
provided no information concerning where the target would be in either interval. Both pe­
ripheral and neutral cues were presented at a range of eccentricities from the fovea. In­
terestingly, but in line with the researchers’ prediction, performance was better in cued
than neutral trials at all target eccentricities except those at, or immediately adjacent to,
the fovea. According to Yeshurun and Carrasco (1998), this counterintuitive pattern of re­
sults occurred because spatial attention caused the already small spatial filters at the
fovea to become so small that their resolution was beyond what the texture segregation
task required. In other words, by enhancing spatial resolution, attention impaired task
performance at central retinal locations. Justifiably, it was concluded that one way in
which attention enhances spatial resolution is via signal enhancement (Yeshurun & Car­
rasco, 1998).

Effects of Spatial Attention on Temporal Sensitivity

Research has also revealed counterintuitive effects of spatial attention on temporal sensi­
tivity. Yeshurun and Levy (2003) investigated the effect of spatial attention on temporal
sensitivity using a temporal gap detection task. Observers judged whether a briefly pre­
sented target disc was on continuously, or contained a brief offset (i.e., a temporal gap).
Targets appeared following valid peripheral cues or (p. 249) after a physically distinct neu­
tral cue that diffused spatial attention across the entire horizontal meridian. The results
indicated that temporal sensitivity was actually worse on valid peripheral cue trials than
on diffuse neutral cue trials. To account for the counterintuitive finding that spatial atten­
tion impairs temporal resolution, Yeshurun and Levy (2003) suggested that spatial atten­
tion produces an inhibitory interaction that activates the parvocellular visual pathway
and inhibits the magnocellular visual pathway. To the extent that spatial resolution de­
pends on processing in the parvocellular pathway and temporal resolution depends on

Page 18 of 30
Spatial Attention

processing in the magnocellular pathway, an inhibitory interaction of this nature could ex­
plain why spatial attention enhances spatial resolution and degrades temporal resolution.

The effect of spatial resolution on temporal sensitivity was further investigated in a study
by Hein, Rolke, and Ulrich (2006) that employed both peripheral and central cues in sepa­
rate temporal order judgment tasks. Two targets dots, side by side, were presented asyn­
chronously at either the attended or unattended location, and observers reported which
target appeared first. Consistent with the results reported by Yeshurun and Levy (2003),
when peripheral cues were used, TOJ performance was worse at the attended than the
unattended location. However, when central cues were used, performance was more ac­
curate at the attended than the unattended location. To account for the qualitative differ­
ence in performance across the two cue types, Hein et al. (2006) adhered to the idea that
automatic and voluntary shifts of spatial attention affect different stages of the visual sys­
tem (see also Briand & Klein, 1987). Specifically, they suggested that automatic shifts of
attention influence early stages of processing and impair the temporal resolution of the
visual system, whereas voluntary shifts of attention influence processing at higher levels
and improve on the temporal resolution of the visual system (Hein et al., 2007).

According to Titchener (1908), “the object of attention comes to consciousness more


quickly than the objects that we are not attending to” (p. 251). He called this the “law of
prior entry.” Shore, Spence, and Klein (2001; Shore & Spence, 2005; Spence, Shore, &
Klein, 2001) investigated the prior entry effect of spatial attention using a visual temporal
order task. Peripheral exogenous cues or central endogenous cues were presented, fol­
lowed by the asynchronous onset of one target line segment on either side of fixation. On
one half of the trials, the target at the cued location appeared first, and on the other half
of the trials, the target at the uncued location appeared first. One target was a vertical
line, and the other was horizontal line, and observers were required to report the one
they perceived first. The response and the cue were orthogonal in an attempt to reduce
the tendency of subjects to simply report that the target at the cued location appeared
first. The prior entry effect was observed in response to both exogenous than endogenous
cues; that is, observers were more likely to perceive and report the target at the cued lo­
cation first, even when it was slightly preceded the onset of the target at the uncued loca­
tion. Thus, attended stimuli are perceived sooner than unattended stimuli (e.g., Shore et
al., 2001; Stelmach & Herdman, 1991).

Spatial attention has a similar effect on the perception of temporal offsets. In a study by
Downing and Treisman (1997), subjects were exogenously cued to one side or the other
of fixation by the transient brightening of one of two target placeholders (there was also
an endogenous component to the cue because it validly predicted the location of the tar­
get event on two-thirds of the trials). Following the presentation of the cue, one of two
target dots offset at either the cued or uncued location. When the target offset occurred
at the cued location, observers were faster to respond relative to when the offset oc­
curred at the uncued location. That effect of cue validity shows that “attention facilitates

Page 19 of 30
Spatial Attention

the detection of offsets at least as much as detection of onsets” (Downing & Treisman,
1997, p. 770).

Seemingly at odds with the results indicating that perception of stimulus onsets and off­
sets are sped up at spatially attended locations, research has also shown that attention
prolongs perceived duration of briefly presented stimuli (e.g., Enns, Brehaut, & Shore,
1999). Mattes and Ulrich (1998) investigated the effect of attention on perceived duration
by assuming that more attention is allocated to a spatial cue as it becomes increasingly
valid. In a blocked design, observers were presented with central cues that validly pre­
dicted the location either on 90 percent, 70 percent, or 50 percent of trials. Observers
were explicitly aware of the cue validity in each block, and their task was to judge
whether the presentation of the target stimulus was of short, medium, or long duration.
As expected, the results indicated that as cue validity increased, so did mean ratings of
perceived duration. Thus, endogenous spatial attention, activated by central cues, pro­
longs the perceived duration of a briefly presented target stimulus (Mattes & Ulrich,
1998). That result was (p. 250) subsequently replicated and extended in a study by Enns et
al. (1999) showing that the illusion of prolonged duration was independent of the prior
entry effect. In other words, attended stimuli do not seem to last longer because they also
seem to have their onset sooner (Enns et al., 1999).

Although the effects of spatial attention on many dimensions of basic vision have been
well established in the extant research, for centuries attention researchers have contem­
plated the effect of spatial attention on the perceived intensity and perceptual clarity of a
stimulus (Helmholtz, 1886; James, 1890; Wundt, 1912). In other words: Does attention al­
ter the appearance of a stimulus? Despite the long-standing interest, however, only re­
cently have psychophysical procedures been developed to permit attention researchers to
evaluate the question with empirical data (e.g., Carrasco, Ling, & Read, 2004; Liu,
Abrams, & Carrasco, 2009).

Carrasco et al. (2004) approached the issue by investigating the effect of transient atten­
tion on perceived contrast. Uninformative peripheral cues were used to direct spatial at­
tention to the left or right side of fixation, or a neutral cue was presented at fixation. On
each trial, observers were presented with a sine wave grating on either side of fixation
and were required to perform an orientation discrimination on the target grating that ap­
peared higher in contrast. One of the two targets, the standard, was always presented at
a fixed contrast (i.e., near threshold), and the other target, the test, was presented at var­
ious contrasts above and below that of the standard. To determine the effect of attention
on perceived contrast, the authors determined the point of subjective equality between
the test and the standard target. The results indicated that when the test target was
cued, it was perceived as being higher in contrast than it really was. In other words,
when the test was cued, its contrast was subjectively perceived as being equal to the
standard even when the actual contrast was lower than the standard. Based on this find­
ing, the authors concluded “that attention changes the strength of a stimulus by increas­

Page 20 of 30
Spatial Attention

ing its ‘effective contrast’ or salience” (Carrasco et al., 2004, p. 308; see also Gobell &
Carrasco, 2005).

Recently, Liu et al. (2009) showed that voluntary spatial attention also enhances subjec­
tive contrast. They presented observers with valid central directional cues to neutral cen­
tral cues to orient observers to one of two rapid serial visual presentation (RSVP) streams
of letters, where they were instructed to detect the rare occurrence of specific target. At
the end of the RSVP stream, a sine wave grating was briefly presented at each location.
On trials when the target was present in the cued or uncued RSVP stream, observers
made one type of response, but if the target was not present, observers reported the ori­
entation of the target grating that was higher in contrast. Similar to the Carrasco et al.
(2004) study summarized above, one target grating, the standard, was presented at a
fixed contrast in all trials, and the other target grating, the test, was presented at a vary­
ing range of contrasts, above and below that of the standard. The critical finding support­
ed what Carrasco et al. (2004) found using peripheral uses: In order for the pairs to be
perceived as subjectively equal in contrast, the test target needed to be presented at a
lower contrast than the standard when it was at the attended location, and needed to be
at a higher contrast than the standard target when it was at the unattended location (Liu
et al., 2009). In sum, automatic and voluntary spatial attention both alter the appearance
of stimuli by enhancing perceived contrast.

Summary
Much of our current knowledge of the visual system conceptualizes it as being, in many
ways, dichotomous (e.g., cells in the central visual pathways are mainly magnocellular
and parvocellular, and visual information is processed in the putative “where/how” dorsal
pathway and the “what” ventral pathway). The research reviewed in the present chapter
presented visual spatial attention in much the same way. Spatial attention is a selective
mechanism; it determines which sensory signals control behavior, and it constrains the
rate of information processing (Maunsell, 2009). Abundant research has attempted to de­
termine the units of attentional selection. On the one hand, space-based models generally
liken spatial attention to a mental spotlight (e.g., Eriksen & Eriksen, 1974; Posner, 1978)
that selects and enhances perception of stimuli falling within a region of space. On the
other hand, object-based models assert that spatial attention selects and enhances per­
ception of perceptually grouped stimuli, irrespective of spatial overlap, occlusion, or dis­
parity (e.g., Duncan, 1984; Neisser, 1967). Rather than one theory or the other being cor­
rect, it appears that the attentional system relies on both spatial and grouping factors in
the selection process. Indeed, recent evidence suggests that relatively minor changes to
the stimuli used in a spatial cueing task can promote space- or object-based selection
(Nicol, Watter, Gray, & Shore, 2009).

Control over the allocation of spatial attention resources is also dichotomous. In­
(p. 251)

deed, exogenously controlled spatial attention is deployed rapidly and automatically in re­
sponse to salient stimuli that appear at peripheral locations (e.g., Jonides, 1981; Muller &

Page 21 of 30
Spatial Attention

Rabbitt, 1989; Nakayama & Mackeben, 1989), whereas endogenously controlled atten­
tion is deployed voluntarily based on the interpretation of centrally presented, informa­
tive stimuli (e.g., Jonides, 1981; Muller & Rabbitt, 1989; Nakayama & Mackeben, 1989).
In addition to reflecting distinct responses to relevant stimuli, these two attentional con­
trol mechanisms probably also process visual information in fundamentally distinct ways
(e.g., Briand & Klein, 1987; Klein, 1994).

Another dichotomy germane to spatial attention concerns overt and covert orienting. Al­
though it is clear that covert attention shifts and eye movements are not identical (e.g.,
Posner, 1980), it is uncertain precisely how the two are related. Some evidence argues for
their independence (e.g., Klein & Pontefract, 1994), but according to the premotor theory
(Rizzolatti et al., 1987)—the dominant view concerning the relationship between spatial
attention and eye movements (Palmer, 1999)—the two mechanisms are actually interde­
pendent. To be sure, the research indicates that although attention shifts can occur in the
absence of an eye movement, an eye movement cannot be executed before an attentional
shift (e.g., Hoffman & Subramanian, 1995; Klein, 2004; Shepherd et al., 1986).

Spatial attention can either enhance or degrade performance in tasks that measure low-
level visual perception. Spatial attention enhances contrast sensitivity (c.f. Carrasco,
2006) and spatial sensitivity (c.f. Carrasco & Yeshurun, 2009), likely by effectively shrink­
ing the receptive field size of cells with receptive fields at the attended location (Moran &
Desimone, 1985), or possibly by biasing activity of cells with smaller receptive fields at
the attended location (Yeshurun & Carrasco, 1999). In contrast, spatial attention de­
grades temporal sensitivity (Yeshurun & Levy, 2003; but see Hein et al., 2006, and Nicol
et al., 2009, for exceptions). One possible explanation for this counterintuitive finding is
that spatial attention induces an inhibitory interaction that favors parvocellular over mag­
nocellular activity (Yeshurun & Levy, 2003).

The final dichotomy of spatial attention reviewed in the present chapter pertained to the
partially segregated, but interacting, dorsal and ventral frontoparietal neural networks.
In their influential model, Corbetta and Shulman (2002) proposed that control of goal-di­
rected (i.e., voluntary) spatial attention is accomplished in a pathway comprising dorsal
frontoparietal substrates, and control of stimulus-driven (i.e., automatic) spatial attention
takes place in a right-lateralized pathway comprising ventral frontoparietal areas. Their
model further proposes that the ventral network serves as a “circuit breaker” of the dor­
sal stream to facilitate the disengagement and reorientation of attention when a behav­
iorally relevant stimulus is detected (Corbetta & Shulman, 2002).

In conclusion, spatial attention permits us to selectively enhance processing of behav­


iorally relevant stimuli and to attenuate processing of irrelevant stimuli (e.g., Yantis et al.,
2002). By enabling us to allocate resources toward a specific stimulus, spatial attention
prevents us from being overwhelmed by the massive stimulation that continually bom­
bards the visual system. Visual perception is, for the most part, enhanced by spatial at­
tention. For example, spatial attention enhances perceptual clarity by enhancing the per­
ceived contrast of a stimulus (Carrasco et al., 2004). Eye movements and spatial attention

Page 22 of 30
Spatial Attention

are identical mechanisms, but they are related (Rizzolatti et al., 1987): Although a shift of
spatial attention can be made in the absence of an overt eye movement, eye movements
cannot be executed without a preceding attentional shift.

Throughout this chapter, spatial attention has been presented as a dichotomous neural
mechanism. Spatial attention can select either spatial locations (e.g. Posner, 1980) or ob­
jects (e.g., Duncan, 1984) for further processing. The neural sources of spatial attention
are both subcortical and cortical, the most critical of which are perhaps the two partially
separate, but interacting, pathways that make up the dorsal and ventral frontoparietal
networks (c.f. Corbetta & Shulman, 2002). Finally, the allocation of spatial attention re­
sources can be controlled automatically by stimulus-driven factors or voluntarily by top-
down processes, and these two mechanisms of attentional control have distinct effects on
perception and behavior (e.g., Jonides, 1981).

References
Balz, G. W., & Hock, H. S. (1997). The effect of attentional spread on spatial resolution. Vi­
sion Research, 37, 1499–1510.

Bashinski, H. S., & Bacharach, V. R. (1980). Enhancement of perceptual sensitivity as the


result of selectively attending to spatial locations. Perception & Psychophysics, 28, 241–
248.

Bichot, N. P., Cave, K. R., & Pashler, H. (1999). Visual selection mediated by location: Fea­
ture-based selection of noncontiguous locations. Perception & Psychophysics, 61, 403–
423.

Briand, K. A., & Klein, R. M. (1987). Is Posner’s “beam” the same as Treisman’s
(p. 252)

“glue”? On the relation between visual orienting and feature integration theory. Journal of
Experimental Psychology: Human Perception and Performance, 13, 228–241.

Broadbent, D. E. (1958). Perception and communication. London: Pergamon Press.

Cameron, E. L., Tai, J. C., & Carrasco, M. (2002). Covert attention affects the psychomet­
ric function of contrast sensitivity. Vision Research, 42, 949–967.

Carrasco, M. (2006). Covert attention increases contrast sensitivity: Psychophysical, neu­


rophysical and neuroimaging studies. Progress in Brain Research, 154, 33–70.

Carrasco, M., Ling, S., & Read, S. (2004). Attention alters appearance. Nature Neuro­
science, 7, 308–313.

Carrasco, M. & Yeshurun, Y. (2009). Covert attention effects on spatial resolution.


Progress in Brain Research, 176, 65–86.

Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye
to visual locations: identical, independent, or overlapping systems? Proceedings of the
National Academy of Science U S A, 95, 831–838.
Page 23 of 30
Spatial Attention

Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A., et al.
(1998). A common network of functional areas for attention and eye-movements. Neuron,
21, 761–773.

Corbetta, M., Kincade, M. J., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol­
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature Neuroscience, 3, 292–297.

Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L., & Petersen, S. E. (1993). Atten­
tional modulation of neural processing of shape, color, and velocity in humans. Science,
248, 1556–1559.

Corbetta, M., Miezin, F. M., Shulman, G. L. & Petersen, S. E. (1993). A PET study of visu­
ospatial attention. Journal of Neuroscience, 13, 1202–1226.

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten­
tion in the brain. Nature Neuroscience, 3, 201–215.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­
nual Review of Neuroscience, 18, 193–222.

Driver, J., & Baylis, G. C. (1989). Movement and visual attention: The spotlight metaphor
breaks down. Journal of Experimental Psychology: Human Perception and Performance, 3,
448–456.

Downing, P. E., & Treisman, A. M. (1997). The line-motion illusion: Attention or impletion?
Journal of Experimental Psychology: Human Perception and Performance, 23, 768–779.

Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology: General, 113, 501–517.

Eason, R. G. (1981). Visual evoked potential correlates of early neural filtering during se­
lective attention. Bulletin of the Psychonomic Society, 18, 203–206.

Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo­
cations: Evidence from normal and parietal lesion subjects. Journal of Experimental Psy­
chology: General, 123, 161–177.

Enns, J. T., Brehaut, J. C., & Shore, D. I. (1999). The duration of a brief event in the mind’s
eye. Journal of General Psychology, 126, 355–372.

Eriksen, C. W., & Eriksen (1974). Effects of noise letters upon the identification of a tar­
get letter in a nonsearch task. Perception & Psychophysics, 16, 143–149.

Eriksen, C. W., & Hoffman, J. E. (1972). Some characteristics of selective attention in visu­
al perception determined by vocal reaction time. Perception & Psychophysics, 11, 169–
171.

Page 24 of 30
Spatial Attention

Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of fo­
cal attention: a zoom lens model. Perception & Psychophysics, 40, 225–240.

Eriksen, C. W., & Yeh, Y. (1985). Allocation of attention in the visual field. Journal of Ex­
perimental Psychology: Human Perception and Performance, 11, 583–597.

Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu­
mans: A comparison of superior parietal and temporal-parietal junction lesions. Neuropsy­
chology, 12, 193–207.

Fox, M. D., Corbetta, M., Snyder, A. Z., Vincent, J. L., & Raichle, M. E. (2006). Sponta­
neous neuronal activity distinguishes human dorsal and ventral attention systems. Pro­
ceedings of the National Academy of Sciences, 103, 10046–10051.

Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu­
mans: A comparison of superior parietal and temporal-parietal junction lesions. Neuropsy­
chology, 12, 193–207.

Funes, M. J., Lupianez, J., & Milliken, B. (2007). Separate mechanisms recruited by exoge­
nous and endogenous spatial cues: Evidence from the spatial Stroop paradigm. Journal of
Experimental Psychology: Human Perception and Performance, 33, 348–362.

Gobell, J., & Carrasco, M. (2005). Attention alters the appearance of spatial frequency
and gap size. Psychological Science, 16, 644–651.

Gobell, J. L., Tseng, C. H., & Sperling, G. (2004). The spatial distribution of visual atten­
tion. Vision Research, 44, 1273–1296.

Goldberg, M. E., & Wurtz, R. H. (1972). Activity of superior colliculus cells in behaving
monkey. I. Visual receptive fields of single neurons. Journal of Neurophysiology, 35, 542–
559.

Hein, E., Rolke, B., & Ulrich, R. (2006). Visual attention and temporal discrimination: Dif­
ferential effects of automatic and voluntary cueing. Visual Cognition, 13, 29–50.

Helmholtz, H. V. (1866). Treatise on physiological optics (3rd ed., Vols. 2 & 3; J. P


Southall, Ed. and Trans.). Rochester, NY: Optimal Society of America. Rochester.

Hillyard, S. A., & Mangun, G. R. (1987). Sensory gating as a physiological mechanisms for
visual selective attention. In R. Johnson, R. Parasuraman, & J. W. Rohrbaugh (Eds.), Cur­
rent trends in event-related potential research (pp. 61–67). Amsterdam: Elsevier.

Hoffman, J. E., & Subramanian, B. (1995). The role of visual attention in saccadic eye
movements. Perception & Psychophysics, 57, 787–795.

Hopfinger, J. B., Buonocre, M. H., & Mangun, G. R. (2000). The neural mechanisms of top-
down attentional control. Nature Neuroscience, 3, 284–291.

Page 25 of 30
Spatial Attention

Hunt, A., & Kingstone, A. (2003). Covert and overt voluntary attention: Linked or inde­
pendent? Cognitive Brain Research, 18, 102–115.

James, W. (1890). The principles of psychology. New York: Henry Holt.

Johnston, W. A., & Dark, V. J. (1986). Selective attention. Annual Review of Psychology, 37,
43–75.

Jonides, J. (1981). Voluntary versus automatic control over the mind’s eye’s move­
(p. 253)

ment. In J. Long & A. Baddeley (Eds.), Attention and performance VIII (pp. 259–276).
Hillsdale, NJ: Erlbaum.

Jonides, J., & Mack, R. (1984). On the cost and benefit of cost and benefit. Psychological
Bulletin, 96, 29–44.

Kahneman, D., & Treisman, A. (1984). Changing views of attention and automaticity. In R.
Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 29–62). New York: Academ­
ic Press.

Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object
specific integration of information. Cognitive Psychology, 24, 175–219.

Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Na­
ture Reviews: Neuroscience, 1, 91–100.

Kastner, S., McMains, S. A., & Beck, D. M. (2009). Mechanisms of selective attention in
the human visual system: Evidence from neuroimaging. In M. S. Gazzaniga (Ed.), The cog­
nitive neurosciences (4th ed.). Cambridge, MA: MIT Press.

Kastner, S. & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cor­
tex. Annual Review of Neuroscience, 23, 315–341.

Klein, R. M. (1994). Perceptual-motor expectancies interact with covert visual orienting


under endogenous but not exogenous control. Canadian Journal of Experimental Psychol­
ogy, 48, 151–166.

Klein, R. M. (2004). On the control of visual orienting. In M. Posner (ed.), Cognitive neu­
roscience of attention (pp. 29–43.). New York: Guilford Press.

Klein, R. M., & Hansen, E. (1990). Chronometric analysis of spotlight failure in endoge­
nous visual orienting. Journal of Experimental Psychology: Human Perception and Perfor­
mance, 16, 790–801.

Klein, R. M., & Pontefract, A. (1994). Does oculomotor readiness mediate cognitive con­
trol of visual attention? Revisited! In C. Umilta (Ed.), Attention and performance XV (pp.
333–350). Hillsdale, NJ: Erlbaum.

Page 26 of 30
Spatial Attention

Klein, R. M., & Shore, D. I. (2000). Relations among modes of visual orienting. In S. Mon­
sell & J. Driver (Eds.), Attention and performance XVIII (pp. 195–208). Hillsdale, NJ: Erl­
baum.

Kramer, A. F., & Hahn, S. (1995). Splitting the beam: Distribution of attention over non­
contiguous regions of the visual field. Psychological Science, 6, 381–386.

Kramer, A. F., & Jacobson, A. (1991). Perceptual organization and focused attention: The
role of objects and proximity in visual processing. Perception & Psychophysics, 50, 267–
284.

LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen­
tal Psychology: Human Perception and Performance, 9, 371–379.

Liu, T., Abrams, J., & Carrasco, M. (2009). Voluntary attention enhances contrast appear­
ance. Psychological Science, 20, 354–362.

Luck, S. (1995). Multiple mechanisms of visual-spatial attention: Recent evidence from


human electrophysiology. Behavioural Brain Research, 71, 113–123.

Lupianez, J., Decraix, C., Sieroff, E., Chokron, S., Milliken, B., & Bartolomeo, P. (2004). In­
dependent effects of endogenous and exogenous spatial cueing: Inhibition of return at en­
dogenously attended target locations. Experimental Brain Research, 159, 4, 447–457.

Mackeben, M., & Nakayama, K. (1993). Express attentional shifts. Vision Research, 33,
85–90.

Mangun, G. R., & Hillyard, S. A. (1991). Modulations of sensory-evoked brain potentials


indicate changes in perceptual processing during visual-spatial priming. Journal of Exper­
imental Psychology: Human Performance and Performance, 17, 1057–1074.

Marois, R., Chun, M. M., & Gore, J. C. (2000). Neural correlates of the attentional blink.
Neuron, 28, 299–308.

Mattes, S., & Ulrich, R. (1998). Directed attention prolongs the perceived duration of a
brief stimulus. Perception & Psychophysics, 60, 1305–1317.

Maunsell, J. H. R. (2009). The effect of attention on the responses of individual visual neu­
rons. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (4th ed.). Cambridge, MA:
MIT Press.

Moran, J., & Desimone, R. (1985). Selection attention gates visual processing in the ex­
trastriate cortex. Science, 229, 782–784.

Muller, H. J., & Rabbitt, P. M. A. (1989). Reflexive and voluntary orienting of visual atten­
tion: Time course of activation and resistance to interruption. Journal of Experimental
Psychology: Human Perception and Performance, 15, 315–330.

Page 27 of 30
Spatial Attention

Nakayama, K., & Mackeben, M. (1989). Sustained and transient components of focal visu­
al attention. Vision Research, 29, 1631–1647.

Neisser, U. (1967). Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.

Nicol, J. R., Watter, S., Gray, K., & Shore, D. I. (2009). Object-based perception mediates
the effect of exogenous attention on temporal resolution. Visual Cognition, 17, 555–573.

O’Connor, D. H., Fukui, M. M., Pinsk, M. A., & Kastner, S. (2002). Attention modulates re­
sponses in the human lateral geniculate nucleus. Nature Neuroscience, 5, 1203–1209.

Palmer, S. E. (1999). Vision science: Photons to phenomenology. Cambridge, MA: MIT


Press.

Paus, T. (1996). Localization and function of the human frontal eye-field: A selective re­
view. Neuropsychologia, 34, 475–483.

Pestilli, F., & Carrasco, M. (2005). Attention enhances contrast sensitivity at cued and im­
pairs it at uncued locations. Vision Research, 45, 1867–1875.

Petersen, S. E., Robinson, D. L., & Morris, D. J. (1987). Contributions of the pulvinar to vi­
sual spatial attention. Neuropsychologia, 25, 97–105.

Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,


32A, 3–25.

Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G.
Bouwhuis (Eds.), Attention and performance X (pp. 521–556). Hillsdale, NJ: Erlbaum.

Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended processing
modes: The role of set for spatial location. In H. L. Pick and J. J. Saltzman (Eds.), Modes of
perceiving and processing information (pp. 137–157). Hillsdale, NJ: Erlbaum.

Posner, M. I., & Petersen, S. E. (1990). The attention system of the human brain. Annual
Review of Neuroscience, 13, 25–42.

Posner, M. I., Petersen, S. E., Fox, P. T., & Raichle, M. E. (1988). Localization of cognitive
operations in the human-brain. Science, 240, 1627–1631.

Posner, M. I., Snyder, C. R. R., Davidson, B. J. (1980). Attention and the detection of sig­
nals. Journal of Experimental Psychology: General, 109, 160–174.

Pratt, J., & Nghiem, T. (2000). The role of the gap effect in the orienting of attention: Evi­
dence for attentional shifts. Visual Cognition, 7, 629–644.

Remington, R. W. (1980). Attention and saccadic eye movements. Journal of Exper­


(p. 254)

imental Psychology: Human Perception and Performance, 6, 726–744.

Page 28 of 30
Spatial Attention

Rizzolatti, G., Riggio L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favour of a premotor theory of attention.
Neuropsychologia, 25, 31–40.

Robinson, D. L., & Petersen, S. (1992). The pulvinar and visual salience. Trends in Neuro­
sciences, 15, 127–721.

Saslow, M. G. (1967). Effects of components of displacement-step stimuli upon latency for


saccadic eye movements. Journal of the Optical Society of America, 57, 1024–1029.

Serences, J. T., Shomstein, S., Leber, A. B., Golay, X., Egeth, H. E., & Yantis, S. (2005). Co­
ordination of voluntary and stimulus-driven attentional control in human cortex. Psycho­
logical Science, 16, 214–222.

Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye move­
ments and spatial attention. Quarterly Journal of Experimental Psychology, 38A, 475–491.

Shore, D. I., & Spence, C. (2005). Prior entry. In L. Itti, G. Rees, & J. K. Tsotos (Eds.), Neu­
robiology of attention (pp. 89–95). Amsterdam: Elsevier.

Shore, D. I., Spence, C., & Klein, R. M. (2001). Visual prior entry. Psychological Science,
12, 205–212.

Shulman, G. L. McAvoy, M. P., Cowan, M. C., Astafiev, S. V., & Tansy, A. P., d’Avossa, G., et
al. (2003). Quantitative analysis of attention and detection signals during visual search.
Journal of Neurophysiology, 90, 3384–3397.

Stelmach, L. B., & Herdman, C. M. (1991). Directed attention and the perception of tem­
poral order. Journal of Experimental Psychology: Human Performance and Performance,
17, 539–550.

Spence, C., Shore, D. I., & Klein, R. M. (2001). Multisensory prior entry. Journal of Experi­
mental Psychology: General, 130, 799–832.

Spence, C., & Driver, J. (1994). Covert spatial orienting in audition—exogenous and en­
dogenous mechanisms. Journal of Experimental Psychology: General, 20, 555–574.

Spence, C., & Driver, J. (2004). Crossmodal space and crossmodal attention. London: Ox­
ford University Press.

Titchener, E. B. (1908). Lectures and the elementary psychology of feeling and attention.
New York: Macmillan.

Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive
Psychology, 12, 97–136.

Treisman, A., Kahneman, D., & Burkell, J. (1983). Perceptual objects and the cost of filter­
ing. Perception & Psychophysics, 33, 527–532.

Page 29 of 30
Spatial Attention

Tsal, Y., & Shalev, L. (1996). Inattention magnifies perceived length: The attentional re­
ceptive field hypothesis. Journal of Experimental Psychology: Human Perception and Per­
formance, 22, 233–243.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. G. Ingle, M. A.
Goodale, & R. J. Q. Mansfield (Eds.), Analysis of visual behaviour (pp. 549–586). Cam­
bridge, MA: MIT Press.

Van Voorhis, S. T., & Hillyard, S. E. (1977). Visual evoked potentials and selective atten­
tion to points in space. Perception & Psychophysics, 22, 54–62.

Wright, R. D., Richard, C. M., & McDonald, J. J. (1995). Neutral location cues and cost/
benefit analysis of visual attention shifts. Canadian Journal of Experimental Psychology,
49, 540–548.

Wright, R. D., & Ward, L. M. (2008). Orienting of attention. New York: Oxford University
Press.

Wundt, W. (1912). An introduction to psychology (R. Pinter, Trans.). London: Allen & Un­
win.

Yantis, S. Schwarzbach, J., Serences, J. T., Carlson, R. L. Steinmetz, M. A., Pekar, J. J., &
Courtney, S. M. (2002). Transient neural activity in human parietal cortex during spatial
attention shifts. Nature Neuroscience, 5, 995–1002.

Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396, 72–75.

Yeshurun, Y., & Carrasco, M. (1999). Spatial attention improves performance in spatial
resolution tasks. Vision Research, 39, 293–306.

Yeshurun, Y., & Levy, L. (2003). Transient attention degrades temporal resolution. Psycho­
logical Science, 14, 225–231.

Jeffrey R. Nicol

Jeffrey R. Nicol is Assistant Professor of Psychology, Affiliate of the Northern Centre


for Research on Aging and Communication (NCRAC), Nipissing University.

Page 30 of 30
Attention and Action

Attention and Action  


George Alvarez
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0013

Abstract and Keywords

At every moment, we face choices: Is it time to work, or to play? Should I listen to this
lecture, or check my e-mail? Should I pay attention to what my significant other is saying,
or do a mental inventory of what work I need to accomplish today? Should I keep my
hands on the wheel, or change the radio station? Without any change to the external envi­
ronment, it is possible to select a subset of these possibilities for further action. The
process of selection is called attention, and it operates in many domains, from selecting
our higher level goals, to selecting the sensory information on which we focus, to select­
ing what actions we perform. This chapter focuses on the relationship between visual at­
tention (selecting visual inputs) and action (selecting and executing movements of the
body). As a case study, the authors focus on visual-spatial attention, the act of choosing to
attend to a particular location in the visual field, and its relationship to eye-movement
control. Visual attention appears to select the targets for eye movements because atten­
tion to a location necessarily precedes an eye movement to that location. Moreover, there
is a great deal of overlap in the neural mechanisms that control spatial attention and eye
movements, and the neural mechanisms that are specialized for spatial attention or eye-
movements are highly intertwined. This link between spatial attention and eye move­
ments strongly supports the idea that a computational goal of the visual attention system
is to select targets for action, and suggests that many of the design properties of the spa­
tial attention system might be optimized for the control of eye movements. Whether this
relationship will hold broadly between other forms of attention (e.g., goal selection, audi­
tory selection, tactile selection) and other forms of action (e.g., hand movements and lo­
comotion) is an important topic of contemporary and future research.

Keywords: attention, action, visual attention, visual-spatial attention, eye movements, spatial attention

Let us start with a fairly ordinary situation (for me at least): you are hungry, and you de­
cide to get a snack from the refrigerator. You open the door with your left hand and arm,
and you look inside. Because you just went shopping you have a fairly stocked refrigera­
tor, so you have a lot of food to choose from. You are thinking that a healthy snack is in
order, and decide to look for carrots. How will you find the carrots? Most likely you will

Page 1 of 32
Attention and Action

“tune your visual system” to look for orange things, and all of the sudden you will become
aware of the location of your orange juice, your carrots, some sharp cheddar cheese, and
any other orange things in sight. You know that you keep your juice on the top shelf, and
your carrots near the bottom, so you further tune your visual system to look for orange
things near the bottom. Now you’ve zeroed in on those carrots and are ready to make
your move and grab them. You reach, and with extreme precision you position your hand
and fingers in exactly the right shape required to grasp the carrots, and begin snacking.

I occasionally have poured juice into my cereal (by mistake), most likely because
(p. 256)

the milk and juice containers appeared somewhat similar in shape. But have you ever ac­
cidentally grabbed a carton of eggs instead of carrots? Have you ever reached for a bag
of carrots as if they had a handle like a gallon of milk? Neither have I, and that is proba­
bly because we all have vision and action systems that have, for the most part, selected
the visual inputs and the physical actions that meet our current goals. This deceptively
simple act of grabbing the bag of carrots in your refrigerator is in fact an extremely im­
pressive behavior that requires the activity of a large network of brain regions (possibly
most of them!). These different regions are specialized for setting up your goal states,
processing incoming visual information, and executing actions. But none of this would
work if the system did not have “selective mechanisms” that enable us to select a subset
of visual inputs (the carrots and not the juice), and a subset of possible actions (the car­
rot-reach and not the juice-reach) that match our current goal state (getting a snack and
not a beverage). Not only do we have these selective mechanisms, but they also tend to
work in a coordinated and seamless fashion.

The overarching purpose of this chapter is to introduce the concept of attentional selec­
tion and the role that it plays in visual perception and action. To begin, the chapter pro­
vides a general definition of attention, followed by a specific introduction to visual atten­
tion. Next, the link between visual attention and eye movements is reviewed as a case
study exploring the relationship between visual attention and action. By taking you
through definitions, theoretical considerations, and evidence linking visual attention and
action, this chapter is aimed at three specific goals: (1) to provide a nuanced view of at­
tention as a process that operates at multiple levels of cognition, (2) to summarize empiri­
cal evidence demonstrating that at least one form of visual attention (visual-spatial atten­
tion) interacts with the action system, and (3) to provide a theoretical framework for un­
derstanding how and why visual attention and action interact.

Defining Attention (think of attention as a verb


instead of a noun)
A generally accepted definition of attention has proved elusive, possibly because any sim­
ple definition of attention in one domain (say vision) will apply to attention in another do­
main (say action, or even goal setting). Take the core of what James offered as a defini­
tion of attention: “Focalization, concentration … withdrawal from some things in order to
deal effectively with others” (James, 1890). Pashler (1998) offers a similar definition, in
Page 2 of 32
Attention and Action

which the core aspects of attention are (1) we can selectively process some stimuli more
than others (selection), (2) we are limited in the number of things we can process simulta­
neously (capacity limitation), and (3) sustained processing involves effort or exertion (ef­
fort). Such a definition will apply to a wide range of cognitive processes. For instance,
there are many possible things you can pay attention to in your visual field at this mo­
ment. Among other things, I currently have a keyboard, a coffee mug, a bottle of wine,
and some knives in sight. The process of selecting a subset of those visual inputs to focus
on is called attention. There are also many possible actions you can take at any given mo­
ment. I can move my eyes to focus on my coffee mug and pick it up, I can continue typing,
or I can get up and find a corkscrew in my kitchen. The process of selecting a subset of
those possible actions is also called attention. Finally, which one of these acts of selection
I execute depends on my goal state. Do I want to finish my work tonight? What is the
most effective approach for accomplishing this task? Should I keep typing, or reach for
the coffee mug? Or would it serve me better in the long run to have a glass of wine and
listen to some music, and start again tomorrow? What’s on TV? Choosing to focus on one
of these trains of thought and set up stable goals is also called attention.

At first it is confusing that we refer to all of these different acts by the same term (atten­
tion by James’ definition, and others’). These different acts of selection do not seem like
they could possibly be the same thing in the mind and brain. Of course, that’s because
they are not the same thing in either. The difficulty here is that we tend to think about the
term “attention” as a noun, and that’s the wrong way to think about it. Think of the word
“attention” as a verb instead, and it is a little easier to understand how and why the term
attention accurately applies to so many situations. Now attention is more like the word
“walk” which can be true of people, rats, horses. They are all the same because they all
walk, but they are not all the same thing in any physical sense. Likewise, selecting visual
inputs, selecting actions, and selecting goals are all forms of attention because they in­
volve a form of selection, but they are not all the same thing. What they share in common
is the process of selection: There are many competing representations (visual inputs, ac­
tion plans, goal states), and attention is the process of (p. 257) selecting a subset of those
competing representations. On this view, attention is a process that operates in multiple
domains. Thus, whenever you hear the term “attention,” you should ask, “Selection in
what domain? What are the competing representations that make selection necessary?”
With that starting point you can then ask questions about the mechanisms by which a
particular act of selection is achieved, the units of that selection process, and the conse­
quences of selection.

This view of attention does not require that the mechanisms of selection in each domain
are completely independent, or that there cannot be general mechanisms of attention. Ul­
timately selection operates over distinct competing representations in each domain (goals
compete with other goals, visual inputs compete with other visual inputs, action represen­
tations compete with other action representations). However, it is likely that goal selec­
tion (e.g., to continue working) interacts with visual selection (e.g., to focus on the coffee
mug), and that both forms of selection interact with action selection (e.g., to reach for the
handle). The competition happens mostly within a domain, so the process of selection also
Page 3 of 32
Attention and Action

happens mostly within a domain. But each process operates within an integrated cogni­
tive system, and the selective processes in one domain will influence selective processes
in the others. This chapter reviews evidence that the processes of visual selection and ac­
tion selection, although separable, are tightly coupled in this way.

In the following section we introduce visual attention, including limits on visual attention,
mechanisms of visual attention, and theoretical proposals for why we need visual atten­
tion, which is a useful starting point for understanding how visual attention interacts with
the control of action.

Introduction to Visual Attention


Understanding that attention is the process of selecting among competing representa­
tions, we can now focus on attentional selection within a particular domain: vision. Visual
attention is the process of selecting a subset of the incoming visual information. Visual at­
tention can be controlled automatically by environmental factors, or intentionally based
on the current goals, and with effort it can be continuously maintained on specific objects
or events. Although there are many unanswered questions about the mechanisms and
consequences of visual attention, two things about visual attention are abundantly clear:
(1) you cannot pay attention to everything in your visual field simultaneously, and (2) you
can tune your attention in various ways to select a particular subset of the incoming visu­
al information.

Limits of Visual Attention

The limits on our ability to pay attention are sometimes startling, particularly in cases of
inattentional blindness (Mack & Rock, 1998; Most, Scholl, Clifford, & Simons, 2005; Si­
mons & Chabris, 1999) and change blindness (J. K. O’Regan, Rensink, & Clark, 1999;
Rensink, O’Regan, & Clark, 1997; Simons & Rensink, 2005). For instance, in one study Si­
mons and Chabris (1999) had participants watch a video in which two teams (white shirts,
black shirts) tossed a basketball between players on the same team. The task was to
count the number of times players on the white-shirt team tossed the ball to another play­
er on the same team. Halfway through the video, a man in a black gorilla suit walked into
the scene from off camera, passing through the middle of the action right between play­
ers, stopping in the middle of the screen to thump his chest, and then continuing off
screen to the left. Remarkably, half of the participants failed to notice the gorilla, even
though it was plainly visible for several seconds! Indeed, when shown a replay of the
video, participants were often skeptical that the event actually occurred during their ini­
tial viewing. The original video for this experiment can be viewed at http://
www.dansimons.com/.

Similar results have been found in conditions with much higher stakes. For instance,
when practicing landing an airplane in a simulator with a heads-up display in which con­
trol panels are superimposed on the cockpit window, many pilots failed to see the appear­
ance of an unexpected vehicle on the runway (Haines, 1991). Attending to the control
Page 4 of 32
Attention and Action

panel superimposed over the runway appeared to prevent noticing completely visible and
critical information. Similarly, traffic accident reports often include accounts of drivers
“not seeing” clearly visible obstacles (McLay, Anderson, Sidaway, & Wilder, 1997). Such
occurrences are typically interpreted as attentional lapses: It seems that we have a se­
verely limited ability to perceive, understand, and act on information that falls outside the
current focus of attention.

Change-blindness studies provide further evidence for our limited ability to attend to vi­
sual information (J. K. O’Regan, Deubel, Clark, & Rensink, 2000; J. K. O’Regan, et al.,
1999; Rensink, et al., 1997; Simons, 2000; Simons & Levin, 1998; Simons & Rensink,
2005). In a typical (p. 258) change-blindness study, observers watch as two pictures are
presented in alternation, with a blank screen between them. The two pictures are identi­
cal, except for one thing that is changing, and the task is simply to identify what is chang­
ing. Even when the difference between images is a substantial visual change (e.g., a large
object repeatedly disappearing and reappearing), observers often fail to notice the
change for several seconds. For demonstrations, go to http://www.dansimons.com/ or
http://visionlab.harvard.edu/Members/George/demo-GradualChange.html. These studies
provide a dramatic demonstration that, although our subjective experience is that we
“see the whole visual field,” we are unable to detect a local change to our environment
unless we focus attention on the thing that changes (Rensink, et al., 1997; Scholl, 2000).
Indeed, with attention, such changes are trivially easy to notice in standard change-blind­
ness tasks, making the failure to detect such changes before they are localized all the
more impressive.

Thus, it appears that we need attention to keep track of information in a dynamic visual
environment. We fail to notice new visual information, or changes to old visual informa­
tion, unless we selectively attend to that information. If we could pay attention to all of
the incoming visual information at once, we would not observe these dramatic inatten­
tional blindness and change-blindness phenomena.

How We Tune Our Attention

Fortunately the visual system is equipped with a variety of mechanisms that control the
allocation of attention, effectively tuning our attention toward salient and task-relevant
information in the visual field. The mechanisms of attentional control are typically divided
into stimulus-driven and goal-driven mechanisms (for a review, see Yantis, 1998). Stimu­
lus-driven attentional shifts occur when there is an abrupt change in the environment,
particularly the appearance of a new object (Yantis & Jonides, 1984), or more generally
when certain dynamic events occur in the environment (Franconeri, Hollingworth, & Si­
mons, 2005; Franconeri & Simons, 2005). Our attention also tends to be guided toward
salient portions of the visual field, such as a location that differs from its surround in
terms of color or other features (Itti & Koch, 2000).

Page 5 of 32
Attention and Action

In addition to these stimulus-driven mechanisms, there are also a variety of goal-driven


mechanisms that enable us to choose which visual information to select, including object-
based attention, feature-based attention, and location-based attention.

Research on object-based attention suggests that attention can select discrete objects,
spreading through them and constrained by their boundaries, while suppressing informa­
tion that does not belong to the selected object. There is a wide range of empirical sup­
port for this theory. In the classic demonstration of object-based attention, attention is
cued to part of an object where the speed and accuracy of perceptual processing is en­
hanced. Critically, performance is enhanced at uncued locations that are part of the same
object, relative to uncued locations equally distant from the cue, but within part of anoth­
er object (Atchley & Kramer, 2001; Avrahami, 1999; Egly, Driver, & Rafal, 1994; Z. J. He &
Nakayama, 1995; Lamy & Tsal, 2000; Marino & Scholl, 2005; Vecera, 1994). Similarly, di­
vided attention studies have shown that, when the task is to report or compare two target
features, performance is better when the target features lie within the boundaries of the
same object compared with when the features are spatially equidistant but appear in dif­
ferent objects (Ben-Shahar, Scholl, & Zucker, 2007; Duncan, 1984; Kramer, Weber, & Wat­
son, 1997; Lavie & Driver, 1996; Valdes-Sosa, Cobo, & Pinilla, 1998; Vecera & Farah,
1994). The fact that these “same object” advantages occur suggests that visual attention
automatically selects entire objects. Many other paradigms provide evidence supporting
this hypothesis, including studies using functional imaging (e.g., O’Craven, Downing, &
Kanwisher, 1999), visual search (e.g., Mounts & Melara, 1999), negative priming (e.g.,
Tipper, Driver, & Weaver, 1991), inhibition of return (e.g., Reppa & Leek, 2003; Tipper et
al., 1991), object reviewing (e.g., Kahneman, Treisman, & Gibbs, 1992; Mitroff, Scholl, &
Wynn, 2004), attentional capture (e.g., Hillstrom & Yantis, 1994), visual illusions (e.g.,
Cooper & Humphreys, 1999), and patient studies (e.g., Humphreys, 1998; Ward,
Goodrich, & Driver, 1994).

Feature-based attention is the ability to tune attention to a particular feature (e.g., red)
such that all items in the visual field containing that feature are simultaneously selected.
Early research provided evidence for feature-based attention using the visual search par­
adigm (Bacon & Egeth, 1997; Egeth, Virzi, & Garbart, 1984; Kaptein, Theeuwes, & Van
der Heijden, 1995; Zohary & Hochstein, 1989). For example, Bacon and Egeth (1997)
asked observers to search for a red X among red Os and black Xs. When participants
were instructed to attend to (p. 259) only the red items, the time to find the target was in­
dependent of the number of black Xs. This finding suggests that attention could be limit­
ed to only the red items in the display, with black items filtered out. This example fits with
the intuition that if your friend were one of a few people wearing a red shirt at a St.
Patrick’s Day party, she would be relatively easy to find in the crowd. Other evidence that
attention can be tuned to specific stimulus features comes from inattentional blindness
studies (Most & Astur, 2007; Most et al., 2001, 2005). For example, Most and colleagues
(2001) asked observers to attentively track a set of moving objects, and asked whether
they noticed the appearance of an unexpected object on one of the trials. The likelihood
of noticing the unexpected object depended systematically on its similarity to the attend­
ed objects. For example, when observers attended to white objects, they were most likely
Page 6 of 32
Attention and Action

to notice the unexpected object if it was white, less likely if it was gray, and least likely if
it was black. Similarly, the time to find the target in a visual search task depends on the
similarity between targets and distractors (Duncan & Humphreys, 1989; Konkle, Brady,
Alvarez, & Oliva, 2010), as does the likelihood of a distractor capturing attention (Folk,
Remington, & Johnston, 1992). Other research has shown that attention can be tuned to
particular orientations or spatial frequencies (Rossi & Paradiso, 1995), as well as color
and motion direction (Saenz, Buracas, & Boynton, 2002, 2003). These effects of feature-
based attention appear to operate over the full visual field, such that attending to a par­
ticular feature selects all objects in the visual field that share the same feature, even for
stimuli that are irrelevant to the task (Saenz, et al., 2002).

Finally, location-based attention is the process of focusing attention on a particular loca­


tion in space (e.g., “to the left”), so that information in the attended location is selected
and surrounding information is ignored or suppressed. You likely know from personal ex­
perience that it is possible to attend to visual information in the periphery, and early re­
searchers documented the same observation (Helmholtz, 1962; James, 1890). Early em­
pirical support for the idea that attention can be moved independent of the eyes came
from a series of experiments by Posner and colleagues (Posner, 1980; Posner, Snyder, &
Davidson, 1980). In the standard Posner cueing paradigm, observers keep their eyes fo­
cused on a fixation point at the center of the screen. The task is to respond to the presen­
tation of flash of light by pressing a key, but it is uncertain where that flash of light will
appear. Observers are given a hint about the most likely location via a cue that indicates
the most likely peripheral location where the flash will appear. On most trials the flash
appears in the cued location (valid trials), so there is an incentive to focus on the cued lo­
cation. However, on some trials, the flash actually appears in the uncued location (invalid
trials). Posner found that responses are faster and more accurate when they appear in the
cued location than when they appear in the uncued location, even when the eyes remain
focused at the central fixation point. This suggests that observers are able to shift their
attention away from where their eyes are looking and focus on locations in the periphery.
Other research has explored the spatial extent of the selected region (Engel, 1971; Erik­
sen & St. James, 1986; Intriligator & Cavanagh, 2001), the two-dimensional and three-di­
mensional shapes of the selected region (Downing & Pinker, 1985; LaBerge, 1983;
LaBerge & Brown, 1986), how attention is shifted from one location to another (Reming­
ton & Pierce, 1984; Shulman, Remington, & McLean, 1979), whether attention can be
split (Awh & Pashler, 2000; McMains & Somers, 2004), and the number of locations that
can be selected at once (Alvarez & Franconeri, 2007; Franconeri, Alvarez, & Enns, 2007).

Theories of Visual Attention


Our capacity to attend is quite limited, and there exist a variety of mechanisms for con­
trolling the allocation of attention so that it will be directed toward the most relevant sub­
set of the visual field. But why is our ability to attend so limited? There are many theories
of visual attention, but for the purposes of this chapter, two classes of theory are perhaps
the most relevant. Capacity-limitation theories of visual attention assume that the brain is
Page 7 of 32
Attention and Action

a finite system (fair assumption), and consequently is limited in the amount of informa­
tion it can process at once. On this view, we need attention to choose which inputs into
the visual system will be fully processed, and which inputs will not be fully processed. An
alternative framework, the attention-for-action framework, assumes that a particular
body part can only execute a single action at once (another fair assumption), and that at­
tention is needed to select a target for that action. On this view, visual attention is limited
because the action system is limited. Although not necessarily mutually exclusive, these
frameworks have been presented in opposition. In this section, I summarize these theo­
retical frameworks and conclude (1) that visual attention is clearly required to handle in­
formation processing (p. 260) constraints (supporting a capacity-limitation view) and (2)
that attention is not necessarily limited because action is limited. This is in line with the
view that attention is a process that operates at multiple levels (goal selection, action se­
lection, visual selection), and highlights the point that the process of selection in one do­
main need not be constrained by the process of selection in another domain. However, as
described in the following section, there is clearly a tight link between at least one form
of visual attention—visual-spatial attention—and action.

Capacity Limitations and Competition for Representation

One class of attentional theory assumes that there is simply too much information flood­
ing into the visual system to fully process all of it at once. Therefore, attentional mecha­
nisms are required to select which inputs to process at any given moment. This informa­
tion overload begins at the earliest stages of visual processing, when a massive amount of
information is registered. The retina has about 6.4 million cones and 110 to 125 million
rods (Østerberg, 1935), which is a fairly high-resolution sampling of the visual field
(equivalent to a 6.4-megapixel camera in the fovea). A million or so axons from each eye
then project information to the brain for further processing (Balazsi, Rootman, Drance,
Schulzer, & Douglas, 1984; Bruesch & Arey, 2004; Polyak, 1941; Quigley, Addicks, &
Green, 1982). These initial measurements must be passed on to higher-level processing
mechanisms that enable the visual system to recognize and localize objects in the visual
field. However, simultaneously processing all of these inputs in order to identify every ob­
ject in the visual field at once would be highly computationally demanding. Intuitively,
such massively parallel processing is beyond the capacity of the human cognitive system
(Broadbent, 1958; Neisser, 1967). This intuition is also supported by computational analy­
ses of visual processing (Tsotsos, 1988), which suggest that such parallel processing is
not viable within the constraints of the human visual system. Thus, the capacity limita­
tions on higher-level cognitive processes, such as object identification and memory encod­
ing, require a mechanism for selecting which subset of the incoming information should
gain access to these higher-level processes.

Of course, there is good behavioral evidence that the visual system is not fully processing
all of the incoming information simultaneously, including the inattentional blindness and
change blindness examples described above. In your everyday life, you may have experi­
enced these phenomena while driving down the road and noticing a pedestrian that
seemed to “appear out of nowhere,” or missing the traffic light turning green even
Page 8 of 32
Attention and Action

though you were looking in the general direction of the light. Or you may have experi­
enced not being able to find something that was in plain sight, like your keys on a messy
desk. It seems safe to say that we are not always aware of all of the information coming
into our visual system at any given moment. Take a look at Figure 13.1, which depicts a
standard visual search task used to explore this in a controlled laboratory setting. Focus
your eyes on the “x” at the center of the display, and then look for the letter T, which is lo­
cated somewhere in the display, tilted on its side in either the counterclockwise () or
clockwise () direction. Try not to be fooled by the almost Ts, like ,, , or . Why does it take
so long to find the T? Is it that you “can’t see it?” Not really, because now that you know
where it is (far right, just below halfway down), you can see it perfectly well even while
continuing to focus your eyes at the x at the center.

Figure 13.1 Standard visual search display. Focus


your eyes on the “x” at the center of the display, and
then look for the letter T, which is located some­
where in the display, titled on its side in either the
counterclockwise |—, or clockwise—| direction.

How can we understand our failure to “see” things that are plainly visible? One possibility
is that the visual system has many specialized neurons that code for different properties
of objects (e.g., some neurons code for color, others code for shape), and that attention is
required to combine these different features into a single coherent object representation
(Treisman & Gelade, 1980). Attending to multiple objects at once would result in confu­
sion between the features of one object and the features of the other attended objects. On
this account, attention can only perform this feature integration operation correctly if it
operates over just one object at a time. (p. 261) Thus, you cannot see things that are plain­
ly visible because at any give moment you can only accurately see the current object of
attention.

An alternative theory, biased competition (Desimone, 1998; Desimone & Duncan, 1995),
places the capacity limit at the level of neuronal representation. In this model, when mul­
tiple objects fall within the receptive field of a neuron in visual cortex, those objects com­
pete for the response of the neuron. Neurons are tuned such that they respond more for
Page 9 of 32
Attention and Action

some input features than for other input features (e.g., a particular neuron might fire
strongly for shape A and not at all for shape B). How would such a neuron respond when
multiple inputs fall within its receptive field simultaneously (e.g., if both shape A and
shape B fell within the receptive field)? One could imagine that you would observe an av­
erage response, somewhere between the strong response to the preferred stimulus and a
weak response to the nonpreferred stimulus. However, it turns out that the response of
the neuron depends on which object is attended, with stronger responses when the pre­
ferred stimulus is attended and weaker responses when the nonpreferred stimulus is at­
tended. Thus, attention appears to bias this neuronal competition in favor of the attended
stimulus. Both stimulus-driven and goal-driven mechanisms can bias the representation of
the neuron in favor of one object over the other, and the biasing signal (“tuning mecha­
nism”) can be feature based or space based. This biasing signal enhances the response to
the attended object and suppresses the response to the unattended object. The capacity
limit in this case is at the level of the individual neuron, which cannot represent two ob­
jects at once, and therefore attentional mechanisms are required to determine which ob­
ject is represented. This theory has been supported by a variety of physiological evidence
in monkeys (for a review, see Desimone, 1998) and is consistent with behavioral evidence
in humans (Carlson, Alvarez, & Cavanagh, 2007; Motter & Simoni, 2007; Scalf & Beck,
2010; Torralbo & Beck, 2008).

An analysis of the complexity of the problems the visual system must solve rules out the
possibility that a human-sized visual system could fully process all of the incoming visual
information at once. The limits on processing capacity can be conceived as limits on spe­
cific computations (e.g., feature binding), or in terms of neuronal competition (e.g., com­
peting for the response of a neuron). On either view, it is clear that visual selection is re­
quired, at least in part if not in full, by capacity limitations that arise within the stream of
visual processing owing to the architecture of the visual system.

Attention for Action

An alternative class of theories holds that attention is limited because action is limited.
Because of physical constraints, your body parts can only move in one direction at a time:
Each hand can only move in one direction, each eye can only move in one direction, and
so on. According to the attention-for-action view, the purpose of visual attention is to se­
lect the most relevant visual information for action and to suppress irrelevant visual infor­
mation. Consequently, you are limited in the number of things you can pay attention to at
once because your body is limited in the number of things it can act on at once. In its
strongest form, the attention-for-action theory proposes that attentional selection limits
are not set by information processing constraints: Attention is limited because action is
limited (Allport, 1987, 1989, 1993; Neumann, 1987, 1990; Van der Heijden, 1992). The
most appealing aspect of the attention-for-action theory is the most intuitive part: It
makes sense that we would pay attention to things while we perform actions on them.
Even without any data, I am convinced that I attend to the handle of my coffee mug just
before I pick it up—and of course eye-movement studies support this intuition (Land &
Hayhoe, 2001). However, the satisfaction of this part of the attention-for-action theory
Page 10 of 32
Attention and Action

does not imply that attention is limited because action is limited. It is possible for atten­
tion and action to be tightly coupled, but for attentional limitations to be determined by
factors other than the limitations on action, such as the architecture of the visual system
(e.g., neurons that encompass large regions of the visual field consisting of multiple ob­
jects). I will support a view in which visual attention selects the targets for action, but in
which the limitations on visual attention are not set by the limitations on action.

Where Attention and Action Diverge

To push the claim that attention is limited because action is limited, we can think about
the limits on visual attention and ask if they match with limits on physical action. For in­
stance, feature-based visual attention is capable of globally selecting a feature such as
“red” or “moving to the right,” such that all objects or regions in the visual field sharing
that feature receive enhanced processing over objects or regions that do not have the se­
lected feature (Bacon & Egeth, 1997; Rossi & Paradiso, 1995; Saenz, et al., 2002, 2003;
Serences & Boynton, 2007; Treue & Martinez Trujillo, 1999). Consequently, in some situa­
tions feature-based (p. 262) attention results in selecting dozens of objects, which far ex­
ceeds the action limit of one (or a few) objects at once. Perhaps one could argue that such
a selected group constitutes a single “object of perception” —whereby within-group
scrutiny requires additional selective mechanisms—but because this single perceptual
group can be distributed across the entire visual field and interleaved with other “unse­
lected” information, it cannot be considered a potential object of action (which is neces­
sarily localized in space). Thus, action limits cannot explain the limits on feature-based at­
tention.

One might argue that attention-for-action theory does not apply to feature-based atten­
tion, but rather that it applies only to spatial attention. This is a fair point because action
occurs within the spatial domain. However, visual-spatial attention also shows many con­
straints that do not match obvious action limits. Most notably, although action is limited
to a single location at any given moment, it is possible to attend to multiple objects or lo­
cations simultaneously, in parallel (Awh & Pashler, 2000; McMains & Somers, 2004;
Pylyshyn & Storm, 1988; for a review of multifocal attention, see Cavanagh & Alvarez,
2005). For instance, a common task for exploring the limits on spatial attention is the
multiple-object-tracking task. In this task, observers view a set of identical moving ob­
jects. A subset is highlighted as targets to be tracked, and then all items appear identical
and continue moving. The task is to keep track of the target items, but because there are
no distinguishing features of the objects, participants must continuously attend to the ob­
jects in order to track them. Observers are able to keep track of one to eight objects con­
currently, depending on the speed and spacing between the items (Alvarez & Franconeri,
2007), and the selection appears to be multifocal, selecting targets but not the space be­
tween targets (Intriligator & Cavanagh, 2001; Pylyshyn & Storm, 1988). The upper bound
on spatial attention of at least eight objects is far beyond the action limit of a single loca­
tion. One might argue that multiple locations can be attended because it is possible to
plan complex actions that involve acting on multiple locations sequentially. However, on
this view, there is no reason for there to be any limit on the number of locations that
Page 11 of 32
Attention and Action

could be attended because in theory an infinite sequence of actions could be planned.


Thus, action limitations seem to be unrelated to the limit on the number of attentional fo­
ci that can be deployed.

Other constraints on spatial attention are also difficult to explain within the attention-for-
action framework. In particular, there appear to be important low-level, anatomical con­
straints on attentional processing. For instance, the spatial resolution of attention is
coarser in the upper visual field than in the lower visual field (He, Cavanagh, & Intriliga­
tor, 1996; Intriligator & Cavanagh, 2001), and it is easier to maintain sustained focal at­
tention along the horizontal meridian than along the vertical meridian (Mackeben, 1999).
It is difficult to understand these limitations in terms of action constraints, unless we as­
sume that our hands are less able to act on the upper visual field (perhaps because of
gravity) and less able to act on the vertical meridian (it seems awkward to perform ac­
tions above and below fixation). An alternative explanation is that the upper/lower asym­
metry is likely to be associated with visual areas in which the lower visual field is overrep­
resented. In monkeys, there is an upper/lower asymmetry that increases from relatively
modest in V1 (Tootell, Switkes, Silverman, & Hamilton, 1988) to much more pronounced
in higher visual areas like MT (Maunsell & Van Essen, 1987) and parietal cortex (Galletti,
Fattori, Kutz, & Gamberini, 1999). The horizontal/vertical meridian asymmetry is likely to
be linked to the relatively lower density of ganglion cells along the vertical meridian rela­
tive to the horizontal meridian (Curcio & Allen, 1990; Perry & Cowey, 1985) and possibly
to the accelerated decline of cone density with eccentricity along the vertical meridian
relative to the horizontal meridian (Curcio, Sloan, Packer, Hendrickson, & Kalina, 1987).

There is some disagreement as to whether these asymmetries should be considered ef­


fects of attentional processing or lower level perceptual constraints (Carrasco, Talgar, &
Cameron, 2001). However, other effects, such as interference effects between targets and
distractors, or targets and other targets, are clearly limitations on attentional processing
and not visual perception (Alvarez & Cavanagh, 2005; Carlson et al., 2007). One particu­
larly dramatic demonstration of a visual field effect on attentional processing is the hemi­
field independence observed in attentive tracking (Alvarez & Cavanagh, 2005). In this at­
tentive tracking task, observers kept their eyes focused at the center of the display and
attentively tracked moving targets in one of the four quadrants of the peripheral visual
field. Surprisingly, observers could keep track of twice as many targets when they ap­
peared in separate halves of the visual field (e.g., in the top left and top right quadrants)
than when they appeared in the same half of the visual field (e.g., in the top right and bot­
tom right quadrants). (p. 263) It was as if the attentional processes required to track a
moving object could operate independently in the left and right visual hemifields. This de­
gree of independence is surprising since attentional selection is often considered to be a
high-level cognitive process, and hemifield representations are characteristic of lower-
level visual areas (c.f. Bullier, 2004).

It is tempting to link these sorts of hemifield effects with hemispheric control of action by
assuming that contralateral control of the body is linked to hemifield constraints on atten­
tional selection (e.g., where the right hemisphere controls reaching and attention to the

Page 12 of 32
Attention and Action

left visual field, and the left hemisphere controls reaching and attention to the right visu­
al field). However, there are also quadrantic limits on attentional selection (Carlson et al.,
2007) that are not amenable to such an account. Specifically, attended targets interfere
with each other to a greater extent when they both appear within the same quadrant of
the visual field than when they are equally far apart but appear in separate quadrants of
the visual field. It is difficult to understand these limits on spatial attention within an at­
tention-for-action framework because there are no visual field quadrantic effects on body
movement. Instead, a capacity limit account, particularly within the competition-for-rep­
resentation framework, provides a more natural explanation for these visual field effects.
For instance, by taking known anatomy into account, we can find some important clues as
to the cause of the quadrantic deficit. Visual effects that are constrained by the horizontal
and vertical meridian can be directly linked to extrastriate areas V2 and V3 (Horton &
Hoyt, 1991), which maintain noncontiguous representations of the four quadrants of the
visual field. Increasing the distance between the lower-level cortical representations of
each target appears to decrease the amount of attentional interference between them.
One possibility is that cortical distance is correlated with the degree of the overlap be­
tween the receptive fields of neurons at two attended locations (i.e., the degree of compe­
tition for representation). On this account, the release from interference over the meridi­
ans suggests that receptive fields of neurons located on these intra-areal borders may not
extend across quadrants of the visual field. This account is purely speculative, but for our
purpose the important point is that there are quadrant-level attentional interference ef­
fects, and anatomical constraints offer some potential explanations, whereas limits on ac­
tion do not.

These are just a few examples to illustrate that visual attention is not limited only be­
cause action is limited. Many constraints on visual-spatial attention are not mirrored by
constraints on the action system, including limits on the number of attentional foci that
can be maintained at once and visual field constraints on attentional selection. Such con­
straints can be understood in terms of capacity limits in the form competition for repre­
sentation, by taking the architecture of the human visual system into account. Thus, to a
great extent, the mechanisms and limitations on visual-spatial attention can be under­
stood separately and independently from the limitations on action. This should not be tak­
en to mean that the process of spatial attention is somehow completely unrelated to the
action system, or that the action system does not constrain visual selection. Visual atten­
tion and action are component processes of an integrated cognitive system, and as de­
scribed in the following section, the two systems are tightly coupled.

Spatial Attention and Eye Movements


Although we have argued that the limits on the capacity of attention are largely indepen­
dent of the limits on action, there is clearly an important role for visual attention in the
selection of the targets for action. This tight coupling between visual selection and action
selection is most apparent with the interaction between visual-spatial attention and eye
movements. There is strong evidence that before you move your eyes to a location, visual
Page 13 of 32
Attention and Action

attention first selects that location, as if attention selects the targets of eye movements.
Indeed, theories such as the premotor theory of attention (Rizzolatti, Riggio, Dascola, &
Umilta, 1987) propose that the mechanisms of visual attention are the same mechanisms
that control eye movements. However, even within the domain of vision, attention is a
process that operates at many levels, and it is likely that some aspects of visual attention
do not perfectly overlap with the eye movement system, and the evidence suggests that
there is some degree of separation between the mechanisms of attention and eye move­
ments. Nevertheless, the idea that there is a high degree of overlap between the mecha­
nisms of visual-spatial attention and eye movements is well supported by both behavioral
and neurophysiological evidence.

Covert Visual-Spatial Attention

One logical possibility for the relationship between visual-spatial selection and eye move­
ments is that the focus of attention is locked into alignment with the center of fixation (at
the fovea). That (p. 264) is, you might always be attending where you eyes are fixating.
However, evidence that attention can be shifted covertly away from fixation rejects this
possibility (Eriksen & Yeh, 1985; J. M. Henderson, 1991; Posner, 1980). For instance,
Posner’s cueing studies suggest that observers are able to shift their attention away from
where their eyes are looking, and focus on locations in the periphery. This ability to shift
attention away from the eyes is known as covert attention.

As described above, other research has shown that it is possible to attend to multiple ob­
jects simultaneously (Pylyshyn & Storm, 1988). It is impossible to perform this task using
eye movements alone because the targets move randomly and independently of each oth­
er. In other words, you can only look directly at one object at a time, and yet it is possible
to track multiple objects simultaneously (anywhere from two to eight, depending on the
speed and spacing of the items; Alvarez & Franconeri, 2007). Also, this task cannot be
performed by grouping items into a single object (Yantis, 1992) because each item moves
randomly and independently of the other items. Consequently, even if the items are per­
ceptually grouped, the vertices must be independently selected and tracked to perform
the task. Thus, it would appear that we have a multifocal, covert attention system that
can select and track multiple moving objects at once, independent of eye position (Ca­
vanagh & Alvarez, 2005).

Page 14 of 32
Attention and Action

Behavioral Evidence Linking Visual-Spatial Attention and Eye Move­


ments

Figure 13.2 Schematic of the paradigm employed by


McConkie and Rayner (1975) for examining the span
of perception. Observers read some passage of text
(a), while their eyes are tracked (fixation position de­
noted by the *). Outside of a window around the fixa­
tion, all letters are changed to x. When the window is
large (b), observers don’t notice, but when the win­
dow is small (c), they do notice. In general, the span
of perception is asymmetrical, with a larger span
ahead of the direction of eye movements (d).

It is clear that attention can be shifted covertly away from the center of fixation, but that
visual-spatial attention and eye movements appear tightly coupled. Specifically, it appears
that attention to a location precedes eye movements to that location. Early studies explor­
ing eye movements during reading found evidence suggesting that attention precedes
saccadic eye movements. McConkie and Rayner (1975) developed a clever paradigm
called the “moving window” or “gaze-contingent display” to investigate perceptual pro­
cessing during reading. They had observers read text (Figure 13.2a), and changed all of
the letters away from fixation into the letter x. You might think this would be easily no­
ticeable to the readers, but it was possible to change most of the letters to x without read­
ers noticing. To accomplish this, researchers created a “window” around fixation where
the text always had normal letters, and as observers moved their eyes, the window moved
with them. Interestingly, observers did not notice this alteration of the text when the win­
dow was large (Figure 13.2b), but they did notice when the window was small (Figure
13.2c). Thus, by manipulating the size of the moving window, it is possible to determine
the span of perception during reading (the number of letters readers notice as they read).
Interestingly, the span of perception is highly asymmetrical around the point of fixation,
with three to four letters to the left of fixation, and fourteen to fifteen letters to the right
of fixation (McConkie & Rayner, 1976) (Figure 13.2d).

What does this have to do with the relationship between attention and eye movements?
One interpretation of this finding is that during reading, eye movements to the right are
preceded by a shift of attention to the right. This interpretation assumes that attention
enhances the perception of letters, which is consistent with the known enhancing effects
of attention (Carrasco, Ling, & Read, 2004; Carrasco, Williams, & Yeshurun, 2002; Titch­
ener, 1908; Yeshurun & Carrasco, 1998). This interpretation is further supported by the
finding that readers of Hebrew text, which is read from right to left, show the opposite
pattern: More letters are perceived to the left of fixation than to the right of fixation (Pol­
latsek, Bolozky, Well, & Rayner, 1981). This asymmetry of the perceptual span does not

Page 15 of 32
Attention and Action

appear to rely on extensive reading practice: In general, the perceptual span appears to
be asymmetrical in the direction of eye movements, even when reading in an atypical di­
rection (Inhoff, Pollatsek, Posner, & Rayner, 1989) or when scanning objects in the visual
field in a task that does not involve reading (Henderson, Pollatsek, & Rayner, 1989).

Figure 13.3 Schematic of the paradigm employed by


Hoffman and Subramaniam (1995) for examining the
link between saccade location and attention. A loca­
tion is cued to designate where the eyes should
move, but the eye movement is not executed right
away. Then a tone is played, indicating that the eyes
should now move to the target location. Before the
eyes have a chance to move, letters briefly appear.
The task is to determine whether one of those letters
is a T or an L, and the target letter can appear at any
of the four possible locations. However, the letter is
detected more accurately when it happens to appear
where the eye movement is planned (i.e., the eventu­
al location of the eyes).

Saccadic eye movements are ballistic movements of the eyes from one location to anoth­
er. Direct (p. 265) evidence for attention preceding saccadic eye movements comes from
studies showing enhanced perceptual processing at saccade target locations before sac­
cade execution (Chelazzi, et al., 1995; Crovitz & Daves, 1962; J. M. Henderson, 1993;
Hoffman & Subramaniam, 1995; Schneider & Deubel, 1995; Shepherd, Findlay, & Hock­
ey, 1986). The assumption of such studies is that faster, more accurate perceptual pro­
cessing is a hallmark of visual-spatial attention. In one study, Hoffman and Subramaniam
(1995) used a dual-task paradigm in which they presented a set of four possible saccade
target locations. For the first task, an arrow cue pointed to one of the four locations, and
observers were instructed to prepare a saccade to the cued location (Figure 13.3). How­
ever, observers did not execute the saccade until they heard a beep. The critical question
was whether attention would obligatorily be focused on the saccade target location,
rather than on the other locations. Because attention presumably enhances perceptual
processing at attended locations, it is possible to use a secondary task to probe whether
attention is allocated to the saccade target location. If perceptual processing were en­
hanced at the saccade target location relative to other locations, then it would appear
that attention was allocated to the saccade target location. For the second task, four let­
ters were briefly presented, with one letter appearing in each of the four possible sac­
cade target locations (see Figure 13.3). One of those letters was the target letter (either a
T or an L), and the other letters were distractor letters, Es and Fs. The task was to identi­
fy whether a T or L was present, and the T or L could appear at any position, independent
of the saccade target location. Because the letter could appear anywhere, there was no
Page 16 of 32
Attention and Action

incentive to look for the target letter in the saccade target position, and yet observers
were more accurate in detecting the target letter when it appeared at the saccade target
location. In a follow-up experiment, it was shown that observers could not attend to one
location and simultaneously move the eyes to a different location. Thus, the allocation of
attention to the saccade target location is in fact obligatory, occurring even when condi­
tions are in favor of attending to a location other than the saccade target location.

Several studies have taken a similar approach, demonstrating that it does not appear pos­
sible to make an eye movement without first attentionally selecting the saccade target lo­
cation. Shepherd, Findlay, and Hockey (1986) showed observers two boxes, one to the left
and one to the right of fixation. A central arrow cued one of the locations, and observers
were required to shift attention to the cued location. Shortly before or after the saccade,
a probe stimulus (a small square) appeared, and observers had to press a key as quickly
as possible. Critically, observers knew that the probe would mostly likely appear in the
box opposite of the saccade target location, providing an incentive to attend to the box
opposite of the saccade target if possible. Nevertheless, response times were faster when
the saccade target location and the probe location were the same, relative to when they
appeared in different boxes. This was true even when the probe appeared before the sac­
cade occurred. It seems difficult or impossible to move the eyes in one direction while at­
tending to a location in the opposite direction. Thus, although it is possible to shift atten­
tion without moving the eyes, it does not seem possible to shift the eyes without shifting
attention in that direction first.

This tight coupling between visual-spatial attention and eye movements is not on­
(p. 266)

ly true for saccadic eye movements but also appears to hold for smooth pursuit eye move­
ments (Khurana & Kowler, 1987; van Donkelaar, 1999; van Donkelaar & Drew, 2002).
Smooth pursuit eye movements are the smooth, continuous eye movements used to main­
tain fixation on a moving object. Indirect evidence for the interaction between attention
and smooth pursuit eye movements comes from studies requiring observers to smoothly
track a moving target with their eyes, and then make a saccade toward a stationary pe­
ripheral target. Such saccades are faster (Krauzlis & Miles, 1996; Tanaka, Yoshida, &
Fukushima, 1998) and more accurate (Gellman & Fletcher, 1992) to locations ahead of
the pursuit direction relative to saccades toward locations behind the pursuit direction.
These results are consistent with the notion that attention is allocated ahead of the direc­
tion of ongoing smooth pursuit. Van Donkelaar (1999) provides more direct evidence in fa­
vor of this interpretation. Observers were required to keep their eyes focused on a
smoothly moving target, while simultaneously monitoring for the appearance of a probe
stimulus. The probe was a small circle that could be flashed ahead of the moving target
(in the direction of pursuit) or behind the moving target (in the wake of pursuit). Critical­
ly, participants were required to continue smooth pursuit throughout the trial, and to
press a key upon detecting the probe without breaking smooth pursuit. The results
showed that responses were significantly faster for probes ahead of the direction of pur­
suit. This suggests that attentional selection of locations ahead of the direction of pursuit
is required to maintain smooth pursuit eye movements. In support of this interpretation,
subsequent experiments showed that probe detection is fastest at the target location and
Page 17 of 32
Attention and Action

just ahead of the target location, with peak enhancement ahead of eye position depending
on pursuit speed (greater speed, further ahead; van Donkelaar & Drew, 2002).

In summary, the behavioral evidence shows enhanced perceptual processing before eye
movements at the eventual location of the eyes. Because attention is known to enhance
perceptual processing in terms of both speed and accuracy, this behavioral evidence sup­
ports a model in which visual-spatial attention selects the targets of eye movements.
However, the behavioral evidence alone cannot distinguish between a model in which vi­
sual-spatial attention and eye-movement control are in fact a single mechanism, as op­
posed to two separate but tightly interconnected mechanisms. One approach to directly
address this question is to investigate the neural mechanisms of visual-spatial attention
and eye-movement control.

Neurophysiological Evidence Linking Visual-Spatial Attention and


Eye Movements

Research on the neural basis of visual-spatial attention and eye movements strongly sug­
gests that there is a great deal of overlap between the neural mechanisms that control at­
tention and the neural mechanisms that control eye movements.

Functional Magnetic Resonance Imaging in


Humans
Much of the research on the neural substrates of visual-spatial attention and eye move­
ments in humans has been conducted with fMRI. The approach of such studies is to local­
ize the brain regions that are active when observers make covert shifts of visual-spatial
attention and compare them to the brain regions that are active when observers move
their eyes (Beauchamp, Petit, Ellmore, Ingeholm, & Haxby, 2001; Corbetta, et al., 1998).
If shifting attention and making eye movements activate the same brain regions, it would
suggest that visual-spatial attention and eye movements employ the same neural mecha­
nisms.

In one study, Beauchamp et al. (2001) had observers perform either a covert attention
task or an eye-movement task. In the covert attention task, observers kept their eyes fix­
ated on a point at the center of the screen and attended to a target that jumped from lo­
cation to location in the periphery. In the eye-movement task, observers moved their eyes,
attempting to keep their eyes focused on the location of the target as it jumped from posi­
tion to position. In each condition, the rate at which the target changed location was var­
ied from 0.2 times per second, to 2.5 times per second. Activation for both tasks was com­
pared with a control task in which the target remained stationary at fixation. Relative to
this control task, several areas were more active during both the covert attention task
and the eye-movement task, including the precentral sulcus (PreCS), the intraparietal sul­
cus (IPS), the lateral occipital sulcus (LOS). Activity in the PreCS appeared to have two
distinct foci, and the superior PreCS has been identified as the possible homolog of the

Page 18 of 32
Attention and Action

monkey frontal eye fields (FEFs) (Paus, 1996), which plays a central role in the control of
eye-movements.

Although these brain regions were active during both the covert attention task
(p. 267)

and the eye-movement task, the level of activity was significantly greater during the eye-
movement task. Based on this result, Beauchamp et al. (2001) proposed the intriguing hy­
pothesis that a shift of visual-spatial attention is simply a subthreshold activation of the
oculomotor control regions. Moderate, subthreshold activity in oculomotor control re­
gions will cause a covert shift of spatial attention without triggering an eye movement,
whereas higher, suprathreshold activity in those same regions will cause eye movement.
Of course, the resolution of fMRI does not allow us to determine that the exact same neu­
rons involved in shifting attention are involved with eye movements. A plausible alterna­
tive is that separate populations of neurons underlie shifts of attention and the control of
eye movements, but that these populations are located in roughly the same anatomical re­
gions (see Corbetta et al., 2001, for a discussion of this possibility). However, if attention
and eye movements had separate underlying populations of neurons, the degree of over­
lap between the two networks would suggest a functional purpose, perhaps to enable
communication between the systems. Thus, on either view, there is strong evidence for
shared or tightly interconnected neural mechanisms for visual-spatial attention and eye
movements.

Recall that the behavioral evidence presented above suggests that a shift of attention
necessarily precedes an eye movement. Thus, even if eye movements required neural
mechanisms completely separate from shifts of attention, the brain regions involved in
the eye movement task should include both the attention regions (which behavioral work
suggests must be activated to make an eye movement) and those involved specifically
with making eye movements (if there are any). How do we know that all of these active
regions are not just part of the attention system? The critical question is not only whether
the brain regions for attention and eye movements overlap but also whether there is any
nonoverlap between them because the nonoverlap may represent eye-movement-specific
neural mechanisms. Physiological studies in monkeys provide some insight into this ques­
tion.

Neurophysiological Studies in Monkeys


Both cortical and subcortical brain regions are involved in planning and executing sac­
cadic eye movements, including the FEF (Bizzi, 1968; Bruce & Goldberg, 1985; Schall,
Hanes, Thompson, & King, 1995; Schiller & Tehovnik, 2001), the supplementary eye field
(Schlag & Schlag-Rey, 1987), the dorsolateral prefrontal cortex (Funahashi, Bruce, &
Goldman-Rakic, 1991), the parietal cortex (Mountcastle, Lynch, Georgopoulos, Sakata, &
Acuna, 1975; D. L. Robinson, Goldberg, & Stanton, 1978), the pulvinar nucleus of the
thalamus (Petersen, Robinson, & Keys, 1985), and the superior colliculus (SC) (Robinson,
1972; Wurtz & Goldberg, 1972a, 1972b).

Page 19 of 32
Attention and Action

Research employing direct stimulation of neurons in the FEF and SC supports the hypoth­
esis that the same neural mechanisms that control eye movements also control covert
shifts of spatial attention. For instance, Moore and Fallah (2004) trained monkeys to de­
tect a target (a brief dimming) in the peripheral visual field. It was found that subthresh­
old stimulation of FEF cells improved performance at peripheral locations where the
monkey’s eyes would move if suprathreshold stimulation were applied. Similarly, Ca­
vanaugh and Wurtz (2004) have shown that change detection is improved by subthresh­
old microstimulation of SC, and this improvement was only observed when the change oc­
curred at the location where the eyes would move if suprathreshold stimulation were ap­
plied. Thus, it appears that the cells in the FEF and SC that control the deployment of eye
movements also control covert shifts of spatial attention.

Although compelling, these microstimulation studies are still limited by the possibility
that microstimuation likely influences several neurons within the region of the electrode.
Thus, it remains possible that eye movements and attention are controlled by different
cells, but these cells are closely intertwined. Indeed, there is evidence supporting the
possibility that some neurons code for eye movements and not spatial selection, whereas
other neurons code for spatial selection and not eye movements. For instance, Sato and
Schall (2003) found that some neurons within the FEF appear to code for the locus of spa­
tial attention, whereas others code only for the saccade location. Moreover, Ig­
nashchenkova et al. (2004) measured single-unit activity in the SC and found that visual
and visuomotor neurons were active during covert shifts of attention, but that purely mo­
tor neurons were not. Thus, both the FEF and SC appear to have neurons that are in­
volved with both attention and eye movements, as well as some purely motor eye-move­
ment neurons.

Summary
What is the relationship between attention and action? To answer this question we must
first (p. 268) clarify, at least a little bit, what we mean by the term “attention.” In this
chapter the term has been used to refer to the process of selection that occurs at multiple
levels of representation within the cognitive system, including goal selection, visual selec­
tion, and action selection. An overview of visual attention was presented, highlighting the
capacity limitations on visual processing and the attentional mechanisms by which we
cope with these capacity limitations, effectively tuning the system to the most important
subset of the incoming information. There has been some debate about whether the limits
on visual attention are due to information processing constraints or to limitations on
physical action. The idea that visual attention is limited by action is undermined by the
mismatch in limitations between visual attention and physical action. Moreover, there is
ample evidence demonstrating that the architecture of the visual system alone, indepen­
dent of action, requires mechanisms of attention and constrains how those mechanisms
operate. In other words, attention is clearly required to manage information processing
constraints within the visual system, irrespective of limitations on physical action. This
does not rule out the possibility that action limitations impose additional constraints on
Page 20 of 32
Attention and Action

visual attention, but it does argue against the idea that action limitations alone impose
constraints on visual attention.

Despite the claim that attention is not limited only because action is limited, it is clear
that visual-spatial attention and action are tightly coupled. Indeed, in the case study of
spatial attention and eye movements, the attention system and action system are highly
overlapping both behaviorally and neurophysiologically, and where they do not overlap
completely they are tightly intertwined. In this case the overlap makes perfect sense be­
cause attending to a location in space and moving the eyes to a position in space require
similar computations (i.e., specifying a particular spatial location relative to some refer­
ence frame). It is interesting to consider whether other mechanisms of visual selection
(feature based, object based) are as intimately tied to action, or whether the coupling is
primarily between spatial attention and action. Further, the current chapter focused on
eye movements, but reaching movements and locomotion are other important forms of ac­
tion that must be considered as well. Detailed models of the computations involved in
these different forms of attentional selection and the control of action would predict that,
like the case of spatial attention and eye movements, visual selection and action mecha­
nisms are likely to overlap to the extent that shared computations and representations
are required by each. However, it is likely that the relationship between attention and ac­
tion will vary across different forms of attention and action. For instance, there is some
evidence that the interaction between visual-spatial attention and hand movements might
be more flexible than the interaction between spatial attention and eye movements.
Specifically, spatial attention appears to select the targets for both eye movements and
hand movements. However, it is possible to shift attention away from eventual hand loca­
tion before the hand is moved, but it is not possible to shift attention away from the even­
tual eye position before the eye has moved (Deubel & Schneider, 2003).

Our understanding of visual-spatial attention and action has been greatly expanded by
considering the relationship between these cognitive systems. Further exploring similar
questions within other domains of attention (e.g., goal selection, other mechanisms of vi­
sual selection, auditory selection, tactile selection) and other domains of action (e.g.,
reaching, locomotion, complex coordinated actions) is likely to provide an even greater
understanding of attention and action, and of how they operate together within an inte­
grated cognitive system.

References
Allport, D. A. (1987). Selection for action: Some behavioral and neurophysiological con­
siderations of attention and action. In H. Heuer & A. F. Sanders (Eds.), Perspectives on
perception and action (pp. 395–419). Cambridge, MA: Erlbaum.

Allport, D. A. (1989). Visual attention. In M. I. Posner (Ed.), Foundations of cognitive sci­


ence (pp. 631–682). Cambridge, MA: MIT Press.

Page 21 of 32
Attention and Action

Allport, D. A. (1993). Attention and control. Have we been asking the wrong questions? A
critical review of twenty-five years. In D. E. Meyer & S. Kornblum (Eds.), Attention and
performance XIV: Synergies in experimental psychology, artificial intelligence, and cogni­
tive neuroscience (pp. 183–218). Cambridge, MA: MIT Press.

Alvarez, G. A., & Cavanagh, P. (2005). Independent resources for attentional tracking in
the left and right visual hemifields. Psychological Science, 16 (8), 637–643.

Alvarez, G. A., & Franconeri, S. L. (2007). How many objects can you track? Evidence for
a resource-limited attentive tracking mechanism. Journal of Vision, 7 (13–14), 1–10.

Atchley, P., & Kramer, A. F. (2001). Object-based attentional selection in three-dimension­


al space. Visual Cognition, 8, 1–32.

Avrahami, J. (1999). Objects of attention, objects of perception. Perception & Psy­


chophysics, 61 (8), 1604–1612.

Awh, E., & Pashler, H. (2000). Evidence for split attentional foci. Journal of Experimental
Psychology: Human Perception and Performance, 26 (2), 834–846.

Bacon, W. J., & Egeth, H. E. (1997). Goal-directed guidance of attention: Evidence


(p. 269)

from conjunctive visual search. Journal of Experimental Psychology Human Perception


and Performance, 23 (4), 948–961.

Balazsi, A. G., Rootman, J., Drance, S. M., Schulzer, M., & Douglas, G. R. (1984). The ef­
fect of age on the nerve fiber population of the human optic nerve. American Journal of
Ophthalmology, 97 (6), 760–766.

Beauchamp, M. S., Petit, L., Ellmore, T. M., Ingeholm, J., & Haxby, J. V. (2001). A paramet­
ric fMRI study of overt and covert shifts of visuospatial attention. NeuroImage, 14 (2),
310–321.

Ben-Shahar, O., Scholl, B. J., & Zucker, S. W. (2007). Attention, segregation, and textons:
Bridging the gap between object-based attention and texton-based segregation. Vision Re­
search, 47 (6), 845–860.

Bizzi, E. (1968). Discharge of frontal eye field neurons during saccadic and following eye
movements in unanesthetized monkeys. Experimental Brain Research, 6, 69–80.

Broadbent, D. (1958). Perception and communication. London: Pergamon.

Bruce, C. J., & Goldberg, M. E. (1985). Primate frontal eye fields. I. Single neurons dis­
charging before saccades. Journal of Neurophysiology, 53 (3), 603–635.

Bruesch, S. R., & Arey, L. B. (2004). The number of myelinated and unmyelinated fibers in
the optic nerve of vertebrates. Journal of Comparative Neurology, 77 (3), 631–665.

Page 22 of 32
Attention and Action

Bullier, J. (2004). Communications between cortical areas of the visual system. In L. M.


Chalupa & J. S. Werner (Eds.), The visual neurosciences (pp. 522–540). Cambridge, MA:
MIT Press.

Carlson, T. A., Alvarez, G. A., & Cavanagh, P. (2007). Quadrantic deficit reveals anatomi­
cal constraints on selection. Proceedings of the National Academy of Sciences U S A, 104
(33), 13496–13500.

Carrasco, M., Ling, S., & Read, S. (2004). Attention alters appearance. Nature Neuro­
science, 7 (3), 308–313.

Carrasco, M., Talgar, C. P., & Cameron, E. L. (2001). Characterizing visual performance
fields: Effects of transient covert attention, spatial frequency, eccentricity, task and set
size. Spatial Vision, 15 (1), 61–75.

Carrasco, M., Williams, P. E., & Yeshurun, Y. (2002). Covert attention increases spatial
resolution with or without masks: support for signal enhancement. Journal of Vision, 2 (6),
467–479.

Cavanagh, P., & Alvarez, G. A. (2005). Tracking multiple targets with multifocal attention.
Trends in Cognitive Sciences, 9 (7), 349–354.

Cavanaugh, J., & Wurtz, R. H. (2004). Subcortical modulation of attention counters


change blindness. Journal of Neuroscience, 24 (50), 11236–11243.

Chelazzi, L., Biscaldi, M., Corbetta, M., Peru, A., Tassinari, G., & Berlucchi, G. (1995).
Oculomotor activity and visual spatial attention. Behavioural Brain Research, 71 (1–2),
81–88.

Cooper, A., & Humphreys, G. (1999). A new, object-based visual illusion. Paper presented
at the Psychonomic Society, Los Angeles.

Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A., et al.
(1998). A common network of functional areas for attention and eye movements. Neuron,
21 (4), 761–773.

Crovitz, H. F., & Daves, W. (1962). Tendencies to eye movement and perceptual accuracy.
Journal of Experimental Psychology, 63, 495–498.

Curcio, C. A., & Allen, K. A. (1990). Topography of ganglion cells in human retina. Journal
of Comparative Neurology, 300 (1), 5–25.

Curcio, C. A., Sloan, K. R., Jr., Packer, O., Hendrickson, A. E., & Kalina, R. E. (1987). Dis­
tribution of cones in human and monkey retina: Individual variability and radial asymme­
try. Science, 236 (4801), 579–582.

Page 23 of 32
Attention and Action

Desimone, R. (1998). Visual attention mediated by biased competition in extrastriate visu­


al cortex. Philosophical Transactions of the Royal Society of London, Series B, Biological
Sciences, 353 (1373), 1245–1255.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­
nual Review of Neuroscience, 18, 193–222.

Deubel, H., & Schneider, W. X. (2003). Delayed saccades, but not delayed manual aiming
movements, require visual attention shifts. Annals of the New York Academy of Sciences,
1004, 289–296.

Downing, C. J., & Pinker, S. (1985). The spatial structure of visual attention. In M. Posner
& O. Martin (Eds.), Attention and performance XI (pp. 171–187). Hillsdale, NJ: Erlbaum.

Duncan, J. (1984). Selective attention and the organization of visual information. Journal
of Experimental Psychology General, 113 (4), 501–517.

Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychologi­
cal Review, 96 (3), 433–458.

Egeth, H. E., Virzi, R. A., & Garbart, H. (1984). Searching for conjunctively defined tar­
gets. Journal of Experimental Psychology Human Perception and Performance, 10 (1), 32–
39.

Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo­
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy­
chology General, 123 (2), 161–177.

Engel, F. L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Re­
search, 11 (6), 563–576.

Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of fo­
cal attention: a zoom lens model. Perception & Psychophysics, 40 (4), 225–240.

Eriksen, C. W., & Yeh, Y. Y. (1985). Allocation of attention in the visual field. Journal of Ex­
perimental Psychology: Human Perception and Performance, 11 (5), 583–597.

Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is con­
tingent on attentional control settings. Journal of Experimental Psychology Human Per­
ception and Performance, 18 (4), 1030–1044.

Franconeri, S. L., Alvarez, G. A., & Enns, J. T. (2007). How many locations can be selected
at once? Journal of Experimental Psychology: Human Perception and Performance, 33 (5),
1003–1012.

Franconeri, S. L., Hollingworth, A., & Simons, D. J. (2005). Do new objects capture atten­
tion? Psychological Science, 16 (4), 275–281.

Page 24 of 32
Attention and Action

Franconeri, S. L., & Simons, D. J. (2005). The dynamic events that capture visual atten­
tion: A reply to Abrams and Christ (2005). Perception & Psychophysics, 67 (6), 962–966.

Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1991). Neuronal activity related to
saccadic eye movements in the monkey’s dorsolateral prefrontal cortex. Journal of Neuro­
physiology, 65 (6), 1464–1483.

Galletti, C., Fattori, P., Kutz, D. F., & Gamberini, M. (1999). Brain location and visual
topography of cortical area V6A in the macaque monkey. European Journal of Neuro­
science, 11 (2), 575–582.

Gellman, R. S., & Fletcher, W. A. (1992). Eye position signals in human saccadic process­
ing. Experimental Brain Research, 89 (2), 425–434.

Haines, R. F. (1991). A breakdown in simultaneous information processing. In G.


(p. 270)

Orbrecht & L. Stark (Eds.), Presbyopia research (pp. 171–175). New York: Plenum.

He, S., Cavanagh, P., & Intriligator, J. (1996). Attentional resolution and the locus of visual
awareness. Nature, 383 (6598), 334–337.

He, Z. J., & Nakayama, K. (1995). Visual attention to surfaces in three-dimensional space.
Proceedings of the National Academy of Sciences U S A, 92 (24), 11155–11159.

Helmholtz, H. V. (1962). Treatise on physiological optics (Vol. 3). New York: Dover.

Henderson, J. M. (1991). Stimulus discrimination following covert attentional orienting to


an exogenous cue. Journal of Experimental Psychology Human Perception and Perfor­
mance, 17 (1), 91–106.

Henderson, J. M. (1993). Visual attention and saccadic eye movements. In G. d’Ydewalle


& J. Van Rensbergen (Eds.), Perception and cognition: Advances in eye movement re­
search (pp. 37–50). Amsterdam: North-Holland.

Henderson, J. M., Pollatsek, A., & Rayner, K. (1989). Covert visual attention and ex­
trafoveal information use during object identification. Perception & Psychophysics, 45 (3),
196–208.

Hillstrom, A. P., & Yantis, S. (1994). Visual motion and attentional capture. Perception &
Psychophysics, 55 (4), 399–411.

Hoffman, J. E., & Subramaniam, B. (1995). The role of visual attention in saccadic eye
movements. Perception & Psychophysics, 57 (6), 787–795.

Horton, J. C., & Hoyt, W. F. (1991). Quadrantic visual field defects: A hallmark of lesions in
extrastriate (V2/V3) cortex. Brain, 114 (4), 1703–1718.

Humphreys, G. W. (1998). Neural representation of objects in space: A dual coding ac­


count. Philosophical Transactions of the Royal Society of London, Series B, Biological
Sciences, 353 (1373), 1341–1351.
Page 25 of 32
Attention and Action

Ignashchenkova, A., Dicke, P. W., Haarmeier, T., & Thier, P. (2004). Neuron-specific contri­
bution of the superior colliculus to overt and covert shifts of attention. Nature Neuro­
science, 7 (1), 56–64.

Inhoff, A. W., Pollatsek, A., Posner, M. I., & Rayner, K. (1989). Covert attention and eye
movements during reading. Quarterly Journal of Experimental Psychology A, 41 (1), 63–
89.

Intriligator, J., & Cavanagh, P. (2001). The spatial resolution of visual attention. Cognitive
Psychology, 43 (3), 171–216.

Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts
of visual attention. Vision Research, 40 (10–12), 1489–1506.

James, W. (1890). Principles of psychology. New York: Holt.

Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object-
specific integration of information. Cognitive Psychology, 24 (2), 175–219.

Kaptein, N. A., Theeuwes, J., & Van der Heijden, A. H. (1995). Search for a conjunctively
defined target can be selectively limited to a color-defined subset of elements. Journal of
Experimental Psychology: Human Perception and Performance, 21, 1053–1069.

Khurana, B., & Kowler, E. (1987). Shared attentional control of smooth eye movement and
perception. Vision Research, 27 (9), 1603–1618.

Konkle, T., Brady, T. F., Alvarez, G., & Oliva, A. (2010). Conceptual distinctiveness sup­
ports detailed visual long-term memory for real-world objects. Journal of Experimental
Psychology: General, 139 (3), 558–578.

Kramer, A. F., Weber, T. A., & Watson, S. E. (1997). Object-based attentional selection:
Grouped arrays or spatially invariant representations? Comment on Vecera and Farah
(1994). Journal of Experimental Psychology: General, 126 (1), 3–13.

Krauzlis, R. J., & Miles, F. A. (1996). Initiation of saccades during fixation or pursuit: Evi­
dence in humans for a single mechanism. Journal of Neurophysiology, 76 (6), 4175–4179.

LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimen­
tal Psychology: Human Perception & Performance, 9 (3), 371–379.

LaBerge, D., & Brown, V. (1986). Variations in size of the visual field in which targets are
presented: An attentional range effect. Perception & Psychophysics, 40 (3), 188–200.

Lamy, D., & Tsal, Y. (2000). Object features, object locations, and object files: Which does
selective attention activate and when? Journal of Experimental Psychology Human Per­
ception and Performance, 26 (4), 1387–1400.

Land, M. F., & Hayhoe, M. (2001). In what ways do eye movements contribute to every­
day activities? Vision Research, 41 (25–26), 3559–3565.
Page 26 of 32
Attention and Action

Lavie, N., & Driver, J. (1996). On the spatial extent of attention in object-based visual se­
lection. Perception & Psychophysics, 58 (8), 1238–1251.

Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press.

Mackeben, M. (1999). Sustained focal attention and peripheral letter recognition. Spatial
Vision, 12 (1), 51–72.

Marino, A. C., & Scholl, B. J. (2005). The role of closure in defining the “objects” of object-
based attention. Perception & Psychophysics, 67 (7), 1140–1149.

Maunsell, J. H., & Van Essen, D. C. (1987). Topographic organization of the middle tempo­
ral visual area in the macaque monkey: Representational biases and the relationship to
callosal connections and myeloarchitectonic boundaries. Journal of Comparative Neurolo­
gy, 266 (4), 535–555.

McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during fixation
in reading. Perception & Psychophysics, 17, 578–586.

McConkie, G. W., & Rayner, K. (1976). Asymmetry of the perceptual span in reading. Bul­
letin of the Psychonomic Society, 8, 365–368.

McLay, R. W., Anderson, D. J., Sidaway, B., & Wilder, D. G. (1997). Motorcycle accident re­
construction under Daubert. Journal of the National Academy of Forensic Engineering,
14, 1–18.

McMains, S. A., & Somers, D. C. (2004). Multiple spotlights of attentional selection in hu­
man visual cortex. Neuron, 42 (4), 677–686.

Mitroff, S. R., Scholl, B. J., & Wynn, K. (2004). Divide and conquer: How object files adapt
when a persisting object splits into two. Psychological Science, 15 (6), 420–425.

Moore, T., & Fallah, M. (2004). Microstimulation of the frontal eye field and its effects on
covert spatial attention. Journal of Neurophysiology, 91 (1), 152–162.

Most, S. B., & Astur, R. (2007). Feature-based attentional set as a cause of traffic acci­
dents. PVIS, 15 (2), 125–132.

Most, S. B., Scholl, B. J., Clifford, E. R., & Simons, D. J. (2005). What you see is what you
set: Sustained inattentional blindness and the capture of awareness. Psychological Re­
view, 112 (1), 217–242.

(p. 271) Most, S. B., Simons, D. J., Scholl, B. J., Jimenez, R., Clifford, E., & Chabris, C. F.
(2001). How not to be seen: The contribution of similarity and selective ignoring to sus­
tained inattentional blindness. Psychological Science, 12 (1), 9–17.

Motter, B. C., & Simoni, D. A. (2007). The roles of cortical image separation and size in
active visual search performance. Journal of Vision, 7 (2), 6–15.

Page 27 of 32
Attention and Action

Mountcastle, V. B., Lynch, J. C., Georgopoulos, A., Sakata, H., & Acuna, C. (1975). Posteri­
or parietal association cortex of the monkey: Command functions for operations within
extrapersonal space. Journal of Neurophysiology, 38 (4), 871–908.

Mounts, J. R., & Melara, R. D. (1999). Attentional selection of objects or features: evi­
dence from a modified search task. Perception & Psychophysics, 61 (2), 322–341.

Neisser, U. (1967). Cognitive psychology. New York: Appleton-Century-Crofts.

Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer & A. F.


Sanders (Eds.), Perspectives on perception and action (pp. 361–394). Hillsdale, NJ: Erl­
baum.

Neumann, O. (1990). Visual attention and action. In O. Neumann & W. Prinz (Eds.), Rela­
tionships between perception and action: Current approaches (pp. 227–267). Berlin:
Springer.

O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the
units of attentional selection. Nature, 401 (6753), 584–587.

O’Regan, J. K., Deubel, H., Clark, J. J., & Rensink, R. A. (2000). Picture changes during
blinks: Looking without seeing and seeing without looking. Visual Cognition, 7 (1), 191–
211.

O’Regan, J. K., Rensink, R. A., & Clark, J. J. (1999). Change-blindness as a result of “mud­
splashes.” Nature, 398 (6722), 34–34.

Østerberg, G. A. (1935). Topography of the layer of rods and cones in the human retina.
Acta Ophthalmologica, 13 (Suppl 6), 1–97.

Pashler, H. (1998). The psychology of attention. Cambridge, MA: MIT Press.

Paus, T. (1996). Location and function of the human frontal eye-field: A selective review.
Neuropsychologia, 34 (6), 475–483.

Perry, V. H., & Cowey, A. (1985). The ganglion cell and cone distributions in the monkey’s
retina: Implications for central magnification factors. Vision Research, 25 (12), 1795–
1810.

Petersen, S. E., Robinson, D. L., & Keys, W. (1985). Pulvinar nuclei of the behaving rhesus
monkey: Visual responses and their modulation. Journal of Neurophysiology, 54 (4), 867–
886.

Pollatsek, A., Bolozky, S., Well, A. D., & Rayner, K. (1981). Asymmetries in the perceptual
span for Israeli readers. Brain Language, 14 (1), 174–180.

Polyak, S. L. (1941). The retina. Chicago: University of Chicago Press.

Page 28 of 32
Attention and Action

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,


32 (1), 3–25.

Posner, M. I., Snyder, C. R., & Davidson, B. J. (1980). Attention and the detection of sig­
nals. Journal of Experimental Psychology, 109 (2), 160–174.

Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: Evidence
for a parallel tracking mechanism. Spatial Vision, 3 (3), 179–197.

Quigley, H. A., Addicks, E. M., & Green, W. R. (1982). Optic nerve damage in human glau­
coma. III. Quantitative correlation of nerve fiber loss and visual field defect in glaucoma,
ischemic neuropathy, papilledema, and toxic neuropathy. Archives of Ophthalmology, 100
(1), 135–146.

Remington, R., & Pierce, L. (1984). Moving attention: Evidence for time-invariant shifts of
visual selective attention. Perception & Psychophysics, 35 (4), 393–399.

Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for at­
tention to perceive changes in scenes. Psychological Science, 8 (5), 368–373.

Reppa, I., & Leek, E. C. (2003). The modulation of inhibition of return by object-internal
structure: Implications for theories of object-based attentional selection. Psychonomic
Bulletin & Review, 10 (2), 493–502.

Rizzolatti, G., Riggio, L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the
horizontal and vertical meridians: evidence in favor of a premotor theory of attention.
Neuropsychologia, 25 (1A), 31–40.

Robinson, D. A. (1972). Eye movements evoked by collicular stimulation in the alert mon­
key. Vision Research, 12 (11), 1795–1808.

Robinson, D. L., Goldberg, M. E., & Stanton, G. B. (1978). Parietal association cortex in
the primate: Sensory mechanisms and behavioral modulations. Journal of Neurophysiolo­
gy, 74, 698–721.

Rossi, A. F., & Paradiso, M. A. (1995). Feature-specific effects of selective visual attention.
Vision Research, 35 (5), 621–634.

Saenz, M., Buracas, G. T., & Boynton, G. M. (2002). Global effects of feature-based atten­
tion in human visual cortex. Nature Neuroscience, 5 (7), 631–632.

Saenz, M., Buracas, G. T., & Boynton, G. M. (2003). Global feature-based attention for mo­
tion and color. Vision Research, 43 (6), 629–637.

Sato, T. R., & Schall, J. D. (2003). Effects of stimulus-response compatibility on neural se­
lection in frontal eye field. Neuron, 38 (4), 637–648.

Scalf, P. E., & Beck, D. M. (2010). Competition in visual cortex impedes attention to multi­
ple items. Journal of Neuroscience, 30 (1), 161–169.
Page 29 of 32
Attention and Action

Schall, J. D., Hanes, D. P., Thompson, K. G., & King, D. J. (1995). Saccade target selection
in frontal eye field of macaque. I. Visual and premovement activation. Journal of Neuro­
science, 15 (10), 6905–6918.

Schiller, P. H., & Tehovnik, E. J. (2001). Look and see: How the brain moves your eyes
about. Progress in Brain Research, 134, 127–142.

Schlag, J., & Schlag-Rey, M. (1987). Evidence for a supplementary eye field. Journal of
Neurophysiology, 57 (1), 179–200.

Schneider, W. X., & Deubel, H. (1995). Visual attention and saccadic eye movements: Evi­
dence for obligatory and selective spatial coupling. In J. M. Findlay, R. Kentridge & R.
Walker (Eds.), Eye-movement research: Mechanisms, processes, and applications (pp.
317–324). New York: Elsevier.

Scholl, B. J. (2000). Attenuated change blindness for exogenously attended items in a


flicker paradigm. Visual Cognition, 7 (1/2/3), 377–396.

Serences, J. T., & Boynton, G. M. (2007). Feature-based attentional modulations in the ab­
sence of direct visual stimulation. Neuron, 55 (2), 301–312.

Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye move­
ments and spatial attention. Quarterly Journal of Experimental Psychology A, 38 (3), 475–
491.

Shulman, G. L., Remington, R. W., & McLean, J. P. (1979). Moving attention through visual
space. Journal of Experimental Psychology, 5 (3), 522–526.

Simons, D. J. (2000). Current approaches to change blindness. Visual Cognition,


(p. 272)

1/2/3, 1–15.

Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blind­
ness for dynamic events. Perception, 28 (9), 1059–1074.

Simons, D. J., & Levin, D. T. (1998). Failure to detect changes to people in real-world in­
teraction. Psychonomic Bulletin and Review, 5, 644–649.

Simons, D. J., & Rensink, R. (2005). Change blindness: Past, present, and future. Trends
in Cognitive Sciences, 9 (1), 16–20.

Tanaka, M., Yoshida, T., & Fukushima, K. (1998). Latency of saccades during smooth-pur­
suit eye movement in man: Directional asymmetries. Experimental Brain Research, 121
(1), 92–98.

Tipper, S. P., Driver, J., & Weaver, B. (1991). Object-centred inhibition of return of visual
attention. Quarterly Journal of Experimental Psychology A, 43 (2), 289–298.

Titchener, E. B. (1908). Lectures on the elementary psychology of feeling and attention.


New York: Macmillan.
Page 30 of 32
Attention and Action

Tootell, R. B., Switkes, E., Silverman, M. S., & Hamilton, S. L. (1988). Functional anatomy
of macaque striate cortex. II. Retinotopic organization. Journal of Neuroscience, 8 (5),
1531–1568.

Torralbo, A., & Beck, D. M. (2008). Perceptual-load-induced selection as a result of local


competitive interactions in visual cortex. Psychological Science, 19 (10), 1045–1050.

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive


Psychology, 12 (1), 97–136.

Treue, S., & Martinez Trujillo, J. C. (1999). Feature-based attention influences motion pro­
cessing gain in macaque visual cortex. Nature, 399 (6736), 575–579.

Tsotsos, J. K. (1988). A “complexity level” analysis of immediate vision. International Jour­


nal of Computer Vision, 1 (4), 303–320.

Valdes-Sosa, M., Cobo, A., & Pinilla, T. (1998). Transparent motion and object-based at­
tention. Cognition, 66 (2), B13–B23.

Van der Heijden, A. H. (1992). Selective attention in vision. London: Routledge.

van Donkelaar, P. (1999). Spatiotemporal modulation of attention during smooth pursuit


eye movements. NeuroReport, 10 (12), 2523–2526.

van Donkelaar, P., & Drew, A. S. (2002). The allocation of attention during smooth pursuit
eye movements. Progress in Brain Research, 140, 267–277.

Vecera, S. P. (1994). Grouped locations and object-based attention: Comment on Egly, Dri­
ver, and Rafal (1994). Journal of Experimental Psychology: General, 123, 316–320.

Vecera, S. P., & Farah, M. J. (1994). Does visual attention select objects or locations? Jour­
nal of Experimental Psychology General, 123 (2), 146–160.

Ward, R., Goodrich, S., & Driver, J. (1994). Grouping reduces visual extinction: Neuropsy­
chological evidence for weight-linkage in visual selection. Visual Cognition, 1, 101–130.

Wurtz, R. H., & Goldberg, M. E. (1972a). Activity of superior colliculus in behaving mon­
key. 3. Cells discharging before eye movements. Journal of Neurophysiology, 35 (4), 575–
586.

Wurtz, R. H., & Goldberg, M. E. (1972b). Activity of superior colliculus in behaving mon­
key. IV. Effects of lesions on eye movements. Journal of Neurophysiology, 35 (4), 587–596.

Yantis, S. (1992). Multielement visual tracking: Attention and perceptual organization.


Cognitive Psychology, 24, 295–340.

Yantis, S. (1998). Control of visual attention. In H. Pashler (Ed.), Attention (pp. 223–256).
East Sussex, UK: Psychology Press.

Page 31 of 32
Attention and Action

Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence
from visual search. Journal of Experimental Psychology: Human Perception and Perfor­
mance, 10 (5), 601–621.

Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396 (6706), 72–75.

Zohary, E., & Hochstein, S. (1989). How serial is serial processing in vision? Perception,
18 (2), 191–200.

George Alvarez

George A. Alvarez, Department of Psychology, Harvard University, Cambridge, MA

Page 32 of 32
Visual Control of Action

Visual Control of Action  


Melvyn A. Goodale
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0014

Abstract and Keywords

The visual control of skilled actions, such as reaching and grasping, requires fundamen­
tally different computations from those mediating our perception of the world. These dif­
ferences in the computational requirements for vision for action and vision for perception
are reflected in the organization of the two prominent visual streams of processing that
arise from primary visual cortex in the primate brain. Although the ventral stream pro­
jecting to inferotemporal cortex mediates the visual processing underlying our visual ex­
perience of the world, the dorsal stream projecting to the posterior parietal cortex medi­
ates the visual control of skilled actions. Specialized visual-motor modules have emerged
in the posterior parietal cortex for the visual control of eye, hand, and arm movements.
Although the identification of goal objects and the selection of an appropriate course of
action depend on the perceptual machinery of the ventral stream and associated cogni­
tive modules in the temporal and frontal lobes, the execution of the subsequent goal-di­
rected action is mediated by dedicated online control systems in the dorsal stream and
associated motor areas. Ultimately then, both streams work together in the production of
goal-directed actions.

Keywords: action, visual-motor control, dorsal stream, ventral stream, two visual streams, grasping, reaching

Introduction
The visual control of movement is a central feature of almost all our daily activities from
playing tennis to picking up our morning cup of coffee. But even though vision is essen­
tial for all these activities, only recently have vision scientists turned their attention to the
study of visual-motor control. For most of the past 100 years, researchers have instead
concentrated their efforts on working out how vision constructs our perception of the
world. Psychophysics, not the study of visual-motor control, has been the dominant
methodology (Goodale, 1983). Indeed, with the notable exception of eye movements,
which have typically been regarded as an information-seeking adjunct to visual percep­
tion, little attention has been paid to the way in which vision is used to program and con­

Page 1 of 40
Visual Control of Action

trol our actions, particularly the movements of our hands and limbs. Nevertheless, in the
past few decades, considerable progress has been made on this front. Enormous strides,
for example, have been made in our understanding of the visual control of locomotion
(see Patla, 1997; Warren & Fajen, 2004). But in this brief review, I focus largely on the vi­
sual control of reach-to-grasp movements, a class of behavior that is exceptionally well
developed in humans, and one that exemplifies the importance of vision in the control of
action.

I begin by introducing research that has examined the visual cues that play a critical role
in the control of reach-to-grasp movements. I then move on to a discussion of work on the
neural substrates of that control, reviewing evidence that the visual pathways mediating
visual-motor control are quite distinct from those supporting visual perception.

Visual Control of Reach-to-Grasp Move­


(p. 274)

ments
Humans are capable of reaching out and grasping objects with great dexterity, and vision
plays a critical role in this important skill. Think for a moment about what happens when
you perform the deceptively simple act of reaching out and picking up the cup of coffee
sitting on your desk. After identifying your cup among all the other objects on our desk,
you begin to reach out toward the cup, choosing a trajectory that avoids the telephone
and the computer monitor. At the same time, your fingers begin to conform to the shape
of the cup’s handle well before your hand makes contact with the cup. As your fingers
curl around the handle, the initial forces that are generated as you lift the cup are finely
tuned to its anticipated weight—and to your (implicit) predictions about the friction coef­
ficients and compliance of the material from which the cup is made. Visual information is
crucial at every stage of this behavior, but the cues that are used and the neural mecha­
nisms that are engaged are quite different for each of the components involved.

Visual Control of Reach-to-Grasp Movements


Pioneering work by Jeannerod (1981, 1984, 1986, 1988) led to the idea that the reaching
component of a grasping movement is relatively independent from the formation of the
grip itself. Jeannerod showed that when a person reaches out to grasp an object, the size
of the opening between the fingers and thumb is positively correlated with the size of the
object: The bigger the object, the wider the grasp. This relationship can be clearly seen at
the point of maximum grip aperture, which is achieved well before contact is made with
the object (Figure 14.1). The velocity of the movement toward the object, however, is typi­
cally not affected that much by the size of the goal object. Instead, the peak velocity of
the reach is more closely correlated with the distance of the object from the reaching
hand: the further the object, the faster the reaching movement. These results would ap­
pear to suggest that the reach and grip components of a manual prehension movement
are generated by independent visual-motor channels, albeit ones that are temporally cou­

Page 2 of 40
Visual Control of Action

pled.1 This so-called dual-channel hypothesis has become the dominant model of human
prehension.

Figure 14.1 Graph showing grip aperture (the dis­


tance between the index finger and thumb) changing
over time as an individual reaches out to pick up ob­
jects of three different sizes. Notice that the maxi­
mum grip aperture, which is achieved about 70% of
the way through the grasp, is strongly correlated
with object size, even though the hand opens much
wider in flight before closing down on the goal ob­
ject.

Adapted with permission from Jakobson & Goodale,


1991.

According to Jeannerod’s (1981) dual-channel account, the kinematics of the reach com­
ponent, whereby the hand is transported to the goal, are largely determined by visual
cues that are extrinsic to the goal object, such as its distance and location with respect to
the grasping hand. In contrast, the kinematics of the grasp component reflect the size,
shape, and other intrinsic properties of the goal object. Even though later studies (e.g.,
Chieffi & Gentilucci, 1993; Jakobson & Goodale, 1991) showed that the visual control of
the reach and grip components may be more intimately related than Jeannerod had origi­
nally proposed, there is broad consensus that the two components show a good deal of
functional independence and (as will be discussed later) are mediated by relatively inde­
pendent neural circuitry.

Jeannerod’s (1981) dual-channel hypothesis has not gone unchallenged. Smeets and Bren­
ner (1999, 2001, 2009), for example, have proposed instead that the movements of each
finger of the grasping hand are programmed and controlled independently. According to
this account, when a person reaches out to grasp an object with a precision grip, the in­
dex finger is directed to one side of the object and the thumb to the other. The apparent
scaling of grip aperture to object size is nothing more than an emergent property of the
fact that the two digits are moving (independently) toward their respective end points.

Page 3 of 40
Visual Control of Action

Moreover, because both digits are attached to the same limb, the so-called reach (or
transport) component is simply the joint movement (p. 275) of the two digits toward the
object. Simply put, it is location rather than size that drives grasping—and there is no
need to separate grasping into transport and grip components, each sensitive to a differ­
ent set of visual cues.

Smeets and Brenner’s (1999, 2001) double-pointing hypothesis has the virtue of being
parsimonious. Nevertheless, it is not without its critics (e.g., Dubrowski, Bock, Carnahan,
& Jungling, 2002; Mon-Williams & McIntosh, 2000; Mon-Williams & Tresilian, 2001; van
de Kamp & Zaal, 2007). Van de Kamp and Zaal, for example, showed that when one side
of an object was perturbed as a person reached out to grasp it, the trajectories of both
digits were adjusted in flight, a result that would not be predicted by Smeets and
Brenner’s model but one that is entirely consistent with Jeannerod’s (1981) dual-channel
hypothesis. Even more important (as we shall see later), the organization of the neural
substrates of grasping as revealed by neuroimaging and neuropsychology can be more
easily explained by the dual-channel than the double-pointing hypothesis.

Most studies of grasping, including those discussed above, have used rather unnatural
situations in which the goal object is the only object present in the workspace. In the real
world, of course, the workspace is usually cluttered with other objects, some of which
could be potential obstacles for a goal-directed movement. Nevertheless, when people
reach out to grasp an object, their hand and arm rarely collide with other objects in the
workspace. The ease with which this is accomplished belies the fact that a sophisticated
obstacle avoidance system must be at work—a system that encodes possible obstructions
to a goal-directed movement and incorporates this information into the motor plan. The
few investigations that have examined obstacle avoidance have revealed an efficient sys­
tem that is capable of altering the spatial and temporal trajectories of goal-directed
reaching and grasping movements to avoid other objects in the workspace in a fluid man­
ner (e.g., Castiello, 2001; Jackson, Jackson, & Rosicky, 1995; Tresilian, 1998; Vaughan,
Rosenbaum, & Meulenbroek, 2001). There has been some debate as to whether the non­
goal objects are always being treated as obstacles or instead as potential targets for ac­
tion (e.g., Tipper, Howard, & Jackson, 1997) or even as frames of reference for the control
of the movement (e.g., Diedrichsen, Werner, Schmidt, & Trommershauser, 2004; Obhi &
Goodale, 2005). By positioning nongoal objects in different locations in the workspace
with respect to the target, however, it is possible to show that most often these objects
are being treated as obstacles and that when individuals reach out for the goal, the trial-
to-trial adjustments of their trajectories are remarkably sensitive to the position of obsta­
cles both in depth and in the horizontal plane, as well as to their height (Chapman &
Goodale, 2008, 2010). Moreover, the system behaves conservatively, moving the trajecto­
ry of the hand and arm away from nontarget objects, even when those objects are unlike­
ly to interfere with the target-directed movement.

Part of the reason for the rapid growth of research into the visual control of manual pre­
hension has been the development of reliable technologies for recording hand and limb
movements in three dimensions. Expensive movie film was replaced with inexpensive

Page 4 of 40
Visual Control of Action

videotape in the 1980s—and over the past 25 years, the use of accurate recording devices
based on active or passive infrared markers, ultrasound, magnetism, instrumented
gloves, and an array of other technologies has grown enormously. This has made it possi­
ble for the development of what might be termed “visual-motor psychophysics,” in which
investigators are exploring the different visual cues that are used in the programming
and control of grasping.

One of the most powerful sets of cues used by the visual-motor system in mediating
grasping comes from binocular vision (e.g., Servos, Goodale, & Jakobson, 1992). Several
studies, for example, have shown that covering one eye has clear detrimental effects on
grasping (e.g., Keefe & Watt, 2009; Loftus, Servos, Goodale, Mendarozqueta, & Mon-
Williams, 2004; Melmoth & Grant, 2006; Servos, Goodale, & Jakobson, 1992; Watt &
Bradshaw, 2000). People reach more slowly, show longer periods of deceleration, and exe­
cute more online adjustments of both their trajectory and their grip during the closing
phase of the grasp. Not surprisingly, then, adults with stereo deficiencies from amblyopia
have been shown to exhibit slower and less accurate grasping movements (Melmoth, Fin­
lay, Morgan, & Grant, 2009). Interestingly, however, individuals who have lost an eye are
still able to grasp objects as accurately as normally sighted individuals who are using
both eyes. It turns out that they do this by making use of monocular retinal motion cues
generated by exaggerated head movements (Marotta, Perrot, Nicolle, Servos, & Goodale,
1995). The use of these self-generated motion cues appears to be learned: the longer the
time between loss of the eye and testing, the more likely it is that these individuals will
make unusually large vertical and lateral head movements during the execution of the
grasp (Marotta, Perrot, Nicolle, & Goodale, 1995).

Computation of the required distance for the grasp has been shown to depend
(p. 276)

more on vergence than on retinal disparity cues, whereas the scaling of the grasping
movement and the final placement of the fingers depends more on retinal disparity than
vergence (Melmoth, Storoni, Todd, Finlay, & Grant, 2007; Mon-Williams & Dijkerman,
1999). Similarly, motion parallax contributes more to the computation of reach distance
than it does to the formation of the grasp, although motion parallax becomes important
only when binocular cues are no longer available (Marotta, Kruyer, & Goodale, 1998; Watt
& Bradshaw, 2003). (It should be noted that the differential contributions that these cues
make to the reach and grasp components, respectively, are much more consistent with
Jeannerod’s, 1981, dual-channel hypothesis than they are with Smeets and Brenner’s
(1999) double-pointing account.) But even when one eye is covered and the head immobi­
lized, people are still able to reach out and grasp objects reasonably well, suggesting that
static monocular cues can be used to program and control grasping movements. Marotta
and Goodale (1998, 2001), for example, showed that pictorial cues, such as height in the
visual scene and familiar size, can be exploited to program and control grasping—but the
contributions from these static monocular cues are usually overshadowed by binocular in­
formation from vergence or retinal disparity, or both.

Page 5 of 40
Visual Control of Action

The role of the shape and orientation of the goal object in determining the formation of
the grasp is poorly understood. It is clear that the posture of the grasping hand is sensi­
tive to these features (e.g., Cuijpers, Smeets, & Brenner, 2004; Goodale et al., 1994b; van
Bergen, van Swieten, Williams, & Mon-Williams, 2007), but there have been only a few
systematic investigations of how information about object shape and orientation is used
to configure the hand during the planning and execution of grasping movements (e.g.,
Cuijpers, Smeets, & Brenner, 2004; Lee, Crabtree, Norman, & Bingham, 2008; Louw,
Smeets, & Brenner, 2007; van Mierlo, Louw, Smeets, & Brenner, 2009).

But understanding the cues that are used to program and control a grasping movement is
only part of the story. To reach and grasp an object, one presumably has to direct one’s
attention to that object as well as to other objects in the workspace that could be poten­
tial obstacles or alternative goals. Research on the deployment of overt and covert atten­
tion in reaching and grasping tasks has accelerated over the past two decades, and it has
become clear that when vision is unrestricted, people shift their gaze toward the goal ob­
ject (e.g., Ballard, Hayhoe, Li, & Whitehead, 1992; Johansson, Westling, Bäckström, &
Flanagan, 2001) and to those locations on the object where they intend to place their fin­
gers, particularly the points on the object where more visual feedback is required to posi­
tion the fingers properly (e.g., Binsted, Chua, Helsen, & Elliott, 2001; Brouwer, Franz, &
Gegenfurtner, 2009). In cluttered workspaces, people also tend to direct their gaze to ob­
stacles that they might have to avoid (e.g., Johansson et al., 2001). Even when gaze is
maintained elsewhere in the scene, there is evidence that attention is shifted covertly to
the goal and is bound there until the movement is initiated (Deubel, Schneider, & Paprot­
ta, 1998). In a persuasive account of the role of attention in reaching and grasping, Bal­
dauf and Deubel (2010) have argued that the planning of a reach-to-grasp movement re­
quires the formation of what they call an “attentional landscape,” in which the locations
of all the objects and features in the workspace that are relevant for the intended action
are encoded. Interestingly, their model implies parallel rather than sequential deployment
of attentional resources to multiple locations, a distinct departure from how attention is
thought to operate in more perceptual-cognitive models of attention.

Finally, and importantly for the ideas that I discuss later in this review, it should be noted
that the way in which different visual cues are weighted for the control of skilled move­
ments is typically quite different from the way they are weighted for perceptual judg­
ments. For example, Knill (2005) found that participants gave significantly more weight to
binocular compared with monocular cues when they were asked to place objects on a
slanted surface in a virtual display compared with when they were required to make ex­
plicit judgments about the slant. Similarly, Servos (2000) demonstrated that even though
people relied much more on binocular than monocular cues when they grasped an object,
their explicit judgments about the distance of the same object were no better under
binocular than under monocular viewing conditions. These and other, even more dramatic
dissociations that I review later underscore the fundamental differences between how vi­
sion is used for action and for perceptual report. In the next section, I offer a speculative
account of the origins of vision before moving on to discuss the neural organization of the

Page 6 of 40
Visual Control of Action

pathways supporting vision for action on the one hand and vision for perception on the
other.

(p. 277) Neural Substrates of Vision for Action


Visual systems first evolved, not to enable animals to see, but rather to provide distal sen­
sory control of their movements—so that they could direct movements with respect to ob­
jects that were some distance from the body. Vision as “sight” is a relative newcomer on
the evolutionary landscape, but its emergence has enabled animals to carry out complex
cognitive operations on visual representations of the world. Thus, vision in humans and
nonhuman primates (and perhaps other animals as well) serves two distinct but interact­
ing functions: (1) the perception of objects and their relations, which provides a visual
foundation for the organism’s cognitive life and its conscious experience of the world, and
(2) the control of actions directed at (or with respect to) those objects, in which separate
motor outputs are programmed and controlled online. These competing demands on vi­
sion have shaped the organization of the visual pathways in the primate brain, particular­
ly within the visual areas of the cerebral cortex.

Page 7 of 40
Visual Control of Action

Figure 14.2 The two streams of visual processing in


human cerebral cortex. The retina sends projections
to the dorsal part of the lateral geniculate nucleus
(LGNd), which projects in turn to primary visual cor­
tex. Within the cerebral cortex, the ventral stream
arises from early visual areas and projects to the in­
ferotemporal cortex. The dorsal stream also arises
from early visual areas but projects instead to the
posterior parietal cortex. Recently, it has been shown
that the posterior parietal cortex also receives visual
input from the pulvinar via projections to MT and V3,
as well as from the interlaminar layers of LGNd via
projections to MT (middle temporal area) and V3.
The pulvinar receives projections from both the reti­
na and from the superior colliculus (SC). The approx­
imate locations of the two streams are shown on a
three-dimensional reconstruction of the pial surface
of the brain. The two streams involve a series of com­
plex interconnections that are not shown.

Adapted with permission from Goodale & Westwood,


2004.

Beyond the primary visual cortex in the primate cerebral cortex, visual information is
conveyed to a bewildering number of extrastriate areas (Van Essen, 2001). Despite the
complexity of the interconnections between these different areas, two broad “streams” of
projections from primary visual cortex have been identified in the macaque monkey brain:
a ventral stream projecting eventually to the inferotemporal cortex and a dorsal stream
projecting to the posterior parietal cortex (Ungerleider & Mishkin, 1982) (Figure 14.2).
Although some caution must be exercised in generalizing from monkey to human (Sereno
& Tootell, 2005), recent neuroimaging evidence suggests that the visual projections from
early visual areas to the temporal and parietal lobes in the human brain also involve a
separation into ventral and dorsal streams (Culham & Valyear, 2006; Grill-Spector &
Malach, 2004).

Traditional accounts of the division of labor between the two streams (e.g., Ungerleider &
Mishkin, 1982) focused on the distinction between object vision and spatial vision. This
distinction between what and where resonated not only with psychological accounts of
perception that emphasized the role of vision in object recognition and spatial attention,
but also with nearly a century of neurological thought about the functions of the temporal
and parietal lobes in vision (Brown & Schäfer, 1888; Ferrier & Yeo, 1884; Holmes, 1918;).
Page 8 of 40
Visual Control of Action

In the early 1990s, however, the what-versus-where story (p. 278) began to unravel as new
evidence emerged from work with both monkeys and neurological patients. It became ap­
parent that a purely perceptual account of ventral-dorsal function could not explain these
findings. The only way to make sense of them was to consider the nature of the outputs
served by the two streams—and to work out how visual information is eventually trans­
formed into motor acts.

In 1992, Goodale and Milner proposed a reinterpretation of the Ungerleider and Mishkin
(1982) account of the functional distinction between the two visual streams. According to
the Goodale-Milner model, the dorsal stream plays a critical role in the real-time control
of action, transforming moment-to-moment information about the location and disposition
of objects into the coordinate frames of the effectors being used to perform the action
(Goodale & Milner, 1992; Milner & Goodale 2006, 2008). The ventral stream (together
with associated cognitive networks outside the ventral stream) helps to construct the rich
and detailed representations of the world that allow us to identify objects and events, at­
tach meaning and significance to them, and establish their causal relations. Such opera­
tions are essential for accumulating and accessing a visual knowledge base about the
world. Thus, it is the ventral stream that provides the perceptual foundation for the of­
fline control of action, projecting action into the future and incorporating stored informa­
tion from the past into the control of current actions. In contrast, processing in the dorsal
stream does not generate visual percepts; it generates skilled actions (in part by modulat­
ing more ancient visual-motor modules in the midbrain and brainstem; see Goodale,
1996).

Some of the most compelling evidence for the division of labor proposed by Goodale and
Milner (1992) has come from studies of the visual deficits observed in patients with dam­
age to either the dorsal or ventral stream. It has been known for a long time, for example,
that patients with lesions in the dorsal stream, particularly lesions of the superior regions
of the posterior parietal cortex that invade the territory of the intraparietal sulcus and/or
the parieto-occipital sulcus, can have problems using vision to direct a grasp or aiming
movement toward the correct location of a visual target placed in different positions in
the visual field, particularly the peripheral visual field. This deficit is often described as
optic ataxia (following Bálint, 1909; Bálint & Harvey, 1995). But the failure to locate an
object with the hand should not be construed as a problem in spatial vision; many of
these patients, for example, can describe the relative position of the object in space quite
accurately, even though they cannot direct their hand toward it (Perenin & Vighetto,
1988). Moreover, sometimes the deficit will be seen in one hand but not the other. (It
should be pointed out, of course, that these patients typically have no difficulty using in­
put from other sensory systems, such as proprioception or audition, to guide their move­
ments.) Some of these patients are unable to use visual information to rotate their hand,
scale their grip, or configure their fingers properly when reaching out to pick up an ob­
ject, even though they have no difficulty describing the orientation, size, or shape of ob­
jects in that part of the visual field (Goodale et al., 1994; Jakobson, Archibald, Carey, &
Goodale, 1991; Figure 14.3A). Clearly, a “disorder of spatial vision” (Holmes, 1918;
Ungerleider & Mishkin, 1982) fails to capture this range of visual-motor impairments. In­
Page 9 of 40
Visual Control of Action

stead, this pattern of deficits suggests that the posterior parietal cortex plays a critical
role in the visual control of skilled actions (for a more detailed discussion, see Milner &
Goodale 2006).

Figure 14.3 Graphs showing the size of the aperture


between the index finger and thumb during object-di­
rected grasping and manual estimates of object
width for RV, a patient with optic ataxia, and DF, a
patient with visual form agnosia. A, RV was able to
indicate the size of the objects reasonably well (indi­
vidual trials marked as open diamonds), but her max­
imal grip aperture in flight was not well tuned. She
simply opened her hand as wide as possible on every
trial. B, In contrast, DF showed excellent grip scal­
ing, opening her hand wider for the 50-mm wide ob­
ject than for the 25-mm wide object. DF’s manual es­
timates of the width of the two objects, however,
were grossly inaccurate and showed enormous vari­
ability from trial to trial.

The opposite pattern of deficits and spared abilities can be seen in patients with visual
agnosia. Take the case of patient DF, who developed a profound visual form agnosia fol­
lowing carbon monoxide poisoning (Goodale et al., 1991; Milner et al., 1991). Although
magnetic resonance imaging (MRI) showed evidence of diffuse damage consistent with
hypoxia, most of the damage was evident in ventrolateral regions of the occipital cortex,
with V1 remaining largely spared. Even though DF’s “low-level” visual abilities are rea­
sonably intact, she can no longer recognize everyday objects or the faces of her friends
and relatives; nor can she identify even the simplest of geometric shapes. (If an object is
placed in her hand, of course, she has no trouble identifying it by touch.) Remarkably,
however, DF shows strikingly accurate guidance of her hand movements when she at­
tempts to pick up the very objects she cannot identify. Thus, when she reaches out to
grasp objects of different sizes, her hand opens wider mid-flight for larger objects than it
does for smaller ones, just like it does in people with normal vision (Figure 14.3B). Simi­
larly, she rotates her hand and wrist quite normally when she reaches out to grasp ob­
jects in different orientations, and she places her fingers correctly on the surface of ob­
jects with different shapes (Goodale et al., 1994). At the same time, she is quite unable to
distinguish between any of these objects when they are presented to her in simple dis­
crimination tests. She even fails (p. 279) in manual “matching” tasks, in which she is
asked to show how wide an object is by opening her index finger and thumb a corre­
sponding amount. DF’s spared visual-motor skills are not limited to grasping. She can

Page 10 of 40
Visual Control of Action

step over obstacles during locomotion as well as controls, even though her perceptual
judgments about the height of these obstacles are far from normal. Contrary to what
would be predicted from the what-versus-where hypothesis, then, a profound loss of form
perception coexists in DF with a preserved ability to use form in guiding a broad range of
actions. Such a dissociation, of course, is consistent with the idea that there are separate
neural pathways for transforming incoming visual information for the perceptual repre­
sentation of the world and for the control of action. Presumably, it is the former and not
the latter that is compromised in DF (for more details, see Goodale & Milner 2004; Milner
& Goodale 2006).

But where exactly is the damage in DF’s brain? As already mentioned, an early structural
MRI showed evidence of extensive bilateral damage in the ventrolateral occipital cortex.
A more recent high-resolution MRI scan confirmed this damage but revealed that the le­
sions were focused in a region of the lateral occipital cortex (area LO) that we now know
is involved in the visual recognition of objects, particularly their geometric structure
(James, Culham, Humphrey, Milner, & Goodale, 2003). It would appear that this selective
damage to area LO has disrupted DF’s ability to perceive the form of objects. These le­
sions have not interfered with her ability to use visual information about form to shape
her hand when she reaches out and grasps objects—presumably because the visual-motor
networks in her dorsal stream are largely spared.

Since the original work on DF, other patients with ventral stream damage have been iden­
tified who show strikingly similar dissociations between vision for perception and vision
for action. Patient SB, who suffered several bilateral damage to his ventral stream early
in life, shows remarkably preserved visual-motor skills (he plays table tennis and can ride
a motorcycle) despite having profound deficits in his ability to identify objects, faces, col­
ors, visual texture, and words (Dijkerman, Lê S, Démonet, & Milner, 2004; Lê, Cardebat
et al., 2002). Recently, another patient, who sustained bilateral damage to the ventral
stream following a stroke, was tested on several of the same tests that we gave to DF
more than a decade ago. Remarkably, this new patient (JS) behaved almost identically to
DF: in other words, despite his inability to perceive the shape and orientation of objects,
he was able to use these same object features to program and control grasping move­
ments directed at those objects (Karnath, Rüter, Mandler, & Himmelbach, 2009). Finally,
it is worth noting that if one reads the early clinical reports of patients with visual form
agnosia, one can find a number of examples of what appear to be spared visual-motor
skills in the face of massive deficits in form perception. Thus, Campion (1987), (p. 280) for
example, reports that patient RC, who showed a profound visual form agnosia after car­
bon monoxide poisoning, “could negotiate obstacles in the room, reach out to shake
hands and manipulate objects or [pick up] a cup of coffee.”

Thus, the pattern of visual deficits and spared abilities in DF (and in SB, JS, and other pa­
tients with visual form agnosia) is in many ways the mirror image of that observed in the
optic ataxia patients described earlier. DF, who has damage in her ventral stream, can
reach out and grasp objects whose form and orientation she does not perceive, whereas
patients with optic ataxia, who have damage in their dorsal stream, are unable to use vi­

Page 11 of 40
Visual Control of Action

sion to guide their reaching or grasping movements to objects whose form and orienta­
tion they perceive. This “double dissociation” cannot be easily accommodated within the
traditional what-versus-where account but is entirely consistent with the division of labor
between perception and action proposed by Goodale and Milner (1992). It should be not­
ed that the perception–action model is also supported by a wealth of anatomical, electro­
physiological, and lesion studies in the monkey too numerous to review here (for recent
reviews, see Andersen & Buneo, 2003; Cohen & Andersen, 2002; Milner & Goodale, 2006;
Tanaka, 2003). But perhaps some of the most convincing evidence for the perception–ac­
tion proposal has come from functional magnetic resonance imaging (fMRI) studies of the
dorsal and ventral streams in the human brain.

Neuroimaging Evidence for Two Visual Streams


As the organization of the human visual system beyond V1 began to be revealed with the
advent of fMRI (Menon et al., 1992; Ogawa et al., 1992), it soon became apparent that
there was a remarkable correspondence between the layout of extrastriate visual areas in
monkeys and humans, including the separation of these areas into dorsal and ventral
streams (Tootell, Tsao, & Vanduffel, 2003; Van Essen et al., 2001). In the ventral stream,
regions have been identified that seem to be selectively responsive to different categories
of visual stimuli. Early on, an area was isolated within the ventrolateral part of the occipi­
tal cortex (area LO) that appears to be involved in object recognition (for review, see Grill-
Spector, 2003). As mentioned earlier, DF has bilateral lesions in the ventral stream that
include area LO in both hemispheres.

Not surprisingly therefore, an fMRI investigation of activity in DF’s brain revealed no dif­
ferential activation for line drawings of common objects (vs. scrambled versions) any­
where in DF’s remaining ventral stream, mirroring her poor performance in identifying
the objects depicted in the drawings (James et al., 2003) (Figure 14.4). Again, this strong­
ly suggest that area LO is essential for form perception, generating the geometrical struc­
ture of objects by combining information about edges and surfaces that has already been
extracted from the visual array by low-level visual feature detectors.

In addition to LO, other ventral stream areas have been identified that code for faces, hu­
man body parts, and places or scenes (for review, see Milner & Goodale, 2006). Although
there is a good deal of debate about whether these areas are really category specific
(e.g., Downing, Chan, Peelen, Dodds, & Kanwisher, 2006

Page 12 of 40
Visual Control of Action

Kanwisher, 2006) (p. 281) or


instead are particular nodes
in a highly distributed sys­
tem (e.g., Cant, Arnott, &
Goodale, 2009; Cant &
Goodale, 2007; Haxby et al.,
2001; Op de Beeck,
Haushofer, & Kanwisher,
2008), the neuroimaging
work continues to provide
strong support for the idea
that the ventral stream plays
the major role in construct­
ing our perceptual represen­
tation of the world. Indeed,
processing within ventral
stream areas, such as area
Figure 14.4 Neuroimaging in DF’s ventral stream. A, LO, exhibits exactly the char­
A right lateral view of DF’s brain, with the lesion in
acteristics that one might ex­
area LO marked in blue (the lesion is also in the left
hemisphere). B, fMRI activation for line drawings pect to see in such a system.
(vs. scrambled drawings) plotted on a horizontal sec­ For example, LO shows se­
tion through DF’s brain at the level of the red line on lective activation for objects
panel A. DF shows no selective activation for line
irrespective of whether the
drawings either in area LO or in neighboring areas.
C, A control subject shows robust activation to the objects are defined by differ­
same drawings. The activation in the control ences in motion, texture, or
subject’s brain, which has been mathematically mor­ luminance contrast (Grill-
phed onto DF’s brain, coincides well with her LO le­
sions.
Spector, Kushnir, Edelman,
Itzchak, & Malach, 1998).
Adapted with permission from James et al., 2003.
Moreover, LO also appears
to code the overall geometric
shape of an object rather than simply its local contours (Kourtzi & Kanwisher, 2001). Although
there is evidence that area LO shows some sensitivity to changes in object viewpoint (Grill-Spec­
tor et al., 1999;), at least part of area LO appears to be largely insensitive to such changes and
treats different views of the same object as equivalent (James, Humphrey, Gati, Menon, &
Goodale, 2002; Valyear, Culham, Sharif, Westwood, & Goodale, 2006). Taken together, the neu­
roimaging work on the human ventral stream reinforces the idea that this set of pathways plays
a fundamental role in constructing our perceptual representations of the world.
Just as was the case for visual-perceptual areas in the ventral stream, the development of
fMRI has led to the discovery in the human dorsal stream of visual-motor areas that ap­
pear to be largely homologous with those in the monkey brain (for reviews, see Castiello,
2005; Culham & Kanwisher, 2001; Culham & Valyear, 2006). Early on, an area in the in­
traparietal sulcus of the posterior parietal cortex was identified that appeared to be acti­
vated when subjects shifted their gaze (or their covert attention) to visual targets. This
area is thought by many investigators to be homologous with an area on the lateral bank
of the intraparietal sulcus (area LIP) in the monkey that has been similarly associated
with the visual control of eye movements and attention (for reviews, see Andersen &
Buneo, 2003; Bisley & Goldberg, 2010), although in the human brain it is located more

Page 13 of 40
Visual Control of Action

medially in the intraparietal sulcus (Culham, Cavina-Pratesi, & Singhal, 2006; Grefkes &
Fink, 2005; Pierrot-Deseilligny, Milea, & Muri, 2004). A region in the anterior part of the
intraparietal sulcus has been identified that is consistently activated when people reach
out and grasp visible objects in the scanner (Binkofski et al., 1998; Culham, 2004; Culham
et al., 2003). This area has been called human AIP (hAIP) because it thought to be homol­
ogous with a region in the anterior intraparietal sulcus of the monkey that has also been
implicated in the visual control of grasping (for review, see Sakata, 2003). Several more
recent studies have also shown that hAIP is differentially activated during visually guided
grasping (e.g., Cavina-Pratesi, Goodale, & Culham, 2007; Frey, Vinton, Norlund, &
Grafton, 2005). Importantly, area LO in the ventral stream is not activated when subjects
reach out and grasp objects (Cavina-Pratesi, Goodale, & Culham, 2007; Culham, 2004;
Culham et al., 2003), suggesting that this object-recognition area in the ventral stream is
not required for the programming and control of visually guided grasping and that hAIP
and associated networks in the posterior parietal cortex (in association with premotor
and motor areas) can do this independently. This conclusion is considerably strengthened
by the fact that patient DF, who has large bilateral lesions of area LO, shows robust differ­
ential activation in area hAIP for grasping (compared with reaching), similar to that seen
in healthy subjects (James et al. 2003) (Figure 14.5).

Figure 14.5 Neuroimaging in DF’s dorsal stream.


Even though her cerebral cortex shows evidence of
widespread degenerative change as a result of hy­
poxia, there is still robust differential activation for
grasping versus reaching in a region of the intrapari­
etal sulcus that corresponds to hAIP. On individual
trials in the brain scanner, DF was instructed to
grasp the shape presented on the rotating drum or,
in a control condition, to simply reach out and touch
it with her knuckles.

Adapted with permission from James et al., 2003.

There is preliminary fMRI evidence to suggest that when binocular information is avail­
able for the control of grasping, dorsal stream structures can mediate this control with­
out any additional activity in ventral stream areas such as LO. But when (p. 282) only
monocular vision is available, and reliance on pictorial cues becomes more critical, acti­

Page 14 of 40
Visual Control of Action

vation increases in LO along with increased activation in hAIP (Verhagen, Dijkerman,


Grol, & Toni, 2008). This observation is consistent with the psychophysical work reviewed
earlier showing that binocular vision plays the major role in the programming and control
of manual prehension—and helps to explain why DF has great difficulty grasping objects
under monocular viewing conditions (Marotta, Behrmann, & Goodale, 1997).

But what about the visual control of reaching? As we saw earlier, the lesions associated
with the misreaching that defines optic ataxia have been typically found in the posterior
parietal cortex, including the intraparietal sulcus, and sometimes extending into the infe­
rior or superior parietal lobules (Perenin & Vighetto, 1988). More recent quantitative
analyses of the lesion sites associated with misreaching have revealed several key foci in
the parietal cortex, including the medial occipital-parietal junction, the superior occipital
gyrus, the intraparietal sulcus, and the superior parietal lobule as well as parts of the in­
ferior parietal lobule (Karnath & Perenin, 2005). As it turns out, these lesion sites map
nicely onto the patterns of activation found in a recent fMRI study of visually guided
reaching that showed reach-related activation both in a medial part of the intraparietal
sulcus (near the intraparietal lesion site identified by Karnath and Perenin) and in the me­
dial occipital-parietal junction (Prado, Clavagnier, Otzenberger, Scheiber, & Perenin,
2005). A more recent study found that the reach-related focus in the medial intraparietal
sulcus was equally active for reaches with and without visual feedback, whereas an area
in the superior parietal occipital cortex (SPOC) was particularly active when visual feed­
back was available (Filimon, Nelson, Huang, & Sereno, 2009). This suggests that the me­
dial intraparietal region may reflect proprioceptive more than visual control of reaching,
whereas the SPOC may be more involved in visual control. Both these areas have also
been implicated in the visual control of reaching in the monkey (Andersen & Buneo,
2003; Fattori, Gamberini, Kutz, & Galletti, 2001; Snyder, Batista, & Andersen, 1997).

There is evidence to suggest that SPOC may play a role in some aspects of grasping, par­
ticularly wrist rotation (Grol et al., 2007; Monaco, Sedda, Fattori, Galletti, & Culham,
2009), a result that mirrors recent findings on homologous areas in the medial occipital-
parietal cortex of the monkey (Fattori, Breveglieri, Amoroso, & Galletti, 2004; Fattori et
al., 2010). But at the same time, it seems clear from the imaging data that more anterior
parts of the intraparietal sulcus, such as hAIP, play a unique role in visually guided grasp­
ing and appear not to be involved in the visual control of reaching movements. Moreover,
patients with lesions of hAIP have deficits in grasping but retain the ability to reach to­
ward objects (Binkofski et al., 1998), whereas other patients with lesions in more medial
and posterior areas of the parietal lobe, including the SPOC, show deficits in reaching but
not grip scaling (Cavina-Pratesi, Ietswaart, Humphreys, Lestou, & Milner, 2010). The
identification of areas in the human posterior parietal cortex for the visual control of
reaching that are anatomically distinct from those implicated in the visual control of
grasping, particularly the scaling of grip aperture, lends additional support to Jeannerod’s
(1981) proposal that the transport and grip components of reach-to-grasp movements are
programmed and controlled relatively independently. None of these observations, howev­

Page 15 of 40
Visual Control of Action

er, can be easily accommodated within the double-pointing hypothesis of Smeets and
Brenner (1999).

As mentioned earlier, not only are we adept at reaching out and grasping objects, but we
are also able to avoid obstacles that might potentially interfere with our reach. Although
to date there is no neuroimaging evidence about where in the brain the location of obsta­
cles is coded, there is persuasive neuropsychological evidence for a dorsal stream locus
for this coding. Thus, unlike healthy control subjects, patients with optic ataxia from dor­
sal stream lesions do not automatically alter the trajectory of their grasp to avoid obsta­
cles located to the left and right of the path of their hand as they reach out to touch a tar­
get beyond the obstacles—even though they certainly see the obstacles and can indicate
the midpoint between them (Schindler et al., 2004). Conversely, patient DF shows normal
avoidance of the obstacles in the same task, even though she is deficient at indicating the
midpoint between the two obstacles (Rice et al., 2006).

But where is the input to all these visual-motor areas in the dorsal stream coming from?
Although it is clear that V1 has prominent projections to the motion processing area MT
and other areas that provide input to dorsal stream networks, it has been known for a
long time that humans (and monkeys) with large bilateral lesions of V1 are still capable of
performing many visually guided actions despite being otherwise blind with respect to
the controlling stimuli (for review, see Milner & Goodale, 2006; (p. 283) Weiskrantz, 1997).
These residual visual abilities, termed “blindsight” by Sanders et al. (1974), presumably
depend on projections that must run outside of the geniculostriate pathway, such as those
going from the eye to the superior colliculus, the interlaminar layers of the dorsal lateral
geniculate nucleus, or even directly to the pulvinar (for review, see Cowey, 2010). Some of
these extra-geniculate projections may also reach visual-motor networks in the dorsal
stream. It has recently been demonstrated, for example, that a patient with a complete le­
sion of V1 in the right hemisphere was still capable of avoiding obstacles in his blind left
hemifield while reaching out to touch a visual target in his sighted right field (Striemer,
Chapman, & Goodale, 2009). The avoidance of obstacles in this kind of task, as we have
already seen, appears to be mediated by visual-motor networks in the dorsal stream (Rice
et al., 2006; Schindler et al., 2004). Similarly, there is evidence that such patients show
some evidence for grip scaling when they reach out and grasp objects placed in their
blind field (Perenin & Vighetto, 1996). This residual ability also presumably depends on
dorsal stream networks that are being accessed by extra-geniculostriate pathways. There
is increasing evidence that projections from the superior colliculus to the pulvinar—and
from there to MT and area V3—may be the relay whereby visual inputs reach the visual-
motor networks in the dorsal stream (e.g., Lyon, Nassi, & Callaway, 2010). Some have
even suggested that a direct projection from the eye to the pulvinar—and then to MT—
might be responsible (Warner, Goldshmit, & Bourne, 2010). But whatever the pathways
might be, it is clear that the visual-motor networks in the dorsal stream that are known to
mediate grasping and obstacle avoidance during reaching are receiving visual input by­
pass V1.

Page 16 of 40
Visual Control of Action

In summary, the neuropsychological and neuroimaging data that have been amassed over
the past 25 years suggest that vision for action and vision for perception depend on dif­
ferent and relatively independent visual pathways in the primate brain. In short, the visu­
al signals that give us the percept of our coffee cup sitting on the breakfast table are not
the same ones that guide our hand as we reach out to pick up it up!

Although I have focused almost entirely on the role of the dorsal stream in the control of
action, it is important to emphasize that the posterior parietal cortex also plays a critical
role in the deployment of attention, as well as in other high level cognitive tasks, such as
numeracy and working memory. Even so, a strong argument can be made that these func­
tions of the dorsal stream (and associated networks in premotor cortex and more inferior
parietal areas) grew out of a pivotal role that the dorsal stream plays in the control of eye
movements and goal-directed limb movements (for more on these issues, see Moore,
2006; Nieder & Dehaene, 2009; Rizzolatti & Craighero, 1998; Rizzolatti, Riggio, Dascola,
& Umiltá, 1987).

Different Neural Computations for Perception and Action

Although the evidence from a broad range of empirical studies points to that fact that
there are two relatively independent visual pathways in the primate cerebral cortex, the
question remains as to why two separate systems evolved in the first place. Why couldn’t
one “general purpose” visual system handle both vision for perception and vision for ac­
tion? The answer to this question lies in the differences in the computational require­
ments of vision for perception on the one hand and vision for action on the other. Consid­
er the coffee cup example introduced earlier. To be able to grasp the cup successfully, the
visual-motor system has to deal with the actual size of the cup and its orientation and po­
sition with respect to the hand you intend to use to pick it up. These computations need
to reflect the real metrics of the world, or at the very least, make use of learned “look-up
tables” that link neurons coding a particular set of sensory inputs with neurons that code
the desired state of the limb (Thaler & Goodale, 2010). The time at which these computa­
tions are performed is equally critical. Observers and goal objects rarely stay in a static
relationship with one another and, as a consequence, the egocentric location of a target
object can often change radically from moment to moment. In other words, the required
coordinates for action need to be computed at the very moment the movements are per­
formed.

Page 17 of 40
Visual Control of Action

Figure 14.6 The effect of a size-contrast illusion on


perception and action. A, The traditional Ebbinghaus
illusion in which the central circle in the annulus of
larger circles is typically seen as smaller than the
central circle in the annulus of smaller circles, even
though both central circles are actually the same
size. B, The same display, except that the central cir­
cle in the annulus of larger circles has been made
slightly larger. As a consequence, the two central cir­
cles now appear to be the same size. C, A three-di­
mensional (3D) version of the Ebbinghaus illusion.
Participants are instructed to pick up one of the two
3D disks placed either on the display shown in panel
A or the display shown in panel B. D, Two trials with
the display shown in panel B, in which the partici­
pant picked up the small disk on one trial and the
large disk on another. Even though the two central
disks were perceived as being the same size, the grip
aperture in flight reflected the real, not the appar­
ent, size of the disks.

Adapted with permission from Aglioti et al., 1995.

In contrast to vision for action, vision for perception does not need to deal with the ab­
solute size of objects or their egocentric locations. In fact, very often such computations
would be counterproductive because our viewpoint with respect to objects does not re­
main constant—even though our perceptual representations of those objects do show con­
stancy. Indeed, one can argue that it would be better to encode the size, orientation, and
location of objects relative to each other. Such a scene-based frame of reference permits
a perceptual representation of objects that transcends particular viewpoints, (p. 284)
while preserving information about spatial relationships (as well as relative size and ori­
entation) as the observer moves around. The products of perception also need to be avail­
able over a much longer time scale than the visual information used in the control of ac­
tion. We may need to recognize objects we have seen minutes, hours, days—or even years
before. To achieve this, the coding of the visual information has to be somewhat abstract
—transcending particular viewpoint and viewing conditions. By working with perceptual
representations that are object or scene based, we are able to maintain the constancies of
size, shape, color, lightness, and relative location, over time and across different viewing
conditions. Although there is much debate about the way in which this information is cod­

Page 18 of 40
Visual Control of Action

ed, it is pretty clear that it is the identity of the object and its location within the scene,
not its disposition with respect to the observer, that is of primary concern to the percep­
tual system. In fact, current perception, combined with stored information about previ­
ously encountered objects, not only facilitates the object recognition but also contributes
to the control of goal-directed movements when we are working in offline mode (i.e., con­
trolling our movements, not in real time, but rather on the basis of the memory of goal
objects that are no longer visible and their remembered locations in the world).

The differences in the metrics and frames of reference used by vision for perception and
vision for action have been demonstrated in normal observers in experiments that have
made use of pictorial illusions, particularly size-contrast illusions. Aglioti, DeSouza, and
Goodale (1995), for example, showed that the scaling of grip aperture in flight was re­
markably insensitive to the Ebbinghaus illusion, in which a target disk surrounded by
smaller circles appears to be larger than the same disk surrounded by larger circles. They
found that maximum grip aperture was scaled to the real, not the apparent, size of the
target disk (Figure 14.6). A similar dissociation between grip scaling and perceived size
was reported by Haffenden and Goodale (1998), under conditions where participants had
no visual feedback during the execution of grasping movements made to targets present­
ed in the context of an Ebbinghaus illusion. Although grip scaling escaped the influence
of the illusion, the illusion did affect (p. 285) performance in a manual matching task, a
kind of perceptual report, in which participants were asked to open their index finger and
thumb to indicate the perceived size of a disk. (This measure is akin to the typical magni­
tude estimation paradigms used in conventional psychophysics, but with the virtue that
the manual estimation makes use of the same effector that is used in the grasping task.)
To summarize, then, the aperture between the finger and thumb was resistant to the illu­
sion when the vision-for-action system was engaged (i.e., when the participant grasped
the target) and sensitive to the illusion when the vision-for-perception system was en­
gaged (i.e., when the participant estimated its size).

This dissociation between what people do and what they say they see underscores the dif­
ferences between vision for action and vision for perception. The obligatory size-contrast
effects that give rise to the illusion (in which different elements of the array are com­
pared) presumably play a crucial role in scene interpretation, a central function of vision
for perception. But the execution of a goal-directed act, such as manual prehension, re­
quires computations that are centered on the target itself, rather than on the relations be­
tween the target and other elements in the scene. In fact, the true size of the target for
calibrating the grip can be computed from the retinal-image size of the object coupled
with an accurate estimate of distance. Computations of this kind, which do not take into
account the relative difference in size between different objects in the scene, would be
expected to be quite insensitive to the kinds of pictorial cues that distort perception when
familiar illusions are presented.

The initial demonstration by Aglioti et al. (1995) that grasping is refractory to the Ebbing­
haus illusion engendered a good deal of interest among researchers studying vision and
motor control—and there have been numerous investigations of the effects (or not) of pic­

Page 19 of 40
Visual Control of Action

torial illusions on visual-motor control. Some investigators have replicated the original
observations of Agioti et al. with the Ebbinghaus illusion (e.g., Amazeen & DaSilva, 2005;
Fischer, 2001; Kwok & Braddick, 2003)—and others have observed a similar insensitivity
of grip scaling to the Ponzo illusion (Brenner & Smeets, 1996; Jackson & Shaw, 2000), the
horizontal-vertical illusion (Servos, Carnahan, & Fedwick, 2000), the Müller-Lyer illusion
(Dewar & Carey, 2006), and the diagonal illusion (Stöttinger & Perner, 2006; Stöttinger,
Soder, Pfusterschmied, Wagner, & Perner, 2010). Others have reported that pictorial illu­
sions affect some aspects of motor control but not others (e.g., Biegstraaten et al., 2007;
Daprati & Gentilucci, 1997; Gentilucci et al., 1996; Glazebrook, de Grave, Brenner, &
Smeets, 2005; van Donkelaar, 1999). And a few investigators have found no dissociation
whatsoever between the effects of pictorial illusions on perceptual judgments and the
scaling of grip aperture (e.g., Franz et al., 2000; Franz, Bülthoff & Fahle, 2003).

Demonstrating that actions such as grasping are sometimes sensitive to illusory displays
is not by itself a refutation of the idea of two visual systems. One should not be surprised
that visual perception and visual-motor control can interact in the normal brain. Ultimate­
ly, after all, perception has to affect our actions or the brain mechanisms mediating per­
ception would never have evolved! The real surprise, at least for monolithic accounts of
vision, is that there are clear instances when visually guided action is apparently unaf­
fected by pictorial illusions, which, by definition, affect perception. But from the stand­
point of the duplex perception–action model, such instances are to be expected (see
Goodale, 2008; Milner & Goodale, 2006, 2008). Nevertheless, the fact that action has
been found to be affected by pictorial illusions in some instances has led a number of au­
thors to argue that the earlier studies demonstrating a dissociation had not adequately
matched action and perception tasks for various input, attentional, and output demands
(e.g., Smeets & Brenner, 2001; Vishton & Fabre, 2003)—and that when these factors are
taken into account, the apparent differences between perceptual judgments and motor
control could be resolved without invoking the idea of two visual systems. Other authors,
notably Glover (2004), have argued that action tasks involve multiple stages of processing
from purely perceptual to more “automatic” visual-motor control. According to his plan­
ning/control model, illusions would be expected to affect the early but not the late stages
of a grasping movement (Glover 2004; Glover & Dixon 2001a, 2001b).

Some of these competing accounts, such as Glover’s (2004) planning/control model, can
be viewed simply as modifications of the original perception–action model, but there are a
number of other studies in which the results cannot easily be reconciled with the two vi­
sual systems model, and it remains a real question as to why actions appear to be sensi­
tive to illusions in some experiments but not in others. But as it turns out, there are sever­
al reasons why grip aperture might appear (p. 286) to be sensitive to illusions under cer­
tain testing conditions—even when it is not. In some cases, notably the Ebbinghaus illu­
sion, the flanker elements can be treated as obstacles, influencing the posture of the fin­
gers during the execution of the grasp (de Grave et al., 2005; Haffenden, Schiff, &
Goodale, 2001; Plodowski & Jackson, 2001). In other words, the apparent effect of the il­
lusion on grip scaling in some experiments might simply reflect the operation of visual-
motor mechanisms that treat the flanker elements of the visual arrays as obstacles to be
Page 20 of 40
Visual Control of Action

avoided. Another critical variable is the timing of the grasp with respect to the presenta­
tion of the stimuli. When targets are visible during the programming of a grasping move­
ment, maximum grip aperture is usually not affected by size-contrast illusions, whereas
when vision is occluded before the command to initiate programming of the movement is
presented, a reliable effect of the illusion on grip aperture is typically observed (West­
wood, Heath, & Roy, 2000; Westwood & Goodale, 2003; Fischer, 2001; Hu & Goodale,
2000). As discussed earlier, vision for action is designed to operate in real time and is not
normally engaged unless the target object is visible during the programming phase, when
(bottom-up) visual information can be immediately converted into the appropriate motor
commands. The observation that (top-down) memory-guided grasping is affected by the il­
lusory display reflects the fact that the stored information about the target’s dimensions
was originally derived from the earlier operation of vision for perception (for a more de­
tailed discussion of these and related issues, see Bruno, Bernadis, & Gentilucci, 2008;
Goodale, Westwood, & Milner, 2004).

Nevertheless, some have argued that if the perceptual and grasping tasks are appropri­
ately matched, then grasping can be shown to be as sensitive to size-contrast illusions as
psychophysical judgments (Franz, 2001; Franz et al., 2000;) Although this explanation, at
least on the face of it, is a compelling one, it cannot explain why Aglioti et al. (1995) and
Haffenden and Goodale (1998) found that when the relative sizes of the two target ob­
jects in the Ebbinghaus display were adjusted so that they appeared to be perceptually
identical, the grip aperture that participants used to pick up the two targets continued to
reflect the physical difference in their size.

Experiments by Ganel, Tanzer, and Goodale (2008b) provide evidence that is even more
difficult to explain away by appealing to a failure to match testing conditions and other
task-related variables. In this experiment, which used a version of the Ponzo illusion, a re­
al difference in size was pitted against a perceived difference in size in the opposite direc­
tion (Figure 14.7). The results were remarkably clear. Despite the fact that people be­
lieved that the shorter object was the longer one (or vice versa), their in-flight grip aper­
ture reflected the real, not the illusory, size of the target objects (Figure 14.8). In other
words, on the same trials in which participants erroneously decided that one object was
the longer (or shorter) of the two, the anticipatory opening between their fingers reflect­
ed the real direction and magnitude of size differences between the two objects. More­
over, the subjects in this experiment showed the same differential scaling to the real size
of the objects whether the objects were shown on the illusory display or on the control
display. Not surprisingly, when subjects were asked to use their finger and thumb to esti­
mate the size of the target objects rather than pick them up, their manual estimates re­
flected the apparent, not the real, size of the targets. Overall, these results underscore
once more the profound difference in the way visual information is transformed for action
and perception. Importantly, too, the results are difficult to reconcile with any argument
that suggests that grip aperture is sensitive to illusions, and that the absence of an effect
found in many studies is simply a consequence of differences in the task demands (Franz,
2001; Franz et al., 2000).

Page 21 of 40
Visual Control of Action

One exceptionally interesting (and controversial) finding with respect to differences in


the computations used by vision for perception and vision for action is the recent demon­
stration that grip scaling, unlike manual estimates of object size, does not appear to obey
Weber’s law (Ganel, Chajut, & Algom, 2008a). In other words, when people estimated the
size of an object (either by adjusting a comparison line on a computer screen or by mak­
ing a manual estimate), the Just Noticeable Difference (JND) increased with physical size
in accord with Weber’s law; but when they reached out and picked up the object, the JND,
as indicated by differences in grip aperture, was unaffected by variations in the size of
the object. This surprising finding would appear to suggest that Weber’s law is violated
for visually guided actions, reflecting a fundamental difference in the way that object size
is computed for action and for perception.

Figure 14.7 Stimuli and experimental design of


Ganel et al. (2008) study. A, The experimental para­
digm and the version of the Ponzo illusion used. B,
The arrangement of the objects on incongruent trials
in which the real size and the illusory size were pit­
ted against one another. In this example, object 1 is
perceived in most cases as shorter than object 2 (due
to the illusory context), although it is actually longer.
The real difference in size can be clearly seen in C,
where the two objects are placed next to one another
(for illustrative purposes) on the nonillusory control
display.

Adapted with permission from Ganel et al., 2008b.

Page 22 of 40
Visual Control of Action

Figure 14.8 Maximum grip aperture and perceptual


estimates of length for objects placed on the illusory
display (A) and control display (B). Only incongruent
trials in which participants made erroneous deci­
sions about real size are shown for the grip aperture
and estimates with the illusory display. As Panel A
shows, despite the fact that participants erroneously
perceived the physically longer object to be the
shorter one (and vice versa), the opening between
their finger and thumb during the grasping move­
ments reflected the real difference in size between
the objects. This pattern of results was completely
reversed when they made perceptual estimates of
the length of the objects. With the control display (B)
, both grip aperture and manual estimates went in
the same direction.

Adapted with permission from Ganel et al., 2008b.

Of course, this finding (as well as the fact that actions are often resistant to size-contrast
illusions) fits well with Smeets and Brenner’s (1999, 2001) double-pointing hypothesis.
They would (p. 287) argue that the visual-motor system does not compute the size of the
object but instead computes the two locations on the surface of object where the digits
will be placed. According to their double-pointing hypothesis, size is irrelevant to the
planning of these trajectories, and thus variation in size will not affect the accuracy with
which the finger and thumb are placed on either side the target (p. 288) object. In short,
Weber’s law is essentially irrelevant (Smeets & Brenner, 2008). The same argument ap­
plies to grasping movements made in the context of size-contrast illusions: Because grip
scaling is simply an epiphenomenon of the independent finger trajectories, grip aperture
seems to be impervious to the effects of the illusion. Although, as discussed earlier,
Smeets and Brenner’s account has been challenged, it has to be acknowledged that their
double-pointing model offers a convincing explanation of all these findings (even if the
neuropsychological and neuroimaging data are more consistent with Jeannerod’s, 1981,
two-visual-motor channel account of reach-to-grasp movements).

Even so, there are some behavioral observations that cannot also be accommodated by
the Smeets and Brenner (1999, 2001) model. For example, as discussed earlier, if a delay
is introduced between viewing the target and initiating the grasp, the scaling of the antic­
ipatory grip aperture is much more likely to be sensitive to size-contrast illusions (Fisch­
er, 2001; Hu & Goodale, 2000; Westwood & Goodale, 2003; Westwood, Heath, & Roy,
2000). Moreover, if a similar delay is introduced in the context of the Ganel et al. (2008a)
experiments just described, grip aperture now obeys Weber’s law. These results cannot
Page 23 of 40
Visual Control of Action

be easily explained by the Smeets and Brenner model without conceding that—with delay
—grip scaling is no longer a consequence of programming individual digit trajectories,
but instead reflects the perceived size of the target object. Nor can the Smeets and Bren­
ner model explain what happens when unpracticed finger postures (e.g., the thumb and
ring fingers) are used to pick up objects in the context of a size-contrast illusion. In con­
trast to skilled grasping movements, grip scaling with unpracticed awkward grasping is
quite sensitive to the illusory difference in size between the objects (Gonzalez, Ganel,
Whitwell, Morrissey, & Goodale, 2008). Only with practice does grip aperture begin to re­
flect the real size of the target objects. Smeets and Brenner’s model cannot account for
this result without positing that individual control over the digits occurs only after prac­
tice. Finally, recent neuropsychological findings with patient DF suggest that she could
use action-related information about object size (presumably from her intact dorsal
stream) to make explicit judgments regarding the length of an object that she was about
to pick up (Schenk & Milner, 2006), suggesting that size, rather than two separate loca­
tions, is implicitly coded during grasping. At this point, the difference between the per­
ception–action model (Goodale & Milner, 1992; Milner & Goodale, 2006) and the (modi­
fied) Smeets and Brenner account begins to blur. Both accounts posit that real-time con­
trol of skilled grasping depends on visual-motor transformations that are quite distinct
from those involved in the control of delayed or unpracticed grasping movements. The
difference in the two accounts turns on the nature of the control exercised over skilled
movements performed in real time. But note that even if Smeets and Brenner are correct
that the trajectories of the individual digits are programmed individually on the basis of
spatial information that ignores the size of the object, this would not obviate the idea of
two visual systems, one for constructing our perception of the world and one for control­
ling our actions in that world. Indeed, the virtue of the perception–action model is that it
accounts not only for the dissociations outlined above between the control of action and
psychophysical report in normal observers in a number of different settings, but it also
accounts for a broad range of neuropsychological, neurophysiological, and neuroimaging
data (and is completely consistent with Jeannerod’s dual-channel model of reach-to-grasp
movements).

It is worth noting that dissociations between perceptual report and action have been re­
ported for other classes of responses as well. For example, Tavassoli and Ringach (2010)
found that eye movements in a visual tracking task responded to fluctuations in the veloc­
ity of the moving target that were perceptually invisible to the subjects. Moreover, the
perceptual errors were independent of the accuracy of the pursuit eye movements. These
results are in conflict with the idea that the motor control of pursuit eye movements and
the perception are based on the same motion signals and are affected by shared sources
of noise. Similar dissociations have been observed between saccadic eye movements and
perceptual report. Thus, by exploiting the illusory mislocalization of a flashed target in­
duced by visual motion, de’Sperati and Baud-Bovy (2008) showed that fast but not slow
saccades escaped the effects of the illusion and were directed to the real rather than the
apparent location of the target. This result underscores the fact that the control of action
often depends on processes that unfold much more rapidly than those involved in percep­

Page 24 of 40
Visual Control of Action

tual processing (e.g., Castiello & Jeannerod, 1991). Indeed, as has been already dis­
cussed, visual-motor control may often be mediated by fast feedforward mechanisms, in
contrast to conscious perception, which requires (slower) feedback to earlier visual ar­
eas, including V1 (Lamme, 2001).

Interactions Between the Two Streams

When the idea of a separate vision-for-action system was first proposed 20 years
(p. 289)

ago, the emphasis was on the independence of this system from vision for perception. But
clearly the two systems must work closely together in the generation of purposive behav­
ior. One way to think about the interaction between the two streams (an interaction that
takes advantage of the complementary differences in their computational constraints) is
in terms of a “tele-assistance” model (Goodale & Humphrey, 1998). In tele-assistance, a
human operator who has identified a goal object and decided what to do with it communi­
cates with a semi-autonomous robot that actually performs the required motor act on the
flagged goal object (Pook & Ballard, 1996). In terms of this tele-assistance metaphor, the
perceptual machinery in the ventral stream, with its rich and detailed representations of
the visual scene (and links with cognitive systems), would be the human operator.
Processes in the ventral stream participate in the identification of a particular goal and
flag the relevant object in the scene, perhaps by means of an attention-like process. Once
a particular goal object has been flagged, dedicated visual-motor networks in the dorsal
stream (in conjunction with related circuits in premotor cortex, basal ganglia, and brain­
stem) are then activated to transform the visual information about the object into the ap­
propriate coordinates for the desired motor act. This means that in many instances a
flagged object in the scene will be processed in parallel by both ventral and dorsal stream
mechanisms—each transforming the visual information in the array for different purpos­
es. In other situations, where the visual stimuli are particularly salient, visual-motor
mechanisms in the dorsal stream will operate without any immediate supervision by ven­
tral stream perceptual mechanisms.

Of course, the tele-assistance analogy is far too simplified. For one thing, the ventral
stream by itself cannot be construed as an intelligent operator that can make assess­
ments and plans. Clearly, there has to be some sort of top-down executive control—almost
certainly engaging prefrontal mechanisms—that can initiate the operation of attentional
search and thus set the whole process of planning and goal selection in motion (for re­
view, see Desimone & Duncan, 1995; Goodale & Haffenden, 2003). Reciprocal interac­
tions between prefrontal/premotor areas and the areas in the posterior parietal cortex un­
doubtedly play a critical role in recruiting specialized dorsal stream structures, such as
LIP, which appear to be involved in the control of both voluntary eye movements and
covert shifts of spatial attention in monkeys and humans (Bisley & Goldberg, 2010; Cor­
betta, Kincade, & Shulman, 2002). In terms of the tele-assistance metaphor, area LIP can
be seen as acting like a video camera on the robot scanning the visual scene, and thereby
providing new inputs that the ventral stream can process and pass on to frontal systems
that assess their potential importance. In practice, of course, the video camera/LIP sys­
tem does not scan the environment randomly: It is constrained to a greater or lesser de­
Page 25 of 40
Visual Control of Action

gree by top-down information about the nature of the potential targets and where those
targets might be located, information that reflects the priorities of the operator/organism
that are presumably elaborated in prefrontal systems.

What happens next goes beyond even these speculations. Before instructions can be
transmitted to the visual-motor control systems in the dorsal stream, the nature of the ac­
tion required needs to be determined. This means that praxis systems, perhaps located in
the left hemisphere, need to “instruct” the relevant visual-motor systems. After all, ob­
jects such as tools demand a particular kind of hand posture. Achieving this not only re­
quires that the tool be identified, presumably using ventral stream mechanisms (Valyear
& Culham, 2010), but also that the required actions to achieve that posture be selected as
well via a link to these praxis systems. At the same time, the ventral stream (and related
cognitive apparatus) has to communicate the locus of the goal object to these visual-mo­
tor systems in the dorsal stream. One way that this ventral-dorsal transmission could hap­
pen is via recurrent projections from foci of activity in the ventral stream back down­
stream to primary visual cortex and other adjacent visual areas. Once a target has been
“highlighted” on these retinotopic maps, its location could then finally be forwarded to
the dorsal stream for action (for a version of this idea, see Lamme & Roelfsema, 2000).
Moreover, LIP itself, by virtue of the fact that it would be “pointing” at the goal object,
could also provide the requisite coordinates, once it has been cued by recognition sys­
tems in the ventral stream.

When the particular disposition and location of the object with respect to the actor have
been computed, that information has to be combined with the postural requirements of
the appropriate functional grasp for the tool, that as I have already suggested are pre­
sumably provided by praxis systems that are in turn cued by recognition mechanisms in
the ventral (p. 290) stream. At the same time, the initial fingertip forces that should be ap­
plied to the tool (or any object, for that matter) are based on estimations of its mass, sur­
face friction, and compliance that are derived from visual information (e.g., Gordon, West­
ling, Cole, & Johansson, 1993). Once contact is made, somatosensory information can be
used to fine-tune the applied forces—but the specification of the initial grip and lift forces
must be derived from learned associations between the object’s visual appearance and
prior experience with similar objects or materials (Buckingham, Cant, & Goodale, 2009).
This information presumably can be provided only by the ventral visual stream in con­
junction with stored information about past interactions.

Again, it must be emphasized that all of this is highly speculative. Nevertheless, whatever
complex interactions might be involved, it is clear that goal-directed action is unlikely to
be mediated by a simple serial processing system. Multiple iterative processing is almost
certainly required, involving a constant interplay among different control systems at dif­
ferent levels of processing (for a more detailed discussion of these and related issues, see
Milner & Goodale, 2006). A full understanding of the contrasting (and complementary)
roles of the ventral and dorsal streams in this complex network will come only when we

Page 26 of 40
Visual Control of Action

can specify the neural and functional interconnections between the two streams (and oth­
er brain areas) and the nature of the information they exchange.

References
Aglioti, S., DeSouza, J., & Goodale, M. A. (1995). Size-contrast illusions deceive the eyes
but not the hand. Current Biology, 5, 679–685.

Amazeen, E. L., & DaSilva, F. (2005). Psychophysical test for the independence of percep­
tion and action. Journal of Experimental Psychology: Human Perception and Performance,
31, 170–182.

Andersen, R. A., & Buneo, C. A. (2003). Sensorimotor integration in posterior parietal cor­
tex. Advances in Neurology, 93, 159–177.

Ballard, D. H., Hayhoe, M. M., Li, F., & Whitehead, S. D. (1992). Hand–eye coordination
during sequential tasks. Philosophical Transactions of the Royal Society London B Biologi­
cal Sciences, 337, 331–338.

Bálint, R. (1909). Seelenlähmung des “Schauens,” optische Ataxie, räumliche Störung der
Aufmerksamkeit. Monatsschrift für Psychiatrie und Neurologie, 25, 51–81.

Bálint, R., & Harvey, M. (1995). Psychic paralysis of gaze, optic ataxia, and spatial disor­
der of attention. Cognitive Neuropsychology, 12, 265–281.

Baldauf, D., & Deubel, H. (2010). Attentional landscapes in reaching and grasping. Vision
Research, 50, 999–1013.

Biegstraaten, M., de Grave, D. D. J., Brenner, E., & Smeets, J. B. J. (2007). Grasping the
Muller-Lyer illusion: Not a change in perceived length. Experimental Brain Research, 176,
497–503.

Binkofski, F., Dohle, C., Posse, S., Stephan, K. M., Hefter, H., Seitz, R. J., & Freund, H. J.
(1998). Human anterior intraparietal area subserves prehension: A combined lesion and
functional MRI activation study. Neurology, 50, 1253–1259.

Binsted, G., Chua, R., Helsen, W., & Elliott, D. (2001). Eye–hand coordination in goal-di­
rected aiming. Human Movement Sciences, 20, 563–585.

Bisley, J. W., & Goldberg, M. E. (2010). Attention, intention, and priority in the parietal
lobe. Annual Review of Neuroscience, 33, 1–21.

Brenner, E., & Smeets, J. B. (1996). Size illusion influences how we lift but not how we
grasp an object. Experimental Brain Research, 111, 473–476.

Brouwer, A. M., Franz, V. H., & Gegenfurtner, K. R. (2009). Differences in fixations be­
tween grasping and viewing objects. Journal of Vision, 9, 18. 1–24.

Page 27 of 40
Visual Control of Action

Brown, S., & Schäfer, E. A. (1888). An investigation into the functions of the occipital and
temporal lobes of the monkey’s brain. Philosophical Transactions of the Royal Society of
London, 179, 303–327.

Bruno, N., Bernardis, P., & Gentilucci, M. (2008). Visually guided pointing, the Müller-Ly­
er illusion, and the functional interpretation of the dorsal-ventral split: Conclusions from
33 independent studies. Neuroscience and Biobehavioral Reviews, 32, 423–437.

Buckingham, G., Cant, J. S., & Goodale, M. A. (2009). Living in a material world: How vi­
sual cues to material properties affect the way that we lift objects and perceive their
weight. Journal of Neurophysiology, 102, 3111–3118.

Campion, J. (1987). Apperceptive agnosia: The specification and description of constructs.


In G. W. Humphreys & M. J. Riddoch (Eds.), Visual object processing: A cognitive neu­
ropsychological approach (pp. 197–232). London: Erlbaum.

Cant, J. S., Arnott, S. R., & Goodale, M. A. (2009). fMR-adaptation reveals separate pro­
cessing regions for the perception of form and texture in the human ventral stream. Ex­
perimental Brain Research, 192, 391–405.

Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates dif­
ferent regions of human occipitotemporal cortex. Cerebral Cortex, 17, 713–731.

Castiello, U. (2001). The effects of abrupt onset of 2-D and 3-D distractors on prehension
movements. Perception and Psychophysics, 63, 1014–1025.

Castiello, U. (2005). The neuroscience of grasping. Nature Reviews Neuroscience, 6, 726–


736.

Castiello, U., & Jeannerod, M. (1991). Measuring time to awareness. NeuroReport, 2,


797–800.

Cavina-Pratesi, C., Goodale, M. A., & Culham, J. C. (2007). FMRI reveals a dissociation be­
tween grasping and perceiving the size of real 3D objects. PLoS One, 2, e424.

Cavina-Pratesi, C., Ietswaart, M., Humphreys, G. W., Lestou, V., & Milner, A. D. (2010).
Impaired grasping in a patient with optic ataxia: Primary visuomotor deficit or secondary
consequence of misreaching? Neuropsychologia, 48, 226–234.

Chapman, C. S., & Goodale, M. A. (2008). Missing in action: The effect of obstacle
(p. 291)

position and size on avoidance while reaching. Experimental Brain Research, 191, 83–97.

Chapman, C. S., & Goodale, M. A. (2010). Seeing all the obstacles in your way: The effect
of visual feedback and visual feedback schedule on obstacle avoidance while reaching.
Experimental Brain Research, 202, 363–375.

Chieffi, S., & Gentilucci, M. (1993). Coordination between the transport and the grasp
components during prehension movements. Experimental Brain Research, 94, 471–477.

Page 28 of 40
Visual Control of Action

Cohen, Y. E., & Andersen, R. A. (2002). A common reference frame for movement plans in
the posterior parietal cortex. Nature Reviews Neuroscience, 3, 553–562.

Corbetta, M., Kincade, M. J. & Shulman, G. L. (2002). Two neural systems for visual ori­
enting and the pathophysiology of unilateral spatial neglect. In H.-O. Karnath, A. D. Mil­
ner, & G. Vallar (Eds.), The cognitive and neural bases of spatial neglect (pp. 259–273).
Oxford, UK: Oxford University Press.

Cowey, A. (2010). The blindsight saga. Experimental Brain Research, 200, 3–24.

Cuijpers, R. H., Smeets, J. B., & Brenner, E. (2004). On the relation between object shape
and grasping kinematics. Journal of Neurophysiology, 91, 2598–2606.

Cuijpers, R. H., Smeets, J. B., & Brenner, E. (2004). On the relation between object shape
and grasping kinematics. Journal of Neurophysiology, 91, 2598–2606.

Culham, J. C. (2004). Human brain imaging reveals a parietal area specialized for grasp­
ing. In N. Kanwisher & J. Duncan (Eds.) Attention and performance XX: Functional brain—
Imaging of human cognition (417–438). Oxford, UK: Oxford University Press.

Culham, J. C., Cavina-Pratesi, C., & Singhal, A. (2006). The role of parietal cortex in visuo­
motor control: What have we learned from neuroimaging? Neuropsychologia, 44, 2668–
2684.

Culham, J. C., Danckert, S. L., DeSouza, J. F. X., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually-guided grasping produces activation in dorsal but not ventral stream
brain areas. Experimental Brain Research, 153, 158–170.

Culham, J. C., & Kanwisher, N. G. (2001). Neuroimaging of cognitive functions in human


parietal cortex. Current Opinion in Neurobiology, 11, 157–163.

Culham, J. C., & Valyear, K. F. (2006). Human parietal cortex in action. Current Opinion in
Neurobiology, 16, 205–212.

Daprati, E., & Gentilucci, G. (1997). Grasping an illusion. Neuropsychologia, 35, 1577–
1582.

de Grave, D. D., Biegstraaten, M., Smeets, J. B., Brenner, E. (2005) Effects of the Ebbing­
haus figure on grasping are not only due to misjudged size. Experimental Brain Research
163, 58–64.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­
nual Review of Neuroscience, 18, 193–222.

de’Sperati, C., & Baud-Bovy, G. (2008). Blind saccades: An asynchrony between seeing
and looking. Journal of Neuroscience, 28, 4317–4321.

Page 29 of 40
Visual Control of Action

Deubel, H., Schneider, W. X., & Paprotta, I. (1998). Selective dorsal and ventral process­
ing: Evidence for a common attentional mechanism in reaching and perception. Visual
Cognition, 5, 81–107.

Dewar, M. T., & Carey, D. P. (2006). Visuomotor “immunity” to perceptual illusion: A mis­
match of attentional demands cannot explain the perception-action dissociation. Neu­
ropsychologia, 44, 1501–1508.

Diedrichsen, J., Werner, S., Schmidt, T., & Trommershauser, J. (2004). Immediate spatial
distortions of pointing movements induced by visual landmarks. Perception and Psy­
chophysics, 66, 89–103.

Dijkerman, H. C., Lê, S., Démonet, J. F., & Milner, A. D. (2004). Visuomotor performance
in a patient with visual agnosia due to an early lesion. Brain Research: Cognitive Brain
Research, 20, 12–25.

Downing, P. E., Chan, A. W., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain
specificity in visual cortex. Cerebral Cortex, 16, 1453–1461.

Dubrowski, A., Bock, O., Carnahan, H., & Jungling, S. (2002). The coordination of hand
transport and grasp formation during single- and double-perturbed human prehension
movements. Experimental Brain Research, 145, 365–371.

Fattori, P., Breveglieri, R., Amoroso, K., & Galletti, C. (2004). Evidence for both reaching
and grasping activity in the medial parieto-occipital cortex of the macaque. European
Journal of Neuroscience, 20, 2457–2466.

Fattori, P., Gamberini, M., Kutz, D. F., & Galletti, C. (2001). “Arm-reaching” neurons in the
parietal area V6A of the macaque monkey. European Journal of Neuroscience, 13, 2309–
2313.

Fattori, P., Raos, V., Breveglieri, R., Bosco, A., Marzocchi, N., & Galletti C. (2010). The
dorsomedial pathway is not just for reaching: Grasping neurons in the medial parieto-oc­
cipital cortex of the macaque monkey. Journal of Neuroscience, 30, 342–349.

Ferrier, D., & Yeo, G. F. (1884). A record of experiments on the effects of lesion of differ­
ent regions of the cerebral hemispheres. Philosophical Transactions of the Royal Society
of London. 175, 479–564.

Filimon, F., Nelson, J. D., Huang, R. S., & Sereno, M. I. (2009). Multiple parietal reach re­
gions in humans: Cortical representations for visual and proprioceptive feedback during
on-line reaching. Journal of Neuroscience, 29, 2961–2971.

Fischer, M. H. (2001). How sensitive is hand transport to illusory context effects? Experi­
mental Brain Research, 136, 224–230.

Franz, V. H. (2001). Action does not resist visual illusions. Trends in Cognitive Sciences, 5,
457–459.

Page 30 of 40
Visual Control of Action

Franz, V. H., Bulthoff, H. H., & Fahle, M. (2003). Grasp effects of the Ebbinghaus illusion:
Obstacle avoidance is not the explanation. Experimental Brain Research, 149, 470–477.

Franz, V. H., Gegenfurtner, K. R., Bulthoff, H. H., & Fahle, M. (2000). Grasping visual illu­
sions: no evidence for a dissociation between perception and action. Psychological
Science, 11, 20–25.

Frey, S. H., Vinton, D., Norlund, R., & Grafton, S. T. (2005). Cortical topography of human
anterior intraparietal cortex active during visually guided grasping. Brain Research: Cog­
nitive Brain Research, 23, 397–405.

Ganel, T., Chajut, E., & Algom, D. (2008a). Visual coding for action violates fundamental
psychophysical principles. Current Biology, 18, R599–R601.

Ganel, T., Tanzer, M., & Goodale, M. A. (2008b). A double dissociation between action and
perception in the context of visual illusions: Opposite effects of real and illusory size. Psy­
chological Science, 19, 221–225.

Gentilucci, M., Chieffi, S., Daprati, E., Saetti, M. C., & Toni, I. (1996). Visual illusion and
action. Neuropsychologia, 34, 369–376.

Glazebrook, C. M., Dhillon, V. P., Keetch, K. M., Lyons, J., Amazeen, E., Weeks, D.
(p. 292)

J., & Elliott, D. (2005). Perception-action and the Muller-Lyer illusion: Amplitude or end­
point bias? Experimental Brain Research, 160, 71–78.

Glover, S. (2004). Separate visual representations in the planning and control of action.
Behavioural and Brain Sciences, 27, 3–24; discussion 24–78.

Glover, S., & Dixon, P. (2001a). Motor adaptation to an optical illusion. Experimental Brain
Research, 137, 254–258.

Glover, S., & Dixon, P. (2001b). The role of vision in the on-line correction of illusion ef­
fects on action. Canadian Journal of Experimental Psychology, 55, 96–103.

Gonzalez, C. L. R., Ganel, T., Whitwell, R. L., Morrissey, B., & Goodale, M. A. (2008). Prac­
tice makes perfect, but only with the right hand: Sensitivity to perceptual illusions with
awkward grasps decreases with practice in the right but not the left hand. Neuropsy­
chologia, 46, 624–631.

Goodale, M. A. (1983). Vision as a sensorimotor system. In T. E. Robinson (Ed.), Behav­


ioral approaches to brain research (pp. 41–61). New York: Oxford University Press.

Goodale, M. A. (1995). The cortical organization of visual perception and visuomotor con­
trol. In S. Kosslyn and D. N. Oshershon (Ed.), An invitation to cognitive science. Vol. 2. Vi­
sual cognition and action (2nd ed., pp. 167–214). Cambridge, MA: MIT Press.

Goodale, M. A. (1996). Visuomotor modules in the vertebrate brain. Canadian Journal of


Physiology and Pharmacology, 74, 390–400.

Page 31 of 40
Visual Control of Action

Goodale, M. A. (2008). Action without perception in human vision. Cognitive Neuropsy­


chology, 25, 891–919.

Goodale, M. A., & Haffenden, A. M. (1998). Frames of reference for perception and action
in the human visual system. Neuroscience and Biobehavioral Reviews, 22, 161–172.

Goodale, M. A., & Haffenden, A. M. (2003). Interactions between dorsal and ventral
streams of visual processing. In A. Siegel, R. Andersen, H.-J. Freund, & D. Spencer (Eds.),
Advances in neurology: The parietal lobe (Vol. 93, pp. 249–267). Philadelphia: Lippincott-
Raven.

Goodale, M. A., Meenan, J. P., Bülthoff, H. H., Nicolle, D. A., Murphy, K. S., & Racicot, C. I.
(1994). Separate neural pathways for the visual analysis of object shape in perception
and prehension. Current Biology, 4, 604–610.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15, 20–25.

Goodale, M. A., & Milner, A. D. (2004). Sight unseen: An exploration of conscious and un­
conscious vision. Oxford, UK: Oxford University Press.

Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissoci­
ation between perceiving objects and grasping them. Nature, 349, 154–156.

Goodale, M. A., Westwood, D. A., & Milner, A. D. (2004). Two distinct modes of control for
object-directed action. Progress in Brain Research 144, 131–144.

Gordon, A. M., Westling, G., Cole, K. J., & Johansson, R. S. (1993). Memory representa­
tions underlying motor commands used during manipulation of common and novel ob­
jects. Journal of Neurophysiology, 69, 1789–1796.

Grefkes, C., & Fink, G. R. (2005). The functional organization of the intraparietal sulcus in
humans and monkeys. Journal of Anatomy, 207, 3–17.

Grill-Spector, K. (2003). The neural basis of object perception. Current Opinion in Neuro­
biology, 13, 159–166.

Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Dif­
ferential processing of objects under various viewing conditions in the human lateral oc­
cipital complex. Neuron, 24, 187–203.

Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., & Malach, R. (1998). Cue-invariant
activation in object-related areas of the human occipital lobe. Neuron, 21, 191–202.

Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro­
science, 27, 649–677.

Page 32 of 40
Visual Control of Action

Grol, M. J., Majdandzić, J., Stephan, K. E., Verhagen, L., Dijkerman, H. C., Bekkering, H.,
Verstraten, F. A., & Toni, I. (2007). Parieto-frontal connectivity during visually guided
grasping. Journal of Neuroscience, 27, 11877–11887.

Haffenden, A., & Goodale, M. A. (1998). The effect of pictorial illusion on prehension and
perception. Journal of Cognitive Neuroscience, 10, 122–136.

Haffenden, A. M., Schiff, K. C., & Goodale, M. A. (2001). The dissociation between percep­
tion and action in the Ebbinghaus illusion: Nonillusory effects of pictorial cues on grasp.
Current Biology, 11, 177–181.

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor­
tex. Science, 293, 2425–2430.

Holmes, G. (1918). Disturbances of visual orientation. British Journal of Ophthalmology, 2,


449–468.

Hu, Y., & Goodale, M. A. (2000). Grasping after a delay shifts size-scaling from absolute to
relative metrics. Journal of Cognitive Neuroscience, 12, 856–868.

Hu, Y., Osu, R., Okada, M., Goodale, M. A., & Kawato, M. (2005). A model of the coupling
between grip aperture and hand transport during human prehension. Experimental Brain
Research, 167, 301–304.

Jackson, S. R., Jackson, G. M., & Rosicky, J. (1995). Are non-relevant objects represented
in working memory? The effect of non-target objects on reach and grasp kinematics. Ex­
perimental Brain Research, 102, 519–530.

Jackson, S. R., & Shaw, A. (2000). The Ponzo illusion affects grip-force but not grip-aper­
ture scaling during prehension movements. Journal of Experimental Psychology: Human
Perception and Performance, 26, 418–423.

Jakobson, L. S., Archibald, Y. M., Carey, D. P., & Goodale, M. A. (1991). A kinematic analy­
sis of reaching and grasping movements in a patient recovering from optic ataxia. Neu­
ropsychologia, 29, 803–809.

Jakobson, L. S., & Goodale, M. A. (1991). Factors affecting higher-order movement plan­
ning: A kinematic analysis of human prehension. Experimental Brain Research, 86, 199–
208.

James, T. W., Culham, J., Humphrey, G. K., Milner, A. D., & Goodale, M. A. (2003). Ventral
occipital lesions impair object recognition but not object-directed grasping: A fMRI study.
Brain, 126, 2463–2475.

James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differen­
tial effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron,
35, 793–801.

Page 33 of 40
Visual Control of Action

Jeannerod, M. (1981). Intersegmental coordination during reaching at natural visual ob­


jects. In J. Long & A. Baddeley (Eds.), Attention and performance IX (pp. 153–168). Hills­
dale, NJ: Erlbaum.

Jeannerod, M. (1984). The timing of natural prehension movements. Journal of Motor Be­
havior, 16, 235–254.

Jeannerod, M. (1986). The formation of finger grip during prehension: A cortically


(p. 293)

mediated visuomotor pattern. Behavioral Brain Research, 19, 99–116.

Jeannerod, M. (1988). The neural and behavioural organization of goal-directed move­


ments. Oxford, UK: Clarendon Press.

Johansson, R., Westling, G., Bäckström, A., & Flanagan, J. R. (2001). Eye–hand coordina­
tion in object manipulation. Journal of Neuroscience, 21, 6917–6932.

Karnath, H. O., & Perenin, M.-T. (2005). Cortical control of visually guided reaching: Evi­
dence from patients with optic ataxia. Cerebral Cortex, 15, 1561–1569.

Karnath, H. O., Rüter, J., Mandler, A., & Himmelbach, M. (2009). The anatomy of object
recognition—Visual form agnosia caused by medial occipitotemporal stroke. Journal of
Neuroscience, 29, 5854–5862.

Keefe, B. D., & Watt, S. J. (2009). The role of binocular vision in grasping: A small stimu­
lus-set distorts results. Experimental Brain Research, 194, 435–444.

Knill, D. C. (2005). Reaching for visual cues to depth: the brain combines depth cues dif­
ferently for motor control and perception. Journal of Vision, 5, 103–115.

Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the hu­
man lateral occipital complex. Science, 293, 1506–1509.

Kwok, R. M., & Braddick, O. J. (2003) When does the Titchener circles illusion exert an ef­
fect on grasping? Two- and three-dimensional targets. Neuropsychologia, 41, 932–940.

Lamme, V. A. F. (2001). Blindsight: The role of feedforward and feedback corticocortical


connections. Acta Psychologica, 107, 209–228.

Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed­
forward and recurrent processing. Trends in Neurosciences, 23, 571–579.

Lê, S., Cardebat, D., Boulanouar, K., Hénaff, M. A., Michel, F., Milner, D., Dijkerman, C.,
Puel, M., & Démonet, J.-F. (2002). Seeing, since childhood, without ventral stream: A be­
havioural study. Brain, 125, 58–74.

Lee, Y. L., Crabtree, C. E., Norman, J. F., & Bingham, G. P. (2008). Poor shape perception
is the reason reaches-to-grasp are visually guided online. Perception and Psychophysics,
70, 1032–1046.

Page 34 of 40
Visual Control of Action

Loftus, A., Servos, P., Goodale, M. A., Mendarozqueta, N., & Mon-Williams, M. (2004).
When two eyes are better than one in prehension: monocular viewing and end-point vari­
ance. Experimental Brain Research, 158, 317–327.

Louw, S., Smeets, J. B., & Brenner, E. (2007). Judging surface slant for placing objects: A
role for motion parallax. Experimental Brain Research, 183, 149–158.

Lyon, D. C., Nassi, J. J., & Callaway, E. M. (2010). A disynaptic relay from superior collicu­
lus to dorsal stream visual cortex in macaque monkey. Neuron, 65, 270–279.

Marotta, J. J., Behrmann, M., & Goodale, M. A. (1997). The removal of binocular cues dis­
rupts the calibration of grasping in patients with visual form agnosia. Experimental Brain
Research, 116, 113–121.

Marotta, J. J., & Goodale, M. A. (1998). The role of learned pictorial cues in the program­
ming and control of grasping. Experimental Brain Research, 121, 465–470.

Marotta, J. J., & Goodale, M. A. (2001). The role of familiar size in the control of grasping.
Journal of Cognitive Neuroscience, 13, 8–17.

Marotta, J. J., Kruyer, A., & Goodale, M. A. (1998). The role of head movements in the con­
trol of manual prehension. Experimental Brain Research, 120, 134–138.

Marotta, J. J., Perrot, T. S., Nicolle, D., & Goodale, M. A. (1995). The development of adap­
tive head movements following enucleation. Eye, 9 (3), 333–336.

Marotta, J. J., Perrot, T. S., Servos, P., Nicolle, D., & Goodale, M. A. (1995). Adapting to
monocular vision: Grasping with one eye. Experimental Brain Research, 104, 107–114.

Melmoth, D. R., Finlay, A. L., Morgan, M. J., & Grant, S. (2009). Grasping deficits and
adaptations in adults with stereo vision losses. Investigative Ophthalmology and Visual
Science, 50, 3711–3720.

Melmoth, D. R., & Grant, S. (2006). Advantages of binocular vision for the control of
reaching and grasping. Experimental Brain Research, 171, 371–388.

Melmoth, D. R., Storoni, M., Todd, G., Finlay, A. L., & Grant, S. (2007). Dissociation be­
tween vergence and binocular disparity cues in the control of prehension. Experimental
Brain Research, 183, 283–298.

Menon, R. S., Ogawa, S., Kim, S. G., Ellermann, J. M., Merkle, H., Tank, D. W., & Ugurbil,
K. (1992). Functional brain mapping using magnetic resonance imaging: Signal changes
accompanying visual stimulation. Investigative Radiology, Suppl 2, S47–S53.

Milner, A. D., & Goodale, M.A. (2006). The visual brain in action (2nd ed.). Oxford, UK:
Oxford University Press.

Milner, A. D., & Goodale M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46, 774–785.
Page 35 of 40
Visual Control of Action

Milner, A. D., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., Bet­
tucci, D., Mortara, F., Mutani, R., Terazzi, E., & Davidson, D. L. W. (1991). Perception and
action in “visual form agnosia.” Brain, 114, 405–428.

Monaco, S., Sedda, A., Fattori, P., Galletti, C., & Culham, J. C. (2009). Functional magnetic
resonance adaptation (fMRA) reveals the involvement of the dorsomedial stream in wrist
orientation for grasping. Society for Neuroscience Abstracts, 307, 1.

Mon-Williams, M., & Dijkerman, H. C. (1999). The use of vergence information in the pro­
gramming of prehension. Experimental Brain Research, 128, 578–582.

Mon-Williams, M., & McIntosh, R. D. (2000). A test between two hypotheses and a possi­
ble third way for the control of prehension. Experimental Brain Research, 134, 268–273.

Mon-Williams, M., & Tresilian, J. R. (2001). A simple rule of thumb for elegant prehen­
sion. Current Biology, 11, 1058–1061.

Moore, T. (2006). The neurobiology of visual attention: Finding sources. Current Opinion
in Neurobiology, 16, 159–165.

Nieder, A., & Dehaene, S. (2009). Representation of number in the brain. Annual Review
of Neuroscience, 32, 185–208.

Obhi, S. S., & Goodale, M. A. (2005). The effects of landmarks on the performance of de­
layed and real-time pointing movements. Experimental Brain Research, 167, 335–344.

Ogawa, S., Tank, D. W., Menon, R., Ellermann, J. M., Kim, S. G., Merkle, H., & Ugurbil, K.
(1992). Intrinsic signal changes accompanying sensory stimulation: Functional brain map­
ping with magnetic resonance imaging. Proceedings of the National Academy of Sciences
U S A, 89, 5951–5955.

Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules and dimensions. Nature Reviews Neuroscience, 9, 123–135.

Patla, A. E. (1997). Understanding the roles of vision in the control of human locomotion.
Gait and Posture, 5, 54–69.

Perenin, M.-T., & Rossetti, Y. (1996). Grasping without form discrimination in a hemi­
anopic field. NeuroReport, 7, 793–797.

Perenin, M.-T., & Vighetto, A. (1988). Optic ataxia: A specific disruption in visuo­
(p. 294)

motor mechanisms. I. Different aspects of the deficit in reaching for objects. Brain, 111,
643–674.

Pierrot-Deseilligny C. H., Milea, D., & Muri, R. M. (2004). Eye movement control by the
cerebral cortex. Current Opinion in Neurology, 17, 17–25.

Plodowski, A., & Jackson, S. R. (2001). Vision: Getting to grips with the Ebbinghaus illu­
sion. Current Biology, 11, R304–R306.
Page 36 of 40
Visual Control of Action

Pook, P. K., & Ballard, D. H. (1996). Deictic human/robot interaction. Robotics and Au­
tonomous Systems, 18, 259–269.

Prado, J., Clavagnier, S., Otzenberger, H., Scheiber, C., & Perenin, M.-T. (2005). Two corti­
cal systems for reaching in central and peripheral vision. Neuron, 48, 849–858.

Rice, N. J., McIntosh, R. D., Schindler, I., Mon-Williams, M., Démonet, J. F., & Milner, A. D.
(2006). Intact automatic avoidance of obstacles in patients with visual form agnosia. Ex­
perimental Brain Research, 174, 176–188.

Rizzolatti, G., & Craighero, L. (1998). Spatial attention: Mechanisms and theories. In M.
Sabourin, F. Craik, & M. Robert (Eds.), Advances in psychological science: Vol.2. Biologi­
cal and cognitive aspects (pp. 171–198). East Sussex, UK: Psychology Press.

Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25, 31–40.

Sakata, H. (2003). The role of the parietal cortex in grasping. Advances in Neurology, 93,
121–139.

Sanders, M. D., Warrington, E. K., Marshall, J., & Weiskrantz, L. (1974). “Blindsight”: Vi­
sion in a field defect. Lancet, 20, 707–708.

Schenk, T., & Milner, A. D. (2006). Concurrent visuomotor behaviour improves form dis­
crimination in a patient with visual form agnosia. European Journal of Neuroscience, 24,
1495–1503.

Schindler, I., Rice, N. J., McIntosh, R. D., Rossetti, Y., Vighetto, A., & Milner, A. D. (2004).
Automatic avoidance of obstacles is a dorsal stream function: Evidence from optic ataxia.
Nature Neuroscience, 7, 779–784.

Sereno, M. I., & Tootell, R. B. (2005). From monkeys to humans: what do we now know
about brain homologies? Current Opinion in Neurobiology, 15, 135–144.

Servos, P., Carnahan, H., & Fedwick, J. (2000). The visuomotor system resists the horizon­
tal-vertical illusion. Journal of Motor Behavior, 32, 400–404.

Servos, P., Goodale, M. A., & Jakobson, L. S. (1992). The role of binocular vision in pre­
hension: A kinematic analysis. Vision Research, 32, 1513–1521.

Smeets, J. B., & Brenner, E. (1999). A new view on grasping. Motor Control, 3, 237–271.

Smeets, J. B., & Brenner, E. (2001). Independent movements of the digits in grasping. Ex­
perimental Brain Research, 139, 92–100.

Smeets, J. B., & Brenner, E. (2008). Grasping Weber’s law. Current Biology, 18, R1089–
R1090.

Page 37 of 40
Visual Control of Action

Smeets, J. B., Brenner, E., & Martin, J. (2009). Grasping Occam’s razor. Advances in Ex­
perimental Medical Biology, 629, 499–522.

Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior
parietal cortex. Nature, 386, 167–170.

Stöttinger, E., & Perner, J. (2006). Dissociating size representation for action and for con­
scious judgment: Grasping visual illusions without apparent obstacles. Consciousness and
Cognition, 15, 269–284.

Stöttinger, E., Soder, K., Pfusterschmied, J., Wagner, H., & Perner, J. (2010). Division of
labour within the visual system: fact or fiction? Which kind of evidence is appropriate to
clarify this debate? Experimental Brain Research, 202, 79–88.

Striemer, C., Chapman, C. S., & Goodale, M. A. (2009). “Realtime” obstacle avoidance in
the absence of primary visual cortex. Proceedings of the National Academy of Sciences, U
S A, 106, 15996–16001.

Tanaka, K. (2003). Columns for complex visual object features in the inferotemporal cor­
tex: Clustering of cells with similar but slightly different stimulus selectivities. Cerebral
Cortex, 13, 90–99.

Tavassoli, A., & Ringach, D. L. (2010). When your eyes see more than you do. Current Bi­
ology, 20, R93–R94.

Thaler, L., & Goodale, M. A. (2010). Beyond distance and direction: The brain represents
target locations non-metrically. Journal of Vision, 10, 3. 1–27.

Tipper, S. P., Howard, L. A., & Jackson, S. R. (1997). Selective reaching to grasp: Evidence
for distractor interference effects. Visual Cognition, 4, 1–38.

Tootell, R. B. H., Tsao, D., & Vanduffel, W. (2003). Neuroimaging weighs in: Humans meet
macaques in “primate” visual cortex. Journal of Neuroscience, 23, 3981–3989.

Tresilian, J. R. (1998). Attention in action or obstruction of movement? A kinematic analy­


sis of avoidance behavior in prehension. Experimental Brain Research, 120, 352–368.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge
MA: MIT Press.

Valyear, K. F., & Culham, J. C. (2010). Observing learned objectspecific functional grasps
preferentially activates the ventral stream. Journal of Cognitive Neuroscience, 22, 970–
984.

Valyear, K. F., Culham, J. C., Sharif, N., Westwood, D., & Goodale, M. A. (2006). A double
dissociation between sensitivity to changes in object identity and object orientation in the
ventral and dorsal visual streams: A human fMRI study. Neuropsychologia, 44, 218–228.

Page 38 of 40
Visual Control of Action

van Bergen, E., van Swieten, L. M., Williams, J. H., & Mon-Williams, M. (2007). The effect
of orientation on prehension movement time. Experimental Brain Research, 178, 180–193.

van de Kamp, C., & Zaal, F. T. J. M. (2007). Prehension is really reaching and grasping. Ex­
perimental Brain Research, 182, 27–34.

van Donkelaar, P. (1999). Pointing movements are affected by size-contrast illusions. Ex­
perimental Brain Research, 125, 517–520.

Van Essen, D. C., Lewis, J. W., Drury, H. A., Hadjikhani, N., Tootell, R. B., Bakircioglu, M.,
& Miller, M. I. (2001). Mapping visual cortex in monkeys and humans using surface-based
atlases. Vision Research, 41, 1359–1378.

van Mierlo, C. M., Louw, S., Smeets J. B., & Brenner, E. (2009). Slant cues are processed
with different latencies for the online control of movement. Journal of Vision, 9, 25. 1–8.

Vaughan, J., Rosenbaum, D. A., & Meulenbroek, R. G. (2001). Planning reaching and
grasping movements: The problem of obstacle avoidance. Motor Control, 5, 116–135.

Verhagen, L., Dijkerman, H. C., Grol, M. J., & Toni, I. (2008). Perceptuo-motor in­
(p. 295)

teractions during prehension movements. Journal of Neuroscience, 28, 4726–4735.

Vilaplana, J. M., Batlle, J. F., & Coronado, J. L. (2004). A neural model of hand grip forma­
tion during reach to grasp. 2004 IEEE International Conference on Systems, Man, and
Cybernetics, 1–7, 542–546.

Vishton, P. M., & Fabre, E. (2003). Effects of the Ebbinghaus illusion on different behav­
iors: One- and two-handed grasping; one- and two-handed manual estimation; metric and
comparative judgment. Spatial Vision, 16, 377–392.

Warner, C. E., Goldshmit, Y., & Bourne, J. A. (2010). Retinal afferents synapse with relay
cells targeting the middle temporal area in the pulvinar and lateral geniculate nuclei.
Frontiers in Neuroanatomy, 4, 8.

Warren, W. H., & Fajen, B. R. (2004). From optic flow to laws of control. In L. M. Vaina, S.
A. Beardsley, & S. K. Rushton (Eds.), Optic flow and beyond (pp. 307–337). Norwell, MA:
Kluwer Academic.

Watt, S. J., & Bradshaw, M. F. (2000). Binocular cues are important in controlling the
grasp but not the reach in natural prehension movements. Neuropsychologia, 38, 1473–
1481.

Watt, S. J., & Bradshaw M. F. (2003). The visual control of reaching and grasping: Binocu­
lar disparity and motion parallax. Journal of Experimental Psychology: Human Perception
and Performance, 29, 404–415.

Weiskrantz, L. (1997). Consciousness lost and found: A neuropsychological exploration.


Oxford, UK: Oxford University Press.

Page 39 of 40
Visual Control of Action

Westwood, D. A., & Goodale, M. A. (2003). Perceptual illusion and the real-time control of
action. Spatial Vision, 16, 243–254.

Westwood, D. A., Heath, M., & Roy, E. A. (2000). The effect of a pictorial illusion on
closed-loop and open-loop prehension. Experimental Brain Research, 134, 456–463.

Notes:

(1) . Several solutions to the temporal coupling problem have been proposed (e.g., Hu,
Osu, Okada, Goodale, & Kawato, 2005; Mon-Williams & Tresilian, 2001; Vilaplana, Batlle,
& Coronado, 2004).

Melvyn A. Goodale

Melvyn A. Goodale, The Brain and Mind Institute, The University of Western Ontario

Page 40 of 40
Development of Attention

Development of Attention  
M. Rosario Rueda
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0015

Abstract and Keywords

Functions of attention include achievement and maintenance of a state of alertness, selec­


tion of information from sensory input, and regulation of responses when dominant or
well-learned behavior is not appropriate. These functions have been associated with acti­
vation of separate networks of brain areas. In this chapter, the developmental course of
the attention networks during infancy and childhood and the neural mechanisms underly­
ing its maturation are reviewed. Alerting is active early in infancy, although the ability to
endogenously maintain the level of alertness develops through late childhood. The ability
to orient to external stimulation is also present from quite early in life, and mostly as­
pects of orienting related to the control of disengagement and voluntary orientation im­
prove during childhood. Executive attention starts developing by the end of the first year
of life, showing major maturational changes during the preschool years. The efficiency of
all three functions is subject to important individual differences, which may be due to
both genetic endowment and educational and social experiences. In the final section, I
discuss evidence indicating that efficiency of attention can be improved through training
during childhood. Attention training has the potential to benefit aspects of behavior cen­
tral to education and socialization processes.

Keywords: development, attention, cognitive development, brain development, attention networks, alerting, orient­
ing, executive control

Attention has been a matter of interest to researchers since the emergence of psychology
as an experimental science. Titchener (1909) gave attention a central role in cognition,
considering it “the heart of the psychological enterprise. ” Some years earlier, William
James had provided an insightful definition of attention, which has become one of the
most popular to this date: “Everyone knows what attention is. It is the taking possession
of the mind in clear and vivid form of one out of what seem several simultaneous objects
or trains of thought. Focalization, concentration and consciousness are of its essence. It
implies withdrawal from some things in order to deal effectively with others …” (James,
1890).

Page 1 of 40
Development of Attention

Despite its subjective nature, this definition contains important insights about the various
aspects that would be studied by scientists interested in attention in the times to follow.
For instance, James’ definition suggests that we are limited by the amount of information
we can consciously process at a time, and he attributed to the attention system the func­
tion of selecting the relevant out of all the stimulation available to our sensory systems.
What is relevant to a particular individual depends on the current goals and intentions of
that individual. Therefore, although implicitly, it is also suggested that attention is intrin­
sically connected to volition. Additionally, words like “concentration” and “focalization”
point to the effortful and resource-consuming nature of the continuous monitoring be­
tween the flow of information and goals of the individual that is required in order to know
what (p. 297) stimulation is important to attend to at each particular moment.

After James’ insightful ideas on attention, the research on inner states and cognitive
processes was for the most part neglected owing to an exacerbated emphasis on “observ­
able” behavior during most of the first half of the twentieth century. However, when the
interest in cognition was reinstituted during and after World War II, all the multiple com­
ponents of the attentional construct contained in James’ definition, such as selection, ori­
enting, and executive control, were further developed by many authors, including Donald
Hebb (1949), Colin Cherry (1953), Donald Broadbent (1958), Daniel Kahneman (1973),
and Michael Posner (1978, 1980), among others.

In this chapter, I first review a variety of constructs studied within the field of attention.
Then, I discuss on the anatomy of brain networks related to attention as revealed from
imaging studies. Afterward, I present evidence on the developmental time course of the
attention networks during infancy and childhood. Next, I discuss how individual differ­
ences in attentional efficiency have been studied over the past years as well as the influ­
ence of both constitutional and experiential factors on the development of attention. Fi­
nally, I present recent research focused on optimizing attentional capacities during devel­
opment.

Constructs of Attention and Marker Tasks


The cognitive approach to attention provides a variety of models and conceptual frame­
works for studying its development across the lifespan and underlying neural mecha­
nisms. Perhaps the most general question is whether to think of attention as one thing or
as a number of somewhat separate issues. Hebb (1949) argued that all stimuli had two ef­
fects: (1) providing information about the nature of the stimulating event after following
the major sensory pathways in the brain; and (2) keeping the cortex tuned in the waking
state through the reticular activating system pathway. A classic distinction in the field is
to divide attention by considering separately the intensive and selective aspects (Kahne­
man, 1973). Alerting effects can be contrasted with selective attention, which involves
committing processing resources to some particular event. These aspects can in turn be
separated from the role of attention in cognitive control, which is needed when situations
call for a careful and volitional control of thoughts and behavior as opposed to when re­

Page 2 of 40
Development of Attention

sponses to stimulation can be automatically triggered (Norman & Shallice, 1986; Posner
& Snyder, 1975). These three aspects of attention are further reviewed below.

Attention as State

The attention state of an organism varies according to changes in both internal and exter­
nal conditions. Generally, alertness reflects the state of the organism for processing infor­
mation and is an important condition in all tasks. Intrinsic or tonic alertness refers to a
state of general wakefulness, which clearly changes over the course of the day from sleep
to waking and within the waking state from sluggish to highly alert. Sustained attention
and vigilance have been defined as the ability to maintain a certain level of arousal and
alertness that allows responsiveness to external stimulation (Posner, 1978; Posner &
Boies, 1971). Tonic alertness requires mental effort and is subject to top-down control of
attention (Posner, 1978); thus, the attention state is likely to diminish after a period of
maintenance. In all tasks involving long periods of processing, the role of changes of state
may be important. Thus, vigilance or sustained attention effects probably rest at least in
part on changes in the tonic alerting system.

The presentation of an external stimulus also increases the state of alertness. Preparation
from warning cues (phasic alertness) can be measured by comparing the speed and accu­
racy of response to stimulation with and without warning signals (Posner, 2008). Warning
cues appear to accelerate processes of response selection (Hackley & Valle-Inclán, 1998),
which commonly result in increased response speed. Depending on the conditions, this ef­
fect can be automatic as it occurs with an auditory accessory event that does not predict
a target, but it can also be partly due to voluntary actions based on information conveyed
by the cue about the time of the upcoming target. The reduction of reaction time (RT) fol­
lowing a warning signal is accompanied by vast changes in the physiological state of the
organism. These changes are thought to bring on the suppression of ongoing activity in
order to prepare the system for a rapid response. This often happens before the response
being sufficiently contrasted, as with short intervals between warning cue and target,
leading to reduced accuracy in performance (Morrison, 1982; Posner, 1978).

Attention as Selectivity

Attention is also an important mechanism for conscious perception. When examining a vi­
sual scene, there is the general feeling that all information (p. 298) about it is available.
However, important changes can occur in the scene without being noticed by the observ­
er, provided they take place away from the focus of attention. For instance, changes in
the scene are often completely missed if cues that are normally effective in producing a
shift of attention, such as luminance changes or movements, are suppressed (Rensink,
O’Reagan, & Clark, 1997), a phenomenon called change blindness. Something similar
happens in the auditory modality. A classic series of studies conducted by Colin Cherry
(1953) in which different stimuli were presented simultaneously to the two ears of individ­
uals showed that information presented to the unattended ear could go totally unnoticed.
To explain the role of attention on the selection of information, Broadbent (1958)

Page 3 of 40
Development of Attention

developed a model that considers that attention acts as a filter, which selects a channel of
entry and sends information to a perceptual processing system of limited capacity. Later
on, Hillyard and colleagues studied the mechanisms of attentional selection. Using event-
related potentials (ERP), they showed that attended stimuli generate early ERP compo­
nents of larger amplitude than unattended ones, suggesting that attention facilitates per­
ceptual processing of attended information (Hillyard, 1985; Mangun & Hillyard, 1987).
Attention is thus considered a mechanism that allows selecting out irrelevant information
and gives priority to relevant information for conscious processing.

Much of the selection of external stimulation is achieved by orienting to the source of in­
formation. This orientation can be carried out overtly, as when head and eye movements
are directed toward the source of input, or it can be carried out covertly, as when only at­
tention is oriented to the stimulation. Posner (1980) developed a cueing paradigm to
study orienting of attention. In this task, a visual target appears at one of two possible lo­
cations, either to the right or left of fixation, and participants are asked to detect its pres­
ence by pressing a key. Before the target, a cue is presented at one of the two possible lo­
cations. When the preceding cue appears at the same location as the subsequent target
(valid cue), a reduction in RT is observed compared with when there is no cue or it is dis­
played at fixation (neutral cue). This allows measuring benefits in RT due to moving atten­
tion to the location of the target before its appearance. On the contrary, when the cue is
presented at a different location from the target (invalid cue), an RT cost is observed with
respect to neutral cues, provided that attention must be disengaged from the wrong loca­
tion and moved to where the target appeared. The same result is obtained even when no
sufficient time for a saccadic movement is allowed between cue and target (i.e., less than
200 milliseconds). This shows that attention can be oriented independently of eye and
head movements and that covert shifts of attention also enhance the speed of responding.
However, how free covert attention is from the eye movement systems is still a debated
matter (see Klein, 2004).

Another important distinction concerning orienting of attention is related to whether at­


tention shifts are triggered by external stimulation or voluntarily generated by the indi­
vidual. In Posner’s paradigm, this is studied by manipulating the nature, location, and
percentage of valid cues. Cues that consist on luminance changes at peripheral locations
are thought to capture attention automatically. On the contrary, cues that are presented
at fixation and must be interpreted before attention is moved, for example, an arrow
pointing left or right, are considered to cause attention shifts that are endogenously gen­
erated. In addition, if cues reliably predict the location of targets, as when they are valid
in more than 50 percent of the trials, they are more likely to induce voluntary shifts of at­
tention. The distinction between endogenous and exogenous orienting of attention is im­
portant because they appear to be two relatively independent mechanisms with separate
underlying neuroanatomy (Corbetta & Shulman, 2002). In developmental terms, this dis­
tinction is important to understanding the development of orienting from birth to early
adolescence, as will be discussed later.

Page 4 of 40
Development of Attention

Studies using functional magnetic resonance imaging (fMRI) and cellular recording have
demonstrated that brain areas that are activated by cues, such as the superior parietal
lobe and temporal parietal junction, play a key role in modulating activity within primary
and extrastriate visual systems when attentional orienting occurs (Corbetta & Shulman,
2002; Desimone & Duncan, 1995). This shows that the attentional system achieves selec­
tion by modulating the functioning of sensory systems. Attention can thus be considered a
domain-general system that regulates the activation of domain-specific sensory process­
ing systems.

Inhibition is also an important mechanism of attentional orienting. In cognitive tasks, in­


hibition is inferred from an increase in RT or an increased error rate. In orienting tasks
when a peripheral cue is presented more than half second before a target, inhibition
takes place at the location of the cue and (p. 299) RT at that location increases, an effect
called inhibition of return (IOR; Posner & Cohen, 1984). IOR is thought to fulfill an impor­
tant function because it prevents reexamination of locations that have already been ex­
plored (Klein, 2000). Because alertness is also increased with changing environmental
conditions, it seems that orienting and alerting bias the organism for novelty and change.

Attention as Executive Control

Same as orienting to external sources of stimulation, attention can be directed internally


to coordinate memories, thoughts, and emotions. The phenomenon called negative prim­
ing is an example of selection at the level of memory representations. Negative priming
consists of increased RT to stimuli that have been previously ignored. This effect can be
accounted for by an inhibitory process that acts on the representation of the ignored in­
formation, allowing the system to focus on information relevant to current actions
(Houghton & Tipper, 1994). In addition, a mechanism similar to inhibition of return has
been described by Fuentes (2004) in the domain of representations in semantic memory.
Negative priming can be observed for targets preceded by semantically related primes
with sufficiently long intervals between the two events. It is as if inhibiting the represen­
tation of a concept extends to semantically related representations, making them harder
to reactivate shortly after if needed. Likewise, attention has been proposed as a central
mechanism for the control of working memory representations (Baddeley, 1993).

Attention is also an important mechanism for action monitoring, particularly when selec­
tion has to be achieved in a voluntary effortful mode. Posner and Snyder (1975) first ar­
gued about the central role of attention for cognitive control. In well-practiced tasks, ac­
tion coordination does not require attentional control because responses can be automati­
cally triggered by the stimulation. However, attention is needed in a variety of situations
in which automatic processing is either not available or is likely to produce inappropriate
responses. Years later, Norman and Shallice (1986) developed a cognitive model for dis­
tinguishing between automatic and controlled modes of processing. According to their
model, psychological processing systems rely on a number of hierarchically organized
schemas of action and thought used for routine actions. These schemas are automatically
triggered and contain well-learned responses or sequences of actions. However, a differ­

Page 5 of 40
Development of Attention

ent mode of operation involving the attention system is required when situations call for
more carefully elaborated responses. These are situations that involve (1) novelty, (2) er­
ror correction or troubleshooting, (3) some degree of danger or difficulty, or (4) overcom­
ing strong habitual responses or tendencies.

A way to study cognitive control in the lab consists of inducing conflict between respons­
es by instructing people to execute a subdominant response while suppressing a domi­
nant tendency. A basic measure of conflict interference is provided by the Stroop task
(Stroop, 1935). The original form of this task requires subjects to look at words denoting
colors and to report the color of ink the words are written in instead of reading them. Pre­
senting incongruent perceptual and semantic information (e.g., the word “blue” written
with red ink) induces conflict and produces a delay in response time compared with when
the two sources of information match. The Flanker task is another widely used method to
study conflict resolution. In this task, the target is surrounded by irrelevant stimulation
that can either match or conflict with the response required by the target (Eriksen &
Eriksen, 1974). As with the Stroop task, resolving interference from distracting incongru­
ent stimulation delays RT. Cognitive tasks involving conflict have been extensively used to
measure the efficiency with which control of action is exerted. The extent of interference
is usually measured by subtracting average RT in nonconflict conditions from that of con­
ditions involving conflict. The idea is that additional involvement of the attention system
is required to detect conflict and resolve it by inhibiting the dominant but inappropriate
response (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Posner & DiGirolamo, 1998).
Thus, larger interference scores are interpreted as indicative of less efficiency of cogni­
tive control.

Another important form of attention regulation is related to the ability to detect and cor­
rect errors. Detection and monitoring of errors has been studied using ERP (Gehring,
Gross, Coles, Meyer, & Donchin, 1993). A large negative deflection over midline frontal
channels is often observed about 100 ms after the commission of an error, called the er­
ror-related negativity (ERN). This effect has been associated with an attention-based self-
regulatory mechanism (Dehaene, Posner, & Tucker, 1994) and provides a means to exam­
ine the emergence of this function during infancy and childhood (Berger, Tzur, & Posner,
2006).

Posner’s Model of Attention

All three aspects of attention considered above are simultaneously involved in much of
our behavior. (p. 300) However, the distinction between alerting, orienting, and executive
control has proved useful to understanding the neural basis of attention. Over the past
decades, Posner and colleagues have developed a neurocognitive model of attention. Pos­
ner proposes a division of attention into three distinct brain networks. One of these in­
volves changes of state and is called alerting. The other two are closely involved with se­
lection and are called orienting and executive attention (Posner, 1995; Posner & Fan,
2008; Posner & Petersen, 1990; Posner, Rueda, & Kanske, 2007). The alerting network
deals with the intensive aspect of attention related to how the organism achieves and

Page 6 of 40
Development of Attention

maintains the alert state. The orienting network deals with selective mechanisms operat­
ing on sensory input. Finally, the executive network is involved in the regulation of
thoughts, feelings, and behavior.

The three attention networks in Posner’s model are not considered completely indepen­
dent. For instance, as mentioned earlier, alerting signals produce faster but more inaccu­
rate responses, a result that Posner interpreted as an inhibitory interaction between the
alerting and executive networks (Posner, 1978). On the other hand, alerting appears to fa­
cilitate the orienting of attention by speeding up attention shifts. Also, by focusing on tar­
gets, orienting aids on filtering-out irrelevant information, thus enhancing the function of
executive attention (see Callejas, Lupiáñez, & Tudela, 2004, for futher discussion on at­
tention networks interactions). However, despite their interacting nature, the three atten­
tion networks have been shown to have relatively independent neuroanatomies and ap­
pear to involve distinct neuromodulatory mechanisms (Posner, Rueda, & Kanske, 2007;
Posner & Fan, 2008; see also Table 15.1).

Several years ago, Fan et al. (2002) developed an experimental task to study the function­
ing of the three attentional networks, called the attention network task (ANT). The task is
based on traditional experimental paradigms to study the functions of alerting (prepara­
tion cues), orienting (orienting cues), and executive control (flanker task) (Figure 15.1).
Completion of the task allows calculation of three scores related to the efficiency of the
attention networks. The alerting score is calculated by subtracting RT to trials with dou­
ble cue from RT to trials with no cue. This provides a measure of the benefit in perfor­
mance by having a signal that informs about the immediate upcoming of the target and
by using this information to get ready to respond. The orienting score provides a measure
of how much benefit is obtained in responding when information is given about the loca­
tion of the upcoming target. It is calculated by subtracting RT to spatial cue trials from
that of central cue trials. Finally, the executive attention score indicates the amount of in­
terference experienced in performing the task when stimulation conflicting with the tar­
get is presented in the display. It is calculated by subtracting RT to congruent trials from
RT to incongruent trials. Larger scores indicate more interference from distractors and
therefore less efficiency of conflict resolution mechanisms (executive attention).

Posner’s view of attention as an organ system with its own functional anatomy (Posner &
Fan, 2008) provides a model of great heuristic power. Connecting cognitive and neural
levels of analysis aids in answering many complex issues related to attention, such as the
maturational processes underlying its development and factors that are likely to influence
maturation, such as genes and experience. In the next section, anatomy of the three at­
tention networks in Posner’s model is briefly described.

Neuroanatomy of Attention
The emergence of the field of cognitive neuroscience constituted a turning point in the
study of the relationship between brain and cognition from which the study of attention
has benefited greatly. Functional neuroimaging has allowed many cognitive tasks to be
Page 7 of 40
Development of Attention

analyzed in terms of the brain areas they activate (Posner & Raichle, 1994). Studies of at­
tention have been among the most often examined in this way (Corbetta & Shulman, 2002;
Driver, Eimer, & Macaluso, 2007; Posner & Fan, 2008; Wright & Ward, 2008), and per­
haps the areas of activation have been more consistent for the study of attention than for
any other cognitive system (Hillyard, Di Russo, & Martinez, 2006; Posner et al., 2007; Raz
& Buhle, 2006). A summary of the anatomy and neurotransmitters involved in the three
networks is shown in Table 15.1.

Alerting Network

Research in the past years has shown that structures of the right parietal and right
frontal lobes as well as a number of midbrain neural modulators such as norepinephrine
and dopamine are involved in alertness (Sturm et al., 1999). Arousal of the central ner­
vous system involves input from brainstem systems that modulate activation of the cor­
tex. (p. 301) Primary among these is the locus coeruleus, which is the source of the brain’s
norepinephrine. It has been demonstrated that the influence of warning signals operate
via this brain system because drugs that block it also prevent the changes in the alert
state that lead to improved performance after a warning signal is provided (Coull, Nobre,
& Frith, 2001; Marrocco & Davidson, 1998).

Page 8 of 40
Development of Attention

Table 15.1 Marker Tasks, Anatomy, Neurochemistry, and Genetics of Attention Networks

Marker Tasks Anatomy Neurotransmitters Genes

Alerting Warning signals (pha­ Locus coeruleus Noradrenaline MAOA


sic) Parietal and frontal ADRA2A
CPT, tasks of sustained cortex (right—tonic; NET
attention (tonic) left—phasic)

Orienting Dual tasks (selectivity) Superior colliculus Acetylcholine CHRNA4


Cueing task (orienting) Superior parietal lobe CHRNA7
Visual search Temporal-parietal junc­
tion
Inferior frontal cortex
Frontal eye fields

Executive attention Conflict tasks Anterior cingulate cor­ Dopamine COMT


Inhibition (go/no-go) tex DRD4
Prefrontal cortex DAT1
DBH

Page 9 of 40
Development of Attention

The involvement of frontal and parietal regions of the right hemisphere is supported by
studies showing that lesions in those areas impair patients’ ability to maintain the alert
state in the absence of warning signals (Dockree et al., 2004; Sturm et al., 1999). Imaging
studies have also supported the involvement of right frontal-parietal structures in the en­
dogenous maintenance of alertness (Coull, Frith, Frackowiak, & Grasby, 1996; Dockree et
al., 2004).

However, the neural basis for tonic alertness may differ from those involving phasic
changes of alertness following warning cues. Warning signals provide a phasic change in
level of alertness over millisecond intervals. This change involves widespread variation in
autonomic signals, such as heart rate (Kahneman, 1973), and cortical changes, such as
the contingent negative variation (CNV; Walter, 1964). Several studies have shown that
the CNV is generated by activation in the frontal lobe, with specific regions depending on
the type of task being used (Cui et al., 2000; Tarkka & Basile, 1998). When using fixed
cue–target intervals, warning signals are informative of when a target will occur, thus
producing a preparation in the time domain. Under these conditions, warning cues ap­
pear to activate frontal-parietal structures on the left hemisphere, instead of the right
(Coull, Frith, Büchel, & Nobre, 2000; Nobre, 2001). Using the ANT, Fan et al. (2005)
observed cortical frontal-parietal activation that was stronger on the left hemisphere fol­
lowing warning cues along with activation on the thalamus.

Orienting Network

The orienting system for visual events has been associated with posterior brain areas in­
cluding the superior parietal lobe, the temporal-parietal junction, and the frontal eye
fields. Lesions of the parietal lobe and superior temporal lobe have been consistently re­
lated to difficulties in orienting (Karnath, Ferber, & Himmelbach, 2001). Activation of cor­
tical areas has been specifically associated with operations of disengagement from the
current focus of attention. Moving attention from one location to another involves the su­
perior colliculus, whereas engaging attention requires thalamic areas such as the pulv­
inar nucleus (Posner & Raichle, 1994). Also, Corbetta and Shulman (2002) reviewed a se­
ries of imaging studies and showed that partially segregated networks appear to be in­
volved in endogenous and exogenous orientation of attention. Goal-directed or top-down
selection involves activation of a dorsal network that includes intraparietal and frontal
cortices, whereas stimuli-driven (exogenous) attention activates a ventral network con­
sisting of the temporal-parietal junction and inferior frontal cortex mostly on the right
hemisphere. The ventral network is involved in orienting attention in a reflexive mode to
salient events and has the capacity to overcome the voluntary orientation associated with
the dorsal system.

The function of the orienting network appears to be modulated by acetylcholine


(p. 302)

(Ach). It has been shown that lesions of cholinergic systems in the basal forebrain in mon­
keys interfere with orienting attention (Voytko et al., 1994). In addition, administration of
a muscarinic antagonist, scopolamine, appears to delay orientation to spatial cues but not
to cues that only have an alerting effect (Davidson, Cutrell, & Marrocco, 1999). Further

Page 10 of 40
Development of Attention

evidence shows that the parietal cortex is the site where this modulation takes place. In­
jections of scopolamine directly in parietal areas containing cells that respond to spatial
cues affect the ability to shift attention to the cued location. Systemic injections of scopo­
lamine have a smaller effect on covert orienting of attention than do local injections in the
parietal area (Davidson & Marrocco, 2000). These observations in the monkey have also
been confirmed by similar studies in the rat (Everitt & Robbins, 1997) and by studies with
nicotine in humans (Newhouse, Potter, & Singh, 2004).

Executive Attention Network

We know from adult brain imaging studies that Stroop tasks activate the anterior cingu­
late cortex (ACC). In a meta-analysis of imaging studies, the dorsal section of the ACC
was activated in response to cognitive conflict tasks such as variants of the Stroop task,
whereas the ventral section appeared to be mostly activated by emotional tasks and emo­
tional states (Bush, Luu, & Posner, 2000). The two divisions of the ACC also seem to inter­
act in a mutually exclusive way. For instance, when the cognitive division is activated, the
affective division tends to be deactivated, and vice versa, suggesting the possibility of rec­
iprocal effortful and emotional controls of attention (Drevets & Raichle, 1998). Also, re­
solving conflict from incongruent stimulation in the flanker task activates the dorsal por­
tion of the ACC together with other regions of the lateral prefrontal cortex (Botvinick, Ny­
strom, Fissell, Carter, & Cohen, 1999; Fan, Flombaum, McCandliss, Thomas, & Posner,
2003). Different parts of the ACC appear to be well connected to a variety of other brain
regions, including limbic structures and parietal and frontal areas (Posner, Sheese, Odlu­
das, & Tang, 2006). Support for the voluntary exercise of self-regulation comes from stud­
ies that examine either the instruction to control affect or the connections involved in the
exercise of that control. For example, the instruction to avoid arousal during processing
of erotic events (Beauregard, Levesque, & Bourgouin, 2001) or to ward off emotion when
looking at negative pictures (Ochsner, Bunge, Gross, & Gabrieli, 2002) produces a locus
of activation in midfrontal and cingulate areas. In addition, if people are required to se­
lect an input modality, the cingulate shows functional connectivity to the selected sensory
system (Crottaz-Herbette & Menon, 2006). Similarly, when involved with emotional pro­
cessing, the cingulate shows a functional connection to limbic areas (Etkin, Egner, Per­
aza, Kandel, & Hirsch, 2006). These findings support the role of cingulate areas in the
control of cognition and emotion.

As with the previous networks, pharmacological studies conducted with monkeys and rats
have aided our understanding of the neurochemical mechanisms affecting efficiency of
the executive attention network. In this case, data suggest that dopamine (DA) is the im­
portant neurotransmitter for executive control. Blocking DA in the dorsal-lateral pre­
frontal cortex (DLPFC) of rhesus monkeys causes deficits on tasks involving inhibitory
control (Brozoski, Brown, Rosvold, & Goldman, 1979). Additionally, activation of mesocor­
tical dopaminergic neurons in rats enhances activity in the prefrontal cortex (McCulloch,
Savaki, McCulloch, Jehle, & Sokoloff, 1982), as does expression of DA receptors in the an­
terior cingulate cortex (Stanwood, Washington, Shumsky, & Levitt, 2001).

Page 11 of 40
Development of Attention

Development of Attention Networks


Each of the functions of attention considered in the neurocognitive model just described
is present to some degree in infancy, but each undergoes a long developmental process.
In the next sections, the development of these functions is traced from birth to adoles­
cence.

Infancy and Toddlerhood

Attention in infancy is less developed than later in life, and the functions of alerting, ori­
enting, and executive control in particular are less independent during infancy. In their
volume Attention in Early Development, Ruff and Rothbart (1996) extensively reviewed
the development of attention across early childhood. They suggest that both reactive and
self-regulatory systems of attention are at play during the first years of life. Initially, reac­
tive attention is involved in more automatic engagement and orienting processes. Then,
by the end of the first year of life, attention can be more voluntarily controlled. Across the
toddler and preschool years, the self-regulatory system increasingly assumes control of
attentional processes, allowing for a more flexible and goal-oriented control of attentional
resources. (p. 303) In this section, I first discuss components of attention related to the
state of engagement and selectivity and then consider attention in relation to self-regula­
tion.

The early life of the infant is concerned with changes in state. Sleep dominates at birth,
and the waking state is relatively rare at first. The newborn infant spends nearly three-
fourths of the time sleeping (Colombo & Horowitz, 1987). There is a dramatic change in
the percentage of time in the waking state over the first 3 months of life. By the 12th
postnatal week, the infant has become able to maintain the alert state during much of the
daytime hours, although this ability still depends heavily on external sensory stimulation,
much of it provided by the caregiver.

Newborns show head and eye movements toward novel stimuli, but the control of orient­
ing is initially largely in the hands of caregiver presentations. Eye movements are prefer­
entially directed toward moving stimuli and have been shown to depend on properties of
the stimulus, for example, how much they resemble human faces (Johnson & Morton,
1991).

Much of the response to external stimuli involves orienting toward the source of stimula­
tion. The orienting response is a relatively automatic or involuntary response in reaction
to moderately intense changes in stimulation (Sokolov, 1963). It depends on the physical
characteristics of the stimulation, such as novelty and intensity. With the orienting re­
sponse, the organism is alerted and prepared to learn more about the event in order to
respond appropriately. Orienting responses are accompanied by a deceleration in heart
rate that is sustained during the attention period (Richards & Casey, 1991). In babies,
other physical reactions involve decreases in motor activity, sucking, and respiration
(Graham, Anthony, & Ziegler, 1983). The heart rate deceleration is observed after 2

Page 12 of 40
Development of Attention

months of age and increases in amplitude until about 9 months. Subsequently, the ampli­
tude decreases until it approximates the adult response, which consists of a small decel­
eration in response to novel stimulation (Graham et al., 1983). Orienting responses of
greater magnitude lead to more sustained periods of focused attention, which likely in­
crease babies’ opportunities to explore objects and scenes. Infants become more able to
recognize objects also partially because of maturation of temporal-parietal structures and
increases in neural transmission. As this happens, the novelty of objects, and hence their
capacity to alert, diminishes, causing shorter periods of sustained attention. Reductions
of automatic orientation may then facilitate the emergence of self-initiated voluntary at­
tention (Ruff & Rothbart, 1996).

The most frequent method of studying orienting in infancy involves the use of eye move­
ments tracking devices. As in adults, there is a close relation between the direction of
gaze and the infants’ attention. The attention system can be driven to particular locations
by external input from birth (Richards & Hunter, 1998); however, orientation to the
source of stimulation continues to improve in precision over many years. Infants’ eye
movements often fall short of the target, and peripheral targets are often foveated by a
series of head and eyes movements. Although not as easy to track, the covert system like­
ly follows a similar trajectory. One strategy to examine covert orienting consists of pre­
senting brief cues that do not produce an eye movement followed by targets that do. Us­
ing this strategy, it has been shown that the speed of the eye movement to the target is
enhanced by the cue, and this enhancement improves over the first year of life (Butcher,
2000). In more complex situations, for example, when there are competing targets, the
improvement may go on for longer periods (Enns & Cameron, 1987).

The orienting network appears to be fully functional by around 6 months of age. During
the first months of life there is a progressive development of the connection between vi­
sual processing pathways and parietal systems involved in attentional control. This matu­
ration allows for visual orientation to be increasingly under attentional control. The dor­
sal visual pathway, primarily involved in processing spatial properties of objects and loca­
tions, maturates earlier compared with the ventral visual pathway, involved in object
identification. This explains why infants show preference for novel objects instead of nov­
el locations (Harman, Posner, Rothbart, & Thomas-Thrapp, 1994). Preference for novel lo­
cations is in fact shown from very early on. Inhibition of return is shown by newborns for
very salient objects such as lights (Valenza, Simion, & Umiltá, 1994), and sometime later,
at about 6 months of age, for more complex objects (Harman et al., 1994).

Gaining control over disengaging attention is also necessary to be able to shift between
objects or locations. Infants in the first 2 or 3 months of life often have a hard time disen­
gaging from salient objects and events and might become distressed before they are able
to move away from the target. By 4 months, however, infants become more able to look
away from central displays (Johnson, Posner, & Rothbart, 1991). (p. 304) From then on,
the latency to turn from central to peripheral displays decreases substantially with age
(Casey & Richards, 1988). Before disengaging attention, the heart rate begins to acceler­
ate back to preattentive levels, and infants become more distractible. After termination of

Page 13 of 40
Development of Attention

attention, there seems to be a refractory period of about 3 seconds during which orient­
ing to a novel event in the previous location or a nearby one is inhibited (Casey &
Richards, 1991), a process that might be related to inhibition of return. The ability to dis­
engage gaze is achieved before the capacity to disengage attention. The voluntary disen­
gagement of attention requires further inhibitory skills that appear to emerge later on, at
about 18 months of age (Ruff & Rothbart, 1996).

Orienting to sensory input is a major mechanism for regulation of distress. Decrements in


heart rate that occur with orienting responses are likely to have a relaxing effect in in­
fants. In fact, it has been reported that greater orienting skill in the laboratory is associ­
ated with lower temperamental negative emotion and greater soothability as reported by
parents (Johnson et al., 1991). Additional evidence of the regulatory function of attention
is provided by caregivers’ attempts to distract their infants by bringing their attention to
other stimuli. As infants orient, they are often quieted, and their distress appears to di­
minish. In one study, 3- to 6-month-old infants were first shown a sound and light display;
about 50 percent of the infants became distressed to the stimulation, but then strongly
oriented to interesting visual and auditory soothing events when these were presented.
While the infants oriented, facial and vocal signs of distress disappeared. However, as
soon as the orienting stopped, the infants’ distress returned to almost exactly the levels
shown before presentation of the soothing object, even when the quieted period lasted for
as long as 1 minute (Harman, Rothbart, & Posner, 1997). The authors have speculated
that an internal system involving the amygdala holds a computation of the initial level of
distress, so that this initial level returns if the infant’s orientation to the novel event is
lost.

Late infancy is the time when self-regulation develops. At about the end of the first year,
executive attention-related frontal structures come into play, and this allows for a pro­
gressive increase in the duration of orientation based on goals and intentions. For in­
stance, periods of focused attention during free play increase steadily after the first year
of life (Ruff & Lawson, 1990). With the development of voluntary attention, young chil­
dren become increasingly more sensitive to others’ line of regard, establishing a basis for
joint attention and social influences on selectivity. Increasingly, infants are able to gain
control of their own emotions and other behaviors, and this transition marks the emer­
gence of the executive attention system.

Perhaps the earliest evidence of activation of the executive attention network is at about
7 months of age. As discussed earlier, an important form of self-regulation is related to
the ability to detect errors, which has been linked to activation of the ACC. One study ex­
amined the ability of infants of 7 to 9 months to detect errors (Berger, Tsur, & Posner,
2006). In this study, infants observed a scenario in which one or two puppets were hidden
behind a screen. A hand was seen to reach behind the screen and either add or remove a
puppet. When the screen was removed, there were either the correct number of puppets
or an incorrect number. Wynn (1992) found that infants of 7 months looked longer when
the number was in error that when it was correct. Whether the increased looking time in­
volved the same executive attention circuitry that is active in adults when they detect er­

Page 14 of 40
Development of Attention

rors was simply unknown. Berger and colleagues replicated the Wynn study but used a
128-channel electroencephalogram (EEG) to determine the brain activity that occurred
during error trials in comparison with that found when the infant viewed a correct solu­
tion. They found that the same EEG component over the same electrode sites differed be­
tween correct and erroneous displays both in infants and adults. This suggests that a sim­
ilar brain anatomy as in adult studies is involved in infants’ ability to detect errors. Of
course, activating this anatomy for observing an error is not the same as what occurs in
adults, who actually slow down after an error and adjust their performance. However, it
suggests that even very early in life, the anatomy of the executive attention system is at
least partly functional.

Later in the first year of life, there is evidence of further development of executive func­
tions, which may depend on executive attention. One example is Adele Diamond’s work
using the “A not B” task and the reaching task. These two marker tasks involve inhibition
of an action that is strongly elicited by the situation. In the “A not B” task, the experi­
menter shifts the location of a hidden object from location A to location B, after the
infant’s retrieving from location A had been reinforced as correct in the previous trials
(Diamond, 1991). In the reaching task, visual information about the correct route (p. 305)
to a toy is put in conflict with the cues that normally guide reaching. The normal tenden­
cy is to reach for an object directly along the line of sight. In the reaching task, a toy is
placed under a transparent box in front of the child. The opening of the box is on one of
the lateral sides instead of the front side. In this situation, the infant can reach the toy on­
ly if the tendency to reach directly along the line of sight is inhibited. Important changes
in the performance of these tasks are observed from 6 to 12 months (Diamond, 2006).
Comparison of performance between monkeys with brain lesions and human infants on
the same marker tasks suggests that the tasks are sensitive to the development of the
prefrontal cortex, and maturation of this brain area seems to be critical for the develop­
ment of this form of inhibition.

Another task that reflects the executive system involves anticipatory looking in a visual
sequence task. In the visual sequence task, stimuli are placed in front of the infant in a
fixed and predictable sequence of locations. The infant’s eyes are drawn reflexively to the
stimuli because they are designed to be attractive and interesting. After a few trials,
some infants will begin to anticipate the location of the next target by correctly moving
their eyes in anticipation of the target. Anticipatory looks are thought to reflect the devel­
opment of a more voluntary attention system that might depend in part on the orienting
network and also on the early development of the executive network. It has been shown
that anticipatory looking occurs with infants as young as 3½ to 4 months (Clohessy, Pos­
ner, & Rothbart, 2001; Haith, Hazan, & Goodman, 1988). However, there are also impor­
tant developments that occur during infancy (Pelphrey et al., 2004) and later (Garon,
Bryson, & Smith, 2008). Learning more complex sequences of stimuli, such as sequences
in which a location is followed by one of two or more different locations, depending on
the location of previous stimuli within the sequence (e.g., location 1, then location 2, then
location 1, then location 3, and so on…), requires the monitoring of context and, in adult
studies, has been shown to depend on lateral prefrontal cortex (Keele, Ivry, Mayr, Hazel­
Page 15 of 40
Development of Attention

tine, & Heuer, 2003). Infants of 4 months do not learn to go to locations where there is
conflict as to which location is the correct one. The ability to respond when such conflict
occurs is not present until about 18 to 24 months of age (Clohessy et al., 2001). At 3
years, the ability to respond correctly when there is conflict in the sequential looking task
correlates with the ability to resolve conflict in a spatial conflict task (Rothbart, Ellis,
Rueda, & Posner, 2003). These findings support the slow development of the executive at­
tention network during the first and second years of life.

The visual sequence task is related to other features that reflect executive attention. One
of these is the cautious reach toward novel toys. Rothbart and colleagues found that the
slow, cautious reach of infants of 10 months predicted higher levels of effortful control as
measured by parent report at 7 years of age (Rothbart, Ahadi, Hersey, & Fisher, 2001). In­
fants of 7 months who show higher levels of correct anticipatory looking in the visual se­
quence task also show longer inspection before reaching toward novel objects and slower
reaching toward the object (Sheese, Rothbart, Posner, Fraundorf, & White, 2008). This
suggests that successful anticipatory looking at 7 months is one feature of self-regulation.
In addition, infants with higher levels of correct anticipatory looking also showed evi­
dence for higher levels of emotionality in a distressing task and more evidence of efforts
to self-regulate their emotional reactions. Thus, even at 7 months, the executive attention
system is showing some properties of self-regulation, even though it is not yet sufficiently
developed to resolve the simple conflicts used in the visual sequence task or the task of
reaching away from the line of sight in the transparent box task.

An important question about early development of executive attention is its relation to the
orienting network. The findings to date suggest that orienting is playing some of the reg­
ulatory roles in early infancy that are later exercised by the executive network. I argued
that the orienting network seems to have a critical role in regulation of emotion by the
caregiver as early as 4 months. It has been recently shown that orienting as measured
from the Infant Behavior Questionnaire at 7 months is not correlated with effortful con­
trol as measured in the same infants at 2 years (Sheese, Voelker, Posner, & Rothbart,
2009). However, orienting did show some relation with early regulation of emotional re­
sponding of the infants because it was positively related to positive affect and negatively
related to negative affect. After toddlerhood, emotional control by orienting might experi­
ence a transition toward control by executive attention because a negative relationship
between effortful control and negative affect has been repeatedly found in preschool-
aged and older children (Rothbart & Rueda, 2005).

In 2001, Colombo presented a summary of attentional functions in infancy, which includ­


ed (p. 306) alertness, spatial orienting, object-oriented attention, and endogenous atten­
tion (Colombo, 2001). This division is similar to the network approach, but divides orient­
ing into space and features and includes the functions of interstimulus shifts and sus­
tained attention as part of endogenous attention. Colombo argues that alerting reaches
the mature state at about 4 months, orienting by 6 to 7 months, and endogenous atten­

Page 16 of 40
Development of Attention

tion by 4 to 5 years. This schedule is similar to our discussion in order, but as discussed in
the next section, all these functions continue developing during childhood.

Childhood

Figure 15.1 The child ANT task. In each trial, a


warning tone is presented on half of the trials (in the
other half of the trials, no tone is presented). After a
brief interval, a cue consisting of an asterisk is pre­
sented on 2/3 of the trials, which appears in the same
location of the subsequent target (valid cue) 50% of
the time, and in the opposite location (invalid cue) in
the remaining 50%. Finally, a target is presented
consisting of a colorful fish pointing either right or
left. The target appears either above or below the fix­
ation cross and is flanked by two fish on each side.
The flanking fish may point in the same direction as
the target (congruent trials), or in the opposite direc­
tion (incongruent trials), equally often. In successive
trials, participants are instructed to discriminate the
direction of the target fish as rapidly and accurately
as possible, and usually both reaction time (RT) and
accuracy of the response are registered.

In the preschool years, children become more able to follow instructions and perform RT
tasks. To study the development of attention functions across childhood, a child-friendly
version of the ANT was developed (Rueda, Fan, et al., 2004). This version is structurally
similar to the adult version but uses fish instead of arrows as target stimuli. This allows
contextualization of the task in a game in which the goal is to feed the middle fish (tar­
get), or simply to make it happy, by pressing a key corresponding to the direction in
which it points. After the response is made, a feedback consisting of an animation of the
middle fish is provided, which intends to help the child’s motivation to complete the task.

Using the child ANT, the development of attention networks has been traced during the
primary school period into early adolescence. In a first study, this task was used with chil­
dren aged 6 to 10 years and in adults (Rueda, Fan, et al., 2004). Results showed separate
developmental trajectories for each attention function. Alerting scores showed stability
across early and middle childhood, but children’s scores were higher than the scores ob­

Page 17 of 40
Development of Attention

tained by adults, suggesting further development of alerting during late childhood. Ori­
enting scores showed no differences across ages, suggesting an early development of this
network. However, in this study invalid cues were not used; thus, the load of operations of
disengagement and reallocation of attention was rather low. (p. 307) Finally, flanker inter­
ference scores indexing executive attention efficiency showed considerable reduction
from age 6 to 7 years. However, interference, as calculated with both RT and percentage
of errors, remained about the same from age 7 years to adulthood, suggesting that early
childhood is the period of major development of executive attention.

Recently, we have conducted a developmental study with a slightly modified version of


the child ANT (see Figure 15.1). Callejas et al. (2004) suggested a modification of the ANT
consisting of separating the presentation of alerting and orienting signals and including
invalid orienting cues, as well as presenting alerting signals in the auditory modality. This
modification has two potential advantages over the original ANT: (1) It allows measure­
ment of alerting and orienting effects separately, and (2) it provides orienting scores with
greater load of disengagement and reorienting operations. We modified the child ANT ac­
cording to these suggestions and conducted a study with groups of 4- to 7-year-olds, 7- to
10-year-olds, and 10- to 13-year-olds and adults (Abundis, Checa, & Rueda, 2013). Again,
data showed separate developmental courses for the three functions. Executive attention
scores showed a progressive development across early and middle childhood and no dif­
ferences between the oldest children and adults. Similarly, alerting scores were larger for
the youngest groups (4–7 years and 7–10 years), with no differences between 10- to 13-
year-olds and adults. With invalid cues included, the orienting network followed a differ­
ent developmental trajectory compared with the previous study. We observed larger ori­
enting scores for all age groups compared with adults, which suggests that the develop­
ment of operations of disengagement and reorienting of attention extends over late child­
hood. In this study, we also registered electrophysiological activation during performance
of the task. Data revealed modulation of distinct ERP components associated with each
network. Alerting cues produced early positive and negative components over frontal
leads followed by the expected CNV. Consistent with the behavioral data, the two younger
groups did not show the early frontal effect, indicating that activation of frontal struc­
tures related to response preparation is delayed in younger children with respect to older
children and adults. Compared with valid orienting cues, invalid cues elicited larger P3
over parietal channels, an effect that was larger for children than adults and suggests
their need for greater engagement of parietal structures in order to disengage and reori­
ent attention. Finally, compared with congruent conditions, incongruent flankers pro­
duced a larger negativity around 300 ms in channels along the frontal midline in adults.
This effect was more sustained for children and had a broader left-lateralized anterior
distribution than in adults. Again, this suggests that the executive attention network has
not reached its highest level of efficiency during early childhood; thus, additional frontal
structures need to be activated during longer periods of time in order to reduce interfer­
ence from flankers.

Page 18 of 40
Development of Attention

One of the strengths of the child ANT is that it is a theoretically grounded task that com­
bines experimental strategies widely used to study alerting (e.g., warning signals), orient­
ing (e.g., spatial cues), and attention control (e.g., flanker conflict) within the same task.
However, much of the developmental research on attention has been conducted separate­
ly for each function. The main findings of this research are reviewed next.

Alerting
Several studies have examined developmental changes in phasic alertness between
preschoolers, older children, and adults. Increasing age is generally associated with larg­
er reductions in RT in response to warning cues. Developmental differences in response
preparation may relate to the speed and maintenance of preparation while expecting the
target. It has been shown that young children (5-year-olds) need more time than older
children (8-year-olds) and adults to get full benefit from a warning cue (Berger, Jones,
Rothbart, & Posner, 2000), and they also seem to be less able to sustain the optimal level
of alertness over time (Morrison, 1982). Using the child ANT, Mezzacappa (2004)
observed a trend toward higher alerting scores (difference between RT in trials with and
without warning cues) with age in a sample of 5- to 7-year-old children. Increasing age
was associated with larger reductions in RT in response to warning cues. Older children
also show lower rates of omissions overall, which indicates greater ability to remain vigi­
lant during the task period. The fact that alertness is subject to more fluctuations in
younger children can in part explain age differences in processing speed because alert­
ness is thought to speed the processing of subsequent events.

Sustained attention is frequently measured by examining variations in performance in a


task along a relatively extended period of time, as in the continuous performance task
(CPT). Variations in the level of alertness can be observed by examining the (p. 308) per­
centage of correct and/or omitted responses to targets or by means of indexes of percep­
tual sensitivity (d’) over time. With young children, the percentage of individuals that are
able to complete a particular task can also be indicative of maturational differences in the
ability to sustain attention. In a study conducted with preschoolers, only 30 to 50 percent
of 3- to 4-year-olds were able to complete the task, whereas the percentage rose to 70
percent for 4- to 4½-year-olds and close to 100 percent from that age up (Levy, 1980). Us­
ing the CPT, Danis et al. (2008) found considerable increases in the ability to maintain and
regain attention after a distraction period between 2½ and 3½ years of age, and more
consistent control of attention after 4½ years of age. However, even though the largest
development of vigilance seems to occur during the preschool period, children continue
to show larger declines in performance in CPT over time compared with adults through
middle and late childhood, especially under more difficult task conditions. For instance, 7-
to 9-year-old children show a larger decline in sensitivity (d’) and hits over time com­
pared with adults in an auditory version of the CPT, which is thought to be more challeng­
ing than the visual version of the task (Curtindale, Laurie-Rose, Bennett-Murphy, & Hull,
2007). Likewise, while performing a CPT with degraded stimuli, a steady increase of d’
and rate of hits with age has been observed, reaching the adult level by about age 13
years (Lin, Hsiao, & Chen, 1999).

Page 19 of 40
Development of Attention

Developmental changes in alertness during childhood and early adolescence appear to re­
late to continuous maturation of frontal systems during this period. One way to examine
brain mechanisms underlying changes in alertness is through studying the CNV, an elec­
trophysiological index associated with activation in the right ventral and medial frontal
areas of the brain (Segalowitz & Davies, 2004). In adolescents as well as adults, the CNV
has been shown to relate to performance in various measures of intelligence and execu­
tive functions as well as functional capacity of the frontal cortex (Segalowitz, Unsal, &
Dywan, 1992). Various studies have shown that the amplitude of the CNV increases with
age, especially during middle childhood. Jonkman (2006) found that the CNV amplitude is
significantly smaller for 6- to 7-year-old children compared with adults, but no differences
were observed between 9- and 10-year-olds and adults. Moreover, the difference in CNV
amplitude between children and adults seems to be restricted to early components of the
CNV observed over right frontal-central channels (Jonkman, Lansbergen, & Stauder,
2003), which suggests a role of maturation of the frontal alerting network.

Orienting and Selectivity


Aspects of the attention system that increase precision and voluntary control of orienting
continue developing throughout childhood and adolescence. For the most part, infant
studies examine overt forms of orienting. By the time children are able to follow instruc­
tions and respond to stimulation by pressing keys, both overt and covert orienting can be
measured. Mostly using Posner’s cuing paradigm, several studies have examined the de­
velopment of orienting during childhood. Despite a progressive increase in orienting
speed to valid cues during childhood (Schul, Townsend, & Stiles, 2003), data generally
show no age differences in the orienting benefit effect between young children (5–6 years
of age), older children (8–10 years) and adults (Enns & Brodeur, 1989), regardless of
whether the effect is measured in covert or overt orienting conditions (Wainwright &
Bryson, 2002). However, there seems to be an age-related decrease in the orienting cost
(Enns & Brodeur, 1989; Schul et al., 2003; Wainwright & Bryson, 2002). Besides, the ef­
fect of age when disengagement and reorienting to an uncued location are needed ap­
pears to be larger under endogenous orienting conditions (e.g., longer intervals between
cue and target) (Schul et al., 2003; Wainwright & Bryson, 2005). This suggests that as­
pects of orienting related to the control of disengagement and voluntary orientation,
which depend on the dorsal frontoparietal network in the Corbetta and Shulman (2002)
model, improve with age during childhood. In a study in which endogenous orienting was
examined in children aged 6 to 14 years and adults, all groups but the youngest children
showed larger orienting effects (difference in RT to targets appearing at cued vs. uncued
locations) with longer cue–target intervals (Wainwright & Bryson, 2005). This indicates
that young children seem to have problems endogenously adjusting the scope of their at­
tentional focus. This idea was also suggested by Enns and Girgus (1985), who found that
attentional focusing as well as the ability to effectively divide or switch attention between
stimuli improves with age between ages 5, 8, and 10 years, and adulthood.

A similar developmental pattern emerges when orienting and selectivity are studied in
the auditory modality. Coch et al. (2005) developed a task to measure sustained selective

Page 20 of 40
Development of Attention

auditory attention (p. 309) in young children. They used a dichotic listening task in which
participants were asked to selectively focus attention to one auditory stream consisting of
a narration of a story while ignoring a different stream containing another story. A pic­
ture related to the to-be-attended story is visually presented to the child in order to facili­
tate the task. Then, ERP are recorded to probes embedded in the attended and unattend­
ed channels. Adults show increased P1 and N1 amplitudes to probes presented in the at­
tended stream. Six- to 8-year-old children and preschoolers also show an attention-relat­
ed modulation of ERP components, which is more sustained in children than adults (Coch
et al., 2005; Sanders, Stevens, Coch, & Neville, 2006). Differences in the topographic dis­
tribution of the effect between children and adults also suggest that brain mechanisms
related to sustained selectivity continue developing beyond middle childhood. Additional­
ly, further development is necessary when attention has to be disengaged from the at­
tended channel and moved to a different one, as occurred for the visual modality. An im­
portant improvement in performance between 8 and 11 years of age has been reported
when children are asked to disengage attention from the attended channel and reallocate
it to the other channel in the dichotic listening task (Pearson & Lane, 1991).

Executive Attention
Children are able to perform simple conflict tasks in which their RT can be measured
from age 2 years on, although the tasks need to be adapted to appear child friendly. One
such adaptation is the spatial conflict task (Gerardi-Caulton, 2000), which induces con­
flict between the identity and the location of objects. It is a touch-screen task in which
pictures of houses of two animals (i.e., a duck and a cat) are presented in the bottom left
and right sides of the screen, then one of the two animals appears either on the left or
right side of the screen in each trial, and the child is required to touch the house corre­
sponding to the animal. Location is the dominant aspect of the stimulus, although instruc­
tions require responding according to its identity. Thus, conflict trials in which the animal
appears on the side of the screen opposite to its house usually result in slower responses
and larger error rates than nonconflict trials (when the animal appears on the side of its
house). Between 2 and 4 years of age, children progressed from an almost complete in­
ability to carry out the task to relatively good performance. Although 2-year-old children
tended to perseverate on a single response, 3-year-olds performed at high accuracy lev­
els; although, like adults, they responded more slowly and with reduced accuracy to con­
flict trials (Gerardi-Caulton, 2000; Rothbart, Ellis, Rueda, & Posner, 2003).

Another way to study action monitoring consists of examining the detection and correc­
tion of errors. While performing the spatial conflict task, 2½- and 3-year-old children
showed longer RT following erroneous trials than following correct ones, indicating that
children were noticing their errors and using them to guide performance in the next trial.
However, no evidence of slowing following an error was found at 2 years of age (Rothbart
et al., 2003). A similar result with a different time frame was found when using a version
of the Simon Says game. In this task, children are asked to execute a response when a
command is given by one stuffed animal and to inhibit a response commanded by a sec­
ond animal (Jones, Rothbart, & Posner, 2003). Children 36 to 38 months of age were un­

Page 21 of 40
Development of Attention

able to inhibit their response and did not show the slowing-after-error effect, but at 39 to
41 months of age, children showed both the ability to inhibit and the slowing of RT follow­
ing errors. These results suggest that between 30 and 39 months of age, children greatly
develop their ability to detect and correct erroneous responses and that this ability may
relate to the development of inhibitory control.

Data collected with the ANT and reported earlier suggested that the development of con­
flict resolution continues during the preschool and early childhood periods. Nonetheless,
studies in which the difficulty of the conflict task is increased by other demands such as
switching rules or holding more information in working memory have shown further de­
velopment of conflict resolution between late childhood and adulthood. For example,
Davidson et al. (2006) manipulated memory load, inhibitory demand, and rule switching
(cognitive flexibility) in a spatial conflict task. They found that the cost resulting from the
need for inhibitory control was larger for children than for adults. Also, even under low
memory load conditions, the switching cost was still larger for 13-year-old children than
for adults. The longer developmental course observed with this task might be due to the
requirement of additional frontal brain areas to those involved in executive attention.

Other studies have used ERP to examine the brain mechanisms that underlie the develop­
ment of executive attention. In one of these studies, a flanker task was used to compare
conflict resolution (p. 310) in three groups of children aged 5 to 6, 7 to 9, and 10 to 12
years, and a group of adults (Ridderinkhof & van der Molen, 1995). Developmental differ­
ences were examined in two ERP components, one related to response preparation (LRP)
and another one related to stimulus evaluation (P3). The authors found differences be­
tween children and adults in the latency of the LRP, but not in the latency of the P3 peak,
suggesting that developmental differences in the ability to resist interference are mainly
related to response competition and inhibition, but not to stimulus evaluation.

As discussed earlier, brain responses to errors are also informative of the efficiency of the
executive attention system. The amplitude of the ERN seems to reflect detection of the
error as well as its salience in the context of the task and therefore is subject to individ­
ual differences in affective style or motivation. Generally, larger ERN amplitudes are as­
sociated with greater engagement in the task and/or greater efficiency of the error-detec­
tion system (Santesso, Segalowitz, & Schmidt, 2005; Tucker, Hartry-Speiser, McDougal,
Luu, & deGrandpre, 1999). Developmentally, the amplitude of the ERN shows a progres­
sive increase during childhood into late adolescence (Segalowitz & Davies, 2004), with
young children (age 7–8 years) being less likely to show the ERN to errors than older chil­
dren and adults—at least when performing a flanker task.

Another evoked potential, the N2, is also modulated by the requirement for executive
control (Kopp, Rist, & Mattler, 1996) and has been associated with a source of activation
in the ACC (van Veen & Carter, 2002). In a flanker task, such as the fish version of the
child ANT, adults show larger N2 for incongruent trials over the mid-frontal leads (Rueda,
Posner, Rothbart, & Davis-Stober, 2004b). Four-year-old children also show a larger nega­
tive deflection for the incongruent condition compared with the congruent one at mid-

Page 22 of 40
Development of Attention

frontal electrodes. However, compared with adults, this congruency effect had a larger
size and extended over a longer period of time. Later in childhood, developmental studies
have shown a progressive decrease in the amplitude and latency of the N2 effect with age
(Davis, Bruce, Snyder, & Nelson, 2004; Johnstone, Pleffer, Barry, Clarke, & Smith, 2005;
Jonkman, 2006). The reduction of the amplitude appears to relate to the increase in effi­
ciency of the system and not to the overall amplitude decrease that is observed with age
(Lamm, Zelazo, & Lewis, 2006). Also, the effects are more widely distributed for young
children, and they become more focalized with age (Jonkman, 2006; Rueda et al., 2004b).
Source localization analyses indicate that, compared with adults, children need additional
activations to adequately explain the distribution (Jonkman, Sniedt, & Kemner, 2007).

The focalization of signals in adults compared with children is consistent with neuroimag­
ing studies, in which children appear to activate the same network of areas as adults
when performing similar tasks, but the average volume of activation appears to be re­
markably greater in children than in adults (Casey, Thomas, Davidson, Kunz, & Franzen,
2002; Durston et al., 2002). Altogether, these data suggest that the brain circuitry under­
lying executive functions becomes more focal and refined as it gains in efficiency. This
maturational process involves not only greater anatomical specialization but also reduc­
ing the time these systems need to resolve each of the processes implicated in the task.
This is consistent with recent data showing that the network of brain areas involved in at­
tentional control shows increased segregation of short-range connections but increased
integration of long-range connections with maturation (Fair et al., 2007). Segregation of
short-range connectivity may be responsible for greater local specialization, whereas in­
tegration of long-range connectivity likely increases efficiency by improving coordinated
responses between different processing networks.

Figure 15.2 Schematic representation of the devel­


opmental time course of attention networks. The
alerting and orienting networks appear to mature
largely during infancy and early childhood, although
both networks continue developing up to late child­
hood, showing improvements in the endogenous con­
trol of processes related to preparation and selectivi­
ty. The executive attention network appears to under­
go a more protracted maturation, emerging at about
the end of the first year of life and continuing during
childhood into adolescence.

Page 23 of 40
Development of Attention

In summary, evidence shows that the attention networks have different developmental
courses across childhood. The different developmental courses are represented in Figure
15.2. Shortly after birth infants show increasing levels of alertness, although alertness is
highly dependent on exogenous stimulation. Then, preparation from warning signals
shows a progressive development during the first years of life, whereas the ability to en­
dogenously sustain attention improves up to late childhood. Orienting also shows pro­
gressive development of increasingly complex functions. Infants are able to orient to ex­
ternal cues by age 4 months. From there, children’s orientation is increasingly more pre­
cise and less dependent on exogenous stimulation. Endogenous disengagement and reori­
enting of attention progress up to late childhood and early adolescence. The earlier signs
of executive attention appear by the end of the first year of life. From there on, there is a
progressive development, especially during the preschool period, of the ability to inhibit
dominant responses and suppress irrelevant stimulation. However, with increasingly com­
plex conditions, as when other executive functions (p. 311) (e.g., working memory, plan­
ning) come into play, executive attention shows further development during childhood
and adolescence.

Individual Differences in Attentional Efficiency


Reasons for studying the emergence of attention are strengthened because cognitive
measures of attention efficiency in laboratory tasks have been linked to aspects of
children’s behavior in naturalistic settings. For example, it has been shown that the effi­
ciency of executive attention as measured with the child ANT is related to the ability to
accommodate to social norms, like smiling when receiving a disappointing gift (Simonds,
Kieras, Rueda, & Rothbart, 2007). Additionally, Eisenberg and her colleagues have shown
that children with good attentional control tend to deal with anger by using nonhostile
verbal methods rather than overt aggressive methods (Eisenberg, Fabes, Nyman,
Bernzweig, & Pinuelas, 1994). Efficiency of attention is also related to peer-reported mea­
sures of unsocial behavior in the classroom and increased risk for social rejection (Checa,
Rodriguez-Bailon, & Rueda, 2008). In that same study a positive relation between atten­
tional efficiency and schooling competence was reported, including measures of academ­
ic achievement and skills important for school success, such as rule following and toler­
ance to frustration.

The relationship between poor attention, school maladjustment, and low academic
achievement seems to be consistent across ages and cultures. Mechanisms of the execu­
tive attention network are likely to play a role in this relationship. We have recently found
that brain activation registered while performing the flanker task is a significant predic­
tor of mathematics grades in school (Checa & Rueda, 2011). Also, in a study conducted
with a flanker task and ERPs, children who committed more errors on incongruent trials
showed smaller amplitudes of the ERN. This result suggests less sensibility of the brains
of these children to the commission of errors. Moreover, the amplitude of the ERN was
predicted by individual differences in social behavior, in that children with poorer social
sensitivity as assessed by a self-report personality questionnaire showed ERNs of smaller
Page 24 of 40
Development of Attention

amplitude (Santesso et al., 2005). On the other hand, empathy appears to show a positive
relation with amplitude of the ERN (Santesso & Segalowitz, 2009). Altogether, these data
suggest that attentional flexibility is required to link affect, internalized social norms, and
action in everyday life situations (Rueda, Checa, & Rothbart, 2010).

Studies of this sort are important for establishing links between biology and behavior.
Knowing the neural substrates of attention also provides a tool for examining which as­
pects of the attention functions are subject to genetic influence, as well as how the effi­
ciency of this system may be influenced by experience.

Genes

Describing the development of a specific neural network is only one step toward a biologi­
cal understanding. It is also important to know the genetic and environmental influences
that together built up the neural network.

Some features or psychological functions may be more subject to genetic influ­


(p. 312)

ences than others. The degree of heritability of each attention network in Posner’s model
was examined in a twin study using the scores provided by the ANT as phenotypes (Fan,
Wu, Fossella, & Posner, 2001). The study showed that the executive attention network
and, to a lesser degree, the alerting network show evidence of heritability. This suggested
that genetic variation contributes to individual differences at least in these two functions.

Links between specific neural networks of attention and chemical modulators allow inves­
tigating the genetic basis of normal attention (Fossella et al., 2002; Green et al., 2008).
Information on the neuromodulators that influence the function of the attention networks
has been used to search for genes related to these modulators. Thus, since the sequenc­
ing of the human genome in the year 2001 (Venter et al., 2001), many studies have shown
that genes influence individuals’ attention capacity (see Posner, Rothbart, & Sheese,
2007; see also Table 15.1). Polymorphisms in genes related to the norepinephrine and
dopamine systems have been associated with individual differences in the efficiency of
alerting and executive attention (Fossella et al., 2002). For example, it has been found
that scores of the executive attention network are specifically related to variations on the
dopamine receptor D4 (DRD4) gene and the monoanime oxidase-A (MAOA) gene (Fossella
et al., 2002), as well as other dopaminergic genes such as the dopamine transporter 1
(DAT1) gene (Rueda, Rothbart, McCandliss, Saccomanno, & Posner, 2005) and the cate­
chol-O-methyltransferase (COMT) gene (Diamond, Briand, Fossella, & Gehlbach, 2004).
Moreover, individuals carrying the alleles associated with better performance showed
greater activation in the anterior cingulate gyrus while performing the ANT (Fan, Fossel­
la, Sommer, Wu, & Posner, 2003). On the other hand, polymorphisms of genes influencing
the function of cholinergic receptors have been associated with attentional orienting as
measured with both RT and ERP (Parasuraman, Greenwood, Kumar, Fossella, et al., 2005;
Winterer et al., 2007).

Page 25 of 40
Development of Attention

Experience

Genetic data could wrongly lead to the impression that attention is not susceptible to the
environment and cannot be enhanced or harmed by experience. However, this conclusion
would greatly contradict evidence on the extraordinarily plastic capacity of the human
nervous system, especially during development (see Posner & Rothbart, 2007). There is
some evidence suggesting that susceptibility to the environment might even be embed­
ded in genetic endowment because some genetic polymorphisms, often under positive se­
lection, appear to make children more susceptible to environmental factors such as par­
enting (Sheese, Voelker, Rothbart, & Posner, 2007).

In the past years, much evidence has been provided in favor of the susceptibility of sys­
tems of self-regulation to the influence of experience. One piece of evidence comes from
studies showing vulnerability of attention to environmental aspects such as parenting and
socioeconomic status (SES; Bornstein & Bradley, 2003). Noble, McCandliss, and Farah
(2007) assessed predictability of SES in a wide range of cognitive abilities in children.
These investigators found that SES accounts for portions of variance, particularly in lan­
guage but also in other superior functions including executive attention. Parental level of
education is also an important environmental factor, highly predictive of the family SES. A
recent study has shown that children whose parents have lower levels of education ap­
pear to have more difficulty selecting out irrelevant information as shown by ERP than
those with highly educated parents (Stevens, Lauinger, & Neville, 2009). All these data
indicate that children’s experiences can shape the functional efficiency of brain networks
and also suggest that providing children with the appropriate experiences may constitute
a good method to enhance attentional skills.

Optimizing Attention Development


Several studies have shown that different intervention methods lead to significant im­
provements in attentional efficiency. Several years ago, in collaboration with Michael Pos­
ner and Mary Rothbart at the University of Oregon, we designed a set of computer exer­
cises aimed at training attention and tested a 5-day training intervention with children
between 4 and 6 years of age (Rueda et al., 2005). Before and after training, the children
performed the child ANT while their brain activation was recorded with an EEG system.
Children in the intervention group showed clear evidence of improvement in the execu­
tive attention network after training, in comparison with a control group who viewed in­
teractive videos matched to the duration of the intervention. The frontal negative ERP
typically observed in conflict tasks showed a more adult-like pattern (i.e. shorter delay
and progressively more posterior scalp distribution) in trained (p. 313) children compared
with controls, suggesting that the training altered the brain mechanisms of conflict reso­
lution in the direction of maturation. The beneficial effect of training attention also trans­
ferred to nontrained measures of fluid intelligence. Recently, we extended the number of
exercises and sessions in our training program and replicated the benefits of training in
brain activation and intelligence with a group of 5-year-old children (Rueda, Checa &

Page 26 of 40
Development of Attention

Combita, 2012). In this study, trained children showed a faster and more efficient activa­
tion of the brain circuitry involved in executive attention, an effect that was still observed
at 2 months’ follow-up. An important question related to intervention is whether it has the
potential to overcome the influence of negative experience or unfavorable constitutional
conditions. Although further data on this question are undoubtedly needed, current evi­
dence indicates that training may be an important tool, especially for children with
greater risk for experiencing attentional difficulties.

Consistently with our results, other studies have shown beneficial effects of cognitive
training on attention and other forms of executive functions during development. For in­
stance, auditory selective attention was improved by training with a computerized pro­
gram designed to promote oral language skills in both language-impaired and typically
developing children (Stevens, Fanning, Coch, Sanders, & Neville, 2008). Klingberg and
colleagues have shown that working memory training has benefits and shows some de­
gree of transfer to aspects of attention (Thorell, Lindqvist, Nutley, Bohlin, & Klingberg,
2009). The Klingberg group has also shown evidence that training can affect various lev­
els of brain function including activation (Olesen, Westerberg, & Klingberg, 2004) and
changes in the dopamine D1 receptor system (McNab et al., 2009) in areas of the cere­
bral cortex involved in the trained function.

There is also some evidence that curricular interventions directly carried out in the class­
room can lead to improvements in children’s cognitive control. Diamond et al. (2007)
tested the influence of a specific curriculum on preschoolers’ control abilities and found
beneficial effects as measured by various conflict tasks. A somewhat indirect but proba­
bly not less beneficial form of fostering attention in school could be provided by multilin­
gual education. There is growing evidence indicating that bilingual individuals perform
better on executive attention tasks than monolinguals (Bialystok, 1999; Costa, Hernan­
dez, & Sebastian-Galles, 2008). The idea is that people using multiple languages on a reg­
ular basis might train executive attention because of the need to suppress one language
while using the other.

Although all this evidence shows promising results about the effectiveness of interven­
tions and particular educational methods to promote attentional skills, questions on vari­
ous aspects of training remain to be answered. In future studies, it will be important to
address questions such as whether genetic variation and other constitutionally based
variables influence the extent to which the executive attention network can be modified
by experience, and whether there are limits to the ages at which training can be effec­
tive. Additionally, further research will be needed to examine whether the beneficial ef­
fects of these interventions transfer to abilities relevant for schooling competence.

Summary and Conclusions


The emergence of the field of cognitive neuroscience constituted a turning point in the
study of the relationship between brain and cognition, from which the study of attention
benefited greatly. Attention has been related to a variety of constructs including the state
Page 27 of 40
Development of Attention

of alertness, selectivity, and executive control. These various functions have been ad­
dressed in cognitive studies conducted for the most part during the second half of the
twentieth century. Since then, imaging studies have shown that each function is associat­
ed with activation of a particular network of brain structures. It has been also determined
that the function of each attention network appears to be modulated by particular neuro­
chemical mechanisms. Using Posner’s neuroanatomical model as a theoretical frame­
work, I have reviewed developmental studies conducted with infants and children. Evi­
dence show that maturation of each attention network follows a particular trajectory that
extends from birth to late childhood and, in the case of executive control, may continue
during adolescence. Apart from the progressive competence acquired with maturation,
efficiency of attentional functions is subject to important differences among individuals of
the same age. Evidence shows that individual differences in attentional efficiency depend
on constitutional as well as environmental factors, or the combination of both. Important­
ly, these individual differences largely appear to contribute to central aspects in the life of
children, including social-emotional development and school competence. For the future,
it will be important to understand the mechanisms by which genes and experience influ­
ence the organization and (p. 314) efficiency of the attention networks and whether there
exist sensitive periods when intervention to foster attention may have the most prevalent
benefits.

Author Note
This work was supported by grants from the Spanish Ministry of Science and Innovation,
refs. PSI2008-02955 and PSI2011–27746.

References
Abundis, A., Checa, P., Castellanos, C., & Rueda, M. R. (2013). Electrophysiological corre­
lates of the development of attention networks in childhood. Manuscript submitted for
publication.

Baddeley, A. D. (1993). Working memory or working attention? In A. D. Baddeley & L.


Weiskrantz (Eds.), Attention, selection, awareness and control (pp. 152–170). Oxford, UK:
Clarendon Press.

Beauregard, M., Levesque, J., & Bourgouin, P. (2001). Neural correlates of conscious self-
regulation of emotion. Journal of Neuroscience, 21 (18), 6993–7000.

Berger, A., Jones, L., Rothbart, M. K., & Posner, M. I. (2000). Computerized games to
study the development of attention in childhood. Behavior Research Methods, Instru­
ments & Computers, 32 (2), 297–303.

Berger, A., Tzur, G., & Posner, M. I. (2006). Infant brains detect arithmetic errors. Pro­
ceedings of the National Academy of Sciences, 103 (33), 12649–12653.

Page 28 of 40
Development of Attention

Bialystok, E. (1999). Cognitive complexity and attentional control in the bilingual mind.
Child Development, 70 (3), 636.

Bornstein, M. H., & Bradley, R. H. (2003). Socioeconomic status, parenting, and child de­
velopment. Mahwah, NJ: Erlbaum.

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict
monitoring and cognitive control. Psychological Review, 108 (3), 624–652.

Botvinick, M., Nystrom, L. E., Fissell, K., Carter, C. S., & Cohen, J. D. (1999). Conflict
monitoring versus selection-for-action in anterior cingulate cortex. Nature, 402 (6758),
179–181.

Broadbent, D. E. (1958). Perception and communication. New York: Pergamon.

Brozoski, T. J., Brown, R. M., Rosvold, H. E., & Goldman, P. S. (1979). Cognitive deficit
caused by regional depletion of dopamine in prefrontal cortex of rhesus monkey. Science,
205, 929–932.

Bush, G., Luu, P., & Posner, M. I. (2000). Cognitive and emotional influences in anterior
cingulate cortex. Trends in Cognitive Sciences, 4 (6), 215–222.

Butcher, P. R. (2000). Longitudinal studies of visual attention in infants: The early devel­
opment of disengagement and inhibition of return. Meppel, The Netherlands: Aton.

Callejas, A., Lupiáñez, J., & Tudela, P. (2004). The three attentional networks: On their in­
dependence and interactions. Brain and Cognition, 54 (3), 225–227.

Casey, B. J., & Richards, J. E. (1988). Sustained visual attention in young infants measured
with an adapted version of the visual preference paradigm. Child Development, 59, 1514–
1521.

Casey, B. J., & Richards, J. E. (1991). A refractory period for the heart rate response in in­
fant visual attention. Developmental Psychobiology, 24, 327–340.

Casey, B., Thomas, K. M., Davidson, M. C., Kunz, K., & Franzen, P. L. (2002). Dissociating
striatal and hippocampal function developmentally with a stimulus-response compatibility
task. Journal of Neuroscience, 22 (19), 8647–8652.

Checa, P., Rodriguez-Bailon, R., & Rueda, M. R. (2008). Neurocognitive and temperamen­
tal systems of self-regulation and early adolescents’ school competence. Mind, Brain and
Education, 2 (4), 177–187.

Checa, P., & Rueda, M. R. (2011). Behavioral and brain measures of executive attention
and school competence in late childhood. Developmental Neuropsychology, 36 (8), 1018–
1032.

Cherry, C. E. (1953). Some experiments on the recognition of speech with one and two
ears. Journal of the Acoustical Society, 25, 975–979.
Page 29 of 40
Development of Attention

Clohessy, A. B., Posner, M. I., & Rothbart, M. K. (2001). Development of the functional vi­
sual field. Acta Psychologica, 106 (1–2), 51–68.

Coch, D., Sanders, L. D., & Neville, H. J. (2005). An event-related potential study of selec­
tive auditory attention in children and adults. Journal of Cognitive Neuroscience, 17 (4),
605–622.

Colombo, J. (2001). The development of visual attention in infancy. Annual Review of Psy­
chology, 52, 337–367.

Colombo, J., & Horowitz, F. D. (1987). Behavioral state as a lead variable in neonatal re­
search. Merrill-Palmer Quarterly, 33, 423–438.

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten­
tion in the brain. Nature Reviews Neuroscience, 3 (3), 201–215.

Costa, A., Hernandez, M., & Sebastian-Galles, N. (2008). Bilingualism aids conflict resolu­
tion: Evidence from the ANT task. Cognition, 106 (1), 59–86.

Coull, J. T., Frith, C. D., Büchel, C., & Nobre, A. C. (2000). Orienting attention in time: Be­
havioral and neuroanatomical distinction between exogenous and endogenous shifts.
Neuropsychologia, 38 (6), 808–819.

Coull, J. T., Frith, C. D., Frackowiak, R. S. J., & Grasby, P. M. (1996). A fronto-parietal net­
work for rapid visual information processing: A PET study of sustained attention and
working memory. Neuropsychologia, 34 (11), 1085–1095.

Coull, J. T., Nobre, A. C., & Frith, C. D. (2001). The noradrenergic a2 agonist clonidine
modulates behavioural and neuroanatomical correlates of human attentional orienting
and alerting. Cerebral Cortex, 11 (1), 73–84.

Crottaz-Herbette, S., & Menon, V. (2006). Where and when the anterior cingulate cortex
modulates attentional response: Combined fMRI and ERP evidence. Journal of Cognitive
Neuroscience, 18 (5), 766–780.

Cui, R. Q., Egkher, A., Huter, D., Lang, W., Lindinger, G., & Deecke, L. (2000). High resolu­
tion spatiotemporal analysis of the contingent negative variation in simple or complex mo­
tor tasks and a non-motor task. Clinical Neurophysiology, 111 (10), 1847–1859.

Curtindale, L., Laurie-Rose, C., Bennett-Murphy, L., & Hull, S. (2007). Sensory modality,
temperament, and the development of sustained attention: A vigilance study in children
and adults. Developmental Psychology, 43 (3), 576–589.

Danis, A., Pêcheux, M.-G., Lefèvre, C., Bourdais, C., & Serres-Ruel, J. (2008). A continuous
performance task in preschool children: Relations between attention and performance.
European Journal of Developmental Psychology, 5 (4), 401–418.

Page 30 of 40
Development of Attention

Davidson, M. C., Amso, D., Anderson, L. C., & Diamond, A. (2006). Development of cogni­
tive control and executive functions from 4 to 13 years: Evidence from manipulations of
memory, inhibition, and task switching. Neuropsychologia, 44 (11), 2037–2078.

Davidson, M. C., Cutrell, E. B., & Marrocco, R. T. (1999). Scopolamine slows the
(p. 315)

orienting of attention in primates to cued visual targets. Psychopharmacology, 142 (1), 1–


8.

Davidson, M. C., & Marrocco, R. T. (2000). Local infusion of scopolamine into intrapari­
etal cortex slows covert orienting in rhesus monkeys. Journal of Neurophysiology, 83 (3),
1536–1549.

Davis, E. P., Bruce, J., Snyder, K., & Nelson, C. A. (2004). The X-trials: Neural correlates of
an inhibitory control task in children and adults. Journal of Cognitive Neuroscience, 15,
532–443.

Dehaene, S., Posner, M. I., & Tucker, D. M. (1994). Localization of a neural system for er­
ror detection and compensation. Psychological Science, 5 (5), 303–305.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­
nual Review of Neuroscience, 18, 193–222.

Diamond, A. (1991). Neuropsychological insights into the meaning of object concept de­
velopment. In S. Carey & R. Gelman (Eds.), The epigenesis of mind: Essays on biology and
cognition (pp. 67–110). Hillsdale, NJ: Erlbaum.

Diamond, A. (2006). The early development of executive functions. In E. Bialystok & F. I.


M. Craik (Eds.), Lifespan cognition: Mechanisms of change (pp. 70–95 xi, 397). New York:
Oxford University Press.

Diamond, A., Barnett, W. S., Thomas, J., & Munro, S. (2007). Preschool program improves
cognitive control. Science, 318 (5855), 1387–1388.

Diamond, A., Briand, L., Fossella, J., & Gehlbach, L. (2004). Genetic and neurochemical
modulation of prefrontal cognitive functions in children. American Journal of Psychiatry,
161 (1), 125–132.

Dockree, P. M., Kelly, S. P., Roche, R. A. P., Reilly, R. B., Robertson, I. H., & Hogan, M. J.
(2004). Behavioural and physiological impairments of sustained attention after traumatic
brain injury. Cognitive Brain Research, 20 (3), 403–414.

Drevets, W. C., & Raichle, M. E. (1998). Reciprocal suppression of regional cerebral blood
flow during emotional versus higher cognitive processes: Implications for interactions be­
tween emotion and cognition. Cognition & Emotion, 12 (3), 353–385.

Driver, J., Eimer, M., & Macaluso, E. (2007). Neurobiology of human spatial attention:
Modulation, generation and integration. In N. Kanwisher & J. Duncan (Eds.), Attention

Page 31 of 40
Development of Attention

and performance XX: Functional brain imaging of visual cognition (pp. 267–300). New
York: Oxford University Press.

Durston, S., Thomas, K. M., Yang, Y., Ulug, A. M., Zimmerman, R. D., & Casey, B. (2002).
A neural basis for the development of inhibitory control. [Journal Peer Reviewed Journal].
Developmental Science, 5 (4), F9–F16.

Eisenberg, N., Fabes, R. A., Nyman, M., Bernzweig, J., & Pinuelas, A. (1994). The rela­
tions of emotionality and regulation to children’s anger-related reactions. Child Develop­
ment, 65 (1), 109–128.

Enns, J. T., & Brodeur, D. A. (1989). A developmental study of covert orienting to peripher­
al visual cues. Journal of Experimental Child Psychology, 48 (2), 171–189.

Enns, J. T., & Cameron, S. (1987). Selective attention in young children: The relations be­
tween visual search, filtering, and priming. Journal of Experimental Child Psychology, 44,
38–63.

Enns, J. T., & Girgus, J. S. (1985). Developmental changes in selective and integrative vi­
sual attention. Journal of Experimental Child Psychology, 40, 319–337.

Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a
target letter in a nonsearch task. Perception & Psychophysics, 16 (1), 143–149.

Etkin, A., Egner, T., Peraza, D. M., Kandel, E. R., & Hirsch, J. (2006). Resolving emotional
conflict: A role for the rostral anterior cingulate cortex in modulating activity in the amyg­
dala. Neuron, 51 (6), 871–882.

Everitt, B. J., & Robbins, T. W. (1997). Central cholinergic systems and cognition. Annual
Review of Psychology, 48.

Fair, D. A., Dosenbach, N. U. F., Church, J. A., Cohen, A. L., Brahmbhatt, S., Miezin, F. M.,
et al. (2007). Development of distinct control networks through segregation and integra­
tion. PNAS Proceedings of the National Academy of Sciences of the United States of
America, 104 (33), 13507–13512.

Fan, J., Flombaum, J. I., McCandliss, B. D., Thomas, K. M., & Posner, M. I. (2003). Cogni­
tive and brain consequences of conflict. NeuroImage, 18 (1), 42–57.

Fan, J., Fossella, J., Sommer, T., Wu, Y., & Posner, M. I. (2003). Mapping the genetic varia­
tion of executive attention onto brain activity. Proceedings of the National Academy of
Sciences U S A, 100 (12), 7406–7411.

Fan, J., McCandliss, B. D., Fossella, J., Flombaum, J. I., & Posner, M. I. (2005). The activa­
tion of attentional networks. NeuroImage, 26 (2), 471–479.

Page 32 of 40
Development of Attention

Fan, J., McCandliss, B. D., Sommer, T., Raz, A., & Posner, M. I. (2002). Testing the efficien­
cy and independence of attentional networks. Journal of Cognitive Neuroscience, 14 (3),
340–347.

Fan, J., Wu, Y., Fossella, J., & Posner, M. I. (2001). Assessing the heritability of attentional
networks. BMC Neuroscience, 2, 14.

Fossella, J., Sommer, T., Fan, J., Wu, Y., Swanson, J. M., Pfaff, D. W., et al. (2002). Assess­
ing the molecular genetics of attention networks. BMC Neuroscience, 3, 14.

Fuentes, L. J. (2004). Inhibitory processing in the attentional networks. In M. I. Posner


(Ed.), Cognitive neuroscience of attention (pp. 29–44). New York: Guilford Press.

Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive function in preschoolers: A re­
view using an integrative framework. Psychological Bulletin, 134, 31–60.

Gehring, W. J., Gross, B., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1993). A neural sys­
tem for error detection and compensation. Psychological Science, 4, 385–390.

Gerardi-Caulton, G. (2000). Sensitivity to spatial conflict and the development of self-reg­


ulation in children 24-36 months of age. Developmental Science, 3 (4), 397–404.

Graham, F. K., Anthony, B. J., & Ziegler, B. L. (1983). The orienting response and develop­
mental processes. In D. Siddle (Ed.), Orienting and habituation: Persperctives in human
research (pp. 371–430). New York: Wiley.

Green, A. E., Munafo, M. R., DeYoung, C. G., Fossella, J. A., Fan, J., & Gray, J. R. (2008).
Using genetic data in cognitive neuroscience: From growing pains to genuine insights.
Nature Reviews Neuroscience, 9, 710–720.

Hackley, S. A., & Valle-Inclán, F. (1998). Automatic alerting does not speed late motoric
processes in a reaction-time task. Nature, 391 (6669), 786–788.

Haith, M. M., Hazan, C., & Goodman, G. S. (1988). Expectation and anticipation of dynam­
ic visual events by 3.5 month old babies. Child Development, 59, 467–469.

Harman, C., Posner, M. I., Rothbart, M. K., & Thomas-Thrapp, L. (1994). Development of
orienting to objects and locations in human infants. Canadian Journal of Experimental
Psychology, 48, 301–318.

Harman, C., Rothbart, M. K., & Posner, M. I. (1997). Distress and attention inter­
(p. 316)

actions in early infancy. Motivation and Emotion, 21 (1), 27–43.

Hebb, D. O. (1949). Organization of behavior. New York: John Wiley & Sons.

Hillyard, S. A. (1985). Electrophysiology of human selective attention. Trends in Neuro­


sciences, 8 (9), 400–405.

Page 33 of 40
Development of Attention

Hillyard, S. A., Di Russo, F., & Martinez, A. (2006). The imaging of visual attention. In J.
Duncan & N. Kanwisher (Eds.), Attention and performance XX: Functional brain imaging
of visual cognition (pp. 381–388). Oxford, UK: Oxford University Press.

Houghton, G., & Tipper, S. P. (1994). A dinamic model of selective attention. In D. Dagen­
bach & C. T. (Eds.), Inhibitory mechanisms in attention, memory and language (pp. 53–
113). Orlando, FL: Academic Press.

James, W. (1890). The principles of psychology. New York: H. Holt.

Johnson, M. H., & Morton, J. (1991). Biology and cognitive development: The case of face
recognition. Oxford, UK: Blackwell.

Johnson, M. H., Posner, M. I., & Rothbart, M. K. (1991). Components of visual orienting in
early infancy: Contingency learning, anticipatory looking, and disengaging. Journal of
Cognitive Neuroscience, 3, 335–344.

Johnstone, S. J., Pleffer, C. B., Barry, R. J., Clarke, A. R., & Smith, J. L. (2005). Develop­
ment of inhibitory processing during the Go/NoGo Task: A behavioral and event-related
potential study of children and adults. Journal of Psychophysiology, 19 (1), 11–23.

Jones, L. B., Rothbart, M. K., & Posner, M. I. (2003). Development of executive attention
in preschool children. Developmental Science, 6 (5), 498–504.

Jonkman, L. M. (2006). The development of preparation, conflict monitoring and inhibi­


tion from early childhood to young adulthood: A Go/Nogo ERP study. Brain Research,
1097 (1), 181–193.

Jonkman, L. M., Lansbergen, M., & Stauder, J. E. A. (2003). Developmental differences in


behavioral and event-related brain responses associated with response preparation and
inhibition in a go/nogo task. Psychophysiology, 40 (5), 752–761.

Jonkman, L., Sniedt, F., & Kemner, C. (2007). Source localization of the Nogo-N2: A devel­
opmental study. Clinical Neurophysiology, 118 (5), 1069–1077.

Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall.

Karnath, H. O., Ferber, S., & Himmelbach, M. (2001). Spatial awareness is a function of
the temporal not the posterior parietal lobe. Nature, 411, 950–953.

Keele, S. W., Ivry, R. B., Mayr, U., Hazeltine, E., & Heuer, H. (2003). The cognitive and
neural architecture of sequence representation. Psychological Review, 110, 316–339.

Klein, R. M. (2000). Inhibition of return. Trends in Cognitive Sciences, 4, 138–147.

Klein, R. M. (2004). On the control of visual orienting. In M. I. Posner (Ed.), Cognitive


neuroscience of attention (pp. 29–44). New York: Guilford Press.

Page 34 of 40
Development of Attention

Kopp, B., Rist, F., & Mattler, U. (1996). N200 in the flanker task as a neurobehavioral tool
for investigating executive control. Psychophysiology, 33, 282–294.

Lamm, C., Zelazo, P. D., & Lewis, M. D. (2006). Neural correlates of cognitive control in
childhood and adolescence: Disentangling the contributions of age and executive func­
tion. Neuropsychologia, 44 (11), 2139–2148.

Levy, F. (1980). The development of sustained attention (vigilance) in children: Some nor­
mative data. Journal of Child Psychology and Psychiatry, 21 (1), 77–84.

Lin, C. C. H., Hsiao, C. K., & Chen, W. J. (1999). Development of sustained attention as­
sessed using the Continuous Performance Test among children 6–15 years of age. Journal
of Abnormal Child Psychology, 27 (5), 403–412.

Mangun, G. R., & Hillyard, S. A. (1987). The spatial allocation of visual attention as in­
dexed by event-related brain potentials. Human Factors, 29 (2), 195–211.

Marrocco, R. T., & Davidson, M. C. (1998). Neurochemistry of attention. In R. Parasura­


man (Ed.), The attentive brain (pp. 35–50): Cambridge, MA: MIT Press.

McCulloch, J., Savaki, H. E., McCulloch, M. C., Jehle, J., & Sokoloff, L. (1982). The distrib­
ution of alterations in energy metabolism in the rat brain produced by apomorphine.
Brain Research, 243, 67–80.

McNab, F., Varrone, A., Farde, L., Jucaite, A., Bystritsky, P., Forssberg, H., et al. (2009).
Changes in cortical dopamine D1 receptor binding associated with cognitive training.
Science, 323 (5915), 800–802.

Mezzacappa, E. (2004). Alerting, orienting, and executive attention: Developmental prop­


erties and sociodemographic correlates in an epidemiological sample of young, urban
children. Child Development, 75 (5), 1373–1386.

Morrison, F. J. (1982). The development of alertness. Journal of Experimental Child Psy­


chology, 34 (2), 187–199.

Newhouse, P. A., Potter, A., & Singh, A. (2004). Effects of nicotinic stimulation on cogni­
tive performance. Current Opinion in Pharmacology, 4 (1), 36–46.

Noble, K. G., McCandliss, B. D., & Farah, M. J. (2007). Socioeconomic gradients predict
individual differences in neurocognitive abilities. Developmental Science, 10 (4), 464–480.

Nobre, A. C. (2001). Orienting attention to instants in time. Neuropsychologia, 39 (12),


1317–1328.

Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of
behavior. In R. J. Davison, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness and self-
regulation (pp. 1–18). New York: Plenum Press.

Page 35 of 40
Development of Attention

Ochsner, K. N., Bunge, S. A., Gross, J. J., & Gabrieli, J. D. (2002). Rethinking feelings: An
fMRI study of the cognitive regulation of emotion. Journal of Cognitive Neuroscience, 14
(8), 1215–1229.

Olesen, P. J., Westerberg, H., & Klingberg, T. (2004). Increased prefrontal and parietal ac­
tivity after training of working memory. Nature Neuroscience, 7 (1), 75–79.

Parasuraman, R., Greenwood, P. M., Kumar, R., Fossella, J., et al. (2005). Beyond heritabil­
ity: Neurotransmitter genes differentially modulate visuospatial attention and working
memory. Psychological Science, 16, 200–207.

Pearson, D. A., & Lane, D. M. (1991). Auditory attention switching: A developmental


study. Journal of Experimental Child Psychology, 51 (2), 320–334.

Pelphrey, K. A., Reznick, J. S., Goldman, B. D., Sasson, N., Morrow, J., Donahoe, A., et al.
(2004). Development of visuospatial short-term memory in the second half of the first
year. Developmental Psychology, 40 (5), 836–851.

Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,


32 (1), 3–25.

Posner, M. I. (1995). Attention in cognitive neuroscience: An overview. In M. S.


(p. 317)

Gazzaniga (Ed.), The cognitive neurosciences (pp. 615–624). Cambridge, MA: MIT Press.

Posner, M. I. (2008). Measuring alertness. Annals of the New York Academy of Sciences,
1129 (Molecular and Biophysical Mechanisms of Arousal, Alertness, and Attention), 193–
199.

Posner, M. I., & Boies, S. J. (1971). Components of attention. Psychological Review, 78 (5),
391–408.

Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D.
Bouwhuis (Eds.), Attention and performance X (pp. 531–556). London: Erlbaum.

Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: Conflict, target detection,
and cognitive control. Cambridge, MA: MIT Press.

Posner, M. I., & Fan, J. (2008). Attention as an organ system. In J. R. Pomerantz (Ed.), Top­
ics in integrative neuroscience (pp. 31–61). New York: Cambridge University Press.

Posner, M. I., & Petersen, S. E. (1990). The attention system of human brain. Annual Re­
view of Neuroscience, 13, 25–42.

Posner, M. I., & Raichle, M. E. (1994). Images of mind. New York: Scientific American Li­
brary; Dist. W.H. Freeman.

Page 36 of 40
Development of Attention

Posner, M. I., & Rothbart, M. K. (2007). Educating the human brain. Washington, DC:
American Psychological Association.

Posner, M. I., Rothbart, M. K., & Sheese, B. E. (2007). Attention genes. Developmental
Science, 10 (1), 24–29.

Posner, M. I., Rueda, M. R., & Kanske, P. (2007). Probing the mechanisms of attention. In
J. T. Cacioppo, J. G. Tassinary, & G. G. Berntson (Eds.), Handbook of psychophysiology (3er
ed., pp. 410–432). Cambridge, UK: Cambridge University Press.

Posner, M. I., Sheese, B. E., Odludas, Y., & Tang, Y. (2006). Analyzing and shaping human
attentional networks. Neural Networks, 19 (9), 1422–1429.

Posner, M. I., & Snyder, C. R. R. (1975). Attention and cognitive control. In R. Solso (Ed.),
Information processing and cognition: The Loyola Symposium (pp. 55–85). Hillsdale, NJ:
Erlbaum.

Raz, A., & Buhle, J. (2006). Typologies of attentional networks. Nature Reviews Neuro­
science, 7 (5), 367–379.

Rensink, R. A., O’Reagan, J. K., & Clark, J. J. (1997). To see or not to see: The need for at­
tention to perceive changes in scenes. Psychological Science, 8 (5), 368–373.

Richards, J. E., & Casey, B. J. (1991). Heart rate variability during attention phases in
young infants. Psychophysiology, 28 (1), 43–53.

Richards, J. E., & Hunter, S. K. (1998). Attention and eye movements in young infants:
Neural control and development. In J. E. Richards (Ed.), Cognitive neuroscience of atten­
tion. Mahwah, NJ: LEA.

Ridderinkhof, K. R., & van der Molen, M. W. (1995). A psychophysiological analysis of de­
velopmental differences in the ability to resist interference. Child Development, 66 (4),
1040–1056.

Rothbart, M. K., Ahadi, S. A., Hersey, K. L., & Fisher, P. (2001). Investigations of tempera­
ment at three to seven years: The Children’s Behavior Questionnaire. Child Development,
72 (5), 1394–1408.

Rothbart, M. K., Ellis, L. K., Rueda, M., & Posner, M. I. (2003). Developing mechanisms of
temperamental effortful control. Journal of Personality, 71 (6), 1113–1143.

Rothbart, M. K., & Rueda, M. R. (2005). The development of effortful control. In U. Mayr,
E. Awh, & S. W. Keele (Eds.), Developing individuality in the human brain. A tribute to
Michael I. Posner (pp. 167–188). Washington, DC: American Psychological Association.

Rueda, M. R., Checa, P., & Combita, L. M. (2012). Enhanced efficiency of the executive at­
tention network after training in preschool children: Immediate changes and effects after
two months. Developmental Cognitive Neuroscience, 2S, S192–S204.

Page 37 of 40
Development of Attention

Rueda, M. R., Checa, P., & Rothbart, M. K. (2010). Contributions of attentional control to
social emotional and academic development. Early Education and Development, 21 (5),
744–764.

Rueda, M., Fan, J., McCandliss, B. D., Halparin, J. D., Gruber, D. B., Lercari, L. P., et al.
(2004). Development of attentional networks in childhood. Neuropsychologia, 42 (8),
1029–1040.

Rueda, M. R., Posner, M. I., & Rothbart, M. K. (2004a). Attentional control and self-regu­
lation. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, the­
ory, and applications (pp. 283–300). New York: Guilford Press.

Rueda, M. R., Posner, M. I., Rothbart, M. K., & Davis-Stober, C. P. (2004b). Development
of the time course for processing conflict: An event-related potentials study with 4 year
olds and adults. BMC Neuroscience, 5 (39), 1–13.

Rueda, M. R., Rothbart, M. K., McCandliss, B. D., Saccomanno, L., & Posner, M. I. (2005).
Training, maturation, and genetic influences on the development of executive attention.
Proceedings of the National Academy of Sciences U S A, 102 (41), 14931–14936.

Ruff, H. A., & Lawson, K. R. (1990). Development of sustained, focused attention in young
children during free play. Developmental Psychology, 26 (1), 85–93.

Ruff, H. A., & Rothbart, M. K. (1996). Attention in early development: Themes and varia­
tions. New York: Oxford University Press.

Sanders, L. D., Stevens, C., Coch, D., & Neville, H. J. (2006). Selective auditory attention
in 3-to 5-year-old children: An event-related potential study. Neuropsychologia, 44 (11),
2126–2138.

Santesso, D. L., & Segalowitz, S. J. (2009). The error-related negativity is related to risk
taking and empathy in young men. Psychophysiology, 46 (1), 143–152.

Santesso, D. L., Segalowitz, S. J., & Schmidt, L. A. (2005). ERP correlates of error moni­
toring in 10-year olds are related to socialization. Biological Psychology, 70 (2), 79–87.

Schul, R., Townsend, J., & Stiles, J. (2003). The development of attentional orienting dur­
ing the school-age years. Developmental Science, 6 (3), 262–272.

Segalowitz, S. J., & Davies, P. L. (2004). Charting the maturation of the frontal lobe: An
electrophysiological strategy. Brain and Cognition, 55 (1), 116–133.

Segalowitz, S. J., Unsal, A., & Dywan, J. (1992). Cleverness and wisdom in 12-year-olds:
Electrophysiological evidence for late maturation of the frontal lobe. Developmental Neu­
ropsychology, 8, 279–298.

Page 38 of 40
Development of Attention

Sheese, B. E., Rothbart, M. K., Posner, M. I., Fraundorf, S. H., & White, L. K. (2008). Exec­
utive attention and self-regulation in infancy. Infant Behavior & Development, 31 (3), 501–
510.

Sheese, B. E., Voelker, P., Posner, M. I., & Rothbart, M. K. (2009). Genetic variation influ­
ences on the early development of reactive emotions and their regulation by attention.
Cognitive Neuropsychiatry, 14 (4–5), 332–355.

Sheese, B. E., Voelker, P. M., Rothbart, M. K., & Posner, M. I. (2007). Parenting
(p. 318)

quality interacts with genetic variation in dopamine receptor D4 to influence tempera­


ment in early childhood. Development and Psychopathology, 19 (4), 1039–1046.

Simonds, J., Kieras, J. E., Rueda, M., & Rothbart, M. K. (2007). Effortful control, executive
attention, and emotional regulation in 7-10-year-old children. Cognitive Development, 22
(4), 474–488.

Sokolov, E. N. (1963). Perception and the conditioned reflex. Oxford, UK: Pergamon.

Stanwood, G. D., Washington, R. A., Shumsky, J. S., & Levitt, P. (2001). Prenatal cocaine
exposure produces consistent developmental alteration in dopamine-rich regions of the
cerebral cortex. Neuroscience, 106, 5–14.

Stevens, C., Fanning, J., Coch, D., Sanders, L., & Neville, H. (2008). Neural mechanisms
of selective auditory attention are enhanced by computerized training: Electrophysiologi­
cal evidence from language-impaired and typically developing children. Brain Research,
1205, 55–69.

Stevens, C., Lauinger, B., & Neville, H. (2009). Differences in the neural mechanisms of
selective attention in children from different socioeconomic backgrounds: An event-relat­
ed brain potential study. Developmental Science, 12 (4), 634–646.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experi­


mental Psychology, 18, 643–662.

Sturm, W., de Simone, A., Krause, B. J., Specht, K., Hesselmann, V., Radermacher, I., et al.
(1999). Functional anatomy of intrinsic alertness: Evidence for a fronto-parietal-thalamic-
brainstem network in the right hemisphere. Neuropsychologia, 37 (7), 797–805.

Tarkka, I. M., & Basile, L. F. H. (1998). Electric source localization adds evidence for task-
specific CNVs. Behavioural Neurology, 11 (1), 21–28.

Thorell, L. B., Lindqvist, S., Nutley, S. B., Bohlin, G., & Klingberg, T. (2009). Training and
transfer effects of executive functions in preschool children. Developmental Science, 12
(1), 106–113.

Titchener, E. B. (1909). Experimental psychology of the thought processes. New York:


Macmillan.

Page 39 of 40
Development of Attention

Tucker, D. M., Hartry-Speiser, A., McDougal, L., Luu, P., & deGrandpre, D. (1999). Mood
and spatial memory: Emotion and right hemisphere contribution to spatial cognition. Bio­
logical Psychology, 50, 103–125.

Valenza, E., Simion, F., & Umiltá, C. (1994). Inhibition of return in newborn infants. Infant
Behavior & Development, 17, 293–302.

van Veen, V., & Carter, C. (2002). The timing of action-monitoring processes in the anteri­
or cingulate cortex. Journal of Cognitive Neuroscience, 14, 593–602.

Venter, J. C., et al. (2001). The sequence of the human genome. Science, 291, 1304–1351.

Voytko, M. L., Olton, D. S., Richardson, R. T., Gorman, L. K., Tobin, J. R., & Price, D. L.
(1994). Basal forebrain lesions in monkeys disrupt attention but not learning and memo­
ry. Journal of Neuroscience, 14 (1), 167–186.

Wainwright, A., & Bryson, S. E. (2002). The development of exogenous orienting: Mecha­
nisms of control. Journal of Experimental Child Psychology, 82 (2), 141–155.

Wainwright, A., & Bryson, S. E. (2005). The development of endogenous orienting: Con­
trol over the scope of attention and lateral asymmetries. Developmental
Neuropsychology, 27 (2), 237–255.

Walter, W. G. (1964). Contingent negative variation: An electric sign of sensorimotor asso­


ciation and expectancy in the human brain. Nature, 203, 380–384.

Winterer, G., et al. (2007). Association of attentional network function with exon 5 varia­
tions of the CHRNA4 gene. Human Molecular Genetics, 16, 2165–2174.

Wright, R. D., & Ward, L. E. (2008). Orienting of attention. New York: Oxford University
Press.

Wynn, K. (1992). Addition and subtraction by human infants. Nature, 358, 749–750.

M. Rosario Rueda

M. Rosario Rueda, Departamento de Psicología Experimental, Universidad de Grana­


da, Spain

Page 40 of 40
Attentional Disorders

Attentional Disorders  
Laure Pisella, A. Blangero, Caroline Tilikete, Damien Biotti, Gilles Rode, Alain
Vighetto, Jason B. Mattingley, and Yves Rossetti
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0016

Abstract and Keywords

While reviewing Bálint’s syndrome and its different interpretations, a new framework of
the anatomical-functional organization of the posterior parietal cortex (dorsal visual
stream) is proposed in this chapter based on recent psychophysical, neuropsychological,
and neuroimaging data. In particular, the authors identify two main aspects that exclude
eye and hand movement disorders as specific deficit categories (optic ataxia). The first
aspect is spatial attention, and the second is visual synthesis. Ipsilesional attentional bias
(and contralesional extinction) may be caused by the dysfunction of the saliency map of
the superior parietal lobule (SPL) directly (by its lesion) or indirectly (by a unilateral le­
sion creating an inter-hemispheric imbalance between the two SPL). Simultanagnosia
may appear as concentric shrinking of spatial attention due to bilateral SPL damage. Oth­
er symptoms such as spatial disorganization (constructional apraxia) and impaired spatial
working memory or visual remapping may be attributable to the second, nonlateralized
aspect. Even if described initially in patients with left-sided neglect, these deficits occur
in the entire space and only after right inferior parietal lobule (IPL) damage. These two
aspects might also correspond to the ventral and dorsal networks of exogenous and en­
dogenous attention, respectively.

Keywords: Bálint’s syndrome, extinction, simultanagnosia, neglect, optic ataxia, constructional apraxia, endoge­
nous/exogenous attention, visual remapping, saliency maps, dorsal/ventral network of attention, dorsal/ventral vi­
sual streams

Page 1 of 56
Attentional Disorders

Introduction

Figure 16.1 Schematic representation of the puta­


tive lesions underlying the main syndromes pertain­
ing to the initial Bálint’s syndrome and observed af­
ter unilateral or bilateral parietal damage. Note that
this model is schematic and deals with only two pari­
etal systems (like Corbetta & Shulman, 2002): a sym­
metrical dorsal one (empty circles) labeled SPL-IPS
similarly organized in the right and left cortical hemi­
spheres (RH and LH) and a right-hemispheric ventral
one (gray oval) that we labeled right IPL. A cross
symbolizes damage to these systems. Optic ataxia
and extinction would result from damage of the SPL-
IPS system symmetrically (but see in the text nonlo­
calizationist accounts of extinction from the hemi­
spheric balance hypothesis). Full neglect syndrome
would result from damage to both the SPL and IPL in
the right hemisphere or from damage restricted to
the right IPL when, in the acute phase, it also in­
duces inhibition of the right SPL (Corbetta et al.,
2005; this effect is symbolized by an arrow and a
cross with dashed lines in the right SPL). When the
balance between the left and right SPLs is reestab­
lished, only constructional apraxia may persist after
a restricted lesion of the right IPL (Russell et al.,
2010). Simultanagnosia would result from bilateral
damage restricted to the superior parietal lobules,
whereas extent of the damage to right IPL after
stroke or posterior cortical atrophy (Benson’s syn­
drome is consecutive to larger bilateral parietal-tem­
poral site of neuronal degeneration) would lead to
more severe simultanagnosia with additional deficit
of visual synthesis and revisiting behavior.

In this chapter, we focus on the role of the posterior parietal cortex (PPC) in visual atten­
tion and use the disorder of Bálint’s syndrome (Bálint, 1909) as an illustrative example.
The PPC lies between the occipital-parietal sulcus (POS) and the post-Rolando sulcus. It
includes a superior parietal lobule (SPL) above and an inferior parietal lobule (IPL) below
the intraparietal sulcus (IPS). After reviewing the main interpretations (Andersen &
Buneo 2002; Colby & Goldberg, 1999; Milner & Goodale 1995; Ungerleider & Mishkin
1982) of the functional role of the dorsal (occipital-parietal) stream of visual processing
Page 2 of 56
Attentional Disorders

and the classic categorizations of Bálint’s syndrome, we propose a new framework for un­
derstanding the anatomical-functional organization of the PPC in the right hemisphere
(Figure 16.1). In particular, we put forward that the core processing of the PPC appears
to be attention and to only consequently affect “vision for action.” Koch and Ullman
(1985) have postulated the existence of a topographic feature map, which codes the
“saliency” of all information within the visual field. By definition, the amount of activity at
a given location within the saliency map represents the relative “conspicuity” or rele­
vance of the corresponding location in the visual field. Interestingly, such a saliency map
has been described by electrophysiologists within the macaque PPC. For instance, single-
unit recordings from the lateral intraparietal (LIP) (p. 320) area suggest that the primate
PPC contains a relatively sparse representation of space, with only those stimuli that are
most salient or behaviorally relevant being strongly represented (Figure 16.2; Gottlieb et
al., 1998). Area LIP provides one potential neural substrate for the levels of visual space
representation proposed by Niebur and Koch (1997), according to whom visual percep­
tion does not correspond to a basic retinotopic representation of the visual input, but is
instead a complex, prioritized interpretation of the environment (Pisella & Mattingley
2004). The autonomous, sequential selection of salient regions to be explored, overtly or
covertly, is postulated to follow the firing rate of populations of neurons in the saliency
map, starting with the most salient location and sampling the visual input in decreasing
order of salience (Niebur & Koch 1997). The saliency map would thus be a representa­
tional level from which the pattern of attentional or motor exploration of the visual world
is achieved.

Dorsal and Ventral Visual Streams: Historical Review of Their Differ­


ent Functional Interpretations

It is usually acknowledged that the central processing of visual information follows two
parallel cortical pathways. From the early occipital areas, an inferotemporal pathway (or
ventral stream) processes the objects for visual recognition and identification, and a pari­
etal pathway (or dorsal stream) makes a spatial (metric) analysis of visual inputs and pre­
pares goal-directed movements. By oversimplification, one often considers that the ven­
tral stream is the pathway of “vision for perception” and the dorsal stream the pathway of
“vision for action” (Milner & Goodale 1995), overlooking the fact that an extensive
frontal-parietal network underlies the visual-spatial and attentional processes necessary
for visual perception and awareness (Driver & Mattingley, 1998; Rees, 2001).

Initially, a “what” versus “where” dissociation depicted the putative function of these two
streams (Figure 16.3; Ungerleider & Mishkin 1982). Most subsequent theoretical formu­
lations, however, have emphasized a dissociation between detecting the presence of a vi­
sual stimulus (“conscious,” “explicit,” “perceptual”) and describing its attributes (“what,”
“semantic”), on the one hand, and performing simple, automatic motor responses toward
a visual stimulus (“unconscious,” “implicit,” “visuomotor,” “pragmatic,” “how”) on the
other (see Milner & Goodale, 1995; Jeannerod & Rossetti, (p. 321)

Page 3 of 56
Attentional Disorders

Figure 16.2 Saliency map. Gottlieb et al. (1998)


recorded the activity of single neurons in the lateral
intraparietal (LIP) of macaques. The receptive field
of each neuron was first assessed in a passive task in
which visual stimuli were flashed during eye fixation.
The responses of the cells were then assessed when
stimuli were brought into their receptive field by a
saccade. A circular array of eight stimuli remained
stably presented from the beginning of the experi­
ment, so that when the monkey made a saccade to­
ward the center of the array, the receptive field of
the neurons matched the location of one of the array
elements. In this first condition, the same stimulus
that activated the cell strongly when flashed in the
passive condition entered the receptive field but
elicited far less neuronal response. In a variant task,
only seven symbols of the array were present initially
on the screen; the eighth (missing) symbol appeared
anew in the receptive field of the cell before to make
the saccade toward the center of the array. This time,
the neuron responded intensely, showing that its ac­
tivity was critically dependent on the abrupt onset of
the stimulus, which rendered it salient. In another
variant, when the monkey maintained a peripheral
fixation, a cue appeared that matched one stimulus
of the array. Then the monkey made a first saccade
to the center of the array (bringing one array stimu­
lus into the receptive field) and a second saccade to
the cued array element. When the cued element was
brought into the receptive field under study by the
first saccade, the neuron discharge started around
the first saccade and continued until after the second
saccade (see panel a). In contrast, when an uncued
element of the array was brought into the receptive
field under study, the neuron did not respond to the
irrelevant array stimulus, even though it entered the
receptive field by means of the first saccade (see
panel b).

Page 4 of 56
Attentional Disorders

1993, Rossetti & Revonsuo,


2000). (p. 322) This evolution
in the functional interpreta­
tion of the dorsal stream
emerged from a large body
of psychophysical data on vi­
sual and visual-motor func­
tions, which became fertile
ground for speculating on
Figure 16.3 Famous use of the landmark test to the neural basis of percep­
characterize the function of the dorsal stream and tion–action dissociations
distinguish it from the ventral stream function. In
this test, an object has to be localized with respect to
(e.g., Milner & Goodale,
others lying on both sides of it (landmark test). After 1995, 2008; Rossetti, 1998;
a lesion to the dorsal visual stream (black area of the Rossetti & Pisella, 2002).
monkey brain on the right), the monkeys are unable
When human neuropsycholo­
to find where the food is based on such a spatial
landmark (hole the closest to the landmark) but re­ gy is considered, the con­
mained able to find the food based on an associated trast between optic ataxia
texture to identify (test on the left). The reverse pat­ (OA) on the one hand and ac­
tern is observed after a lesion to the ventral visual
tion-blindsight and visual ag­
stream (black area of the monkey brain on the left).
The perceptual version of the line bisection test (Bisi­ nosia on the other repre­
ach et al., 1998, Fink et al., 2000, Fierro et al., 2000) sents the core argument for
is also a landmark test. the suggested perception–ac­
Lesion study in monkey from Ungerleider & Mishkin, tion division between the
1982. ventral and dorsal anatomi­
cal streams (Milner &
Goodale 1995, 2008; Rossetti & Pisella, 2002). The ventral stream is fed almost exclusively by V1
and organized in a serial hierarchical fashion, whereas the dorsal stream is also fed by the supe­
rior colliculus and is organized in a more parallel fashion, including numerous shortcuts (Nowak
& Bullier, 1997; Rossetti, Pisella, & Pélisson, 2000). Consequently, after a lesion in V1 (action-
blindsight), only the dorsal stream remains active (Girard et al., 1991, 1992; see also review in
Danckert & Rossetti, 2005). Conversely, OA is a neurological condition encountered after dam­
age to the PPC or dorsal stream. Behaviorally, action-blindsight and OA seem to be two sides of
the same coin: In action-blindsight, the patients remain able to guide actions toward stimuli that
they do not see, whereas in OA, patients are impaired to guide actions toward visual stimuli that
they see. Similarly, after lesions to the inferior temporal lobe (visual agnosia), patient D.F. could
not recognize objects but could nevertheless perform simple reach-and-grasp movements that
fitted the location and the object visual properties of size and orientation (Carey et al., 1996;
Goodale et al., 1991). These neuropsychological dissociations have provided decisive elements
for considering the PPC as a vision-for-action pathway (Milner & Goodale 1995), reinforced by
the work of Andersen and Buneo (2002) indicating that the specialized spatial maps of the PPC
are involved in the planning of different actions (eye, hand, fingers).
As reviewed later in this chapter, however, the neurological consequences of lesions of
the dorsal stream are not restricted to OA, and even OA patients show deficits in the per­
ceptual domain. Bálint-Holmes syndrome, which arises from bilateral lesions of the PPC
extended toward the inferior parietal lobule (IPL) and temporal-parietal junction (TPJ) in
the right hemisphere, includes OA but also two symptoms affecting visual perception: uni­
lateral spatial neglect and simultanagnosia. In these patients, the ventral stream is pre­
served bilaterally, but visual perception is limited. In patients with neglect, perceptual
Page 5 of 56
Attentional Disorders

awareness is restricted to the ipsilesional side of space or objects. In patients with simul­
tanagnosia, perceptual awareness is restricted to a reduced central field of view, or to
just one object among others, even when the objects are superimposed in the same loca­
tion of space (e.g., in the case of overlapping figures).

After a lesion to the IPL and TPJ in the left hemisphere, patients may present with
Gerstmann’s syndrome, which will not be a focus of this chapter but which also includes a
visual perceptual symptom. In addition to dysgraphia, dyscalculia, and finger agnosia,
these patients are impaired at determining whether a letter is correctly oriented (left-
right confusion). This also suggests a role of the left hemisphere dorsal stream in building
oriented (canonical) perceptual representations. Accordingly, a neuroimaging study from
Konen and Kastner (2008) has revealed object-selective responses displayed in visual ar­
eas of both dorsal and ventral streams. Foci of activity in the lateral occipital cortex (ven­
tral stream) and in the IPS (dorsal stream) have been shown to represent objects inde­
pendent of viewpoint and size, whereas the responses are viewpoint and size specific in
the occipital cortex. Such symptoms, pertaining to Gerstmann’s or Bálint’s syndrome, af­
fect the perceptual domain after a lesion to the PPC and thereby highlight a necessary in­
teraction of dorsal and ventral streams for visual perception.

Figure 16.4 Cortical visual representation magnifies


central vision. The ventral stream includes multiple
specialized representations of central vision through
the prism of the processing of different elementary
visual attributes, while areas of the dorsal stream
mainly represent peripheral vision.

Adapted with permission from Wade et al., 2002.

According to an alternative characterization of the functional distinction between ventral


and dorsal streams, the ventral stream includes multiple areas specialized for different el­
ementary visual attributes (e.g., color, texture, shape) mainly processed in central vision,
whereas areas of the dorsal stream mainly represent peripheral vision or the entire visual
field (Figure 16.4; Wade et al., 2002). An interaction between the ventral and dorsal visu­
al streams for perception therefore appears necessary as soon as one takes into account
the structural organization of the retina (and in particular the distribution of cones and
Page 6 of 56
Attentional Disorders

rods), which allows a detailed analysis only in the restricted area corresponding to the
fovea (Figure 16.5). If the eyes are fixed, the ventral stream will provide information
about only a small central part of the visual scene (see Figure 16.5), even though the sub­
jective experience of most normally sighted individuals is of a coherent, richly detailed
world where everything is apprehended simultaneously. This subjective sensation of an
instantaneous, (p. 323) global, and detailed perception of the visual scene is an illusion, as
hinted at by the phenomenon of “change blindness” in healthy subjects (reviewed in Pisel­
la & Mattingley, 2004) and provided thanks to mechanisms of “active vision” (Figure 16.6)
relying on the PPC (Berman & Colby, 2009).

Role of the Dorsal Stream in Visual Perception: Active Vision

Figure 16.5 The structural organization of the retina


(and in particular the distribution of cones and rods)
allows a detailed analysis only in the restricted area
corresponding to the fovea. If the eyes are fixed, only
a small central part of the visual scene will be per­
ceived with a good acuity.

There are three mechanisms of active vision that rely on the PPC (visual attention, sac­
cadic eye movements, and visual remapping), as we outline below.

Figure 16.6 A complex triad of processes underlying


oculomotor mechanisms, spatial representations, and
visual attention are in play for visual perception.

First, shifts of attention without eye displacements (known as covert attention) have been
shown to improve spatial resolution and contrast sensitivity in peripheral vision (Carrasco

Page 7 of 56
Attentional Disorders

et al., 2000; Yeshurun & Carrasco, 1998). Attention can be defined as the neural process­
es that allow us to prioritize information of interest from a cluttered visual scene while
suppressing irrelevant detail. Note that a deficit of visual attention can be expressed as a
spatial deficit (p. 324) (omissions of relevant information in contralesional peripheral visu­
al space, i.e., eye-centered space) or a temporal one (increased time needed for relevant
information to be selected by attention and further processed; Husain, 2001). So a spatial
bias and a temporal processing impairment may be the two sides of the same coin. The
level of interest or relevance of a stimulus can be provided by physical properties that al­
low it to be distinguished easily from the background (bottom-up or exogenous attention­
al selection) or by its similarity to a specific target defined by the current task or goal
(top-down or endogenous attentional selection).

Second, rapid eye movements (known as saccades or overt attention) can bring a new
part of the visual scene to the fovea, allowing it to be analyzed with optimal visual acuity.
Active ocular exploration of a visual scene (three saccades per second on average) during
free viewing has been demonstrated using video-oculography by Yarbus (1967). The glob­
al image of the visual scene is thus built on progressively through active ocular explo­
ration, with each new saccade bringing a new component of the visual scene to the fovea
to be analyzed precisely by the retina and the visual cortex. Most visual areas in the oc­
cipital and posterior inferotemporal cortex (ventral stream) are organized with a retinal
topography, with a large magnification of the representation of central vision, relative to
peripheral vision (see Figure 16.4). At the level of these retinocentric visual representa­
tions, the components of the visual scene that are successively brought to the fovea by
the ocular exploration actually overwrite each other, at each new ocular fixation. These
different components are thus analyzed serially by the ventral stream as unlocalized
snapshots. A spatial linkage of these different details is necessary for global visual per­
ception.

This spatial linkage (or visual synthesis) of multiple snapshots is the third mechanism nec­
essary to perceive a large, stable, and coherent visual environment. Brain areas more or
less directly related to saccadic eye movements (frontal and parietal eye fields, superior
colliculus) demonstrate “oculocentric” (instead of retinocentric) spatial representations
acting as “spatial buffers”: The location of successively explored components is stored
and displaced in spatial register with the new eye position after each saccade (Figure
16.7; Colby et al., 1995). The PPC is specifically involved in the “remapping” (or “updat­
ing”) mechanisms compensating for the continuous changes of eye position (Heide et al.,
1995), which are crucial to building these dynamic oculocentric representations. Trans-
saccadic oculocentric remapping processes have been explored using computational mod­
els (e.g., Anderson & Van Essen, 1987), and demonstrated using specific psychophysical
paradigms like the double-step saccade task (Figure 16.8), in which the retinal location of
a stimulus has to be maintained in memory and recoded with respect to a new ocular lo­
cation owing to an intervening saccade. Such paradigms have been used in monkey elec­
trophysiology (e.g., Colby et al., 1995), human neuroimaging (Medendorp et al., 2003;
Merriam et al., 2003, 2007), after chronic lesions to the PPC (Duhamel et al., 1992; Heide

Page 8 of 56
Attentional Disorders

et al., 1995; Khan et al., 2005a, 2005b; Pisella et al., 2011), and during transcranial mag­
netic stimulation (TMS) of the PPC (Morris et al., 2007, Van Koningsbruggen et al., 2010).

Figure 16.7 Visual remapping in the lateral intra­


parietal (LIP) area in monkey. Monkey electrophysiol­
ogy has described dynamic oculocentric representa­
tions in which the neuronal response can outlast the
duration of a visual stimulus of interest within the
retinotopic receptive field, and this “memory” activi­
ty can be transferred to another neuron to recode
the location of the (extinguished) stimulus with re­
spect to new ocular position. Such neuronal activity
has been described in oculomotor centers also
known to be crucial for attention, that is, the superi­
or colliculus (Mays & Sparks 1980), the frontal eye
field (Goldberg & Bruce 1990; Tian et al. 2000), and
the LIP area. The role of the LIP area appears crucial
for visual remapping because it also contains neu­
rons that activity start in anticipation of a saccade
that will bring the location of the extinguished visual
stimulus into their receptive field (review in Colby et
al., 1995, Gottlieb et al., 1998).

From with permission Colby et al., 1995.

Consistent with these putative mechanisms of active vision, the perceptual symptoms of
Bálint-Holmes syndrome can be seen as arising from deficits in attentive vision, ocular ex­
ploration, and global apprehension of visual scenes. One common component of the syn­
drome is a limited capacity of attentive vision such that patients fail to perceive at any
time the totality of items forming a visual scene (leading to “piecemeal” perception of the
environment, and a local perceptual bias). Description or copy of images is accordingly
composed of elements of the original figure without perception of the whole scene. Pa­
tients not only explore working space in a disorganized fashion but also may return to
scrutinize the same item repeatedly. This “revisiting behavior” may lead, for example, to
patients counting more dots than are actually present in a test display (personal observa­

Page 9 of 56
Attentional Disorders

tion of the behavior of patients with Bálint’s syndrome after posterior cortical atrophy).
This revisiting behavior has been reported during ocular exploration of visual scenes in
patients with unilateral neglect and ascribed to a spatial working memory deficit (Husain
et al., 2001). Indeed, a study has demonstrated that patients with parietal neglect show,
in addition to their characteristic right attentional bias, a specific deficit of working mem­
ory for location and not for color or shape (Figure 16.9; (p. 325) Pisella et al., 2004). Pisel­
la and Mattingley (2004) ascribed this revisiting behavior and spatial working memory
deficit to a loss of “visual remapping” or “visual synthesis” due to parietal lobe damage,
which will contribute to severe visual neglect syndrome.

This association of (1) spatially restricted attention and (2) visual synthesis deficit ap­
pears as a consistent frame to understand the different expressions of posterior parietal
lobe lesions. Attention will be restricted to central vision in bilateral lesions or to the ip­
silesional (eye-centered) visual field in unilateral lesions, affecting visual perception and
action in peripheral vision. Additional deficit of visual synthesis might occur in the entire
visual field, producing a disorganization of active vision mechanisms and thereby increas­
ing the severity of the patient’s restricted field of view (unilateral in neglect and bilateral
in simultanagnosia).

Page 10 of 56
Attentional Disorders

Bálint-Holmes Syndrome

Figure 16.8 Double-step saccade task. A, Example of


a double-step stimulus with the two targets, A and B,
being flashed successively while the gaze is directed
to a central fixation point (FP). When both saccades
are performed after all of the targets have disap­
peared, the motor vector of the second saccade (A →
B) is different from the retinal vector of the second
target (FP → B or A → B’). However, in this condition,
the saccade toward position B is achieved correctly
both in humans and in animals. There is thus a need
to postulate remapping mechanisms allowing the
oculomotor system to anticipate the new retinal posi­
tion B by integrating the displacement on the retina
produced by the first saccade toward position A. (Re­
drawn from Heide & Kömpf, 1997.) B, Results of the
double-step task in patients with a left or right pari­
etal lesion (in the posterior parietal cortex [PPC])
compared with patients with a right frontal lesion (in
the frontal eye field [FEF] or prefrontal cortex [PFC])
and controls. It represents the mean absolute error
of final eye position (FEP) after double-step trials,
plotted separately for the four different stimulus con­
ditions of the study (R-R, centripetal double-step
within the right hemifield; L-L, centripetal double-
step within the left hemifield; R-L, double-step be­
tween hemifields starting with target on the right; L-
R, double-step between hemifields starting with tar­
get on the left). Double-steps with retinospatial disso­
nance (upper panel) necessitates remapping process­
es, whereas double-steps with no retinospatial disso­
nance do not need remapping processes. Significant
errors relative to control performance are indicated
by asterisks. Note that patients with a parietal lesion
exhibit errors specific to double-step with retinospa­
tial dissonance (contrary to patients with frontal le­
sion): patients a with right PPC lesion (black bars)
are impaired for both between-hemifield double-
steps and the L-L stimulus, whereas patients with a
right PPC lesion are only impaired for between-hemi­
fields stimuli. Standard deviations of the control
group are indicated by vertical lines. Results in pa­
tients with a left prefrontal cortex lesion are not
shown in this diagram because they were not signifi­
cantly different from the control group. (Reproduced
from Heide et al., 1995, with permission.)

Page 11 of 56
Attentional Disorders

The syndrome reported by Reszo Bálint in 1909 and later, in other terms, by Holmes
(1918) is a clinical entity that combines variously a set of complex spatial behavior disor­
ders following bilateral damage (p. 326) to the occipital-parietal junction (dorsal stream).
Both diversity of terminology used in the literature and bias in clinical descriptions, which
often reflect a particular opinion of the authors on underlying mechanisms, add to the dif­
ficulty in describing and comprehending this rare and devastating syndrome. Despite
these flaws, Bálint-Holmes syndrome can be identified at the bedside examination and al­
lows a robust anticipation of lesion localization.

Initially, Bálint’s syndrome was described as a triad, composed of the following:

• Optische Ataxie, a defect of visually guided hand movements characterized by spatial


errors when the patient uses the contralesional hand and/or reaches objects in periph­
eral vision in the contralesional visual field
• Räumliche Storung der Aufmerksamkeit—described as a lateralized spatial attention
disorder in which attention in the extrapersonal space is oriented to the right of the
body midline and in which stimuli lying to the left of fixation are neglected—corre­
sponding to what is now called unilateral neglect
• Seelenlähmung des Schauens—described as an extreme restriction of visual atten­
tion, such that only one object is seen at a time—which corresponds to what Luria
called later “disorder of simultaneous perception” (1959), following the term “simul­
tanagnosia” coined by Wolpert in 1924. This set of symptoms has often been translated
as “psychic paralysis of gaze” to highlight that although the patients exhibit no visual
field defect and no oculomotor paralysis, they manifests no attention for visual events
appearing in peripheral vision.

Page 12 of 56
Attentional Disorders

Figure 16.9 Spatial working memory deficit in the


entire visual space in parietal neglect. Added to the
attentional left-right gradient, a deficient spatial
working memory for the whole visual space is evi­
denced by the difference between conditions of
change detection with (1 second of interstimuli inter­
val; black lines) and without (white lines) delay, in
parietal neglect only. Note that the location change
always occurred in one object only, within a vertical
quadrant (= column of the grid), as illustrated.

Adapted with permission from Pisella et al., 2004.

The related “visual disorientation” syndrome described a few years later by Holmes
(1918; Smith & Holmes, 1916) in soldier patients with bilateral parietal lesions (see also
Inouye, 1900, cited by Rizzo & Vecera, 2002) highlighted a particular oculomotor disor­
der: wandering of gaze in search for peripheral objects and a difficulty to maintain fixa­
tion. This eye movement disorganization, later (p. 327) labeled gaze ataxia or apraxia
(reviewed in Vallar, 2007), was accompanied by a complete deficit for visually guided
reach-to-grasp movements, even when performed in central vision.

The comparison between Bálint’s and Holmes’ descriptions of the consequences of le­
sions to the posterior parietal cortex put forward three differences that still correspond to
debated questions and that constitute the plan of this chapter.

First, the lateralized aspect of the behavioral deficit for left visual space in Bálint’s syn­
drome (corresponding to räumliche Störung der Aufmerksamkeit) does not appear in the
patients’ description of Holmes, suggesting that the component known nowadays as uni­
lateral neglect might be easily dissociated from the two others components of the triad by
its specific right-hemispheric localization. Visual extinction is often considered as minimal
severity of neglect, and prevalence for right-hemispheric lesions is reported for visual ex­
tinction as well as for neglect (Becker & Karnath, 2007). Nevertheless, their dissociation
has been shown clinically (Karnath et al., 2003) and experimentally: Contralesional visual

Page 13 of 56
Attentional Disorders

extinction is symmetrically linked to the function of the right or left superior parietal lob­
ules (Hilgetag et al., 2001; Pascual-Leone, 1994), whereas some aspects of neglect (espe­
cially those concerned with visual scanning; Ashbridge et al., 1997; Ellison et al., 2004;
Fierro et al., 2000; Fink et al., 2000; Muggleton et al., 2008) are distinct from visual ex­
tinction and specifically linked to the right inferior parietal cortex (and superior temporal
cortex). Furthermore, the recent literature tends to highlight the contribution of nonlater­
alized deficits to neglect syndrome, that is, deficits specifically linked to right-hemispher­
ic damage but nonspatially restricted to the contralesional visual space, like sustained at­
tention or spatial working memory (Husain, 2001, 2008; Husain & Rorden, 2003; Malho­
tra et al., 2005, 2009; Pisella et al., 2004; Robertson, 1989). The dissociation between lat­
eralized and nonlateralized deficits following parietal damage are re-explored and rede­
fined in this chapter, notably in the section devoted to neglect, discussed with respect to
visual extinction.

A second difference between Bálint and Holmes is their controversial interpretation of


the visual-manual deficits of OA. Optische Ataxie was interpreted by Bálint and further by
Garcin et al. (1967) and Perenin and Vighetto (1988) as a specific interruption of visual
projections to the hand motor center. For Holmes, these visual-manual errors simply re­
flected a general visual disorientation considered as basically visual (resulting from a reti­
nal or an extraocular muscle position sense deficit). The debate between a visual-spatial
or a specifically visual-manual nature of OA deficits has been renewed by recent data
showing subclinical saccadic and attentional deficits in patients with OA (reviewed in
Pisella et al., 2009). In this chapter, we explore how these attentional deficits can be dis­
tinguished from those of neglect patients.

A third difference is the presentation of the eye movement troubles as causing the deficit
by (p. 328) Holmes, whereas they were considered consequences of higher level attention­
al deficits by Bálint (Seelenlähmung des Schauens most often translated in English by
“psychic paralysis of gaze”). This question corresponds to the conflicting “intentional”
and “attentional” functional views of the parietal cortex (Andersen & Buneo, 2002; Colby
& Goldberg, 1999). For the former, the PPC is anatomically segregated into regions per­
mitting the planning of different types of movements (reach, grasp, saccade), whereas for
the latter, the different functional regions represent locations of interest of external and
internal space with respect to different reference frames.

Page 14 of 56
Attentional Disorders

Figure 16.10 Unilateral neglect. Left, Unilateral ne­


glect may behaviorally manifest as omission to con­
sider the left space of the body (e.g., while shaving),
of objects (e.g., while preparing an apple pie; Rode et
al., 2007a), or of mental images (e.g., while mentally
exploring a map of France; Rode et al., 1995, 2007b).
Right, Unilateral neglect syndrome is more classical­
ly diagnosed using a series of paper-and-pencil tests,
which should include a drawing from memory (e.g.,
daisy), a drawing copy (e.g., the bicycle), a cancella­
tion task in which each object of a visual scene must
be individuated as if they were to be counted (e.g.,
line cancellation), the midline judgment in its motor
(line bisection) or perceptual (landmark test) version
and a spontaneous writing test on a white sheet.

This chapter reviews historical and recent arguments that have been advanced in the
context of these three debates, with a first section on neglect, a second on OA, and a
third on the less defined third component of the Bálint’s syndrome: psychic paralysis of
gaze. Visual extinction is advocated in each of these three sections. This clinical review of
attentional disorders should allow us, through the patients’ behavioral descriptions, to
start to delineate theoretically and neurologically the concepts of selective versus sus­
tained attention, object versus spatial attention, saliency versus oculomotor maps, and ex­
ogenous versus endogenous attention. A conceptual segregation, different from Bálint’s,
of these symptoms consecutive to lesions of the PPC will be progressively introduced and
finally proposed based on the recent advances on posterior parietal functional organiza­
tion made through psychophysical assessments of patients and neuroimaging data (see
Figure 16.1).

Unilateral Neglect
Basic Arguments for Dissociation Between Neglect and Extinction

Unilateral neglect is defined by a set of symptoms in which the left part of space or ob­
jects is not explicitly behaviorally considered. This lateralized definition (left-sided ne­
glect instead of contralesional neglect), and its global distinction with contralesional visu­
al extinction, is acknowledged as soon as one considers that only the neglect symptoms
(and not the extinction symptoms) appear in the most complex, typically human, types of
Page 15 of 56
Attentional Disorders

visual-spatial behavior like ecological (Figure 16.10, left panel; e.g., make up or shave on­
ly the right side of the face, prepare an apple pie, represent mentally a well-known space,
art) or paper-and-pencil (see Figure 16.10, right panel; e.g., draw, copy, enumerate, evalu­
ate lengths between left and right spatial positions, write on a white sheet) tasks.

Figure 16.11 Percentage of correct detection of left-


sided targets. Histograms are plotted as a function of
trial type (single versus double presentation) for
each of the seven conditions illustrated below (each
panel shows the relative locations of visual stimuli—
targets and fixation cross): fixate left, fixate center,
and fixate right; fixate right-left target at 10.8° or
21.6°; fixate left-right target at 10.8° or 21.6°. The
cross disappeared 50 ms before target onset. The
pattern of performance suggests extinction in eye-
centered coordinates.

Reprinted with permission from Mattingley et al.


Copyright © 2000 Routledge.

However, it is more classical to distinguish unilateral neglect from visual extinction based
on a more experimental subtlety, the latter being a failure to report an object located con­
tralesionally only when it competes for attentional resources with an object located ipsile­
sionally. Critically, report of contralesional items presented in isolation should be normal
(or better than for simultaneous stimuli) in visual extinction. Visual extinction is
(p. 329)

thus revealed in conditions of brief presentation and attentional competition, whereas vi­
sual neglect refers to loss of awareness of a contralesional stimulus even when it is pre­
sented alone and permanently, in free gaze conditions.

Note that the use of the term “contralesionally” rather than “in the contralesional visual
field” in the above definitions reveals the lack of acknowledgment of the reference frame
in which these deficits can occur. Even if visual extinction is usually tested between visual
fields and strictly eye-centered pattern has been described (Figure 16.11; Mattingley et
al.,

Page 16 of 56
Attentional Disorders

2000), Làdavas (1990) has


reported cases in which an
item in the ipsilesional field
can be extinguished by a
more eccentric one (i.e., allo­
centric extinction). Note that
this is still consistent with
the notion of a left-right gra­
dient of salience or competi­
tive strength across the en­
tire visual field (Figure
Figure 16.12 Effect of adaptation to a rightward
16.12; see Figure 16.9). In
prismatic deviation (10°, same as in Rossetti et al.,
1998) on the attentional left-right gradient in a pa­ cross-modal extinction be­
tient with extinction. Attentional gradient was mea­ tween visual stimulus near
sured by reaction time (RT, in milliseconds) to detect the hand and tactile stimulus
visual targets presented at varied eccentricities (pix­
els) in the right and left visual fields. When the tar­
on the other hand, a limb-
get was detected, the subject had to respond by lift­ centered pattern revealed by
ing the finger from a tactile screen. Two sessions crossing the hands has al­
were performed before (pre) and two sessions after
lowed investigators to disso­
(post) prismatic adaptation. The ANOVA showed no
significant main effect of prisms (pre versus post: ciate egocentric and limb-
F(1,154) = 2.3; p = 0.12), a significant main effect of centered spatial coordinates
visual field (left versus right: F(1,154) = 94; p < (Bartolomeo et al., 2004).
0.01), and a trend of interaction between prisms and
Neglect and extinction have
visual field (p = 0.10), which allows us to perform
planned comparisons: pre versus post was significant in common this possible oc­
in the right visual field (p < 0.05) and not in the left currence (p. 330) in multiple
visual field (p = 0.94). The effect, therefore, appears reference frames: object-
as a decrease of the ipsilesional hyperattentional
bias, resulting in a more flat/balanced attentional based versus space-based,
gradient. allocentric versus egocentric
(in body space or external
space: eye-centered; Egly et al., 1994; Hillis & Caramazza, 1991; Ota et al., 2001; Riddoch et al.,
2010; Vuilleumier & Rafal, 2000; but see the review by Driver, 1999, for interpretation as a com­
mon attentional reference frame), and in perceptual (in multiple sensory modalities) or motor
domains (motor extinction: Hillis et al., 2006; motor neglect: Mattingley et al., 1998; see reviews
in Heilman, 2004; Husain et al., 2000).

Page 17 of 56
Attentional Disorders

Figure 16.13 A patient with a right focal posterior


parietal infarct. Data indicated that the patient had
no clinical neglect (assessed by classic neuropsycho­
logical paper-and-pencil tests like line bisection,
drawing, and cancellation) nor clinical extinction
(tested with simultaneous dots flashed in the right
and left visual fields while fixation to a central cross;
left panel, dark gray bars). However, a deficit of
covert attention in the contralesional visual field was
revealed as an extinction pattern when letters were
flashed instead of dots (middle panel, light gray bars)
: the deficit of detection of the letter presented in the
left visual field appeared only when it was in compe­
tition with a letter in the right visual field (bilateral
presentation trials). The contralesional attentional
deficit was worsen and manifested as a neglect pat­
tern when letter identification (right panel, black
bars) was required instead of simple detection (left,
right, or both): the performance to report the letter
presented in the left visual field was affected not on­
ly in bilateral presentations but also in single presen­
tation.

Although the literature has treated extinction as a single phenomenon (contrary to ne­
glect), this may not be the case and may explain why, as for neglect, the anatomical site
of visual extinction is still debated. Both the tests used for diagnosis and the condition in
which they are tested are crucial. For example, it seems that the processes are different
whether the stimuli are presented at short or wide eccentricities (Riddoch et al., 2010).
Extinction is classically tested with stimuli presented bilaterally at relatively short eccen­
tricity (perhaps to ensure reasonable report of single items in the contralesional visual
field; Riddoch et al., 2010) and has been shown to benefit from similarly based groupings
(Mattingley et al., 1997). However, the reverse effect of similarity has been observed with
items presented at wide eccentricities. Riddoch et al. (2010) argue that at far separations
participants need to select both items serially, with attention switched from one location
to another. This may also be the case when one increases task difficulty (Figure 16.13) or
attentional demand. In such conditions, extinction might then arise from a limitation in
short-term visual memory rather than from competition for selection between stimuli that
are available in parallel. Given that Pisella et al. (2004) have shown that spatial working
memory deficit is a nonlateralized component of parietal neglect, this makes the frontier
between patients considered as exhibiting extinction or neglect rather confusing.

Page 18 of 56
Attentional Disorders

As a consequence, visual extinction is often considered a mild form of visual neglect. Di


Pellegrino et al. (1998) have demonstrated in a left-sided extinction patient that a stimu­
lus presented in the left visual field can be extinguished by a right-sided one even if the
left-sided stimulus was presented several hundred milliseconds before the right one. Sim­
ilar default contralesional disadvantage, even when there is no initial right-sided stimulus
for attention to engage upon, has been revealed through temporal order judgment para­
digm by Rorden et al. (1997). Visual extinction, after a lesion to the SPL-IPS network, is
therefore better defined as a deficit of visual saliency for stimuli in the contralesional vi­
sual field than as a deficit appearing only in the condition of attentional competition of ip­
silesional visual stimulation. As illustrated in Figure 16.13, extinction and neglect, as test­
ed with dots in a simple environment, can therefore be considered the gradual expres­
sions of a same attentional deficit (ipsilesional lateralized (p. 331) bias) because they can
both be expressed by the same patient, depending on the task difficulty.

In sum, these experimental conditions using dots are unsatisfactory to distinguish be­
tween visual extinction and neglect, whereas their clinical distinction in terms of handi­
cap and recovery and with paper-and-pencil tasks is almost intuitive (as mentioned
above). Alternatively, the presence of nonlateralized components, in addition to the com­
mon lateralized bias of attention, could be used as criteria to distinguish “clinical” unilat­
eral neglect from visual extinction.

Lateralized and Nonlateralized Deficits Within the Neglect Syndrome

The clinical prevalence of spatial neglect for a right-hemispheric lesion is supported by


converging arguments for a specialization of the right hemisphere for spatial cognition
and attention throughout the whole visual space, as already suggested by Kinsbourne
(1993). Such representation of the whole space in the right IPL is crucially used for com­
plex visual-spatial tasks that have to be performed throughout the visual field because
they require a visual synthesis of detailed snapshots (Malhotra et al., 2009), such as the
“landmark” tests that require the middle between two objects to be mentally defined in
order to judge whether a landmark object is in the middle or closer to the right or to the
left object (illustrated in Figure 16.3; Bisiach et al., 1998; Fierro et al., 2000; Fink et al.,
2000; Ungerleider & Mishkin 1992). Accordingly, the recent literature on neglect has
highlighted that after right-hemispheric damage to IPL (Mort et al., 2003; Vallar & Perani,
1986), neglect patients present with deficits of visual space exploration and integration
that are not restricted to the contralesional hemifield (Husain, 2008; Husain et al., 2001;
Husain & Rorden, 2003; Kennard et al., 2005; Pisella et al., 2004; Wojciulik et al., 2001).
Pisella and Mattingley (2004) have postulated that this specific representation of the
whole space within the human IPL may be a privileged map in which oculocentric remap­
ping processes operate, thereby allowing coherent visual synthesis. Human studies have
revealed implication of both occipital (Merriam et al., 2007) and parietal (Heide et al.,
1995; Medendorp et al., 2003; Morris et al., 2007) lobes in visual remapping, with a domi­
nance of the right PPC (Heide & Kömpf, 1997; Heide et al., 2001, Kennard et al., 2005;
Malhotra et al., 2009; Mannan et al., 2005; Pisella et al., 2011; van Koningsbruggen et al.,
2010). Heide et al. (1995) have provided a neuropsychological assessment of the brain re­
Page 19 of 56
Attentional Disorders

gions specifically involved in remapping mechanisms using the double-step saccadic para­
digm with four combinations of saccades directions in different groups of stroke patients.
Patients with left PPC lesions were impaired when the initial saccade was made toward
the right, followed by a second saccade toward the left (right-left condition), but not in
the condition of two successive rightward saccades. Patients with right PPC lesions were
impaired in left-right and left-left conditions, and also in right-left condition (only the
right-right combination was correctly performed; see Figure 16.8). As reviewed in Pisella
and Mattingley (2004), this asymmetry in remapping impairment matches the clinical
consequences of lesions of the human IPL and is probably due to an asymmetry of visual
space representation between the two hemispheres. Later, Heide and Kömpf (1997) wrote
about their study (Heide et al., 1995): “our data confirm the key role of the PPC in the
analysis of visual space with a dominance of the right hemisphere” (p. 166), and provided
new information on the lesions and symptoms of their patients’ groups: the focus of PPC
lesions located “in the inferior parietal lobule along the border between the angular and
supramarginal gyrus, extending cranially toward the intraparietal sulcus, caudally to the
temporo-parietal junction, and posteriorly into the angular gyrus” (p. 158). Compatible
with this lesion site, patients of the right PPC lesion group in the study of Heide et al.
(1995) presented with symptoms of hemineglect. Furthermore, their deficit in the double-
step saccade task did correlate with patients’ impairment in copying Rey’s complex figure
(Figure 16.14), but not with other tests measuring severity of left hemineglect (Heide &
Kömpf, 1997). Accordingly, Pisella and Mattingley (2004) have suggested that an impair­
ment of remapping processes may contribute to a series of symptoms that pertain to uni­
lateral visual neglect syndrome and that are unexplained by the attentional hypothesis
alone (the ipsilesional attentional bias), such as revisiting, spatial transpositions, and dis­
organization in the whole visual field. The severity of neglect might depend on two disso­
ciated factors: 1) the strength of the ipsilesional attentional bias, and 2) the severity of
non-lateralized attentional deficits.

Page 20 of 56
Attentional Disorders

Figure 16.14 Spatial transpositions on the copy of


Rey figure following right posterior parietal lesion. In
his copy, the patient with neglect (bottom panel) not
only omits most elements of the left side of the figure
but also inappropriately adds to the right side some
elements pertaining to the left side (Rode et al.,
2007a). The patient with constructional apraxia with­
out neglect (upper panel) copies almost all the figure
components but exhibits errors in their relative local­
ization (Heide & Kömpf, 1997). Another patient with
constructional apraxia and neglect (unpublished) had
to search for the target (circle), which normally easi­
ly “pops up” among the distracters (squares). The
lines represent the continuous eye position recorded
until the patient provided his response (target
present or absent). As shown by the ocular tracking,
both patients with constructional apraxia and neglect
showed much revisiting behavior during their visual
search, with lack of exploration of the left part of the
visual scene exhibited in addition in the neglect pa­
tient.

Another attempt to account for the prevalence of visual neglect following right-hemi­
spheric lesion has highlighted its specialization in “nonspatial” processes that are critical
for visual selection, such as sustained attention (or arousal; Robertson, 1989). In this re­
spect, the review of Husain (2001) on the nonspatial temporal deficits associated with ne­
glect is of prime interest. Husain et al. (1997) have shown (p. 332) that lesions of either
the frontal lobe (4), the parietal lobe (3), or the basal ganglia (1) causing an ipsilesional
attentional bias are also associated with a limited-capacity visual processing system caus­
ing abnormal visual processing over time between letters presented at the same location
in space (lengthening of the “attentional blink” in central vision tested by the Rapid Seri­
al Visual Presentation paradigm; Broadbent & Broadbent, 1987; Raymond et al., 1992). As
reviewed in Husain (2001), a later study by di Pelligrino et al. (1998) suggested that this
deficit in time and the spatial bias were the two sides of the same coin by showing in a
patient with left-sided visual extinction that the attentional blink was lengthened when
Page 21 of 56
Attentional Disorders

stimuli were presented in the left visual field but within normal range when stimuli were
presented in the right visual field. As a conclusion, the temporal deficits of attention in
central vision (such as lengthening of attentional blink) can appear as a deficit of spatial
attention and do not seem to be related specifically to neglect (because it also occurs in
patients with extinction), nor anatomically to the posterior parietal lobe. In contrast, the
deficit of visual spatial working memory (visual synthesis) described in neglect but also
more recently in constructional apraxia has been ascribed to be specific to right IPL le­
sions (Pisella et al., 2004; Russell et al., 2010; see Figure 16.9). In this context, the cru­
cial contribution of visual remapping impairment in severe neglect (and not in left-sided
extinction) proposed by Pisella and Mattingley (2004) is not questioned by the studies
that have shown that spatial working memory deficit of patients with neglect also ap­
pears in a vertical display (Ferber & Danckert, 2006; Malhotra et al., 2005). Indeed, the
right IPL is conceived to be able to remap (and establish relationships between) locations
throughout the whole visual field (Pisella et al., 2004, 2011; see Figure 16.9). The lateral­
ized remapping impairments revealed by Heide et al. (1995) but also more recently by van
Koningsbruggen et al. (2010) result from the combination of the right IPL (p. 333) special­
ization for space and the ipsilesional bias of attention, which additionally may delay or de­
crease the representation of contralesional visual stimuli. In contrast, remapping impair­
ments may express as deficient visual synthesis and revisiting behavior without lateral­
ized spatial bias in several nonlateralized syndromes like constructional apraxia (see Fig­
ure 16.14), which appears as persisting visual-spatial disorder following right parietal
damage when neglect lateralized bias has resolved (Russell et al., 2010).

A Recent View of Posterior Parietal Cortex Organization and Unilater­


al Neglect Syndrome

The standard view of the PPC and Bálint’s syndrome in the context of the predominant
model of Milner and Goodale (1995) was that the most superior part of the PPC (dorsal
stream) was devoted to action (with OA as illustrative example of specific visual-motor
deficit) and the most inferior part of the PPC was more intermediate between vision for
action and vision for perception, with unilateral neglect as illustrative example. Recent
converging evidence tends to distinguish behaviorally but also anatomically the lateral­
ized and nonlateralized components of unilateral neglect, linking the well-known lateral­
ized bias of neglect to the dysfunction of the superior parietal-frontal network and the
newly defined deficits nonlateralized in space to the inferior parietal-frontal network and
TPJ. Indeed, the consequences of PPC lesions in humans suggest that, within the PPC,
symmetrical and asymmetrical (right-hemispheric dominant) visual-spatial maps coexist.
TMS applied unilaterally on the SPL symmetrically causes in humans contralesional visu­
al extinction (Hilgetag et al., 2001; Pascual-Leone et al., 1994): In bilateral crossed-hemi­
field visual presentation of two simultaneous objects, only the ipsilesional one is reported.
The spatial representations of the SPL, whose damage potentially induces contralesional
OA and contralesional visual extinction, concern egocentric (eye-centered) localization in
the contralesional visual field (Blangero et al., 2010a; see Figure 16.11) and do not exhib­
it right-hemispheric dominance (Blangero et al., 2010a; Hilgetag et al., 2001; Pascual-

Page 22 of 56
Attentional Disorders

Leone et al., 1994). In contrast, hemineglect for the right space is rare and usually is
found in people who have an unusual right-hemispheric lateralization of language. Ac­
cording to the rightward bias in midline judgment characteristic of more classical left-
sided visual neglect after a unilateral (right) lesion, brain imaging (Fink et al., 2000) and
TMS (Fierro et al., 2000) studies have revealed that this landmark (allocentric localiza­
tion: perceptual line bisection) task activates in humans a specialized and lateralized net­
work including the right IPL and the left cerebellum. The right IPL (IPL is used here to
distinguish it from SPL and designates a large functional region in the right hemisphere
that also includes the superior temporal gyrus and the TPJ) is also specifically involved in
processes such as sustaining and reorienting attention to spatial locations in the whole vi­
sual field, useful in visual search tasks (Ashbridge et al., 1997; Corbetta et al., 2000, 2005;
Ellison et al., 2004; Malhotra et al., 2009; Mannan et al., 2005; Muggleton et al., 2008;
Schulman et al., 2007). Because neglect patients by definition (1) have a deficit of atten­
tion for contralesional space and (2) fail in visual scanning tasks of line bisection and can­
cellation, it seems that visual neglect syndrome is a combination of left visual extinction
(produced by damage of the right SPL) and visual synthesis deficits in the entire visual
field caused by damage to the right IPL (Pisella & Mattingley, 2004). Accordingly, Pisella
et al. (2004) have shown a combination of left-right attentional gradient and spatial work­
ing memory in the entire visual space in neglect consecutive to parietal damage (see Fig­
ure 16.9).

One can further speculate (model on Figure 16.1) that bilateral lesions of the SPL in hu­
mans may cause the Bálint’s symmetrical shrinkage of attention, Seelenlähmung des
Schauens, later called simultanagnosia (Luria, 1959; Wolpert, 1924,), in which the patient
reports only one object among two in a symmetrical way, that is, not systematically the
right or the left one. Accordingly, the symptoms of simultanagnosia may simply appear as
a bilateral visual extinction (Humphreys et al., 1994), without more severe spatial disor­
ganization. This might correspond to the differential severity between the handicap con­
secutive to bilateral SPL lesion after a stroke, in which the patients exhibit bilateral OA
and subclinical simultanagnosia only displayed as a shrinking of visual attention (e.g., pa­
tient AT: Michel & Henaff, 2004; patient IG, personal observation), and the larger handi­
cap consecutive to posterior cortical atrophy (Benson’s syndrome). In Benson’s syn­
drome, simultanagnosia may be worsened by an extent of the damage toward the right
IPL, thereby affecting the maps in which the whole space is represented and in which the
remapping mechanisms may specifically operate in order to integrate visual information
collected via multiple snapshots into a (p. 334) global concept. This extent of neural dam­
age toward the right IPL would also be the crucial parameter to explain the different
severity between left-sided extinction (without clinical neglect syndrome) after a unilater­
al SPL lesion and neglect syndrome, including extinction, deficit in landmark tests, and
remapping impairments or defect of visual synthesis (see Figure 16.1).

The observation of a young patient with an extremely focal lesion of the right IPL caused
by a steel nut penetrating his brain during an explosion (Patterson & Zangwill, 1944,
Case 1) is a direct argument for the model we propose in this chapter (see Figure 16.1).
The right IPL is the region the most commonly associated with visual neglect (Mort et al.,
Page 23 of 56
Attentional Disorders

2003). Patterson & Zangwill (1944) describe a patient with left-sided extinction and “a
complex disorder affecting perception, appreciation and reproduction of spatial relation­
ships in the central visual field of vision” (p. 337). This “piecemeal approach” was associ­
ated with a lack of “any real grasp of the object as a whole” (p. 342) that “could be de­
fined as a fragmentation of the visual contents with deficient synthesis” (p. 356). This de­
fect of visual synthesis in central vision was qualitatively similar to the consequences of
bilateral lesion of the posterior parietal lobe, which causes simultanagnosia, a nonlateral­
ized extinction restricting visual perception to seeing only one object a time, even though
another overlapping object may occupy the same location in space (Humphreys et al.,
1994; Husain, 2001; Luria, 1959;). It is striking that the behavioral impact of this focal
unilateral lesion was almost equivalent to a full Bálint’s syndrome with ipsilesional biases
and extreme restriction of attention. In addition, a defect of establishing spatial relation­
ships and integration of visual snapshots as a whole was described (that has been related
to spatial working memory or visual remapping impairments in the whole visual field; Dri­
ver & Husain, 2002; Husain et al., 2001; Pisella et al., 2004; Pisella & Mattingley, 2004).
According to neuropsychological and neuroimaging observations, Corbetta et al. (2005)
developed a model based on the notion that the right inferior parietal-frontal network is
activated by unpredicted visual events throughout the whole visual field and that the su­
perior parietal-frontal eye field network of attention influences the top-down stimulus–re­
sponse selection in the contralesional visual space. In their model, the lesion of the right
inferior parietal-frontal network, via a “circuit-breaking signal,” decreases activity in the
ipsilateral superior parietal-frontal eye field network and consequently biases the activity
in the occipital visual cortex toward the ipsilesional visual field. In other words, this mod­
el predicts that the lesion of the right IPL would, in addition to directly damaging the in­
ferior parietal-frontal network, indirectly decrease functional activity in the right SPL and
thereby cause ipsilesional attentional bias and left-sided visual extinction (arrow from the
right IPL to the right SPL on the model of Figure 16.1).

The presence of lateralized attentional bias in the patient described by Patterson and
Zangwill (1944) may alternatively be understood within the general framework of inter­
hemispheric balance. It appears that any lesion affecting directly or indirectly the pari­
etal-frontal networks of attention in an asymmetrical way will cause an imbalance ex­
pressed as a lateralized bias. According to this more general assumption, a lateralized
bias (left-right gradient of attention) is observed, for example, after a lesion to the basal
ganglia but without spatial working memory deficit, which is more specific to lesions of
the right IPL (Pisella et al., 2004; see Figure 16.9). Even less specifically, a spatial bias to
direct attention (or a limited-capacity visual processing system) may presumably occur
because of unopposed contralesional hemisphere activity.

To sum up, in our model (see Figure 16.1), visual extinction is a sensitive test to reveal
the ipsilesional attentional bias, which is the lateralized component of neglect. This ipsile­
sional attentional bias (or left-right attentional gradient) is caused by the dysfunction of
the SPL directly (lesion of the SPL) or indirectly (the lesion of the right IPL causes an im­
balance between the right and the left SPL; Corbetta et al., 2005), or by a lesion else­
where causing similar imbalance in the dorsal parietal-frontal networks between the right
Page 24 of 56
Attentional Disorders

and the left hemispheres. It seems that prismatic adaptation, and most other treatments
of neglect, improve this lateralized component (common to neglect and extinction pa­
tients, whatever the lesion causing the attentional gradient; see Figure 16.12; but see
Striemer et al., 2008), probably by acting on this imbalance (Luauté et al., 2006; Pisella et
al., 2006b). This is also an explanation of the paradoxical improvement of neglect (follow­
ing right-hemisphere damage) by subsequent damage of the left hemisphere (Sprague ef­
fect; see Weddell, 2004, for a recent reference). The other (nonlateralized) components of
neglect rely on the right IPL and its general function of visual-spatial synthesis and may
be more resistant to treatments (but see Rode et al., 2006; Schindler et al., 2009).

(p. 335) Optic Ataxia


Reaching Errors Might Be Explained by Attentional Deficit

The basic feature of OA is a deficit for reach and grasp to a visual target in peripheral vi­
sion that cannot be considered purely visual (the patient can see and describe the visual
world, and binocular vision is unaffected), proprioceptive (the patient can match joint an­
gles between the two arms), or motor (the patient can move the two arms freely and can
usually reach and grasp in central vision) (Blangero et al., 2007; Garcin et al., 1967;
Perenin & Vighetto, 1988).

The two bilateral OA patients who have been tested most extensively in the last decade
(IG and AT; see Khan et al., 2005b, for detailed description of their lesion) initially pre­
sented with associated Bàlint’s symptoms (simultanagnosia but no neglect) causing
deficits in central vision. For example, patient AT showed a limited deficit when grasping
unfamiliar objects in central vision during the early stage of her disease, when more con­
comitant Bàlint’s symptoms were present (Jeannerod et al., 1994). Subsequent studies of
grasping in these patients used only peripheral object presentations (e.g., Milner et al.,
2001, 2003). The reaching study conducted in these two bilateral patients (after they re­
covered from initial Bálint’s symptoms) also typically showed that accuracy was essential­
ly normal in central vision, whereas errors increased dramatically with target eccentricity
(Milner et al., 1999, 2003; Rossetti et al., 2005). In light of these observations, we suggest
that patients with bilateral lesions suffering from Bàlint’s syndrome may be impaired in
central vision owing to additional visual-spatial deficits such as simultanagnosia. As re­
ported by IG herself soon after her stroke, simultanagnosia prevents “the concomitant
viewing of the hand and the target,” which prevents motor execution from any control
and can cause reach-and-grasp errors in central vision. Accordingly, patients with reach­
ing deficits in central vision are shown to be more accurate when they are prevented
from visual feedback about the hand (open-loop condition) during the execution of the
movement (Jackson et al., 2005; Jakobson et al., 1991; Buxbaum & Coslett, 1998). This
demonstrates that in these patients there is a clear interference between visual informa­
tion from the target guiding the action and visual information from the hand position pro­
viding online feedback, which is resolved by removing the visual feedback of the hand.
Such simultanagnosia, described by Bálint (1909) as difficulty looking at an object other
Page 25 of 56
Attentional Disorders

than the one he was fixating, can also be advocated to explain the “magnetic misreach­
ing” behavior (Carey et al., 1997), in which the patient can only reach toward where he is
fixating, and that we have had the opportunity to observe in patient CF, in the acute
phase when Bálint’s syndrome was exhibited.

For patient IG, this perceptual deficit in the acute phase is mentioned in the Pisella et al.
(2000) study, where it is written that online reaching control was tested only when IG re­
covered from her initial simultanagnosia. Even if clinical signs of simultanagnosia had re­
solved after a few months (the patient had retrieved the full perceptual view of her hand,
whereas she initially had perceived only two fingers at the same time, and could report
bilateral stimulation even when tested on a computer with simultaneous presentation of
small dots), a complaint remained, for example, about pouring water into a glass without
touching it. For patient AT, who exhibited a larger lesion with more extent toward the oc­
cipital lobule and IPL, a full report of the remaining perceptual deficits has been provided
by Michel and Henaff (2004) and summed up as a concentric shrinking of the attentional
field, impairing tasks, such as mazes, in which global attention is required.

Page 26 of 56
Attentional Disorders

Figure 16.15 Field effect and hand effect. A, Clinical


examination of optic ataxia patients (Vighetto, 1980).
The clinician stands behind the patient and asks him
to fixate straight ahead. The clinician then succes­
sively presents in the two fields target objects to be
grasped with the two hands. This patient with right
posterior parietal cortex (PPC) damage exhibits a
gross deficit when reaching to left-sided objects (con­
tralesional field effect) with his left hand (contrale­
sional hand effect). Once the object has been missed,
he exhibits exploratory movements comparable to
blind subjects. This poor visual-motor performance
can be contrasted with the ability of the patient to
describe the object and his normal ability to reach to
central targets. B, Histograms of the mean absolute
errors when pointing to visual targets in the dark (Bl
angero et al., 2007). Columns represent the means
and standard deviations of the end points errors (in
millimeters) for each of four combinations of hemi­
fields and pointing hands for the two patients Can
and OK presenting unilateral optic ataxia. The dotted
line shows the mean normal performance in the same
condition. C, Illustration of the four conditions of on­
line motor control tested and of the performance of
patient CF with left optic ataxia (values: percentage
of movements corrected in response to the target
jump in the four conditions of hand and jump direc­
tion). The movements were made by the healthy or
by the ataxic hand, and the target could jump from
central vision toward either the patient’s healthy or
ataxic visual field.

Reprinted from Cortex, 44(5), Blangero et al., “A


hand and a field effect in on-line motor control in uni­
lateral optic ataxia,” 560–568, Copyright (2008), with
permission from Elsevier.

After unilateral focal lesions of the SPL, cases of “pure” OA have been described, that is,
in absence of clinical perceptual and oculomotor symptoms (e.g., Garcin et al., 1967),
even in the acute stage. These descriptions have provided arguments for a pure vision-
for-action pathway involving the SPL-IPS network (Milner & Goodale, 1995). When OA is
observed after a unilateral lesion, the symptoms predominate on the contralesional pe­
ripheral visual field (field effect), usually combined with a hand effect, the use of the con­

Page 27 of 56
Attentional Disorders

tralesional hand causing additional errors throughout the whole visual field, especially
when vision of the hand is not provided (Figure 16.15A, B; Blangero et al., 2007; Brou­
chon et al., 1986; Vighetto, 1980). This combination of hand and field effect is observed
both for reaching to stationary targets and for online motor control in response to a tar­
get jump (Figure 16.15C; Blangero et al., 2008,); the impairment of manual automatic
corrections is linked both to the deficient updating of target location when it jumps in the
contralesional visual field (field effect) and to the deficient monitoring of the contralesion­
al hand location within the contralesional space at movement start and during ongoing
movement (hand effect). This combination of hand and field effects has been considered
characteristic of a deficit in visual-manual (p. 336) transformation. However, recent exper­
imental investigation of the field effect has revealed that this deficit specifically corre­
sponds to the impairment of a visual-spatial (not only visual-manual) transformation defin­
ing a contralesional target location in an eye-centered reference frame (Blangero et al.,
2010a; Dijkerman et al., 2006; Gaveau et al., 2008; Khan et al., 2005a). That is, there is a
visual-spatial module coding locations in the eye-centered reference frame commonly
used for eye and hand movements within the SPL (Figure 16.16; Gaveau et al., 2008).
This common module could thus also be involved in covert (attentional) orienting shifts
with perceptual consequences. Accordingly, a lateralized bias of attention has been
demonstrated in a Posner task by Striemer et al. (2007). Michel and Henaff (2004) have
reported that target jumps were perceived without an apparent motion in a bilateral OA
patient (AT), and reaction times to discriminate with a mouse left or right jumping of a
target seen in central vision are delayed for contralesional directions (e.g., with a right
PPC lesion, reaction time to target jumps to the left were longer than target jumps to the
right and longer than control performance; unpublished observation).

Recent studies have revealed that even pure cases of unilateral OA appear to systemati­
cally demonstrate a lateralized contralesional deficit of covert attention (reviewed in
Pisella et al., 2007, 2009). This attentional deficit has to be specifically explored because
it can be subclinical (see Figure 16.13). An attentional deficit or ipsilesional bias can be
revealed in conditions of increased attentional load (request of identification of the letter
rather than simple detection) or attentional competition (with presence of an ipsilesional
item or of flankers within the (p. 337) same visual field). These increases of task demand
were always at the disadvantage of the performance within the contralesional visual field
in unilateral OA patients, similarly as in patients with extinction or neglect. The similarity
is also highlighted by the fact that OA can express in space or in time (Rossetti et al.,
2003), as already mentioned for attentional deficit (reviewed in Husain, 2001). The last
decade of study of OA has described the deficit in space (errors in eye-centered reference
frame increasing with visual eccentricity; reviewed in Blangero et al., 2010a) and in time
(Milner et al., 1999, 2001; Pisella et al., 2000; reviewed in Pisella et al., 2006a; Rossetti &
Pisella, 2002; Rossetti et al., 2003). The time effect can be summed up as a paradoxical
improvement of visual-manual guidance in delayed offline conditions and even by a guid­
ance based on past representations (more stable than eye-centered ones) of targets
(“grasping the past” Milner et al., 2001). This description does not seem finally too far
away from what Husain (2001) concluded about attentional function in the context of vi­

Page 28 of 56
Attentional Disorders

sual extinction and neglect: that it may consist in the keeping track of object features
across space and time. Note that location in space is a crucial feature for reaching as well
as for feature binding in the context of perception and more complex action.

To sum up, evidence of a lateralized deficit of covert attention in OA does not allow us to
claim a pure visual-motor deficit and to maintain a functional dissociation between OA
and (subclinical) visual extinction after lesions to the SPL.

Dissociation Between the Attentional Deficits in Neglect and Optic


Ataxia

At this stage of the chapter, we have claimed that both neglect and OA exhibit a deficit of
attentional orienting toward contralesional space emerging from damage to the symmet­
rical SPL-IPS network. The specific right-hemispheric localization of neglect would rely
on the association of nonlateralized deficit consecutive to right IPL damage. Interestingly,
these common and different components can be revealed using the Posner paradigm (Pos­
ner, 1980), a well-known means of testing attentional orienting in space. In the Posner
spatial-cueing task, participants keep their eyes on a central fixation point and respond as
fast as possible to targets presented in the right or the left visual field. Targets can be
preceded by cues presented either on the same side (valid trials) or on the opposite side
(invalid trials). An increased cost in responding to the target presented on the opposite
side of the cue (incongruent trials) is attributed to a covert orienting shift toward the cue
and necessary disengagement of attention from this cue to detect the target (Posner,
1980).

Page 29 of 56
Attentional Disorders

Figure 16.16 Similar pattern of errors for saccade


and reach in bilateral optic ataxia. A, Comparison of
the pattern of errors between pointing and reaching
to stationary peripheral visual targets in bilateral op­
tic ataxia (patient IG). Individual pointing (left
panels) and saccadic (right panels) trajectories to­
ward peripheral targets for one control subject (up­
per panels) and bilateral optic ataxia patient IG (low­
er panels). In the pointing task, subjects were asked
to maintain their gaze fixed (black cross) while point­
ing to central (black dot) or peripheral (gray dots)
targets. Control subject reached accurately target
positions estimated in peripheral vision, whereas
IG’s pointing behavior showed a pathological hand
movement hypometria, which increased with target
eccentricity (Milner et al., 2003; Rossetti et al.,
2005). In the saccade task, subjects were instructed
to move their eyes from a fixation point (“0”) toward
one of three peripheral targets (gray dots). Control

Page 30 of 56
Attentional Disorders

subject presented a well-known undershoot (hypome­


tria) of his primary saccade, which increases with
target eccentricity; IG presented a pathological in­
crease of hypometria of primary saccades with target
eccentricity, which appears similar to her pointing
hypometria for peripheral targets but at further ec­
centricities (the eccentricities are provided in cen­
timeters, but in both reach and saccade conditions il­
lustrated here, the targets are similarly positioned
along a frontal-parallel line at reaching distance). It
must be noted that following corrective saccade (not
shown) will eventually correct this hypometria of pri­
mary saccades in patients with optic ataxia as in con­
trols (Gaveau et al., 2008). B, Comparison of hand
and eye online visual-motor control in bilateral optic
ataxia (patient IG). This figure describes the perfor­
mance of a control subject and a patient with bilater­
al optic ataxia (IG) in two experimental conditions.
First, static objects were presented in central vision,
and all subjects were able to reach appropriately to
grasp them in either position C or position R. Se­
cond, when the object was quickly moved at the time
of movement onset, controls were able to alter their
ongoing trajectory and reach for the final location of
the object. The patient with bilateral optic ataxia was
specifically impaired in this condition and produced a
serial behavior. She performed a whole movement to
the first location of the object (C), which she fol­
lowed with a secondary movement toward the sec­
ond location (R). The time of target grasping was
subsequently delayed with respect to stationary tar­
gets and control performance (Gréa et al., 2002; see
also Pisella et al., 2000). Similarly, her saccadic be­
havior (on the right; Gaveau et al., 2008) consisted of
two corrective saccades in addition to the primary
saccade. The first “corrective” saccade was generat­
ed was directed to the initial target location (A) with
no reaction time increase (i.e., as if the target had
not been displaced), then a second late corrective
saccade achieved visual capture of the target with a
delay with respect to stationary targets and control
performance. This pathological behavior (pointed by
a black arrow, compared with the control perfor­
mance pointed by an empty arrow) reveals a core
deficit of online integration of visual target location.

Page 31 of 56
Attentional Disorders

Figure 16.17 Differential pattern of performance in


the Posner paradigm between patients with optic
ataxia (OA) and patients with unilateral spatial ne­
glect (NSU). In these parietal patients, the common
deficit is that targets on the contralesional side are
detected more slowly than those on the ipsilesional
side, consistent with a pathological attentional gradi­
ent. This left-right difference is the main effect in pa­
tients with optic ataxia (Striemer et al., 2007). A
deficit in disengaging attention (in invalid trials) has
been highlighted as a specific additional characteris­
tic of neglect: For these patients, the need to gener­
ate sequential orienting toward the left then the
right side appears even more problematic than the
need to orient a saccade in a direction opposite to
their pathological rightward attentional gradient. In­
deed, data replotted from Posner et al. (1984) show
that reaction times to respond to target presented in
the ipsilesional visual field in invalid trials (ipsi-in­
valid) are almost 100 ms longer than reaction times
to respond to target presented in the contralesional
visual field in valid trials (contra-valid trials).

In parietal patients, targets on the contralesional side are detected more slowly than
those (p. 338) (p. 339) on the ipsilesional side (Friedrich et al., 1998; Posner et al., 1984),
consistent with a pathological attentional gradient. As can be seen on Figure 16.17,
deficit in disengaging attention from a right (ipsilesional) cue in invalid condition after
right parietal damage has been highlighted as a specific additional characteristic of ne­
glect (right hyperattention; Bartolomeo & Chokron, 1999, 2002). Note, however, on Fig­
ure 16.17 that a cost to disengage attention from a left (contralesional) cue is also ob­
served in the other condition of spatial invalidity of the cue in neglect patients: When the
cue appears on the left side and the target on the right side, neglect patients are about
100 ms slower than in the valid condition with the target presented on the left side. Al­
though the difference between leftward and rightward orienting in valid trials is about 20
ms only (Posner et al., 1984; redrawn in Figure 16.17). This means that for these patients
with parietal neglect, the need to generate sequential left-right orienting appears more
problematic than the need to orient attention in a direction opposite to their pathological
rightward attentional gradient (illustrated in Figure 16.12). This reflects the combination
of a rightward attentional gradient and a general (nonlateralized) double-step orienting
deficit, with an interaction between the two deficits causing the disengagement cost from
Page 32 of 56
Attentional Disorders

right cues to be higher than the disengagement cost from left cues. We have suggested
that remapping mechanisms, crucial for double-step orienting, can account for this gener­
al disengagement cost in neglect patients and further for observation of ipsilesional ne­
glect patterns after left cueing (Pisella & Mattingley, 2004). Crucially, this hypothesis im­
plies that remapping processes might work for covert shifts of attention as well as for
overt shifts of the eyes.

Contrary to neglect patients who exhibit a major disengagement deficit leading to the
longest reaction times in the two invalid conditions (Posner et al., 1984), Striemer et al.
(2007) have suggested that OA patients exhibit only an engagement deficit toward the
contralesional visual field in a Posner task and no specific disengagement deficit. In con­
sequence, performance to respond to ipsilesional items is never impaired, even in invalid
trials, whereas response to contralesional items is slow in both valid and invalid trials
(see Figure 16.17). Striemer et al. (2007) have therefore suggested damage of the salien­
cy maps representing the contralesional visual field after an SPL-IPS lesion in a patient
with OA. In other words, they show the same deficit as neglect to orient attention toward
the contralesional visual field but no additional deficit in the double-step orienting need­
ed in invalid trials. Accordingly, patients with OA exhibit no signs of visual synthesis
deficit. Moreover, OA patients have shown preserved visual remapping of saccade in the
context of a pointing experiment (Khan et al., 2005a). This absence of visual synthesis im­
pairment fits with the absence of disengagement cost in invalid trials.

Page 33 of 56
Attentional Disorders

Figure 16.18 Performance of two patients with left


optic ataxia in letter discrimination tasks in visual
periphery in condition of central (endogenous) cue
versus peripheral (exogenous) cue. The cues are al­
ways valid. The endogenous cue consists of a central
arrow, which indicates a direction, and a color (al­
ways the green location). The exogenous cue con­
sists of a flash of the peripheral green circle on one
side. In both conditions, the letter E is flashed for
250 ms in regular or inverted orientation at the
green location at 8° of visual eccentricity, surround­
ed by flankers who also change from “8” to “2” or
“5” symbols for 250 ms. Then everything is masked
by the reappearance of the “8” symbols (first initial
image). The patients maintain central fixation
throughout the experiment (eye position is record­
ed). Both patients show a clear deficit in their left
(contralesional) visual field in the task of endogenous
covert attention shifting, whereas in the covert ex­
ogenous task, patient CF shows no deficit, and Mme
P also shows similar performance in both visual
fields.

This task is an adaptation of the paradigm of Deubel


and Schneider, 1996.

Luo et al. (1998) and later Bartolomeo et al. (2000) showed that neglect patients are more
specifically impaired in exogenous (stimulus-driven) conditions of attentional orienting
and remain able to shift attention leftward when instructed (i.e., voluntarily, goal direct­
ed). This framework can be used to explain the superiority of the disengaging costs with
respect to the bias resulting from the pathological attentional gradient in neglect pa­
tients. When the Posner task is undertaken by participants during functional magnetic
resonance imaging (Corbetta et al. 2000), symmetrical activations (contralateral to stimu­
lus presentation) are observed in the SPL, and specific activation of the right IPL is addi­
tionally observed in invalid trials in either visual field. Based on this pattern of activation,
Corbetta and (p. 340) Shulman (2002) have proposed this distinction between goal-direct­
ed (endogenous) and stimulus-driven (exogenous) systems. This model was suggested by
the stimulus-driven nature of the invalid trials (detecting events at unpredicted locations

Page 34 of 56
Attentional Disorders

could only been stimulus-driven by definition) activating representation of the entire visu­
al space within the IPL. If exogenous attention relies on the right IPL, according to neu­
roimaging and neuropsychology of spatial neglect, then the symmetrical SPL-IPS network
could subtend endogenous attention shifts. Accordingly, the tests of attentional shifting
we have used on several OA patients (OK: Blangero et al., 2010b, Pisella et al., 2009; CF
and Mme P: see Figure 16.18) involved endogenous cues (central arrow). Patients were
instructed to undertake a letter-discrimination task at a cued location within the visual
periphery. They failed to identify the letter flashed at the cued location in their contrale­
sional (ataxic) visual field. Is there a double dissociation between OA and neglect in this
respect? This remains to be investigated on a large scale. We have recently started to test
OA patients in an exogenous version of the same letter-discrimination task (adapted from
Deubel & Schneider, 1996), in which the cue was a flash of the peripheral location where
the letter would be presented. The two patients who are still available for testing (CF and
Mme P) have exhibited less contralesional deficit in this exogenous covert attention task
than in the endogenous version. In the endogenous version (right panel of Figure 16.18),
the percentage of correct letter discrimination was at chance level in the contralesional
visual field and at almost 80 percent in the ipsilesional visual field during the task. In the
exogenous version (left panel of Figure 16.18), there were no more difference between
performance in left and right visual fields. This possible clinical dissociation within the
PPC between exogenous and endogenous covert attention might therefore be (p. 341) an­
other difference between the attentional deficits of neglect and OA patients. Whether this
constitutes a systematic neuropsychological difference between SPL-IPS and right IPL le­
sion consequences needs to be further investigated. Note that such dissociation between
exogenous and endogenous attention has also been reported within the frontal lobe (Ve­
cera & Rizzo, 2006): Contrary to frontal neglect cases, their patient exhibited a general
impairment in orienting attention endogenously, although he could use peripheral cues to
direct attention.

To sum up, the Posner task reveals that OA patients exhibit attentional deficits different
from those exhibited by neglect patients because neglect appears as a combination of lat­
eralized deficits (common with OA patients) and deficits specific to the right IPL lesion
that need to be further characterized. Another issue that needs to be further investigated
is whether the attentional deficits of OA patients are associated with their visual-motor
impairment or are causal. The same issue is developed in the following section in relation
to attentional and eye movements’ impairments.

How Can We Now Define Psychic Paralysis of


Gaze?
Bálint’s description of Seelenlähmung des Schauens included difficulty with both finding
visual targets with the eyes (wandering of gaze) and with being “captured” by visual tar­
get once fixated (visual grasp reflex). Patients stare open eyed, with gaze locked to the
place they are fixating, and they may only be able to disrupt such sticky fixation after a

Page 35 of 56
Attentional Disorders

blink. When patients are asked to move their eyes to a target suddenly appearing in the
peripheral field, they may generate no movement, or they may initiate wandering eye
movements that consist in erratic and usually hypometric displacement of eyes in space,
ending with incidental acquisition of the target. Patients do not seem to perceive visual
targets located away from a small area, which is usually the area of foveation, despite
preserved visual fields. They exhibit a reduction of “useful field of vision,” operationally
defined as the field of space that can be attended while keeping central fixation (Rizzo &
Vecera, 2002). This can be tested by asking patients either to direct their eyes or their
hand to, or to name, objects presented extrafoveally. Generally, responses are more likely
to be given after verbal encouragement, a finding that indicates that the deficit is not a
consequence of a reduction of the visual fields but rather of attention scanning for non­
central events.

The three elements of Bálint’s triad (1909) have been subjected to numerous interpreta­
tions. As emphasized by de Renzi (1989), Holmes (1918) added to his description of pa­
tients with parietal lesions a deficit for oculomotor functions, which had been excluded by
Bálint for his patient. This oculomotor deficit described by Holmes (1918) has often been
incorporated into the description of Bálint’s syndrome, and confounded with the Seelen­
lähmung des Schauens most often translated in English as a psychic paralysis of gaze. As
de Renzi (1989) pointed out, psychic paralysis of gaze appears to be an erroneous transla­
tion of Bálint’s description. A first alternative translation proposed by Hecaen and De
Ajuriaguerra (1954)—psychic paralysis of visual fixation, also known as spasm of fixation
(or inability to look toward a peripheral target)—suggested that it can be dissociated from
intrinsic oculomotor disorders, as already argued by Bálint, but also from the two atten­
tional disturbances of Bálint’s syndrome: a general impairment of attention that corre­
sponded to a selective perception of foveal stimuli and a lateralized component of the at­
tention deficit, which can now be described as unilateral neglect. Husain and Stein’s
translation (1988) of Bálint’s description of the psychic paralysis of gaze corresponds to a
restriction of the patient’s “field of view, or we can call it the psychic field of vision.” De
Renzi (1989) only distinguished two types of visual disorders: one corresponding to the
lateralized deficit known as unilateral neglect, and the other to a nonlateralized restric­
tion of visual attention. These interpretations of the Bálint-Holmes syndrome excluded the
presence of intrinsic oculomotor deficits.

By contrast, Rizzo and Vecera (2002; Rizzo, 1993), like Holmes (1918), included in the
posterior parietal dysfunction the oculomotor deficits which they distinguished from the
psychic paralysis of gaze assimilated to spasm of fixation or ocular apraxia. To these two
ocular symptoms they associated spatial disorder of attention corresponding to simul­
tanagnosia (Wolpert 1924), unilateral neglect, and a concentric restriction of the atten­
tive field. Evaluation of spatial disorder of attention using a motor response was therefore
considered to be possibly affected by concurrent visual-motor deficits (OA and ocular
apraxia). Verbal response was considered to more directly probe conscious or attentive
perception. This framework resembles the theory developed by Milner and Goodale
(1995) postulating a neural dissociation between perception and action, with attention po­
sitioned together with perception. In this context of the dominant view of the dorsal
Page 36 of 56
Attentional Disorders

stream being devoted to vision for action, Bálint’s syndrome has been described
(p. 342)

as a set of visual-motor deficits: deficit of arm (reach) and hand (grasp) movements and
psychic paralysis of gaze as a deficit of eye movements. The attentional disorders that
were initially associated with the psychic paralysis of gaze have been implicitly grouped
together with the lateralized attentional deficits assimilated to unilateral neglect and sup­
posed to all rely on the IPL and TPJ, a ventrodorsal pathway intermediate between the
dorsal and the ventral stream.

Altogether these apparent subtleties have given rise to several conceptualizations of the
oculomotor, perceptual, and attentional aspects of the syndrome, which reflect the diffi­
culty to assign these oculomotor disorders to a well-defined dysfunction, as well as the va­
riety of eye movement problems observed from case to case. The first subsection deals
with the assimilation of psychic paralysis of gaze to simultanagnosia, and the second sub­
section reviews in detail the oculomotor impairments that have been described after a le­
sion to the PPC.

Simultanagnosia

Figure 16.19 One main characteristic of simultanag­


nosia is the lack of global perception. For example,
looking at a painting from Arcimboldo, patients with
simultanagnosia report the vegetables but not the
face. This deficit also prevents them from performing
labyrinth tasks (B) and from judging whether a dot is
on line 1 or 2 (A) or whether a dot is within or with­
out a undefined closed shape (C).

Adapted with permission from Michel & Hénaff,


2004.

Simultanagnosia, a term initially coined by Wolpert (1924), defines a deficit in which pa­
tients see only one object at a time (Figure 16.19). This aspect was reminiscent of the de­
scription of Bálint’s patient who was not able to perceive the light of a match while focus­
ing on a cigarette until he felt a burning sensation, a symptom that Bálint (1909) included
within the component he called Seelenlähmung des Schauens. Bálint (1909) mentioned
that this limited capacity of attentive vision for only one object at a time does not depend
on the size of the object. This is a distinction from a visual field deficit. Moreover, in pa­
tients with simultanagnosia, description or copy of complex figures is laborious and slow,
and patients focus serially on details, apprehending portions of the pictures with a piece­

Page 37 of 56
Attentional Disorders

meal approach but failing to switch attention from local details to global structures. Copy
of images is accordingly composed of tiny elements of the original figure without percep­
tion of the whole scene (Wolpert, 1924). Luria (1959) assessed visual perception in a si­
multanagnosic patient with tachistoscopic presentation of two overlapping triangles in
the configuration of the Star of David. When the two triangles were drawn in the same
color, the patient reported a star, but when one was drawn in red and the other in blue,
the patient reported seeing only one triangle (never two, and never a star).

The description of isolated cases of simultanagnosia (Luria, 1959; Wolpert, 1924), without
lateralized bias (unilateral visual neglect) and with OA or impaired visual capture being
considered consecutive to a dissociated visual-motor component, has tended to confirm
Bálint’s classification. However, as mentioned above, the causal relationship between at­
tentional and visual-motor deficit is a possibility, even in patients with subclinical atten­
tional disorder. Michel and Hénaff (2004) have provided a comprehensive examination of
a patient with bilateral PPC lesions whose initial Bálint’s syndrome had been reduced 20
years after onset to bilateral OA and a variety of attentional deficits that could be all in­
terpreted as a concentric shrinking of the attentional field.

Spatial disorder of attention (Bálint, 1909), restriction (de Renzi, 1989) or shrinkage of
the attentional field (Michel & Henaff, 2004), or disorder of simultaneous perception
(Luria, 1959), are equivalent terms to designate this complex symptom, which can be
viewed as a limitation of visual-spatial attentional resources following bilateral lesions of
the parietal-occipital junction. It is tempting to view both the visual grasp reflex and the
wandering of gaze as direct consequences of the shrinking of the attentional field. It is
quite obvious that if one does not perceive any visual target in the periphery, one will not
capture this target with the eyes. Hence the gaze should remain anchored on a target
once acquired (visual grasp reflex). Then, if one is forced to find a specific target in the
visual scene, one would have to make blind exploratory movements, as would be the case
in tunnel vision. The shrinking of the visual (p. 343) field would therefore also account for
the wandering of gaze. However, a competition between visual objects or features for at­
tentional resources is necessary to integrate to the spatial interpretation, in order to ac­
count for the perception of one object among others in these patients, independent of its
size (Bálint, 1909) and even if the two objects are presented at the same location (Luria,
1959). These space-based and object-based aspects of simultanagnosia could be inter­
preted in a more general framework of attention (see a similar view for object-based and
space-based neglect in Driver, 1999) as follows: The feature currently being processed ex­
tinguishes—or impairs processing of—the other feature(s).

Oculomotor Deficits Not Linked to Underlying Attentional Deficit?

Gaze apraxia is characterized by severe abnormalities of generation of eye movements in


response to visual targets in space, in the absence of ocular motor palsy, ascertained by
full reflexive eye movements. Eye movement recordings usually show several abnormali­
ties, such as prolonged latency, fragmentation and hypometria of saccades, fixation drift,
and absence of smooth pursuit (Girotti et al., 1982; Michel et al., 1963). Pattern of oculo­

Page 38 of 56
Attentional Disorders

motor scanning is highly abnormal during scene exploration (Zihl, 2000). Both accuracy
of fixation and saccadic localization are impaired, and spatial-temporal organization of
eye displacements does not fit with the spatial configuration of the scene to be analyzed
(Tyler, 1968). Saccadic behavior is abnormal in rich (natural) environment arrays, which
requires continuous selection between concurrent stimuli, but it may be normal in a sim­
plified context, for example, when the task is to direct the eyes to a peripheral light-emit­
ting diode in the dark (Guard et al., 1984). The issue of whether the parietal lobe is
specifically involved in oculomotor processes per se should first be assessed by testing
such simple saccades to isolated single dots. Such a paradigm has been tested in patients
with a specific lesion of a parietal eye field, unilateral neglect, or OA, as reviewed below.

Pierrot-Deseilligny and Müri (1997) have observed increased reaction times and hypome­
tria for contradirectional reflexive saccades after a lesion to the IPL. They have therefore
postulated the existence of an oculomotor parietal region (parietal eye field; see also Müri
et al., 1996) whose lesion specifically affects the processes of planning and triggering of
contradirectional reflexive saccades. These authors mention that the increase in reaction
time for contradirectional saccades is shorter when the fixation point vanishes about 200
ms before the presentation of the target in the contralesional visual field (“gap para­
digm”) than in a condition of overlap between the two visual locations. This effect sug­
gests that the deficit may be linked to visual extinction or a deficit of attentional disen­
gagement from the cross initially fixated.

Patients with unilateral neglect have also been reported to exhibit late and hypometric
leftward saccades (e.g., Girotti et al., 1983; Walker & Findlay, 1997) when their saccadic
accuracy is reported as normal elsewhere (Behrmann et al., 1997). Finally, Niemeier and
Karnath (2000) have shown that hypometria can be observed casually only for reflexive
saccades triggered in response to left peripheral target presentation, leftward and right­
ward saccades being equivalent in amplitude in conditions of free ocular exploration of vi­
sual scenes. By contrast, the strategy of exploration is clearly impaired in patients with
unilateral neglect (e.g., Ishiai, 2002). No available result seems to rule out that these tem­
poral and strategic deficits in neglect patients may result from a deficient allocation of at­
tention to visual targets in the left periphery.

Finally, patients with pure OA arising from damage of the SPL are, by definition, impaired
for visual-manual reach-and-grasp guidance within their peripheral visual field, without
primary visual, proprioceptive, and motor deficits (Garcin et al., 1967; Jeannerod, 1986);
this definition is supposed to also exclude oculomotor deficits. The slight impairments of
saccadic eye movements detected from clinical tests in some OA patients have not been
considered sufficient to account for their major misreaching deficit (e.g., Rondot et al.,
1977; Vighetto & Perenin, 1981). Indeed, they correspond to only one part of the mis­
reaching deficit: the contribution of the deficit to localizing the target commonly affecting
saccade and reach (field effect), and not the hand effect, which is specific to the reach
(Gaveau et al., 2008; see Figure 16.16 stationary targets). In addition, a deficit to monitor
hand location in peripheral vision (at the start of and during reaching execution) has to
be added to get the whole deficit. A more simple explanation could be that the misreach­

Page 39 of 56
Attentional Disorders

ing arises from only two deficits: one to monitor visual information in eye-centered coor­
dinates (impaired localization of the target and impaired visual guidance of the hand in
the contralesional visual field) and the other to monitor proprioceptive information in eye-
centered coordinates (typical (p. 344) hand effect as impaired proprioceptive guidance of
the ataxic hand; Blangero et al., 2007).

Further investigations are necessary to evaluate whether the typical mislocalization of


the visual target in eye-centered coordinates (field effect; Blangero et al., 2010a) is func­
tionally linked to a deficit or delay in making the attentional selection of the peripheral
target, as suggested by their systematic co-occurrence when the subclinical attentional
deficit is searched for. Striemer et al. (2009) have suggested that it was not the case, but
McIntosh et al. (2011) have revealed a very good correlation between perceptual and vi­
sual-motor delays with a more satisfactory design to compare perceptual and motor
deficits.

To sum up, the most consistent observation following a unilateral lesion of the posterior
parietal cortex is an increase of latency for saccades in the contralesional direction or—in
the case of a bilateral lesion—a poverty of eye movements, that may culminate in a condi­
tion often referred to as spasm of fixation, or visual grasp reflex. As mentioned above,
parietal patients may exhibit a defect in shifting spatial attention (Verfaellie et al., 1990)
exogenously or endogenously. This may be central for the deficit of “attentional disen­
gagement from fixated objects” (Rizzo & Vecera, 2002) and thereby for the visual grasp
reflex to occur. Second, patients are usually able to move their eyes spontaneously or on
verbal command, but they are impaired to perform visually guided saccades. The more at­
tention and complex visual processing the eye movement requires, the less it is likely to
be performed. In this line, visual search behavior is particularly vulnerable. The variabili­
ty and instability of the oculomotor deficit after a parietal lesion can be highlighted, for
single saccades as well as for exploratory scanning, as an argument for an underlying at­
tentional deficit. Accordingly, inactivation of LIP area, considered the parietal saccadic
region in monkeys, affects visual search but not systematically saccades to single targets
(Li et al., 1999; Wardak et al., 2002). We therefore tend to propose that when oculomotor
deficits are observed following a posterior parietal lesion, they stem from an attentional
disorder. This view stands in contrast to the premotor theory of attention (Rizzolatti et al.,
1987), which postulates that spatial attention emerges from (a more basic process of) mo­
tor preparation of eye movements, functionally and anatomically. In our view, the causal
effect is solely due to a functional coupling between attention and motor preparation. In­
deed, attentional selection and motor selection rely on dissociated neural substrates
(Blangero et al., 2010b; Khan et al., 2009) even if they tightly interact with each other.
Moreover, our view is not theoretical or evolutionist. The direction of the causal effect is
not seen as an absolute hierarchy between two systems but is rather proposed only in the
case of a posterior parietal lesion. Other functional links may emerge from different le­
sion locations (e.g., the frontal cortex).

Page 40 of 56
Attentional Disorders

Conclusion and Perspectives


Bálint-Holmes syndrome provides insights into the functional roles assigned to the neu­
ronal populations of the dorsal (occipital-parietal-frontal) stream. These functions include
spatial perception, gating and directing spatial attention, spatial dynamical representa­
tions in multiple reference frames, and spatial coding of eye and hand movements in the
immediate extrapersonal space. We propose to identify two main aspects of the posterior
parietal symptomatology that exclude oculomotor disorders as a specific category of
deficit. The wandering of gaze and the visual grasp reflex appear to depend on the con­
centric shrinking of the attentional field called simultanagnosia. This first component may
appear as unilateral visual extinction in the case of a unilateral lesion. Other symptoms
such as disorganized visual exploration, with typical revisiting behavior, and disappear­
ance of previously viewed components in a visual scene during its exploration or copy
may be attributable to the second aspect, namely visual synthesisimpairment. Even if this
latter component (spatial working memory or visual remapping) has been described in
patients with left neglect (Heide & Kömpf, 1997; Husain et al., 2001) after a parietal le­
sion, this deficit appears to occur in the entire space (Pisella et al., 2004, Figure 16.9) and
to be specific not of neglect syndrome but rather of an anatomical right IPL localization
(because it is also advocated for constructional apraxia; Russell et al., 2010).

It is tempting to conclude from this chapter that, schematically, all deficits that occur af­
ter lesions to the superior parietal lobule (extinction, simultanagnosia, and OA) can be un­
derstood in the framework of visual attention (defined as mechanisms allowing the brain
to increase spatial resolution in the visual periphery and supposed to express as deficits
in time or in space), whereas all deficits consecutive to a lesion extending toward the IPL
include deficits of visual synthesis (spatial representations and remapping mechanisms).
However, before reaching this conclusion, further investigations are (p. 345) needed to
confirm two main assumptions that we have been forced to make to establish our model
on Figure 16.1.

First, remapping processes for covert shifts of attention should exist within the right IPL.
Prime et al. (2008) have shown that only TMS of the right PPC, and not of the left PPC,
disrupts spatial working memory across saccades but also in static conditions. Visual
remapping across overt eye movements and spatial working memory therefore appear
similar in terms of anatomical network. However, they operate at different time scales
and may constitute related but different processes. Indeed, contrasting with the double-
step saccadic literature (Duhamel et al., 1992; Heide et al., 1995), which has claimed that
patients with neglect are impaired to remap contralesional (leftward) saccades, Vuilleumi­
er et al. (2007) have shown impairment in a perceptual spatial working memory task
across saccades after a first rightward saccade in neglect patients. Interestingly, a more
complex account of parietal attentional function has been proposed recently by Riddoch
et al. (2010), distinguishing functionally within the ventral right-hemispheric attentional
network of the dorsal stream several subregions: the supramarginal and angular gyri of
the IPL, but also the TPJ and the superior temporal gyrus. This should provide neural sub­
strates for possible distinction between visual remapping of a goal stimulus for a goal-di­
Page 41 of 56
Attentional Disorders

rected (saccadic or pointing) response and perceptual response based on a spatial repre­
sentation of the configuration of several salient objects of the visual scene. This would al­
so provide a more complex framework to account for the multiple dissociations that have
been reported within the neglect syndrome (e.g., between near and far space, body and
external space, cancellation and bisection tasks).

Finally, the possible causal link between attentional deficit and OA needs to be further in­
vestigated, as should that between attentional deficit and eye movement deficits follow­
ing a lesion to the PPC. The view developed here is that the parietal cortex contains at­
tentional prioritized representations of all possible targets (salient object locations; Got­
tlieb et al., 1998) that feed oculomotor maps (e.g., by communicating the next location to
reach) but are not oculomotor maps themselves. More specifically, further studies are
needed to explain how an attentional deficit can cause a visual-motor deficit, expressed
spatially as hypometric errors in eye-centered coordinates, increasing with target eccen­
tricity (Blangero et al., 2010a; Gaveau et al., 2008) or temporally by a delay of visual cap­
ture. Old psychophysical studies have demonstrated that low-energy targets actually elic­
it high-latency and hypometric saccadic eye movements (Pernier et al., 1969; Prablanc &
Jeannerod, 1974). An explanation of this phenomenon in terms of spatial map organiza­
tion at different subcortical and cortical levels is necessary to establish a link between at­
tention and visual-motor metric errors.

References
Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. An­
nual Review of Neuroscience, 25, 189–220.

Anderson, C., & Van Essen, D. (1987). Shifter circuits: A computational strategy for dy­
namic aspects of visual processing. Proceedings of the National Academy of Sciences U S
A, 84, 6297–6301.

Ashbridge, E., Walsh, V., & Cowey, A. (1997). Temporal aspects of visual search studied by
transcranial magnetic stimulation. Neuropsychologia, 35 (8), 1121–1131.

Bálint, R. (1909). “Seelenlähmung des Schauens, optische Ataxie, raümliche Störung der
Aufmerksamkeit.” Monatsschrift für Psychiatrie und Neurologie, 25, 51–81.

Bartolomeo, P. (2000). Inhibitory processes and spatial bias after right hemisphere dam­
age. Neuropsychological Rehabilitation, 10 (5), 511–526.

Bartolomeo, P., & Chokron, S. (1999). Left unilateral neglect or right hyperattention?
Neurology, 53 (9), 2023–2027.

Bartolomeo, P., & Chokron, S. (2002). Orienting of attention in left unilateral neglect.
Neuroscience and Biobehavioral Reviews, 26, 217–234.

Bartolomeo, P., Perri, R., & Gainotti, G. (2004). The influence of limb crossing on left tac­
tile extinction. Journal of Neurology, Neurosurgery, and Psychiatry, 75 (1), 49–55.

Page 42 of 56
Attentional Disorders

Becker, E., & Karnath, H. O. (2007). Incidence of visual extinction after left versus right
hemisphere stroke. Stroke, 38 (12), 3172–3174.

Behrmann, M., Watt, S., Black, S. E., & Barton, J. J. (1997). Impaired visual search in pa­
tients with unilateral neglect: An oculographic analysis. Neuropsychologia, 35 (11), 1445–
1458.

Berman, R. A., & Colby C. 2009 Attention and active vision. Vision Research, 49 (10),
1233–1248.

Bisiach, E., Ricci, R., Lualdi, M., & Colombo, M. R. (1998). Perceptual and response bias
in unilateral neglect: Two modified versions of the Milner landmark task. Brain and Cog­
nition, 37 (3), 369–386.

Blangero, A., Delporte, L., Vindras, P., Ota, H., Revol, P., Boisson, D., Rode, G., Vighetto,
A., Rossetti, Y. & Pisella, L. (2007). Optic ataxia is not only “optic”: Impaired spatial inte­
gration of proprioceptive information. NeuroImage, 36, 61–68.

Blangero, A., Gaveau, V., Luauté, J., Rode, G., Salemme, R., Boisson, D., Guinard, M.,
Vighetto, A., Rossetti, Y. & Pisella, L. (2008). A hand and a field effect on on-line motor
control in unilateral optic ataxia. Cortex, 44 (5), 560–568.

Blangero, A., Khan, A. Z., Salemme, R., Laverdure, N., Boisson, D., Rode, G.,
(p. 346)

Vighetto, A., Rossetti, Y., & Pisella L. (2010b). Pre-saccadic perceptual facilitation can oc­
cur without covert orienting of attention. Cortex, 46 (9), 1132–1137.

Blangero, A., Ota, H., Rossetti, Y., Fujii, T., Luaute, J., Boisson, D., Ohtake, H., Tabuchi,
M., Vighetto, A., Yamadori, A., Vindras, P., & Pisella, L. (2010a). Systematic retinotopic er­
ror vectors in unilateral optic ataxia. Cortex, 46 (1), 77–93.

Broadbent, D. E., & Broadbent, M. H. P. (1987). From detection to identification: Re­


sponse to multiple targets in rapid serial visual presentation. Perception and Psy­
chophysics, 42, 105–113.

Brouchon, M., Joanette, Y., & Samson, M. (1986). From movement to gesture: “Here” and
“there” as determinants of visually guided pointing. In J. L. Nespoulos, A. Perron, & R. A.
Lecours (Eds.), Biological foundations of gesture (pp. 95–107). Mahwah, NJ: Erlbaum.

Buxbaum, L. J., & Coslett, H. B. (1998). Spatio-motor representations in reaching: Evi­


dence for subtypes of optic ataxia. Cognitive Neuropsychology, 15 (3), 279–312.

Carey, D. P., Coleman R. J., & Della Sala, S. (1997). Magnetic misreaching. Cortex, 33 (4),
639–652.

Carey, D. P., Harvey, M., & Milner, A. D. (1996). Visuomotor sensitivity for shape and ori­
entation in a patient with visual form agnosia. Neuropsychologia, 34 (5), 329–337.

Page 43 of 56
Attentional Disorders

Carrasco, M., Penpeci-Talgar, C., & Eckstein, M. (2000). Spatial covert attention enhances
contrast sensitivity across the CSF: Support for signal enhancement. Vision Research, 40,
1203–1215.

Colby, C. L., Duhamel, J.-R., & Goldberg, M. E. (1995). Oculocentric spatial representation
in parietal cortex. Cerebral Cortex, 5, 470–481.

Colby, C. L., & Goldberg, M. E. (1999). Space and attention in parietal cortex. Annual Re­
view of Neuroscience, 22, 319–349.

Corbetta, M., Kincade, M. J., Lewis, C., Snyder, A. Z., & Sapir, A. (2005). Neural basis and
recovery of spatial attention deficits in spatial neglect. Nature Neuroscience, 8 (11),
1603–1610.

Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol­
untary orienting is dissociated from target detection in human posterior parietal cortex.
Nature Neuroscience, 3 (3), 292–297.

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten­
tion in the brain. Nature Reviews Neuroscience, 3 (3), 201–215.

Danckert, J., & Rossetti, Y. (2005). Blindsight in action: What can the different sub-types
of blindsight tell us about the control of visually guided actions? Neuroscience and Biobe­
havioral Reviews, 29 (7), 1035–1046.

De Renzi, E. (1989). Bálint-Holmes syndrome. In Classic cases in neuropsychology (pp.


123–143). Hove, UK: Psychology Press.

Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition:
Evidence for a common attentional mechanism. Vision Research, 36, 1827–1837.

Dijkerman, H. C., McIntosh, R. D., Anema, H. A., de Haan, E. H., Kappelle, L. J., & Milner,
A. D. (2006). Reaching errors in optic ataxia are linked to eye position rather than head or
body position. Neuropsychologia, 44, 2766–2773.

Di Pellegrino, G., Basso, G., & Frassinetti, F. (1998). Visual extinction as a spatio-temporal
disorder of selective attention. NeuroReport, 9, 835–839.

Driver, J. (1999). Egocentric and object-based visual neglect. In N. Burgess, K. J. Jeffery &
J. O. O’Keefe (Eds.), The hippocampal and parietal foundations of spatial cognition. (pp.
66–89). Oxford, UK: Oxford University Press.

Driver, J., & Husain, M. (2002). The role of spatial working memory deficits in pathologi­
cal search by neglect patients. In H. O. Karnath, A. D. Milner, & G. Vallar (Eds.), The cog­
nitive and neural bases of spatial neglect (pp. 351–364). Oxford, UK: Oxford University
Press.

Page 44 of 56
Attentional Disorders

Driver, J., & Mattingley J. B. (1998). Parietal neglect and visual awareness. Nature Neuro­
science, 1, 17–22.

Duhamel, J.-R., Goldberg, M. E., Fitzgibbon, E. J., Sirigu, A., & Grafman, J. (1992). Sac­
cadic dysmetria in a patient with a right frontoparietal lesion. Brain, 115, 1387–1402.

Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and lo­
cations: evidence from normal and parietal lesion subjects. Journal of Experimental Psy­
chology: General, 123 (2), 161–177.

Ellison, A., Schindler, I., Pattison, L. L., & Milner, A. D. (2004). An exploration of the role
of the superior temporal gyrus in visual search and spatial perception using TMS. Brain,
127 (Pt 10), 2307–2315.

Ferber, S., & Danckert, J. (2006). Lost in space—the fate of memory representations for
non-neglected stimuli. Neuropsychologia, 44 (2), 320–325.

Fierro, B., Brighina, F., Oliveri, M., Piazza, A., La Bua, V., Buffa, D., & Bisiach, E. (2000).
Contralateral neglect induced by right posterior parietal rTMS in healthy subjects. Neu­
roReport, 11 (7), 1519–1521.

Fink, G. R., Marshall, J. C., Shah, N. J., Weiss, P. H., Halligan, P. W., Grosse-Ruyken, M.,
Ziemons, K., Zilles, K., & Freund, H. J. (2000). Line bisection judgments implicate right
parietal cortex and cerebellum as assessed by fMRI. Neurology, 54 (6), 1324–1331.

Friedrich, F. J., Egly, R., Rafal, R. D., & Beck, D. (1998). Spatial attention deficits in hu­
mans: A comparison of superior parietal and temporo-parietal junction lesions. Neuropsy­
chology, 12, 193–207.

Garcin, R., Rondot, P., & de Recondo J. (1967). Ataxie optique localisée aux deux
hémichamps visuels homonymes gauches. Revue Neurologique, 116 (6), 707–714.

Gaveau, V., Pélisson, D., Blangero, A., Urquizar, C., Prablanc, C., Vighetto, A. & Pisella, L.
(2008). Saccadic control and eye-hand coordination in optic ataxia. Neuropsychologia, 46,
475–486.

Girard, P., Salin, P. A., & Bullier, J. (1991). Visual activity in macaque area V4 depends on
area 17 input. NeuroReport, 2 (2), 81–84.

Girard, P., Salin, P. A., & Bullier, J. (1992). Response selectivity of neurons in area MT of
the macaque monkey during reversible inactivation of area V1. Journal of Neurophysiolo­
gy, 67 (6), 1437–1446.

Girotti, F., Casazza, M., Musicco, M., & Avanzini, G. (1983). Oculomotor disorders in corti­
cal lesions in man: The role of unilateral neglect. Neuropsychologia, 21, 543–553.

Page 45 of 56
Attentional Disorders

Girotti, F., Milanese, C., et al. (1982). Oculomotor disturbances in Bálint’s syndrome:
Anatomoclinical findings and electrooculographic analysis in a case. Cortex, 18 (4), 603–
614.

Goldberg, M. E., & Bruce, C. J. (1990). Primate frontal eye fields. III. Maintenance of a
spatially accurate saccade signal. Journal of Neurophysiology, 64, 489–508.

Goodale, M. A., Milner, A. D., Jacobson, L. S., & Carey, D. P. (1991). A neurological dissoci­
ation between perceiving objects and grasping them. Nature, 349, 154–156.

Gottlieb, J. P., Kusunoki, M., & Goldberg, M. E. (1998). The representation of visu­
(p. 347)

al salience in monkey parietal cortex. Nature, 391, 481–484.

Gréa, H., Pisella, L., Rossetti, Y., Desmurget, M., Tilikete, C., Prablanc, C., & Vighetto, A.
(2002). A lesion of the posterior parietal cortex disrupts on-line adjustments during aim­
ing movements. Neuropsychologia, 40, 2471–2480.

Guard, O., Perenin, M. T., Vighetto, A., Giroud, M., Tommasi, M., & Dumas, R. (1984).
Syndrome pariétal bilatéral ressemblant au syndrome de Bálint. Revue Neurologique, 140
(5), 358–367.

Hecaen, H., & De Ajuriaguerra, J. (1954). Bálint’s syndrome (psychic paralysis of visual
fixation) and its minor forms. Brain, 77 (3), 373–400.

Heide, W., Binkofski, F., Seitz, R. J., Posse, S., Nitschke, M. F., Freund, H. J., & Kömpf, D.
(2001). Activation of frontoparietal cortices during memorized triple-step sequences of
saccadic eye movements: an fMRI study. European Journal of Neuroscience, 13 (6), 1177–
1189.

Heide, W., Blankenburg, M., Zimmermann, E., & Kömpf, D. (1995). Cortical control of
double-step saccades: implications for spatial orientation. Annals of Neurology, 38, 739–
748.

Heide, W., & Kömpf, D. (1997). Specific parietal lobe contribution to spatial constancy
across saccades. In: P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orienta­
tion in 3D space (pp. 149–172). Heidelberg: Springer-Verlag.

Heilman, K. M. (2004). Intentional neglect. Frontiers in Bioscience, 9, 694–705.

Hilgetag, C. C., Théoret, H., & Pascual-Leone, A. (2001). Enhanced visual spatial atten­
tion ipsilateral to rTMS-induced “virtual lesions” of human parietal cortex. Nature Neuro­
science, 4 (9), 953–957.

Hillis, A. E., & Caramazza, A. (1991). Deficit to stimulus-centered, letter shape represen­
tations in a case of “unilateral neglect.” Neuropsychologia, 29 (12), 1223–1240.

Page 46 of 56
Attentional Disorders

Hillis, A. E., Chang, S., Heidler-Gary, J., Newhart, M., Kleinman, J. T., Davis, C., Barker, P.
B., Aldrich, E., & Ken, L. 2006 Neural correlates of modality-specific spatial extinction.
Journal of Cognitive Neuroscience, 18 (11), 1889–1898.

Holmes, G. (1918). Disturbances of visual orientation. British Journal of Ophthalmology, 2,


449–468, 506–518.

Humphreys, G. W., Romani, C., Olson, A., Riddoch, M. J., & Duncan, J. (1994). Non-spatial
extinction following lesions of the parietal lobe in humans. Nature, 372, 357–359.

Husain, M. (2001). A spatio-temporal framework for disorders of visual attention. In K.


Shapiro (Ed)., The limits of attention: Temporal constraints in human information process­
ing (pp. 229–246). Oxford, UK: Oxford University Press.

Husain, M. (2008). Hemispatial neglect. In G. Goldenberg & B. Miller (Eds.), Handbook of


clinical Neurology, 3rd Series, Volume 88: Neuropsychology and behavioral neurology
(pp. 359–372). Amsterdam: Elsevier.

Husain, M., Mannan, S., Hodgson, T., Wojciulik, E., Driver, J., & Kennard, C. (2001). Im­
paired spatial working memory across saccades contributes to abnormal search in pari­
etal neglect. Brain, 124 (Pt 5), 941–952.

Husain, M., Mattingley, J. B., Rorden, C., Kennard, C., & Driver, J. (2000). Distinguishing
sensory and motor biases in parietal and frontal neglect. Brain, 123 (Pt 8), 1643–1659.

Husain, M., & Rorden, C. (2003). Non-spatially lateralized mechanisms in hemispatial ne­
glect. Nature Reviews Neurosci ence, 4 (1), 26–36.

Husain, M., Shapiro, K., Martin, J., & Kennard, C. (1997). Abnormal temporal dynamics of
visual attention in spatial neglect patients. Nature, 385, 154–156.

Husain, M., & Stein, J. (1988). Reszö Bálint and his most celebrated case. Archives of
Neurology, 45, 89–93.

Ishiai, S. (2002). Perceptual and motor interaction in unilateral spatial neglect. In H. O.


Karnath, A. D. Milner, G. Vallar (Eds.), The cognitive and neural bases of spatial neglect
(pp. 181–195). Oxford, UK: Oxford University Press.

Jackson, S. R., Newport, R., Mort, D., & Husain, M. (2005). Where the eye looks, the hand
follows: Limb-dependent magnetic misreaching in optic ataxia. Current Biology, 15, 42–
46.

Jakobson, L. S., Archibald, Y. M., Carey, D. P., & Goodale, M. A. (1991). A kinematic analy­
sis of reaching and grasping movements in a patient recovering from optic ataxia. Neu­
ropsychologia, 29, 803–809.

Jeannerod, M. (1986). Mechanisms of visuo-motor coordination: A study in normals and


brain-damaged subjects. Neuropsychologia, 24, 41–78.

Page 47 of 56
Attentional Disorders

Jeannerod, M., Decety, J., et al. (1994). Impairment of grasping movements following bi­
lateral posterior parietal lesion. Neuropsychologia, 32, 369–380.

Jeannerod, M., & Rossetti, Y. (1993). Visuomotor coordination as a dissociable function:


Experimental and clinical evidence. In C. Kennard (Ed.), Visual perceptual defects.
Baillière’s clinical neurology, international practise and research (pp. 439–460). London,
Ballière Tindall.

Karnath, H. O., Himmelbach, M., & Küker, W. (2003). The cortical substrate of visual ex­
tinction. NeuroReport, 14 (3), 437–442.

Kennard, C., Mannan, S. K., Nachev, P., Parton, A., Mort, D. J., Rees, G., Hodgson, T. L., &
Husain, M. (2005). Cognitive processes in saccade generation. Annals of N Y Academy of
Sci ence, 1039, 176–183.

Khan, A. Z., Blangero, A., Rossetti, Y., Salemme, R., Luauté, J., Laverdure, N., Rode, G.,
Boisson, D., & Pisella, L. (2009). Parietal damage dissociates saccade planning from pre-
saccadic perceptual facilitation. Cerebral Cortex, 19 (2), 383–387.

Khan, A. Z., Pisella, L., Rossetti, Y., Vighetto, A., & Crawford, J. D. (2005b). Impairment of
gaze-centered updating of reach targets in bilateral parietal-occipital damaged patients.
Cerebral Cortex, 15 (10), 1547–1560.

Khan, A. Z., Pisella, L., Vighetto, A., Cotton, F., Luauté, J., Boisson, D., Salemme, R., Craw­
ford, J. D., & Rossetti, Y. (2005a). Optic ataxia errors depend on remapped, not viewed,
target location. Nature Neuroscience, 8 (4), 418–420.

Kinsbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from at­
tentional gradients within hemispace. In I. H. Robertson & J. C. Marshall (Eds.), Unilater­
al neglect: Clinical and experimental studies (pp. 63–86). Hove, UK: Erlbaum.

Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying
neural circuitry. Human Neurobiology, 4, 219–227.

Konen, C. S., & Kastner, S. (2008). Two hierarchically organized neural systems for object
information in human visual cortex. Nature Neuroscience, 11 (2), 224–231.

Làdavas, E. (1990). Selective spatial attention in patients with visual extinction. Brain,
113 (Pt 5), 1527–1538.

(p. 348) Li, C. S., Mazzoni, P., & Andersen, R. A. (1999). Effect of reversible inactivation of
macaque lateral intraparietal area on visual and memory saccades. Journal of Neurophys­
iology, 81 (4), 1827–1838.

Luauté, J., Halligan, P., Rode, G., Jacquin-Courtois, S., & Boisson, D. (2006). Prism adapta­
tion first among equals in alleviating left neglect: A review. Restorative Neurology and
Neuroscience, 24 (4–6), 409–418.

Page 48 of 56
Attentional Disorders

Luo, C. R., Anderson, J. M., & Caramazza, A. (1998). Impaired stimulus-driven orienting of
attention and preserved goal-directed orienting of attention in unilateral visual neglect.
American Journal of Psychology, 111 (4), 487–507.

Luria, A. R. (1959). Disorders of “simultaneous” perception in a case of occipito-parietal


brain injury. Brain, 82, 437–449.

Malhotra, P., Coulthard, E. J., & Husain, M. (2009). Role of right posterior parietal cortex
in maintaining attention to spatial locations over time. Brain, 132 (Pt 3), 645–660.

Malhotra, P., Jäger, H. R., Parton, A., Greenwood, R., Playford, E. D., Brown, M. M., Dri­
ver, J., & Husain, M. (2005). Spatial working memory capacity in unilateral neglect. Brain,
128 (Pt 2), 424–435.

Mannan, S. K., Mort, D. J., Hodgson, T. L., Driver, J., Kennard, C., & Husain, M. (2005). Re­
visiting previously searched locations in visual neglect: Role of right parietal and frontal
lesions in misjudging old locations as new. Journal of Cognitive Neuroscience, 17 (2), 340–
354.

Mattingley, J. B., Davis, G., & Driver, J. (1997). Preattentive filling-in of visual surfaces in
parietal extinction. Science, 275 (5300), 671–674.

Mattingley, J. B., Husain, M., Rorden, C., Kennard, C., & Driver, J. (1998). Motor role of
human inferior parietal lobe revealed in unilateral neglect patients. Nature, 392 (6672),
179–182.

Mattingley, J. B., Pisella, L., Rossetti, Y., Rode, G., Tilikete, C., Boisson, D., Vighetto, A.
(2000). Visual extinction in retinotopic coordinates: a selective bias in dividing attention
between hemifields. Neurocase 6, 465–475.

Mays, L. E., & Sparks, D. L. (1980). Dissociation of visual and saccade-related responses
in superior colliculus neurons. Journal of Neurophysiology, 43, 207–232.

McIntosh, R.D., Mulroue, A., Blangero, A., Pisella, L., & Rossetti, Y. (2011). Correlated
deficits of perception and action in optic ataxia. Neuropsychologia, 49, 131–137.

Medendorp, W. P., Goltz, H. C., Vilis, T., & Crawford, J. D. (2003). Gaze-centered updating
of visual space in human parietal cortex. Journal of Neuroscience, 23 (15), 6209–6214.

Merriam, E. P., Genovese, C. R., & Colby, C. L. (2003). Spatial updating in human parietal
cortex. Neuron, 39, 361–373.

Merriam, E. P., Genovese, C. R., & Colby, C. L. (2007). Remapping in human visual cortex.
Journal of Neurophysiology, 97 (2), 1738–1755.

Michel, F., & Hénaff, M. A. (2004). Seeing without the occipito-parietal cortex: Simul­
tanagnosia as a shrinkage of the attentional visual field. Behavioural Neurology, 15, 3–13.

Page 49 of 56
Attentional Disorders

Michel, F., Jeannerod, M., & Devic M. (1963). Un cas de désorientation visuelle dans les
trois dimensions de l’espace (A propos du syndrome de Bálint et du syndrome décrit par
G Holmes). Revue Neurologique, 108, 983–984.

Milner, A. D., Dijkerman, H. C., McIntosh, R. D., Rossetti, Y., & Pisella, L. (2003). Delayed
reaching and grasping in patients with optic ataxia. In D. Pelisson, C. Prablanc & Y. Ros­
setti (Eds.), Progress in Brain Research series: Neural control of space coding and action
production (pp. 142, 225–242). Amsterdam: Elsevier.

Milner, A. D., Dijkermann, C., Pisella, L., McIntosh, R., Tilikete, C., Vighetto, A., & Rosset­
ti, Y. (2001). Grasping the past: Delaying the action improves visuo-motor performance.
Current Biology, 11 (23), 1896–1901.

Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford, UK: Oxford Uni­
versity Press.

Milner, A. D., & Goodale, M. A. (2008). Two visual systems re-viewed. Neuropsychologia,
46 (3), 774–785.

Milner, A. D., Paulignan Y., Dijkerman H. C., Michel F., & Jeannerod M. (1999). A paradox­
ical improvement of misreaching in optic ataxia: New evidence for two separate neural
systems for visual localization. Proceedings of the Royal Society of London B, 266, 2225–
2229.

Morris, A. P., Chambers, C. D., & Mattingley, J. B. (2007). Parietal stimulation destabilizes
spatial updating across saccadic eye movements. Proceedings of the National Academy of
Science U S A, 104 (21), 9069–9074.

Mort, D. J., Malhotra, P., Mannan, S. K., Rorden, C., Pambakian, A., Kennard, C., & Hu­
sain, M. (2003). The anatomy of visual neglect. Brain, 126 (Pt 9), 1986–1997.

Muggleton, N. G., Cowey, A., & Walsh, V. (2008). The role of the angular gyrus in visual
conjunction search investigated using signal detection analysis and transcranial magnetic
stimulation. Neuropsychologia, 46 (8), 2198–2202.

Müri, R. M., Iba-Zizen, M. T., et al. (1996). Location of the human posterior eye field with
functional magnetic resonance imaging. Journal of Neurology, Neurosurgery, and Psychia­
try, 60, 445–448.

Niebur, E., & Koch, C. (1997). Computational architectures for attention. In R. Parasura­
man (Ed.), The attentive brain (pp. 163–186). Cambridge, MA: MIT Press.

Niemeier, M., & Karnath, H. O. (2000). Exploratory saccades show no direction-specific


deficit in neglect. Neurology, 54 (2), 515–518.

Nowak, L., & Bullier, J. (1997). The timing of information transfer in the visual system. In
J. Kaas, K. Rochland, & A. Peters (Eds.), Extrastriate cortex in primates. New York:
Plenum Press.

Page 50 of 56
Attentional Disorders

Ota, H., Fujii, T., Suzuki, K., Fukatsu, R., & Yamadori, A. (2001). Dissociation of body-cen­
tered and stimulus-centered representations in unilateral neglect. Neurology, 57 (11),
2064–2069.

Pascual-Leone, A., Gomez-Tortosa, E., Grafman, J., Always, D., Nichelli, P., & Hallett, M.
(1994). Induction of visual extinction by rapid-rate transcranial magnetic stimulation of
parietal lobe. Neurology, 44 (3 Pt 1), 494–498.

Patterson, A., & Zangwill, O. L. (1944). Disorders of visual space perception associated
with lesions of the right cerebral hemisphere. Brain, 67, 331–358.

Perenin, M.-T. & Vighetto, A. (1988) Optic ataxia: a specific disruption in visuomotor
mechanisms. I. Different aspects of the deficit in reaching for objects. Brain, 111 (Pt 3),
643–674.

Pernier, J., Jeannerod, M., & Gerin, P. (1969) Preparation and decision in saccades: adap­
tation to the trace of the stimulus. Vision Research, 9 (9), 1149–1165.

Pierrot-Deseilligny, C., & Müri, R. (1997). Posterior parietal cortex control of saccades in
humans. In P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orientation in
3D space (pp. 135–148). Heidelberg: Springer-Verlag.

Pisella, L., Alahyane, N., Blangero, A., Thery, F., Blanc, S., Rode, G., & Pelisson, D. (2011).
Right-hemispheric dominance for visual remapping in humans. Philosophical Transactions
of the Royal Society of London, Series B, Biological Sciences, 366 (1564), 572–585.

Pisella, L., Berberovic, N., & Mattingley, J. B. (2004). Impaired working memory
(p. 349)

for location but not for colour or shape in visual neglect: a comparison of parietal and
non-parietal lesions. Cortex, 40 (2), 379–390.

Pisella, L., Binkofski, F., Lasek, K., Toni, I., & Rossetti, Y. (2006a). No double-dissociation
between optic ataxia and visual agnosia: Multiple sub-streams for multiple visuo-manual
integrations. Neuropsychologia, 44 (13), 2734–2748.

Pisella, L., Gréa, H., Tilikete, C., Vighetto, A., Desmurget, M., Rode, G., Boisson, D., &
Rossetti, Y. (2000). An automatic pilot for the hand in the human posterior parietal cortex
toward a reinterpretation of optic ataxia. Nature Neuroscience, 3, 729–736.

Pisella, L., & Mattingley, J. B. (2004). The contribution of spatial remapping impairments
to unilateral visual neglect. Neuroscience and Biobehavioral Reviews, 28 (2), 181–200.

Pisella, L., Ota, H., Vighetto, A., & Rossetti, Y. (2007). Optic ataxia and Bálint syndrome:
Neurological and neurophysiological prospects. In G. Goldenberg & B. Miller (Eds.),
Handbook of clinical neurology, 3rd Series, Volume 88: Neuropsychology and behavioral
neurology (pp. 393–416). Amsterdam: Elsevier.

Page 51 of 56
Attentional Disorders

Pisella, L., Rode, G., Farne, A., Tilikete, C., & Rossetti, Y. (2006b). Prism adaptation in the
rehabilitation of patients with visuo-spatial cognitive disorders. Current Opinion in Neu­
rology, 19 (6), 534–542.

Pisella, L., Sergio, L., Blangero, A., Torchin, H., Vighetto, A., & Rossetti, Y. (2009). Optic
ataxia and the function of the dorsal stream: Contribution to perception and action. Neu­
ropsychologia, 47, 3033–3044.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,


32, 3–25.

Posner, M. I., Walker, J. A., Friedrich, F. J., & Rafal, R. D. (1984). Effects of parietal injury
on covert orienting of attention. Journal of Neuroscience, 4, 1863–1874.

Prablanc, C., & Jeannerod, M. (1974). Latence et precision des saccades en fonction de
l’intensité, de la durée et de la position rétinienne d’un stimulus. Revue E.E.G., 4 (3), 484–
488.

Prime, S. L., Vesia, M., & Crawford, J. D. (2008). Transcranial magnetic stimulation over
posterior parietal cortex disrupts transsaccadic memory of multiple objects. Journal of
Neuroscience, 28 (27), 6938–6949.

Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual
processing in RSVP task: An attentional blink? Journal of Experimental Psychology: Hu­
man Perception and Performance, 18, 849–860.

Rees, G. (2001). Neuroimaging of visual awareness in patients and normal subjects. Cur­
rent Opinions in Neurobiology, 11 (2), 150–156.

Riddoch, M. J., Chechlacz, M., Mevorach, C., Mavritsaki, E., Allen, H., & Humphreys, G.
W. (2010). The neural mechanisms of visual selection: The view from neuropsychology.
Annals of the N Y Academy of Science, 1191 (1), 156–181.

Rizzo, M. (1993). “Bálint syndrome” and associated visuo-spatial disorders. Baillière’s


Clinical Neurology, 2, 415–437.

Rizzo, M., & Vecera, S. P. (2002). Psychoanatomical substrates of Bálint ‘s syndrome. Jour­
nal of Neurology, Neurosurgery, and Psychiatry, 72 (2), 162–178.

Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25, 31–40.

Robertson, I. (1989). Anomalies in the laterality of omissions in unilateral left visual ne­
glect: Implications for an attentional theory of neglect. Neuropsychologia, 27 (2), 157–
165.

Page 52 of 56
Attentional Disorders

Rode, G., Luauté, J., Klos, T., Courtois-Jacquin, S., Revol, P., Pisella, L., Holmes, N. P., Bois­
son D., & Rossetti, Y. (2007a). Bottom-up visuo-manual adaptation: Consequences for spa­
tial cognition. In P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Attention and performance
XXI: Sensorimotor foundations of higher cognition (pp. 207–229). Oxford, UK: Oxford Uni­
versity Press.

Rode, G., Pisella, L., Marsal, L., Mercier, S., Rossetti, Y., & Boisson, D. (2006). Prism adap­
tation improves spatial dysgraphia following right brain damage. Neuropsychologia, 44
(12), 2487–2493.

Rode, G., Perenin, M. T., & Boisson, D. (1995). [Neglect of the representational space:
Demonstration by mental evocation of the map of France]. Revue Neurologique, 151 (3),
161–164.

Rode, G., Revol, P., Rossetti, Y., Boisson, D., & Bartolomeo, P. (2007b). Looking while
imagining: The influence of visual input on representational neglect. Neurology, 68 (6),
432–437.

Rondot, P., de Recondo J., & Ribadeau-Dumas, J. L. (1977). Visuomotor ataxia. Brain, 100
(2), 355–376.

Rorden, C., Mattingley, J. B., Karnath, H.-O., & Driver, J. (1997). Visual extinction as prior
entry: Impaired perception of temporal order with intact motion perception after parietal
injury. Neuropsychologia, 35, 421–433.

Rossetti, Y. (1998). Implicit short-lived motor representations of space in brain damaged


and healthy subjects. Consciousness and Cognition, 7, 520–558.

Rossetti, Y., McIntosh, R. M., Revol, P., Pisella, L., Rode, G., Danckert, J., Tilikete, C., Dijk­
erman, H. C. M., Boisson, D., Michel, F., Vighetto, A., & Milner, A. D. (2005). Visually
guided reaching: posterior parietal lesions cause a switch from visuomotor to cognitive
control. Neuropsychologia, 43/2, 162–177.

Rossetti, Y., & Pisella L. (2002). Tutorial. Several “vision for action” systems: A guide to
dissociating and integrating dorsal and ventral functions. In W. Prinz & B. Hommel (Eds.),
Attention and performance XIX: Common mechanisms in perception and action (pp. 62–
119). Oxford, UK: Oxford University Press.

Rossetti, Y., Pisella, L., & Pélisson, D. (2000). New insights on eye blindness and hand
sight: Temporal constraints of visuomotor networks. Visual Cognition, 7, 785–808.

Rossetti, Y., Pisella, L. & Vighetto, A. (2003). Optic ataxia revisited: Visually guided action
versus immediate visuo-motor control. Experimental Brain Research, 153 (2), 171–179.

Rossetti, Y., & Revonsuo, A. (2000) Beyond dissociations: Recomposing the mind-brain af­
ter all? In Y. Rossetti & A. Revonsuo (Eds.), Beyond dissociation: Interaction between dis­
sociated implicit and explicit processing (pp. 1–16). Amsterdam: Benjamins.

Page 53 of 56
Attentional Disorders

Rossetti, Y., Rode, G., Pisella, L., Farne A., Ling L., Boisson D., & Perenin, M. T. (1998).
Hemispatial neglect and prism adaptation: When adaptation to rightward optical devia­
tion rehabilitates the neglected left side. Nature, 395, 166–169.

Russell, C., Deidda, C., Malhotra, P., Crinion, J. T., Merola, S., & Husain, M. (2010). A
deficit of spatial remapping in constructional apraxia after right-hemisphere stroke.
Brain, 133, 1239–1251.

Schindler, I., McIntosh, R. D., Cassidy, T. P., Birchall, D., Benson, V., Ietswaart, M.,
(p. 350)

& Milner, A. D. (2009). The disengage deficit in hemispatial neglect is restricted to be­
tween-object shifts and is abolished by prism adaptation. Experimental Brain Research,
192 (3), 499–510.

Schulman, G. L., Astafiev, S. V., McAvoy, M. P., d’Avossa, G., & Corbetta, M. (2007). Right
TPJ deactivation during visual search: Functional significance and support for a filter hy­
pothesis. Cerebral Cortex, 17 (11), 2625–2633.

Smith, S., & Holmes, G. (1916). A case of bilateral motor apraxia with disturbance of visu­
al orientation. British Medical Journal, 1, 437–441.

Snyder LH, Batista AP, Andersen RA (1997). Coding of intention in the posterior parietal
cortex. Nature, 386 (6621), 167–170.

Striemer, C., Blangero, A., Rossetti, Y., Boisson, D., Rode, G., Vighetto, A., Pisella, L., &
Danckert, J. (2007). Deficits in peripheral visual attention in patients with optic ataxia.
NeuroReport, 18 (11), 1171–1175.

Striemer, C., Blangero, A., Rossetti, Y., Boisson, D., Rode, G., Salemme, R., Vighetto, A.,
Pisella, L., & Danckert, J. (2008). Bilateral posterior parietal lesions disrupt the beneficial
effects of prism adaptation on visual attention: Evidence from a patient with optic ataxia.
Experimental Brain Research, 187 (2), 295–302.

Striemer, C., Locklin, J., Blangero, A., Rossetti, Y., Pisella, L., & Danckert, J. (2009). Atten­
tion for action? Examining the link between attention and visuomotor control deficits in a
patient with optic ataxia. Neuropsychologia, 47 (6), 1491–1499.

Tian, J., Schlag J., & Schlag-Rey, M. (2000). Testing quasi-visual neurons in the monkey’s
frontal eye field with the triple-step paradigm. Experimental Brain Research, 130, 433–
440.

Tyler, H. R. (1968). Abnormalities of perception with defective eye movements (Bálint’s


syndrome). Cortex, 4, 154–171.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam­
bridge, MA: MIT Press.

Page 54 of 56
Attentional Disorders

Vallar, G. (2007). Spatial neglect, Balint-Homes’ and Gerstmann’s syndrome, and other
spatial disorders. CNS Spectrums, 12 (7), 527–536.

Vallar, G., & Perani, D. (1986). The anatomy of unilateral neglect after right-hemisphere
stroke lesions: A clinical/CT-scan correlation study in man. Neuropsychologia, 24, 609–
622.

van Koningsbruggen, M. G., Gabay, S., Sapir, A., Henik, A., & Rafal, R. D. (2010). Hemi­
spheric asymmetry in the remapping and maintenance of visual saliency maps: A TMS
study. Journal of Cognitive Neurosci ence, 22 (8), 1730–1738.

Vecera, S. P., & Rizzo, M. (2006). Eye gaze does not produce reflexive shifts of attention:
Evidence from frontal-lobe damage. Neuropsychologia, 44, 150–159.

Verfaellie, M., Rapcsak, S. Z., et al. (1990). Impaired shifting of attention in Bálint ‘s syn­
drome. Brain and Cognition, 12 (2), 195–204.

Vighetto, A. (1980). Etude neuropsychologique et psychophysique de l’ataxie optique.


Thèse Université Claude Bernard Lyon I.

Vighetto, A., & Perenin, M. T. (1981). Optic ataxia: Analysis of eye and hand responses in
pointing at visual targets. Revue Neurologique, 137 (5), 357–372.

Vuilleumier, P., & Rafal, R. D. (2000). A systematic study of visual extinction: Between-
and within-field deficits of attention in hemispatial neglect. Brain, 123 (Pt 6), 1263–1279.

Vuilleumier, P., Sergent, C., Schwartz, S., Valenza, N., Girardi, M., Husain, M., & Driver, J.
(2007). Impaired perceptual memory of locations across gaze-shifts in patients with uni­
lateral spatial neglect. Journal of Cognitive Neuroscience, 19 (8), 1388–1406.

Wade, A. R., Brewer, A. A., Rieger, J. W., & Wandell, B. A. (2002). Functional measure­
ments of human ventral occipital cortex: retinotopy and colour. Philosophical Transac­
tions of the Royal Society of London, Series B, Biological Sciences. 357 (1424), 963–973.

Walker, R., & Findlay, J. M. (1997). Eye movement control in spatial- and object-based ne­
glect. In P. Thier & H.-O. Karnath (Eds.), Parietal lobe contributions to orientation in 3D
space (pp. 201–218). Heidelberg: Springer-Verlag.

Wardak, C., Olivier, E., & Duhamel, J. R. (2002). Saccadic target selection deficits after
lateral intraparietal area inactivation in monkeys. Journal of Neuroscience, 22 (22), 9877–
9884.

Weddell, R. A. (2004). Subcortical modulation of spatial attention including evidence that


the Sprague effect extends to man. Brain and Cognition, 55 (3), 497–506.

Wojciulik, E., Husain, M., Clarke, K., & Driver, J. (2001) Spatial working memory. Journal
of Neuropsychologia, 39 (4), 390–396.

Page 55 of 56
Attentional Disorders

Wolpert, T. (1924). Die simultanagnosie. Zeitschrift für gesamte Neurologie und Psychia­
trie, 93, 397–415.

Yarbus, A. L. (1967). Eye movements and vision. New York: Plenum Press.

Yeshurun, Y., & Carrasco, M. (1998). Attention improves or impairs visual performance by
enhancing spatial resolution. Nature, 396, 72–75.

Zihl, J. (2000). Rehabilitation of visual disorders after brain injury. In Neuropsychological


rehabilitation: A modular handbook. Hove, UK: Psychology Press.

Laure Pisella

Laure Pisella, Lyon Neuroscience Research Center, Bron, France

A. Blangero

A. Blangero, Lyon Neuroscience Research Center, Bron, France

Caroline Tilikete

Caroline Tilikete, Lyon Neuroscience Research Center, University Lyon, Hospices


Civils de Lyon, Hôpital Neurologique.

Damien Biotti

Damien Biotti, Lyon Neuroscience Research Center.

Gilles Rode

Gilles Rode, Lyon Neuroscience Research Center, Hospices Civils de Lyon, and Hôpi­
tal Henry Gabrielle.

Alain Vighetto

Alain Vighetto, Lyon Neuroscience Research Center, University Lyon, Hospices Civils
de Lyon, Hôpital Neurologique, Lyon, France.

Jason B. Mattingley

Jason B. Mattingley is Professor of Cognitive Neuroscience, The University of


Queensland.

Yves Rossetti

Yves Rossetti, Lyon Neuroscience Research Center, University Lyon, Mouvement et


Handicap, Plateforme IFNL-HCL, Hospices Civils de Lyon.

Page 56 of 56
Semantic Memory

Semantic Memory  
Eiling Yee, Evangelia G. Chrysikou, and Sharon L. Thompson-Schill
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0017

Abstract and Keywords

Semantic memory refers to general knowledge about the world, including concepts, facts,
and beliefs (e.g., that a lemon is normally yellow and sour or that Paris is in France). How
is this kind of knowledge acquired or lost? How is it stored and retrieved? This chapter
reviews evidence that conceptual knowledge about concrete objects is acquired through
experience with them, thereby grounding knowledge in distributed representations
across brain regions that are involved in perceiving or acting on them, and impaired by
damage to these brain regions. The authors suggest that these distributed representa­
tions result in flexible concepts that can vary depending on the task and context, as well
as on individual experience. Further, they discuss the role of brain regions implicated in
selective attention in supporting such conceptual flexibility. Finally, the authors consider
the neural bases of other aspects of conceptual knowledge, such as the ability to general­
ize (e.g., to map lemons and grapes onto the category of fruit), and the ability to repre­
sent knowledge that does not have a direct sensorimotor correlate (e.g., abstract con­
cepts, such as peace).

Keywords: semantic memory, concepts, categories, representation, knowledge, sensorimotor, grounding, embodi­
ment

Introduction
What Is Semantic Memory?

How do we know what we know about the world? For instance, how do we know that a
cup must be concave, or that a lemon is normally yellow and sour? Psychologists and cog­
nitive neuroscientists use the term semantic memory to refer to this kind of world knowl­
edge. In his seminal article, “Episodic and Semantic Memory,” Endel Tulving borrowed
the term semantic from linguists to refer to a memory system for “words and other verbal
symbols, their meaning and referents, about relations among them, and about rules, for­
mulas, and algorithms for manipulating them”1 (Tulving, 1972, p. 386).

Page 1 of 40
Semantic Memory

Today, most psychologists use the term semantic memory more broadly—to refer to all
kinds of general world knowledge, whether it is about words or concepts, facts or beliefs.
What these types of world knowledge have in common is that they are made up of knowl­
edge that is independent of specific experiences; instead, it is general information or
knowledge that can be retrieved without reference to the circumstances in which it was
originally acquired. For example, the knowledge that lemons are shaped like mini-foot­
balls would be considered part of semantic memory, whereas knowledge about where you
were the last time you tasted a lemon would be considered part of episodic memory. This
division is reflected in a prominent taxonomy of long-term memory (Squire, 1987), in
which semantic and episodic memory are characterized as distinct components of the ex­
plicit (or declarative) memory system for facts (semantic knowledge) and events (episodic
knowledge).

What Is the Relationship Between Semantic Memory and


(p. 354)

Episodic Memory?

Although semantic memory and episodic memory are typically considered distinct, the de­
gree to which semantic memory is dependent on episodic memory is a matter of ongoing
debate. This is because in order to possess a piece of semantic information, there must
have been some episode during which that information was learned. Whether this means
that all information in semantic memory begins as information in episodic memory (i.e.,
memory linked to a specific time and place) is an open question. According to Tulving, the
answer is no: “If a person possesses some semantic memory information, he obviously
must have learned it, either directly or indirectly, at an earlier time, but he need not pos­
sess any mnemonic information about the episode of such learning …” (p. 389). In other
words, it may be possible for information to be incorporated into our semantic memory in
the absence of ever having conscious awareness of the instances in which we were ex­
posed to it. Alternatively, episodic memory may be the “gateway” to semantic memory
(see Squire & Zola, 1998, for review)—that is, it may be the route through which seman­
tic memory must be acquired (although eventually this information may exist indepen­
dently). Most of the evidence brought to bear on this debate has come from studies of pa­
tients with selective episodic or semantic memory deficits. We turn to these patients in
the following two subsections.

How Is Semantic Memory Acquired?

Children who develop amnesia in early childhood (consequent to bilateral hippocampal


damage) are relevant to the question of whether the acquisition of semantic information
depends on episodic memory. If semantic knowledge is acquired through episodic memo­
ry, then because these children had limited time to acquire semantic knowledge before
developing amnesia, they should have limited semantic knowledge. Interestingly, despite
their episodic memory impairments, amnesic children’s semantic knowledge appears rel­
atively intact (Bindschaedler et al., 2011; Gardiner et al., 2008; Vargha-Khadem et al.,
1997). Furthermore, studies on the famous amnesic patient H.M. have revealed that he
acquired some semantic knowledge after the surgery that led to his amnesia (for words
Page 2 of 40
Semantic Memory

that came into common use [Gabrieli et al., 1988] and for people who became famous
[O’Kane et al., 2004] after his surgery). Thus, the evidence suggests that semantic knowl­
edge can be acquired independently of the episodic memory system. However, semantic
knowledge in these amnesic patients is not normal (e.g., it is acquired very slowly and la­
boriously). It is therefore possible that the acquisition of semantic memory normally de­
pends on the episodic system,2 but other points of entry can be used (albeit less efficient­
ly) when the episodic system is damaged. Alternatively, these patients may have enough
remaining episodic memory to allow the acquisition of semantic knowledge (Squire & Zo­
la, 1998).

Can Semantic Memories Be “Forgotten”?

Everyone occasionally experiences difficulty retrieving episodic memories (what did I eat
for dinner last night?), but can people lose their knowledge of what things are? Imagine
walking through an orchard with a friend: Your friend has no trouble navigating among
the trees; then—to your surprise—as you stroll under a lemon tree, she picks up a lemon,
holds it up and asks, “What is this thing?”

In an early report, Elizabeth Warrington (1975) described three patients who appeared to
have lost this kind of knowledge. The syndrome has subsequently been termed semantic
dementia (also known as the temporal variant of fronto-temporal dementia), a neurode­
generative disease that causes gradual and selective atrophy of the anterior temporal cor­
tex (predominantly on the left; see Garrard & Hodges, 1999 ; Mesulam et al., 2003; Mum­
mery et al., 1999). Although semantic dementia patients typically speak fluently and with­
out grammatical errors, as the disease progresses, they exhibit severe word-finding diffi­
culties and marked deficits in identifying objects, concepts, and people (Snowden et al.,
1989) irrespective of stimulus modality (e.g., pictures or written or spoken words; Bozeat
et al., 2000; Hodges et al., 1992; Patterson et al., 2006, 2007; Rogers & Patterson, 2007;
Snowden et al., 1994, 2001).

Semantic dementia patients’ performance on tests of visuo-spatial reasoning and execu­


tive function is less impaired (e.g., Hodges et al., 1999; Rogers et al., 2006). Importantly,
they also have relatively preserved episodic memories (e.g., Bozeat et al., 2002a, 2002b,
2004; Funnell, 1995a, 1995b, 2001; Graham et al., 1997, 1999; Snowden et al., 1994,
1996, 1999). Research on semantic dementia thus provides further evidence that the
neural structures underlying episodic memory are at least partially independent of those
underlying retrieval from semantic memory.

How one conceives of the relationship between semantic and episodic memory is
(p. 355)

complicated by the fact that (as we discuss in the following section) there are different
kinds of semantic knowledge. It may be that for sensorimotor aspects of semantic knowl­
edge (e.g., knowledge about the shape, size, or smell of things), “new information enters
semantic memory through our perceptual systems, not through episodic memory” (Tulv­
ing, 1991. p. 20), whereas semantic knowledge of information that does not enter directly
through our senses (e.g., “encyclopedic knowledge,” such as the fact that trees photosyn­

Page 3 of 40
Semantic Memory

thesize) depends more heavily on contextual information. Moreover, sensorimotor and


nonsensorimotor components of semantic knowledge may be stored in different areas of
the cortex. Of note, even encyclopedic knowledge is often acquired indirectly; for exam­
ple, knowing that apple trees photosynthesize allows you to infer that lemon trees also
photosynthesize. Semantic knowledge may support the ability to make these kinds of gen­
eralizations. In the next section, we introduce some influential hypotheses about what the
different components of semantic knowledge might be.

What Are the Different Aspects of Semantic


Memory?
Psychologists began to ask questions about how our knowledge about the world is orga­
nized following observations of different kinds of impairments in patients with brain in­
juries. More than 25 years ago, Warrington and McCarthy (1983) described a patient who
had more difficulty identifying nonliving than living things. Shortly after, Warrington and
Shallice (1984) described four patients exhibiting a different pattern of impairments:
more difficulty identifying living than nonliving things. These and other observations of
category-specific impairments led to the proposal that semantic memory might be orga­
nized in domains of knowledge such as living things (e.g., animals, vegetables, fruits) and
nonliving things (e.g., tools, artifacts), which can be selectively impaired after brain in­
jury (Warrington & McCarthy, 1994). Thus, one possible organizational framework for se­
mantic knowledge is categorical (also referred to as domain specific; e.g., Caramazza &
Shelton, 1998).

Early functional neuroimaging studies, however, suggested that semantic memory may be
organized along featural (also known as modality- or attribute-specific) lines—either in­
stead of or in addition to domain-specific lines. These studies showed neuroanatomical
dissociations between visual and nonvisual object attributes, even within a category (e.g.,
Thompson-Schill et al., 1999). For example, Martin and colleagues (1995) reported that
retrieving the color of an object was associated with activation in ventral temporal cortex
bilaterally, whereas retrieving action-related information was associated with activation
in middle temporal and frontal cortex.

Further observations from neuropsychological patients have suggested even finer subdi­
visions within semantic memory (e.g., Buxbaum & Saffran, 2002; Saffran & Schwartz,
1994). In particular, in categorical frameworks, living things can be further divided into
distinct subcategories (e.g., fruits and vegetables). Similarly, in featural frameworks, non­
visual features can be subdivided into knowledge about an object’s function (e.g., a spoon
is used to eat) versus knowledge about how it is manipulated (e.g., a spoon is held with
the thumb, index, and middle fingers, at an angle; Buxbaum, Veramonti, & Schwartz,
2000; Kellenbach, Brett, & Patterson, 2003; Sirigu et al., 1991); likewise, visual features
can be subdivided into different attributes (e.g., color, size, form, or motion; see Thomp­
son-Schill, 2003, for review).

Page 4 of 40
Semantic Memory

In the remainder of this chapter, we present a number of different theories cognitive neu­
roscientists have proposed for the organization of semantic knowledge, and we discuss
experimental evidence on how this organization might be reflected in the brain. Although
some findings would appear, at first, to be consistent with an organization of semantic
memory by categories of information, we will conclude that the bulk of the evidence sup­
ports an organization by features or attributes that are distributed across multiple brain
regions.

How Is Semantic Memory Organized?


How is knowledge in semantic memory organized? Is it organized like files appear on a
computer, with separate folders for different kinds of information (Applications, Docu­
ments, Music, Movies, etc.), and subfolders within those folders providing further organi­
zation? That is, is semantic knowledge organized hierarchically? Or is it organized more
like how information is actually stored in computer (e.g., RAID) memory, wherein data are
stored in multiple (frequently redundant) drives or levels to increase access speed and re­
liability? That is, is semantic knowledge organized in a distributed fashion? In this section
we briefly describe four different classes of models that have been put forth to describe
the organization of semantic memory.

(p. 356) Traditional Cognitive Perspectives

Classical cognitive psychological theories have described the organization of knowledge


in semantic memory in terms of a hierarchy (e.g., a tree is a plant and a plant is a living
thing; Collins & Quillian, 1969) that is structured according to abstract relations between
concepts (i.e., the propositions, rules, or procedures that determine where a concept fits
in the hierarchy) and that may be inaccessible to conscious experience (e.g., Pylyshyn,
1973). Cognitive theorists have also considered whether semantic knowledge may be ac­
quired and stored in multiple formats akin to verbal and visual codes (e.g., Paivio, 1969,
1971, 1978). Historically, these theories have not described brain mechanisms that might
support conceptual knowledge, but these sorts of descriptions foreshadow the theories
about the organization of semantic memory (category vs. attribute based) that character­
ize cognitive neuroscience today.

Domain-Specific Category-Based Models

As described above, a number of observations from patients with brain injuries suggest
that different object categories (i.e., living and nonliving things) might be differentially in­
fluenced by brain damage. One way to instantiate the evident neural dissociation be­
tween living and nonliving things is to posit that there are distinct neural regions dedicat­
ed to processing different categories of objects. The “domain-specific” category-based
model (Caramazza & Shelton, 1998) does just that. According to this model, evolutionary
pressure led to the development of adaptations to facilitate recognition of categories that
are particularly relevant for survival or reproduction, such as animals, plant life (i.e.,

Page 5 of 40
Semantic Memory

fruits and vegetables), conspecifics, and possibly tools; and these adaptations led to ob­
jects from these different categories having distinct, non-overlapping neural representa­
tions. Such a system would have adaptive value to the extent that having dedicated neur­
al mechanisms for recognizing these objects could make for faster and more accurate
classification—and subsequent appropriate response.

Although a fundamental principle of this model is that representations of concepts from


these different categories are processed in distinct regions and thus do not overlap, it
does not speak to how conceptual knowledge is represented within these categories. In
fact, an elaboration of this model (Mahon & Caramazza, 2003) is partially distributed and
partially sensorimotor based in that it suggests that representations may be distributed
over different sensory modalities. However, within each modality, the representations of
different categories remain distinct.

Sensory-Functional and Sensorimotor-Based Theories

A complication for category-based models is that despite the “category-specific” label, pa­
tients’ recognition problems do not always adhere to category boundaries—deficits can
span category boundaries or affect only part of a category. This suggests a need for an ac­
count of semantic memory that does not assume a purely category-specific organization.
Sensory-functional theory provides an alternative account. According to this model, con­
ceptual knowledge is divided into anatomically distinct sensory and functional stores, and
so-called category-specific deficits emerge because the representations of different kinds
tend to rely on sensory and functional information to different extents (Farah & McClel­
land, 1991; Warrington & McCarthy, 1987). For example, representations of living things
depend more on visual information than do artifacts, which depend more on functional in­
formation. Consequently, deficits that partially adhere to category boundaries can emerge
even without semantic memory being categorically organized per se.

Sensory-functional theory is not without its own problems, however. There exist numer­
ous patients whose deficits cannot be captured by a binary sensory-functional divide (see
Caramazza & Shelton, 1998, for a review), which demonstrates that a simple two-way
partitioning of semantic attributes is overly simplistic. A related but more fully specified
proposal by Alan Allport addresses this concern by pointing out that sensory information
should not be considered a unitary entity but rather should be divided into multiple at­
tributes (e.g., color, sound, form, touch). Specifically, Allport (1985) suggests that the sen­
sorimotor systems used to experience the world are also used to represent meaning: “The
essential idea is that the same neural elements that are involved in coding the sensory at­
tributes of a (possibly unknown) object presented to eye or hand or ear also make up the
elements of the auto-associated activity-patterns that represent familiar object-concepts
in ‘semantic memory’” (1985, p. 53).3 Hence, according to Allport’s model, representa­
tions are sensorimotor based, and consequently, the divisions of labor that exist in senso­
rimotor processing should be reflected in conceptual representations. More recently, oth­
er sensorimotor-based models have made similar claims (e.g., Barsalou, 1999; (p. 357)

Page 6 of 40
Semantic Memory

Damasio, 1989; Lakoff & Johnson, 1999; in a later section, we discuss empirical studies
that address these predictions).

One question that often arises with respect to these sensorimotor-based theories is
whether, in addition to sensorimotor representations and the connections between them,
it is useful to posit one or more specific brain regions, often called a hub or convergence
zone, where higher order similarity—that is, similarity across sensory modalities—can be
computed (e.g., Damasio, 1989; Simmons & Barsalou, 2003). Such an architecture may
facilitate capturing similarity among concepts, thereby promoting generalization and the
formation of categories (see Patterson et al., 2007, for a review). We return to these is­
sues in later sections, where we discuss generalization and the representation of knowl­
edge that is abstract in that it has no single direct sensorimotor correlate (e.g., the pur­
pose for which an object is used, such as “to tell time” for a clock).

Correlated Feature-Based Accounts

The final class of models that we discuss is commonly referred to as correlated feature-
basedaccounts (Gonnerman et al., 1997; McRae, de Sa, & Seidenberg, 1997; Tyler &
Moss 2001). According to these models, the “features” from which concepts are built
comprise not only sensorimotor-based features (such as shape, color, action, and taste)
but also other (experience-based) attributes that participants produce when asked to list
features of objects. For instance, for a tiger, these features might include things such as
“has eyes,” “breathes,” “has legs,” and “has stripes,” whereas for a fork, they might in­
clude “made of metal,” “used for spearing,” and “has tines.”

Importantly, different classes of objects are characterized by different degrees of co-oc­


currence of particular types of features. For example, for a given living thing, participants
tend to list features that are shared with other living things (e.g., “has eyes,” “breathes,”
“has legs”), whereas for artifacts, they tend to list features that are not shared with other
artifacts (e.g., “used for spearing,” “has tines”). When features tend to co-occur, they can
be said to be correlated. For example, if something has legs, it is also likely to breathe
and to have eyes. Because correlated feature-based models consider that living and non­
living things can be described through component features, they are at least partially
compatible with both sensorimotor and domain-specific theories.4

According to one influential correlated feature-based model (Tyler & Moss, 2001), highly
correlated shared features tend to support knowledge of a category as a whole, whereas
distinctive features tend to support accurate identification of individual members. Fur­
ther, the correlations between features enable them to support each other, making these
features robust. Hence, because living things have many shared features, general catego­
ry knowledge is robust for them. On the other hand, because individual living things tend
to have few and uncorrelated distinctive features (e.g., “has stripes” or “has spots”), dis­
tinctive information about living things is particularly susceptible to impairment. In con­
trast, features that distinguish individual artifacts from others tend to be correlated (e.g.,
“has tines” is correlated with “used for spearing”), making this information robust. While

Page 7 of 40
Semantic Memory

differing in some details, Cree and McRae’s (2003) feature-based account similarly posits
that objects (living and nonliving) differ with respect to number of shared versus distinc­
tive features and that these factors vary with object category. Hence, correlated feature-
based accounts hypothesize that the reason for category-specific deficits is not domain of
knowledge per se, but instead is differences in the distribution of features across domains
(see also Rogers & Patterson, 2007).

Summary of Models

The main division between domain-specific category-based models, on the one hand, and
sensorimotor-based and correlated feature-based accounts, on the other, concerns how
category knowledge is represented. For domain-specific models, object category is a pri­
mary organizing principle of semantic memory, whereas for the other accounts, category
differences emerge from other organizational properties. In many ways, correlated fea­
ture-based accounts echo sensorimotor-based theories. In particular, these two classes of
models are parallel in that categories emerge through co-occurrence of features, with the
relevance of different features depending on the particular object, and with different
parts of a representation supporting one another. The major distinguishing aspect is that
sensorimotor-based theories focus on sensorimotor features—specifying that the same
brain regions that encode a feature represent it. In contrast, because none of the funda­
mental principles of correlated feature-based accounts require that features be sensori­
motor based (in fact, a concern for these models is how features should be defined), these
accounts do not require that features be situated in brain regions that are tied to sensory
or motor processing.

Incorporating a convergence zone type of architecture into a sensorimotor-based


(p. 358)

model may help integrate all three classes of models. Convergence zone theories posit
dedicated regions for integrating across sensorimotor-based features, extracting statisti­
cal regularities across concepts, and ultimately producing a level of representation with a
category-like topography in the brain (Simmons & Barsalou, 2003).

What Are the Neural Systems that Support Se­


mantic Memory, and How Do We Retrieve Se­
mantic Information from These Systems?
Are Different Categories Supported by Different Brain Regions?

Functional neuroimaging techniques like positron emission tomography (PET) and func­
tional magnetic resonance imaging (fMRI) have allowed cognitive neuroscientists to ex­
plore different hypotheses regarding the neural organization of semantic memory in un­
damaged brains. By means of these methodologies, researchers observe regional brain
activity while participants perform cognitive tasks such as naming objects, deciding

Page 8 of 40
Semantic Memory

whether two stimuli belong in the same object category, or matching pictures of stimuli to
their written or spoken names.

Early work attempted to examine whether specific brain regions are selectively active for
knowledge of different object categories (e.g., animals or tools). These studies found that
thinking about animals tends to produce increased neural activity in inferior posterior ar­
eas, including inferior temporal (Okada et al., 2000; Perani et al., 1995) and occipital re­
gions (Grossman et al., 2002; Martin et al., 1996; Okada et al., 2000; Perani et al., 1995),
whereas thinking about tools tends to activate more dorsal and frontal areas, including
left dorsal (Perani et al., 1995) or inferior (Grossman et al., 2002; Okada et al., 2000) pre­
frontal regions, as well as left premotor (Martin et al., 1996), inferior parietal (Okada et
al., 2000), and posterior middle temporal areas (Grossman et al., 2002; Martin et al.,
1996; Okada et al., 2000). Further, within the inferior temporal lobe, the lateral fusiform
gyrus generally shows increased neural activity in response to animals, while the medial
fusiform tends to respond more to tools (see Martin, 2007, for a review).

Although these findings might seem at first glance to provide unambiguous support for a
domain-specific, category-based organization of semantic memory, the data have not al­
ways been interpreted as such. Sensory-functional theories can also account for putative­
ly category-specific activations because they posit that different regions of neural activity
for animals and tools reflect a tendency for differential weighting of visual and functional
features for objects within a given category, rather than an explicit category-based orga­
nization (e.g., Warrington & McCarthy, 1987).

The hypothesis that a feature’s weight can vary across objects raises the possibility that
even for a given object, a feature’s weight may vary depending on its relevance to a given
context. In other words, the extent to which a particular feature becomes active for a giv­
en object may be contextually dependent not only on long-term, object-related factors
(i.e., is this feature relevant in general for the identification of this object?) but also on
short-term, task-related factors (i.e., is this feature relevant for the current task?). The
following sections describe evidence suggesting that both the format of the stimulus with
which semantic memory is probed (i.e., words vs. pictures) and the demands of the task
influence which aspects of a given concept’s semantic representation are activated.

Does the Format of the Stimulus Influence Semantic Memory Re­


trieval?

Studies of neuropsychological patients have suggested dissociations in performance be­


tween semantic knowledge tasks that use pictorial or verbal stimuli. For example, pa­
tients with optic aphasia are unable to identify objects presented visually, whereas their
performance with lexical/verbal stimuli remains unimpaired (e.g., Hillis & Caramazza,
1995; Riddoch & Humphreys, 1987). On the other hand, Saffran and colleagues (2003a)
described a patient whose object recognition performance was enhanced when prompted
with pictures but not with words. This neuropsychological evidence suggests that pic­
tures and words may have differential access to different components of semantic knowl­

Page 9 of 40
Semantic Memory

edge (Chainay & Humphreys, 2002; Rumiati & Humphreys, 1998; Saffran et al., 2003b).
That is, damage to a component accessed by one stimulus type (e.g., words) can spare
components accessed by a different stimulus type (e.g., pictures).

Consistent with the neuropsychological observations, studies of healthy participants have


found that although the patterns of brain activation produced when accessing the same
concept from pictures and words can overlap significantly, there are also differences
(e.g., Gates & Yoon, 2005; Vandenberghe et al., 1996; see also Sevostianov et al., 2002).
(p. 359) Bright, Moss, and Tyler (2004; see also Wright et al., 2008) performed a meta-

analysis of four PET studies involving semantic categorization and lexical decision tasks
with verbal and pictorial stimuli. They found evidence for a common semantic system for
pictures and words in the left inferior frontal gyrus and left temporal lobe (anterior and
medial fusiform, parahippocampal, and perirhinal cortices) and evidence for modality-
specific activations for words in both temporal poles and for pictures in both occipito-tem­
poral cortices. Overall, evidence from studies examining access to semantic knowledge
from pictures versus words suggests that concepts are distributed patterns of brain acti­
vation that can be differentially tapped by stimuli in different formats.

Does the Type of Task Influence Semantic Memory Retrieval?

Retrieval from semantic memory can be influenced not only by the format of the stimuli
used to elicit that information (as described above) but also by specifics of the task, such
as the information that the participant is asked to produce and the amount of time provid­
ed to respond. For example, in an elegant PET experiment, Mummery and colleagues
(1998) showed participants the names of living things or artifacts and asked them to
make judgments about either a perceptual attribute (color) or a nonperceptual attribute
(typical location). Different attribute judgments elicited distinct patterns of activation (in­
creased activation in the left temporal-parietal-occipital junction for location and in­
creased activation in the left anterior middle temporal cortex for color). Moreover, differ­
ences between attributes were larger than differences between category (i.e., living
things vs. artifacts), suggesting that the most prominent divisions in semantic memory
may be associated with attributes rather than categories—a structure consistent with dis­
tributed, feature-based models of semantic memory (see also Moore & Price, 1999).

The amount of time provided to respond also appears to affect which aspects of a concept
become active. In an early semantic priming study, Schreuder and colleagues (1984)
observed that priming for perceptual information (e.g., between the concepts apple and
ball, which are similar in shape) emerges when task demands encourage a rapid re­
sponse, whereas priming for more abstract information (e.g., between apple and banana,
which are from the same category) emerges only when responses are slower (see Yee et
al., 2011, for converging evidence). More recently, Rogers and Patterson (2007) provided
additional evidence that speed of response influences which semantic features are avail­
able: When participants were under time pressure, responses were more accurate for cat­
egorization judgments that did not require specific information, such as between cate­
gories (e.g., distinguishing birds from vehicles), and less accurate for categorization that

Page 10 of 40
Semantic Memory

did require access to specific information, such as within a category (e.g., distinguishing
between particular kinds of birds). When participants were allowed more time to re­
spond, the pattern reversed. Thus, the results of these studies suggest that the specifics
of the task influence which aspects of a representation become measurably active.

In sum, retrieval from semantic memory can be influenced not only by the format of the
stimuli used to elicit the information (e.g., words vs. pictures) but also by the timing of
the task and the information that the participant is asked to provide.

Is Retrieval Influenced by Interactions Between Category and Task?

The format- and task-related effects reviewed earlier suggest that the most prominent di­
vision in semantic memory might be in terms of attribute domains and not, necessarily,
category domains, thus offering support for distributed, feature-based models of semantic
memory. Clearly, though, differences in format or task cannot account for the fact that dif­
ferences between categories can be observed even with the same format and task. How­
ever, the presence of both format and task effects in semantic knowledge retrieval raises
the possibility that interactions between stimulus modality and task type can elicit catego­
ry effects that these factors do not produce independently. In this section we explore how
the organization of semantic memory might accommodate stimulus, task, and category ef­
fects.

For instance, the particular combinations of sensorimotor attributes retrieved from se­
mantic memory might be determined by an interaction between task-type and sensorimo­
tor experience (Thompson-Schill et al., 1999). For example, for living things, retrieval of
both visual and nonvisual information should require activation of visual attributes be­
cause semantic memory about living things depends largely on knowledge about their vi­
sual features. To illustrate, people’s experience with zebras is largely visual; hence, re­
trieval of even nonvisual information about them (e.g., Do zebras live in Africa?) will en­
gage visual attributes because one’s knowledge about zebras is built around their (p. 360)
visual features (assuming that retrieving more weakly represented attributes depends on
the activation of more strongly represented attributes; see Farah & McClelland, 1991). In
contrast, for nonliving things, only retrieval of visual information should require activa­
tion of visual attributes. For instance, because people’s experience with microwave ovens
is distributed across a wide range of properties (e.g., visual, auditory, tactile), retrieval of
nonvisual information about them (e.g., Do microwave ovens require more electricity than
refrigerators?) will not necessarily engage visual attributes.

Thompson-Schill and colleagues (1999) found evidence for just such a dissociation: The
left fusiform gyrus (a region linked to visual knowledge) was activated by living things re­
gardless of whether participants made judgments about their visual or nonvisual proper­
ties. In contrast, for nonliving things, the same visual region was active only when partici­
pants were asked to make judgments about visual properties. The complementary pattern
has also been observed: A region linked to action information (the left posterior middle
temporal cortex) was activated by tools for both action and nonaction tasks, but was acti­

Page 11 of 40
Semantic Memory

vated by fruit only during an action task (Phillips et al., 2002). These and related findings
(Hoenig et al., 2008) suggest that category-specific activations may reflect differences in
which attributes are important for our knowledge of different object categories (but see
Caramazza, 2000, for an alternative perspective).

Related work has demonstrated that ostensibly category-specific patterns can be elimi­
nated by changing the task. Both patients with herpes simplex virus encephalitis and
unimpaired participants exhibit apparently category-specific patterns when classifying
objects at the “basic” level (i.e., at the level of dog or car) as revealed by errors or by
functional activity in ventral temporal cortex, respectively. However, these differences
can be made to disappear when objects are classified more specifically (e.g., Labrador or
BMW, instead of dog or car; Lambon Ralph et al., 2007; Rogers et al., 2005). Why might
level of classification matter? One possibility relates to correlated feature-based models
(discussed earlier): Differences in the structure of the stimuli that are correlated with cat­
egory may interact with the task (e.g., Humphreys et al., 1988; Price et al., 2003; Tarr &
Gautier, 2000; see also Cree & McRae, 2003). For instance, at the basic level, animals
typically share more features (e.g., consider dog vs. goat), than do vehicles (e.g., car vs.
boat). This greater similarity for animals may produce a kind of “crowding” that makes
them particularly difficult to differentiate at the basic level (e.g., Rogers et al., 2005; Nop­
peney et al., 2007; Tyler & Moss, 2001; but cf. Wiggett et al., 2009, who find that interac­
tions between category and task do not always modulate category effects).

Hence, the studies described in this section provide further evidence that apparently cat­
egory-specific patterns may be due to interactions between stimuli and task. More broad­
ly, numerous studies have explored whether semantic memory is organized in the brain
by object category, by perceptual or functional features, or by a multimodal distributed
network of attributes. Thus far, the findings are compatible with correlated feature and
sensorimotor-based accounts and appear to suggest a highly interactive distributed se­
mantic system that is engaged differently depending on object category and task de­
mands (for a review, see Thompson-Schill, 2003).

Do the Same Neural Regions Underlie Perceptual and Conceptual


Processing of Objects?

The preceding evidence largely supports one main tenet of sensorimotor, feature-based
accounts—that semantic memory is distributed across different brain regions. However,
an additional claim of sensorimotor theory is that the brain regions that are involved
when perceiving and interacting with an object also encode its meaning. To address this
claim, research has attempted to explore the extent to which the different sensorimotor
properties of an object (e.g., its color, action, or sound) activate the same neural systems
as actually perceiving these properties.

With respect to color, for example, Martin and colleagues (1995) measured changes in re­
gional cerebral blood flow using PET when participants generated the color or the action
associated with pictures of objects or their written names. Generating color words led to

Page 12 of 40
Semantic Memory

activation in the ventral temporal lobe in an area anterior to that implicated in color per­
ception, whereas generating action words was associated with activation in the middle
temporal gyrus just anterior to a region identified in the perception of motion. Martin and
colleagues interpreted these results as indicative of a distributed semantic memory net­
work organized according to one’s sensorimotor experience of different object attributes
(see also Ishai et al., 2000; Wise et al., 1991). More recent studies have reported some di­
rect overlap5 between regions involved in color perception and (p. 361) those involved in
retrieval of color knowledge about objects (Hsu et al., 2011; Simmons et al., 2007).

With respect to action, analogous findings have been reported regarding overlap between
perceptual-motor and conceptual processing. Chao and Martin (2000; see also Chao, Hax­
by, & Martin, 1999; Gerlach et al., 2000) showed that the left ventral premotor and left
posterior parietal cortices (two areas involved in planning and performing actions) are se­
lectively active when participants passively view or name pictures of manipulable tools.
The involvement of these regions despite the absence of a task requiring the retrieval of
action information (i.e., even during passive viewing) can be explained if the representa­
tions of manipulable objects include areas involved in planning and performing actions. In
a recent study (Yee, Drucker, & Thompson-Schill, 2010) we obtained additional evidence
supporting this hypothesis: In left premotor cortex and inferior parietal sulcus, the neural
similarity of a pair of objects (as measured by fMRI-adaptation; see later) is correlated
with the degree of similarity in the actions used to interact with them. For example, a pi­
ano and a typewriter, which we interact with using similar hand motions, have similar
representations in action regions, just as they should if representations are sensorimotor
based. Moreover, reading action words (e.g., lick, pick, kick) produces differential activity
in or near motor regions activated by actual movement of the tongue, fingers, and feet,
respectively (Hauk et al., 2004). Interestingly, it appears that this motor region activation
can be modulated by task: Reading an action verb related to leg movement (e.g., kick) ac­
tivates motor regions in literal (kick the ball) but not figurative (kick the bucket) sen­
tences (Raposo et al., 2009).

Although visual and motor features have been studied most often, other modalities also
supply evidence for overlap between conceptual and perceptual processing. Regions in­
volved in auditory perception and processing (posterior and superior middle temporal
gyri) are active when reading the names of objects that are strongly associated with
sounds (e.g., telephone; Kiefer et al., 2008; see also Goldberg et al., 2006; Kellenbach et
al., 2001; Noppeney & Price, 2002). Similarly, an orbitofrontal region associated with
taste and smell is activated when making decisions about objects’ flavor (Goldberg et al.,
2006), and simply reading words with strongly associated smells (e.g., cinnamon) acti­
vates primary olfactory areas (Gonzalez et al., 2006).

Patients with brain damage affecting areas involved in sensorimotor processing are also
relevant to the question of whether regions underlying perception and action also under­
lie conceptual knowledge. A sensorimotor-based account would predict that damage to an
auditory, visual, or motor area (for example), should affect the ability to retrieve auditory,
visual, or motor information about an object, whereas access to features corresponding to

Page 13 of 40
Semantic Memory

undamaged brain regions would be less affected. There is evidence that this is indeed the
case. For instance, patients with damage to left auditory association cortex have prob­
lems accessing concepts for which sound is highly relevant (e.g., thunder or telephone;
Bonner & Grossman, 2012; Trumpp et al., 2013). Likewise, a patient with damage to ar­
eas involved in visual processing (right inferior occipito-temporal junction) had more diffi­
culty naming pictures of objects whose representations presumably rely on visual infor­
mation (e.g., living things that are not ordinarily manipulated) than objects whose repre­
sentations are presumably less reliant on visual information (e.g., living or nonliving
things that are generally manipulated); the patient’s encyclopedic and auditory knowl­
edge about both types of objects, in contrast, was relatively preserved (Wolk et al., 2005).

Similarly, apraxic patients, who have difficulty performing object-related actions—and


who typically have damage to the premotor or parietal areas subserving these actions—
show abnormally delayed access to manipulation information about objects (Myung et al.,
2010). Studies with normal participants using transcranial magnetic stimulation (TMS),
which produces a temporary and reversible “lesion” likewise suggest that motor areas
are involved in processing motor-related concepts (e.g., Pobric et al., 2010; see Hauk et
al., 2008, for review), as do studies requiring normal participants to perform an explicit
motor task designed to interfere with activating object-appropriate motor programs (e.g.,
Witt et al., 2010; Yee et al., 2013). Finally, Gainotti (2000) conducted a comprehensive re­
view of category-specific deficits, focusing on relationships between location of brain
damage and patterns of impairment. These relationships, Gainotti observed, suggest that
the categorical nature of the deficits is produced by correlations between (damaged)
brain regions and sensorimotor information that is central to various categories.

Overall, findings from neuroimaging, neuropsychological, and TMS studies converge to


suggest that semantic knowledge about objects is built (p. 362) around their sensorimotor
attributes and that these attributes are stored in sensorimotor brain regions.

Which Neural Regions Underlie the Generalization of Semantic


Knowledge?

A critical function of semantic memory is the ability to generalize (or abstract) over our
experiences with a given object. Such generalization permits us to derive a representa­
tion that will allow us to recognize new exemplars of it and make predictions about as­
pects of these exemplars that we have not directly perceived. For example, during analog­
ical thinking, generalization is critical to uncover relationships between a familiar situa­
tion and a new situation that may not be well understood (e.g., that an electron is to the
nucleus like a planet is to the sun). Thus, analogical thinking involves not only retrieving
information about the two situations but also a mapping between their surface elements
based on shared abstract relationships (see Chrysikou & Thompson-Schill, 2010). Similar­
ly, knowing that dogs and cats are both animals (i.e., mapping them from their basic to
their superordinate level categories) may facilitate generalization from one to the other. A
full treatment of the process of generalization would be beyond the scope of this chapter.

Page 14 of 40
Semantic Memory

However, we briefly touch on some of the things that cognitive neuroscience has revealed
about the generalization process.

Several findings are consistent with the idea that different brain regions support different
levels of representation. For instance, an anterior temporal region (the perirhinal cortex,
particularly in the left) was activated when naming pictures at the basic level (e.g., dog or
hammer), but not at the superordinate level (e.g., living or manmade), whereas a posteri­
or temporal region (fusiform gyrus bilaterally) was activated for both levels (Tyler et al.,
2004, but cf. Rogers et al., 2006). In addition, greater anterior temporal lobe activity has
been observed during word–picture matching at a specific level (e.g., robin?kingfisher?)
than at a more general level (e.g., animal?vehicle?; Rogers et al., 2006). Further, process­
ing may differ for different levels of representation: Recordings of neural activity (via
magnetoencephalography) suggest that during basic level naming, there are more recur­
rent interactions between left anterior and left fusiform regions than during superordi­
nate level naming (Clark et al., 2011).

One interpretation of these findings is that there exists a hierarchically structured system
along a posterior-anterior axis in the temporal cortex—with posterior regions more in­
volved in coarse processing (such as the presemantic, perceptual processing required for
superordinate category discrimination) and anterior regions more involved in the integra­
tion of information across modalities that facilitates basic-level discrimination (e.g., cat
vs. dog; see Martin & Chao, 2001). More broadly, these and related findings (e.g., Chan et
al., 2011; Grabowski et al., 2001; Kable et al., 2005) are consistent with the idea that se­
mantic knowledge is represented at different levels of abstraction in different regions
(see also Hart & Kraut, 2007, for a mechanism by which different types of knowledge
could be integrated).

If true, this may be relevant to a puzzle that has emerged in neuroimaging tests of
Allport’s (1985) sensorimotor model of semantic memory. There is a consistent trend for
retrieval of a given physical attribute to be associated with activation of cortical areas 2
to 3 cm anterior to regions associated with perception of that attribute (Thompson-Schill,
2003). This pattern, which has been interpreted as coactivation of the “same areas” in­
volved in sensorimotor processing, as Allport hypothesized, could alternately be used as
grounds to reject the Allport model. What does this anterior shift reflect?

We believe the answer may lie in ideas developed by Rogers and colleagues (2004). They
have articulated a model of semantic memory that includes units that integrate informa­
tion across all of the attribute domains (including verbal descriptions and object names;
McClelland & Rogers, 2003). As a consequence, “abstract semantic representations
emerge as a product of statistical learning mechanisms in a region of cortex suited to per­
forming cross-modal mappings by virtue of its many interconnections with different per­
ceptual-motor areas” (Rogers et al., 2004, p. 206). The process of abstracting away from
modality-specific representations may occur gradually across a number of cortical re­
gions (perhaps converging on the temporal pole). As a result, a gradient of abstraction
may emerge in the representations throughout a given region of cortex (e.g., the ventral

Page 15 of 40
Semantic Memory

extrastriate visual pathway), and the anterior shift may reflect activation of a more ab­
stract representation (Kosslyn & Thompson, 2000). In other words, the conceptual simi­
larity space in more anterior regions may depart a bit from the similarity space in the en­
vironment, moving in the direction of abstract relations.

A gradient like this could also help solve another puzzle: If concepts are sensorimotor
based, one might worry that thinking of a concept would cause (p. 363) one to hallucinate
it or execute it (e.g., thinking of lemon would cause one to hallucinate a lemon, and think­
ing of kicking would produce a kick). But if concepts are represented (at least in part) at a
more abstract level than that which underlies direct sensory perception and action, then
the regions that underlie, for example, action execution, need not become sufficiently ac­
tive to produce action. More work is needed to uncover the nature of the representations
—and how the similarity space may gradually change across different cortical regions.

Summary of the Neural Systems Supporting Semantic Memory

In this section we have briefly summarized a large body of data on the neural systems
supporting semantic memory (see Noppeney, 2009, for a more complete review of func­
tional neuroimaging evidence for sensorimotor-based models). We suggested that in light
of the highly consistent finding that sensorimotor regions are active during concept re­
trieval, the data largely support sensorimotor-based models of semantic memory. Howev­
er, there is a question that is frequently raised about activation in sensorimotor regions
during semantic knowledge retrieval: Could it be that the activation of sensorimotor re­
gions that has been observed in so many studies is “epiphenomenal”6 rather than indicat­
ing that aspects of semantic knowledge are encoded in these regions? (See Mahon &
Caramazza, 2008, for discussion.) For example, perhaps activation in visual areas during
semantic processing is a consequence of generating visual images, and not of semantic
knowledge per se. The patient, TMS, and behavioral interference work described above
help to address this question: It is not clear how an epiphenomenal account would ex­
plain the fact that lesioning or interfering with a sensorimotor brain region affects the
ability to retrieve the corresponding attribute of a concept. These data therefore suggest
that semantic knowledge is at least partially encoded in sensorimotor regions.

However, the task effects described above raise another potential concern. Traditionally,
in the study of semantic representations (and, in fact, in cognitive psychology more
broadly) it is assumed that only effects that can be demonstrated across a variety of con­
texts should be considered informative with regard to the structure and organization of
semantic memory. If one holds this tenet, then these task effects are problematic. Yet, as
highlighted by the work described in this section, task differences can be accommodated
if one considers an important consequence of postulating that the representations of con­
cepts are distributed (recall that all but traditional approaches allow for a distributed ar­
chitecture): Distributed models allow attention to be independently focused on specific
(e.g., contextually relevant) properties of a representation through partial activation of
the representation (see Humphreys & Forde, 2001, for a description of one such model).
This means that if a task requiring retrieval of action information, for example, produces

Page 16 of 40
Semantic Memory

activation in premotor and parietal regions, but a task requiring retrieval of color does
not, the discrepancy may reflect differential focus of attention within an object concept
rather than that either attribute is not part of the object concept.

Thus, the differences between effects that emerge in different contexts lead to important
questions, such as how we are able to flexibly focus attention on relevant attributes. We
turn to this in the next section.

Biasing Semantic Representations


If our semantic knowledge is organized in a multimodal, highly interactive, distributed
system, how is it that we are able to weight certain attributes more heavily than others
depending on the circumstance—so that we can, for example, retrieve just the right com­
binations of features to identify or answer questions about concepts like a horse, a screw­
driver, or an airplane? In other words, how does our brain choose, for a given object and
given the demands of the task at hand, the appropriate pattern of activation? A number of
studies have suggested that the prefrontal cortex, particularly the left ventrolateral re­
gions, produces a modulatory signal that biases the neural response toward certain pat­
terns of features (e.g., Frith, 2000; Mechelli et al., 2004; Miller & Cohen, 2001; Noppeney
et al., 2006). For example, when, during semantic knowledge retrieval, competition
among different properties is high, a region in the left inferior frontal gyrus is activated
(Thompson-Schill et al., 1997; see also Kan & Thompson-Schill, 2004; Thompson-Schill et
al., 1998; Thompson-Schill, D’Esposito, et al., 1999).

Several mechanisms have been proposed regarding this region’s role in selective activa­
tion of conceptual information, among them that prefrontal cortical activity during se­
mantic tasks reflects the maintenance of different attributes in semantic memory (e.g.,
Gabrieli et al., 1998) or that this region performs a “controlled retrieval” of semantic in­
formation (e.g., Badre & Wagner, 2007). We and others (p. 364) have suggested that this
region, although critical in semantic memory retrieval, performs a domain-general func­
tion as a dynamic filtering mechanism that biases neural responses toward task-relevant
information while gating task-irrelevant information (Shimamura, 2000; Thompson-Schill,
2003; Thompson-Schill et al., 2005). In other words, when a context or task requires us to
focus on specific aspects of our semantic memory, the left ventrolateral prefrontal cortex
biases which aspects of our distributed knowledge system will be most active.

Individual Differences in Access to and in the


Organization of Semantic Memory
Earlier, we discussed two types of evidence that support sensorimotor models of semantic
memory: (1) sensorimotor regions are active during concept retrieval, and (2) damage to
a sensorimotor region affects the ability to retrieve the corresponding attribute of an ob­
ject. However, we have not yet addressed an additional claim of sensorimotor-based theo­

Page 17 of 40
Semantic Memory

ries: If it is true that the sensorimotor regions that are active when an object is perceived
are the same regions that represent its meaning, then an individual’s experience with
that object should shape the way it is represented. In other words, the studies that we
have described so far have explored the way that concepts are represented in the “aver­
age” brain, and the extent to which the findings have been consistent presumably reflects
the commonalities in human experience. Yet studying the average brain does not allow us
to explore whether, as predicted by sensorimotor-based theories, differences in individu­
als’ experiences result in differences in their representation of concepts. In this section
we describe some ways in which individual differences influence the organization of se­
mantic memory.

Are There Individual Differences in Semantic Representations?

Semantic representations appear to vary as a consequence of lifelong individual differ­


ences in sensorimotor experience: For instance, recruitment of left parietal cortex (a re­
gion involved in object-related action) during the retrieval of object shapes was modulat­
ed by the amount of lifetime tactile experience associated with the objects (Oliver et al.,
2009). Similarly, right- and left-handed people, who use their hands differently to perform
various actions with manipulable objects, employ homologous but contralateral brain re­
gions to represent those objects: When participants named tools, handedness influenced
the lateralization of premotor activity (Kan et al., 2006). Critically, handedness was not a
predictor of premotor lateralization for objects that are not acted on manually (animals).
In related work, while reading action verbs (e.g., write, throw) right-handed participants
activated primarily left premotor cortex regions, whereas left-handed participants activat­
ed primarily right premotor cortex regions (Willems et al., 2010). No such difference was
observed for nonmanual action verbs (e.g., kneel, giggle). Analogous findings have been
observed for long-term experience with sports: When reading sentences describing ice
hockey (but not when reading about everyday experiences), professional ice hockey play­
ers activated premotor regions more than nonplayers did (Beilock et al., 2008). Further,
such differences are not limited to motor experience: When professional musicians identi­
fied pictures of musical instruments (but not control objects), they activated auditory as­
sociation cortex and adjacent areas more than nonmusicians did (Hoenig et al., 2011).

Even with much less than a lifetime of experience, the neural representation of an object
can reflect specific experience with it. Oliver and colleagues (2008) asked one set of par­
ticipants to learn (by demonstration) actions for a set of novel objects, perform those ac­
tions, and also view the objects, whereas a second set of participants viewed the same ob­
jects without learning actions but had the same total amount of exposure to them. In a
subsequent fMRI session in which participants made judgments about visual properties of
the objects, activity in parietal cortex was found to be modulated by the amount of tactile
and action experience a participant had with a given object. These and related findings
(Kiefer et al., 2007; Weisberg et al., 2007) demonstrate a causal link between experience
with an object and its neural representation, and also show that even relatively short-
term differences in sensorimotor experience can influence an object’s representation.

Page 18 of 40
Semantic Memory

Intriguingly, changes in individual experience may also lead to changes in the representa­
tion of abstract concepts. Right-handers’ tendency to associate “good” with “right” and
“bad” with “left” (Casasanto, 2009) can be reversed when right hand dominance is com­
promised because of stroke or a temporary laboratory-induced handicap (Casasanto &
Chrysikou, 2011).

What Happens When a Sensory Modality Is Missing?

As would be expected given the differences observed for handedness and relatively short-
term (p. 365) experience, more dramatic differences in individual experience have also
been shown to affect the organization of semantic knowledge. For instance, color influ­
ences implicit similarity judgments for sighted but not for blind participants (even when
blind participants have good explicit color knowledge of the items tested; Connolly et al.,
2007). Interestingly, this difference held only for fruits and vegetables, and not for house­
hold items, consistent with a large literature demonstrating that the importance of color
for an object concept varies according to how useful it is for recognizing the object (see
Tanaka & Presnell, 1999, for review).

However, differences in sensory experience do not always produce obvious differences in


the organization of semantic knowledge. For instance, when making judgments about
hand action, blind, like sighted, participants selectively activate left posterior middle tem­
poral areas that in sighted people have been associated with processing visual motion
(Noppeney et al., 2003). Furthermore, blind participants demonstrate category-specific
(nonliving vs. animal, in this case) activation in the same visual areas as sighted partici­
pants (ventral temporal and ventral occipital regions; Mahon et al., 2009). Because senso­
rimotor-based theories posit that visual experience accounts for the activation in visual
areas, the findings in these two studies may appear to be inconsistent with sensorimotor-
based theories and instead suggest an innate specification of action representation or of
living/nonliving category differences. However, given the substantial evidence that corti­
cal reorganization occurs if visual input is absent (for a review, see Amedi, Merabet,
Bermpohl, & Pascual-Leone, 2005), another possibility is that in blind participants these
“visual” regions are sensitive to nonvisual factors (e.g., shape information that is ac­
quired tactilely) that correlate with hand action and with the living/nonliving distinction.

Summary of Individual Differences in Semantic Memory

At first glance, the individual differences that we have described in this section may seem
surprising. If our concept of a lemon, for example, is determined by experience, then no
two individuals’ concepts of a lemon will be exactly the same. Further, your own concept
of a lemon is likely to change subtly over time, probably without conscious awareness. Yet
the data described above suggest that this is, in fact, what happens. Because sensorimo­
tor-based models assume that our representations of concepts are based on our experi­
ences with them, these models can easily account for, and in fact predict, these differ­
ences and changes. It is a challenge for future research to explore whether there are fac­

Page 19 of 40
Semantic Memory

tors that influence the extent to which we attend to different types of information, and
that constrain the degree to which representations change over time.

Abstract Knowledge
Our discussion of the organization of semantic memory has thus far focused primarily on
the physical properties of concrete objects. Clearly, though, a complete theory of seman­
tic memory must also provide an account for how we represent abstract concepts (e.g.,
peace) as well as abstract features of concrete objects (e.g., “used to tell time” is a prop­
erty of a clock). According to the “concreteness effect,” concrete words are processed
more easily than abstract words (e.g., Paivio, 1991) because their representations include
sensory information that abstract words lack. However, there have been reports of se­
mantic dementia patients who have more difficulty with concrete than abstract words
(Bonner et al., 2009; Breedin et al., 1994; but cf. Hoffman & Lambon-Ralph, 2011, and Jef­
feries et al., 2009, for evidence that the opposite pattern is more common in semantic de­
mentia), suggesting that there must be more to the difference between these word types
than quantity of information. Additional evidence for a qualitative difference between the
representations of concrete and abstract words comes from work by Crutch and Warring­
ton (2005). They reported a patient AZ, with left temporal, parietal, and posterior frontal
damage, who, for concrete words, exhibits more interference from words closely related
in meaning (e.g., synonyms) than for “associated” words (i.e., words that share minimal
meaning but often occur in similar contexts), whereas for abstract words, she displays the
opposite pattern.

Neuroimaging studies that have compared abstract and concrete words have identified
an inconsistent array of regions associated with abstract concepts: the left superior tem­
poral gyrus (Wise et al., 2000), right anterior temporal pole, or left posterior middle tem­
poral gyrus (Grossman et al., 2002). These inconsistencies may be due to the differing de­
mands of the tasks employed in these studies or to differences in how “abstract” is opera­
tionalized. The operational definition of abstract may be particularly important because it
varies widely across studies—ranging from words without sensorimotor associations to
words that have low imageability (p. 366) (i.e., words that are difficult to visualize) to emo­
tion words (e.g., love). We surmise that these differences likely have a particularly signifi­
cant influence on where brain activation is observed.

Using abstract stimuli intended to have minimal sensorimotor associations, Noppeney and
Price (2004) compared fMRI activation while subjects made judgments about words (com­
prising nouns, verbs, and adjectives) referring to visual, auditory, manual action, and ab­
stract concepts. Relative to the other conditions, abstract words activated the left inferior
frontal gyrus, middle temporal gyrus, superior temporal sulcus, and anterior temporal
pole. Because these are classical “language” areas, the authors suggest that the activa­
tions are a consequence of the representations of abstract words being more reliant on
contextual information provided by language. Recently, Rodriguez and colleagues (2011)
observed activation in these same regions for abstract verbs. They also observed that a

Page 20 of 40
Semantic Memory

greater number of regions were active for abstract relative to concrete verbs—leading
them to hypothesize that because abstract words appear in more diverse contexts (Hoff­
man et al., 2011), the networks supporting them are more broadly distributed.

Like abstract words, abstract features (e.g., “used to tell time”) have no direct sensorimo­
tor correlates. Our ability to conceive of abstract concepts and features—i.e., knowledge
that cannot be directly perceived from any individual sensory modality—demonstrates
that there must be more to semantic knowledge than simple sensorimotor echoes. How
might abstract concepts or features be represented in the kind of distributed architecture
that we have described? Rogers and colleagues’ model of semantic memory (introduced
above in the context of generalization) may be of service here as well. They argue that the
interaction between content-bearing perceptual representations and verbal labels pro­
duces a similarity space that is not captured in any single attribute domain, but rather re­
flects abstract similarity (cf. Caramazza, Hillis, Rapp, & Romani, 1990; Chatterjee, 2010;
Damasio, 1989; Plaut, 2002; Tyler, Moss, Durrant-Peatfield, & Levy, 2000).

Based on the abundant interconnections between the temporal pole and different sensori­
motor areas, and on the fact that temporal pole degeneration is associated with semantic
dementia (introduced in earlier), Rogers and colleagues suggest that this region may sup­
port abstract knowledge and generalization. Semantic dementia, in particular, has had a
large influence on ideas about the anterior temporal lobes’ role in semantic memory. In
this disorder, relatively focal degeneration in the anterior temporal lobes accompanies se­
mantic memory deficits (e.g., problems naming, recognizing, and classifying objects, re­
gardless of category), whereas other cognitive functions are relatively spared (see
Hodges & Patterson, 2007, for a review). The concomitance of the anatomical and cogni­
tive impairments in semantic dementia therefore lends credence to the idea that the ante­
rior temporal lobes are important for supporting semantic memory (see Patterson et al.,
2007, for a review). Additional research is needed to explore whether brain regions be­
yond the anterior temporal lobe serve similar “converging” functions.

Methodological Advances
The studies reviewed in this chapter employed behavioral, neuropsychological, and neu­
roimaging techniques to explore the organization and function of semantic memory. A
number of methodologies that have recently been introduced in cognitive neuroscience
hold much promise for the study of semantic memory.

First, new approaches in experimental design and data analysis for neuroimaging-based
studies allow cognitive neuroscientists to address more fine-grained questions about the
neural representation of concepts. For example, questions relating to representational
similarity can be explored with fMRI adaptation (e.g., Grill-Spector & Malach, 2001). This
technique relies on the assumption that when stimuli that are representationally similar
are presented sequentially, the repeated activation of the same set of neurons will pro­
duce a reduced fMRI response. If the stimuli are representationally distinct, no such
adapted response should be observed. This method can be applied to address a number of
Page 21 of 40
Semantic Memory

questions pertaining, for instance, to relationships between regions implicated in the pro­
cessing of different object attributes (e.g., color, shape, and size; see Yee et al., 2010, for
function and manipulation), or to the degree to which the same neurons are involved in
perception and in conceptual representation. Similarly, multivoxel pattern analysis (e.g.,
Mitchell et al., 2008; Norman et al., 2006; Weber et al., 2009) and functional connectivity
approaches allow for analyses that exploit the distributed nature of brain activation,
rather than focusing on focal activation peaks (see Rissman & Wagner, 2012).

Second, noninvasive brain stimulation techniques, specifically TMS and transcranial di­
rect current stimulation (tDCS), allow researchers to temporarily “lesion” a given brain
region and (p. 367) observe the effects on behavior (e.g., Antal et al., 2001, 2008; Walsh &
Pascual-Leone, 2003). In contrast to studying patients in the months and years after brain
injuries that produce permanent lesions, using these “virtual lesions” allows cognitive
neuroscientists to examine the role of a given brain region without the possibility that re­
organization of neural function has occurred.

Third, cognitive neuroscience has benefited from advances in eye-tracking research, in


which eye movements to objects are monitored as participants listen to spoken language
(Cooper 1974; Tanenhaus et al., 1995). Hearing a word (e.g., piano) produces eye move­
ments toward pictures of semantically related objects (e.g., a trumpet; Yee & Sedivy,
2006), and the probability of looking at the related object is predicted by how far away it
is in “semantic space” (calculated in terms of the degree of featural overlap; Huettig &
Altmann, 2005). This semantic eye-tracking paradigm has been used to explore specific
dimensions of featural overlap (e.g., shape, color, manipulation) and is well suited to in­
vestigating semantic representations in patients with brain damage (Mirman & Graziano,
2012; Myung et al., 2010). Such behavioral paradigms inform cognitive neuroscience of
the behavioral consequences of the manner in which semantic memory is organized.

Implications and Future Directions


Is There Something Special about Action?

Much of the work in cognitive neuroscience that has been the focus of this chapter indi­
cates that semantic representations are at least partially sensorimotor based. One senso­
rimotor modality in particular, action, has received a great deal of attention, perhaps be­
cause of the discovery of “mirror neurons”—cells that respond both when an action is
perceived and when it is performed (Rizzolatti & Craighero, 2004). This focus on action
has led to a common criticism of sensorimotor-based theory: Being impaired in perform­
ing actions does not entail being unable to conceive of objects with strongly associated
actions—suggesting that action may not, in fact, be part of these conceptual representa­
tions.7

There are at least three important points to keep in mind with respect to this criticism.
First, concepts are more than associated actions (and in fact many concepts—e.g., book­

Page 22 of 40
Semantic Memory

shelf or tree—may have weakly if any associated actions). As a result, sensorimotor-based


representations can include many different components (e.g., visual, auditory, and olfac­
tory as well as action oriented) that are distributed across cortex. For this reason, under a
sensorimotor-based account it would be surprising if all of these components were dam­
aged simultaneously. This means that losing part of a representation does not entail los­
ing the entire concept—just as losing one finger from a hand does not entail loss of the
entire hand. Moreover, as highlighted in our discussion of abstract features, all of the var­
ious sensorimotor components still make up only part of conceptual knowledge—because
semantic knowledge is only partially sensorimotor. Second, even concepts that at first
glance seem predominantly action based generally comprise more than action alone. For
example, our knowledge of kicking may include not only the action but also the contexts
in which kicking is likely to occur (see Taylor & Zwaan, 2009, for a discussion of the many
possible components of action knowledge and the resulting implications for “fault-toler­
ant comprehension”).

Third, recent research (reviewed earlier) suggests that depending on the demands of the
task, we are able to dynamically focus our attention on different aspects of a concept.
This means that sensorimotor-based distributed models are not inconsistent with finding
that an action is not routinely activated when the concept is activated, or that patients
with disorders of action can respond successfully to concepts that are action based if the
task does not require access to action information. In fact, such findings fall naturally out
of the architecture of these models. Such models allow for semantic memory to exhibit
some degree of gracefuldegradation (or fault tolerance) in that representations can con­
tinue to be accessed despite the loss of some of their components.

Is Semantic Memory Really “Shared Knowledge”?

Semantic memory is often referred to as “shared knowledge,” to distinguish it from the


individual experiences that make up episodic memory. Yet in this chapter we have empha­
sized that individual experience, task, and context all influence the extent to which differ­
ent aspects of an object’s representation become active over time. Thus, when conceiving
of an object, there may be no fixed representational “outcome” that is stable across dif­
ferent episodes of conceiving of it (or even across time within an episode), let alone
across individuals. This raises a significant challenge for how to define and understand
semantic memory: Because semantic memory is “shared knowledge” only to the extent
that our experiences (both long and short term) are shared, (p. 368) understanding the
representation and retrieval of semantic knowledge may depend on our ability to de­
scribe aspects of these representations that are not shared. Future work must therefore
do more than discover the extent to which various attributes are routinely activated for
certain concepts. It should also attempt to characterize variations in the neural bases of
semantic memory, as well as the neural mechanisms by which context or task demands
modulate which aspects of a concept are activated (and at what rate), allowing for contin­
uously changing outcomes (for further discussion, see Spivey, 2007).

Page 23 of 40
Semantic Memory

From Categories to Semantic Spaces

Many of the studies described in this chapter explored the organization of semantic mem­
ory by comparing the neural responses to traditionally defined categories (e.g., animals
vs. tools). However, a more fruitful method of understanding conceptual representations
may be to compare individual concepts to one another, and extract dimensions that de­
scribe the emergent similarity space. The newer methods of analyzing neuroimaging data
discussed above (such as fMRI adaptation and multi-voxel pattern analysis, or MVPA) are
well suited to the task of describing these types of neural similarity spaces. Further, by
making inferences from these spaces, it is possible to discover what type of information is
represented in a given cortical region (e.g., Mitchell et al., 2008; Weber et al., 2009; Yee
et al., 2010). Overall, our understanding of semantic memory can benefit more from
studying individual items (e.g., Bedny et al., 2007) and their relations to each other, than
from simply examining categories as unified wholes.

Where Does Semantic Memory Fit in the Traditional Taxonomy of


Memory?

Traditionally, semantic memory is considered to be part of the declarative (explicit) mem­


ory system (Squire, 1987). Yet the sensorimotor-based frameworks we have discussed in
this chapter suggest that semantic memory is also partially composed of information con­
tained in sensorimotor systems and can be probed through (implicit) perceptual priming.
The amnesic patients we discussed in the first section of this chapter also support the
idea that semantic memory is at least partially implicit, in that they are able to acquire
some semantic knowledge despite severely impaired episodic memories. Hence, the cur­
rent conception of semantic memory does not seem to fit cleanly into existing descrip­
tions of either declarative (explicit) or nondeclarative (implicit) memory. Rather, our
knowledge about the world and the objects in it appears to rely on both declarative and
nondeclarative memory.

Summary
In this chapter we have briefly summarized a wide variety of data pertaining to the cogni­
tive neuroscience of semantic memory. We reviewed different schemes for characterizing
the organization of semantic memory and argued that the bulk of the evidence converges
to support sensorimotor-based models (which extend sensory-functional theory). Because
these models allow for, and in fact are predicated on, a role for degree and type of experi­
ence (which will necessarily vary by individual and by concept), they are able to accom­
modate a wide variety of observations. Importantly, they can also make specific, testable
predictions regarding experience. Finally, it is important to emphasize that although often
pitted against one another in service of testing specific hypotheses, sensorimotor and cor­
related feature-based models are not at odds with a categorical-like organization. In fact,
both were developed to provide a framework in which a categorical organization can
emerge from commonalities in the way we interact with and experience similar objects.

Page 24 of 40
Semantic Memory

References
Allport, D. A. (1985). Distributed memory, modular subsystems and dysphasia. In S. K.
Newman & R. Epstein (Eds.), Current perspectives in dysphasia (pp. 207–244). Edin­
burgh: Churchill Livingstone.

Amedi, A., Merabet, L., Bermpohl, F., & Pascual-Leone, A. (2005). The occipital cortex in
the blind: Lessons about plasticity and vision. Current Directions in Psychological
Science, 16, 306–311.

Antal, A., Nitsche, M. A., & Paulus, W. (2001). External modulation of visual perception in
humans, NeuroReport, 12, 3553–3555.

Antal, A., & Paulus, W. (2008). Transcranial direct current stimulation of visual percep­
tion, Perception, 37, 367–374.

Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive
control of memory. Neuropsychologia, 45, 2883–1901.

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral Brain Science, 22, 577–
660.

Bedny, M., Aguirre, G. K., & Thompson-Schill, S. L. (2007). Item analysis in functional
magnetic resonance imaging. NeuroImage, 35, 1093–1102.

Beilock, S. L., Lyons, I. M., Mattarella-Micke, A., Nusbaum, H. C., & Small, S. L. (2008).
Sports experience changes the neural processing of action language. Proceedings of the
National Academy of Sciences, 105, 13269–13273.

Bonner, M. F., & Grossman, M. (2012). Gray matter density of auditory association cortex
relates to knowledge of sound concepts in primary progressive aphasia. Journal of Neuro­
science, 32 (23), 7986–7991.

Bonner, M. F., Vesely, L., Price, C., Anderson, C., Richmond, L., Farag, C., et al. (2009). Re­
versal of the concreteness effect in semantic dementia. Cognitive Neuropsychology, 26,
568–579.

Bozeat, S., Lambon Ralph, M. A., Patterson, K., Garrard, P., & Hodges, J. R. (2000). Non-
verbal semantic impairment in semantic dementia. Neuropsychologia, 38, 1207–1215.

Bozeat, S., Lambon Ralph, M. A., Patterson, K., & Hodges, J. R. (2002a). The influence of
personal familiarity and context on object use in semantic dementia. Neurocase, 8, 127–
134.

Bozeat, S., Lambon Ralph, M. A., Patterson, K., & Hodges, J. R. (2002b). When objects
lose their meaning: What happens to their use? Cognitive, Affective, & Behavioral Neuro­
sciences, 2, 236–251.

Page 25 of 40
Semantic Memory

Bozeat, S., Patterson, K., & Hodges, J. R. (2004). Relearning object use in semantic de­
mentia. Neuropsychological Rehabilitation, 14, 351–363.

Bindschaedler, C., Peter-Favre, C., Maeder, P., Hirsbrunner, T., & Clarke, S. (2011). Grow­
ing up with bilateral hippocampal atrophy: From childhood to teenage. Cortex, 47, 931–
944.

Breedin, S. D., Saffran, E. M., & Coslett, H. B. (1994). Reversal of the concreteness effect
in a patient with semantic dementia. Cognitive Neuropsychology, 11, 617–660.

Bright, P., Moss, H., & Tyler, L.K. (2004). Unitary vs multiple semantics: PET studies of
word and picture processing. Brain and Language, 89, 417–432.

Buxbaum, L. J., & Saffran, E. M. (2002). Knowledge of object manipulations and object
function: Dissociations in apraxic and nonapraxic subjects. Brain and Language, 82, 179–
199.

Buxbaum, L. J., Veramonti, T., & Schwartz, M. F. (2000). Function and manipulation tool
knowledge in apraxia: Knowing “what for” but not “how.” Neurocase, 6, 83–97.

Caramazza, A. (2000). Minding the facts: A comment on Thompson-Schill et al’s “A neural


basis for category and modality specificity of semantic knowledge.” Neuropsychologia,
38, 944–949.

Caramazza, A., Hillis, A. E., Rapp, B. C., & Romani, C. (1990). The multiple semantics hy­
pothesis: Multiple confusions? Cognitive Neuropsychology, 7, 161–189.

Caramazza, A., & Shelton, J. R. (1998). Domain-specific knowledge systems in the brain
the animate-inanimate distinction. Journal of Cognitive Neuroscience 10, 1–34.

Casasanto, D. (2009). Embodiment of abstract concepts: Good and bad in right- and left-
handers. Journal of Experimental Psychology: General, 138, 351–367.

Casasanto, D., & Chrysikou, E. G. (2011). When left is “right”: Motor fluency shapes ab­
stract concepts. Psychological Science, 22, 419–422.

Chainay, H., & Humphreys, G. W. (2002). Privileged access to action for objects relative to
words. Psychonomic Bulletin & Review, 9, 348–355.

Chan, A. M., Baker, J. M., Eskandar, E., Schomer, D., Ulbert, I., Marinkovic, K., Cash, S.
C., & Halgren, E. (2011). First-pass selectivity for semantic categories in human an­
teroventral temporal lobe. Journal of Neuroscience, 31, 18119–18129.

Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates for per­
ceiving and knowing about objects. Nature Neuroscience, 2, 913–919.

Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.

Page 26 of 40
Semantic Memory

Chatterjee, A. (2010). Disembodying cognition. Language and Cognition. 2-1, 79–116.

Chrysikou, E. G., & Thompson-Schill, S. L. (2010). Are all analogies created equal? Pre­
frontal cortical functioning may predict types of analogical reasoning (commentary). Cog­
nitive Neuroscience, 1, 141–142.

Clarke, A., Taylor, K. I., & Tyler, L. K. (2011). The evolution of meaning: Spatiotemporal
dynamics of visual object recognition. Journal of Cognitive Neuroscience, 23, 1887–1899.

Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of
Verbal Learning and Verbal Behavior, 8, 240–247.

Connolly, A. C., Gleitman, L. R., & Thompson-Schill, S. L. (2007). The effect of congenital
blindness on the semantic representation of some everyday concepts. Proceedings of the
National Academy of Sciences, 104, 8241–8246.

Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A
new methodology for the real-time investigation of speech perception, memory, and lan­
guage processing. Cognitive Psychology, 6, 84–107.

Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and com­
putation of the meaning of chipmunk, cherry, chisel, cheese and cello (and many other
such concrete nouns). Journal of Experimental Psychology: General, 132, 163–201.

Crutch, S. J., & Warrington, E. K. (2005). Abstract and concrete concepts have structural­
ly different representational frameworks. Brain, 128, 615–627.

Damasio, A. R. (1989). The brain binds entities and events by multiregional activa­
(p. 370)

tion from convergence zones. Neural Computation, 1, 123–132.

Farah, M. J., & McClelland, J. L. (1991). A computational model of semantic memory im­
pairment: modality specificity and emergent category specificity. Journal of Experimental
Psychology: General, 120, 339–357.

Frith, C. (2000). The role of dorsolateral prefrontal cortex in the selection of action as re­
vealed by functional imaging. In S. Monsell & J. Driver (Eds.), Control of cognitive
processes (pp. 549–565). Cambridge, MA: MIT Press.

Funnell, E. (1995a). A case of forgotten knowledge. In R. Campbell & M. A. Conway


(Eds.), Broken memories (pp. 225–236). Oxford, UK: Blackwell Publishers.

Funnell, E. (1995b). Objects and properties: A study of the breakdown of semantic memo­
ry. Memory, 3, 497–518.

Funnell, E. (2001). Evidence for scripts in semantic dementia: Implications for theories of
semantic memory. Cognitive Neuropsychology, 18, 323–341.

Page 27 of 40
Semantic Memory

Gabrieli, J. D. E., Cohen, N. J., & Corkin, S. (1988). The impaired learning of semantic
knowledge following bilateral medial temporal-lobe resection. Brain & Cognition, 7, 151–
177.

Gabrieli, J. D., Poldrack, R. A., & Desmond, J. E. (1998). The role of left prefrontal cortex
in language and memory. Proceedings of the National Academy of Sciences of the United
States of America, 95 (3), 906–913.

Gage, N., & Hickok, G. (2005). Multiregional cell assemblies, temporal binding, and the
representation of conceptual knowledge in cortex: A modern theory by a “classical” neu­
rologist, Carl Wernicke. Cortex, 41, 823–832.

Gainotti, G. (2000). What the locus of brain lesion tells us about the nature of the cogni­
tive defect underlying category-specific disorders: A review. Cortex, 36, 539–559.

Gardiner, J. M., Brandt, K. R., Baddeley, A. D., Vargha-Khadem, F., & Mishkin, M. (2008).
Charting the acquisition of semantic knowledge in a case of developmental amnesia. Neu­
ropsychologia. 46, 2865–2868.

Garrard, P., & Hodges, J. R. (1999). Semantic dementia: Implications for the neural basis
of language and meaning. Aphasiology, 13, 609–623.

Gates, L., & Yoon, M. G. (2005). Distinct and shared cortical regions of the human brain
activated by pictorial depictions versus verbal descriptions: An fMRI study. NeuroImage,
24, 473–486.

Gerlach, C., Law, I., Gade, A., & Paulson, O.B. (2000). Categorization and category effects
in normal object recognition: A PET study. Neuropsychologia, 38, 1693–1703.

Goldberg, R. F., Perfetti, C. A., & Schneider, W. (2006). Perceptual knowledge retrieval ac­
tivates sensory brain regions. Journal of Neuroscience, 26, 4917–4921.

Gonnerman, L. M., Andersen, E. S., Devlin, J. T., Kempler, D., & Seidenberg, M. S. (1997).
Double dissociation of semantic categories in Alzheimer’s disease. Brain and Language,
57, 254–279.

Gonzalez, J., Barros-Loscertales, A., Pulvermuller, F., Meseguer, V., Sanjuan, A., Belloch,
V., et al. (2006). Reading cinnamon activates olfactory brain regions. NeuroImage, 32,
906–912.

Grabowski, T. J., Damasio, H., Tranel, D., Boles Ponto, L. L., Hichwa, R.D., & Damasio, A.
R. (2001). A role for left temporal pole in the retrieval of words for unique entities. Hu­
man Brain Mapping, 13, 199–212.

Graham, K. S., Lambon Ralph, M. A., & Hodges, J. R. (1997). Determining the impact of
autobiographical experience on “meaning”: New insights from investigating sports-relat­

Page 28 of 40
Semantic Memory

ed vocabulary and knowledge in two cases with semantic dementia. Cognitive Neuropsy­
chology, 14, 801–837.

Graham, K. S., Lambon Ralph, M. A., & Hodges, J. R. (1999). A questionable semantics:
The interaction between semantic knowledge and autobiographical experience in seman­
tic dementia. Cognitive Neuropsychology, 16, 689–698.

Graham, K. S., Simons, J. S., Pratt, K. H., Patterson, K., & Hodges, J. R. (2000). Insights
from semantic dementia on the relationship between episodic and semantic memory. Neu­
ropsychologia, 38, 313–324.

Greve, A., van Rossum, M. C. W., & Donaldson, D. I. (2007). Investigating the functional
interaction between semantic and episodic memory: Convergent behavioral and electro­
physiological evidence for the role of familiarity. NeuroImage, 34, 801–814.

Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica, 107, 293–321.

Grossman, M., Koenig, P., DeVita, C., Glosser, G., Alsop, D., Detre, J., & Gee, J. (2002). The
neural basis for category-specific knowledge: An fMRI study. NeuroImage, 15, 936–948.

Hart, J., & Kraut, M. A. (2007). Neural hybrid model of semantic object memory (version
1.1). In J. Hart & M. A. Kraut (Eds.), Neural basis of semantic memory (pp. 331–359). New
York: Cambridge University Press.

Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action
words in human motor and premotor cortex. Neuron, 41, 301–307.

Hauk, O., Shtyrov, Y., & Pulvermuller, F. (2008). The time course of action and action-word
comprehension in the human brain as revealed by neurophysiology. Journal of Physiology,
Paris 102, 50–58.

Hillis, A., & Caramazza, A. (1995). Cognitive and neural mechanisms underlying visual
and semantic processing: Implication from “optic aphasia.” Journal of Cognitive Neuro­
science, 7, 457–478.

Hodges, J. R., & Patterson, K. (2007). Semantic dementia: a unique clinicopathological


syndrome. The Lancet Neurology, 6, 1004–1014.

Hodges, J. R., Patterson, K., Oxbury, S., & Funnell, E. (1992). Semantic dementia: Pro­
gressive fluent aphasia with temporal lobe atrophy. Brain, 115, 1783–1806.

Hodges, J. R., Patterson, K., Ward, R., Garrard, P., Bak, T., Perry, R., & Gregory, C. (1999).
The differentiation of semantic dementia and frontal lobe dementia (temporal and frontal
variants of frontotemporal dementia) from early Alzheimer’s disease: A comparative neu­
ropsychological study. Neuropsychology, 13, 31–40.

Page 29 of 40
Semantic Memory

Hoenig, K., Müller, C., Herrnberger, B., Spitzer, M., Ehret, G., & Kiefer, M. (2011). Neuro­
plasticity of semantic maps for musical instruments in professional musicians. NeuroI­
mage, 56, 1714–1725.

Hoenig, K., Sim, E.-J., Bochev, V., Herrnberger, B., & Kiefer, M. (2008). Conceptual flexi­
bility in the human brain: Dynamic recruitment of semantic maps from visual, motion and
motor-related areas. Journal of Cognitive Neuroscience, 20, 1799–1814.

Hoffman, P., & Lambon Ralph, M. A. (2011). Reverse concreteness effects are not
(p. 371)

a typical feature of semantic dementia: Evidence for the hub-and-spoke model of concep­
tual representation. Cerebral Cortex, 21, 2103–2112.

Hoffman, P., Rogers, T. T., & Lambon Ralph, M. A. (2011). Semantic diversity accounts for
the “missing” word frequency effect in stroke aphasia: Insights using a novel method to
quantify contextual variability in meaning. Journal of Cognitive Neuroscience, 23, 2432–
2446.

Hsu, N. S., Kraemer, D. J. M., Oliver, R. T., Schlichting, M. L., & Thompson-Schill, S. L.
(2011). Color, context, and cognitive style: Variations in color knowledge retrieval as a
function of task and subject variables. Journal of Cognitive Neuroscience, 23, 2554–2557.

Huettig, F., & Altmann, G. T. M. (2005). Word meaning and the control of eye fixation: Se­
mantic competitor effects and the visual world paradigm. Cognition, 96, B23–B32.

Humphreys, G. W., & Forde, E. M. (2001). Hierarchies, similarity, and interactivity in ob­
ject recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain
Sciences, 24, 453–476.

Humphreys, G. W., Riddoch, M. J., & Quinlan, P. T. (1988). Cascade processes in picture
identification. Cognitive Neuropsychology, 5, 67–103.

Jefferies, E., Patterson, K., Jones, R. W., & Lambon Ralph, M. A. (2009) Comprehension of
concrete and abstract words in semantic dementia. Neuropsychology, 23, 492–499.

Ishai, A., Ungerleider, L. G., & Haxby, J. V. (2000). Distributed neural systems for the gen­
eration of visual images. Neuron, 28, 979–990.

Kable, J. W., Kan, I. P., Wilson, A., Thompson-Schill, S. L., & Chatterjee, A. (2005). Concep­
tual representations of action in lateral temporal cortex. Journal of Cognitive Neuro­
science, 17, 855–870.

Kan, I. P., Alexander, M. P., & Verfaellie, M. (2009). Contribution of prior semantic knowl­
edge to new episodic learning in amnesia. Journal of Cognitive Neuroscience, 21, 938–
944.

Kan, I. P., Kable, J. W., Van Scoyoc, A., Chatterjee, A., & Thompson-Schill, S. L. (2006).
Fractionating the left frontal response to tools: Dissociable effects of motor experience
and lexical competition. Journal of Cognitive Neuroscience, 18, 267–277.

Page 30 of 40
Semantic Memory

Kan, I. P., & Thompson-Schill, S. L. (2004). Effect of name agreement on prefrontal activi­
ty during overt and covert picture naming. Cognitive, Affective, & Behavioral Neuro­
science, 4, 43–57.

Kellenbach, M. L., Brett, M., & Patterson, K. (2001). Large, colorful or noisy? Attribute-
and modality-specific activations during retrieval of perceptual attribute knowledge. Cog­
nitive, Affective, & Behavioral Neuroscience, 1, 207–221.

Kellenbach, M., Brett, M., & Patterson, K. (2003). Actions speak louder than functions:
The importance of manipulability and action in tool representation. Journal of Cognitive
Neuroscience, 15, 30–46.

Kiefer, M., Sim, E.-J., Herrnberger, B., Grothe, J. & Hoenig, K. (2008). The sound of con­
cepts for markers for a link between auditory and conceptual brain systems. Journal of
Neuroscience, 28, 12224–12230.

Kiefer, M., Sim, E.-J., Liebich, S., Hauk, O., & Tanaka, J. (2007). Experience-dependent
plasticity of conceptual representations in human sensory-motor areas. Journal of Cogni­
tive Neuroscience, 19, 525–542.

Kosslyn, S. M., & Thompson, W. L. (2000). Shared mechanisms in visual imagery and visu­
al perception: Insights from cognitive neuroscience. In M. S. Gazzaniga (Ed.), The new
cognitive neurosciences (2nd ed., pp. 975–985). Cambridge, MA: MIT Press.

Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its chal­
lenge to Western thought. New York: Basic Books.

Lambon Ralph, M. A., Lowe, C., & Rogers, T. (2007). Neural basis of category-specific se­
mantic deficits for living things: evidence from semantic dementia, HSVE and a neural
network model. Brain, 130 (Pt 4), 1127–1137.

Mahon, B. Z., Anzellotti, S., Schwarzbach, J., Zampini, M., & Caramazza, A. (2009). Cate­
gory-specific organization in the human brain does not require visual experience. Neuron,
63, 397–405.

Mahon, B. Z., & Caramazza, A. (2003). Constraining questions about the organization &
representation of conceptual knowledge. Cognitive Neuropsychology, 20, 433–450.

Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothe­
sis and a new proposal for grounding conceptual content. Journal of Physiology—Paris,
102, 59–70.

Martin, A. (2007) The representation of object concepts in the brain. Annual Review of
Psychology, 58, 25–45.

Martin, A., & Chao, L. L. (2001). Semantic memory and the brain: Structure and process­
es. Current Opinion in Neurobiology, 11, 194–201.

Page 31 of 40
Semantic Memory

Martin, A., Haxby, J. V., Lalonde, F. M., Wiggs, C. L., & Ungerleider, L. G. (1995). Discrete
cortical regions associated with knowledge of color and knowledge of action. Science,
270, 102–105.

Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of cat­
egory-specific knowledge. Nature, 379, 649–652.

McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach to
semantic cognition. Nature Reviews Neuroscience, 4, 310–322.

McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural
representations of word meaning. Journal of Experimental Psychology: General, 126, 99–
130.

Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets top-
down: Neuronal interactions during perception and imagery. Cerebral Cortex, 14, 1256–
1265.

Mesulam, M. M., Grossman, M., Hillis, A., Kertesz, A., & Weintraub, S. (2003). The core
and halo of primary progressive aphasia and semantic dementia. Annals of Neurology, 54,
S11–S14.

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­
nual Review of Neuroscience, 24, 167–202.

Mirman, D., & Graziano, K. M. (2012). Damage to temporoparietal cortex decreases inci­
dental activation of thematic relations during spoken word comprehension. Neuropsy­
chologia, 50, 1990–1997.

Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K.-M., Malave, V. L., Mason, R. A.,
Just, M. A. (2008). Predicting human brain activity associated with the meanings of
nouns. Science, 320, 1191–1195.

Moore, C. J., & Price, C. J. (1999). A functional neuroimaging study of the variables that
generate category-specific object processing differences. Brain, 122, 943–962.

Mummery, C. J., Patterson, K., Hodges, J. R., & Price, C. J. (1998). Functional neuroanato­
my of the semantic system: Divisible by what? Journal of Cognitive Neuroscience, 10,
766–777.

(p. 372) Mummery, C. J., Patterson, K., Wise, R. J. S., Vandenbergh, R., Price, C. J., &
Hodges, J. R. (1999). Disrupted temporal lobe connections in semantic dementia. Brain,
122, 61–73.

Myung, J., Blumstein, S. E., Yee, E., Sedivy, J. C., Thompson-Schill, S. L., & Buxbaum, L. J.
(2010). Impaired access to manipulation features in apraxia: Evidence from eyetracking
and semantic judgment tasks. Brain and Language, 112, 101–112.

Page 32 of 40
Semantic Memory

Noppeney, U. (2009). The sensory-motor theory of semantics: Evidence from functional


imaging. Language and Cognition, 1-2, 249–276.

Noppeney, U., Friston, K., & Price, C. (2003). Effects of visual deprivation on the organiza­
tion of the semantic system. Brain, 126, 1620–1627.

Noppeney, U., Patterson, K., Tyler, L. K., Moss, H., Stamatakis, E. A., Bright, P., Mummery,
C., & Price, C. J. (2007). Temporal lobe lesions and semantic impairment: A comparison of
herpes simplex virus encephalitis and semantic dementia. Brain, 130 (Pt 4), 1138–1147.

Noppeney, U., & Price, C. J. (2002). Retrieval of visual, auditory, and abstract semantics.
NeuroImage, 15, 917–926.

Noppeney, U., & Price, C. J. (2004). Retrieval of abstract semantics. NeuroImage, 22, 164–
170.

Noppeney, U., Price, C. J., Friston, K. J., & Penny, W. D. (2006). Two distinct neural mecha­
nisms for category-selective responses. Cerebral Cortex, 1, 437–445.

Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Mul­
ti-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10, 424–430.

Okada, T., Tanaka, S., Nakai, T., Nishiwaza, S., Inui, T., Sadato, N., Yonekura, Y., & Kon­
ishi, J. (2000). Naming of animals and tools: A functional magnetic resonance imagine
study of categorical differences in the human brain areas commonly used for naming vi­
sually presented objects. Neuroscience Letters, 296, 33–36.

O’Kane G., Kensinger E. A., Corkin S. (2004). Evidence for semantic learning in profound
amnesia: An investigation with patient H.M. Hippocampus 14, 417–425.

Oliver, R. T., Geiger, E. J., Lewandowski, B. C., & Thompson-Schill, S. L. (2009). Remem­
brance of things touched: How sensorimotor experience affects the neural instantiation of
object form. Neuropsychologia, 47, 239–247.

Oliver, R. T., Parsons, M. A., & Thompson-Schill, S. L. (2008). Hands on learning: Varia­
tions in sensorimotor experience alter the cortical response to newly learned objects. San
Francisco: Cognitive Neuroscience Society.

Paivio, A. (1969). Mental Imagery in associative learning and memory. Psychological Re­
view, 76, 241–263.

Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart, and Winston.

Paivio, A. (1978). The relationship between verbal and perceptual codes. In E. C.


Carterette & M. P. Friedman (Eds.), Handbook of perception (Vol. 8, pp 375–397). London:
Academic Press.

Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of
Psychology, 45, 255–287.
Page 33 of 40
Semantic Memory

Patterson, K., Lambon-Ralph, M. A., Jefferies, E., Woollams, A., Jones, R., Hodges, J. R., &
Rogers, T. T. (2006). “Presemantic” cognition in semantic dementia: Six deficits in search
of an explanation. Journal of Cognitive Neuroscience, 18, 169–183.

Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know?
The representation of semantic knowledge in the human brain. Nature Reviews Neuro­
science, 8, 976–987.

Perani, D., Cappa, S. F., Bettinardi, V., Bressi, S., Gorno-Tempini, M., Matarrese, & Fazio,
F. (1995). Different neural systems for the recognition of animals and man-made tools.
NeuroReport, 6, 1637–1641.

Phillips, J. A., Noppeney, U., Humphreys, G. W., & Price, C. J. (2002). Can segregation
within the semantic system account for category specific deficits? Brain, 125, 2067–2080.

Plaut, D. C. (2002). Graded modality-specific specialization in semantics: A computational


account of optic aphasia. Cognitive Neuropsychology, 19, 603–639.

Pobric, G., Jefferies, E., & Lambon Ralph, M. A. (2010) Induction of category-specific vs.
general semantic impairments in normal participants using rTMS. Current Biology, 20,
964–968.

Price, C. J., Noppeney, U., Phillips, J. A., & Devlin, J. T. (2003). How is the fusiform gyrus
related to category-specificity? Cognitive Neuropsychology, 20 (3-6), 561–574.

Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental
imagery. Psychological Bulletin, 80, 1–24.

Raposo, A., Moss, H. E., Stamatakis, E. A., & Tyler, L. K. (2009). Modulation of motor and
premotor cortices by actions, action words and action sentences. Neuropsychologia, 47,
388–396.

Riddoch, M. J., & Humphreys, G. W. (1987). Visual object processing in a case of optic
aphasia: A case of semantic access agnosia. Cognitive Neuropsychology, 4, 131–185.

Rissman, J., & Wagner, A. D. (2012). Distributed representations in memory: Insights from
functional brain imaging. Annual Review of Psychology, 63, 101–128.

Rizzolatti, G., & Craighero, L. (2004.) The mirror-neuron system. Annual Review of Neu­
roscience 27, 169–192.

Rodriguez-Ferreiro, J., Gennari, S. P., Davies, R., & Cuetos, F. (2011). Neural correlates of
abstract verb processing, Journal of Cognitive Neuroscience, 23, 106–118.

Rogers, T. T., Hocking, J., Mechelli, A., Patterson, K., & Price, C. (2005). Fusiform activa­
tion to animals is driven by the process, not the stimulus. Journal of Cognitive Neuro­
science, 17, 434–445.

Page 34 of 40
Semantic Memory

Rogers, T. T., Hocking, J., Noppeney, U., Mechelli, A., Gorno-Tempini, M., Patterson, K., &
Price, C. (2006). The anterior temporal cortex and semantic memory: Reconciling find­
ings from neuropsychology and functional imaging. Cognitive, Affective and Behavioral
Neuroscience, 6, 201–213.

Rogers, T. T., Ivanoiu, A., Patterson, K., & Hodges, J. R. (2006). Semantic memory in
Alzheimer’s disease and the frontotemporal dementias: A longitudinal study of 236 pa­
tients. Neuropsychology, 20, 319–335.

Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R.,
& Patterson, K. (2004). The structure and deterioration of semantic memory: A neuropsy­
chological and computational investigation. Psychological Review, 111, 205–235.

Rogers, T. T., & Patterson, K. (2007). Object categorization: Reversals and explanations of
the basic-level advantage. Journal of Experimental Psychology: General, 136, 451–469.

Rumiati, R. I., & Humphreys, G. W. (1998). Recognition by action: Dissociating visual and
semantic routes to action in normal observers. Journal of Experimental Psychology: Hu­
man Perception and Performance, 24, 631–647.

Saffran, E. M., Coslett, H. B., & Keener, M. T. (2003b). Differences in word associations to
pictures and words. Neuropsychologia, 41, 1541–1546.

Saffran, E. M., Coslett, H. B., Martin, N., & Boronat, C. (2003a). Access to knowl­
(p. 373)

edge from pictures but not words in a patient with progressive fluent aphasia. Language
and Cognitive Processes, 18, 725–757.

Saffran, E. M., & Schwartz, M. F. (1994). Of cabbages and things: Semantic memory from
a neuropsychological perspective—A tutorial review. In C. Umilta & M. Moscovitch (Eds.),
Attention and performance XV (pp. 507–536). Cambridge, MA: MIT Press.

Schreuder, R., Flores D’Arcais, G. B., & Glazenborg, G. (1984). Effects of perceptual and
conceptual similarity in semantic priming. Psychological Research, 45, 339–354.

Sevostianov, A., Horwitz, B., Nechaev, V., Williams, R., Fromm, S., & Braun, A. R. (2002).
fMRI study comparing names versus pictures for objects. Human Brain Mapping, 16, 168–
175.

Shimamura, A. P. (2000). The role of the prefrontal cortex in dynamic filtering. Psychobi­
ology, 28, 207–218.

Simmons, K., & Barsalou, L.W. (2003). The similarity-in-topography principle: Reconciling
theories of conceptual deficits. Cognitive Neuropsychology, 20, 451–486.

Simmons, W., Ramjee, V., Beauchamp, M., McRae, K., Martin, A., & Barsalou, L. (2007). A
common neural substrate for perceiving and knowing about color. Neuropsychologia, 45,
2802–2810.

Page 35 of 40
Semantic Memory

Sirigu, A., Duhamel, J. R., Poncet, M. (1991). The role of sensorimotor experience in ob­
ject recognition: A case of multimodal agnosia. Brain, 114, 2555–2573.

Snowden, J. S., Bathgate, D., Varma, A., Blackshaw, A., Gibbons, Z. C., & Neary, D. (2001).
Distinct behavioral profiles in frontotemporal dementia and semantic dementia. Journal of
Neurology, Neurosurgery, & Psychiatry, 70, 323–332.

Snowden, J. S., Goulding, P. J., & Neary, D. (1989). Semantic dementia: A form of circum­
scribed cerebral atrophy. Behavioural Neurology, 2, 167–182.

Snowden, J. S., Griffiths, H. L., & Neary, D. (1994). Semantic dementia: Autobiographical
contribution to preservation of meaning. Cognitive Neuropsychology, 11, 265–288.

Snowden, J. S., Griffiths, H. L., & Neary, D. (1996). Semanticepisodic memory interactions
in semantic dementia: Implications for retrograde memory function. Cognitive Neuropsy­
chology, 13, 1101–1137.

Snowden, J. S., Griffiths, H. L., & Neary, D. (1999). The impact of autobiographical experi­
ence on meaning: Reply to Graham, Lambon Ralph, and Hodges. Cognitive Neuropsychol­
ogy, 11, 673–687.

Spivey, M. J. (2007). The continuity of mind. New York: Oxford University Press.

Squire, L. R. (1987). Memory and brain. New York: Oxford University Press.

Squire, L. R., & Zola, S. M. (1998). Episodic memory, semantic memory, and amnesia. Hip­
pocampus, 8, 205–211.

Tanaka, J. M., & Presnell, L. M. (1999). Color diagnosticity in object recognition. Percep­
tion & Psychophysics, 61, 1140–1153.

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integra­
tion of visual and linguistic information in spoken language comprehension. Science, 268
(5217), 1632–1634.

Tarr, M. J., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual
processing automatized by expertise. Nature Neuroscience, 3, 764–769.

Taylor, L. J., & Zwaan, R. A. (2009). Action in cognition: The case of language. Language
and Cognition, 1, 45–58.

Thompson-Schill, S. (2003). Neuroimaging studies of semantic memory: Inferring how


from where. Neuropsychologia, 41, 280–292.

Thompson-Schill, S. L., Aguirre, G. K., D’Esposito, M., & Farah, M. J. (1999). A neural ba­
sis for category and modality specificity of semantic knowledge. Neuropsychologia, 37,
671–676.

Page 36 of 40
Semantic Memory

Thompson-Schill, S. L., Bedny, M., & Goldberg, R. F. (2005). The frontal lobes and the reg­
ulation of mental activity. Current Opinion in Neurobiology, 15, 219–224.

Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
prefrontal cortex in retrieval of semantic knowledge: A re-evaluation. Proceedings of the
National Academy of Sciences, 94, 14792–14797.

Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com­
petition on prefrontal activity during word generation. Neuron, 23, 513–522.

Thompson-Schill, S. L., Swick, D., Farah, M. J., D’Esposito, M., Kan, I. P., & Knight, R. T.
(1998). Verb generation in patients with focal frontal lesions: A neuropsychological test of
neuroimaging findings. Proceedings of the National Academy of Sciences, 95, 15855–
15860.

Trumpp, N., Kliese, D., Hoenig, K., Haarmaier, T., & Kiefer, M. (2013). A causal link be­
tween hearing and word meaning: Damage to auditory association cortex impairs the pro­
cessing of sound-related concepts. Cortex, 49, 474–486.

Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.),
Organization of memory (pp. 381–403). New York: Academic Press.

Tulving, E. (1991). Concepts of human memory. In L. Squire, G. Lynch, N. M. Weinberger,


& J. L. McGaugh (Eds.), Memory: Organization and locus of change (pp. 3–32). New York:
Oxford University Press.

Tyler, L. K., & Moss, H. E. (2001). Towards a distributed account of conceptual knowl­
edge. Trends in Cognitive Sciences, 5, 244–252.

Tyler, L. K., Moss, H. E., Durrant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual struc­
ture and the structure of concepts: a distributed account of category-specific deficits.
Brain & Language, 75, 195–231.

Tyler, L. K., Stamatakis, E. A., Bright, P., Acres, K., Abdalah, S., Rodd, J. M., & Moss, H. E.
(2004). Processing objects at different levels of specificity. Journal of Cognitive Neuro­
science, 16, 351–362.

Vandenberghe, R., Price, C., Wise, R., Josephs, O., & Frackowiak, R. S. J. (1996). Function­
al anatomy of a common semantic system for words and pictures. Nature, 383, 254–256.

Vargha-Khadem, F., Gadian, D. G., Watkins, K. E., Connelly, A., Van Paesschen, W.,
Mishkin, M. (1997). Differential effects of early hippocampal pathology on episodic and
semantic memory. Science 277, 376–380.

Walsh, V., & Pascual-Leone, A. (2003). Transcranial magnetic stimulation: A neurochrono­


metrics of mind. Cambridge, MA: MIT Press.

Page 37 of 40
Semantic Memory

Warrington, E. K. (1975). The selective impairment of semantic memory. Quarterly Jour­


nal of Experimental Psychology, 27, 635–657.

Warrington, E. K., & McCarthy, R. A. (1983). Category specific access dysphasia. Brain,
106, 869–878.

Warrington, E. K., & McCarthy, R. A. (1987). Categories of knowledge: Further fractiona­


tion and an attempted integration. Brain, 110, 1273–1296.

Warrington, E. K., & McCarthy, R. A. (1994). Multiple meaning systems in the


(p. 374)

brain: A case for visual semantics. Neuropsychologia, 32, 1465–1473.

Warrington, E. K., & Shallice, T. (1984). Category specific semantic impairments. Brain,
107, 829–854.

Weber, M., Thompson-Schill, S. L., Osherson, D., Haxby, J., & Parsons, L. (2009). Predict­
ing judged similarity of natural categories from their neural representations. Neuropsy­
chologia, 47, 859–868.

Weisberg, J., Turennout, M., & Martin, A. (2007). A neural system for learning about ob­
ject function. Cerebral Cortex. 17, 513–521.

Wiggett, A. J., Pritchard, I. C., & Downing, P. E. (2009). Animate and inanimate objects in
human visual cortex: evidence for task-independent category effects. Neuropsychologia,
47, 3111–3117.

Willems, R. M., Hagoort, P., & Casasanto, D. (2010). Body-specific representations of ac­
tion verbs: Neural evidence from right- and left-handers. Psychological Science, 21, 67–
74.

Wise, R. J., Chollet, F., Hadar, U., Friston, K., Hoffner, E., & Frackowiak, R. (1991). Distri­
bution of cortical neural networks involved in word comprehension and word retrieval.
Brain, 114 (Pt 4), 1803–1817.

Wise, R. J. S., Howard, D., Mummery, C. J., Fletcher, P., Leff, A., Buchel, C., & Scott, S. K.
(2000). Noun imageability and the temporal lobes. Neuropsychologia, 38, 985–994.

Witt, J. K., Kemmerer, D., Linkenauger, S. A., & Culham, J. (2010). A functional role for
motor simulation in naming tools. Psychological Science, 21, 1215–1219.

Wolk, D. A., Coslett, H. B., & Glosser, G. (2005). The role of sensory-motor information in
object recognition: Evidence from category-specific visual agnosia, Brain and Language
94, 131–146.

Wright, N. D., Mechelli, A., Noppeney, U., Veltman, D. J., Rombouts, S. A. R. B., Glensman,
J., Haynes, J. D., & Price, C. J. (2008). Selective activation around the left occipito-tempo­
ral sulcus for words relative to pictures: Individual variability or false positives? Human
Brain Mapping, 29, 986–1000.

Page 38 of 40
Semantic Memory

Yee, E., Chrysikou, E., Hoffman, E., & Thompson-Schill, S. L. (2013). Manual experience
shapes object representations. Psychological Science, 24 (6), 909–919.

Yee, E., Drucker, D. M., & Thompson-Schill, S. L. (2010). fMRI-adaptation evidence of


overlapping neural representations for objects related in function or manipulation. Neu­
roImage, 50, 753–763.

Yee, E., Hufstetler, S., & Thompson-Schill, S. L. (2011). Function follows form: Activation
of shape and function features during object identification. Journal of Experimental Psy­
chology: General, 140, 348–363.

Yee, E., & Sedivy, J. C. (2006). Eye movements to pictures reveal transient semantic acti­
vation during spoken word recognition. Journal of Experimental Psychology, Learning,
Memory, and Cognition, 32, 1–14.

Notes:

(1) . Linguists use the term semantic in a related, but slightly narrower way— to refer to
the meanings of words or phrases.

(2) . There is mounting evidence that the reverse may also be true: semantic memory has
been found to support episodic memory acquisition (Kan et al., 2009) and retrieval (Gra­
ham et al., 2000; Greve et al., 2007).

(3) . These ideas about the relationship between knowledge and experience echo those of
much earlier thinkers. For example, in “An Essay Concerning Human Understanding,”
John Locke considers the origin of “ideas,” or what we now refer to as “concepts,” such
as “whiteness, hardness, sweetness, thinking, motion, elephant …”, arguing: “Whence
comes [the mind] by that vast store, which the busy and boundless fancy of man has
painted on it with an almost endless variety? … To this I answer, in one word, From expe­
rience.” Furthermore, in their respective works on aphasia, Wernicke (1874) and Freud
(1891) both put forth similar ideas (Gage & Hickok, 2005).

(4) . Recall that the domain-specific hypothesis allows for distributed representations
within different categories.

(5) . Moreover (returning to the task effects discussed in 4.3), it has been suggested that
the presence or absence of direct overlap may reflect the existence of multiple types of
color representations that vary in resolution (or abstraction) with differences in task-con­
text influencing whether information is retrieved at a fine (high-resolution) level of detail
or a more abstract level. Retrieving high- (but not necessarily low-) resolution color
knowledge results in overlap with color perception regions (Hsu et al., 2011).

(6) . We use the word “epiphenomenal” here to remain consistent with the objections that
are sometimes raised in this literature; however, we note that the literal translation of the
meaning of this term (i.e., an event with no effectual consequence) may not be suited to

Page 39 of 40
Semantic Memory

descriptions of neural activity, which can always be described as having an effect on its
efferent targets.

(7) . Note that an analogous critique—and importantly, a response analogous to the one
that follows—could be made for any sensorimotor modality.

Eiling Yee

Eiling Yee is staff scientist at the Basque Center on Cognition, Brain and Language.

Evangelia G. Chrysikou

Evangelia G. Chrysikou, Department of Psychology, University of Kansas, Lawrence,


KS

Sharon L. Thompson-Schill

Sharon L. Thompson-Schill, Department of Psychology, University of Pennsylvania,


Philadelphia, PA

Page 40 of 40
Cognitive Neuroscience of Episodic Memory

Cognitive Neuroscience of Episodic Memory  


Lila Davachi and Jared Danker
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0018

Abstract and Keywords

This chapter examines the neural underpinnings of episodic memory, focusing on process­
es taking place during the experience itself, or encoding, and those involved in memory
reactivation or retrieval at a later time point. It first looks at the neuropsychological case
study of Henry Molaison to show how the medial temporal lobe (MTL) is linked to the for­
mation of new episodic memories. It also assesses the critical role of the MTL in general
and the hippocampus in particular in the encoding and retrieval of episodic memories.
The chapter then describes the memory as reinstatement model, episodic memory re­
trieval, the use of functional neuroimaging as a tool to probe episodic memory, and the
difference in memory paradigm. Furthermore, it discusses hippocampal activity during
encoding and its connection to associative memory, activation of the hippocampus and
cortex during episodic memory retrieval, and how hippocampal reactivation mediates cor­
tical reactivation during memory retrieval.

Keywords: episodic memory, medial temporal lobe, Henry Molaison, memory retrieval, encoding, memory reacti­
vation, hippocampus, memory as reinstatement model, functional neuroimaging, associative memory

Introduction
What did she say? Where did you go? What did you eat? The answers to questions like
these require that you access previous experiences, or episodes of your life. Episodic
memory is memory for our unique past experiences that unfolded in a particular time and
place. By contrast, other forms of memory do not require that you access a particular
time and place from the past, such as knowing what a giraffe is (semantic memory) or
knowing how to ride a bike (procedural memory). Thus, episodic memory is, in a sense, a
collection of our personal experiences, the past that still resides inside of us and makes
up much of the narrative of our lives.

In fact, the distinction between episodic remembering and semantic knowing was de­
scribed by Tulving (1972): “Episodic memory refers to memory for personal experiences
and their temporal relations, while semantic memory is a system for receiving, retaining,

Page 1 of 26
Cognitive Neuroscience of Episodic Memory

and transmitting information about meaning of words, concepts, and classification of


concepts.” (p. 401–402). And when William James (1890) described memory, he was think­
ing of episodic memory in particular. “Memory proper, or secondary memory as it might
be styled, is the knowledge of a former state of mind after it has already once dropped
from consciousness; or rather it is the knowledge of an event, or fact, of which meantime
we have not been thinking, with the additional consciousness that we have thought or ex­
perienced it before” (p. 610).

Figure 18.1 Memory as reinstatement (MAR) model.


During an experience, the representation of that ex­
perience in the brain is a distributed patter of brain
activation across cortical and subcortical regions
that represent our sensory-perceptual experience,
actions, and cognitive and emotional states. When an
experience is encoded, it is theorized that changes in
the connections within the hippocampus and be­
tween the hippocampus and the cortex serve to bind
together the different aspects of an experience into a
coherent episodic memory (A). Subsequently, when a
partial cue activates part of the cortical representa­
tion (B), these encoding-related changes in neural
connections enable activation to spread to the hip­
pocampus, and via pattern completion, to reactivate
the original pattern of activity first in the hippocam­
pus (C) and then in the cortex (D). It is theorized that
this sequence of neural events supports episodic
memory retrieval.

In this chapter, we describe the current status of our understanding of the neural under­
pinnings of episodic memory, focusing on processes taking place during the experience it­
self, or encoding, and those involved in reactivating or retrieving memories (p. 376) at a
later time point. To facilitate and organize the discussion, we present a current model of
memory formation and retrieval that, perhaps arguably, has received the most support to
date (Figure 18.1). At the very least, it is the model that motivates and shapes the design
of most current investigations into episodic memory processing in the brain.

Page 2 of 26
Cognitive Neuroscience of Episodic Memory

Setting the Stage: Early Insights from Patient


Work
It is wise to start a discussion of memory with the most well-known neuropsychological
case study of Henry Molaison, famously known as patient H.M. At the young age of 27
years, H.M. opted for surgical removal of his medial temporal lobe (MTL) bilaterally to re­
lieve the devastating epilepsy with which he had suffered since he was a young child. The
surgery was a success in its intended goal: H.M.’s seizures had been relieved. However,
there was a devastating unintended consequence. H.M. was not able to form any new
episodic memories after the removal. This inability to form new memories is known as an­
terograde amnesia. The specificity of the deficit was remarkable. His intelligence was in­
tact, and he appeared to have normal working memory and skill learning, but he was con­
fined to living in the moment because the present disappeared into the past without a
trace.

With this one case, it became clear that the MTL is absolutely critical for the formation of
new episodic memories. Although there is some debate regarding the extent of presurgi­
cal memory loss (retrograde amnesia) that he exhibited and whether H.M. was able to
learn new semantic information, it is well agreed that his main deficit was the inability to
form new episodic memories. Thus, this major discovery set the stage for systems neuro­
scientists to begin to examine the MTL and how it contributes to episodic memory forma­
tion and retrieval.

The MTL regions damaged in this now-famous surgery were the hippocampus as well as
portions of the underlying cortex: entorhinal, perirhinal, and posterior parahippocampal
cortices. Since this discovery, it has been demonstrated that other patients (p. 377) with
damage to the hippocampus and MTL cortical structures also exhibit anterograde amne­
sia. Of current focus and highlighted in this chapter is the critical role of the MTL in gen­
eral and the hippocampus in particular in the encoding and retrieval of episodic memo­
ries.

Memory as Reinstatement Model


The memory as reinstatement (MAR) model presented in Figure 18.1 represents a combi­
nation of current theory and knowledge regarding how episodic memories are formed
and subsequently accessed. It consists of elements drawn from many different models,
and should not be considered a new model but rather a summary of the common ele­
ments of many existing models (e.g., Alvarez & Squire et al., 1994; McClelland et al.,
1995; Moscovitch et al., 2005; Norman & O’Reilly, 2003). We will first present the model,
and following that, we will discuss the aspects of the model for which strong evidence ex­
ists and point out aspects of the model that are still somewhat lacking in empirical sup­
port.

Page 3 of 26
Cognitive Neuroscience of Episodic Memory

During an experience (see Figure 18.1A), the representation of that experience in the
brain is characterized by distributed cortical and subcortical patterns of neural activation
that represent our sensory-perceptual (visual, auditory, somatosensory) experience, ac­
tions, internal thoughts, and emotions. Thus, in a sense, the brain is processing and rep­
resenting the current context and state of the organism. In addition, it is thought that the
distributed pattern of cortical firing filters into the MTL and converges in the hippocam­
pus, where a microrepresentation of the current episode is created. The connections be­
tween neurons that make up the hippocampal representation for each episode are
thought to strengthen through long-term potentiation (LTP). The idea is that the connec­
tions between neurons that are simultaneously active in the hippocampus are more likely
to become strengthened compared with those that are not (Davachi, 2004; Hebb, 1949).
Importantly, the LTP-mediated strengthening can happen over time both immediately af­
ter the experience and during post-encoding sleep (Diekelmann & Born, 2010; Ellenbo­
gen et al, 2007). Thus, what results from a successfully encoded experience is a hip­
pocampal neural pattern (HNP) and a corresponding cortical neural pattern (CNP). Im­
portantly, what differentiates the two patterns is that the HNP is thought to contain the
critical connections between representations that allow the CNP to be accessed later and
be attributed to a particular time and place, that is, an episodic memory. Thus, without
the HNP, it is difficult to recover the precise CNP associated with a prior experience.

Episodic memory retrieval encompasses multiple stages of processing, including cue pro­
cessing (see Figure 18.1B), reinstatement of the HNP (see Figure 18.1C), and finally rein­
statement of the CNP (see Figure 18.1D). Cue processing simply refers to the fact that re­
trieval is often cued by an external stimulus or internal thought whose representation is
supported by cortical regions (visual cortex, auditory cortex, etc.). In addition, the cue is
thought to serve as one key that might unlock the neural patterns associated with the pri­
or experience or episode that contained that stimulus. Specifically it is thought that dur­
ing successful retrieval, the retrieval cue triggers what has been referred to as hippocam­
pal pattern completion (see Figure 18.1C). Pattern completion refers to the idea that a
complete pattern (a memory) can be reconstructed from part of the pattern (a partial
cue). In other words, a part of or the entire HNP that was established during encoding
and subsequently strengthened becomes reinstated. Finally, the reinstatement of the HNP
is then thought to reinstate aspects of the CNP (see Figure 18.1D), resulting in the con­
current reactivation of disparate cortical regions that were initially active during the ex­
perience. Importantly, it is this final stage of reactivation that is thought to underlie our
subjective experience of remembering and, in turn, drive mnemonic decision making.

Development of Functional Neuroimaging as a


Tool to Probe Episodic Memory
Before turning to the data generated by brain imaging studies of episodic memory, it is
important to consider the behavioral approaches used to measure episodic recovery. As
emphasized in Tulving’s encoding specificity principle (Tulving, 1973), in order to fully

Page 4 of 26
Cognitive Neuroscience of Episodic Memory

understand episodic memory, it is critical to consider the interaction between encoding


and retrieval contexts (c.f. Morris et al., 1977; Roediger, 2000). More specifically, al­
though the probability that an event will be later remembered critically depends on
processes taking place during the experience, or encoding, consideration of the encoding
conditions alone cannot determine whether an event will later be recalled. For example,
while you may not be able to recall what you ate last night for dinner, you may be able to
recognize it from a list of possible alternatives. This is an example in which the ability to
retrieve a memory depends on how (p. 378) memory is tested; likewise, different encoding
conditions also critically influence whether memory will be accessible on a subsequent
test. Thus, although we later discuss patterns of activation during encoding and retrieval
separately, it should be understood up front that the two interact and one cannot be com­
pletely understood without the other.

Why use imaging to understand memory? As noted by Tulving (1983), equivalent memory
output responses can arise from fundamentally different internal cognitive states. Thus
measures such as memory accuracy and response time, although important, are limited
with respect to disentangling the contributing underlying cognitive processes that sup­
port memory formation and retrieval. Furthermore, the current MAR model specifically
posits that representations and processes characterizing an encoding event are at least
partially reinstated during later successful retrieval. Examination of the veracity of this
model is greatly facilitated by the ability to use the multivariable information contained in
a functional image of brain activity during encoding and retrieval. In other words, one
can ask whether a specific encoding pattern of brain activity (across hundreds of voxels)
is reinstated during retrieval, and whether it relates to successful recovery of episodic de­
tails. This is arguably much more efficient than, for example, asking whether subjects re­
activate a particular representation during encoding. As evidence discussed further in
this chapter demonstrates, although functional imaging of episodic memory has laid the
groundwork for characterizing episodic encoding and retrieval in terms of distributed
neural systems and multivariate patterns of activity across subcortical and cortical re­
gions, important questions about the functional roles of cortical regions comprising these
systems and their interactions still remain.

Early Insights into Episodic Encoding Using


Functional Imaging
How does experience get transformed into a long-lasting trace that can later be ac­
cessed? Surely, one needs to perceive and attend to aspects of an event in order for it to
even have a chance of being encoded into a lasting memory. However, as we also know,
we do not remember everything we see or attend to, so these factors may be necessary,
but they are not sufficient. Therefore, in addition to perceptual and attentive processes,
there must exist a set of further mechanisms that ensures the longevity of the stimulus or
event features. These mechanisms have collectively been referred to as encoding mecha­
nisms(Davachi, 2006), or processing that is related to the formation of an enduring mem­

Page 5 of 26
Cognitive Neuroscience of Episodic Memory

ory trace. It is precisely these mechanisms that are thought to be lost in patients with am­
nesia, who are capable of attending to stimuli but arguably never form lasting episodic
memories after the onset of their amnesia. It is noteworthy, however, that recent work
has demonstrated that patients with amnesia also appear to have deficits in novel associa­
tive processing and imagination (Addis et al, 2007; Hassabis et al, 2007; but see Squire et
al, 2010).

Neuroimaging has been used in a multitude of ways to understand the underlying neural
mechanisms supporting episodic memory. Early approaches were directly motivated by
the levels of processing framework that demonstrated that differential processing during
encoding modulates memory formation (Craik & Lockhart, 1972). Thus, the earliest neu­
roimaging studies examined brain activation during cognitive tasks that required seman­
tic or associative processing (Fletcher et al., 1998; Montaldi et al., 1998; Rombouts et al.,
1997; Shallice et al., 1994). This approach was targeted because it was known that se­
mantic or elaborative/associative processing leads to better later memory, and determin­
ing what brain systems were activated during this kind of processing could help to illumi­
nate the neural substrates of successful episodic encoding. Results from these initial task-
level investigations revealed that activation in left lateral prefrontal cortex (PFC), and the
MTL was enhanced during semantic processing of study items relative to more superficial
processing of those items (Fletcher et al., 1998; Montaldi et al., 1998; Rombouts et al.,
1997; Shallice et al., 1994). Involvement of left PFC in semantic retrieval processes has
been observed consistently across many paradigms (Badre et al., 2005; Fiez, 1997; Pe­
tersen et al., 1989; Poldrack et al., 1999; Thompson-Schill et al., 1997; Wagner et al.
2001).

Other approaches used to reveal the neural substrates of episodic memory formation
have compared brain activation to novel versus familiar stimuli within the context of the
same task. The logic in these studies is that, on average, we are more likely to encode
novel stimuli compared with familiar ones (Tulving et al., 1994). The results of this ap­
proach again demonstrated greater activation in the MTL to novel compared with familiar
stimuli (Dolan & Fletcher, 1997; Gabrieli et al., 1997; Stern et al., 1996), suggesting that
processes in these regions are related to the encoding of novel information into memory.

One of the early methodological breakthroughs in functional magetic resonance


(p. 379)

imaging (fMRI) that enabled a tighter linking between brain activation and episodic en­
coding was the measurement of trial-by-trial estimates of blood-oxygen-level-dependent
(BOLD) activation, compared with earlier PET and fMRI block designs. Accordingly, at
present, most studies of memory now employ event-related fMRI designs and, thus, mea­
sure brain activation and patterns of activity across multiple voxels on individual trials.
The advantage of measuring trial-by-trial estimates is that brain activation during the ex­
periencing of events that are later remembered can be directly contrasted with activity
for events that are not remembered. This approach has been referred to as the difference
in memory (DM) paradigm (Paller et al., 1987; Rugg, 1995; Sanquist et al., 1980; Wagner
et al., 1999). This paradigm affords better experimental control because events yielding
successful and unsuccessful memory encoding can be compared within the same

Page 6 of 26
Cognitive Neuroscience of Episodic Memory

individual performing the same task. Also, brain activity during encoding can be related
to a variety of memory outcomes, measured by different retrieval tests. For example, one
can determine whether each presented item was or was not remembered and whether,
for example, some contextual detail was also recovered during remembering. The varying
memory status of individual events can be used to query the brain data to determine what
brain areas show patterns of activation relating to successful memory formation and the
recovery of contextual details.

Initial groundbreaking studies used this powerful DM approach to reveal brain regions
important for successful memory formation using fMRI data. Wagner et al. (1998)
demonstrated that brain activation in posterior parahippocampal cortex and left PFC dur­
ing semantic processing of words was greater during processing of words that partici­
pants later successfully recognized with high confidence. At the same time, Brewer et al.
(1998) found that activation in right PFC and bilateral parahippocampal gyrus during the
viewing of scene images correlated with later subjective ratings of memory using the re­
member/know paradigm.

Taken together, these two groundbreaking studies revealed two important principles of
episodic memory formation. First, brain activation during encoding depends, in some
brain regions, on the content of the stimulus itself, with the lateralized PFC DM effects
for verbal and visual-spatial stimuli seen in these two studies nicely aligning with existing
work showing that left inferior frontal gyrus is important in semantic processing (Pol­
drack et al., 1999), whereas pictorial stimuli have been shown to engage right PFC to a
greater extent (Kelley, 1998). Thus, successful encoding is related to enhanced activation
in a subset of brain regions engaged during stimulus and task processing.

The second principle that was evident in these early studies but not directly examined un­
til more recently is the notion of a specialized system or a domain-general mechanism un­
derlying episodic memory formation. Both of these initial studies (and almost every study
performed since) found that activation within the MTL correlates with successful episodic
memory formation. However, how subregions within the MTL differentially contribute to
episodic encoding is still not known and is debated both in the animal (Eichenbaum et al.,
2012; Meunier et al., 1993; Parkinson et al., 1988; Zola-Morgan & Squire, 1986) and hu­
man literature (Davachi et al., 2003; Jackson & Schacter, 2004; Kirwan & Stark, 2004;
Mayes et al., 2004; Squire et al., 2004; Stark & Squire, 2001, 2003; for reviews, see
Davachi, 2006; Diana et al., 2007; Eichenbaum et al., 2007; Ranganath et al, 2010; Wixted
& Squire, 2011). The next section provides further discussion focusing on the role of the
hippocampus in associative encoding, presumably by laying down a strong HNP (see Fig­
ure 18.1A) that can later be accessed in the context of an appropriate retrieval cue.

Page 7 of 26
Cognitive Neuroscience of Episodic Memory

Hippocampal Activity During Encoding Pre­


dicts Later Associative Memory
The hippocampus receives direct input from medial temporal lobe cortical regions: the
entorhinal, perirhinal (PRc), and parahippocampal (PHc) cortices, each of which receives
a distinct pattern of inputs from other neocortical and subcortical regions. However, the
PHc projects strongly into PRc, so there is clearly an interplay between these regions as
well as between these regions and the hippocampus. Most researchers agree that MTL
subregions likely contribute to episodic memory in a distinct way; however, the precise
nature of this division remains unclear. That said, a number of experiments have been
performed in the past 10 years using broadly similar designs and analysis approaches,
and a consistent picture is emerging. In particular, studies have been designed to differ­
entiate patterns of brain activation during encoding that predict successful item memory
from those that predict the recovery of associated items, context, or source. These experi­
ments were fueled by (p. 380) a leading influential model of MTL function that posits that
item and relational encoding are supported by distinct, yet complementary, learning sys­
tems implemented within the hippocampus and perirhinal cortex (Marr, 1971; McClelland
et al., 1995; Norman & O’Reilly, 2003; O’Reilly & Rudy, 2000). For example, providing evi­
dence for a distinct role in associative encoding, many studies have shown that the mag­
nitude of encoding activation in the hippocampus is predictive of whether participants
will later remember the contextual associations from each trial (Davachi et al., 2003; Dou­
gal et al., 2007; Hannula & Ranganath, 2008; Kensinger & Schacter, 2006; Kirwan &
Stark, 2004; Park & Rugg, 2011; Ranganath et al., 2004; Staresina & Davachi, 2008, 2009;
Uncapher et al., 2006; Yu et al, 2012; but see Gold et al., 2006). Furthermore, in many of
these same studies and others, PRc activation during encoding was shown to be related
to whether items were later recognized, regardless of whether additional contextual de­
tails were also available at the time of retrieval (Davachi et al., 2003; Dougal et al., 2007;
Haskins et al, 2008; Kensinger & Schacter, 2006; Kirwan & Stark, 2004; Ranganath et al.,
2004; Staresina & Davachi, 2008, 2009). Taken together, these data suggest a division of
labor across MTL regions in their respective contributions to item and associatice memo­
ry formation.

Interestingly, these distinctions between encoding mechanisms in PRc and hippocampus


correspond with similar distinctions from single cell recordings in animals (Brown & Ag­
gleton, 2001; Eichenbaum et al., 2010; Komorowski et al., 2009; Sauvage et al., 2008). Ad­
ditionally, there is notable evidence from human patient work that damage to the hip­
pocampus disproportionately impairs recollection, compared with item recognition based
on familiarity (Giovanello et al., 2003; Spiers et al., 2001; Vann et al., 2009; Yonelinas et
al., 2002; but see Wixted & Squire, 2004). Hippocampal damage is more common, but pa­
tients with damage to PRc, but not hippocampus, are very rare. In one seminal report,
however, a woman with anterior temporal lobe resection that spared hippocampus but re­
moved the left perirhinal cortex revealed an interesting behavioral pattern in her memory
performance. Specifically, she showed a higher than average propensity to recollect with
little evidence of familiarity-based memory (Bowles et al., 2007). This finding is critical
Page 8 of 26
Cognitive Neuroscience of Episodic Memory

because it appears to be consistent with the growing body of literature that PRc mecha­
nisms are important for knowing that an item has previously occurred even when you
cannot remember in what particular episodic context.

One important advance, fueled by observations that PRc and PHc are sensitive to differ­
ent stimulus classes (e.g., scenes versus objects), is that the MTL cortex may contribute
to domain-specific encoding of object and scene-like, or contextual, details, whereas the
hippocampus may be important in domain-general binding together of these various dis­
tinct episodic elements (Davachi, 2006). First, it has been demonstrated that PRc re­
sponds more to objects and faces than scenes and that PHc shows the opposite response
pattern: greater activation to scenes than objects and faces (Liang et al., 2013; Litman et
al., 2009). Second, when study items were scenes and the associated context was devised
to be one of six repeating objects, it was seen that PRc enoding activation now predicted
the later recovery of the associated objects (a form of context), whereas succesful scene
memory (a form of item memory) was supported by PHc (Awipi & Davachi, 2008). Third,
it was shown that both hippocampal and PRc activation predicted whether object details
were later recalled, whereas only hippocampal activation additionally predicted whether
other contextual details were later recovered (Staresina & Davachi, 2008; see also Park &
Rugg, 2011). Finally, in a tightly controlled study in which study items were always words
but participants treated the word either as a cue to imagine an object or a cue to imagine
a scene, it was shown that PRc activation predicted later source memory for the object-
imagery trials and that PHc activation predicted later source memory for the scene-im­
agery trials (Staresina et al., 2011). Taken together, it is clear that involvement of MTL
cortex in encoding is largely dependent on the content of the episode and on what aspects
of the episode are attended. By contrast, across all of the aforementioned studies and a
whole host of other experiments, hippocampal activation appears to selectively predict
whether associated details are later recovered, irrespective of the context of those details
(Awipi & Davachi, 2008; Park et al., 2012; Prince et al., 2005; Rugg et al., 2012; Staresina
& Davachi, 2008; Staresina et al, 2011). These results bring some clarity to the seemingly
inconsistent findings that activation in the PHc has been shown to both correlate with lat­
er item (Davachi & Wagner, 2002; Eldridge et al., 2000; Kensinger et al., 2003) and asso­
ciative memory (Awipi & Davachi, 2008; Cansino et al., 2002; Davachi et al., 2003; Kirwan
& Stark, 2004; Ranganath et al., 2004; Staresina et al., 2011) across different paradigms.
It is likely that the role of PRc (p. 381) and PHc in item versus associative encoding will
vary depending on the nature of the stimuli being treated as the “item” and the “con­
text” (Staresina et al., 2011).

It is important to note that although the studies cited above provide strong evidence for a
selective role of the hippocampus in binding episodic representations so that they can be
later retrieved, another proposal has recently emerged linking hippocampal processes
with establishing “strong” memories—both strongly “recollected” and strongly “famil­
iar” (Kirwan et al, 2008; Shrager et al., 2008; Song et al., 2011; see Hayes et al., 2011 for
support for both accounts). This account is not necessarily in conflict with the aforemen­
tioned notion that PRc is important in both item encoding and item–feature binding. How­
ever, it does raise questions about the often-used dichotomy linking PRc and hippocampal
Page 9 of 26
Cognitive Neuroscience of Episodic Memory

function with the subjective sense of “knowing” and “remembering.” It is also important
to keep in mind that the paradigms being used to distinguish item from associative, re­
membering from knowing, and low from high memory strength are all going to suffer
from ambiguities in the interpretion of each specific condition. For example, high confi­
dence recognition can be associated with the recovery of all kinds of episodic detail.
Thus, if you only ask for one detail and a participant fails to recover that detail, the par­
ticipant might actually be recollecting other noncriterial episodic details. Furthermore, it
is unclear what underlying operations, or information processing, is being proposed to
support the memory strength theory. This is in contrast to the complementary learning
systems approach, which is strongly grounded in how underlying processing within PRc
and hippocampus can come to support item and associative encoding.

Thus, taken together, current functional imaging results strongly suggest that greater
hippocampal activation during the encoding of an event is correlated with the later recov­
ery of the details associated with that event. However, very little has been done to identi­
fy whether a specific HNP needs to be reinstated or completed in order to allow for recov­
ery of episodic details. Instead there have been recent reports that the level of reactiva­
tion in a region of interest can correlate with later memory. These results have specifical­
ly been seen in work examining post-encoding rest and sleep periods (Rasch et al., 2007;
Peigneux et al., 2004; 2006; Tambini et al., 2010). It is assumed that overall hippocampal
BOLD activation during encoding may thus be a good proxy for laying down a strong
HNP. However, this assumption needs to be tested.

Hippocampus Activates During Episodic Re­


trieval
According to the model presented in Figure 18.1, presentation of a partial cue during
episodic retrieval reactivates the original memory trace in cortex through pattern comple­
tion processes in the hippocampus. Therefore, the hippocampus is the critical hub that
connects cue processing to cortical reinstatement during episodic retrieval, and hip­
pocampal activation should be a necessary component of episodic retrieval.

Consistent with this idea, the hippocampus has been found to activate during retrieval,
specifically when the retrieval appears to be episodic in nature (i.e., characterized by the
recovery of associations or contextual details). A number of measures have been used to
isolate episodic from nonepisodic retrieval. For example, the remember/know paradigm is
a recognition memory paradigm in which participants must distinguish among studied
items that are recollected with episodic details (remember), studied items that are merely
familiar (know), and unstudied items (new). Studied items endorsed as “remembered” are
associated with greater hippocampal activity during retrieval than studied items en­
dorsed as “known” or “new” (Eldridge et al., 2000; Wheeler & Buckner, 2004; see also,
Daselaar et al., 2006; Yonelinas et al., 2005). Another way to isolate episodic retrieval is
to compare the successful retrieval of an association to successful item recognition in the
absence of successful associative retrieval. The hippocampus has been found to be more
Page 10 of 26
Cognitive Neuroscience of Episodic Memory

active during associative retrieval than during nonassociative retrieval (Dobbins et al,
2003; Kirwan & Stark, 2004; Yonelinas et al., 2001). In addition, hippocampal activation
during autobiographical memory retrieval has been found to correlate with subjective re­
ports of a number of recollective qualities, including detail, emotionality, and personal
significance (Addis et al., 2004). The current evidence is consistent with the notion that
the hippocampus is driven by the recovery of associations or episodic details.

The MAR model posits that episodic retrieval does not just activate hippocampus general­
ly, but also specifically reactivates through pattern completion the same hippocampal
neurons (the HNP) that were active during the encoding episode. Because of methodolog­
ical obstacles, there are currently no fMRI studies demonstrating that the hippocampus
(p. 382) reinstates encoding activity during retrieval (but see the later section, Hippocam­

pal Reactivation Mediates Cortical Reactivation: An Open Question). However, in a land­


mark study using intracranial single-cell recordings in epileptic patients, Gelbard-Sagiv
and colleagues (2008) presented compelling evidence that hippocampal neurons reacti­
vate during episodic retrieval. Gelbard-Sagiv et al. recorded from neurons while partici­
pants viewed and subsequently freely recalled a series of short movie clips. A subset of
hippocampal neurons responded selectively to specific movie clips during viewing. Dur­
ing subsequent free recall, hippocampal neurons selectively activated during viewing of a
particular clip reactivated immediately preceding verbal recall of that clip. For example,
one hippocampal neuron fired specifically during the viewing of a clip from “The Oprah
Winfrey Show.” That same neuron fired immediately before verbal recall of the same clip.
In contrast to hippocampal neurons, anterior cingulate neurons demonstrated selectivity
during viewing but did not reactivate during recall. This study demonstrated that hip­
pocampal neurons can indeed reactivate during episodic retrieval.

Cortex Reactivates During Episodic Retrieval


In the MAR model presented in Figure 18.1, presentation of a partial cue during episodic
retrieval reactivates the CNP through pattern completion processes in the hippocampus.
In the course of this chapter, we have already discussed and elaborated on evidence sup­
porting the role of the hippocampus in memory encoding (see Figure 18.1A) and retrieval
(see Figure 18.1C). We will now summarize and discuss evidence that partial cues reacti­
vate cortex during retrieval (see Figure 18.1D).

In contrast to MTL damage, which results in anterograde amnesia, damage to posterior


cortical regions often leads to an inability to retrieve previously learned information (ret­
rograde amnesia; for a review of data demonstrating that impaired perception is often ac­
companied by impaired mental imagery, see Farah, 1988). According to Greenberg and
Rubin (2003), memory deficits due to posterior cortical damage are usually limited to the
cognitive processes affected by the impairment. That is, particular components of memo­
ries that are supported by the damaged cortex may be rendered lost or inaccessible. For
example, individuals with damage to auditory cortex might experience memories without
sound. However, in some cases, damage to posterior cortical regions can lead to more

Page 11 of 26
Cognitive Neuroscience of Episodic Memory

global forms of retrograde amnesia. This kind of global memory impairment would be ex­
pected if the damaged cortex represented a large or crucial component of many episodic
memories. This appears to be the case in some individuals with damage to visual cortex
(Rubin & Greenberg, 1998). In contrast to amnesia caused by MTL lesions, amnesia
caused by damage to the perceptual systems seems to be predominantly retrograde in na­
ture. Thus, it appears that the cortex is pivotal in representing the contents of memory
that are reactivated during remembering.

Strong support for cortical reactivation also requires evidence that regions of cortex that
are activated directly by some stimulus during encoding can be indirectly reactivated by
an associate of that stimulus (i.e., the partial cue in Figure 18.1B) during retrieval. In the
past decade, substantial evidence drawn from functional neuroimaging studies of memo­
ry has supported the idea that the presentation of partial cues can lead to the reactiva­
tion of cortex during episodic retrieval. These studies have largely converged on a single
paradigm, which we describe in detail here. This paradigm relies critically on the associa­
tion of neutral retrieval cues with different kinds of associative or contextual information,
such that the stimuli that engage different regions of cortex during encoding are re­
trieved but not actually presented during retrieval. During encoding, brain activity is
recorded while participants encounter and build associations between neutral stimuli
(e.g., words) and multiple categories of stimuli that evoke activity in different brain re­
gions (e.g., pictures vs. sounds). During retrieval, brain activity is recorded while partici­
pants are presented with the neutral stimuli as retrieval cues and are instructed to make
decisions about their memory of the cue or its associates. For example, participants might
make a decision about whether the cue was studied or not (recognition decision), how
well they remember encoding the cue (remember/know decision), or how the cue was en­
coded or what its associates were (cued recall/source decision).

Early studies using this paradigm, relying on what we will refer to as the region of inter­
est (ROI) approach, demonstrated that circumscribed cortical regions that are differen­
tially engaged during the encoding of different kinds of associations are also differentially
engaged during their retrieval. For example, in an event-related fMRI study, Wheeler, Pe­
tersen, and Buckner (2000) had participants associate words (e.g., dog) with either corre­
sponding sounds (“WOOF!”) or corresponding pictures (e.g., a picture of a dog) during
encoding. During (p. 383) subsequent retrieval, participants were presented with each
studied word and instructed to indicate whether it was studied with a sound or picture.
Wheeler et al. (2000) found that the fusiform gyrus, a region in the visual association cor­
tex that is preferentially activated by pictures compared with sounds during encoding, is
also more strongly activated during cued picture retrieval than cued sound retrieval.
They also found that Heschl’s gyrus, a region in the auditory association cortex that is
preferentially activated by sounds compared with pictures during encoding, is more
strongly activated during cued sound retrieval than cued picture retrieval. That is, re­
gions that are activated during sound and picture encoding are reactivated during sound
and picture retrieval. This finding has been replicated across studies for both pictures

Page 12 of 26
Cognitive Neuroscience of Episodic Memory

(Vaidya et al., 2002; Wheeler & Buckner, 2003; Wheeler & Buckner, 2004; Wheeler et al.,
2006) and sounds (Nyberg et al., 2000).

In addition, the specificity of reactivation has been demonstrated in several studies that
found that the retrieval of different visual categories evokes activity in stimulus-selective
regions of visual cortex. For example, the ventral and dorsal visual processing streams,
which process object information and location information, respectively (Ungerleider &
Mishkin, 1982), have been shown to reactivate during the retrieval of object and location
information (Khader et al., 2005). Similarly, the fusiform face area (FFA) and parahip­
pocampal place area (PPA), two regions in the ventral visual stream that have been shown
to respond preferentially to visually presented faces and scenes, respectively (Epstein &
Kanwisher, 1998; Kanwisher et al., 1997), have correspondingly been shown to reactivate
during the retrieval of faces and places (O’Craven & Kanwisher, 2000; Ranganath et al.,
2004; see also Danker, Fincham, & Anderson, 2011). In a series of studies, Slotnick and
colleagues demonstrated that even regions very early in the visual processing stream re­
activate during retrieval of the appropriate stimulus: Color retrieval reactivates color pro­
cessing region V8 (Slotnick, 2009a), retrieval of items in motion reactivates motion pro­
cessing region MT+ (Slotnick & Thackral, 2011), and retrieval of items presented to the
right or left visual field reactivates the contralateral area of extrastriate cortex (BA 18) in
a retinotopic manner (Slotnick, 2009b). These studies demonstrate that different process­
ing modules within the visual system are reactivated during the retrieval of specific kinds
of visual information (for a more in-depth discussion, see Danker & Anderson, 2010).

In recent years, a new approach known as classification or multivoxel pattern analysis


(MVPA, Haxby et al., 2001; Mitchell et al., 2004) has become popular for investigating the
reactivation of encoding representations during retrieval. In contrast to the ROI ap­
proach, which is sensitive to overall activity differences between conditions within a re­
gion (i.e., a group of contiguous voxels), the classifier approach is sensitive to differences
in the pattern of activity across voxels between conditions. In a typical classification
study, a computer algorithm known as a classifier (e.g., a neural network) is trained to
differentiate the pattern of activity across voxels between two or more conditions. The
logic behind applying the classifier approach to study episodic memory is as follows: If
partial cues reactivate cortex during retrieval, then the pattern of cortical activity within
a particular condition during cued retrieval should resemble, at least partially, the pat­
tern of activity within that condition during encoding. Therefore, a classifier trained to
differentiate conditions on encoding trials should also be able to classify retrieval trials at
above chance accuracy. Greater similarity between corresponding encoding and retrieval
patterns will be reflected in greater classifier accuracy.

Polyn and colleagues (2005) were the first to apply the classification method to the study
of memory in this manner. Participants studied lists containing photographs of famous
faces, famous locations, and common objects, and subsequently retrieved as many list
items as possible in a free recall paradigm (i.e., no cues were presented). Polyn et al.
trained classifiers to differentiate between encoding trials in the three conditions, and
tested the classifiers using the free recall data. Consistent with their predictions, Polyn et

Page 13 of 26
Cognitive Neuroscience of Episodic Memory

al. found that the reactivation of a given stimulus type’s pattern of activity correlated with
verbal recall of items of that stimulus type. It is worth noting that the voxels that con­
tributed to classification decisions overlapped with, but were not limited to, the category-
selective regions that one would expect to find using the ROI approach (i.e., the FFA and
PPA). Polyn et al. present their technique of applying MVPA to memory retrieval as “a
powerful new tool that researchers can use to test and refine theories of how people mine
the recesses of the past” (p. 1966).

As mentioned earlier, the retrieval of associations and contexts is one of the hallmarks of
episodic retrieval. Insofar as episodic retrieval is characterized by cortical reactivation,
we should expect greater reactivation when episodic details are recovered (p. 384) during
retrieval. Consistent with this, the earliest studies to find reactivation of encoding regions
during retrieval required associative retrieval (Nyberg et al., 2000; Wheeler et al., 2000).
Along the same lines, reactivation correlates with subjective reports of the recovery of
episodic details. In studies using the remember/know paradigm, items endorsed as re­
membered have been found to evoke more reactivation than items endorsed as known us­
ing both the ROI (Wheeler & Buckner, 2004) and classifier (Johnson et al., 2009) ap­
proaches. Along the same lines, Daselaar et al. (2008) found that the degree of activation
in auditory and visual association cortex was positively correlated with participant rat­
ings of reliving during autobiographical memory retrieval, suggesting that reactivation
correlates with the number or quality of retrieved details. Overall, the current evidence
suggests that reactivation correlates with subjective ratings of episodic retrieval.

It is often the case that a particular retrieval cue is associated with multiple episodes. Re­
trieval becomes more difficult in the presence of competing associations (Anderson,
1974), and this is often reflected in increased prefrontal and anterior cingulate cortex
(ACC) involvement during retrieval (e.g., Danker, Gunn, & Anderson, 2008; Thompson-
Schill et al., 1997; Wagner et al., 2001). It has been theorized that this increased frontal
activity represents the engagement of control processes that select among competing al­
ternatives during retrieval (e.g., Danker, Gunn, & Anderson, 2007; Thompson-Schill et al.,
1997; Wagner et al., 2001). According to Kuhl and colleagues (2010), if competition re­
sults from the simultaneous retrieval of multiple episodes, then competition should be re­
flected in the simultaneous reactivation of competing memories during retrieval. In their
study, Kuhl et al. (2010) instructed participants to associate words (e.g., “lamp”) with im­
ages of well-known faces (e.g., Robert De Niro) or scenes (e.g., Taj Majal). Some words
were paired with one associate, and some words were paired with two associates: one
face and one scene. During retrieval, participants were presented with a word as a cue
and instructed to recall its most recent associate and indicate the visual category (face or
scene). Kuhl et al. (2010) used a classifier approach to capture the amount of target and
competitor reactivation during retrieval and found that competition decreased target
classifier accuracy, presumably because of increased competitor reactivation. Further­
more, when classifier accuracy was low, indicating high competition, frontal engagement
was increased. A follow-up study using three kinds of images (faces, scenes, and objects)
confirmed that competition corresponded to increased competitor reactivation, and found
that competitor reactivation correlated with ACC engagement during retrieval (Kuhl,
Page 14 of 26
Cognitive Neuroscience of Episodic Memory

Brainbridge, & Chun, 2012). These studies demonstrate that competition during retrieval
is reflected in the reactivation of competing memories.

Hippocampal Reactivation Mediates Cortical


Reactivation: An Open Question
Figure 18.1 outlines a process whereby hippocampal reactivation mediates cortical reacti­
vation during retrieval. As discussed in this chapter, current research supports the role of
the hippocampus in episodic encoding and retrieval: Hippocampal activity during encod­
ing predicts subsequent memory, and hippocampal activity during retrieval coincides with
the recovery of episodic details. Furthermore, we have shown that cortical reactivation
occurs during retrieval and correlates with the recovery of episodic details. Despite the
fact the many models of hippocampal–cortical interaction converge on the theory that the
hippocampus mediates cortical reactivation during retrieval (Alvarez & Squire, 1994; Mc­
Clelland et al., 1995; Moscovitch et al., 2005), there is currently a paucity of empirical ev­
idence from neuroimaging studies. If hippocampal reactivation mediates cortical reactiva­
tion, then hippocampal reactivation should both precede and predict cortical reactivation.
Given the limitations of the methodological techniques currently available, there are two
major hurdles: (1) simultaneous measurement of hippocampal and cortical reactivation,
and (2) measurement of reactivation at a temporal resolution capable of distinguishing
the temporal order of hippocampal and cortical reactivation.

By capitalizing on stimulus types with distinct patterns of hippocampal and cortical activi­
ty, one should be able to simultaneously measure hippocampal and cortical reactivation
during retrieval using classification methods. In fact, there has already been some suc­
cess in using classifiers to identify cognitive states (Hassabis et al., 2009), and even indi­
vidual memories (Chadwick et al., 2010), using patterns of activity across hippocampal
voxels. However, no study has attempted to classify retrieval trials using a classifier
trained on the encoding data. This would be a true demonstration of hippocampal reacti­
vation during retrieval.

However, measuring cortical and hippocampal reactivations at a sufficient temporal reso­


lution to distinguish their order is a more difficult methodological (p. 385) barrier. Where­
as event-related potential (ERP) studies of reactivation have provided estimates of how
early cortical reactivation occurs (Johnson et al., 2008; Slotnick, 2009b; Yick & Wilding,
2008), it would be extremely difficult to isolate a signal from the hippocampus using elec­
troencephalography (EEG) or even magnetoencephalography (MEG). Testing the hypothe­
sis that hippocampal reactivation mediates cortical reactivation during episodic retrieval
will be one of the major challenges for episodic memory researchers in the near future.

Page 15 of 26
Cognitive Neuroscience of Episodic Memory

References
Addis, D. R., Moscovitch, M., Crawley, A. P., & McAndrews, M. P. (2004). Recollective
qualities modulate hippocampal activation during autobiographical memory retrieval.
Hippocampus, 14, 752–762.

Addis, D. R., Wong A. T., & Schacter D. L. (2007). Remembering the past and imagining
the future: Common and distinct neural substrates during event construction and elabo­
ration. Neuropsychologia, 45 (7) 1363–1377.

Alvarez, P., & Squire, L. R. (1994). Memory consolidation and the medial temporal lobe: A
simple network model. Proceedings of the National Academy of Sciences, 91, 7041–7045.

Anderson, J. R. (1974). Retrieval of propositional information from long-term memory.


Cognitive Psychology, 5, 451–474.

Awipi, T., & Davachi L. (2008). Content-specific source encoding in the human medial
temporal lobe. Journal of Experimental Psychology: Learning, Memory and Cognition, 34
(4) 769–779.

Badre, D., Poldrack, R. A., Paré-Gloev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissocia­
ble controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal
cortex. Neuron, 47, 907–918.

Bowles, B., Crupi, C., Mirsattari, S. M., Pigott, S. E., Parrent, A. G., Pruessner, J. C.,
Yonelinas, A. P., & Köhler, S. (2007). Impaired familiarity with preserved recollection after
anterior temporal-lobe resection that spares the hippocampus. Proceedings of the Nation­
al Academy of Sciences, 41, 16382–16387.

Brewer, J. B., Zhao, Z., Desmond, J. E., Glover, G. H., & Gabrieli, J. D. E. (1998). Making
memories: Brain activity that predicts how well visual experience will be remembered.
Science, 281, 1185–1187.

Cansino, S., Maquet, P., Dolan, R. J., & Rugg, M. D. (2002). Brain activity underlying en­
coding and retrieval of source memory. Cerebral Cortex, 12, 1048–1056.

Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding individual
episodic memory traces in the human hippocampus. Current Biology, 20, 544–547.

Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory
research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684.

Danker, J. F., & Anderson, J. R. (2010). The ghosts of brain states past: Remembering re­
activates the brain regions engaged during encoding. Psychological Bulletin, 136, 87–102.

Danker, J. F., Fincham, J. M., & Anderson, J. R. (2011). The neural correlates of competi­
tion during memory retrieval are modulated by attention to the cues. Neuropsychologia,
49, 2427–2438.

Page 16 of 26
Cognitive Neuroscience of Episodic Memory

Danker, J. F., Gunn, P., & Anderson, J. R. (2008). A rational account of memory predicts
left prefrontal activation during controlled retrieval. Cerebral Cortex, 18, 2674–2685.

Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial tem­
poral lobes: recollection, familiarity, and novelty. Journal of Neurophysiology, 96, 1902–
1911.

Daselaar, S. M., Rice, H. J., Greenberg, D. L., Cabeza, R., LaBar, K. S., & Rubin, D. C.
(2008). The spatiotemporal dynamics of autobiographical memory: Neural correlates of
recall, emotional intensity, and reliving. Cerebral Cortex, 18, 217–229.

Davachi, L. (2004). The ensemble the plays together, stays together. Hippocampus, 14, 1–
3.

Davachi, L., Mitchell, J. P., & Wagner, A. D. (2003). Multiple routes to memory: Distinct
medial temporal lobe processes build item and source memories. Proceedings of the Na­
tional Academy of Sciences, 100, 2157–2162.

Davachi, L. (2006). Item, context and relational episodic encoding in humans. Current
Opinion in Neurobiology, 16, 693–700.

Davachi, L. (2007). Encoding: The proof is still required. In H. L. Roediger III, Y. Dudai, &
S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 137–143). New York: Oxford
University Press.

Dobbins, I. G., Rice, H. J., Wagner, A. D., & Schacter, D. L. (2003). Memory orientation and
success: separable neurocognitive components underlying episodic recognition. Neu­
ropsychologia, 41, 318–333.

Dougal, S., Phelps, E. A., & Davachi, L. (2007). The role of the medial temporal lobe in
item recognition and source recollection of emotional stimuli. Cognitive, Affective & Be­
havioral Neurosciences, 7 (3) 233–242.

Diana, R. A., Yonelinas, A. P., & Ranganath, C. (2007). Imaging recollection and familiarity
in the medial temporal lobe: a three-component model. Trends in Cognitive Sciences, 11,
379–386.

Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuro­
science, 11, 114–126.

Dolan, R. J., & Fletcher, P. C. (1997). Dissociating prefrontal and hippocampal function in
episodic memory encoding. Nature, 388, 582–585.

Eichanbam, H., Yonelinas, A. R., & Ranganath, C. (2007). The medial temporal lobes and
recognition memory. Annual Reviews in Neuroscience, 30, 123–152.

Page 17 of 26
Cognitive Neuroscience of Episodic Memory

Eichenbaum, H., Sauvege, M., Fortin, N., Komorovski, R., & Lipton, P. (2012). Towards a
functional organization of episodic memory in the medial temporal lobe. Neuroscience &
Biobehavioral Reviews, 36, 1597–1608.

Eichenbaum, H., Fortin, N., Sauvage, M., Robitsek, R. J., & Farovik, A. (2010) An animal
model of amnesia that uses Receiver Operating Characteristics (ROC) analysis to distin­
guish recollection from familiarity deficits in recognition memory. Neuropsychologia 48,
2281–2289.

Eldridge, L. L., Knowlton, B. J., Furmanski, C. S., Bookheimer, S. Y., & Engel, S. A. (2000).
Remembering episodes: A selective role for the hippocampus during retrieval. Nature
Neuroscience, 3, 1149–1152.

Ellenbogen, J. M., Hu, P. T., Payne, J. D. Titone, D., & Walker, M. P. (2007) Human relation­
al memory requires time and sleep. Proceedings of the National Academy of Sciences,
104 (18), 7723–7728.

Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­
ment. Nature, 392, 598–601.

Farah, M. J. (1988). Is visual imagery really visual? Overlooked evidence from neu­
(p. 386)

ropsychology. Psychological Review, 95, 307–317.

Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5, 79–83.

Fletcher, P. C., Shallice, T., & Dolan, R. J. (1998). The functional roles of prefrontal cortex
in episodic memory. I. Encoding. Brain, 121, 1239–1248.

Gabrieli, J. D. E., Brewer, J. B., Desmond, J. E., & Glover, G. H. (1997). Separate neural
bases of two fundamental memory processes in the human medial temporal lobe. Science,
11, 264–266.

Gelbard-Sagiv, H., Mukamel, R., Harel, M., Malach, R., & Fried, I. (2008). Internally gen­
erated reactivation of single neurons in human hippocampus during free recall. Science,
322, 96–101.

Giovanello, K. S., Verfaellie, M., & Keane, M. M. (2003). Disproportionate deficit in asso­
ciative recognition relative to item recognition in global amnesia. Cognitive, Affective,
and Behavioral Neuroscience, 3, 186–194.

Gold, J. J., Smith, C. N., Baylet, P. J., Shrager, Y., Brewer, J. B., Stark, C. E. L., Hopkins, R.
O., & Squire, L. R. (2006). Item memory, source memory, and the medial temporal lobe:
Concordant findings from fMRI and memory-impaired patients. Proceedings of the Na­
tional Academy of Sciences of the United States of America, 103, 9351–9356.

Greenberg, D. L., & Rubin, D. C. (2003). The neuropsychology of autobiographical memo­


ry. Cortex, 39, 687–728.

Page 18 of 26
Cognitive Neuroscience of Episodic Memory

Hannula, D. E., & Ranganath, C. (2008). Medial temporal lobe activity predicts successful
relational memory binding. Journal of Neuroscience, 28 (1), 116–124.

Haskins, A. L., Yonelinas, A. P., & Ranganath, C. (2008). Perirhinal cortex supports uniti­
zation and familiarity-based recognition of novel associations. Neuron, 59 (4), 554–560.

Hassabis, D., Kumaran D., Vann S. D., & Macguire E. A. (2007). Patients with hippocam­
pal amnesia cannot imagine new experiences. Proceedings of the National Academy of
Sciences, 105 (5), 1726–1731.

Hassabis, D., Chu, C., Rees, G., Weiskopf, N., Molyneux, P. D., & Maguire, E. A. (2009).
Decoding neuronal ensembles in the human hippocampus. Current Biology, 19, 546–554.

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor­
tex. Science, 293, 2425–2430.

Hayes, S. M., Buchler, N., Stokes, J., Kragel, J., & Cabeza, R. (2011). Neural correlates of
confidence during item and recognition and source memory retrieval: Evidence for both
dual-process and strength memory theories. Journal of Cognitive Neuroscience, 23, 3959–
3971.

Hebb, D. O. (1949). The organization of behavior. New York: Wiley & Sons.

Jackson, O., & Schacter, D. L. (2004). Encoding activity in anterior medial temporal lobe
supports subsequent associative recognition. NeuroImage, 21, 456–462.

James, W. (1890). The principles of psychology. New York: Holt.

Johnson, J. D., Minton, B. R., & Rugg, M. D. (2008). Context-dependence of the electro­
physiology correlates of recollection. NeuroImage, 39, 406–416.

Johnson, J. D., McDuff, S. G. R., Rugg, M. D., & Norman, K. A. (2009). Recollection, famil­
iarity, and cortical reinstatement: A multivoxel pattern analysis. Neuron, 63, 697–708.

Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17,
4302–4311.

Kelley, W. M., Miezin, F. M., McDermott, K. B., Buckner, R. L., Raichle, M. E., Cohen, N. J.,
Ollinger, J. M., Akbudak, E., Conturo, T. E., Snyder, A. Z., & Petersen, S. E. (1998). Hemi­
spheric specialization in human dorsal frontal cortex and medial temporal lobe for verbal
and nonverbal memory encoding. Neuron, 20, 927–936.

Kensinger, E. A., Clarke, R. J., & Corkin, S. (2003). What neural correlates underlie suc­
cessful encoding and retrieval? A functional magnetic resonance imaging study using a
divided attention paradigm. Journal of Neuroscience, 23, 2407–2415.

Page 19 of 26
Cognitive Neuroscience of Episodic Memory

Kensinger, E. A., & Schacter, D. (2006). Amygdala activity is associated with the success­
ful encoding of item, but not source, information for positive and negative stimuli. Journal
of Neuroscience, 26 (9), 2564–2570.

Khader, P., Burke, M., Bien, S., Ranganath, C., & Rosler, F. (2005). Content-specific activa­
tion during associative long-term memory retrieval. NeuroImage, 27, 805–816.

Kirwan, C. B., & Stark, C. L. (2004). Medial temporal lobe activation during encoding and
retrieval of novel face-name pairs. Hippocampus, 14, 919–930.

Kirwan, C. B., Wixted, J. T., & Squire, L.R. (2008). Activity in the medial temporal lobe
predicts memory strength, whereas activity in the prefrontal cortex predicts recollection.
Journal of Neuroscience, 28, 10548–10541.

Komorowski, R. W., Manns, J. R., & Eichenbaum, H. (2009) Robust conjunctive item-place
coding by hippocampal neurons parallels learning what happens. Journal of
Neuroscience, 29, 9918–9929.

Kuhl, B. A., Rissman, J., Chun, M. M., & Wagner, A. D. (2010). Fidelity of neural reactiva­
tion reveals competition between memories. Proceedings of the National Academy of
Sciences, 108, 5903–5908.

Kuhl, B. A., Brainbridge, W. A., & Chun, M. M. (2012). Neural reactivation reveals mecha­
nisms for updating memory. Journal of Neuroscience, 32, 3453–3461.

Levin, D. T., Simons, D. J., Angelone, B. L., & Chabris, C. F. (2002). Memory for centrally
attended changing object in an incidental real-world change detection paradigm. British
Journal of Psychology, 92, 289–302.

Liang, J., Wagner, A. D., & Preston, A. R. (2013). Content representation in the human me­
dial temporal lobe. Cerebral Cortex, 23 (1), 80–96.

Litman, L., & Davachi, L. (2008) Distributed learning enhances relational memory consol­
idation. Learning & Memory, 15, 711–716.

Marr, D. (1971). Simple memory: A theory for archicortex. Philosophical Transactions of


the Royal Society of London, Series B, 176, 161–234.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why are there complemen­
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102, 419–
457.

Meunier, M., Bachevalier, J., Mashkin, M., & Murray, E. A. (1993) Effects of visual recog­
nition of combined and separate ablations of the entorhinal and perirhinal cortex in rhe­
sus monkeys. Journal of Neuroscience, 13, 5418–5432.

Page 20 of 26
Cognitive Neuroscience of Episodic Memory

Mitchell, T., Hutchinson, R., Niculescu, S., Pereira, F., Wang, X., Just, M., & New­
(p. 387)

man, S. (2004). Learning to decode cognitive states from brain images. Machine Learning,
57, 145–175.

Montaldi, D., Mayes, A. R., Barnes, A., Pirie, H., Hadley, D. M., Patterson, J., & Wyper, D. J.
(1998). Associative encoding of pictures activates the medial temporal lobes. Human
Brain Mapping, 6, 85–104.

Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer
appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533.

Moscovitch, R., Rosenbaum, R. S., Gilboa, A., Addis, D. R., Westmacott, R., Grady, C.,
McAndrews, M. P., Levine, B., Black, S., Winocur, G., & Nadel, L. (2005). Functional neu­
roanatomy of remote episodic, semantic, and spatial memory: A unified account based on
multiple trace theory. Journal of Anatomy, 207, 35–66.

Norman, K. A., & Schacter, D. L. (1997). False recognition in young and older adults: Ex­
ploring the characteristics of illusory memories. Memory & Cognition, 25, 838–848.

Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contribu­
tions to recognition memory: a complementary-learning-systems approach. Psychological
Review, 110, 611–646.

Nyberg, L., Habib, R., McIntosh, A. R., & Tulving, E. (2000). Reactivation of encoding-re­
lated brain activity during memory retrieval. Proceedings of the National Academy of
Sciences, 97, 11120–11124.

O’Craven, K. M., & Kanwisher, N. (2000). Mental imagery of faces and places activates
corresponding stimulus-specific brain regions. Journal of Cognitive Neuroscience, 12,
1013–1023.

O’Reilly, R. C. & Rudy, J. W. (2000). Computational principles of learning in the neocortex


and hippocampus. Hippocampus, 10, 389–397.

Paller, K. A., Kutas, M., & Mayes, A. R. (1987). Neural correlates of encoding in an inci­
dental learning paradigm. Electroencephalography and Clinical Neurophysiology, 67,
360–371.

Park, H., & Rugg, M. D. (2011) Neural correlates of encoding within- and across-domain
inter-item associations. Journal of Cognitive Neuroscience, 23, 2533–2543.

Park, H., Shannon V., Biggan, J., & Spann, C. (2012). Neural activity supporting the forma­
tion of associative memory versus source memory. Brain Research, 1471, 81–92.

Parkinson, J. K., Murray, E. A., & Mishkin, M. (1988). A selective mnemonic role for the
hippocampus in monkeys: memory for the location of objects. Journal of Neuroscience, 8,
4159–4167.

Page 21 of 26
Cognitive Neuroscience of Episodic Memory

Peigneux, P., Laureys, S., Fuchs, S., Collette, F., Perrin, F., Reggers, J., Phillips, C., Deguel­
dre, C., Del Fiore, G., Aerts, J., Luxen, A., & Maquet, P. (2004). Are spatial memories
strengthened in the human hippocampus during slow wave sleep? Neuron, 44, 535–545.

Peigneux, P., Orban P., Balteau E., Degueldre C., Luxen A., Laureys S., & Paquet P. (2006).
Offline persistence of memory-related cerebral activity during active wakefulness. PLoS
Biology 4 (4), e100.

Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1989). Positron
emission tomographic studies of processing of single words. Journal of Cognitive Neuro­
science, 1, 153–170.

Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
E. (1999). Functional specialization for semantic and phonological processing in left infe­
rior frontal cortex. NeuroImage, 10, 15–35.

Polyn, S. M., Natu, V. S., Cohen, J. D., & Norman, K. A. (2005). Category-specific cortical
activity precedes retrieval during memory search. Science, 310, 1963–1966.

Prince, S. E., Daselaar, S. M., & Cabeza, R. (2005). Neural correlates of relational memo­
ry: Successful encoding and retrieval of semantic and perceptual associations. Journal of
Neuroscience, 25 (5), 1203–1210.

Ranganath, C., Cohen, M. X., Dam, C., & D’Esposito, M. (2004). Inferior temporal, pre­
frontal, and hippocampal contributions to visual working memory maintenance and asso­
ciative memory retrieval. Journal of Neuroscience, 24, 3917–3925.

Ranganath, C. (2010). A unified framework for the functional organization of the medial
temporal lobes and the phenomenology of episodic memory. Hippocampus, 20, 1263–
1290.

Rasch, B., Buchel, C., Gais, S., & Born, J. (2007). Odor cues during slow-wave sleep
prompt declarative memory consolidation. Science, 315, 1426–1429.

Roediger, H. L. (2000). Why retrieval is the key process to understanding human memory.
In E. Tulving (Ed.), Memory, consciousness, and the brain: The Tallinn conference (pp. 52–
75). Philadelphia: Psychology Press.

Rombouts, S. A. R. B., Machielsen, W. C. M., Witter, M. P., Barkhof, F., Lindelboom, J., &
Scheltens, P. (1997). Visual associative encoding activates the medial temporal lobe: A
functional magnetic resonance imaging study. Hippocampus, 7, 594–601.

Rubin, D. C., & Greenberg, D. L. (1998). Visual memory-deficit amnesia: A distinct amne­
sia presentation and etiology. Proceedings of the National Academy of Sciences, 95, 5413–
5416.

Page 22 of 26
Cognitive Neuroscience of Episodic Memory

Rugg, M. D. (1995). ERP studies of memory. In M. D. Rugg & M. G. H. Coles (Eds.), Elec­
trophysiology of mind: Event-related brain potentials and cognition (pp. 133–170). Lon­
don: Oxford University Press.

Sanquist, T. F., Rohrbaugh, J., Syndulko, K., & Lindsley, D. B. (1980). An event-related po­
tential analysis of coding processes in human memory. Progress in Brain Research, 54,
655–660.

Sauvage, M. M., Fortin, N. J., Owens, C. B., Yonelinas, A. P., & Eichenbaum, H. (2008).
Recognition memory: Opposite effects of hippocampal damage on recollection and famil­
iarity. Nature Neuroscience, 11, 16–18.

Shallice, T., Fletcher, P., Frith C. D., Grasby, P., Frackowiak, R. S. J., & Dolan, R. J. (1994).
Brain regions associated with acquisition and retrieval of verbal episodic memory. Nature,
368, 633–635.

Shrager, Y., Kirwan, C. B., & Squire, L. R. (2008). Activity in both hippocampus and
perirhinal cortex predicts the memory strength of subsequently remembered information.
Neuron, 59, 547–553.

Slotnick, S. D. (2009a). Memory for color reactivates color processing region. NeuroRe­
port, 20, 1568–1571.

Slotnick, S. D. (2009b). Rapid retinotopic reactivation during spatial memory. Brain Re­
search, 1268, 97–111.

Slotnick, S. D., & Thakral, P. P. (2011). Memory for motion and spatial location is mediat­
ed by contralateral and ipsilateral motion processing cortex. NeuroImage, 55, 794–800.

Song, Z., Wixted, J. T., Smith, C. N., & Squire, L. R. (2011). Different nonlinear functions
in hippocampus and perirhinal cortex relating functional MRI activity to memory
strength. Proceedings of the National Academy of Sciences, 108, 5783–5788.

Spiers, H. J., Burgess, N., Hartley, T., Vargha-Khadem, F., & O’Keefe, J. (2001). Bi­
(p. 388)

lateral hippocampal pathology impairs topographical and episodic memory but not visual
pattern matching. Hippocampus, 11 (6), 715–725.

Squire, L. R., Stark, C. E. L., & Clark, R. E. (2004). The medial temporal lobe. Annual Re­
view of Neuroscience, 27, 279–306.

Squire, L. R., van der Horst, A. S., McDuff, S. G., Frascino, J. C., Hopkins, R. O., &
Mauldin, K. N. (2010). Role of the hippocampus in remembering the past and imagining
the future. Proceedings of the National Academy of Sciences, 107 (44) 19044–19048.

Staresina, B. P., & Davachi, L. (2008). Selective and shared contributions of the hip­
pocampus and perirhinal cortex to episodic item and associative encoding. Journal of Cog­
nitive Neuroscience, 20 (8), 1478–1489.

Page 23 of 26
Cognitive Neuroscience of Episodic Memory

Staresina, B. P., & Davachi, L. (2009). Mind the gap: Binding experiences across space
and time in the human hippocampus. Neuron, 63 (3), 267–276.

Staresina, B. P., Duncan, K. D., Davachi, L. (2011). Perirhinal and parahippocampal cor­
tices differentially contribute to later recollection of object- and scene-related event de­
tails. Journal of Neuroscience, 31 (24), 8739–8747.

Stark, C. E. L., & Squire, L. R. (2001). Simple and associative recognition memory in the
hippocampal region. Learning & Memory, 8, 190–197.

Stark, C. E. L., & Squire, L. R. (2003). Hippocampal damage equally impairs memory for
single items and memory for conjunctions. Hippocampus, 13, 281–292.

Tambini, A., Ketz, N., & Davachi, L. (2010). Enhanced brain correlations during rest are
related to memory for recent experiences. Neuron, 65, 280–290.

Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceedings
of the National Academy of Sciences, 94, 14792–14797.

Tulving, E. (1973). Encoding specificity and retrieval processes in episodic memory. Psy­
chological Review, 80, 352–373.

Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.),
Organization Memory (pp. 381–403). New York: Academic Press.

Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press.

Tulving, E., Markowitsch, H. J., Kapur, S., Habib, R., & Houle, S. (1994). Novelty encoding
networks in the human brain—positron emission tomography data. NeuroReport, 5, 2525–
2528.

Uncapher, M. R., Otten, L. J., & Rugg, M. D. (2006). Episodic encoding is more than the
sum of its parts: An fMRI investigation of multifeatural contextual encoding. Neuron, 52
(3) 547–556.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A.
Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam­
bridge, MA: MIT Press.

Vaidya, C. J., Zhao, M., Desmond, J. E., & Gabrieli, J. D. E. (2002). Evidence for cortical
specificity in episodic memory: Memory-induced re-activation of picture processing areas.
Neuropsychologia, 40, 2136–2143.

Vann, S.D., Tsivilis D., Denby C.E., Quamme J.R., Yonelinas A.P., Aggleton J.P., Montaldi D.
& Mayes A.R. (2009) Impaired recollection but spared familiarity in patients with extend­
ed hippocampal system damage revealed by 3 convergent methods. Proceedings of the
National Academy of Sciences, 106 (13): 5442–5447.

Page 24 of 26
Cognitive Neuroscience of Episodic Memory

Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B.
R., & Buckner, R. L. (1998). Building memories: Remembering and forgetting verbal expe­
riences as predicted by brain activity. Science, 281, 1188–1191.

Wagner, A. D., Koutstaal, W., & Schacter, D. L. (1999). When encoding yields remember­
ing: Insights from event-related neuroimaging. Philosophical Transactions of the Royal
Society of London, Biology, 354, 1307–1324.

Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning:
Left prefrontal cortex guides controlled semantic retrieval. Neuron, 31, 329–338.

Wheeler, M. E., & Buckner, R. L. (2003). Functional dissociations among components of


remembering: Control, perceived oldness, and content. Journal of Neuroscience, 23,
3869–3880.

Wheeler, M. E., & Buckner, R. L. (2004). Functional-anatomic correlates of remembering


and knowing. NeuroImage, 21, 1337–1349.

Wheeler, M. E., Petersen, S. E., & Buckner, R. L. (2000). Memory’s echo: Vivid remember­
ing reactivates sensory-specific cortex. Proceedings of the National Academy of Sciences,
97, 11125–11129.

Wheeler, M. E., Shulman, G. L., Bucckner, R. L., Miezin, F. M., Velanova, K., & Petersen, S.
E. (2006). Evidence for separate perceptual reactivation and search processes during re­
membering. Cerebral Cortex, 6, 949–959.

Wixted, J. T., & Squire, L.R. (2004). Recall and recognition are equally impaired in pa­
tients with selective hippocampal damage. Cognitive, Affective, and Behavioral Neuro­
science, 4, 58–66.

Wixted, J. T., & Squire, L. R. (2011). The medial temporal lobe and the attributes of mem­
ory. Trends in Cognitive Sciences, 15, 210–217.

Yick, Y. Y., & Wilding, E. L. (2008). Material-specific correlates of memory retrieval. Neu­
roReport, 19, 1463–1467.

Yonelinas, A. P., Hopfinger, J. B., Buonocore, M. H., Kroll, N. E. A., & Baynes, K. (2001).
Hippocampal, parahippocampal, and occipital-temporal contributions to associative and
item recognition memory: an fMRI study. NeuroReport, 12, 359–363.

Yonelinas, A. P., Kroll, N. E., Quamme, J. R., Lazzara, M. M., Sauvé, M. J., Widaman, K. F.,
& Knight, R. T. (2002). Effects of extensive temporal lobe damage or mild hypoxia on rec­
ollection and familiarity. Nature Neuroscience, 11, 1236–1241.

Yonelinas, A. P., Otten, L. J., Shaw, K. N., & Rugg, M. D. (2005). Separating the brain re­
gions involved in recollection and familiarity in recognition memory. Journal of Neuro­
science, 25, 3002–3008.

Page 25 of 26
Cognitive Neuroscience of Episodic Memory

Yu, S. S., Johnson, J. D., & Rugg, M. D. (2012) Hippocampal activity during recognition
memory co-varies with the accuracy and confidence of source memory judgments. Hip­
pocampus, 22, 1429–1437.

Zola-Morgan, S., & Squire, L. R. (1986). Memory impairment in monkeys following le­
sions limited to the hippocampus. Behavioral Neuroscience, 100, 155–160.

Lila Davachi

Lila Davachi, Department of Psychology, Center for Neural Science, New York Uni­
versity

Jared Danker

Jared Danker, Department of Psychology, New York University

Page 26 of 26
Working Memory

Working Memory  
Bradley R. Buchsbaum and Mark D'Esposito
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0019

Abstract and Keywords

Working memory refers to the temporary retention of information that has just arrived to
the senses or been retrieved from long-term memory. Although internal representations
of external stimuli have a natural tendency to decay, they can be kept “in mind” through
the action of maintenance or rehearsal strategies, and can be subjected to various opera­
tions that manipulate information in the service of ongoing behavior. Empirical studies of
working memory using the tools of neuroscience, such as electrophysiological recordings
in monkeys and functional neuroimaging in humans, have advanced our knowledge of the
underlying neural mechanisms of working memory.

Keywords: short-term memory, working memory, prefrontal cortex, maintenance, neuroimaging, functional mag­
netic resonance imaging, phonological loop, visual-spatial sketchpad, central executive

Introduction
Humans and other animals with elaborately evolved sensory systems are prodigious con­
sumers of information: Each successive moment in an ever-changing environment nets a
vast informational catch—a rich and teeming mélange of sights, sounds, smells, and sen­
sations. Everything that is caught by the senses, however, is not kept—and that which is
kept may not be kept for long. Indeed, the portion of experience that survives the immedi­
ate moment is but a small part of the overall sensory input. With regard to memory stor­
age, then, the brain is not a packrat, but rather is a judicious and discerning collector of
the most important pieces of experience. A good collector of experience, however, is also
a good speculator: The most important information to store in memory is that which is
most likely to be relevant at some time in the future. Of course, a large amount of infor­
mation that might be important in the next few seconds is very unlikely to be of any im­
portance in a day, a month, or a year. It might be stated more generally that to a large de­
gree the relevance of information is time-bounded—sense-data collected and registered in
the present are far more likely to be useful in a few seconds than they are to be in a few
minutes. It would seem, then, that the temporary relevance of information demands the
existence of a temporary storage system: a kind of memory that is capable of holding onto
Page 1 of 48
Working Memory

the sense-data of the “just past” in an easily accessible form, while ensuring that older, ir­
relevant, or distracting information is discarded or actively suppressed.

The existence of this kind of short-term or “working memory” has been well established
over the past century through the detailed study of human performance on tasks de­
signed to examine the limits, properties, and underlying structure of human memory.
Moreover, in recent years, much has been learned about the neurobiological basis of
working memory through the study of brain-damaged patients, the effect of cortical abla­
tions on animal behavior, electrophysiological recordings from single cells in the nonhu­
man primate, and regional brain activity as measured by modern functional neuroimaging
tools such as positron emission tomography (p. 390) (PET), functional magnetic resonance
imaging (fMRI), and event-related potentials (ERPs). In this chapter, we examine how the
psychological concept of working memory has, through a variety of cognitive neuroscien­
tific investigations, been validated as a biological reality.

Short-Term Memory
In the mid-1960s, evidence began to accumulate supporting the view that separate func­
tional systems underlie memory for recent events and memory for more distant events. A
particularly robust finding came from studies of free recall in which it was demonstrated
that when subjects are presented a list of words and asked to repeat as many as possible
in any order, performance is best for the first few items (the primacy effect) and for the
last few items (the recency effect)—a pattern of accuracy that when plotted as a function
of serial position (Figure 19.1) appears U-shaped (Glanzer & Cunitz, 1966; Waugh & Nor­
man, 1965).

Page 2 of 48
Working Memory

Figure 19.1 Plot of recall accuracy as a function of


serial position in a test of free recall. Primacy and re­
cency effects are evident in the U-shaped pattern of
the curve.

Adapted with permission from Glanzer & Cunitz,


1966.

When a brief filled retention period is interposed between stimulus presentation and re­
call, however, performance on early items is relatively unaffected, but the recency effect
disappears (Glanzer & Cunitz, 1966; Postman & Phillips, 1965). These findings suggest
that in the immediate recall condition, the last few items of a list are recalled best be­
cause they remain accessible in a short-term store, whereas early items are more perma­
nently represented (and thus unaffected by the insertion of a filled delay) in a long-term
store. This idea that memory, as a functional system, contains both short- and long-term
stores is exemplified by the two-store memory model of Atkinson and Shiffrin (1968). In
this prototype psychological memory model, comprising a short-term store (STS) and
long-term store (LTS), information enters the system through the STS, where it is encod­
ed and enriched, before being passed on to the LTS for permanent storage. Although the
idea that short-term storage is a necessary prerequisite for entry into the LTS has not
held up, the two-store model of Atkinson and Shiffrin crystallized the very idea of memory
as a divisible, dichotomous system and provided the conceptual framework for the inter­
pretation of patterns of memory deficits observed in patients with brain damage.

Neurological Evidence for Short-Term and


Long-Term Memory Stores
Perhaps the most compelling evidence for the existence of two memory stores comes
from case studies of persons with focal brain lesions. In the early 1950s a surgical proce­
dure for the treatment of intractable epilepsy that involved bilateral removal of the medi­

Page 3 of 48
Working Memory

al temporal lobe in patient H.M. resulted in a catastrophic impairment in his ability to


form new long-term memories, though, remarkably, his short-term memory was left intact
(Scoville & Milner, 1957). Thus, H.M., although perfectly capable of repeating back a
string of digits—the classic test of short-term memory—was unable to permanently store
new facts and events. In the following decade, when Warrington and Shallice (Shallice &
Warrington, 1970; Warrington & Shallice, 1969) reported a number of case studies of pa­
tients with temporal-parietal lesions who had dramatically impaired short-term memory
for numbers and words coupled with a preserved ability to learn supra-span (e.g., greater
than ten items) word lists with repeated study, the case for a separation between short-
and long-term memory was strengthened. It is important to emphasize that the short-
term memory deficits exhibited by such patients were, in the purest cases (Shallice &
Warrington, 1977), not accompanied by any obvious deficits in ordinary language compre­
hension and speech production. Thus, for instance, patient J.B. was able to carry on con­
versations normally and to speak fluently without abnormal pauses, errors, or other
symptoms of aphasia; in short, the “language faculty,” considered to encompass the
processes necessary for the online comprehension and production (p. 391) of meaningful
speech, need not be disturbed even in the presence of a nearly complete eradication of
verbal short-term memory (Shallice & Butterworth, 1977). This established an important
dissociation between the short-term memory syndrome and the aphasic syndromes—a
class of neurological disorders that specifically affect language ability—and argued,
again, for a dedicated system in the brain for the temporary storage of information.

In summary, the discovery of “short-term memory patients,” as they were to be called, in


the neuropsychological investigations of Warrington, Shallice, and others established a
double dissociation both in brain localization (long-term memory—medial temporal lobe;
verbal short-term memory—temporal-parietal cortex) and patterns of performance, be­
tween short- and long-term memory systems. In addition, the short-term memory disorder
could be clearly distinguished, at the behavioral level at least, from the major disorders of
language such as Broca’s and Wernicke’s aphasia.

Development of the Concept of Working Memo­


ry
Short-term memory had, until the landmark work of Baddeley and colleagues (Baddeley,
1986; Baddeley & Hitch, 1974), typically been viewed as a more or less passive and amor­
phous medium for the brief storage of information derived from the senses. Questions
tended to focus on the principles governing the mnemonic “life cycle” of an item in mem­
ory—for example, why and at what rate are items forgotten? What is the role of passive
decay? What is the role of interference, both proactive and retroactive, in forgetting?
What is the route from short-term memory to long-term memory, and what are the factors
that influence this process? These questions, though of fundamental importance to under­
standing how memory works, tended to emphasize the general mechanisms—the proce­
dures and principles of memory—rather than the underlying functional architecture of

Page 4 of 48
Working Memory

the system. What was missing from this line of research was the recognition that the con­
tents of short-term memory are not physical elements governed by certain lawful and in­
exorable processes of decay and interference, but rather dynamic representations of a
fluid cognition, capable of being maintained, transformed, and manipulated by active, ex­
ecutive processes of higher control. Thus, for instance, two of the most important vari­
ables in studies of short-term memory, before the emergence of the working memory
model, were time (e.g., between stimulus presentation and recall) and serial order (e.g.,
of a list of items); both of which variables are defined by the inherent structure of the en­
vironmental input. In more recent years, at least as great an emphasis has been placed on
variables that reflect an ability or attribute of the subject, for instance, his or her rate of
articulation (Hulme, Newton, Cowan, Stuart, & Brown, 1999), memory capacity (Cowan,
2001), degree of inhibitory control (Hasher, Zacks, & Rahhal, 1999), or ability to “re­
fresh” information in memory (Raye, Johnson, Mitchell, Reeder, & Greene, 2002). Interest
in these “internal variables” is a recognition of the fact that what is “in memory” at a mo­
ment in time is defined to various degrees by the structure of the input (e.g., time, serial
order, information content), the biophysical properties of the storage medium (e.g., rate
of decay, interference susceptibility), and the active processes of control that continually
monitor and operate on the contents of memory. It is this last ingredient that puts the
“work” into working memory; it makes explicit the active and transformative character of
mental processes and acknowledges that the content of memory need not mirror the
structure and arrangement of environmental input, but rather may reflect the intentions,
plans, and goals of the conscious organism.

With that introduction in mind, let us now give a brief overview of the working memory
model of Baddeley and colleagues (Baddeley, 1986, 2000; Baddeley & Hitch, 1974).
Whereas contemporary models of short-term memory tended to emphasize storage
buffers as the receptacles for information arriving from the senses, Baddeley and Hitch
(1974) focused on rehearsal processes, that is, strategic mechanisms for the maintenance
of items in memory. Thus, for example, when one is trying to keep a telephone or license
plate number “in mind,” a common strategy is to repeatedly rehearse, either subvocally
or out loud, the contents of the numerical or alphanumerical sequence. Research had
shown that in tests of serial recall, when subjects are prevented from engaging in covert
rehearsal during a delay period that is inserted between stimulus presentation and recall,
overall performance is dramatically impaired (Baddeley, Thomson, & Buchanan, 1975). In
the case of verbal material, then, it was clear that in many ways the ability to keep words
in memory depended in large part on articulatory processes. This insight was central to
the development of the verbal component of working memory, the phonological loop (see
below), and led to a broader conceptualization of short-term memory that seeks (p. 392) to
explain not only how and why information enters and exits awareness but also how re­
sources are marshaled in a strategic effort to capture and maintain the objects of memory
in the focus of attention.

Page 5 of 48
Working Memory

The central tenets of the working memory model are as follows:

1. It is a limited-capacity system; at any moment in time, there is only a finite amount


of information directly available for processing in memory.
2. The specialized subsystems devoted to the representation of information of a par­
ticular type, for instance, verbal or visual-spatial, are structurally independent of one
another; the integrity of information represented in one domain is protected from the
interfering effects of information that may be arriving to another domain.
3. Storage of information in memory is distinct from the processes that underlie
stimulus perception; rather, there is two-stage process whereby sensory information
is first analyzed by perceptual modules and then transferred into specialized storage
buffers that have no other role but to temporarily “hold” preprocessed units of infor­
mation. Moreover, the pieces of information that reside in such specialized buffers
are subject to passive, time-based decay as well as inter-item interference (e.g., simi­
lar-sounding words like “man, mad, map, cap, mad” can lead to interference within a
specialized phonological storage structure); finally, such storage buffers have no
built-in or internal mechanism for maintaining or otherwise refreshing their contents
—rather, this must occur from without, through the process of rehearsal, which
might be a motor or top-down control mechanism that can sequentially access and
refresh the contents that remain active within the store.

The initial working memory model proposed by Baddeley and Hitch (1974), but later re­
fined somewhat (Baddeley, 1986, 2000; Salame & Baddeley, 1982), argued for the exis­
tence of three functional components of working memory. The central executive was envi­
sioned as a control system of limited attentional capacity responsible for coordinating and
controlling two subsidiary slave systems: a phonological loop and a visual-spatial sketch­
pad. The phonological loop was responsible for the storage and maintenance of informa­
tion in a verbal form, and the visual-spatial sketchpad was dedicated to the storage and
maintenance of visual-spatial information. In the last decade, a fourth component, the
episodic buffer has been added to the model in order to capture a number of phenomena
that could not be readily explained within the original framework.

The Central Executive

As has already been mentioned, working memory is viewed as a limited-capacity system.


There are a number of reasons for this capacity limitation, but an important one relates to
what one might call the allocation of attention. Although many people are perfectly capa­
ble of walking and chewing gum at the same time, it is far more difficult to perform two
simultaneous cognitive activities that are both attentionally demanding. Thus, quite apart
from the structural limitations inherent to memory storage systems (e.g., the natural in­
clination of memory traces to fade with time and interference), there also appear to be
certain fundamental constraints on “how much” attention can be allocated to the set of
active tasks at any one time (Kahneman, 1973). The central executive component of work­
ing memory sits, as it were, at the helm of the cognitive apparatus and is responsible for
the dispensation of attentional resources to the subsidiary components (e.g., the phono­

Page 6 of 48
Working Memory

logical loop) in working memory (Baddeley, 1986). Because total attentional capacity is fi­
nite, there must be a mechanism that intervenes to determine how the pool of attention is
to be divided among the many possible actions, with their different levels of priority and
reward contingencies that are afforded by the environment. Thus, in dual-task paradigms,
the central executive plays a crucial role in the scheduling and shifting of resources be­
tween tasks, and it can be used to explain the decline in performance that may be ob­
served even when the two tasks in question involve different memory subsystems (Badde­
ley, 1992). Finally, it has often been pointed out that the central executive concept is too
vague to act as anything other than a kind of placeholder for what is undoubtedly a much
more complex system than is implied by the positing of a unitary and homunculus-like
central cognitive operator (for a model of executive cognition, see Shallice, 1982). Provid­
ed, however, that the concept is not taken too literally, it can serve as a convenient way to
refer to the complex and variegated set of processes that constitute the brain’s executive
system.

The Phonological Loop

Built into the architecture of the working memory model is a separation between domain-
specific (p. 393) mechanisms of memory maintenance and domain-general mechanisms of
executive control. Thus, the verbal component of working memory, or the phonological
loop, is viewed as a “slave” system that can be mobilized by the central executive when
verbal material has to be retained in memory over some uncertain delay. Within the
phonological loop, it is the interplay of two components—the phonological store and the
articulatory rehearsal process—that enables representations of verbal material to be kept
in an active state. The phonological store is a passive buffer in which speech-based infor­
mation can be stored for brief (approximately 2 seconds) periods. The articulatory control
process serves to refresh and revivify the contents of the store, thus allowing the system
to maintain short sequences of verbal items in memory for an extended interval. This divi­
sion of labor between two interlocking components, one an active process and the other a
passive store, is crucial to the model’s explanatory power. For instance, when the articula­
tory control process is interfered with through the method of articulatory suppression
(e.g., by requiring subjects to say “hiya” over and over again), items in the store rapidly
decay, and recall performance suffers greatly. The store, then, lacks a mechanism of reac­
tivating its own contents but possesses memory capacity, whereas conversely, the articu­
latory rehearsal process lacks an intrinsic memory capacity of its own but can exert its ef­
fect indirectly by refreshing the contents of the store.

The Visual-Spatial Sketchpad

The other slave system in the working memory model is the visual-spatial sketchpad,
which is critical for the online retention of object and spatial information. Again, as is
suggested by the term “sketchpad,” the maintenance of visual-spatial imagery in an ac­
tive state requires top-down, or strategic, processing. As with the phonological loop,
where articulatory suppression interferes with the maintenance of verbal information, a
concurrent processing demand in the visual-spatial domain, such as tracking a spot of
Page 7 of 48
Working Memory

light moving on a screen, making random eye movements, or presenting subjects with ir­
relevant visual information during learning, also impairs memory performance. Although
the symmetry between sensory and motor representations of visual-spatial information is
less obvious than it is in the case of speech, it has been demonstrated that covert eye
movement is important for the maintenance of spatial information (Postle, D’Esposito, &
Corkin, 2005). Baddeley (1986) initially proposed that in the context of spatial memory,
covert eye movements can act as way of revisiting locations in memory, and thus operate
very much like the articulatory rehearsal process known to be important for the mainte­
nance of verbal information. Moreover, requiring subjects to perform a spatial interfer­
ence task that disrupts or otherwise occupies this rehearsal component significantly im­
pairs the performance of tests of spatial working memory, but may have little effect on
nonspatial visual memory tasks (Cocchini, Logie, Della Sala, MacPherson, & Baddeley,
2002; Della Sala, Gray, Baddeley, Allamano, & Wilson, 1999). In contrast, retention of vi­
sual shape or color information is interfered with by visual-perceptual input, but not by a
concurrent demand in the spatial domain (Klauer & Zhao, 2004). Thus, the principles that
underlie the operation of the phonological loop are qualitatively similar to those that un­
derlie the operation of the visual-spatial sketchpad; in both cases, maintenance processes
consist of covert motor performance that serves to reactivate the memory traces residing
in sensory stores. This mechanism might be most simply described as “remembering by
doing,” a strategy that is most effective when a motor code, which can be infinitely regen­
erated and that is under the subject’s voluntary control, can be substituted for a fragile
and less easily maintained perceptual memory code.

The Episodic Buffer

Working memory was originally conceived as temporary workspace in which a small


amount of information could be kept in mind through the action of rehearsal mechanisms
whose purpose was to counteract memory decay. But when these rehearsal mechanisms
are interrupted by way of experimental interventions such as articulatory suppression
(Baddeley, Lewis, & Vallar, 1984), the effect is not catastrophic. In a typical study, sup­
pression would cause digit span to drop from seven to five items. Moreover, whereas the
storage systems in working memory were initially conceived of as temporary and with a
fixed capacity, many studies have shown that working memory capacity is increased if the
experimental stimuli are familiar (Ericsson & Kintsch, 1995), meaningful (Martin & He,
2004), or structured, such as in the case of sentences (R. A. McCarthy & Warrington,
1987). This implies that prior experience influences how easily information can be encod­
ed and maintained in working memory. The episodic buffer is hypothesized to provide the
means by which integrated information, such as semantics, syntactic structures, learned
patterns, (p. 394) and multimodal long-term representations, is temporarily retrieved and
bound into the mental workspace of working memory (Baddeley, 2000). The episodic
buffer serves as an interface between a range of systems, but using a common multidi­
mensional code. The buffer is assumed to be limited in capacity because of computational
demand of providing simultaneous access to information provided across multiple cogni­

Page 8 of 48
Working Memory

tive domains (Baddeley, Allen, & Vargha-Khadem, 2010; Prabhakaran, Narayanan, Zhao,
& Gabrieli, 2000).

Summary

The working memory model of Baddeley and colleagues describes a system for the main­
tenance and manipulation of information that is stored in domain-specific memory
buffers. Separate cognitive components are dedicated to the functions of storage, re­
hearsal, and executive control. Informational encapsulation and domain segregation dic­
tate that auditory-verbal and visual information be kept in separate storage subsystems—
the phonological loop and the visual-spatial sketchpad, respectively. These storage sub­
systems themselves comprise specialized components for the passive storage of memory
traces, which are subject to time and interference-based decay, and for the reactivation of
these memory traces by way of simulation, or rehearsal. Thus, storage components repre­
sent memory traces, but have no internal means of refreshing them, whereas rehearsal
processes (e.g., articulatory, saccadic) have no mnemonic capacity of their own, but can
reactivate the decaying traces held in temporary stores. In the succeeding sections we ex­
amine how neuroscience has built on the cognitive foundation of the working memory
model of Baddeley and colleagues to refine our understanding of how information is
maintained and manipulated in the brain. We will see that in some cases neuroscientific
evidence has bolstered and reinforced aspects of the working memory model, whereas in
other cases neuroscience has compelled a departure from certain core principles of the
Baddeleyan concept.

Emergence of Working Memory as a Neurosci­


entific Concept
Perhaps the first insights into the neurobiological underpinnings of a memory whose pur­
pose is to bridge cross-temporal contingencies (Fuster, 1997) comes from the work of Ja­
cobsen, who studied nonhuman primate behavior after ablation to the prefrontal cortices.
In comparing normal chimpanzees to those that had suffered extensive injury to the pre­
frontal cortex (PFC), Jacobsen (1936) noted:

The normal chimpanzee has considerable facility in using sticks or other objects to
manipulate its environment, e.g., to reach a piece of food beyond its unaided
reach. It can solve such problems when it must utilize several sticks, some of
which may not be immediately available in the visual field. After ablation of the
pre-frontal areas, the chimpanzee continues to use sticks as tools but it may have
difficulty solving the problem if the necessary sticks and the food are not simulta­
neously present in the visual field. It exhibits also a characteristic “memory” de­
fect. Given an opportunity to observe a piece of food being concealed under one of
two similar cups, it fails to recall after a few seconds under which cup the lure has
been hidden.… (p. 317)

Page 9 of 48
Working Memory

In his pioneering experimental work, Jacobsen (1936) discovered that damage to the PFC
of the monkey produces selective deficits in a task requiring a delayed response to the
presentation of a sensory stimulus. The delayed response tasks were initially devised by
Hunter (1913) as a way of differentiating between animals on the basis of their ability to
use information not currently available in the sensory environment to guide an imminent
response. In the classic version of this test, a monkey is shown the location of a food
morsel that is then hidden from view and placed in one of two wells. After a delay period
of a few seconds, the monkey chooses one of the two locations and is rewarded if his
choice corresponds to the location of the food. Variations on this test include the delayed
alternation task, the delayed match-to-sample task, and the delayed nonmatch-to-sample
task. The family of delayed-response tasks measure a complex cognitive ability that re­
quires at least three clearly identifiable subprocesses: to recognize and properly encode
the to-be-remembered item, to hold an internal representation of the item “online” across
an interval of time, and finally, to initiate the appropriate motor command when a re­
sponse is prompted. Jacobsen showed that lesions to the PFC impair only the second of
the above three functions, suggesting a fundamental role for the region in immediate or
short-term memory. Thus, monkeys with lesions to PFC perform in the normal range on a
variety of tests requiring sensorimotor behavior, such as visual pattern discrimination and
motor learning and control—that is, tasks without a short-term mnemonic component. Al­
though the impairments in the performance of delayed-response tasks in Jacobsen’s stud­
ies were caused by large (p. 395) prefrontal lesions that often extended into the frontal
pole and orbital surface, later studies showed that lesions confined to the region of the
principal sulcus produced deficits equally as severe (Blum, 1952; Butters, Pandya, Stein,
& Rosen, 1972).

Fuster and Alexander (1971) reported the first direct physiological measures of PFC in­
volvement in short-term memory. With microelectrodes placed in the PFC, they measured
the firing patterns of neurons during a spatial delayed-response task and showed that
many cells had increased firing, relative to an intertrial baseline period, during both cue
presentation and the later retention period. Importantly, some cells fired exclusively dur­
ing the delay period and therefore could be considered pure “memory cells.” The results
were interpreted as providing evidence for PFC involvement in the focusing of attention
“on information that is being or that has been placed in temporary memory storage for
prospective utilization” (p. 654). Many subsequent electrophysiological studies have
demonstrated memory-related activity in the PFC of the monkey during delayed-response
tasks of various kinds (e.g., Joseph & Barone, 1987; Niki, 1974; Niki & Watanabe, 1976;
Quintana, Yajeya, & Fuster, 1988), although it was Patricia Goldman-Rakic who first drew
a parallel (but see Passingham, 1985) and then firmly linked the phenomenon of persis­
tent activity in PFC to the cognitive psychological concept of “working memory.” In a re­
view of the existing literature on the role of the PFC in short-term memory, Goldman-Ra­
kic (1987), citing lesion and electrophysiological studies in the monkey, human neuropsy­
chology, and the cytoarchitectonics and cortical-cortical connections of the PFC, argued
that the dorsolateral PFC (the principal sulcus of the monkey) plays an essential role in
holding visual-spatial information in memory before the initiation of a response and in the

Page 10 of 48
Working Memory

absence of guiding sensory stimulation. In this and later work (especially that of Wilson,
1993), Goldman-Rakic developed a model of PFC in which visual-spatial and (visual) ob­
ject working memory were topographically segregated, with the former localized to the
principal sulcus and the latter localized to a more ventral region along the inferior con­
vexity of the lateral PFC (Figure 19.2).

This domain-specific view of the prefrontal organization, which was supported by ob­
served dissociations in the responsivity of neurons in dorsal and ventral areas of the PFC
during delayed response tasks, could be viewed as an anterior expansion of the dorsal
(“where”) and ventral (“what”) streams that had been discovered in the visual system in
posterior neocortex (Ungerleider & Mishkin, 1982). In addition, the parallel and modular
nature of the proposed functional and neuroanatomical architecture of PFC was in keep­
ing with the tenet of domain independence in the working memory model of Baddeley and
colleagues.

The connection between persistent activation in the PFC of the monkey and a model of
memory developed in the field of cognitive psychology might seem tenuous, especially in
light of the fact that the working memory model was originally formulated on the basis of
evidence derived from behavioral studies using linguistic material—an informational
medium clearly unavailable to monkeys. For Goldman-Rakic, though, the use of the term
“working memory” in the context of nonhuman primate electrophysiology was intended
not as an offhand or otherwise desultory nod to psychology (Goldman-Rakic, 1990), but
rather as a reasoned and deliberate effort to unify both our understanding of and manner
of referencing a common neurobiological mechanism underlying an aspect of higher cog­
nition that is well developed in primate species. Certainly, in retrospect, the decision to
label the phenomenon of persistent activity in PFC with the term “working memory” has
had an immeasurable impact on memory research, and indeed may be thought of as one
of the two or three most important events contributing to the emergence of an integrated
and unified approach to the study of neurobiology and psychology.

Page 11 of 48
Working Memory

Functional Neuroimaging Studies of Working


Memory

Figure 19.2 A, Diagram of the frontal lobe and loca­


tion of principal sulcus and inferior convexity. B, Re­
sponses of inferior convexity neuron with visual ob­
ject-specific activity in the delayed response task.
Upper panels show delay-period activity for object
memory trials; lower panels show lack of response on
spatial memory trials. C, Responses of dorsolateral
prefrontal neuron with spatial memory selectivity.
Upper panels show lack of responsivity on object
memory trials; lower panels show delay-period activi­
ty on spatial memory trials. D, Schematic diagram il­
lustrating the dorsal and ventral streams in the visu­
al system and their connections with prefrontal cor­
tex (PFC). AS, arcuate sulcus; PS, principal sulcus.
The posterior parietal (PP) cortex is concerned with
spatial perception, and the inferior temporal (IT) cor­
tex with object recognition. These regions are con­
nected with the dorsolateral (DL) and inferior con­
vexity (IC) prefrontal cortices where, according to
the Goldman-Rakic model, memory for spatial loca­
tion and object identity are encoded in working mem­
ory.

Adapted with permission from Wilson et al., 1993.

At about the same time as Fuster and Alexander (1971) recorded neural activity in the
monkey PFC during a working memory task, Ingvar and colleagues examined variation in
regional cerebral blood flow (rCBF) during tasks requiring complex mental activity. In­
deed, Risberg and Ingvar (1973), in the first functional neuroimaging study of short-term
memory, showed that during a backward digit span task, the largest increases in rCBF,
compared with a resting baseline, were observed in pre-rolandic and anterior frontal cor­
tex. It was not, however, until the emergence of PET and the development the oxygen-15
tracer that the mapping of brain activity during the exercise of higher mental functions
would become genuinely amenable to the evaluation of complex hypotheses about of the
neural basis of cognition. In the middle and late 1980s, technological advances in the PET
technique, with its relatively high spatial resolution (approximately 1 cm3), were accom­
panied by a critical conceptual (p. 396) innovation known as cognitive subtraction, which
provided the inferential machinery needed to link regional variation in brain activity to

Page 12 of 48
Working Memory

experimental manipulations at the task or psychological level (Posner, Petersen, Fox, &
Raichle, 1988). Thus, for any set of hypothesized mental processes (a,b,c), if a task can be
devised in which one condition recruits all of the processing components (Task 1a,b,c) and
another condition recruits only a subset of the components (Task 2a,b), subtraction of the
observed regional activity during Task 2 from that observed during Task 1 should reveal
the excess neural activity due to the performance of Task 1, and thus is associated with
the cognitive component c. The working memory model of Baddeley, with its discrete cog­
nitive components (e.g., central executive, phonological loop, and visual-spatial scratch­
pad) was an ideal model with which to test the power of cognitive subtraction using mod­
ern neuroimaging tools. Indeed, in the span of only 2 years, the landmark studies of
Paulesu et al. (1993), Jonides et al. (1993), and D’Esposito et al. (1995) had mapped all of
the cognitive components of the working memory model onto specific regions of the cere­
bral cortex. The challenge in successive years was to go beyond this sort of “psychoneur­
al transcription”—which is necessarily a unidirectional mapping between the cognitive
box and the cerebral convolution—and begin to develop models that generate hypotheses
that refer directly to the brain regions and mechanisms that underlie working memory. In
the following sections, we review how cognitive neuroscience studies of short-term mem­
ory and executive control used the working memory model to gain an initial neural
foothold, upon which later studies were buttressed, and which would lead to insights and
advances in our understanding of working memory as it is implemented in the brain.

Visual-Spatial Working Memory

The first study of visual-spatial working memory in PET was carried out by Jonides and
colleagues in 1993 using the logic of cognitive subtraction to isolate mnemonic processes
associated with the maintenance of visual-spatial information, in a task very similar to the
tasks used by Goldman-Rakic and her colleagues with monkeys (p. 397) (Funahashi,
Bruce, & Goldman-Rakic, 1989; Goldman-Rakic, 1987). During “memory” scans, subjects
were shown an array of three dots appearing for 200 ms on the circumference of a 14-mm
imaginary circle and instructed to maintain the items in memory during a 3-second reten­
tion interval. This was followed by a probe for location-memory consisting of a circular
outline that either did or did not (with equal probability) enclose one of the previously
memorized dots, and to which subjects responded with a yes/no decision. In “perception”
scans, the three dots and the probe outline were presented simultaneously, so that sub­
jects did not have to retain the location of the items in memory during a delay, but instead
simply had to decide whether the outline encircled one of the three displayed dots (Fig­
ure 19.3).

Page 13 of 48
Working Memory

Figure 19.3 A, Schematic presentation of spatial


memory task from Jonides et al. (1993) and Smith et
al. (1995). B, Top panel shows example “memory” tri­
al; bottom panel shows example “perception” trial.

Subtraction of the “perception” scans from the “memory” scans revealed a right-lateral­
ized network of cortical regions that would become a hallmark of neuroimaging studies of
visual-spatial working memory: the posterior parietal lobe, dorsal premotor cortex, occipi­
tal cortex (Brodmann area 19), and PFC. In their interpretation of the findings, the au­
thors suggested that the occipital activity reflected a role in the creation, but not neces­
sarily the maintenance, of an internal visual image of the dot pattern, and that activity in
the PFC might reflect one of two things: (1) the literal storage of a representation of the
image in memory during the delay, or (2) the representation of a pointer or link to other
brain circuitry, perhaps in the occipital or parietal lobe, that is actually responsible for
maintaining the memory engram. These two explanations for the observation of pre­
frontal activity during working memory tasks, which in later years would often be pitted
against each other, nicely framed the emerging debate on the division of labor among the
cortical regions involved in the maintenance of information in working memory.

A major aim of many of the early neuroimaging studies of visual-spatial working memory
was to duplicate the canonical finding of Goldman-Rakic and colleagues of a dorsal-ven­
tral dissociation in monkey PFC for spatial and object working memory. Studies by
Petrides et al. (1993) and McCarthy et al. (1994) demonstrated with PET and fMRI, re­
spectively, that mid-dorsolateral PFC (Brodmann areas 9 and 46) shows increased activity
during spatial working memory when compared with a control condition. An attempt to
show a neuroanatomical double dissociation between spatial and object working memory
was undertaken by Smith et al. (1995) in a PET study that used carefully controlled non­
verbalizable object stimuli that were presented in both object and spatial task contexts.
This study found distinct brain circuits for the storage of spatial and object information,
with spatial working memory relying primarily on right-hemisphere regions in the pre­
frontal (BA 46) and parietal (BA 40) cortices, and object working memory involving only a
left inferotemporal area. These results, however, only partially replicated the monkey
study of Wilson et al. (1993), who had found distinct dorsal and ventral regions in lateral

Page 14 of 48
Working Memory

PFC for spatial and object working memory. A similar pattern was found by McCarthy et
al. (1994), in which regional (p. 398) differences between object and spatial working mem­
ory were most pronounced across hemispheres rather than between dorsal and ventral
divisions of the PFC. In a contemporaneous review and meta-analysis of all human neu­
roimaging studies of working memory, D’Esposito et al. (1998) showed that there was vir­
tually no evidence for a neuroanatomical dissociation between spatial and object working
memory in the PFC, a finding that was supported by a later and more exhaustive quanti­
tative meta-analysis (Wager & Smith, 2003). Indeed, establishing a correspondence be­
tween the functional neuroanatomy of visual-spatial working memory in the monkey and
human brains has proved remarkably difficult, leading to a protracted debate among and
between monkey neurophysiologists and human neuroimaging researchers about the
proper way to conceptualize the functional topography of working memory in the PFC
(Goldman-Rakic, 2000; E. K. Miller, 2000). Increasingly, efforts were made to adapt hu­
man neuroimaging studies to resemble as closely as possible the kinds of tasks used in
animal electrophysiology, such as the delayed match-to-sample procedure. The emer­
gence of event-related fMRI, with its superior spatial and temporal resolution to oxy­
gen-15 PET, was critical to this new effort at cross-disciplinary synthesis and reconcilia­
tion, and led to a number of fundamental insights on the brain basis of working memory,
to the discussion of which we now turn.

Early PET studies of working memory relied exclusively on the logic of cognitive subtrac­
tion to isolate hypothesized components of a complex cognitive task. Thus, even for work­
ing memory tasks that consisted of a number of temporal phases within a given trial (e.g.,
stimulus presentation → memory maintenance → recognition decision), the low temporal
resolution of PET prohibited separate statistical assessment of activity within a single
task phase. Event-related fMRI, on the other hand, with its temporal resolution on the or­
der of 2 to 4 seconds, could be used to examine functional activity in different portions of
a multiphase trial, provided that each of the sequential task components was separated
by approximately 4 seconds (Zarahn, Aguirre, & D’Esposito, 1997). This methodology per­
mits the isolation of maintenance-related activity during the delay period of a match-to-
sample procedure without relying on a complex cognitive subtraction.

Using event-related fMRI, Courtney et al. (1998) demonstrated a neuroanatomical dissoci­


ation between delay-period activity during working memory maintenance for either the
identity (object memory) or location (spatial memory) of a set of three face stimuli.
Greater activity during the delay period on face identity trials was observed in the left in­
ferior frontal gyrus, whereas greater activity during the delay period of the location task
was observed in dorsal frontal cortex, a finding consistent with the spatial/object domain
segregation thesis of Goldman-Rakic (1987). Unlike previous studies that had implicated
human BA 46—the presumed homologue to the monkey principal sulcus—in spatial work­
ing memory, Courtney et al. (1998) observed enhanced delay-period activity for the loca­
tion task, bilaterally, in the superior frontal sulcus (BA 8), a region just anterior to the
frontal eye field (FEF). A control task requiring sensory-guided eye movements was used
to functionally delineate the FEF and thus distinguish them from regions with a specifi­
cally mnemonic function. They concluded that the localization of spatial working memory
Page 15 of 48
Working Memory

in the superior frontal sulcus (posterior and superior to BA 46) indicated an evolutionary
displacement in the functional anatomy of the PFC, possibly owing to the emergence of
new cognitive abilities such as abstract reasoning, complex problem solving, and plan­
ning for the future. In short, then, this study was the first functional neuroimaging study
to fully replicate the object versus spatial working memory dissociation shown by Gold­
man-Rakic and colleagues, insofar as one accepts their proposal that the human homo­
logue to the monkey principal sulcus is located, not in the middle frontal gyrus or BA 46,
but rather in the superior frontal sulcus.

While several subsequent studies of spatial working memory offered further support (Le­
ung, Seelig, & Gore, 2004; Munk et al., 2002; Sala, Rama, & Courtney, 2003; Walter et al.,
2003) for a specifically mnemonic role of the superior frontal sulcus in tasks of spatial
working memory, other studies failed to replicate the finding (Postle, Berger, &
D’Esposito, 1999; Postle, Berger, Taich, & D’Esposito, 2000; Srimal & Curtis). For in­
stance, although Postle et al. (2000) observed delay-period activity in this region during a
spatial working memory task, they also found it to be equally active during the generation
of two-dimensional saccades, a task that required visual-spatial attention and motor con­
trol but placed no demands on memory storage. Using a paradigm that varied the length
of the delay period in a memory-guided saccade task similar to that used by Funahashi
and colleagues in the monkey, Srimal and Curtis (2008) failed to show any maintenance-
related activity in the (p. 399) superior frontal sulcus, casting doubt on whether this area
is the human homologue of the monkey principal sulcus as originally suggested by Court­
ney et al. (1998). Srimal and Curtis (2008) note, however, that all of the studies of spatial
working memory that have shown delay-period activity in human superior frontal sulcus
have used tasks that required maintenance of multiple spatial locations. This suggests
that activity in this region reflects the operation of higher order processes that are need­
ed when one is required to maintain the spatial relation or configuration of multiple ob­
jects in space. Moreover, Curtis et al. (2004) have shown using the memory-guided sac­
cade task that the FEF and intraparietal sulcus (IPS) are the two areas that not only show
robust delay-period activity but also have activity that predicts the accuracy of the gener­
ated saccade (Figure 19.4). This indicates that the very areas that are known to be impor­
tant for the planning and preparation of eye movements and, more generally, for spatial
selective attention (Corbetta, Kincade, & Shulman, 2002) are also critical for working
memory maintenance (Logie, 1995). In addition, recent evidence has shown that internal
shifts to locations in working memory not only activate the frontoparietal oculomotor sys­
tem but also can activate early visual cortex (including V1) in a retinotopic manner
(Munneke, Belopolsky, & Theeuwes).

Page 16 of 48
Working Memory

Visual Object Working Memory

Page 17 of 48
Working Memory

Figure 19.4 Event-related study of spatial working


memory by Curtis et al. (2004). A, Schematic depic­
tion of the oculomotor delayed response tasks in
which subjects used the cue’s location to make a
memory-guided saccade. Both the matching-to-sam­
ple (top) and nonmatching-to-sample (bottom) tasks
began with the brief presentation of a small cue. Dur­
ing matching trials, the subject made a memory-guid­
ed saccade (depicted by the thin black line) after the
disappearance of the fixation cue marking the end of
the delay. Feedback was provided by the re-presenta­
tion of the cue. At this point, the subject corrected
any errors by shifting gaze to the cue. The difference
between the end-point fixation after the memory-
guided saccade and the fixation to acquire the feed­
back cue was used as an index of memory accuracy.
During nonmatching trials, the subject made a sac­
cade to the square that did not match the location of
the sample cue. B, Average (±S.E. bars) blood-oxy­
gen-level-dependent (BOLD) time series data for
matching-to-sample (black) and nonmatching-to-sam­
ple (gray) oculomotor delayed-response tasks. The
solid gray bar represents the delay interval. The gray
gradient in the background depicts the probability
that the BOLD signal is emanating from the delay pe­
riod, where darker indicates more probable. The
frontal eye field (FEF) show greater delay-period ac­
tivity during the matching task where an oculomotor
strategy is efficient. The right intraparietal sulcus
(IPS) shows greater delay-period activity during the
nonmatching task when subjects are biased from us­
ing such a strategy. C, Scatter plot showing the cor­
relation between memory-guided saccade (MGS) ac­
curacy and the magnitude of the delay-period para­
meter estimates in the right FEF. More accurate
MGS was associated with greater delay-period activi­
ty.

Page 18 of 48
Working Memory

Many studies have investigated the maintenance of visual objects, mostly faces, houses,
scenes, and abstract shapes that are not easily verbalizable (e.g., Belger et al., 1998;
Courtney, Ungerleider, Keil, & Haxby, 1996, 1997; Druzgal & D’Esposito, 2001, 2003; Lin­
den et al., 2003; G. McCarthy et al., 1994; Mecklinger, Gruenewald, Besson, Magnie, &
Von Cramon, 2002; Postle & D’Esposito, 1999; Postle, Druzgal, & D’Esposito, 2003; Rama,
Sala, Gillen, Pekar, & Courtney, 2001; Sala et al., 2003; Smith et al., 1995). A consistent
finding has been that posterior cortical areas within the inferior temporal lobe that may
preferentially respond to the presentation of certain categories of complex visual objects
also tend to activate during object working memory tasks. For example, the fusiform
gyrus, found along the ventral surface of the temporal lobe, shows greater activation
when a subject is shown pictures of faces than when shown other types of complex visual
stimuli like pictures of houses or scenes or household objects (Kanwisher, McDermott, &
Chun, 1997). Indeed, given its selective response (p. 400) properties, the fusiform gyrus
has been termed the “fusiform face area,” or FFA.

There are four important findings that indicate that posterior extrastriate cortical regions
like the FFA play an important role in the mnemonic storage of object features. First, the
FFA shows persistent delay-period activity (Druzgal & D’Esposito, 2001, 2003; Postle et
al., 2003; Rama et al., 2001) during working memory tasks. Second, the activity in the
FFA is selective for faces; it is greater during delays in which subjects are maintaining
faces compared with other objects (Sala et al., 2003). Third, as the number of faces that
are being maintained increases, the magnitude of the delay-period activity increases in
the FFA (Druzgal & D’Esposito, 2001, 2003; Jha & McCarthy, 2000). Such load effects
strongly suggest a role in short-term storage because, as the number of items that must
be represented increases, so should the storage demands. Fourth, using a delayed paired-
associates task, Ranganath et al. (2004) showed that the FFA responds during an unfilled
delay interval following the presentation of a house that the subject has learned is associ­
ated with a certain face. Therefore, the delay-period FFA activity likely reflects the reacti­
vated image of the associated face that was retrieved from long-term memory despite the
fact that no face was actually presented before the delay. Together, these studies suggest
that posterior regions of visual association cortex, like the FFA, participate in the internal
storage of specific classes of visual object features. Most likely, the mechanisms used to
create internal representations of objects that are no longer in our environment are simi­
lar to the mechanisms used to represent objects that exist in our external environment.

What happens to memory representations for visual objects when subjects momentarily
divert their attention to elsewhere (i.e., to a competing task or a different item in memo­
ry)? Lewis-Peacock et al. (2012) have exploited the enhanced sensitivity of multivoxel
analyses of fMRI data to answer this question. In their paradigm, a subject is presented
with two items displayed on the screen, one word and one picture, followed by a cue indi­
cating which of the two stimuli to retain in memory. After an 8-second delay, a second cue
appears indicating for the subject either to retain the same item in memory or else to re­
trieve and maintain the item that the subject had previously been instructed to ignore for
another 8 seconds. The authors found that when an item is actively maintained in the fo­
cus of attention, multivoxel patterns of brain activation can be used to “decode” whether
Page 19 of 48
Working Memory

the item in memory is a picture or a word. However, if the item is not actively maintained
in working memory, there is not sufficient information in the multivoxel patterns to identi­
fy it. This is surprising in light of the behavioral data showing that subjects are easily able
to retrieve the ignored item when they are cued to do so (on “switch trials”). Thus, a
memory may be resident within “short-term memory” while not being actively main­
tained, and yet there is no discernible neurophysiological footprint in the blood-oxygen-
level-dependent (BOLD) signal that the item is displaying heightened activation. Two im­
portant points may be taken from this work—first, that analyses of patterns of activity of­
fer increased power and specificity when investigating the brain’s representations of indi­
vidual memories; and, second, that increased BOLD activity during working memory
tasks reflects processes associated with active maintenance and may not be able to de­
tect memory representations that are accessible but nevertheless outside the direct focus
of attention.

Working Memory Capacity


Although many studies of visual working memory, such as those reviewed above, have pri­
marily focused on the extent of overlap between stimulus selectivity during visual percep­
tion and working memory maintenance, a recent line of research has explored the physio­
logical a basis of capacity limits in visual short-term memory. This work has exploited an
extremely simple and elegant short-term memory paradigm that tests a subject’s ability
to detect a change between two sets of objects (usually simple, colored discs or squares;
Figure 19.5) separated in time by a short delay (Luck & Vogel, (p. 401)

Page 20 of 48
Working Memory

1997). Using ERPs, Vogel et


al. (2004) employed a varia­
tion of the change detection
task to explore the electro­
physiological indices of ca­
pacity limits in visual short-
term memory. In each trial,
subjects were presented with
two arrays of colored
squares presented 100 ms on
either side of a central fixa­
tion cross, preceded by a cue
indicating which side of the
Figure 19.5 Two studies of the capacity of visual screen to attend. This was
short-term memory. A, The visual change detection followed by a 900-ms delay
paradigm using an example trial for the left hemi­ and a test array, in which
field. B, Averaged event-related potential difference
waves at lateral and posterior parietal electrode sites subjects had to decide
plotted for memory loads of 1, 2, 3, and 4. C, Region whether any items in the at­
of the intraparietal sulcus (IPS), bilaterally, with level tended visual field had
of activation that tracks visual short-term memory
changed color (items in the
load as defined by Cowan’s K function. D, Behavioral
performance and IPS response functions. Behavioral unattended filed were al­
performance corresponds to the estimated number (K ways re-presented un­
of encoded colored discs at each set size. IM, iconic changed). The number of col­
memory control task; VSTM, visual short-term memo­
ored squares in each hemi­
ry task. Top, Reprinted by permission from Macmil­
lan Publishers Ltd: Nature, Edward K. Vogel and field varied from one to ten
Maro G. Machizawa, “Neural activity predicts indi­ items. By computing the dif­
vidual differences in visual working memory capaci­
ference waveform between
ty,” 428, 748–751, copyright 2004.
ipsilateral activity and con­
tralateral activity over poste­
rior parietal and occipital electrodes, an estimate of the magnitude of delay-period activity could
be calculated for each memory set size. These difference waves revealed that the magnitude of
activity increased linearly from one to three items and then leveled off thereafter. The physiolog­
ical asymptote in the magnitude of neural activity was nearly identical to the behavioral esti­
mate of the average capacity, as measured using a formula referred to as Cowan’s K (Cowan,
2001; Pashler, 1988), which was 2.8. Moreover, individual estimates of the increase in activity
between arrays of two and four items were very highly correlated (r = 0.78) with individual esti­
mates of Cowan’s K, a finding that strongly suggests that the electrophysiological signal reflects
the number of items held in visual short-term memory. Todd and Marois (2004) used the same
paradigm with fMRI and showed that activation in the posterior IPS increased from one to four
items and was flat from four to eight items—a relationship that precisely mirrored the estimate
of Cowan’s K in the group of subjects. Individual differences in the fMRI signal in the posterior
IPS were also shown to correlate with individual estimates of K, thus replicating the result of Vo­
gel (2004). Xu et al. (2006) showed that the complexity of stored objects (rather than colored
squares, multifeature objects were used as memory items) modulated the magnitude of activity
in the lateral occipital cortex and the superior IPS, but did not affect activity in the posterior IPS.
Thus, it appears that the posterior IPS represents a fixed number of objects, or slots, that can be
used to store items of index locations in visual space. Using simultaneous auditory and visual
presentation, Cowan et al. (2011) showed that the (p. 402) IPS had load-dependent activation

Page 21 of 48
Working Memory

that cut across the representational domain, arguing for a modality-independent storage system
that the authors linked to Cowan’s concept of a focus of attention (Cowan, 1988). The correct in­
terpretation, however, as to what is actually being represented in the IPS is still a matter of de­
bate; however, one need not assume that what is being “stored” is a literal representation of an
object—it may instead reflect the deployment of a limited pool of attentional resources that
scales with the number of items that are maintained in memory.

Can Working Memory Capacity Be Expanded Through Training?

In the past several years there has been a great deal of interest in the question as to
whether working memory capacity can be expanded through practice and training (Jaeg­
gi, Buschkuehl, Jonides, & Perrig, 2008; Klingberg, 2010; Morrison & Chein, 2011). Stud­
ies of individual differences in working memory abilities have shown that the construct is
highly correlated (r ∼ 0.5) with fluid intelligence (Engle, Tuholski, Laughlin, & Conway,
1999; Oberauer, Schulze, Wilhelm, & Suss, 2005)—and it therefore stands to reason that
if working memory capacity could be expanded through training, so could one enhance a
person’s general problem-solving abilities and overall intelligence. A central concern of
such training research is in demonstrating that practice with a working memory task con­
fers a benefit, not just in performance on the same or similar types of tasks, but also in
cognitive performance on activities that share little surface similarity with the training
task (Shipstead, Redick, & Engle, 2012)—a phenomenon call far transfer. Although there
is some evidence supporting the hypothesis that training on a working memory task can
lead to improvements on other cognitive tasks (Chein & Morrison, 2010; Jaeggi,
Buschkuehl, Jonides, & Shah, 2011; Klingberg et al., 2005), the largest study of computer­
ized cognitive training involving more than 10,000 participants failed to demonstrate far
transfer (Owen et al., 2010). Nevertheless, research on this topic is still in its infancy, and
it seems plausible that small improvements in working memory capacity could be
achieved through training.

Verbal Working Memory


Research on the neural basis verbal working memory has, for a number of reasons, taken
a different course from corresponding work in the visual domain. First, whereas in visual
working memory many of the most influential ideas and concepts have derived from work
in the monkey, verbal working memory is a uniquely human phenomenon and has there­
fore benefited from animal research only indirectly, or by analogy with the visual system.
Even research on the primary modality relevant to verbal working memory, that of audi­
tion, is surprisingly scarce in the monkey literature, owing to the difficulty in training
nonhuman primates to perform delayed-response tasks with auditory stimuli, which can
take upward of 15,000 learning trials (see Fritz, Mishkin, & Saunders, 2005). On the oth­
er hand, an entirely different state of affairs prevails in the field of human cognitive psy­
chology, where verbal short-term and working memory has over the past 40 years been
studied extensively, almost to the exclusion of other modalities, resulting in thousands of
published articles, a host of highly reliable and replicated behavioral phenomena, and

Page 22 of 48
Working Memory

dozens of sophisticated computational models. Finally, the study of aphasic patients has
provided a wealth of information about the neural circuitry underlying language, and sys­
tematic neurological and neuropsychological inquiries into the impairments that accom­
pany damage to the language system have yielded detailed neuroanatomical models. The
aphasia literature notwithstanding, the study of the neural basis of verbal working memo­
ry has depended, to a much greater extent than has been the case in the visual-spatial do­
main, on pure cognitive models of memory, in particular the phonological loop of Badde­
ley and colleagues. Not surprisingly, as it turns out, there are notable similarities be­
tween working memory for visual material and working memory for linguistic material,
despite the absence of an exactly analogous capacity in nonhuman primates.

Early neurological investigations of patients with language disturbances, or aphasia, re­


vealed that lesions to specific parts of the cerebral cortex could cause extremely selective
deficits in language abilities. Thus, lesions to the inferior frontal gyrus are associated
with Broca’s aphasia, a disorder that causes severe impairments in speech production.
Broca’s aphasia is not, however, a disorder of peripheral motor coordination, such as the
ability to move and control the tongue and mouth, but rather is a disorder of the ability to
plan, program, and access the motor codes required for the production of speech (Good­
glass, 1993). The functions of speech perception and comprehension in Broca’s aphasia
are generally preserved, however. Lesions to the posterior superior temporal gyrus (STG)
and surrounding cortex, on the other hand, are associated (p. 403) with Wernicke’s apha­
sia, a complex syndrome that is characterized by fluent but error-filled production and
poor comprehension and perception of speech. A third, less studied syndrome called con­
duction aphasia, typically caused by lesions in the posterior sylvian region (generally less
extensive and relatively superior to lesions causing Wernicke’s aphasia), is associated
with preserved speech perception and comprehension, occasional errors in otherwise flu­
ent spontaneous speech (e.g., phoneme substitutions), and severe difficulties with verba­
tim repetition of words and sentences (H. Damasio & Damasio, 1980). From the stand­
point of verbal short-term memory, there are a number of important points to be drawn
from these three classic aphasic syndromes. First, the neural structures that underlie the
perception and production of speech are partly dissociable. Thus, it appears that the
brain retains at least two codes for the representation of speech: a sensory, or acoustic
code, and an articulatory, or motor code; the former is necessary for the perception of
speech, and the latter is required for the production of speech. It is tempting to postulate
that posterior temporal lesions primarily affect receptive language functions, whereas an­
terior lesions affect productive language functions—but this is not quite true: Both
Wernicke’s aphasia and conduction aphasia are caused by posterior lesions, yet only the
former is associated with a receptive language disturbance (Hickok & Poeppel, 2000). Se­
cond, all of the above-mentioned disorders affect basic aspects of language processing,
such as the comprehension, production, and perception of speech. Even conduction apha­
sia, for which a deficit in repetition of speech is often emphasized, is characterized by
speech errors that occur in the course of natural language production. Finally, the classic
Wernicke-Lichteim-Geschwind (Geschwind, 1965) model of language explains each of
these three syndromes as disruptions to components of a neuroanatomical network of ar­

Page 23 of 48
Working Memory

eas, in the inferior frontal and superior temporal cortices, that subserve language func­
tion.

In the 1960s a handful of patients were described that did not fit nicely into the classic
aphasiological rubric. Both Luria (1967) and Warrington and Shallice (1969) described pa­
tients with damage to the temporal-parietal cortex who were severely impaired at repeat­
ing sequences of words or digits spoken aloud by the experimenter. Luria referred to the
deficit as an acoustic-mnestic aphasia, whereas Warrington and Shallice (1969), who were
perhaps more attuned to extant information processing models in cognitive psychology,
referred to the deficit as a “selective impairment of auditory-verbal short-term memory.”
In both of these cases, however, the memory impairment was accompanied by a deficit in
ordinary speech production (i.e., word-finding difficulties, errors of speech, and reading
difficulty), which was, in fact, consistent with the more common diagnosis of conduction
aphasia, and therefore complicated the argument in favor of a “pure” memory impair­
ment. Several years later, however, a patient (JB) (Shallice & Butterworth, 1977), also
with a temporal-parietal lesion, was described who had a severely reduced auditory-ver­
bal immediate memory span (one or two items) and yet was otherwise unimpaired in ordi­
nary language use, including speech production and even long-term learning of supra-
span lists of words. Several other such patients have since been described (for a review,
see Shallice & Vallar, 1990), thus strengthening the case for the existence of an auditory-
verbal storage component located in temporal-parietal cortex.

The puzzle, of course, with respect to the classic neurological model of language dis­
cussed above, is how a lesion in the middle of the perisylvian speech center could pro­
duce a deficit in auditory-verbal immediate memory without any collateral deficit in basic
language functioning. One possibility is that the precise location of the brain injury is de­
terminative, so that a particularly focal and well-placed lesion in temporal-parietal cortex
might spare cortex critical for speech perception and speech production, while damaging
a region dedicated to the storage of auditory-verbal information. However, the number of
patients who have been described with a selective impairment to auditory-verbal short-
term memory is small, and the lesion locations that have been reported are comparable to
those that might, in another patient, have led to conduction or Wernicke’s aphasia (A. R.
Damasio, 1992; Dronkers, Wilkins, Van Valin, Redfern, & Jaeger, 2004; Goodglass, 1993).
This would seem, then, to be a question particularly well suited to high-resolution func­
tional neuroimaging.

The first study that attempted to localize the components of phonological loop in the
brain was that of Paulesu and colleagues (1993). In one task, English letters were visually
presented on a monitor and subjects were asked to remember them. In a second task, let­
ters were presented and rhyming judgments were made about them (press a button if let­
ter rhymes with “B”). In a baseline condition, Korean letters were visually presented and
subjects were asked to remember them using a visual code. According to the authors’ log­
ic, the first task would (p. 404) require the contribution of all the components of the
phonological loop—subvocal rehearsal, phonological storage, and executive processes—
and the second (rhyming) task would only require subvocal rehearsal and executive

Page 24 of 48
Working Memory

processes. This reasoning was based on previous research showing that when letters are
presented visually (Vallar & Baddeley, 1984), rhyming decisions engage the subvocal re­
hearsal system, but not the phonological store. Thus, a subtraction of the rhyming condi­
tion from the letter-rehearsal condition should isolate the neural locus of the phonological
store. First, results were presented for the two tasks requiring phonological processing
with the baseline tasks (viewing Korean letters) that did not. Several areas were shown to
be significantly more active in the phonological tasks, including (in all cases, bilaterally):
Broca’s area (BA 44/45), the supplementary motor cortex (SMA), the insula, the cerebel­
lum, Brodmann area 22/42, and Brodmann area 40. Subtracting the rhyming condition
from the phonological short-term memory condition left a single brain area: Brodmann
area 40—the neural correlate of the phonological store.

Not surprisingly, the articulatory rehearsal process recruited a distributed neural circuit
that included the inferior frontal gyrus. Activation of multiple brain regions during articu­
latory rehearsal is not surprising, given the complexity of the process and the variety of
lesion sites associated with a speech production deficit. On the other hand, the localiza­
tion of the phonological store in a single brain region, BA 40 (or the supramarginal
gyrus), comports with the idea of a solitary “receptacle,” where phonological information
is temporarily stored. A number of follow-up PET studies, using various tasks and design
logic, generally replicated the basic finding of the Paulesu study, namely a frontal-insular
cerebellar network associated with rehearsal processes, and a parietal locus for the
phonological store (Awh et al., 1996; Jonides et al., 1998; Salmon et al., 1996; Schumach­
er et al., 1996; Smith & Jonides, 1999).

In a review of these pre-millennium PET studies of verbal working memory, Becker (1999)
questioned whether the localization of the phonological store in BA 40 of the parietal cor­
tex could be reconciled with the logical architecture of the phonological loop. For in­
stance, one key aspect of the phonological loop model is that auditory information
(whether it be speech, tones, music, or white noise), but not visual information, has oblig­
atory access to the phonological store. The reason for this asymmetry is to account for
dissociations in memory performance that depend on the modality in which information is
presented. For instance, the presentation of distracting auditory information while sub­
jects attempt to retain a list of verbal items in memory impairs performance on tests of
recall. In contrast, the presentation of distracting visual information during verbal memo­
ry retention has no impact on verbal recall. This phenomenon, known as the irrelevant
sound effect, is explained by assuming that auditory information—whether relevant or ir­
relevant—always enters the phonological store, but that visual-verbal information only en­
ters the store when it is explicitly subvocalized. Becker and colleagues, however, noted
that if indeed auditory information has obligatory access to the phonological store, its
“neural correlate” should be active even during passive auditory perception. Functional
neuroimaging studies of passive auditory listening (e.g., with no memory component),
however, do not show activity in the parietal lobe, but rather show activation that is large­
ly confined to the superior temporal lobe (e.g., Binder et al., 2000). In addition, efforts to
show verbal mnemonic specificity to maintenance-related activity on the parietal lobe
have not been successful, showing instead that working memories for words, visual ob­
Page 25 of 48
Working Memory

jects, and spatial locations all activate the area (Badre, Poldrack, Pare-Blagoev, Insler, &
Wagner, 2005; Nystrom et al., 2000; Zurowski et al., 2002). Thus, it would appear that if
there were a true neural correlate to the phonological store, it must reside within the
confines of the auditory cortical zone of the superior temporal cortex.

As was the case in the visual-spatial domain, the emergence of event-related fMRI, with
its ability to isolate delay-period activity during working memory, was an inferential boon
to the study of verbal working memory. Postle (1999) showed with visual-verbal presenta­
tion of letter stimuli that delay-period activity in single subjects was often localized in the
posterior-superior temporal cortex rather than the parietal lobe. Buchsbaum (2001) also
used an event-related fMRI paradigm in which, on each trial, subjects were presented
with acoustic speech information that they then rehearsed subvocally for 27 seconds, fol­
lowed by a rest period. Analysis focused on identifying regions that were responsive both
during the perceptual phase and during the rehearsal phase of the trial. Activation oc­
curred in two regions in the posterior superior temporal cortex, one in the posterior supe­
rior temporal sulcus (pSTS) bilaterally and one along the dorsal surface of the left poste­
rior planum temporale, that is, in (p. 405) the Sylvius fissure at the parietal-temporal
boundary (area SPT). Notably, although the parietal lobe did show delay-period activity, it
was unresponsive during auditory stimulus presentation. In a follow-up study, Hickok
(2003) showed that the same superior temporal regions (pSTS and SPT) were active both
during the perception and during delay-period maintenance of short (5-second) musical
melodies, suggesting that these posterior temporal storage sites are not restricted to
speech-based, or phonological, information (Figure 19.6). Several subsequent studies
have confirmed the role of SPT in inner speech and verbal working memory (Hashimoto,
Lee, Preus, McCarley, & Wible, 2010; Hickok, Okada, & Serences, 2009; Koelsch et al.,
2009). Acheson et al. (2011) used fMRI to identify posterior temporal regions activated
during verbal working memory maintenance, and then used repetitive transcranial mag­
netic stimulation (TMS) to these sites while subjects performed a rapid-paced reading
task that involved language production but no memory load. TMS applied to the posterior
temporal area significantly interfered with paced reading, arguing for common neural
substrate for language production and verbal working memory.

Page 26 of 48
Working Memory

Figure 19.6 Main results from Hickok et al.’s (2003)


study of verbal and musical working memory mainte­
nance. A, Averaged time course of activation over the
course of a trial in the Sylvian fissure at the parietal-
temporal boundary (area SPT) for speech and music
conditions. Timeline at bottom shows structure of
each trial; black bars indicate auditory stimulus pre­
sentation. Red traces indicate activation during re­
hearsal trials, black traces indicate activity during
listen-only trials in which subjects did not rehearse
stimuli at all. B, Activation maps of in the left hemi­
sphere (sagittal slices) showing three response pat­
terns for both music rehearsal (left) and speech re­
hearsal trials (right): auditory-only responses shown
in green; delay-period responses shown in blue; and
auditory + rehearsal responses shown in red. Arrows
indicate the location of area SPT. pSTS, posterior su­
perior temporal sulcus.

Stevens et al. (2004) and Rama et al. (2004) have shown that memory for voice identity,
independent of phonological content (i.e., matching speaker identity as opposed to word
identity), selectively activates the mid-STS and the anterior STG of the superior temporal
region, but not the more posterior and dorsally situated SPT region. Buchsbaum et al.
(2005) have further shown that the mid-STS is more active when subjects recall verbal in­
formation that is acoustically presented than when the information is visually presented,
whereas area SPT shows equally strong delay-period activity for both auditory and visual
forms of input. This finding is supported by regional analyses of structural MRI in large
groups of patients with brain lesions that have showed that damage to the STG is most
predictive of auditory short-term memory impairment (Koenigs et al. 2011; Leff et al.,
2009). Thus, it appears that different regions in the auditory association cortex of the su­
perior temporal cortex are attuned to different qualities or features of a verbal stimulus,
such as voice information, input modality, phonological content, and lexical status (e.g.,
Martin & Freedman, 2001)—and all of these codes may play a role in the short-term
maintenance of verbal information.

Page 27 of 48
Working Memory

Figure 19.7 A comparison of conduction aphasia,


phonological working memory in functional magnetic
resonance imaging (fMRI), and their overlap. Left
panel surface shows the regional distribution lesion
overlap in patients with conduction aphasia (maxi­
mum is 12/14, or 85% overlap). Middle panel shows
the percentage of subjects with maintenance-related
activity in a phonological working memory task.
Right panel shows the area of maximal overlap be­
tween the lesion and fMRI surfaces (lesion > 85%
overlap and significant fMRI activity for conjunction
of encoding and rehearsal).

Additional support for a feature-based topography of auditory association cortex comes


from neuroanatomical tract-tracing studies in the monkey (p. 406) that have revealed sep­
arate temporal-prefrontal pathways arising along the anterior-posterior axis of the superi­
or temporal region (Romanski, 2004; Romanski et al., 1999). The posterior part of the
STG projects to dorsolateral PFC (BA 46, 8), whereas neurons in the anterior STG are
more strongly connected to the ventral PFC, including BA 12 and 47. Several authors
have suggested, similar to the visual system, a dichotomy between ventral-going auditory-
object and a dorsal-going auditory-spatial processing streams (Rauschecker & Tian, 2000;
Tian, Reser, Durham, Kustov, & Rauschecker, 2001). Thus, studies have shown that the
neurons in the rostral STG have more selective responses to classes of complex sounds,
such as vocalizations, whereas more caudally located regions have more spatial selectivi­
ty (Chevillet, Riesenhuber, & Rauschecker, 2011; Rauschecker & Scott, 2009; Rauscheck­
er & Tian, 2000; Tian et al., 2001). Hickok and Poeppel (2000, 2004) have proposed that
human speech processing also proceeds along diverging auditory dorsal and ventral
streams, although they emphasize the distinction between perception for action, or audi­
tory-motor integration, in the dorsal stream and perception for comprehension in the ven­
tral stream. Buchsbaum (2005) has shown with fMRI time-series data that, consistent
with the monkey connectivity patterns, the most posterior and dorsal part of the superior
temporal cortex, area SPT, has the strongest functional connectivity with dorsolateral and
posterior (premotor) parts of the PFC, whereas the midportion of the STS is most tightly
coupled with BA 12 and 47 of the ventrolateral PFC. Moreover, gross distinctions be­
tween anterior (BA 47) and posterior (BA 44/6), parts of the PFC have been associated
with conceptual-semantic and phonological-articulatory aspects of verbal processing (Pol­
drack et al., 1999; Wagner, Pare-Blagoev, Clark, & Poldrack, 2001).

Earlier we posed the question as to how a lesion in posterior sylvian cortex, an area of
known importance for online language processing, could occasionally produce an impair­
ment restricted to phonological short-term memory. One solution to this puzzle is that

Page 28 of 48
Working Memory

subjects with selective verbal short-term memory deficits from temporal-parietal lesions
retain their perceptual and comprehension abilities owing to the sparing of the ventral
stream pathways in the lateral temporal cortex, whereas the preservation of speech pro­
duction is due to an unusual capacity in these subjects for right-hemisphere control of
speech (Buchsbaum & D’Esposito, 2008b; Hickok & Poeppel, 2004). The short-term mem­
ory deficit arises, then, from a selective deficit in auditory-motor integration—or the abili­
ty to translate between acoustic and articulatory speech codes—a function that is espe­
cially taxed during tests of repetition and short-term memory (Buchsbaum & D’Esposito,
2008a). Conduction aphasia, the aphasic syndrome most often associated with a deficit in
auditory repetition and verbal short-term memory in the absence of any difficulty with
speech perception, may reflect a disorder of auditory-motor integration. Indeed, it has re­
cently been shown that the lesion site most often implicated in conduction aphasia cir­
cumscribes area SPT in the posterior-most portion of the superior temporal lobe, a link
between a disorder of verbal repetition and a region in the brain often implicated in tasks
of verbal working memory (Buchsbaum et al. 2011; Figure 19.7). Thus, impairment in the
ability to temporarily store verbal information, as occurs in conduction aphasia, may re­
sult from damage to a system, area (p. 407) SPT, that is critical for the interfacing of audi­
tory and motor representations of sound.

Cognitive Control of Working Memory


In the foregoing sections we have examined how different types of information—spatial,
visual, verbal—are represented in the brain. A key conclusion has been that the regions of
the cerebral cortex that are specialized for the perception of certain classes of stimuli are
also important for the maintenance of such information in working memory. This principle
of course only applies to regions of the cerebral cortex, located primarily in the posterior
half of the neocortex, that are specialized for sensory processing in the first place. For in­
stance, the PFC, which is not easily categorized in terms of type of sensory processing,
shows little evidence of content-specific selectivity in working memory. Rather, the PFC
appears to play a more general role in maintaining, monitoring, and controlling the cur­
rent contents of working memory, irrespective of the type of information involved. Indeed,
the role of Baddeley’s somewhat elusive “central executive” is most likely fulfilled by the
coordinated action of the PFC. We learn little, however, by merely substituting the ho­
munculus of the working memory model, the central executive, with a large area of cor­
tex in the front of the brain (Miyake et al., 2000). In recent years, however, progress has
been made in the study of the role of the PFC in “cognitive control” by investigating the
constituent subprocesses that together compose the brain executive system.

Three of the most important processes that serve to regulate the contents of working
memory are selection, reactivation (or updating), and suppression. Each of these opera­
tions functions to regulate what is currently “in” working memory, allowing for the con­
tents of working memory to be determined strategically, according to ongoing goals and
actions of the organism. Moreover, these cognitive control processes allow for the best
utilization of a system that is subject to severe capacity limitations; thus, selection is a
Page 29 of 48
Working Memory

mechanism that regulates what enters working memory, suppression serves to prevent
unwanted information from entering (or remaining in) memory, and reactivation offers a
mechanism for retaining information in working memory. All three of these operations fall
under the general category of what we might call “top-down” signals or commands that
the PFC deploys to effectively regulate memory. In the following sections we briefly re­
view research on the neural basis of these working memory control functions.

It is easy to see why selection is an important process for regulating the contents of work­
ing memory. For instance, if one tries to mentally calculate the product of the numbers 8
and 16, one might attack the problem in stages. First multiply 8 and 6 (yielding 48), then
multiply 8 and 16 (yielding 80), and then add together the intermediate values (48 + 80 =
128). For this strategy to work, one has to be able to be able to select the currently rele­
vant numbers in working memory at each stage of the calculation. Thus, at any given
time, it is likely that many pieces of information are competing for access to working
memory. One way to study this type of selection is to devise working memory tasks that
vary the degree of proactive interference, so that on some trials subjects must select the
correct items from among a set of active, but task irrelevant competitors (see Jonides &
Nee, 2006; Vogel, McCollough, & Machizawa, 2005; Yi, Woodman, Widders, Marois, &
Chun, 2004). Functional neuroimaging studies have shown that the left inferior frontal
gyrus is modulated by the number of irrelevant competing alternatives that confront a
subject (Badre & Wagner, 2007; Thompson-Schill, D’Esposito, Aguirre, & Farah, 1997).
Moreover, behavioral measures of the interfering effects of irrelevant information are cor­
related with the level of activation in the left inferior frontal gyrus (D’Esposito, Postle,
Jonides, & Smith, 1999; Smith, Jonides, Marshuetz, & Koeppe, 1998; Thompson-Schill et
al., 2002). From a mechanistic standpoint, selection may involve the maintenance of goal
information to bias activation in favor of relevant over irrelevant information competing
for access to working memory (E. K. Miller & Cohen, 2001; Rowe, Toni, Josephs, Frack­
owiak, & Passingham, 2000).

Reactivation refers to the updating or refreshing of information that is currently active in


memory. For instance, in verbal working memory, rehearsal is used as a strategy to keep
sequences of phonological material in mind. But why is this strategy effective? According
to the phonological loop model, articulatory rehearsal serves to “refresh” the decaying
traces in the phonological store. We know from functional neuroimaging studies (re­
viewed above) that during covert verbal rehearsal, brain areas known to be important for
speech production (e.g., Broca’s area, premotor cortex) coactivate with areas that are
known to be important for speech perception. One interpretation of this phenomenon,
consistent with the ideas of the phonological loop, is that regions in the PFC that govern
the selection of speech motor programs can also send signals to posterior cortex (p. 408)
that can “reactivate” a set of corresponding sensory traces. In this case, reactivation is
achieved presumably by way of a tight sensorimotor link between a prefrontally situated
motor system (e.g., LIFG for speech) and an auditory system located in posterior neocor­
tex.

Page 30 of 48
Working Memory

There must yet be other means of achieving reactivation, however, when there is not a di­
rect equivalence between some top-down (e.g., motor) and bottom-up (e.g., sensory)
code, as is true in the case of speech, in which every sound that can be produced can also
be perceived (Chun & Johnson, 2011). Thus, for instance, there is no obvious “motor
equivalent” of an abstract shape, color, or picture of a (expressionless) face, and yet such
images can nevertheless be maintained to some degree in working memory. One possibili­
ty is that the maintenance of sensory stimuli that do not have a motor analogue is
achieved by means of a more general mechanism of reactivation that need not rely on di­
rect motor to sensory signals, but rather operates in a similar manner to selective atten­
tion. Thus, just as attention can be focused on this or that aspect of the current perceptu­
al environment, it can also be directed to some subset of the current contents of working
memory. Indeed, if, as we have been suggesting, the neural substrates of memory are
shared with those of perception, then it seems likely that the same mechanisms of selec­
tive attention can be applied to both the contents of perception and the contents of mem­
ory. A potential mechanism for selective modulation of memory representations is by way
of a top-down bias signal that can target populations of cells in sensory cortex and modu­
late their level of activity.

Top-down suppression has the opposite effect of selection or reactivation—it serves to de­
crease, rather than increase, the salience of a stimulus representation in working memo­
ry. Although it is sometimes argued that suppression in attention and memory is merely
the flip side of selection, and therefore does not constitute a separate processing mecha­
nism in its own right, evidence from cognitive neuroscience suggests that this is not the
case. For instance, Gazzaley (2005) used a working memory delay task to directly study
the neural mechanisms underlying top-down activation and suppression by investigating
the processes involved when participants were required to select relevant and suppress
irrelevant information. During each trial, participants observed sequences of two faces
and two natural scenes presented in a randomized order. The tasks differed in the in­
structions informing the participants how to process the stimuli: (1) remember faces and
ignore scenes, (2) remember scenes and ignore faces, or (3) passively view faces and
scenes without attempting to remember them. In the two memory tasks, the encoding of
the task-relevant stimuli required selective attention and thus permitted the dissociation
of physiological measures of enhancement and suppression relative to the passive base­
line. fMRI data revealed top-down modulation of both activity magnitude and processing
speed that occurred above or below the perceptual baseline depending on task instruc­
tion. Thus, during the encoding period of the delay task, FFA activity was enhanced when
faces had to be remembered compared with a condition in which they were passively
viewed. Likewise, FFA activity was suppressed when faces had to be ignored (with scenes
now being retained instead across the delay interval) compared with a condition in which
they were passively viewed. Thus, there appears to be at least two types of top-down sig­
nal: one that serves to enhance task-relevant information and another that serves to sup­
press task-relevant information. It is well documented that the nervous system uses inter­
leaved inhibitory and excitatory mechanisms throughout the neuroaxis (e.g., spinal reflex­
es, cerebellar outputs, and basal ganglia movement control networks). Thus, it may not

Page 31 of 48
Working Memory

be surprising that enhancement and suppression mechanisms may exist to control cogni­
tion (Knight, Staines, Swick, & Chao, 1999; Shimamura, 2000). By generating contrast
through enhancements and suppressions of activity magnitude and processing speed, top-
down signals bias the likelihood of successful representation of relevant information in a
competitive system.

Top-Down Control Signals and the Prefrontal Cortex

Although it has been proposed that the PFC provides the major source of the types of top-
down signals that we have described, this hypothesis largely originates from suggestive
findings rather than direct empirical evidence. However, a few studies lend direct causal
support to this hypothesis. For example, Fuster (1985) investigated the effect of cooling
inactivation of specific parts of the PFC on spiking activity in ITC inferior temporal cortex
(ITC) neurons, during a delayed match-to-sample DMS color task. During the delay inter­
val in this task—when persistent stimulus-specific activity in ITC neurons is observed—in­
activation caused attenuated spiking profiles and a loss of stimulus (p. 409) specificity of
ITC neurons. These two alterations of ITC signaling strongly implicate the PFC as a
source of top-down signals necessary for maintaining robust sensory representations in
the absence of bottom-up sensory activity. Tomita (1999) was able to isolate top-down sig­
nals during the retrieval of paired associates in a visual memory task. Spiking activity
was recorded from stimulus-specific ITC neurons as cue stimuli were presented to the ip­
silateral hemifield. This experiment’s unique feature was the ability to separate bottom-
up sensory signals from a top-down mnemonic reactivation, using a posterior split-brain
procedure that limited hemispheric cross-talk to the anterior corpus callosum connecting
each PFC. When a probe stimulus was presented ipsilaterally to the recording site, thus
restricting bottom-up visual input to the contralateral hemisphere, stimulus-specific neu­
rons became activated at the recording site approximately 170 ms later. Because these
neurons received no bottom-up visual signals of the probe stimulus, with the only route
between the two hemispheres being through the PFC, this experiment showed that PFC
neurons were sufficient to trigger the reactivation of object-selective representations in
ITC regions in a top-down manner. The combined lesion/electrophysiological approach in
humans has rarely been implemented. Chao and Knight (1998), however, studied patients
with lateral PFC lesions during DMS tasks. It was found that when distracting stimuli are
presented during the delay period, the amplitude of the recorded ERP from posterior
electrodes was markedly increased in patients compared with controls. These results
were interpreted to show disinhibition of sensory processing and support a role of the
PFC in suppressing the representation of stimuli that are irrelevant for current behavior.

Based on the data we have reviewed thus far, we might propose that any population of
neurons within primary or unimodal association cortex can exhibit persistent neuronal ac­
tivity, which serves to actively maintain the representations coded by those neuronal pop­
ulations (Curtis & D’Esposito, 2003). Areas of multimodal cortex, such as PFC and pari­
etal cortex, which are in a position to integrate representations through connectivity to
unimodal association cortex, are also critically involved in the active maintenance of task-
relevant information (Burgess, Gilbert, & Dumontheil, 2007; Stuss & Alexander, 2007).
Page 32 of 48
Working Memory

Miller and Cohen (2001) have proposed that in addition to the recent sensory informa­
tion, integrated representations of task contingencies and even abstract rules (e.g., if this
object, then this later response) are also maintained in the PFC. This is similar to what
Fuster (1997) has long emphasized, namely that the PFC is critically responsible for tem­
poral integration and the mediation of events that are separated in time but contingent
on one another. In this way, the PFC may exert “control” in that the information it repre­
sents can bias posterior unimodal association cortex in order to keep neural representa­
tions of behaviorally relevant sensory information activated when they are no longer
present in the external environment (B. T. Miller & D’Esposito, 2005; Postle, 2006b; Ran­
ganath et al., 2004). In a real world example, when a person is looking at a crowd of peo­
ple, the visual scene presented to the retina may include a myriad of angles, shapes, peo­
ple, and objects. If that person is a police officer looking for an armed robber escaping
through the crowd, however, some mechanism of suppressing irrelevant visual informa­
tion while enhancing task-relevant information is necessary for an efficient and effective
search. Thus, neural activity throughout the brain that is generated by input from the out­
side world may be differentially enhanced or suppressed, presumably from top-down sig­
nals emanating from integrative brain regions such as PFC, based on the context of the
situation. As Miller and Cohen (2001) state, putative top-down signals originating in PFC
may permit “the active maintenance of patterns of activity that represent goals and the
means to achieve them. They provide bias signals throughout much of the rest of the
brain, affecting visual processes and other sensory modalities, as well as systems respon­
sible for response execution, memory retrieval, emotional evaluation, etc. The aggregate
effect of these bias signals is to guide the flow of neural activity along pathways that es­
tablish the proper mappings between inputs, internal states and outputs needed to per­
form a given task.” Computational models of this type of system have created a PFC mod­
ule (e.g., O’Reilly & Norman, 2002) that consists of “rule” units whose activation leads to
the production of a response other than the one most strongly associated with a given in­
put. Thus, “this module is not responsible for carrying out input–output mappings needed
for performance. Rather, this module influences the activity of other units whose respon­
sibility is making the needed mappings” (e.g., Cohen, Dunbar, & McClelland, 1990). Thus,
there is no need to propose the existence of a homunculus (e.g., central executive) in the
brain that can perform a wide range of cognitive operations that (p. 410) are necessary for
the task at hand (Hazy, Frank, & O’Reilly, 2006).

Clearly, there are other areas of multimodal cortex such as posterior parietal cortex, and
the hippocampus, that can also be the source of top-down signals. For example, it is
thought that the hippocampus is specialized for rapid learning of arbitrary information
that can be recalled in the service of controlled processing (McClelland, McNaughton, &
O’Reilly, 1995). Several recent studies have offered evidence that the hippocampus, long
thought to contribute little to “online” cognitive activities, plays a role in the maintenance
of information in working memory for novel or high information stimuli (Olsen et al.,
2009; Olson, Page, Moore, Chatterjee, & Verfaellie, 2006; Ranganath & D’Esposito, 2001;
Rose, Olsen, Craik, & Rosenbaum, 2012). Moreover, input from brainstem neuromodula­
tory systems probably plays a critical role in modulating goal-directed behavior (Robbins,

Page 33 of 48
Working Memory

2007). For example, the dopaminergic system probably plays a critical role in cognitive
control processes (for a review, see Cools & Robbins, 2004). Specifically, it is proposed
that phasic bursts of dopaminergic neurons may be critical for updating currently activat­
ed task-relevant representations, whereas tonic dopaminergic activity serves to stabilize
such representations (e.g., Cohen, Braver, & Brown, 2002; Durstewitz, Seamans, & Se­
jnowski, 2000).

Summary and Conclusions


Elucidation of the cognitive and neural architectures underlying working memory has
been an important focus of neuroscience research for much of the past two decades. The
emergence of the concept of working memory, with its emphasis on the utilization of the
information stored in memory in the service of behavioral goals, has enlarged our under­
standing and broadened the scope of neuroscience research of short-term memory. Data
from numerous studies have been reviewed and have demonstrated that a network of
brain regions, including the PFC, is critical for the active maintenance of internal repre­
sentations. Moreover, it appears that the PFC has functional subdivisions that are orga­
nized according to the domain (e.g., verbal, spatial, object) of the topographical inputs ar­
riving from posterior cortices. In addition, however, a level of representational abstract­
ness is achieved through the integration of information converging in the PFC. Finally,
working memory function is not localized to a single brain region, but rather is an emer­
gent property of the functional interactions between the PFC and other posterior neocor­
tical regions. Numerous questions remain about the neural basis of this complex cogni­
tive system, but studies such as those reviewed in this chapter should continue to provide
converging evidence that may provide answers to the many residual questions.

References
Acheson, D. J., Hamidi, M., Binder, J. R., & Postle, B. R. (2011). A common neural sub­
strate for language production and verbal working memory. Journal of Cognitive Neuro­
science, 23 (6), 1358–1367.

Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its con­
trol processes. In K. W. Spence (Ed.), The psychology of learning and motivation: Ad­
vances in research and theory (Vol. 2, pp. 89–195). New York: Academic Press.

Awh, E., Jonides, J., Smith, E. E., Schumacher, E. H., Koeppe, R. A., & Katz, S. (1996). Dis­
sociation of storage and rehearsal in working memory: PET evidence. Psychological
Science, 7, 25–31.

Baddeley, A. D. (1986). Working memory. Oxford (Oxfordshire), New York: Clarendon


Press; Oxford University Press.

Baddeley, A. (1992). Working memory. Science, 255 (5044), 556–559.

Page 34 of 48
Working Memory

Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in
Cognitive Sciences, 4 (11), 417–423.

Baddeley, A., Allen, R., & Vargha-Khadem, F. (2010). Is the hippocampus necessary for vi­
sual and verbal binding in working memory? Neuropsychologia, 48 (4), 1089–1095.

Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. Bower (Ed.), The psychology
of learning and motivation (Vol. 7, pp. 47–90). New York: Academic Press.

Baddeley, A., Lewis, V., & Vallar, G. (1984). Exploring the articulatory loop. Quarterly Jour­
nal of Experimental Psychology, 36, 2233–2252.

Baddeley, A. D., Thomson, N., & Buchanan, M. (1975). Word length and the structure of
short-term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575–589.

Badre, D., Poldrack, R. A., Pare-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Disso­
ciable controlled retrieval and generalized selection mechanisms in ventrolateral pre­
frontal cortex. Neuron, 47 (6), 907–918.

Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive
control of memory. Neuropsychologia, 45 (13), 2883–2901.

Becker, J. T., MacAndrew, D. K., & Fiez, J. A. (1999). A comment on the functional localiza­
tion of the phonological storage subsystem of working memory. Brain and Cognition, 41
(1), 27–38.

Belger, A., Puce, A., Krystal, J. H., Gore, J. C., Goldman-Rakic, P., & McCarthy, G. (1998).
Dissociation of mnemonic and perceptual processes during spatial and nonspatial work­
ing memory using fMRI. Human Brain Mapping, 6 (1), 14–32.

Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N.,
et al. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cerebral
Cortex, 10 (5), 512–528.

Blum, R. A. (1952). Effects of subtotal lesions of frontal granular cortex on delayed reac­
tion in monkeys. AMA Archives of Neurology and Psychiatry, 67 (3), 375–386.

Buchsbaum, B. R., Baldo, J., Okada, K., Berman, K. F., Dronkers, N., D’Esposito,
(p. 411)

M., et al. (2011). Conduction aphasia, sensory-motor integration, and phonological short-
term memory—An aggregate analysis of lesion and fMRI data. Brain and Language.

Buchsbaum, B. R., & D’Esposito, M. (2008b). The search for the phonological store: from
loop to convolution. Journal of Cognitive Neuroscience, 20 (5), 762–778.

Buchsbaum, B. R., Hickok, G., & Humphries, C. (2001). Role of the left superior temporal
gyrus in phonological processing for speech perception and production. Cognitive
Science, 25, 663–678.

Page 35 of 48
Working Memory

Buchsbaum, B. R., Olsen, R. K., Koch, P., & Berman, K. F. (2005). Human dorsal and ven­
tral auditory streams subserve rehearsal-based and echoic processes during verbal work­
ing memory. Neuron, 48 (4), 687–697.

Burgess, P. W., Gilbert, S. J., & Dumontheil, I. (2007). Function and localization within ros­
tral prefrontal cortex (area 10). Philosophical Transactions of the Royal Society of Lon­
don, Series B, Biological Sciences, 362 (1481), 887–899.

Butters, N., Pandya, D., Stein, D., & Rosen, J. (1972). A search for the spatial engram
within the frontal lobes of monkeys. Acta Neurobiologiae Experimentalis (Wars), 32 (2),
305–329.

Chao, L. L., & Knight, R. T. (1998). Contribution of human prefrontal cortex to delay per­
formance. Journal of Cognitive Neurosciences, 10 (2), 167–177.

Chein, J. M., & Morrison, A. B. (2010). Expanding the mind’s workspace: training and
transfer effects with a complex working memory span task. Psychonomic Bulletin and Re­
view, 17 (2), 193–199.

Chevillet, M., Riesenhuber, M., & Rauschecker, J. P. (2011). Functional correlates of the
anterolateral processing hierarchy in human auditory cortex. Journal of Neuroscience, 31
(25), 9345–9352.

Chun, M. M., & Johnson, M. K. (2011). Memory: Enduring traces of perceptual and reflec­
tive attention. Neuron, 72 (4), 520–535.

Cocchini, G., Logie, R. H., Della Sala, S., MacPherson, S. E., & Baddeley, A. D. (2002).
Concurrent performance of two memory tasks: Evidence for domain-specific working
memory systems. Memory and Cognition, 30 (7), 1086–1095.

Cohen, J. D., Braver, T. S., & Brown, J. W. (2002). Computational perspectives on dopamine
function in prefrontal cortex. Current Opinion in Neurobiology, 12 (2), 223–229.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic process­
es: A parallel distributed processing account of the Stroop effect. Psychological Review,
97 (3), 332–361.

Cools, R., & Robbins, T. W. (2004). Chemistry of the adaptive mind. Philosophical Transac­
tions, Series A, Mathematical, Physical, and Engineering Sciences, 362 (1825), 2871–
2888.

Corbetta, M., Kincade, J. M., & Shulman, G. L. (2002). Neural systems for visual orienting
and their relationships to spatial working memory. Journal of Cognitive Neurosciences, 14
(3), 508–523.

Courtney, S. M., Petit, L., Maisog, J. M., Ungerleider, L. G., & Haxby, J. V. (1998). An area
specialized for spatial working memory in human frontal cortex. Science, 279 (5355),
1347–1351.

Page 36 of 48
Working Memory

Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1996). Object and spatial visu­
al working memory activate separate neural systems in human cortex. Cerebral Cortex, 6
(1), 39–49.

Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1997). Transient and sustained
activity in a distributed neural system for human working memory. Nature, 386 (6625),
608–611.

Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their
mutual constraints within the human information-processing system. Psychological Bul­
letin, 104 (2), 163–191.

Cowan, N. (2001). The magical number 4 in short-term memory: a reconsideration of


mental storage capacity. Behavioral and Brain Sciences, 24 (1), 87–114; discussion 114–
185.

Cowan, N., Li, D., Moffitt, A., Becker, T. M., Martin, E. A., Saults, J. S., et al. (2011). A
neural region of abstract working memory. Journal of Cognitive Neurosciences, 23 (10),
2852–2863.

Curtis, C. E., & D’Esposito, M. (2003). Persistent activity in the prefrontal cortex during
working memory. Trends in Cognitive Sciences, 7 (9), 415–423.

Curtis, C. E., Rao, V. Y., & D’Esposito, M. (2004). Maintenance of spatial and motor codes
during oculomotor delayed response tasks. Journal of Neuroscience, 24 (16), 3944–3952.

D’Esposito, M., Aguirre, G. K., Zarahn, E., Ballard, D., Shin, R. K., & Lease, J. (1998).
Functional MRI studies of spatial and nonspatial working memory. Brain Research. Cogni­
tive Brain Research, 7 (1), 1–13.

D’Esposito M, Detre JA, Alsop DC, Shin RK, Atlas S, Grossman M (1995). The neural basis
of the central executive system of working memory. Nature, 378: 279–281.

D’Esposito, M., Postle, B. R., Jonides, J., & Smith, E. E. (1999). The neural substrate and
temporal dynamics of interference effects in working memory as revealed by event-relat­
ed functional MRI. Proceedings of the National Academy of Sciences U S A, 96 (13),
7514–7519.

Damasio, A. R. (1992). Aphasia. New England Journal of Medicine, 326 (8), 531–539.

Damasio, H., & Damasio, A. R. (1980). The anatomical basis of conduction aphasia. Brain,
103 (2), 337–350.

Della Sala, S., Gray, C., Baddeley, A., Allamano, N., & Wilson, L. (1999). Pattern span: A
tool for unwelding visuo-spatial memory. Neuropsychologia, 37 (10), 1189–1199.

Page 37 of 48
Working Memory

Dronkers, N. F., Wilkins, D. P., Van Valin, R. D., Jr., Redfern, B. B., & Jaeger, J. J. (2004). Le­
sion analysis of the brain areas involved in language comprehension. Cognition, 92 (1-2),
145–177.

Druzgal, T. J., & D’Esposito, M. (2001). Activity in fusiform face area modulated as a func­
tion of working memory load. Brain Research. Cognitive Brain Research, 10 (3), 355–364.

Druzgal, T. J., & D’Esposito, M. (2003). Dissecting contributions of prefrontal cortex and
fusiform face area to face working memory. Journal of Cognitive Neurosciences, 15 (6),
771–784.

Durstewitz, D., Seamans, J. K., & Sejnowski, T. J. (2000). Neurocomputational models of


working memory. Nature Neuroscience, 3 (Suppl), 1184–1191.

Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. (1999). Working memory,
short-term memory, and general fluid intelligence: A latent-variable approach. Journal of
Experimental Psychology: General, 128 (3), 309–331.

Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review,
102 (2), 211–245.

Fritz, J., Mishkin, M., & Saunders, R. C. (2005). In search of an auditory engram.
(p. 412)

Proceedings of National Academy of Sciences U S A, 102 (26), 9359–9364.

Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1989). Mnemonic coding of visual
space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 61 (2),
331–349.

Fuster, J. M. (1997). Network memory. Trends in Neurosciences, 20 (10), 451–459.

Fuster, J. M. (2000). Prefrontal neurons in networks of executive memory. Brain Research


Bulletin, 52 (5), 331–336.

Fuster, J. M., & Alexander, G. E. (1971). Neuron activity related to short-term memory.
Science, 173 (997), 652–654.

Fuster, J. M., Bauer, R. H., & Jervey, J. P. (1985). Functional interactions between infer­
otemporal and prefrontal cortex in a cognitive task. Brain Research, 330 (2), 299–307.

Gazzaley, A., Cooney, J. W., McEvoy, K., Knight, R. T., & D’Esposito, M. (2005). Top-down
enhancement and suppression of the magnitude and speed of neural activity. Journal of
Cognitive Neuroscience, 17 (3), 507–517.

Geschwind, N. (1965). Disconnexion syndromes in animals and man. I. Brain, 88 (2), 237–
294.

Glanzer, M., & Cunitz, A.-R. (1966). Two storage mechanisms in free recall. Journal of Ver­
bal Learning and Verbal Behavior, 5, 351–360.

Page 38 of 48
Working Memory

Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of be­


havior by representational memory. In F. Plum (Ed.), Handbook of physiology—the ner­
vous system (Vol. 5, pp. 373–417). Bethesda, MD: American Physiological Society.

Goldman-Rakic, P. S. (1990). Cellular and circuit basis of working memory in prefrontal


cortex of nonhuman primates. Progress in Brain Research, 85, 325–335; discussion, 335–
326.

Goldman-Rakic, P. (2000). Localization of function all over again. NeuroImage, 11 (5 Pt 1),


451–457.

Goodglass, H. (1993). Understanding aphasia. San Diego, CA: Academic Press.

Hasher, L., Zacks, R. T., & Rahhal, T. A. (1999). Timing, instructions, and inhibitory con­
trol: some missing factors in the age and memory debate. Gerontology, 45 (6), 355–357.

Hashimoto, R., Lee, K., Preus, A., McCarley, R. W., & Wible, C. G. (2011). An fMRI study of
functional abnormalities in the verbal working memory system and the relationship to
clinical symptoms in chronic schizophrenia. Cerebral Cortex, 20 (1), 46–60.

Hazy, T. E., Frank, M. J., & O’Reilly, R. C. (2006). Banishing the homunculus: making
working memory work. Neuroscience, 139 (1), 105–118.

Hickok, G. Computational neuroanatomy of speech production (2012). Nature Review


Neuroscience, 13 (2), 135–145.

Hickok, G., Buchsbaum, B., Humphries, C., & Muftuler, T. (2003). Auditory-motor interac­
tion revealed by fMRI: speech, music, and working memory in area Spt. Journal of Cogni­
tive Neuroscience, 15 (5), 673–682.

Hickok, G., Okada, K., & Serences, J. T. (2009). Area Spt in the human planum temporale
supports sensory-motor integration for speech processing. Journal of Neurophysiology,
101 (5), 2725–2732.

Hickok, G., & Poeppel, I. D. (2000). Towards a functional neuroanatomy of speech percep­
tion. Journal of Cognitive Neuroscience, 45–45.

Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for under­
standing aspects of the functional anatomy of language. Cognition, 92 (1-2), 67–99.

Hulme, C., Newton, P., Cowan, N., Stuart, G., & Brown, G. (1999). Think before you speak:
Pauses, memory search, and trace redintegration processes in verbal memory span. Jour­
nal of Experimental Psychology: Learning, Memory, and Cognition, 25 (2), 447–463.

Hunter, W. S. (1913). The delayed reaction in animals and children. Behavioral Mono­
graphs, 2, 1–86.

Ingvar, D. H. (1977). Functional responses of the human brain studied by regional cere­
bral blood flow techniques. Acta Clinical Belgica, 32 (2), 68–83.
Page 39 of 48
Working Memory

Ingvar, D. H., & Risberg, J. (1965). Influence of mental activity upon regional cerebral
blood flow in man: A preliminary study. Acta Neurologica Scandinavica Supplementum,
14, 183–186.

Jacobsen, C. F. (1936). Studies of cerebral function in primates. Comparative Psychologi­


cal Monographs, 13, 1–68.

Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Improving fluid intelli­
gence with training on working memory. Proceedings of the National Academy of
Sciences U S A, 105 (19), 6829–6833.

Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Shah, P. Short- and long-term benefits of cog­
nitive training. Proceedings of the National Academy of Sciences U S A, 108 (25), 10081–
10086.

Jha, A. P., & McCarthy, G. (2000). The influence of memory load upon delay-interval activi­
ty in a working-memory task: an event-related functional MRI study. Journal of Cognitive
Neuroscience, 12 (Suppl 2), 90–105.

Jonides, J., & Nee, D. E. (2006). Brain mechanisms of proactive interference in working
memory. Neuroscience, 139 (1), 181–193.

Jonides, J., Schumacher, E. H., Smith, E. E., Koeppe, R. A., Awh, E., Reuter-Lorenz, P. A.,
et al. (1998). The role of parietal cortex in verbal working memory. Journal of Neuro­
science, 18 (13), 5026–5034.

Jonides, J., Smith, E. E., Koeppe, R. A., Awh, E., Minoshima, S., & Mintun, M. A. (1993).
Spatial working memory in humans as revealed by PET. Nature, 363 (6430), 623–625.

Joseph, J. P., & Barone, P. (1987). Prefrontal unit activity during a delayed oculomotor task
in the monkey. Experimental Brain Research, 67 (3), 460–468.

Kahneman, D. (1973). Attention and effort. New York: Prentice-Hall.

Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in
human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17
(11), 4302–4311.

Klauer, K. C., & Zhao, Z. (2004). Double dissociations in visual and spatial short-term
memory. Journal of Experimental Psychology: General, 133 (3), 355–381.

Klingberg, T. (2010). Training and plasticity of working memory. Trends in Cognitive


Sciences, 14 (7), 317–324.

Klingberg, T., Fernell, E., Olesen, P. J., Johnson, M., Gustafsson, P., Dahlstrom, K., et al.
(2005). Computerized training of working memory in children with ADHD: A randomized,
controlled trial. Journal of the American Academy of Child and Adolescent Psychiatry, 44
(2), 177–186.

Page 40 of 48
Working Memory

Knight, R. T., Staines, W. R., Swick, D., & Chao, L. L. (1999). Prefrontal cortex regulates
inhibition and excitation in distributed neural networks. Acta Psychologica (Amst), 101
(2-3), 159–178.

Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Muller, K., & Gruber, O. (2009).
(p. 413)

Functional architecture of verbal and tonal working memory: an FMRI study. Human
Brain Mapping, 30 (3), 859–873.

Koenigs, M., Acheson, D. J., Barbey, A. K., Solomon, J., Postle, B. R., & Grafman, J. (2011).
Areas of left perisylvian cortex mediate auditory-verbal short-term memory. Neuropsy­
chologia, 49 (13), 3612–3619.

Leff, A. P., Schofield, T. M., Crinion, J. T., Seghier, M. L., Grogan, A., Green, D. W., et al.
(2009). The left superior temporal gyrus is a shared substrate for auditory short-term
memory and speech comprehension: Evidence from 210 patients with stroke. Brain, 132
(Pt 12), 3401–3410.

Leung, H. C., Seelig, D., & Gore, J. C. (2004). The effect of memory load on cortical activi­
ty in the spatial working memory circuit. Cognitive, Affective, and Behavioral Neuro­
science, 4 (4), 553–563.

Lewis-Peacock, J. A., Drysdale, A. T., Oberauer, K., & Postle, B. R. (2012). Neural evidence
for a distinction between short-term memory and the focus of attention. Journal of Cogni­
tive Neuroscience, 24 (1), 61–79.

Linden, D. E., Bittner, R. A., Muckli, L., Waltz, J. A., Kriegeskorte, N., Goebel, R., et al.
(2003). Cortical capacity constraints for visual working memory: Dissociation of fMRI
load effects in a fronto-parietal network. NeuroImage, 20 (3), 1518–1530.

Logie, R. H. (1995). Visuo-spatial working memory. Hove, UK: Erlbaum.

Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and
conjunctions. Nature, 390 (6657), 279–281.

Luria, A. R., Sokolov, E. N., & Klinkows, M. (1967). Towards a neurodynamic analysis of
memory disturbances with lesions of left temporal lobe. Neuropsychologia, 5 (1), 1.

Martin, R. C., & Freedman, M. L. (2001). Short-term retention of lexical-semantic repre­


sentations: Implications for speech production. Memory, 9 (4), 261–280.

Martin, R. C., & He, T. (2004). Semantic short-term memory and its role in sentence pro­
cessing: a replication. Brain and Language, 89 (1), 76–82.

McCarthy, G., Blamire, A. M., Puce, A., Nobre, A. C., Bloch, G., Hyder, F., et al. (1994).
Functional magnetic resonance imaging of human prefrontal cortex activation during a
spatial working memory task. Proceedings of the National Academy of Sciences U S A, 91
(18), 8690–8694.

Page 41 of 48
Working Memory

McCarthy, G., Puce, A., Constable, R. T., Krystal, J. H., Gore, J. C., & Goldman-Rakic, P.
(1996). Activation of human prefrontal cortex during spatial and nonspatial working mem­
ory tasks measured by functional MRI. Cerebral Cortex, 6 (4), 600–611.

McCarthy, R. A., & Warrington, E. K. (1987). The double dissociation of short-term memo­
ry for lists and sentences. Evidence from aphasia. Brain, 110 (Pt 6), 1545–1563.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complemen­
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102 (3),
419–457.

Mecklinger, A., Gruenewald, C., Besson, M., Magnie, M. N., & Von Cramon, D. Y. (2002).
Separable neuronal circuitries for manipulable and non-manipulable objects in working
memory. Cerebral Cortex, 12 (11), 1115–1123.

Miller, B. T., & D’Esposito, M. (2005). Searching for “the top” in top-down control. Neuron,
48 (4), 535–538.

Miller, E. K. (2000). The prefrontal cortex: No simple matter. NeuroImage, 11 (5 Pt 1),


447–450.

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­
nual Review of Neuroscience, 24, 167–202.

Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D.
(2000). The unity and diversity of executive functions and their contributions to complex
“frontal lobe” tasks: A latent variable analysis. Cognitive Psychology, 41 (1), 49–100.

Morrison, A. B., & Chein, J. M. (2011). Does working memory training work? The promise
and challenges of enhancing cognition by training working memory. Psychonomic Bulletin
and Review, 18 (1), 46–60.

Munk, M. H., Linden, D. E., Muckli, L., Lanfermann, H., Zanella, F. E., Singer, W., et al.
(2002). Distributed cortical systems in visual short-term memory revealed by event-relat­
ed functional magnetic resonance imaging. Cerebral Cortex, 12 (8), 866–876.

Munneke, J., Belopolsky, A. V., & Theeuwes, J. (2012). Shifting Attention within Memory
Representations Involves Early Visual Areas. PLoS One, 7 (4), e35528.

Niki, H. (1974). Differential activity of prefrontal units during right and left delayed re­
sponse trials. Brain Research, 70 (2), 346–349.

Niki, H., & Watanabe, M. (1976). Prefrontal unit activity and delayed response: Relation
to cue location versus direction of response. Brain Research, 105 (1), 79–88.

Nystrom, L. E., Braver, T. S., Sabb, F. W., Delgado, M. R., Noll, D. C., & Cohen, J. D.
(2000). Working memory for letters, shapes, and locations: fMRI evidence against stimu­

Page 42 of 48
Working Memory

lusbased regional organization in human prefrontal cortex. NeuroImage, 11 (5 Pt 1), 424–


446.

O’Reilly, R. C., & Norman, K. A. (2002). Hippocampal and neocortical contributions to


memory: advances in the complementary learning systems framework. Trends in Cogni­
tive Sciences, 6 (12), 505–510.

Oberauer, K., Schulze, R., Wilhelm, O., & Suss, H. M. (2005). Working memory and intelli­
gence—their correlation and their relation: Comment on Ackerman, Beier, and Boyle
(2005). Psychological Bulletin, 131 (1), 61–65; author reply, 72–65.

Olsen, R. K., Nichols, E. A., Chen, J., Hunt, J. F., Glover, G. H., Gabrieli, J. D., et al. (2009).
Performance-related sustained and anticipatory activity in human medial temporal lobe
during delayed match-to-sample. Journal of Neuroscience, 29 (38), 11880–11890.

Olson, I. R., Page, K., Moore, K. S., Chatterjee, A., & Verfaellie, M. (2006). Working memo­
ry for conjunctions relies on the medial temporal lobe. Journal of Neuroscience, 26 (17),
4596–4601.

Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, Ballard CG
(2010). Putting brain training to the test. Nature, 465: 775–778.

Pashler, H. (1988). Familiarity and visual change detection. Perception and


Psychophysics, 44 (4), 369–378.

Passingham, R. E. (1985). Memory of monkeys (Macaca mulatta) with lesions in pre­


frontal cortex. Behavioral Neuroscience, 99 (1), 3–21.

Paulesu, E., Frith, C. D., & Frackowiak, R. S. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.

Petrides, M., Alivisatos, B., Evans, A. C., & Meyer, E. (1993). Dissociation of hu­
(p. 414)

man mid-dorsolateral from posterior dorsolateral frontal cortex in memory processing.


Proceedings of the National Academy of Science U S A, 90 (3), 873–877.

Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
(1999). Functional specialization for semantic and phonological processing in the left in­
ferior prefrontal cortex. NeuroImage, 10 (1), 15–35.

Posner, M. I., Petersen, S. E., Fox, P. T., & Raichle, M. E. (1988). Localization of cognitive
operations in the human brain. Science, 240 (4859), 1627–1631.

Postle, B. R. (2006b). Working memory as an emergent property of the mind and brain.
Neuroscience, 139 (1), 23–38.

Postle, B. R., Berger, J. S., & D’Esposito, M. (1999). Functional neuroanatomical double
dissociation of mnemonic and executive control processes contributing to working memo­

Page 43 of 48
Working Memory

ry performance. Proceedings of the National Academy of Science U S A, 96 (22), 12959–


12964.

Postle, B. R., Berger, J. S., Taich, A. M., & D’Esposito, M. (2000). Activity in human frontal
cortex associated with spatial working memory and saccadic behavior. Journal of Cogni­
tive Neuroscience, 12 (Suppl 2), 2–14.

Postle, B. R., & D’Esposito, M. (1999). “What”-Then-Where” in visual working memory: An


event-related fMRI study. Journal of Cognitive Neuroscience, 11 (6), 585–597.

Postle, B. R., D’Esposito, M., & Corkin, S. (2005). Effects of verbal and nonverbal interfer­
ence on spatial and object visual working memory. Memory and Cognition, 33 (2), 203–
212.

Postle, B. R., Druzgal, T. J., & D’Esposito, M. (2003). Seeking the neural substrates of vi­
sual working memory storage. Cortex, 39 (4-5), 927–946.

Postman, L., & Phillips, L.-W. (1965). Short-term temporal changes in free recall. Quarter­
ly Journal of Experimental Psychology, 17, 132–138.

Prabhakaran, V., Narayanan, K., Zhao, Z., & Gabrieli, J. D. (2000). Integration of diverse
information in working memory within the frontal lobe. Nature Neuroscience, 3 (1), 85–
90.

Quintana, J., Yajeya, J., & Fuster, J. M. (1988). Prefrontal representation of stimulus attrib­
utes during delay tasks. I. Unit activity in cross-temporal integration of sensory and sen­
sory-motor information. Brain Research, 474 (2), 211–221.

Rama, P., Poremba, A., Sala, J. B., Yee, L., Malloy, M., Mishkin, M., et al. (2004). Dissocia­
ble functional cortical topographies for working memory maintenance of voice identity
and location. Cerebral Cortex, 14 (7), 768–780.

Rama, P., Sala, J. B., Gillen, J. S., Pekar, J. J., & Courtney, S. M. (2001). Dissociation of the
neural systems for working memory maintenance of verbal and nonspatial visual informa­
tion. Cognitive, Affective, and Behavioral Neuroscience, 1 (2), 161–171.

Ranganath, C., Cohen, M. X., Dam, C., & D’Esposito, M. (2004). Inferior temporal, pre­
frontal, and hippocampal contributions to visual working memory maintenance and asso­
ciative memory retrieval. Journal of Neuroscience, 24 (16), 3917–3925.

Ranganath, C., & D’Esposito, M. (2001). Medial temporal lobe activity associated with ac­
tive maintenance of novel information. Neuron, 31 (5), 865–873.

Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhu­
man primates illuminate human speech processing. Nature Neuroscience, 12 (6), 718–
724.

Page 44 of 48
Working Memory

Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what”
and “where” in auditory cortex. Proceedings of the National Academy of Sciences U S A,
97 (22), 11800–11806.

Raye, C. L., Johnson, M. K., Mitchell, K. J., Reeder, J. A., & Greene, E. J. (2002). Neu­
roimaging a single thought: Dorsolateral PFC activity associated with refreshing just-acti­
vated information. NeuroImage, 15 (2), 447–453.

Risberg, J., & Ingvar, D. H. (1973). Patterns of activation in the grey matter of the domi­
nant hemisphere during memorizing and reasoning: A study of regional cerebral blood
flow changes during psychological testing in a group of neurologically normal patients.
Brain, 96 (4), 737–756.

Robbins, T. W. (2007). Shifting and stopping: Fronto-striatal substrates, neurochemical


modulation and clinical implications. Philosophical Transactions of the Royal Society of
London, Series B, Biological Sciences, 362 (1481), 917–932.

Romanski, L. M. (2004). Domain specificity in the primate prefrontal cortex. Cognitive, Af­
fective, and Behavioral Neuroscience, 4 (4), 421–429.

Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P.
(1999). Dual streams of auditory afferents target multiple domains in the primate pre­
frontal cortex. Nature Neuroscience, 2 (12), 1131–1136.

Rose, N. S., Olsen, R. K., Craik, F. I., & Rosenbaum, R. S. (2012). Working memory and
amnesia: the role of stimulus novelty. Neuropsychologia, 50 (1), 11–18.

Rowe, J. B., Toni, I., Josephs, O., Frackowiak, R. S., & Passingham, R. E. (2000). The pre­
frontal cortex: response selection or maintenance within working memory? Science, 288
(5471), 1656–1660.

Sala, J. B., Rama, P., & Courtney, S. M. (2003). Functional topography of a distributed
neural system for spatial and nonspatial information maintenance in working memory.
Neuropsychologia, 41 (3), 341–356.

Salame, P., & Baddeley, A. D. (1982). Disruption of short-term memory by unattended


speech: Implications for the structure of working memory. Journal of Verbal Learning and
Verbal Behavior, 21, 150–164.

Salmon, E., Van der Linden, M., Collette, F., Delfiore, G., Maquet, P., Degueldre, C., et al.
(1996). Regional brain activity during working memory tasks. Brain, 119 (Pt 5), 1617–
1625.

Schumacher, E. H., Lauber, E., Awh, E., Jonides, J., Smith, E. E., & Koeppe, R. A. (1996).
PET evidence for an amodal verbal working memory system. NeuroImage, 3 (2), 79–88.

Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le­
sions. Journal of Neurology, Neurosurgery, and Psychiatry, 20 (1), 11–21.

Page 45 of 48
Working Memory

Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions of the


Royal Society of London, Series B, Biological Sciences, 298 (1089), 199–209.

Shallice, T., & Butterworth, B. (1977). Short-term memory impairment and spontaneous
speech. Neuropsychologia, 15 (6), 729–735.

Shallice, T., & Vallar, G. (1990). The impairment of auditory-verbal short-term storage. In
G. Vallar & T. Shallice (Eds.), Neuropsychological impairments of short-term memory (pp.
11–53). Cambridge, UK: Cambridge University Press.

Shallice, T., & Warrington, E. K. (1970). Independent functioning of verbal memory


stores: A neuropsychological study. Quarterly Journal of Experimental Psychology, 22 (2),
261–273.

Shallice, T., & Warrington, E. K. (1977). Auditory-verbal short-term-memory impairment


and conduction aphasia. Brain and Language, 4 (4), 479–491.

Shimamura, A. P. (2000). Toward a cognitive neuroscience of metacognition. Con­


(p. 415)

sciousness and Cognition, 9 (2 Pt 1), 313–323; discussion 324–316.

Shipstead, Z., Redick, T. S., & Engle, R. W. (2012). Is working memory training effective?
Psychological Bulletin, 138 (4).

Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes.
Science, 283 (5408), 1657–1661.

Smith, E. E., Jonides, J., Koeppe, R. A., Awh, E., Schumacher, E. H., & Minoshima, S.
(1995). Spatial versus object working-memory—PET investigations. Journal of Cognitive
Neuroscience, 7 (3), 337–356.

Smith, E. E., Jonides, J., Marshuetz, C., & Koeppe, R. A. (1998). Components of verbal
working memory: evidence from neuroimaging. Proceedings of the National Academy of
Sciences U S A, 95 (3), 876–882.

Srimal, R., & Curtis, C. E. (2008). Persistent neural activity during the maintenance of
spatial position in working memory. NeuroImage, 39 (1), 455–468.

Stevens, A. A. (2004). Dissociating the cortical basis of memory for voices, words and
tones. Brain Research. Cognitive Brain Research, 18 (2), 162–171.

Stuss, D. T., & Alexander, M. P. (2007). Is there a dysexecutive syndrome? Philosophical


Transactions of the Royal Society of London, Series B, Biological Sciences, 362 (1481),
901–915.

Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of left
inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceedings
of the National Academy of Sciences U S A, 94 (26), 14792–14797.

Page 46 of 48
Working Memory

Thompson-Schill, S. L., Jonides, J., Marshuetz, C., Smith, E. E., D’Esposito, M., Kan, I. P.,
et al. (2002). Effects of frontal lobe damage on interference effects in working memory.
Cognitive, Affective, and Behavioral Neuroscience, 2 (2), 109–120.

Tian, B., Reser, D., Durham, A., Kustov, A., & Rauschecker, J. P. (2001). Functional special­
ization in rhesus monkey auditory cortex. Science, 292 (5515), 290–293.

Todd, J. J., & Marois, R. (2004). Capacity limit of visual short-term memory in human pos­
terior parietal cortex. Nature, 428 (6984), 751–754.

Tomita, H., Ohbayashi, M., Nakahara, K., Hasegawa, I., & Miyashita, Y. (1999). Top-down
signal from prefrontal cortex in executive control of memory retrieval. Nature, 401
(6754), 699–703.

Ungerleider, L., & Mishkin, M. (1982). Two cortical visual systems. In D. Ingle, M. A.
Goodale & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge,
M A: MIT Press.

Vallar, G., & Baddeley, A. (1984). Fractionation of working memory: Neuropsychological


Evidence for a phonological short-term store. Journal of Verbal Learning and Verbal Be­
havior, 23, 151–161.

Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in
visual working memory capacity. Nature, 428 (6984), 748–751.

Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal indi­
vidual differences in controlling access to working memory. Nature, 438 (7067), 500–503.

Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A meta-
analysis. Cognitive, Affective, and Behavioral Neuroscience, 3 (4), 255–274.

Wagner, A. D., Pare-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning:
Left prefrontal cortex guides controlled semantic retrieval. Neuron, 31 (2), 329–338.

Walter, H., Wunderlich, A. P., Blankenhorn, M., Schafer, S., Tomczak, R., Spitzer, M., et al.
(2003). No hypofrontality, but absence of prefrontal lateralization comparing verbal and
spatial working memory in schizophrenia. Schizophrenic Research, 61 (2-3), 175–184.

Warrington, E., & Shallice, T. (1969). Selective impairment of auditory verbal short-term
memory. Brain, 92, 885–886.

Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72, 89–
104.

Wilson, F. A., Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of object and
spatial processing domains in primate prefrontal cortex. Science, 260 (5116), 1955–1958.

Xu, Y., & Chun, M. M. (2006). Dissociable neural mechanisms supporting visual short-
term memory for objects. Nature, 440 (7080), 91–95.
Page 47 of 48
Working Memory

Yi, D. J., Woodman, G. F., Widders, D., Marois, R., & Chun, M. M. (2004). Neural fate of ig­
nored stimuli: Dissociable effects of perceptual and working memory load. Nature Neuro­
science, 7 (9), 992–996.

Zarahn, E., Aguirre, G., & D’Esposito, M. (1997). A trial-based experimental design for
fMRI. NeuroImage, 6 (2), 122–138.

Zurowski, B., Gostomzyk, J., Gron, G., Weller, R., Schirrmeister, H., Neumeier, B., et al.
(2002). Dissociating a common working memory network from different neural substrates
of phonological and spatial stimulus processing. NeuroImage, 15 (1), 45–57.

Bradley R. Buchsbaum

Bradley R. Buchsbaum, Rotman Research Institute, Baycrest Centre, Toronto, On­


tario, Canada

Mark D'Esposito

Mark D’Esposito is Professor of Neuroscience and Psychology, and Director of the


Henry H. Wheeler, Jr. Brain Imaging Center at the Helen Wills Neuroscience Institute
at the University of California, Berkeley. He is also Director of the Neurorehabilita­
tion Unit at the Northern California VA Health Care System and Adjunct Professor of
Neurology at UCSF.

Page 48 of 48
Motor Skill Learning

Motor Skill Learning  


Rachael Seidler, Bryan L. Benson, Nathaniel B. Boyden, and Youngbin Kwak
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0020

Abstract and Keywords

Early neuroscience experiments of skill learning focused predominantly on motor cortical


plasticity. More recently, experiments have shown that cognitive processes such as work­
ing memory and error detection are engaged in the service of motor skill learning, partic­
ularly early in the learning process. This engagement early in learning maps onto pre­
frontal and striatal brain regions. As learning progresses, skill performance becomes au­
tomated and is no longer associated with prefrontal cortical recruitment. “Choking under
pressure” may involve a return to early learning cognitive control mechanisms, resulting
in disruption of highly learned skills. Conversely, analogy learning may speed skill acqui­
sition by allowing learners to bypass the more cognitively demanding early stages. Ques­
tions for the future include the relative involvement of and interaction between implicit
and explicit memory systems for different types of skill learning, and the impact of experi­
ential and genetic individual differences on learning success.

Keywords: skill learning, working memory, error, implicit, explicit

Introduction
Do you aspire to take up the piano? To improve your tennis or golf game? Or, perhaps you
would like to maintain your motor abilities in the face of injury, growth, advancing age, or
disease? Skill learning underlies our capacity for such adaptive motor behaviors. Re­
search over the past 20 years has provided great insight into the dynamic neural process­
es underlying human motor skill acquisition, focusing primarily on brain networks that
are engaged during early versus late stages of learning. What has been challenging for
the field is to tightly link these shifting neural processes with what is known about mea­
sureable behavioral changes and strategic processes that occur during learning. Because
individuals learn at different rates and often adopt different strategies, it is difficult to
characterize the dynamics of evolving motor behaviors. Moreover, skill learning is often
implicit, so verbal reports about the learning process are imprecise and unreliable. Here,
we review our current understanding of skill learning from a cognitive neuroscience

Page 1 of 38
Motor Skill Learning

perspective, with a particular emphasis on linking the cognitive (i.e., which strategies and
behavioral processes are relied on for skill learning) with the neuroscience (i.e., which
neural networks underlie these processes, and where motor memories are stored in the
brain). Researchers in different disciplines have employed varying approaches to these
two topics with relatively little crosstalk. For example, those working from the behavioral
approach have been focused on questions such as whether implicit or explicit memory
systems are engaged during early and late learning; those working from the computation­
al neuroscience approach have modeled fast and slow processes of learning; and those
working from the cognitive neuroscience approach have identified large-scale shifts in
brain networks that are engaged (p. 417) during early versus late learning. Here, we en­
deavor to bring together these predominantly autonomous schools of thought. Given the
constraints of this chapter, we focus on learning from overt practice and do not address
other aspects of the field such as sleep-dependent consolidation of learning, transfer of
learning, mental practice, and learning by observing others. Moreover, these topics have
been recently reviewed elsewhere (Garrison, Winstein, & Aziz-Zadeh, 2010; Krakauer &
Shadmehr, 2006; Robertson, Pascual-Leone, & Miall, 2004; Seidler, 2010).

Researchers studying skill acquisition have classified learning into at least two broad cat­
egories: sensorimotor adaptation and sequence learning (Doyon & Benali, 2005; Willing­
ham, 1998). In sensorimotor adaptation paradigms, participants modify movements to ad­
just to changes in either sensory input or motor output characteristics. A real-world ex­
ample is learning to drive a new car: The magnitude of vehicle movement in response to
the amount of wheel turn and accelerator depression varies across vehicles. Thus, the dri­
ver must learn the new mapping between his or her actions and the resulting vehicle
movements. Another example of sensorimotor adaptation is learning how the size and
speed of hand movements of a mouse correspond to cursor movements on a computer
display screen. The study of motor performance under transformed spatial mappings
spans over 100 years. Helmholtz (1867) originally used prisms to invert the visual world,
whereas more recent investigations make use of computer displays to alter visual feed­
back of movement (Cunningham, 1989; Ghilardi, Gordon, & Ghez, 1995; Krakauer, Pine,
Ghilardi, & Ghez, 2000; Seidler, Noll, & Chintalapati, 2006). These studies have demon­
strated that sensorimotor adaptation occurs when movements are actively made in the
new environment.

Motor sequence learning refers to the progressive association between isolated elements
of movement, eventually allowing for a multi-element sequence to be performed quickly.
A real-world example is a gymnast learning a new tumbling routine. The gymnast already
knows how to perform the individual skills in isolation, but must practice to combine
them with seamless transitions.

Behavioral Models of Skill Learning


Acquiring a skill requires practice and repetition. The father of modern psychology,
William James, understood this when he wrote: “We must make automatic and habitual,

Page 2 of 38
Motor Skill Learning

as early as possible, as many useful actions as we can” (James, 1890). Even well before
James, the ancient Greeks knew that repetitive action brought about physiological and
behavioral changes in an organism. In his treatise, On the Motion of Animals (1908/330
BC), Aristotle noted that “the body must straightaway be moved and changed with the
changes that nature makes dependent upon one another.”

It is now more than 2,000 years since Aristotle first philosophized on the changes associ­
ated with the movement of organisms. How much, or how little, have recent advances in
cognitive neuroscience changed the way we think about skill learning? What theories and
ideas do we still hold dear and which have we discarded? This section of the chapter will
highlight a few influential models of the processes underlying skill learning that have
been offered in the past 50 years or so.

Fitts’ and Posner’s Stage Model

In 1967 Paul Fitts and Michael Posner proposed an influential model for understanding
motor skill acquisition (Fitts & Posner, 1967). In their model, the learning of a movement
progresses through three interrelated phases: the cognitive phase, the associative phase,
and the autonomous phase. One utility of this model is that it applies well to both motor
and cognitive skill acquisition.

Figure 20.1 The shift from early to late learning


processes that occurs as a function of skill acquisi­
tion. Choking under pressure has been linked to a re­
gression back to the cognitively demanding early
stages of learning, whereas analogy learning is
thought to accelerate performance improvements
through bypassing this stage.

In the cognitive stage, learners must use their attentional resources to break down a de­
sired skill into discrete components. This involves creating a mental picture of the skill,
which helps to facilitate an understanding of how these parts come together to form cor­
Page 3 of 38
Motor Skill Learning

rect execution of the desired movement. Performance at this stage might be defined as
conscious incompetence; individuals are cognizant of the various components of the task,
yet cannot perform them efficiently and effectively. The next phase of the model, the asso­
ciative phase, requires repeated practice and the use of feedback to link the component
parts into a smooth action. The ability to distinguish important from unimportant stimuli
is central to this stage of the model. For example, if a baseball player is learning to detect
and hit a curveball, attention should be paid to the most telling stimuli, such as the angle
of the wrist on the pitcher’s throwing hand, not the height of the pitcher’s leg-kick. The
last stage of the model, the autonomous stage, involves development of the learned skill
so that it becomes habitual and automatic. Individuals at this stage rely on processes that
require little or no conscious attention. Performance may be defined as unconscious com­
petence, and is (p. 418) reliant on experience and stored knowledge easily accessible for
the execution of the motor skill (Figure 20.1). As will be seen throughout this chapter, this
model has had an enduring impact on the field of skill learning.

Closed-Loop and Open-Loop Control

Another influential theory from the information processing perspective is Adams’ (1971)
closed-loop theory of skill acquisition. In this theory, two independent memory represen­
tations, a memory trace and a perceptual trace, serve to facilitate the learning of self-
paced pointing movements. Selection and initiation of movement are the responsibility of
the memory trace, whose strength is seen as a function of the stimulus–response contigui­
ty. With practice the strength of the memory trace is increased (Newell, 1991). The per­
ceptual trace relies on feedback, both internal and external, to create an image of the cor­
rectness of the desired skill. Motor learning then takes place through the process of de­
tecting errors and inconsistencies between the perceptual trace and memory trace
(Adams, 1971). Repetition, feedback, and refinement serve to produce a set of associated
sensory representations and movement units that can be called on, depending on the req­
uisite skill. Interestingly, both internal and external feedback loops are incorporated into
more modern skill acquisition theories as well.

Although Adams’ theory led to numerous empirical examinations of two-state memory


representations of motor learning, there are theoretical problems associated with the
closed-loop theory. One such problem concerns novel movements. If movement units are
stored with a memory trace and a perceptual trace in a one-to-one matching, then how
does a learner generate new movements? Schmidt’s (1975) schema theory sought to deal
with this problem by allowing for recognition mechanisms of movement to be generalized
and mapped in a one-to-many fashion. Schema theory proposes that learning motor skills
results in the construction of “generalized motor programs” dependent on the relation­
ship between variables rather than the absolute instantiations of the variable themselves
(Newell, 1991). Thus, representations of individual actions themselves are not stored.
Rather, abstract relationships or rules for motor programs are stored and called on
through associative stimuli and habit strengths and can be executed without delay (Sch­
midt, 1975).

Page 4 of 38
Motor Skill Learning

Bayesian Models of Skill Learning

Many theories of skill learning describe processes of optimization. The oldest of these
theories is optimization of movement cost; that is, as learning progresses, performance of
the skill becomes not only more automatic, but also less effortful. For example, in the
case of runners, refinement and optimization of running form allows athletes to run at the
same speed with less effort, and consequently to run faster at maximal effort (Conley &
Krahenbuhl, 1980; Jones, 1998). In the case of Olympic weightlifters, learning complex
techniques allows them to lift heavier weights with less effort (Enoka, 1988).

This idea fits well with the theory of evolution: strategies that minimize energy consump­
tion should be favored. As a consequence, early studies of motor control in the framework
of optimization focused on the metabolic cost of movements, particularly gait (Atzler &
Herbst, 1927; Ralston, 1958). These studies demonstrated good accordance between the
metabolically optimal walking speed and preferred walking speed (Holt, Hamill, & An­
dres, 1991).

However, movements are not always optimized for lowest metabolic cost, even after ex­
tensive (p. 419) practice (Nelson, 1983). One possible reason for this is that there are sim­
ply too many degrees of freedom for the motor system to sample all possibilities. Further­
more, our environment is highly variable, our information about it is too limited, and our
movements and their associated feedback are too noisy.

In an attempt to explain how individuals adapt despite such variability, researchers have
suggested that the motor system relies on a Bayesian model to maximize the likelihood of
desired outcomes (Geisler, 1989; Gepshtein, Seydell, & Trommershauser, 2007; Körding &
Wolpert, 2004; Maloney & Mamassian, 2009; Seydell, McCann, Trommershauser, & Knill,
2008; Trommershäuser, Maloney, & Landy, 2008). Such approaches predict that move­
ment accuracy is stressed early in learning, which quickly eliminates many inefficient
movement patterns.

A model of Bayesian transfer, in which parameters from one task are used to more rapidly
learn a second, novel task, predicts that learning complex movements is achieved most
efficiently by first learning component, simple movements (Maloney & Mamassian, 2009).
This may seem like common sense: beginning piano students don’t start right away play­
ing concertos, but rather start with single notes, scales, and chords, and progress from
there. In the beginning, basic commands, known as motor primitives, are combined to
produce a simple movement such as a key press or a pen stroke (Nishimoto & Tani, 2009;
Paine & Tani, 2004; Polyakov, Drori, Ben-Shaul, Abeles, & Flash, 2009; Thoroughman &
Shadmehr, 2000). By learning simple movements first, the brain is able to simultaneously
form a model of the environment and its uncertainty, which transfers to other tasks (Mc­
Nitt-Gray, Requejo, & Flashner, 2006; Seydell et al., 2008).

The advent and widespread use of neuroimaging techniques have opened the door for in­
tegration of these and other psychological theories with evolving views of brain function.
Questions that have dominated the field of neuromotor control, and which continue to

Page 5 of 38
Motor Skill Learning

take center stage today, include: What are the neural bases of the stages of learning?
How does their identification inform our knowledge regarding the underlying processes
of skill learning? Where in the brain are motor memories formed and stored? How gener­
alizable are these representations?

Cognitive Neuroscience of Skill Learning


The Beginning

Early neuroimaging studies often investigated motor tasks because they allow for simple
recording and tracking of overt behavioral responses (Grafton, Mazziotta, Woods, &
Phelps, 1992; Kim et al., 1993; Pascual-Leone, Brasil-Neto, Valls-Sole, Cohen, & Hallett,
1992; Pascual-Leone, Valls-Sole, et al., 1992). Additionally, the discovery that motor repre­
sentations were capable of exhibiting experience-dependent change even in the adult
brain was greeted with much interest and excitement. These earlier experiments identi­
fied prominent roles for the motor cortex, cerebellum, and striatum in skill learning, com­
plementing the results of earlier experiments conducted with neurological patients (Pas­
cual-Leone et al., 1993; Weiner, Hallett, & Funkenstein, 1983). Much of the early neu­
roimaging experiments of skill learning focused specifically on the motor cortex, identify­
ing an initial expansion of functional movement representations in primary motor cortex,
followed by retraction of these representations as movement automatization progressed
(Doyon, Owen, Petrides, Sziklas, & Evans, 1996; Grafton, Mazziotta, Presty, et al., 1992;
Grafton, Woods, Mazziotta, & Phelps, 1991; Jueptner, Frith, Brooks, Frackowiak, & Pass­
ingham, 1997; Jueptner, Stephan, et al., 1997; Karni et al., 1995, 1998; Pascual-Leone,
Grafman, & Hallett, 1994; Pascual-Leone & Torres, 1993).

It seems logical to begin studying motor learning processes by focusing on the motor cor­
tex. However, these and other subsequent studies also reported extensive activation of
“nonmotor” brain networks during skill learning as well, including engagement of dorso­
lateral prefrontal cortex, parietal cortex, anterior cingulate cortex, and associative re­
gions of the striatum (cf. Doyon & Benali, 2005). This work is extensively reviewed in the
current chapter, with a particular emphasis on the processes and functions of both non­
motor and motor networks that are engaged across the time course of learning.

Recent neuroimaging studies have tracked the time course of involvement of these addi­
tional brain structures. Typically, frontal-parietal systems are engaged early in learning
with a shift to activation in more “basic” motor cortical and subcortical structures later in
learning. This is taken as support that early learning is a more cognitively controlled
process, whereas later learning is more automatic. This view is supported by studies
showing dual-task interference during early learning (Eversheim & Bock, 2001; Taylor &
Thoroughman, 2007, 2008) and those reporting correlations between cognitive capacity
and the rate of early learning (Anguera, Reuter-Lorenz, Willingham, & Seidler, 2010; Bo,
Borza, & Seidler, 2009; Bo & Seidler, 2009). (p. 420) In the subsequent sections, we out­

Page 6 of 38
Motor Skill Learning

line evidence and provide reasoned speculation regarding precisely which cognitive
mechanisms are engaged during early skill learning.

Early Learning Processes

In this section, we describe what is known about the neurocognitive processes of skill
learning, emphasizing the early and late stages of learning. Let’s consider an example of
a novice soccer player learning to make a shot on goal. Initially, she may be rehearsing
her coach’s instructions in working memory (see later section, Role of Working Memory
in Skill Learning), reminding herself to keep her toe down and her ankle locked, and to
hit the ball with her instep. When her first shot goes wide, she will engage error detec­
tion and correction mechanisms (see next section, Error Detection and Correction) to ad­
just her aim for the next attempt, relying again on working memory to adjust motor com­
mands based on recent performance history. It seems intuitive, both from this example
and our own experiences, that these early learning processes are cognitively demanding.
As our budding soccer player progresses in skill, she can become less inwardly focused
on the mechanics of her own shot and start learning about more tactical aspects of the
sport. At this point, her motor performance has become fluid and automatized (see later
section, Late Learning Processes). When learning a new sport or skill, this transition from
“early” to “late” processing occurs not just once, but rather at multiple levels of compe­
tency as the learner progresses through more and more complex aspects of her or his do­
main of expertise (Schack & Mechsner, 2006; Wolpert & Flanagan, 2010). These stages of
learning are likely not dissociated by discrete transitions, but rather overlap in time.

Error Detection and Correction


Learning from errors is one of the basic principles of motor skill acquisition. Current
ideas about error-based learning stem from forward model control theories (Diedrichsen,
Shadmehr, & Ivry, 2009; Kawato, 1999; Shadmehr, Smith, & Krakauer, 2010; Wolpert &
Miall, 1996). When movement errors are detected by sensory systems, the information is
used to update the motor commands for subsequent actions. However, relying solely on
sensory feedback does not allow for efficient motor adjustments because of the time de­
lay between the initial motor command and the arrival of sensory feedback. Movement in­
duces continuous changes to state variables such as limb position and velocity. To allow
for accurate movement adjustments, the motor system relies on a forward model that
makes predictions of the sensory outcomes (i.e., changes in position and velocity) associ­
ated with a given motor command (Bastian, 2006; Flanagan, Vetter, Johansson, &
Wolpert, 2003). Differences between the predicted and actual sensory outcome serve as
the feedback error signal that updates forthcoming motor commands.

When learning a new motor skill such as moving a cursor on a computer screen, predic­
tion error becomes critical: New skills do not have enough of a motor history for an accu­
rate forward model, resulting in large prediction errors. In this case, the process of learn­
ing involves updating motor commands through multiple exposures to motor errors and

Page 7 of 38
Motor Skill Learning

gradually reducing them by refining the forward model (Donchin, Francis, & Shadmehr,
2003; Shadmehr et al., 2010).

The mechanisms of error-based learning are often studied using visual-motor adaptation
and force field adaptation tasks. Visual-motor adaptation involves distortion of the visual
consequences of movement, whereas force field adaptation affects the proprioceptive
consequences of motor commands by altering the dynamics of movement (Lalazar & Vaa­
dia, 2008; Shadmehr et al., 2010). Error processing under these two paradigms shows ex­
tensive neural overlap in the cerebellum, suggesting a common mechanism for error pro­
cessing and learning (Diedrichsen, Hashambhoy, Rane, & Shadmehr, 2005). Error pro­
cessing that contributes to learning is distinct from online movement corrections that
happen within a trial (Diedrichsen, Hashambhoy, et al., 2005; Gomi, 2008). Rather, learn­
ing is reflected in corrections made from one trial to the next, reflecting learning or up­
dating of motor representations.

Evidence suggests that the cerebellum provides the core mechanism underlying error-
based learning (Criscimagna-Hemminger, Bastian, & Shadmehr, 2010; Diedrichsen, Ver­
stynen, Lehman, & Ivry, 2005; Ito, 2002; Miall, Christensen, Cain, & Stanley, 2007; Miall,
Weir, Wolpert, & Stein, 1993; Ramnani, 2006; Tseng, Diedrichsen, Krakauer, Shadmehr, &
Bastian, 2007; Wolpert & Miall, 1996). Studies with cerebellar patients demonstrate that,
although these patients are able to make online motor adjustments, their performance
across trials does not improve (Bastian, 2006; Maschke, Gomez, Ebner, & Konczak, 2004;
Morton & Bastian, 2006; Smith & Shadmehr, 2005; Tseng et al., 2007). (p. 421) Neu­
roimaging studies also provide evidence for cerebellar contributions to error-based motor
skill learning (Diedrichsen, Verstynen, et al., 2005; Imamizu, Kuroda, Miyauchi, Yoshioka,
& Kawato, 2003; Imamizu, Kuroda, Yoshioka, & Kawato, 2004; Imamizu et al., 2000). It
should be noted that correcting errors within a trial does not seem to be a prerequisite to
learning, which is reflected as correcting errors from one trial to the next, or across-trial
corrections. This is evidenced by experiments showing that learning occurs even when
participants have insufficient time for making corrections. Thus it seems that just experi­
encing or detecting an error is sufficient to stimulate motor learning.

Neuroimaging studies provide evidence that brain regions other than the cerebellum may
also play a role in error-dependent learning such as the parietal cortex, striatum, and an­
terior cingulate cortex (Chapman et al., 2010; Clower et al., 1996; Danckert, Ferber, &
Goodale, 2008; den Ouden, Daunizeau, Roiser, Friston, & Stephan, 2010). Studies also
show that cerebellar patients can learn from errors when a perturbation is introduced
gradually, resulting in small errors (Criscimagna-Hemminger et al., 2010). In combination
these studies suggest that not all error-based learning relies on cerebellar networks.

Recent studies demonstrate a contribution of the anterior cingulate cortex (ACC) error
processing system to motor learning (Anguera, Reuter-Lorenz, Willingham, & Seidler,
2009; Anguera, Seidler, & Gehring, 2009; Danckert et al., 2008; Ferdinand, Mecklinger, &
Kray, 2008; Krigolson & Holroyd, 2006, 2007a, 2007b; Krigolson, Holroyd, Van Gyn, &
Heath, 2008). This prefrontal performance monitoring system has been studied extensive­

Page 8 of 38
Motor Skill Learning

ly by recording the error-related negativity (ERN), an event-related potential (ERP) com­


ponent that is locked to an erroneous response (Falkenstein, Hohnsbein, & Hoormann,
1995; Gehring, Coles, Meyer, & Donchin, 1995; Gehring, Goss, Coles, Meyer, & Donchin,
1993). The ERN has attracted a great deal of interest, both within the ERP research com­
munity and in cognitive neuroscience more generally. Much of this interest arose because
of evidence that the ERN is generated in the ACC, which is known to serve cognitive con­
trol functions that enable the brain to adapt behavior to changing task demands and envi­
ronmental circumstances (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Ridderinkhof,
Ullsperger, Crone, & Nieuwenhuis, 2004). The presupplementary motor area (pre-SMA),
a region anatomically in the vicinity of the ACC, has also been proposed to play a role in
error processing (Hikosaka & Isoda, 2010; Isoda & Hikosaka, 2007). As opposed to the
ACC, however, it is thought that the pre-SMA corrects for movement errors in a proactive
manner (Isoda & Hikosaka, 2007).

In a series of studies by Krigolson and colleagues, an ERN from the medial frontal region
was found to be associated with motor tracking errors (Krigolson & Holroyd, 2006,
2007a, 2007b; Krigolson et al., 2008). Interestingly, the onset of the ERN occurred before
the tracking error, indicating that the medial frontal system began to detect the error
even before the error was fully committed (Krigolson & Holroyd, 2006). The authors sug­
gested that this might entail the medial frontal system predicting tracking errors by
adopting a predictive mode of control (Desmurget, Vindras, Grea, Viviani, & Grafton,
2000). They also found that target errors which allow online movement correction within
a trial did not elicit the medial frontal ERN, but rather resulted in a negative deflection in
the posterior parietal region (Krigolson & Holroyd, 2007a; Krigolson et al., 2008). These
results indicate that the contribution of the medial frontal ERN to motor error processing
is a distinct process potentially involving prediction error calibration (Krigolson & Hol­
royd, 2007a; Krigolson et al., 2008).

We recently tested whether the ERN was sensitive to the magnitude of error experienced
during visual-motor adaptation and found a larger ERN magnitude on trials in which larg­
er errors were made (Anguera, Seidler, et al., 2009). ERN magnitude also decreased from
the early to the late stages of learning. These results are in agreement with current theo­
ries of ERN and skill acquisition. For example, as the error detection theory proposes
(Falkenstein, Hohnsbein, Hoormann, & Blanke, 1991; Gehring et al., 1993), a greater
ERN associated with larger errors indicates that the brain was monitoring the disparity
between the predicted and actual movement outcomes (Anguera, Seidler, et al., 2009).

There is also evidence supporting that the error-based learning represented in the ACC
contributes to motor sequence learning (Berns, Cohen, & Mintun, 1997). Several studies
have shown that the N200 ERP component, which is known to be sensitive to a mismatch
between the expected and actual sensory stimuli, is enhanced for a stimulus that violates
a learned motor sequence (Eimer, Goschke, Schlaghecken, & Sturmer, 1996). When ERN
magnitudes were compared between explicit (p. 422) and implicit learners, a larger ERN
was found for the explicit learners, demonstrating greater involvement of the error moni­
toring system when actively searching for the regularity of a sequence (Russeler, Kuh­

Page 9 of 38
Motor Skill Learning

licke, & Munte, 2003). A more recent study demonstrated a parametric increase in the
magnitude of the ERN during sequence learning as the awareness of the sequential na­
ture and the expectancy to the forthcoming sequential element increased (Ferdinand et
al., 2008).

The series of studies described above supports a role for the prefrontal ACC system in
motor error processing. The actual mechanism of how the ERN contributes to perfor­
mance improvements across trials during motor learning is not well understood, however.
Additionally, it remains unclear whether this system works independently of or in collabo­
ration with the cerebellar-based error processing system.

Role of Working Memory in Skill Learning

Working memory refers to the structures and processes used for temporarily storing and
manipulating information (Baddeley, 1986; Miyake & Shah, 1999). Dissociated processing
for spatial and verbal information was initially proposed by Baddeley and Hitch (1974).
Current views suggest that working memory may not be as process pure as once thought
(Cowan, 1995, 2005; Jonides et al., 2008), but the idea of separate modules for processing
different types of information still holds (Goldman-Rakic, 1987; Shah & Miyake, 1996;
Smith, Jonides, & Koeppe, 1996; Volle et al., 2008). Given that the early stages of motor
learning can be disrupted by the performance of secondary tasks (Eversheim & Bock,
2001; Remy, 2010; Taylor & Thoroughman, 2007, 2008), and the fact that similar pre­
frontal cortical regions are engaged early in motor learning as those relied on to perform
spatial working memory tasks (Jonides et al., 2008; Reuter-Lorenz et al., 2000), it is plau­
sible that learners engage spatial working memory processes to learn new motor skills.

Because variation exists in the number of items that individuals can hold and operate on
in working memory (cf. Vogel & Machizawa, 2004), it lends itself well to individual differ­
ences research approaches. To investigate whether working memory contributes to visu­
al-motor adaptation in a recent study by our group, we administered a battery of neu­
ropsychological assessments to participants and then had them perform a manual visual-
motor adaptation task and a spatial working memory task during magnetic resonance
imaging (MRI). The results showed that performance on the card rotation task (Ekstrome,
1976), a measure of spatial working memory, correlated with the rate of early, but not
late, learning on the visual-motor adaptation task across individuals (Anguera, Reuter-
Lorenz, Willingham, & Seidler, 2010). There were no correlations between verbal working
memory measures and either early or late learning. Moreover, the neural correlates of
early adaptation overlapped with those that participants engaged when performing a spa­
tial working memory task, notably in the right dorsolateral prefrontal cortex and in the bi­
lateral inferior parietal lobules. There was no neural overlap between late adaptation and
spatial working memory. These data demonstrate that early, but not late, learning en­
gages spatial working memory processes (Anguera, Reuter-Lorenz, Willingham, & Sei­
dler, 2010).

Page 10 of 38
Motor Skill Learning

Despite recent assertions that visual-motor adaptation is largely implicit (Mazzoni &
Krakauer, 2006), these findings (Anguera, Reuter-Lorenz, Willingham, & Seidler, 2010)
are consistent with the hypothesis that spatial working memory processes are involved in
the early stages of acquiring new visual-motor mappings. As described in detail below,
whether a task is learned implicitly (subconsciously) or explicitly appears to be a separate
issue from the relative cognitive demands of a task. For example, adaptation to a small
and gradual visual-motor perturbation, which typically occurs outside of the learner’s
awareness, is still subject to interference by performance of a secondary task (Galea, Sa­
mi, Albert, & Miall, 2010). This effect is reminiscent of Nissen and Bullemer’s (1987)
finding that implicitly acquiring a sequence of actions is attentionally demanding and can
be disrupted by secondary task performance. Thus we propose that spatial working mem­
ory can be engaged for learning new motor skills even when learning is implicit.

Models of motor sequence learning also propose that working memory plays an integral
role (cf. Ashe, Lungu, Basford, & Lu, 2006; Verwey, 1996, 2001). Studies that have used
repetitive transcranial magnetic stimulation (rTMS) to disrupt the dorsolateral prefrontal
cortex, a structure involved in working memory (Jonides et al., 1993), have shown im­
paired motor sequence learning (Pascual-Leone, Wassermann, Grafman, & Hallett, 1996;
Robertson, Tormos, Maeda, & Pascual-Leone, 2001). To evaluate whether working memo­
ry plays a role in motor sequence learning, we determined whether individual differences
in visual-spatial working memory (p. 423) capacity affect the temporal organization of ac­
quired motor sequences. We had participants perform an explicit motor sequence learn­
ing task (i.e., they were explicitly informed about the sequence and instructed to learn it)
and a visual-spatial working memory task (Luck & Vogel, 1997). We found that working
memory capacity correlated with the motor sequence chunking pattern that individuals
developed; that is, individuals with a larger working memory capacity chunked more
items together when learning the motor sequence (Bo & Seidler, 2009). Moreover, these
individuals exhibited faster rates of learning. These results demonstrate that individual
differences in working memory capacity predict the temporal structure of acquired motor
sequences.

How might spatial working memory be used during the motor learning process? We pro­
pose that it is used for differing purposes depending on whether the learner is acquiring
a new sequence of actions or adapting to a novel sensorimotor environment. In the case
of sensorimotor adaptation, we suggest that error information from the preceding trial
(see above section) is maintained in spatial working memory and relied on when the
learner manipulates the sensorimotor map to generate a motor command that is appro­
priate for the new environment. When adaptation is in response to a rotation of the visual
display, this process likely involves the mental rotation component of spatial working
memory (Jordan, 2001; Logie, 2005). This interpretation agrees with Abeele and Bock’s
proposal that adaptation progresses in a gradual fashion across the learning period from
small angles of transformation through intermediate values until the prescribed angle of
rotation is reached (Abeele, 2001). Thus, the engagement of these spatial working memo­
ry resources late in adaptation is markedly diminished, compared with early adaptation,
when the new mapping has been formed and is in use. This notion is supported by elec­
Page 11 of 38
Motor Skill Learning

trophysiological data demonstrating an interaction between motor areas and a frontal-


parietal network during motor adaptation (Wise, 1998). Dual-tasking studies of motor
adaptation support the appropriate timeline for this proposal as well. Taylor and Thor­
oughman (2007, 2008) have shown that sensorimotor adaptation is most affected when
attention is distracted by a secondary task imposed late in the trial, when error informa­
tion becomes available. These authors suggest that cognitive resources are engaged be­
tween trials so that error information can be integrated to update visual-motor maps for
the subsequent trial because a secondary task performed early in the trial did not pro­
duce interference. Thus, it seems that spatial working memory is engaged in the service
of correcting motor errors across trials to improve performance over time. This is
anatomically plausible regardless of whether the cerebellar motor error system or the
ACC error detection and correction network (or both; see previous section) is relied on
for motor learning because both structures have connections to the lateral frontal-pari­
etal system that supports working memory and both structures have been reported to be
activated in the early stages of sensorimotor adaptation (Anguera, Reuter-Lorenz, Willing­
ham, & Seidler, 2010; Imamizu et al., 2000).

Many motor sequence learning paradigms result in learning with few or no errors be­
cause movements are cued element by element, as in the popular serial reaction time
task. Trial and error sequence learning is an exception; in this case spatial working mem­
ory likely plays a key role in maintaining and inhibiting past erroneous responses (Hikosa­
ka et al., 1999). In the case of cued sequence learning, we propose that working memory
is relied on for chunking together of movement elements based on repetition, and their
transfer into long-term memory. Both implicit and explicit motor sequence learning para­
digms consistently result in activation of the frontal-parietal working memory system
(Ashe, Lungu, Basford, & Lu, 2006), consistent with this notion. Interestingly, dorsal pre­
motor cortex, which serves as a node in both working memory and motor execution net­
works, is the site where sensory information from working memory is thought to be con­
verted into motor commands for sequence execution (Ohbayashi, 2003).

Verwey (1996, 2001) also hypothesized a close relationship between working memory ca­
pacity and motor sequence learning. He proposed that participants rely on a cognitive
processor during sequence learning, which depends on “motor working memory” to allow
a certain number of sequence elements (i.e., a chunk) to be programmed in advance of
execution. At the same time, a motor processor is running in parallel to execute the ac­
tions so that the entire sequence can be performed efficiently. Interestingly, Ericsson and
colleagues (Ericsson, 1980) reported a case in which a participant with initially average
memory abilities increased his memory span from 7 to 79 digits with practice. This indi­
vidual learned to group chunks of digits together to form “supergroups,” which allowed
him to dramatically increase his digit span. Presumably the process is similar for those
acquiring long motor (p. 424) sequences as well, such as a musician memorizing an entire
piece of music or a dancer learning a sequence of moves for a performance.

Page 12 of 38
Motor Skill Learning

Late Learning Processes

As individuals become more proficient in executing a task, striatal and cerebellar regions
become more active, whereas cortical regions become less active (Doyon & Benali, 2005;
Grafton, Mazziotta, Woods, et al., 1992). But the question of how and where motor memo­
ries are ultimately stored is not trivial to answer. Shifts in activation do not mean that
these “late learning” structures are involved in memory; they may instead merely medi­
ate performance of a well-learned set of movements. Also, the role of a specific area may
differ between types of motor tasks.

At the cellular level, the acquisition of motor memories as a modulation of synaptic con­
nections between neurons was first proposed by Ramon y Cajal (1894). Memories result
from the facilitation and selective elimination of neuronal pairings due to experience.
Hebb stated the associative nature of memories in more explicit terms: “two cells or sys­
tems that are repeatedly active at the same time will tend to become associated, so that
activity in one facilitates activity in the other” (1949, p. 70). This Hebbian plasticity mod­
el has been well supported in model organisms (see Abel & Lattal, 2001; Martin, Grim­
wood, & Morris, 2000), and it appears that similar or identical mechanisms are at play for
human motor memories (Donchin, Sawaki, Madupu, Cohen, & Shadmehr, 2002).

At the level of systems, motor memories appear to follow a “cascade” pattern (Krakauer
& Shadmehr, 2006): motor learning initially excites and rapidly induces associative plas­
ticity in one area, for example, primary motor cortex or cerebellar cortex. For a period of
hours, this area will remain excited, and the learning will be sensitive to disruption and
interference (Brashers-Krug, Shadmehr, & Bizzi, 1996; Stefan et al., 2006). With the pas­
sage of time, the activation decreases, and the learning becomes more stable and resis­
tant to interference (Shadmehr & Brashers-Krug, 1997). Note that this has been debated
(Caithness et al., 2004), but reducing the effects of anterograde interference, through ei­
ther intermittent practice (Overduin, Richardson, Lane, Bizzi, & Press, 2006) or washout
blocks (Krakauer, Ghez, & Ghilardi, 2005), demonstrates consolidation.

Memories for motor acts are stored hierarchically. This hierarchy appears to hold true
both for abstract representations of movements in declarative memory (Schack & Mech­
sner, 2006) as well as for the motor commands themselves (Grafton & Hamilton, 2007;
Krigolson & Holroyd, 2006; Paine & Tani, 2004; Thoroughman & Shadmehr, 2000; Ya­
mamoto & Fujinami, 2008). This storage method may reflect a fundamental limitation of
working memory or attention for the number of simultaneous elements that can be oper­
ated on at once. In other words, hierarchical representations are a form of chunking.
Analogy learning, discussed later in this chapter, may exploit this hierarchical framework
by providing a ready-made set of assembled motor primitives, bringing a learner more
quickly to a higher hierarchical level.

The search for the site of motor memory storage has mostly focused on areas known to be
active in late learning of well-practiced tasks. Classically, it was believed that primary mo­
tor cortex merely housed commands for simple trajectories, and was controlled by premo­
tor areas (Fulton, 1935). However, more recent electrophysiological work in primates has
Page 13 of 38
Motor Skill Learning

shown that motor cortex stores representations of target locations (Carpenter, Geor­
gopoulos, & Pellizzer, 1999), motor sequences (Lu, 2005; Matsuzaka, Picard, & Strick,
2007), adaptations to a force field (Li, Padoa-Schioppa, & Bizzi, 2001), and a visual-motor
transformation (Paz, Boraud, Natan, Bergman, & Vaadia, 2003).

Whether and how other brain areas contribute to the storage of motor memories is less
clear. The cerebellum, commonly associated with motor tasks and motor learning, and ac­
tive in the performance of well-learned movements, has been identified as a likely candi­
date. However, its role is still debated. With regard to sequence learning, although it is
active in performance, the cerebellum is not necessary for learning or memory (Seidler et
al., 2002) unless movements are cued in a symbolic fashion (Bo, Peltier, Noll, & Seidler,
2011; Spencer & Ivry, 2008). The striatum has been extensively linked to acquisition and
storage of sequential representations (Debas et al., 2010; Jankowski, 2009; Seitz &
Roland, 1992), although a recent experiment provides a provocative challenge to this
viewpoint (Desmurget, 2010). In contrast, substantial evidence supports a role for the
cerebellum in housing internal models acquired during sensorimotor adaptation (Gray­
don, Friston, Thomas, Brooks, & Menon, 2005; Imamizu et al., 2000, 2003; Seidler & Noll,
2008; Werner, 2010).

Fast and Slow Learning

Computational neuroscience approaches have contributed much to our understanding of


time (p. 425) varying processes underlying motor learning, although these results await
strong integration into cognitive neuroscience views. Computational modeling work sup­
ports a role for error processing in skill learning by showing that performance change
across trials is related to the magnitude of errors that have been recently experienced
(Scheidt, Dingwell, & Mussa-Ivaldi, 2001; Thoroughman & Shadmehr, 2000). More re­
cently it has been demonstrated that a single state model (i.e., rate of learning is depen­
dent on error magnitude) does not account for all features of motor learning such as the
savings that occur when relearning a previously experienced task, and the fact that un­
learning occurs more quickly than learning. Work by Shadmehr and colleagues provides
evidence that two time varying processes with differential responsiveness to errors and
varying retention profiles can account for these features (Huang & Shadmehr, 2009; Join­
er & Smith, 2008; Shadmehr et al., 2010; Smith, Ghazizadeh, & Shadmehr, 2006). These
authors propose that a fast learning system responds strongly to errors but does not re­
tain information well, whereas in contrast, a slow learning system responds weakly to er­
rors but exhibits better retention.

Studies investigating ways to maximize the effects of the slow learning process provide
some interesting implications for enhancing retention of acquired skills. For example,
Joiner and Smith (2008) demonstrated that retention 24 hours after learning does not de­
pend on the level of performance that participants had attained at the end of learning,
which often is viewed as an indicator of the amount of learning that has taken place.
Rather, retention depends on the level that the slow learning process has reached at the
end of learning. Further, it seems that the fast and slow processes are not fixed but rather

Page 14 of 38
Motor Skill Learning

can be exploited to maximize retention (Huang & Shadmehr, 2009). When an adaptive
stimulus is introduced gradually, errors are small, and retention is better.

Explicit and Implicit Memory Systems in Skill


Learning
Understanding the cognitive and neural mechanisms of motor skill learning requires tak­
ing into account the relative roles of the explicit and implicit learning and memory sys­
tems. The explicit learning and memory system refers to the neurocognitive process that
accompanies conscious awareness of learning and remembering, whereas the implicit
system refers to the same process without concomitant awareness (Cohen & Squire,
1980; Reber, 1967; Squire, 1992; Voss & Paller, 2008). Laboratory evaluations of explicit
memory have a specific reference to information learned earlier, such as recall of a previ­
ously learned list of words. Implicit memory is demonstrated by changes in performance
that are due to prior experience or practice and that may not be consciously remembered
(Voss & Paller, 2008).

The distinction between the two memory systems and the existence of parallel neural
mechanisms underlying each was initially evidenced through the study of neurological pa­
tients. Patients with lesions to the medial temporal lobe and in particular the hippocam­
pus (as in the case of the well-studied patient H.M.) show amnesia, which is the inability
to learn, store, and recollect new information consciously (Cohen & Squire, 1980; Scoville
& Milner, 1957). Motor learning and memory systems are spared in amnesic patients, as
shown by intact mirror drawing and rotary pursuit task learning (Brooks & Baddeley,
1976; Cavaco, Anderson, Allen, Castro-Caldas, & Damasio, 2004; Corkin, 1968). These
same implicit tasks are impaired in patients with diseases affecting the basal ganglia
(Gabrieli, Stebbins, Singh, Willingham, & Goetz, 1997; Heindel, Salmon, Shults, Walicke,
& Butters, 1989). Neuroimaging studies also demonstrate medial temporal lobe and hip­
pocampal activation during explicit learning (Luo & Niki, 2005; Montaldi et al., 1998;
Staresina & Davachi, 2009) and striatal involvement during implicit learning (Lieberman,
Chang, Chiao, Bookheimer, & Knowlton, 2004; Poldrack, Prabhakaran, Seger, & Gabrieli,
1999; Seger & Cincotta, 2006). The dorsolateral prefrontal cortex is also involved during
explicit learning (Barone & Joseph, 1989a, 1989b; Sakai et al., 1998; Toni, Krams, Turner,
& Passingham, 1998), whereas the cerebellum and the cortical motor areas are involved
during implicit learning (Ashe et al., 2006; Matsumura et al., 2004).

More recent evidence suggests that the explicit and implicit systems may affect and inter­
act with one another (Willingham, 2001). For example, performance on an implicit memo­
ry test can be influenced by explicit memory, and vice versa (Keane, Orlando, & Verfael­
lie, 2006; Kleider & Goldinger, 2004; Tunney & Fernie, 2007; Voss, Baym, & Paller, 2008).
Moreover, the hippocampus and striatum, traditionally associated with explicit and im­
plicit knowledge, respectively, have been shown to interact during probabilistic classifica­
tion learning (Sadeh, Shohamy, Levy, Reggev, & Maril, 2011; Shohamy & Wagner, 2008).
Studies also suggest (p. 426) that the two systems can compete with one another during
Page 15 of 38
Motor Skill Learning

learning (Eichenbaum, Fagan, Mathews, & Cohen, 1988; Packard, Hirsh, & White, 1989;
Poldrack et al., 2001).

The interaction between the explicit and implicit learning and memory systems is also ev­
ident during motor skill learning. The fact that amnesic patients can still learn visual-mo­
tor tasks such as mirror drawing led many to think that explicit processes are not in­
volved in motor skill learning. As a consequence, motor skill learning has been predomi­
nantly viewed as part of the implicit learning system by researchers in the memory field
(Gabrieli, 1998; Henke, 2010; Squire, 1992). As explicated in this chapter, however, there
are both explicit and implicit forms of motor skill learning, and there is evidence that
these processes interact during the learning of new motor skills. Moreover, the heavy in­
volvement of cognitive processes such as error detection and working memory during the
early stages of skill learning supports the notion that motor skills can benefit from explic­
it instruction, at least early in the learning process. The following sections further discuss
the role of implicit and explicit processes in motor sequence learning and sensorimotor
adaptation.

Explicit and Implicit Processes in Sequence Learning

Implicit and explicit learning mechanisms have been studied extensively in motor se­
quence learning. There is still debate, however, as to whether the two systems interact
during learning. Behavioral experiments suggest that the two systems act parallel to each
other without interference (Curran & Keele, 1993; Song, Howard, & Howard, 2007). Dis­
tinct neural mechanisms have also been reported for the two forms of sequence learning
(Destrebecqz et al., 2005; Honda et al., 1998; Karabanov et al., 2010).

On the other hand, there is also some evidence supporting the interaction of the two sys­
tems during sequence learning. For example, the medial temporal lobe and striatum,
known to serve explicit and implicit processes, respectively, have both been shown to be
engaged during both implicit and explicit sequence learning (Schendan, Searl, Melrose, &
Stern, 2003; Schneider et al., 2010; Wilkinson, Khan, & Jahanshahi, 2009). Several stud­
ies also show that explicit instructions interfere with implicit learning, supporting interac­
tions between the two systems (Boyd & Winstein, 2004; Green & Flowers, 1991; Reber,
1976). However, if explicit knowledge has been acquired through practice, it does not in­
terfere with implicit learning (Vidoni & Boyd, 2007). The interference effect between the
two systems during sequence learning has also been supported by neuroimaging data.
Destrebecqz and colleagues showed the expected activation in the striatum during implic­
it learning; during explicit learning, the ACC and medial prefrontal cortex were also ac­
tive, and this activation correlated negatively with striatal activation (Destrebecqz et al.,
2005). The authors interpreted this as suppression of the implicit learning system (i.e.,
the striatum) during explicit learning. Another study showed that the intention to learn a
sequence was associated with sustained right prefrontal cortex activation and an attenua­
tion of learning-related changes in the medial temporal lobe and thalamus, resulting in
the failure of implicit learning (Fletcher et al., 2005).

Page 16 of 38
Motor Skill Learning

Some studies also emphasize that the implicit and explicit memory systems share infor­
mation. Destrebecqz and Cleermans (2001) showed that participants were able to gener­
ate an implicitly learned sequence when prompted, implying the existence of explicitly ac­
quired sequence knowledge. An overlap of neural activation patterns between the two
learning systems including the striatum has also been shown in neuroimaging studies of
simultaneous explicit and implicit sequence learning (Aizenstein et al., 2004; Willingham,
Salidis, & Gabrieli, 2002).

A recent study demonstrates that the two memory systems are combined during the
learning process (Ghilardi, Moisello, Silvestri, Ghez, & Krakauer, 2009). In this study, the
authors report that the implicit and explicit systems consolidate differently and are differ­
entially sensitive to interference, with explicit learning being more sensitive to antero­
grade and implicit more sensitive to retrograde interference. They proposed that the ex­
plicit acquisition of sequential order knowledge and the implicit acquisition of accuracy
can be combined for successful learning.

Explicit and Implicit Processes in Sensorimotor Adaptation

Whether and how implicit and explicit processes play a role in sensorimotor adaptation is
less well understood. It seems clear that cognitive processes contribute to the early
stages of adaptive learning, as outlined above. However, cognitively demanding tasks do
not necessitate reliance on explicit processes (cf. Galea et al., 2010; Nissen & Bullemer,
1987). It has been shown that participants who gain explicit awareness by the end of
adaptation exhibit learning (p. 427) advantages (Hwang, Smith, & Shadmehr, 2006; Wern­
er & Bock, 2007). However, because participants were polled after the experimental pro­
cedure, it is not clear whether explicit processes aided learning per se, or arose as a re­
sult of the transformation having become well learned. Malone and Bastian (Malone,
2010) recently demonstrated that participants instructed how to consciously correct er­
rors during adaptation learned faster than those given no instructions, and participants
performing a secondary distractor task learned even more slowly.

Mazzoni and Krakauer (2006) also provided participants with an explicit strategy to
counter an imposed rotation. Their surprising result was that implicit adaptation to the
rotation proceeded despite the strategy, and despite making performance on the task
worse. Using the same paradigm, Taylor and colleagues (2010) repeated this experiment,
this time comparing healthy adults and patients with cerebellar ataxia. The patients with
cerebellar damage performed better than the controls; they were able to implement an
explicit strategy with little interference from implicit processes, whereas the control
group demonstrated progressively poorer performance as a result of recalibration. Al­
though these findings are remarkable, the data do not necessarily support independence
of implicit and explicit systems or a complete takeover by implicit processes, as proposed
by Mazzoni and Krakauer (2006): The errors made by normal controls as a result of their
adaptation only reached about one-third of the magnitude of the total rotation. These

Page 17 of 38
Motor Skill Learning

findings provide further support for the role of the cerebellum in implicit adaptation, as
separate from explicit processes.

That the two systems may be independent is further supported by work from Sulzenbruck
and Heuer (2009). In their task, drawing circles under changing gain conditions, explicit
and implicit processes were placed both in cooperation and in opposition; the explicit
strategy either complemented or opposed the gain change. Under these circumstances,
and when participants never fully adapted to the gain change, the two systems operated
in isolation: they did not interfere with one another, and their effects were purely summa­
tive.

When the explicit and implicit systems are not placed in direct conflict, the evidence
seems to be clearer. Explicit and implicit involvements are complementary and non-inter­
acting. In a study by Mazzoni and Wexler (2009), an explicit rule could be used to select a
target, whereas implicit processes adapted to a visual-motor rotation without interfer­
ence. Interestingly, this study indicates that the two processes have a potential to inter­
act. Patients in a presymptomatic stage of Huntington’s disease were unable to maintain
pure separation, and their adaptation performance suffered when explicit task control
was required. In all, a clear conclusion regarding the role of explicit control in adaptation
remains elusive, as does an understanding of the brain structures and networks involved.
Future work with careful attention to methodological detail is required to address these
issues.

Practical Implications for Skill Learning


The cognitive neuroscience of skill acquisition has many practical implications, especially
as it applies to sporting activities and rehabilitation after injury. As mentioned previously,
when individuals learn a new skill, they progress from a more cognitive early phase of
learning relying on prefrontal and parietal brain networks to a highly automatic late
phase of learning that relies on more subcortical structures. Moving participants quickly
through these stages and avoiding regressing to lower levels of performance is a topic of
interest not only to scientific researchers but also to coaches, physical therapists, sports
educators, and people from many other related disciplines. This section highlights the
practical implications of skill learning in sports, and focuses on the phenomenon of “chok­
ing under pressure,” as well as on ways to circumvent and avoid such lapses in perfor­
mance.

Numerous studies have shown that choking under pressure occurs when a highly learned
and automatic skill is brought to explicit awareness leaving the normally well-learned and
automatic motor skill prone to breakdown under conditions of stress (Baumeister, 1984;
Beilock & Carr, 2001; Beilock, Carr, MacMahon, & Starkes, 2002). Although little neuro­
science research has been dedicated to this phenomenon, presumably learners show de­
creased performance in a well-learned skill when the focus of attention engages brain
networks more actively involved in initial learning, such as the prefrontal cortex and ACC
networks (see Figure 20.1, arrow on right). Researchers studying skilled performance in
Page 18 of 38
Motor Skill Learning

high-stakes conditions often analyze such breakdowns in performance by reporting on the


differences in the focus of attention between experts and less skilled individuals. For ex­
ample, in a 2007 study by Casteneda and Gray, novice and expert baseball players, de­
pending on their explicit instructions before the task, performed differentially based on
focus of attention. Specifically, it was found that the (p. 428) optimal focus of attention for
highly skilled batters is one that is external and allows attention to the perceptual aspects
of the action. This external focus of attention is thought to allow smooth processing of the
proceduralized knowledge of swinging a baseball bat (Casteneda & Gray, 2007). On the
other hand, less skilled batters benefit from a focus that attends to the step-by-step exe­
cution of the swing and are hampered by focusing on the effects of their action (Castena­
da et al., 2007). Similarly, in a 2002 study using expert and novice soccer players, Beilock
et al. (2002) found that when novices were required to focus on the execution of the skill,
their performance as measured by time to dribble a soccer ball through an obstacle
course improved compared with a dual-task condition. In experts, though, this result was
reversed. Expert players performed better when there was a concomitant task that took
their attention off of executing this well-learned skill. Interestingly, however, when right-
footed expert players were instructed to perform the task with their nondominant left
foot, their performance was enhanced in the skill-focus condition. Overall, this research
implies that when it comes to highly learned motor skills, performance is improved with
an external focus. It is only when performing a less-well-learned skill that performance
benefits from a focus on the component movements that compose the complex skill.

How does the way that individuals learn affect their acquisition and robustness of a new
motor skill? For example, if skill acquisition requires going from a highly cognitive explic­
it representation to a procedural implicit representation, are there ways to affect the
learning process so that the motor program is initially stored implicitly? This would also
have the added benefit of making the skill more resistant to choking under pressure be­
cause there would not be a strong explicit motor program of the skill to bring to con­
scious awareness. In analogy learning, one simple heuristic is provided to the learner
(Koedijker, Oudejans, & Beek, 2008). In following this one single rule, it is believed that
learners eschew hypothesis testing during skill learning. Thus, little explicit knowledge
about the task is accumulated, and demands on error detection and working memory
processes are kept to a minimum (Koedijker, Oudejans, & Beek, 2008). From a neuro­
science perspective this would be akin to “bypassing” the highly cognitive prefrontal
brain networks and encoding the skill directly into more robust subcortical brain regions
such as the striatum and cerebellum (see Figure 20.1, arrow on left). For example, in a
2001 study conducted by Liao and Masters (2001), subjects were instructed to hit a fore­
hand shot in table tennis by moving the hand along an imaginary hypotenuse of a right
triangle. Analogy learners acquired few explicit rules about the task, and showed similar
performance compared with participants who had learned the skill explicitly. When a con­
current secondary task was added, the performance of the explicit learners was signifi­
cantly reduced. However, no significant performance impairments were seen in the analo­
gy (implicit learning) group under dual-task conditions.

Page 19 of 38
Motor Skill Learning

Conclusions and Future Directions


Recent studies on the cognitive neuroscience of skill learning have begun to incorporate
new techniques and approaches. For example, it has been shown using diffusion tensor
imaging (DTI) that individual differences in cerebellar white matter microstructure corre­
late with skill learning ability (Della-Maggiore, 2009). Moreover, a single session of motor
learning, but not simple motor performance, has been shown to modulate frontal-parietal
network connectivity. The field has advanced in terms of both elucidating basic mecha­
nisms of skill learning and making strides in translational approaches. Our nascent un­
derstanding of the physiological mechanisms underlying skill learning has been exploited
for the design of brain stimulation protocols that inhibit abnormal brain activity in stroke
patients (Fregni, 2006) or accelerate skill learning by applying transcranial direct current
stimulation to the motor cortex (Reis, 2009).

When new techniques are applied to the study of skill learning mechanisms, investigators
typically focus first on primary motor cortex. Although this structure is both accessible
and strongly associated with skill learning, a complete understanding of the neural mech­
anisms underlying learning will come only from investigation of interactions between and
integration of cognitive and motor networks. Moreover, investigation of ecologically valid
skills in more naturalistic settings with the use of mobile brain imaging techniques (Gwin,
Gramann, Makeig, & Ferris, 2010, 2011) will bring further validity and translational ca­
pacity to the study of skill learning.

Learning a new motor skill places demands not only on the motor system but also on cog­
nitive processes. For example, working memory appears to be relied on for manipulating
motor commands for subsequent actions based on recently experienced (p. 429) errors.
Moreover, it may be used for chunking movement elements into hierarchical sequence
representations. Interestingly, individual differences in working memory capacity are pre­
dictive of the ability to learn new motor skills. It should be noted that engagement of
working memory, error processing, and attentional processes and networks does not
necessitate that skill learning is explicit. As skill learning progresses, cognitive demands
are reduced, as reflected by decreasing interference from secondary tasks and reduced
activation of frontal-parietal brain networks.

Questions for the Future


• How overlapping are the following continua?

• Cognitive and procedural control of skill


• Implicit and explicit memory systems
• Fast and slow skill learning processes

Page 20 of 38
Motor Skill Learning

• Do the implicit and explicit memory systems interact, either positively (reflected as
transfer) or negatively (reflected as interference), for both motor sequence learning
and sensorimotor adaptation?
• What role do genetic, experiential, and other individual differences variables play in
skill learning?
• What role do state variables, such as motivation, arousal, and fatigue, play in skill
learning?
• How can a precise understanding of the neurocognitive mechanisms of skill learning
be exploited to enhance rehabilitation, learning, and performance?

Author Note
This work was supported by the National Institutes of Health (R01 AG 24106 S1 and T32-
AG00114).

References
Abeele, S., Bock, O. (2001). Mechanisms for sensorimotor adaptation to rotated visual in­
put. Experimental Brain Research, 139, 248–253.

Abel, T., & Lattal, K. M. (2001). Molecular mechanisms of memory acquisition, consolida­
tion and retrieval. Current Opinion in Neurobiology, 11 (2), 180–187.

Adams, J. A. (1971). A closed-loop theory of motor learning. Journal of Motor Behavior, 3


(2), 111–149.

Aizenstein, H. J., Stenger, V. A., Cochran, J., Clark, K., Johnson, M., Nebes, R. D., et al.
(2004). Regional brain activation during concurrent implicit and explicit sequence learn­
ing. Cerebral Cortex, 14 (2), 199–208.

Anguera, J. A., Reuter-Lorenz, P. A., Willingham, D. T., & Seidler, R. D. (2009). Contribu­
tions of spatial working memory to visuomotor learning. Journal of Cognitive Neuro­
science, 22 (9), 1917–1930.

Anguera, J. A., Reuter-Lorenz, P.A., Willingham, D.T., Seidler, R.D. (2010). Contributions of
spatial working memory to visuomotor learning. Journal of Cognitive Neuroscience, 22
(9), 1917–1930.

Anguera, J. A., Seidler, R. D., & Gehring, W. J. (2009). Changes in performance monitoring
during sensorimotor adaptation. Journal of Neurophysiology, 102 (3), 1868–1879.

Aristotle; Ross, W. D., & Smith, J. A. (1908). The works of Aristotle. Oxford, UK: Clarendon
Press.

Page 21 of 38
Motor Skill Learning

Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences.
Current Opinion in Neurobiology, 16, 213–221.

Atzler, E., & Herbst, R. (1927). Arbeitsphysiologische Studien. Pflügers Archiv European
Journal of Physiology, 215 (1), 291–328.

Baddeley, A. D. (1986). Working memory. Oxford, UK: Oxford University Press.

Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Bower (Ed.), Recent ad­
vances in learning and motivation (Vol. 8, pp. 47–89). New York: Academic Press.

Barone, P., & Joseph, J. P. (1989a). Prefrontal cortex and spatial sequencing in macaque
monkey. Experimental Brain Research, 78 (3), 447–464.

Barone, P., & Joseph, J. P. (1989b). Role of the dorsolateral prefrontal cortex in organizing
visually guided behavior. Brain Behavior and Evolution, 33 (2-3), 132–135.

Bastian, A. J. (2006). Learning to predict the future: the cerebellum adapts feedforward
movement control. Current Opinion in Neurobiology, 16 (6), 645–649.

Baumeister, R. F. (1984). Choking under pressure: Self-consciousness and paradoxical ef­


fects of incentives on skillful performance. Journal of Personality and Social Psychology,
46 (3), 610–620.

Beilock, S. L., & Carr, T. H. (2001). On the fragility of skilled performance: what governs
choking under pressure? Journal of Experimental Psychology: General, 130 (4), 701–725.

Beilock, S. L., Carr, T. H., MacMahon, C., & Starkes, J. L. (2002). When paying attention
becomes counterproductive: Impact of divided versus skill-focused attention on novice
and experienced performance of sensorimotor skills. Journal of Experimental Psychology:
Applied, 8 (1), 6–16.

Berns, G. S., Cohen, J. D., & Mintun, M. A. (1997). Brain regions responsive to novelty in
the absence of awareness. Science, 276 (5316), 1272–1275.

Bo, J., Borza, V., & Seidler, R. D. (2009). Age-related declines in visuospatial working
memory correlate with deficits in explicit motor sequence learning. Journal of Neurophys­
iology, 102, 2744–2754.

Bo, J., Peltier, S., Noll, D., Seidler, R. D. (2011). Symbolic representations in motor se­
quence learning. NeuroImage, 54 (1), 417–426.

Bo, J., & Seidler, R. D. (2009). Visuospatial working memory capacity predicts the organi­
zation of acquired explicit motor sequences. Journal of Neurophysiology, 101 (6), 3116–
3125.

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict
monitoring and cognitive control. Psychological Review, 108 (3), 624–652.

Page 22 of 38
Motor Skill Learning

Boyd, L. A., & Winstein, C. J. (2004). Providing explicit information disrupts implicit motor
learning after basal ganglia stroke. Learning and Memory, 11 (4), 388–396.

Brashers-Krug, T., Shadmehr, R., & Bizzi, E. (1996). Consolidation in human mo­
(p. 430)

tor memory. Nature, 382 (6588), 252–255.

Brooks, D. N., & Baddeley, A. D. (1976). What can amnesic patients learn? Neuropsycholo­
gia, 14 (1), 111–122.

Caithness, G., Osu, R., Bays, P., Chase, H., Klassen, J., Kawato, M., et al. (2004). Failure to
consolidate the consolidation theory of learning for sensorimotor adaptation tasks. Jour­
nal of Neuroscience, 24 (40), 8662–8671.

Cajal, S. R. (1894). The Croonian Lecture: La fine structure des centres nerveux. Proceed­
ings of the Royal Society of London, 55, 444–468.

Carpenter, A. F., Georgopoulos, A. P., & Pellizzer, G. (1999). Motor cortical encoding of se­
rial order in a context-recall task. Science, 283 (5408), 1752–1757.

Castaneda, B., & Gray, R. (2007). Effects of focus of attention on baseball batting perfor­
mance in players of different skill levels. Journal of Sport & Exercise Psychology, 29, 60–
77.

Cavaco, S., Anderson, S. W., Allen, J. S., Castro-Caldas, A., & Damasio, H. (2004). The
scope of preserved procedural memory in amnesia. Brain, 127 (Pt 8), 1853–1867.

Chapman, H. L., Eramudugolla, R., Gavrilescu, M., Strudwick, M. W., Loftus, A., Cunning­
ton, R., et al. (2010). Neural mechanisms underlying spatial realignment during adapta­
tion to optical wedge prisms. Neuropsychologia, 48 (9), 2595–2601.

Clower, D. M., Hoffman, J. M., Votaw, J. R., Faber, T. L., Woods, R. P., & Alexander, G. E.
(1996). Role of posterior parietal cortex in the recalibration of visually guided reaching.
Nature, 383 (6601), 618–621.

Cohen, N. J., & Squire, L. R. (1980). Preserved learning and retention of pattern-analyzing
skill in amnesia: Dissociation of knowing how and knowing that. Science, 210 (4466), 207–
210.

Conley, D. L., & Krahenbuhl, G. S. (1980). Running economy and distance running perfor­
mance of highly trained athletes. Medicine and Science in Sports and Exercise, 12 (5),
357–360.

Corkin, S. (1968). Acquisition of motor skill after bilateral medial temporal-lobe excision.
Neuropsychologia, 6 (3), 255–265.

Cowan, N. (1995). Attention and memory: An integrated framework Oxford Psychology


Series (Vol. 26). New York: Oxford University Press.

Cowan, N. (2005). Working memory capacity. New York: Psychology Press.


Page 23 of 38
Motor Skill Learning

Criscimagna-Hemminger, S. E., Bastian, A. J., & Shadmehr, R. (2010). Size of error affects
cerebellar contributions to motor learning. Journal of Neurophysiology, 103 (4), 2275–
2284.

Cunningham, H. A. (1989). Aiming error under transformed spatial mappings suggests a


structure for visual-motor maps. Journal of Experimental Psychology: Human Perception
and Performance, 15 (3), 493–506.

Curran, T., & Keele, S. W. (1993). Attentional and nonattentional forms of sequence learn­
ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19 (1), 189–
202.

Danckert, J., Ferber, S., & Goodale, M. A. (2008). Direct effects of prismatic lenses on vi­
suomotor control: An event-related functional MRI study. European Journal of Neuro­
science, 28 (8), 1696–1704.

Debas, K., Carrier, J., Orban, P., Barakat, M., Lungu, O., Vandewalle, G., Hadj Tahar, A.,
Bellec, P., Karni, A., Ungerleider, L. G., Benali, H., & Doyon, J. (2010). Brain plasticity re­
lated to the consolidation of motor sequence learning and adaptation. Proceedings of the
National Academy of Sciences U S A, 107 (41), 17839–17844.

Della-Maggiore, V., Scholz, J., Johansen-Berg, H., & Paus, T. (2009). The rate of visuomo­
tor adaptation correlates with cerebellar white-matter microstructure. Human Brain Map­
ping, 30 (12), 4048–4053.

den Ouden, H. E., Daunizeau, J., Roiser, J., Friston, K. J., & Stephan, K. E. (2010). Striatal
prediction error modulates cortical coupling. Journal of Neuroscience, 30 (9), 3210–3219.

Desmurget, M., Turner, R. S. (2010). Motor sequences and the basal ganglia: Kinematics,
not habits. Journal of Neuroscience, 30 (22), 7685–7690.

Desmurget, M., Vindras, P., Grea, H., Viviani, P., & Grafton, S. T. (2000). Proprioception
does not quickly drift during visual occlusion. Experimental Brain Research, 134 (3), 363–
377.

Destrebecqz, A., & Cleeremans, A. (2001). Can sequence learning be implicit? New evi­
dence with the process dissociation procedure. Psychonomic Bulletin and Review, 8 (2),
343–350.

Destrebecqz, A., Peigneux, P., Laureys, S., Degueldre, C., Del Fiore, G., Aerts, J., et al.
(2005). The neural correlates of implicit and explicit sequence learning: Interacting net­
works revealed by the process dissociation procedure. Learning and Memory, 12 (5), 480–
490.

Diedrichsen, J., Hashambhoy, Y., Rane, T., & Shadmehr, R. (2005). Neural correlates of
reach errors. Journal of Neuroscience, 25 (43), 9919–9931.

Page 24 of 38
Motor Skill Learning

Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2009). The coordination of movement: Opti­
mal feedback control and beyond. Trends in Cognitive Sciences, 14 (1), 31–39.

Diedrichsen, J., Verstynen, T., Lehman, S. L., & Ivry, R. B. (2005). Cerebellar involvement
in anticipating the consequences of self-produced actions during bimanual movements.
Journal of Neurophysiology, 93 (2), 801–812.

Donchin, O., Francis, J. T., & Shadmehr, R. (2003). Quantifying generalization from trial-
by-trial behavior of adaptive systems that learn with basis functions: Theory and experi­
ments in human motor control. Journal of Neuroscience, 23 (27), 9032–9045.

Donchin, O., Sawaki, L., Madupu, G., Cohen, L. G., & Shadmehr, R. (2002). Mechanisms
influencing acquisition and recall of motor memories. Journal of Neurophysiology, 88 (4),
2114–2123.

Doyon, J., & Benali, H. (2005). Reorganization and plasticity in the adult brain during
learning of motor skills. Current Opinion in Neurobiology, 15 (2), 161–167.

Doyon, J., Owen, A. M., Petrides, M., Sziklas, V., & Evans, A. C. (1996). Functional anato­
my of visuomotor skill learning in human subjects examined with positron emission to­
mography. European Journal of Neuroscience, 8 (4), 637–648.

Eichenbaum, H., Fagan, A., Mathews, P., & Cohen, N. J. (1988). Hippocampal system dys­
function and odor discrimination learning in rats: Impairment or facilitation depending on
representational demands. Behavioral Neuroscience, 102 (3), 331–339.

Eimer, M., Goschke, T., Schlaghecken, F., & Sturmer, B. (1996). Explicit and implicit
learning of event sequences: evidence from event-related brain potentials. Journal of Ex­
perimental Psychology: Learning, Memory, and Cognition, 22 (4), 970–987.

Ekstrome, R. B., French, J. W., Harman, H. H., et al. (1976). Manual for kit of factor refer­
enced cognitive tests. Princeton, NJ: Educational Testing Service.

Enoka, R. M. (1988). Load- and skill-related changes in segmental contributions to


(p. 431)

a weightlifting movement. Medicine and Science in Sports and Exercise, 20 (2), 178–187.

Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of a memory skill. Science,
208, 1181–1182.

Eversheim, U., & Bock, O. (2001). Evidence for processing stages in skill acquisition: A
dual-task study. Learning and Memory, 8 (4), 183–189.

Falkenstein, M., Hohnsbein, J., & Hoormann, J. (1995). Event-related potential correlates
of errors in reaction tasks. Electroencephalographic and Clinical Neurophysiology Sup­
plement, 44, 287–296.

Page 25 of 38
Motor Skill Learning

Falkenstein, M., Hohnsbein, J., Hoormann, J., & Blanke, L. (1991). Effects of crossmodal
divided attention on late ERP components. II. Error processing in choice reaction tasks.
Electroencephalography and Clinical Neurophysiology, 78 (6), 447–455.

Ferdinand, N. K., Mecklinger, A., & Kray, J. (2008). Error and deviance processing in im­
plicit and explicit sequence learning. Journal of Cognitive Neuroscience, 20 (4), 629–642.

Fitts, P. M., & Posner, M. I. (1967). Human performance. Belmont, CA: Brooks/Cole.

Flanagan, J. R., Vetter, P., Johansson, R. S., & Wolpert, D. M. (2003). Prediction precedes
control in motor learning. Current Biology, 13 (2), 146–150.

Fletcher, P. C., Zafiris, O., Frith, C. D., Honey, R. A., Corlett, P. R., Zilles, K., et al. (2005).
On the benefits of not trying: Brain activity and connectivity reflecting the interactions of
explicit and implicit sequence learning. Cerebral Cortex, 15 (7), 1002–1015.

Fregni, F., Boggio, P. S., Valle, A. C., Rocha, R. R., Duarte, J., Ferreira, M. J., Wagner, T.,
Fecteau, S., Rigonatti, S. P., Riberto, M., Freedman, S. D., & Pascual-Leone, A. (2006). A
sham-controlled trial of a 5-day course of repetitive transcranial magnetic stimulation of
the unaffected hemisphere in stroke patients. Stroke, 37, 2115–2122.

Fulton, J. F. (1935). A note on the definition of the “motor” and “premotor” areas. Brain,
58 (2), 311.

Gabrieli, J. D. (1998). Cognitive neuroscience of human memory. Annual Review of Psy­


chology, 49, 87–115.

Gabrieli, J. D., Stebbins, G. T., Singh, J., Willingham, D. B., & Goetz, C. G. (1997). Intact
mirror-tracing and impaired rotary-pursuit skill learning in patients with Huntington’s
disease: Evidence for dissociable memory systems in skill learning. Neuropsychology, 11
(2), 272–281.

Galea, J. M., Sami, S. A., Albert, N. B., & Miall, R. C. (2010). Secondary tasks impair adap­
tation to step- and gradual-visual displacements. Experimental Brain Research, 202 (2),
473–484.

Garrison, K. A., Winstein, C. J., & Aziz-Zadeh, L. (2010). The mirror neuron system: A
neural substrate for methods in stroke rehabilitation. Neurorehabilitation and Neural Re­
pair, 24 (5), 404–412.

Gehring, W. J., Coles, M. G., Meyer, D. E., & Donchin, E. (1995). A brain potential manifes­
tation of error-related processing. Electroencephalography and Clinical Neurophysiology
Suppl, 44, 261–272.

Gehring, W. J., Goss, B., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1993). A neural sys­
tem for error detection and compensation. Psychological Science, 4 (6), 385–390.

Page 26 of 38
Motor Skill Learning

Geisler, W. S. (1989). Sequential ideal-observer analysis of visual discriminations. Psycho­


logical Review, 96 (2), 267–314.

Gepshtein, S., Seydell, A., & Trommershauser, J. (2007). Optimality of human movement
under natural variations of visual-motor uncertainty. Journal of Vision, 7 (5), 13 11–18.

Ghilardi, M. F., Gordon, J., & Ghez, C. (1995). Learning a visuomotor transformation in a
local area of work space produces directional biases in other areas. Journal of Neurophys­
iology, 73 (6), 2535–2539.

Ghilardi, M. F., Moisello, C., Silvestri, G., Ghez, C., & Krakauer, J. W. (2009). Learning of a
sequential motor skill comprises explicit and implicit components that consolidate differ­
ently. Journal of Neurophysiology, 101 (5), 2218–2229.

Goldman-Rakic, P. (1987). Circuitry of primate prefrontal cortex and regulation of behav­


ior by representational memory. In F. Plum (Ed.), Handbook of physiology (pp. 373–417).
Washington, DC: American Psychological Society.

Gomi, H. (2008). Implicit online corrections of reaching movements. Current Opinion in


Neurobiology, 18 (6), 558–564.

Grafton, S. T., & Hamilton, A. F. (2007). Evidence for a distributed hierarchy of action rep­
resentation in the brain. Human Movement Science, 26 (4), 590–616.

Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E.
(1992). Functional anatomy of human procedural learning determined with regional cere­
bral blood flow and PET. Journal of Neuroscience, 12 (7), 2542–2548.

Grafton, S. T., Mazziotta, J. C., Woods, R. P., & Phelps, M. E. (1992). Human functional
anatomy of visually guided finger movements. Brain, 115 (Pt 2), 565–587.

Grafton, S. T., Woods, R. P., Mazziotta, J. C., & Phelps, M. E. (1991). Somatotopic mapping
of the primary motor cortex in humans: Activation studies with cerebral blood flow and
positron emission tomography. Journal of Neurophysiology, 66 (3), 735–743.

Graydon, F., Friston, K., Thomas, C., Brooks, V., & Menon, R. (2005). Learning-related fM­
RI activation associated with a rotational visuo-motor transformation. Brain Research:
Cognitive Brain Research, 22 (3), 373–383.

Green, T. D., & Flowers, J. H. (1991). Implicit versus explicit learning processes in a prob­
abilistic, continuous fine-motor catching task. Journal of Motor Behavior, 23 (4), 293–300.

Gwin, J. T., Gramann, K., Makeig, S., & Ferris, D. P. (2010). Removal of movement artifact
from high-density EEG recorded during walking and running. Journal of Neurophysiology,
103 (6), 3526–3534.

Gwin, J. T., Gramann, K., Makeig, S., & Ferris, D. P. (2011). Electrocortical activity is cou­
pled to gait cycle phase during treadmill walking. NeuroImage, 54 (2), 1289–1296.

Page 27 of 38
Motor Skill Learning

Hebb, D. O. (1949). The organization of behavior. New York: Wiley & Sons.

Heindel, W. C., Salmon, D. P., Shults, C. W., Walicke, P. A., & Butters, N. (1989). Neuropsy­
chological evidence for multiple implicit memory systems: A comparison of Alzheimer’s,
Huntington’s, and Parkinson’s disease patients. Journal of Neuroscience, 9 (2), 582–587.

Helmholtz, H. V. (1867). Handbuch der physiologischen Optik. Leipzig: Voss.

Henke, K. (2010). A model for memory systems based on processing modes rather than
consciousness. Nature Reviews Neuroscience, 11 (7), 523–532.

Hikosaka, O., & Isoda, M. (2010). Switching from automatic to controlled behavior: corti­
co-basal ganglia mechanisms. Trends in Cognitive Sciences, 14 (4), 154–161.

Hikosaka, O., Nakahara, H., Rand, M. K., Sakai, K., Lu, X., Nakamura, K., et al. (1999).
Parallel neural networks for learning and sequential procedures. Trends in
Neurosciences, 22, 464–471.

(p. 432) Holt, K. G., Hamill, J., & Andres, R. O. (1991). Predicting the minimal energy costs
of human walking. Medicine and Science in Sports and Exercise, 23 (4), 491.

Honda, M., Deiber, M. P., Ibanez, V., Pascual-Leone, A., Zhuang, P., & Hallett, M. (1998).
Dynamic cortical involvement in implicit and explicit motor sequence learning: A PET
study. Brain, 121 (Pt 11), 2159–2173.

Huang, V. S., & Shadmehr, R. (2009). Persistence of motor memories reflects statistics of
the learning event. Journal of Neurophysiology, 102 (2), 931–940.

Hwang, E. J., Smith, M. A., & Shadmehr, R. (2006). Dissociable effects of the implicit and
explicit memory systems on learning control of reaching. Experimental Brain Research,
173 (3), 425–437.

Imamizu, H., Kuroda, T., Miyauchi, S., Yoshioka, T., & Kawato, M. (2003). Modular organi­
zation of internal models of tools in the human cerebellum. Proceedings of the National
Academy of Sciences U S A, 100 (9), 5461–5466.

Imamizu, H., Kuroda, T., Yoshioka, T., & Kawato, M. (2004). Functional magnetic reso­
nance imaging examination of two modular architectures for switching multiple internal
models. Journal of Neuroscience, 24 (5), 1173–1181.

Imamizu, H., Miyauchi, S., Tamada, T., Sasaki, Y., Takino, R., Putz, B., et al. (2000). Hu­
man cerebellar activity reflecting an acquired internal model of a new tool. Nature, 403
(6766), 192–195.

Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by mon­
key medial frontal cortex. Nature Neuroscience, 10 (2), 240–248.

Page 28 of 38
Motor Skill Learning

Ito, M. (2002). Historical review of the significance of the cerebellum and the role of
Purkinje cells in motor learning. Annals of the New York Academy of Science, 978, 273–
288.

James, W. (1890). The principles of psychology. New York: H. Holt.

Jankowski, J., Scheef, L., Hüppe, C., & Boecker, H. (2009). Distinct striatal regions for
planning and executing novel and automated movement sequences. NeuroImage, 44 (4),
1369–1379.

Joiner, W. M., & Smith, M. A. (2008). Long-term retention explained by a model of short-
term learning in the adaptive control of reaching. Journal of Neurophysiology, 100 (5),
2948–2955.

Jones, A. M. (1998). A five year physiological case study of an Olympic runner. British
Journal of Sports Medicine, 32 (1), 39–43.

Jonides, J., Lewis, R. L., Nee, D. E., Lustig, C. A., Berman, M. G., & Moore, K. S. (2008).
The mind and brain of short-term memory. Annual Review of Psychology, 59, 193–224.

Jonides, J., Smith, E. E., Koeppe, R. A., Awh, E. Minoshima, S., & Mintun, M. A. (1993).
Spatial working memory in humans as revealed by PET. Nature, 363, 623–625.

Jordan, K., Heinze, H. J., Lutz, K., Kanowski, M., Jancke, L. (2001). Cortical activations
during the mental rotation of different visual objects. NeuroImage, 13, 143–152.

Jueptner, M., Frith, C. D., Brooks, D. J., Frackowiak, R. S., & Passingham, R. E. (1997).
Anatomy of motor learning. II. Subcortical structures and learning by trial and error. Jour­
nal of Neurophysiology, 77 (3), 1325–1337.

Jueptner, M., Stephan, K. M., Frith, C. D., Brooks, D. J., Frackowiak, R. S., & Passingham,
R. E. (1997). Anatomy of motor learning. I. Frontal cortex and attention to action. Journal
of Neurophysiology, 77 (3), 1313–1324.

Karabanov, A., Cervenka, S., de Manzano, O., Forssberg, H., Farde, L., & Ullen, F. (2010).
Dopamine D2 receptor density in the limbic striatum is related to implicit but not explicit
movement sequence learning. Proceedings of the National Academy of Sciences U S A,
107 (16), 7574–7579.

Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995).
Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Na­
ture, 377 (6545), 155–158.

Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M. M., Turner, R., et al. (1998).
The acquisition of skilled motor performance: Fast and slow experience-driven changes in
primary motor cortex. Proceedings of the National Academy of Sciences U S A, 95 (3),
861–868.

Page 29 of 38
Motor Skill Learning

Kawato, M. (1999). Internal models for motor control and trajectory planning. Current
Opinion in Neurobiology, 9 (6), 718–727.

Keane, M. M., Orlando, F., & Verfaellie, M. (2006). Increasing the salience of fluency cues
reduces the recognition memory impairment in amnesia. Neuropsychologia, 44 (5), 834–
839.

Kim, S. G., Ashe, J., Georgopoulos, A. P., Merkle, H., Ellermann, J. M., Menon, R. S., et al.
(1993). Functional imaging of human motor cortex at high magnetic field. Journal of Neu­
rophysiology, 69 (1), 297–302.

Kleider, H. M., & Goldinger, S. D. (2004). Illusions of face memory: Clarity breeds famil­
iarity. Journal of Memory and Language, 50 (2), 196–211.

Koedijker, J. M., Oudejans, R. R. D., & Beek P. J. (2008). Table tennis performance, follow­
ing explicit and analogy learning over 10,000 repetitions. International Journal of Sport
Psychology, 39, 237–256.

Körding, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning.


Nature, 427 (6971), 244–247.

Krakauer, J. W., Ghez, C., & Ghilardi, M. F. (2005). Adaptation to visuomotor transforma­
tions: Consolidation, interference, and forgetting. Journal of Neuroscience, 25 (2), 473–
478.

Krakauer, J. W., Pine, Z. M., Ghilardi, M. F., & Ghez, C. (2000). Learning of visuomotor
transformations for vectorial planning of reaching trajectories. Journal of Neuroscience,
20 (23), 8916–8924.

Krakauer, J. W., & Shadmehr, R. (2006). Consolidation of motor memory. Trends in Neuro­
sciences, 29 (1), 58–64.

Krigolson, O. E., & Holroyd, C. B. (2006). Evidence for hierarchical error processing in
the human brain. Neuroscience, 137 (1), 13–17.

Krigolson, O. E., & Holroyd, C. B. (2007a). Hierarchical error processing: different errors,
different systems. Brain Research, 1155, 70–80.

Krigolson, O. E., & Holroyd, C. B. (2007b). Predictive information and error processing:
The role of medial-frontal cortex during motor control. Psychophysiology, 44 (4), 586–595.

Krigolson, O. E., Holroyd, C. B., Van Gyn, G., & Heath, M. (2008). Electroencephalograph­
ic correlates of target and outcome errors. Experimental Brain Research, 190 (4), 401–
411.

Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: Modifying inter­
nal models. Current Opinion in Neurobiology, 18 (6), 573–581.

Page 30 of 38
Motor Skill Learning

Li, C. S., Padoa-Schioppa, C., & Bizzi, E. (2001). Neuronal correlates of motor perfor­
mance and motor learning in the primary motor cortex of monkeys adapting to an exter­
nal force field. Neuron, 30 (2), 593–607.

Liao, C. M., & Masters, R. S. W. (2001). Analogy learning: a means to implicit mo­
(p. 433)

tor learning. Journal of Sports Sciences, 19, 307–319.

Lieberman, M. D., Chang, G. Y., Chiao, J., Bookheimer, S. Y., & Knowlton, B. J. (2004). An
event-related fMRI study of artificial grammar learning in a balanced chunk strength de­
sign. Journal of Cognitive Neuroscience, 16 (3), 427–438.

Logie, R. H., Della Sala, S., Beschin, N., Denis, M. (2005). Dissociating mental transforma­
tions and visuo-spatial storage in working memory: Evidence from representational ne­
glect. Memory, 13, 430–434.

Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex codes memorized
movement sequences. Neuron, 45 (6), 967–973.

Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and
conjunctions. Nature, 390, 279–281.

Luo, J., & Niki, K. (2005). Does hippocampus associate discontiguous events? Evidence
from event-related fMRI. Hippocampus, 15 (2), 141–148.

Malone, L. A., Bastian, A. J. (2010). Thinking about walking: Effects of conscious correc­
tion versus distraction on locomotor adaptation. Journal of Neurophysiology, 103 (4),
1954–1962.

Maloney, L. T., & Mamassian, P. (2009). Bayesian decision theory as a model of human vi­
sual perception: Testing Bayesian transfer. Visual Neuroscience, 26 (1), 147–155.

Martin, S. J., Grimwood, P. D., & Morris, R. G. (2000). Synaptic plasticity and memory: An
evaluation of the hypothesis. Annual Review of Neuroscience, 23, 649–711.

Maschke, M., Gomez, C. M., Ebner, T. J., & Konczak, J. (2004). Hereditary cerebellar atax­
ia progressively impairs force adaptation during goal-directed arm movements. Journal of
Neurophysiology, 91 (1), 230–238.

Matsumura, M., Sadato, N., Kochiyama, T., Nakamura, S., Naito, E., Matsunami, K., et al.
(2004). Role of the cerebellum in implicit motor skill learning: a PET study. Brain Re­
search Bulletin, 63 (6), 471–483.

Matsuzaka, Y., Picard, N., & Strick, P. L. (2007). Skill representation in the primary motor
cortex after long-term practice. Journal of Neurophysiology, 97 (2), 1819–1832.

Mazzoni, P., & Krakauer, J. W. (2006). An implicit plan overrides an explicit strategy dur­
ing visuomotor adaptation. Journal of Neuroscience, 26 (14), 3642–3645.

Page 31 of 38
Motor Skill Learning

Mazzoni, P., & Wexler, N. S. (2009). Parallel explicit and implicit control of reaching. PLoS
One, 4 (10), e7557.

McNitt-Gray, J. L., Requejo, P. S., & Flashner, H. (2006). Multijoint control strategies
transfer between tasks. Biological Cybernetics, 94 (6), 501–510.

Miall, R. C., Christensen, L. O., Cain, O., & Stanley, J. (2007). Disruption of state estima­
tion in the human lateral cerebellum. PLoS Biol, 5 (11), e316.

Miall, R. C., Weir, D. J., Wolpert, D. M., & Stein, J. F. (1993). Is the cerebellum a smith pre­
dictor? Journal of Motor Behavior, 25 (3), 203–216.

Miyake, A., & Shah, P. (1999). Models of working memory: Mechanisms of active mainte­
nance and executive control. New York: Cambridge University Press.

Montaldi, D., Mayes, A. R., Barnes, A., Pirie, H., Hadley, D. M., Patterson, J., et al. (1998).
Associative encoding of pictures activates the medial temporal lobes. Human Brain Map­
ping, 6 (2), 85–104.

Morton, S. M., & Bastian, A. J. (2006). Cerebellar contributions to locomotor adaptations


during splitbelt treadmill walking. Journal of Neuroscience, 26 (36), 9107–9116.

Nelson, W. L. (1983). Physical principles for economies of skilled movements. Biological


Cybernetics, 46 (2), 135–147.

Newell, K. M. (1991). Motor skill acquisition. Annual Review of Psychology, 42, 213–237.

Nishimoto, R., & Tani, J. (2009). Development of hierarchical structures for actions and
motor imagery: a constructivist view from synthetic neuro-robotics study. Psychological
Research, 73 (4), 545–558.

Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from
performance measures. Cognitive Psychology, 19 (1), 1–32.

Ohbayashi, M., Ohki, K., Miyashita, Y. (2003). Conversion of working memory to motor se­
quence in the monkey premotor cortex. Science, 301, 233–236.

Overduin, S. A., Richardson, A. G., Lane, C. E., Bizzi, E., & Press, D. Z. (2006). Intermit­
tent practice facilitates stable motor memories. Journal of Neuroscience, 26 (46), 11888–
11892.

Packard, M. G., Hirsh, R., & White, N. M. (1989). Differential effects of fornix and caudate
nucleus lesions on two radial maze tasks: evidence for multiple memory systems. Journal
of Neuroscience, 9 (5), 1465–1472.

Paine, R. W., & Tani, J. (2004). Motor primitive and sequence self-organization in a hierar­
chical recurrent neural network. Neural Networks, 17 (8-9), 1291–1309.

Page 32 of 38
Motor Skill Learning

Pascual-Leone, A., Brasil-Neto, J. P., Valls-Sole, J., Cohen, L. G., & Hallett, M. (1992). Sim­
ple reaction time to focal transcranial magnetic stimulation: Comparison with reaction
time to acoustic, visual and somatosensory stimuli. Brain, 115 (Pt 1), 109–122.

Pascual-Leone, A., Grafman, J., Clark, K., Stewart, M., Massaquoi, S., Lou, J. S., et al.
(1993). Procedural learning in Parkinson’s disease and cerebellar degeneration. Annals of
Neurology, 34 (4), 594–602.

Pascual-Leone, A., Grafman, J., & Hallett, M. (1994). Modulation of cortical motor output
maps during development of implicit and explicit knowledge. Science, 263 (5151), 1287–
1289.

Pascual-Leone, A., & Torres, F. (1993). Plasticity of the sensorimotor cortex representa­
tion of the reading finger in Braille readers. Brain, 116 (Pt 1), 39–52.

Pascual-Leone, A., Valls-Sole, J., Wassermann, E. M., Brasil-Neto, J., Cohen, L. G., & Hal­
lett, M. (1992). Effects of focal transcranial magnetic stimulation on simple reaction time
to acoustic, visual and somatosensory stimuli. Brain, 115 (Pt 4), 1045–1059.

Pascual-Leone, A., Wassermann, E. M., Grafman, J., & Hallett, M. (1996). The role of the
dorsolateral prefrontal cortex in implicit procedural learning. Experimental Brain Re­
search, 107, 479–485.

Paz, R., Boraud, T., Natan, C., Bergman, H., & Vaadia, E. (2003). Preparatory activity in
motor cortex reflects learning of local visuomotor skills. Nature Neuroscience, 6 (8), 882–
890.

Poldrack, R. A., Clark, J., Pare-Blagoev, E. J., Shohamy, D., Creso Moyano, J., Myers, C., et
al. (2001). Interactive memory systems in the human brain. Nature, 414 (6863), 546–550.

Poldrack, R. A., Prabhakaran, V., Seger, C. A., & Gabrieli, J. D. (1999). Striatal activation
during acquisition of a cognitive skill. Neuropsychology, 13 (4), 564–574.

Polyakov, F., Drori, R., Ben-Shaul, Y., Abeles, M., & Flash, T. (2009). A compact
(p. 434)

representation of drawing movements with sequences of parabolic primitives. PLoS Com­


putational Biology, 5 (7), e1000427.

Ralston, H. (1958). Energy-speed relation and optimal speed during level walking. Euro­
pean Journal of Applied Physiology and Occupational Physiology, 17 (4), 277–283.

Ramnani, N. (2006). The primate cortico-cerebellar system: anatomy and function. Nature
Reviews Neuroscience, 7 (7), 511–522.

Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning


and Verbal Behavior, 6 (6), 855–863.

Reber, A. S. (1976). Implicit learning of synthetic languages: The role of instructional set.
Journal of Experimental Psychology: Human Learning and Memory, 2 (1), 88–94.

Page 33 of 38
Motor Skill Learning

Reis, J., Schambra, H., Cohen, L. G., Buch, E. R., Fritsch, B., Zarahn, E., Celnik, P. A., &
Krakauer, J. W. (2009). Noninvasive cortical stimulation enhances motor skill acquisition
over multiple days through an effect on consolidation. Proceedings of the National Acade­
my of Sciences U S A, 106 (5), 1590–1595.

Remy, F., Wenderoth, N., Lipkens, K., Swinnen, S. P. (2010). Dual-task interference during
initial learning of a new motor task results from competition for the same brain areas.
Neuropsychologia, 48 (9), 2517–2527.

Reuter-Lorenz, P. A., Jonides, J., Smith, E. E., Hartley, A., Miller, A., Marshuetz, C., et al.
(2000). Age differences in the frontal lateralization of verbal and spatial working memory
revealed by PET. Journal of Cognitive Neuroscience, 12, 174–187.

Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of
the medial frontal cortex in cognitive control. Science, 306 (5695), 443–447.

Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedur­
al consolidation. Nature Reviews Neuroscience, 5 (7), 576–582.

Robertson, E. M., Tormos, J. M., Maeda, F., & Pascual-Leone, A. (2001). The role of the
dorsolateral prefrontal cortex during sequence learning is specific for spatial information.
Cerebral Cortex, 11 (628–635).

Russeler, J., Kuhlicke, D., & Munte, T. F. (2003). Human error monitoring during implicit
and explicit learning of a sensorimotor sequence. Neuroscience Research, 47 (2), 233–
240.

Sadeh, T., Shohamy, D., Levy, D. R., Reggev, N., & Maril, A. (2011). Cooperation between
the hippocampus and the striatum during episodic encoding. Journal of Cognitive Neuro­
science, 23, 1597–1608.

Sakai, K., Hikosaka, O., Miyauchi, S., Takino, R., Sasaki, Y., & Putz, B. (1998). Transition
of brain activation from frontal to parietal areas in visuomotor sequence learning. Journal
of Neuroscience, 18 (5), 1827–1840.

Schack, T., & Mechsner, F. (2006). Representation of motor skills in human long-term
memory. Neuroscience Letters, 391 (3), 77–81.

Scheidt, R. A., Dingwell, J. B., & Mussa-Ivaldi, F. A. (2001). Learning to move amid uncer­
tainty. Journal of Neurophysiology, 86 (2), 971–985.

Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the
role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37 (6),
1013–1025.

Schmidt, R. A. (1975). A schema theory of discrete motor learning. Psychological Review,


82 (4), 225–260.

Page 34 of 38
Motor Skill Learning

Schneider, S. A., Wilkinson, L., Bhatia, K. P., Henley, S. M., Rothwell, J. C., Tabrizi, S. J., et
al. (2010). Abnormal explicit but normal implicit sequence learning in premanifest and
early Huntington’s disease. Movement Disorders, 25 (10), 1343–1349.

Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le­
sions. Journal of Neurology, Neurosurgery, and Psychiatry, 20 (1), 11–21.

Seger, C. A., & Cincotta, C. M. (2006). Dynamics of frontal, striatal, and hippocampal sys­
tems during rule learning. Cerebral Cortex, 16 (11), 1546–1555.

Seidler, R. D. (2010). Neural correlates of motor learning, transfer of learning, and learn­
ing to learn. Exercise and Sport Sciences Review, 38 (1), 3–9.

Seidler, R. D., & Noll, D. C. (2008). Neuroanatomical correlates of motor acquisition and
motor transfer. Journal of Neurophysiology, 99, 1836–1845.

Seidler, R. D., Noll, D. C., & Chintalapati, P. (2006). Bilateral basal ganglia activation asso­
ciated with sensorimotor adaptation. Experimental Brain Research, 175 (3), 544–555.

Seidler, R. D., Purushotham, A., Kim, S. G., Ugurbil, K., Willingham, D., & Ashe, J. (2002).
Cerebellum activation associated with performance change but not motor learning.
Science, 296 (5575), 2043–2046.

Seitz, R. J., & Roland, P. E. (1992). Learning of sequential finger movements in man: A
combined kinematic and positron emission tomography (PET) Study. European Journal of
Neuroscience, 4, 154–165.

Seydell, A., McCann, B. C., Trommershauser, J., & Knill, D. C. (2008). Learning stochastic
reward distributions in a speeded pointing task. Journal of Neuroscience, 28 (17), 4356–
4367.

Shadmehr, R., & Brashers-Krug, T. (1997). Functional stages in the formation of human
long-term motor memory. Journal of Neuroscience, 17 (1), 409–419.

Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error correction, sensory predic­
tion, and adaptation in motor control. Annual Review of Neuroscience, 33, 89–108.

Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial
thinking and language processing: An individual differences approach. Journal of Experi­
mental Psychology: General, 125 (1), 4–27.

Shohamy, D., & Wagner, A. D. (2008). Integrating memories in the human brain: hip­
pocampal-midbrain encoding of overlapping events. Neuron, 60 (2), 378–389.

Smith, E. E., Jonides, J., & Koeppe, R. A. (1996). Dissociating verbal and spatial working
memory: PET investigations. Journal of Cognitive Neuroscience, 7, 337–356.

Smith, M. A., Ghazizadeh, A., & Shadmehr, R. (2006). Interacting adaptive processes with
different timescales underlie short-term motor learning. PLoS Biology, 4 (6), e179.
Page 35 of 38
Motor Skill Learning

Smith, M. A., & Shadmehr, R. (2005). Intact ability to learn internal models of arm dy­
namics in Huntington’s disease but not cerebellar degeneration. Journal of Neurophysiol­
ogy, 93 (5), 2809–2821.

Song, S., Howard, J. H., Jr., & Howard, D. V. (2007). Implicit probabilistic sequence learn­
ing is independent of explicit awareness. Learning and Memory, 14 (3), 167–176.

Spencer, R. M., & Ivry, R. B. (2009). Sequence learning is preserved in individuals with
cerebellar degeneration when the movements are directly cued. Journal of Cognitive Neu­
roscience, 21, 1302–1310.

Squire, L. R. (1992). Declarative and nondeclarative memory: multiple brain sys­


(p. 435)

tems supporting learning and memory. Journal of Cognitive Neuroscience, 4 (3), 232–243.

Staresina, B. P., & Davachi, L. (2009). Mind the gap: Binding experiences across space
and time in the human hippocampus. Neuron, 63 (2), 267–276.

Stefan, K., Wycislo, M., Gentner, R., Schramm, A., Naumann, M., Reiners, K., et al.
(2006). Temporary occlusion of associative motor cortical plasticity by prior dynamic mo­
tor training. Cerebral Cortex, 16 (3), 376–385.

Sulzenbruck, S., & Heuer, H. (2009). Functional independence of explicit and implicit mo­
tor adjustments. Conscious Cognition, 18 (1), 145–159.

Taylor, J., Klemfuss, N., & Ivry, R. (2010). An explicit strategy prevails when the cerebel­
lum fails to compute movement errors. Cerebellum, 9, 580–586.

Taylor, J. A., & Thoroughman, K. A. (2007). Divided attention impairs human motor adap­
tation but not feedback control. Journal of Neurophysiology, 98 (1), 317–326.

Taylor, J. A., & Thoroughman, K. A. (2008). Motor adaptation scaled by the difficulty of a
secondary cognitive task. PLoS One, 3 (6), e2485.

Thoroughman, K. A., & Shadmehr, R. (2000). Learning of action through adaptive combi­
nation of motor primitives. Nature, 407 (6805), 742–747.

Toni, I., Krams, M., Turner, R., & Passingham, R. E. (1998). The time course of changes
during motor sequence learning: A whole-brain fMRI study. NeuroImage, 8 (1), 50–61.

Trommershäuser, J., Maloney, L. T., & Landy, M. S. (2008). Decision making, movement
planning and statistical decision theory. Trends in Cognitive Sciences, 12 (8), 291–297.

Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Senso­
ry prediction errors drive cerebellum-dependent adaptation of reaching. Journal of Neuro­
physiology, 98 (1), 54–62.

Tunney, R. J., & Fernie, G. (2007). Repetition priming affects guessing not familiarity. Be­
havioral and Brain Functions, 3, 40.

Page 36 of 38
Motor Skill Learning

Verwey, W. B. (1996). Buffer loading and chunking in sequential keypressing. Journal of


Experimental Psychology: Human Perception and Performance, 22, 544–562.

Verwey, W. B. (2001). Concatenating familiar movement sequences: The versatile cogni­


tive processor. Acta Psychologica, 106, 69–95.

Vidoni, E. D., & Boyd, L. A. (2007). Achieving enlightenment: What do we know about the
implicit learning system and its interaction with explicit knowledge? Journal of Neurolog­
ic Physical Therapy, 31 (3), 145–154.

Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in
visual working memory capacity. Nature, 428, 748–751.

Volle, E., Kinkingnéhun, S., Pochon, J. B., Mondon, K., Thiebaut de Schotten, M., Seassau,
M., et al. (2008). The functional architecture of the left posterior and lateral prefrontal
cortex in humans. Cerebral Cortex, 18 (10), 2460–2469.

Voss, J. L., Baym, C. L., & Paller, K. A. (2008). Accurate forced-choice recognition without
awareness of memory retrieval. Learning and Memory, 15 (6), 454–459.

Voss, J. L., & Paller, K. A. (2008). Brain substrates of implicit and explicit memory: The im­
portance of concurrently acquired neural signals of both memory types. Neuropsycholo­
gia, 46 (13), 3021–3029.

Weiner, M. J., Hallett, M., & Funkenstein, H. H. (1983). Adaptation to lateral displacement
of vision in patients with lesions of the central nervous system. Neurology, 33 (6), 766–
772.

Werner, S., & Bock, O. (2007). Effects of variable practice and declarative knowledge on
sensorimotor adaptation to rotated visual feedback. Experimental Brain Research, 178
(4), 554–559.

Werner, S., Bock, O., Gizewski, E. R., Schoch, B., & Timmann, D. (2010). Visuomotor adap­
tive improvement and aftereffects are impaired differentially following cerebellar lesions
in SCA and PICA territory. Experimental Brain Research, 201 (3), 429–439.

Wilkinson, L., Khan, Z., & Jahanshahi, M. (2009). The role of the basal ganglia and its cor­
tical connections in sequence learning: evidence from implicit and explicit sequence
learning in Parkinson’s disease. Neuropsychologia, 47 (12), 2564–2573.

Willingham, D. B. (1998). A neuropsychological theory of motor skill learning. Psychologi­


cal Review, 105 (3), 558–584.

Willingham, D. B. (2001). Becoming aware of motor skill. Trends in Cognitive Sciences, 5


(5), 181–182.

Page 37 of 38
Motor Skill Learning

Willingham, D. B., Salidis, J., & Gabrieli, J. D. (2002). Direct comparison of neural systems
mediating conscious and unconscious skill learning. Journal of Neurophysiology, 88 (3),
1451–1460.

Wise, S. P., Moody, S. L., Blomstrom, K. J., Mitz, A. R. (1998). Changes in motor cortical
activity during visuomotor adaptation. Experimental Brain Research, 121, 285–299.

Wolpert, D. M., & Flanagan, J. R. (2010). Motor learning. Current Biology, 20 (11), R467–
R472.

Wolpert, D. M., & Miall, R. C. (1996). Forward models for physiological motor control.
Neural Networks, 9 (8), 1265–1279.

Yamamoto, T., & Fujinami, T. (2008). Hierarchical organization of the coordinative struc­
ture of the skill of clay kneading. Human Movement Science, 27 (5), 812–822.

Rachael Seidler

Rachael D. Seidler, Dept. of Psychology, School of Kinesiology, Neuroscience Pro­


gram, University of Michigan, Ann Arbor, MI

Bryan L. Benson

Bryan L. Benson, Department of Psychology, School of Kinesiology, University of


Michigan, Ann Arbor, MI

Nathaniel B. Boyden

Nathaniel B. Boyden, Department of Psychology, University of Michigan, Ann Arbor,


MI

Youngbin Kwak

Youngbin Kwak, Neuroscience Program, University of Michigan, Ann Arbor, MI

Page 38 of 38
Memory Consolidation

Memory Consolidation  
John Wixted and Denise J. Cai
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0021

Abstract and Keywords

Memory consolidation is a multifaceted concept. At a minimum, it refers to both cellular


consolidation and systems consolidation. Cellular consolidation takes place in the hours
after learning, stabilizing the memory trace—a process that may involve structural
changes in hippocampal neurons. Systems consolidation refers to a more protracted
process by which memories become independent of the hippocampus as they are estab­
lished in cortical neurons—a process that may involve neural replay. Both forms of consol­
idation may preferentially unfold whenever the hippocampus is not encoding new infor­
mation, although some theories hold that consolidation occurs exclusively during sleep. In
recent years, the notion of reconsolidation has been added to the mix. According to this
idea, previously consolidated memories, when later retrieved, undergo consolidation all
over again. With new findings coming to light seemingly every day, the concept of consoli­
dation will likely evolve in interesting and unpredictable ways in the years to come.

Keywords: cellular consolidation, systems consolidation, reconsolidation, sleep and consolidation

The idea that memories require time to consolidate has a long history, but the under­
standing of what consolidation means has evolved over time. In 1900, the German experi­
mental psychologists Georg Müller and Alfons Pilzecker published a monograph in which
a new theory of memory and forgetting was proposed, one that included—for the first
time—a role for consolidation. Their basic method involved asking subjects to study a list
of paired-associate nonsense syllables and then testing their memory using cued recall af­
ter a delay of several minutes. Typically, some of the list items were forgotten, and to in­
vestigate why that occurred, Müller and Pilzecker (1900) presented subjects with a sec­
ond, interfering list of items to study before memory for the target list was tested. They
found that this interpolated list reduced memory for the target list compared with a con­
trol group that was not exposed to any intervening activity. Critically, the position of the
interfering list within the retention interval mattered such that interference occurring
soon after learning had a more disruptive effect than interference occurring later in the
retention interval. This led them to propose that memories require time to consolidate
and that retroactive interference is a force that compromises the integrity of recently

Page 1 of 34
Memory Consolidation

formed (and not-yet-consolidated) memories. In this chapter, we review the major theo­
ries of consolidation—beginning with the still-relevant account proposed by Müller and
Pilzecker (1900)—and we consider a variety of recent developments in what has become
a rapidly evolving field.

The Early View: Consolidation and Resistance


to Interference
According to Müller and Pilzecker’s (1900) view, consolidation consists of “trace
hardening” (cf. Wickelgren, 1974) in the sense that some (p. 437) physiological process
perseverates and eventually renders the memory trace less vulnerable to interference
caused by new learning. The kind of interference that a consolidated trace theoretically
resists differs from the kind of interference that most experimental psychologists have in
mind when they study forgetting. In the field of experimental psychology, new learning
has long been thought to generate interference by creating competing associations linked
to a retrieval cue, not by affecting the integrity of a fragile memory trace (e.g., Keppel,
1968; Underwood, 1957; Watkins & Watkins, 1975). Traditionally, this kind of interference
has been investigated using an A-B, A-C paired-associates paradigm in which the same
cue words (the A items) are paired with different to-be-remembered target words across
two lists (the B and C items, respectively). In a standard retroactive interference para­
digm, for example, the memory test consists of presenting the A items and asking partici­
pants to recall the B items. Having learned the A-C associations after learning the A-B as­
sociations, the ability of participants to recall the B items is typically impaired, and this
impairment is usually assumed to reflect retrieval competition from the C items. The pow­
erful effect of this kind of “cue overload” interference on retention has been well estab­
lished by decades of psychological research, but it is almost certainly not the only kind of
interference that causes forgetting.

The kind of interference envisioned by Müller and Pilzecker (1900) does not involve over­
loading a retrieval cue but instead involves directly compromising the integrity of a par­
tially consolidated memory trace. In what even today seems like a radical notion to many
experimental psychologists, Müller and Pilzecker (1900) assumed that the interference
was nonspecific in the sense that the interfering material did not have to be similar to the
originally memorized material for interference to occur. Instead, mental exertion of any
kind was thought to be the interfering force (Lechner et al., 1999). “Mental exertion” is
fairly vague concept, and Wixted (2004a) suggested that the kind of intervening mental
exertion that Müller and Pilzecker (1900) probably had in mind consists specifically of
new learning. The basic idea is that new learning, per se, serves as an interfering force
that degrades recently formed and still fragile memory traces.

Loosely speaking, it can be said that Müller and Pilzecker (1900) believed that the memo­
ry trace becomes strengthened by the process of consolidation. However, there is more
than one way that a trace can become stronger, so it is important to keep in mind which
meaning of a “stronger memory trace” applies in any discussion of consolidation. One
Page 2 of 34
Memory Consolidation

way that a trace might become stronger is that it comes to more accurately reflect past
experience than it did when it was first formed, much like a snapshot taken from a Po­
laroid camera comes into sharper focus over time. A trace that consolidated in this man­
ner would yield an ever-clearer memory of the encoding event in response to the same re­
trieval cue. Another way that a trace might become stronger is that it becomes ever more
likely to spring to mind in response to a retrieval cue (even more likely than it was when
the memory was first formed). A memory trace that consolidated in either of these two
ways would support a higher level of performance than it did at the end of training, as if
additional learning occurred despite the absence of additional training.

Still another way that a trace can become stronger is that it becomes hardened against
the destructive forces of interference. A trace that hardens over time (i.e., a trace that
consolidates in that sense) may simultaneously become degraded over time due to the in­
terfering force of new learning or to some other force of decay. As an analogy, a clay
replica of the Statue of Liberty will be at its finest when it has just been completed and
the clay is still wet, but it will also be at its most vulnerable. With the passage of time,
however, the statue dries and becomes more resistant to damage even though it may now
be a less accurate replica than it once was (because of the damage that occurred before
the clay dried). Müller and Pilzecker’s (1900) original view of consolidation, which was
later elaborated by Wickelgren (1974), was analogous to this. That is, the consolidation
process was not thought to render the trace more representative of past experience or to
render it more likely to come to mind than it was at the time of formation; instead, consol­
idation was assumed to render the trace (or its association with a retrieval cue) more re­
sistant to interference even while the integrity of the trace was gradually being compro­
mised by interference.

These considerations suggest a relationship between Müller and Pilzecker’s (1900) view
of consolidation and the time course of forgetting. More specifically, the fact that a memo­
ry trace hardens in such a way as to become increasingly resistant to interference even
as the trace fades may help to explain the general shape of the forgetting function
(Wixted, 2004b). Since the seminal work of Ebbinghaus (1885), a consistent body of evi­
dence has indicated that the proportional rate of (p. 438) forgetting is rapid at first and
then slows to a point at which almost no further forgetting occurs. This general property
is captured by the power law of forgetting (Anderson & Schooler, 1991; Wixted & Carpen­
ter, 2007; Wixted & Ebbesen, 1991), and it is enshrined in Jost’s law of forgetting, which
states that if two memory traces have equal strength but different ages, the older trace
will decay at a slower rate than the younger one from that moment on (Jost, 1897). One
possibility is that the continuous reduction in the rate of forgetting as a trace ages is a re­
flection of the increased resistance to interference as the trace undergoes a slow process
of consolidation (Wickelgren, 1974; Wixted, 2004b).

Page 3 of 34
Memory Consolidation

Modern Views of Consolidation


The view of consolidation advanced by the pioneering experimental psychologists Müller
and Pilzecker was not embraced by the field of experimental psychology in the latter half
of the twentieth century. Ironically, during that same period of time, the notion that mem­
ories consolidate became the “standard story” in the field of neuroscience. The impetus
for this way of thinking among neuroscientists can be traced in large part to the realiza­
tion that the structures of medial temporal lobe (MTL) play a critical role in the formation
of new memories. The importance of these structures became clear when patient H.M. re­
ceived a bilateral medial temporal lobe resection in an effort to control his epileptic
seizures (Scoville & Milner, 1957). Although successful in that regard, H.M. was also un­
expectedly left with a profound case of anterograde amnesia (i.e., the inability to form
new memories from that point on) despite retaining normal perceptual and intellectual
functioning, including normal working memory capacity. Another outcome—one that is
relevant to the issue of consolidation—was that H.M. also exhibited temporally graded
retrograde amnesia (Scoville & Milner, 1957; Squire, 2009). That is, memories that were
formed before surgery were also impaired, and the degree of impairment was greater for
memories that had been formed just before surgery than for memories that had been
formed well before. Although memories of up to 3 years before his surgery were seeming­
ly impaired, H.M.’s older memories appeared to be largely intact (Scoville & Milner,
1957). This result suggests that the brain systems involved in the maintenance of memory
change over time.

Systems Consolidation

The temporal gradient of retrograde amnesia that is associated with injury and disease
was noted long ago by Ribot (1881/1882), but he had no way of knowing what brain struc­
tures were centrally involved in this phenomenon. The experience of H.M. made it clear
that the relevant structures reside in the MTL, and the phenomenon of temporally graded
retrograde amnesia suggests an extended but time-limited role for the MTL in the encod­
ing and retrieval of new memories. That is, the MTL is needed to encode new memories,
and it is needed for a time after they are encoded, but it is not needed indefinitely. The
decreasing dependence of memories on the MTL is known as systems consolidation (Fran­
kland & Bontempi, 2005; McGaugh, 2000). As a result of this process, which may last as
long as several years in humans, memories are eventually reorganized and established in
the neocortex in such a way that they become independent of the MTL (Squire et al.,
2001). Note that this is a different view of consolidation than the resistance-to-interfer­
ence view proposed by Müller and Pilzecker (1900).

The temporal gradient of retrograde amnesia exhibited by patient H.M. (documented dur­
ing early years after surgery) prompted more controlled investigations using both hu­
mans and nonhumans. These studies have shown that the temporal gradient is real and
that it is evident even when bilateral lesions are limited to the hippocampus (a central
structure of the MTL). For example, in a particularly well-controlled study, Anagnostaras
et al. (1999) investigated the effect of hippocampal lesions in rats using a context fear-
Page 4 of 34
Memory Consolidation

conditioning paradigm. In this task, a tone conditional stimulus (CS) is paired with a
shock unconditional stimulus (US) several times in a novel context. Such training results
in a fear of both the tone and the training context (measured behaviorally as the propor­
tion of time spent freezing), and memory for the context-shock association in particular is
known to depend on the hippocampus. Anagnostaras et al. (1999) trained a group of rats
in two different contexts, and this training was later followed by surgical lesions of the
hippocampus. Each rat received training in Context A 50 days before surgery and train­
ing in Context B 1 day before surgery. Thus, at the time lesions were induced, memory for
learning in Context A was relatively old, whereas memory for learning in Context B was
still new. A later test of retention showed that remote (i.e., 50-day-old) memory for con­
textual fear was similar to that of controls, whereas recent (i.e., 1-day-old) memory for
contextual fear was greatly impaired. Thus, in rats, hippocampus-dependent memories
appear to (p. 439) become independent of the hippocampus in a matter of weeks.

Controlled studies in humans sometimes suggest that the time course of systems consoli­
dation often plays out over a much longer period of time period, a finding that is consis­
tent with the time window of retrograde amnesia observed for H.M. However, the time
course is rather variable, and the basis for the variability is not known. Both semantic and
episodic memory have been assessed in studies investigating the temporal gradient of
retrograde amnesia. Semantic memory refers to memory for factual knowledge (e.g.,
what is the capital of Texas?), whereas episodic memory refers to memory for specific
events (e.g., memory for a recently presented list of words or memory for an autobio­
graphical event, such as a trip to the Bahamas).

Temporal Gradient of Semantic Memory


Semantic knowledge is generally acquired gradually across multiple episodes of learning
and is forgotten slowly, so it seems reasonable to suppose that the systems consolidation
of such knowledge would be extended in time. In one recent study of this issue, Manns et
al. (2003) measured factual knowledge in six amnesic patients with damage limited to the
hippocampal region. Participants were asked questions about news events that had oc­
curred from 1950 to early 2002 (e.g., Which tire manufacturer recalled thousands of
tires? [Firestone] What software company was accused of running a monopoly? [Mi­
crosoft]). The data for a particular patient (and for several controls matched to that pa­
tient) were analyzed according to the year in which the patient became amnesic. As
might be expected, memory for factual knowledge was reduced for the period of time fol­
lowing the onset of memory impairment. Thus, for example, if a patient became amnesic
in 1985, then memory for news events that occurred after 1985 was impaired (i.e., an­
terograde amnesia was observed). In addition, and more to the point, factual knowledge
for news events that occurred during the several years immediately before the onset of
memory impairment (e.g., 1980 to 1985) was also impaired, particularly when memory
was measured by free recall rather than by recognition. However, memory for events that
occurred 11 to 30 years before the onset of memory impairment (e.g., 1955 to 1974) was
intact. These older memories, it seems, had become fully consolidated in the neocortex
and were no longer dependent on the structures of the MTL.

Page 5 of 34
Memory Consolidation

The findings from lesion studies have sometimes been corroborated by neuroimaging
studies performed on unimpaired subjects, though the relevant literature is somewhat
mixed in this regard. Although some studies found more activity in the MTL during the
recollection of recent semantic memories compared with remote semantic memories
(Douville et al., 2005; Haist et al., 2001; Smith & Squire, 2009), other studies found no
difference (e.g., Bernard et al., 2004; Maguire et al., 2001; Maguire & Frith, 2003). For
example, using functional magnetic resonance imaging (fMRI), Bernard et al. (2004)
identified brain regions associated with recognizing famous faces from two different peri­
ods: people who became famous in the 1960s to 1970s and people who became famous in
the 1990s. They found that the hippocampus was similarly active during the recognition
of faces from both periods (i.e., no temporal gradient was observed). It is not clear why
studies vary in this regard, but one possibility is that the detection of a temporal gradient
is more likely when multiple time points are assessed, especially during the several years
immediately preceding memory impairment, than when only two time points are assessed
(as in Bernard et al., 2004).

In an imaging study that was patterned after the lesion study reported by Manns et al.
(2003), Smith and Squire (2009) measured brain activity while subjects recalled news
events from multiple time points over the past 30 years. In agreement with the lesion
study, they found that regions in the MTL exhibited a decrease in brain activity as a func­
tion of the age of the memory over a 12-year period (whereas activity was constant for
memories from 13 to 30 years ago). In addition, they found that regions in the frontal
lobe, temporal lobe, and parietal lobe exhibited an increase in activity as a function of the
age of the trace. Thus, it seems that the (systems) consolidation of semantic memories is
a slow process that may require years to complete.

Temporal Gradient of Autobiographical (Episodic) Memory


Although lesion studies and neuroimaging studies point to a temporal gradient for seman­
tic memory lasting years, there is some debate about whether episodic memory—in par­
ticular autobiographical memory—exhibits any temporal gradient at all. For example,
some recent studies performed on H.M. that were conducted not long before he passed
away in 2008 showed that his memory for remote personal experiences, unlike his memo­
ry for remote factual knowledge, was not preserved (Steinvorth, Levine, & Corkin, 2005).
In addition, (p. 440) a number of case studies of memory-impaired patients have reported
impairments of childhood memories (Cipolotti et al., 2001; Eslinger, 1998; Hirano &
Noguchi, 1998; Kitchener et al., 1998; Maguire et al., 2006; Rosenbaum et al., 2004).

These findings appear to suggest that the MTL plays a role in recalling personal episodes
even if they happened long ago, and the apparent implication is that autobiographical
memories do not undergo systems consolidation. However, by the time H.M.’s remote au­
tobiographical memory impairment was documented, he was an elderly patient, and his
brain was exhibiting signs of cortical thinning, abnormal white matter, and subcortical in­
farcts (Squire, 2009). Thus, these late-life brain abnormalities could account for the loss
of remote memories. In addition, in most of the case studies that have documented re­
mote autobiographical memory impairment, damage was not restricted to the MTL. To
Page 6 of 34
Memory Consolidation

compare the remote memory effects of limited MTL damage and damage that also in­
volved areas of the neocortex, Bayley et al. (2005) measured the ability of eight amnesic
patients to recollect detailed autobiographical memories from their early life. Five of the
patients had damage limited to the MTL, whereas three had damage to the neocortex in
addition to MTL damage. They found that the remote autobiographical memories of the
five MTL patients were quantitatively and qualitatively similar to the recollections of the
control group, whereas the autobiographical memories of the three patients with addi­
tional neocortical damage were severely impaired. This result suggests that semantic
memory and episodic memory both eventually become independent of the MTL through a
process of systems consolidation, but the temporal gradient of retroactive amnesia associ­
ated with that process can be obscured if damage extends to the neocortex. MacKinnon
and Squire (1989) also found that the temporal gradient of autobiographical memories for
five MTL patients was similar in duration to the multiyear gradient associated with se­
mantic memory.

Temporal Gradients Involving a Shorter Time Scale


Recent neuroimaging studies have documented a temporal gradient of activity for memo­
ry of simple laboratory stimuli on a time scale that is vastly shorter than the multiyear
process of consolidation suggested by lesion studies of semantic memory and autobio­
graphical memory. For example, using fMRI, Yamashita et al. (2009) measured activity in
the hippocampus and temporal neocortex associated with memory for two sets of paired-
associate figures that subjects had previously memorized. One set was studied 8 weeks
before the memory test (old memories), and the other was studied immediately before the
memory test (new memories). Overall accuracy at the time of test was equated for the
two conditions by providing extra study time for the items that were studied 8 weeks be­
fore. Thus, any differences in activity associated with old and new memories could not be
attributed to differences in memory strength. The results showed that a region in right
hippocampus was associated with greater activity during retrieval of new memories than
old memories, whereas in left temporal neocortex, the opposite activation pattern (i.e.,
old > new) was observed. These results are consistent with a decreasing role of the hip­
pocampus and increasing role of the neocortex as memories age over a period as short as
50 days (cf. Takashima et al., 2006), a time scale of consolidation that is similar to that ob­
served in experimental animals (e.g., Anagnostaras et al., 1999).

An even shorter time scale for systems consolidation was evident in a recent fMRI study
reported by Takashima et al. (2009). Subjects in that study memorized two sets of face–lo­
cation stimuli, one studied 24 hours before the memory test (old memories) and the other
studied 15 minutes before the memory test (new memories). To control for differences in
memory strength, they compared activity for high-confidence hits associated with the old
and new memories and found that hippocampal activity decreased and neocortical activi­
ty increased over the course of 24 hours. In addition, the connectivity between the hip­
pocampus and the neocortical regions decreased, whereas cortico-cortical connectivity
increased (all over the course of only 24 hours). Results like these suggest that the
process of systems consolidation can occur very quickly.

Page 7 of 34
Memory Consolidation

What determines whether the temporal gradient is short or long? The answer is not
known, but Frankland and Bontempi (2006) suggested that the critical variable may be
the richness of the memorized material. To-be-remembered stimuli presented in a labora­
tory are largely unrelated to a subject’s personal history and thus might be integrated
with prior knowledge represented in the cortex in a rather sparse (yet rapid) manner. Au­
tobiographical memories, by contrast, are generally related to a large preexisting knowl­
edge base. The integration of such memories into an intricate knowledge base may re­
quire more extended dialogue between (p. 441) the hippocampus and neocortex (McClel­
land, McNaughton, & O’Reilly, 1995). Alternatively, Tse et al. (2007) suggested that when
memories can be incorporated into an associative “schema” of preexisting knowledge
(i.e., when newly learned information is compatible with previously learned information),
the process of systems consolidation is completed very rapidly (within 48 hours for rats).
However, it is not clear whether this idea can account for the rapid systems consolidation
that is apparent for memories of arbitrary laboratory-based stimuli in humans (e.g.,
Takashima et al., 2009). The as-yet-unresolved question of why memories vary in how
quickly they undergo systems consolidation seems likely to remain a focus of research in
this area for some time to come.

Decreasing Hippocampal Activity Versus Increasing Neocortical Activity


The findings discussed above support the view that memories undergo a process of sys­
tems consolidation in which the structures of the MTL play a decreasing role with the
passage of time. An interesting feature of several of the neuroimaging studies discussed
above is that not only does MTL activity often decrease with time, but also neocortical ac­
tivity increases over time (Smith & Squire, 2009; Takashima et al., 2009; Yamashita et al.,
2009). What might that increased activity signify?

The memory of a past experience that is elicited by a retrieval cue presumably consists of
the reactivation of distributed neocortical areas that were active at the time the trace was
initially encoded (Damasio, 1989; Hoffman & McNaughton, 2002; Johnson, McDuff, Rugg,
& Norman, 2009; McClelland, McNaughton, & O’Reilly, 1995; Squire & Alvarez, 1995).
The primary sensory areas of the brain (e.g., the brain areas activated by the sights,
sounds, and smells associated with a visit to the county fair) converge on association ar­
eas of the brain, which, in turn, are heavily interconnected with the MTL. Conceivably,
memories are stored in widely distributed neocortical areas from the outset, but the hip­
pocampus and other structures of the MTL are required to bind them together until corti­
co-cortical associations develop, eventually rendering the memory trace independent of
the MTL (Wixted & Squire, 2011). In studies using fMRI, the increasing role of the direct
cortico-cortical connections may be reflected in increased neocortical activity as time
passes since the memory trace was formed (Takashima et al., 2009).

In agreement with this possibility, a series of trace eyeblink conditioning studies conduct­
ed by Takehara-Nishiuchi and colleagues has shown that an area of the medial prefrontal
cortex (mPFC) in rats becomes increasingly necessary for the retrieval of memories as
they become decreasingly dependent on the hippocampus over the course of several
weeks. This was shown both by lesion studies and by direct neural recordings of task-re­
Page 8 of 34
Memory Consolidation

lated activity in the mPFC (Takehara, Kawahara, & Kirino, 2003; Takehara-Nishiuchi &
McNaughton, 2008). In one particularly relevant experiment, Takehara-Nishiuchi and Mc­
Naughton (2008) showed that task-related activity of mPFC neurons in rats increased
over the course of several weeks even in the absence of further training. In a conceptual­
ly related study, Frankland et al. (2004) trained mice in a fear-conditioning procedure and
tested memory either 1 day (recent) or 36 days (remote) after training. They found that
activity in multiple association cortical regions (measured by the expression of activity-
regulated genes) was greater for remote than for recent memories. In addition, they
found that the increased cortical activity for remote memories was not evident in mice
with a gene mutation that selectively impairs remote memory. Results like these would
seem to provide direct evidence of the kind of cortical reorganization that has long been
thought to underlie systems consolidation.

Cellular Consolidation

Systems consolidation is not what Müller and Pilzecker (1900) had in mind when they
first introduced the concept of consolidation. Their view was that a memory trace be­
comes increasingly resistant to interference caused by new learning as the trace consoli­
dates, not that the trace becomes reorganized in the neocortex (and, therefore, less de­
pendent on the MTL) over time.

Whereas a role for systems consolidation came into sharper focus in the years following
the recognition of H.M.’s memory impairment, evidence for a second kind of consolida­
tion began to emerge in the early 1970s. This kind of consolidation—called cellular con­
solidation—occurs at the level of neurons (not brain systems) and takes place over the
hours (and, perhaps, days) after a memory is formed in the hippocampus (McGaugh,
2000). Cellular consolidation seems more directly relevant to the trace-hardening physio­
logical processes that Müller and Pilzecker (1900) had in mind, and it had its origins in
the discovery of a phenomenon known as long-term potentiation (LTP; Bliss & Lomo,
1973).

LTP is a relatively long-lasting enhancement of synaptic efficacy that is induced by


(p. 442)

a brief burst of high-frequency electrical stimulation (a tetanus) delivered to presynaptic


neurons (Bliss & Collingridge, 1993). Before the tetanus, a single (weak) test pulse of
electrical stimulation applied to the presynaptic neuron elicits a certain baseline re­
sponse in the postsynaptic neuron, but after the tetanus, that same test pulse elicits a
greater response. The enhanced reactivity typically lasts hours or days (and sometimes
weeks), so it presumably does not represent the way in which memories are permanently
coded. Still, LTP is readily induced in hippocampal neurons, and it is, by far, the leading
approach to modeling the neural basis of initial memory formation (Bliss, Collingridge, &
Morris, 2003; Martin, Grimwood, & Morris, 2000). In this model, tetanic stimulation is
analogous to the effect of a behavioral experience, and the enhanced efficacy of the
synapse is analogous to the memory of that experience.

Page 9 of 34
Memory Consolidation

Although LTP looks like neural memory for an experience (albeit an artificial experience
consisting of a train of electrical impulses), what reason is there to believe that a similar
process plays a role in real memories? The induction of LTP in hippocampal neurons in­
volves the opening of calcium channels in postsynaptic N-methyl-D-aspartate (NMDA) re­
ceptors (Bliss & Collingridge, 1993). When those receptors are blocked by an NMDA an­
tagonist, high-frequency stimulation fails to induce LTP. Perhaps not coincidentally, NM­
DA antagonists have often been shown to impair the learning of hippocampus-dependent
tasks in animals (e.g., Morris et al., 1986; Morris, 1989), as if an LTP-like process in the
hippocampus plays an important role in the formation of new episodic memories. One
study suggests that the encoding of actual memories (not just an artificial train of electri­
cal pulses) also gives rise to LTP in the hippocampus. Whitlock et al. (2006) trained rats
on an inhibitory avoidance task (a task known to be dependent on the hippocampus), and
they were able to find neurons in the hippocampus that exhibited sustained LTP after
training (not after an artificial tetanus). In addition, tetanic stimulation applied to these
neurons after training now had a lesser effect (as if those neurons were already close to
ceiling levels of LTP) than tetanic stimulation applied to the neurons of animals who had
not received training. These findings suggest that LTP may be more than just a model for
memory formation; it may, in fact, be part of the mechanism that underlies the initial en­
coding of memory.

What does LTP have to do with the story of consolidation? The induction of LTP unleashes
a molecular cascade in postsynaptic neurons that continues for hours and results in struc­
tural changes to those neurons. The postsynaptic changes are protein-synthesis depen­
dent and involve morphological changes in dendritic spines (Yuste & Bonhoeffer, 2001)
and the insertion of additional AMPA receptors into dendritic membranes (Lu et al.,
2001). These changes are generally thought to stabilize LTP because LTP degrades rapid­
ly if they do not occur (or are prevented from occurring by the use of a protein synthesis
inhibitor).

LTP exhibits all of the characteristics of consolidation envisioned by Müller and Pilzecker
(1900). In their own work, Müller and Pilzecker (1900) used an original learning phase
(L1), followed by an interfering learning phase (L2), followed by a memory test for the
original list (T1). Holding the retention interval between L1 and T1 constant, they essen­
tially showed that L1-L2-----T1 yields greater interference than L1---L2---T (where the
dashes represent units of time). In experimental animals, memories formed in the hip­
pocampus and LTP induced in the hippocampus both exhibit a similar temporal gradient
with respect to retroactive interference (Izquierdo et al., 1999; Xu et al., 1998). Whether
L1 and L2 both involve hippocampus-dependent learning tasks (e.g., L1 = one-trial in­
hibitory avoidance learning, L2 = exploration of a novel environment), as reported by
Izquierdo et al. (1999), or one involves the induction of LTP (L1) while the other involves
exposure to a learning task (L2), as reported by Xu et al. (1998), the same pattern
emerges. Specifically, L2 interferes with L1 when the time between them is relatively
short (e.g., 1 hour), but not when the time between them is relatively long (e.g., 6 or more
hours). Moreover, if an NMDA antagonist is infused into the hippocampus before L2
(thereby blocking the induction of interfering LTP that might be associated with the
Page 10 of 34
Memory Consolidation

learning of a potentially interfering task), no interference effect is observed even when


the L1-L2 temporal interval is short.

The point is that hippocampus-dependent memories and hippocampal LTP both appear to
be vulnerable to interference early on and then become more resistant to interference
with the passage of time. Moreover, the interfering force is the formation of new memo­
ries (or, analogously, the induction of LTP). Newly induced LTP, like a newly encoded
memory, begins life in a fragile state. Over time, as the process of cellular consolidation
(p. 443) unfolds, recently formed LTP and recently encoded memories become more sta­

ble, which is to say that they become more resistant to interference caused by the induc­
tion of new LTP or by the encoding of new memories.

The use of an NMDA antagonist in rats is not the only way to induce a temporary period
of anterograde amnesia (thereby protecting recently induced LTP or recently formed
memories). In sufficient quantities, alcohol and benzodiazepines have been shown to do
the same in humans. Moreover, like NMDA antagonists, these drugs not only induce an­
terograde amnesia but also inhibit the induction of LTP in the hippocampus (Del Cerro et
al., 1992; Evans & Viola-McCabe, 1996; Givens & McMahon, 1995; Roberto et al., 2002,
Sinclair & Lo, 1986). Interestingly, they also result in a phenomenon known as retrograde
facilitation. That is, numerous studies have reported that even though alcohol induces
amnesia for information studied under the influence of the drug, it actually results in im­
proved memory for material studied just before consumption (e.g., Bruce & Pihl, 1997;
Lamberty, Beckwith, & Petros, 1990; Mann, Cho-Young, & Vogel-Sprott, 1984; Parker et
al., 1980, 1981). Similar findings have been frequently reported for benzodiazepines such
as diazepam and triazolam (Coenen & Van Luijtelaar, 1997; Fillmore et al., 2001;
Ghoneim, Hinrichs, & Mewaldt, 1984; Hinrichs, Ghoneim, & Mewaldt, 1984; Weingartner
et al., 1995). Predrug memories, it seems, are protected from interference that would
have been created during the postdrug amnesic state.

It is important to emphasize that postlearning amnesia-inducing agents (such as NMDA


antagonists used in rats or alcohol and benzodiazepines used in humans) do not enhance
predrug memories in an absolute sense. That is, in response to these drugs, the memories
do not more accurately represent past experience and are not more likely to be retrieved
than they were at the end of learning. Instead, memories formed before drug intake are
forgotten to a lesser degree than memories formed before placebo. By limiting the forma­
tion of new memories, alcohol and benzodiazepines (like NMDA antagonists) may protect
memories that were formed just before drug intake. While protected from the trace-de­
grading force of new memory formation, these memories may be allowed to consolidate
(via cellular consolidation) in a way that hardens them against the interference they will
later encounter when new memories are once again formed. If so, then less forgetting
should be observed than would otherwise be the case.

All of these findings are easily understood in terms of cellular consolidation (not systems
consolidation), but a recent explosion of research on the role of sleep and consolidation

Page 11 of 34
Memory Consolidation

has begun to suggest that the distinction between cellular consolidation and systems con­
solidation may not be as sharp as previously thought.

Sleep and Consolidation


In recent years, the idea that sleep plays a special role in the consolidation of both declar­
ative and nondeclarative memory has received a great deal of attention. Declarative mem­
ory consists of the conscious remembrance of either factual information (i.e., semantic
memory) or past experience (i.e., episodic memory), and it is the kind of memory that we
have discussed thus far in connection with systems consolidation and cellular consolida­
tion. Nondeclarative memory, on the other hand, refers to the acquisition and retention of
nonconscious skills and abilities, with the prototypical example being the ability to ride a
bike. With practice, one’s riding ability improves, but the memory of how to balance on
two wheels is not realized by consciously remembering anything about the past (as in the
case of declarative memory). Instead, that memory is realized by climbing on the bike and
discovering that you can ride it without falling off. Whereas declarative memory depends
on the structures of the MTL, nondeclarative memories do not (Squire, 1992; Squire &
Zola, 1996). As a result, amnesic patients with MTL damage have an impairment of de­
clarative memory (both anterograde amnesia and temporally graded retrograde amne­
sia), but they are generally unimpaired at learning and retaining procedural skills
(Squire, 1992). An amnesic could, for example, learn to ride a bike as easily as you could,
but, unlike you, the amnesic would have no conscious declarative memory of the practice
sessions. Recent research suggests that sleep plays a role in the consolidation of both de­
clarative and nondeclarative memories.

Because sleep is not an undifferentiated state, one focus of this line of research has been
to identify the specific stage of sleep that is important for consolidation. Sleep is divided
into five stages that occur in a regular sequence within 90-minute cycles throughout the
night. Stages 1 through 4 refer to ever-deeper levels of sleep, with stages 3 and 4 often
being referred to as slow-wave sleep. Rapid eye movement (REM) sleep is a lighter stage
of sleep (p. 444) associated with vivid dreams. Although every stage of sleep occurs during
each 90-minute sleep cycle, the early sleep cycles of the night are dominated by slow-
wave sleep, and the later sleep cycles are dominated by REM sleep. For declarative mem­
ory, the beneficial effects of sleep have been almost exclusively associated with slow-wave
sleep, and this is true of its possible role in systems consolidation and cellular consolida­
tion. For nondeclarative memories, the beneficial effects of sleep are more often associat­
ed with REM sleep.

Sleep-Related Consolidation of Declarative Memory

Much evidence dating back at least to Jenkins and Dallenbach (1924) has shown that less
forgetting occurs if one sleeps during the retention interval than if one remains awake. A
reduction in interference is generally thought to play some role in this sleep-related bene­
fit, but there appears to be much more to the story than that. In particular, consolidation

Page 12 of 34
Memory Consolidation

is an important part of the story, and the different stages of sleep play very different
roles.

Ekstrand and colleagues (Ekstrand, 1972; Yaroush, Sullivan, & Ekstrand, 1971) were the
first to address the question of whether the different stages of sleep differentially benefit
what we now call declarative memory. These researchers took advantage of the fact that
most REM sleep occurs in the second half of the night, whereas most non-REM sleep oc­
curs in the first half. Some subjects in this experiment learned a list, went to sleep imme­
diately, and were awakened 4 hours later for a test of recall. These subjects experienced
mostly slow-wave sleep during the 4-hour retention interval. Others slept for 4 hours,
were awakened to learn a list, slept for another 4 hours, and then took a recall test. These
subjects experienced mostly REM sleep during the 4-hour retention interval. The control
(i.e., awake) subjects learned a list during the day and were tested for recall 4 hours lat­
er. The subjects all learned the initial list to a similar degree, but the results showed that
4 hours of mostly non-REM sleep resulted in less forgetting relative to the other two con­
ditions, which did not differ from each other (i.e., REM sleep did not facilitate memory).
Barrett and Ekstrand (1972) reported similar results in a study that controlled for time-of-
day and circadian rhythm confounds, and the effect was later replicated in studies by Pli­
hal and Born (1997, 1999). Slow-wave sleep may play a role in both cellular consolidation
and systems consolidation.

Slow-Wave Sleep and Cellular Consolidation


Why is slow-wave sleep more protective of recently formed memories than REM sleep?
One possibility is that slow-wave sleep is more conducive to cellular consolidation than
REM sleep. In experiments performed on sleeping rats, Jones Leonard et al. (1987)
showed that LTP can be induced in the hippocampus during REM sleep but not during
slow-wave sleep. Whereas slow-wave sleep inhibits the induction of LTP, it does not dis­
rupt the maintenance of previously induced LTP (Bramham & Srebo 1989). In that sense,
slow-wave sleep is like the NMDA antagonists discussed earlier (i.e., they block the induc­
tion of new LTP but not the maintenance of previously induced LTP). By contrast, with re­
gard to synaptic plasticity in the hippocampus, REM sleep is similar to the awake state
(i.e., LTP can be induced during REM).

Even during a night of sleep, interference may occur, especially during REM sleep, when
considerable mental activity (mainly vivid dreaming) takes place and memories can be en­
coded in the hippocampus. But memories are probably never formed during slow-wave
sleep. This is true despite the fact that a considerable degree of mental activity (consist­
ing of static visual images, thinking, reflecting, etc.) occurs during slow-wave sleep. In­
deed, perhaps half as much mental activity occurs during non-REM sleep as during REM
sleep (Nielsen, 2000). However, mental activity and the formation of memories are not
one and the same. The mental activity that occurs during slow-wave sleep is not remem­
bered, perhaps because it occurs during a time when hippocampal plasticity is mini­
mized. Because no new memories are formed in the hippocampus during this time, cellu­
lar consolidation can presumably proceed in the absence of interference. During REM
sleep, however, electroencephalogram (EEG) recordings suggest that the hippocampus is
Page 13 of 34
Memory Consolidation

in an awake-like state, and LTP can be induced (and memories can be formed), so inter­
ference is more likely to occur.

If slow-wave sleep protects recently formed memories from interference while allowing
cellular consolidation to move forward, then a temporal gradient of interference should
be observed. That is, sleep soon after learning should confer more protection than sleep
that is delayed. This can be tested by holding the retention interval between learning (L1)
and test (T1) constant (e.g., at 24 hours), with the location of sleep (S) within that reten­
tion interval varied. Using the notation introduced earlier, the (p. 445) prediction would be
that L1-S-----T1 will confer greater protection than L1---S---T1. If a temporal gradient is
observed (i.e., if memory performance at T1 is greater in the first condition than the sec­
ond), it would suggest that sleep does more than simply subtract out a period of retroac­
tive interference that would otherwise occur. Instead, it would suggest that sleep (pre­
sumably slow-wave sleep) also allows the process of cellular consolidation to proceed in
the absence of interference.

Once again, Ekstrand (1972) performed the pioneering experiment on this issue. In that
experiment, memory was tested for paired-associate words following a 24-hour retention
interval in which subjects slept either during the 8 hours that followed list presentation
or during the 8 hours that preceded the recall test. In the immediate sleep condition (in
which L1 occurred at night, just before sleep), he found that 81 percent of the items were
recalled 24 hours later; in the delayed sleep condition (in which L1 occurred in the morn­
ing), only 66 percent were recalled. In other words, a clear temporal gradient associated
with the subtraction of retroactive interference was observed, one that is the mirror im­
age of the temporal gradient associated with the addition of retroactive interference re­
ported by Müller and Pilzecker (1900). More recent sleep studies have reinforced the idea
that the temporal gradient of retrograde facilitation is a real phenomenon, and they have
addressed various confounds that could have accounted for the results that Ekstrand
(1972) obtained (Gais, Lucas, & Born, 2006; Talamini et al., 2008). The temporal gradient
associated with sleep, like the LTP and animal learning research described earlier, is con­
sistent with the notion that when memory formation is temporarily halted, recently
formed and still-fragile memories are protected from interference. As a result, they are
given a chance to become hardened against the forces of retroactive interference that
they will later encounter (perhaps through a process of cellular consolidation).

Slow-Wave Sleep and Systems Consolidation


Recent sleep studies have also shed light on the mechanism that may account for systems
consolidation, which presumably involves some relatively long-lasting form of communi­
cation between the hippocampus and the neocortex (Marr, 1971). The mechanism of com­
munication is not known, but a leading candidate is neural replay, and most of the work
on this topic comes from sleep studies. The phenomenon of neural replay was initially ob­
served in hippocampal cells of sleeping rats after they had run along a familiar track, and
its discovery was tied to the earlier discovery of place cells in the hippocampus.

Page 14 of 34
Memory Consolidation

Long ago, it was discovered that the firing of particular hippocampal cells in awake rats
is coupled to specific points in the rat’s environment (O’Keefe & Dostrovsky, 1971). These
cells are known as “place cells” because they fire only when the rat traverses a particular
place in the environment. Usually, hippocampal place cells fire in relation to the rat’s po­
sition on a running track. That is, as the rat traverses point A along the track, place cell 1
will reliably fire. As it traverses point B, place cell 2 will fire (and so on). An intriguing
finding that may be relevant to the mechanism that underlies systems consolidation is
that cells that fire in sequence in the hippocampus during a behavioral task tend to be­
come sequentially coactive again during sleep (Wilson & McNaughton, 1994). This is the
phenomenon of neural replay.

Neural reply has most often been observed in rats during slow-wave sleep. It has also oc­
casionally been observed during REM sleep, but, in that case, it occurs at a rate that is
similar to the neuron firing that occurred during learning (Louie & Wilson, 2001) and
thus may simply reflect dreaming. The neural replay that occurs during slow-wave sleep
occurs at a rate five to ten times faster than it did during the waking state (e.g., Ji & Wil­
son, 2007) and may therefore reflect a biological consolidation process separate from
mental activity like dreaming. It is as if the hippocampus is replaying the earlier behav­
ioral experience, perhaps as a way to reorganize the representation of that experience in
the neocortex.

The fact that replay of sequential place cell activity in the hippocampus occurs during
slow-wave sleep does not, by itself, suggest anything about communication between the
hippocampus and the neocortex (the kind of communication that is presumably required
for systems consolidation to take place). However, Ji and Wilson (2007) reported that hip­
pocampal replay during slow-wave sleep in rats was coordinated with firing patterns in
the visual cortex, which is consistent with the idea that this process underlies the reorga­
nization of memories in the neocortex. In addition, Lansink et al. (2009) performed multi­
neuron recordings from the hippocampus and ventral striatum during waking and sleep­
ing states. While the rats were awake, the hippocampal cells fired when the rat traversed
a (p. 446) particular point in the environment (i.e., they were place cells), whereas the stri­
atal cells generally fired in response to rewards. During slow-wave sleep (but not during
REM sleep), they found that the hippocampal and striatal cells reactivated together. The
coordinated firing was particularly evident for pairs in which the hippocampal place cell
fired before the striatal reward-related neuron. Thus, the hippocampus leads reactivation
in a projection area, and this mechanism may underlie the systems consolidation of place–
reward associations.

One concern about studies of neural replay is that the animals are generally overtrained,
so little or no learning actually occurs. Thus, it is not clear whether learning-related neur­
al replay takes place. However, Peyrache et al. (2009) recorded neurons in prefrontal cor­
tex during the course of learning. Rats were trained on a Y-maze task in which they
learned to select the rewarded arm using one rule (e.g., choose the left arm) that
changed to a different rule as soon as a criterion level of performance was achieved (e.g.,
choose the right arm). They identified sets of neuronal assemblies with reliable coactiva­

Page 15 of 34
Memory Consolidation

tions in prefrontal cortex, and some of these coactivations became stronger when the rat
started the first run of correct trials associated with the acquisition of the new rule. Fol­
lowing these sessions, replay during slow-wave sleep mainly involved the learning-related
coactivations. Thus, learning-related replay—the mechanism that may underlie systems
consolidation—can be identified and appears to get underway very soon after learning.

Other evidence suggests that something akin to neural replay occurs in humans as well.
An intriguing study by Rasch et al. (2007) showed that cuing recently formed odor-associ­
ated memories by odor re-exposure during slow-wave sleep—but not during REM sleep—
prompted hippocampal activation (as measured by fMRI) and resulted in less forgetting
after sleep compared with a control group. This result is consistent with the notion that
systems consolidation results from the reactivation of newly encoded hippocampal repre­
sentations during slow-wave sleep. In a conceptually related study, Peigneux et al. (2004)
measured regional cerebral blood flow and showed that hippocampal areas that were ac­
tivated during route learning in a virtual town (a hippocampus-dependent, spatial learn­
ing task) were activated again during subsequent slow-wave sleep. Moreover, the degree
of activation during slow-wave sleep correlated with performance on the task the next
day.

In both these studies, the hippocampal reactivation (perhaps reflective of hippocampo-


neocortical dialogue) occurred within hours of the learning episode, a time course of con­
solidation ordinarily associated with cellular consolidation. The timing observed in these
studies is not unlike that observed in a neuroimaging study discussed earlier in which
hippocampal activity decreased, and neocortical activity increased, over a period as short
as 24 hours (Takashima et al., 2009). Moreover, the timing fits with studies in rats show­
ing that learning-related neural replay is evident in the first slow-wave sleep episode that
follows learning (Peyrache et al., 2009).

In a sleep-deprivation study that also points to an almost immediate role for systems-level
consolidation processes, Sterpenich et al. (2009), using human subjects, investigated
memory for emotional and neutral pictures 6 months after encoding. Half the subjects
were deprived of sleep on the first postencoding night, and half were allowed to sleep
(and then all subjects slept normally each night thereafter). Six months later, subjects
completed a recognition test in the scanner in which each test item was given a judgment
of “remember” (previously seen and subjectively recollected), “know” (previously seen
but not subjectively recollected), or “new” (not previously seen). A contrast between ac­
tivity associated with remembered items and known items yielded a smaller difference in
the sleep-deprived subjects across a variety of brain areas (ventral mPFC, precuneus,
amygdala, and occipital cortex), even though the items had been memorized 6 months
earlier, and these results were interpreted to mean that sleep during the first postencod­
ing night influences the long-term systems-level consolidation of emotional memory.

The unmistakable implication from all of these studies is that the process thought to un­
derlie systems consolidation—namely, neural replay (or neural reactivation)—begins to
unfold in a measurable way along a time course ordinarily associated with cellular consol­

Page 16 of 34
Memory Consolidation

idation. That is, in the hours after a trace is formed, hippocampal LTP stabilizes, and
neural replay in the hippocampus gets underway. These findings would seem to raise the
possibility that the molecular cascade that underlies cellular consolidation also plays a
role in initiating neural replay (Mednick, Cai, Shuman, Anagnostaras, & Wixted, 2011). If
interference occurs while the trace is still fragile, then LTP will not stabilize, and presum­
ably, neural replay will not be initiated. In that case, the memory will be lost. But if hip­
pocampal (p. 447) LTP is allowed to stabilize (e.g., if potentially interfering memories are
blocked, or if a period of slow-wave sleep ensues after learning), then (1) the LTP will sta­
bilize and become more resistant to interference and (2) neural replay in the hippocam­
pus will commence and the memory will start to become reorganized in the neocortex.
Thus, on this view, cellular consolidation is an early component of systems consolidation.

With these considerations in mind, it is interesting to consider why Rasch et al. (2007)
and Peigneux et al. (2004) both observed performance benefits associated with reactiva­
tion during slow-wave sleep. Results like these suggest that reactivation not only serves
to reorganize the memory trace in the neocortex but also strengthens the memory trace
in some way. But in what way is the trace strengthened? Did the reactivation process that
occurred during slow-wave sleep act as a kind of rehearsal, strengthening the memory in
much the same way that ordinary conscious rehearsal strengthens a memory (increasing
the probability that the memory will later be retrieved)? Or did the reactivation instead
serve to render the memory trace less dependent on the hippocampus and, in so doing,
protect the trace from interference caused by the encoding of new memories in the hip­
pocampus (e.g., Litman & Davachi, 2008)? Either way, less forgetting would be (and was)
observed following a period of reactivation compared with a control condition.

The available evidence showing that increased reactivation during slow-wave sleep re­
sults in decreased forgetting after a night of sleep does not shed any direct light on why
reactivation causes the information to be better retained. Evidence uniquely favoring a
rehearsal-like strengthening mechanism (as opposed to protection from interference)
would come from a study showing that reactivation during sleep can be associated with
an actual enhancement of performance beyond the level that was observed at the end of
training. Very few declarative memory studies exhibit that pattern, but one such study
was reported by Cai, Shuman, Gorman, Sage, and Anagnostaras (2009). Using a Pavlov­
ian fear-conditioning task, they found that hippocampus-dependent contextual memory in
mice was enhanced (in an absolute sense) following a period of sleep whether the sleep
phase occurred immediately after training or 12 hours later. More specifically, following
sleep, the percentage of time spent freezing (the main dependent measure of memory) in­
creased beyond that observed at the end of training. This is a rare pattern in studies of
declarative memory, but it is the kind of finding that raises the possibility that sleep-relat­
ed consolidation can sometimes increase the probability that a memory will be retrieved
(i.e., it can strengthen memories in that sense).

Page 17 of 34
Memory Consolidation

Role of Brain Rhythms in the Encoding and Consolidation States of the Hip­
pocampus
Most of the work on hippocampal replay of past experience has looked for the phenome­
non during sleep, as if it might be a sleep-specific phenomenon. However, the key condi­
tion for consolidation to occur may not be sleep, per se. Instead, the key condition may
arise whenever the hippocampus is not in an encoding state, with slow-wave sleep being
an example of such a condition. Indeed, Karlsson and Frank (2009) found frequent awake
replay of sequences of hippocampal place cells in the rat. The rats were exposed to two
environments (i.e., two different running tracks) each day, and each environment was as­
sociated with a different sequence of place cell activity. The interesting finding was that
during pauses in awake activity in environment 2, replay of sequential place cell activity
associated with environment 1 was observed (replay of the local environment was also
observed). The finding that the hippocampus exhibits replay of the remote environment
while the rat is awake suggests that the hippocampus may take advantage of any down
time (including, but not limited to, sleep) to consolidate memory. That is to say, the
processes that underlie systems consolidation may unfold whenever the hippocampus is
not encoding new memories (e.g., Buzsáki, 1989).

In a two-stage model advanced by Buzsáki (1989), the hippocampus is assumed to alter­


nate between what might be referred to as an “encoding state” and a “consolidating
state.” In the encoding state, the hippocampus receives (and encodes) information from
the sensory and association areas of the neocortex. In the consolidating state, the hip­
pocampus sends encoded information back to the neocortex. Hasselmo (1999) argued
that changes in the level of acetylcholine (Ach) mediate the directional flow of informa­
tion to and from the hippocampus. High levels of Ach, which occur during both active-
awake and REM sleep, are associated with the encoding state, whereas low levels of Ach,
which occur during both quiet awake (i.e., when the animal is passive) and slow-wave
sleep, are associated with the consolidating state. Thus, according to this view, the con­
solidating state is not specific to sleep, but it does occur during sleep. Critically, (p. 448)
the encoding and consolidating states are also associated with characteristic rhythmic ac­
tivity, and a basic assumption of this account is that communication between the hip­
pocampus and neocortex is mediated by coordinated oscillatory rhythms across different
structures of the brain (Sirota, Csicsvari, Buhl, & Buzsáki, 2003).

In the encoding state, the cortex is characterized by beta oscillations (i.e., 12 to 20 Hz),
whereas the hippocampus is characterized by theta oscillations (i.e., 4 to 8 Hz). Hip­
pocampal theta oscillations are thought to synchronize neural firing along an input path­
way into the hippocampus. For example, in the presence of theta (but not in its absence),
the hippocampus receives rhythmic input from neurons in the input layers of the adjacent
entorhinal cortex (Chrobak & Buzsáki, 1996). In addition, Siapas, Lubenov, and Wilson
(2005) showed that neural activity in the prefrontal cortex of rats was “phase-locked” to
theta oscillations in the hippocampus in freely behaving (i.e., active-awake) rats. Findings
like these are consistent with the idea that theta rhythms coordinate the flow of informa­
tion into the hippocampus, and still other findings suggest that theta rhythms may facili­

Page 18 of 34
Memory Consolidation

tate the encoding of information flowing into the hippocampus. During the high-Ach en­
coding state—which is a time when hippocampal synaptic plasticity is high (Rasmusson,
2000)—electrical stimuli delivered at intervals equal to theta frequency are more likely to
induce LTP than stimulation delivered at other frequencies (Larson & Lynch, 1986). Thus,
theta appears to play a role both in organizing the flow of information into the hippocam­
pus and in facilitating the encoding of that information.

Lower levels of Ach prevail during quite-awake and slow-wave sleep, and this is thought
to shift the hippocampus into the consolidating state (see Rasch, Born, & Gais, 2006). In
this state, activity along input pathways (ordinarily facilitated by theta rhythms) is sup­
pressed, and hippocampal plasticity is low (i.e., hippocampal LTP is not readily induced).
As such, and as indicated earlier, recently induced LTP is protected from interference and
is given a chance to stabilize as the process of cellular consolidation unfolds. In addition,
under these conditions, the cortex is characterized by low-frequency spindle oscillations
(i.e., 7 to 14 Hz) and delta oscillations (i.e., 4 Hz or less), whereas the hippocampus is as­
sociated with a more broad-spectrum pattern punctuated by brief, high-frequency sharp
waves (i.e., 30 Hz or more) and very-high-frequency “ripples” (about 200 Hz). These
sharp wave oscillations occur within the hippocampal-entorhinal output network, and syn­
chronized neural discharges tend to occur along this pathway during sharp-wave/ripple
events (Buzsáki, 1986; Chrobak & Buzsáki, 1996). Thus, once again, rhythmic activity
seems to coordinate communication between adjacent brain structures, and such commu­
nication has been found to occur between more distant brain structures as well. For ex­
ample, ripples observed during hippocampal sharp waves have been correlated with the
occurrence of spindles in prefrontal cortex (Siapas & Wilson, 1998). Moreover, the neural
replay discussed earlier preferentially takes place during the high-frequency bursts of
spindle waves (Wilson & McNaughton, 1994). All of this suggests that rhythmically based
feedback activity from the hippocampus may serve to “train” the neocortex and thus facil­
itate the process of systems consolidation. When it occurs in the hours after learning, this
kind of systems-level communication presumably involves hippocampal neurons that have
encoded information and that are successfully undergoing the process of cellular consoli­
dation. If so, then, again, cellular consolidation could be regarded as an early component
of the systems consolidation process.

Sleep-Related Consolidation of Nondeclarative Memory

A novel line of research concerned with the role of sleep in consolidation was initiated by
a study suggesting that sleep also plays a role in the consolidation of nondeclarative
memories. Karni et al. (1994) presented subjects with computer-generated stimulus dis­
plays that sometimes contained a small target consisting of three adjacent diagonal bars
(arranged either vertically or horizontally) embedded within a background of many hori­
zontal bars. The displays were presented very briefly (10 ms) and then occluded by a visu­
al mask, and the subject’s job on a given trial was to indicate whether the target items
were arranged vertically or horizontally in the just-presented display. Performance on this

Page 19 of 34
Memory Consolidation

task improves with practice in that subjects can correctly identify the target with shorter
and shorter delays between the stimulus and the mask.

The detection of element orientation differences in these visual displays is a preattentive


process that occurs rapidly and automatically (i.e., no deliberate search is required). In
addition, the learning that takes place with practice presumably reflects plasticity in the
early processing areas of the visual cortex, which would account for why the learning is
extremely specific to the trained stimuli (e.g., if the (p. 449) targets always appear in one
quadrant of the screen during training, no transfer of learning is apparent when the tar­
gets are presented in a different quadrant). Thus, the visual segregation task is not a hip­
pocampus-dependent task involving conscious memory (i.e., it is not a declarative memo­
ry task); instead, it is a nondeclarative memory task.

A remarkable finding reported by Karni et al. (1994; Karni & Sagi, 1993) was that, follow­
ing a night of normal sleep, performance improved on this task to levels that were higher
than the level that had been achieved at the end of training—as if further learning took
place offline during sleep. This is unlike what is typically observed on declarative memory
tasks, which only rarely show an actual performance enhancement. Various control condi­
tions showed that the enhanced learning effect was not simply due to a reduction in gen­
eral fatigue. Instead, some kind of performance-enhancing consolidation apparently oc­
curred while the subjects slept.

Karni et al. (1994) found that depriving subjects of slow-wave sleep after learning did not
prevent the improvement of postsleep performance from occurring, but depriving them of
REM sleep did. Thus, REM sleep seems critical for the sleep-related enhancement of pro­
cedural learning to occur, and similar results have been reported in a number of other
studies (Atienza et al., 2004; Gais et al., 2000; Mednick et al., 2002, 2003; Stickgold,
James, & Hobson, 2000; Walker et al., 2005). These findings have been taken to mean
that nondeclarative memories require a period of consolidation and that REM sleep in
particular is critical for such consolidation to occur. Although most work has pointed to
REM, some work has suggested a role for slow-wave sleep as well. For example, using the
same texture-discrimination task, Stickgold et al. (2000) found that the sleep-dependent
gains were correlated with the amount of slow-wave sleep early in the night and with the
amount of REM sleep late in the night (cf. Gais et al., 2000).

In the case of nondeclarative memories, the evidence for consolidation does not consist of
decreasing dependence on one brain system (as in systems consolidation) or of increasing
resistance to interference (as in cellular consolidation). Instead, the evidence consists of
an enhancement of learning beyond the level that was achieved at the end of training. At
the time Karni et al. (1994) published their findings, this was an altogether new phenome­
non, and it was followed by similar demonstrations of sleep-related enhancement using
other procedural memory tasks, such as the sequential finger-tapping task (Walker et al.,
2002, 2003a, 2003b). In this task, subjects learn a sequence of finger presses, and perfor­
mance improves with training (i.e., the sequence is completed with increasing speed) and
improves still further following a night of sleep, with the degree of improvement often

Page 20 of 34
Memory Consolidation

correlating with time spent in stage 2 sleep. Fischer, Hallschmid, Elsner, and Born (2002)
reported similar results, except that performance gains correlated with amount of REM
sleep. However, one aspect of this motor-sequence-learning phenomenon—namely, the
fact that performance improves beyond what was observed at the end of training—has
been called into question. Rickard et al. (2008) recently presented evidence suggesting
that the apparent absolute enhancement of performance on this task following sleep may
have resulted from a combination of averaging artifacts, time-of-day confounds (cf.
Keisler, Ashe, & Willingham, 2007; Song et al., 2007), and the buildup of fatigue (creating
the impression of less presleep learning than actually occurred). This result does not nec­
essarily question the special role of sleep in the consolidation of motor-sequence learn­
ing, but it does call into question the absolute increase in performance that has been ob­
served following a period of sleep.

Somewhat more puzzling for the idea that REM plays a special role in the consolidation of
nondeclarative memory is that Rasch, Pommer, Diekelmann, and Born (2008) found that
the use of antidepressant drugs, which virtually eliminate REM sleep, did not eliminate
the apparent sleep-related enhancement of performance on two nondeclarative memory
tasks (mirror tracing and motor sequence learning). This result would appear to suggest
that REM sleep, per se, is not critical for the consolidation of learning on either task. In­
stead, conditions that happen to prevail during REM sleep (rather than REM sleep per se)
may be critical. Consistent with this possibility, Rasch, Gais, and Born (2009) showed that
cholinergic receptor blockade during REM significantly impaired motor skill consolida­
tion. This finding suggests that the consolidation of motor skill depends on the high
cholinergic activity that typically occurs during REM (and that presumably occurs even
when REM is eliminated by antidepressant drugs).

What consolidation mechanism is responsible for sleep-related enhancement of perfor­


mance on perceptual learning tasks? Hippocampal replay discussed earlier seems like an
unlikely candidate because this is not a hippocampus-dependent (p. 450) task. However,
some form of neural reactivation in the cortex may be involved, as suggested by one
study using positron emission tomography (PET). Specifically, Maquet et al. (2000)
showed that patterns of widely distributed brain activity evident during the learning of an
implicit serial reaction time task were again evident during REM sleep. Such offline re­
hearsal may reflect a neural replay mechanism that underlies the consolidation of proce­
dural learning, but the evidence on this point is currently quite limited.

Role of Sleep in Creative Problem-Solving


In addition to facilitating the consolidation of rote perceptual and (perhaps) motor learn­
ing, REM might also be an optimal state for reorganizing semantic knowledge (via
spreading activation) in neocortical networks. This could occur because hippocampal in­
put to the neocortex is suppressed during REM, thus allowing for cortical-cortical com­
munication without interference from the hippocampus. Consolidation of this kind could
facilitate insight and creative problem solving (Wagner et al., 2004). In this regard, a re­
cent study by Cai, Mednick, Harrison, Kanady, and Mednick (2009) found that REM sleep,
compared with quiet rest and non-REM sleep, enhanced the integration of previously
Page 21 of 34
Memory Consolidation

primed items with new unrelated items to create new and useful associations. They used
the Remote Associations Test, in which subjects are asked to find a fourth word that could
serve as an associative link between three presented words (such as COOKIES, SIXTEEN,
HEART). The answer to this item is SWEET (cookies are sweet, sweet sixteen, sweet­
heart). It is generally thought that insight is required to hit upon solutions to problems
such as these because the correct answer is usually not the strongest associate of any of
the individual items. After priming the answers earlier in the day using an unrelated
analogies task, subjects took an afternoon nap. Cai et al. (2009) found that quiet rest and
non-REM sleep did not facilitate performance on this task, but REM sleep did. Important­
ly, the ability to successfully create new associations was not attributable to conscious
memory for the previously primed items because there were no differences in recall or
recognition for the primed items between the quiet rest, non-REM sleep, and REM sleep
groups. This finding reinforces the notion that REM sleep is important for nondeclarative
memory, possibly by providing a brain state in which the association of the neocortex can
reorganize without being disrupted by input from the MTL.

Reconsolidation
In recent years, a great deal of attention has focused on the possibility that it is not just
new memories that are labile for a period of time; instead, any recently reactivated mem­
ory—even one that consolidated long ago—may become labile again as a result of having
been reactivated. That is, according to this idea, an old declarative memory retrieved to
conscious awareness again becomes vulnerable to disruption and modification and must
undergo the process of cellular consolidation (and, perhaps, systems consolidation) all
over again.

The idea that recently retrieved memories once again become labile was proposed long
ago (Misanin, Miller, & Lewis, 1968), but the recent resurrection of interest in the subject
was sparked by Nader, Schafe, and Le Doux (2000). Rats in this study were exposed to a
fear-conditioning procedure in which a tone was paired with shock in one chamber (con­
text A). The next day, the rats were placed in another chamber (context B) and presented
with the tone to reactivate memory of the tone–shock pairing. For half the rats, a protein
synthesis inhibitor (anisomycin) was then infused into the amygdala (a structure that is
adjacent to the hippocampus and that is involved in the consolidation of emotional memo­
ry). If the tone-induced reactivation of the fear memory required the memory to again un­
dergo the process of consolidation in order to become stabilized, then anisomycin should
prevent that from happening, and the memory should be lost. This, in fact, was what Nad­
er et al. (2000) reported. Whereas control rats exhibited considerable freezing when the
tone was presented again 1 day later (indicating long-term memory for the original tone–
shock pairing), the anisomycin-treated rats did not (as if they had forgotten the tone–
shock pairing).

In the absence of a protein synthesis inhibitor, a reactivated memory should consolidate


over the course of the next several hours. In accordance with this prediction, Nader et al.

Page 22 of 34
Memory Consolidation

(2000) also reported that when the administration of anisomycin was delayed for 6 hours
after the memory was reactivated (thereby giving the memory a chance to reconsolidate
before protein synthesis was inhibited), little effect on long-term learning was observed.
More specifically, in a test 24 hours after reactivation, the treated rats and the control
rats exhibited a comparable level of freezing in response to the tone (indicating memory
for the tone–shock pairing). (p. 451) All these results parallel the effects of anisomycin on
tone–shock memory when it is infused after a conditioning trial (Schafe & LeDoux, 2000).
What was remarkable about the Nader et al. (2000) results was that similar consolidation
effects were also observed well after conditioning and in response to the reactivation of
memory caused by the presentation of the tone. Similar findings have now been reported
for other tasks and other species (see Nader & Hardt, 2009, for a review).

The notion that a consolidated memory becomes fragile again merely because it is reacti­
vated might seem implausible because personal experience does not suggest that we
place our memories at risk by retrieving them. In fact, the well-known testing effect—the
finding that successful retrieval enhances memory more than additional study—seems to
suggest that the opposite may be true (e.g., Roediger & Karpicke, 2006). However, a frag­
ile trace is also a malleable trace, and it has been suggested that the updating of memory
—not its erasure—may be a benefit of what otherwise seems like a problematic state of
affairs. As noted by Dudai (2004), the susceptibility to corruption of a retrieved memory
“might be the price paid for modifiability” (p. 75). If the reactivated trace is susceptible
only to agents such as anisomycin, which is not a drug that is encountered on a regular
basis, then the price for modifiability might be low indeed. On the other hand, if the trace
is vulnerable to corruption by new learning, as a newly learned memory trace appears to
be, then the price could be considerably higher. In an intriguing new study, Monfils, Cow­
ansage, Klann, and LeDoux (2009) showed that contextual fear memories in rats can be
more readily eliminated by extinction trials if the fear memory is first reactivated by a re­
minder trial. For the first time, this raises the possibility that reactivated memories are
vulnerable to disruption and modification by new learning (not just by protein synthesis
inhibitors).

Much remains unknown about reconsolidation, and there is some debate as to whether
the disruption of a recently retrieved trace is a permanent or a transient phenomenon.
For example, Stafford and Lattal (2009) recently compared the effects of anisomycin ad­
ministered shortly after fear conditioning (which would disrupt the consolidation of a new
memory) or shortly after a reminder trial (which would disrupt the consolidation of a new­
ly retrieved memory). With both groups equated on important variables such as prior
learning experience, they found that the anisomycin-induced deficit on a test of long-term
memory was larger and more persistent in the consolidation group compared with the re­
consolidation group. Still, this study adds to a large and growing literature showing that
reactivated memories are in some way vulnerable in a way that was not fully appreciated
until Nader et al. (2000) drove the point home with their compelling study.

Page 23 of 34
Memory Consolidation

Conclusion
The idea that memories require time to consolidate was proposed more than a century
ago, but empirical inquiry into the mechanisms of consolidation is now more intense than
ever. With that inquiry has come the realization that the issue is complex, so much so
that, used in isolation, the word “consolidation” no longer has a clear meaning. One can
speak of consolidation in terms of memory becoming less dependent on the hippocampus
(systems consolidation) or in terms of a trace becoming stabilized (cellular consolidation).
Alternatively, one can speak of consolidation in terms of enhanced performance (over and
above the level of performance achieved at the end of training), in terms of increased re­
sistance to interference (i.e., less forgetting), or in terms of a presumed mechanism, such
as neural replay or neural reactivation. A clear implication is that any use of the word
consolidation should be accompanied by a statement of what it means. Similarly, any sug­
gestion that consolidation “strengthens” the memory trace should be accompanied by a
clear statement of the way (or ways) in which the trace is thought be stronger than it was
before. A more precise use of the terminology commonly used in this domain of investiga­
tion will help to make sense of the rapidly burgeoning literature on the always fascinating
topic of memory consolidation.

References
Anagnostaras, S. G., Maren S., & Fanselow M. S. (1999). Temporally-graded retrograde
amnesia of contextual fear after hippocampal damage in rats: Within-subjects examina­
tion. Journal of Neuroscience, 19, 1106–1114.

Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psy­
chological Science, 2, 396–408.

Atienza, M., Cantero, J. L., & Stickgold, R. (2004) Posttraining sleep enhances automatici­
ty in perceptual discrimination. Journal of Cognitive Neuroscience, 16, 53–64.

Barrett, T. R., & Ekstrand, B. R. (1972). Effect of sleep on memory: III. Controlling for
time-of-day effects. Journal of Experimental Psychology, 96, 321–327.

Bayley, P. J. Gold, J. J., Hopkins, R. O., & Squire, L. R. (2005). The neuroanatomy of remote
memory. Neuron, 46, 799–810.

Bernard, F. A., Bullmore, E. T., Graham, K. S., Thompson, S. A., Hodges, J. R., &
(p. 452)

Fletcher, P. C. (2004). The hippocampal region is involved in successful recognition of


both remote and recent famous faces. NeuroImage, 22, 1704–1714.

Bliss, T. V. P., & Collingridge, G. L. (1993). A synaptic model of memory: Long-term poten­
tiation in the hippocampus. Nature, 361, 31–39.

Bliss, T. V. P., Collingridge, G. L., & Morris, R. G. (2003). Longterm potentiation: enhanc­
ing neuroscience for 30 years – Introduction. Philosophical Transactions of the Royal So­
ciety of London, Series B, Biological Sciences, 358, 607–611.
Page 24 of 34
Memory Consolidation

Bliss, T. V. P., & Lomo, T. (1973). Long-lasting potentiation of synaptic transmission in the
dentate area of the anaesthetized rabbit following stimulation of the perforant path. Jour­
nal of Physiology, 232, 331–356.

Bramham, C. R., & Srebo, B. (1989). Synaptic plasticity in the hippocampus is modulated
by behavioral state. Brain Research, 493, 74–86.

Bruce, K. R., & Pihl, R. O. (1997). Forget “drinking to forget”: Enhanced consolidation of
emotionally charged memory by alcohol. Experimental and Clinical Psychopharmacology,
5, 242–250.

Buzsáki, G. (1986). Hippocampal sharp waves: Their origin and significance. Brain Re­
search, 398, 242–252.

Buzsáki, G. (1989). A two-stage model of memory trace formation: A role for “noisy” brain
states. Neuroscience, 31, 551–570.

Cai, D. J., Mednick, S. A., Harrison, E. M., Kanady, J. C., & Mednick, S. C. (2009). REM,
not incubation, improves creativity by priming associative networks. Proceedings of the
National Academy of Sciences U S A, 106, 10130–10134.

Cai, D. J., Shuman, T., Gorman, M. R., Sage, J. R., & Anagnostaras, S. G. (2009). Sleep se­
lectively enhances hippocampus-dependent memory in mice. Behavioral Neuroscience,
123, 713–719.

Chrobak, J. J., & Buzsáki, G. (1996) High-frequency oscillations in the output networks of
the hippocampal-entorhinal axis of the freely-behaving rat. Journal of Neuroscience, 16,
3056–3066.

Cipolotti, L., Shallice, T., Chan, D., Fox, N., Scahill, R., Harrison, G., Stevens, J., & Rudge,
P. (2001). Long-term retrograde amnesia: the crucial role of the hippocampus. Neuropsy­
chologia, 39, 151–172.

Coenen, A. M. L., & Van Luijtelaar, E. L. J. M. (1997). Effects of benzodiazepines, sleep


and sleep deprivation on vigilance and memory. Acta Neurologica Belgica, 97, 123–129.

Damasio, A.R. (1989). Time-locked multiregional retroactivation: A systems-level proposal


for the neural substrates of recall and recognition. Cognition, 33, 25–62.

Del Cerro, S., Jung, M., & Lynch, L. (1992). Benzodiazepines block long-term potentiation
in slices of hippocampus and piriform cortex. Neuroscience, 49, 1–6.

Douville, K., Woodard, J. L., Seidenberg, M., Miller, S. K., Leveroni, C. L., Nielson, K. A.,
Franczak, M., Antuono, P., & Rao, S. M. (2005). Medial temporal lobe activity for recogni­
tion of recent and remote famous names: An event related fMRI study. Neuropsychologia,
43, 693–703.

Page 25 of 34
Memory Consolidation

Dudai, Y. (2004). The neurobiology of consolidations, or, how stable is the engram? Annu­
al Review of Psychology, 55, 51–86.

Ebbinghaus, H. (1885). Über das Gedchtnis. Untersuchungen zur experimentellen Psy­


chologie. Leipzig: Duncker & Humblot.

English edition: Ebbinghaus, H. (1913). Memory: A contribution to experimental psychol­


ogy. New York: Teachers College, Columbia University.

Ekstrand, B. R. (1972). To sleep, perchance to dream (about why we forget). In C. P. Dun­


can, L. Sechrest, & A. W. Melton (Eds.), Human memory: Festschrift for Benton J. Under­
wood (pp. 59–82). New York: Appelton-Century-Crofts.

Eslinger, P. J. (1998). Autobiographical memory after temporal lobe lesions. Neurocase, 4,


481–495.

Evans, M. S., & Viola-McCabe, K. E. (1996). Midazolam inhibits long-term potentiation


through modulation of GABAA receptors. Neuropharmacology, 35, 347–357.

Fillmore, M. T., Kelly, T. H., Rush, C. R., & Hays, L. (2001). Retrograde facilitation of mem­
ory by triazolam: Effects on automatic processes. Psychopharmacology, 158, 314–321.

Fischer, S., Hallschmid, M., Elsner, A. L., & Born, J. (2002). Sleep forms memory for fin­
ger skills. Proceedings of the National Academy of Sciences U S A, 99, 11987–11991.

Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memory.
Nature Reviews Neuroscience, 6, 119–130.

Frankland, P. W., & Bontempi, B. (2006). Fast track to the medial prefrontal cortex. Pro­
ceedings of the National Academy of Sciences U S A, 103, 509–510.

Frankland, P. W., Bontempi, B., Talton, L. E., Kaczmarek, L., & Silva, A. J. (2004). The in­
volvement of the anterior cingulate cortex in remote contextual fear memory. Science,
304, 881–883.

Gais, S., Lucas, B., & Born, J. (2006). Sleep after learning aids memory recall. Learning
and Memory, 13, 259–262.

Gais, S., Plihal, W., Wagner, U., Born, J. (2000). Early sleep triggers memory for early visu­
al discrimination skills. Nature Neuroscience, 3, 1335–1339.

Ghoneim, M. M., Hinrichs, J. V., & Mewaldt, S. P. (1984). Dose-response analysis of the be­
havioral effects of diazepam: I. Learning and memory. Psychopharmacology, 82, 291–295.

Givens, B., & McMahon, K. (1995). Ethanol suppresses the induction of long-term potenti­
ation in vivo. Brain Research, 688, 27–33.

Page 26 of 34
Memory Consolidation

Haist, F., Bowden Gore, J., & Mao, H. (2001). Consolidation of human memory over
decades revealed by functional magnetic resonance imaging. Nature Neuroscience, 4,
1139–1145.

Hasselmo, M. E. (1999) Neuromodulation: Acetylcholine and memory consolidation.


Trends in Cognitive Sciences, 3, 351–359.

Hinrichs, J. V., Ghoneim, M. M., & Mewaldt, S. P. (1984). Diazepam and memory: Retro­
grade facilitation produced by interference reduction. Psychopharmacology, 84, 158–162.

Hirano, M., & Noguchi, K. (1998). Dissociation between specific personal episodes and
other aspects of remote memory in a patient with hippocampal amnesia. Perceptual and
Motor Skills, 87, 99–107.

Hoffman, K. L., & McNaughton, B. L. (2002). Coordinated reactivation of distributed


memory traces in primate neocortex. Science, 297, 2070–2073.

Izquierdo, I., Schröder, N., Netto, C. A., & Medina, J. H. (1999). Novelty causes time-de­
pendent retrograde amnesia for one-trial avoidance in rats through NMDA receptor- and
CaMKII-dependent mechanisms in the hippocampus. European Journal of Neuroscience,
11, 3323–3328.

Jenkins, J. B., & Dallenbach, K. M. (1924). Oblivescence during sleep and waking. Ameri­
can Journal of Psychology, 35, 605–612.

Ji, D., & Wilson, M. A. (2007). Coordinated memory replay in the visual cortex and hip­
pocampus during sleep. Nature Neuroscience, 10, 100–107.

Johnson, J. D., McDuff, S. G., Rugg, M. D., & Norman, K. A. (2009). Recollection,
(p. 453)

familiarity, and cortical reinstatement: A multivoxel pattern analysis. Neuron, 63, 697–
708.

Jones Leonard, B., McNaughton, B. L., & Barnes, C. A. (1987). Suppression of hippocam­
pal synaptic activity during slow-wave sleep. Brain Research, 425, 174–177.

Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhängigkeit von der Verteilung der
Wiederholungen [The strength of associations in their dependence on the distribution of
repetitions]. Zeitschrift fur Psychologie und Physiologie der Sinnesorgane, 16, 436–472.

Karlsson, M. P., & Frank, L. M. (2009). Awake replay of remote experiences in the hip­
pocampus. Nature Neuroscience, 12, 913–918.

Karni, A., & Sagi, D. (1993) The time course of learning a visual skill. Nature, 365, 250–
252.

Karni, A., Tanne, D., Rubenstein, B. S., Askenasy, J. J. M., & Sagi, D. (1994). Dependence
on REM sleep of overnight improvement of a perceptual skill. Science, 265, 679–682.

Page 27 of 34
Memory Consolidation

Keppel, G. (1968). Retroactive and proactive inhibition. In T. R. Dixon & D. L. Horton


(Eds.), Verbal behavior and general behavior theory (pp. 172–213). Englewood Cliffs, NJ:
Prentice-Hall.

Keisler, A., Ashe, J., & Willingham, D.T. (2007). Time of day accounts for overnight im­
provement in sequence learning. Learning and Memory, 14, 669–672.

Kitchener, E. G., Hodges, J. R., & McCarthy, R. (1998). Acquisition of post-morbid vocabu­
lary and semantic facts in the absence of episodic memory. Brain 121, 1313–1327.

Lamberty, G. J., Beckwith, B. E., & Petros, T. V. (1990). Posttrial treatment with ethanol
enhances recall of prose narratives. Physiology and Behavior, 48, 653–658.

Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L., & Pennartz, C. M. A.
(2009). Hippocampus leads ventral striatum in replay of place-reward information. PLoS
Biology, 7 (8), e1000173.

Larson, J., & Lynch, G. (1986). Induction of synaptic potentiation in hippocampus by pat­
terned stimulation involves two events. Science, 23, 985–988.

Lechner, H. A., Squire, L. R., & Byrne, J. H. (1999). 100 years of consolidation—Remem­
bering Müller and Pizecker. Learning and Memory, 6, 77–87.

Litman, L., & Davachi, L. (2008) Distributed learning enhances relational memory consol­
idation. Learning and Memory, 15, 711–716.

Louie, K., & Wilson, M. A. (2001). Temporally structured replay of awake hippocampal en­
semble activity during rapid eye movement sleep. Neuron, 29, 145–156.

Lu, W., Man, H., Ju, W., Trimble, W. S., MacDonald, J. F., & Wang, Y. T. (2001). Activation of
synaptic NMDA receptors induces membrane insertion of new AMPA receptors and LTP
in cultured hippocampal neurons. Neuron, 29, 243–254.

MacKinnon, D., & Squire, L. R. (1989). Autobiographical memory in amnesia. Psychobiolo­


gy, 17, 247–256.

Maguire, E. A., & Frith, C. D. (2003) Lateral asymmetry in the hippocampal response to
the remoteness of autobiographical memories. Journal of Neuroscience, 23, 5302–5307.

Maguire, E. A., Henson, R. N. A., Mummery, C. J., & Frith, C. D. (2001) Activity in pre­
frontal cortex, not hippocampus, varies parametrically with the increasing remoteness of
memories. NeuroReport, 12, 441–444.

Maguire, E. A., Nannery, R., & Spiers, H. J. (2006). Navigation around London by a taxi
driver with bilateral hippocampal lesions. Brain, 129, 2894–2907.

Mann, R. E., Cho-Young, J., & Vogel-Sprott, M. (1984). Retrograde enhancement by alco­
hol of delayed free recall performance. Pharmacology, Biochemistry and Behavior, 20,
639–642.
Page 28 of 34
Memory Consolidation

Manns, J. R., Hopkins, R. O., & Squire, L. R. (2003). Semantic memory and the human hip­
pocampus. Neuron, 37, 127–133.

Maquet, P., Laureys, S., Peigneux, P., et al. (2000) Experience-dependent changes in cere­
bral activation during human REM sleep. Nature Neuroscience, 3, 831–836.

Marr, D. (1971). Simple memory: A theory for archicortex. Philosophical Transactions of


the Royal Society of London, Series B, Biological Sciences, 262, 23–81.

Martin, S. J., Grimwood, P. D., & Morris, R. G. M. (2000). Synaptic plasticity and memory:
An evaluation of the hypothesis. Annual Review of Neuroscience, 23, 649–711.

Mednick, S. C., Cai, D. J., Shuman, T., Anagnostaras, S., & Wixted, J. T. (2011). An oppor­
tunistic theory of cellular and systems consolidation. Trends in Neurosciences, 34, 504–
514.

Mednick, S., Nakayam, K., & Stockgold, R. (2003). Sleep-dependent learning: A nap is as
good as a night. Nature Neuroscience, 6, 697–698.

Mednick, S., & Stickgold, R. (2002). The restorative effect of naps on perceptual deterio­
ration. Nature Neuroscience, 5, 677–681.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complemen­
tary learning systems in the hippocampus and neocortex: Insights from the successes and
failures of connectionist models of learning and memory. Psychological Review, 102, 419–
457.

McGaugh, J. L. (2000). Memory: A century of consolidation. Science, 287, 248–251.

Misanin, J. R., Miller, R. R., & Lewis, D. J. (1968). Retrograde amnesia produced by elec­
troconvulsive shock after reactivation of a consolidated memory trace. Science, 160, 203–
204.

Monfils, M., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolida­
tion boundaries: Key to persistent attenuation of fear memories. Science, 324, 951–955.

Morris, R. G. M. (1989). Synaptic plasticity and learning: Selective impairment of learn­


ing in rats and blockade of long-term potentiation in vivo by the N-methyl-D-aspartate re­
ceptor antagonist AP5. Journal of Neuroscience, 9, 3040–3057.

Morris, R. G. M., Anderson, E., Lynch, G. S., & Baudry, M. (1986). Selective impairment of
learning and blockade of long-term potentiation by an N-methyl-D-aspartate receptor an­
tagonist, AP5. Nature, 319, 774–776.

Müller, G. E., & Pizecker, A. (1900). Experimentelle Beiträge zur Lehre vom Gedächtnis.
Z. Psychol. Ergänzungsband (Experimental contributions to the science of memory), 1, 1–
300.

Page 29 of 34
Memory Consolidation

Nader, K., & Hardt, O. (2009). A single standard for memory: The case for reconsolida­
tion. Nature Reviews Neuroscience, 10, 224–234.

Nader, K., Schafe, G. E., LeDoux, J. E. (2000). Fear memories require protein synthesis in
the amygdala for reconsolidation after retrieval. Nature, 406, 722–726.

Nielsen, T. A. (2000). Cognition in REM and NREM sleep: A review and possible reconcili­
ation of two models of sleep mentation. Behavioral and Brain Sciences, 23, 851–866.

O’Keefe, J., & Dostrovsky, J. (1971). The hippocampus as a spatial map: Prelimi­
(p. 454)

nary evidence from unit activity in the freely-moving rat. Brain Research, 34, 171–175.

Parker, E. S., Birnbaum, I. M., Weingartner, H., Hartley, J. T., Stillman, R. C., & Wyatt, R. J.
(1980). Retrograde enhancement of human memory with alcohol. Psychopharmacology,
69, 219–222.

Parker, E. S., Morihisa, J. M., Wyatt, R. J., Schwartz, B. L., Weingartner, H., & Stillman, R.
C. (1981). The alcohol facilitation effect on memory: A dose-response study. Psychophar­
macology, 74, 88–92.

Peigneux, P., Laureys, S., Fuchs, S., Collette, F., Perrin, F., et al. (2004). Are spatial memo­
ries strengthened in the human hippocampus during slow wave sleep? Neuron, 44, 535–
545.

Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S. I., & Battaglia, F. P. (2009). Re­
play of rule-learning related neural patterns in the prefrontal cortex during slee p. Nature
Neuroscience, 12, 919–926.

Plihal, W., & Born, J. (1997). Effects of early and late nocturnal sleep on declarative and
procedural memory. Journal of Cognitive Neuroscience, 9, 534–547.

Plihal, W., & Born, J. (1999). Effects of early and late nocturnal sleep on priming and spa­
tial memory. Psychophysiology, 36, 571–582.

Rasch, B. H., Born, J., & Gais, S. (2006). Combined blockade of cholinergic receptors
shifts the brain from stimulus encoding to memory consolidation. Journal of Cognitive
Neuroscience, 18, 793–802.

Rasch, B., Buchel, C., Gais, S., & Born, J. (2007). Odor cues during slow-wave sleep
prompt declarative memory consolidation. Science, 315, 1426–1429.

Rasch, B., Gais, S., & Born, J. (2009) Impaired off-line consolidation of motor memories af­
ter combined blockade of cholinergic receptors during REM sleep-rich sleep. Neuropsy­
chopharmacology, 34, 1843–1853.

Rasch, B., Pommer, J., Diekelmann, S., & Born, J. (2008). Pharmacological REM sleep sup­
pression paradoxically improves rather than impairs skill memory. Nature Neuroscience,
12, 396–397.

Page 30 of 34
Memory Consolidation

Rasmusson, D. D. (2000). The role of acetylcholine in cortical synaptic plasticity. Behav­


ioural Brain Research, 115, 205–218.

Ribot, T. (1881). Les maladies de la memoire [Diseases of memory]. New York: Appleton-
Century-Crofts.

Ribot, T. (1882). Diseases of memory: An essay in positive psychology. London: Kegan


Paul, Trench & Co.

Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not en­
hance motor sequence learning. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 34, 834–842.

Roberto, M., Nelson, T. E., Ur, C. L., & Gruol, D. L. (2002). Long-term potentiation in the
rat hippocampus is reversibly depressed by chronic intermittent ethanol exposure. Jour­
nal of Neurophysiology, 87, 2385–2397.

Roediger, H. L., & Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests im­
proves long-term retention. Psychological Science, 17, 249–255.

Rosenbaum, R.S., McKinnon, M. C., Levine, B., & Moscovitch, M. (2004). Visual imagery
deficits, impaired strategic retrieval, or memory loss: Disentangling the nature of an am­
nesic person’s autobiographical memory deficit. Neuropsychologia, 42, 1619–1635.

Schafe, G. E., & LeDoux, J. E. (2000). Memory consolidation of auditory Pavlovian fear
conditioning requires protein synthesis and protein kinase A in the amygdala. Journal of
Neuroscience, 20, RC96, 1–5.

Scoville, W. B., & Milner, B. (1957). Loss of recent memory after bilateral hippocampal le­
sions. Journal of Neurology, Neurosurgery and Psychiatry, 20, 11–21.

Siapas, A. G., Lubenov, E. V., & Wilson, M. A., (2005). Prefrontal phase locking to hip­
pocampal theta oscillations. Neuron, 46, 141–151.

Siapas, A. G., & Wilson, M. A. (1998). Coordinated interactions between hippocampal rip­
ples and cortical spindles during slow-wave sleep. Neuron, 21, 1123–1128.

Sinclair, J. G., & Lo, G. F. (1986). Ethanol blocks tetanic and calcium-induced long-term
potentiation in the hippocampal slice. General Pharmacology, 17, 231–233.

Sirota, A., Csicsvari, J. Buhl, D., & Buzsáki, G. (2003). Communication between neocortex
and hippocampus during sleep in rats and mice. Proceedings of the National Academy of
Sciences U S A, 100, 2065–2069.

Smith, C. N., & Squire, L. R. (2009). Medial temporal lobe activity during retrieval of se­
mantic memory is related to the age of the memory. Journal of Neuroscience, 29, 930–
938.

Page 31 of 34
Memory Consolidation

Song, S. S., Howard, J. H., Jr., & Howard, D. V. (2007). Sleep does not benefit probabilistic
motor sequence learning. Journal of Neuroscience, 27, 12475–12483.

Squire, L. R. (1992) Memory and the hippocampus: a synthesis from findings with rats,
monkeys, and humans. Psychological Review, 99, 195–231.

Squire, L. R. (2009). The legacy of patient H.M. for neuroscience. Neuron, 61, 6–9.

Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consolidation: A neu­
robiological perspective. Current Opinion in Neurobiology, 5, 169–177.

Squire, L. R., Clark, R. E., & Knowlton, B. J. (2001). Retrograde amnesia. Hippocampus,
11, 50–55.

Squire, L. R., & Wixted, J. T. (2011). The cognitive neuroscience of human memory since
H.M. Annual Review of Neuroscience, 34, 259–288.

Squire, L. R., & Zola, S. M. (1996). Structure and function of declarative and nondeclara­
tive memory systems. Proceedings of the National Academy of Sciences U S A, 93, 13515–
13522.

Stafford, J. M., & Lattal, K. M. (2009). Direct comparisons of the size and persistence of
anisomycin-induced consolidation and reconsolidation deficits. Learning and Memory, 16,
494–503.

Steinvorth, S., Levine, B., & Corkin, S. (2005). Medial temporal lobe structures are need­
ed to re-experience remote autobiographical memories: evidence from H.M. and W.R.
Neuropsychologia, 43, 479–496.

Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Des­
seilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P.
(2009). Sleep promotes the neural reorganization of remote emotional memory. Journal of
Neuroscience, 16, 5143–5152.

Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires
sleep after training. Nature Neuroscience, 3, 1237–1238.

Takashima, A., Nieuwenhuis, I. L. C., Jensen, O., Talamini, L. M., Rijpkema, M., & Fernán­
dez, G. (2009). Shift from hippocampal to neocortical centered retrieval network with
consolidation. Journal of Neuroscience, 29, 10087–10093.

(p. 455) Takashima, A., Petersson, K. M., Rutters, F., Tendolkar, I., Jensen, O., Zwarts, M.
J., McNaughton, B. L., & Fernández, G. (2006). Declarative memory consolidation in hu­
mans: A prospective functional magnetic resonance imaging study. Proceedings of the Na­
tional Academy of Sciences U S A, 103, 756–761.

Page 32 of 34
Memory Consolidation

Takehara, K., Kawahara, S., & Kirino, Y. (2003). Time-dependent reorganization of the
brain components underlying memory retention in trace eyeblink conditioning. Journal of
Neuroscience, 23, 9897–9905.

Takehara-Nishiuchi, K., & McNaughton, B. L. (2008). Spontaneous changes of neocortical


code for associative memory during consolidation. Science, 322, 960–963.

Talamini, L. M., Nieuwenhuis, I. L., Takashima, A., & Jensen, O. (2008). Sleep directly fol­
lowing learning benefits consolidation of spatial associative memory. Learning and Memo­
ry, 15, 233–237.

Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M.
P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316, 76–82.

Underwood, B. J. (1957). Interference and forgetting. Psychological Review, 64, 49–60.

Wagner, U., Gais, S., Haider, H., Verleger, R., & Born, J. (2004) Sleep inspires insight. Na­
ture, 427, 352–355.

Walker, M. P., Brakefield, T., Hobson, J. A., & Stickgold, R. (2003a). Dissociable stages of
human memory consolidation and reconsolidation. Nature, 425, 616–620.

Walker, M. P., Brakefield, T., Morgan, A., Hobson, J. A., & Stickgold, R. (2002). Practice
with sleep makes perfect: Sleep-dependent motor skill learning. Neuron, 35, 205–211.

Walker, M. P., Brakefield, T., Seidman, J., Morgon, A., Hobson, J. A., & Stickgold, R.
(2003b). Sleep and the time course of motor skill learning. Learning and Memory, 10,
275–284.

Walker, M. P., Stickgold, R., Jolesz, F. A., & Yoo, S. S. (2005). The functional anatomy of
sleep-dependent visual skill learning. Cerebral Cortex, 15, 1666–1675.

Watkins, C., & Watkins, M. J. (1975). Buildup of proactive inhibition as a cue-overload ef­
fect. Journal of Experimental Psychology: Human Learning and Memory, 1, 442–452.

Weingartner, H. J., Sirocco, K., Curran, V., & Wolkowitz, O. (1995). Memory facilitation fol­
lowing the administration of the benzodiazepine triazolam. Experimental and Clinical Psy­
chopharmacology, 3, 298–303.

Whitlock, J. R., Heynen A. J., Schuler M. G., & Bear M. F. (2006). Learning induces long-
term potentiation in the hippocampus. Science, 313, 1058–1059.

Wickelgren, W. A. (1974). Single-trace fragility theory of memory dynamics. Memory and


Cognition, 2, 775–780.

Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble memo­


ries during sleep. Science, 265, 676–679.

Page 33 of 34
Memory Consolidation

Wixted, J. T. (2004a). The psychology and neuroscience of forgetting. Annual Review of


Psychology, 55, 235–269.

Wixted, J. T. (2004b). On common ground: Jost’s (1897) law of forgetting and Ribot’s
(1881) law of retrograde amnesia. Psychological Review, 111, 864–879.

Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law and the Ebbinghaus
savings function. Psychological Science, 18, 133–134.

Wixted, J. T., & Ebbesen, E. (1991). On the form of forgetting. Psychological Science, 2,
409–415.

Xu, L., Anwyl, R., & Rowan, M. J. (1998). Spatial exploration induces a persistent reversal
of long-term potentiation in rat hippocampus. Nature, 394, 891–894.

Yamashita, K., Hirose, S., Kunimatsu, A., Aoki, S., Chikazoe, J., Jimura, K., Masutani, Y.,
Abe, O., Ohtomo, K., Miyashita, Y., & Konishi, S. (2009). Formation of long-term memory
representation in human temporal cortex related to pictorial paired associates. Journal of
Neuroscience, 29, 10335–10340.

Yaroush, R., Sullivan, M. J., & Ekstrand, B. R. (1971). The effect of sleep on memory: II.
Differential effect of the first and second half of the night. Journal of Experimental Psy­
chology, 88, 361–366.

Yuste, R., & Bonhoeffer, T. (2001). Morphological changes in dendritic spines associated
with long-term synaptic plasticity. Annual Review of Neuroscience, 24, 1071–1089.

John Wixted

John Wixted is Distinguished Professor of Psychology at the University of California


San Diego.

Denise J. Cai

Denise J. Cai, University of California, San Diego

Page 34 of 34
Age-Related Decline in Working Memory and Episodic Memory

Age-Related Decline in Working Memory and Episodic


Memory  
Sander Daselaar and Roberto Cabeza
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0022

Abstract and Keywords

Memory is one of the cognitive functions that deteriorate most with age. The types of
memory most affected by aging are working memory, the short-term memory mainte­
nance and simultaneous manipulation of information, and episodic memory, our memory
for personally experienced past events. Functional neuroimaging studies indicate impor­
tant roles in age-related memory decline for the medial temporal lobe (MTL) and pre­
frontal cortex (PFC) regions, which have been linked to two major cognitive aging theo­
ries, the resource and binding deficit hypotheses, respectively. Interestingly, functional
neuroimaging findings also indicate that aging is not exclusively associated with decline.
Some older adults seem to deal with PFC and MTL decline by shifting to alternative brain
resources that can compensate for their memory deficits. In the future, these findings
may help to distinguish normal aging from early Alzheimer’s dementia and the develop­
ment of memory remediation therapies.

Keywords: functional neuroimaging, aging, working memory, episodic memory, medial temporal lobe, prefrontal
cortex

Page 1 of 27
Age-Related Decline in Working Memory and Episodic Memory

Introduction

Figure 22.1 Longitudinal changes in volumes of pre­


frontal cortex (A), hippocampus (B), and rhinal cor­
tex (C) as a function of baseline age.

From Raz et al., 2005. Reprinted with permission


from Oxford University Press.

As we age, our brain declines both in terms of anatomy and physiology. This brain decline
is accompanied by cognitive decline, which is most notable in the memory domain. Un­
derstanding the neural basis of age-related memory decline is important for two main
reasons. First, in view of the growing number of older adults in today’s society, cognitive
aging is increasingly becoming a problem in general health care, and effective therapies
can only be developed on the basis of knowledge obtained through basic research. Se­
cond, there is a subgroup of elderly people whose memory impairments are more severe,
preventing normal functioning in society. For these persons, such memory impairments
can be the earliest sign of pathological age-related conditions, such as Alzheimer’s de­
mentia (AD). Particularly in the early stages of this disease, the differentiation from nor­
mal age-related memory impairments is very difficult to make. Thus, it is important to
map out which memory deficits can be regarded as a correlate of normal aging and which
deficits are associated with age-related pathology. Working memory (WM) and episodic
memory (EM) are the two types of memory most affected by the aging process. WM
refers to the short-term memory maintenance and simultaneous manipulation of informa­
tion. Clinical and functional neuroimaging evidence indicates that WM is particularly de­
pendent on the functions of the prefrontal cortex (PFC) and the parietal cortex (Wager &
Smith, 2003). EM refers to the encoding and retrieval of personally experienced events
(Gabrieli, 1998; Tulving, 1983). Clinical studies have shown that EM is primarily depen­
dent on the integrity of the medial temporal lobe (MTL) memory system (Milner, 1972;
Squire, Schmolck, & Stark, 2001). However, functional neuroimaging (p. 457) studies have
also underlined the contributions of PFC to EM processes (Simons & Spiers, 2003). In
this chapter, we examine the roles of PFC and MTL changes in age-related decline in WM
and EM by reviewing findings from functional neuroimaging studies of healthy aging.
With respect to EM, most studies have scanned either the encoding or the retrieval phase
Page 2 of 27
Age-Related Decline in Working Memory and Episodic Memory

of EM, and hence, we will consider these two phases in separate sections. We also discuss
how these WM and EM findings relate to two major cognitive theories of aging, the re­
source deficit hypothesis (Craik, 1986), and the binding deficit hypothesis (Johnson,
Hashtroudi, & Lindsay, 1993; Naveh-Benjamin, 2000).

Regarding the role of PFC and MTL, one influential neurocognitive view is that the age-
related decline in EM and WM results from a selective PFC decline, whereas MTL degra­
dation is an indicator of pathological age-related conditions (Buckner, 2004; Hedden &
Gabrieli, 2004; West, 1996). This is based on anatomical studies showing that the PFC is
the brain region that shows the greatest brain atrophy with age (Raz, 2005; Raz et al.,
1997). However, there is now substantial evidence that the MTL also shows substantial
anatomical and functional decline in healthy older adults (OAs), and thus, it is no longer
possible to ascribe age-related memory deficits exclusively to PFC decline. It should be
noted, though, that not all MTL regions show decline with age. As shown in Figure 22.1, a
recent longitudinal study found that in healthy OAs, the hippocampus showed atrophy
similar to the PFC, whereas the rhinal cortex did not (Raz et al., 2005). The differential ef­
fects of aging on hippocampus and rhinal cortex is very interesting because the rhinal
cortex is one of the regions first affected by AD (Braak, Braak, & Bohl, 1993). As dis­
cussed later, together with recent functional magnetic resonance imaging (fMRI) evi­
dence of dissociations between hippocampal and rhinal functions in aging (Daselaar,
Fleck, Dobbins, Madden, & Cabeza, 2006), these findings have implications for the early
diagnosis of AD.

This chapter focuses on two major cognitive factors thought to underlie age-related mem­
ory decline and strongly linked to PFC and MTL function, namely deficits in executive
function and deficits in binding processes. The term executive function describes a set of
cognitive abilities that control and regulate other abilities and behaviors. With respect to
EM, executive functions are necessary (p. 458) to keep information available online so that
it can be encoded in or retrieved from EM. The PFC is generally thought to be the key
brain region underlying executive functions (Miller & Cohen, 2001). According to the re­
source deficit hypothesis (Craik, 1986), age-related cognitive impairments, including WM
and EM deficits, are the result of a general reduction in attentional resources. As a result,
OAs have greater difficulties with cognitive tasks that provide less environmental sup­
port, and hence require greater self-initiated processing. Executive and PFC functions are
thought to be major factors explaining resource deficits in OAs (Craik, 1977, 1986; Glisky,
2007; West, 1996). Yet, we acknowledge that there are other factors, not discussed in this
chapter, that can explain these deficits, including speed of processing (Salthouse, 1996)
and inhibition deficits (Hasher, Zacks, & May, 1999).

Binding refers to our capacity to bind into one coherent representation the individual ele­
ments that together make up an episode in memory, such as sensory inputs, thoughts,
and emotions. Binding is assumed to be more critical for recollection than for familiarity.
Recollection refers to remembering an item together with contextual details, whereas fa­
miliarity refers to knowing that an item without occurred in the past even though its con­
textual details cannot be retrieved. According to relational memory theory, different sub­

Page 3 of 27
Age-Related Decline in Working Memory and Episodic Memory

regions of MTL are differentially involved in memory for relations among items and mem­
ory for individual items (Eichenbaum, Yonelinas, & Ranganath, 2007). In particular, the
hippocampal formation is more involved in relational memory (binding, recollection),
whereas the surrounding parahippocampal gyrus and rhinal cortex are more involved in
item memory (familiarity). According to the binding deficit hypothesis (Johnson et al.,
1993; Naveh-Benjamin, 2000), OAs are impaired in forming and remembering associa­
tions between individual items and between items and their context. As a result, age-re­
lated WM and EM impairments are more pronounced on tasks that require binding be­
tween study items (i.e., relational memory and recollection-based tasks). It is important to
note that these tests also depend on executive functions mediated by PFC, and hence it is
not possible to simply associate age-related deficits in these tasks with MTL function.

Given the role of PFC in executive functions and MTL in binding operations, we will take
an in-depth look at functional neuroimaging studies of WM and EM that revealed age-re­
lated functional changes in PFC or MTL regions. We will first discuss studies showing
PFC differences during WM and EM encoding and retrieval, and then MTL differences.
We will conclude with a discussion of different interpretations of age-related memory de­
cline derived from these findings, and how they relate to deficits in executive function
and binding capacity.

Neuroimaging Studies of Working Memory and


Episodic Memory
Prefrontal Cortex

In the case of the PFC, both age-related reductions and increases in activity have been
found. Age-related reductions in PFC activity are often accompanied by increased activity
in the contralateral hemisphere, leading to a more bilateral activation pattern in OAs than
in younger adults (YAs). This pattern has been conceptualized in a model called hemi­
spheric asymmetry reduction in older adults (HAROLD), which states that, under similar
conditions, PFC activity tends to be less lateralized in OAs than in YAs (Cabeza, 2002). In
addition to the HAROLD pattern, over-recruitment of PFC regions in OAs is often found in
the same hemisphere. When appropriate, we will distinguish between neuroimaging stud­
ies presenting task conditions in distinct blocks (blocked design) and studies that charac­
terize stimuli on a trial-by-trial basis based on performance (event-related design).

Working Memory
Imaging studies of aging and WM function have shown altered patterns of activation in
OAs compared with YAs, particularly in PFC regions. Studies that compared simple main­
tenance tasks have generally found increased PFC activity in OAs, which is correlated
with better performance (Rypma & D’Esposito, 2000). Additionally, PFC activity in OAs
not only is greater overall in these studies but also is often more bilateral, exhibiting the

Page 4 of 27
Age-Related Decline in Working Memory and Episodic Memory

aforementioned HAROLD pattern (Cabeza et al., 2004; Park et al., 2003; Reuter-Lorenz et
al., 2000).

Figure 22.2 Young participants show left-lateralized


prefrontal cortex (PFC) activity during verbal work­
ing memory, and right PFC activity during spatial
working memory, whereas older adults show bilater­
al PFC activity in both tasks: the HAROLD pattern.

From Reuter-Lorenz, et al., 2000. Courtesy of P.


Reuter-Lorenz.

For example, Reuter-Lorenz et al. (2000) used a maintenance task in which participants
maintained four letters in WM and then compared them to a probe letter. As shown in
Figure 22.2, YAs showed left lateralized activity, whereas OAs showed bilateral activity.
They interpreted this HAROLD pattern as compensatory. Consistent with this interpreta­
tion, the OAs who showed the bilateral activation pattern were faster in the verbal WM
task than those who did not. In addition to the verbal WM condition, they also included a
spatial WM task. In this (p. 459) task, YAs activated right PFC, and OAs additionally re­
cruited left PFC. Thus, even though age-related increases were in opposite hemispheres,
both verbal and spatial WM conditions yielded the HAROLD pattern (see Figure 22.2).
This finding supports the generalizability of the HAROLD model to different kinds of stim­
uli.

In contrast to studies using simple maintenance tasks, recent WM studies that manipulat­
ed WM load found both age-related decreases and increases in PFC activity. Cappell et al.
(Cappell, Gmeindl, & Reuter-Lorenz, 2010) used a WM maintenance task with three dif­
ferent memory loads: low, medium, and high. OAs showed increased activation in right
dorsolateral PFC regions during the lower load conditions. However, during the highest
load condition, OAs showed a reduction in left dorsolateral PFC activation. Another WM
study by Schneider-Garces et al. (2009) reported similar findings. They varied WM load
between two and six letters, and measured a “throughput” variable reflecting the amount
of information processed at a given load. Whereas YAs showed increasing throughput lev­
els with higher WM loads, the levels of OAs showed an asymptote-like function with lower
levels at the highest load. Matching the behavioral results, they found that overall PFC
activity showed an increasing function in YAs, but an asymptotic function in OAs, leading
to an age-related over-recruitment in PFC activity during the lower loads and an under-
recruitment during the highest load.

Page 5 of 27
Age-Related Decline in Working Memory and Episodic Memory

The results by Cappell et al. and Schneider-Garces et al. are in line with the compensa­
tion-related utilization of neural circuits hypothesis (CRUNCH) (Reuter-Lorenz & Cappell,
2008). CRUNCH was proposed to account for patterns of overactivation and underactiva­
tion in OAs. According to CRUNCH, declining neural efficiency leads OAs to engage more
neural circuits than YAs to meet task demands. Therefore, OAs show more activity at low­
er levels of task demands. However, as demands increase, YAs show greater activity to
meet increasing task loads. OAs, on the other hand, have already reached their ceiling
and will show reduced performance and underactivation.

In summary, WM studies often found that OAs show reduced activity in the PFC regions
engaged by YAs but greater activity in other PFC regions, such as contralateral PFC re­
gions (i.e., the PFC hemisphere less engaged by YAs). In some cases (Reuter-Lorenz et al.,
2000), contralateral recruitment led to a more bilateral pattern of PFC activity in OAs
(i.e., HAROLD). Moreover, the WM studies by Cappel et al. and Schneider-Garces et al. il­
lustrate the importance of distinguishing between difficulty levels when considering age
differences in brain activity. In general, age-related increases in PFC activity were attrib­
uted to compensatory mechanisms.

Episodic Memory Encoding


There are two general categories of EM encoding studies: blocked design studies and
event-related studies using the subsequent memory paradigm. In blocked design studies,
the EM encoding condition is typically compared with a baseline condition, such as read­
ing. Some blocked studies used intentional learning instructions, asking participants to
memorize items for a subsequent memory test, whereas others use incidental learning in­
structions, asking them to make a judgment (i.e., semantic, size) on each item without
mentioning or emphasizing the subsequent test. In event-related studies using the subse­
quent memory paradigm, activity associated (p. 460) with successful encoding operations
is identified by comparing study-phase activity for items remembered versus forgotten in
a subsequent memory test (for a review, see Paller & Wagner, 2002).

The difference between blocked and event-related encoding studies is that the first
method measures overall task activity including regions involved in simply processing the
task instructions and stimuli, whereas the second method measures memory-specific ac­
tivity because overall task characteristics are subtracted out. Although OAs may show a
difference in task-related processes that are not immediately relevant for memory encod­
ing, such as interpreting task instructions and switching between task conditions, they
may recruit processes associated with successful memory encoding to a similar or even
greater extent.

The most common result in blocked design EM encoding studies using incidental and in­
tentional learning instructions is a reduction in left PFC activity. This reduction in left
PFC activity was often coupled with an increase in right PFC activity, yielding the
HAROLD pattern. In line with the resource deficit hypothesis, PFC reductions are more
pronounced for intentional conditions—which require more self-initiated processing—
than for incidental encoding conditions. For example, Logan et al. (2002) reported that

Page 6 of 27
Age-Related Decline in Working Memory and Episodic Memory

during self-initiated, intentional encoding instructions, OAs compared with YAs showed
less activity in left PFC but greater activity in right PFC, resulting in a more bilateral ac­
tivity pattern (HAROLD). Results were similar for intentional encoding of both verbal and
nonverbal material. Interestingly, further exploratory analyses revealed that this pattern
was present in a group of “old-old” (mean age = 80), but not in a group of “young-
old” (mean age = 67), suggesting that contralateral recruitment is associated with more
pronounced age-related cognitive decline. At the same time, the decrease in left PFC was
not present in the old-old group during incidental encoding instructions, suggesting that
—in line with the resource deficit hypothesis—frontal reductions can be remediated by
providing environmental support during encoding (Figure 22.3).

Figure 22.3 Young and young-old adults show left-


lateralized prefrontal cortex (PFC) activity during in­
tentional episodic memory encoding of words, where­
as old-old adults show bilateral PFC activity
(HAROLD).

Reprinted from Neuron, 33(5), Jessica M. Logan,


Amy L. Sanders, Abraham Z. Snyder, John C. Morris,
and Randy L. Buckner, “Under-Recruitment and Non­
selective Recruitment: Dissociable Neural Mecha­
nisms Associated with Aging,” 827–840, Copyright
(2002), with permission from Elsevier.

An incidental EM encoding study by Rosen at al. (2002) distinguishing between high- and
low-performing OAs linked the HAROLD pattern specifically to high-performing OAs. This
study distinguished between OAs with high and low memory scores based on a neuropsy­
chological test battery. The authors reported equivalent left PFC activity but greater right
PFC activity in the old-high memory group relative to YAs. In contrast, the old-low memo­
ry group showed reduced activity in both left and right PFC. As a result, the old-high
group showed a more bilateral pattern of PFC activity than YAs (HAROLD). These find­
ings support a compensatory interpretation of HAROLD.

Page 7 of 27
Age-Related Decline in Working Memory and Episodic Memory

In contrast to blocked design EM encoding studies, event-related fMRI studies using sub­
sequent memory paradigms have often found age-related equivalent or increased activity
in left PFC activity. (Dennis, Daselaar, & Cabeza, 2006; Duverne, Motamedinia, & Rugg,
2009; Gutchess et al., 2005; Morcom, Good, Frackowiak, & Rugg, 2003). For instance,
Morcom et al. (2003) used event-related fMRI to study subsequent memory for semanti­
cally encoded words. Recognition memory for these words was tested after a short and a
longer delay. Performance of OAs at the short delay was equal to that of YAs at the long
delay. Under these conditions, activity in left (p. 461) inferior PFC was greater for subse­
quently recognized than forgotten words and was equivalent in both age groups. Howev­
er, OAs showed greater right PFC activity than YAs, again resulting in a more bilateral
pattern of frontal activity (HAROLD).

Figure 22.4 Age differences associated with sus­


tained and transient subsequent memory effects.
Older adults show greater transient activation in left
prefrontal cortex (PFC), but younger adults showed
greater sustained activation in right PFC. The bar
graph represents difference scores of functional acti­
vation (beta weights) between successful and unsuc­
cessful encoding conditions for sustained and tran­
sient subsequent memory activity in both younger
and older adults.

Reprinted from Neurobiology of Aging, Vol. 28, Nan­


cy A. Dennis, Sander Daselaar, and Roberto Cabeza,
“Effects of aging on transient and sustained success­
ful memory encoding activity,” 1749–1758, Copyright
(2007), with permission from Elsevier.

As noted, one explanation for the discrepancy in left PFC activity between blocked and
event-related studies may be a difference between overall task activity (reduced) and
memory-related activity (preserved or enhanced). Although OAs may show a difference in
task-related processes that are not relevant for memory encoding, they may recruit
processes associated with successful memory encoding to a similar or even greater ex­
tent. Another explanation may be a difference in sustained (blocked) and transient (event-
related) subsequent memory effects. A recent study by Dennis et al. (2006), used hybrid
blocked and event-related analyses to distinguish between transient and sustained subse­
quent memory effects during deep incidental encoding of words. Subsequent memory
was defined as parametric increases in encoding activity as a function of a combined sub­
sequent memory and confidence scale. This parametric response was measured in each
trial (transient activity) and in blocks of eight trials (sustained activity). Similar to the re­
sults of Gutchess et al., subsequent memory analyses of transient activity showed age-re­

Page 8 of 27
Age-Related Decline in Working Memory and Episodic Memory

lated increases in the left PFC. At the same time, subsequent memory analyses of sus­
tained activity showed age-related reductions in the right PFC (Figure 22.4). The decline
in sustained subsequent memory activity in the PFC may involve age-related deficits in
sustained attention that affect encoding processes. The results underline the importance
of investigating aging effects on both transient and sustained neural activity.

To summarize encoding studies, the most consistent finding in incidental and intentional
encoding studies is an age-related reduction in left PFC activity. This finding is more fre­
quent for intentional than for incidental encoding studies, suggesting that, in line with
the resource deficit hypothesis, the environmental support provided by a deep semantic
encoding task may attenuate the age-related decrease in left PFC activity. This effect was
found within subjects in the study by Logan et al. (2002). The difference between inten­
tional and incidental encoding conditions suggests an important strategic component in
age-related memory decline. The reduction in left PFC activity was often coupled with an
increase in right PFC activity, leading to a bilateral pattern of PFC activity in OAs
(HAROLD). Importantly, the study by Rosen et al. (2002) found the HAROLD pattern only
in high-performing OAs. This result provides support for a compensatory account of
HAROLD. In contrast to blocked EM encoding studies, studies that used subsequent
memory paradigms often found increases in left PFC activity. This discrepancy may relate
to differences between overall task activity (reduced) and memory-related activity (pre­
served or enhanced) and between the different attentional components measured during
blocked and event-related paradigms.

Episodic Memory Retrieval


In line with the resource deficit hypothesis, age-related deficits in episodic retrieval tend
to be more pronounced for recall and context memory tasks than for recognition tasks
(Spencer & Raz, 1995). (p. 462) However, considerable differences in activity have also
been observed during simple recognition tasks. Similar to EM encoding studies, whereas
studies using blocked designs have often found decreases in PFC activity, more recent
studies using event-related fMRI designs have found equivalent or increased activity.

As an example of a blocked EM retrieval study, Cabeza et al. (1997) used both a word-pair
recognition and cued-recall task. During word recognition, OAs showed reduced activity
in the right PFC. During recall, OAs also showed weaker activations in the right PFC than
YAs, but at the same time, showed greater activity than YAs in the left PFC. The net result
was that PFC activity during recall was right lateralized in YAs but bilateral in OAs. The
authors noted this change in hemispheric asymmetry and interpreted it as compensatory.
This was the first study identifying the HAROLD pattern and the first one suggesting the
compensatory interpretation of this finding. These changes were more pronounced dur­
ing recall than during recognition, consistent with behavioral evidence that recall is more
sensitive to aging.

In another study by Cabeza and colleagues (2002), YAs, high-performing OAs (old-high),
and low-performing OAs (old-low) studied words presented auditorily or visually. During
scanning, they were presented with words visually and made either old/new decisions

Page 9 of 27
Age-Related Decline in Working Memory and Episodic Memory

(item memory) or heard/seen decisions (context memory). Consistent with their previous
results, YAs showed right PFC activity for context trials, whereas OAs showed bilateral
PFC activity (HAROLD). Importantly, however, this pattern was only seen for the old-high
adults, supporting a compensation account of the HAROLD pattern (Figure 22.5).

Figure 22.5 Prefrontal cortex (PFC) activity during


episodic memory retrieval is right lateralized in
young and old-low participants, but bilateral in old-
high subjects (HAROLD).

Reprinted from NeuroImage, Vol. 3, Roberto Cabeza,


Nicole D. Anderson, Jill K. Locantore, and Anthony R.
McIntosh, “Aging Gracefully: Compensatory Brain
Activity in High-Performing Older Adults,” 1394–
1402, Copyright (2002), with permission from Elsevi­
er.

As an example of an event-related EM retrieval study, Morcom et al. (Morcom, Li, &


Rugg, 2007) studied age differences during successful retrieval using a source memory
task involving pictures with two encoding sources: animacy classifications or size judg­
ments. They also distinguished between an easy and a difficult condition. During the easy
condition, OAs encoded the items three times and YAs two times to equate memory per­
formance. Under these conditions of equal performance, OAs showed more activity dur­
ing source retrieval (animacy vs. size) in both the left and right PFC regions, as well as in
several other cortical areas. Interestingly, in line with CRUNCH (Reuter-Lorenz & Cap­
pell, 2008) and the WM studies by Cappell et al. (2010) and Schneider-Garcias et al.
(2009), OAs did show overall reductions in activity during the difficult source retrieval
condition (single encoding presentation). The authors concluded that the over-recruit­
ment of regions during the easy condition reflects an age-related decline in the efficiency
with which neural populations support cognitive function.

Summarizing the studies on PFC and EM retrieval, in blocked design studies, PFC differ­
ences between YA and OAs have been found more frequently in studies using tasks with
little environmental support, including recall and context memory tasks, than during sim­
ple item recognition. This was exemplified in the study by Cabeza et al. (1997), which in­
cluded both recall and recognition tasks. These findings suggest a three-way interaction
between age, executive demand, and frontal laterality. Distinguishing between old-high
and old-low adults, the study by Cabeza et al. (2002) provided direct evidence for the
compensation account of HAROLD. Similar to EM encoding studies that used the subse­
quent memory paradigm, event-related EM retrieval studies that focused on the neural
correlates of successful EM retrieval have found equivalent or increased PFC activity in
OAs. Interestingly, in line with the WM studies by Cappell et al. (2010) and Schneider-
Page 10 of 27
Age-Related Decline in Working Memory and Episodic Memory

Garces et al. (2009), (p. 463) the event-related study by Morcom et al. (2007) illustrates
the importance of distinguishing between differences in task difficulty when considering
age differences in brain activity.

Medial Temporal Lobes

Frontal activations in aging showed both reductions and increases across aging, as well
as shifts in lateralization of activation. On the other hand, activation within the MTL gen­
erally shows age-related decreases compared with MTL activation seen in YAs. However,
EM retrieval studies indicate a shift in the foci of activation from the hippocampus proper
to more parahippocampal regions in aging.

Working Memory
Although the MTL has been strongly linked to EM, and the PFC to WM, MTL processes
are also thought to play a role in WM tasks, particularly when these involve the binding
between different elements (Ranganath & Blumenfeld, 2005). Regarding aging, only
three WM studies found reductions in hippocampal activity during WM tasks, which all
used nonverbal tasks.

The first study was conducted by Grady et al. (1998). They employed a face WM task with
varying intervals of item maintenance. Results showed that OAs have difficulty maintain­
ing hippocampal activation across longer delays. As the delay extended from 1 to 6 sec­
onds, left hippocampal activity increased in YAs but decreased in OAs, which implies that
OAs have difficulties initiating memory strategies mediated by MTL or sustaining MTL ac­
tivity beyond very short retention intervals.

Figure 22.6 The left hippocampus showed an age ×


condition interaction. In young adults, hippocampal
activity was greater in combination trials (object +
location) than in the object-only and location-only
conditions. In older adults, activation was lower in
the combination trials than in the object-only condi­
tion.

The second study was conducted by Mitchell et al. (2000). They investigated a WM para­
digm with an important binding component. In each trial, participants were presented an
object in a particular screen location and had to hold in WM the object, its location, or
both (combination trials). Combination trials can be assumed to involve not only WM
maintenance but also the binding of different information into an integrated memory
trace (associative memory EM encoding). OAs showed a deficit in accuracy in the combi­
nation condition but not in the object and location conditions. Two regions were differen­

Page 11 of 27
Age-Related Decline in Working Memory and Episodic Memory

tially involved in the combination condition in YAs but not in OAs: a left anterior hip­
pocampal region and an anteromedial PFC region (right Brodmann area 10) (Figure
22.6). According to the authors, a disruption of a hippocampal–PFC circuit may underlie
binding deficits in OAs.

Finally, in a study by Park et al. (2003), OAs showed an age-related reduction in hip­
pocampal activity. The left hippocampus was more activated in the viewing than in the
maintenance condition in YAs but not in OAs. As in the study by Mitchell et al. (2000), the
age-related reduction in hippocampal activity was attributed to deficits in binding opera­
tions.

In sum, three nonverbal WM studies using spatial/pictorial stimuli (Grady et al., 1998;
Mitchell et al., 2000; Park et al., 2003) found age-related decreases in hippocampus activ­
ity. Interestingly, no verbal WM study found such decreases (Cappell et al., 2010; Reuter-
Lorenz et al., 2000). It is possible that nonverbal tasks are more dependent on hippocam­
pal-mediated relational memory processing, and hence more sensitive to age-related
deficits in MTL regions. Thus, contrary to PFC findings, WM studies have generally found
age-related reductions in MTL activity.

(p. 464) Episodic Memory Encoding


During EM encoding, frontal activations in aging showed both reductions and increases
across aging. On the other hand, similar to WM, activation within the MTL during EM en­
coding generally shows age-related decreases compared with MTL activation seen in YAs.
We will briefly discuss the general findings of blocked-design studies and then more re­
cent event-related fMRI studies using the subsequent memory paradigm.

Although not as frequently as reductions in left PFC activity, blocked design studies using
both intentional and incidental EM encoding paradigms have found age-related decreases
in MTL activity. As an example of intentional EM encoding, in their study examining face
encoding, Grady et al. (1995) found that, compared with YAs, OAs showed less activity not
only in the left PFC but also in the MTL. Furthermore, they found a highly significant cor­
relation between hippocampus and left PFC activity in YAs, but not in OAs. Based on
these results, they concluded that encoding in OAs is accompanied by reduced neural ac­
tivity and diminished connectivity between PFC and MTL areas. As an example of inciden­
tal EM encoding, Daselaar et al. (2003a) investigated levels of processing in aging using a
deep (living/nonliving) versus shallow (uppercase/lowercase) encoding task. Despite see­
ing common activation of regions involved in a semantic network across both age groups,
activation differences were seen when comparing levels of processing. OAs revealed sig­
nificantly less activation in the left anterior hippocampus during deep relative to shallow
classification. The researchers concluded that in addition to PFC changes, under-recruit­
ment of MTL regions contributes, at least in part, to age-related impairments in EM en­
coding.

Similar to block design studies, event-related EM encoding studies have generally found
age-related reductions in MTL activity during tasks using single words or pictures (Den­

Page 12 of 27
Age-Related Decline in Working Memory and Episodic Memory

nis et al., 2006; Gutchess et al., 2005). A study by Dennis et al. (2008) also included a
source memory paradigm and provided clear support for the binding deficit hypothesis.
YAs and OAs were studied with fMRI while encoding faces, scenes, and face–scene pairs
(source memory). In line with binding deficit theory, the investigators found age-related
reductions in subsequent memory activity in the hippocampus, which were more pro­
nounced for face–scene pairs than for item memory (faces and scenes).

The aforementioned study by Daselaar et al. (2003b) that distinguished between high-
and low-performing OAs linked MTL reductions directly to individual differences in mem­
ory performance. These researchers found an age-related reduction activity in the anteri­
or MTL when comparing subsequently remembered items to a motor baseline, which was
specific to low-performing OAs. Based on these findings, they concluded that MTL dys­
function during encoding is an important factor in age-related memory decline.

To summarize, blocked and event-related EM encoding studies have generally found age-
related MTL reductions (Daselaar et al., 2003a, 2003b; Dennis et al., 2006; Gutchess et
al., 2005). In line with the binding deficit hypothesis, the study by Dennis et al. links age-
related MTL reductions mainly to source memory. However, other studies also found re­
ductions during individual items. These findings suggest that age-related binding deficits
play a role not only in complex associative memory tasks but also in simpler item memory
tasks. One explanation for these results is that, in general, item memory tasks also have
an associative component. In fact, the deep processing tasks used in these studies are
specifically designed to invoke semantic associations in relation to the study items. As dis­
cussed in the next section, recollection of these associations can be used as confirmatory
evidence during EM retrieval: remembering specific associations with a study item con­
firms that one has seen the study items. The study by Daselaar et al. (2003b) directly
linked reduced MTL activity during single-word encoding to impaired performance on a
subsequent recognition test.

Episodic Memory Retrieval


As noted, relational memory theory asserts that the hippocampal formation is more in­
volved in binding or relational memory operations, whereas the surrounding parahip­
pocampal gyrus is more involved in individual item memory. Recent EM retrieval studies
have indicated a shift from hippocampal to parahippocampal regions with age that may
reflect a reduced employment of relational memory operations during EM retrieval
(Cabeza et al., 2004; Daselaar, Fleck, Dobbins, Madden, & Cabeza, 2006; Giovanello,
Kensinger, Wong, & Schacter, 2010). This idea is supported by a large number of behav­
ioral studies indicating that OAs show an increased reliance on familiarity-based retrieval
as opposed to recollection-based retrieval (e.g., Bastin & Van der Linden, 2003; Davidson
& Glisky, 2002; Java, 1996; Mantyla, 1993; Parkin & Walter, 1992). As mentioned before,
recollection (p. 465) refers to remembering an item together with contextual details,
which is more dependent on binding, whereas familiarity refers to knowing that an item
occurred in the past even though its contextual details cannot be retrieved.

Page 13 of 27
Age-Related Decline in Working Memory and Episodic Memory

The first support for a shift from recollection to familiarity EM retrieval processes came
from a study by Cabeza et al. (2004). They investigated the effects of aging on several
cognitive tasks, including a verbal recognition task. Within the MTLs, they found a disso­
ciation between a hippocampal region, which showed weaker activity in OAs than in YAs,
and a parahippocampal region, which showed the converse pattern. Given evidence that
hippocampal and parahippocampal regions are, respectively, more involved in recollec­
tion than familiarity (Eichenbaum et al., 2007), this finding is consistent with the notion
that OAs are more impaired in recollection than in familiarity (e.g., Jennings & Jacoby,
1993; Parkin & Walter, 1992). Indeed, the age-related increase in parahippocampal cortex
activity suggests that OAs may be compensating for recollection deficits by relying more
on familiarity. Supporting this idea, OAs had a larger number of “know” (familiarity-based
EM retrieval—“knowing” that something is old) responses than YAs, and these responses
were positively correlated with the parahippocampal activation.

In line with the findings by Cabeza and colleagues, a recent study by Giovanello et al.
(2010) also found a hippocampal–parahippocampal shift with age during retrieval. They
used a false memory paradigm in which conjunction words during study were recombined
during testing. For instance, “blackmail” and “jailbird” were presented during the study,
and “blackbird” was presented during test. False memory conjunction errors (responding
“blackbird” is old) tended to occur more frequently in OAs than in YAs. Giovanello and
colleagues found that OAs showed more parahippocampal activity during recombined
conjunction words at retrieval (false memory), but less hippocampal activity during iden­
tical conjunction words (veridical memory). Given that false memories are associated with
gist-based processes that rely on familiarity (Balota et al., 1999; Dennis, Kim, & Cabeza,
2008), the age-related increase in parahippocampal cortex fits well with the results by
Cabeza et al. (2004).

Another study by Cabeza and colleagues provided a similar pattern of results (Daselaar,
Fleck, et al., 2006). YAs and OAs made old/new judgments about previously studied words
followed by a confidence judgment from low to high. On the basis of previous research
(Daselaar, Fleck, & Cabeza, 2006; Yonelinas, 2001), recollection was measured as an ex­
ponential change in brain activity as a function of confidence, and familiarity was mea­
sured as a linear change. The results revealed a clear double dissociation within MTL:
whereas recollection-related activity in the hippocampus was reduced by aging, familiari­
ty-related activity in rhinal cortex was increased by aging (Figure 22.7A). These results
suggested that OAs compensate for deficits in recollection processes mediated by the hip­
pocampus by relying more on familiarity processes mediated by rhinal cortex. Supporting
this interpretation, within-participants regression analyses based on single-trial activity
showed that recognition accuracy was determined by only hippocampal activity in YAs but
by both hippocampal and rhinal activity in OAs. Also consistent with the notion of com­
pensation, functional connectivity analyses showed that correlations between the hip­
pocampus and posterior regions associated with recollection were greater in YAs, where­
as correlations between the rhinal cortex and bilateral PFC regions were greater in OAs
(Figure 22.7B). The latter effect suggests a top-down modulation of PFC on rhinal activity
in OAs. The finding of preserved rhinal function in healthy OAs has important clinical im­
Page 14 of 27
Age-Related Decline in Working Memory and Episodic Memory

plications because this region is impaired early in AD (Killiany et al., 2000; Pennanen et
al., 2004).

In sum, retrieval studies have found both increases and decreases in MTL activity. The
findings by Cabeza and colleagues suggest that at least some of these increases reflect a
shift from recollection-based (hippocampus) to familiarity-based (parahippocampal\rhinal
cortex) retrieval. Furthermore, their functional connectivity findings suggest that the
greater reliance on familiarity processes in OAs may be mediated by a top-down frontal
modulation.

Discussion
In summary, our review of functional neuroimaging studies of cognitive aging has identi­
fied considerable age-related changes in activity during WM, EM encoding, and EM re­
trieval tasks not only in the PFC but also in the MTL. These findings suggest that func­
tional changes in both the PFC and MTL play a role in age-related memory deficits. Fo­
cusing first on PFC findings, the studies indicated both age-related reductions and in­
creases in PFC activity. During WM tasks, OAs show reduced activity in the PFC regions
engaged by YAs, but greater activity in other regions, such as contralateral PFC regions.
The latter changes often resulted in the more bilateral pattern of PFC activity in OAs than
YAs known as HAROLD (Cabeza (p. 466)

Page 15 of 27
Age-Related Decline in Working Memory and Episodic Memory

et al., 2004; Park et al.,


2003; Reuter-Lorenz et al.,
2000, 2001). In general, age-
related PFC increases and
HAROLD findings have been
attributed to functional com­
pensation in the aging brain.
During EM encoding tasks,
the most consistent finding
has been a reduction in left
PFC activity. This finding is
more frequent for intentional
than for incidental EM en­
coding tasks. The age-relat­
ed reduction in left PFC ac­
tivity was often coupled with
an age-related increase in
Figure 22.7 The effects of aging yielded a double right PFC activity (i.e.,
dissociation between two medial temporal lobe HAROLD). EM retrieval was
(MTL) subregions: Whereas recollection-related ac­
tivity (exponential increase) in the hippocampus was
also associated with
attenuated by aging, familiarity-related activity (lin­ HAROLD, and this pattern
ear decrease) in the rhinal cortex was enhanced by was found more often in
aging. The hippocampal exponential rate parameter
studies using more challeng­
(λ) provides a measure of the sharpness of the expo­
nential increase of the perceived oldness function in ing recall and context memo­
the hippocampus. The rhinal slope parameter pro­ ry tasks than during simple
vides a measure of the steepness of the perceived item recognition tasks. Final­
oldness function in the rhinal cortex.
ly, (p. 467) EM retrieval stud­
From Daselaar, Fleck, Dobbins, Madden, & Cabeza, ies suggest a shift from hip­
2005. Reprinted with permission from Oxford Univer­
pocampal (recollection) to
sity Press.
parahippocampal (familiari­
ty) retrieval processes.

Linking Cognitive Theories to Age-Related


Changes in the Prefrontal Cortex and Medial
Temporal Lobes
In the first part of this chapter, we discussed two important cognitive hypotheses that
have been put forward to account for age-related deficits in WM and EM, the resource
deficit hypothesis and the binding deficit hypothesis. Below, we connect these behavioral
and neurobiological findings by linking the resource and binding deficit hypotheses to
PFC and MTL function in OAs. Finally, we discuss the relevance of these findings in terms
of the clinical distinction between healthy and pathological deficits in EM.

Page 16 of 27
Age-Related Decline in Working Memory and Episodic Memory

Resource Deficit Hypothesis and Prefrontal Cortex Function

The resource deficit hypothesis postulates that aging reduces attentional resources, and
as a result, OAs have greater difficulties with cognitive tasks, including EM tasks, that re­
quire greater self-initiated processing. This hypothesis predicts that age-related differ­
ences should be smaller when the task provides a supportive environment that reduces
attentional demands. Among other findings, the resource deficit hypothesis is supported
by evidence that when attentional resources are reduced in YAs, they tend to show EM
deficits that resemble those of OAs (Anderson, Craik, & Naveh-Benjamin, 1998; Jennings
& Jacoby, 1993).

Regarding neural correlates, Craik (1983) proposed that OAs’ deficits in processing are
related to a reduction in the efficiency of PFC functioning. This idea fits with the fact that
this region shows the most prominent gray matter atrophy. Moreover, functional neu­
roimaging studies have found age-related changes in PFC activity that are generally in
line with the resource deficit hypothesis.

Given the critical role of PFC in managing attentional resources, the resource deficit hy­
pothesis predicts that age-related changes in PFC activity will be larger for tasks involv­
ing greater self-initiated processing or less environmental support. The results of func­
tional neuroimaging studies are generally consistent with this prediction. During EM en­
coding, age-related decreases in left PFC activation were found frequently during inten­
tional EM encoding conditions (which provide less environmental support) but rarely dur­
ing incidental EM encoding conditions (which provide greater environmental support).
Similarly, during EM retrieval, age-related differences in PFC activity were usually larger
for recall and context memory tasks (which require greater cognitive resources) than for
recognition memory tasks (which require fewer cognitive resources). Thus, in general,
age effects on PFC activity tend to increase as a function of the demands placed on cogni­
tive resources.

However, not all age-related changes in PFC activity suggested decline; on the contrary,
many studies found age-related increases in PFC that suggested compensatory mecha­
nisms in the aging brain. In particular, several studies of EM encoding and EM retrieval
found activations in contralateral PFC regions in OAs that were not seen in YAs. Impor­
tantly, experimental comparisons between high- and low-performing OAs (Cabeza et al.,
2002; Rosen et al., 2002) demonstrated the beneficial contribution of contralateral PFC
recruitment to memory performance in OAs. Moreover, a recent study using transcranial
magnetic stimulation (TMS) found that in YAs, episodic EM retrieval performance was im­
paired by TMS of the right PFC but not of the left PFC, whereas in OAs, it was impaired
by either right or left PFC stimulation (Rossi et al., 2004). This result indicates that the
left PFC was less critical for YAs and was used more by OAs, consistent with the compen­
sation hypothesis.

It is important to note that resource deficit and compensatory interpretations are not in­
compatible. In fact, it is reasonable to assume that the recruitment of additional brain re­
gions (e.g., in the contralateral PFC hemisphere) reflects an attempt to compensate for
Page 17 of 27
Age-Related Decline in Working Memory and Episodic Memory

reduced cognitive resources. One way in which OAs could counteract deficits in the par­
ticular pool of cognitive resources required by a cognitive task is to tap into other pools of
cognitive resources. If one task is particularly dependent on cognitive processes mediat­
ed by one hemisphere, the other hemisphere represents an alternative pool of cognitive
resources. Thus, in the case of PFC-mediated cognitive resources, if OAs have deficits in
PFC activity in one hemisphere, they may compensate for these deficits by recruiting con­
tralateral PFC regions. Moreover, age-related decreases suggestive of resource deficits
and age-related increases suggestive of compensation have often been found in the same
conditions. For example, intentional EM encoding studies have shown age-related de­
creases in left PFC activity coupled with age-related increases in right PFC activity, lead­
ing to a dramatic reduction in hemispheric asymmetry in OAs (i.e., HAROLD).

(p. 468) Binding Deficit Hypothesis and Medial Temporal Lobe Function

The binding deficit hypothesis postulates that age-related memory deficits are primarily
the result of difficulties in EM encoding and retrieving novel associations between items.
This hypothesis predicts that OAs are particularly impaired in EM tasks that involve rela­
tions between individual items or between items and their context. Given that relational
memory has been strongly associated with the hippocampus (Eichenbaum, Otto, & Co­
hen, 1994), this hypothesis also predicts that OAs will show decreased hippocampal activ­
ity during memory tasks, particularly when they involve relational information.

Functional neuroimaging studies have identified considerable age-related changes not on­
ly in the PFC but also in MTL regions. As noted, the MTL also shows substantial atrophy
in aging. Yet, the rate of decline differs for different subregions: Whereas the hippocam­
pus shows a marked decline, the rhinal cortex is relatively preserved in healthy aging
(see Figure 22.1). This finding is in line with the idea that age-related memory deficits are
particularly pronounced during relational memory tasks, which depend on the hippocam­
pus.

In line with anatomical findings, functional neuroimaging studies have found substantial
age-related changes in MTL activity during WM, EM encoding, and EM retrieval. Several
studies have found age-related decreases in both hippocampal and parahippocampal re­
gions. During WM and EM encoding, these reductions are seen in tasks that emphasize
the binding between different elements (Dennis, Hayes, et al., 2008; Mitchell et al.,
2000). However, declines in hippocampal activation are also seen during tasks using indi­
vidual stimuli in healthy OAs (e.g., Daselaar et al., 2003a, 2003b; Park et al., 2003). Final­
ly, during EM retrieval some studies found decreases in hippocampal activity, but also
greater activity in parahippocampal regions, which may be compensatory (Cabeza et al.,
2004; Daselaar, Fleck, & Cabeza, 2006; Giovanello et al., 2010).

In general, age-related changes in MTL activity are consistent with the binding deficit hy­
pothesis. Age-related reductions in hippocampal activity were found during the mainte­
nance of EM encoding of complex scenes, which involved associations among picture ele­
ments (Gutchess et al., 2005), and during deep EM encoding of words, which involved

Page 18 of 27
Age-Related Decline in Working Memory and Episodic Memory

identification of semantic associations (Daselaar et al., 2003a, 2003b; Dennis et al., 2006).
Finally, one study specifically linked age-related reductions in hippocampal activity to rec­
ollection, which involves recovery of item–context associations (Daselaar, Fleck, et al.,
2006). Yet, it should be noted that age-related changes in MTL activity were often accom­
panied by concomitant changes in PFC activity. Hence, in these cases, it is unclear
whether such changes signal MTL dysfunction or are the result of a decline in executive
processes mediated by PFC regions. However, studies using incidental EM encoding tasks
with minimal self-initiated processing requirements have also identified age-related dif­
ferences in MTL activity without significant changes in PFC activity (Daselaar et al.,
2003a, 2003b)

As in the case of PFC, not all age-related changes in MTL activity suggest decline; several
findings suggest compensation. OAs have been found to show reduced activity in the hip­
pocampus but increased activity in other brain regions such as the parahippocampal
gyrus (Cabeza et al., 2004) and the rhinal cortex (Daselaar, Fleck, et al., 2006). These re­
sults were interpreted as a recruitment of familiarity processes mediated by parahip­
pocampal regions to compensate for the decline of recollection processes that are depen­
dent on the hippocampus proper. These results fit well with the relational memory view
(Cohen & Eichenbaum, 1993; Eichenbaum et al., 1994), which states that the hippocam­
pus is involved in binding an item with its context (recollection), whereas the surrounding
parahippocampal cortex mediates item-specific memory processes (familiarity).

Healthy Versus Pathological Aging

As mentioned at the beginning of this chapter, one of the biggest challenges in cognitive
aging research is to isolate the effects of healthy aging from those of pathological aging.
Structural neuroimaging literature suggests that healthy aging is accompanied by greater
declines in frontal regions compared with MTL (Raz et al., 2005). In contrast, pathologi­
cal aging is characterized by greater decline in MTL than in frontal regions (Braak,
Braak, & Bohl, 1993; Kemper, 1994). In fact, functional neuroimaging evidence suggests
that prefrontal activity tends to be maintained or even increased in early AD (Grady,
2005). Thus, these findings suggest that memory decline in healthy aging is more depen­
dent on frontal than MTL deficits, whereas the opposite pattern is more characteristic of
pathological aging (Buckner, 2004; West, 1996). In view of these findings, clinical studies
aimed at an early diagnosis of (p. 469) age-related pathology have mainly targeted
changes in MTL (Nestor, Scheltens, & Hodges, 2004). Yet, the studies reviewed in this
chapter clearly indicate that healthy OAs are also prone to MTL decline. Hence, rather
than focusing on MTL deficits alone, diagnosis of age-related pathology may be improved
by employing some type of composite score reflecting the ratio between MTL and frontal
decline.

In terms of MTL dysfunction in healthy and pathological aging, it is also critical to assess
the specific type or loci of MTL dysfunction. Critically, a decline in hippocampal function
can be seen in both healthy aging and AD. Thus, even though hippocampal volume de­
cline is an excellent marker of concurrent AD (Scheltens, Fox, Barkhof, & De Carli, 2002),

Page 19 of 27
Age-Related Decline in Working Memory and Episodic Memory

it is not a reliable measure for distinguishing normal aging from early stages of the dis­
ease (Raz et al., 2005). In contrast, changes in the rhinal cortex are not apparent in
healthy aging (see Figure 22.1), but they are present in early AD patients with only mild
impairments (Dickerson et al., 2004). In a discriminant analysis, Pennanen and colleagues
(2004) showed that, although hippocampal volume is indeed the best marker to discrimi­
nate AD patients from normal controls, measuring the volume of the entorhinal cortex is
much more useful for distinguishing between incipient AD (mild cognitive impairment)
and healthy aging. The fMRI study by Daselaar, Cabeza, and colleagues provides indica­
tions that the implementation of the recollection/familiarity distinction during EM re­
trieval in combination with fMRI may be promising in that respect (Daselaar, Fleck, et al.,
2006). Finally, it should be noted that, despite the rigorous screening procedures typical
of functional neuroimaging studies of healthy aging, it remains possible that early symp­
toms of age-related pathology went undetected in some of the studies reviewed in this
chapter.

Summary
In this chapter, we reviewed functional neuroimaging evidence highlighting the role of
the PFC and MTL regions in age-related decline in WM and EM function. The chapter fo­
cused on two major factors thought to underlie age-related memory decline and strongly
linked to PFC and MTL function, namely deficits in executive function and deficits in
binding processes. We discussed functional neuroimaging studies that generally showed
age-related decreases in PFC and MTL activity during WM, EM encoding, and EM re­
trieval. Yet, some of these studies also found preserved or increased levels of PFC or MTL
activity in OAs, which may be compensatory. Regarding the PFC, several WM and EM
studies have found an age-related increase in contralateral PFC activity, leading to an
overall reduction in frontal asymmetry in OAs (HAROLD). As discussed, studies that divid­
ed OAs into high and low performers provided strong support for the idea that HAROLD
reflects a successful compensatory mechanism. Regarding the MTL, several WM and EM
studies reported age-related decreases in MTL activity. Yet, studies of EM retrieval have
also found age-related increases in MTL activity. Recent findings suggest that at least
some of these increases reflect a compensatory shift from hippocampal-based recollection
processes to parahippocampal-based familiarity processes. In sum, in view of the substan­
tial changes in PFC and MTL that take place when we grow older, a reduction in WM and
EM function seems inevitable. Yet, our review also suggests that some OAs can counter­
act this reduction by using alternative brain resources within the PFC and MTL that allow
them to compensate for the general deficits in executive and binding operations underly­
ing WM and EM decline with age.

Page 20 of 27
Age-Related Decline in Working Memory and Episodic Memory

Future Directions
In this chapter we reviewed functional neuroimaging studies of WM and EM that identi­
fied considerable age-related changes in PFC and MTL activity. These findings suggest
that functional changes in both PFC and MTL play a role in age-related memory deficits.
However, as outlined below, there are several open questions that need to be addressed
in future functional neuroimaging studies of memory and aging

What is the role of PFC-MTL functional connectivity in age-related memory decline? In


this chapter, we discussed the role of PFC and MTL and age-related memory decline in
WM and EM separately, and only mentioned a few cases in which age-related differences
in PFC activation were correlated with MTL activations across participants. However,
these studies did not assess age differences in functional connectivity, which need to be
measured within participants on a trial-by-trial basis. Moreover, the role of white matter
integrity in age-related differences in PFC–MTL connectivity has been ignored in studies
of aging and memory. Future aging and memory research should elucidate the relation
between age-related memory decline and PFC–MTL coupling by combining functional
connectivity measures with structural connectivity in the form of white matter integrity.

(p. 470) What is the role of task demands in age differences in memory activations?
According to CRUNCH (Reuter-Lorenz & Cappell, 2008), OAs show more activity at lower
levels of task demands, but as demands increase, YAs show greater activity to meet in­
creasing task loads. Yet, OAs have already reached their ceiling and will show reduced
performance and less activity. We discussed evidence from two recent WM studies sup­
porting the CRUNCH model. We also discussed similar findings regarding the HAROLD
model, indicating, for instance, greater age-related asymmetry during more difficult re­
call tasks than during simple recognition tasks. Future fMRI studies of aging and memory
should further clarify the relation between task difficulty and age-related activation dif­
ferences by incorporating multiple levels of task difficulty in their design and including
task demand functions in their analyses.

Finally, two open question are: To what extent is the age-related shift from recollection-
based (hippocampus) to familiarity-based (rhinal cortex) retrieval indeed specific to
healthy OAs? and Can the familiarity versus recollection distinction, combined with fMRI,
be used in practice for the clinical diagnosis of pathological age-related conditions?

Author Note
This work was supported by grants from the National Institute on Aging to RC (AG19731;
AG23770; AG34580).

Page 21 of 27
Age-Related Decline in Working Memory and Episodic Memory

References
Anderson, N. D., Craik, F. I. M., & Naveh-Benjamin, M. (1998). The attentional demands
of encoding and retrieval in younger and older adults: I. Evidence from divided attention
costs. Psychology and Aging, 13, 405–423.

Balota, D. A., Cortese, M. J., Duchek, J. M., Adams, D., Roediger, H. L., McDermott, K. B.,
et al. (1999). Veridical and false memories in healthy older adults and in dementia of the
Alzheimer’s type. Cognitive Neuropsychology, 16 (3-5), 361–384.

Bastin, C., & Van der Linden, M. (2003). The contribution of recollection and familiarity to
recognition memory: A study of the effects of test format and aging. Neuropsychology, 17
(1), 14–24.

Braak, H., Braak, E., & Bohl, J. (1993). Staging of Alzheimer-related cortical destruction.
European Neurology, 33 (6), 403–408.

Buckner, R. L. (2004). Memory and executive function in aging and AD: Multiple factors
that cause decline and reserve factors that compensate. Neuron, 44 (1), 195–208.

Cabeza, R. (2002). Hemispheric asymmetry reduction in older adults: The HAROLD mod­
el. Psychology and Aging, 17 (1), 85–100.

Cabeza, R., Anderson, N. D., Locantore, J. K., & McIntosh, A. R. (2002). Aging gracefully:
Compensatory brain activity in high-performing older adults. NeuroImage, 17 (3), 1394–
1402.

Cabeza, R., Daselaar, S. M., Dolcos, F., Prince, S. E., Budde, M., & Nyberg, L. (2004).
Task-independent and task-specific age effects on brain activity during working memory,
visual attention and episodic retrieval. Cerebral Cortex, 14 (4), 364–375.

Cabeza, R., Grady, C. L., Nyberg, L., McIntosh, A. R., Tulving, E., Kapur, S., et al. (1997).
Age-related differences in neural activity during memory encoding and retrieval: A
positron emission tomography study. Journal of Neuroscience, 17, 391–400.

Cappell, K. A., Gmeindl, L., & Reuter-Lorenz, P. A. (2010). Age differences in prefrontal
recruitment during verbal working memory maintenance depend on memory load. Cortex,
46 (4), 462–473.

Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia and the Hippocampal system.
Cambridge MA: MIT Press.

Craik, F. I. M. (1977). Age differences in human memory. In J. E. Birren & K. W. Schaie


(Eds.), Handbook of the psychology of aging (pp. 384–420). New York: Van Nostrand Rein­
hold.

Craik, F. I. M. (1983). On the transfer of information from temporary to permanent memo­


ry. Philosophical Transactions of the Royal Society, London, Series B, 302, 341–359.

Page 22 of 27
Age-Related Decline in Working Memory and Episodic Memory

Craik, F. I. M. (1986). A functional account of age differences in memory. In F. Klix & H.


Hagendorf (Eds.), Human memory and cognitive capabilities, mechanisms, and perfor­
mances (pp. 409–422). Amsterdam: Elsevier.

Daselaar, S. M., Fleck, M. S., & Cabeza, R. (2006). Triple dissociation in the medial tem­
poral lobes: Recollection, familiarity, and novelty. Journal of Neurophysiology, 96 (4),
1902–1911.

Daselaar, S. M., Fleck, M. S., Dobbins, I. G., Madden, D. J., & Cabeza, R. (2006). Effects of
healthy aging on hippocampal and rhinal memory functions: An event-related fMRI study.
Cerebral Cortex, 16 (12), 1771–1782.

Daselaar, S. M., Veltman, D. J., Rombouts, S. A., Raaijmakers, J. G., & Jonker, C. (2003a).
Deep processing activates the medial temporal lobe in young but not in old adults. Neuro­
biology of Aging, 24 (7), 1005–1011.

Daselaar, S. M., Veltman, D. J., Rombouts, S. A., Raaijmakers, J. G., & Jonker, C. (2003b).
Neuroanatomical correlates of episodic encoding and retrieval in young and elderly sub­
jects. Brain, 126 (Pt 1), 43–56.

Davidson, P. S., & Glisky, E. L. (2002). Neuropsychological correlates of recollection and


familiarity in normal aging. Cognitive, Affective and Behavioral Neuroscience 2 (2), 174–
186.

Dennis, N. A., Daselaar, S., & Cabeza, R. (2006). Effects of aging on transient and sus­
tained successful memory encoding activity. Neurobiology of Aging, 28 (11), 1749–1758.

Dennis, N. A., Hayes, S. M., Prince, S. E., Madden, D. J., Huettel, S. A., & Cabeza, R.
(2008). Effects of aging on the neural correlates of successful item and source memory
encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (4),
791–808.

Dennis, N. A., Kim, H., & Cabeza, R. (2008). Age-related differences in brain activity dur­
ing true and false memory retrieval. Journal of Cognitive Neuroscience, 20 (8), 1390–
1402.

Dickerson, B. C., Salat, D. H., Bates, J. F., Atiya, M., Killiany, R. J., Greve, D. N., et al.
(2004). Medial temporal lobe function and structure in mild cognitive impairment. Annals
of Neurology, 56 (1), 27–35.

Duverne, S., Motamedinia, S., & Rugg, M. D. (2009). The relationship between ag­
(p. 471)

ing, performance, and the neural correlates of successful memory encoding. Cerebral
Cortex, 19 (3), 733–744.

Eichenbaum, H., Otto, T., & Cohen, N. J. (1994). Two functional components of the hip­
pocampal memory system. Behavioral and Brain Sciences, 17 (3), 449–472.

Page 23 of 27
Age-Related Decline in Working Memory and Episodic Memory

Eichenbaum, H., Yonelinas, A. P., & Ranganath, C. (2007). The medial temporal lobe and
recognition memory. Annual Review in Neuroscience, 30, 123–152.

Gabrieli, J. D. (1998). Cognitive neuroscience of human memory. Annual Review of Psy­


chology, 49, 87–115.

Giovanello, K. S., Kensinger, E. A., Wong, A. T., & Schacter, D. L. (2010). Age-related neur­
al changes during memory conjunction errors. Journal of Cognitive Neuroscience, 22 (7),
1348–1361.

Glisky, E. L. (2007). Changes in cognitive function in human aging. In D. R. Riddle (Ed.),


Brain aging: Models, methods, and mechanisms (pp. 1–15). Boca Raton, FL: CRC Press.

Grady, C. L. (2005). Functional connectivity during memory tasks in healthy aging and de­
mentia. In R. Cabeza, L. Nyberg, & D. Park (Eds.), Cognitive neuroscience of aging (pp.
286–308). New York: Oxford University Press.

Grady, C. L., McIntosh, A. R., Bookstein, F., Horwitz, B., Rapoport, S. I., & Haxby, J. V.
(1998). Age-related changes in regional cerebral blood flow during working memory for
faces. NeuroImage, 8 (4), 409–425.

Grady, C. L., McIntosh, A. R., Horwitz, B., Maisog, J. M., Ungerleider, L. G., Mentis, M. J.,
et al. (1995). Age-related reductions in human recognition memory due to impaired en­
coding. Science, 269 (5221), 218–221.

Gutchess, A. H., Welsh, R. C., Hedden, T., Bangert, A., Minear, M., Liu, L. L., et al. (2005).
Aging and the neural correlates of successful picture encoding: Frontal activations com­
pensate for decreased medial-temporal activity. Journal of Cognitive Neuroscience, 17 (1),
84–96.

Hasher, L., Zacks, R. T., & May, C. P. (1999). Inhibitory control, circadian arousal, and
age. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII, cognitive regula­
tion of performance: Interaction of theory and application (pp. 653–675). Cambridge, MA:
MIT Press.

Hedden, T., & Gabrieli, J. D. (2004). Insights into the ageing mind: A view from cognitive
neuroscience. Nature Reviews Neuroscience, 5 (2), 87–96.

Java, R. I. (1996). Effects of age on state of awareness following implicit and explicit
word-association tasks. Psychology and Aging, 11 (1), 108–111.

Jennings, J. M., & Jacoby, L. L. (1993). Automatic versus intentional uses of memory: Ag­
ing, attention, and control. Psychology and Aging, 8 (2), 283–293.

Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological
Bulletin, 114 (1), 3–28.

Page 24 of 27
Age-Related Decline in Working Memory and Episodic Memory

Kemper, T. (1994). Neuroanatomical and neuropathological changes during aging and in


dementia. In M. L. Albert & E. J. E. Knoepfel (Eds.), Clinical neurology of aging (2nd ed.,
pp. 3–67). New York: Oxford University Press.

Killiany, R. J., Gomez-Isla, T., Moss, M., Kikinis, R., Sandor, T., Jolesz, F., et al. (2000). Use
of structural magnetic resonance imaging to predict who will get Alzheimer’s disease. An­
nals of Neurology, 47 (4), 430–439.

Logan, J. M., Sanders, A. L., Snyder, A. Z., Morris, J. C., & Buckner, R. L. (2002). Under-re­
cruitment and nonselective recruitment: Dissociable neural mechanisms associated with
aging. Neuron, 33 (5), 827–840.

Mantyla, T. (1993). Knowing but not remembering: Adult age differences in recollective
experience. Memory and Cognition, 21 (3), 379–388.

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­
nual Review of Neuroscience, 24, 167–202.

Milner, B. (1972). Disorders of learning and memory after temporal lobe lesions in man.
Clinical Neurosurgery, 19, 421–446.

Mitchell, K. J., Johnson, M. K., Raye, C. L., & D’Esposito, M. (2000). fMRI evidence of age-
related hippocampal dysfunction in feature binding in working memory. Cognitive Brain
Research, 10 (1-2), 197–206.

Morcom, A. M., Good, C. D., Frackowiak, R. S., & Rugg, M. D. (2003). Age effects on the
neural correlates of successful memory encoding. Brain, 126, 213–229.

Morcom, A. M., Li, J., & Rugg, M. D. (2007). Age effects on the neural correlates of
episodic retrieval: Increased cortical recruitment with matched performance. Cerebral
Cortex, 17, 2491–2506.

Naveh-Benjamin, M. (2000). Adult age differences in memory performance: Tests of an


associative deficit hypothesis. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 26 (5), 1170–1187.

Nestor, P. J., Scheltens, P., & Hodges, J. R. (2004). Advances in the early detection of
Alzheimer’s disease. Nature Medicine, 10 (Suppl), S34–S41.

Paller, K. A., & Wagner, A. D. (2002). Observing the transformation of experience into
memory. Trends in Cognitive Sciences, 6 (2), 93–102.

Park, D. C., Welsh, R. C., Marshuetz, C., Gutchess, A. H., Mikels, J., Polk, T. A., et al.
(2003). Working memory for complex scenes: Age differences in frontal and hippocampal
activations. Journal of Cognitive Neuroscience, 15 (8), 1122–1134.

Parkin, A. J., & Walter, B. M. (1992). Recollective experience, normal aging, and frontal
dysfunction. Psychology and Aging, 7, 290–298.

Page 25 of 27
Age-Related Decline in Working Memory and Episodic Memory

Pennanen, C., Kivipelto, M., Tuomainen, S., Hartikainen, P., Hanninen, T., Laakso, M. P., et
al. (2004). Hippocampus and entorhinal cortex in mild cognitive impairment and early
AD. Neurobiology of Aging, 25 (3), 303–310.

Ranganath, C., & Blumenfeld, R. S. (2005). Doubts about double dissociations between
short- and long-term memory. Trends in Cognitive Sciences, 9 (8), 374–380.

Raz, N. (2005). The aging brain observed in vivo. In R. Cabeza, L. Nyberg & D. C. Park
(Eds.), Cognitive neuroscience of aging (pp. 19–57). New York: Oxford University Press.

Raz, N., Gunning, F. M., Head, D., Dupuis, J. H., McQuain, J., Briggs, S. D., et al. (1997).
Selective aging of the human cerebral cortex observed in vivo: Differential vulnerability
of the prefrontal gray matter. Cerebral Cortex, 7 (3), 268–282.

Raz, N., Lindenberger, U., Rodrigue, K. M., Kennedy, K. M., Head, D., Williamson, A., et al.
(2005). Regional brain changes in aging healthy adults: General trends, individual differ­
ences and modifiers. Cerebral Cortex, 15 (11), 1676–1689.

Reuter-Lorenz, P. A., & Cappell, K. A. (2008). Neurocognitive aging and the compensation
hypothesis. Current Directions in Psychological Science, 17 (3), 177–182.

Reuter-Lorenz, P., Jonides, J., Smith, E. S., Hartley, A., Miller, A., Marshuetz, C., et
(p. 472)

al. (2000). Age differences in the frontal lateralization of verbal and spatial working mem­
ory revealed by PET. Journal of Cognitive Neuroscience, 12, 174–187.

Reuter-Lorenz, P. A., Marshuetz, C., Jonides, J., Smith, E. E., Hartley, A., & Koeppe, R.
(2001). Neurocognitive ageing of storage and executive processes. European Journal of
Cognitive Psychology, 13 (1-2), 257–278.

Rosen, A. C., Prull, M. W., O’Hara, R., Race, E. A., Desmond, J. E., Glover, G. H., et al.
(2002). Variable effects of aging on frontal lobe contributions to memory. NeuroReport, 13
(18), 2425–2428.

Rossi, S., Miniussi, C., Pasqualetti, P., Babiloni, C., Rossini, P. M., & Cappa, S. F. (2004).
Age-related functional changes of prefrontal cortex in long-term memory: A repetitive
transcranial magnetic stimulation study. Journal of Neuroscience, 24 (36), 7939–7944.

Rypma, B., & D’Esposito, M. (2000). Isolating the neural mechanisms of age-related
changes in human working memory. Nature Neuroscience, 3 (5), 509–515.

Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition.


Psychological Review, 103 (3), 403–428.

Scheltens, P., Fox, N., Barkhof, F., & De Carli, C. (2002). Structural magnetic resonance
imaging in the practical assessment of dementia: beyond exclusion. Lancet Neurology, 1
(1), 13–21.

Page 26 of 27
Age-Related Decline in Working Memory and Episodic Memory

Schneider-Garces, N. J., Gordon, B. A., Brumback-Peltz, C. R., Shin, E., Lee, Y., Sutton, B.
P., et al. (2009). Span, CRUNCH, and beyond: Working memory capacity and the aging
brain. Journal of Cognitive Neuroscience, 22 (4), 655–669.

Simons, J. S., & Spiers, H. J. (2003). Prefrontal and medial temporal lobe interactions in
long-term memory. Nature Reviews Neuroscience, 4 (8), 637–648.

Spencer, W. D., & Raz, N. (1995). Differential effects of aging on memory for content and
context: A meta-analysis. Psychology and Aging, 10 (4), 527–539.

Squire, L. R., Schmolck, H., & Stark, S. M. (2001). Impaired auditory recognition memory
in amnesic patients with medial temporal lobe lesions. Learning and Memory, 8 (5), 252–
256.

Tulving, E. (1983). Elements of episodic memory. Oxford, UK: Clarendon Press.

Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A meta-
analysis. Cognitive, Affective and Behavioral Neuroscience, 3 (4), 255–274.

West, R. L. (1996). An application of prefrontal cortex function theory to cognitive aging.


Psychological Bulletin, 120 (2), 272–292.

Yonelinas, A. P. (2001). Components of episodic memory: The contribution of recollection


and familiarity. Philosophical Transactions of the Royal Society of London, Series B, Bio­
logical Sciences, 356 (1413), 1363–1374.

Sander Daselaar

Sander Daselaar, Donders Institute for Brain, Cognition, and Behaviour, Radboud
University, Nijmegen, Netherlands, Center for Cognitive Neuroscience, Duke Univer­
sity, Durham, NC

Roberto Cabeza

Roberto Cabeza is professor at the Department of Psychology and Neuroscience and


a core member at the Center for Cognitive Neuroscience, Duke University.

Page 27 of 27
Memory Disorders

Memory Disorders  
Barbara Wilson and Jessica Fish
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0023

Abstract and Keywords

Following some introductory comments, this chapter describes the ways in which memo­
ry systems can be affected after an insult to the brain. The main focus is on people with
nonprogressive brain injury, particularly traumatic brain injury (TBI), stroke, encephali­
tis, and hypoxia. References are provided for those readers interested in memory disor­
ders following progressive conditions such as dementia. The chapter then considers re­
covery of memory function for people with nonprogressive memory deficits. This is fol­
lowed by a section addressing the assessment of memory abilities, suggesting that both
standardized and functional assessment procedures are required to identify an
individual’s cognitive strengths and weaknesses and to plan for rehabilitation. The final
part of the chapter addresses rehabilitation of memory functioning, including compen­
satory strategies, ways to improve new learning, the value of memory groups, and ways
to reduce the emotional consequences of memory impairment.

Keywords: memory disorders, brain injury, memory function, memory systems

Introduction and Overview


The neuroanatomical structures involved in memory functioning include the hippocampi
and surrounding areas, the limbic structures, and the frontal-temporal cortex. Given the
number of structures and networks involved, it is not surprising that memory problems
are so frequently reported after many kinds of brain damage.

Memory impairment is one of the most common consequences of an insult to the brain
(Stilwell et al., 1999), affecting almost all people with dementia (approximately 10 per­
cent of people over the age of 65 years), some 36 percent of survivors of TBI, and about
70 percent of survivors of encephalitis, as well as those with hypoxic brain damage fol­
lowing cardiac or pulmonary arrest, attempted suicide, or near drowning, Parkinson’s dis­
ease, multiple sclerosis, AIDS, Korsakoff’s syndrome, epilepsy, cerebral tumors, and so

Page 1 of 26
Memory Disorders

forth. The problem is enormous, and unfortunately, many people do not get the care and
help they and their families need.

What do we mean by memory? Memory is not one skill or function or system but is a
“complex combination of memory subsystems” (Baddeley, 1992, p. 5). We can classify or
understand these systems in a number of ways: We can consider the stages involved in
memory, the length of time for which information is stored, the type of information
stored, the modality information is stored in, whether explicit or implicit recall is re­
quired, whether recall or recognition is required, whether we are trying to remember
things that have already occurred or remember to do things in the future, and whether
memories date from before or after an insult to the brain. These distinctions are elaborat­
ed on in the next section of this chapter.

Although survivors of brain injury can have problems with each one of these subcate­
gories of (p. 474) memory, the most typical scenario is for a memory-impaired person to
have (1) normal or near normal immediate memory; (2) problems remembering after a
delay or distraction; (3) a memory gap for the time just before the accident, illness, or in­
sult to the brain; and (4) difficulty learning most new information. Those with no other
cognitive deficits (apart from memory) are said to have an amnesic syndrome, whereas
those with more widespread cognitive deficits (the majority of memory-impaired people)
are said to have memory impairment. This distinction is not always adhered to, however,
and it is not uncommon to hear the term amnesia used for all memory-impaired people.

Although people with progressive conditions will not recover or improve memory func­
tioning, much can be done to help them survive with a better quality of life (Camp et al.,
2000; Clare, 2008; Clare & Woods, 2004), they can learn new information (Clare et al.,
1999, 2000) and be helped to compensate for their difficulties (Berry, 2007). Recovery for
people with nonprogressive brain injury is addressed below.

Assessment of memory functioning and other cognitive areas is an important part of any
treatment program for people with memory difficulties. We need to identify the person’s
cognitive strengths and weaknesses, and we need to identify the everyday problems caus­
ing the most distress for the patient and family. Having identified these situations, we
then need to help the memory-impaired person to reduce his or her real-life problems. As­
sessment and treatment are the themes of the final parts of this chapter.

Understanding Memory Disorders


Stages of Remembering

There are three stages involved in remembering: namely encoding, or the stage of taking
in information; storage, the stage of retaining of information; and retrieval, the stage of
accessing information when it is required. All can be differentially affected by an insult to
the brain even though, in real life, these stages interact with one another. For example,
people with encoding problems show attention difficulties. Although it is the case that in
Page 2 of 26
Memory Disorders

some circumstances remembering is unintentional and we can recall things that happen
when we are not paying attention to them, typically we need to pay attention when we
are learning something new or when it is important to remember. People with the classic
amnesic syndrome do not have encoding difficulties, while those with executive deficits,
say, following a TBI or in the context of dementia, may well have them.

Once information is registered in memory, it has to be stored there until needed. Most
people forget new information rather rapidly over the first few days, and then the rate of
forgetting slows down. This is also true for people with memory problems, bearing in
mind of course that in their case, relatively little information may be stored in the first
place. However, once the information is encoded adequately and enters the long-term
store, testing, rehearsal, or practice can help keep it there.

Retrieving information when we need it is the third stage in the memory process. We all
experience occasions when we know we know something such as a particular word or the
name of a person or a film, yet we cannot retrieve it at the right moment. This is known
as the “tip of the tongue phenomenon.” If someone provides us with a word we can usual­
ly determine immediately whether or not it is correct. Retrieval problems are even more
likely for people with memory problems. If we can provide a “hook” in the form of a cue
or prompt, we may be able to help them access the correct memory. Wilson (2009)
discusses ways to improve encoding, storage, and retrieval.

Anterograde and Retrograde Amnesia

A distinction frequently applied to memory disorders is that between anterograde and


retrograde amnesia. Retrograde amnesia refers to the period of impaired recall of events
that took place before the insult to the brain. Anterograde amnesia refers to the memory
difficulties that follow an insult to the brain. Most memory-impaired people have both ret­
rograde and anterograde memory deficits, with a variable period of retrograde amnesia.
It may be as short as a few minutes following TBI and may extend back for decades fol­
lowing encephalitis (Wilson et al., 2008). A few reports exist of people with an isolated
retrograde amnesia and no anterograde deficits (Kapur, 1993, 1999; Kopelman, 2000), al­
though some of these at least are likely to be of psychogenic origin (Kopelman, 2004).
Conversely, there are a few people with severe anterograde amnesia with no loss of mem­
ories before the insult (Wilson, 1999). In clinical practice, anterograde amnesia is more
handicapping for memory-impaired patients and is, therefore, the main focus of rehabili­
tation, although a few people need help with recall of their earlier autobiographical
knowledge.

Memory Systems

Short-Term and Working Memory


Atkinson and Shiffrin’s (1971) modal model classified human memory on the broad basis
of the length of time for which information is stored. (p. 475) On this basis, sensory memo­
ry stores information for less than one-fourth of a second (250 ms),1 the short-term memo­

Page 3 of 26
Memory Disorders

ry store holds information for a few seconds, and the long-term memory store holds infor­
mation for anything from minutes to years.

The term short-term memory is so misused by the general public that it may be better to
avoid it and instead to use the terms immediate memory, primary memory, and working
memory. Immediate memory and primary memory refer to the memory span as measured
by the number of digits that can be repeated in the correct order after one presentation
(seven plus or minus two for the vast majority of people; Miller, 1956) or by the recency
effect in free recall (the ability to recall the last few items in a list of words, letters, or
digits). Baddeley (2004) prefers the term primary memory when referring to the simple
unitary memory system and working memory when referring to the interacting range of
temporary memory systems people typically use in real-life situations.

Baddeley and Hitch’s (1974) working memory model, which describes how information is
stored and manipulated over short periods of time, is one of the most influential theories
in the history of psychology. Working memory is composed of three components: the cen­
tral executive and two subsidiary systems, the phonological loop and the visual-spatial
sketchpad. The phonological loop is a store that holds memory for a few seconds, al­
though this can be increased through the use of subvocal speech and can also convert vi­
sual material that is capable of being named into a phonological code (Baddeley, 2004).
The visual-spatial sketchpad is the second subsidiary system that allows for the tempo­
rary storage and manipulation of visual and spatial information. In clinical practice, one
can see people with selective damage to each of these systems. Those with a central exec­
utive disorder are said to have the dysexecutive syndrome (Baddeley, 1986; Baddeley &
Wilson, 1988a). Their main difficulties are with planning, organization, divided attention,
perseveration, and dealing with new or novel situations (Evans, 2005). Those with phono­
logical loop disorders have difficulties with speech processing and with new language ac­
quisition. The few patients we see with immediate memory disorders, who can only re­
peat back one or two digits, words, or letters accurately, have a phonological loop disor­
der. Vallar and Papagno (2002) discuss the phonological loop in detail, whereas Baddeley
and Wilson (1988b; Wilson & Baddeley, 1993) describe a patient with a phonological loop
deficit and his subsequent recovery. People with visual-spatial sketchpad disorders may
have deficits in just one of these functions, in other words, visual and spatial difficulties
are separable (Della Sala & Logie, 2002). Although one such patient was described by
Wilson, Baddeley, and Young (1999), such patients are rare in clinical practice and are not
typical of memory-impaired people or of those referred for rehabilitation.

In 2000, Baddeley added a fourth component to working memory, namely, the episodic
buffer, a multimodal temporary interface between the two subsidiary systems and long-
term memory. When assessing an amnesic patient’s immediate recall of a prose passage,
it is not uncommon to find that the score is in the normal range even though, given the
capacity of the short-term store, the recall should be no more than seven words plus or
minus two (Miller, 1956). The episodic buffer provides an explanation for the enhanced
score because it allows amnesic patients with good intellectual functioning and no execu­
tive impairments to show apparently normal immediate memory for prose passages that

Page 4 of 26
Memory Disorders

would far exceed the capacity of either of the subsidiary systems. Delayed memory for
people with amnesia is, of course, still severely impaired.

Declarative Long-Term Memory


The term declarative memory refers to forms of memory that involve explicit or conscious
recollection. Within this, Tulving (1972) introduced the distinction between semantic
memory and episodic memory. Memory for general knowledge, such as facts, the mean­
ings of words, the visual appearance of objects, and the color of things, is known as se­
mantic memory. In contrast, episodic memory relates to memory for personal experiences
and events, such as the memory of one’s last vacation or paying the telephone bill.

Episodic Memory
Episodic memory is autobiographical in nature. Tulving (2002) states that it is “about hap­
penings in particular places at particular times, or about ‘what,’ ‘where,’ and ‘when,’”,
and that it “makes possible mental time travel,” in the sense that it allows one to relive
previous experiences (p. 3). Broadly speaking, episodic memory is supported by the medi­
al temporal lobes, although there is continuing debate regarding the specific roles of the
hippocampus and the parahippocampal, entorhinal, and perirhinal cortices in supporting
different aspects of episodic memory. As would be expected, impairment of episodic mem­
ory is common following damage (p. 476) to the temporal lobes, and it is the primary fea­
ture of Alzheimer’s disease. Tulving and colleagues have also reported a case of relatively
specific and near-total episodic memory loss. The patient K.C. suffered a closed head in­
jury that resulted in widespread and extensive brain damage, including damage to the
medial temporal lobes. Subsequently, K.C. was unable to remember any events from his
life before the injury and was unable to retain “new” events for more than a few minutes.
In contrast, he retained semantic memory for personal details such as addresses and
school names, although he could not create new personal semantic memories (see Tulv­
ing, 2002, for a summary).

In understanding disorders of episodic memory, modality and material specificity are im­
portant considerations to bear in mind. In everyday life, we may be required to remember
information and events from several modalities, including things we have seen, heard,
smelled, touched, and tasted. In memory research, however, the main area of concern is
memory for auditory or visual information. In addition to the modality that material is
presented in, we should also consider the material itself, that is, whether it is verbal or
nonverbal. Not only can we remember things that we can label or read (verbal material),
we can also remember things that we cannot easily verbalize such as a person’s face (vi­
sual material). Because different parts of the brain are responsible for visual and verbal
processing, these can be independently affected, with some people having a primary diffi­
culty with nonverbal memory and others with verbal memory, as demonstrated by Milner
many years ago (1965, 1968, 1971) when she found that removal of the left temporal lobe
resulted in verbal memory deficits and removal of the right temporal lobe led to more dif­
ficulties with remembering nonverbal material, such as faces, patterns, and mazes. If
memory for one kind of material is less affected, it might be possible to capitalize on the

Page 5 of 26
Memory Disorders

less damaged system in order to compensate for the damaged one. People with amnesic
syndrome have problems with both systems. This does not mean that they cannot benefit
from mnemonics and other memory strategies, as we will see later in this chapter.

Recall and recognition are two other episodic memory processes that may be differential­
ly affected in people with neurological disorders. Differences between recall and recogni­
tion may be attributable to their differential reliance on recollection (having distinct and
specific memory of the relevant episode) and familiarity (a feeling of knowing in relation
to the episode), and there has been considerable debate about whether familiarity and
recollection are separable within the medial temporal lobes. Diana et al. (2007) reviewed
functional imaging literature on this topic, and concluded that recollection is associated
with patterns of increased activation within the hippocampus and posterior parahip­
pocampal gyrus, whereas familiarity is associated with activations in anterior parahip­
pocampal gyrus (perirhinal cortex).

For most memory-impaired people, recall is believed to be harder than recognition. Kopel­
man et al. (2007), however, believe this may be due to the way the two processes are
measured. They studied patients with hippocampal, medial temporal lobe, more wide­
spread temporal lobe, or frontal pathology. Initially, it looked as if all patients found recall
harder than recognition, but when converted to Z scores, the differences were eliminat­
ed. In clinical practice, it is important to measure both visual and verbal recall and recog­
nition, as discussed in the assessment section below.

It is also well established that patients with frontal lobe damage, because of poor strategy
application (Kopelman & Stanhope 1998; Shallice & Burgess 1991), benefit from cues and
thus are less impaired on recognition tasks than on recall tasks because they do not have
to apply a retrieval strategy. This probably depends on which regions of the frontal lobes
are affected. Stuss and Alexander (2005) found that different areas of the frontal lobes
were concerned with different memory processes and that some regions were involved
with nonstrategic memory encoding. A later study by McDonald et al. (2006) found that
only patients with right frontal (and not those with left frontal) lesions had recognition
memory deficits. Fletcher and Henson (2001) reviewed functional imaging studies of
frontal lobe contributions to memory and concluded that ventrolateral frontal regions are
associated with updating or maintenance in memory; dorsolateral areas with selection,
manipulation, and monitoring, and anterior areas with the selection of processes or sub­
goals.

Semantic Memory
We all have a very large store of information about what things mean, look like, sound
like, smell like, and feel like, and we do not need to remember when this information was
acquired or who was present at the time. Learning typically takes place over many occa­
sions in a variety of circumstances. Most memory-impaired people have a normal (p. 477)
semantic memory at least for information acquired before the onset of the memory im­
pairment. People with a pure amnesic syndrome have little difficulty recalling from their
semantic memory store information laid down well before the onset of the amnesia, but

Page 6 of 26
Memory Disorders

they may have great difficulty laying down new semantic information. This is because ini­
tially one has to depend on episodic memory in order for information to enter the seman­
tic store (Cermak & O’Connor, 1983).

The group most likely to show problems with semantic memory includes those with pro­
gressive semantic dementia (Snowden, 2002; Snowden, Neary, Mann, et al., 1992; War­
rington, 1975). This condition is characterized by a “selective degradation of core seman­
tic knowledge, affecting all types of concept, irrespective of the modality of
testing” (Lambon Ralph & Patterson, 2008, p. 61). Episodic memory deficits in these peo­
ple may be less affected. Mummery et al. (2000) reported that although several areas of
frontal and temporal cortex are affected in semantic dementia, the area most consistently
and significantly affected in semantic dementia is the left anterior temporal lobe. Further­
more, the degree of atrophy in this region is associated with the degree of semantic im­
pairment—a relationship that does not hold for the other regions affected. Semantic
memory difficulties may also be seen in some survivors of nonprogressive brain injury,
particularly after encephalitis or anoxia, although people with TBI can also exhibit these
problems (Wilson, 1997).

Relationship Between Semantic Memory and Episodic Memory


Tulving’s initial conceptualization saw semantic and episodic memory as wholly indepen­
dent systems. Subsequent revisions to this conceptualization (e.g., Tulving, 1995) have,
however, incorporated the idea that episodic memory is dependent on semantic memory
and, as such, that the system is hierarchical. This idea has intuitive appeal because it is
difficult to imagine how one could remember an event (e.g., eating an apple for break­
fast) in the absence of knowledge about the constituent factors (e.g., knowing what an ap­
ple is). It has, however, been demonstrated that episodic memory can indeed be pre­
served in the absence of semantic memory. For example, Graham et al. (2000) found that
people with impaired semantic memory retained the ability to recognize previously seen
objects regarding which they had no semantic knowledge, as long as they were perceptu­
ally identical at the study and test phases. This suggests that episodic learning can be
supported by a perceptual process in the absence of conceptual ones, and argues against
a strictly hierarchical model. Baddeley (1997, 2004) also disagrees with the view that se­
mantic and episodic memory are independent, albeit from a different perspective. He
says that in most situations, there is a blend of the two: If one recalls what happened last
Christmas (an episodic task), then this will be influenced by the semantic knowledge of
what one typically does at Christmas. Most people with memory problems have episodic
memory deficits, which are their major handicap in everyday life.

Prospective Memory
Yet another way we can understand memory problems is to distinguish between retro­
spective memory and prospective memory. The former is memory for past events, inci­
dents, word lists, and other experiences as discussed above. This can be contrasted with
prospective memory, which is remembering to do things rather than remembering things
that have already happened. One might have to remember to do something at a particular

Page 7 of 26
Memory Disorders

time (e.g., to watch the news on television at 9.00 p.m.), within a given interval of time
(e.g., to take your medication in the next 10 minutes), or when a certain event happens
(e.g., when you next see your sister, to give her a message from an old school friend).
Groot, Wilson, Evans, and Watson (2002) showed that prospective memory can fail be­
cause of memory failures or because of executive difficulties. Indeed, there is support for
the view that measures of executive functioning are better at accounting for prospective
memory failures than are measures of memory (Burgess et al., 2000). In a review of the
literature, Fish, Manly, and Wilson (2010) suggested a hierarchical model whereby
episodic memory problems are likely to lead to prospective memory problems due to for­
getting task-related information (e.g., what to do, when to do it, that there is a task to do).
However, when retrospective memory functioning is adequate, other more executive
problems can lead to prospective memory failure, for example, those resulting from inad­
equate monitoring of the self and environment for retrieval cues, failure to initiate the in­
tended activity, or difficulty in applying effective strategies and managing multiple task
demands. Considering the multi-componential nature of prospective memory tasks, it is
not surprising that problems with such tasks are one of the most frequently reported
complaints when people are asked what everyday memory problems they face (Baddeley,
2004). (p. 478) In clinical practice, treatment of prospective memory disorders is one of
the major components of memory rehabilitation.

Nondeclarative Long-Term Memory


As mentioned previously, the term nondeclarative memory refers to memory that does not
involve explicit or conscious remembering. Squire and Zola-Morgan (1988) identified sev­
eral heterogeneous subtypes of nondeclarative memory, including procedural memory,
priming, classical conditioning, and adaptation effects. These types of memory are gener­
ally reported to be intact in even densely amnesic patients, such as the famous case of
H.M. (for a review, see Corkin, 2002). The nondeclarative abilities most relevant in terms
of memory disorders are procedural memory and priming.

Procedural Memory
Procedural memory refers to the abilities involved in learning skills or routines. These
abilities are generally tested in the laboratory by asking people to repeatedly perform an
unfamiliar perceptual or motor task, such as reading mirror-reversed words or tracing
the shape of a star with only the reflection of a mirror for visual feedback. Common real-
life examples are learning how to use a computer or to ride a bicycle. The primary char­
acteristic of this kind of learning is that it does not depend on conscious recollection; in­
stead, the learning can be demonstrated without the need to be aware of where and how
the original learning took place. For this reason, most people with memory problems
show normal or relatively normal procedural learning (Brooks & Baddeley, 1976; Cohen
et al., 1985). Some patients are known to show impaired procedural learning, particularly
those with Huntington’s disease and those with Parkinson’s disease (Osman et al., 2008;
Vakil & Herishanu-Naaman, 1998). People with Alzheimer’s disease may show a deficit
(Mitchell & Schmitt, 2006) or may not (Hirono et al., 1997).

Page 8 of 26
Memory Disorders

Priming
The term priming refers to the processes whereby performance on a given task is im­
proved or biased through prior exposure or experience. Again, no conscious memory of
the previous episode is necessary. Thus, if an amnesic patient is shown and reads a list of
words before being shown the word stems (in the form of the first two or three letters of
the word), he or she is likely to respond with the previously studied words even though
there is no conscious or explicit memory of seeing them before (Warrington & Weiskrantz,
1968). Further, a double dissociation has been reported between priming and recognition
memory. Keane et al. (1995) reported that a patient with bilateral occipital lesions showed
impaired priming but intact recognition memory, whereas patient H.M., who had bilateral
medial temporal lesions, showed intact priming but impaired recognition memory. Hen­
son (2009), integrating such evidence from neuropsychological studies, along with experi­
mental psychological and neuroimaging studies of priming, considers that priming is a
form of memory dissociable from declarative memory, which reflects plasticity in the form
of reduced neural activity in areas of the brain relevant to the task in question. For exam­
ple in word-reading tasks, such reductions are seen in the left inferior frontal, left inferior
temporal, and occipital regions. These reductions are thought to reflect more efficient
processing of the stimuli.

Recovery of Memory Functioning

Recovery means different things to different people. Some focus on survival rates, others
are more concerned with recovery of cognitive functions, and others only consider biolog­
ical recovery such as repair of brain structures. Some interpret recovery as the exact re­
instatement of behaviors disrupted by the brain injury (LeVere, 1980), but this state is
rarely achieved by memory-impaired people. Jennett and Bond (1975) define “good recov­
ery” on the Glasgow Outcome Scale as “resumption of normal life even though there may
be minor neurological and psychological deficits” (p. 483). This is sometimes achievable
for those with organic memory problems (Wilson, 1999). Kolb (1995) says that recovery
typically involves partial recovery of lost functioning together with considerable substitu­
tion of function, that is, compensating for the lost function through other means. Because
this includes both natural recovery and compensatory approaches, it is, perhaps, more
satisfactory for those of us working in rehabilitation.

TBI is the most common cause of brain damage and memory impairment in people
younger than 25 years. Some, and often considerable, recovery may be seen in people in­
curring such injury. This is likely to be fairly rapid in the early weeks and months after in­
jury, followed by a slower recovery that can continue for many years. Those with other
kinds of nonprogressive injury such as stroke (cerebrovascular accident), encephalitis,
and hypoxic brain damage may show a similar pattern of recovery, although (p. 479) this
typically lasts for months rather than years. For many people with severe brain damage,
recovery will be minimal, and compensatory approaches may provide the best chance of
reducing everyday problems, increasing independence, and enhancing quality of life.

Page 9 of 26
Memory Disorders

Mechanisms of recovery include resolution from edema or swelling of the brain, diaschi­
sis (whereby lesions cause damage to other areas of the brain through shock), plasticity
or changes to the structure of the nervous system, and regeneration or regrowth of neur­
al tissue. Changes seen in the first few minutes (e.g., after a mild head injury) probably
reflect the resolution of temporary damage that has not caused structural damage.
Changes seen within several days are more likely to be due to resolution of temporary
structural abnormalities such as edema, vascular disruption, or the depression of enzyme
metabolic activity. Recovery after several months or years is less well understood. There
are several ways in which this might be achieved, including diaschisis, plasticity, or re­
generation. Age at insult, diagnosis, the number of insults sustained by an individual, and
the premorbid status of the individual’s brain are other factors that may influence recov­
ery from brain injury. Kolb (2003) provides an overview of plasticity and recovery from
brain injury.

So much for general recovery, but what about recovery of memory functions themselves?
Although some recovery of lost memory functioning occurs in the early weeks and months
following an insult to the brain, many people remain with lifelong memory problems. Pub­
lished studies show contradictory evidence, with some individuals showing no improve­
ment and others showing considerable improvement. It is clear that we can improve on
natural recovery through rehabilitation (Wilson, 2009). Because restoration of episodic
memory is unlikely in most cases after the acute period, compensatory approaches are
the most likely to lead to changes in everyday memory functioning. Before beginning re­
habilitation, however, a thorough assessment is required, and this is addressed in the
next section.

Assessment of Memory Functioning

Assessment is the systematic collection, organization, and interpretation of information


about a person and his or her situation. It is also concerned with the prediction of behav­
ior in new situations (Sundberg & Tyler, 1962). Of course, the means by which we collect,
organize, and interpret this information depends on the purpose of the assessment. We
carry out assessments in order to answer questions, and the questions determine the as­
sessment procedure. A research question such as whether there are clear distinctions be­
tween immediate and delayed memory will be answered one way, whereas clinical ques­
tions such as, “what are the most frequent everyday memory problems faced by this pa­
tient?” will require a different approach.

Some questions can be answered through the use of standardized tests, others need func­
tional or behavioral assessments, and others may require specially designed procedures.
Standardized tests can help us answer questions about the nature of the memory deficit—
for example, “does this person have an episodic memory disorder?” or “Is the memory
problem global or restricted to certain kinds of material (e.g., Is memory for visual mater­
ial better than for verbal material)”? They can also help us answer questions about indi­
rect effects on memory functioning such as, “To what extent are the memory problems
due to executive, language, perceptual, or attention difficulties?” or “Is this person de­

Page 10 of 26
Memory Disorders

pressed?” If the assessment question is, “What are this person’s memory strengths and
weaknesses?” we should assess immediate and delayed memory; verbal and visuo-spatial
memory; recall and recognition; semantic and episodic memory; explicit and implicit
memory; anterograde and retrograde amnesia; and new learning and orientation. Pub­
lished standardized tests exist for many of these functions, apart from implicit memory
and retrograde amnesia, for which one might need to design specific procedures (see Wil­
son, 2004, for a fuller discussion of the assessment of memory). Of course, memory as­
sessments should not be carried out alone—it will also be necessary to assess general in­
tellectual functioning; predict premorbid ability; assess language, reading, perception, at­
tention, and executive functioning to get a clear picture of the person’s cognitive
strengths and weaknesses; and assess anxiety, depression, and perhaps other areas of
emotion such as post-traumatic stress disorder. These assessments are needed for reha­
bilitation because the emotional consequences of memory impairment should be treated
alongside the memory and other cognitive consequences of any insult to the brain
(Williams & Evans, 2003)

When we want to answer more treatment-related questions like, “How are the memory
difficulties manifested in everyday life?” or “What coping strategies are used?” we need a
more functional or behavioral approach through observations, self-report measures (from
relatives or caregivers as well as (p. 480) from the memory-impaired person), or inter­
views because these are more suited and able to answer real-life, practical problems. The
standardized and functional assessment procedures provide complementary information:
The former allow us to build up a cognitive map of a person’s strengths and weaknesses,
whereas the latter enable us to target areas for treatment.

Rehabilitation of Memory

Rehabilitation should focus on improving aspects of everyday life and should address per­
sonally meaningful themes, activities, settings, and interactions (Ylvisaker & Feeney,
2000). Many survivors of brain injury will face problems in everyday life. These could be
difficulties with motor functioning, impaired sensation, reduced cognitive skills, emotion­
al troubles, conduct or behavioral problems, and impoverished social relationships. Some
people will have all of these. In addition to emotional and psychosocial problems, neu­
ropsychologists in rehabilitation are likely to treat cognitive difficulties, including memo­
ry deficits (Wilson et al., 2009). The main purposes of rehabilitation, including memory
rehabilitation, are to enable people with disabilities to achieve their optimal level of well-
being, to reduce the impact of their problems on everyday life, and to help them return to
their own most appropriate environments. Its purpose is not to teach individuals to score
better on tests or to learn lists of words or to be faster at detecting stimuli. Apart from
some emerging evidence that it might be possible to achieve some degree of restoration
of working memory in children with attention deficit hyperactivity disorder (Klingberg,
2006; Klingberg et al., 2005) and possibly in stroke patients (Westerberg et al., 2007), no
evidence exists for recovery of episodic memory in survivors of brain injury. Thus restora­
tion of memory functioning is at this time an unrealistic goal. There is evidence, however,
that we can help people to compensate for their difficulties, find ways to help them learn
Page 11 of 26
Memory Disorders

more efficiently, and for those with very severe and widespread cognitive problems, orga­
nize the environment so that they can function without the need for memory (Wilson,
2009). These are more realistic goals for memory rehabilitation.

Helping People to Compensate for Their Memory Difficulties Through the


Use of External Memory Aids
A wide variety of external memory aids exist to help people remember what has to be
done in everyday life. These range from simple Post-It notes to pill boxes, alarms, and
talking watches to very sophisticated aids such as electronic organizers and global posi­
tioning system devices. External aids may alert someone to the fact that something needs
to be done at a particular time and place, taking medication or collecting children from
school, or they may act as systems to store information unrelated to a particular time or
place, such as an address and telephone book. Because the use of such aids involves
memory, the people who need them most usually have the greatest difficulty in learning
to use them. Nevertheless, some people use compensatory aids well (Evans et al., 2003;
Kime et al., 1996; Wilson, 1991). If aids are to be used successfully, people need insight;
motivation; certain cognitive, emotional, and motivational characteristics; previous use of
memory aids; demands on memory; support from family, school, or work; and availability
of appropriate aids (Scherer, 2005). Several studies have looked at the efficacy of exter­
nal aids for memory-impaired people (summarized in Wilson, 2009).

One series of studies carried out in Cambridge, England involves the use of a paging sys­
tem, “NeuroPage,” to determine whether people can carry out everyday tasks more effi­
ciently with or without a pager. Following a pilot study (Wilson et al., 1997), a randomized
control trial (cross-over design) was carried out. People were randomly allocated to pager
first (Group A) or waiting list first (Group B). All participants chose their own target be­
haviors that they needed to remember each day. Taking medication, feeding pets, and
turning on the hot water system were the kinds of target behavior selected. Most people
selected between four and seven messages each day. For 2 weeks (the baseline period),
participants recorded the frequency with which these targets were achieved each day; an
independent observer (usually a close relative) also checked to ensure the information
was accurate. In the baseline period, there were no significant differences between the
two groups. Group A participants were then provided with pagers for 7 weeks, while
Group B participants remained on the waiting list. There was a very significant improve­
ment in the targets achieved by Group A. Group B (on the waiting list) did not change.
Then Group A returned the pagers, which were given to Group B. Now Group B showed a
statistically significant improvement over baseline and waiting list periods. Group A par­
ticipants dropped back a little but were still better than baseline, showing that they had
learned many of their target behaviors during the 7 weeks with the pager (Wilson et al.,
2001). (p. 481) This study comprised several diagnostic groups, which have been reported
separately. People with TBI performed like the main group (Wilson et al., 2005), which is
not surprising because they were the largest subgroup. Four people with encephalitis all
improved with the pager (Emslie et al., 2007). People with stroke performed like the main
group in the baseline and treatment phases but dropped back to baseline levels when the

Page 12 of 26
Memory Disorders

pagers were returned (Fish et al., 2008), possibly because they were older and had more
executive deficits as a result of ruptured aneurysms on the anterior communicating
artery. There were twelve children in the study with ages ranging from 8 to 18 years (Wil­
son et al., 2009), all of whom benefited. Approximately 80 percent of the 143 people in
the study reduced their everyday memory and planning problems. As a result of the study,
the local health authority set up a clinical service for people throughout the United King­
dom—so this was an example of research influencing clinical practice.

New Learning for Memory-Impaired People


One of the most handicapping aspects of severe memory impairment is the great difficul­
ty in learning new information. Although many think that repetition is the answer, rote re­
hearsal or simply repeating material is not a particularly good learning strategy for peo­
ple with memory deficits. We can hear or read something many times over and still not
remember it, and the information may simply “go in one ear and out the other.” There are
ways to help memory-impaired people learn more efficiently, and this is one of the main
focuses of memory rehabilitation. The method of vanishing cues, spaced retrieval,
mnemonics, and errorless learning are, perhaps, the main ways to enhance new learning.
In the method of vanishing cues, prompts are provided and then gradually faded out. For
example, someone learning a new name might be expected first to copy the whole name,
then the last letter would be deleted; the name would be copied again and the last letter
inserted by the memory-impaired person, then the last two letters would be deleted and
the process repeated until all letters were completed by the memory-impaired person.
Glisky et al. (1986) were the first to report this method with memory-impaired people.
Several studies have since been published with both nonprogressive patients and those
with dementia (see Wilson, 2009, for a full discussion).

Spaced retrieval, also known as expanding rehearsal (Landauer & Bjork, 1978), is also
widely used in memory rehabilitation. This method involves the presentation of material
to be remembered, followed by immediate testing, then a very gradual lengthening of the
retention interval. Spaced retrieval may work because it is a form of distributed practice,
that is, distributing the learning trials over a period of time rather than massing them to­
gether in one block. Distributed practice is known to be more effective than massed prac­
tice (Baddeley, 1999). Camp and colleagues in the United States have used this method
extensively with dementia patients (Camp et al., 1996; 2000; McKitrick & Camp, 1993),
but it has also been used to help people with TBI, stroke, encephalitis, and dementia.
Sohlberg (2005) discusses using this method in people with learning difficulties.

Those of us without severe memory difficulties can benefit from trial-and-error learning.
We are able to remember our mistakes and thus can avoid making the same mistake in fu­
ture attempts. Because memory-impaired people have difficulty with this, any erroneous
response may be strengthened or reinforced. This is the rationale behind errorless learn­
ing, a teaching technique whereby the likelihood of mistakes during learning is mini­
mized as far as possible. Another way of understanding errorless learning is through the
principle of Hebbian plasticity and learning (Hebb, 1949). At a synaptic level, Hebbian
plasticity refers to increases in synaptic strength between neurons that fire together
Page 13 of 26
Memory Disorders

(“neurons that fire together wire together”). Hebbian learning refers to the detection of
temporally related inputs. If an input elicits a pattern of neural activity, then, according to
the Hebbian learning rule, the tendency to activate the same pattern on subsequent occa­
sions is strengthened. This means that the likelihood of making the same response in the
future, whether correct or incorrect, is strengthened (McClelland et al., 1999). Like im­
plicit memory, Hebbian learning has no mechanism for filtering out errors.

Errors can be avoided through the provision of spoken or written instructions or guiding
someone through a particular task or modeling the steps of a procedure. There is now
considerable evidence that errorless learning is superior to trial-and-error learning for
people with severe memory deficits. In a meta-analysis of errorless learning, Kessels and
De Haan (2003) found a large and statistically significant effect size of this kind of learn­
ing for those with severe memory deficits. The combination of (p. 482) errorless learning
and spaced retrieval would appear to be a powerful learning strategy for people with pro­
gressive conditions in addition to those with nonprogressive conditions (Wilson, 2009).

There has been some debate in the literature as to whether errorless learning depends on
explicit or implicit memory. Baddeley and Wilson (1994) argued that implicit memory was
responsible for the efficacy of errorless learning: Amnesic patients had to rely on implicit
memory, a system that is poor at eliminating errors (this is not to say that errorless learn­
ing is a measure of implicit memory). Nevertheless, there are alternative explanations.
For example, the errorless learning advantage could be due to residual explicit memory
processes or to a combination of both implicit and explicit systems. Hunkin et al. (1998)
argued that it is due entirely to the effects of error prevention on the residual explicit
memory capacities, and not to implicit memory at all. Specifically, they used errorless and
errorful learning protocols similar to Baddeley and Wilson (1994) to teach word lists to
people with moderate to severe and relatively specific memory impairment. They investi­
gated the errorless learning advantage in a fragment completion task intended to tap im­
plicit memory of the learned words and in a cued-recall task intended to tap explicit mem­
ory. The fragment completion task presented participants with two letters from learned
words from noninitial positions (e.g., _ _ T _ S _ ), with the task being to complete the
word. The cued-recall condition presented the first two letters of learned words (e.g., A R
_ _ _ _). In both of these examples the learned word is “ARTIST.” Hunkin et al. found that
the errorless learning advantage was only evident in the cued-recall task, with perfor­
mance in fragment completion being equivalent between errorless and errorful learning
conditions. The authors interpreted this result as meaning that the errorless learning ad­
vantage relies on residual explicit memory. Tailby and Haslam (2003) also believe that the
benefits of EL are due to residual explicit concurrent memory processes, although they do
not rule out implicit memory processes altogether. They say the issue is a complex one
and that different individuals may rely on different processes. Support for this view can
also be found in a paper by Kessels et al. (2005).

Page et al. (2006) claim, however, that preserved implicit memory in the absence of ex­
plicit memory is sufficient to produce the errorless learning advantage. They challenge
the conclusions of Hunkin et al. (1998) because the design of their implicit task was such

Page 14 of 26
Memory Disorders

that it was unlikely to be sensitive to implicit memory for prior errors. As is clear from the
example above, the stimuli in the fragment completion task used by Hunkin et al. did not
prime errors made during learning in the same way that the cued-recall stimuli did. To
continue with the previous example, if the errors “ARCHES” and “AROUND” had been
made during learning, neither would fit with the fragment “_ _ T _ S _.” If the errorless
learning advantage results from avoiding implicit memory of erroneous responses, then
an advantage would not be expected within this fragment completion task. Furthermore,
there was an element of errorful learning in both the errorless and errorful explicit mem­
ory conditions. They also challenge the Tailby and Haslam (2003) paper because it con­
flates two separate questions. First, is the advantage of errorless learning due to the con­
tribution of implicit memory? And second, is learning under errorless conditions due to
implicit memory? Perhaps some people do use both implicit and explicit systems when
learning material, but this does not negate the argument that the advantage (i.e., that
seen at retrieval) is due to implicit memory, particularly implicit memory for prior errors
following errorful learning. Some people with no or very little explicit recall can learn un­
der certain conditions such as errorless learning. For example, the Baddeley and Wilson
(1994) study included sixteen very densely amnesic participants with extremely little ex­
plicit memory capacity, yet nevertheless, every single one of them showed an errorless
over errorful advantage. Page et al. (2006) also found that people with moderate memory
impairment, and hence some retention of explicit memory, showed no greater advantage
from errorless learning than people with very severe memory impairment who had very
little explicit memory, which again supports the hypothesis that implicit memory is suffi­
cient to produce that advantage.

Mnemonics are systems that enable us to remember things more easily and usually refer
to internal strategies, such as reciting a rhyme to remember how many days there are in
a month, or remembering the order of information such as “My very elderly mother just
sat upon a new pin” to remember the order of the planets around the sun (where my
stands for Mercury, very for Venus, elderly for Earth, etc). Although verbal and visual
mnemonic systems have been used successfully with memory-impaired people (Wilson,
2009), not everyone can use them. Instead of expecting memory-impaired people to use
mnemonics spontaneously, therapists may need to employ them to help their patients
(p. 483) achieve faster learning for particular pieces of information, such as names of a

few people or a new address.

Modifying the Environment for Those with Severe and Widespread Cogni­
tive Deficits
People with very severe memory difficulties and widespread cognitive problems may be
unable to learn compensatory strategies and have major difficulties learning new episodic
information. They may be able to learn things implicitly. C.W., for example, the musician
who has one of the most severe cases of amnesia on record (Wearing, 2005; Wilson et al.,
2008) is unable to lay down any new episodic memories but has learned some things im­
plicitly. Thus, if he is asked “where is the kitchen?” he says he does not know, but if asked
if he would like to go and make himself some coffee, he will go to the kitchen without er­

Page 15 of 26
Memory Disorders

ror. He has implicit but no explicit memory of how to find the kitchen. For such people,
our only hope of improving quality of life and giving them as much independence as possi­
ble is probably to organize or structure the environment so that they can function without
the need for memory. People with severe memory problems may not be handicapped in
environments where there are no demands made on memory. Thus, if doors, closets,
drawers, and storage jars are clearly labeled, if rooms are cleared of dangerous equip­
ment, if someone appears to remind or accompany the memory-impaired person when it
is time to go to the hairdresser or to have a meal, the person may cope reasonably well.

Kapur et al. (2004) give other examples. Items can be left by the front door for people
who forget to take belongings with them when they leave the house; a message can be
left on the mirror in the hallway; and a simple flow chart can be used to help people
search in likely places when they cannot find a lost belonging (Moffat, 1989). Modifica­
tions can also be made to verbal environments to avoid irritating behavior such as the
repetition of a question, story, or joke. It might be possible to identify a “trigger” or an an­
tecedent that elicits this behavior. Thus, by eliminating the “trigger,” one can avoid the
repetitious behavior. For example, in response to the question “How are you feeling to­
day?” one young brain-injured man would say “Just getting over my hangover.” If staff
simply said “Good morning,” however, he replied “Good morning,” so the repetitious com­
ments about his supposed hangover were avoided.

Hospitals and nursing homes, in addition to other public spaces such as shopping cen­
ters, may use color coding, signs, and other warning systems to reduce the chances of
getting lost. “Smart houses” are already in existence to help “disable the disabling envi­
ronment” described by Wilson and Evans (2000). Some of the equipment used in Smart
houses can be employed to help the severely memory-impaired patient survive more easi­
ly. For example, “photo phones” are telephones with large buttons (available from ser­
vices for visually impaired people); each button can be programmed to dial a particular
number and a photo of the person who owns that number can be pasted on to the large
button. Thus, if a memory-impaired person wants to dial her daughter or her district
nurse, she simply presses the right photograph, and the number is automatically dialled.

Emotional Consequences of Memory Impairment


In addition to their memory problems, many memory-impaired people have other cogni­
tive deficits such as impaired attention, word-finding problems, and difficulties with plan­
ning, judgment, and reasoning, and they may also suffer emotional disorders such as anx­
iety, depression, mood swings, and anger problems. When neuropsychological rehabilita­
tion programs address the cognitive, emotional, and psychosocial consequences of brain
injury, patients experience less emotional distress, increased self-esteem, and greater
productivity (Prigatano, 1994; Prigatano et al., 1999).

Treatment for emotional difficulties includes psychological support for individuals and for
groups (Wilson et al., 2009). Individual psychological support is mostly derived from cog­
nitive behavior therapy, which is now very much part of neuropsychological rehabilitation
programs, particularly in the United Kingdom (Gracey et al., 2009). Tyerman and King

Page 16 of 26
Memory Disorders

(2004) provide suggestions on how to adapt psychotherapy and cognitive behavior thera­
py for those with memory problems. Notes, audiotapes and videotapes of sessions, fre­
quent repetitions, mini-reviews, telephone reminders to complete homework tasks, and
use of family members as co-therapists can all help to circumvent the difficulties posed by
impaired retention of the therapeutic procedures.

Group therapy can also be of great assistance in helping to reduce anxiety and other emo­
tional difficulties. Memory-impaired people often benefit from interaction with others
having similar problems. (p. 484) Those who fear they are losing their sanity may have
their fears allayed through the observation of others with similar problems. Groups can
reduce anxiety and distress; they can instill hope and show patients that they are not
alone; and it may be easier to accept advice from peers than from therapists or easier to
use strategies that peers are using rather than strategies recommended by professional
staff (Evans, 2009; Malley et al., 2009; Wilson, 2009).

Conclusions
1. Memory disorders can be classified in a number of ways, including the amount of
time for which information is stored, the type of information stored, the type of mate­
rial to be remembered, the modality being employed, the stages involved in the mem­
ory process, explicit and implicit memory, recall and recognition, retrospective and
prospective memory, and anterograde and retrograde amnesia.
2. Some recovery of memory functioning can be expected after an insult to the brain,
especially in the early days, weeks, and months after nonprogressive damage. Age at
insult, diagnosis, the number of insults sustained by the individual, and the premor­
bid status of the individual’s brain are just a few of the factors influencing recovery.
Some people will remain with lifelong memory impairment. There is no doubt that we
can improve on natural recovery through rehabilitation. Given the fact that restora­
tion of episodic, explicit memory is unlikely in most cases after the acute period,
compensatory approaches are the most likely to lead to change in everyday memory
functioning.
3. Before planning treatment for someone with memory difficulties, a detailed as­
sessment should take place. This should include a formal neuropsychological assess­
ment of all cognitive abilities, including memory, to build up a picture of a person’s
cognitive strengths and weaknesses. In addition, assessment of emotional and psy­
chosocial functioning should be carried out. Standardized tests should be comple­
mented with observations, interviews, and self-report measures.
4. Once the assessment has been carried out, one can design a rehabilitation pro­
gram. One of the major ways of helping people with memory problems cope in every­
day life is to enable them to compensate through the use of external aids; we can al­
so help them learn more efficiently, and for those who are very severely impaired, we
may need to structure or organize the environment to help them function without a
memory. We can also provide support and psychotherapy to address the emotional
consequences of memory impairment.
Page 17 of 26
Memory Disorders

5. Rehabilitation can help people to compensate for, bypass, or reduce their everyday
problems and thus survive more efficiently in their own most appropriate environ­
ments. Rehabilitation makes clinical and economic sense and should be widely avail­
able to all those who need it.

References
Atkinson, R. C., & Shiffrin, R. M. (1971). The control of short-term memory. Scientific
American, 225: 82–90.

Baddeley, A. D. (1986). Working memory. Gloucester, UK: Clarendon Press.

Baddeley, A. D. (1992). Memory theory and memory therapy. In B. A. Wilson & N. Moffat
(Eds.), Clinical management of memory problems (2nd ed., pp. 1–31). London: Chapman
& Hall.

Baddeley, A. D. (1997). Human memory: Theory and practice (revised edition). Hove: Psy­
chology Press.

Baddeley, A. D. (1999). Essentials of human memory. Hove: Psychology Press.

Baddeley, A. D. (2000). The episodic buffer: A new component of working memory?


Trends in Cognitive Sciences, 4 (11), 417–423.

Baddeley, A. D. (2004). The psychology of memory. In A. D. Baddeley, M. D. Kopelman, &


B. A. Wilson (Eds.), The essential handbook of memory disorders for clinicians. (pp. 1–14).
Chichester, UK: John Wiley & Sons.

Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The psychol­
ogy of learning and motivation: Advances in research and theory (pp. 47–89). New York:
Academic Press.

Baddeley, A. D., & Wilson, B. A. (1988a). Comprehension and working memory: A single
case neuropsychological study. Journal of Memory and Language, 27 (5), 479–498.

Baddeley, A. D., & Wilson, B. A. (1988b). Frontal amnesia and the dysexecutive syndrome.
Brain and Cognition, 7 (2), 212–230.

Baddeley, A. D., & Wilson, B. A. (1994). When implicit learning fails: Amnesia and the
problem of error elimination. Neuropsychologia, 32 (1), 53–68.

Berry, E. (2007). Using SenseCam, a wearable camera, to alleviate autobiographical mem­


ory loss. Talk given at the British Psychological Society Annual Conference, York, UK.

Brooks, D. N., & Baddeley, A. D. (1976). What can amnesic patients learn? Neuropsycholo­
gia, 14 (1), 111–122.

Page 18 of 26
Memory Disorders

Burgess, P. W., Veitch, E., de Lacy Costello, A., & Shallice, T. (2000). The cognitive and
neuroanatomical correlates of multitasking. Neuropsychologia, 38 (6), 848–863.

Camp, C. J., Bird, M., & Cherry, K. (2000). Retrieval strategies as a rehabilitation
(p. 485)

aid for cognitive loss in pathological aging. In R. D. Hill, L. Bäckman, & A. Stigsdotter-
Neely (Eds.), Cognitive rehabilitation in old age (pp. 224–248). New York: Oxford Univer­
sity Press.

Camp, C. J., Foss, J. W., Stevens, A. B., & O’Hanlon, A. M. (1996). Improving prospective
memory performance in persons with Alzheimer’s disease. In M. A. Brandimonte, G.O.
Einstein, & M. A. McDaniel (Eds.), Prospective memory: Theory and application (pp. 351–
367). Mahwah, NJ: Erlbaum.

Cermak, L. S., & O’Connor, M. (1983). The anterograde and retrograde retrieval ability of
a patient with amnesia due to encephalitis. Neuropsychologia, 21 (3), 213–234.

Clare, L. (2008). Neuropsychological rehabilitation and people with dementia. Hove, UK:
Psychology Press.

Clare, L., Wilson, B. A., Breen, K., & Hodges, J. R. (1999). Errorless learning of face-name
associations in early Alzheimer’s disease. Neurocase, 5, 37–46.

Clare, L., Wilson, B. A., Carter, G., Breen, K., Gosses, A., & Hodges, J. R. (2000). Interven­
ing with everyday memory problems in dementia of Alzheimer type: An errorless learning
approach. Journal of Clinical and Experimental Neuropsychology, 22 (1), 132–146.

Clare, L., & Woods, R. T. (2004) Cognitive training and cognitive rehabilitation for people
with early-stage Alzheimer’s disease: A review. Neuropsychological Rehabilitation, 14,
385–401.

Cohen, N. J., Eichenbaum, H., Deacedo, B. S., & Corkin, S. (1985). Different memory sys­
tems underlying acquisition of procedural and declarative knowledge. Annals of the New
York Academy of Sciences, 444, 54–71.

Corkin, S. (2002). What’s new with the amnesic patient H.M.? Nature Reviews Neuro­
science, 3, 153–160.

Della Sala, S., & Logie, R. H. (2002). Neuropsychological impairments of visual and spa­
tial working memory. In A. D. Baddeley, M. D. Kopelman, & B. A. Wilson (Eds.), The hand­
book of memory disorders (pp. 271–292). Chichester, UK: John Wiley & Sons.

Diana, R. A., Yonelinas, A. P., & Ranganath, C. (2007). Imaging recollection and familiarity
in the medial temporal lobe: a three-component model. Trends in Cognitive Sciences, 11,
379–386.

Emslie, H., Wilson, B. A., Quirk, K., Evans, J., & Watson, P. (2007). Using a paging system
in the rehabilitation of encephalitic patients. Neuropsychological Rehabilitation, 17, 567–
581.

Page 19 of 26
Memory Disorders

Evans, J. J. (2005). Can executive impairments be effectively treated? In P. W. Halligan &


D. Wade (Eds.), The effectiveness of rehabilitation for cognitive deficits (pp. 247–256). Ox­
ford, UK: Oxford University Press.

Evans, J. J. (2009). The cognitive group part two: Memory. In B. A. Wilson, F. Gracey, J. J.
Evans, & A. Bateman (Eds.), Neuropsychological rehabilitation: Theory, therapy and out­
comes (pp. 98–111). Cambridge, UK: Cambridge University Press.

Evans, J. J., Wilson, B. A., Needham, P., & Brentnall, S. (2003). Who makes good use of
memory aids: Results of a survey of 100 people with acquired brain injury. Journal of the
International Neuropsychological Society, 9 (6), 925–935.

Fish, J., Manly, T., Emslie, H., Evans, J. J., & Wilson, B. A. (2008). Compensatory strategies
for acquired disorders of memory and planning: Differential effects of a paging system for
patients with brain injury of traumatic versus cerebrovascular aetiology. Journal of Neu­
rology, Neurosurgery and Psychiatry, 79, 930–935.

Fish, J., Wilson, B. A., & Manly, T. (2010). The assessment and rehabilitation of prospec­
tive memory problems in people with neurological disorders: A review. Neuropsychologi­
cal Rehabilitation, 20, 161–179.

Fletcher, P. C., & Henson, R. N. A. (2001). Frontal lobes and human memory: Insights
from functional neuroimaging. Brain, 124, 849–881.

Glisky, E. L., Schacter, D. L., & Tulving, E. (1986). Computer learning by memory-im­
paired patients: Acquisition and retention of complex knowledge. Neuropsychologia, 24
(3), 313–328.

Gracey, F., Yeates, G., Palmer, S., & Psaila, K. (2009) The psychological support group. In
B. A. Wilson, F. Gracey, J. J. Evans, & A. Bateman (Eds.), Neuropsychological Rehabilita­
tion: Theory, therapy and outcomes (pp. 123–137). Cambridge, UK: Cambridge University
Press.

Graham, K. S., Simons, J.S., Pratt, K. H., Patterson, K., & Hodges, J. R. (2000). Insights
from semantic dementia on the relationship between episodic and semantic memory. Neu­
ropsychologia, 38, 313–324.

Groot, Y. C. T., Wilson, B. A., Evans, J. J., & Watson, P. (2002). Prospective memory func­
tioning in people with and without brain injury. Journal of the International Neuropsycho­
logical Society, 8 (05), 645–654.

Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. Chich­


ester, UK: Wiley.

Henson, R. N. (2009). Priming. In L. Squire, T. Albright, F. Bloom, F. Gage & N. Spitzer


(Eds.), New encyclopedia of neuroscience (pp. 1055–1063). Online: Elsevier.

Page 20 of 26
Memory Disorders

Hirono, N., Mori, E., Ikejiri, Y., Imamura, T., Shimomura, T., Ikeda, M., et al. (1997). Pro­
cedural memory in patients with mild Alzheimer’s disease. Dementia and Geriatric Cogni­
tive Disorders, 8 (4), 210–216.

Hunkin, N. M., Squires, E. J., Parkin, A. J., & Tidy, J. A. (1998). Are the benefits of error­
less learning dependent on implicit memory? Neuropsychologia, 36 (1), 25–36.

Jennett, B., & Bond, M. (1975). Assessment of outcome after severe brain damage.
Lancet, 1 (7905), 480–484.

Kapur, N. (1993). Focal retrograde amnesia in neurological disease: A critical review. Cor­
tex; a Journal Devoted to the Study of the Nervous System and Behavior, 29 (2), 217–234.

Kapur, N. (1999). Syndromes of retrograde amnesia: A conceptual and empirical synthe­


sis. Psychological Bulletin, 125, 800–825.

Kapur, N., Glisky, E. L., & Wilson, B. A. (2004). Technological memory aids for people with
memory deficits. Neuropsychological Rehabilitation, 14 (1/2), 41–60.

Keane, M. M., Gabrieli, J. D., Mapstone, H. C., Johnson, K. A., & Corkin, S. (1995). Double
dissociation of memory capacities after bilateral occipital-lobe or medial temporal-lobe le­
sions. Brain, 119, 1129–1148.

Kessels, R. P. C., Boekhorst, S. T., & Postma, A. (2005). The contribution of implicit and ex­
plicit memory to the effects of errorless learning: A comparison between young and older
adults. Journal of the International Neuropsychological Society, 11 (2), 144–151.

Kessels, R. P. C., & de Haan, E. H. F. (2003). Implicit learning in memory rehabilitation: A


meta-analysis on errorless learning and vanishing cues methods. Journal of Clinical and
Experimental Neuropsychology, 25 (6), 805–814.

Kime, S. K., Lamb, D. G., & Wilson, B. A. (1996). Use of a comprehensive program of ex­
ternal cuing to enhance procedural memory in a patient with dense amnesia. Brain Injury,
10, 17–25.

Klingberg, T. (2006). Development of a superior frontalintraparietal network for


(p. 486)

visuo-spatial working memory. Neuropsychologia, 44 (11), 2171–2177.

Klingberg, T., Fernell, E., Olesen, P., Johnson, M., Gustafsson, P., Dahlström, K., Gillberg,
C. G., Forssberg, H., & Westerberg, H. (2005). Computerized training of working memory
in children with ADHD—a randomized, controlled trial. Journal of the American Academy
of Child and Adolescent Psychiatry, 44 (2), 177–186.

Kolb, B. (1995). Brain plasticity and behaviour. Hillsdale, NJ: Erlbaum.

Kolb, B. (2003). Overview of cortical plasticity and recovery from brain injury. Physical
Medicine and Rehabilitation Clinics of North America, 14 (1), S7–S25.

Page 21 of 26
Memory Disorders

Kopelman, M. D. (2000). Focal retrograde amnesia and the attribution of causality: An ex­
ceptionally critical view. Cognitive Neuropsychology, 17 (7), 585–621.

Kopelman, M. D. (2004). Psychogenic amnesia. In A. D. Baddeley, M. D. Kopelman, & B. A.


Wilson (Eds.), The essential handbook of memory disorders for clinicians (pp. 69–90).
Chichester, UK: Wiley.

Kopelman, M. D., Bright, P., Buckman, J., Fradera, A., Yoshimasu, H., Jacobson, C., &
Colchester, A. C. F. (2007). Recall and recognition memory in amnesia: Patients with hip­
pocampal, medial temporal, temporal lobe or frontal pathology. Neuropsychologia, 45,
1232–1246.

Kopelman, M. D., & Stanhope, N. (1998). Recall and recognition memory in patients with
focal frontal, temporal lobe and diencephalic lesions. Neuropsychologia, 37, 939–958.

Lambon Ralph, M. A., & Patterson, K. (2008). Generalization and differentiation in seman­
tic memory: Insights from semantic dementia. Annals of the New York Academy of
Sciences, 1124, 61–76.

Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In
M. M. Gruneberg, P. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–
632). London: Academic Press.

LeVere, T. E. (1980). Recovery of function after brain damage: A theory of the behavioral
deficit. Physiological Psychology, 8, 297–308.

Malley, D., Bateman, A., & Gracey, F. (2009). Practically based project groups. In B. A.
Wilson, F. Gracey, J. J. Evans, & A. Bateman (Eds.), Neuropsychological rehabilitation:
Theory, therapy and outcomes (pp. 164–180). Cambridge, UK: Cambridge University
Press.

McClelland, J. L., Thomas, A. G., McCandliss, B. D., & Fiez, J. A. (1999). Understanding
failures of learning: Hebbian learning, competition for representational space, and some
preliminary experimental data. Progress in Brain Research, 121, 75–80.

McDonald, C., Bauer, R., Filoteo, J., Grande, L., Roper, S., & Gilmore, R. (2006). Episodic
memory in patients with focal frontal lobe lesions. Cortex, 42, 1080–1092.

McKitrick, L. A., & Camp, C. J. (1993). Relearning the names of things: The spaced-re­
trieval intervention implemented by a caregiver. Clinical Gerontologist, 14 (2), 60–62.

Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our ca­
pacity for processing information. Psychological Review, 63 (2), 81–97.

Milner, B. (1965). Visually-guided maze learning in man: Effects of bilateral hippocampal,


bilateral frontal, and unilateral cerebral lesions. Neuropsychologia, 3 (3), 17–338.

Page 22 of 26
Memory Disorders

Milner, B. (1968). Visual recognition and recall after right temporal lobe excision in man.
Neuropsychologia, 6, 191–209.

Milner, B. (1971). Interhemispheric differences in the localisation of psychological


processes in man. British Medical Bulletin, 27 (3), 272–277.

Mitchell, D. B., & Schmitt, F. A. (2006). Short- and long-term implicit memory in aging
and Alzheimer’s disease. Neuropsychology, Development, and Cognition. Section B, Ag­
ing, Neuropsychology and Cognition, 13 (3-4), 611–635.

Moffat, N. (1989). Home-based cognitive rehabilitation with the elderly. In L. W. Poon, D.


C. Rubin, & B. A. Wilson (Eds.), Everyday cognition in adulthood and later life (pp. 659–
680). Cambridge, UK: Cambridge University Press.

Mummery, C. J., Patterson, K., Price, C. J., Ashburner, J., Frackowiak, R. S. J., & Hodges, J.
R. (2000). A voxelbased morphometry study of semantic dementia: relationship between
temporal lobe atrophy and semantic memory. Annals of Neurology, 47, 36–45.

Osman, M., Wilkinson, L., Beigi, M., Castaneda, C. S., & Jahanshahi, M. (2008). Patients
with Parkinson’s disease learn to control complex systems via procedural as well as non-
procedural learning. Neuropsychologia, 46 (9), 2355–2363.

Page, M., Wilson, B. A., Shiel, A., Carter, G., & Norris, D. (2006). What is the locus of the
errorless-learning advantage? Neuropsychologia, 44 (1), 90–100.

Prigatano, G. P. (1999). Principles of neuropsychological rehabilitation. New York: Oxford


University Press.

Prigatano, G. P., Klonoff, P. S., O’Brien, K. P., Altman, I. M., Amin, K., Chiapello, D., et al.
(1994). Productivity after neuropsychologically oriented milieu rehabilitation. Journal of
Head Trauma Rehabilitation, 9 (1), 91.

Scherer, M. (2005). Assessing the benefits of using assistive technologies and other sup­
ports for thinking, remembering and learning. Disability and Rehabilitation, 27 (13), 731–
739.

Shallice, T., & Burgess, P. W. (1991). Higher-order cognitive impairments and frontal lobe
lesions in man. In H. S. Levin, H. M. Eisenberg, & A. L. Benton (Eds.), Frontal lobe func­
tion and dysfunction (pp. 125–138). New York: Oxford University Press.

Snowden, J. S. (2002). Disorders of semantic memory. In A. D. Baddeley, M. D. Kopelman,


& B. A. Wilson (Eds.), The handbook of memory disorders. (pp. 293–314). Chichester, UK:
John Wiley & Sons.

Snowden, J. S., Neary, D., Mann, D. M., Goulding, P. J., & Testa, H. J. (1992). Progressive
language disorder due to lobar atrophy. Annals of Neurology, 31 (2), 174–183.

Page 23 of 26
Memory Disorders

Sohlberg, M. M. (2005). External aids for management of memory impairment. In W.


High, A. Sander, K. M. Struchen, & K. A. Hart (Eds.), Rehabilitation for traumatic brain in­
jury (pp. 47–70). New York: Oxford University Press.

Squire, L. R., & Zola-Morgan, S. (1988). Memory: brain systems and behavior. Trends in
neurosciences, 11 (4), 170–175.

Stilwell, P., Stilwell, J., Hawley, C., & Davies, C. (1999). The national traumatic brain in­
jury study: Assessing outcomes across settings. Neuropsychological Rehabilitation, 9 (3),
277–293.

Stuss D. T., & Alexander, M. P. (2005). Does damage to the frontal lobes produce impair­
ment in memory? Current Directions in Psychological Science, 14, 84–88.

Sundberg, N. D., & Tyler, L. E. (1962). Clinical psychology. New York: Appleton-Century-
Crofts.

Tailby, R., & Haslam, C. (2003). An investigation of errorless learning in memory-impaired


patients: improving the technique and clarifying theory. Neuropsychologia, 41 (9), 1230–
1240.

Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson


(p. 487)

(Eds.), Organization of memory (pp. 381–402). New York: Academic Press.

Tulving, E. (1995). Organization of memory: Quo vadis? In M. S. Gazzaniga (Ed.), The cog­
nitive neurosciences (pp. 839–847). Cambridge, MA: MIT Press.

Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology,
53, 1–25.

Tyerman, A., & King, N. (2004). Interventions for psychological problems after brain in­
jury. In L. H. Goldstein & J. E. McNeil (Eds.), Clinical neuropsychology: A practical guide
to assessment and management for clinicians (pp. 385–404). Chichester, UK: John Wiley
& Sons.

Vakil, E., & Herishanu-Naaman, S. (1998). Declarative and procedural learning in


Parkinson’s disease patients having tremor or bradykinesia as the predominant symptom.
Cortex; a Journal Devoted to the Study of the Nervous System and Behavior, 34 (4), 611–
620.

Vallar, G., & Papagno, C. (2002). Neuropsychological impairments of verbal short-term


memory. In Handbook of memory disorders (2nd ed., pp. 249–270). Chichester, UK: Wiley.

Warrington, E. K. (1975). The selective impairment of semantic memory. Quarterly Jour­


nal of Experimental Psychology, 27 (4), 635–657.

Warrington, E. K., & Weiskrantz, L. (1968). A study of learning and retention in amnesic
patients. Neuropsychologia, 6 (3), 283–291.

Page 24 of 26
Memory Disorders

Wearing, D. (2005). Forever today: A memoir of love and amnesia. London: Doubleday.

Westerberg, H., Jacobaeus, H., Hirvikoski, T., Clevberger, P., Ostensson, M. L., Bartfai, A.,
& Klingberg, T. (2007). Computerized working memory training after stroke—a pilot
study. Brain Injury, 21 (1), 21–29.

Williams, W. H., & Evans, J. J. (2003). Biopsychosocial approaches in neurorehabilitation:


Assessment and management of neuropsychiatric, mood and behaviour disorders. Special
Issue of Neuropsychological Rehabilitation, 13, 1–336.

Wilson, B. A. (1991). Long-term prognosis of patients with severe memory disorders. Neu­
ropsychological Rehabilitation, 1 (2), 117–134.

Wilson, B. A. (1997). Semantic memory impairments following non progressive brain in­
jury a study of four cases. Brain Injury, 11 (4), 259–270.

Wilson, B. A. (1999). Case studies in neuropsychological rehabilitation (p. 384). New York:
Oxford University Press.

Wilson, B. A. (2004). Assessment of memory disorders. In A. D. Baddeley, M. D. Kopel­


man, & B. A. Wilson (Eds.), The essential handbook of memory disorders for clinicians
(pp. 159–178). Chichester, UK: John Wiley & Sons.

Wilson, B. A. (2009). Memory rehabilitation: Integrating theory and practice. New York:
Guilford Press.

Wilson, B. A., & Baddeley, A. D. (1993). Spontaneous recovery of impaired memory span:
Does comprehension recover? Cortex, 29 (1), 153–159.

Wilson, B. A., Baddeley, A. D., & Young, A. W. (1999). LE, a person who lost her “mind’s
eye.” Neurocase, 5 (2), 119–127.

Wilson, B. A., Emslie, H. C., Quirk, K., & Evans, J. J. (2001). Reducing everyday memory
and planning problems by means of a paging system: A randomised control crossover
study. Journal of Neurology, Neurosurgery and Psychiatry, 70 (4), 477–482.

Wilson, B. A., Emslie, H., Quirk, K., Evans, J., & Watson, P. (2005). A randomised control
trial to evaluate a paging system for people with traumatic brain injury. Brain Injury, 19,
891–894.

Wilson, B. A., & Evans, J. J. (2000). Practical management of memory problems. In G. E.


Berrios & J. R. Hodges (Eds.), Memory disorders in psychiatric practice (pp. 291–310).
Cambridge, UK: Cambridge University Press.

Wilson, B. A., Evans, J. J., Emslie, H., & Malinek, V. (1997). Evaluation of NeuroPage: A
new memory aid. Journal of Neurology, Neurosurgery and Psychiatry, 63, 113–115.

Wilson, B. A., Evans, J. J., Gracey, F., & Bateman, A. (2009). Neuropsychological rehabilita­
tion: Theory, therapy and outcomes. Cambridge, UK: Cambridge University Press.
Page 25 of 26
Memory Disorders

Wilson, B. A., Fish, J., Emslie, H. C., Evans, J. J., Quirk, K., & Watson, P. (2009). The Neu­
roPage system for children and adolescents with neurological deficits. Developmental
Neurorehabilitation, 12, 421–426.

Wilson, B. A., Kopelman, M. D., & Kapur, N. (2008). Prominent and persistent loss of self-
awareness in amnesia: Delusion, impaired consciousness or coping strategy? Neuropsy­
chological Rehabilitation, 18, 527–540.

Ylvisaker, M., & Feeney, T. (2000). Reconstruction of identity after brain injury. Brain Im­
pairment, 1 (1), 12–28.

Notes:

(1) . The two sensory memory systems most studied are visual or iconic memory and audi­
tory or echoic memory. People with disorders in the sensory memory systems would usu­
ally be considered to have a visual or an auditory perceptual disorder rather than a mem­
ory impairment and thus are beyond the scope of this chapter.

Barbara Wilson

Barbara A. Wilson, MRC Cognition and Brain Sciences Unit, Cambridge, MA

Jessica Fish

Jessica Fish, MRC Cognition and Brain Sciences Unit, Cambridge, UK

Page 26 of 26
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing

Cognitive Neuroscience of Written Language: Neural


Substrates of Reading and Writing  
Kyrana Tsapkini and Argye Hillis
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0024

Abstract and Keywords

Spelling and reading are evolutionarily relatively new functions, and therefore it is plausi­
ble that they are accomplished by engaging neural networks initially devoted to other
functions, such as object recognition in the case of reading. However, there are unique
aspects of these complex tasks, such as the fact that certain words (e.g., “regular” words)
never previously encountered can often be read accurately. Furthermore, spelling in
many ways seems to be simply the reverse of reading, but the relationship is not quite so
simple, at least in English. For example, the spoken word “lead” can be spelled led or
lead, but the printed word lead can be pronounced like “led” or “lead” (rhyming with
“feed”). Therefore, there may be some unique areas of the brain devoted to certain com­
ponents of reading or spelling. This chapter reviews the cognitive processes underlying
these tasks as well as areas of the brain that are thought to be necessary for these com­
ponent processes (e.g., on the basis of individuals who are impaired in each component
because of lesions in a particular area) and areas of the brain engaged in each compo­
nent on the basis of functional imaging studies showing neural activation associated with
a particular type of processing.

Keywords: reading, writing, dyslexia, dysgraphia, neuroimaging

Introduction
In this chapter, we tackle the issue of correspondence between cognitive and neural sub­
strates underlying the processes of reading and spelling. The main questions we explore
are how the brain reads and spells and how the cognitive architecture corresponds to the
neural one. We focus our discussion on the role of the following areas of the left hemi­
sphere that have been found to be involved in the processes of reading and spelling in le­
sion and functional neuroimaging studies: the inferior frontal gyrus (Brodmann area [BA]
44/45), fusiform gyrus (inferior and mesial part of BA 37), angular gyrus (BA 39), supra­
marginal gyrus (BA 40), and superior temporal gyrus (BA 22). We also discuss the role of

Page 1 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
some other areas that seem to be involved only in reading or spelling. The particular is­
sues we address are (1) the distinction between areas that are engaged in, versus neces­
sary for, reading and spelling, as shown by functional neuroimaging and lesion studies;
(2) the unimodality versus multimodality of each area; and (3) whether areas are dedicat­
ed to one or more cognitive processes. The question of common regions for reading and
spelling is not the focus of this chapter, but we summarize recent papers that deal with
this particular issue (Philipose et al., 2007; Rapcsak, 2007; Rapp & Lipka, 2011). First, we
present what is known about the cognitive architecture of reading (recognition/compre­
hension) and spelling (production) as derived from classic case studies in the tradition of
cognitive neuropsychology as well as relative computational (nonrepresentational) mod­
els, briefly exposing the rationale for postulating each cognitive module or process. Sub­
sequently, we connect what we know about neural processes (p. 492) in reading and writ­
ing with the cognitive processes described earlier.

Cognitive Architecture of Reading and Spelling


Investigators have tried to integrate the processes of reading and spelling of the literate
mind in a single cognitive model, in both cognitive neuropsychological accounts (see Colt­
heart et al., 1980; Ellis & Young, 1988) and connectionist accounts (see Seidenberg & Mc­
Clelland, 1989; Plaut & Shallice, 1993). In these models, each cognitive component pro­
posed is derived from evidence from neurologically impaired subjects. Although compo­
nents in these models are depicted as independent, there is ample evidence that these
modules interact and that the processing is parallel rather than strictly serial. Figure 24.1
represents a cognitive model of reading and spelling according to a representational ac­
count. We chose an integrative model for reading and spelling because we will not ex­
plore the issue of shared versus independent components between reading and spelling
that has been addressed elsewhere (see Tainturier & Rapp, 2001, for a discussion).

Page 2 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing

Figure 24.1 A representational model of reading and


spelling.

Although both representational and connectionist models represent dynamic systems, es­
pecially in their more recent versions, the main difference between them is that in repre­
sentational accounts, reading and spelling of familiar versus unfamiliar words are accom­
plished through two distinct routes or mechanisms, whereas in connectionist architec­
ture, they are accomplished through the same process. The two routes in the representa­
tional account are a lexical route, where stored (learned) representations of the words
(phonological, semantic, and orthographic) may be accessed and available for reading or
spelling; and a sublexical route, which converts graphemes (abstract letter identities) to
phonemes (speech sounds) in reading and phonemes to graphemes in writing. The lexical
route is proposed to be used for reading and spelling irregular and low-frequency words.
For example, to read yacht, one would access the orthographic representation (learned
spelling) of yacht, the semantic representation (learned meaning) of yacht, and the
phonological representation (learned pronunciation) of yacht. The sublexical (orthogra­
phy-to-phonology conversion) route is proposed to be used for decoding nonwords or
first-encountered words, such as unfamiliar proper names. Both routes may interact and
operate in parallel; that is, there is likely to be functional interaction between processes
and components (Hillis & Caramazza, 1991; Rapp et al., 2002). Hence, the sublexical
mechanisms (e.g., grapheme-to-phoneme conversion) might contribute (along with se­
mantic information) to accessing stored lexical representations for output.

In a connectionist architecture, on the other hand, both words and nonwords (or unfamil­
iar words) are read and spelled through the same mechanisms (Seidenberg & McClel­
land, 1989; Plaut & Shallice, 1993). An important feature of these models is that in order
to read or write any word, the system does not need to rely on any previous representa­
tion of the word or on any memory trace, but instead relies on the particular algorithm of
interaction between phonological units, semantic units, and orthographic units, and the
on frequency with which they are activated together.

Page 3 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
The issue of which architecture better explains the behavioral findings from both normal
participants and neurologically impaired patients has generated debate over the past
decades but has also helped to further our understanding of the cognitive architecture of
reading and writing. We do not address the issue here. For heuristic reasons, we adopt a
representational model to consider the neural substrates of reading and spelling, but we
assume, consistent with computational models, that the component processes are interac­
tive and operate in parallel. We also assume, as in computational models, that the repre­
sentations are distributed and overlapping. That is, the phonological representation of
“here” overlaps considerably with the phonological representations of “fear” and “heap,”
and completely with the phonological representation of “hear.” Before discussing evi­
dence for the neural substrates of reading and spelling, we (p. 493) sketch out the basic
cognitive elements that have been proposed.

Reading Mechanisms

The reading process (see Figure 24.1) begins with access to graphemes from letter
shapes, with specific font, case, and so on. Once individual graphemes are accessed, they
comprise a case-, font-, and orientation-independent graphemic description, which is held
by a temporary working memory component (sometimes called the graphemic buffer).
From here, representational models split processing into the lexical and sublexical
routes. The sublexical route, used to read nonwords or first-encountered letter strings be­
fore they acquire a memory trace (orthographic representations), is a sublexical mecha­
nism that transforms graphemes to phonemes. The lexical mechanism, on the other hand,
consists of three components: the orthographic lexicon, which is a long-term memory
storage of orthographic representations of words that have been encountered before; the
semantic system, which contains the semantic representations (meanings) of these
words; and the phonological lexicon, which is a long-term memory storage of the phono­
logical representations of words that have been encountered before. During reading, both
these computational routes end up at the peripheral component of converting phonemes
to motor plans for articulation, so that the word or nonword can be read aloud. As men­
tioned earlier, the independence of these components was inferred from lesion studies, in
which patients showed patterns of performance that could be explained by assuming se­
lective damage to a single component. Patients with selective deficits at the level of the
orthographic lexicon cannot access lexical orthographic representations; that is, cannot
read or understand familiar words, but they understand the same words when they hear
them (Patterson et al., 1985). Patients with selective deficits at the semantic level, on the
other hand, cannot understand either spoken or written familiar words (Howard et al.,
1984). Patients with selective deficits at the level of the phonological lexicon can under­
stand written words but make word selection errors when trying to pronounce them (Bub
& Kertesz, 1982; Hillis & Caramazza, 1990; Rapp et al., 1997). Patients with deficits in
the sublexical grapheme-to-phoneme conversion mechanisms are unable to read pseudo­
words or unfamiliar words but can read correctly familiar words (Beauvois & Derousne,
1979; Funnell, 1996; Marshall & Newcombe, 1973).

Page 4 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Spelling Mechanisms

There are two hypotheses regarding the cognitive computations of spelling: In the first,
they are described as the reverse of reading, and in the second, reading and spelling
share only the semantic system. According to the first account (also called the shared-
components account), the orthographic and phonological lexicons in spelling are the
same ones used in reading, whereas the second account (also called the independent-
components account) posits that phonological and orthographic lexicons used in spelling
are independent from reading. The issue of whether reading and spelling share the same
components has been a controversial issue. The existence of patients with both associa­
tions and dissociations between their reading and spelling does not prima facie favor one
or the other account. The issue of whether reading and writing share the same computa­
tional components is discussed in great detail by Hillis and Rapp (2004) for each mecha­
nism: the semantic system, the orthographic lexicon, the sublexical mechanism, the
graphemic buffer, and the representation of letter shapes.

In either case, spelling to dictation starts off with a peripheral phonemic analysis system
that distinguishes the sounds of one’s own language from other languages or environmen­
tal sounds, by matching the sounds to learned phonemes. Then, there are again perhaps
two routes: the lexical route and the sublexical route. In the lexical route, the sequence of
phonemes accesses the phonological lexicon, the suppository of stored representations of
spoken words, so that the heard word is recognized. The phonological representation
then accesses the semantic system that evokes the meaning of the word. Finally, the
stored orthographic representation of the word is accessed from the orthographic lexi­
con. In the sublexical route, the phoneme-to-grapheme conversion mechanism allows for
the heard word or nonword to be transformed to a sequence of corresponding
graphemes. Both routes, then, end up at the graphemic buffer, a temporary storage
mechanism for sequences of graphemes, which “holds” the position and identity of letters
while each letter is written or spelled aloud. Finally, graphemes (abstract letter identities)
are converted to letter shapes, with specific case, font, and so on, by evoking motor
schemes for producing the letter.

Neural Correlates of Reading and Writing


In this section we discuss evidence from both lesion-deficit correlation studies in patients
with (p. 494) impairments in reading or spelling (including studies using voxel-based mor­
phometry and diffusion tensor imaging [DTI]) as well as data from functional neuroimag­
ing (positron emission tomography [PET], functional magnetic resonance imaging [fMRI],
and magnetoencephalography [MEG]). Despite the appeal of the enterprise, there re­
mains much to be learned about the neural correlates of cognitive processes. We attempt
in this chapter to review evidence that certain areas are involved in reading and writing.
We examine data from both lesion-deficit and functional neuroimaging studies in order to
specify the particular role of these areas in particular cognitive processes. We start with
the role of the inferior occipital-temporal cortex and fusiform gyrus in particular (BA 37),

Page 5 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
then continue with the inferior frontal gyrus (IFG) and in particular BA 44/45, as well as
other perisylvian areas, such as the superior temporal gyrus (BA 22); then we discuss the
role of more controversial areas such as the angular gyrus (BA 39) and the supramarginal
gyrus/sulcus (BA 40). We end this discussion with the role of areas more peripheral to the
process areas, such as premotor BA 6 (including so-called Exner’s area). The general lo­
cations of these areas are shown in Figure 24.2.

Figure 24.2 General locations of the Brodmann ar­


eas involved in reading and spelling. Red circles
represent areas important for reading and spelling;
blue circle represents area important for spelling on­
ly. Note that there is substantial variation in the pre­
cise location of cytoarchitectural fields (as well as in
the shapes and sizes of brains) across individuals.

Because we will discuss brain areas involved in reading and spelling, it is important to
clarify the unique contribution of each type of study, hence the distinction between areas
“necessary for” and those “engaged in” a given cognitive process. Lesion and functional
neuroimaging studies answer different types of questions regarding the relationship be­
tween brain areas and cognitive processes. The main question asked in lesion studies is
whether a brain area is necessary for a certain type of processing. If an area is necessary
for a particular aspect of reading or spelling, then a patient with damage to that area will
not be able to perform the particular type of processing. However, areas that appear to be
activated in functional neuroimaging studies in normal subjects may well be involved in,
but are not always necessary for, the particular type of processing. In the following sec­
tions we discuss some brain areas found to be involved in reading and writing in a variety
of studies such as lesion-deficit correlation studies, functional neuroimaging studies with
normal control subjects, and functional neuroimaging studies with patients.

Fusiform Gyrus (BA 37)

The fusiform gyrus (part of BA 37) is the area that has been associated with written lan­
guage and has dominated both the neuropsychological and the neuroimaging literature
more than any other area. In this section we summarize and update the available evi­
Page 6 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
dence from both the lesion-deficit and the functional neuroimaging literature. We then
discuss possible interpretations of the role of the fusiform in reading and spelling.

Several lesion-deficit studies have shown that dysfunction or lesion at the left fusiform
gyrus or at the left inferior-temporal gyrus (lateral BA 37) causes disrupted access to or­
thographic representations in oral reading or spelling in both acute and chronic stroke
patients (Gailard et al., 2006; Hillis et al., 2001, 2004; Patterson & Kay, 1982; Philipose et
al., 2007; Rapcsak & Beeson, 2004; Rapcsak et al., 1990; Tsapkini & Rapp, 2010). In par­
ticular, Rapcsak and Beeson (2004) examined eight patients after chronic stroke that had
caused a lesion in BA 37 and BA 20 (mid- and anterior fusiform gyrus) who showed read­
ing and writing impairments. The importance of this region in oral reading of words and
pseudowords as well as in oral and written naming was also confirmed in acute lesion
studies (Hillis et al., 2004; Philipose et al., 2007). Philipose and colleagues (2007)
reported deficits in oral reading and spelling of words and pseudowords in sixty-nine cas­
es of acute stroke. Their analyses showed that dysfunction of BA 37 (and BA 40, as we
discuss later), as demonstrated by hypoperfusion, was strongly correlated with impair­
ments of oral reading and spelling of both words and pseudowords. In general, lesion
studies, whether they refer to reading (p. 495) only or spelling only (Hillis et al., 2002; Pat­
terson & Kay, 1982; Rapcsak et al., 1990), or both (Hillis et al., 2005; Philipose et al.,
2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2009), confirmed that dysfunction in left BA
37 results in impairment in written language output (oral reading and spelling), indicat­
ing that function in this area is necessary for some aspect of these tasks.

However, in most lesion-deficit studies the lesions are large, and in most (but not all) cas­
es, there are other areas that are dysfunctional that might contribute to the deficit. Two
recent studies (Gaillard et al., 2006; Tsapkini & Rapp, 2009) address this issue by describ­
ing patients with focal lesions that do not involve extended posterior visual areas known
to be responsible for general visual processing, and they offer additional evidence of the
importance of the fusiform gyrus in reading and spelling. Gaillard and colleagues (2006)
reported on a patient who exhibited impairment in reading but not spelling after a focal
lesion to the posterior portion of the left fusiform gyrus. The deficit in this patient was in­
terpreted as resulting from a disconnection between the early reading areas of the poste­
rior fusiform and abstract orthographic representations whose access might require mid-
fusiform gyrus. Similar “disconnection” accounts of pure alexia (also called letter-by-let­
ter reading or alexia without agraphia) after left occipital-temporal or occipital-splenial le­
sions. usually due to posterior cerebral artery stroke, have been given since the 1880s
(Binder et al., 1992; Cohen et al., 2003; Déjerine, 1891; Epelbaum et al., 2008; Gaillard et
al., 2006; Leff et al., 2006).

In contrast to cases of pure alexia, many studies have indicated that damage in the mid-
fusiform results in impairments in both oral reading and spelling (Hillis et al., 2005; Phili­
pose et al., 2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2010). For example, in a case re­
port of a patient with a focal lesion in this area after resection of the more anterior mid-
fusiform gyrus, Tsapkini and Rapp (2009) showed that orthographic representations as
well as access to word meanings from print were disrupted for both reading and spelling.

Page 7 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Furthermore, except for the impairment in orthographic processing of words (but not of
nonwords), there was no impairment in other types of visual processing such as the pro­
cessing of faces and objects. It is striking that the localization of reading and spelling is
the same in all studies in which both reading and spelling are affected (Hillis et al., 2005;
Philipose et al., 2007; Rapcsak et al., 2004; Tsapkini & Rapp, 2009). The above findings
lend support to the claim that the mid-fusiform gyrus is a necessary area for orthographic
processing (but see Price & Devlin, 2003, below, for a different view, as discussed later).

Functional neuroimaging studies have also found that reading words and nonwords, or
simply viewing letters relative to other visual stimuli, activates the left (greater than
right) mid-fusiform gyri. The area of activation is remarkably consistent across studies
and across printed stimuli, irrespective of location, font, or size (Cohen et al., 2000, 2002;
Dehaene et al., 2001, 2002; Gros et al., 2001; Polk et al., 2002; Polk & Farah, 2002; Price
et al., 1996; Puce et al., 1996; Uchida et al., 1999; see also Cohen & Dehaene, 2004, for
review). Some investigators have taken these data as evidence that the left mid-fusiform
gyrus has a specialized role in computing location- and font-independent visual word
forms (Cohen et al., 2003; McCandliss et al., 2003). This area has been labeled the visual
word form area (VWFA; Cohen et al., 2000, 2002, 2004a, 2004b). However, based on acti­
vation of the same or a nearby area during a variety of nonreading lexical tasks, and on
the fact that lesions in this area are associated with a variety of lexical deficits, other in­
vestigators have objected to the label. These opponents of the label of VWFA have pro­
posed that the left mid-fusiform gyrus has a specialized role in modality-independent lexi­
cal processing, rather than in reading alone (Büchel, Price, & Friston, 1998; Price et al.,
2003; Price & Devlin, 2003, 2004; see below for discussion).

Recently there have been further specifications of the VWFA proposal. Recent findings
suggest that there is a hierarchical organization in the fusiform gyrus and that word
recognition is a process accomplished from posterior-to-anterior direction in the occipital-
temporal cortex, which tunes progressively to more specified elements starting from ele­
mentary visual features to letters, bigrams, and finally whole words (Cohen et al., 2008;
Dehaene et al., 2005; Vinckier et al., 2007). This proposal is accompanied by a proposal
about the experiential attenuation of the occipital-temporal cortex. Because reading and
writing area very recent evolutionary skills, it would be hard to argue for an area “set
aside” for orthographic processing only. Therefore, the proposal is that reading expertise
develops though experience and fine-tunes neural tissue that was supposed to achieve de­
tailed visual processing (Cohen et al., 2008; McCandliss et al., 2003). Supporting evi­
dence for this proposal comes from the developmental literature as well as from
Aghababian and Nazir, 2000, who reported (p. 496) increases in reading accuracy and
speed with age and maturation. The experiential attenuation of the occipital-temporal
cortex with experience may also be the reason we see different subdivisions of the
fusiform involved in word versus pseudoword reading, or even in the frequency effects
observed (Binder et al., 2006; Bruno et al., 2008; Dehaene et al., 2005; Glezer et al., 2009;
Kronbichler et al., 2004, 2007; Mechelli et al., 2003). This familiarity effect in the fusiform
has also been found for writing (Booth et al., 2003; Norton et al., 2007), confirming a role
for the fusiform in both reading and spelling. Recent functional neuroimaging findings of
Page 8 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
reading and spelling in the same individuals confirmed that the mid-fusiform is more sen­
sitive to words than letter strings and to low- than high-frequency words, showing the
sensitivity of this area to lexical orthography for both reading and spelling (Rapp & Lipka,
2011).

A different interpretation of these results has been offered. Price and Devlin (2003) have
rigorously questioned the notion that this area is dedicated to orthographic processing
only, by arguing that this area is also found to be activated in hearing or speaking (Price
et al., 1996, 2005, 2006), undermining any claims about orthographic specificity. Other
studies have shown that the VWFA is activated when subjects process nonorthographic vi­
sual patterns, and investigators have claimed that this area is not dedicated to ortho­
graphic processing per se but rather to complex visual processing (Ben-Shachar et al.,
2007; Joseph et al., 2006; Starrfelt & Gerlach, 2007; Wright et al., 2008). Others have
found that this area may selectively process orthographic forms, but not exclusively (Bak­
er et al., 2007; Pernet et al., 2005; Polk et al., 2002).

To evaluate the proposed roles of the mid-fusiform gyrus in orthographic processing ver­
sus more general lexical processing, Hillis et al. (2005) studied eighty patients with acute,
left-hemisphere ischemic stroke on two reading tasks that require access to a visual word
form (or orthographic lexicon) but do not require lexical output—written lexical decision
and written word–picture verification. Patients were also administered other lexical tasks,
including oral and written naming of pictures, oral naming of objects from tactile explo­
ration, oral reading, spelling to dictation, and spoken word–picture verification. Patients
underwent magnetic resonance imaging, including diffusion- and perfusion-weighted
imaging, the same day. It was argued that if left mid-fusiform gyrus were critical to ac­
cessing visual word forms, then damage or dysfunction of this area would, at least at the
onset of stroke (before reorganization and recovery), reliably cause impaired perfor­
mance on written lexical decision and written word–picture verification.

However, there was no significant association between damage or dysfunction of this re­
gion and impaired written lexical decision or written word–picture verification, indicating
that left mid-fusiform gyrus is not reliably necessary for accessing visual word forms
(Hillis et al., 2005). Of the fifty-three patients who showed infarct or hypoperfusion (se­
vere enough to cause dysfunction) involving the left mid-fusiform gyrus, twenty-two sub­
jects had intact written word comprehension, and fifteen had intact written lexical deci­
sion, indicating that both tasks can be accomplished without function of this region. How­
ever, there was a strong association between damage or dysfunction of this area and im­
pairment in oral reading (χ2 = 10.8; df1; p = 0.001), spoken picture naming (χ2 = 18.9;
df1; p < 0.0001), spoken object naming to tactile exploration (χ2 = 8.2; df1; p < 0.004);
and written picture naming (χ2 = 13.5; df1; p < 0.0002). These results indicate that struc­
tural damage or tissue dysfunction of the left mid-fusiform gyrus was associated with im­
paired lexical processing, irrespective of the input or output modality. In this study, the in­
farct or hypoperfusion that included the left mid-fusiform gyrus usually extended to adja­
cent areas of the left fusiform gyrus (BA 37). Therefore, it is possible that dysfunction of
areas near the left mid-fusiform gyrus, such as the lateral inferior-temporal multimodality

Page 9 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
area (LIMA; Cohen, Jobert, Le Bihan, & Dehaene, 2004), was critical to modality-indepen­
dent lexical processing. Further support for a critical role of at least a portion of the left
fusiform gyrus in modality-independent lexical output is provided by patients who show
impaired naming (with visual or tactile input) and oral reading when this region is hypop­
erfused, and improved naming and oral reading when this region is reperfused (Hillis,
Kane, et al., 2002; Hillis et al., 2005). Previous studies have also reported oral naming
and oral reading deficits associated with lesions to this region (e.g., Foundas et al., 1998;
Hillis, Tuffiash, et al., 2002; Raymer et al., 1997; Sakurai, Sakai, Sakuta, & Iwata, 1994).

One possible way to accommodate these various sets of results is the following. Part of
left fusiform gyrus (perhaps lateral to the VWFA identified by Cohen and colleagues) may
be critical for modality-independent lexical processing, rather than a reading-specific
process (Price & Devlin, 2003, 2004); (p. 497) and part of the left or right mid-fusiform
gyrus is essential for computing a location-, font-, and orientation-independent graphemic
description (one meaning of a visual word form). Evidence in support of this latter compo­
nent of this account comes from patients with pure alexia (who have a disconnection be­
tween visual input to the left mid-fusiform gyrus) (Binder & Mohr, 1992; Chialant & Cara­
mazza, 1997; Marsh & Hillis, 2005; Miozzo & Caramazza, 1997; Saffran & Coslett, 1998).
In one case of acute pure alexia, written word recognition was impaired when the left
mid-fusiform gyrus was infarcted and the splenium of the corpus callosum was hypoper­
fused, but recovered when the splenium was reperfused (despite persistent damage to
the left mid-fusiform gyrus). These results could be explained by proposing that the pa­
tient was able to rely on the right mid-fusiform gyrus to compute case-, font- and orienta­
tion-independent graphemic descriptions (when the left was damaged); but these
graphemic descriptions could not be used for reading until reperfusion of the splenium al­
lowed information from the right mid-fusiform gyrus to access language cortex for addi­
tional components of the reading process (Marsh & Hillis, 2005). This proposal of a criti­
cal role of either the left or right mid-fusiform gyrus in computing graphemic descriptions
is consistent with findings of (1) reliable activation of this area in response to written
words, pseudowords, and letters in functional imaging; (2) activation of bilateral mid-
fusiform gyrus in written lexical decision (Fiebach et al., 2002); and (3) electrophysiologi­
cal evidence of a mid-fusiform activity early in the reading process (Salmelin et al., 1996,
Tarkiainen et al., 1999).

In general, there are fewer functional imaging studies of spelling than reading (for re­
views and meta-analyses, see Jobard et al., 2003; Mechelli et al., 2003; Turkeltaub et al.,
2002). However, four of the six spelling studies show that the left fusiform gyrus is an
area involved in spelling (Beeson et al., 2003; Norton et al., 2007; Rapp & Hsieh, 2002;
Rapp & Lipka, 2011). Spelling might require both computation of a case-, font-, and orien­
tation-independent graphemic description (proposed to require right or left mid-fusiform
gyrus) and modality-independent lexical output (proposed to require the lateral part of
left mid-fusiform gyrus). Further evidence of a role for part of the left mid-fusiform gyrus
in modality-independent lexical output is that reading Braille in blind subjects (Büchel et
al., 1998) and sign language in deaf subjects (Price et al., 2005) activates this area.

Page 10 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Figure 24.3 shows the remarkable similarity in the area identified as a key neural sub­
strate of reading, spelling, or naming across various lesion and functional imaging stud­
ies.

Inferior Frontal Gyrus and Broca’s Area (BA 44/45)

The left posterior IFG is probably the area of the brain mostly found to be involved in lan­
guage processing, and in particular in language production, since the 1800s. Damage in
the IFG has also been associated with impairments in fluency (Goodglass et al., 1969),
picture naming (Hecaen & Consoli, 1973; Hillis et al., 2004; Miceli & Caramazza, 1988),
lexical impairment (Goodglass, 1969; Perani & Cappa, 2006), semantic retrieval (Good­
glass et al., 1969), and syntactic processing (Grodzinsky, 2000).

There have also been many reports of patients with impairments in reading and spelling
after lesions in Broca’s area. IFG lesions are often associated with deficits in reading or
writing nonwords using grapheme-to-phoneme conversion rules, sometimes with accu­
rate performance on words. However, in most of these cases, lesions in BA 44/45 were ac­
companied by damage to other perisylvian areas, such as the insula, precentral gyrus (BA
4/6), and superior temporal gyrus or Wernicke’s area (BA 22) (Coltheart, 1980; Fiez et al.,
2006; Henry et al., 2005; Rapcsak et al., 2002, 2009; Roeltgen et al., 1983). However, in
the chronic lesion studies there were no areas that could be identified as critical for non­
word reading or spelling by themselves, in the absence of damage to the surrounding ar­
eas (Rapcsak, 2007).

Page 11 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing

Figure 24.3 A, Cluster of voxels where hypoperfu­


sion/infarct was most strongly associated with im­
paired spelling of words in acute stroke. B, Cluster of
voxels with greatest activation associated with read­
ing of words in a functional magnetic resonance
imaging (fMRI) study. C, Cluster of voxels with great­
est activation associated with spelling of words in an
fMRI study. D, Magnetic resonance perfusion-weight­
ed image showing area of hypoperfusion in patient
with impaired oral reading, oral naming, and spelling
in acute stroke. In each panel, the arrow points to the
important voxels within left Brodmann area 37 in the
mid-fusiform gyrus.

A, adapted with permission from Philipose et al.,


2007. Copyright © 2007, John Wiley and Sons; B,
Reprinted from Trends in Cognitive Sciences, Vol. 7,
Issue 7, Bruce D. McCandliss, Laurent Cohen, and
Stanislas Dehaene, “The visual word form area: ex­
pertise for reading in the fusiform gyrus,” 293–299,
Copyright 2003, with permission from Elsevier; C,
adapted with permission from Beeson, et al. 2003.
Copyright © 2003 Routledge. D, adapted with per­
mission from Hillis et al, 2004. Copyright © 2004
Routledge.

Studies have shown that a lesion or tissue dysfunction of this area alone may result in
deficits in spelling (see Hillis et al., 2002, for review). Many patients with relatively local­
ized lesions involving Broca’s area have pure agraphia, with impaired written naming
(particularly of verbs compared with nouns), but spared oral naming of verbs and nouns,
accompanied by impaired sublexical mechanisms for converting phonemes to graphemes
(Hillis, Chang, & Breese, 2004; see also Hillis, Rapp, & Caramazza, 1999). Additional evi­
dence that the IFG is crucial for written naming of verbs comes from the use of reperfu­
sion in acute stroke (Hillis et al., 2002). For example, Hillis and colleagues found that im­
pairment in accessing orthographic representations (particularly verbs) with relatively
spared oral naming and word meaning was associated with dysfunction (hypoperfusion)
of Broca’s area. Furthermore, (p. 498) reperfusion of Broca’s area resulted in recovery of
written naming of verbs. Thus, Broca’s area seems to be critical at least for writing verbs
(in the normal state, before reorganization).

Page 12 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
The reason we do not always see spelling or reading deficits caused by a lesion to the IFG
in chronic stroke patients may be that the brain reorganizes so that the contribution of
this area in spelling can be taken over by other areas over time (Price & Friston, 2002).

Functional neuroimaging studies have been employed to assess whether the IFG is en­
gaged in reading and spelling. Many functional neuroimaging studies of reading and
spelling have found IFG involvement, even when no overt speech production is required
(Bolger et al., 2005; Fiez & Petersen, 1998; Jobard et al. 2003; Mechelli et al., 2003;
Paulesu et al., 2000; Price 2000; Rapp & Lipka, 2011 [review]; Turkeltaub, 2002). Howev­
er, not all studies have shown IFG activation during spelling. One functional neuroimag­
ing study reported frontal activation in written naming versus oral naming, but the activa­
tion did not clearly involve Broca’s area (Katonoda et al., 2001). Another study did report
the activation of Broca’s area in written naming, but this activation did not survive the
comparison to oral naming (Beeson et al., 2003). Nevertheless, both Beeson and col­
leagues (2003) and Rapp and Hsieh (2002) found spelling-specific activation in the left
IFG. Therefore, it is hard to identify the exact role of the left IFG in the process of read­
ing or spelling; subparts of this area may be necessary or engaged in distinct cognitive
processes underlying reading or spelling, such as lexical retrieval or selection (Price,
2000; Thompson-Schill et al., 1999); phonological processing (Pugh et al., 1996), or re­
trieval or selection from the orthographic lexicon specifically (Hillis et al., 2002). The de­
termination of the role of Broca’s area in reading and writing becomes even more compli­
cated if one considers the many other language and nonlanguage functions that have
been attributed to this area and that may be required for reading and writing, such as
working memory (Smith & Jonides, 1999), memory (Paulesu et al., 1993), syntactic pro­
cessing (Thompson et al., 2007), cognitive control and task updating (Brass (p. 499) & von
Cramon, 2002, 2004; Derrfuss et al., 2005; Thompson-Schill et al., 1999), and semantic
encoding (Mummery et al., 1996; Paulesu et al., 1997; Poldrack et al., 1999).

Other Areas Implicated in Reading and Writing

Supramarginal Gyrus (BA 40)


The supramarginal gyrus has been found to be compromised (often along with other peri­
sylvian areas such as Broca’s area, precentral gyrus, and Wernicke’s area) mostly in pa­
tients with impaired reading or spelling of nonwords or unfamiliar words (Alexander et
al., 1992; Coltheart et al., 1980; Fiez et al., 2006; Henry et al., 2007; Lambon Ralph &
Graham, 2000; Rapcsak & Beeson, 2002; Rapcsak et al., 2009; Roeltgen et al., 1984; Shal­
lice, 1981). However, acute lesions of supramarginal gyrus do seem to interfere with
spelling familiar words as well (Hillis et al., 2002). Probably the strongest evidence that
the supramarginal gyrus is important in reading and spelling words as well as nonwords
comes from a perfusion study in acute stroke patients (Philipose et al., 2007). In this
study, the authors found that the supramarginal gyrus was one of the two areas (the other
was the fusiform gyrus) in which hypoperfusion (independently of hypoperfusion in other
regions) predicted impairments in reading and spelling words and nonwords. These re­
gions might be tightly coupled acutely so that dysfunction in one brain region or the oth­

Page 13 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
er is sufficient to impair the neural network necessary for reading and spelling both
words and nonwords. The reason we do not often see chronic lesions in supramarginal
gyrus to result in deficits in reading or spelling is that other areas might successfully as­
sume its function in the networks.

Figure 24.4 Magnetic resonance diffusion-weighted


image (left) and perfusion image (right) of a patient
with acute infarct and hypoperfusion in left angular
gyrus, who was selectively impaired in reading and
spelling of words and nonwords.

The supramarginal gyrus (BA 40) has also been found to be involved in both reading and
spelling in the functional neuroimaging literature for both words and nonwords (Booth et
al., 2002, 2003; Corina et al., 2003; Law et al., 1991; Mummery et al., 1998; Price, 1998).
In these studies, the supramarginal gyrus was implicated in sublexical conversion either
from graphemes to phonemes or from phonemes to graphemes. However, in these studies
clusters at the supramarginal gyrus during sublexical conversion were always accompa­
nied by clusters at the frontal operculum in the IFG, as Jobard and colleagues (2003) note
in their extensive meta-analysis. Some researchers have claimed that this set of brain re­
gions sustains phonological working memory and serves as a temporary phonological
store (Becker et al., 1999; Fiez et al., 1996, 1997; Paulesu et al., 1993). Thus, it is plausi­
ble that this region is activated in reading and writing because the sequential computa­
tions involved in sublexical conversion must be held in working memory as the word is
written or spoken.

Angular Gyrus (BA 39)


Lesion studies provide evidence for a critical role of angular gyrus in reading (Benson,
1979; Black & Behrmann, 1994; Déjerine, 1892) and spelling (Hillis et al., 2001; Rapcsak
& Beseson, 2002; Roeltgen & Heilman, 1984). The angular gyrus is an area that appears
to be critical for both sublexical conversion mechanisms, that is, orthography to phonolo­
gy in reading (Hillis et al., 2001) and phonology to orthography in spelling (Hillis et al.,
2002), but also for access to lexical orthographic representations (Hillis et al., 2002). Fig­
ure 24.4 shows the MRIs of a patient with acute infarct and hypoperfusion in left angular
gyrus, who was (p. 500) selectively impaired in reading and spelling of words and non­
words. However, some patients with lesions in this area show deficits for only reading or
spelling. This finding should not be too surprising because the angular gyrus is a fairly

Page 14 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
large area, and it is conceivable that, because even small brain areas may be specialized,
there may be smaller areas with different functions included in this area. Furthermore, in
chronic stroke patients, there may be other areas that take over one function but not the
other, depending on factors of individual neuronal connectivity or practice, or for some
other unknown reason.

Functional neuroimaging studies of normal adults have been inconsistent in finding evi­
dence for angular gyrus involvement; that is, some of them did find angular gyrus activa­
tion (Rapp & Hsieh, 2002), but others did not (Beeson & Rapcsak, 2002). However, sever­
al developmental studies have found that the angular gyrus activation is associated with
reading of pseudowords more than words and proposed that this area may be more im­
portant during development (Pugh et al., 2001). Therefore, the evidence is not yet conclu­
sive about the exact contribution of the angular gyrus in reading or spelling, but this area
seems to have an important role in these tasks.

Exner’s Area (BA 6)


Hillis et al. (2002) have found that hypoperfusion (causing dysfunction) of Exner’s area
was highly associated with impaired allographic conversion (converting from an abstract
grapheme to a particular letter shape or letter-specific motor plan), but not with impaired
access to orthographic representations. This finding agrees with Exner’s original propos­
al for a cortical center of movements for writing. Exner’s center contribution to the con­
trol of hand movements has also been found in lesion studies (Anderson et al., 1990), in
which patients were impaired in written but not oral spelling. It is also consistent with
fMRI studies in which the comparison between written and oral spelling showed activa­
tion in Exner’s area and with cortical stimulation studies in which temporary lesions in
Exner’s area caused difficulty with motor aspects of writing (see Roux et al., 2009, for a
review). In functional neuroimaging studies of spelling, BA 6 was also found to be in­
volved (Beeson et al., 2003; Rapp & Hsieh, 2002). It seems that this area is more involved
in writing than reading and probably has a role in neural machinery needed for motoric
aspects of written output.

Superior Temporal Gyrus (BA 22)


The left superior temporal gyrus is an area, in conjunction with the other perisylvian ar­
eas (BA 44/45, insula, BA 40), involved in phonological processing (see Rapcsak, 2009, for
a review). In lesion studies, especially in the case of left middle cerebral artery stroke, it
has been shown to be involved in comprehension in spoken language. Dysfunction of this
area has been shown to disrupt understanding of oral and written words and of oral and
written naming in acute stroke patients (Hillis et al., 2001). Furthermore, in a recent
study, Philipose and colleagues (2007) found that hypoperfusion in this area was associat­
ed with impaired word reading. The above evidence was taken as indication that this area
is critical for linking words to their meaning, and it was concluded that this area may be
important for both reading and spelling words to the extent that they require access to
meaning. However, this area does not seem to play a critical role in either reading or
spelling of nonwords (Philipose et al., 2007).

Page 15 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
However, functional imaging studies show activation of anterior and middle areas of the
superior temporal gyrus linked to phonological processing or to grapheme-to-phoneme
conversion in reading (Jobard et al., 2003; Price et al., 1996; Wise et al., 1991). Likewise,
an MEG study of adult reading indicated that the middle part of the superior temporal
gyrus was involved in pseudoword but not word reading (Simos et al., 2002).

The discrepancy between lesion and functional imaging studies observed may be due to
the fact that the superior temporal gyrus is a large area of cortex, and its posterior part
may well be related to accessing meaning from phonological or orthographic codes as
Wernicke has claimed, whereas its middle and anterior parts may be more dedicated to
phonological processing per se. Alternatively, the middle and anterior parts of the superi­
or temporal gyrus may be consistently engaged in all aspects of phonological processing
(including sublexical grapheme-to-phoneme conversion and hearing), but not necessary
for these processes (perhaps because the right superior temporal gyrus is capable of at
least some aspects of phonological processing), so that unilateral lesions do not cause
deficits in these aspects.

Conclusion
The evidence reviewed in this chapter, although controversial and inconclusive on some
points, illustrates the complexity of the cognitive and neural mechanisms underlying
reading and spelling and their relationships to one another. We started (p. 501) this review
by taking the paradigms of reading and spelling as instances to discuss the correspon­
dence between cognitive and neural architecture, just to discover the complexity, al­
though not ineffability, of this task. One of the most intriguing observations in reviewing
the bulk of the current evidence is that some brain areas seem to be crucial for more than
one language function or component, and some are more specialized. One function may
be subserved by more than one brain area, and a single brain area may subserve more
than one function. Thus, reading and spelling seem to be complex processes consisting of
distinct cognitive subprocesses that seem to rely on overlapping networks of separate
brain regions. Furthermore, the connectivity of this cortical circuitry may be altered and
reorganized over time, owing to practice or therapy (or to new infarcts, atrophy, or resec­
tion). Further insights into the plasticity of the human cortex and its dynamic nature are
needed to shed greater light on the cognitive and neural processes underlying reading
and writing.

References
Aghababian, V., & Nazir, T. A. (2000). Developing normal reading skills: Aspects of the vi­
sual process underlying word recognition. Journal of Experimental Child Psychology, 76,
123–150.

Alexander, M. P., Friedman, R. B., Loverso, F., & Fischer, R. S. (1992). Lesion localization
in phonological agraphia. Brain and Language, 43, 83–95.

Page 16 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Anderson, S.W., Damasio, A. R., & Damasio, H. (1990). Troubled letters but not numbers:
Domain specific cognitive impairments following focal damage in frontal-cortex. Brain,
113, 749–766.

Baker, C. I., Liu, J., Wald, L. L., et al. (2007). Visual word processing and the experiential
origins of functional selectivity in human extrastriate cortex. Proceedings of the National
Academy of Sciences U S A, 104, 9087–9092.

Beauvois, M. F., & Dérouesné, J. (1979). Phonological alexia: Three dissociations. Journal
of Neurology, Neurosurgery and Psychiatry, 42, 1115–1124.

Becker, J. T., Danean, K., MacAndrew, D. K., & Fiez, J. A. (1999). A comment on the func­
tional localization of the phonological storage subsystem of working memory. Brain and
Cognition, 41, 27–38.

Beeson, P. M., & Rapcsak, S. Z. (2002). Clinical diagnosis and treatment of spelling disor­
ders. In A. E. Hillis (Ed.), Handbook on adult language disorders: Integrating cognitive
neuropsychology, neurology, and rehabilitation (pp. 101–120). Philadelphia: Psychology
Press.

Beeson, P. M., & Rapcsak, S. Z. (2003). The neural substrates of sublexical spelling (INS
Abstract). Journal of the International Neuropsychological Society, 9, 304.

Beeson, P. M., Rapcsak, S. Z., Plante, E., et al. (2003). The neural substrates of writing: A
functional magnetic resonance imaging study. Aphasiology, 17, 647–665.

Behrmann, M., & Bub, D. (1992). Surface dyslexia and dysgraphia: Dual routes, single
lexicon. Cognitive Neuropsychology, 9, 209–251.

Behrmann, M., Nelson, J., & Sekuler, E. B. (1998). Visual complexity in letter-by-letter
reading: “Pure” alexia is not pure. Neuropsychologia, 36, 1115–1132.

Behrmann, M., Plaut, D. C., & Nelson, J. (1998). A literature review and new data support­
ing an interactive account of letter-by-letter reading. Cognitive Neuropsychology, 15, 7–
51.

Ben-Shachar, M., Dougherty, R. F., Deutsch, G. K., & Wandell, B. A. (2007). Differential
sensitivity to words and shapes in ventral occipito-temporal cortex. Cerebral Cortex, 17,
1604–1611.

Benson, D. F. (1979). Aphasia, alexia and agraphia. New York: Churchill Livingstone.

Binder, J. R., McKiernan, K. A., Parsons, M. E., et al. (2003). Neural correlates of lexical
access during visual word recognition. Journal of Cognitive Neuroscience, 15, 372–393.

Binder, J. R., Medler, D. A., Westbury, C. F., et al. (2006). Tuning of the human left
fusiform gyrus to sublexical orthographic structure. NeuroImage, 33, 739–748.

Page 17 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Binder, J. R., & Mohr, J. P. (1992). The topography of callosal reading pathways. Brain,
115, 1807–1826.

Binder, J., & Price, C. J. (2001). Functional neuroimaging of language. In R. Cabeza & A.
Kingstone (Eds.), Handbook of functional neuroimaging of cognition (pp. 187–251). Cam­
bridge, MA: MIT Press.

Black, S., & Behrmann, M. (1994). Localization in alexia. In A. Kertesz (Ed.), Localization
and neuroimaging in neuropsychology. San Diego: Academic Press.

Bolger, D. J., Perfetti, C. A., & Schneider, W. (2005). A cross-cultural effect on the brain re­
visited. Human Brain Mapping, 25, 92–104.

Booth, J. R., Burman, D. D., Meyer, J. R, et al. (2003). Relation between brain activation
and lexical performance. Human Brain Mapping, 19, 155–169.

Booth, J. R., Burman, D. D., Meyer, J. R., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M.
(2002). Modality independence of word comprehension. Human Brain Mapping, 16, 251–
261.

Borowsky, R., & Besner, D. (1993). Visual word recognition: A multistage activation mod­
el. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 813–840.

Bowers, J. S., Bub, D., & Arguin, M. A. (1996). Characterization of the word superiority ef­
fect in a case of letter-by-letter surface alexia. Cognitive Neuropsychology, 13, 415–441.

Boxer, A. L., Rankin, K. P., Miller, B. L., et al. (2003). Cinguloparietal atrophy distinguish­
es Alzheimer’s disease from semantic dementia. Archives of Neurology, 60, 949–956.

Brass, M., & von Cramon, D. Y. (2002). The role of the frontal cortex in task preparation.
Cerebral Cortex, 12, 908–914.

Brass, M., & von Cramon, D. Y. (2004). Decomposing components of task preparation with
functional magnetic resonance imaging. Journal of Cognitive Neuroscience, 16, 609–620.

Bruno, J. L., Zumberge, A., & Manis, F. R., et al. (2008). Sensitivity to orthographic famil­
iarity in the occipito-temporal region. NeuroImage, 39, 1988–2001.

Bub, D., & Kertesz, A. (1982). Deep agraphia. Brain and Language, 17, 146–165.

Büchel, C., Price, C., Frackowiak, R. S., & Friston, K. (1998). Different activation patterns
in the visual cortex of late and congenitally blind subjects. Brain: A Journal of Neurology,
121, 409–419.

Burton, M. W., LoCasto, P. C., Krebs-Noble, D., & Gullapalli, R. P. (2005). A system­
(p. 502)

atic investigation of the functional neuroanatomy of auditory and visual phonological pro­
cessing. NeuroImage, 26, 647–661.

Page 18 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Chan, D., Fox, N. C., Scahill, R. I., et al. (2001). Patterns of temporal lobe atrophy in se­
mantic dementia and Alzheimer’s disease. Annals of Neurology, 49, 433–442.

Chialant, D., & Caramazza, A. (1997). Identity and similarity factors in repetition blind­
ness: Implications for lexical processing. Cognition, 63, 79–119.

Cloutman, L., Gingis, L., Newhart, M., Davis, C., Heidler-Gary, J., Crinion, J., & Hillis, A. E.
(2009). A neural network critical for spelling. Annals of Neurology, 66 (2), 249–253.

Cohen, L., & Dehaene, S. (2004). Specialization within the ventral stream: The case for
the visual word form area. NeuroImage, 22, 466–476.

Cohen, L., Dehaene, S., Naccache, L., et al. (2000). The visual word form area: Spatial
and temporal characterization of an initial stage of reading in normal subjects and poste­
rior split-brain patients. Brain, 123, 291–307.

Cohen, L., Dehaene, S., Vinckier, F., et al. (2008). Reading normal and degraded words:
Contribution of the dorsal and ventral visual pathways. NeuroImage, 40, 353–366.

Cohen, L., Henry, C., Dehaene, S., et al. (2004). The pathophysiology of letter-by-letter
reading. Neuropsychologia, 42, 1768–1780.

Cohen, L., Jobert, A., Bihan, D. L., & Dehaene, S. (2004). Distinct unimodal and multi­
modal regions for word processing in left temporal cortex. NeuroImage, 23, 1256–1270.

Cohen, L., Lehéricy, S., Chochon, F., et al. (2002). Language-specific tuning of visual cor­
tex? Functional properties of the Visual Word Form Area. Brain, 125, 1054–1069.

Cohen, L., Martinaud, O., Lemer, C., et al. (2003). Visual word recognition in the left and
right hemispheres: anatomical and functional correlates of peripheral alexias. Cerebral
Cortex, 13, 1313–1333.

Coltheart, M., Patterson, K., & Marshall, J. C. (1980). Deep dyslexia. London: Routledge &
Kegan Paul.

Coltheart, M., Rastle, K., Perry, C., et al. (2001). DRC: A dual route cascaded model of vi­
sual word recognition and reading aloud. Psychological Review, 108, 204–256.

Corina, D. P., San Jose-Robertson, L., Guillemin, A., High, J., & Braun, A. R. (2003). Lan­
guage lateralization in a bimanual language. Journal of Cognitive Neuroscience, 15, 718–
730.

Crisp, J., & Lambon Ralph, M. A. (2006). Unlocking the nature of the phonological-deep
dyslexia continuum: the keys to reading aloud are in phonology and semantics. Journal of
Cognitive Neuroscience, 18, 348–362.

Damasio, A. R., & Damasio, H. (1983). The anatomic basis of pure alexia. Neurology, 33,
1573–1583.

Page 19 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Dehaene, A., Cohen, L., Sigman, M., & Vinckier, F. (2005). The neural code for written
words: A proposal. Trends in Cognitive Sciences, 9, 335–341.

Dehaene, S., Le Clec, H. G., Poline, J. B., Le Bihan, D., & Cohen, L. (2002). The visual
word form area: A prelexical representation of visual words in the fusiform gyrus. Neu­
roReport, 13, 321–325.

Dehaene, S., Naccache, L., Cohen, L., Bihan, D. L., Mangin, J. F., Poline, J. B., et al. (2001).
Cerebral mechanisms of word masking and unconscious repetition priming. Nature Neu­
roscience, 4, 752–758.

Déjerine, J. (1891). Sur un cas de cécité verbale avec agraphie, suivi d’autopsie. Mém Soc
Biol, 3, 197–201.

Déjerine, J. (1892). Contribution a l’étude anatomo-pathologique et clinique des differ­


entes variétés de cécité verbale. Mém Soc Biol, 4, 61–90.

DeLeon, J., Gottesman, R. F., Kleinman, J. T., et al. (2007). Neural regions essential for dis­
tinct cognitive processes underlying picture naming. Brain, 130, 1408–1422.

De Renzi, E., Zambolin, A., & Crisi, G. (1987). The pattern of neuropsychological impair­
ment associated with left posterior cerebral artery infarcts. Brain, 110, 1099–1116.

Derrfuss, J., Brass, M., Neumann, J., & von Cramon, D. Y. (2005). Involvement of the infe­
rior frontal junction in cognitive control: meta-analyses of switching and Stroop studies.
Human Brain Mapping, 25, 22–34.

Ellis, A. W., & Young, A. W. (1988). Human cognitive neuropsychology. Hove, UK: Erl­
baum.

Epelbaum, S., Pinel, P., Gaillard, R., et al. (2008). Pure alexia as a disconnection syn­
drome: new diffusion imaging evidence for an old concept. Cortex, 44, 962–974.

Exner, S. (1881). Lokalisation des Funcktion der Grosshirnrinde des Menschen. Wein:
Braunmuller.

Farah, M. J., Stowe, R. M., & Levinson, K. L. (1996). Phonological dyslexia: Loss of a read­
ing-specific component of the cognitive architecture? Cognitive Neuropsychology, 13,
849–868.

Farah, M. J., & Wallace, M. A. (1991). Pure alexia as a visual impairment: A reconsidera­
tion. Cognitive Neuropsychology, 8, 313–334.

Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5, 79–83.

Fiez, J. A., & Petersen, S. E. (1998). Neuroimaging studies of word reading. Proceedings
of the National Academy of Science U S A, 95, 914–921.

Page 20 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Fiez, J. A., Raichle, M. E., Balota, D. A., Tallal, P., & Petersen, S. E. (1996). PET activation
of posterior temporal regions during auditory word presentation and verb generation.
Cerebral Cortex, 6, 1–10.

Fiez, J. A., Tranel, D., Seager-Frerichs, D., & Damasio, H. (2006). Specific reading and
phonological processing deficits are associated with damage to the left frontal opercu­
lum. Cortex, 42, 624–643.

Foundas, A., Daniels, S. K., & Vasterling, J. J. (1998). Anomia: Case studies with lesion lo­
calization. Neurocase, 4, 35–43.

Friedman, R. B. (1996). Recovery from deep alexia to phonological alexia: Points on a con­
tinuum. Brain and Language, 52, 114–128.

Friedman, R. B., & Hadley, J. A. (1992). Letter-by-letter surface alexia. Cognitive Neu­
ropsychology, 9, 185–208.

Funnell, E. (1996). Response bias in oral reading: An account of the co-occurrence of sur­
face dyslexia and semantic dementia. Quarterly Journal of Experimental Psychology, 49A,
417–446.

Gaillard, R., Naccache, L., Pinel, P., et al. (2006). Direct intracranial, fMRI, and lesion evi­
dence for the causal role of left inferotemporal cortex in reading. Neuron, 50, 191–204.

Galton, C. J., Patterson, K., Graham, K., et al. (2001). Differing patterns of temporal atro­
phy in Alzheimer’s disease and semantic dementia. Neurology, 57, 216–225.

Gelb, I. J. (1963). A study of writing. Chicago: University of Chicago Press.

Glezer, L. S., Jiang, X., & Riesenhuber, M. (2009). Evidence for highly selective neuronal
tuning to whole words in the “visual word form area.” Neuron, 62 (2), 199–204.

Glosser, G., & Friedman, R. B. (1990). The continuum of deep/phonological alexia.


(p. 503)

Cortex, 26, 343–359.

Gold, B. T., & Kertesz, A. (2000). Right-hemisphere semantic processing of visual words in
an aphasic patient: An fMRI study. Brain and Language, 73, 456–465.

Goodglass, H., Hyde, M. R., & Blumstein, S. (1969). Frequency, picturability and availabil­
ity of nouns in aphasia. Cortex, 5, 104–119.

Gorno-Tempini, M. L., Dronkers, N. F., Rankin, K. P., et al. (2004). Cognition and anatomy
in three variants of primary progressive aphasia. Annals of Neurology, 55, 335–346.

Graham, K. S., Hodges, J. R., & Patterson, K. (1994). The relationship between compre­
hension and oral reading in progressive fluent aphasia. Neuropsychologia, 32, 299–316.

Graham, N. L., Patterson, K., & Hodges, J. R. (2000). The impact of semantic memory im­
pairment on spelling: evidence from semantic dementia. Neuropsychologia, 38, 143–163.

Page 21 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Grodzinsky, Y. (2000). The neurology of syntax: Language use without Broca’s area. Be­
havioral and Brain Sciences, 23, 1–21.

Hecaen, H., & Consoli, S. (1973). Analysis of language disorders in lesions of Broca’s
area. Neuropsychologia, 11, 377–388.

Henry, C., Gaillard, R., & Volle, E., et al. (2005). Brain activations during letter-by-letter
reading: A follow-up study. Neuropsychologia, 43, 1983–1989.

Henry, M. L., Beeson, P. M., Stark, A. J., & Rapcsak, S. Z. (2007). The role of left perisyl­
vian cortical regions in spelling. Brain and Language, 100, 44–52.

Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature
Reviews Neuroscience, 8, 393–402.

Hillis, A. E., Chang, S., & Breese, E. (2004). The crucial role of posterior frontal regions in
modality specific components of the spelling process. Neurocase, 10, 157–187.

Hillis, A. E., Kane, A., Barker, P., et al. (2001). Neural substrates of the cognitive process­
es underlying reading: Evidence from magnetic resonance perfusion imaging in hypera­
cute stroke. Aphasiology, 15, 919–931.

Hillis, A. E., Kane, A., Tuffiash, E., et al. (2002). Neural substrates of the cognitive
processes underlying spelling: Evidence from MR diffusion and perfusion imaging. Apha­
siology 16, 425–438.

Hillis, A. E., Kane, A., Tuffiash, E., Beauchamp, N., Barker, P. B., Jacobs, M. A., & Wityk, R.
(2002). Neural substrates of the cognitive processes underlying spelling: Evidence from
MR diffusion and perfusion imaging. Aphasiology, 16, 425–438.

Hillis, A. E., Newhart, M., Heidler, J., et al. (2005). The roles of the “visual word form
area” in reading. NeuroImage, 24, 548–559.

Hillis, A., & Rapp, B. (2004). Cognitive and neural substrates of written language compre­
hension and production. In M. Gazzaniga (Ed.), The new cognitive neurosciences (3rd ed.,
pp. 755–788). Cambridge, MA: MIT Press.

Hillis, A. E., Rapp, B. C., & Caramazza, A. (1999). When a rose is a rose in speaking but a
tulip in writing. Cortex, 35, 337–356.

Hillis, A.E., Tuffiash, E., & Caramazza, A. (2002). Modality specific deterioration in oral
naming of verbs. Journal of Cognitive Neuroscience, 14, 1099–1108.

Hodges, J. R., & Patterson, K. (2007). Semantic dementia: A unique clinicopathological


syndrome. Lancet Neurology, 6, 1004–1014.

Howard, D., Patterson, K., Franklin, S., Morton, J., & Orchard-Lisle, V. (1984). Variability
and consistency in picture naming by aphasic patients. Advances in Neurology, 42, 263–
276.
Page 22 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Ino, T., Tokumoto, K., Usami, K., et al. (2008). Longitudinal fMRI study of reading in a pa­
tient with letter-by-letter reading. Cortex, 44, 773–781.

Jefferies, E., Sage, K., & Lambon Ralph, M. A. (2007). Do deep dyslexia, dysphasia and
dysgraphia share a common phonological impairment? Neuropsychologia, 45, 1553–1570.

Jobard, G., Crivello, F., & Tzourio-Mazoyer, N. (2003). Evaluation of the dual route theory
of reading: a metanalysis of 35 neuroimaging studies. NeuroImage, 20, 693–712.

Johnson, M. H. (2001). Functional brain development in humans. Nature Reviews Neuro­


science, 2, 475–483.

Joseph, J. E., Cerullo, M. A., Farley, A. B., et al. (2006). fMRI correlates of cortical special­
ization and generalization for letter processing. NeuroImage, 32, 806–820.

Katzir, T., Misra, M., & Poldrack, R. A. (2005). Imaging phonology without print: Assess­
ing the neural correlates of phonemic awareness using fMRI. NeuroImage, 27, 106–115.

Klein, D., Milner, B., Zatorre, R. J., Zhao, V., & Nikelski, J. (1999). Cerebral organization in
bilinguals: A PET study of Chinese-English verb generation. NeuroReport, 10, 2841–2846.

Kronbichler, M., Bergmann, J., Hutzler, F., et al. (2007). Taxi vs. taksi: On orthographic
word recognition in the left ventral occpitotemporal cortex. Journal of Cognitive Neuro­
science, 19, 1584–1594.

Kronbichler, M., Hutzler, F., Wimmer, H., et al. (2004). The visual word form area and the
frequency with which words are encountered: Evidence from a parametric fMRI study.
NeuroImage, 21, 946–953.

Lambert, J., Giffard, B., Nore, F. et al. (2007). Central and peripheral agraphia in
Alzheimer’s disease: From the case of Auguste D. to a cognitive neuropsychology ap­
proach. Cortex, 43, 935–951.

Lambon Ralph, M. A., & Graham, N. L. (2000). Previous cases: Acquired phonological and
deep dyslexia. Neurocase, 6, 141–178.

Larsen, J., Baynes, K., & Swick, D. (2004). Right hemisphere reading mechanisms in a
global alexic patient. Neuropsychologia, 42, 1459–1476.

Law, I., Kannao, I., Fujita, H., Miura, S., Lassen, N., & Uemura, K. (1991). Left supramar­
ginal/angular gyri activation during reading of syllabograms in the Japanese language.
Journal of Neurolinguistics, 6, 243–251.

Leff, A. P., Crewes, H., Plant, G. T., et al. (2001). The functional anatomy of single-word
reading in patients with hemianopic and pure alexia. Brain, 124, 510–521.

Leff, A. P., Spitsyna, G., Plant, G. T., & Wise, R. J. S. (2006). Structural anatomy of pure
and hemianopic alexia. Journal of Neurology, Neurosurgery, and Psychiatry, 77, 1004–
1007.
Page 23 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Lichtheim, L. (1885). On aphasia. Brain, VII, 433–484.

Lubrano, V., Roux, F. E., & Démonet, J. F. (2004). Writingspecific sites in frontal areas: A
cortical stimulation study. Neurosurgery, 101 (5), 787–798.

Luders, H., Lesser, R. P., Hahn, J., et al. (1991). Basal temporal language area. Brain, 114,
743–754.

Mainy, N., Jung, J., Baciu, M., et al. (2008). Cortical dynamics of word recognition. Human
Brain Mapping, 29, 1215–1230.

Mani, J., Diehl, B., Piao, Z., et al. (2008). Evidence for a basal temporal visual language
center: Cortical stimulation producing pure alexia. Neurology, 71, 1621–1627.

Marinkovic, K., Dhond, R. P., Dale, A. M., et al. (2003). Spatiotemporal dynamics
(p. 504)

of modality-specific and supramodal word processing. Neuron, 38, 487–497.

Marsh, E. B., & Hillis, A. E. (2005). Cognitive and neural mechanisms underlying reading
and naming: Evidence from letter-by-letter reading and optic aphasia. Neurocase, 11,
325–337.

Marshall, J. C., & Newcombe, F. (1973). Patterns of paralexia: a psycholinguistic ap­


proach. Journal of Psycholinguistic Research, 2, 175–199.

McCandliss, B. D., Cohen, L., & Dehaene, S. (2003). The visual word form area: Expertise
for reading in the fusiform gyrus. Trends in Cognitive Sciences, 7, 293–299.

McKay, A., Castles, M., & Davis, C. (2007). The impact of progressive semantic loss on
reading aloud. Cognitive Neuropsychology, 24, 162–186.

Mechelli, A., Gorno-Tempini, M. L., & Price, C. J. (2003). Neuroimaging studies of word
and pseudoword reading: Consistencies, inconsistencies, and limitations. Journal of Cog­
nitive Neuroscience, 15, 260–271.

Miceli, G., & Caramazza, A. (1988). Dissociation of inflectional and derivational morpholo­
gy. Brain and Language, 35, 24–65.

Miozzo, M., & Caramazza, A. (1997). Retrieval of lexical-syntactic features in tip-of-the-


tongue states. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23,
1410–1423.

Molko, N., Cohen, L., Mangin, J. F., et al. (2002). Visualizing the neural bases of a discon­
nection syndrome with diffusion tensor imaging. Journal of Cognitive Neuroscience, 14,
629–636.

Moro, A., Tettamanti, M., Perani, D., Donati, C., Cappa, S. F., & Fazio, F. (2001). Syntax
and the brain: Disentangling grammar by selective anomalies. NeuroImage, 13, 110–118.

Page 24 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Mummery, C. J., Patterson, K., Hodges, J. R., & Price, C. J. (1998). Functional neuroanato­
my of the semantic system: Divisible by what? Journal of Cognitive Neurosci ence, 10,
766–777.

Mummery, C. J., Patterson, K., Price, C. J., et al. (2000). A voxel-based morphometry study
of semantic dementia: Relationship between temporal lobe atrophy and semantic memo­
ry. Annals of Neurology, 47, 36–45.

Murtha, S., Chertkow, H., Beauregard, M., & Evans, A. (1999). The neural substrate of
picture naming. Journal of Cognitive Neurosci ence, 11, 399–423.

Nakamura, K., Honda, M., Hirano, S., et al. (2002). Modulation of the visual word re­
trieval system in writing: A functional MRI study on the Japanese orthographies. Journal
of Cognitive Neuroscience, 14, 104–115.

Nakamura, K., Honda, M., Okada, T., et al. (2000). Participation of the left posterior inferi­
or temporal cortex in writing and mental recall of kanji orthography: A functional MRI
study. Brain, 123, 954–967.

Nobre, A. C., Allison, T., & McCarthy, G. (1995). Word recognition in the human inferior
temporal lobe. Nature, 372, 260–273.

Norton, E. S., Kovelman, I., & Petitto, L.-A. (2007). Are there separate neural systems for
spelling? New insights into the role of rules and memory in spelling from functional mag­
netic resonance imaging. Mind, Brain, and Education, 1, 48–59.

Ogden, J. A. (1996). Phonological dyslexia and phonological dysgraphia following left and
right hemispherectomy. Neuropsychologia, 34, 905–918.

Omura, K., Tsukamoto, T., Kotani, Y., et al. (2004). Neural correlates of phoneme-
grapheme conversion. NeuroReport, 15, 949–953.

Patterson, K. E., & Kay, J. (1982). Letter-by-letter reading: Psychological descriptions of a


neurological syndrome. Quarterly Journal of Experimental Psychology, 34A, 411–441.

Patterson, K. E., Marshall, J. C., & Coltheart, M. (1985). Surface dyslexia: Neuropsycho­
logical and cognitive studies of phonological reading. London: Erlbaum.

Paulesu, E., Frith, C. D., & Frackowiak, R. S. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.

Paulesu, E., Goldacre, B., Scifo, P., Cappa, S. F., Gilardi, M. C., Castiglioni, I., Perani, D., &
Fazio, F. (1997). Functional heterogeneity of left inferior frontal cortex as revealed by fM­
RI. NeuroReport, 8, 2011–2017.

Paulesu, E., McCrory, E., Fazio, F., Menoncello, L., Brunswick, N., Cappa, S. F., Cotelli, M.,
Cossu, G., Corte, F., Lorusso, M., Pesenti, S., Gallagher, A., Perani, D., Price, C., Frith, C.
D., & Frith, U. (2000). A cultural effect on brain function. Nature Neuroscience, 3, 91–96.

Page 25 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Paulesu, E., Perani, D., Blasi, V., Silani, G., Borghese, N. A., De Giovanni, U., Sensolo, S., &
Fazio, F. (2003). A functional-anatomical model for lipreading. Journal of Neurophysiology,
90 (3), 2005–2013.

Perani, D., & Cappa, S. (2006). Broca’s area and lexical-semantic processing. In Y.
Grodzinsky & K Amunts (Eds.), Broca’s region. New York: Oxford University Press, 2006.

Perani, D., Cappa, S. F., Schnur, T., Tettamanti, M., Collina, S., Rosa, M. M., & Fazio, F.
(1999). The neural correlates of verb and noun processing: A PET study. Brain, 122, 2337–
2344.

Perfetti, C. A. (2003). The universal grammar of reading. Scientific Studies of Reading, 7,


3–24.

Pernet, C., Celsis, P., & Demonet, J.-F. (2005). Selective response to letter categorization
within the left fusiform gyrus. NeuroImage, 28, 738–744.

Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1988). Positron
emission tomographic studies of the cortical anatomy of single-word processing. Nature,
331 (6157), 585–589.

Philipose, L. E., Gottesman, R. F., Newhart, M., et al. (2007). Neural regions essential for
reading and spelling of words and pseudowords. Annals of Neurology, 62, 481–492.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding
normal and impaired word reading: Computational principles in quasi-regular domains.
Psychological Review, 103, 56–115.

Plaut, D. C., & Shallice, T. (1993). Deep dyslexia: A case study of connectionist neuropsy­
chology. Cognitive Neuropsychology, 10, 377–500.

Polk, T. A., & Farah, M. J. (2002). Functional MRI evidence for an abstract, not perceptu­
al, word-form area. Journal of Experimental Psychology: General, 131 (1), 65–72.

Polk, T. A., Stallcup, M., Aguirre, G. K., et al. (2002). Neural specialization for letter
recognition. Journal of Cognitive Neuroscience, 14, 145–159.

Price, C. J. (2000). The anatomy of language: Contributions from functional neuroimaging


[review]. Journal of Anatomy, 197 (Pt 3), 335–359.

Price, C. J., & Devlin, J. T. (2003). The myth of the visual word form area. NeuroImage, 19,
473–481.

Price, C. J., & Devlin, J. T. (2004). The pro and cons of labelling a left occipitotem­
(p. 505)

poral region: “The visual word form area” NeuroImage, 22, 477–479.

Price, C. J., Devlin, J. T., Moore, C. J., et al. (2005). Meta-analyses of object naming: Effect
of baseline. Human Brain Mapping, 25, 70–82.

Page 26 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Price, C. J., & Friston, K. (2002). Degeneracy and cognitive anatomy. Trends in Cognitive
Sciences, 6, 416–421.

Price, C. J., & Friston, K. J. (2005). Functional ontologies for cognition: The systematic de­
finition of structure and function. Cognitive Neuropsychology, 22, 262–275.

Price, C. J., Gorno-Tempini, M. L., Graham, K. S., et al. (2003). Normal and pathological
reading: Converging data from lesion and imaging studies. NeuroImage, 20, S30–S41.

Price, C. J., Howard, D., Patterson, K., et al. (1998). A functional neuroimaging description
of two deep dyslexic patients. Journal of Cognitive Neuroscience, 10, 303–315.

Price, C. J., & Mechelli, A. (2005). Reading and reading disturbance. Current Opinion in
Neurobiology, 15, 231–238.

Price, C. J., Moore, C. J., & Frackowiak, R. S. (1996). The effect of varying stimulus rate
and duration on brain activity during reading. NeuroImage, 3 (1), 40–52.

Pugh, K. R., Mencl, W. E., Jenner, A. R., Katz, L., Frost, S. J., Lee, J. R., Shaywitz, S. E., &
Shaywitz, B. A. (2001). Neurobiological studies of reading and reading disability. Journal
of Communication Disorders, 34 (6), 479–492.

Pugh, K. R., Shaywitz, B. A., Shaywitz, S. E., Constable, R. T., Skudlarski, P., Fulbright, R.
K., et al. (1996). Cerebral organization of component processes in reading. Brain, 119,
1221–1238.

Rabinovici, G. D., Seeley, E. J., Gorno-Tempini, M. L., et al. (2008). Distinct MRI atrophy
patterns in autopsy-proven Alzheimer’s disease and frontotemporal lobar degeneration.
American Journal of Alzheimer’s Disease and Other Dementias, 22, 474–488.

Rapcsak, S. Z., & Beeson, P. M. (2002). Neuroanatomical correlates of spelling and writ­
ing. In A. E. Hillis (Ed.), Handbook of adult language disorders: Integrating cognitive neu­
ropsychology, neurology, and rehabilitation (pp. 71–99). Philadelphia: Psychology Press.

Rapcsak, S. Z., & Beeson, P. M. (2004). The role of left posterior inferior temporal cortex
in spelling. Neurology, 62, 2221–2229.

Rapcsak, S. Z., Beeson, P. M., Henry, M. L., Leyden, A., Kim, E., Rising, K., Andersen, S.,
& Cho, H. (2009). Phonological dyslexia and dysgraphia: Cognitive mechanisms and neur­
al substrates. Cortex, 45 (5), 575–591.

Rapcsak, S. Z., Beeson, P. M., & Rubens, A. B. (1991). Writing with the right hemisphere.
Brain and Language, 41, 510–530.

Rapcsak, S. Z., Henry, M. L., Teague, S. L., et al. (2007). Do dual-route models accurately
predict reading and spelling performance in individuals with acquired alexia and
agraphia? Neuropsychologia, 45, 2519–2524.

Page 27 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Rapcsak, S. Z., Rubens, A. B., & Laguna, J. F. (1990). From letters to words: Procedures
for word recognition in letter-by-letter reading. Brain and Language, 38, 504–514.

Rapp, B. Uncovering the cognitive architecture of spelling. In A. Hillis (Ed.), Handbook on


adult language disorders: Integrating cognitive neuropsychology, neurology and rehabili­
tation. Philadelphia: Psychology Press.

Rapp, B., & Caramazza, A. (1998). A case of selective difficulty in writing verbs. Neuro­
case, 4, 127–140.

Rapp, B., Epstein, C., & Tainturier, M. J. (2002). The integration of information across lexi­
cal and sublexical processes in spelling. Cognitive Neuropsychology, 19, 1–29.

Rapp, B., & Hsieh, L. (2002). Functional magnetic resonance imaging of the cognitive
components of the spelling process. Cognitive Neuroscience Society Meeting, San Fran­
cisco.

Rapp, B., & Lipka, K. (2011). The literate brain: The relationship between spelling and
reading. Journal of Cognitive Neuroscience, 23 (5), 1180–1197.

Raymer, A., Foundas, A. L., Maher, L. M., et al. (1997). Cognitive neuropsychological
analysis and neuroanatomical correlates in a case of acute anomia. Brain and Language,
58, 137–156.

Roeltgen, D. P., & Heilman, K. M. (1984). Lexical agraphia: Further support for the two-
system hypothesis of linguistic agraphia. Brain, 107, 811–827.

Roeltgen, D. P., Sevush, S., & Heilman, K. M. (1983). Phonological agraphia: Writing by
the lexical-semantic route. Neurology, 33, 755–765.

Rosen, H. J., Gorno-Tempini, M. L., Goldman, W. P., et al. (2002). Patterns of brain atrophy
in frontotemporal dementia and semantic dementia. Neurology, 58, 198–208.

Roux, F. E., Dufor, O., Giussani, C., Wamain, Y., Draper, L., Longcamp, M., & Démonet, J. F.
(2009). The graphemic/motor frontal area Exner’s area revisited. Annals of Neurology, 66
(4), 537–545.

Roux, F. E., Dufor, O., Giussani, C., Wamain, Y., Draper, L., Longcamp, M., & Démonet, J. F.
(2009). The graphemic/motor frontal area Exner’s area revisited. Annals of Neurology, 66,
537–545.

Saffran, E. M., & Coslett, H. B. (1998). Implicit vs. letter-by-letter reading in pure alexia:
A tale of two systems. Cognitive Neuropsychology, 15, 141–165.

Sakurai, Y., Sakai, K., Sakuta, M., & Iwata, M. (1994). Naming difficulties in alexia with
agraphia for kanji after a left posterior inferior temporal lesion. Journal of Neurology,
Neurosurgery, and Psychiatry, 57, 609–613.

Page 28 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Salmelin, R. (2007). Clinical neurophysiology of language: The MEG approach. Clinical
Neurophysiology, 118, 237–254.

Salmelin, R., Service, E., Kiesilä, P., Uutela, K., & Salonen, O. (1996). Impaired visual
word processing in dyslexia revealed with magnetoencephalography. Annals of
Neurology, 40 (2), 157–162.

Schlaggar, B. L., & McCandliss, B. D. (2007). Development of neural systems for reading.
Annual Review of Neuroscience, 30, 475–503.

Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word


recognition and naming. Psychological Review, 96 (4), 523–568.

Shallice, T. (1981). Phonological agraphia and the lexical route in writing. Brain, 104,
413–429.

Simos, P. G., Breier, J. I., Fletcher, J. M., Foorman, B. R., Castillo, E. M., & Papanicolaou,
A. C. (2002). Brain mechanisms for reading words and pseudowords: An integrated ap­
proach. Cerebral Cortex, 12 (3), 297–305.

Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes [re­
view]. Science, 283 (5408), 1657–1661.

Starrfelt, R., & Gerlach, C. (2007). The visual what for area: Words and pictures in the left
fusiform gyrus. NeuroImage, 35, 334–342.

Strain, E., Patterson, K., Graham, N., & Hodges, J. R. (1998). Word reading in
(p. 506)

Alzheimer’s disease: Cross-sectional and longitudinal analyses of response time and accu­
racy data. Neuropsychologia, 36, 155–171.

Tainturier, M.-J., & Rapp, B. (2001). The spelling process. In B. Rapp (Ed.), The handbook
of cognitive neuropsychology: What deficits reveal about the human mind (pp. 263–289).
Philadelphia: Psychology Press.

Tarkiainen, A., Helenius, P., Hansen, P. C., Cornelissen, P. L., & Salmelin, R. (1999). Dy­
namics of letter string perception in the human occipitotemporal cortex. Brain, 122 (Pt
11): 2119–2132.

Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., Fazio, F.,
Rizzolatti, G., Cappa, S. F., & Perani, D. (2005). Listening to action-related sentences acti­
vates fronto-parietal motor circuits. Journal of Cognitive Neurosci ence, 17 (2), 273–281.

Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com­
petition on activity in left prefrontal cortex during word generation. Neuron, 23 (3), 513–
522.

Page 29 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Tsapkini, K., & Rapp, B. (2009). Neural response to word and pseudoword reading after a
left fusiform gyrus resection: An fMRI investigation. Poster presented at the Academy of
Aphasia, Boston.

Tsapkini, K., & Rapp, B. (2010). The orthography-specific functions of the left fusiform
gyrus: Evidence of modality and category specificity. Cortex, 46, 185–205.

Turkeltaub, P. E., Eden, G. F., Jones, K. M., & Zeffiro, T. A. (2002). Meta-analysis of the
functional neuroanatomy of single-word reading: method and validation. NeuroImage, 16
(3 Pt 1), 765–780.

Usui, K., Ikeda, A., Takayama, M., et al. (2003). Conversion of semantic information into
phonological representation: A function in left posterior basal temporal area. Brain, 126,
632–641.

Usui, K., Ikeda, A., Takayama, M., et al. (2005). Processing of Japanese morphogram and
syllabogram in the left basal temporal area: Electrical cortical stimulation studies. Cogni­
tive Brain Research, 24, 274–283.

Vanier, M., & Caplan, D. (1985). CT correlates of surface dyslexia. In K. E. Patterson, J. C.


Marshall, & M. Coltheart (Eds.), Surface dyslexia: Neuropsychological and cognitive stud­
ies of phonological reading (pp. 511–525). London: Erlbaum.

Vigneau, M., Beaucousin, V., Hervé, P. Y., et al. (2006). Meta-analyzing left hemisphere
language areas: Phonology, semantics, and sentence processing. NeuroImage, 30, 1414–
1432.

Vinckier, F., Dehaene, S., Jobert, A., et al. (2007). Hierarchical coding of letter strings in
the ventral stream: dissecting the inner organization if the visual word-form system. Neu­
ron, 55, 143–156.

Weekes, B. (1995). Right hemisphere writing and spelling. Aphasiology, 9, 305–319.

Weekes, B., Coltheart, M., & Gordon, E. (1997). Deep dyslexia and right-hemisphere read­
ing: Regional cerebral blood flow study. Aphasiology, 11, 1139–1158.

Wernicke, C. (1894). Grundriss der Psychiatrie in Kliniscen Vorlesungen. Leipzig: Verlag


von Georg Thieme.

Wilson, S. M., Brambati, S. M., Henry, R. G., et al. (2009). The neural basis of surface
dyslexia in semantic dementia. Brain, 132, 71–86.

Wilson, T. W., Leuthold, A. C., Lewis, S. M., et al. (2005). Cognitive dimensions of ortho­
graphic stimuli affect occipitotemporal dynamics. Experimental Brain Research, 167,
141–147.

Page 30 of 31
Cognitive Neuroscience of Written Language: Neural Substrates of Reading
and Writing
Wilson, T. W., Leuthold, A. C., Moran, J. E., et al. (2007). Reading in a deep orthography:
Neuromagnetic evidence of dual mechanisms. Experimental Brain Research, 180, 247–
262.

Wise, R., Hadar, U., Howard, D., & Patterson, K. (1991). Language activation studies with
positron emission tomography [review]. Ciba Foundation Symposium, 163, 218–228; dis­
cussion 228–234.

Woollams, A., Lambon Ralph, M. A., Plaut, D. C., & Patterson, K. (2007). SD-squared: On
the association between semantic dementia and surface dyslexia. Psychological Review,
114, 316–339.

Wright, N., Mechelli, A., Noppeney, U., et al. (2008). Selective activation around the left
occipito-temporal sulcus for words relative to pictures: Individual variability of false posi­
tives? Human Brain Mapping, 29, 986–1000.

Xue, G., & Poldrack, R. A. (2007). The neural substrates of visual perceptual learning of
words: Implications for the visual word form area hypothesis. Journal of Cognitive Neuro­
science, 19, 1643–1655.

Zaidel, E. (1990). Language functions in the two hemispheres following complete cere­
bral commissurotomy and hemispherectomy. In F. Boller, J. Grafman (Eds.), Handbook of
neuropsychology (Vol. 4, pp. 115–150). Amsterdam: Elsevier.

Kyrana Tsapkini

Kyrana Tsapkini, Departments of Neurology, and Physical Medicine and Rehabilita­


tion, Johns Hopkins University, Baltimore, MD

Argye Hillis

Argye Hillis, Johns Hopkins University School of Medicine, Departments of Neurolo­


gy and Physical Medicine and Rehabilitation, and Department of Cognitive Science,
Johns Hopkins University

Page 31 of 31
Neural Systems Underlying Speech Perception

Neural Systems Underlying Speech Perception  


Sheila Blumstein and Emily B. Myers
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0025

Abstract and Keywords

This chapter reviews the neural systems underlying speech perception processes with a
focus on the acoustic properties contributing to the phonetic dimensions of speech and
the mapping of speech input to phonetic categories and word representations. Neu­
roimaging findings and results from brain lesions resulting in aphasia suggest that these
processes recruit a neural system that includes temporal (Heschl’s gyrus, superior tempo­
ral gyrus, superior temporal sulcus, and middle temporal gyrus), parietal (supramarginal
and angular gyri), and frontal lobe structures (inferior frontal gyrus) and support a func­
tional architecture in which there are distinct stages organized such that information at
one stage of processing influences and modulates information at other stages of process­
ing. In particular, spectral-temporal analysis of speech recruits temporal areas, lexical
processing, i.e. the mapping of sound structure to phonetic category and lexical represen­
tations, recruits posterior superior temporal gyrus and parietal areas, and selection
processes recruit frontal areas. The authors consider how evidence from cognitive neuro­
science can inform a number of theoretical issues that have dominated speech perception
research, including invariance, the influence of higher level processes (both lexical and
sentence processing) on lower level phonetic categorization processes, the influence of
articulatory and motor processes on speech perception, the effects of phonetic and
phonological competition on lexical access, and the plasticity of phonetic category learn­
ing in the adult listener.

Keywords: speech perception, lexical processing, lexical competition, invariance, neuroimaging, aphasia, acoustic
properties, phonetic categories, plasticity

Introduction
Language processing appears to be easy and seamless. However, evidence across a wide
range of disciplines and approaches to the study of language, including theoretical lin­
guistics, behavioral studies of language, and the cognitive neuroscience of language, sug­
gests that the language system can be broken down into separable components—speech,
words, syntactic structure, meaning. It is the goal of this chapter to focus on one of these
Page 1 of 30
Neural Systems Underlying Speech Perception

components—namely, speech—and to examine the neural systems recruited in its pro­


cessing.

For most of us, speech is the medium we use for language communication. Thus, in both
listening and speaking, it serves as the interface with the language system. In language
understanding, the auditory input must be mapped onto phonetic categories, and these
must be mapped onto word representations that are ultimately mapped onto meaning or
conceptual structures. Similarly, in spoken word production, concepts must be mapped
onto word representations that are ultimately mapped onto articulatory plans and com­
mands for producing the sounds of speech. Although study of the cognitive (p. 508) neuro­
science of speech processing includes the processes that map sounds to words and words
to speech, we will focus in this chapter on the perceptual processing stream, namely, the
processes that map auditory input to phonetic category and word representations.

Figure 25.1 Functional architecture (right panel) of


the auditory speech processing system from auditory
input to lexical access and the neural systems under­
lying this processing (left panel).

To begin this review it is important to provide a theoretical framework that elucidates our
current understanding of the functional architecture of speech processing. Such a frame­
work serves as a guide as we examine the neural systems underlying speech processing.
At the same time, examination of the neural systems recruited during speech processing
may in turn shape and constrain models of speech processing (Poldrack, 2006). For exam­
ple, although most models agree that in both speech perception and speech production
there is a series of successive stages in which information is transformed along the pro­
cessing stream, there is disagreement about the extent to which the processing stages in­
fluence each other. In one view, the processing stages are discrete and informationally en­
capsulated (Fodor, 1983; Levelt, 1992). In this view, the input to a stage or module is
processed and transformed, but its output is sent to the next stage without leaving any
“residue” of information from the earlier processing stage. In contrast, the interactive or
cascading view holds that although there are multiple stages of processing, information
flow at one level of processing influences and hence modulates other stages of processing
(Dell, 1986; Gaskell & Marslen-Wilson, 1997; McClelland & Elman, 1986). For example, it
has been shown in spoken word production that the sound structure properties of the lex­
icon cascade and influence phonological planning and ultimately articulatory implementa­
tion stages or processing (Baese-Berk & Goldrick, 2009; Goldrick & Blumstein, 2006; Per­
amunage, Blumstein, Myers, Goldrick, & Baese-Berk, 2011).

Page 2 of 30
Neural Systems Underlying Speech Perception

The theoretical framework we adopt views speech processing as an integrated system in


which there are multiple stages of processing organized in a network-like architecture
such that information at one stage of processing influences that at other stages of pro­
cessing (Gaskell & Marslen-Wilson, 1999; Marslen-Wilson, 1987; Marslen-Wilson & Welsh,
1978; McClelland & Elman, 1986). In speech perception, as shown in Figure 25.1, the au­
ditory input is mapped onto acoustic-phonetic (spectral-temporal) properties of speech,
which are then mapped onto more abstract phonological properties such as features, seg­
ments, and syllables, and these phonological properties are in turn mapped onto a word,
i.e. lexical (sound structure) representations.

At each stage of processing, representations are coded in terms of patterns of activation


and inhibition. The extent of activation of a representation unit is not all or none, but
rather is graded and depends on the goodness of the input and the extent to which there
are other representational units that share properties with it and hence compete for ac­
cess. Indeed, the extent of activation and the subsequent selection of a particular repre­
sentational unit occur at each stage of processing. Thus, acoustic-phonetic representa­
tions that are similar compete with each other (e.g., [da] and [ta] share spectral proper­
ties but are distinguished by temporal properties of voicing), phonological representa­
tions compete with each other (e.g., /da/ and /ba/ (p. 509) share features of manner of ar­
ticulation and voicing but are distinguished by place of articulation), and word represen­
tations compete with each other (e.g., hammock and hammer share the same onset prop­
erties but are distinguished by the sound structure of the final syllable).

In the remainder of this chapter we discuss in detail how this functional architecture
maps onto the neural system. In broad strokes, speech perception and the mapping of
sounds to words recruits a neural system which includes temporal lobe structures
(Heschl’s gyrus, superior temporal gyrus [STG], superior temporal sulcus [STS], and mid­
dle temporal gyrus [MTG]), parietal lobe structures (supramarginal gyrus [SMG] and an­
gular gyrus [AG]), and frontal lobe structures (inferior frontal gyrus [IFG]) (see Figure
25.1). Early stages of processing appear to be bilateral, whereas later stages appear to
recruit primarily the left hemisphere. We examine how the neural system responds to dif­
ferent acoustic properties contributing to the phonetic dimensions of speech. We investi­
gate the neural substrates reflecting the adult listener’s ability to respond to variability in
the speech stream and to perceive a stable phonetic representation. We consider the po­
tential relationship between speech perception and speech production processes by ex­
amining the extent to which speech perception processes recruit and are influenced by
the speech production and motor system. We review the various stages implicated in the
mapping from sounds to words as a means of examining whether the same neural system
is recruited for resolving competition across linguistic domains or whether different areas
are recruited as a function of a particular linguistic domain. Finally, we examine the neur­
al systems underlying the plasticity of the adult listener in learning new phonetic cate­
gories.

Page 3 of 30
Neural Systems Underlying Speech Perception

There is a rich literature investigating speech processing in cognitive neuroscience using


a range of methods. Much of the early work that laid the foundations for structure–func­
tion relations in speech processing comes from lesion-based research examining the na­
ture of speech perception and production deficits in aphasia. The strength of this method
is that it provides a rich tapestry of behaviors in which one can see impairment or sparing
of particular language functions in the same patient. Of importance, it is also possible to
see different patterns of impairment across different types of aphasics, allowing for in­
sights into the potential functional role of particular neural areas in accomplishing a par­
ticular language task. Nonetheless, because the brain lesions of patients tend to be large,
it is difficult to make strong inferences about structure–function relations. Evidence from
the neuroimaging literature can speak to these issues, allowing for a closer examination
of the neural areas activated during speech processing and the potential modulation of
activation along the processing stream. For this reason, this chapter looks at converging
evidence from multiple cognitive neuroscience methods in examining the neural systems
underlying speech perception.

Perception of the Acoustic Properties of


Speech
The listener faces a daunting task in perceiving speech. In order to map the complex
acoustic signal onto a speech sound in his linguistic repertoire, the acoustic details of the
speech signal that are relevant for phonetic identity must be extracted. These acoustic
properties include changes in the fine-grained spectral details of speech over short and
long time scales. At a short time scale, fine-grained timing distinctions on the order of 10
ms between the onset of the burst and the onset of voicing serve as a cue to voicing in
stop consonants in many languages, and differentiate sounds such as /d/ versus /t/ (Lisker
& Abramson, 1964). Spectral sweeps about 40 ms in length mark the transitions from
consonant to vowel in stop consonants, and signal place of articulation, distinguishing be­
tween syllables such as /da/ and /ga/. At a longer time scale, on the order of 150 to 300
ms, relatively steady-state spectral patterns distinguish between vowels and between
fricative consonants such as /f/ and /s/. At a yet longer time scale, changes in pitch over
the course of an entire word are important for establishing lexical tone, which determines
lexical identity in languages such as Chinese and Thai. Moreover, information at short
and long temporal durations may be useful for processing indexical properties of speech
such as speaker identity, speaker emotional state, speech rate, and prosodic boundaries
(Wildgruber, Pihan, Ackermann, Erb, & Grodd, 2002). Taken together, evidence suggests
that the perception of speech is supported by lower level, bottom-up acoustic processes
that detect and analyze the acoustic details of speech.

There is fairly general consensus that processing the acoustic details of speech occurs in
the temporal lobes, specifically in Heschl’s gyrus, the STG, and extending into the STS
(Binder & Price, 2001; Hickok & Poeppel, 2004). Phoneme identification and discrimina­
tion tasks are disrupted when the (p. 510) left posterior temporal gyrus is directly stimu­

Page 4 of 30
Neural Systems Underlying Speech Perception

lated (Boatman & Miglioretti, 2005), and damage to the bilateral STG can result in
deficits in processing the details of speech (Auerbach, Allard, Naeser, Alexander, & Al­
bert, 1982; Coslett, Brashear, & Heilman, 1984; cf. Poeppel, 2001, for review). However,
interesting differences emerge in the neural systems recruited in the processing of the
acoustic parameters of speech. These differences appear to reflect whether the acoustic
attribute is spectral or temporal, whether the acoustic information occurs over a short or
long time window, and whether the cue plays a functional role in the language. We now
turn to these early analysis stages in which the acoustic-phonetic properties are mapped
onto phonetic representations, which is the primary focus of the next section.

Perception of the Temporal Properties of Speech: Voicing in Stop


Consonants

Voice onset time (VOT) is defined as the time between the release of an oral closure and
the onset of voicing in syllable-initial stop consonants. This articulatory/acoustic parame­
ter serves as a cue to the perception of voicing contrasts in syllable-initial stop conso­
nants, distinguishing between voiced sounds such as /b/, /d/, and /g/, and voiceless sounds
such as /p/, /t/, and /k/ (Lisker & Abramson, 1964). VOT is notable in part because small
differences (10 to 20 ms) in the timing between the burst and onset of voicing cue per­
ceptual distinctions between voiced and voiceless stop consonants (Liberman, Delattre, &
Cooper, 1958; Liberman, Harris, Hoffman, & Griffith, 1957). Nonetheless, depending on
the VOT values, similar differences are perceived by listeners as belonging to the same
phonetic category. Thus, stimuli on a VOT continuum are easily discriminated when they
fall into two distinct speech categories (e.g., “d” versus “t”), but are difficult to distin­
guish if they are members of the same phonetic category. This is true even when the
acoustic distance between exemplars is equivalent.

Evidence from intercerebral evoked potentials and magnetic and electrophysiological


recordings suggests that at least some aspects of this type of discontinuous perception
may arise as a result of properties of the auditory system. Steinschneider and colleagues
(1999) recorded intercerebral auditory evoked potentials from three patients undergoing
cortical mapping before surgery for intractable epilepsy. In these patients, evoked re­
sponses from Heschl’s gyrus and the posterior portion of the STG in the right hemisphere
showed distinct patterns of response to tokens with VOTs less than 20 ms than those with
VOTs greater than 20 ms. For VOTs less than 20 ms, only one component was observed,
regardless of the timing of the burst with respect to voicing (cf. Sharma & Dorman,
1999). For speech tokens with VOTs greater than 20 ms, two components were observed,
one that was time-locked to the onset of the burst, which signals the release of the stop
closure, and another that was time-locked to the onset of voicing. Given that 20-ms VOT
is near the boundary between voiced and voiceless phonetic categories in some lan­
guages, this categorical response to voiced tokens was taken as evidence that physiologi­
cal properties of auditory cortex support the categorical perceptual response. A similar
pattern has been observed in nonhuman primates and more phylogenetically distant
species such as guinea pigs (McGee, Kraus, King, Nicol, & Carrell, 1996; Steinschneider,
Schroeder, Arezzo, & Vaughan, 1995), suggesting that the sensitivity of the auditory sys­
Page 5 of 30
Neural Systems Underlying Speech Perception

tem to this property forms the basis for the phonetic property of voicing in stop conso­
nants.

Consistent with these findings are results from Liégeois-Chauvel and colleagues (1999).
Using intracerebral evoked potentials, they observed components time-locked to the
acoustic properties of voiced and voiceless tokens for both speech and nonspeech lateral­
ized to the left hemisphere in Heschl’s gyrus, the planum temporale (PT), and the posteri­
or part of the STG (Brodmann area [BA] 22). Results from a magnetoencephalography
(MEG) study (Papanicolaou et al., 2003) suggest that responses to nonspeech “tone-onset-
time” and speech VOT stimuli do not differ in laterality early in processing (before 130
ms). However, a strong leftward laterality is observed for speech tokens, but not for non­
speech tokens, later in processing (>130 ms). Similarly, in a MEG study examining within-
phonetic-category variation, Frye et al. (2007) showed greater sensitivity to within-cate­
gorization in left-hemisphere evoked responses than in the right hemisphere. Taken to­
gether, these findings suggest that early stages of processing recruit auditory areas bilat­
erally, but that at later stages, the processing is lateralized to the left hemisphere, pre­
sumably when the auditory information is encoded onto higher level acoustic-phonetic
properties of speech that ultimately have linguistic relevance.

Indeed, left posterior temporal activation is modulated by the nature of the phonetic cate­
gory information conveyed by a particular VOT value. In particular, results of neuroimag­
ing (p. 511) studies have shown graded activation to stimuli along a VOT continuum in left
posterior temporal areas with increasing activation as the VOT of the stimuli approached
the phonetic boundary (Blumstein, Myers, & Rissman, 2005; Hutchison, Blumstein, & My­
ers, 2008; Myers, 2007; Myers, Blumstein, Walsh, & Eliassen, 2009). Exemplar stimuli
(i.e., those stimuli perceived as a clear-cut voiced or voiceless stop consonant) showed
the least activation. Stimuli that were poorer exemplars of their category, because those
stimuli either were close to the category boundary (Blumstein et al., 2005) or had ex­
treme VOT values at the periphery of the continuum (Myers, 2007), showed increased
neural activation, suggesting that the posterior portion of the STG is sensitive to the
acoustic-phonetic structure of the phonetic category.

Perception of the Spectral Properties of Speech: Place of Articulation


in Stop Consonants

Stop consonants such as /b/, /d/, and /g/ differ with respect to their place of articulation—
that is, the location in the oral cavity where the closure is made to produce these sounds.
In stop consonants, differences in place of articulation are signaled by the shape of for­
mants as they transition from the burst to the vowel. Unlike VOT, which is a temporal pa­
rameter, place of articulation is cued by rapid spectral changes over a relatively short
time window of some 20 to 40 ms (Stevens & Blumstein, 1978). Neuroimaging evidence
largely supports the notion that place of articulation, like VOT, recruits temporal lobe
structures and is processed primarily in the left hemisphere.

Page 6 of 30
Neural Systems Underlying Speech Perception

Joanisse and colleagues (2007) examined cortical responses to changes in phonetic identi­
ty from /ga/ to /da/ as participants listened passively to phonetic tokens while they
watched a silent movie. In this study, sensitivity to between-category shifts was observed
in the left STS, extending into the MTG. Of interest, similar areas were recruited for sine-
wave analogues of place of articulation (Desai, Liebenthal, Waldron, & Binder, 2008).
Sine-wave speech is speech that has been filtered so that the energy maxima are re­
placed by simple sine waves. Initially, sine-wave speech sounds like unintelligible noise;
however, it has been shown that with experience, listeners become adept at perceiving
these sounds as speech (Remez, Rubin, Pisoni, & Carrell, 1981). Of importance, although
sine-wave speech is stripped of the precise acoustic cues that distinguish the sounds of
speech, it preserves relative shifts in spectral energy that signal place of articulation in
stop consonants. In a functional magnetic resonance imaging (fMRI) study by Desai et al.
(2008), participants heard sounds along a continuum from /ba/ to /da/, as well as sine-
wave versions of these sounds and sine-wave stimuli that did not correspond to any
speech sound. They were scanned while listening to speech and sine-wave speech sounds
before and after they were made aware of the phonetic nature of the sine-wave speech
sounds. Results showed that activation patterns were driven not by the spectral charac­
teristics of the stimuli, but rather by whether sounds were perceived as speech or not.
Specifically, left posterior temporal activation was observed for real speech sounds before
training and for sine-wave speech sounds once participants had begun to perceive them
as speech. That sine-wave speech recruits the same areas as speech but only when it is
perceived by the listener as speech suggests that the linguistic function of an input is crit­
ical for recruiting the left-hemisphere speech processing stream. As we will see below
(see the section, Perception of Lexical Tone), these findings are supported by the percep­
tion of tone when it has linguistic relevance to a speaker.

Perception of Vowels

Vowel perception involves attention to spectral information at a longer time scale than
that of either VOT or place of articulation, typically on the order of 150 to 300 ms. Al­
though stop consonants are marked by spectral transitions over short time windows,
spectral information in vowels is relatively steady at the vowel midpoint, although there
are spectral changes as a function of the consonant environment in which a vowel occurs
(Strange, Jenkins, & Johnson, 1983). Moreover, vowels are more forgiving of differences
in timing—for instance, vowels as short as 25 ms can still be perceived as members of
their category, suggesting that the dominant cue to vowel identity is the spectral cue,
rather than durational cues (Strange, 1989). Indeed, vowel quality can be perceived
based on the spectral transitions of a preceding consonant (Blumstein & Stevens, 1980).

There are few behavioral studies examining the perception of vowels pursuant to brain in­
jury. Because participants with right-hemisphere damage typically do not present with
aphasia, it is usually assumed that they do not have speech perception impairments. The
sparse evidence available supports this claim (cf. Tallal & Newcombe, 1978). (p. 512)
Nonetheless, dichotic listening experiments with normal individuals have typically shown
that, unlike consonants, which display a strong right ear (left hemisphere) advantage,
Page 7 of 30
Neural Systems Underlying Speech Perception

vowels show either no ear preference or a slight left ear (right hemisphere) advantage
(Shankweiler & Studdert-Kennedy, 1967; Studdert-Kennedy & Shankweiler, 1970). As de­
scribed below, neuroimaging experiments also suggest neural differences in the process­
ing of consonants and vowels.

Not surprisingly, fMRI studies of vowel perception show recruitment of the temporal lobe.
However, results suggest that vowel processing appears to differ from consonant process­
ing both in terms of the areas within the temporal lobe that are recruited and in terms of
the laterality of processing.

With respect to the recruitment of temporal lobe areas, it appears that vowels recruit the
anterior temporal lobe to a greater degree than consonant processing. In particular, acti­
vation patterns observed by Britton et al. (2009), Leff et al. (2009), and Obleser et al.
(2006) have shown that areas anterior to Heschl’s gyrus are recruited in the perception of
vowels. For instance, in a study by Obleser and colleagues (2006), activation was moni­
tored using fMRI as participants listened to sequences of vowels that varied in either the
first or second formants, acoustic parameters that spectrally distinguish vowels. In gener­
al, more activation was observed for vowels than nonspeech sounds in the anterior STG
bilaterally. Moreover, anterior portions of this region responded more to back vowels ([u]
and [o]), whereas more posterior portions responded more to front vowels ([i] and [e]),
suggesting that the anterior portions of the superior temporal cortex respond to the
structure of vowel space.

These findings suggest that there may be a subdivision of temporal cortex along the ante­
rior-posterior axis as a function of the phonetic cues giving rise to the sounds of lan­
guage. Phonetic cues to stop consonants seem to recruit temporal areas posterior to
Heschl’s gyrus as well as the MTG (Joanisse, Zevin, & McCandliss, 2007; Liebenthal et
al., 2010; Myers et al., 2009), whereas phonetic cues to vowels seem to recruit more ante­
rior regions (Britton et al., 2009; Leff et al., 2009; Obleser et al., 2006).

With respect to laterality, the preponderance of neuroimaging evidence suggests a bilat­


eral or somewhat right-lateralized organization for vowel processing (Kasai et al., 2001;
Obleser et al., 2006; Obleser, Elbert, Lahiri, & Eulitz, 2003). For instance, a study by
Formisano and colleagues (2008) used support vector machines to learn the patterns of
fMRI activation associated with three different vowels ([i], [a], [u]) spoken by three speak­
ers. Activation in both the right and left STG and extending into the STS distinguished be­
tween vowels in both the trained speakers’ voices and a novel speaker’s voice. Consistent
with these findings, Guenther and colleagues (2004) report activation in the right STG for
500-ms isolated vowel stimuli, which reflected a vowel’s prototypicality as a member of
its phonetic category.

Nonetheless, it appears that the laterality of vowels is influenced by their duration and
the context in which they occur. In particular, it appears that vowels when presented in
more naturally occurring contexts, where either their durations are shorter or they occur
in consonant contexts, are more likely to recruit left-hemisphere mechanisms. For exam­
ple, Britton et al. (2009) reported a study of vowel processing in which vowels were pre­
Page 8 of 30
Neural Systems Underlying Speech Perception

sented at varying durations, from 75 to 300 ms. Vowels at longer durations showed
greater activation in both left and right anterior STG, with a general rightward asymme­
try for vowels. However, there was a decreasing right-hemisphere preference as the dura­
tion of the vowels got shorter. A study by Leff et al. (2009) examined the hemodynamic re­
sponse to an unexpected vowel stimulus embedded in a /bVt/ context. Increased activa­
tion for the vowel deviant was observed in anterior portions of the left STG, whereas no
such response to deviant stimuli was observed when the standard and deviant stimuli
were nonspeech stimuli that were perceptually matched to the vowels.

Perception of Lexical Tone


Until now we have examined acoustic features that are limited to phonetic segments.
However, suprasegmental information in the acoustic stream also has linguistic rele­
vance. One such suprasegmental property of language is lexical tone, in which changes in
the pitch contour over a word signal differences in meaning. For example, tonal lan­
guages such as Chinese and Thai employ distinct pitch patterns (e.g., rising and falling)
to distinguish between lexical items with the same segmental structure. Of interest, lexi­
cal tone is an acoustic property with a relatively long duration: on the order of 300 to 500
ms. Nonlinguistic processing of pitch information, for instance in the perception of music,
has long been considered a right-hemisphere process (Zatorre, Belin, & Penhune, 2002).
Nonetheless, for native speakers of tone languages, lexical tone appears to be processed
in (p. 513) the left hemisphere. In contrast, native speakers of English do not show left-
hemisphere processing for Chinese or Thai words with different tones because lexical
tones do not play a linguistic role (Gandour et al., 2000).

Evidence suggests that even at the level of the brainstem, native speakers of tone lan­
guages respond differently to tone contours even when this information is not embedded
in linguistic structure (Krishnan, Swaminathan, & Gandour, 2009), suggesting that expe­
rience in learning a tonal language fundamentally alters one’s auditory perception of
pitch changes in general. At the level of the cortex, however, responses to a given type of
tone pattern may be more language specific. For instance, Xu et al. (2006) reported a
study of tone processing that used Chinese syllables with appropriate overlaid Chinese
tones, and Chinese syllables with overlaid Thai tones. This latter group of stimuli was not
perceived by Chinese listeners as words in their language, and neither stimulus type was
perceived as words by Thai listeners. When native speakers of Thai and Chinese per­
formed a discrimination task on these stimuli, an interaction was observed in the left PT,
such that activation was greatest for tone contours that corresponded to each group’s na­
tive language (i.e., Chinese tones for Chinese listeners, Thai tones for Thai listeners). Of
particular interest, this language-specific preference was observable irrespective of
whether stimuli had lexical content or not. Although processing lower level details of tone
may rely on a bilaterally distributed system, it seems clear that even nonlinguistic uses of
tone become increasingly left lateralized when they correspond to the tonal patterns used
in one’s native language.

Page 9 of 30
Neural Systems Underlying Speech Perception

Invariance for Phonetic Categories of Speech


Exemplars of speech categories vary in the way they are realized acoustically (Peterson &
Barney, 1952). This variability can come from numerous sources, including different vocal
tract properties and sizes associated with distinct talkers, imprecise motor control in
speech articulation, and changes in speaking rate. Listeners are surprisingly sensitive to
these variations. For example, evidence suggests that not all members of a phonetic cate­
gory are perceived as equal (Kuhl, 1991; Pisoni & Tash, 1974). This sensitivity to within-
category variation is reflected in neural sensitivity in the posterior STG, where there are
graded neural responses, with increased activation as a member of a phonetic category
approaches the acoustic-phonetic boundary (Frye et al., 2007, 2008; Guenther, Nieto-Cas­
tanon, Ghosh, & Tourville, 2004; Myers, 2007; Myers et al., 2009). However, for the listen­
er, the primary purpose of the speech signal is to denote meaning. Indeed, listeners per­
ceive a stable phonetic percept despite this variability. As such, a crucial question is how
this acoustic variability is mapped onto a stable representation of a phonetic category. It
is this question that has been termed the “invariance problem.”

The search for the neural correlates of invariance presents methodological challenges to
the cognitive neuroscientist. The goal is to identify patterns of response that show sensi­
tivity to variation between phonetic categories yet are insensitive to variation within the
category. One way researchers have approached this issue is to use adaptation or oddball
designs, in which a standard speech stimulus is repeated a number of times. This string
of standard stimuli is interrupted by a stimulus that differs from the standard along some
dimension, either acoustically (e.g., a different exemplar of the same phonetic category,
or the same phonetic content spoken by a different speaker) or phonetically (e.g., a differ­
ent phonetic category). Increases in activation for the “different” stimulus are presumed
to reflect either a release from adaptation (Grill-Spector, Henson, & Martin, 2006; Grill-
Spector & Malach, 2001) or an active change detection response (Zevin, Yang, Skipper, &
McCandliss, 2010), but in either case serve to index a region’s sensitivity to the dimen­
sion of change.

A study by Joanisse and colleagues (2007) used such a method to investigate neural acti­
vation patterns to within- and between-category variation along a place of articulation
continuum from [da] to [ga]. Sensitivity to between-category variation was observed
along the left STS and into the MTG. One cluster in the left SMG showed an invariant re­
sponse, that is, sensitivity to between-category changes in the face of insensitivity to
within-category changes. In this study, no areas showed sensitivity to within-category
variation, which leaves open the question of whether the paradigm was sensitive enough
to detect small acoustic changes.

A study by Myers and colleagues (2009) investigated the same question with a similar de­
sign using a VOT continuum from [da] to [ta]. In contrast to Joanisse et al., sensitivity to
both within- and between-category variation was evident in the left posterior STG, indi­
cating that the neural system was sensitive to within-category contrasts. The (p. 514) only
region to show an “invariant” response (i.e., sensitivity to 25-ms differences that signaled

Page 10 of 30
Neural Systems Underlying Speech Perception

phonetic between-category differences and insensitivity to 25-ms differences that sig­


naled within-category differences) was in the posterior portion of the left IFG extending
into the precentral gyrus, a region that was not imaged in the study by Joanisse et al.
(2007).

The fact that invariant responses emerged in frontal areas rather than in temporal areas
suggests that the perception of category invariance may arise, at least in part, from com­
putations in frontal areas on graded phonetic input processed in the temporal lobes
(Chang et al., 2010). A cognitive, rather than perceptual, role in computing phonetic cate­
gory membership seems plausible in light of behavioral evidence showing that the precise
location of the phonetic category boundary is not fixed, but rather varies with a host of
factors including, but not limited to, lexical context, speaker identity, and speech rate
(Ganong, 1980; Kraljic & Samuel, 2005; Miller & Volaitis, 1989). A mechanism that relied
only on the specific properties of the acoustic signal would be unable to account for the
influence of these factors on speech perception.

Listeners perceive the same phonetic category spoken by both a male and a female
speaker even though the absolute acoustic properties are not the same. At the same time,
the listener also recognizes that two different speakers produced the same utterance.
Thus, the perceptual system must be able to respond in an invariant way at some level of
processing to phonetic identity while at the same time being able to respond differently to
speaker identity.

This relationship between phonetic category invariance and acoustic variability from dif­
ferent speakers has been the topic of much research in the behavioral literature. This
type of invariance is especially relevant given the controversy about the degree to which
indexical features in the speech signal such as talker information are retained or discard­
ed as we map speech to meaning. Behavioral studies (Goldinger, 1998; Palmeri,
Goldinger, & Pisoni, 1993) suggest that speaker information is encoded along with the
lexical form of a word, and that indexical information affects later access to that form.
This observation has led to episodic theories of speech perception, in which nonphonetic
information such as talker information is preserved in the course of speech processing.
Episodic theories stand in contrast to abstractionist theories of speech perception, which
propose that the speech signal is stripped of indexical features before the mapping of
sound structure to lexical form (Stevens, 1960).

Studies investigating neural activation patterns associated with changes in either speaker
or phonetic information have revealed dissociations between processing of these two
types of information. Sensitivity to changing speakers in the face of constant linguistic in­
formation has been observed in the right anterior STS (Belin & Zatorre, 2003) and in the
bilateral MTG and SMG (Wong, Nusbaum, & Small, 2004). In contrast, neural coding that
distinguished among vowel types, but not individual speakers, was reported by Formisano
and colleagues (2008) across a distributed set of regions in left and right STG and STS.

Page 11 of 30
Neural Systems Underlying Speech Perception

Nonetheless, none of these studies has specifically shown invariance to speaker change—
that is, an area that treats phonetically equivalent utterances spoken by different speak­
ers as the same. Using an adaptation paradigm, Salvata and colleagues (2012) showed
speaker-invariant responses in the anterior portion of the STG bilaterally, suggesting that
at some level of analysis, the neural system may abstract away from the acoustic variabili­
ty across speakers in order to map to a common phonetic form. The fact that speaker-in­
variant regions were observed in the temporal lobes suggests that this abstraction may
occur relatively early in the phonetic processing stream, before sound is mapped to lexi­
cal form, challenging a strict episodic view of lexical processing.

Of interest, invariant responses may in some cases be shifted by attention to different as­
pects of the signal. Bonte et al. (2009) used electroencephalography (EEG) to investigate
the temporal alignment of evoked responses to vowels spoken by a set of three different
speakers. Participants engaged in a one-back task on either phonetic identity or speaker
identity. When engaged in the phonetic task, significant temporal alignment was observed
for vowels regardless of the speaker, and when engaged in the speaker task, signals were
aligned for different vowels spoken by the same speaker. Similarly, von Kriegstein et al.
(2003) showed that attention to voices versus linguistic content resulted in shifts in acti­
vation, with attention to voices activating the right middle STS and attention to linguistic
content activating the homologous area on the left.

Articulatory and Motor Influences on Speech


Perception
Until now, we have focused on the influence of information in the acoustic stream on the
mapping to (p. 515) a phonetic category. However, acoustic information is only part of the
array of multimodal information associated with phonetic categories. Importantly, speech
sound categories are linked to the articulatory codes that are necessary to produce these
same sounds. This has led to the question of the precise role that articulatory and motor
codes play in the perception of speech (Galantucci, Fowler, & Turvey, 2006; Liberman &
Mattingly, 1985).

Evidence from neuroimaging suggests that motor regions involved during speech produc­
tion do play some role in speech perception processes. A study by Wilson and colleagues
showed coactivation of a region on the border of BA 4 and BA 6 for both the perception
and production of speech (Wilson, Saygin, Sereno, & Iacoboni, 2004). A similar precen­
tral region, together with a region in the left superior temporal lobe, was also shown to
be more active for the perception of non-native speech sounds than for native speech
sounds (Wilson & Iacoboni, 2006), suggesting that motor regions may be recruited, espe­
cially when the acoustic input is difficult to map to an existing speech sound in the native
language inventory. What is less clear is the degree to which motor regions involved in ar­
ticulation are necessary for the perception of speech.

Page 12 of 30
Neural Systems Underlying Speech Perception

A transcranial magnetic stimulation (TMS) study by Mottonen & Watkins (2009)


investigated this issue. TMS was applied to the lip region of motor cortex, after which
participants performed phonetic categorization and discrimination tasks on items along
four acoustic phonetic continua. In two of them, [ba]-[da] and [pa]-[ta], at one end of the
continuum, the phonetic category involved articulation at the lips (i.e., [ba] and [pa]), and
at the other end it did not (i.e., [da] and [ta]). In the two others, [ka]-[ga] and [da]-[ga],
none of the phonetic categories involved articulation at the lips. Results showed after
TMS stimulation a change in the slope and discrimination function for stimuli near the
phonetic boundary for the continua that involved the lips. No changes were found for the
other two continua. Of importance, the changes in phonetic identification and in discrimi­
nation occurred only for those stimuli near the phonetic boundary and did not affect the
end-point, good-exemplar stimuli. Thus, although TMS affected the perception of speech,
it only affected those sounds that were poorer exemplars of the phonetic category. It is
possible then that, similar to the findings of Wilson and Iacoboni (2006), who showed that
motor regions appear to be recruited when the acoustic input is difficult to map to an ex­
isting speech sound in the native language inventory, motor regions are recruited in
speech perception when the acoustic input is poor. The question remains as to whether
these regions form the core of speech perception abilities, or instead are recruited as a
supporting resource in cases in which the acoustic signal is compromised in some way.

Some evidence comes from the neuropsychological literature. In particular, it is the case
that patients with damage to the speech motor system typically show good auditory lan­
guage comprehension (see Hickok, 2009, for a review). Nonetheless, although this evi­
dence is suggestive, it is not definitive because auditory comprehension is supported by
multiple syntactic, semantic, and contextual cues that may disguise an underlying speech
perception deficit. To answer this question, it is necessary to examine directly the extent
to which damage to motor areas produces a frank speech perception deficit. Few studies
have investigated this question. However, those that have are consistent with the view
that motor area involvement is not central to speech perception. In particular, aphasic pa­
tients with anterior brain damage who showed deficits in the production of VOT in stop
consonants nonetheless performed normally in the perception of a VOT continuum (Blum­
stein, Cooper, Zurif, & Caramazza, 1977).

Mapping of Sounds to Words


As we have seen, the perception of the sound properties of speech engages a processing
stream in the temporal lobes in which the auditory information is hierarchically organized
and transformed into successively more abstract representations. Although early analysis
stages recruit the temporal lobes, other neural areas, including the IFG and inferior pari­
etal areas such as the left SMG and AG (Burton, 2001, 2009), are recruited in speech per­
ception processes. Evidence from the aphasia literature (Blumstein, Cooper, Zurif, &
Caramazza, 1977) is consistent with these findings because patients with anterior lesions
as well as those with lesions extending into parietal areas present with deficits in speech
perception (Caplan, Gow, & Makris, 1995). It is generally assumed that the functional role
Page 13 of 30
Neural Systems Underlying Speech Perception

of these areas is to map acoustic-phonetic properties of speech onto phonetic category


representations and to ultimately make phonetic decisions.

But speech is more than simply identifying the sound shape of language. It serves as the
vehicle for accessing the words of a language, that is the lexicon. As we discuss later, ac­
cessing the sound shape of words activates a processing stream that (p. 516) includes the
posterior STG, the SMG and AG, and the IFG. Of importance, the ease or difficulty with
which individuals access the lexicon in auditory word recognition is influenced by a num­
ber of factors related to the sound structure properties of words in a language. In this
way, the structure of the phonological lexicon influences activation patterns along this
processing stream. We turn now to a discussion of these factors because they provide a
window into not only the functional architecture of the language processing system but
also the neural systems engaged in selecting a word from the lexicon. In particular, they
show that lexical processes are influenced by the extent to which a target word shares
phonological structure with other words in the lexicon, and hence the extent to which a
target word competes for access and selection.

Lexical Competition

Whether understanding or speaking, we need to select the appropriate words from our
mental lexicon, a lexicon that contains thousands of words, many of which share sound-
shape properties. It is generally assumed that in auditory word recognition, the auditory-
phonetic-phonological input activates not only the target word but also a set of words that
share sound-shape properties with it (Gaskell & Marslen-Wilson, 1997; Marslen-Wilson,
1987). The relationship between the sound structure of a word and the sound structure of
other words in the lexicon affects lexical access, presumably because both processing and
neural resources are influenced by the extent to which a target word has to be selected
from a set of competing words that share this phonological structure.

Indeed, it has been shown that the extent to which a word shares phonological properties
with other words in the lexicon affects the ease with which that word is accessed. In par­
ticular, it is possible to count the number of words that share all but one phoneme with
other words in the lexicon and quantify the density of the neighborhood in which a partic­
ular word resides. A word that has a lot of words that share phonological properties with
it is said to come from a high-density neighborhood, and a word that has a few words that
share phonological properties is said to come from a low-density neighborhood (Luce &
Pisoni, 1998).

Behavioral research has shown that reaction-time latencies are slower in a lexical deci­
sion task for words from high-density neighborhoods compared with low-density neigh­
borhoods (Luce & Pisoni, 1998). These findings are consistent with the view that it takes
greater computational resources to select a word when there are many potential competi­
tors compared with when there are a few potential competitors. FMRI results show that
there is greater activation in the posterior STG and the SMG in a lexical decision task for
words from high-density compared with low-density neighborhoods (Okada & Hickok,

Page 14 of 30
Neural Systems Underlying Speech Perception

2006; Prabhakaran, Blumstein, Myers, Hutchison, & Britton, 2006). These findings sug­
gest that these areas are recruited in accessing the lexical representations of words and
that there are greater processing demands for accessing words as a function of the set of
competing word targets. They are also consistent with studies indicating that posterior
STG and SMG are recruited in phonological and lexical processing (Binder & Price, 2001;
Indefrey & Levelt, 2004; Paulesu, Frith, & Frackowiak, 1993).

Of importance, a series of fMRI studies have shown that the information flow under con­
ditions of lexical competition cascade throughout the lexical processing stream, activat­
ing the posterior STG, SMG, AG, and IFG. Using the visual world paradigm coupled with
fMRI, Righi et al. (2010) showed increased activation in a temporal-parietal-frontal net­
work for words that shared onsets, that is, initial sound segments or initial syllables. In
these studies, subjects were required to look at the picture of an auditorily presented
word from an array of four pictures including the named picture (hammock), a picture of
a word that shared the onset of the target word (hammer), and two other pictures that
had neither a phonological nor semantic relationship with the target word (monkey,
chocolate). Participants’ eye movements were tracked during fMRI scanning. Behavioral
results of participants in the scanner replicated earlier findings showing more looks to
the onset competitor than to the unrelated stimuli (Allopenna, Magnuson, & Tanenhaus,
1998). These findings show that the presence of phonological competition not only modu­
lates activation in the posterior STG and the SMG, areas implicated in phonological pro­
cessing and lexical access, but also has a modulatory effect on activation in frontal areas
and in particular the IFG, an area implicated in selecting among competing semantic al­
ternatives (Thompson-Schill, D’Esposito, Aguirre, & Farah 1997; Thompson-Schill,
D’Esposito, & Kan, 1999).

That the IFG is also activated in selecting among competing phonological alternatives
raises one of two possibilities, as yet unresolved in the literature, about the functional
role of the IFG in resolving (p. 517) competition. One possibility is that selection processes
are domain general and cut across different levels of the grammar (cf. Duncan, 2001;
Duncan & Owen, 2000; Miller & Cohen, 2001; Smith & Jonides, 1999). Another possibility
is that there is a functional subdivision of the IFG depending on the source of the compe­
tition. In particular, BA 44 is recruited in selection based on phonological properties
(Buckner, Raichle, & Petersen, 1995; Burton, Small, & Blumstein, 2000; Fiez, 1997; Pol­
drack et al., 1999), and BA 45 is recruited in selection based on semantic properties (Sny­
der, Feignson, & Thompson-Schill, 2007; Thompson-Schill et al., 1997, 1999). Whatever
the ultimate conclusion, the critical issue is that phonological properties of the lexicon
have a modulatory effect along the speech-lexical processing stream consistent with the
view that information flow at one level of processing (phonological lexical access) cas­
cades and influences other stages of processing downstream from it (lexical selection).

Converging evidence from lesion-based studies supports these general conclusions. In


particular, lexical processing deficits emerge in patients with IFG lesions and those with
temporo-parietal lesions involving the posterior portions of the STG, the SMG, and the
AG. A series of studies using both lexical decision and eye-tracking paradigms in auditory

Page 15 of 30
Neural Systems Underlying Speech Perception

word recognition has shown deficits in aphasic patients in accessing words in the lexicon,
especially under conditions of phonological and lexical competition (cf. Blumstein, 2009,
for review). Of importance, different patterns of deficits emerge for patients with frontal
lesions and for those with temporo-parietal lesions, suggesting that these areas play dif­
ferent functional roles in lexical access and lexical selection processes (Blumstein & Mil­
berg, 2000; Utman, Blumstein, & Sullivan, 2001; Yee, Blumstein, & Sedivy, 2008).

Nature of Information Flow

To this point, it appears as though information flow in speech processing and in accessing
the sound properties of words is unidirectional, going from lower level processes to in­
creasingly more abstract representations, and recruiting a processing stream from tem­
poral to parietal to frontal areas. However, we also know that listeners have a lexical bias
when processing the phonetic categories of speech, suggesting that higher level, lexical
information may influence lower level, speech processing. In a seminal paper, Ganong
(1980) showed that when presented with two continua varying along the same acoustic-
phonetic attribute (e.g., VOT of the initial stop consonant), listeners show a lexical bias.
Thus, they perceive more [b]’s in a beef–peef continuum and more [p]’s in a beace–peace
continuum. Of importance, the same stimulus at or near the phonetic boundary is per­
ceived differently as a function of the lexical status of the stimulus.

These findings raise the question of whether information flow in mapping sounds to
words is solely bottom-up or whether top-down information affects lower level perceptual
processes. There has been much debate about this question in the behavioral literature
(cf. Burton et al., 1989; Connine & Clifton, 1987; McQueen, 1991; Pitt & Samuel, 1993),
some claiming that these results demonstrate that top-down lexical information shapes
perceptual processes (Pitt & Samuel, 1993), and others claiming that they reflect post­
perceptual decision-related processes (Fox, 1984), and more recently in the fMRI litera­
ture (Davis et al., 2011; Guediche et al., 2013). Evidence from functional neuroimaging
provides a potential resolution to this debate. In particular, findings using fMRI (Myers &
Blumstein, 2008) and MEG coupled with EEG (Gow, Segawa, Ahlfors, & Lin, 2008) are
consistent with a functional architecture in which information flow is not just bottom-up
but is also top-down. In both studies, participants listened to stimuli taken from continua
ranging from a word to a nonword (e.g., gift to kift) or from a nonword to a word (e.g.,
giss to kiss). Participants were asked to categorize the initial phoneme of each stimulus,
and in both studies there was evidence of a shift in the perception of tokens from the con­
tinuum such that more categorizations were made that were consistent with the word
endpoint (e.g., more “g” responses in the “gift–kift” continuum). Activation patterns in the
STG reflected this shift in perception, showing modulation as a function of the lexically bi­
ased shift in the locus of the phonetic boundary. Given that modulation of activation was
seen early in the neural processing stream, these data were taken as evidence that lexi­
cally biased shifts in perception affect early processing of speech stimuli in the STG. In­
deed, the MEG/EEG source estimates suggest that the information flow from the SMG, an

Page 16 of 30
Neural Systems Underlying Speech Perception

area implicated in lexical processing, influenced activation in the left posterior STG (Gow
et al., 2008).

Taken together, these results are consistent with a functional architecture of language in
which information flow is bidirectional. Of interest, the STG appears to integrate phonetic
information with multiple sources of linguistic information, including (p. 518) lexical and
sentence level (meaning) information (Chandrasekaran, Chan, & Wong, 2011; Obleser,
Wise, Dresner, & Scott, 2007).

Neural Plasticity: Phonetic Category Learning


in Adults
The neural architecture involved in processing mature, native-language phonetic catego­
ry distinctions has been well studied. What has been less well addressed is how this struc­
ture arises. Second language learning offers a window into the processes that give rise to
language organization in the brain. In particular, one can explore the neural systems that
support the learning of a new phonetic contrast and examine whether the neural struc­
tures that support speech perception are plastic into adulthood. Although typically-devel­
oping children learn to perceive the sounds of their native languages, and indeed can
gain native-like proficiency in learning a second language, adult second-language learn­
ers meet with much more variable success in learning new speech sound contrasts (Brad­
low, Pisoni, Akahane-Yamada, & Tohkura, 1997).

Explanations for the locus of difficulties in the perception of second-language contrasts


differ, yet it is clear that native-like discrimination of many non-native speech contrasts is
unobtainable for some listeners, even with considerable experience or training (Pallier,
Bosch, & Sebastian-Galles, 1997). Taken at face value, this finding argues against signifi­
cant neural plasticity of the structures involved in phonetic learning in adulthood. Howev­
er, using an individual differences approach, a number of investigations have looked for
brain areas that correlate with better success in learning non-native contrasts.

Evidence from fMRI and event-related potential (ERP) studies of non-native sound pro­
cessing suggests that better proficiency in perceiving non-native speech contrasts is ac­
companied by neural activation patterns that increasingly resemble the patterns shown
for native speech sounds. Specifically, increased leftward lateralization is found in the
temporal lobe for those listeners who show better perceptual performance for trained
non-native speech sounds (Naatanen et al., 1997; Zhang et al., 2009; cf. Naatanen, Paavi­
lainen, Rinne, & Alho, 2007, for review). Moreover, increased sensitivity to non-native
contrasts also correlates with less activation in the left IFG as measured by fMRI
(Golestani & Zatorre, 2004; Myers & Blumstein, 2011). At the same time, increased profi­
ciency in perceiving non-native sounds is accompanied by an increase in the size of the
mismatch negativity, or MMN (Ylinen et al., 2010; Zhang et al., 2009), which is thought to

Page 17 of 30
Neural Systems Underlying Speech Perception

index preattentive sensitivity to acoustic contrasts and to originate from sources in the bi­
lateral temporal lobes and inferior frontal gyri.

These findings are not necessarily incompatible with one another. The neural sensitivity
indexed by the MMN response may reflect more accurate or more efficient encoding of
learned phonetic contrasts, particularly in the temporal lobes. In turn, decreases in acti­
vation in the left IFG for processing non-native speech sounds may be related to greater
efficiency and hence fewer neural resources needed to access and make decisions about
category representations when speech sound perception is well encoded in the temporal
lobes.

In fact, there is some evidence suggesting that frontal areas play a more active role in de­
termining the degree of success that learners attain on non-native contrasts. In an ERP
study of bilingualism by Diaz et al. (2008), early Spanish-Catalan bilinguals were grouped
by their mastery of a vowel contrast in their second language, Catalan. MMN responses
to a non-native and a native vowel contrast as well as to nonspeech stimuli were assessed
in a group of “good perceivers” and a group of “poor perceivers.” Although good and poor
perceivers did not differ in their MMN response to either the native and non-native
speech contrasts or the nonspeech stimuli at electrodes over the temporal poles, signifi­
cant differences between groups emerged for both types of speech contrasts over frontal
electrode sites, with good perceivers showing a larger MMN to both non-native and na­
tive speech contrasts than poor perceivers.

Of interest, some recent research suggests that individual differences in activation for
non-native speech sounds may arise at least in part because of differences in brain mor­
phology. In Heschl’s gyrus, greater volume in the left but not right hemisphere correlates
with individual success in learning a non-native tone contrast (Wong et al., 2008) and a
non-native place of articulation contrast (Golestani, Molko, Dehaene, LeBihan, & Pallier,
2007). Given that Heschl’s gyrus is involved in auditory processing, these anatomical
asymmetries give rise to the hypothesis that better learning of non-native contrasts
comes about because some individuals are better able to perceive the fine-grained
acoustic details of speech.

The question remains, however, of whether individual differences in brain morphology are
the cause or consequence of differences in proficiency (p. 519) with non-native contrasts.
That is, do preexisting, potentially innate differences in brain morphology support better
learning, or do differences in cortical volume arise because of neural plasticity resulting
from experience or with the sounds of speech in adulthood? This question was examined
in a recent study by Golestani and colleagues (2011) in which differences in brain mor­
phology were measured in a group of trained phoneticians and a group of untrained con­
trols. Greater white matter volume in Heschl’s gyrus was evident in the group of phoneti­
cians compared to the normal controls. However, differences in these early auditory pro­
cessing areas did not correlate with amount of phonetic training experience among the
phoneticians. In contrast, regions in a subportion of the left IFG, the left pars opercularis,
showed a significant correlation with years of phonetic training. Taken together, these re­

Page 18 of 30
Neural Systems Underlying Speech Perception

sults suggest that differences in the size and morphology of Heschl’s gyrus may confer an
innate advantage in perceiving the fine-grained details of the speech stream. The in­
creased volume in frontal areas appears instead to reflect experience-dependent plastici­
ty. Thus, phoneticians may become phoneticians because they have a “natural” propensity
for perceiving speech. But their performance is enhanced by experience and the extent to
which they become “experts.”

Summary and Future Directions


In this chapter, we have reviewed the nature of speech perception processing and the
neural systems underlying such processing. As we have seen, the processing of the
sounds of language recruits a neural processing stream involving temporal, parietal, and
frontal structures. Together they support a functional architecture in which information
flow is progressively transformed in stages from the auditory input to spectral-temporal
properties of speech, phonetic category representations, and ultimately lexical represen­
tations. Of importance, patterns of neural activation are modulated throughout the pro­
cessing stream, suggesting that the system is highly interactive.

Although there has been much progress, many questions remain unanswered. In particu­
lar, the direction of information flow has largely been inferred from our knowledge of the
effects of lesions on speech perception processes. Studies that integrate imaging meth­
ods with fine-grained temporal resolution (ERP, MEG) with the spatial resolution afforded
by fMRI will be critical for our understanding of the extent to which feedforward or feed­
back mechanisms underlie the modulation of activation found, for example, on the influ­
ence of lexical information on the perception of the acoustic-phonetic properties of
speech, the influence of motor and articulatory processes on speech perception, and the
influence of sound properties of speech on lexical access. Of particular interest is
whether the IFG is involved solely in decision-related executive processes or whether it,
in turn, modulates activation of neural areas downstream from it. That is, does IFG acti­
vation influence the activation of temporal lobe areas in the processes of phonetic catego­
rization, and does it influence the activation of parietal lobe areas in the processes of lexi­
cal access?

At each level of processing, questions remain. Evidence we have reviewed suggests that
the acoustic-phonetic properties of speech are extracted in different areas within the tem­
poral lobe, and at least at early stages of processing, they differentially recruit the right
hemisphere. However, listeners perceive a unitary percept of individual sound segments
and syllables. How does the neural system integrate or “bind” this information? As listen­
ers we are attuned to fine acoustic differences, whether it is to within-category variation
or speaker variation, and yet we ignore these differences as we perceive a stable phonet­
ic category or lexical representation. How does our neural system solve this invariance
problem across the different sources of variability encountered by the listener? Results to
date suggest that the resolution of this variability may recruit different neural areas, de­
pending on the type of variability.

Page 19 of 30
Neural Systems Underlying Speech Perception

And finally, we know that adult listeners show some degree of plasticity in processing lan­
guage. They can learn new languages and their attendant phonetic and phonological
structures, and they show the ability to dynamically adapt to variability in the speech and
language input when accessing the sound structure and lexical representations of their
native language (see Kraljic & Samuel, 2007, for a review). What are the neural systems
underlying this plasticity? Is the same neural system recruited that underlies adult pro­
cessing of the sound structure of language, or are other areas recruited in support of
such learning?

These are only a few of the questions remaining to be answered, but together they set an
agenda for future research on the neural systems underlying speech perception process­
es.

Author Note
This research was supported in part by NIH NIDCD Grant R01 DC006220, R01 DC00314,
and NIH NIDCD Grant R03 DC009495 to Brown University, and NIHDCD P30 DC010751
to the (p. 520) University of Connecticut. The content is solely the responsibility of the au­
thors and does not necessarily represent the official views of the National Institute on
Deafness and Other Communication Disorders or the National Institutes of Health.

References
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of
spoken word recognition using eye movements: Evidence for continuous mapping models.
Journal of Memory and Language, 38 (4), 419–439.

Auerbach, S. H., Allard, T., Naeser, M., Alexander, M. P., & Albert, M. L. (1982). Pure word
deafness: Analysis of a case with bilateral lesions and a defect at the prephonemic level.
Brain: A Journal of Neurology, 105 (Pt 2), 271–300.

Baese-Berk, M., & Goldrick, M. (2009). Mechanisms of interaction in speech production.


Language and Cognitive Processes, 24 (4), 527–554.

Belin, P., & Zatorre, R. J. (2003). Adaptation to speaker’s voice in right anterior temporal
lobe, 14 (16), 2105–2109.

Binder, J. R., & Price, C. (2001). Functional neuroimaging of language. In R. Cabeza & A.
Kingstone (Eds.), Handbook of functional neuroimaging of cognition (pp. 187–251). Cam­
bridge, MA: MIT Press.

Blumstein, S. E. (2009). Auditory word recognition: Evidence from aphasia and functional
neuroimaging. Language and Linguistics Compass, 3, 824–838.

Blumstein, S. E., Cooper, W. E., Zurif, E. B., & Caramazza, A. (1977). The perception and
production of voice-onset time in aphasia. Neuropsychologia, 15, 371–383.

Page 20 of 30
Neural Systems Underlying Speech Perception

Blumstein, S. E., & Milberg, W. 2000. Language deficits in Broca’s and Wernicke’s apha­
sia: A singular impairment. In Y. Grodzinsky, L. Shapiro, & D. Swinney (Eds.), Language
and the brain: Representation and processing (pp. 167–183). New York: Academic Press.

Blumstein, S. E., Myers, E. B., & Rissman, J. (2005). The perception of voice onset time:
An fMRI investigation of phonetic category structure. Journal of Cognitive Neuroscience,
17 (9), 1353–1366.

Blumstein, S. E., & Stevens, K. N. (1980). Perceptual invariance and onset spectra for
stop consonants in different vowel environments. Journal of the Acoustical Society of
America, 67 (2), 648–662.

Boatman, D. F., & Miglioretti, D. L. (2005). Cortical sites critical for speech discrimination
in normal and impaired listeners. Journal of Neuroscience, 25 (23), 5475–5480.

Bonte, M., Valente, G., & Formisano, E. (2009). Dynamic and task-dependent encoding of
speech and voice by phase reorganization of cortical oscillations. Journal of Neuroscience,
29 (6), 1699.

Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japan­
ese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on
speech production. Journal of the Acoustical Society of America, 101 (4), 2299–2310.

Britton, B., Blumstein, S. E., Myers, E. B., & Grindrod, C. (2009). The role of spectral and
durational properties on hemispheric asymmetries in vowel perception. Neuropsychologia,
47 (4), 1096–1106.

Buckner, R. L., Raichle, M. E., & Petersen, S. E. (1995). Dissociation of human prefrontal
cortical areas across different speech production tasks and gender groups. Journal of
Neurophysiology, 74 (5), 2163–2173.

Burton, M. W. (2001). The role of inferior frontal cortex in phonological processing. Cogni­
tive Science, 25, 695–709.

Burton, M. W. (2009). Understanding the role of the prefrontal cortex in phonological pro­
cessing. Clinical Linguistics and Phonetics, 23 (3), 180–195.

Burton, M. W., Baum, S. R., & Blumstein, S. E. (1989). Lexical effects on the phonetic cat­
egorization of speech: The role of acoustic structure. Journal of Experimental Psychology:
Human Perception and Performance, 15, 567–575.

Burton, M. W., Small, S. L., & Blumstein, S. E. (2000). The role of segmentation in phono­
logical processing: An fMRI investigation. Journal of Cognitive Neuroscience, 12 (4), 679–
690.

Caplan, D., Gow, D., & Makris, N. (1995). Analysis of lesions by MRI in stroke patients
with acoustic-phonetic processing deficits. Neurology, 45, 293–298.

Page 21 of 30
Neural Systems Underlying Speech Perception

Chandrasekaran, B., Chan, A. H. D., & Wong, P. C. M. (2011). Neural processing of what
and who information in speech. Journal of Cognitive Neuroscience, 2 (10), 2690–2700.

Chang, E. F., Rieger, J. W., Johnson, K., Berger, M. S., Barbaro, N. M., & Knight, R. T.
(2010). Categorical speech representation in human superior temporal gyrus. Nature
Neuroscience, 13 (11), 1428–1432.

Connine, C. M., & Clifton, C. (1987). Interactive use of lexical information in speech per­
ception. Journal of Experimental Psychology: Human Perception and Performance, 13,
291–299.

Coslett, H. B., Brashear, H. R., & Heilman, K. M. (1984). Pure word deafness after bilater­
al primary auditory cortex infarcts. Neurology, 34 (3), 347–352.

Davis, M. H., Ford, M. A., Kherif, F., & Johnsrude, I. S. (2011). Does semantic context ben­
efit speech understanding through “top–down” processes? Evidence from time-resolved
sparse fMRI. Journal of Cognitive Neuroscience, 23 (12), 3914–3932.

Dell, G. S. (1986). A spreading activation theory of retrieval in sentence production. Psy­


chological Review, 93, 283–321.

Desai, R., Liebenthal, E., Waldron, E., & Binder, J. R. (2008). Left posterior temporal re­
gions are sensitive to auditory categorization. Journal of Cognitive Neuroscience, 20 (7),
1174–1188.

Diaz, B., Baus, C., Escera, C., Costa, A., & Sebastian-Galles, N. (2008). Brain potentials to
native phoneme discrimination reveal the origin of individual differences in learning the
sounds of a second language. Proceedings of the National Academy of Science U S A, 105
(42), 16083–16088.

Duncan, J. (2001). An adaptive model of neural function in prefrontal cortex. Nature Re­
views Neuroscience, 2, 820–829.

Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited
by diverse cognitive demands. Trends in Neurosciences, 23, 475–483.

Fiez, J. A. (1997). Phonology, semantics, and the role of the left inferior prefrontal cortex.
Human Brain Mapping, 5 (2), 79–83.

Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press.

Formisano, E., De Martino, F., Bonte, M., & Goebel, R. (2008). “Who” is saying “what”?
Brain-based decoding of human voice and speech. Science, 322, 970–973.

Fox, R. A. (1984). Effect of lexical status on phonetic categorization. Journal of Experi­


mental Psychology: Human Perception and Performance, 10, 526–540.

Page 22 of 30
Neural Systems Underlying Speech Perception

Frye, R. E., Fisher, J. M. G., Witzel, T., Ahlfors, S. P., Swank, P., Liederman, J., &
(p. 521)

Halgren, E. (2008). Objective phonological and subjective perceptual characteristics of


syllables modulate spatiotemporal patterns of superior temporal gyrus activity. NeuroI­
mage, 40 (4), 1888–1901.

Frye, R. E., Fisher, J. M., Coty, A., Zarella, M., Liederman, J., & Halgren, E. (2007). Linear
coding of voice onset time. Journal of Cognitive Neuroscience, 19 (9), 1476–1487.

Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech percep­
tion reviewed. Psychonomic Bulletin and Review, 13 (3), 361–377.

Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., & Hutchins, G. D. (2000).
A crosslinguistic PET study of tone perception. Journal of Cognitive Neuroscience, 12 (1),
207–222.

Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Ex­


perimental Psychology: Human Perception and Performance, 6 (1), 110–125.

Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: A distrib­
uted model of speech perception. Language and Cognitive Processes, 12, 613–656.

Gaskell, M. G., & Marslen-Wilson, W. D. (1999). Ambiguity, competition, and blending in


spoken word recognition. Cognitive Science, 23 (4), 439–462.

Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psycholog­


ical Review, 105 (2), 251–278.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning
to articulatory processes: Evidence from tongue twisters. Language and Cognitive
Processes, 21, 649–683.

Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier, C. (2007). Brain structure
predicts the learning of foreign speech sounds. Cerebral Cortex, 17 (3), 575.

Golestani, N., Price, C. J., & Scott, S. K. (2011). Born with an ear for dialects? Structural
plasticity in the expert phonetician brain. Journal of Neuroscience, 31 (11), 4213–4220.

Golestani, N., & Zatorre, R. J. (2004). Learning new sounds of speech: Reallocation of
neural substrates, 21 (2), 494–506.

Gow, D. W., Segawa, J. A., Ahlfors, S. P., & Lin, F.-H. (2008). Lexical influences on speech
perception: A Granger causality analysis of MEG and EEG source estimates. NeuroImage,
43 (3), 614–623.

Grill-Spector, K., Henson, R., & Martin, A. (2006). Repetition and the brain: Neural mod­
els of stimulus-specific effects. Trends in Cognitive Sciences, 10 (1), 14–23.

Page 23 of 30
Neural Systems Underlying Speech Perception

Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional
properties of human cortical neurons. Acta Psychologica (Amsterdam), 107 (1-3), 293–
321.

Guediche, S., Salvata, C., & Blumstein S. E. (2013). Temporal cortex reflects effects of
sentence context on phonetic processing. Journal of Cognitive Neuroscience, 25 (5), 706–
718.

Guenther, F. H., Nieto-Castanon, A., Ghosh, S. S., & Tourville, J. A. (2004). Representation
of sound categories in auditory cortical maps. Journal of Speech, Language, and Hearing
Research, 47 (1), 46–57.

Hickok, G. (2009). The functional neuroanatomy of language. Physics of Life Reviews, 6,


121–143.

Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for under­
standing aspects of the functional anatomy of language. Cognition, 92 (1-2), 67–99.

Hutchison, E. R., Blumstein, S. E., & Myers, E. B. (2008). An event-related fMRI investiga­
tion of voice-onset time discrimination. NeuroImage, 40 (1), 342–352.

Indefrey, P., & Levelt, W. J. M. (2004). The spatial and temporal signatures of word pro­
duction components, Cognition, 92 (1–2), 101–144.

Joanisse, M. F., Zevin, J. D., & McCandliss, B. D. (2007). Brain mechanisms implicated in
the preattentive categorization of speech sounds revealed using fMRI and a shortinterval
habituation trial paradigm. Cerebral Cortex, 17 (9), 2084–2093.

Kasai, K., Yamada, H., Kamio, S., Nakagome, K., Iwanami, A., Fukuda, M., Itoh, K., Koshi­
da, I., Yumoto, M., Iramina, K., Kato, N., & Ueno, S. (2001). Brain lateralization for mis­
match response to across- and within-category change of vowels. NeuroReport, 12 (11),
2467–2471.

Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there a return to
normal? Cognitive Psychology, 51 (2), 141–178.

Kraljic, T., & Samuel, A. G. (2007). Perceptual adjustments to multiple speakers. Journal
of Memory and Language, 56, 1–15.

Krishnan, A., Swaminathan, J., & Gandour, J. T. (2009). Experience-dependent enhance­


ment of linguistic pitch representation in the brainstem is not specific to a speech con­
text. Journal of Cognitive Neuroscience, 21 (6), 1092–1105.

Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect”
for the prototypes of speech categories, monkeys do not. Perception and Psychophysics,
50 (2), 93–107.

Page 24 of 30
Neural Systems Underlying Speech Perception

Leff, A. P., Iverson, P., Schofield, T. M., Kilner, J. M., Crinion, J. T., Friston, K. J., & Price, C.
J. (2009). Vowel-specific mismatch responses in the anterior superior temporal gyrus: An
fMRI study. Cortex, 45 (4), 517–526.

Levelt, W. J. M (1992). Accessing words in speech production: Stages, processes, and rep­
resentations. Cognition, 42, 1–22.

Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1958). Some cues for the distinction be­
tween voiceless and voiced stops in initial position. Language and Speech, 1, 153–157.

Liberman, A. M., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The discrimination
of speech sounds within and across phoneme boundaries. Journal of Experimental Psy­
chology, 54 (5), 358–368.

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception re­
vised. Cognition, 21 (1), 1–36.

Liebenthal, E., Desai, R., Ellingson, M. M., Ramachandran, B., Desai, A., & Binder, J. R.
(2010). Specialization along the left superior temporal sulcus for auditory categorization.
Cerebral Cortex, 20, 2958–2970.

Liégeois-Chauvel, C., de Graaf, J. B., Laguitton, V., & Chauvel, P. (1999). Specialization of
left auditory cortex for speech perception in man depends on temporal coding. Cerebral
Cortex, 9 (5), 484–496.

Lisker, L., & Abramson, A. S. (1964). A cross-language study of voicing in initial stops:
Acoustical measurements. Word, 20, 384–422.

Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activa­
tion model. Ear and Hearing, 19 (1), 1–36.

Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word-recognition. Cogni­


tion, 25 (1-2), 71–102.

Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access dur­
ing word recognition in continuous speech. Cognitive Psychology, 10 (1), 29–63.

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cogni­
tive Psychology, 18 (1), 1–86.

McGee, T., Kraus, N., King, C., Nicol, T., & Carrell, T. D. (1996). Acoustic elements
(p. 522)

of speechlike stimuli are reflected in surface recorded responses over the guinea pig tem­
poral lobe. Journal of the Acoustical Society of America, 99 (6), 3606–3614.

McQueen, J. M. (1991). The influence of the lexicon on phonetic categorization: Stimulus


quality in word-final ambiguity. Journal of Experimental Psychology: Human Perception
and Performance, 17, 433–443.

Page 25 of 30
Neural Systems Underlying Speech Perception

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­
nual Review of Neuroscience, 24, 167–202.

Miller, J. L., & Volaitis, L. E. (1989). Effect of speaking rate on the perceptual structure of
a phonetic category. Perception and Psychophysics, 46 (6), 505–512.

Mottonen, R., & Watkins, K. E. (2009). Motor representations of articulators contribute to


categorical perception of speech sounds. Journal of Neuroscience, 29 (31), 9819.

Myers, E. B. (2007). Dissociable effects of phonetic competition and category typicality in


a phonetic categorization task: An fMRI investigation. Neuropsychologia, 45 (7), 1463–
1473.

Myers, E. B., & Blumstein, S. E. (2008). The neural bases of the lexical effect: An fMRI in­
vestigation. Cerebral Cortex, 18 (2), 278.

Myers, E. B., & Blumstein, S. E. (2011). Individual differences in neural sensitivity to a


novel phonetic contrast. Unpublished manuscript.

Myers, E. B., Blumstein, S. E., Walsh, E., & Eliassen, J. (2009). Inferior frontal regions un­
derlie the perception of phonetic category invariance. Psychological Science, 20 (7), 895–
903.

Naatanen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio,
M., et al. (1997). Language-specific phoneme representations revealed by electric and
magnetic brain responses. Nature, 385 (6615), 432–434.

Naatanen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity
(MMN) in basic research of central auditory processing: a review. Clinical Neurophysiolo­
gy, 118 (12), 2544–2590.

Obleser, J., Boecker, H., Drzezga, A., Haslinger, B., Hennenlotter, A., Roettinger, M., Eu­
litz, C., et al. (2006). Vowel sound extraction in anterior superior temporal cortex. Human
Brain Mapping, 27 (7), 562–571.

Obleser, J., Elbert, T., Lahiri, A., & Eulitz, C. (2003). Cortical representation of vowels re­
flects acoustic dissimilarity determined by formant frequencies. Cognitive Brain
Research, 15 (3), 207–213.

Obleser, J., Wise, R. J. S., Dresner, M. A., & Scott, S. K. (2007). Functional integration
across brain regions improves speech perception under adverse listening conditions. Jour­
nal of Neuroscience, 27 (9), 2283–2289.

Okada, K., & Hickok, G. (2006). Identification of lexical-phonological networks in the su­
perior temporal sulcus using functional magnetic resonance imaging. NeuroReport, 17
(12), 1293–1296.

Page 26 of 30
Neural Systems Underlying Speech Perception

Pallier, C., Bosch, L., & Sebastián-Gallés, N. (1997). A limit on behavioral plasticity in
speech perception. Cognition, 64 (3), B9–B17.

Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attribut­
es and recognition memory for spoken words. Learning, Memory, 19 (2), 309–328.

Papanicolaou, A. C., Castillo, E., Breier, J. I., Davis, R. N., Simos, P. G., & Diehl, R. L.
(2003). Differential brain activation patterns during perception of voice and tone onset
time series: a MEG study. NeuroImage, 18 (2), 448–459.

Paulesu, E., Frith, C. D., & Frackowiak, R. S. J. (1993). The neural correlates of the verbal
component of working memory. Nature, 362 (6418), 342–345.

Peramunage, D., Blumstein, S. E., Myers, E. B., Goldrick, M., & Baese-Berk, M. (2011).
Phonological neighborhood effects in spoken word production: An fMRI study. Journal of
Cognitive Neuroscience, 23 (3), 593–603.

Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of vowels. Jour­
nal of the Acoustical Society of America, 24, 175–184.

Pisoni, D. B., & Tash, J. (1974). Reaction times to comparisons within and across phonetic
categories. Perception and Psychophysics, 15, 289–290.

Pitt, M.A., & Samuel, A.G. (1993). An empirical and meta-analytic evaluation of the
phoneme identification task. Journal of Experimental Psychology: Human Perception and
Performance, 19, 699–725.

Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code.
Cognitive Science, 25 (5), 679–693.

Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data?


Trends in Cognitive Sciences, 10, 59–63.

Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D.
(1999). Functional specialization for semantic and phonological processing in the left in­
ferior prefrontal cortex. NeuroImage, 10 (1), 15–35.

Prabhakaran, R., Blumstein, S. E., Myers, E. B., Hutchison, E., & Britton, B. (2006). An
event-related fMRI investigation of phonological-lexical competition. Neuropsychologia,
44 (12), 2209–2221.

Remez, R. E., Rubin, P. E., Pisoni, D. B., & Carrell, T. D. (1981). Speech perception with­
out traditional speech cues. Science, 212 (4497), 947–949.

Righi, G., Blumstein, S. E., Mertus, J., & Worden, M. S. (2010). Neural systems underlying
lexical competition: An eye tracking and fMRI study. Journal of Cognitive Neuroscience,
22 (2), 213–224.

Page 27 of 30
Neural Systems Underlying Speech Perception

Salvata, C., Blumstein, S. E., & Myers, E. B. (2012). Speaker invariance for phonetic infor­
mation: An fMRI investigation, Language and Cognitive Processes, 27 (2), 210–230.

Shankweiler, D., & Studdert-Kennedy, M. (1967). Identification of consonants and vowels


presented to left and right ears. Quarterly Journal of Experimental Psychology, 19, 59–63.

Sharma, A., & Dorman, M. F. (1999). Cortical auditory evoked potential correlates of cate­
gorical perception of voice-onset time. Journal of the Acoustical Society of America, 106
(2), 1078–1083.

Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes.
Science, 283, 1657–1661.

Snyder, H. R., Feignson, K., & Thompson-Schill, S. L. (2007). Prefrontal cortical response
to conflict during semantic and phonological tasks. Journal of Cognitive Neuroscience, 19,
761–775.

Steinschneider, M., Schroeder, C. E., Arezzo, J. C., & Vaughan, H. G. (1995). Physiologic
correlates of the voice onset time boundary in primary auditory cortex (A1) of the awake
monkey: Temporal response patterns. Brain and Language, 48 (3), 326–340.

Steinschneider, M., Volkov, I. O., Noh, M. D., Garell, P. C., & Howard, M. A. (1999). Tem­
poral encoding of the voice onset time phonetic parameter by field potentials recorded di­
rectly from human auditory cortex. Journal of Neurophysiology, 82 (5), 2346–2357.

Stevens, K. N. (1960). Toward a model for speech recognition. Journal of the


(p. 523)

Acoustical Society of America, 32 (1), 47–55.

Stevens, K. N., & Blumstein, S. E. (1978). Invariant cues for place of articulation in stop
consonants. Journal of the Acoustical Society of America, 64, 1358–1368.

Strange, W. (1989). Evolving theories of vowel perception. Journal of the Acoustical Soci­
ety of America, 85 (5), 2081–2087.

Strange, W., Jenkins, J., & Johnson, T. L. (1983). Dynamic specification of coarticulated
vowels. Journal of the Acoustical Society of America, 74 (3), 697–705.

Studdert-Kennedy, M., & Shankweiler, D. (1970). Hemispheric specialization for speech


perception. Journal of the Acoustical Society of America, 48, 579–594.

Tallal, P., & Newcombe, F. (1978). Impairment of auditory perception and language com­
prehension in aphasia. Brain and Language, 5, 13–24.

Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997). Role of the
left inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation. Proceed­
ings of the National Academy of Sciences U S A, 94, 14792–14797.

Page 28 of 30
Neural Systems Underlying Speech Perception

Thompson-Schill, S. L., D’Esposito, M., & Kan, I. P. (1999). Effects of repetition and com­
petition on activation in left prefrontal cortex during word generation. Neuron, 23, 513–
522.

Utman, J. A., Blumstein, S. E., & Sullivan, K. (2001). Mapping from sound to meaning: Re­
duced lexical activation in Broca’s aphasics, Brain and Language, 79, 444–472.

von Kriegstein, K., Eger, E., Kleinschmidt, A., & Giraud, A. L. (2003). Modulation of neur­
al responses to speech by directing attention to voices or verbal content. Cognitive Brain
Research, 17 (1), 48–55.

Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., & Grodd, W. (2002). Dynamic brain ac­
tivation during processing of emotional intonation: Influence of acoustic parameters,
emotional valence, and sex. NeuroImage, 15 (4), 856–869.

Wilson, S. M., & Iacoboni, M. (2006). Neural responses to non-native phonemes varying in
producibility: Evidence for the sensorimotor nature of speech perception. NeuroImage, 33
(1), 316–325.

Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech acti­
vates motor areas involved in speech production. Nature Neuroscience, 7 (7), 701–702.

Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normaliza­
tion. Journal of Cognitive Neuroscience, 16 (7), 1173–1184.

Wong, P., Warrier, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, T. B., & Zatorre,
R. J. (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex,
18 (4), 828–836.

Xu, Y., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Tong, Y., Li, X., et al. (2006). Ac­
tivation of the left planum temporale in pitch processing is shaped by language experi­
ence. Human Brain Mapping, 27 (2), 173–183.

Yee, E., Blumstein, S. E., & Sedivy, J. C. (2008). Lexical-semantic activation in Broca’s and
Wernicke’s aphasia: Evidence from eye movements. Journal of Cognitive Neuroscience,
20, 592–612.

Ylinen, S., Uther, M., Latvala, A., Vepsäläinen, S., Iverson, P., Akahane-Yamada, R., &
Näätänen, R. (2010). Training the brain to weight speech cues differently: A study of
Finnish second-language users of English. Journal of Cognitive Neuroscience, 22 (6),
1319–1332.

Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex:
Music and speech. Trends in Cognitive Sciences, 6 (1), 37–46.

Zevin, J. D., Yang, J., Skipper, J. I., & McCandliss, B. D. (2010). Domain general change de­
tection accounts for “dishabituation” effects in temporal-parietal regions in functional

Page 29 of 30
Neural Systems Underlying Speech Perception

magnetic resonance imaging studies of speech perception. Journal of Neuroscience, 30


(3), 1110.

Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., Kawakatsu, M., et al.
(2009). Neural signatures of phonetic learning in adulthood: A magnetoencephalography
study. NeuroImage, 46 (1), 226–240.

Sheila Blumstein

Sheila Blumstein is the Albert D. Mead Professor of Cognitive, Linguistic and Psycho­
logical Sciences at Brown University.

Emily B. Myers

Emily B. Myers, Department of Psychology, University of Connecticut, Storrs, CT

Page 30 of 30
Multimodal Speech Perception

Multimodal Speech Perception  


Agnès Alsius, Ewen MacDonald, and Kevin Munhall
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0026

Abstract and Keywords

Spoken language can be understood through different sensory modalities. Audition, vi­
sion, and haptic perception each can transduce speech information from a talker as a sin­
gle channel of information. The more natural context for communication is for language
to be perceived through multiple modalities and for multimodal integration to occur. This
chapter reviews the sensory information provided by talkers and the constraints on multi­
modal information processing. The information generated during speech comes from a
common source, the moving vocal tract, and thus shows significant correlations across
modalities. In addition, the modalities provide complementary information for the per­
ceiver. For example, the place of articulation of speech sounds is conveyed more robustly
by vision. These factors explain the fact that multisensory speech perception is more ro­
bust and accurate than unisensory perception. The neural networks responsible for this
perceptual activity are diverse and still not well understood.

Keywords: sensory modalities, spoken language, multisensory speech perception

Multisensory Integration: Cross-Talk Between


the Senses
Evolution has equipped organisms with multiple senses, each one conveying a unique and
particular aspect of the outside world. For example, when we eat we not only perceive
gustatory and olfactory information of the food (Small et al., 2004), but we also often ap­
preciate its visual appearance (imagine having to eat a green steak!), feel its texture in
our mouths (i.e., somatosensation), or even hear the sounds it produces when we chew it.
One important challenge facing researchers is to understand how the brain orchestrates
the processing of all this multisensory information, synthesizing the massive and constant
influx into coherent representations to produce a single perceptual reality.

Page 1 of 56
Multimodal Speech Perception

The brain’s ability to combine sensory information from multiple modalities into a single,
unified percept is a key feature of organisms’ successful interaction with the external
world. Psychophysical studies have demonstrated that multisensory integration results in
perceptual enhancement by reducing ambiguity and therefore enhancing our ability to re­
act to external events (Ernst & Banks, 2002; Welch & Warren, 1986). Furthermore, given
that information specified in each sensory modality reflects different features of the same
stimulus, the integration of sensory inputs provides complementary information about the
environment, increasing the likelihood of its accurate identification (O’Hare, 1991). In
short, multisensory integration produces a much richer and a more diverse sensory expe­
rience than that offered by each sensory modality in isolation (Stein & Meredith, 1993).

Whereas multisensory redundancy is known to be highly advantageous, its benefits can


only arise if the information carried along the different modalities is perceived as belong­
ing to the same external (p. 525) object or event. How this process is accomplished, given
that information from the various sensory systems and transducers differs in resolution,
neural processing time, and type of physical energy, has been the focus of much behav­
ioral and neuroscientific research (see Calvert et al., 2004, for a review). Across all com­
binations of sensory modalities, both structural factors, such as temporal and spatial
proximity, and cognitive factors, such as semantic congruency (i.e., “appropriateness” of
intermodality match) have been defined as major determinants for co-registration. That
is, semantically congruent stimuli occurring at an approximately coincident time and
space have greater chances to be considered as originating from a common external
source, and therefore integrated by the perceptual system (see King, 2005).

Interpersonal communication provides a rich perceptual environment with information


being available across the different sensory modalities. Although the auditory signal by it­
self is often sufficient for accurate speech comprehension, in face-to-face conversations
the sight of the talker conveys an additional source of information that can be actively
used by the listener during speech perception. Speech perceivers are, in the words of
Carol Fowler (2004, p.189), “informational omnivores:” When we see someone speaking,
our processing is not limited to the linguistic analysis of the words, but also encompasses
the perception of visible speech movements and other nonverbal information such as fa­
cial expression, body movements, posture, gaze, manual gestures, and the tone and the
timing of the voice. This broad spectrum of audiovisual information is produced in paral­
lel by the talker and must be processed and integrated by the listener in order to grasp
the talker’s full intent. Thus human communication, in its most natural form (i.e., face to
face), is multisensorial and multidimensional (Figure 26.1).

Page 2 of 56
Multimodal Speech Perception

Figure 26.1 A portrait of Helen Keller (seated on the


left), Alexander Graham Bell (seated on the right),
and Anne Sullivan (standing). This remarkable photo­
graph shows communication happening simultane­
ously in different modalities. Bell is listening to Sulli­
van while Keller performs tactile lip reading. Bell and
Sullivan are also watching Keller, who is also using
tactile finger spelling with Bell.

From Parks Canada; Cultural Heritage Image


Gallery.

Given the multiple levels of audiovisual speech, a comprehensive explanation of what in­
formation is available for speech, therefore, requires a more specific statement of what
unit of communication is being examined. For example, if the level of communication is
the message and its meaning, then global features, such as posture and tone of speech,
may be important. At times, how you say something—not what you say—conveys the im­
portant information. At a more micro level, the unit might be the individual sound or
word, a dimension that, again, can be acquired acoustically (i.e., speech sounds) as well
as optically (i.e., articulatory movements). To a great extent, this more fine-grained analy­
sis of speech has been the major focus of research, and it will be the level at which we
consider the existing literature.

Speech Production: Inherent Link Between Ar­


ticulatory Movements and Vocal Acoustics
The production of speech is a mixture of cognitive-linguistic planning and the biophysics
of sound production. Both levels leave their trace in the acoustics of speech. Since the
middle of the last century, research has depicted sound production as an interaction be­
tween the sources of sound and their filtering by the vocal tract resonances. For vowels
and some consonants, the primary source of sound is the vocal folds vibrating. As the

Page 3 of 56
Multimodal Speech Perception

sound propagates through the vocal tract, the spatial configuration of the articulator
structures (e.g., velum, palate, tongue, lips) creates resonances that give each sound its
final characteristic spectral shape. Thus, when talking, the articulators must move to
achieve the (p. 526) spatial-temporal configurations necessary for each speech sound.
Such time-varying configurations of the vocal tract not only deform the immediate re­
gions around the oral aperture but also encompass a much larger region of the face (Pre­
minger et al., 1998; Thomas & Jordan, 2004; Yehia et al., 1998). Because some of the ar­
ticulators of the oral aperture are visible along with their distributed correlates around
the face, there is an inherent relationship between the acoustic speech sounds and their
visible generators.

Properties and Perception of the Auditory


Speech Signal
Spoken language comprehension is one of our most complex and striking cognitive abili­
ties: A noisy, incomplete, and usually ambiguous acoustic waveform must be parsed to lin­
guistic representational units, which will ultimately allow extraction and decoding of the
meaning intended by the talker (Pisoni & Luce, 1987). A major challenge in speech per­
ception, therefore, has been to provide an account of how these mappings (i.e., from a
continuously varying speech waveform onto the discrete linguistic units such as
phonemes; and from discrete units to semantics) are accomplished. Following the devel­
opment of the sound spectrograph1 (Joos, 1948; Potter et al., 1947), much of the study of
speech perception focused on the acoustic invariants of the complex signal underling the
perception of individual sound segments or phonemes (see Raphael, 2005, for a review of
this body of work). This research failed to find consistent correspondences between phys­
ical properties of the acoustic signal and the perceived phonemes (Miller & Eimas, 1995).
Rather, the speech signal is continuous and exhibits a considerable amount of variability
(i.e., the “lack of invariance”). One of the sources of this variability is coarticulation: In
natural, fluent speech, given the obvious need for rapid transitions from one articulatory
configuration to another, the acoustic signal at any particular time reflects not only the
current segment being produced but also previous and subsequent segments. For in­
stance, even though /di/ and /du/ audibly share the /d/ phoneme, the acoustical character­
istics (e.g., formant transitions) of /d/ vary considerably depending on the following vowel
(Liberman et al., 1954). Furthermore, in addition to coarticulation, differences between
talkers’ vocal tracts, speaking rate, and dialect, along with variation in social contexts
and environmental characteristics (e.g., reverberation, noise, transmission media), can
produce large changes in the acoustic properties of speech (Klatt, 1986). The fact that lis­
teners are able to identify phonemes categorically despite such acoustic/phonemic vari­
ability suggests that acoustic signal is replete with numerous redundant features, which
vary with the context of the spoken utterance. Individually, none of these acoustic cues
would be necessary or sufficient to signal the phonemic identity (Liberman et al., 1967).
For instance, Lisker (1986) identified sixteen possible cues to the opposition of stop con­
sonants (e.g., /b/ and /p)/ in an intervocalic lexical context (e.g., rabid vs. rapid). In gener­

Page 4 of 56
Multimodal Speech Perception

al, however, a few basic features can be indexed to signal place, manner, and voicing of
consonants or frontness versus backness and height of vowels.

The absence of reliable acoustic boundary markers in phoneme segments led a number of
researchers to abandon the notion that a phonemic level of representation is activated
during speech processing. Alternative accounts proposed, among others, syllables (see
Dupoux, 1993, for a review), context-sensitive allophones (Wickelgren, 1969), and articu­
latory gestures (Fowler, 1986; Liberman et al., 1967, Liberman & Mattingly, 1985)2 as the
minimal units that the listeners could use to parse the incoming input. Some alternative
models have even proposed that lexical representations are compared directly with a
transformed speech signal, with no intermediate stages of processing (Klatt, 1989).

According to the great majority of psycholinguistic models of word recognition, therefore,


listeners would extract prelexical representational units (in whatever form) that would
then be matched to representations of words stored in long-term memory. These segmen­
tal cues would also convey information about a word’s functional, semantic, and syntactic
roles that would help the listener to parse and interpret the utterance. Prosodic informa­
tion (i.e., suprasegmental features such as rhythm, stress, and intonation of speech) is al­
so processed to define word, phrase, and sentence boundaries, stress patterns, and syn­
tactic structure (Cutler & Butterfield, 1992; Soto-Faraco et al., 2001). Besides the linguis­
tic structure of the message, suprasegmental features also provide information about cer­
tain pragmatic aspects of the conversational situation, such as the emotional state or
communicative intent.

Finally, when considering the mapping from prelexical segments to words stored in the
mental lexicon, some models have proposed that information only flows in one direction:
from sounds to words, without any backward influence (e.g., autonomous models such as
the merge model; (p. 527) Norris et al., 2000). According to other interactive models, how­
ever, the flow of information between prelexical and lexical stages of processing is bidi­
rectional, with top-down feedback from lexical forms to earlier acoustic and phonemic
processing contributing to word recognition (e.g., TRACE model; McClelland & Elman,
1986). The debate between autonomous and interactive models of word recognition has
become one central topic in the psychology of speech and seems to be far from being re­
solved (see Norris et al., 1995, for a review).

Properties and Perception of the Visual Speech


Signal
In comparison to what is now known about speech perception, the study of the specific
contribution of visual information to speech processing in the absence of sound (also
called speech reading or lip reading)3 has a more limited history, with far fewer studies of
its features and its perception. Relative to auditory speech stimuli that are presented un­
der good listening conditions, the ability to discriminate words from sight alone is gener­
ally rather limited (Bernstein et al., 1998) and can be influenced by factors such as lin­

Page 5 of 56
Multimodal Speech Perception

guistic context (e.g., Rönnberg et al., 1998; Samuelsson & Rönnberg, 1993) or training.
Although there is great individual variability in the ability to speech-read, some highly
skilled individuals are capable of reaching high comprehension levels (e.g., Andersson &
Lidestam, 2005).

The impoverishment of visual phonetic signals relative to acoustic phonetic signals is


mainly due to the ambiguity of the visual patterns. Even though visual speech contains
highly salient cues to certain critical aspects of speech (e.g., place of articulation, vowel
rounding), other aspects of speech (e.g., manner of articulation, voicing) are realized by
articulators usually hidden from view (MacLeod & Summerfield, 1987). For example, the
vibrations of the vocal folds, a critical feature to distinguish voiced and unvoiced conso­
nants, cannot be perceived through the visual modality (Jackson, 1988; though see Mayer
et al., 2011). Furthermore, those phonemes that can be easily seen are often indistin­
guishable from each other because the places of articulation may be very closely located
(Lidestam & Beskow, 2006). As a result, only clusters of phonemes can be visually distin­
guished from each other. These are referred to in the literature as visemes (Fisher, 1968).
Differences in clarity of articulation across talkers and the external conditions in which
the sounds are produced influence the visual information that is present. Thus, there is a
lack of agreement among researchers regarding the exact grouping of phonemes into
visemes. Depending on the study, the approximately forty-two phonemes of American
English have been grouped into as few as five to as many as fifteen visemes (see Jackson,
1988). Some clusters, however, have been consistently defined in all these studies. For ex­
ample, the phonemes /p/, /b/, and/m/ (bilabial group) are articulated at the same place
(lips) and appear the same visually (Auer & Bernstein, 1997; Massaro, 1998; Summer­
field, 1987). Consonants and vowels have been shown to map onto distinct visemes
(Campbell & Massaro, 1997; Owens & Blazek, 1985; Rosenblum & Saldaña, 1998).

Because of this high degree of visual confusability, it could be expected to find that many
words are indistinguishable from each other on the basis of the visual information alone
(a phenomenon called homophony; Berger, 1972; Nitchie, 1916). However, some studies
have shown that a reduction in phonemic distinctiveness does not necessarily imply a loss
of word recognition in visual speech perception. Indeed, lexical intelligibility during
speech reading is much better than would be expected on the basis of the visemic reper­
toire alone (Auer, 2002; Auer & Bernstein, 1997; Bernstein et al., 1997; Mattys et al.,
2002). Additional information such as knowledge of phonotactic constraints (i.e.,
phoneme patterns that constitute words in the language; Auer & Bernstein, 1997), re­
duced lexical density (i.e., the degree of visual similarity to other words in the lexicon),
high frequency of occurrence (Auer, 2009), and semantic predictability (Gagné et al.,
1991) may allow individuals to identify a word even in a reduced phonetic representation.

Visual primitives of speech have also been described based on time-varying features of ar­
ticulation (see Jackson, 1988; Summerfield, 1987). For instance, Rosenblum & Saldaña
(1996) showed that isolated visual time-varying information of the speech event is suffi­
cient for its identification. In fact, recent research suggests that the information that can
be retrieved from the visual speech may be much more detailed than previously thought.

Page 6 of 56
Multimodal Speech Perception

Growing evidence shows that subtle jaw, lip, and cheek movements can be perceived by a
viewer, thus allowing finer distinctions within gestures belonging to the same viseme
(Vatikiotis-Bateson et al., 1996; Yehia et al., 2002). This suggests that visible speech fea­
tures can be described along kinematic as well as static dimensions, a point to which we
shall return in a following section (see later section, Correlated and Complementary Na­
ture of Facial Movements and Vocal Acoustics).

Besides providing information of the phonemes, visual speech cues have been
(p. 528)

shown to improve the recognition of prosodic aspects of the message. For example,
phonologically similar languages can be discriminated on the basis of the visual informa­
tion alone, possibly owing to differences in the rhythmic pattern of the languages (Soto-
Faraco et al., 2007; Weikum et al., 2007). Similarly, other prosodic dimensions, such as
emphatic stress, sentence intonation (Fisher et al., 1969), and pitch changes associated
with lexical tone (Burnham et al., 2000), can be processed by using visual cues alone,
sometimes even when the lower part of the face is occluded (Cvejic et al., 2010; Davis &
Kim, 2006). For instance, raising eyebrow movements (Granström et al., 1999) or eye
widening (Massaro & Beskow, 2002) can serve as an independent prosodic cue to promi­
nence, and observers preferentially look at these regions in prosody-related judgments
(Buchan et al., 2004; Lansing & McConkie, 1999). In fact, visual cues such as movements
of the head and eyebrows have been shown to correlate with the basic cues for prosody in
the auditory domain, namely changes in voice pitch, loudness, or duration (Foxton et al.,
2009; Munhall et al., 2003). However, the visual contribution to the prosodic aspects of
the message is not necessarily limited to the upper part of the face. That is, head move­
ments also include parts of the lower face (i.e., chin; Figure 26.2). The neck and chin
movements can provide information about the identity of lexical tones (e.g., Chen & Mas­
saro, 2008), and the magnitude of mouth movements can possibly be cues for the percep­
tion for loudness (i.e., amplitude). Therefore, the different regions of the face can be in­
formative for various dimensions and to various degrees.

Page 7 of 56
Multimodal Speech Perception

Visual Contributions to Speech Intelligibility

Figure 26.2 Two-dimensional plots of the motion of


seventy markers on the face during speech produc­
tion. Panel A shows the motion in space that results
from the combined face and head motion. Panel B
shows the facial motion with the head motion re­
moved.

Figure 26.3 Perception of speech in noise for audito­


ry-only presentation (blue line) and audiovisual pre­
sentation (black line) in a closed-set task with thirty-
two words.

Adapted from Sumby & Pollack, 1954. Reprinted with


permissions from Acoustical Society of America.

Page 8 of 56
Multimodal Speech Perception

When speech is perceived bimodally, such as audiovisually or even visuotactually (see lat­
er section, Tactile Contributions to Speech), perception is often enhanced in a synergistic
fashion, especially if the auditory input is somehow degraded. That is, speech-reading
performance in bimodal conditions is better than the pooled performance from the uni­
modal conditions (Summerfield, 1987). In the first known experimental demonstration of
this phenomenon (Cotton, 1935), the voice of a talker speaking inside a dark sound booth
was filtered and masked with a loud buzzing noise, rendering it almost unintelligible.
However, when the lights of the sound booth were switched on and participants could see
the talker’s face, they correctly reported most of the words. Almost 20 years later, these
findings were quantified by Sumby and Pollack (1954), who showed that when the intelli­
gibility of acoustic speech is impoverished by adding noise, the concurrent presentation
of its corresponding visual speech cues can improve comprehension to a degree equiva­
lent to increasing acoustic signal-to-noise ratio by 15 to 20 dB (see also Rosenblum et al.,
1996; Ross et al., 2006; Figure 26.3). Because this gain in performance is so large, the
combination of multiple sources of information about speech has the potential to be ex­
tremely useful for listeners with hearing impairments (Berger, 1972). Indeed, the benefits
derived from having access to the speaker’s facial speech information have been docu­
mented in listeners with mild to more severe types of hearing loss (p. 529) (Payton et al.,
1994; Picheny et al., 1985) and even deaf listeners with cochlear implants (Rouger et al.,
2007, 2008).

Nonetheless, the visual contribution to speech perception has been demonstrated even in
circumstances in which the auditory signal is not degraded. For example, novice speakers
of a second language often report that face-to-face conversation is easier than situations
without visual support (Navarra & Soto-Faraco, 2007; Reisberg et al., 1987). Similarly,
Reisberg et al. (1987) demonstrated that when listening to perfectly audible messages
from a speaker with a heavy foreign accent or to a passage with difficult semantic con­
tent (e.g., Kant’s Critique of Pure Reason), the availability of the visual information en­
hances comprehension (see also Arnold & Hill, 2001).

This synergy effect in bimodal stimulation contexts may be attributed to at least two sepa­
rate mechanisms. First, the perceiver might exploit time-varying features common to the
physical signals of both the acoustical and the visual input. That is, in addition to match­
ing onsets and offsets, the concurrent streams of auditory and visual speech information
are usually invariant in terms of their tempo, rhythmical patterning, duration, intensity
variations, and even affective tone (Lewkowicz et al., 2000). These common (“amodal”)
properties have been proposed to be the critical determinants for integration by some
models of audiovisual speech perception (Rosenblum, 2008; Studdert-Kennedy, 1989;
Summerfield, 1987).4 Second, the auditory and visual signals are fused to a percept that
is more than the sum of its parts. That is, because each modality contains partially inde­
pendent phonological cues about the same speech event, recognition is boosted when
both sources of information are available.

Page 9 of 56
Multimodal Speech Perception

Correlated and Complementary Nature of Fa­


cial Movements and Vocal Acoustics
As the act of talking unfolds, the movement of articulators to produce the acoustic output
and the speech-related facial motion results in a structural coupling between auditory
and visual events. Whereas the primary locus of visual speech information is around the
mouth and jaw—owing to their principal role in speech sound generation—the motion of
articulation spreads across the entire face (Vatikiotis-Bateson et al., 1996). Observation of
the articulators during speech can give direct information regarding not only the place of
articulation but also the onset, offset, and rate of change speech, as well as the overall
amplitude contour (Grant & Seitz, 2000) and the spectral properties of the acoustic signal
(Grant, 2001). In fact, investigators have found robust correlations between the spa­
tiotemporal characteristics of vocal tract configurations, visual facial information, and the
acoustic output. For instance, Yehia et al. (1998) showed that the spectral envelope of the
speech signal can be estimated with more than 90 percent accuracy simply by tracking
the position of the talker’s moving head or even by the motion of the tongue alone (an ar­
ticulator that is not necessarily coupled with the face; Yehia et al., 1998). Similarly,
Munhall et al. (2004) showed a kinematic–acoustic relation between head motion alone
(i.e., with no facial motion) and the pitch (fundamental frequency) and amplitude root
mean square (RMS) of the speech sound during natural speech production. Chan­
drasekaran et al. (2009) computed the frequency spectra of the envelopes of the auditory
signal and the corresponding mouth area function to identify the temporal structure of
the auditory envelope and the movement of the mouth, and found that audiovisual speech
has a stereotypical rhythm that is between 2 and 7 Hz (see also Ohala, 1975).

Given that there is so much coherence between speech production components, it makes
sense that perceivers are sensitive to this structure and benefit from the time-varying re­
dundancies across modalities to decode the spoken message more reliably (Grant & Seitz,
2000; Schwartz et al., 2002). This redundancy may permit early modulation of audition by
vision. For example, the visual signal may amplify correlated auditory inputs (Schroeder
(p. 530) et al., 2008). Grant and Seitz (2000) showed that the ability to detect the presence

of auditory speech in a background of noise improved by seeing a face articulating the


same utterance. Critically, this benefit depended on the correlation between the sound in­
tensity (RMS energy in the mid- to high-frequency energy envelope) and the articulator
movement (see also Bernstein et al., 2004; Eskelund et al., 2011; Kim & Davis, 2004).
Similar detection advantages have been recently reported in the opposite direction, that
is, when the visual input is presented in noise. Kim et al. (2010) showed that the detection
of a point-light talking face among dynamic visual noise is improved when a temporally
correlated, noninformative, auditory speech stream is present.

Page 10 of 56
Multimodal Speech Perception

Figure 26.4 Confusion trees for (A) lip reading and


(B) auditory perception of speech. At the top of each
tree, all of the consonants can be distinguished. As
the branches are followed downward, the tree dis­
plays the common confusions that are made in each
modality as (A) the degree of visual distinctiveness
decreases or (B) signal-to-noise level decreases.

Reprinted with permission from Summerfield, 1987.

It has been suggested that the redundancy of information perceived from the talking
head, in particular in the common dynamic properties of the utterance, may be the metric
listeners use for perceptual grouping and phonetic perception (Munhall et al., 1996; Sum­
merfield, 1987). In keeping with this idea, several studies have shown that the visual en­
hancement of speech perception depends primarily on dynamic rather than static charac­
teristics of facial images. For example, Vitkovitch and Barber (1994) have demonstrated
that speech-reading accuracy for auditory dynamic noise decreases as the frame rate
(temporal resolution) of the video of the speaker’s face drops below 16 Hz. Furthermore,
the temporal characteristics of facial motion can enhance phonetic perception, as demon­
strated by use of dynamic point-light displays (Rosenblum & Saldaña, 1996). Neuropsy­
chological data confirm the specific role of the dynamic, time-varying characteristics on
audiovisual speech integration. Munhall et al. (2002) presented dynamic and static (i.e.,
single frame) visual vowels to a patient who had suffered selective damage to the ventral
stream, a brain region involved in the discrimination of forms (e.g., faces, a deficit known
as agnosia). The authors found that, whereas the patient was unable to identify any
speech gestures from the static photographs, she did not differ from controls in the dy­
namic condition (see also Campbell, 1997, for a similar case). Overall, these results sug­
gest that local motion cues are critical for speech recognition.

The second way in which visible speech influences auditory speech is by providing com­
plementary information. In this case, vision provides stronger cues than the auditory sig­
nal, or even information that is missing from the auditory signal. As Summerfield (1987)
pointed out, the speech units that are most confusable visually are not those that are
most confusable with auditory stimuli (Figure 26.4). In Miller and Nicely’s (1955) classic
study of consonant perception in noise, place of articulation is one of the first things to
become difficult to perceive as the signal-to-noise ratio decreases. The auditory cues re­
garding manner of consonants (e.g., stops vs. fricatives) are less susceptible to noise
while voicing, and presence of nasality is even more robust. These latter cues are very
weakly conveyed visually, if at all. Thus, audio and visual speech cues are complementary;
visual cues may be superior in conveying information about the place of articulation (e.g.,
Page 11 of 56
Multimodal Speech Perception

at the lips vs. at the back of the mouth), whereas auditory cues may be more robust for
conveying other phonetic information, such as the manner of articulation and voicing. The
visual input, therefore, provides (p. 531) redundant cues to reinforce the auditory stimulus
and can also be used to disambiguate some speech sounds with quite similar acoustics,
such as /ba/ versus /da/, which differ in place of articulation. The most famous multisenso­
ry speech phenomenon, the McGurk effect, probably has its roots in this difference in in­
formation strength.

Classic Demonstration of Audiovisual Speech


Integration: McGurk Effect

Figure 26.5 Schematic showing the McGurk effect.


In the standard version of the illusion, the face is
presented saying the syllable /ga/ while the auditory
syllable /ba/ is played simultaneously. Common per­
ceptions are shown on the right.

The most compelling illustration of the consequences of audiovisual speech integration is


the McGurk illusion (McGurk & MacDonald, 1976), which does not involve noisy acoustic
conditions or complex messages. In this illusion, observers exposed to mismatched audi­
tory and visual speech signals often experience (i.e., hear) a phoneme different from that
originally presented in either modality (Figure 26.5). For instance, in the classic “fusion”
version of the illusion, a visual /ga/ consonant is dubbed in synchrony with an acoustic /
ba/. This new audiovisual syllable is perceived by most subjects as beginning with a dif­
ferent consonant. Depending on the particular tokens used, most people hear /da/, /tha/,
or /ga/. The contradictory visual information alters perception of the acoustic signal, /ba/,
and this has been characterized as a fusion of phonetic features. There are a number of
important aspects to this illusion. First, the voicing category of the consonant is deter­
mined by the auditory signal. Second, the use of /b/ as the auditory consonant is impor­
tant to the illusion. Although this stop consonant is correctly categorized when presented
acoustically, this perception is not necessarily “strong.” The best McGurk illusions result
from “weaker” /b/ tokens, that is, tokens for which the acoustic cues regarding place of
articulation are more ambiguous. Finally, in the illusion, the perceived place of articula­
tion changes from bilabial to something else. This may result either from the absence of
visual bilabial movement, one of the clearest of visual speech cues, or from perceiving the
visual information for other consonants. For most talkers, the visual cues for a /g/ are also
somewhat ambiguous. Thus, the main visual cue received by the listener is that a nonlabi­
al consonant is being produced. In summary, the illusory percept in the McGurk effect re­
sults from the presence of a weak auditory signal and the absence of the expected and

Page 12 of 56
Multimodal Speech Perception

complementary signal for /b/. Another version of the McGurk effect, the combination illu­
sion, reinforces this interpretation. In the combination illusion (sometimes called audiovi­
sual phonological fusion; Radicke, 2007; see also Troyer et al., 2010), two stimuli for
which the cues are very strong are presented: a visual /ba/ and an acoustic / ga/. In this
case, both consonants are perceived as a cluster /bga/. This version of the illusion is more
reliable and more consistently perceived than the fusionillusion.

The McGurk effect has been replicated many times with different stimuli under a variety
of manipulations (e.g., Green & Gerdeman, 1995; Green et al., 1991; Jordan & Bevan,
1997; MacDonald & McGurk, 1978; Massaro & Cohen, 1996; Rosenblum & Saldaña,
1996), and has often been described as being very compelling and robust. However, after
extensive experience in the laboratory with this type of stimuli, one cannot help but no­
tice that the effect and the illusory experience derived from it are not as robust as usually
described in the audiovisual speech literature. For instance, it is not uncommon to find
that the effect simply fails to occur for utterances produced by some talkers, even when
the auditory and visual channels are accurately dubbed (Carney et al., 1999). Moreover,
even for those talkers more amenable to producing the illusion, the effect is usually only
observed in a (p. 532) percentage of trials over the course of an experiment (Brancazio,
2004; Brancazio & Miller, 2005; Massaro & Cohen, 1983), contrary to what was claimed
in the original report (McGurk & MacDonald, 1976, p. 747). Furthermore, even when au­
diovisual discrepancy is not recognized as such, the phenomenological experience arising
from a McGurk-type stimulus is often described as being different from the experience of
a naturally occurring equivalent audiovisual event. On the other hand, not experiencing
the illusion does not necessarily mean that there is no influence of vision on audition
(Brancazio & Miller, 2005; Gentilucci & Cattaneo, 2005). Brancazio and Miller, for in­
stance, found that a visual phonetic effect involving the influence of visual speaking rate
on perceived voicing (Green & Miller, 1985) occurred even when observers did not expe­
rience the McGurk illusion. According to the authors, this result suggests that the inci­
dence of the McGurk effect as an index for audiovisual integration may be underestimat­
ing the actual extent of interaction among the two modalities. Along the same lines, Gen­
tilucci and Cattaneo (2005) showed that even when participants did not experience the
McGurk illusion, an acoustical analysis of the participants’ spoken responses revealed
that these utterances were always influenced by the lip movements of the speaker, sug­
gesting that some phonetic features present in the visual signal were being processed.

Another factor that often puzzles researchers is that the overall incidence of the McGurk
effect for a given stimulus typically varies considerably across individuals (Brancazio et
al., 1999), with some people not experiencing the illusion at all. Whereas these differ­
ences have been explained in terms of modality dominance (i.e., individual differences in
the weighting of auditory vs. visual streams during integration; Giard & Peronnet, 1999),
it is still surprising that some people can weigh the auditory and visual information in
such a way that the information provided by one modality can be completely filtered out.

Page 13 of 56
Multimodal Speech Perception

Given that this phenomenon has traditionally been used to quantify the necessary and
sufficient conditions under which audiovisual integration occurs (some of these are dis­
cussed below), we believe a full understanding of the processes underlying the McGurk
effect is required before extrapolating the results to naturally occurring (i.e., audiovisual
matching) speech (see also Brancazio & Miller, 2005).

Tactile Contributions to Speech


Besides vision and hearing, the tactile modality has also been shown to be an efficient in­
put channel for processing speech information. Strong support for the capacity to use
touch as a communicative sense is provided by individuals born both deaf and blind, and
that, by means of a variety of natural communication methods, have been able to acquire
a full range of spoken language abilities. For instance, in tactile finger spelling, the manu­
al alphabet of the local sign language is adapted, so that by placing the palms over the
signer’s hands, the deaf-blind person can feel the shape, movement, and location of the
different signs. Particularly noteworthy, however, is the Tadoma method of speech percep­
tion, which is based on the vibrotactile reception of the articulatory movements and ac­
tions that occur during the production of speech. In this method, the hand of the deaf-
blind receiver is placed over the speaker’s face in such a way that the little finger, on the
throat, detects laryngeal vibration, the ring and middle fingers pick up information on the
jaw and cheek movement, the index finger detects nasal resonance, and the thumb de­
tects lip movement and airflow changes (Weisenberger et al., 1989; Figure 26.6). Previ­
ous research has demonstrated that high levels of comprehension can be reached by us­
ing the Tadoma method. That is, proficient users achieve almost normal communication
with this method (i.e., they can track 80 percent of the key words in running speech, at a
rate of three syllables per second; Reed et al., 1982, 1985). This suggests that the sense
of touch has sufficient capacity for decoding time-varying complex cues in the speech sig­
nal.

Figure 26.6 Drawing of an individual communicating


using Tadoma. The perceiver places his hand on the
face and throat of the talker as she talks and he can
perceive her speech at a high level of intelligibility.

Page 14 of 56
Multimodal Speech Perception

A few studies have shown that even untrained hearing individuals can benefit from this
form of natural tactile speech reading when speech is presented in adverse listening con­
ditions (Fowler & Dekle, 1991; Gick et al., 2008; Sato et al., 2010). For instance, Gick et
al. (2008) found that untrained participants could identify auditory or visual syllables
about 10 percent better when they were paired with congruent tactile information from
the face. Similarily, Sato et al. (2010) presented participants with auditory syllables em­
bedded in noise (e.g., / ga/) alone or together with congruent (e.g., /ga/) and incongruent
(e.g., /ba/) McGurk-like combinations (e.g., /bga/) tactile utterances. The results showed
that manual tactile contact with the speaker’s face coupled with congruent auditory infor­
mation facilitated the identification of the syllables. Interestingly, they also found that al­
though auditory identification was significantly decreased by (p. 533) the influence of the
incongruent tactile information, participants did not report combined illusory percepts
(e.g., /bda/). Instead, on some trials, they selected the haptically specified event. This re­
sult is in line with previous findings by Fowler and Dekle (1991), who showed no evidence
of illusory McGurk percepts as a result of audiotactile incongruent pairings (only one of
seven tested participants reported hearing a new percept under McGurk conditions). The
fact that audiotactile incongruent combinations do not elicit illusory percepts, together
with the fairly small audiotactile benefits observed in the matching conditions (in compar­
ison to audiovisual matching conditions), raises theoretical questions regarding the com­
binative structuring of audiotactile speech information. That is, these results suggest that
rather than genuine perceptual interactions, audiotactile interactions may be the result of
postperceptual decisions reached through probability summation of the two sources of in­
formation (see Massaro, 2009). Note that if visual and tactile information were not per­
ceptually integrated but rather were processed in parallel and independently of each oth­
er, one should observe improvements in the syllable identification (because of signal re­
dundancy), but incongruent conditions should never lead to hearing new percepts. This is
the exact pattern of results found in these studies. The finding that integrated processing
in speech perception is highly dependent on the modality through which information is
initially encoded clashes with a recent hypothesis suggesting that audiotactile speech in­
formation is integrated in a similar way as synchronous audiovisual information (Gick &
Derrick, 2009, p. 503). In a recent study, Gick and Derrick (2009) paired acoustic speech
utterances (i.e., the syllables /pa/ or /ba/) with small bursts of air on their necks or hands
and found that participants receiving puffs of air were more likely to perceive both
sounds as aspirated (i.e., /pa/). According to the authors, the fact that this effect occurred
in untrained perceivers and at body locations unlikely to be reinforced by frequent experi­
ence suggest that tactile information is combined in a natural manner with the auditory
speech signal. However, it is more likely that this interference is due to postperceptual
analysis (Massaro, 2009).

Tactile Aids

Given the substantial success of Tadoma and other natural methods for tactile communi­
cation, researchers have become interested in exploiting this modality as an alternative
channel of communication in individuals with severe hearing impairments. The possibility

Page 15 of 56
Multimodal Speech Perception

of tactile information working as a speech-reading supplement is grounded on the com­


plementary nature of visual and tactile cues, analogous to the previously described rela­
tionship between auditory and visual information in speech (Summerfield, 1987). That is,
lip and jaw movements, two of the parameters Tadoma users exploit to extract linguistic
information, are available to the sighted lip reader with hearing deficits.

The effort in developing tactile aids to assist speech reading and maximize sensory redun­
dancy started in 1924 when Dr. Robert Gault built an (p. 534) apparatus, a long tube that
the speaker placed in front of the mouth and the receiver held in her hands, that deliv­
ered unprocessed sound vibrations to the receivers (Gault, 1924). Although Gault’s vibra­
tor only transmitted very limited cues for speech recognition (possibly only the temporal
envelope; see Levitt, 1995), it was nevertheless found to be useful to supplement speech
reading. The relative success in this early work boosted the development of more sophis­
ticated sensory substitution systems,5 aiming at decoding different aspects of speech sig­
nals into patterns of tactile stimulation.

Haptic devices have varied with respect to the type of transducers (electrotactile vs. vi­
brotactile), the stimulated body site (e.g., finger, hand, forearm, abdomen, and thigh), and
the number and configurations of stimulators (e.g., single-channel vs. multi-channel stim­
ulation; see Levitt, 1988, for a review). In the single-channel approach, for example, a sin­
gle transducer directly presents minimally processed acoustic signals, thus conveying
global properties of speech such as intensity, rhythm, and energy contours (Boothroyd,
1970; Erber & Cramer, 1974; Gault, 1924; Gault & Crane, 1928; Schulte, 1972). In multi­
channel schemes, the acoustic input signal is decomposed into a number of frequency
bands, the outputs of which drive separate tactile transducers that present a spectral dis­
play along the user’s skin surface. The skin, thus, is provided with an artificial place
mechanism for coding frequencies. In most systems, moreover, the sound energy at a giv­
en locus of stimulation cues intensity in the corresponding channel.

Psychophysical evaluations of performance with these methods have usually been done by
comparing speech reading alone, the tactile device alone, and speech reading plus the
tactile device. These studies have shown that after extensive training, vocoders can pro­
vide enough feedback to improve speech intelligibility both when used in isolation and
presented together with visual speech information (Brooks & Frost, 1983; Brooks et al.,
1986). Nevertheless, none of these artificial methods has reached the performance
achieved through the Tadoma method. Such performance differences may be partly at­
tributed to the superior overall richness that the Tadoma displays. Whereas Tadoma con­
veys a variety of sensory qualities directly tied to the articulation process, tactile devices
encode and display acoustic information in an arbitrary fashion (i.e., employing a sensory
system not typically used for this sort of information; Weisenberg & Percy, 1995). Further­
more, as pointed out before (see earlier section, Properties and Perception of the Audito­
ry Speech Signal), phonemes are signalled by multiple acoustic cues that vary greatly as
a function of the context. It is possible, therefore, that some critical acoustic information
is lost in the processing and transduction when using such an apparatus.

Page 16 of 56
Multimodal Speech Perception

In terms of the information provided, single-channel devices have been generally de­
scribed as being superior in conveying supra-segmental information, such as syllable
number, syllabic stress, and intonation (Bernstein et al., 1989; Carney & Beachler, 1986).
Multichannel aids, on the other hand, show better performance for tasks requiring the
identification of fine-structure phoneme information (both single-item and connected
speech; Brooks & Frost, 1983; Plant, 1989; Summers et al, 1997; Weisenberger et al.,
1991; Weisenberger & Russell, 1989). Other studies, however, have reported fairly simi­
lar-sized benefits in extracting segmental and supra-segmental information regardless of
the number of channels. For instance, Carney (1988) and Carney and Beachler (1986)
compared a single-channel and a twenty-four-channel vibrotactile device in phoneme
recognition tasks, and reported similar levels of performance under both tactile aid alone
and speech reading plus tactile aid conditions. Hanin et al. (1988) measured the percep­
tion of words in sentences by speech reading with and without tactile presentation of
voice fundamental frequency (F0) using both multichannel display and single-channel dis­
plays. Mean performance with the tactile displays was found to be slightly, but signifi­
cantly, better than speech reading alone, but no significant differences were observed be­
tween the two displays.

Overall, what is clear from these studies, however, is that providing complementary tac­
tile stimulation—in whatever form—is effective in helping hearing-impaired individuals to
decode different kinds of linguistic information such as segmental (e.g., Yuan et al.,
2005), and supra-segmental features (e.g., Auer et al., 1998; Bernstein et al., 1989; Grant
et al., 1986; Thompson, 1934), closed-set words (Brooks & Frost, 1983), and even con­
nected speech (Miyamoto et al., 1987; Skinner et al., 1989).

Sensory Factors in Audiovisual Speech Integra­


tion
Temporal Constraints in Audiovisual Speech Perception

As previously mentioned (see earlier section, Multisensory Integration: Cross-Talk Be­


tween The (p. 535) Senses), the temporal synchrony of audiovisual signals provides a pow­
erful cue for linking multi-sensory inputs (see Vroomen & Keetels, 2010, for a tutorial re­
view). One compelling example of the role that temporal synchrony plays in multisensory
speech perception can be seen in the discomfort experienced when stimuli from different
modalities occur separated by a large temporal gap, as can happen during television
broadcast or video playback in which the face and the voice are out of sync (see Hamilton
et al., 2006, for a study of a patient who consistently perceives natural speech in asyn­
chrony). Studies investigating the temporal structure of multisensory events have shown,
however, that whereas multisensory signals must be relatively synchronized to be per­
ceived as a single event, strict temporal overlapping is by no means necessary. Instead,
there is a temporal window of integration over which asynchronies between sensory
modalities are not detected and multisensory effects (i.e., visual enhancement of speech

Page 17 of 56
Multimodal Speech Perception

intelligibility and the McGurk effect) are still observed. This temporal window of integra­
tion is thought to compensate for small temporal delays between modalities that may
arise under natural conditions because of both the physical characteristics of the arriving
inputs (i.e., differences in the relative time of arrival of stimuli at the eye and ear) and
biophysical differences on sensory information processing (i.e., differences in neural
transduction latencies between vision and audition; Spence & Squire 2003; Vatakis &
Spence, 2010).6

In the area of audiovisual speech perception, results suggest that the perceptual system
can handle a relatively large temporal offset between auditory and visual speech signals.
For example, studies measuring sensitivity to intermodal asynchrony (i.e., judgments
based on the temporal aspects of the stimuli) for syllables (Conrey & Pisoni, 2006) or sen­
tences (Dixon & Spitz, 1980) have consistently identified a window of approximately 250
ms over which auditory-visual speech asynchronies are not reliably perceived. Further­
more, this time window is often found to be longer when the visual input precedes the au­
ditory input than when the auditory input precedes the visual. For instance, in one of the
first attempts to find the threshold for detection of asynchrony, Dixon and Spitz (1980)
presented participants with audiovisual speech streams that became gradually out of
sync, and instructed them to press a button as soon as they noticed the asynchrony. They
found that the auditory stream had to either lag by 258 ms or lead by 131 ms before the
discrepancy was detected. Later studies using detection methods less susceptible to bias
(see Vatakis & Spence, 2006, for a discussion on this),7 have provided a sharper delimita­
tion of this temporal window, showing limits ranging between 40 and 100 ms for auditory
leading (see Soto-Faraco & Alsius, 2007, 2009; Vatakis et al., 2006) and up to 250 for
video leading stimuli (Grant et al. 2004; Soto-Faraco & Alsius, 2009; Vatakis & Spence
2006; Figure 26.7).

Page 18 of 56
Multimodal Speech Perception

Figure 26.7 Meta-analysis of the window of per­


ceived synchrony for a range of different asyn­
chronies and different stimuli. SC, syllable catego­
rization; SJ, simultaneity judgment; TJ, temporal or­
der judgment; 2IFC, two-interval forced choice. The
white area in the temporal window of integration col­
umn corresponds to the stimulus onset asynchrony
(SOA) range in which the auditory information is pre­
sented before the visual information. The shaded
area corresponds to the SOA range in which the visu­
al information is presented before the auditory infor­
mation. The dark horizontal lines correspond to the
window of audiovisual integration.

Consistent findings are found in studies estimating the boundaries of this temporal win­
dow by quantifying the magnitude of multisensory effects, such as visual enhancement of
speech intelligibility (Conrey & Pisoni, 2006; Grant & Greenberg, 2001; McGrath & Sum­
merfield, 1985; Pandey et al., 1986) or the McGurk illusion (Jones & Jarick, 2006; Mas­
saro & Cohen; 1993; Massaro et al., 1996; Miller & D’Esposito, 2005; Munhall et al.,
1996; Soto-Faraco, & Alsius, 2007, 2009; van Wassenhove et al., 2007). For instance,
Munhall et al. (1996) presented McGurk syllables at different asynchronies (i.e., from a
360-ms auditory lead to a 360-ms auditory lag, in 60-ms steps) and found that, although
illusory percepts prevailed at small asynchronies, a significant amount of fused responses
were still observed when the audio track lagged the video track for 240 ms, or when it
lead it by 60 ms. In a recent report, however, this temporal window was reported to be
much wider (480 ms with an audio lag and 320 ms with an audio lead; Soto-Faraco & Al­
sius, 2009). According to some researchers, this much greater tolerance for situations in
which the audio signal lags the visual than for situations in which the visual signal lags
the audio can be explained by perceivers’ experience with the physical properties of the
natural world (see Munhall et al., 1996). That is, in audiovisual speech, like in many other
natural audiovisual occurrences, the visual information (i.e., the visible byproducts of
speech articulation) almost always precedes the acoustic output (i.e., speech sounds). It is

Page 19 of 56
Multimodal Speech Perception

conceivable, therefore, that after repeated exposure to visually leading occurrences in the
external word, our perceptual system has adapted to tolerate and bind visual-leading mul­
tisensory events to a greater extent than audio-leading stimuli. Indeed, this hypothesis is
supported by recent studies showing that the temporal window of integration is adapt­
able in size, as a result of perceptual experience. For instance, repeated exposure to tem­
porally misaligned speech stimuli can alter the perception of synchrony or asynchrony
(Vatakis & Spence, (p. 536) 2007; see also Fujisaki et al., 2004, and Vroomen et al., 2004,
for nonspeech related results).

Furthermore, the ability of the human perceptual system to deal with audiovisual asyn­
chronies appears to vary as a function of the nature of the stimuli with which the system
is being confronted. For instance, for stimuli with limited informational structure within
either modality (e.g., such as beeps and flashes; see Hirsh & Sherrick, 1961), the tempo­
ral window for subjective simultaneity has been shown to be much narrower, with timing
differences of less than 60 to 70 ms being detected (Zampini et al., 2003). When the com­
plexity of multisensory information increases, as in audiovisual speech or highly ecologi­
cal nonspeech audiovisual events (e.g., music instruments or object actions; Vatakis
& Spence, 2006), the information content (i.e., the semantics and the inherent
(p. 537)

structure/dynamics that are extended in time) of the unisensory inputs may serve as an
additional factor promoting integration (Calvert et al., 1998; Laurienti et al., 2004), and
hence lead to a widening of the temporal window for integration (i.e., larger temporal and
spatial disparities are tolerated; Vatakis & Spence, 2008). For speech, some authors have
suggested, moreover, that once such complex sensory signals are merged into a single
unified percept, the system interprets that this multisensory stimulus has a unique tempo­
ral onset and a common external origin (see Jackson, 1953), the “unity assumption hy­
pothesis.” Consequently, the final perceptual outcome of multisensory integration re­
duces or eliminates the original temporal asynchrony between the auditory and visual sig­
nals (Vatakis & Spence, 2007; Welch & Warren, 1980). Support for the unity assumption
has primarily come from studies showing participants’ sensitivity to temporal misalign­
ment in audiovisual speech signals is higher for mismatching audiovisual speech events
(e.g., different face-voice gender or McGurk syllables) than for matching ones (Vatakis &
Spence, 2007, and van Wassenhove et al., 2007; respectively). Other studies, however,
qualified these findings by showing that such informational congruency effects (i.e., wider
temporal windows for matching vs. nonmatching stimuli) were only observed for audiovi­
sual speech stimuli, but not for other complex audiovisual events such as music instru­
ments or object actions (Vatakis & Spence, 2006) and even some vocalizations (Vatakis et
al., 2008). This has been used to argue that the temporal synchrony perception for speech
stimuli may be somewhat special (Radeau, 1994; Vatakis et al., 2008).8

A critical methodological concern in Vatakis and Spence (2007) and van Wassenhove et al.
(2007), however is that mismatch conditions differed not only in the informational con­
tent of the stimuli but also in a number of physical (sensory) dimensions (e.g., structural
factors; Welch, 1999). As a result, the reported effects may be explained by differences at
the level of simple spatial-temporal attributes. In fact, two recent studies evaluating the
temporal window of integration for different attributes and perceptual interpretations in
Page 20 of 56
Multimodal Speech Perception

a set of identical multisensory objects (thus controlling for low-level factors) have chal­
lenged the unity assumption hypothesis (Soto-Faraco & Alsius, 2009; Vroomen & Steke­
lenburg, 2011). Soto-Faraco and Alsius (2009) explored the tolerance of the McGurk com­
bination effect to a broad range of audiovisual temporal asynchronies, while measuring,
at the same time, the temporal resolution across the two modalities involved. Critically,
they found that the McGurk illusion can arise even when perceivers are able to detect the
temporal mismatch between the face and the voice (Soto-Faraco & Alsius, 2009). This
suggests that the final perceptual outcome of multisensory integration perceptual input
does not overwrite the original temporal relation between the two inputs, as the unity as­
sumption would predict. Instead, it demonstrates that the temporal window of multisen­
sory integration has different widths depending on the perceptual attribute at stake. In
the Vroomen and Stekelnburg (2011) study, participants were required to detect asyn­
chronies between synthetically modified (i.e., sine-wave speech) pseudowords and the
corresponding talking face. Sine-wave speech is an impoverished speech signal that is not
recognized as speech by naïve observers; however, when perceivers are informed of its
speech-like nature, they become able to decode its phonetic content. The authors found
that, whereas the sound was more likely integrated with lip-read speech if heard as
speech than non-speech (i.e., the magnitude of the McGurk effect was dependent on the
speech mode; see also Tuomainen et al., 2005), observers in both a speech and non­
speech mode were equally sensitive at judging audiovisual temporal order of the events.
This result suggests that previously found differences between speech and nonspeech
stimuli were due to low-level stimulus differences, rather than reflecting the putative spe­
cial nature of speech.

The wider temporal windows of integration for audiovisual speech—as compared with
complex nonspeech stimuli or simple stimuli—observed in previous studies can be better
explained by the increased low-level time-varying correlations that underlie audiovisual
speech. That is, in contrast to simple transitory stimuli, where asynchronies can be de­
tected primarily by temporal onset–offsets cues, in slightly asynchronous matching audio-
visual speech there is still a fine temporal correlation between sound and vision (Munhall
et al., 1996). According to Vroomen and Stekelenburg, this (time-shifted) correlation
would induce a form of “temporal ventriloquist” effect (Morein-Zamir et al., 2003; Scheier
et al., 1999; Vroomen & de Gelder, 2004), by which the perceived timing of the auditory
speech stream would be actively shifted (i.e., “ventriloquized”) toward the corresponding
lip gestures, reducing differences in transmission and (p. 538) processing times of the dif­
ferent senses (therefore leading to a wider temporal integration windows for these type
of stimuli).

Spatial Constraints in Audiovisual Speech Perception

In addition to tolerance of temporal asynchronies, several studies have shown that exten­
sive spatial discrepancy between auditory and visual stimuli does not influence the
strength of the McGurk effect (Bertelson et al., 1994; Colin et al., 2001; Fisher &
Pylyshyn; 1994; Jones & Jarrick, 2006; Jones & Munhall, 1997), unless auditory spatial at­
tention is manipulated (Tiippana et al., 2011). Jones and Munhall (1997) measured the
Page 21 of 56
Multimodal Speech Perception

magnitude of the illusion for up to 90-degree sound angles and found that the proportion
of auditory-based responses was independent of loudspeaker location. Similar findings
were reported by Jones and Jarrick (2006). They measured the illusion both under tempo­
ral (from–360 to 360 ms) and spatial (five different locations) discrepancies. They found
no indication of an additive relationship between the effects of spatial and temporal sepa­
rations. In other words, the McGurk illusion was not reduced by the combination of spa­
tial and temporal disparities.

Besides measuring the magnitude of the McGurk illusion, a few studies also instructed
participants to make perceptual judgments of the relative location of auditory stimulus
(i.e., point to the apparent origin of the speech sounds; Bertelson et al., 1994; Driver,
1996; Jack & Thurlow, 1973). These studies showed that observers’ judgements of speech
sound locations are biased toward the visual source; the so-called ventriloquist illusion
(Connor, 2000; see Bertelson & de Gelder, 2004, for studies examining the effect in
nonassociative, simple stimuli). This illusion, a classic illustration of multisensory bias in
the spatial domain, underlies our perception of a voice emanating from actors appearing
on the screen when, in reality, the soundtrack is physically located elsewhere. Just as with
the temporal ventriloquist described earlier, the spatial ventriloquist effect is thought to
occur as a result of our perceptual system assuming that the co-occurring auditory infor­
mation and visual information have a single spatial origin. An interesting characteristic of
the ventriloquist illusion is that vision tends to dominate audition in the computation of
the location of the emergent percept. This makes functional sense, considering that, in
our daily life, we usually assign the origin of sounds, which may be difficult to localize, es­
pecially in noisy or reverberant conditions, to events perceived visually, which can be lo­
calized more accurately (Kubovy, 1988; Kubovy & Van Valkenburg, 1995).

Bertelson et al. (1994) measured both the ventriloquist and the McGurk illusions for the
very same audiovisual materials in one experiment. On each trial, participants were pre­
sented with an ambiguous fragment of auditory speech, delivered from one of seven hid­
den loudspeakers, together with an upright or inverted face shown on centrally located
screen. They were instructed to do a localization task (i.e., point to the apparent origin of
the speech sounds) and an identification task (i.e., report what had been said). Whereas
spatial separations did not reduce the effectiveness of the audiovisual stimuli in produc­
ing the McGurk effect, the ventriloquist illusion decreased as the loudspeaker location
moved away from the face (see Colin et al., 2001; Fisher & Pylyshyn, 1994; Jones &
Munhall, 1997, for similar results). The inverted presentation of the face, in contrast, had
no effect on the overall magnitude of the ventriloquism illusion, but it did significantly re­
duce the integration of auditory and visual speech (see Jordan & Bevan, 1997). This re­
versed pattern suggests that the two phenomena can be dissociated and, perhaps, involve
different components of the cognitive architecture. Note, moreover, that such dissociation
is also in disagreement with the unity assumption hypothesis described above because it
suggests that in some trials, both the unified (i.e., McGurk percepts) and low-level senso­
ry discrepancies (i.e., the spatial origin) can be perceived by participants.

Page 22 of 56
Multimodal Speech Perception

The findings that the McGurk illusion is impervious to spatial discrepancies suggest that
the spatial rule for multisensory integration (i.e., enhanced integration for closely located
sensory stimuli) does not apply in the specific case of audiovisual speech perception un­
less the task involves judgments regarding the spatial attributes of the event. Time-vary­
ing similarities in the patterning of information might prove, in this case, a more salient
feature for binding (Calvert et al., 1998; Jones & Jarrick, 2006; Jones & Munhall, 1997).

Extraction of Visual Cues in Audiovisual


Speech
The study of the visual aspects of speech perception has generally been addressed mainly
from two perspectives. Whereas some researchers have explored the essential visual in­
put required for a successful processing of linguistic information (Munhall et al., 2004;
Preminger et al., 1998; Rosenblum & Saldaña, (p. 539) 1996; Thomas & Jordan, 2004),9
others have investigated how the observer actively selects this information by examining
the pattern of eye movements during speech (e.g., Lansing & McConkie, 1994; Vatikiotis-
Bateson et al., 1998).

Numerous studies attempting to isolate the critical aspects of the visual information have
been carried out by occluding (or freezing) different parts of facial regions and measur­
ing the impact of the nonmasked areas to speech reading. These studies have generally
shown that the oral region (i.e., talker’s mouth) offers a fairly direct source of information
about the segmental properties of speech (Summerfield, 1979). For instance, Thomas and
Jordan (2004) showed that the intelligibility of an oral-movements display was similar to
that of a whole-face movements display (though see IJsseldijk, 1992). However, linguisti­
cally relevant information can also be extracted from extraoral facial regions, when the
oral aperture is occluded (Preminger et al., 1998; Thomas & Jordan, 2004), probably ow­
ing to the strong correlation between oral and extraoral movements described above
(Munhall & Vatikiotis-Bateson, 1998). Furthermore, visual speech influences on the audi­
tory component have also been shown to remain substantially unchanged across horizon­
tal viewing angles (full face, three-quarter, profile; Jordan & Thomas, 2001), rotations in
the picture plane (Jordan & Bevan, 1997), or when removing the color of the talking face
(Jordan et al., 2000). Even when the facial surface kinematics are reduced to the motion
of a collection of light points, observers still show perceptual benefit in an acoustically
noisy environment and can perceive the McGurk effect (Rosenblum & Saldaña, 1996).
This result suggests that pictorial information such as skin texture does not portray criti­
cal information for audiovisual improvements (although note that performance never
reached the levels found in natural displays), and they highlight the importance of motion
cues in speech perception. However, is time-varying information of the seen articulators
sufficient for audiovisual benefits to be observed? In Rosenblum’s study, the patch-light
stimuli covered extensive regions of the face and the inside of the mouth (tongue, teeth)
and the angular deformations of the point-lights could have been used to reveal the local
surface configuration. It is possible, therefore, that such point-lights did not completely

Page 23 of 56
Multimodal Speech Perception

preclude configural information of the face forms (i.e., spatial relations between fea­
tures), and that these cues were used together with the motion cues to support speech
identification. Indeed, other studies have found that configural information of the face is
critical for audiovisual benefits and integration to be observed. For instance, Campbell
(1996) demonstrated that the McGurk effect can be disrupted by inverting the brightness
of the face (i.e., photonegative images), a manipulation that severely degrades visual
forms of the face while preserving time-varying information. In the same line, in a speech-
in-noise task, Summerfield (1979) presented auditory speech stimuli together with four
point-lights tracking the motion of the lips (center of top and bottom lips and the corners
of the mouth) or with a Lissajou curve whose diameter was correlated with the amplitude
of the audio signal, and found no enhancement whatsoever. Similarly, auditory speech de­
tection in noise is not facilitated by presenting a fine-grained correlated visual object
(e.g., dynamic rectangle whose horizontal extent was correlated with the speech enve­
lope; also see Bernstein et al., 2004; Ghazanfar et al., 2005). This suggests that both the
analysis of visual form and analysis of the dynamic characteristics of the seen articulators
are important factors for audiovisual integration. Further studies are required to deter­
mine the contribution of each of these sources to speech identification.

The image quality of the face has also been manipulated by using various techniques that
eliminate part of the spatial frequency spectrum (Figure 26.8). Studies using this type of
procedure suggest that fine facial detail is not critical for visual and audiovisual speech
recognition. For instance, Munhall et al. (2004) degraded images by applying different
bandpass and low-pass filters and revealed that the filtered visual information was suffi­
cient for attaining a higher speech intelligibility score than that of auditory-only signal
presentation. Results showed that subjects had highest levels of speech intelligibility in
the midrange filter band with a center spectral frequency of 11 cycles per face, but that
the band with 5.5 cycles per face also significantly enhanced intelligibility. This suggests
that high spatial frequency information is not needed for speech perception. In this line,
other studies have shown that speech perception is reduced—but remains effective when
facial images are spatially degraded by quantization (e.g., Campbell & Massaro, 1997;
MacDonald et al., 2000), visual blur (Thomas & Jordan, 2002; Thorn & Thorn, 1989), or
increased stimulus distance (Jordan & Sergeant, 2000).

Page 24 of 56
Multimodal Speech Perception

Figure 26.8 Three versions of the same image of a


talker are shown with different amounts of spatial
frequency filtering. The image on the left contains
only very-low-frequency content and would provide
minimal visual speech information. The middle image
has a higher spatial frequency filter cutoff but is still
quite blurry. Video images with this degree of filter­
ing have been found to produce an equal enhance­
ment of the perception of auditory speech in noise as
the unfiltered video like the image on the right.

The results of these studies, moreover, are consistent with the observation that visual
speech can be successfully encoded when presented in peripheral vision, (p. 540) several
degrees away from fixation (Smeele et al., 1998). Indeed, Paré et al. (2003) showed that
manipulations of observers’ gaze did not influence audiovisual speech integration sub­
stantially until their gaze was directed at least 60 degrees eccentrically. Thus, the conclu­
sion from this finding is that high-acuity, foveal vision of the oral area is not necessary to
extract linguistically relevant visual information from the face.

Altogether, results demonstrating that visual speech perception can subsist when experi­
mental stimuli are restricted to low spatial frequency components of the images suggest
that visual speech information is processed at a coarse spatial level. In fact, many studies
examining eye movements of perceivers naturally looking at talking faces (i.e., with no
specific instructions regarding what cues to attend to), consistently look at the eye region
more than the mouth (Klin et al., 2005). Nevertheless, if the task requires extracting fine
linguistic information (e.g., phonetic details in high background noise, word identification
or segmental cues), observers make significantly more fixations on the mouth region than
the eye region (Buchan et al., 2007, 2008; Lansing & McConkie, 2003; Vatikiotis-Bateson
et al., 1998).

Neural Correlates of Audiovisual Speech Inte­


gration
The brain contains many structures that receive projections from more than one sensory
system. In some, these projections remain functionally and anatomically segregated (such
as in the thalamus). However, in others there is a convergence of multisensory informa­
tion onto the same neurons. These regions, therefore, are possibly involved in audiovisual
integration operations. This latter type of area traditionally includes several structures in
the high-level associative or heteromodal cortices, such as the superior temporal sulcus
(STS), intraparietal sulcus (IPS), inferior frontal gyrus (IFG), insula, claustrum, and sub-
cortical structures like the superior colliculus (SC). Among these brain areas, the majori­
ty of functional imaging studies emphasize the caudal part of the STS as a key region in­
Page 25 of 56
Multimodal Speech Perception

volved in audio-visual integration of speech because it exhibits increased activity for audi­
tory speech stimulation (Scott & Johnsrude, 2003), visual speech articulation (Bernstein
et al., 2002; Calvert et al., 1997; Campbell et al., 2001; MacSweeney et al., 2001), and
congruent audiovisual speech (Calvert et al., 2000). Furthermore, several studies have
demonstrated that this region shows enhancement to concordant audiovisual stimuli and
depression to mismatching speech (Calvert et al., 2000). Finally, the STS and STG are in­
volved with visual enhancement of speech intelligibility in the presence of an acoustic
masking noise, in accordance with the principle of inverse effectiveness (i.e., multisenso­
ry enhancement is greatest when uni-modal stimuli are least effective; Callan et al., 2003;
Stevenson & James, 2009).

Whereas these brain regions have been repeatedly shown to be involved in audiovisual
speech processing, the relatively large diversity of experimental settings and analysis
strategies across studies makes it difficult to determine the specific role of each location
in the audiovisual integration processes. Nevertheless, it is now becoming apparent that
different distributed networks of neural structures may serve different functions in audio­
visual integration (i.e., time, space, content; Calvert et al., 2000). In a recent study, for ex­
ample, Miller and d’Esposito (2005; see also Stevenson et al., 2011) observed that differ­
ent brain areas respond preferentially to the detection of sensory correspondence and to
the perceptual fusion of speech events. That is, they found that while middle STS, middle
IPS, and IFG (p. 541) are associated with the perception of fused bimodal speech stimuli,
another network of brain areas (the SC, anterior insula, and anterior IPS) is differentially
involved in the detection of commonalities of seen and heard speech in terms of its tem­
poral signature (see also Jones & Callan, 2003; Kaiser et al., 2004; Macaluso et al., 2004).
It remains unknown, however, how these functionally distinct networks of neural groups
work in concert to match and integrate multimodal input during speech perception. Ac­
cording to Doesburg et al. (2008), functional coupling between the networks is achieved
through long-range phase synchronization (i.e., neuronal groups that enter into precise
phase-locking over a limited period of time), particularly the oscillations in the gamma
band (30 to 80 Hz; Engel & Singer, 2001; Fingelkurts et al., 2003; Senkowski et al., 2005;
Varela et al., 2001).

Furthermore, other remarkable findings demonstrate that viewing speech can modulate
the activity in the secondary (Wilson et al., 2004) and primary motor cortices, specifically
the mouth area in the left hemisphere (Ferrari et al., 2003; Watkins et al., 2003). That is,
the distributed network of brain regions associated with the perception of bimodal speech
information also includes areas that are typically involved in speech production, such as
Broca’s area, premotor cortex (Meister et al., 2007), and even primary motor cortex
(Calvert & Campbell, 2003; Campbell et al., 2001; Sato et al., 2009; Skipper et al., 2007).
These results appear in keeping with the long-standing proposal that audiovisual integra­
tion of speech is mediated by the speech motor system (motor theory of speech percep­
tion; Liberman et al., 1967). However, the role that these areas play in audiovisual speech
integration remains to be elucidated (see Galantucci et al., 2006; Hickok, 2009, for dis­
cussions).

Page 26 of 56
Multimodal Speech Perception

Moreover, there have been a number of studies in humans and animals that have re­
vealed early multi-sensory interactions within areas traditionally considered purely visual
or purely auditory, such as MT/ V5 and Heschl’s gyrus, respectively (Beauchamp et al.,
2004; Callan et al., 2003, 2004; Calvert et al., 2000, 2001; Möttönen et al., 2004; Olson et
al., 2002), and even at earlier stages of processing (i.e., brainstem structures; Musacchia
et al., 2006).

Using functional magnetic resonance imaging (fMRI), Calvert et al. (1997), for example,
found that linguistic visual cues are sufficient to activate primary auditory cortex in nor­
mal-hearing individuals in the absence of auditory speech sounds (see also Pekkola et al.,
2005; though see Bernstein et al., 2002, for a nonreplication).

The first evidence that visual speech modulates activity in the unisensory auditory cortex,
however, was provided by Sams et al. (1991). Using magnetoencephalography (MEG)
recordings in an oddball paradigm, Sams et al. found that infrequent McGurk stimuli (in­
congruent audiovisually presented syllables) presented among frequent matching audiovi­
sual standards gave rise to magnetic mismatch fields (i.e., mismatch negativity [MMN]10;
Näätänen, 1982) at the level of the supratemporal region about 180 ms after stimulus on­
set. Other studies using mismatch paradigms have corroborated this result (Colin et al.,
2002; Kislyuk et al., 2008; Möttönen et al., 2002). Using high-density electrical mapping,
Saint-Amour et al. (2007) revealed a dominance of left hemispheric cortical generators
during the early and late phases of the McGurk MMN, consistent with the well-known left
hemispheric dominance for speech reading (Calvert & Lewis 2004; Capek et al. 2004).

Subsequent MEG and electroencephalography (EEG) studies using the additive model (in
which the event-related potential (ERP) responses to audiovisual stimulation are com­
pared with the sum of auditory and visual evoked responses) have estimated that the ear­
liest interactions between the auditory and visual properties can occur approximately 100
ms after the visual stimulus onset (Besle et al., 2004; Giard & Peronnet, 1999; Klucharev
et al., 2003; Van Wassenhove et al., 2005) or even before (50 ms; Lebib et al., 2003), sug­
gesting that audiovisual integration of speech occurs early in the cortical auditory pro­
cessing hierarchy. For instance, van Wassenhove et al. (2005) examined the timing of AV
integration for both congruent (/ka/, /pa/, and /ta/) and incongruent (McGurk effect)
speech syllables and found that the latency of the classic components of the auditory
evoked potential (i.e., N1/P2) speeded up when congruent visual information was present.
According to the authors, because the visual cues for articulation often precede the
acoustic cues in natural audiovisual speech (sometimes by more than 100 ms; Chan­
drasekaran et al., 2009), an early representation of the speech event can be extracted
from visual cues related to articulator preparation and used to constrain the processing
of the forthcoming auditory input (analysis by synthesis model; van Wassenhove et al.,
2005). Thus, importantly, this model proposes not only that cross-modal effects occur ear­
ly in time, and in areas that are generally regarded as lower (early) in the sensory hierar­
chy, but also that the earlier-arriving visual information constrains the activation of pho­
netic units in the auditory cortex. Note that such a (p. 542) mechanism would only be

Page 27 of 56
Multimodal Speech Perception

functional if the auditory processing system allowed for some sort of feedback signal from
higher order areas.

The idea of feedback modulating the processing in lower sensory areas by using predic­
tions generated by higher areas is not novel. Evidence from fMRI experiments has shown
modulation of activity in sensory-specific cortices during audiovisual binding (Calvert &
Campbell, 2003; Calvert et al., 1997, 2000; Campbell et al., 2001; Sekiyama et al., 2003;
Wright et al., 2003) and have proposed that unisensory signals of multisensory objects
are initially integrated in the STS and that interactions in the auditory cortex reflect feed­
back inputs from the STS (Calvert et al., 1999). However, although this explanation is ap­
pealing, it is unlikely that this is the only way in which audiovisual binding occurs. The
relative preservation of multisensory function after lesions to higher order multisensory
regions (see Ettlinger & Wilson, 1990, for a review) and studies demonstrating that inter­
actions in auditory cortex preceded activation in the STS region (Besle et al., 2008;
Bushara, 2003; Möttönen et al., 2004; Musacchia et al., 2006) challenge a pure feedback
interpretation. That is, current views on multisensory integration (Driver & Noesselt,
2008; Schroeder et al., 2008) suggest that there might be an additional direct cortical-
cortical input to auditory cortex from the visual cortex (Cappe & Barone, 2005; Falchier
et al., 2002; Rockland & Ojima, 2003) that might convey direct phase resetting by the mo­
tion-sensitive cortex, resulting in tuning of the auditory cortex to the upcoming sound (Ar­
nal et al., 2009; Schroeder et al., 2008). Thus, in the light of recent evidence, a more elab­
orate network involving both unimodal processing streams and multisensory areas needs
to be considered.

Cognitive Influences on Audiovisual Speech In­


tegration
Currently, one of the most controversial issues in the study of audiovisual integration con­
cerns the question of whether audiovisual integration is an automatic process that occurs
early and independently of top-down factors such as attention, expectations, and task
sets, or can be modulated by cognitive processes (Colin et al., 2002; Massaro, 1983;
McGurk & MacDonald, 1976; Soto-Faraco et al., 2004).

The considerable robustness of the McGurk illusion under conditions in which the observ­
er realizes that the visual and auditory streams do not emanate from the same source
makes it seem likely that higher cognitive functions have little access to the integration
process and that the integration cannot be avoided or interrupted at will (Colin et al.,
2002; Massaro, 1998; McGurk & MacDonald, 1976; Rosenblum & Saldaña, 1996; Soto-
Faraco et al., 2004). Nevertheless, the most straightforward demonstration for the auto­
matic nature of speech integration is provided by studies using indirect methodologies
and showing that audiovisual integration emerges even when its resulting percept (e.g.,
McGurk illusion) is detrimental to the task at hand (Driver, 1996; Soto-Faraco et al.,
2004). In Soto-Faraco’s study, participants were asked to make speeded classification
judgments about the first syllable of audiovisually presented bisyllabic pseudowords,
Page 28 of 56
Multimodal Speech Perception

while attempting to ignore the second syllable. The paradigm, inspired by the classic Gar­
ner speeded classification task (Garner, 1974; see Pallier, 1994, for the syllabic version),
is based on the finding that reaction times required to classify the first syllable (target)
are slowed down when the second (irrelevant) syllable varies from trial to trial, in com­
parison to when it remains constant. Soto-Faraco et al. used audiovisual stimuli for which
the identity of the second syllable was sometimes manipulated by introducing McGurk
percepts. Critically, the authors found that the occurrence of the syllabic interference ef­
fect was determined by the illusory percept, rather than the actually presented acoustic
stimuli. Overall, these results suggest that the integration of auditory and visual speech
cues occurs before attentive selection of syllables because participants were unable to fo­
cus their attention on the auditory component alone and ignore the visual influence in the
irrelevant syllable.

In another study, Driver (1996) showed that selective listening to one of two spatially sep­
arated speech streams worsened when the lip movements corresponding to the distractor
speech stream were displayed near the target speech stream. The interpretation for this
interference is that the matching (distractor) auditory stream is ventriloquized toward the
location of the seen talker face, causing an illusory perception that the two auditory
streams emanate from the same source, and thus worsening the selection of the target
stream by spatial attention. The key finding is that participants could not avoid integrat­
ing the auditory and visual components of the speech stimulus, suggesting that multisen­
sory binding of speech cannot be suppressed and arises before spatial selective attention
is completed.

In line with the behavioral results discussed above, neurophysiological (ERP) studies us­
ing (p. 543) oddball paradigms have shown that an MMN can be evoked (Colin et al.,
2002) or eliminated (Kislyuk et al., 2008) by McGurk percepts. Because the MMN is con­
sidered to reflect a preattentive comparison process between the neural representation of
the incoming deviant trace and the neural trace of the standard, these results suggest
that conflicting signals from different modalities can be combined into a unified neural
representation during early sensory processing without any attentional modulation. Fur­
thermore, the recent discovery that audiovisual interactions can arise early in functional
terms (Callan et al., 2003; Calvert et al., 1997, 1999; Möttönen et al., 2004) seems to be
consistent with the argument that binding mechanisms need to be rapid and mandatory.
However, top-down modulation effects have been observed at very early stages in the
neural processing pathway both for the auditory (e.g., Schadow et al., 2009) and the visu­
al sensory modalities (Shulman et al., 1997). It cannot be ruled out, therefore, that top-
down cognitive processes modulate the audiovisual integration mechanisms in an analo­
gous fashion. In fact, attentional modulations of audiovisual interactions at hierarchically
early stages of processing have been described before (Calvert et al., 1997; Pekkola et al.,
2005).

Altogether, the studies reviewed above suggest that audiovisual speech integration is cer­
tainly resilient to cognitive factors. However, these findings should be not construed as
conclusive evidence for automatic integration of the auditory and visual speech compo­

Page 29 of 56
Multimodal Speech Perception

nents in all situations. Recent studies have shown that increasing attentional load to a de­
manding, unrelated visual, auditory (Alsius et al., 2005; see also Tiippana et al., 2004), or
tactile task (Alsius et al., 2007) decreases the percentage of illusory McGurk responses.
This result conforms well to the perceptual load theory of attention11 (Lavie, 2005) and in­
dicates that spare attentional resources are in fact needed for successful integration.
Note that if audiovisual speech integration were completely independent of attention, it
would remain unaltered when attentional resources were fully consumed.

Furthermore, other studies have shown that selective attention to the visual stimuli is re­
quired to perceptually bind audiovisually correlated speech (Alsius & Soto-Faraco, 2011;
Fairhall & Macaluso, 2009) and McGurk-like combinations (Andersen et al., 2008).
Fairhall et al. measured blood-oxygen-level-dependent (BOLD) responses using fMRI
while participants were covertly attending to one of two visual speech streams (i.e., talk­
ing faces on the left and right) that could be either congruent or incongruent with a cen­
tral auditory stream. The authors found that spatial attention to the corresponding
(matching) visual component of these audiovisual speech pairs was critical for the activa­
tion of cortical and subcortical brain regions thought to reflect neural correlates of audio­
visual integration. Using a similar display configuration (i.e., two laterally displaced faces
and a single central auditory stream) with McGurk-like stimuli, Andersen et al. (2008)
found that directing visual spatial attention toward one of the faces increased the influ­
ence of that particular face on auditory perception (i.e., the illusory percepts related to
that face). Finally, Alsius and Soto-Faraco (2011) used visual and auditory search para­
digms to explore how the human perceptual system processes spatially distributed
speech signals (i.e., speaking faces) in order to detect matching audio-visual events. They
found that search efficiency among faces for the match with a voice declined with the
number of faces being monitored concurrently. This suggests that visual selective atten­
tion is required to perceptually bind audiovisually correlated objects. In keeping with
Driver’s results described above, however, they found that search among auditory speech
streams for the match with a face was independent of the number of streams being moni­
tored concurrently (though see Tiippana, 2011). It seems, therefore, that whereas the
perceptual system has to have full access to the spatial location of the visual event for the
audiovisual correspondences to be detected, similar constraints do not apply to the audi­
tory event. That is, multisensory matching seems to occur before the deployment of audi­
tory spatial attention (though see Tiippana et al., 2011).

In addition to the above-mentioned results, other studies have shown that audiovisual in­
tegration can sometimes be influenced by higher cognitive properties such as observers’
expectations (Windman et al., 2003), talker familiarity (Walker et al., 1995), lexical status
(Barutchu et al., 2008), or instructions (Colin et al, 2005; Massaro, 1998; Summerfield &
McGrath, 1984). For instance, in one study by Summerfield and McGrath (1984; see also
Colin et al., 2005), half of the participants were informed about the artificial nature of the
McGurk stimuli and were instructed to report syllables according to the auditory input.
The rest of participants were not aware of the dubbing procedure and were simply in­
structed to repeat what the (p. 544) speaker had said. The visual influence was weaker in
the first group of participants, suggesting that knowledge of how the illusion is created
Page 30 of 56
Multimodal Speech Perception

can facilitate the segregation of the auditory signal from the visual information. Taken to­
gether, therefore, these results question an extreme view of automaticity and suggest in­
stead some degree of penetrability in the binding process. Along the same lines, other
studies have shown that identical McGurk-like stimuli can be responded to in qualitative­
ly different ways, depending on whether the auditory stimuli is perceived by the listener
as speech or as nonspeech (Tuomainen et al., 2005), on whether the visual speech ges­
tures are consciously processed or not (Munhall et al., 2009), or on the set of features of
the audiovisual stimuli participants have to respond to (Soto-Faraco & Alsius, 2007;
2009). Thus, the broad picture that emerges from all these studies is that, although au­
diovisual speech integration seems certainly robust to cognitive intervention, it can some­
times adapt to the specific demands imposed by the task at hand in a dynamic and mal­
leable manner.

Therefore, it seems that rather than absolute susceptibility or absolute immunity to top-
down modulation, the experimental conditions at hand will set the degree of automaticity
of the binding process. In situations of low perceptual and cognitive processing load,
where the visual speech stimuli may be particularly hard to ignore (see Lavie, 2005), au­
dio-visual integration appears to operate quite early and in a largely automatic manner
(i.e., without voluntary control). This is line with a recent proposal by Talsma et al. (2010)
stating that audiovisual integration will occur automatically (i.e., in a bottom-up manner)
in low-perceptual-load settings, whereas in perceptually demanding conditions, top-down
attentional control will be required. In keeping with this idea, previous studies outside
the speech domain have shown that only when attention is directed to both modalities si­
multaneously are auditory and visual stimuli integrated very early in the sensory flow pro­
cessing (about 50 ms; Senkowski et al., 2005; Talsma & Woldorff, 2005; Talsma et al.,
2007) and do interactions within higher order heteromodal areas arise (Degerman et al.,
2007; Fort et al., 2002; Pekkola et al., 2005; van Atteveldt et al., 2007). However, top-
down modulations may impose a limit to this automaticity under certain conditions. That
is, having full access (at sensory and cognitive levels) to the visual stimuli (Alsius & Soto-
Faraco, 2011) and processing both the auditory and visual signals as speech stimuli seem
to be critical requisites for binding (Munhall et al., 2009; Summerfield, 1979; Tuomainen
et al., 2005).

In summary, there seems to be a complex and dynamic interplay between audiovisual in­
tegration and cognitive processes. As mentioned earlier (see section, Neural Correlates of
Audiovisual Speech Integration), the auditory and visual sensory systems interact at mul­
tiple levels of processing during speech perception (Hertrich et al., 2009; Klucharev et
al., 2003). It is possible, therefore, that top-down modulatory signals exert an influence
on some of these levels (see Calvert & Thesen, 2004, for a related argument), possibly de­
pending on perceptual information available (Fairhall & Macaluso, 2009), resources re­
quired (Alsius et al., 2005; 2007), spatial attention (Alsius & Soto-Faraco, 2011; Tiippana
et al., 2011), task parameters (Hugenschmidt et al., 2010), and the type of attribute en­
coded (Munhall et al., 2009; Soto-Faraco & Alsius, 2009; Tuomainen et al., 2005).

Page 31 of 56
Multimodal Speech Perception

Summary
Speech is a naturally occurring multisensory event and we, as perceivers, are capable of
using various components of this rich signal to communicate. The act of talking involves
the physical generation of sound by the face and vocal tract and the acoustics and move­
ments are inextricably linked. These physical signals can be accessed in part by the visu­
al, auditory, and haptic perceptual systems. Each of these perceptual systems preferen­
tially extracts different information about the total speech event. Thus, the combination
of multiple sensory channels provides a richer and more robust perceptual experience.
The neural networks supporting this processing are diverse and extended. In part, the
neural substrates involved depend on the task subjects are engaged in. Much remains to
be learned about how low-level automatic processes contribute to multisensory process­
ing and how and when higher cognitive strategies can modify this processing. Under­
standing behavioral data about task and attention are prerequisites for a principled neu­
roscientific explanation of this phenomenon.

References
Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of
speech falters under high attention demands. Current Biology, 15, 839–843.

Alsius, A., Navarra, J., Soto-Faraco, S. (2007). Attention to touch reduces audiovisual
speech integration. Experimental Brain Research, 183, 399–404.

Alsius, A., Soto-Faraco, S. (2011). Searching for audiovisual correspondence in multiple


speaker scenarios. Experimental Brain Research, 213 (2-3), 175–183.

Andersen, T. S., Tiippana, K., Laarni, J., Kojo, I., & Sams, M. (2008). The role of visual spa­
tial attention in audiovisual speech perception. Speech Communication, 51, 184–193.

Andersson, U., & Lidestam, B. (2005). Bottom-up driving speechreading in a speechread­


ing expert: The case of AA (JK023). Ear and Hearing, 26, 214–224.

Arnal, L. H., Morillon, B., Kell, C. A., & Giraud, A. L. (2009). Dual neural routing of visual
facilitation in speech processing. Journal of Neuroscience, 29, 13445–13453.

Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when
speech is clearly audible and intact. British Journal of Psychology, 92 (2), 339–355.

Auer, E. T., Jr. (2002). The influence of the lexicon on speech read word recognition: Con­
trasting segmental and lexical distinctiveness. Psychonomic Bulletin and Review, 9, 341–
347.

Auer, E. T., Jr. (2009). Spoken word recognition by eye. Scandinavian Journal of Psycholo­
gy, 50, 419–425.

Page 32 of 56
Multimodal Speech Perception

Auer, E. T., Jr., & Bernstein, L. E. (1997). Speechreading and the structure of the lexicon:
Computationally modeling the effects of reduced phonetic distinctiveness on lexical
uniqueness. Journal of the Acoustical Society of America, 102 (6), 3704–3710.

Auer, E. T., Jr., Bernstein, L. E., & Coulter, D. C. (1998). Temporal and spatio-temporal vi­
brotactile displays for voice fundamental frequency: An initial evaluation of a new vibro­
tactile speech perception aid with normal-hearing and hearing-impaired individuals. Jour­
nal of the Acoustical Society of America, 104, 2477–2489.

Barutchu, A., Crewther, S., Kiely, P., & Murphy, M. (2008). When /b/ill with /g/ill becomes /
d/ill: Evidence for a lexical effect in audiovisual speech perception. European Journal of
Cognitive Psychology, 20 (1), 1–11.

Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling
multisensory integration: patchy organization within human STS multisensory cortex. Na­
ture Neuroscience, 7, 1190–1192.

Berger, K. W. (1972). Visemes and homophenous words. Teacher of the Deaf, 70, 396–399.

Bernstein, L. E., Auer, E. T., Moore, J. K., Ponton, C., Don, M., & Singh, M. (2002). Visual
speech perception without primary auditory cortex activation. NeuroReport, 13, 311–315.

Bernstein, L. E., Auer, E. T., & Takayanagi, S. (2004). Auditory speech detection in
(p. 546)

noise enhanced by lipreading. Speech Communication, 44, 5–18.

Bernstein, L. E., Demorest, M. E., & Tucker, P. E. (1998). What makes a good speechread­
er? First you have to find one. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by
eye. II: The Psychology of speechreading and auditory–visual speech (pp. 211–228). East
Sussex, UK: Psychology Press.

Bernstein, L. E., Eberhart, S. P., & Demorest, M. E. (1989). Single-channel vibrotactile


supplements to visual perception of intonation and stress. Journal of the Acoustical Soci­
ety of America, 85, 397–405.

Bernstein, L. E., Iverson, P., & Auer, E. T., Jr. (1997). Elucidating the complex relation­
ships between phonetic perception and word recognition in audiovisual speech percep­
tion. In C. Benoît & R. Campbell (Eds.), Proceedings of the ESCA/ ESCOP workshop on
audio-visual speech processing (pp. 21–24). Rhodes, Greece.

Bertelson, P., & de Gelder, B. (2004). The psychology of multimodal perception In: C.
Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 141–177). Ox­
ford, UK: Oxford University Press.

Bertelson, P., Vroomen, J., Wiegeraad, G., & de Gelder, B. (1994). Exploring the relation
between McGurk interference and ventriloquism. In Proceedings of ICSLP 94, Acoustical
Society of Japan (Vol. 2, pp. 559–562). Yokohama, Japan.

Page 33 of 56
Multimodal Speech Perception

Besle, J., Fischer, C., Bidet-Caulet, A., Lecaignard, F., Bertrand, O., & Giard, M.H. (2008).
Visual activation and audiovisual interactions in the auditory cortex during speech per­
ception: intracranial recordings in humans. Journal of Neuroscience, 28, 14301–14310.

Besle, J., Fort, A., Delpuech, C., & Giard, M.-H. (2004). Bimodal speech: Early suppressive
visual effects in the human auditory cortex. European Journal of Neuroscience, 20, 2225–
2234.

Boothroyd, A. (1970). Concept and control of fundamental voice frequency in the deaf: An
experiment using a visible pitch display. Paper presented at the International Congress of
Education of the Deaf, Stockholm, Sweden.

Brancazio, L. (2004). Lexical influences in audiovisual speech perception. Journal of Ex­


perimental Psychology: Human Perception and Performance, 30, 445–463.

Brancazio, L., & Miller, J. L. (2005). Use of visual information in speech perception: Evi­
dence for a visual rate effect both with and without a McGurk effect. Perception and Psy­
chophysics, 67, 759–769.

Brancazio, L., Miller, J. L., & Paré, M. A. (1999). Perceptual effects of place of articulation
on voicing for audiovisually discrepant stimuli. Journal of the Acoustical Society of Ameri­
ca, 106, 2270.

Brooks, P. L., & Frost, B. J. (1983). Evaluation of a tactile vocoder for word recognition.
Journal of the Acoustical Society of America, 74, 34–39.

Brooks, P. L., Frost, B. J., Mason, J. L., & Gibson, D. M. (1986). Continuing evaluation of
Queen’s University tactile vocoder I: Identification of open set words. Journal of Rehabili­
tation Research and Development, 23, 119–128.

Buchan, J. N., Paré, M., & Munhall, K. G. (2004). The influence of task on gaze during au­
diovisual speech perception. Journal of the Acoustical Society of America, 115, 2607.

Buchan, J. N., Paré, M., & Munhall, K. G. (2007). Spatial statistics of gaze fixations during
dynamic face processing. Social Neuroscience, 2 (1), 1–13.

Buchan, J. N., Paré, M., & Munhall, K. G. (2008). The effect of varying talker identity and
listening conditions on gaze behavior during audiovisual speech perception. Brain Re­
search, 1242, 162–171.

Burnham, D., Ciocca, V., Lauw, C., Lau, S., & Stokes, S. (2000). Perception of visual infor­
mation for Cantonese tones. In M. Barlow & P. Rose (Eds.), Proceedings of the Eighth
Australian International Conference on Speech Science and Technology (pp. 86–91). Aus­
tralian Speech Science and Technology Association, Canberra.

Bushara, K. O., Hanakawa, T., Immisch, I., Toma, K., Kansaku, K., & Hallet, M. (2003).
Neural correlates of cross-modal binding. Nature Neuroscience, 6 (2), 190–195.

Page 34 of 56
Multimodal Speech Perception

Callan, D., Jones, J. A., Munhall, K. G., Kroos, C., Callan, A., & Vatikiotis Bateson, E.
(2003). Neural processes underlying perceptual enhancement by visual speech gestures.
NeuroReport, 14, 2213–2218.

Callan, D., Jones, J. A., Munhall, K. G., Kroos, C., Callan, A., & Vatikiotis-Bateson, E.
(2004). Multisensory integration sites identified by perception of spatial wavelet filtered
visual speech gesture information. Journal of Cognitive Neuroscience, 16, 805–816.

Calvert, G. A., Brammer, M., Bullmore, E., Campbell, R., Iversen, S. D., & David, A.
(1999). Response amplification in sensory-specific cortices during crossmodal binding.
Neuroreport, 10, 2619–2623.

Calvert, G. A., Brammer, M. J., & Iversen, S. D. (1998). Crossmodal identification. Trends
in Cognitive Sciences, 2 (7), 247–253.

Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P.
K., Woodruff, P. W., Iversen, S. D., & David, A. S. (1997). Activation of auditory cortex dur­
ing silent lipreading. Science, 276, 593–596.

Calvert, G. A., & Campbell, R. (2003). Reading speech from still and moving faces: The
neural substrates of visible speech. Journal of Cognitive Neuroscience, 15, 57–70.

Calvert, G.A., Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic
resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biol­
ogy, 10, 649–657.

Calvert, G. A., Hansen, P. C., Iversen, S. D., & Brammer, M. J. (2001). Detection of audiovi­
sual integration sites in humans by application of electro-physiological criteria to the
BOLD effect. NeuroImage, 14, 427–438.

Calvert, G. A., & Lewis, J. W. (2004). Hemodynamic studies of audiovisual interactions. In


G. Calvert, C. Spence, & B. Stein (Eds.), Handbook of multisensory processes (pp. 483–
502). Cambridge, MA: MIT Press.

Calvert, G. A., & Thesen, T. (2004). Multisensory integration: Methodological approaches


and emerging principles in the human brain. Journal of Physiology, 98, 191–205.

Campbell, C. S., & Massaro, D. W. (1997). Visible speech perception: Influence of spatial
quantization. Perception, 26, 627–644.

Campbell, R. (1992). The neuropsychology of lipreading. Philosophical Transactions of the


Royal Society of London. Series B, Biological Sciences, 335, 39–45.

Campbell, R. (1996). Seeing brains reading speech: A review and speculations. In D. G.


Stork, & M. E. Hennecke (Eds.), Speechreading by humans and machines: Models. In:
Systems and applications (pp. pp. 115–134). Berlin: Springer.

Page 35 of 56
Multimodal Speech Perception

Campbell, R., MacSweeney, M., Surguladze, S., Calvert, G. A., Brammer, M. J.,
(p. 547)

David, A. S., & Williams, S. C. R. (2001). Cortical substrates for the perception of face ac­
tions: An fMRI study of the specificity of activation for seen speech and for meaningless
lower-face acts (gurning). Cognitive Brain Research, 12, 233–243.

Campbell, R., Zihl, J., Massaro, D., Munhall, K., & Cohen, M. (1997). Speechreading in the
akinetopsic patient, LM. Brain, 120, 1793–1803.

Capek, C. M., Bavelier, D., Corina, D., Newman, A. J., Jezzard, P., & Neville, H. J. (2004)
The cortical organization of audio-visual sentence comprehension: An fMRI study at 4
Tesla. Cognitive Brain Research, 20, 111–119.

Cappe, C., & Barone, P. (2005). Heteromodal connections supporting multisensory inte­
gration at low levels of cortical processing in the monkey. European Journal of Neuro­
science, 22 (11), 2886–2902.

Carney, A. E. (1988). Vibrotactile perception of segmental features of speech: A compari­


son of single-channel and multichannel instruments. Journal of Speech and Hearing Re­
search, 31, 438–448.

Carney, A. E., & Beachler, C. R. (1986). Vibrotactile perception of suprasegmental fea­


tures of speech: A comparison of single-channel and multi-channel instruments. Journal of
the Acoustical Society of America, 79, 131–140.

Carney, A. E., Clement, B. R., & Cienkowski, K. M. (1999). Talker variability effects in au­
ditory-visual speech perception. Journal of the Acoustical Society America, 106, 2270 (A).

Chandrasekaran, C., & Ghazanfar, A. A. (2009). Different neural frequency bands inte­
grate faces and voices differently in the superior temporal sulcus. Journal of Neurophysi­
ology, 101, 773–788.

Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009).
The natural statistics of audiovisual speech. PLoS Computational Biology, 5, e1000436.

Chen, T. H., & Massaro, D. W. (2008). Seeing pitch: Visual information for lexical tones of
Mandarin Chinese. Journal of Acoustical Society of America, 123 (4), 2356–2366.

Colin, C., Radeau, M., & Deltenre, P. (2005). Top-down and bottom-up modulation of au­
diovisual integration in speech. European Journal of Cognitive Psychology, 17 (4), 541–
560.

Colin, C., Radeau, M., Deltenre, P., & Morais, J. (2001). Rules of intersensory integration
in spatial scene analysis and speech reading. Psychologica Belgica, 41, 131–144.

Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P. (2002). Mismatch
negativity evoked by the McGurk–MacDonald effect: A phonetic representation within
short-term memory. Clinical Neurophysiology, 113, 495–506.

Page 36 of 56
Multimodal Speech Perception

Connor, S. (2000). Dumbstruck: A cultural history of ventriloquism. Oxford, UK: Oxford


University Press.

Conrey, B., & Pisoni, D. B. (2006). Auditory-visual speech perception and synchrony de­
tection for speech and nonspeech signals. Journal of the Acoustical Society of America,
119, 4065–4073.

Cotton, J. C. (1935). Normal “visual hearing.” Science, 82, 592–593.

Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from
juncture misperception. Journal of Memory and Language, 31, 218–236.

Cvejic, E., Kim, J., & Davis, C. (2010). Prosody off the top of the head: Prosodic contrasts
can be discriminated by head motion. Speech Communication, 52 (6), 555–564.

Davis, C., & Kim, J. (2006). Audio-visual speech perception off the top of the head. Cogni­
tion, v100, B21–B31.

Degerman, A., Rinne, T., Pekkola, J., Autti, T., Jääskeläinen, I. P., Sams, M., & Alho, K.
(2007). Human brain activity associated with audiovisual perception and attention. Neu­
roImage, 34, 1683–1691.

Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception,
9, 719–721.

Doesburg, S. M., Emberson, L. L., Rahi1, A., Cameron D., & Ward, L. M. (2008). Asyn­
chrony from synchrony: Longrange gamma-band neural synchrony accompanies percep­
tion of audiovisual speech asynchrony. Experimental Brain Research, 185, 1.

Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech


sounds due to lip-reading. Nature, 381, 66–68.

Driver, J., & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on
“sensory-specific” brain regions, neural responses, and judgments. Neuron, 57 (1), 11–23.

Dupoux, E. (1993). The time course of prelexical processing: The Syllabic Hypothesis re­
visited. In: G. T. M. Altmann & R. Shillcock (Eds.), Cognitive models of speech processing:
The second Sperlonga meeting (pp. 81–114). Hillsdale, NJ: Erlbaum.

Engel, A. K., & Singer, W. (2001). Temporal binding and the neural correlates of sensory
awareness. Trends in Cognitive Sciences, 5, 16–25.

Erber, N. P., & Cramer, K. D. (1974). Vibrotactile recognition of sentences. American An­
nals of the Deaf, 119, 716–720.

Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a
statistically optimal fashion. Nature, 415, 429–433.

Page 37 of 56
Multimodal Speech Perception

Eskelund, K., Tuomainen, J., & Andersen, T. S. (2011). Multistage audiovisual integration
of speech: Dissociating identification and detection. Experimental Brain Research, 208,
447–457.

Ettlinger, G., & Wilson, W. A. (1990). Cross-modal performance: Behavioural processes,


phylogenetic considerations and neural mechanisms. Behavioural Brain Research, 40,
169–192.

Fairhall, S., & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration
at multiple cortical and subcortical sites. European Journal of Neuroscience, 29, 1247–
1257.

Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical evidence of mul­
timodal integration in primate striate cortex. Journal of Neuroscience, 22, 5749–5759.

Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons responding to
the observation of ingestive and communicative mouth actions in the monkey ventral pre­
motor cortex. European Journal of Neuroscience, 17, 1703–1714.

Fingelkurts, A. A., Fingelkurts, A. A., Krause, C. M., Möttönen, R., & Sams, M. (2003).
Cortical operational synchrony during audio–visual speech integration. Brain Language,
85, 297–312.

Fisher, B. D., & Pylyshyn, Z. W. (1994). The cognitive architecture of bimodal event per­
ception: A commentary and addendum to Radeau. Current Psychology of Cognition, 13
(1), 92–96.

Fisher, C. G. (1968). Confusions among visually perceived consonants. Journal of Speech


and Hearing Research, 11 (4), 796–804.

Fisher, C. G. (1969). The visibility of terminal pitch contour. Journal of Speech and Hear­
ing Research, 12, 379–382.

Fort, A., Delpuech, C., Pernier, J., & Giard, M. H. (2002). Early auditory-visual in­
(p. 548)

teractions in human cortex during nonredundant target identification. Cognitive Brain Re­
search, 14, 20–30.

Fowler, C. A. (1986). An event approach to the study of speech perception direct-realist


perspective. Journal of Phonology, 14, 3–28.

Fowler, C. A. (2004). Speech as a supramodal or amodal phenomenon. In G. Calvert, C.


Spence, B. E. Stein (Eds.), The handbook of multisensory processes. Cambridge, MA: MIT
Press.

Fowler, C., & Deckle, D. J. (1991). Listening with eye and hand: Cross-modal contributions
to speech perception. Journal of Experimental Psychology, Human Perception and Perfor­
mance, 17, 816–828.

Page 38 of 56
Multimodal Speech Perception

Foxton, J. M., Weisz, N., Bauchet-Lecaignard, F., Delpuech, C., & Bertrand, O. (2009). The
neural bases underlying pitch processing difficulties. NeuroImage, 45, 1305–1313.

Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. Y. (2004). Recalibration of audiovisual
simultaneity. Nature Neuroscience, 7 (7), 773–778.

Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech percep­
tion reviewed. Psychonomic Bulletin and Review, 13, 361–377.

Gagné, J. P., Tugby, K. G., & Michaud, J. (1991). Development of a Speechreading Test on
the Utilization of Contextual Cues (STUCC): Preliminary findings with normal-hearing
subjects. Journal of the Academy of Rehabilitative Audiology, 24, 157–170.

Garner, W. R. (1974). The processing of information and structure. Hillsdale, NJ: Erlbaum.

Gault, R. H. (1924) Progress in experiments on tactual interpretation of oral speech. Jour­


nal of Abnormal Psychology and Social Psychology, 14, 155–159.

Gault, R. H., & Crane, G.W. (1928). Tactual patterns from certain vowel qualities instru­
mentally communicated from a speaker to a subject’s fingers. Journal of General Psychol­
ogy, 1, 353–359.

Gentilucci, M., & Cattaneo, L. (2005). Automatic audiovisual integration in speech per­
ception. Experimental Brain Research, 167, 66–75.

Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., & Logothetis, N. K. (2005). Multisensory in­
tegration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuro­
science, 25, 5004–5012.

Giard, M. H., & Peronnet, F. (1999). Auditory-visual integration during multimodal object
recognition in humans: a behavioral and electrophysiological study. Journal of Cognitive
Neuroscience, 11, 473–490.

Gick, B., & Derrick, D. (2009). Aero-tactile integration in speech perception. Nature, 462,
502–504.

Gick, B., Jóhannsdóttir, K., Gibraiel, D., & Muehlbauer, J. (2008). Tactile enhancement of
auditory and visual speech perception in untrained perceivers. Journal of the Acoustical
Society America, 123, 72–76.

Granström, B., House, D., & Lundeberg, M. (1999). Prosodic cues in multimodal speech
perception. In Proceedings of the International Congress of Phonetic Sciences (ICPhS99)
(pp. 655–658). San Francisco.

Grant, K. W. (2001). The effect of speechreading on masked detection thresholds for fil­
tered speech. Journal of the Acoustical Society America, 109, 2272–2275.

Grant, K. W., Ardell, L. A., Kuhl, P. K., & Sparks, D. W. (1986). The transmission of prosod­
ic information via an electrotactile speech reading aid. Ear and Hearing, 7, 328–335.
Page 39 of 56
Multimodal Speech Perception

Grant, K. W., & Greenberg, S. (2001). Speech intelligibility derived from asynchronous
processing of auditory-visual information. International Conference of Auditory-Visual
Speech Processing (pp. 132–137). Santa Cruz, CA.

Grant, K. W., & Seitz, P.-F. (2000). The use of visible speech cues for improving auditory
detection of spoken sentences. Journal of the Acoustical Society America, 108, 1197–
1208.

Grant, K. W., VanWassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spec­
tral) and auditory-visual (cross-modal) synchrony. Speech peech Communication, 44, 43–
53.

Green, K. P., & Gerdeman, A. (1995). Cross-modal discrepancies in coarticulation and the
integration of speech information: The McGurk effect with mismatched vowels. Journal of
Experiment Psychology: Human Perception and Performance, 21 (6), 1409–1426.

Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B. (1991). Integrating speech infor­
mation across talkers, gender, and sensory modality: Female faces and male voices in the
McGurk effect. Perception and Psychophysics, 50 (6), 524–536.

Green, K. P., & Miller, J. L. (1985). On the role of visual rate information in phonetic per­
ception. Perception & Psychophysics, 38, 269–276.

Hamilton, R. H., Shenton, J. T., & Coslett, H. B. (2006). An acquired deficit of audiovisual
speech processing. Brain and Lang uage, 98, 66–73.

Hanin, L., Boothroyd, A., & Hnath-Chisolm, T. (1988). Tactile presentation of voice fre­
quency as an aid to the speechreading of sentences. Ear and Hearing, 9, 335–341.

Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. (2009). Time course of early
audiovisual interactions during speech and nonspeech central auditory processing: A
magnetoencephalography study. Journal of Cognitive Neuroscience, 21, 259–274.

Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding
in monkeys and humans. Journal of Cognitive Neuroscience, 21, 1229–1243.

Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities.
Journal of Experimental Psychology, 62, 423–432.

Hugenschmidt, C. E., Peiffer, A. M., McCoy, T. P., Hayasaka, S., & Laurienti, P. J. (2010).
Preservation of crossmodal selective attention in healthy aging. Experimental Brain Re­
search, 198, 273–285.

IJsseldijk, F. J. (1992). Speechreading performance under different conditions of video im­


age, repetition, and speech rate. Journal of Speech and Hearing Research, 35, 466–471.

Jack, C. E., & Thurlow, W. R. (1973). Effects of degree of visual association and angle of
displacement on the “ventriloquism” effect. Perceptual and Motor Skills, 37, 967–979.

Page 40 of 56
Multimodal Speech Perception

Jackson, C. V. (1953). Visual factors in auditory localization. Quarterly Journal of Experi­


mental Psychology, 5, 52–65.

Jackson, P. L. (1988). The theoretical minimal unit for visual speech perception: Visemes
and coarticulation. Volta Review, 90 (5), 99–114.

Jones, J., & Callan, D. (2003). Brain activity during audiovisual speech perception: An fM­
RI study of the McGurk effect. NeuroReport, 14, 1129–1133.

Jones, J. A., & Jarick, M. (2006). Multisensory integration of speech signals: The relation­
ship between space and time. Experimental Brain Research, 174, 588–594.

Jones, J. A., & Munhall, K. G. (1997). The effects of separating auditory and visual sources
on audiovisual integration of speech. Canadian Acoustics, 25, 13–19.

Joos, M. (1948). Acoustic phonetics. Baltimore: Linguistic Society of America.

Jordan, T. R., & Bevan, K. M. (1997). Seeing and hearing rotated faces: Influences
(p. 549)

of facial orientation on visual and audiovisual speech recognition. Journal of Experimental


Psychology: Human Perception and Performance, 23, 388–403.

Jordan, T. R., & Sergeant, P. (2000). Effects of distance on visual and audiovisual speech
recognition. Language and Speech, 43, 107–124.

Jordan, T. R., McCotter, M. V., & Thomas, S. M. (2000). Visual and audiovisual speech per­
ception with color and gray scale facial images. Perception and Psychophysics, 62, 1394–
1404.

Jordan, T. R., & Thomas, S. (2001). Effects of horizontal viewing angle on visual and au­
diovisual speech recognition, Journal of Experimental Psychology: Human Perception and
Performance, 27, 1386–1403.

Kaiser, J., Hertrich, L., Ackermann, H., Mathiak, K., & Lutzenberger, W. (2004). Hearing
lips: Gamma-band activity during audiovisual speech perception. Cerebral Cortex, 15,
646–653.

Kim, J., & Davis, C. (2004). Investigating the audio-visual speech detection advantage.
Speech Communication, 44, 19–30.

Kim, J., Kroos, C., & Davis, C. (2010). Hearing a point-light talker: An auditory influence
on a visual motion detection task. Perception, 39 (3), 407–416.

King, A. J. (2005). Multisensory integration: Strategies for synchronization. Current Biolo­


gy, 15 (9), 339–341.

Kislyuk, D. S., Möttönen, R., & Sams, M. (2008). visual processing affects the neural basis
of auditory discrimination. Journal of Cognitive Neuroscience, 20 (12), 2175–2184.

Page 41 of 56
Multimodal Speech Perception

Klatt, D. H. (1986). The problem of variability in speech recognition and in models of


speech perception. In J. Perkell & D. Klatt (Eds.), Invarience and variability in speech
processes (pp. 300–319). Hillsdale, NJ: Erlbaum.

Klatt, D. (1989). Review of selected models of speech perception. In W. D. Marslen-Wilson


(Ed.), Lexical representation and process (pp. 169–226). Cambridge, MA: MIT Press.

Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2005). Visual fixation patterns
during viewing of naturalistic social situations as predictors of social competence in indi­
viduals with autism. Archives of General Psychiatry, 59, 809–816.

Klucharev, V., Möttönen, R., & Sams, M. (2003). Electrophysiological indicators of phonet­
ic and non-phonetic multisensory interactions during audiovisual speech perception. Cog­
nitive Brain Research, 18 (1), 65–75.

Kubovy, M. (1988). Should we resist to the seductiveness of the


space:time::vision:audition analogy? Journal of Experimental Psychology: Human Percep­
tion and performance, 14, 318–320.

Kubovy, M., & Van Valkenburg, J. (1995). Auditory and visual objects. Cognition, 80, 97–
126.

Lansing, C. R., & McConkie, G. W. (1994). A new method for speechreading research:
Tracking observer’s eye movements. Journal of the Academy of Rehabilitative Audiology,
27, 25–43.

Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and
prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Re­
search, 42, 526–538.

Lansing, C. R., & McConkie, G. W. (2003). Word identification and eye fixation locations in
visual and visual-plusauditory presentations of spoken sentences. Perception and Psy­
chophysics, 65 (4), 536–552.

Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette J. H., & Wallace, M. T. (2004). Seman­
tic congruence is a critical factor in multisensory behavioral performance. Experimental
Brain Research, 158, 405–414.

Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cog­
nitive Sciences, 9 (2), 75–82.

Lebib, R., Papo, D., de Bode, S., & Baudonniere, P. M. (2003). Evidence of a visual-audito­
ry cross-modal sensory gating phenomenon as reflected by the human P50 event-related
potential modulation. Neuroscience Letters, 341, 185–188.

Levitt, H. (1988). Recurrent issues underlying the development of tactile sensory aids.
Ear and Hearing, 9 (6), 301–305.

Page 42 of 56
Multimodal Speech Perception

Levitt, H. (1995). Processing of speech signals for physical and sensory disabilities. Pro­
ceedings of the National Academy Science U S A, 92 (22), 9999–10006.

Lewkowicz, D. J. (2000). The development of intersensory temporal perception: An epige­


netic systems/limitations view. Psychological Bulletin, 126 (2), 281–308.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Percep­
tion of the speech code. Psychological Review, 74 (6), 431–461.

Liberman, A. M., Delattre, P., Cooper, F. S., & Gerstman, L. (1954). The role of consonant–
vowel transitions in the perception of the stop and nasal consonants. Psychological Mono­
graphs: General and Applied, 68, 1–13.

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception re­
vised. Cognition, 21, 1–36.

Lidestam, B., & Beskow, J. (2006). Visual phonemic ambiguity and speechreading. Journal
of Speech, Language, and Hearing Research, 49 (4), 835–847.

Lisker, L. (1986). “Voicing” in English: A catalog of acoustic features signaling /b/ versus /
p/ in trochees. Language and Speech, 29, 3–11.

Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J. (2004). Spatial and temporal
factors during processing of audiovisual speech: A PET study. NeuroImage, 21, 725–732.

MacDonald, J., Andersen, S., & Bachmann, T. (2000). Hearing by eye: How much spatial
degradation can be tolerated? Perception, 29, 1155–1168.

MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes.
Perception & Psychophysics, 24, 253–257.

MacLeod, A., & Summerfield, Q. (1987). Quantifying the contribution of vision to speech
perception in noise. British Journal of Audiology, 21, 131–141.

MacSweeney, M., Campbell, R., Calvert, G. A., McGuire, P. K., David, A. S., & Suckling, J.
(2001). Dispersed activation in the left temporal cortex for speechreading in congenitally
deaf speechreaders. Proceedings of the Royal Society of London B, 268, 451–457.

Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral


principle. Cambridge, MA: MIT Press.

Massaro, D. (2009). Caveat emptor: The meaning of perception and integration in speech
perception. Available from Nature Precedings, http://hdl.handle.net/10101/npre.
2009.4016.1.

Massaro, D. W., & Beskow, J. (2002). Multimodal speech perception: A paradigm for
speech science. In B. Granström, D. House, & I. Karlsson (Eds.), Multimodality in lan­
guage and speech systems (pp. 45–71). Dordrecht: Kluwer Academic Publishers.

Page 43 of 56
Multimodal Speech Perception

Massaro, D. W., & Cohen, M. M. (1983). Phonological context in speech perception. Per­
ception and Psychophysics, 34, 338–348.

Massaro, D. W., & Cohen, M. M. (1993). Perceiving asynchronous speech in consonant-


vowel and vowel syllables. Speech Communication, 13, 127–134.

Massaro, D. W., & Cohen, M. M. (1996). Perceiving speech from inverted faces.
(p. 550)

Perception and Psychophysics, 58 (7), 1047–1065.

Massaro, D. W., Cohen, M. M., & Smeele, P. M. T. (1996). Perception of asynchronous and
conflicting visible and auditory speech. Journal of the Acoustical Society of America, 100,
1777–1786.

Massaro, D. W., & Light, J. (2004). Using visible speech for training perception and pro­
duction of speech for hard of hearing individuals. Journal of Speech, Language, and Hear­
ing Research, 47 (2), 304–320.

Mattys, S. L., Bernstein, L. E., & Auer, E. T., Jr. (2002). Stimulus based lexical distinctive­
ness as a general word-recognition mechanism. Perception and Psychophysics, 64, 667–
679.

Mayer, C., Abel, J., Barbosa, A., Black, A., & Vatikiotis-Bateson, E. (2011). The labial
viseme reconsidered: Evidence from production and perception. Journal of the Acoustic
Society of America, 129, 2456–2456.

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cogni­
tive Psychology, 18, 1–86.

McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-visual
speech recognition by normal-hearing adults. Journal of the Acoustic Society of America,
77, 678–685.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 265, 746–
748.

Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D., & Iacoboni, M. (2007). The essential
role of premotor cortex in speech perception. Current Biology, 17 (19), 1692–1696.

Miller, L. M., & D’Esposito, M. (2005). Perceptual fusion and stimulus coincidence in the
cross-modal integration of speech. Journal of Neuroscience, 25, 5884–5893.

Miller, J. L., & Eimas, P. (1995). Speech perception: From signal to word. Annual Review
of Psychology, 46, 467–492.

Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some
English consonants. Journal of the Acoustical Society America, 27, 338–352.

Page 44 of 56
Multimodal Speech Perception

Miyamoto, R. T., Myres, W. A., Wagner, M., & Punch, J. I. (1987). Vibrotactile devices as
sensory aids for the deaf. Journal of American Academy of Otolaryngology-Head and Neck
Surgery, 97, 57–63.

Morein-Zamir, S., Soto-Faraco, S., & Kingstone, A. (2003). The capture of vision by audi­
tion: Deconstructing temporal ventriloquism. Cognitive Brain Research, 17, 154–163.

Möttönen, R., Krause, C. M., Tiippana, K., & Sams, M. (2002). Processing of changes in vi­
sual speech in the human auditory cortex. Cognitive Brain Research, 13, 417–425.

Möttönen, R., Schurmann, M., & Sams, M. (2004). Time course of multisensory interac­
tions during audiovisual speech perception in humans: A magnetoencephalographic
study. Neuroscience Letters, 363, 112–115.

Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the
McGurk effect. Perception and Psychophysics, 58, 351–362.

Munhall, K. G., Jones, J. A., Callan, D. Kuratate, T., & Vatikiotis-Bateson, E. (2003). Visual
prosody and speech intelligibility: Head movement improves auditory speech perception.
Psychological Science, 15 (2), 133–137.

Munhall, K. G., Kroos, C., Jozan, G., & Vatikiotis-Bateson, E. (2004). Spatial frequency re­
quirements for audiovisual speech perception. Perception and Psychophysics, 66, 574–
583.

Munhall, K. G., Servos, P., Santi, A., & Goodale, M. (2002). Dynamic visual speech percep­
tion in a patient with visual form agnosia. NeuroReport, 13 (14), 1793–1796.

Munhall, K.G., ten Hove, M., Brammer, M., & Paré, M. (2009). Audiovisual integration of
speech in a bistable illusion. Current Biology, 19 (9), 1–5.

Munhall, K. G., & Vatikiotis-Bateson, E. (1998). The moving face during speech communi­
cation. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: Pt. 2. The psychol­
ogy of speechreading and audiovisual speech (pp. 123–139). London: Taylor & Francis,
Psychology Press.

Munhall, K. G., & Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audio­
visual speech perception. In G. A. Calvert, C. Spence, B. E. Stein (Eds.), The handbook of
multisensory processing (pp. 177–188). Cambridge, MA: MIT Press.

Musacchia, G., Sams, M., Nicol, T., & Kraus, N. (2006). Seeing speech affects acoustic in­
formation processing in the human brainstem. Experimental Brain Research, 168 (1-2), 1–
10.

Näätänen, R. (1982). Processing negativity: An evoked-potential reflection of selective at­


tention. Psychological Bulletin, 92, 605–640.

Page 45 of 56
Multimodal Speech Perception

Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulato­
ry information enables the perception of L2 sounds. Psychological Research, 71 (1), 4–12.

Nitchie, E. B. (1916). The use of homophenous words. Volta Rev iew, 18, 85–83.

Norris, D., McQueen, J. M., & Cutler, A. (1995). Competition and segmentation in spoken-
word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition,
21, 1209–1228.

Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recogni­
tion: Feedback is never necessary. Behavioral and Brain Sciences, 23 (3), 299–324.

O’Hare, J. J. (1991). Perceptual integration. Journal of the Washington Academy of


Sciences, 81, 44–59.

Ohala, J. (1975). Temporal regulation of speech. In G. Fant & M. A. A. Tatham (Eds.), Audi­
tory analysis and perception of speech (pp. 431–453). London: Academic Press.

Olson, I. R., Gatenby, J. C., & Gore, J. C. (2002). A comparison of bound and unbound au­
dio-visual information processing in the human cerebral cortex. Cognitive Brain Research,
14, 129–138.

Ouni, S., Cohen, M. M., Ishak, H., & Massaro, D. W (2007). Visual contribution to speech
perception: Measuring the intelligibility of animated talking heads EURASIP. Journal on
Audio, Speech, and Music Processing, 2007 (Article ID 47891), 1–12.

Owens, O., & Blazek, B. (1985) Visemes observed by hearingimpaired and normal-hearing
adult viewers. Speech and Hearing Research, 28, 381–393.

Pallier, C. (1994). Role de la syllabe dans la perception de la parole: Etudes attentionelles.


Doctoral dissertation, Ecole des Hautes Etudes en Sciences Sociales, Paris.

Pandey, P. C., Kunov, H., & Abel, S. M. (1986). Disruptive effects of auditory signal delay
on speech perception with lipreading. Journal of Auditory Research, 26, 27–41.

Paré, M., Richler, R., ten Hove, M., & Munhall, K. G. (2003). Gaze behavior in audiovisual
speech perception: The influence of ocular fixations on the McGurk effect. Perception and
Psychophysics, 65, 553–567.

Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and
clear speech in noise and reverberation for listeners with normal and impaired hearing.
Journal of the Acoustical Society of America, 95, 1581–1592.

Pekkola, J., Ojanen, V., Autti, T., Jaaskelainen, I. P., Mottonen, R., Tarkiainen, A., &
(p. 551)

Sams, M. (2005). Primary auditory cortex activation by visual speech: An fMRI study at 3
T. NeuroReport, 16, 125–128.

Page 46 of 56
Multimodal Speech Perception

Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of
hearing I: Intelligibility differences between clear and conversational speech. Journal of
Speech and Hearing Research, 28, 96–103.

Pisoni, D. B., & Luce, P. A. (1987). Acoustic-phonetic representations in word recognition.


Cognition, 25, 21–52.

Plant, G. (1989). A comparison of five commercially available tactile aids. Australian Jour­
nal of Audiology, 11, 11–19.

Pöppel, E., Schill, K., & von Steinbüchel, N. (1990). Sensory integration within temporally
neutral systems states: a hypothesis. Naturwissenschaften, 77, 89–91.

Potter, R. K., Kopp, G. A., & Green, H. C. (1947). Visible speech. New York: Van Nostrand.
(Dover Publications reprint 1966).

Preminger, J. E., Lin, H., Payen, M., & Levitt, H. (1998). Selective visual masking in
speech reading. Journal of Speech Language and Hearing Research, 41 (3), 564–575.

Radeau, M. (1994). Auditory-visual spatial interaction and modularity. Current Psychology


of Cognition, 13 (1), 3–51.

Radicke, J. L. (2007). Audiovisual phonological fusion. Unpublished master’s thesis, Indi­


ana University, Bloomington, IN.

Raphael, L. J. (2005). Acoustic cues to the perception of segmental phonemes. In D. B.


Pisoni & R. E. Remez (Eds.), The handbook of speech perception (pp. 182–206). Oxford,
UK: Blackwell.

Reed, C. M., Durlach, N. I., & Braida, L. D. (1982). Research on tactile communication of
speech: A review. American Speech, Language, and Hearing Association, 20.

Reed, C. M., Rabinowitz, W. M., Durlach, N. I., Braida, L. D., Conway-Fithian, S., &
Schultz, M. C. (1985). Research on the Tadoma method of speech communication. Journal
of the Acoustical Society America, 77, 247–257.

Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A
lip-reading advantage with intact auditory stimuli. In B. Dodd, R. Campbell (Eds.), Hear­
ing by eye: The psychology of lip-reading. Hillsdale, NJ: Erlbaum.

Rockland, K. S., & Ojima, H. (2003). Multisensory convergence in calcarine visual areas
in macaque monkey. International Journal of Psychophysiology, 50, 19–26.

Rosenblum, L. D. (2008). Speech perception as a multimodal phenomenon. Current Direc­


tions in Psychological Science, 17 (6), 405–409.

Rosenblum, L. D., Johnson, J. A., & Saldaña, H. M. (1996). Visual kinematic information
for embellishing speech in noise. Journal of Speech and Hearing Research, 39 (6), 1159–
1170.
Page 47 of 56
Multimodal Speech Perception

Rosenblum, L. D., & Saldaña, H. M. (1996). An audiovisual test of kinematic primitives for
visual speech perception. Journal of Experimental Psychology: Human Perception and
Performance, 22 (2), 318–331.

Rosenblum, L. D., & Saldaña, H. M. (1998). Time-varying information for visual speech
perception. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: Pt. 2. The psy­
chology of speechreading and audiovisual speech (pp. 61–81). Hillsdale, NJ: Erlbaum.

Rönnberg, J., Samuelsson, S., & Lyxell, B. (1998). Conceptual constraints in sentence-
based lipreading in the hearing-impaired. In R. Campbell, B. Dodd, & D. Burnham (Eds.),
Hearing by eye, II: Advances in the psychology of speechreading and auditory-visual
speech (pp. 143–153). East Sussex, UK: Psychology Press.

Ross, L., Saint-Amour, D., Leavitt, V., Jeavitt, D. C., & Foxe J. J. (2006). Do you see what
I’m saying? Optimal visual enhancement of speech comprehension in noisy environments.
Cerebral Cortex, 17 (5), 1147–1153.

Rouger, J., Fraysse, B., Deguine, O., & Barone, P. (2008). McGurk effects in cochlear-im­
planted deaf subjects. Brain Research, 1188, 87–99.

Rouger, J., Lagleyre, S., Fraysse, B., Deneve, S., Deguine, O., & Barone, P. (2007). Evi­
dence that cochlear-implanted deaf patients are better multisensory integrators. Proceed­
ings of the National Academy of Sciences U S A, 104 (17), 7295–7300.

Saint-Amour, D., De Sanctis, S. P., Molholm, S., Ritter, W., & Foxe, J. J. (2007). Seeing voic­
es: High-density electrical mapping and source-analysis of the multisensory mismatch
negativity evoked during the McGurk illusion. Neuropsychologia, 45, 587–597.

Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., & Simola, J.
(1991). Seeing speech: Visual information from lip movements modifies activity in the hu­
man auditory cortex. Neuroscience Letters, 127, 141–145.

Samuelsson, S., & Rönnberg, J. (1993). Implicit and explicit use of scripted constraints in
lip-reading. European Journal of Cognitive Psychology, 5, 201–233.

Sato, M., Cavé, C., Ménard, L., & Brasseur, A. (2010). Auditorytactile speech perception
in congenitally blind and sighted adults. Neuropsychologia, 48 (12), 3683–3686.

Sato, M., Tremblay, P., & Gracco, V. L. (2009). A mediating role of the premotor cortex in
phoneme segmentation. Brain and Language, 111 (1), 1–7.

Schadow, J., Lenz, D., Dettler, N., Fründ, I., & Herrmann, C. S. (2009). Early gamma-band
responses reflect anticipatory top-down modulation in the auditory cortex. NeuroImage,
47 (2), 651–658.

Scheier, C. R., Nijhawan, R., & Shimojo, S. (1999). Sound alters visual temporal resolu­
tion. Investigative Ophthalmology and Visual Science, 40, 4169.

Page 48 of 56
Multimodal Speech Perception

Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscilla­
tions and visual amplification of speech. Trends in Cognitive Sciences, 12, 106–113.

Schulte, K. (1972). Fonator system: Speech stimulator and speech feedback by technical­
ly amplified one-channel vibration. In G. Fant (Ed.), International Symposium on Speech
Communication Ability and Profound Deafness (pp. 351–353). Washington, DC: A.G. Bell
Association for the Deaf.

Schwartz, J. L., Berthommier, F., & Savariaux, C. (2002). Audiovisual scene analysis: Evi­
dence for a “very-early” integration process in audio-visual speech perception. Proceed­
ings of ICSLP, 1937–1940.

Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of
speech perception. Trends in Neurosciences, 26 (2), 100–107.

Sekiyama, K., Kanno, I., Miura, S., & Sugita, Y. (2003). Auditory-visual speech perception
examined by fMRI and PET. Neuroscience Research, 47 (3), 277–287.

Senkowski, D., Talsma, D., Herrmann, C., & Woldorff, M. G. (2005). Multisensory process­
ing and oscillatory gamma responses: Effects of spatial selective attention. Experimental
Brain Research, 166, 411–426.

Shulman, G. L., Fiez, J. A., Corbetta, M., Buckner, R. L., Miezin, F. M., Raichle, M. E., &
Petersen, S. E. (1997). Common blood flow changes across visual tasks: II. Decreases in
cerebral cortex. Journal Cognitive Neuroscience, 9, 648–663.

Skinner, M. W., Rinzer, S. M., Fredricksorr, J. M., Smith, P. G., Holden, T. A., Hold­
(p. 552)

en, L. K., Juelich, M. E., & Turner, B. A. (1989). Comparison of benefit from vibrotactile
aid and cochlear implant for post-linguistically deaf adults. Laryngoscope, 98, 1092–1099.

Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., & Small, S. L. (2007). Hearing lips and
seeing voices: How cortical areas supporting speech production mediate audiovisual
speech perception. Cerebral Cortex, 17 (10), 2387–2399.

Small, D. M., Voss, J., Mak, Y. E., Simmons, K. B., Parrish, T., & Gitelman, D. (2004). Expe­
rience-dependent neural integration of taste and smell in the human brain. Journal of
Neurophysiology, 92, 1892–1903.

Smeele, P., Massaro, D., Cohen, M., & Sittig, A. (1998). Laterality in visual speech percep­
tion. Journal of Experimental Psychology: Human Perception and Psychophysics, 24,
1232–1242.

Soto-Faraco, S., & Alsius, A. (2007). Conscious access to the unisensory components of a
crossmodal illusion. NeuroReport, 18, 347–350.

Soto-Faraco, S., & Alsius, A. (2009). Deconstructing the McGurk-MacDonald illusion. Jour­
nal of Experimental Psychology: Human Perception and Performance, 35 (2), 580–587.

Page 49 of 56
Multimodal Speech Perception

Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual
speech integration: evidence from the speeded classification task. Cognition, 92 (3), B13–
B23.

Soto-Faraco, S., Navarra, J., Weikum, W. M., Vouloumanos, A., Sebastián-Gallés, N., &
Werker, J. F. (2007) Discriminating languages by speech-reading. Perception and Psy­
chophysics, 69, 218–231.

Soto-Faraco, S., Sebastián-Gallés, N., & Cutler, A. (2001). Segmental and suprasegmental
mismatch in lexical access. Journal of Memory and Language, 45, 412–432.

Spence, C., & Squire, S. B. (2003). Multisensory integration: Maintaining the perception
of synchrony. Current Biology, 13, R519–R521.

Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT
Press.

Stekelenburg, J. J., & Vroomen, J. (2007). Neural correlates of multisensory integration of


ecologically valid audiovisual events. Journal of Cognitive Neuroscience, 19 (12), 1964–
1973.

Stevenson, C. M., Brookes, M. J., & Morris, P. G. (2011). b-Band correlates of the fMRI
BOLD response. Human Brain Mapping, 32, 182–197.

Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior tempo­
ral sulcus: Inverse effectiveness and the neural processing of speech and object recogni­
tion. NeuroImage, 44 (3), 1210–1223.

Studdert-Kennedy, M. (1989). Feature fitting: A comment on K. N. Stevens’ “On the quan­


tal nature of speech.” Journal of Phonetics, 17, 135–144.

Sumby, W., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Jour­
nal of Acoustical Society of America, 26, 212–215.

Summerfield, Q. A. (1979). Use of visual information for phonetic perception. Phonetica,


36, 314–331.

Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual


speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of
lipreading (pp. pp. 3–51). Hillsdale, NJ: Erlbaum.

Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical


Transactions of the Royal Society, Series B, Biological Sciences, 335, 71–78.

Summerfield, Q., & McGrath, M. (1984). Detection and resolution of audio-visual incom­
patibility in the perception of vowels. Quarterly Journal of Experimental Psychology, 36A,
51–74.

Page 50 of 56
Multimodal Speech Perception

Summers, I. R., Cooper, P. G., Wright, P., Gratton, D. A., Milnes, P., & Brown, B. H. (1997).
Information from timevarying vibrotactile stimuli. Journal of the Acoustical Society of
America, 102, 3686–3696.

Talsma, D., Doty, T. J., & Woldorff, M. G. (2007). Selective attention and audiovisual inte­
gration: is attending to both modalities a prerequisite for early integration? Cerebral Cor­
tex, 17 (3), 679–690.

Talsma, D., Senkowski, D., Soto-Faraco, S., & Woldorff, M. G. (2010). The multifaceted in­
terplay between attention and multisensory integration. Trends in Cognitive Sciences, 14,
400–410.

Talsma, D., & Woldorff, M. G. (2005). Selective attention and multisensory integration:
Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience,
17 (7), 1098–1114.

Thomas, S. M., & Jordan, T. R. (2002). Determining the influence of Gaussian blurring on
inversion effects with talking faces. Perception and Psychophysics, 64, 932–944.

Thomas, S. M., & Jordan, T. R. (2004). Contributions of oral and extra-oral facial motion to
visual and audiovisual speech perception. Journal of Experimental Psychology: Human
Perception and Performance, 30, 873–888.

Thompson, D. M. (1934). On the detection of emphasis in spoken sentences by means of


visual, tactual, and visual-tactual cues. Journal of General Psychology, 11, 160–172.

Thorn, F., & Thorn, S. (1989). Speechreading with reduced vision: A problem of aging.
Journal of the Optical Society of America, 6, 491–499.

Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual attention modulates audiovisual
speech perception. European Journal of Cognitive Psychology, 16, 457–472.

Tiippana, K., Puharinen, H., Möttönen, R., & Sams, M. (2011). Sound location can influ­
ence audiovisual speech perception when spatial attention is manipulated. Seeing and
Perceiving, 24, 67–90.

Tremblay, C., Champoux, F., Voss, P., Bacon, B. A., & Lepore, F. (2007). Speech and non-
speech audio-visual illusions: A developmental study. PLoS ONE, 2 (8), e742.

Troyer, M., Loebach, J. L., & Pisoni, D. B. (2010). Perception of temporal asynchrony in
audiovisual phonological fusion. Research on Spoken Language Processing, Progress Re­
port, 29, 156–182.

Tuomainen, J., Andersen, T., Tiippana, K., & Sams, M. (2005). Audio-visual speech percep­
tion is special. Cognition, 96 (1), B13–B22.

Page 51 of 56
Multimodal Speech Perception

van Atteveldt, N. M., Formisanoa, E., Goebela, R., & Blomert, L. (2007). Top-down task ef­
fects overrule automatic multisensory responses to letter–sound pairs in auditory associa­
tion cortex. NeuroImage, 36 (4), 1345–1360.

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neur­
al processing of auditory speech. Proceedings of the National Academy of Science U S A,
102, 1181–1186.

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration
in bimodal speech. Neuropsychologia, 45 (3), 598–607.

Varela, F., Lachaux, J. P., Rodríguez, E., & Martinerie, J. (2001). The brainweb: Phase syn­
chronization and large-scale integration. Nature Reviews Neuroscience, 2, 229–239.

Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration
by the “unity effect” reveals that speech is special. Journal of Vision, 9 (14), 1–11.

Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual tempo­
(p. 553)

ral adaptation of speech: Temporal order versus simultaneity judgments. Experimental


Brain Research, 185, 521–529.

Vatakis, A., & Spence, C. (2006). Audiovisual synchrony perception for speech and music
assessed using a temporal order judgment task. Neuroscience Letters, 393, 40–44.

Vatakis, A., & Spence, C. (2007). Crossmodal binding: Evaluating the “unity assumption”
using audiovisual speech and nonspeech stimuli. P erception and Psychophysics, 69, 744–
756.

Vatakis, A., & Spence, C. (2008). Evaluating the influence of the “unity assumption” on
the temporal perception of realistic audiovisual stimuli. Acta Psychologica, 127 (1), 12–23.

Vatakis, A., & Spence, C. (2010). Audiovisual temporal integration for complex speech,
object-action, animal call, and musical stimuli. In M. J. Naumer & J. Kaiser (Eds.), Multi­
sensory object perception in the primate brain (pp. 95–121). New York: Springer. Engi­
neering in Medicine and Biology Society of the Institute of Electrical and Electronics En­
gineers.

Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S., & Munhall, K. G. (1998). Eye movement of
perceivers during audiovisual speech perception. Perception and Psychophysics, 60, 926–
940.

Vatikiotis-Bateson, E., Munhall, K. G., Kasahara, Y., Garcia, F., & Yehia, H. (1996). Charac­
terizing audiovisual information during speech. In Proceedings of the 4th International
Conference on Spoken Language Processing (ICSLP 96) (Vol. 3, pp. 1485–1488). New
York: IEEE Press.

Page 52 of 56
Multimodal Speech Perception

Vitkovitch, M., & Barber, P. (1994). Effect of video frame rate on subjects’ ability to shad­
ow one of two competing verbal passages. Journal of Speech and Hearing Research, 37
(5), 1204–1211.

Vroomen, J., & de Gelder, B. (2004). Temporal ventriloquism: Sound modulates the flash-
lag effect. Journal of Experimental Psychology: Human Perception and Performance, 30,
513–518.

Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: A tutorial re­
view. Attention, Perception, & Psychophysics, 72, 871–884.

Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal
order perception by exposure to audiovisual asynchrony. Cognitive Brain Research, 22 (1),
32–35.

Vroomen, J., & Stekelenburg, J. J. (2011). Perception of intersensory synchrony in audiovi­


sual speech: Not that special. Cognition, 118, 78–86.

Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing:
Familiar faces and voices in the McGurk effect. P erception and Psychophysics, 57, 1124–
1133.

Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the
motor system involved in speech production. Neuropsychologia, 41 (8), 989–994.

Weikum, W. M., Vouloumanos, A., Navarra, J., Soto-Faraco, S., Sebastián-Gallés, N., &
Werker, J. F. (2007). Visual language discrimination in infancy. Science, 316, 1159.

Weisenberger, J. M., Broadstone, S. P., & Kozma-Spytek, L. (1991). Relative performance


of single-channel and multichannel tactile aids for speech perception. Journal of Rehabili­
tation Research and Development, 28, 45–56.

Weisenberger, J. M., Broadstone, S. M., & Saunders, F. A. (1989). Evaluation of two multi­
channel tactile aids for the hearing impaired. Journal Acoustic Society America, 865,
1764–1775.

Weisenberger, J. M., & Kozma-Spytek, L. (1991). Evaluating tactile aids for speech per­
ception and production by hearingimpaired adults and children. American Journal of Otol­
ogy, 12 (Suppl), 188–200.

Weisenberger, J. M., & Percy, M. (1995). The transmission of phoneme-level information


by multichannel tactile speech perception aids. Ear and Hearing, 16, 392–406.

Weisenberger, J. M., & Russel, A. F. (1989). Comparison of two single-channel vibrotactile


aids for the hearing-impaired. Journal of Speech Hearing Research, 32, 83–92.

Welch, R. B. (1999). Meaning, attention and the “unity assumption” in the intersensory
bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler

Page 53 of 56
Multimodal Speech Perception

(Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 371–
388). Amsterdam: Elsevier.

Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory dis­
crepancy. Psychological Bulletin, 88 (3), 638–667.

Welch, R. B., & Warren D. H. (1986). Intersensory interactions. In K. R. Kaufman & J. P.


Thomas (Eds.), Handbook of perception and human performance. Vol. 1: Sensory process­
es and perception (pp. 1–36). New York: Wiley.

Wickelgren, W. A. (1969). Context-sensitive coding, associative memory, and serial order


in (speech) behavior. Psychological Review, 76, 1–15.

Wilson, S. M., Pinar Saygin, A., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech
activates motor areas involved in speech production. Nature, 7 (7), 701–702.

Windmann, S. (2003). Effects of sentence context and expectation on the McGurk illusion.
Journal of Memory and Language, 50 (2), 212–230.

Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysen­
sory interactions along lateral temporal regions evoked by audiovisual speech. Cerebral
Cortex, 13, 1034–1043.

Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head
motion and speech acoustics. Journal of Phonetics, 30, 555–568.

Yehia, H., Rubin, P., & Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract
and facial behaviour. Speech Communication, 26, 23–43.

Yuan, H., Reed, C. M., & Durlach, N. I. (2005). Tactual display of consonant voicing as a
supplement to lipreading. Journal of the Acoustical Society of America, 118 (2), 1003–
1015.

Zampini, M., Guest, S., Shore, D. I., & Spence, C. (2003). A udio-visual simultaneity judg­
ments. Perception and Psychophysics, 67 (3), 531–544.

Notes:

(1) . A spectrograph is an instrument that separates an incoming wave into a frequency


spectrum.

(2) . Two main theories (motor theory, Liberman et al., 1967, Liberman & Mattingly, 1985;
and direct realism, Fowler, 1986) propose that the process of speech perception involves
the perceptual recovery and classification of articulatory gestures produced by the talker
rather than the acoustic correlates of those articulations. Thus, according to these theo­
ries, articulatory movements are the invariant components in speech perception.

Page 54 of 56
Multimodal Speech Perception

(3) . Whereas some researchers have used these two terms inter-changeably, others have
distinguished them theoretically. Summerfield (1992; Footnote 1), for instance, defines lip
reading as the perception of speech purely by observing the talker’s articulatory ges­
tures, whereas speech reading would also include other linguistic aspects of nonverbal
communication (e.g., the talker’s facial and manual gestures).

(4) . According to this model, the processing of corresponding auditory and visual infor­
mation is never functionally separated (at early stages of processing). Other models claim
that audiovisual binding occurs after unisensory signals have been thoroughly processed,
thus at a late stage of processing (e.g., FLMP; Massaro, 1998).

(5) . Sensory substitution systems gather environmental energy that would normally be
processed by one sensory system (e.g., acoustic energy in the present case) and translate
this information into stimuli for another sensory system (e.g., electrotactile or vibrotactile
energy).

(6) . In air, the speed of light is much faster than sound (approximately 300,000,000 m/
second vs. 330 m/second, respectively). The neural transduction latencies are faster for
auditory stimuli than for visual (approximately 10 ms vs. 50 ms, respectively; Pöppel et
al., 1990).

(7) . The temporal resolution has often been determined by means of temporal order judg­
ment (TOJ) or simultaneity judgment (SJ) tasks. In a TOJ task, stimuli are presented with
multi-sensory information at various stimulus onset asynchronies (SOAs; Dixon & Spitz,
1980; Hirsh & Sherrick, 1961), and observers may judge which stimulus came first or
which came second. In an SJ task, observers must judge whether the stimuli were pre­
sented simultaneously or successively.

(8) . The special nature of audiovisual speech perception, compared with nonspeech au­
diovisual binding, has been supported in other studies showing that audiovisual interac­
tion is stronger if the very same audiovisual stimuli are treated as speech rather than
nonspeech (Tuomainen et al., 2005) and that independent maturational processes under­
lie speech and nonspeech audiovisual illusory effects (Tremblay et al., 2007). At the physi­
ological level, audiovisual speech and nonspeech interactions also appear to rely, at least
in part, on distinct mechanisms (Klucharev et al., 2003; van Wassenhove et al., 2005).
However, higher familiarity, extensive exposure to these stimuli in daily life, and the fact
that audiovisual speech events may be somehow more attention grabbing (i.e., compared
with nonspeech events) are potential confounds that may explain some of the differences
reported in previous literature (see Stekelenburg & Vroomen, 2007; Vatakis et al., 2008).

(9) . Among other things, determining which components of the face are important for vis­
ible speech perception will allow the development of applications with virtual three-di­
mensional animated talking heads (Ouni et al., 2007). These animated agents have the po­
tential to improve communication in a range of situations, by supporting auditory infor­

Page 55 of 56
Multimodal Speech Perception

mation (e.g., deaf population, telephone conversation, second language learning; Massaro
& Light, 2004).

(10) . The MMN is typically evoked by an occasional auditory change (deviant) in a ho­
mogenous sequence of auditory stimuli (standards), even when it occurs outside the sub­
jects’ focus of attention. That is, it constitutes an electrophysiological signature of audito­
ry discrimination abilities.

(11) . According to the perceptual load theory of attention, the level of attentional de­
mands required to perform a particular process will determine, among other things, the
amount of resources available to engage in the processing of task irrelevant information.
When a relevant task exhausts the available processing resources (i.e., under conditions
of high perceptual load), other incoming stimuli/tasks will receive little, if any, attention.
However, the theory also implies that, if the target-processing load is low, attention will
inevitably spill over to the processing of distractors, even if they are task irrelevant.

Agnès Alsius

Agnès Alsius, Department of Psychology, Queen’s University, Ontario, Canada

Ewen MacDonald

Ewen MacDonald, Department of Psychology, Queen’s University, Ontario, Canada;


Centre for Applied Hearing Research, Department of Electrical Engineering, Techni­
cal University of Denmark, Lyngby, Denmark

Kevin Munhall

Kevin Munhall is Professor and Coordinator of Graduate Studies, Queen's University.

Page 56 of 56
Organization of Conceptual Knowledge of Objects in the Human Brain

Organization of Conceptual Knowledge of Objects in


the Human Brain  
Bradford Z. Mahon and Alfonso Caramazza
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0027

Abstract and Keywords

One of the most provocative and exciting issues in cognitive science is how neural speci­
ficity for semantic categories of common objects arises in the functional architecture of
the brain. Three decades of research on the neuropsychological phenomenon of category-
specific semantic deficits has generated detailed claims about the organization and repre­
sentation of conceptual knowledge. More recently, researchers have sought to test hy­
potheses developed on the basis of neuropsychological evidence with functional imaging.
From those two fields, the empirical generalization emerges that object domain and sen­
sory modality jointly constrain the organization of knowledge in the brain. At the same
time, research within the embodied cognition framework has highlighted the need to ar­
ticulate how information is communicated between the sensory and motor systems, and
processes that represent and generalize abstract information. Those developments point
toward a new approach for understanding category-specificity in terms of the coordinat­
ed influences of diverse regions and cognitive systems.

Keywords: objects, category-specific semantic deficits, conceptual knowledge, organization, object domain, senso­
ry modality

Introduction
The scientific study of how concepts are represented in the mind/brain extends to all dis­
ciplines within cognitive science. Within the psychological and brain sciences, research
has focused on studying how the perceptual, motor, and conceptual attributes of common
objects are represented and organized in the brain. Theories of conceptual representa­
tion must therefore explain not only how conceptual content itself is represented and or­
ganized but also the role played by conceptual content in orchestrating perceptual and
motor processes.

Page 1 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Cognitive neuropsychological studies of brain-damaged patients provide strong evidence


about the representation of conceptual knowledge and the relationship between concep­
tual knowledge and perceptual and motor processes. The cognitive neuropsychological
approach ultimately seeks to evaluate models of cognitive processing through the proxi­
mate goal of explaining the profile of behavioral performance observed in brain-damaged
patients. In the measure to which it is possible to establish the functional locus of impair­
ment in a patient within a given model of cognitive functioning, it is possible to test other
assumptions of that model through further experiments with that patient. Dissociations of
abilities in patients (and of processes in models) are central to the neuropsychological ap­
proach. This is because, if a given behavior/process X can be impaired while another be­
havior/process Y is preserved, then one may conclude that the former process is not
causally involved in the latter process. Another important source of evidence from neu­
ropsychology is aspects of cognitive functioning that are observed to be systematically
impaired or spared together (for discussion of methodological issues in (p. 555) cognitive
neuropsychology, see Caramazza, 1986, 1992; Shallice, 1988).

Scope of the Review

The modern study of the representation of concepts in the brain was initiated by a series
of papers by Elizabeth Warrington, Tim Shallice, and Rosaleen McCarthy. Those authors
described patients with disproportionate semantic impairments for one, or several, cate­
gories of objects compared with other categories (see Hécaen & De Ajuriaguerra, 1956,
for earlier work). Since those initial investigations, a great deal has been learned about
the causes of category-specific semantic deficits and, by extension, about the organiza­
tion of object knowledge in the brain.

Page 2 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Figure 27.1 Representative picture naming perfor­


mance of patients with category-specific semantic
deficits. a. (Upper Left) Category-specific semantic
deficits for living animate things. b. (Upper Right)
Category-specific semantic deficits for fruit/vegeta­
bles. c. (Lower Left) Category-specific semantic
deficits for conspecifics. d. (Lower Right). Category-
specific semantic deficits for nonliving.

The focus of this review is on neuropsychological research and, in particular, on the phe­
nomenon of category-specific semantic deficits. Evidence from other fields within cogni­
tive science and neuroscience and functional neuroimaging is reviewed as it bears on the
theoretical positions that emerge from the study of category-specific semantic deficits. In
particular, we highlight findings in functional neuroimaging related to the representation
of different semantic categories in the brain. We also discuss the degree to which concep­
tual representations are grounded in sensory and motor processes and the critical role
that neuropsychological studies of patients with impairments to sensory and motor knowl­
edge can play in constraining theories of semantic representation. However, the stated fo­
cus of this chapter also excludes important theoretical positions in the field of semantic
memory (e.g., Patterson, Nestor, & Rogers, 2007).

Category-Specific Semantic Deficits: Introduc­


tion to the Phenomenon
Patients with category-specific semantic deficits present with disproportionate or even se­
lective impairments for one semantic category compared with other semantic categories.
Figure 27.1 illustrates cases of disproportionate impairment for animals (see Figure
27.1A; Blundo et al., 2006; Caramazza & Shelton, 1998), (p. 556) fruit/vegetables (see Fig­
ure 27.1B; Hart et al., 1985; Samson & Pillon, 2003), conspecifics (see Figure 27.1C;
Miceli et al., 2000; Ellis et al., 1989), and nonliving things (see Figure 27.1D; Hillis &
Caramazza, 1991; Laiacona & Capitani, 2001). There have been over 100 reported cases

Page 3 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

of category-specific semantic impairment (for review and discussion, see Capitani et al.,
2003; Hart et al., 2007; Humphreys & Forde, 2001; Tyler & Moss, 2001). The majority of
reported patients have disproportionate impairments for living things compared with non­
living things (Capitani et al., 2003).

One important aspect of the performance profile of patients with category-specific seman­
tic impairment is that the impairment is to conceptual knowledge and not (only) to modal­
ity-specific input or output representations. The evidence for locating the deficit at a con­
ceptual level is that the category-specific deficit does not depend on stimuli being pre­
sented, or on patients responding, in only one modality of input or output. For instance,
patients KC and EW (see Figure 27.1A) were impaired for naming living animate things
compared with nonliving things and fruit/vegetables. Both patients were also impaired for
answering questions about living animate things, such as “does a whale have legs” or
“are dogs domestic animals,” but were unimpaired for the same types of questions about
nonanimals (Figure 27.2A).

Patients with category-specific semantic deficits may also have additional, and also cate­
gory-specific, deficits at presemantic levels of processing. For instance, patient EW was
impaired for judging whether pictures depicted real or unreal animals, but was unim­
paired for the same task over nonanimal stimuli. The ability to make such decisions is as­
sumed to index the integrity of the visual structural description system, a presemantic
stage of object recognition (Humphreys et al., 1988). In contrast, patient KC was relative­
ly unimpaired on an object decision task, even for the category of items (living animate)
that the patient was unable to name. A similar pattern to that observed in patient KC was
present in patient APA (Miceli et al., 2000). Patient APA was selectively impaired for con­
ceptual knowledge of people (see Figure 27.1C). Despite a severe impairment for naming
famous people, APA did not have a deficit at the level of face recognition (prosopagnosia).

Another important aspect of patients with category-specific semantic impairments is that


they have difficulty distinguishing among basic-level items within the impaired category,
but do not necessarily have problems assigning items they cannot identify to the correct
superordinate-level category (e.g., they may know that a picture of a dog is an animal, but
do not know which animal; see Humphreys & Forde, 2005, for a patient with greater diffi­
culty at a superordinate than a basic level across all semantic categories).

A number of studies have now documented that variables such as lexical frequency, con­
cept familiarity, and visual complexity may be unbalanced if items are sampled “random­
ly” from different semantic categories (Cree & McRae, 2003; Funnell & Sheridan, 1992;
Stewart et al., 1992). In addition, Laiacona and colleagues (Barbarotto et al., 2002; Laia­
cona et al., 1998) have highlighted the need to control for gender-specific effects on vari­
ables such as concept familiarity (for discussion of differences between males and fe­
males in the incidence of category-specific semantic deficits for different categories, see
Laiacona et al., 2006). However, the existence of category-specific semantic deficits is not
an artifact of such stimulus-specific attributes. Clear cases have been reported while
carefully controlling for those factors, and double dissociations have been reported using

Page 4 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

the same materials (e.g., Hillis & Caramazza, 1991; see also the separate case reports in
Laiacona & Capitani, 2001, and Barbarotto et al., 1995).

Overview of Theoretical Explanations of the


Causes of Category-Specific Semantic Deficits
Theories developed to explain category-specific semantic deficits fall into two broad
groups (Caramazza, 1998). Theories within the first group, based on the neural structure
principle, assume dissociable neural substrates are differentially (or exclusively) involved
in representing different semantic categories. Theories within the second group, based on
the correlated structure principle, assume that conceptual knowledge of items from dif­
ferent semantic categories is not represented in functionally dissociable regions of the
brain.

According to theories based on the neural structure principle, category-specific semantic


deficits are due to differential or selective damage to the neural substrate upon which the
impaired category of items depends. Two broad classes of theories based on the neural
structure principle are the sensory/functional theory (Warrington & McCarthy, 1983,
1987; Warrington & Shallice, 1984) and the domain-specific hypothesis (Caramazza &
Shelton, 1998).

Page 5 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Figure 27.2 Relation between impairments for a type


or modality of knowledge and category-specific se­
mantic deficits. These data show that: a. (Upper
Graph) category-specific semantic impairments are
associated with impairments for all types of knowl­
edge about the impaired category; b. (Middle Graph)
differential impairments for visual/perceptual knowl­
edge can be associated with (if anything) a dispro­
portionate impairment for nonliving things compared
to living things; and c. (Lower Graph) selective im­
pairment for knowledge of object color is not associ­
ated with a corresponding disproportionate deficit
for fruit/vegetables.

References for patient initials from a. (Upper Graph):


EW–Caramazza and Shelton 1998; GR and FM–Laia­
cona et al 1993; DB–Lambon Ralph et al 1998; RC–
Moss et al 1998.

(p. 557)

The sensory/functional theory is composed of two assumptions. The first—the multiple se­
mantics assumption—is that conceptual knowledge is organized into subsystems that par­
allel the sensory and motor modalities of input and output. The second assumption is that
the critical semantic attributes of items from different categories of objects are repre­
sented in different modality-specific semantic subsystems.

The domain-specific hypothesis assumes that the first-order constraint on the organiza­
tion of conceptual knowledge is object domain, with the possible domains restricted to
those that could have had an evolutionarily relevant history—living animate, living inani­
mate, conspecifics, and “tools.”

Theories based on the correlated structure principle model semantic memory as a system
that (p. 558) represents statistical regularities in the co-occurrence of object properties in
the world (Caramazza et al., 1990; Devlin et al., 1998; McClelland & Rogers, 2003; Tyler

Page 6 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

& Moss, 2001). This class of models has been instrumental in motivating large-scale em­
pirical investigations of how different types of features are distributed and correlated for
different semantic categories. Several theories based on the correlated structure princi­
ple have been developed to explain the causes of category-specific semantic deficits
(Caramazza et al., 1990; Devlin et al., 1998; Tyler & Moss, 2001).

This review is organized to reflect the role that different theoretical assumptions have
played in motivating empirical research. Initial hypotheses that were developed to ex­
plain category-specific semantic deficits appealed to a single principle of organization
(modality specificity, domain specificity, or correlated structure). The current state of the
field of category-specific semantic deficits is characterized by complex models that inte­
grate assumptions from multiple theoretical frameworks. This trajectory of theoretical po­
sitions reflects the fact that, although theories that have been developed based on the
neural and correlated structure principles are mutually contrary as explanations about
the causes of category-specific semantic deficits, the individual assumptions that consti­
tute those theories are not necessarily incompatible as hypotheses about the structure of
knowledge in the brain (for discussion, see Caramazza & Mahon, 2003).

Neural Structure Principle


Multiple Semantics Assumption

The proposal that the organization of the semantic system follows the organization of the
various input and output modalities to and from the semantic system was initially pro­
posed by Beauvois (Beauvois, 1982; Beauvois et al., 1978). The original motivation for the
assumption of multiple semantics was the phenomenon of optic aphasia (e.g., Llermitte &
Beavuois, 1973; for review, see Plaut, 2002). Patients with optic aphasia present with im­
paired naming of visually presented objects, but relatively (or completely) spared naming
of the same objects when presented through the tactile modality (e.g., Hillis & Caramaz­
za, 1995). The fact that optic aphasic patients can name objects presented through the
tactile modality indicates that the naming impairment to visual presentation is not due to
a deficit at the level of retrieving the correct names. In contrast to patients with visual ag­
nosia (e.g., Milner et al., 1991), patients with optic aphasia can recognize, at a visual lev­
el of processing, the stimuli they cannot name. Evidence for this is provided by the fact
that some optic aphasic patients can demonstrate the correct use of objects that they can­
not name (e.g., Coslett & Saffran, 1992; Llermitte & Beauvois, 1973; see Plaut, 2002).
Beauvois (1982) explained the performance of optic aphasic patients by assuming that the
conceptual system is functionally organized into visual and verbal semantics and that op­
tic aphasia is due to a disconnection between the two semantic systems.

Along with reporting the first cases of category-specific semantic deficit, Warrington and
her collaborators (Warrington & McCarthy, 1983; Warrington & Shallice, 1984) developed
an influential explanation of the phenomenon that built on the proposal of Beauvois
(1982). Warrington and colleagues argued that category-specific semantic deficits are

Page 7 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

due to differential damage to a modality-specific semantic subsystem that is not itself or­
ganized by semantic category. Specifically, those authors noted that the patients they had
reported with impairments for living things also had impairments for foods, plants, and
precious stones (Warrington & Shallice 1984); in contrast, a patient with an impairment
for nonliving things (Warrington & McCarthy, 1983) was spared for living things, food,
and plant life. Warrington and her collaborators reasoned that the association of impaired
and spared categories was meaningfully related to the degree to which identification of
items from those categories depends on sensory or functional knowledge. Specifically,
they argued that the ability to identify living things differentially depends on sensory
knowledge, whereas the ability to identify nonliving things differentially depends on func­
tional knowledge.

Farah and McClelland (1991) implemented the theory of Warrington and colleagues in a
connectionist framework. Three predictions follow from the computational model of Farah
and McClelland (1991; for discussion, see Caramazza & Shelton, 1998). All three of those
predictions have now been tested. The first prediction is that the grain of category-specif­
ic semantic deficits should not be finer than living versus nonliving. This prediction fol­
lows from the assumption that all living things differentially depend on visual knowledge.
However, as represented in Figure 27.1, patients have been reported with selective se­
mantic impairments for fruit/vegetables (e.g., Hart et al., 1985; Laiacona et al., 2005;
Samson & Pillon, 2003) and animals (e.g., Blundo et al., 2006; Caramazza & Shelton,
1998). The second prediction is that an impairment for a given category of knowledge will
be associated with a disproportionate impairment for the modality of (p. 559) knowledge
that is critical for that category. At variance with this prediction, it is now known that cat­
egory-specific semantic deficits are associated with impairments for all types of knowl­
edge (sensory and functional) about items from the impaired category (see Figure 27.2A;
e.g., Blundo et al., 2006; Caramazza & Shelton, 1998; Laiacona & Capitani, 2001; Laia­
cona et al., 1993; Lambon Ralph et al., 1998; Moss et al., 1998). The third prediction is
that impairments for a type of knowledge will necessarily be associated with differential
impairments for the category that depends on that knowledge type. Patients exhibiting
patterns of impairment contrary to this prediction have been reported. For instance, Fig­
ure 27.2B shows the profile of a patient who was (1) more impaired for visual compared
with functional knowledge, and (2) if anything, more impaired for nonliving things than
living things (Lambon Ralph et al., 1998; see also Figure 27.2C, Figure 27.3, and discus­
sion below).

Page 8 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Second-Generation Sensory/Functional Theories

Figure 27.3 Category-specific patterns of BOLD re­


sponse in the healthy brain (data from Chao et al
2002; graphics provided by Alex Martin). This figure
shows in red, a network of regions that are differen­
tially activated for living animate things, and in blue,
a network of regions that are differentially activated
for nonliving things.

The original formulation of the sensory/functional theory was based on a simple division
between visual-perceptual knowledge and functional-associative knowledge. Warrington
and McCarthy (1987; see also Crutch & Warrington, 2003) suggested, however, that
knowledge of object color is differentially important for fruit/vegetables compared with
animals. Since Warrington and McCarthy, further sensory- and motor-based dimensions
that may be important for distinguishing between semantic categories have been articu­
lated (e.g., Cree & McRae, 2003; Vinson et al., 2003).

Cree and McRae (2003) used a feature-listing task to study the types of information that
normal subjects spontaneously associate with different semantic categories. The seman­
tic features were then classified into nine knowledge types: color, visual parts and surface
properties, visual motion, smell, sound, tactile, taste, function, and encyclopedic (see Vin­
son et al., 2003, for a slightly different classification). Hierarchical cluster analyses were
used to determine which semantic categories differentially loaded on which feature types.
The results of those analyses indicated that (1) visual motion and function information
were the two most important knowledge types for distinguishing living animate things
(high on visual motion information) from (p. 560) nonliving things (high on function infor­
mation); (2) living animate things were weighted lower on color information than fruit/
vegetables, but higher on this knowledge type than nonliving things; and (3) fruit/vegeta­
bles were distinguished from living animate and nonliving things by being weighted the
highest on both color and taste information.

Cree and McRae’s analyses support the claim that the taxonomy of nine knowledge types
is effective in distinguishing among the domains living animate, fruit/vegetables, and non­

Page 9 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

living. Those analyses do not demonstrate that the nine knowledge types are critical for
distinguishing among items within the respective categories. However, and as noted
above, patients with category-specific semantic impairments do not necessarily have diffi­
culty distinguishing between different domains (i.e., they might know it is an “animal,”
but cannot say which one). It is therefore not obvious that Cree and McRae’s analyses
support the claim that category-specific semantic deficits may be explained by assuming
damage to one (or more) of the nine knowledge types.

At a more general level, the open empirical question is whether the additional knowledge
types and the corresponding further functional divisions that are introduced into the se­
mantic system can account for the neuropsychological evidence. Clearly, if fruit/vegeta­
bles and animals are assumed to differentially depend on different types of information
(and by inference, different semantic subsystems), it is in principle possible to account for
the tripartite distinction between animals, fruit/vegetables, and nonliving. As for the origi­
nal formulation of the sensory/functional theory, the question is whether fine-grained cat­
egory-specific semantic impairments are associated with impairments for the type of
knowledge upon which items from the impaired category putatively depend. However, pa­
tients have been reported with category-specific semantic impairments for fruit/vegeta­
bles, without disproportionate impairments for color knowledge (e.g., Samson & Pillon,
2003). Patients have also been reported with impairment for knowledge of object color
without a disproportionate impairment for fruit/vegetables compared with other cate­
gories of objects (see Figure 27.2C; Luzzatti & Davidoff, 1994; Miceli et al., 2001).

Another way in which investigators have sought to provide support for the sensory/func­
tional theory is to study the semantic categories that are systematically impaired togeth­
er. As noted above, one profile of the first reported cases that motivated the development
of the sensory/functional theory (Warrington & Shallice, 1984) was that the categories of
animals, plants, and foods tended to be impaired or spared together. Those associations
of impairing and sparing of categories made sense if all of those categories depended on
the same modality-specific system for their identification. Following the same logic, it was
argued that musical instruments patterned with living things (because of the importance
of sensory attributes; see Dixon et al., 2000, for relevant data), whereas body parts pat­
terned with nonliving things (because of the importance of functional attributes associat­
ed with object use (e.g., Warrington & McCarthy, 1987). However, as was the case for the
dissociation between living animate (animals) and living inanimate (e.g., plants) things, it
is now known that musical instruments dissociate from living things, whereas body parts
dissociate from nonliving things (Caramazza & Shelton, 1998; Laiacona & Capitani, 2001;
Shelton et al., 1998; Silveri et al., 1997; Turnbull & Laws, 2000; for review and discus­
sion, see Capitani et al., 2003).

More recently, Borgo and Shallice (2001, 2003) have argued that sensory-quality cate­
gories, such as materials, edible substances, and drinks, are similar to animals in that
they depend on sensory information for their identification. Those authors reported that
impairment for living things was associated with impairments for sensory-quality cate­
gories. However, Laiacona and colleagues (2003) reported a patient who was impaired for

Page 10 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

living things but spared for sensory-quality categories (for further discussion, see Carroll
& Garrard 2005).

Another dimension that has been argued to be instrumental in accounting for category-
specific semantic deficits is differential similarity in the visual structure of items from dif­
ferent categories. Humphreys and Forde (2001; see also Tranel et al., 1997) argued that
living things tend to be more structurally similar than nonliving things. If that were the
case, then it could be argued that damage to a system not organized by object category
would result in disproportionate disruption of items that are more “confusable” (see also
Lambon Ralph et al., 2007; Rogers et al., 2004). Within Humphreys and Forde’s frame­
work, it is also assumed that activation dynamically cascades from visual object recogni­
tion processes through to lexical access. Thus, perturbation of visual recognition process­
es could trickle through the system to disrupt the normal functioning of subsequent
processes, resulting in a naming deficit (see Humphreys et al., 1988). (p. 561) Laws and
colleagues (Laws & Gale, 2002; Laws & Neve, 1999) also argued for the critical role of
similarity in visual structure for explaining category-specific semantic deficits. However,
in contrast to Humphreys and Forde (see also Tranel et al., 1997), Laws and colleagues
argued that nonliving things tend to be more similar than living things.

Clearly, there remains much work to be done to understand the role that visual similarity
and the consequent “crowding” (Humphreys & Forde 2001) of visual representations
have in explaining category-specific semantic deficits. On the one hand, there is no con­
sensus regarding the relevant object properties over which similarity should be calculat­
ed, or regarding how such a similarity metric should be calculated. On the other hand, as­
suming an “agreed on” means for determining similarity in visual shape, the question re­
mains open as to the role that such a factor might play in explaining the facts of category-
specific semantic deficits.

Domain-Specific Hypothesis

The domain-specific hypothesis of the organization of conceptual knowledge in the brain


(Caramazza & Shelton, 1998) assumes that the first-order constraint on the organization
of information within the conceptual system is object domain. The semantic categories
that may be organized by domain-specific constraints are limited to those that could have
had an evolutionarily relevant history: living animate, living inanimate, conspecifics, and
tools. On this proposal the phenomenon of category-specific semantic deficit reflects dif­
ferential or selective damage to the neural substrates that support one or another domain
of knowledge. Research from developmental psychology converges with the assumption
that conceptual knowledge is organized, in part, by innately specified constraints on ob­
ject knowledge (e.g., Baillargeon 1998; Carey & Spelke, 1994; Gallistel, 1990; R. Gelman,
1990; Keil, 1981; Spelke et al., 1992; Wellman & S. Gelman, 1992; see Santos & Caramaz­
za, 2002, for review; see, e.g., Kiani et al., 2007, for convergent findings using neurophys­
iological methods with nonhuman primates). Research in developmental psychology has
also highlighted other domains of knowledge beyond those motivated by neuropsychologi­
cal research on patients with category-specific deficits, such as number and geometric/

Page 11 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

spatial reasoning (e.g., Cantlon et al., 2009; Feigenson et al., 2004; Hermer & Spelke,
1994).

Unique predictions are generated by the original formulation of the domain-specific hy­
pothesis as it was articulated in the context of category-specific semantic deficits. One
prediction is that the grain of category-specific semantic deficits will reflect the grain of
those categories that could plausibly have had an evolutionarily relevant history (see Fig­
ure 27.1). Another prediction is that category-specific semantic impairments will be asso­
ciated with impairments for all types of knowledge about the impaired object type (see
Figure 27.2A). A third prediction made by the domain-specific hypothesis is that it should
be possible to observe category-specific impairments that result from early damage to the
brain. Evidence in line with this expectation is provided by the case of Adam (Farah & Ra­
binowitz 2003). Patient Adam, who was 16 years old at the time of testing, suffered a
stroke at 1 day of age. Adam failed to acquire knowledge of living things, despite normal
levels of knowledge about nonliving things. As would be expected within the framework
of the domain-specific hypothesis, Adam was impaired for both visual and nonvisual
knowledge of living things (Farah & Rabinowitz, 2003).

Correlated Structure Principle


Theories based on the correlated structure principle assume that the conceptual system
has no structure that is specifically reflected in functional neuroanatomy. For instance,
the organized unitary content hypothesis (OUCH; Caramazza et al., 1990) was initially
formulated as an explanation of optic aphasia that did not invoke the assumption of multi­
ple semantics. Caramazza and colleagues (1990; see also Riddoch et al., 1988) argued
that there are privileged relationships between certain types of input representations
(e.g., visual form) and certain types of output representations (e.g., knowledge of object
manipulation), thus explaining how optic aphasic patients might be spared for gesturing
to objects while impaired for naming them.

Other researchers subsequently developed highly specified proposals based on the corre­
lated structure principle, all of which build on the idea that different types of features are
differentially correlated across different semantic categories (Devlin et al., 1998; Rogers
et al., 2004; Tyler & Moss 2001). Those models of semantic memory have been imple­
mented computationally, with simulated damage, to provide existence proofs that a sys­
tem with no explicit functional organization may be damaged so as to produce category-
specific semantic deficits. Because theories based on the correlated structure principle
do not assume that the conceptual system has structure at the level of functional neu­
roanatomy, (p. 562) they are best suited to modeling the patterns of progressive loss of
conceptual knowledge observed in neurodegenerative diseases, such as dementia of the
Alzheimer type and semantic dementia. The type of damage in such patients is diffuse
and widespread and can be modeled in connectionist architectures by removing, to vary­
ing degrees, randomly selected components of the network.

Page 12 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

One important proposal is the conceptual-structure account of Tyler, Moss, and col­
leagues (Bright et al., 2005; Tyler & Moss, 2001). That proposal assumes that living
things have more shared features, whereas nonliving things have more distinctive fea­
tures. The model further assumes that the shared features of living things are highly cor­
related (has eyes/can see), whereas for nonliving things distinctive features are highly
correlated (used for spearing/has tines). If distinctive features are critical for identifica­
tion, and if greater correlation confers resilience to damage, then an interaction between
the severity of overall impairment and the direction of category-specific semantic deficit
is predicted. Mild levels of impairments should produce disproportionate impairments for
living things compared with nonliving things. At more severe levels of impairments, the
distinctive features of nonliving things will be lost, and a disproportionate impairment for
this category will be observed. The opposite prediction regarding the severity of overall
impairment and the direction of category-specific impairment is predicted by the account
of Devlin and colleagues (1998) because it is assumed that as damage becomes severe,
whole sets of inter-correlated features will be lost, resulting in a disproportionate impair­
ment for living things. However, it is now known that neither prediction finds clear empir­
ical support (Garrard et al., 1998; Zannino et al., 2002; see also Laiacona and Capitani,
2001, for discussion within the context of focal lesions; for further discussion and theoret­
ical developments, see Cree and McRae, 2003; Vinson et al., 2003).

One issue that is not resolved is whether correlations between different features should
be calculated in a concept-dependent or concept-independent manner (Zannino et al.,
2006). For instance, although the (“distinctive”) information “has tines” is highly correlat­
ed with the function “used for spearing” in the concept of “fork” (correlated as concept
dependent), the co-occurrence of those properties in the world is relatively low (concept
independent). Sartori, Lombardi, and colleagues (Sartori & Lombardi, 2004; Sartori et al.,
2005) have addressed a similar issue by developing the construct of “semantic rele­
vance,” which is computed through a nonlinear combination of the frequency with which
particular features are produced for an item and the distinctiveness of those features for
all concepts in the database. Those authors have shown that living things tend to be low­
er, on average, than nonliving things in terms of their relevance, thus making living
things on average “harder” than nonliving things. As is the case for other accounts of cat­
egory-specific semantic deficits that are based on differences across categories along a
single dimension, the existence of disproportionate deficits for the relatively “easy” cate­
gory (nonliving things) are difficult to accommodate (see, e.g., Hillis & Caramazza, 1991;
Laiacona & Capitani, 2001; see Figure 27.1D). Nevertheless, the theoretical proposal of
Sartori and colleagues highlights the critical and unresolved issue of how to determine
the “psychologically relevant” metric for representing feature correlations.

Another unresolved issue is whether high correlations between features will provide “re­
silience” to damage for those features, or rather will make damage “contagious” among
them. It is often assumed that high correlation confers resilience to, or insulation from,
damage; however, our understanding of how damage to one part of the brain affects oth­
er regions of the brain remains poorly developed. It is also not obvious that understand­
ing the behavior of connectionist architectures constitutes the needed motivation for de­
Page 13 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

ciding, one way or the other, whether greater correlation confers greater resilience to
damage. In fact, theoretical differences about the role of correlations in conferring re­
silience to damage are in part responsible for the contrasting predictions that follow from
the models of Tyler and colleagues (Tyler & Moss, 2001) and Devlin and colleagues
(1998) (see Zannino et al., 2006, for discussion).

Another example that illustrates our current lack of understanding of the role of correla­
tion in determining the patterns of impairment is provided by dissociations between sen­
sory, motor, and conceptual knowledge. For instance, the visual structure of objects is
highly correlated with more abstract knowledge of the conceptual features of objects.
Even so, patients with impairments to abstract conceptual features of objects do not nec­
essarily have corresponding impairments to object recognition processes (see above, and
Capitani et al., 2003, for review). Similarly, although manipulation knowledge (“how to”
knowledge) is correlated with functional knowledge (“what for” knowledge), damage to
the former does not imply damage to the latter (see Buxbaum et al., 2000; see Figure
27.3D and discussion below).

(p. 563) Theories based on the correlated structure principle are presented as alternatives
to proposals that assume neural structure within the conceptual system. The implicit as­
sumption in that argument is that the theoretical construct of a semantic feature offers a
means for reducing different categories to a common set of elements (see Rogers et al.,
2004, for an alternative proposal). There are, however, no semantic features that have
been described that are shared across semantic categories, aside from very abstract fea­
tures such as “has mass” (Strnad, Anzellotti, & Caramazza, 2011). In other words, in the
measure to which semantic features are the “substance” of conceptual representations,
different semantic categories would be represented by non-overlapping sets of features.
Thus, and as has been proposed on the basis of functional neuroimaging data (see, e.g.,
Haxby et al., 2001, and discussion below), it may be the case that regions of high feature
correlation (e.g., within semantic category correlations in visual structure) are reflected
in the functional neuroanatomy of the brain (see also Devlin et al., 1998, for a hybrid
model in which both focal and diffuse lesions can produce category-specific effects, and
Caramazza et al., 1990, for an earlier proposal along those lines).

Anatomy of Category Specificity


An important development in cognitive neuroscience that has paralleled the articulation
of theories of semantic organization is the discovery of multiple channels of visual pro­
cessing (Goodale & Milner, 1992; Ungerleider & Miskin, 1982). It is now known that visu­
al processing bifurcates into two independent but interconnected streams (for discussion
of how best to characterize the two streams, see Pisella et al., 2006). The ventral visual
object processing stream projects from V1 through ventral occipital and temporal cor­
tices, terminating in anterior regions of the temporal lobe, and subserves visual object
identification. The dorsal object processing stream projects from V1 through dorsal occip­
ital cortex to posterior parietal cortex and subserves object-directed action and spatial

Page 14 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

analysis for the purposes of object-directed grasping. The two-visual-systems hypothesis


has played a central role in understanding the neuroanatomy of category specificity.

Lesion Analyses

A natural issue to arise in neuropsychological research concerns which brain regions tend
to be lesioned in association with category-specific deficits. The first study to systemati­
cally address this issue was by H. Damasio and colleagues (1996). Those authors found
that name retrieval deficits for pictures of famous people were associated with left tempo­
ral pole lesions, a result confirmed by other investigators (see Lyons, et al., 2006, for an
overview). Damasio and colleagues also found that deficits for naming animals were asso­
ciated with (more posterior) lesions of anterior left ventral temporal cortex. Subsequent
research has confirmed that deficits for naming animals are associated with lesions to an­
terior regions of temporal cortex (e.g., Brambati et al., 2006). Damasio and collaborators
also found that deficits for naming tools were associated with lesions to posterior and lat­
eral temporal areas, overlapping the left posterior middle gyrus. The critical role of the
left posterior middle temporal gyrus for knowing about tools has also since been con­
firmed by other lesion studies (e.g., Brambati et al., 2006).

A subsequent report by H. Damasio and colleagues (2004) demonstrated that the same re­
gions were also reliably damaged in patients with impairments for recognizing stimuli
from those three categories. In addition, Damasio and colleagues (2004) found that
deficits for naming tools, as well as fruit/vegetables, were associated with lesions to the
inferior precentral and postcentral gyri and the insula. Consensus about the association
of lesions to the regions discussed above with category-specific deficits is provided by
Gainotti’s (e.g., 2000) analyses of published reports of patients with category-specific se­
mantic deficits.

A number of investigators have interpreted the differential role of anterior mesial aspects
of ventral temporal cortex in the processing of living things to reflect the fact that living
things have more shared properties than nonliving things, such that more fine-grained
discriminations are required to name them (Bright et al., 2005; Damasio et al., 2004; Sim­
mons & Barsalou, 2003; see also Humphreys et al., 2001). Within this framework, the as­
sociation of deficits to unique person knowledge and lesions to the most anterior aspects
of the temporal lobe is assumed to reflect the greater discrimination that is required for
distinguishing among conspecifics, compared with animals (less) and nonliving things
(even less).

Page 15 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Functional Imaging

Figure 27.4 Congenitally blind and sighted partici­


pants were presented with auditorily spoken words
of living things (animals) and nonliving things (tools,
nonmanipulable objects) and were asked to make
size judgments about the referents of the words. The
sighted participants were also shown pictures corre­
sponding to the same stimuli in a separate scan. For
sighted participants viewing pictures, the known
finding was replicated that nonliving things such as
tools and large nonmanipulable objects lead to differ­
ential neural responses in medial aspects of ventral
temporal-occipital cortex. This pattern of differential
BOLD responses for nonliving things in medial as­
pects of ventral temporal-occipital cortex was also
observed in congenitally blind participants and sight­
ed participants performing the size judgment task
over auditory stimuli. These data indicate that the
medial-to-lateral bias in the distribution of category-
specific responses does not depend on visual experi­
ence. For details of the study, see Mahon and Col­
leagues (2009).

Data from functional imaging, and in particular functional magnetic resonance imaging
(fMRI), have added in important ways to our understanding of how different semantic cat­
egories are processed in the healthy brain. In particular, although (p. 564) the lesion over­
lap approach is powerful in detecting brain regions that are critical for performing a giv­
en task, functional imaging has the advantage of detecting both regions that are critical
and regions that are automatically engaged by the mere presentation of a certain type of
stimulus. Thus, in line with the lesion evidence described above, nonliving things, and in
particular tools, differentially activate the left middle temporal gyrus (Figure 27.4A; e.g.,
Martin et al., 1996; Thompson-Schill et al., 1999; see Devlin et al., 2002, for review). Oth­
er imaging data indicate that this region plays an important role in processing the seman­
tics of actions (e.g., Martin et al., 1995; Kable et al., 2002; Kemmerer et al., 2008), as well

Page 16 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

as mechanical (i.e., unarticulated) motion (Beauchamp et al., 2002, 2003; Martin & Weis­
berg, 2003).

In contrast, and not as apparent in lesion studies, tools differentially activate dorsal
stream regions that mediate object-directed action. The activation of some of those re­
gions is independent of whether (p. 565) action information is necessary to perform the
task in which participants are engaged (e.g., picture naming). For instance, regions with­
in dorsal occipital cortex, posterior parietal cortex, through to the anterior intraparietal
sulcus, are automatically activated when participants observe manipulable objects (e.g.,
Chao & Martin, 2000; Culham et al., 2003; Fang & He, 2005; Frey et al., 2005). Those re­
gions are important for determining volumetric and spatial information about objects as
well as shaping and transporting the hand for object grasping. However, those dorsal oc­
cipital and posterior parietal regions are not thought to be critical for object identifica­
tion or naming (e.g., Goodale & Milner, 1992). Naming tools also differentially activates
the left inferior parietal lobule (e.g., Mahon et al., 2007; Rumiati et al., 2003), a structure
that is important for representing complex object-associated manipulations (e.g., for re­
view, see Johnson-Frey, 2004; Lewis, 2006).

One clear way in which functional imaging data have contributed beyond lesion evidence
to our understanding of category specificity in the brain is the description of highly con­
sistent topographic biases by semantic categories in the ventral object processing stream
(see Figure 27.4B and C; for reviews, see Bookheimer, 2002; Gerlach, 2002; Grill-Spector
& Malach, 2004; Op de Beeck et al., 2008; Thompson-Schill, 2003). As opposed to the an­
terior-posterior mapping of semantic categories within the ventral stream described by
the lesion evidence (e.g., Damasio et al., 1996), there is also a lateral-to-medial organiza­
tion. The fusiform gyrus on the ventral surface of the temporal-occipital cortex is critical
for representing object color and form (e.g., Martin, 2007; Miceli et al., 2001). Living ani­
mate things such as faces and animals elicit differential neural responses in the lateral
fusiform gyrus, whereas nonliving things (tools, vehicles) elicit differential neural re­
sponses in the medial fusiform gyrus (e.g., Chao et al., 1999; Mahon et al., 2007; Nop­
peney et al., 2006). Stimuli that are highly definable in terms of their spatial context, such
as houses and scenes, differentially activate regions anterior to these fusiform regions, in
the vicinity of parahippocampal cortex (e.g., Bar & Aminoff, 2003; Epstein & Kanwisher,
1998). Other visual stimuli also elicit consistent topographical biases in the ventral
stream, for instance, written words (see Dehaene et al., 2005, for discussion) and images
of body parts (e.g., Downing et al., 2001).

Distributed Domain-Specific Hypothesis


The basic phenomenon of consistent topographic biases by semantic category in the ven­
tral stream sits more naturally with the distributed domain-specific hypothesis than the
sensory/functional theory. To explain those data within the context of the sensory/func­
tional theory, further assumptions are necessary about why there would be an organiza­
tion by semantic category within the (putative) visual modality. In short, a hybrid model is

Page 17 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

required that combines the assumption of multiple semantics with some claim about how
information would come to be topographically segregated by semantic category. A num­
ber of such proposals have been advanced, although not always in the context of the sen­
sory/functional theory, or more generally within the context of theories that emerge from
category-specific semantic deficits (see, e.g., Gauthier et al., 2000; Haxby et al., 2001;
Ishai et al., 1999; Levy et al., 2001; Mechelli et al., 2006; Rogers et al., 2003).

To date, the emphasis of research on the organization of the ventral stream has been on
the stimulus properties that drive responses in a particular brain region, studied in rela­
tive isolation from other regions. This approach was inherited from well-established tradi­
tions in neurophysiology and psychophysics, where it has been enormously productive for
mapping psychophysical continua in primary sensory systems. It does not follow that the
same approach will yield equally useful insights for understanding the principles of the
neural organization of conceptual knowledge. The reason is that unlike the peripheral
sensory systems, the pattern of neural responses in higher order areas is only partially
driven by the physical input—it is also driven by how the stimulus is interpreted, and that
interpretation does not occur in a single, isolated region. The ventral object processing
stream is the central pathway for the extraction of object identity from visual information
in the primate brain—but what the brain does with that information about object identity
depends on how the ventral stream is connected to the rest of the brain.

We have extended the domain-specific hypothesis, as developed in the context of catego­


ry-specific semantic deficits, to explain the causes of category specificity in the ventral
stream (Caramazza & Mahon, 2003; Mahon & Caramazza, 2009). This reformulation of
the theory is referred to as the distributed domain-specific hypothesis. On that view, what
is given innately is the connectivity; specialization by semantic category in the ventral
stream is driven by that connectivity (Mahon & Caramazza, 2011). The implication of this
proposal is that the organization of the ventral stream by category is (p. 566) relatively in­
variant to visually based, bottom-up constraints. This approach corrects an imbalance in
explanations of the causes of the consistent topography by semantic category in the ven­
tral object processing stream by giving greater prominence to endogenously determined
constraints on brain organization.

An important characteristic of domain-specific systems is that the computations that must


be performed over items from the domain are sufficiently “eccentric” (Fodor, 1983 to mer­
it a specialized process. In other words, the coupling across different brain regions that is
necessary for successful processing of a given domain is different in kind from the types
of coupling that are needed for other domains of knowledge. For instance, the need to in­
tegrate motor-relevant information with visual information is present for tools and other
graspable objects and less so for animals or faces. In contrast, the need to integrate affec­
tive information, biological motion processing, and visual form information is strong for
conspecifics and animals, and less so for tools or places. Thus, our proposal is that do­
main-specific constraints are expressed as patterns of connectivity among regions of the
ventral stream and other areas of the brain that process nonvisual information about the
same classes of items. For instance, specialization for faces in the lateral fusiform gyrus

Page 18 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

(fusiform face area; Martin & Weisberg, 2003; Pasley et al., 2004; Vuilleumier et al., 2004)
arises because that region of the brain has connectivity with the amygdala and the supe­
rior temporal sulcus (among other regions), which are important for the extraction of so­
cially relevant information and biological motion. Specificity for tools and manipulable ob­
jects in the medial fusiform gyrus is driven, in part, by connectivity between that region
and regions of parietal cortex that subserve object manipulation (Mahon et al., 2007;
Noppeney et al., 2006; Rushworth et al., 2006; Valyear & Culham, 2009). Connectivity-
based constraints may also be responsible for other effects of category specificity in the
ventral visual stream, such as connectivity between somatomotor areas and regions of the
ventral stream that differentially respond to body parts (extrastriate body area; Astafiev
et al., 2004; Orlov et al., 2010; Peelen & Caramazza, 2010), connectivity between left lat­
eralized frontal language processing regions and ventral stream areas specialized for
printed words (visual word form area; Dehaene et al., 2005; Martin, 2006), and connectiv­
ity between regions involved in spatial analysis and ventral stream regions showing dif­
ferential responses to highly contextualized stimuli, such as houses, scenes, and large
nonmanipulable objects (parahippocampal place area; Bar & Aminoff, 2003).

Role of Visual Experience

According to the distributed domain-specific hypothesis, the organization by category in


the ventral stream not only is a reflection of the visual structure of the world but also re­
flects the structure of how ventral visual cortex is connected to other regions of the brain
(Mahon & Caramazza, 2009; Mahon et al., 2007; Riesenhuber, 2007). However, visual ex­
perience and dimensions of visual similarity are also critical in shaping the organization
of the ventral stream (Felleman & Van Essen, 1991; Op de Beeck et al., 2006)—after all,
the principal afferents to the ventral stream come from earlier stages in the visual hierar­
chy (Tanaka et al., 1991).

Although recent discussion has noted the possibility that nonvisual dimensions may be
relevant in shaping the organization of the ventral stream (Cant et al., 2009; Grill-Spector
& Malach, 2004; Martin, 2007), those accounts have given far greater prominence to the
role of visual experience in their explanation of the causes of category-specific organiza­
tion within the ventral stream. A number of hypotheses have been developed, and we
merely touch on them here to illustrate a common assumption: that the organization of
the ventral stream reflects the visual structure of the world, as interpreted by domain-
general processing constraints. Thus, the general thrust of those accounts is that the vi­
sual structure of the world is correlated with semantic category distinctions in a way that
is captured by how visual information is organized in the brain. One of the most explicit
proposals is that there are weak eccentricity preferences in higher order visual areas that
are inherited from earlier stages in the processing stream. Those eccentricity biases in­
teract with our experience of foveating some classes of items (e.g., faces) and viewing
others in the relative periphery (e.g., houses; Levy et al., 2001). Another class of propos­
als is based on the supposition that items from the same category tend to look more simi­
lar than items from different categories, and similarity in visual shape is mapped onto
ventral temporal occipital cortex (Haxby et al., 2001). It has also been proposed that a
Page 19 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

given category may require differential processing relative to other categories, for in­
stance, in terms of expertise (Gauthier et al., 1999), visual crowding (Rogers et al., 2005),
or the relevance of visual information for categorization (Mechelli et al., 2006). Still other
accounts (p. 567) appeal to “feature” similarity and distributed feature maps (Tyler et al.,
2003). Finally, it has been suggested that multiple, visually based dimensions of organiza­
tion combine super-additively to generate the boundaries among category-preferring re­
gions (Op de Beeck et al., 2008). Common to all of these accounts is the assumption that
visual experience provides the necessary structure, and that a visual dimension of organi­
zation happens to be highly correlated with semantic category.

Although visual information is important in shaping how the ventral stream is organized,
recent findings indicate that visual experience is not necessary for the same, or similar,
patterns of category specificity to be present in the ventral stream. In an early positron
emission tomography (PET) study, Büchel and colleague (1998) showed that congenitally
blind subjects have activation for words (presented in Braille) in the same region of the
ventral stream as sighted individuals (presented visually). Pietrini and colleagues (2004)
used multivoxel pattern analyses to show that the pattern of activation over voxels in the
ventral stream was more consistent across different exemplars within a category, than ex­
emplars across categories. More recently, we (Mahon et al., 2009) have shown that the
same medial-to-lateral bias in category preferences on the ventral surface of the occipital
temporal cortex that is present in sighted individuals is present in congenitally blind sub­
jects. Specifically, nonliving things, compared with animals, elicit stronger activation in
medial regions of the ventral stream (see Figure 27.3).

Although these studies on category specificity in blind individuals represent only a first-
pass analysis of the role of visual experience in driving category specificity in the ventral
stream, they indicate that visual experience is not necessary for category specificity to
emerge in the ventral stream. This fact raises an important question—if visual experience
is not needed for the same topographical biases in category specificity to be present in
the ventral stream, then, what drives such organization? One possibility, as we have sug­
gested, is innate connectivity between regions of the ventral stream and other regions of
the brain that process affective, motor, and conceptual information.

Connectivity as an Innate Domain-Specific Constraint

A critical component of the distributed domain-specific hypothesis is the notion of connec­


tivity. The most obvious candidate to mediate such networks is white matter connectivity.
However, it is important to underline that functional networks need not be restricted by
the grain of white matter connectivity, and perhaps more important, task- and state-de­
pendent changes may bias processing toward different components of a broader anatomi­
cal brain network. For instance, connectivity between lateral and orbital prefrontal re­
gions and ventral temporal-occipital cortex (Kveraga et al., 2007; Miller et al., 2003) is
critical for categorization of visual input. It remains an open question whether multiple
functional networks are subserved by this circuit, each determined by the type of visual
stimulus being categorized. For instance, when categorizing manipulable objects, connec­

Page 20 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

tivity between parietal-frontal somatomotor areas and prefrontal cortex may dominate,
whereas when categorizing faces other regions may express stronger functional coupling
to those same prefrontal regions. Such a suggestion would generate the expectation that
although damage to prefrontal-to-ventral stream connections may result in difficulties
categorizing all types of visual stimuli, disruption of the afferents to prefrontal cortex
from a specific category-preferring area could lead to categorization problems selective
to that domain.

Object-Associated Actions
The activation by tool stimuli of regions of the brain that mediate object-directed action
has been argued to follow naturally from the sensory/functional theory. On that theory,
the activation of dorsal structures by tool stimuli indexes the critical role of function
knowledge in the recognition of nonliving things (e.g., Boronat et al., 2004; Kellenbach et
al., 2003; Martin, 2000; Noppeney et al., 2006; Simmons & Barsalou, 2003). That argu­
ment is weakened, however, in the measure to which it is demonstrated that the integrity
of action knowledge is not necessary in order to have other types of knowledge about
tools, such as their function.

The neuropsychological phenomenon of apraxia offers a way of testing whether action


knowledge is critical for supporting conceptual processing of tools. Apraxia refers to an
impairment for using objects that cannot be explained by a deficit in visual object recog­
nition or an impairment to low-level motor processes themselves. Figure 27.5A
summarizes the performance profile of the patient reported by Ochipa and colleagues
(1989) who was impaired for using objects but relatively preserved for naming the same
objects (see also Figure 27.5B for similar dissociations in a series of single-case analyses;
Negri et al., 2007; see also Rosci et al., 2003; for clear (p. 568) cases studies, see Moreaud
et al., 1998; Rapcsak et al., 2001; Rumiati et al., 2001; see Rothi et al., 1991, for an influ­
ential cognitive model). Apraxic deficits for using objects are often observed subsequent
to lesions in the regions of the dorsal stream, reviewed above, that are automatically acti­
vated when participants name tools (in particular, the left inferior parietal lobule). The
fact that patients are able to name objects that they cannot use indicates that the activa­
tion of those regions during naming tasks is not, in and of itself, necessary for successful
completion of the task. At the same time, lesions to parietal cortex, in the context of le­
sions to the middle temporal gyrus or frontal motor areas, do modulate performance in
object identification. In a recent analysis (Mahon et al., 2007), a group of unilateral
stroke patients were separated into two groups according to the anatomical criterion of
having lesions involving (see Figure 27.5C, left) or not involving parietal cortex (see Fig­
ure 27.5C, right). There was a relationship between performance in object identification
and object use at the group level only in patients with lesions involving parietal cortex,
suggesting that action knowledge associated with objects is not irrelevant for successful
identification.

Page 21 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Other neuropsychological data indicate that the integrity of action knowledge is not nec­
essary for patients to have accurate knowledge of object function. Figure 27.5 depicts the
performance of patient WC (Buxbaum et al., 2000) on two picture-matching tasks. In a
picture-matching task that required knowledge of object manipulation, performance was
impaired; however, in a picture-matching task that required knowledge of object function,
performance was spared. Functional imaging studies (Boronat et al., 2004; Canessa et al.,
2008; Kellenbach et al., 2003) converge with those neuropsychological data in showing
that manipulation, but not function, knowledge modulates neural responses in the inferi­
or parietal lobule. There is also evidence, from both functional neuroimaging (e.g., Canes­
sa et al., 2008) and neuropsychology (e.g., Sirigu et al., 1991), that temporal, and not
parietal, cortex may be involved in the representation of function knowledge of objects.

The convergence between the neuropsychological evidence from apraxia and the func­
tional imaging evidence indicates that although there is a dedicated system for knowl­
edge of object manipulation, that system is not critically involved in representing knowl­
edge of object function. This suggests that the automatic engagement of action process­
ing by manipulable objects, as observed in neuroimaging, may have consequences for a
theory of pragmatics or action, but not necessarily for a theory of semantics (Goodale &
Milner, 1992; Jeannerod & Jacob, 2005). This in turn weakens the claim that automatic
activation of dorsal stream structures by manipulable objects is evidence for the sensory/
functional theory.

Relation Between Sensory, Motor, and Concep­


tual Knowledge
Early formulations of the sensory/functional theory assumed that conceptual content, al­
though tied in important ways to the sensory and motor systems, was more abstract than
the token-based information contained within the sensory and motor systems (Warrington
& McCarthy, 1983, 1987; Warrington & Shallice, 1984; see also Crutch & Warrington,
2003). More recent formulations of the multiple semantics approach have argued, within
the embodied cognition framework, that conceptual content can be reductively grounded
in sensory and motor processes (e.g., Barsalou, 1999, 2008; H. Damasio et al., 2004;
Gallese & Lakoff, 2005; Patterson et al., 2007; Pulvermüller, 2005; Prinz, 2002; Zwaan,
2004).

The first detailed articulation of the embodied cognition framework was by Allen Allport
(1985). Allport (1985) proposed that conceptual knowledge is organized according to sen­
sory and motor modalities and that the information represented within different modali­
ties was format specific:

The essential idea is that the same neural elements that are involved in the coding
the sensory attributes of a (possibly unknown) object presented to eye or hand or
ear also make up the elements of the auto-associated activity-patterns that repre­
sent familiar object-concepts in “semantic memory.” This model is, of course, in

Page 22 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

radical opposition to the view, apparently held by many psychologists, that “se­
mantic memory” is represented in some abstract, modality-independent, “concep­
tual” domain remote from the mechanisms of perception and motor organization.
(Allport, 1985, p. 53, emphasis original)

Figure 27.5 Relation between knowledge of how to


manipulate tools and other knowledge of tools. a.
(Upper Left). Ochipa and colleagues (1989) reported
a patient with a severe impairment for manipulating
objects but relatively preserved naming of the same
objects. b. (Upper Right). A multiple single case
study of unselected unilateral stroke patients asked
patients to use and identify the same set of objects (N
egri et al 2007). Performance of the patients is plot­
ted as t values (Crawford and Garthwaite 2006) com­
pared to control (n = 25) performance. c. Lesions to
parietal cortex, in the context of lesions to lateral
temporal and frontal regions, can be instrumental in
modulating the relationship between performance in
object identification and object use, at the group lev­
el (see Mahon et al 2007, Figure 7, for details and le­
sion overlap analyses). Each circle in the plots repre­
sents the performance of a single patient in object
identification and object use. The 95% confidence in­
tervals around the regression lines are shown. Re­
produced with permission from Mahon and col­
leagues (2007). d.(Lower Graph). Patient WC
(Buxbaum et al 2000) was impaired for matching pic­
tures based on how objects are manipulated but was
spared for matching pictures based on the function
of the objects.

One type of evidence, discussed above, that has been argued to support an embodied rep­
resentation of object concepts is the observation that regions of the brain that directly
mediate object-directed action are automatically activated when participants observe ma­
nipulable objects. However, the available neuropsychological evidence (see Figure 27.5)

Page 23 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

reduces confidence in the claim that action knowledge plays a critical role in
(p. 569)

grounding the diverse types of knowledge that we have about tools. The strongest evi­
dence for the relevance of motor and perceptual processes to conceptual processing is
provided by demonstrations that the sensory and motor systems are automatically en­
gaged by linguistic stimuli that imply action (e.g., Buccino et al., 2005; Boulenger et al.,
2006; Glenberg & Kaschak, 2002; Oliveri et al., 2004). It has also been demonstrated that
activation of the motor system automatically spreads to conceptual and perceptual levels
of processing (e.g., Pulvermüller et al., 2005).

The embodied cognition hypothesis makes strong predictions about the integrity of con­
ceptual processes after damage to sensory and motor processes. It predicts, necessarily,
and as Allport wrote, that “…the loss of particular attribute information in semantic mem­
ory should be accompanied by a corresponding perceptual (agnostic) deficit” (p. 55,
(p. 570) emphasis original). Although there are long traditions within neuropsychology of

studying patients with deficits for sensory and motor knowledge, only recently have those
deficits been of such clear theoretical relevance to hypotheses about the nature of seman­
tic memory. Systematic and theoretically informed studies of such patients will play a piv­
otal role in evaluating the relation between sensory, motor, and conceptual knowledge.
Central to that enterprise will be to specify how information is dynamically exchanged be­
tween systems, in the context of specific task requirements. This will be important for de­
termining the degree to which sensory and motor activation is in fact a critical compo­
nent of conceptual processing (see Machery, 2007; Mahon & Caramazza, 2008, for discus­
sion). It is theoretically possible (and in our view, likely) that although concepts are not
exhausted by sensory and motor information, the organization of “abstract” concepts is
nonetheless shaped in important ways by the structure of the sensory and motor systems.
It is also likely, in our view, that processing of such “abstract” conceptual content is heav­
ily interlaced with activation of the sensory and motor systems. We have referred to this
view as “grounding by interaction” (Mahon & Caramazza, 2008).

Grounding by Interaction: A Hypothesis about


the Representation of Conceptual Content
Consider the hypothetical apraxic patient with whom one might have a conversation
about hammers. The patient might be able to recount the history of the hammer as an in­
vention, the materials of which the first hammer was made, or what hammers typically
weigh. The patient may even look at a hammer and name it without apparent difficulty.
But when presented with a hammer, the patient is profoundly impaired at demonstrating
how the object is physically manipulated to accomplish its function. This impairment is
not due to a peripheral motor deficit because the patient may be able to imitate meaning­
less gestures without difficulty. What is the functional locus of damage in the patient? Has
the patient “lost” part of her concept of “hammer”?

Page 24 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

On one level, the patient clearly does retain the concept of hammer, and in this sense, the
concept of hammer is “symbolic,” “abstract,” and “qualitatively different” from the motor
“knowledge” that is compromised in the patient. On another level, when the patient in­
stantiates the abstract and symbolic concept of hammer, that instantiation occurs isolated
from sensory-motor information that, in the normal system, would go along with the in­
stantiation of the concept.

Thus, on the one hand, there is a level of representation of meaning that is sufficiently
general and flexible that it may apply to inputs from diverse sensory modalities and be ex­
pressed in action through diverse output modalities. The abstract and symbolic represen­
tation of hammer could be accessed from touch, vision, or audition; similarly, that repre­
sentation could be “expressed” by pantomiming the use of a hammer, producing the
sounds that make up the word hammer, writing the written word hammer, and so on. In
short, there is a level of conceptual representation that is abstract and symbolic and that
is not exhausted by information represented in the sensory and motor systems.

On the other hand, conceptual information that is represented at an abstract and symbol­
ic level does not, in and of itself, exhaust what we know about the world. What we know
about the world depends also on interactions between abstract conceptual content and
the sensory and motor systems. There are two ways in which such interactions may come
about. First, abstract and symbolic concepts can be activated by events in the world that
are processed by the sensory systems, and realize changes in the world through the mo­
tor system (see Jeannerod & Jacob, 2005, for relevant discussion). Second, the instantia­
tion of a given abstract and symbolic concept always occurs in a particular situation; as
such, the instantiation of that concept in that situation may involve highly specific senso­
ry and motor processes.

Within the grounding by interaction framework, sensory and motor information colors
conceptual processing, enriches it, and provides it with a relational context. The activa­
tion of the sensory and motor systems during conceptual processing serves to ground ab­
stract and symbolic representations in the rich sensory and motor content that mediates
our physical interaction with the world. Of course, the specific sensory and motor infor­
mation that is activated may change depending on the situation in which the abstract and
symbolic conceptual representation is instantiated.

On the grounding by interaction view, the specific sensory and motor information that
goes along with the instantiation of a concept is not constitutive of that concept. Of
course, that does not mean that that specific sensory and motor information is not impor­
tant for the instantiation of a concept, in a particular way, at a given point in time. In­
deed, (p. 571) such sensory and motor information may constitute, in part, that instantia­
tion. A useful analogy in this regard is to linguistic processing. There is no upper limit (in
principle) on the number of completely novel sentences that a speaker may utter. This
fact formed one of the starting points for formal arguments against the behaviorist para­
digm (Chomsky, 1959). Consider the (indefinite) set of sentences that a person may utter
in his life: Those sentences can have syntactic structures that are in no way specifically

Page 25 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

tied to the particular words through which the expression of those syntactic structures
were realized. The syntax of a sentence is not exhausted by an account of the words of
which it is composed; this is the case even though it may be the first time that that syn­
tactic structure has ever been produced, and even though the expression of that particu­
lar syntactic structure clearly depended (de facto) on the presence of those particular
words. To close the analogy: Concepts “wear” sensory and motor information in the way
that the syntax of a sentence “wears” particular words.

Toward a Synthesis
We have organized this review around theoretical explanations of category specificity in
the human brain. One theme that emerges is the historical progression from theories
based on a single principle of organization to theories that integrate multiple dimensions
of organization. This progression is due to the broad recognition in the field that a single
dimension will not be sufficient to explain all aspects of the organization of object knowl­
edge in the brain. However, every dimension or principle of organization is not of equal
importance. This is because all dimensions do not have the same explanatory scope. A rel­
ative hierarchy of principles is therefore necessary to determine which of the many
known facts are theoretically important and which are of only marginal significance
(Caramazza & Mahon, 2003).

Two broad findings emerge from cognitive neuropsychological research. First, patients
have been reported with disproportionate impairments for a modality or type of knowl­
edge (e.g., visual-perceptual knowledge—see Figure 27.2B; manipulation knowledge—see
Figure 27.5). Second, category-specific semantic deficits are associated with impairments
for all types of knowledge about the impaired category (see Figure 27.2A). Analogues to
those two facts are also found in functional neuroimaging. First, the attributes of some
categories of objects (e.g., tools) are differentially represented in modality specific sys­
tems (i.e., motor systems). Second, within a given modality specific system (e.g., ventral
visual pathway) there is functional organization by semantic category (e.g., living animate
vs. nonliving; see Figure 27.4 for an overview). Thus, across both neuropsychological
studies and functional imaging studies, the broad empirical generalization emerges that
there are two, orthogonal, constraints on the organization of object knowledge: object do­
main and sensory-motor modality. This empirical generalization is neutral with respect to
how one explains the causes of category-specific effects in both functional neuroimaging
and neuropsychology.

Many theoretical proposals of the causes of category specificity articulate dimensions


along which semantic categories differ (e.g., Cree & McRae, 2003; Devlin et al., 1998;
Gauthier et al., 2000; Haxby et al., 2001; Humphreys & Forde, 2001; Laws & Gale, 2002;
Levy et al., 2001; Mechelli et al., 2006; Op de Beeck et al., 2008; Rogers et al., 2004; Sar­
tori & Lombardi, 2004; Simmons & Barsalou, 2003; Tranel et al., 1997; Tyler & Moss,
2001; Warrington & Shallice, 1984; Zannino et al., 2006). Understanding the role that
such dimensions play in the genesis of category specificity in a particular part of the

Page 26 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

brain, or a particular component of a cognitive model, will be central to characterizing


the functioning of that component of the system. However, progress in understanding the
causes of category specificity in one region of the brain, or one functional component of a
cognitive model, will require an understanding of how category specificity is realized
throughout the whole brain and throughout the whole cognitive model.

All current theories of the organization of conceptual knowledge assume that a concept is
composed of distinct types of information. This shared assumption permits an explanation
of how thinking about a single concept (e.g., hammer) can engage different regions of the
brain that processes distinct types of information (e.g., sensory vs. motor). It also allows
for an account of how patients may present with impairments for a type or modality of
knowledge (e.g., know what a hammer looks like, but not know how to use it). However,
that assumption begs the question of how the different types of information that consti­
tute a given concept are functionally unified. A central theoretical issue to be addressed
by the field is to understand the nature of the mechanisms that unify different types of
knowledge about the same entity in the world, and that give rise to a functionally unitary
concept of that entity.

Our proposal, the distributed domain-specific hypothesis (Caramazza & Mahon,


(p. 572)

2003; Mahon & Caramazza, 2009, 2011), is that the organization of conceptual knowl­
edge in the brain reflects the final product of a complex tradeoff of pressures, some of
which are expressed locally within a given brain region, and some of which are expressed
as connectivity between that region and the rest of the brain. Our suggestion is that con­
nectivity within a domain-specific neural circuit is the first, or broadest, principle accord­
ing to which conceptual knowledge is organized. For instance, visual motion properties of
living animate things are represented in a different region or system than visual form
properties of living animate things. In addition, affective properties of living animate
things may be represented by other functionally and neuroanatomically distinct systems.
However, all those types of information constitute the domain “living animate.” For that
reason, it is critical to specify the nature of the functional connectivity that relates pro­
cessing across distinct subsystems specialized for different types of information. The ba­
sic expectation of the distributed domain-specific hypothesis is that the functional con­
nectivity that relates processing across distinct types of information (e.g., emotional val­
ue versus visual form) will be concentrated around those domains that have had evolu­
tionarily important histories. The strong prediction that follows from that view is that it is
those neural circuits that are disrupted or disorganized after brain damage in patients
with category-specific semantic deficits.

Independently of whether the distributed domain-specific hypothesis is empirically con­


firmed, it serves to highlight two key aspects of human conceptual processing. First, hu­
mans do not have systems that support rich conceptual knowledge of objects just to have
them. We have those systems because they serve action, and ultimately have been in the
service of survival (Goodale & Milner, 1992). An understanding of the architecture of the
conceptual system must therefore be situated in the context of the real-world computa­
tional problems that the conceptual system is structured to support. Second, human be­

Page 27 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

havior arises as a result of the integration of multiple cognitive processes that individual­
ly operate over distinct types of knowledge. On the distributed domain-specific hypothe­
sis, the distinct (and potentially modular) processes within the sensory, motor, and affec­
tive systems are components of broader structures within the mind/brain. This framework
thus emphasizes the need to understand how different types of cognitive processes, oper­
ating over different types of information, work in concert to orchestrate behavior.

In the more than 25 years since Warrington and colleagues’ first detailed reports of pa­
tients with category-specific semantic deficits, new fields of study have emerged around
the study of the organization and representation of conceptual knowledge. Despite that
progress, the theoretical questions that currently occupy researchers are the same as
those that were initially framed and debated two decades ago: What are the principles of
neural organization that give rise to effects of category specificity? Are different types of
information involved in processing different semantic categories, and if so, what distin­
guishes those different types of information? Future research will undoubtedly build on
the available theories and redeploy their individual assumptions within new theoretical
frameworks.

Author Note
Bradford Z. Mahon was supported in part by NIH training grant 5 T32 19942-13 and
R21NS076176-01A1; Alfonso Caramazza was supported by grant DC006842 from the Na­
tional Institute on Deafness and Other Communication Disorders. Sections of this article
are drawn from three previous publications of the same authors: Mahon and Caramazza,
2008, 2009, and 2011. The authors are grateful to Erminio Capitani, Marcella Laiacona,
Alex Martin, and Daniel Schacter for their comments on an earlier draft.

References
Allport, D. A. (1985). Distributed memory, modular subsystems and dysphasia. In S. K.
Newman & R. Epstein (Eds.), Current perspectives in dysphasia. New York: Churchill Liv­
ingstone.

Astafiev, S. V., et al. (2004). Extrastriate body area in human occipital cortex responds to
the performance of motor actions. Nature Neuroscience, 7, 542–548.

Baillargeon, R. (1998). Infants’ understanding of the physical world. In M. Sabourin, F.


Craik, & M. Robert (Eds.), Advances in psychological science: 2. Biological and cognitive
aspects, (pp. 503–529). London: Psychology Press.

Bar, M., & Aminoff, E. (2003). Cortical analysis of visual context. Neuron, 38, 347–358.

Barbarotto, R., Capitani, E., & Laiacona, M. (1996). Naming deficit in herpes simplex en­
cephalitis. Acta Neurologica Scandinavaca, 93, 272–280.

Page 28 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Barbarotto, R., Capitani, E., Spinnler, H., & Trivelli, C. (1995). Slowly progressive seman­
tic impairment with category specificity. Neurocase, 1, 107–119.

Barbarotto, R., Laiacona, M., Macchi, V., & Capitani, E. (2002). Picture reality decision,
semantic categories, and gender: A new set of pictures, with norms and an experimental
study. Neuropsychologia, 40, 1637–1653.

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Science,


(p. 573)

22, 637–660.

Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645.

Beauchamp, M. S., Lee, K. E., Haxby, J. V., & Martin, A. (2002.) Parallel visual motion pro­
cessing streams for manipulable objects and human movements. Neuron, 24, 149–159.

Beauchamp, M. S., Lee, K. E., Haxby, J. V., & Martin, A. (2003). FMRI responses to video
and point-light displays of moving humans and manipulable objects. Journal of Cognitive
Neuroscience, 15, 991–1001.

Beauvois, M.-F. (1982). Optic aphasia: A process of interaction between vision and lan­
guage. Philosophical Transacation of the Royal Society of London B, 298, 35–47.

Beauvois, M.-F., Saillant, B., Mhninger, V., & Llermitte, F. (1978). Bilateral tactile aphasia:
A tacto-verbal dysfunction. Brain, 101, 381–401.

Blundo, C., Ricci, M., & Miller, L. (2006). Category-specific knowledge deficit for animals
in a patient with herpes simplex encephalitis. Cognitive Neuropsychology, 23, 1248–1268.

Bookheimer, S. (2002). Functional MRI of language: New approaches to understanding


the cortical organization of semantic processing. Annual Review of Neuroscience, 25,
151–188.

Borgo, F., & Shallice, T. (2001). When living things and other “sensory-quality” categories
behave in the same fashion: A novel category-specific effect. Neurocase, 7, 201–220.

Borgo, F., & Shallice, T. (2003). Category specificity and feature knowledge: Evidence
from new sensory-quality categories. Cognitive Neuropsychology, 20, 327–353.

Boronat, C. B., Buxbaum, L. J., Coslett, H. B., Tang, K., Saffran, E. M., et al. 2004. Distinc­
tions between manipulation and function knowledge of objects: Evidence from functional
magnetic resonance imaging. Cognitive Brain Research, 23, 361–373.

Boulenger, V., Roy, A. C., Paulignan, Y., Deprez, V., Jeannerod, M., & Nazir, T. A. (2006).
Cross-talk between language processes and overt motor behavior in the first 200 msec of
processing. Journal of Cognitive Neuroscience, 18, 1607–1615.

Brambati, S. M., Myers, D., Wilson, A., Rankin, K. P., Allison, S. C., Rosen, H. J., Miller, B.
L., & Gorno-Tempini, M. L. (2006). The anatomy of category-specific object naming in
neurodegenerative diseases. Journal of Cognitive Neuroscience, 18, 1644–1653.
Page 29 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Bright, P., Moss, H. E., Stamatakis, E. A., & Tyler, L. K. (2005). The anatomy of object pro­
cessing: The role of anteromedial temporal cortex. Quarterly Journal of Experimental Psy­
chology B, 58, 361–377.

Buccino, G., Riggio, L., Melli, G., Binkofski, F., Gallese, V., & Rizzolatti, G. (2005). Listen­
ing to action related sentences modulates the activity of the motor system: A combined
TMS and behavioral study. Cognitive Brain Research, 24, 355–363.

Büchel, C., Price, C. J., & Friston, K. (1998). A multimodal language region in the ventral
visual pathway. Nature, 394, 274–277.

Buxbaum, L. J., Veramonti, T., & Schwartz, M. F. (2000). Function and manipulation tool
knowledge in apraxia: Knowing “what for” but not “how.” Neurocase, 6, 83–97.

Canessa, N., Borgo, F., Cappa, S. F., Perani, D., Falini, A., Buccino, G., Tettamanti, M., &
Shallice, T. (2008). The different neural correlates of action and functional knowledge in
semantic memory: An fMRI study. Cerebral Cortex, 18, 740–751.

Cant, J. S. et al. (2009) fMR-adaptation reveals separate processing regions for the per­
ception of form and texture in the human ventral stream. Experimental Brain Research,
192, 391–405.

Cantlon, J. F., Platt, M., & Brannon, E. M. (2009). The number domain. Trends in Cogni­
tive Sciences, 13 (2), 83–91.

Capitani, E., Laiacona, M., Mahon, B., & Caramazza, A. (2003). What are the facts of cate­
gory-specific deficits? A critical review of the clinical evidence. Cognitive Neuropsycholo­
gy, 20, 213–262.

Caramazza, A. (1986). On drawing inferences about the structure of normal cognitive sys­
tems from the analysis of patterns of impaired performance: The case for single-patient
studies. Brain and Cognition, 5, 41–66.

Caramazza, A. (1992). Is cognitive neuropsychology possible? Journal of Cognitive Neuro­


science, 4, 80–95.

Caramazza, A. (1998). The interpretation of semantic category-specific deficits: What do


they reveal about the organization of conceptual knowledge in the brain? Neurocase, 4,
265–272.

Caramazza, A., Hillis, A. E., Rapp, B. C., & Romani, C. (1990). The multiple semantics hy­
pothesis: Multiple confusions? Cognitive Neuropsychology, 7, 161–189.

Caramazza, A., & Mahon, B. Z. (2003). The organization of conceptual knowledge: The ev­
idence from category-specific semantic deficits. Trends in Cognitive Sciences, 7, 354–361.

Page 30 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Caramazza, A., & Mahon, B. Z. (2006). The organisation of conceptual knowledge in the
brain: The future’s past and some future directions. Cognitive Neuropsychology, 23, 13–
38.

Caramazza, A., & Shelton, J. R. (1998). Domain specific knowledge systems in the brain:
The animate-inanimate distinction. Journal of Cognitive Neuroscience, 10, 1–34.

Carey, S., & Spelke, E. S. (1994). Domain specific knowledge and conceptual change. In
L. Hirschfeld & S. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and
culture (pp. 169–200). Cambridge, UK: Cambridge University Press.

Carroll, E., & Garrard, P. (2005). Knowledge of living, nonliving and “sensory quality” cat­
egories in semantic dementia. Neurocase, 11, 338–350.

Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates in posteri­
or temporal cortex for perceiving and knowing about objects. Nature Neuroscience, 2,
913–919.

Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. NeuroImage, 12, 478–484.

Chao, L. L., Weisberg, J., & Martin, A. (2002). Experience-dependent modulation of cate­
gory related cortical activity. Cerebral Cortex, 12, 545–551.

Coslett, H. B., & Saffran, E. M. (1992). Optic aphasia and the right hemisphere: A replica­
tion and extension. Brain and Language, 43, 148–161.

Crawford, J. R., & Garthwaite, P. H. (2006). Methods of testing for a deficit in single case
studies: Evaluation of statistical power by Monte Carlo simulation. Cognitive Neuropsy­
chology, 23, 877–904.

Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and com­
putation of the meaning of chipmunk, cherry, chisel, cheese, and cello and many other
such concrete nouns Journal of Experimental Psychology: General, 132, 163–201.

Crutch, S. J., & Warrington, E. K. (2003). The selective impairment of fruit and
(p. 574)

vegetable knowledge: A multiple processing channels account of fine-grain category


specificity. Cognitive Neuropsychology, 20, 355–372.

Culham, J. C., Danckert, S. L., DeSourza, J. F. X., Gati, J. S., Menon, R. S., & Goodale, M. A.
(2003). Visually guided grasping produces fMRI activation in dorsal but not ventral
stream brain areas. Experimental Brain Research, 153, 180–189.

Damasio, H., Grabowski, T. J., Tranel, D., & Hichwa, R. D. (1996). A neural basis for lexi­
cal retrieval. Nature, 380, 499–505.

Damasio, H., Tranel, D., Grabowski, T., Adolphs, R., & Damasio, A. (2004). Neural systems
behind word and concept retrieval. Cognition, 92, 179–229.

Page 31 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Dehaene, S., Cohen, L., Sigman, M., & Vinckier, F. (2005). The neural code for written
words: A proposal. Trends in Cognitive Sciences, 9, 335–341.

Devlin, J., Gonnerman, L., Andersen, E., & Seidenberg, M. (1998). Category-specific se­
mantic deficits in focal and widespread brain damage: A computational account. Journal
of Cognitive Neuroscience, 10, 77–94.

Devlin, J. T., Moore, C. J., Mummery, C. J., Gorno-Tempini, M. L., Phillips, J. A., Noppeney,
U., Frackowiak, R. S. J., Friston, K. J., & Price, C. J. (2002). Anatomic constraints on cogni­
tive theories of category-specificity. NeuroImage, 15, 675–685.

Dixon, M. J., Piskopos, M., & Schweizer, T. A. (2000). Musical instrument naming impair­
ments: The crucial exception to the living/nonliving dichotomy in category-specific ag­
nosia. Brain and Cognition, 43, 158–164.

Duchaine, B. C., & Yovel, G. (2008). Face recognition. The Senses: A Comprehensive Ref­
erence, 2, 329–357.

Duchaine, B. C., Yovel, G., Butterworth, E. J., & Nakayama, K. (2006). Prosopagnosia as
an impairment to face-specific mechanisms: Elimination of the alternative hypotheses in a
developmental case. Cognitive Neuropsychology, 23, 714–747.

Eggert, G. H. (1977). Wernicke’s works on aphasia: A sourcebook and review (Vol. 1). The
Hague: Mouton.

Ellis, A. W., Young, A. W., & Critchley, A. M. R. (1989). Loss of memory for people follow­
ing temporal lobe damage. Brain, 112, 1469–1483.

Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­
ment. Nature, 392, 598–601.

Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and
ventral pathways. Nature Neuroscience, 8, 1380–1385.

Farah, M., & McClelland, J. (1991). A computational model of semantic memory impair­
ment: modality specificity and emergent category specificity. Journal of Experimental Psy­
chology: General, 120, 339–357.

Farah, M. J., & Rabinowitz, C. (2003). Genetic and environmental influences on the orga­
nization of semantic memory in the brain: Is “living things” an innate category? Cognitive
Neuropsychology, 20, 401–408.

Feigenson, L., Dehaene, S., & Spelke, E. S. (2004). Core systems of number. Trends in
Cognitive Sciences, 8, 307–314.

Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in primate
visual cortex. Cerebral Cortex, 1, 1–47.

Fodor, J. (1983). Modularity of mind. Cambridge, MA: MIT Press.


Page 32 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Funnell, E., & Sheridan, J. (1992). Categories of knowledge: Unfamiliar aspects of living
and nonliving things. Cognitive Neuropsychology, 9, 135–153.

Gainotti, G. (2000). What the locus of brain lesion tells us about the nature of the cogni­
tive defect underlying category-specific disorders: A review. Cortex, 36, 539–559.

Gallistel, C. R. (1990). The organization of learning. Cambridge, MA: Bradford/MIT Press.

Gallese, V., & Lakoff, G. (2005). The brain’s concepts: The role of the sensory-motor sys­
tem in reason and language. Cognitive Neuropsychology, 22, 455–479.

Garrard, P., Patterson, K., Watson, P. C., & Hodges, J. R. (1998). Category-specific seman­
tic loss in dementia of Alzheimer’s type: Functional-anatomical correlations from cross-
sectional analyses. Brain, 121, 633–646.

Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and
birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197.

Gauthier, I., et al. (1999) Activation of the middle fusiform “face area” increases with ex­
pertise in recognizing novel objects. Nature Neuroscience, 2, 568–573.

Gelman, R. (1990). First principles organize attention to and learning about relevant da­
ta: Number and the animate-inanimate distinction as examples. Cognitive Science, 14,
79–106.

Gerlach, C. (2007). A review of functional imaging studies on category specificity. Journal


of Cognitive Neuroscience, 19, 296–314.

Glenberg, A. M., & Kaschak, M. P. (2002). Grounding language in action. Psychonomic


Bulletin and Review, 9, 558–565.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­
tion. Trends in Neurosciences, 15, 20–25.

Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review Neuro­
science, 27, 649–677.

Hart, J., Anand, R., Zoccoli, S., Maguire, M., Gamino, J., Tillman, G., King, R., & Kraut, M.
A. (2007). Neural substrates of semantic memory. Journal of the International Neuropsy­
chology Society, 13, 865–880.

Hart, J., Jr., Berndt, R. S., & Caramazza, A. (1985). Category-specific naming deficit fol­
lowing cerebral infarction. Nature, 316, 439–440.

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal cor­
tex. Science, 293, 2425–2430.

Page 33 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Hécaen, H., & De Ajuriaguerra, J. (1956). Agnosie visuelle pour les objets inanimées par
lésion unilatérale gauche. Révue Neurologique, 94, 222–233.

Hermer, L., & Spelke, E. S. (1994). A geometric process for spatial reorientation in young
children. Nature, 370, 57–59.

Hillis, A. E., & Caramazza, A. (1991). Category-specific naming and comprehension im­
pairment: A double dissociation. Brain, 114, 2081–2094.

Hillis, A. E., & Caramazza, A. (1995). Cognitive and neural mechanisms underlying visual
and semantic processing: Implications from “optic aphasia.” Journal of Cognitive Neuro­
science, 7, 457–478.

Humphreys, G. W., & Forde, E. M. E. (2001). Hierarchies, similarity, and interactivity in


object recognition: “Category-specific” neuropsychological deficits. Behavioral and Brain
Science, 24, 453–475.

Humphreys, G. W., & Forde, E. M. E. (2005). Naming a giraffe but not an animal: Base-
level but not superordinate naming in a patient with impaired semantics. Cognitive Neu­
ropsychology, 22, 539–558.

Humphreys, G. W., Riddoch, M. J., & Quinlan, P. T. (1988). Cascade processes in


(p. 575)

picture identification. Cognitive Neuropsychology, 5, 67–103.

Ishai, A., Ungerleider, L. G., Martin, A., Schourten, J. L., & Haxby, J. V. (1999). Distributed
representation of objects in the human ventral visual pathway. Proceedings of the Nation­
al Academy of Sciences U S A, 96, 9379–9384.

Jeannerod, M., & Jacob, P. (2005). Visual cognition: A new look at the two-visual systems
model. Neuropsychologia, 43, 301–312.

Johnson-Frey, S. H. (2004). The neural bases of complex tool use in humans. Trends in
Cognitive Sciences, 8, 71–78.

Kable, J. W., Lease-Spellmeyer, J., & Chatterjee, A. (2002). Neural substrates of action
event knowledge. Journal of Cognitive Neuroscience, 14, 795–805.

Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological Re­


view, 88, 197–227.

Kellenbach, M. L., Brett, M., & Patterson, K. (2003). Actions speak louder than functions:
The importance of manipulability and action in tool representation. Journal of Cognitive
Neuroscience, 15, 20–46.

Kemmerer, D., Gonzalez Castillo, J., Talavage, T., Patterson, S., & Wiley, C. (2008). Neu­
roanatomical distribution of five semantic components of verbs: Evidence from fMRI.
Brain and Language, 107, 16–43.

Page 34 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in re­
sponse patterns of neuronal population in monkey inferior temporal cortex. Journal of
Neurophysiology, 97, 4296–4309.

Kveraga, K., et al. (2007). Magnocellular projections as the trigger of top-down facilita­
tion in recognition. Journal of Neuroscience, 27, 13232–13240.

Laiacona, M., Barbarotto, R., & Capitani, E. (1993). Perceptual and associative knowledge
in category specific impairment of semantic memory: A study of two cases. Cortex, 29,
727–740.

Laiacona, M., Barbarotto, R., & Capitani, E. (1998). Semantic category dissociation in
naming: Is there a gender effect in Alzheimer disease? Neuropsychologia, 36, 407–419.

Laiacona, M., Barbarotto, R., & Capitani, E. (2005). Animals recover but plant life knowl­
edge is still impaired 10 years after herpetic encephalitis: the long-term follow-up of a pa­
tient. Cognitive Neuropsychology, 22, 78–94.

Laiacona, M., Barbarotto, R., & Capitani, E. (2006). Human evolution and the brain repre­
sentation of semantic knowledge: Is there a role for sex differences? Evolution and Hu­
man Behaviour, 27, 158–168.

Laiacona, M., & Capitani, E. (2001). A case of prevailing deficit on nonliving categories or
a case of prevailing sparing of living categories? Cognitive Neuropsychology, 18, 39–70.

Laiacona, M., Capitani, E., & Caramazza, A. (2003). Category-specific semantic deficits do
not reflect the sensory-functional organisation of the brain: A test of the “sensory-quality”
hypothesis. Neurocase, 9, 3221–3231.

Lambon Ralph, M. A., Howard, D., Nightingale, G., & Ellis, A. W. (1998). Are living and
non-living category-specific deficits causally linked to impaired perceptual or associative
knowledge? Evidence from a category-specific double dissociation. Neurocase, 4, 311–
338.

Lambon Ralph, M. A., Lowe, C., & Rogers, T. T. (2007). Neural basis of category-specific
semantic deficits for living things: Evidence from semantic dementia, HSVE and a neural
network model. Brain, 130, 1127–1137.

Laws, K. R., & Gale, T. M. (2002). Category-specific naming and the “visual” characteris­
tics of line drawn stimuli. Cortex, 38, 7–21.

Laws, K. R., & Neve, C. (1999). A “normal” category-specific advantage for naming living
things. Neuropsychologia, 37, 1263–1269.

Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organi­
zation of human object areas. Nature Neuroscience, 4, 533–539.

Page 35 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Lewis, J. W. (2006). Cortical networks related to human use of tools. Neuroscientist, 12,
211–231.

Lhermitte, F., & Beauvois, M.-F. (1973). A visual speech disconnection syndrome: Report
of a case with optic aphasia, agnosic alexia and color agnosia. Brain, 96, 695–714.

Luzzatti, C., & Davidoff, J. (1994). Impaired retrieval of object-color knowledge with pre­
served color naming. Neuropsychologia, 32, 1–18.

Lyons, F., Kay, J., Hanley, J. R., & Haslam, C. (2006). Selective preservation of memory for
people in the context of semantic memory disorder: Patterns of association and dissocia­
tion. Neuropsychologia, 44, 2887–2898.

Mahon, B. Z., & Caramazza, A. (2003). Constraining questions about the organisation and
representation of conceptual knowledge. Cognitive Neuropsychology, 20, 433–450.

Mahon, B. Z., & Caramazza, A. (2005). The orchestration of the sensory-motor systems:
Clues from neuropsychology. Cognitive Neuropsychology, 22, 480–494.

Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothe­
sis and a new proposal for grounding conceptual content. Journal of Physiology—Paris,
102, 50–70.

Mahon, B. Z., & Caramazza, A. (2011). The distributed domain-specific hypothesis. Trends
in Cognitive Sciences. In Production

Mahon, B. Z., Milleville, S., Negri, G. A. L., Rumiati, R. I., Martin, A., & Caramazza, A.
(2007). Action-related properties of objects shape object representations in the ventral
stream. Neuron, 55, 507–520.

Mahon, B. Z., et al. (2009) Category-specific organization in the human brain does not re­
quire visual experience. Neuron, 63, 397–405.

Mahon, B. Z., & Caramazza, A. (2009). Concepts and categories: A cognitive neuropsy­
chological perspective. Annual Review of Psychology, 60, 1–15.

Machery, E. (2007). Concept empiricism: A methodological critique. Cognition, 104, 19–


46.

Martin, A. (2006). Shades of Déjerine—Forging a causal link between the visual word
form area and reading. Neuron, 50, 173–175.

Martin, A. (2007). The representation of object concepts in the brain. Annual Review Psy­
chology, 58, 25–45.

Martin, A., Haxby, J. V., Lalonde, F. M., Wiggs, C. L., & Ungerleider, L. G. (1995). Discrete
cortical regions associated with knowledge of color and knowledge of action. Science,
270, 102–105.

Page 36 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Martin, A., & Weisberg, J. (2003). Neural foundations for understanding social and me­
chanical concepts. Cognitive Neuropsychology, 20, 575–587.

McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach to
semantic cognition. Nature Reviews Neuroscience, 4, 310–322.

Mechelli, A., Sartori, G., Orlandi, P., & Price, C. J. (2006). Semantic relevance explains
category effects in medial fusiform gyri. NeuroImage, 3, 992–1002.

Miceli, G., Capasso, R., Daniele, A., Esposito, T., Magarelli, M., & Tomaiuolo, F.
(p. 576)

(2000). Selective deficit for people’s names following left temporal damage: An impair­
ment of domain-specific conceptual knowledge. Cognitive Neuropsychology, 17, 489–516.

Miceli, G., Fouch, E., Capasso, R., Shelton, J. R., Tamaiuolo, F., & Caramazza, A. (2001).
The dissociation of color from form and function knowledge. Nature Neuroscience, 4,
662–667.

Miller, E. K., et al. (2003). Neural correlates of categories and concepts. Current Opinion
in Neurobiology, 13, 198–203.

Milner, A. D., Perrett, D. I., Johnson, R. S., Benson, O. J., Jordan, T. R., et al. (1991). Per­
ception and action “visual form agnosia.” Brain, 114, 405–428.

Mitchell, J. P., Heatherton, T. F., & Macrae, C. N. (2002). Distinct neural systems subserve
person and object knowledge. Proceedings of the National Academy of Sciences U S A,
99, 15238–15243.

Moreaud, O., Charnallet, A., & Pellat, J. (1998). Identification without manipulation: A
study of the relations between object use and semantic memory. Neuropsychologia, 36,
1295–1301.

Morris, J. S., öhman, A., & Dolan, R. J. (1999). A subcortical pathway to the right amyg­
dala mediating “unseen” fear. Proceedings of the National Academy of Sciences U S A, 96,
1680–1685.

Moscovitch, M., Winocur, G., & Behrmann, M. (1997). What is special about face recogni­
tion? Nineteen experiments on a person with visual object agnosia and dyslexia but with
normal face recognition. Journal of Cognitive Neuroscience, 9, 555–604.

Moss, H. E., Tyler, L. K., Durrant-Peatfield, M., & Bunn, E. M. (1998). “Two eyes of a see-
through”: Impaired and intact semantic knowledge in a case of selective deficit for living
things. Neurocase, 4, 291–310.

Negri, G. A. L., Rumiati, R. I., Zadini, A., Ukmar, M., Mahon, B. Z., & Caramazza, A.
(2007). What is the role of motor simulation in action and object recognition? Evidence
from apraxia. Cognitive Neuropsychology, 24, 795–816.

Page 37 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Noppeney, U., Price, C. J., Penny, W. D., & Friston, K. J. (2006). Two distinct neural mecha­
nisms for category-selective responses. Cerebral Cortex, 16, 437–445.

Nunn, J. A., & Pearson, R. (2001). Developmental prosopagnosia: Should it be taken at


face value? Neurocase, 7, 15–27.

Ochipa, C., Rothi, L. J. G., & Heilman, K. M. (1989). Ideational apraxia: A deficit in tool se­
lection and use. Annals of Neurology, 25, 190–193.

Oliveri, M., Finocchiaro, C., Shapiro, K., Gangitano, M., Caramazza, A., & Pascual-Leone,
A. (2004). All talk and no action: A transcranial magnetic stimulation study of motor cor­
tex activation during action word production. Journal of Cognitive Neuroscience, 16, 374–
381.

Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data:
Maps, modules, and dimensions. Nature Reviews Neuroscience, 9, 123–135.

Op de Beeck, H. P., et al. (2006). Discrimination training alters object representations in


human extrastriate cortex. Journal of Neuroscience, 26, 13025–13036.

Orlov, T., et al. (2010). Topographic representation of the human body in the occipitotem­
poralcortex. Neuron, 68, 586–600.

Pasley, B. N., Mayes, L. C., & Schultz, R. T. (2004). Subcortical discrimination of unper­
ceived objects during binocular rivalry. Neuron, 42, 163–172.

Patterson, K., Nestor, P. J., & Rogers, T. (2007). What do you know what you know? The
representation of semantic knowledge in the brain. Nature Neuroscience Reviews, 8,
976–988.

Peelen, M. V., & Caramazza, A. (2010) What body parts reveal about the organization of
the brain. Neuron, 68, 331–333.

Pietrini, P., et al. (2004) Beyond sensory images: Object-based representation in the hu­
man ventral pathway. Proceedings of the National Academy of Sciences U S A, 101, 5658–
5663.

Pisella, L., Binkofski, B. F., Lasek, K., Toni, I., & Rossetti Y. 2006. No double-dissociation
between optic ataxia and visual agnosia: Multiple sub-streams for multiple visuo-manual
integrations. Neuropsychologia, 44, 2734–2748.

Plaut, D. C. (2002). Graded modality-specific specialization in semantics: a computational


account of optic aphasia. Cognitive Neuropsychology, 19, 603–639.

Polk, T. A., Park, J., Smith, M. R., & Park, D. C. (2007). Nature versus nurture in ventral vi­
sual cortex: A functional magnetic resonance imaging study of twins. Journal of Neuro­
science, 27, 13921–13925.

Page 38 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Prinz, J. J. (2002). Furnishing the mind: Concepts and their perceptual basis. Cambridge,
MA: MIT Press.

Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews
Neuroscience, 6, 576–582.

Pulvermüller, F., Hauk, O., Nikolin, V. V., & Ilmoniemi, R. J. (2005). Functional links be­
tween language and motor systems. European Journal of Neuroscience, 21, 793–797.

Rapcsak, S. Z., Ochipa, C., Anderson, K. C., & Poizner, H. (1995). Progressive ideomotor
apraxia: Evidence for a selective impairment in the action production system. Brain and
Cognition, 27, 213–236.

Riddoch, M. J., Humphreys, G. W., Coltheart, M., & Funnell, E. (1988). Semantic systems
or system? Neuropsychological evidence re-examined. Cognitive Neuropsychology, 5, 3–
25.

Riesenhuber, M. (2007). Appearance isn’t everything: News on object representation in


cortex. Neuron, 55, 341–344.

Rogers, T. T., Hocking, J., Mechelli, A., Patterson, K., & Price, C. J. (2003). Fusiform activa­
tion to animals is driven by the process, not the stimulus. Journal of Cognitive Neuro­
science, 17, 434–445.

Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., et al. (2004).
Structure and deterioration of semantic memory: A neuropsychological and computation­
al investigation. Psychological Review, 111, 205–235.

Rogers, T. T., et al. (2005). Fusiform activation to animals is driven by the process, not the
stimulus. Journal of Cognitive Neuroscience, 17, 434–445.

Rosci, C., Chiesa, V., Laiacona, M., & Capitani, E. (2003). Apraxia is not associated to a
disproportionate naming impairment for manipulable objects. Brain and Cognition, 53,
412–415.

Rothi, L. J., Ochipa, C., & Heilman, K. M. (1991). A cognitive neuropsychological model of
limb praxis. Cognitive Neuropsychology, 8, 443–458.

Rumiati, R. L., Zanini, S., & Vorano, L. (2001). A form of ideational apraxia as a selective
deficit of contention scheduling. Cognitive Neuropsychology, 18, 617–642.

Rushworth, M. F. S., et al. (2006). Connection patterns distinguish 3 regions of human


parietal cortex. Cerebral Cortex, 16, 1418–1430.

Sacchett, C., & Humphreys, G. W. (1992). Calling a squirrel a squirrel but a canoe a wig­
wam: A category-specific deficit for artifactual objects and body parts. Cognitive Neu­
ropsychology, 9, 73–86.

Page 39 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Samson, D., & Pillon, A. (2003). A case of impaired knowledge for fruit and veg­
(p. 577)

etables. Cognitive Neuropsychology, 20, 373–400.

Santos, L. R., & Caramazza, A. (2002). The domain-specific hypothesis: A developmental


and comparative perspective on category-specific deficits. In E. M. E. Forde & G. W.
Humphreys (Eds.), Category-specificity in the brain and mind. New York: Psychology
Press.

Sartori, G., & Lombardi, L. (2004). Semantic relevance and semantic disorders. Journal of
Cognitive Neuroscience, 16, 439–452.

Sartori, G., Lombardi, L., & Mattiuzzi, L. (2005). Semantic relevance best predicts normal
and abnormal name retrieval. Neuropsychologia, 43, 754–770.

Shallice, T. (1988). From neuropsychology to mental structure. Cambridge, UK: Cam­


bridge University Press.

Shallice, T. (1993). Multiple semantics: Whose confusions? Cognitive Neuropsychology,


10, 251–261.

Shelton, J. R., Fouch, E., & Caramazza, A. (1998). The selective sparing of body part
knowledge: A case study. Neurocase, 4, 339–351.

Silveri, M. C., Gainotti, G., Perani, D., Cappelletti, J. Y., Carbone, G., & Fazio, F. (1997).
Naming deficit for non-living items: Neuropsychological and PET study. Neuropsychologia,
35, 359–367.

Simmons, W. K., & Barsalou, L. W. (2003). The Similarity-in-Topography Principle: Recon­


ciling theories of conceptual deficits. Cognitive Neuropsychology, 20, 451–486.

Sirigu, A., Duhamel, J., & Poncet, M. (1991). The role of sensorimotor experience in object
recognition. Brain, 114, 2555–2573.

Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge.
Psychological Review, 99, 605–632.

Stewart, F., Parkin, A. J., & Hunkin, N. M. (1992). Naming impairments following recovery
from herpes simplex encephalitis. Quarterly Journal of Experimental Psychology A, 44,
261–284.

Strnad, L., Anzellotti, S., & Caramazza, A. (2011). Formal models of categorization: In­
sights from cognitive neuroscience. In E. M. Pothos & A. J. Wills (Eds.), Formal approach­
es in categorization (pp. 313–324). Cambridge, UK: Cambridge University Press.

Tanaka, K., et al. (1991). Coding visual images of objects in the inferotemporal cortex of
the macaque monkey. Journal of Neurophysiology, 66, 170–189.

Thompson-Schill, S. L. (2003). Neuroimaging studies of semantic memory: Inferring


“how” from “where.” Neuropsychologia, 41, 280–292.
Page 40 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Thompson-Schill, S. L., Aguirre, G. K., D’Esposito, M., & Farah, M. J. (1999). A neural ba­
sis for category and modality specificity of semantic knowledge. Neuropsychologia, 37,
671–676.

Tranel, D., Logan, C. G., Frank, R. J., & Damasio, A. R. (1997). Explaining category-relat­
ed effects in the retrieval of conceptual and lexical knowledge of concrete entities: Opera­
tionalization and analysis of factor. Neuropsychologia, 35, 1329–1339.

Turnbull, O. H., & Laws, K. R. (2000). Loss of stored knowledge of object structure: Impli­
cation for “category-specific” deficits. Cognitive Neuropsychology, 17, 365–389.

Tyler, L. K., & Moss, H. E. (2001). Towards a distributed account of conceptual knowl­
edge. Trends in Cognitive Sciences, 5, 244–252.

Tyler, L. K., et al. (2003). Do semantic categories activate distinct cortical regions? Evi­
dence for a distributed neural semantic system. Cognitive Neuropsychology, 20, 541–559.

Valyear, K. F., & Culham, J. C. (2009). Observing learned object-specific functional grasps
preferentially activates the ventral stream. Journal of Cognitive Neuroscience, 22, 970–
984.

Vinson, D. P., Vigliocco, G., Cappa, S., & Siri, S. (2003). The breakdown of semantic
knowledge: Insights from a statistical model of meaning representation. Brain and Lan­
guage, 86, 347–365.

Vuilleumier, P., et al. (2004) Distant influences of amygdala lesion on visual cortical acti­
vation during emotional face processing. Nature Neuroscience, 7, 1271–1278.

Warrington, E. K., & McCarthy, R. (1983). Category specific access dysphasia. Brain, 106,
859–878.

Warrington, E. K., & McCarthy, R. A. (1987). Categories of knowledge: Further fractiona­


tions and an attempted integration. Brain, 110, 1273–1296.

Warrington, E. K., & Shallice, T. (1984). Category specific semantic impairment. Brain,
107, 829–854.

Wellman, H. M., & Gelman, S. A. (1992). Cognitive development: Foundational theories of


core domains. Annual Review of Psychology, 43, 337–375.

Zannino, G. D., Perri, R., Carlesimo, G. A., Pasqualetti, P., & Caltagirone, C. (2002). Cate­
gory-specific impairment in patients with Alzheimer’s disease as a function of disease
severity: A cross-sectional investigation. Neuropsychologia, 40, 2268–2279.

Zannino, G. D., Perri, R., Pasqualetti, P., Caltagirone, C., & Carlesimo, G. A. (2006). Analy­
sis of the semantic representations of living and nonliving concepts: A normative study.
Cognitive Neuropsychology, 23, 515–540.

Page 41 of 42
Organization of Conceptual Knowledge of Objects in the Human Brain

Zwaan, R. A. (2004). The immersed experiencer: Toward an embodied theory of language


comprehension. In B. H. Ross (Ed.), The psychology of learning and motivation. New York:
Academic Press.

Bradford Z. Mahon

Bradford Z. Mahon, Departments of Neurosurgery and Brain and Cognitive Sciences,


University of Rochester, Rochester, NY

Alfonso Caramazza

Alfonso Caramazza is Daniel and Amy Starch Professor of Psychology at Harvard Uni­
versity.

Page 42 of 42
A Parallel Architecture Model of Language Processing

A Parallel Architecture Model of Language Processing


 
Ray Jackendoff
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0028

Abstract and Keywords

The Parallel Architecture is a linguistic theory in which (1) the generative power of lan­
guage is divided among phonology, syntax, and semantics; and (2) words, idioms, morpho­
logical affixes, and phrase structure rules are stored in the lexicon in a common format.
This formalization leads to a theory of language processing in which the “competence”
grammar is put to work directly in “performance,” and which is compatible with psy­
cholinguistic evidence concerning lexical retrieval, incremental and parallel processing,
syntactic priming, and the integration of visual context with semantic processing.

Keywords: parallel architecture, language processing

Goals of a Theory of Language Processing—and


Goals of Language Processing
The Parallel Architecture (Jackendoff 1997, 2002, 2010; Culicover & Jackendoff, 2005) is
a theory of language that preserves all the mentalistic and biological aspects of main­
stream generative grammar (e.g., Chomsky 1965, 1981, 1995, 2000), but which employs a
theoretical technology better in tune with linguistic and psycholinguistic discoveries of
the past 30 years. This chapter shows how the Parallel Architecture lends itself to a direct
relation between linguistic structure (or “competence”) and language processing (or
“performance”).

A theory of language processing has to explain how language users convert sounds into
meanings in language perception and how they convert meanings into sounds in lan­
guage production. In particular, the theory has to describe what language users store in
long-term memory that enables them to do this, and how the material stored in memory is
brought to bear in understanding and creating novel utterances in real time.

Page 1 of 31
A Parallel Architecture Model of Language Processing

Linguistic theory is an account of the repertoire of utterances available to a speaker, ab­


stracting away from the real-time aspects of language processing and from the distinction
between perception and production. I take it that one should seek a linguistic theory that
embeds gracefully into an account of language processing, and that can be tested
through experimental techniques as well as through grammaticality judgments.

Unfortunately, many linguists assert that a theory of performance has no bearing on a


theory of competence, and many psycholinguists “retaliate” by asserting that a theory of
processing has no need for a theory of competence. But a linguistic theory that disre­
gards processing cuts itself off from valuable sources of evidence and from potential inte­
gration into cognitive science. From the other side, processing theories that claim to do
without a theory of competence always implicitly embody such a theory anyway, usually a
theory that severely underestimates the complexity and richness of the repertoire of ut­
terances. The goal here is to develop competence and performance theories that are ade­
quate on their own turf and that also interact meaningfully with each other.

All linguistic theories consider utterances to be structured in several domains: at least


phonological (p. 579) (sound) structure, syntactic (grammatical) structure, and semantic
(meaning) structure. Therefore, a plausible working hypothesis is that the goal of lan­
guage processing is to produce a correlated set of phonological, syntactic, and semantic
structures that together match sound to meaning. In perceiving an utterance, the starting
point is an unstructured phonetic string being apprehended over time, possibly with some
gaps or uncertainty; the end point is a meaning correlated with a structured string of
sounds. In producing an utterance, the starting point is a meaning (or thought), possibly
complete, possibly developing as the utterance is being produced; the end point is a fully
structured meaning correlated with a structured string of motor instructions that pro­
duce sounds. Because the correlation of sound and meaning is mediated by syntactic
structure, the processor must also develop enough syntactic structure in both perception
and production to be able to make the relation of sound and meaning explicit.1

These observations already suffice to call into question connectionist models of language
perception whose success is judged by their ability to predict the next word of a sen­
tence, given some finite preceding context (e.g., Elman, 1990; MacDonald & Christiansen,
2002; and, as far as I can determine, Tabor & Tanenhaus, 1999). The implicit theory of
language behind such models is that well-formed language is characterized only by the
statistical distribution of word sequencing. To be sure, statistics of word sequencing are
sometimes symptomatic of meaning relations, but they do not constitute meaning rela­
tions. Consider the previous sentence: (1) How could a processor predict the full succes­
sion of words, and (2) what good would such predictions do in understanding the sen­
tence? Moreover, predicting the next word has no bearing whatsoever on an explanation
of speech production, where the goal has to be to produce the next word in an effort to
say something meaningful.

Page 2 of 31
A Parallel Architecture Model of Language Processing

More generally, we have known since Chomsky (1957) and Miller and Chomsky (1963)
that sequential dependencies among words in a sentence are not sufficient to determine
understanding or even grammaticality. For instance, in (1),

(1) Does the little boy in the yellow hat who Mary described as a genius like
icecream?

the fact that the italicized verb is like rather than likes is determined by the presence of does,
fourteen words away; and we would have no difficulty making the distance longer. What is signif­
icant here is not the distance in words; it is the distance in noun phrases (NPs)—the fact that
does is one NP away from like. This relation is not captured in Elman-style recurrent networks,
which (as pointed out by many critics over the past twenty years) take account only of word se­
quence and have no representation of global structure.
Other issues with connectionist models of language processing will arise below. However,
my main focus here is the Parallel Architecture, to which we now turn.

The Parallel Architecture


The Parallel Architecture differs from mainstream generative grammar (MGG) in three
important respects.

• MGG is syntactocentric: The generative power of language is invested in the syntax,


and phonology and semantics are “interpretive,” derived from syntactic structure. In
the Parallel Architecture, phonology, syntax, and semantics are independent genera­
tive components, linked by interfaces.
• MGG is derivation-based: The structure of a sentence is produced through a step-by-
step algorithmic process, and is inherently directional. The Parallel Architecture is con­
straint-based and nondirectional.
• MGG maintains a strict formal distinction between the lexicon and rules of grammar.
In the Parallel Architecture, words are relatively idiosyncratic rules in a continuum of
generality with more general grammatical structure.

I take up these aspects of the Parallel Architecture in turn.

Phonology as an Independent Generative Component

A major theoretical development in the 1970s (e.g., Goldsmith, 1979; Liberman & Prince,
1977) showed that phonology has its own units and principles of combination, incommen­
surate with syntactic units, though correlated with them. For instance, consider the sen­
tence in (2).

(2) Syntax:
[NP Sesame Street] [VP is [NP a production
[of [NP the Children’s Television Workshop]]]]
Phonology:

Page 3 of 31
A Parallel Architecture Model of Language Processing

[Sesame Street is a production of]


[the Children’s Television Workshop]
or
[Sesame Street] [is a production]
[of the Children’s Television Workshop]

The syntactic structure of (2) consists of a noun phrase (NP), Sesame Street, fol­
(p. 580)

lowed by a verb phrase (VP), the rest of the sentence. The VP in turn consists of the verb
is plus another NP. This NP has embedded within it a further NP, the Children’s Television
Workshop. However, the way the sentence is pronounced does not necessarily conform to
this structure: It can be broken up into intonation contours (or breath groups) in various
ways, two of which are illustrated in (2). Some of these units, for instance, Sesame Street
is a production of in the first version, do not correspond to any syntactic constituent;
moreover, this unit cannot be classified as an NP or a VP, because it cuts across the
boundaries of both. Another such example is the familiar This is the cat that chased the
rat that ate the cheese. This has relentlessly right-embedded syntax; but the intonation is
a flat structure with three parallel parts, only the last of which corresponds to a syntactic
constituent.

The proper way to characterize the pronunciation of these examples is in terms of Intona­
tion Phrases (IPs), phonological units over which intonation contours and the position of
pauses are defined. The pattern of intonation phrases is to some degree independent of
syntactic structure, as seen from the two possibilities in (2). Nevertheless it is not entire­
ly free. For instance, (3) is not a possible pronunciation of this sentence.

(3) *[Sesame] [Street is a] [production of the Children’s] [Television Workshop]

Thus there is some correlation between phonological and syntactic structure, which a
theory of intonation needs to characterize. A first approximation to the proper account for
English appears to be the following principles, stated very informally (Gee & Grosjean,
1983; Hirst, 1993; Jackendoff, 1987; Truckenbrodt, 1999)2:

(4)
a. An Utterance is composed of a string of IPs; IPs do not embed in each other.
b. An IP must begin at the beginning of a syntactic constituent. It may end be­
fore the syntactic constituent does, but it may not go beyond the end of the
largest syntactic constituent that it starts.

Inspection will verify that (2) observes this matching. But (3) does not, because the sec­
ond IP begins with the noun Street, and there is no constituent starting with Street that
also contains is a.

These examples illustrate that phonological structure requires its own set of basic units
and combinatorial principles such as (4a). It is generative in the same sense that syntax
is, though not recursive. In addition, because units of phonological structure such as IPs
cannot be derived from syntactic structure, the grammar needs principles such as (4b)

Page 4 of 31
A Parallel Architecture Model of Language Processing

that stipulate how phonological and syntactic structures can be correlated. In the Parallel
Architecture, these are called interface rules.3

Semantics as an Independent Generative Component

Several different and incompatible approaches to semantics developed during the 1970s
and 1980s: formal semantics (Chierchia & McConnell-Ginet, 1990; Lappin, 1996; Partee,
1976), Cognitive Grammar (Lakoff, 1987; Langacker, 1987; Talmy, 1988), Conceptual Se­
mantics (Jackendoff, 1983, 1990; Pinker, 1989, 2007), and approaches growing out of cog­
nitive psychology (Collins & Quillian, 1969; Rosch & Mervis, 1975; Smith & Medin, 1981;
Smith, Shoben, & Rips, 1974) and artificial intelligence (Schank, 1975). Whatever radical
differences among them, they implicitly agreed on one thing: Meanings of sentences are
not made up of syntactic units such as verbs, noun phrases, and prepositions. Rather,
they are combinations of specifically semantic units such as (conceptualized) individuals,
events, times, places, properties, and quantifiers, none of which correspond one to one
with syntactic units; and these semantic units are combined according to principles that
are specific to semantics and distinct from syntactic principles. This means that seman­
tics, like phonology, must be an independent generative system, not strictly derivable
from syntactic structure, but only correlated with it. The correlation between syntax and
semantics again takes the form of interface rules that state the connection between the
two types of mental representation.

The Parallel Architecture, unlike MGG and most other linguistic theories incorporated in­
to processing models, incorporates a rich and explicit theory of semantics: Conceptual
Semantics (Jackendoff, 1983, 1990, 2002, chapters 9–12). This theory is what makes it
possible to explore the ways in which syntax does and does not match up with meaning
and the ways in which semantics interfaces with other sorts of cognitive capacities, both
perception and “world knowledge.”

Figure 28.1 The Parallel Architecture.

Granting semantics its independence from syntax makes sense both psychologically and
biologically. Sentence meanings are, after all, the combinatorial thoughts that spoken
sentences convey. Thoughts (or concepts) have their own structure, evident even (p. 581)
in nonhuman primates (Cheney & Seyfarth 1990, Hauser 2000), and language is at its ba­
sis a combinatorial system for expressing thoughts. (See Pinker & Jackendoff, 2005, and
Jackendoff & Pinker, 2005, for discussion.)

To sum up, we arrive at an architecture for language along the lines of Figure 28.1.

Page 5 of 31
A Parallel Architecture Model of Language Processing

Here the interfaces are indicated by double arrows to signify that they characterize corre­
lations of structures with each other rather than derivation of one structure from the oth­
er.

Constraint-Based Principles of Grammar


A second major feature of the Parallel Architecture is that it is constraint based and
nondirectional. Instead of classical phrase structure rules such as (5), in which the sym­
bol S is expanded as or rewritten as NP plus VP and so on, the Parallel Architecture lists
available pieces of structure or “treelets,” as in (6).

A tree is built by “clipping together” these tree-lets at nodes they share, working from the
bottom up, or from the top down, or from anywhere in the middle, as long as the resulting
tree ends up with S at the top and terminal symbols at the bottom. No order for building
trees is logically prior to any other. Alternatively, one can take a given tree and check its
well-formedness by making sure that every part of it conforms to one of the treelets; the
structures in (6) then function as constraints on possible trees rather than as algorithmic
generative engines for producing trees. Hence the constraint-based formalism does not
presuppose any particular implementation; it is compatible with serial, parallel, top-down,
or bottom-up computation.

The constraint-based formalism is not confined to the Parallel Architecture. It is a major


feature of several other non-mainstream versions of generative grammar, such as Lexical
Functional Grammar (Bresnan, 2001), Head-driven Phrase Structure Grammar (Pollard &
Sag, 1994), and Optimality Theory (Prince & Smolensky, 1993/2004). An important part of
this formalism is that constraints can be violable and can compete with each other; it is
beyond the scope of this article to describe the various theoretical approaches to resolv­
ing constraint conflict.

This approach is advantageous in making contact with models of processing. For exam­
ple, suppose an utterance begins with the word the. This is listed in the lexicon as a de­
terminer, so we begin with the subtree (7).

(7)

Page 6 of 31
A Parallel Architecture Model of Language Processing

Det is the initial node in treelet (6b), which can therefore be clipped onto (7) to produce
(8).

(8)

In turn, an initial NP fits into treelet (6a), which in turn can have (6c) clipped into its VP,
giving (9)—

(9)

—and we are on our way to anticipatory parsing, that is, setting up grammatical expectations on
the basis of an initial word. Further words in (p. 582) the sentence may be attached on the basis
of the top-down structure anticipated in (9). Alternatively, they may disconfirm it, as in the sen­
tence The more I read, the less I understand—in which case other treelets had better be avail­
able that can license the construction.4
In psycholinguistics, the term constraint-based seems generally to be used to denote a
lexically driven connectionist architecture along the lines of MacDonald et al. (1994). Like
the constraint-based linguistic theories, these feature multidimensional constraint satis­
faction and the necessity to resolve competition among conflicting constraints. However,
as MacDonald and Christiansen (2002) observe, the constraint-based aspects of such pro­
cessing theories can be separated from the connectionist aspects. Indeed, one of the ear­
liest proposals for lexically driven constraint-based parsing, by Ford et al. (1982), is
couched in traditional symbolic terms.

The constraints that govern structure in the Parallel Architecture are not all word-based,
as they are for MacDonald et al. The projection of the into a Determiner node in (7) is in­
deed word-based. But all the further steps leading to (9) are accomplished by the treelets
in (6), which are phrasal constraints that make no reference to particular words. Similar­
ly, the use of the prosody–syntax interface constraint (4b) constrains syntactic structure
without reference to particular words. In general, as will be seen later, the building of
structure is constrained by a mixture of word-based, phrase-based, semantically based,
and even pragmatically based conditions.

Page 7 of 31
A Parallel Architecture Model of Language Processing

No Strict Lexicon Versus Grammar Distinction


Every mentalistic linguistic theory takes a word to be an association in long-term memory
between pieces of phonological, syntactic, and semantic structure. The phonological and
semantic structures of words are typically much richer than their syntactic structures.
For example, the words dog, cat, chicken, kangaroo, worm, and elephant are differentiat­
ed in sound and meaning, but they are syntactically indistinguishable: they are all just
singular count nouns. Similarly for all the color words and for all the verbs of locomotion
such as walk, jog, swagger, slither, and so on.

In the Parallel Architecture, a word is treated as a small interface rule that plays a role in
the composition of sentence structure. It says that in building the structure for a sen­
tence, this particular piece of phonology can be matched with this piece of meaning and
these syntactic features. So, for instance, the word cat has a lexical structure along the
lines of (10a), and the has a structure like (10b).

(10)
a. kæt1—N1—CAT1
b. ðə2—Det2—DEF2

The first component of (10a) is a phonological structure; the second marks it as a noun;
the third is a stand-in for whatever semantic features are necessary to distinguish cats
from other things. (10b) is similar (where DEF is the feature “definiteness”). The co-sub­
scripting of the components is a way of notating that the three parts are linked in long-
term memory (even if it happens that they are localized in different parts of the brain).

When words are built into phrases, structures are built in all three components in paral­
lel, yielding a linked trio of structures like (11) for the cat.

(11)

Here the subscript 1 binds together the components of cat, and the subscript 2 binds to­
gether the components of the.

A word can stipulate contextual restrictions on parts of its environment; these include,
among other things, traditional notions of subcategorization and selectional restrictions.
For example, the transitive verb devour requires a direct object in syntactic structure. Its
semantics requires two arguments: an action of devouring must involve a devourer (the
agent) and something being devoured (the patient). Moreover, the patient has to be ex­
pressed as the direct object of the verb. Thus the lexical entry for this verb can be notat­
ed as (12). The material composing the verb itself is notated in roman type. The contextu­
al restrictions are notated in italics: NP, X, and Y are variables that must be satisfied in or­

Page 8 of 31
A Parallel Architecture Model of Language Processing

der for a structure incorporating this word to be well formed. The fact that the patient
must appear in object position is notated in terms of the subscript 4 shared by the syntac­
tic and semantic structure.

(12) dəvawr3 –V3 NP4 –


[[X; ANIMATE] DEVOUR3 [Y; EDIBLE]4]

In the course of parsing, if the parser encounters devour, the syntactic and semantic
structure of (12) will create an anticipation of a direct object that denotes some edible en­
tity.

The word-based projection of structure illustrated in (12) is entirely parallel to


(p. 583)

that in lexically driven models of parsing such as that of MacDonald et al. (1994). Howev­
er, MacDonald et al. claim that all structure is built on the basis of word-based contextual
constraints. This strategy is not feasible in light of the range of structures in which most
open-class items can appear (and it is questioned experimentally by Traxler et al., 1998).
For example, we do not want every English noun to stipulate that it can occur with a pos­
sessive, with quantifiers, with prenominal adjectives, with postnominal prepositional
phrase modifiers, and with relative clauses, and if a count noun, in the plural. These pos­
sibilities are a general property of noun phrases, captured in the phrasal rules, and they
do not belong in every noun’s lexical entry. Similarly, we do not want every verb to stipu­
late that it can occur in every possible inflectional form, and that it can co-occur with a
sentential adverbial, a manner adverbial (if semantically appropriate), time and place
phrases, and so on. Nor, in German, do we want every verb to say that it occurs second in
main clauses and last in subordinate clauses. These sorts of linguistic phenomena are
what a general theory of syntax accounts for, and for which general phrasal rules like (6)
are essential.5 Furthermore, the constraints between prosodic and syntactic constituency
discussed earlier cannot be coded on individual words either. It is an empirical problem to
sort out which constraints on linguistic form are word-based, which are phrase based,
which involve syntactic structure, which involve semantic or prosodic structure, and
which involve interface conditions.

An important feature of the treatment of words illustrated in (12) is that it extends direct­
ly to linguistic units both smaller and larger than words. For instance, consider the Eng­
lish regular plural inflection, which can be formalized in a fashion entirely parallel to (12).

(13) Wd6+z5 – N6+aff 5 – [PLUR 5 (X6)]

The phonological component of (13) says that the phoneme z is appended to a phonologi­
cal word. The syntactic component says that an affix appears attached to a noun. The co-
subscripting indicates that this affix is pronounced z and the noun corresponds to the
phonological word that z is added to. The semantic component of (13) says that the con­
cept expressed by this phonological word is pluralized. Thus the regular plural is formally
similar to a transitive verb; the differences lie in what syntactic category it belongs to and
what categories it attaches to in syntax and phonology.

Page 9 of 31
A Parallel Architecture Model of Language Processing

This conception of regular affixation is somewhat different from Pinker’s (1999). Pinker
would state the regular plural as a procedural rule: “To form the plural of a noun, add z.”
In the present account, the regular plural is at once a lexical item, an interface rule, and
a rule for combining an affix and a noun, depending on one’s perspective. However, the
present analysis does preserve Pinker’s dichotomy between regular affixation and irregu­
lar forms. As in his account, irregular forms must be listed individually, whereas regular
forms can be constructed by combining (13) with a singular noun. In other words, this is
a “dual-process” model of inflection. However, the “second” process, that of free combina­
tion, is exactly the same as is needed for combining transitive verbs with their objects.
Notice that every theory of language needs a general process for free combination of
verbs and their objects—the combinations cannot be memorized. So parsimony does not
constitute a ground for rejecting this particular version of the dual-process model.6

Next consider lexical entries that are larger than a word. For example, the idiom kick the
bucket is a lexical VP with internal phonological and syntactic structure:

(14)

Here the three elements in phonology are linked to the three terminal elements of the VP
(V, Det, and N). However, the meaning is linked not to the individual words but rather to
the VP as a whole (subscript 10). Thus the words have no meaning on their own—only the
entire VP has meaning. This is precisely what it means for a phrase to be an idiom: Its
meaning cannot be predicted from the meanings of its parts, but instead must be learned
and stored as a whole.

Once we acknowledge that pieces of syntactic structure are stored in long-term memory
associated with idiomatic meanings in items like (14), it is a short step to also admitting
pieces of structure that lack inherent meanings, such as the “treelets” in (6). This leads to
a radical conclusion from the mainstream point of view: words, regular affixes, idioms,
and ordinary phrase structure rules like (6) can all be expressed in a common formalism,
namely as pieces of linguistic structure stored in long-term memory. (p. 584) The lexicon is
not a separate component of grammar from the rules that assemble sentences. Rather,
what have traditionally been distinguished as “words” and “rules” are simply different
sorts of stored structure. “Words” are idiosyncratic interface rules; “rules” may be gener­
al interface rules, or they may be simply stipulations of possible structure in one compo­

Page 10 of 31
A Parallel Architecture Model of Language Processing

nent or another. Novel sentences are “generated” by “clipping together” pieces of stored
structure, an operation called unification (Shieber, 1986).7

Under this interpretation of words and rules, the distinction between word-based parsing
and rule-based parsing disappears. This yields an immediate benefit in the description of
syntactic priming, in which the use of a particular syntactic structure such as a ditransi­
tive verb phrase primes subsequent appearances (Bock, 1995; Bock & Loebell, 1990,). As
Bock (1995) observes, the existence of syntactic priming is problematic within main­
stream assumptions: There is no reason that rule application should behave anything like
lexical access. However, in the Parallel Architecture, where syntactic constructions and
words are both pieces of stored structure, syntactic priming is to be expected, altogether
parallel to word priming.

Summing up the last three sections, the Parallel Architecture acknowledges all the com­
plexity of linguistic detail addressed by mainstream theory, but it proposes to account for
this detail in different terms. Phonology and semantics are combinatorially independent
from syntax; ordered derivations are replaced by parallel constraint checking; words are
regarded as interface rules that help mediate between the three components of language;
and words and rules are both regarded as pieces of stored structure. Jackendoff (2002)
and Culicover and Jackendoff (2005) demonstrate how this approach leads to far more
natural descriptions of many phenomena such as idioms and offbeat “syntactic nuts” that
have been either problematic or ignored in the mainstream tradition.

Culicover and Jackendoff (2005) further show how this approach leads to a considerable
reduction in the complexity of syntactic structure, an approach called Simpler Syntax.
From the point of view of psycholinguistics, this should be a welcome result. The syntac­
tic structures posited by contemporary MGG are far more complex than have been or
could be investigated experimentally, whereas the structures of Simpler Syntax are for
the most part commensurate with those that have been assumed in the last three decades
of psycholinguistic and neurolinguistic research.

Processing in the Parallel Architecture: Gener­


al Considerations
The Parallel Architecture is motivated primarily on grounds of its ability to account for
the phenomena addressed by traditional linguistic theory; that is, it is a “competence”
model in the classical sense. However, we have begun to see that it also has implications
for processing. We now turn more specifically to embedding the Parallel Architecture in a
processing theory that helps clarify certain debates in psycholinguistics and that also al­
lows psycholinguistic evidence to bear directly on issues of linguistic theory. (See also
Jackendoff, 2002, chapter 7, which discusses production as well as perception.)

To begin, it is necessary to discuss two general considerations in processing. The first is


serial versus parallel processing. When the parser encounters a local structural ambigui­

Page 11 of 31
A Parallel Architecture Model of Language Processing

ty, does it only pursue one preferred analysis, backing up if it makes a mistake—or does it
pursue multiple options in parallel? Through the past three decades, as these two alterna­
tive hypotheses have competed in the literature and have been refined to deal with new
experimental evidence, they have become increasingly indistinguishable (Lewis, 2000).
On the one hand, a parallel model has to rank alternatives for plausibility; on the other
hand, a serial model has to be sensitive to detailed lexically conditioned alternatives that
imply either some degree of parallelism or a phenomenally fast recovery from certain
kinds of incorrect analyses.

The Parallel Architecture cannot settle this dispute, but it does place a distinct bias on
the choice. It has been clear since Swinney (1979) and Tanenhaus et al. (1979) that lexical
access in language perception is “promiscuous”: An incoming phonological string acti­
vates all semantic structures associated with it, whatever their relevance to the current
semantic context, and these remain activated in parallel for some time in working memo­
ry. As shown earlier, the Parallel Architecture treats syntactic treelets as the same formal
type as words: Both are pieces of structure stored in long-term memory. A structural am­
biguity such as that in (15)—

(15) My professor told the girl that Bill liked the story about Harry.

—arises by activating different treelets and/or combining them in different ways, a treatment not
so different in spirit from a lexical ambiguity. This suggests that on grounds of consistency, the
Parallel Architecture recommends parallel processing. Thus in developing a model of process­
ing, I assume all (p. 585) the standard features of parallel processing models, in particular com­
petition among mutually inhibitory analyses.
A second ongoing dispute in the language processing literature concerns the character of
working memory. One view (going back at least to Neisser, 1967) sees working memory as
functionally separate from long-term memory, a “place” where incoming information can
be structured. In this view, lexical retrieval involves in some sense copying or binding the
long-term coding of a word into working memory. By contrast, semantic network and con­
nectionist architectures for language processing (e.g., Smith & Medin, 1981; Elman et al.,
1996; MacDonald & Christiansen, 2002; MacDonald et al., 1994) make no distinction be­
tween long-term and working memory. For them, “working memory” is just the part of
long-term memory that is currently activated (plus, in Elman’s recurrent network archi­
tecture, a copy of the immediately preceding input); lexical retrieval consists simply of ac­
tivating the word’s long-term encoding, in principle a simpler operation.

Such a conception, though, does not allow for the building of structure. Even if the words
of a sentence being perceived are activated, there is no way to connect them up; the dog
chased a cat, the cat chased a dog, and dog cat a chased the activate exactly the same
words. There is also no principled way to account for sentences in which the same word
occurs twice, such as my cat likes your cat: the sentence refers to two distinct cats, even
though there is (presumably) only one “cat node” in the long-term memory network. Jack­
endoff (2002, section 3.5) refers to this difficulty as the “Problem of 2” and shows that it
recurs in many cognitive domains, for example, in recognizing two identical forks on the
table, or in recognizing a melody containing two identical phrases. In an approach with a
Page 12 of 31
A Parallel Architecture Model of Language Processing

separate working memory, these problems do not arise: There are simply two copies of
the same material in working memory, each of which has its own relations to other mater­
ial (including the other copy).

An approach lacking an independent working memory also cannot make a distinction be­
tween transient and permanent linkages. For instance, recall that MacDonald et al.
(1994) propose to account for structure by building into lexical items their potential for
participating in structure. For them, structure is composed by establishing linkages
among the relevant parts of the lexical entries. However, consider the difference between
the phrases throw the shovel and kick the bucket. In the former, where composition is ac­
complished on the spot, the linkage between verb and direct object has to be transient
and not affect the lexical entries of the words. But in the latter, the linkage between the
verb and direct object is part of one’s lexical knowledge and therefore permanent. This
distinction is not readily available in the MacDonald et al. model. A separate working
memory deals easily with the problem: Both examples produce linkages in working mem­
ory, but only kick the bucket is linked in long-term memory.

Neural network models suffer from two other important problems (discussed at greater
length in Jackendoff, 2002, section 3.5). First, such models encode long-term memories as
connection strengths among units in the network, acquired through thousands of steps of
training. This gives no account of one-time learning of combinatorial structures, such as
the meaning of the sentence I’ll meet you for lunch at noon, a single utterance of which
can be sufficient to cause the hearer to show up for lunch. In a model with a separate
working memory, the perception of this sentence leads to copying of the composite mean­
ing into episodic memory (or whatever is responsible for keeping track of obligations and
formulating plans)—which is distinct from linguistic knowledge.

Finally, a standard neural network cannot encode a general relation such as X is identical
with Y, X rhymes with Y,8 or X is the (regular) past tense of Y. Connectionists, when
pressed (e.g., Bybee & McClelland, 2005), claim that there are no such general relations
—there are only family resemblances among memorized items, to which novel examples
are assimilated by analogy. But to show that there is less generality than was thought is
not to show that there are no generalizations. The syntactic generalizations mentioned
earlier, such as the multiple possibilities for noun modification, again can be cited as
counterexamples; they require typed variables such as N, NP, V, and VP in order to be
statable. Marcus (1998, 2001), in important work that has been met with deafening si­
lence by the connectionist community,9 demonstrates that neural networks in principle
cannot encode the typed variables necessary for instantiating general relations, including
those involved in linguistic combinatoriality.

This deficit is typically concealed by dealing with toy domains with small vocabularies
and a small repertoire of structures. It is no accident that the domains of language in
which neural network architectures have been most successful are those that make mini­
mal use of structure, such as word (p. 586) retrieval, lexical phonology, and relatively sim­
ple morphology. All standard linguistic theories give us a handle on how to analyze com­

Page 13 of 31
A Parallel Architecture Model of Language Processing

plex sentences like the ones you are now reading; but despite more than twenty years of
connectionist modeling, no connectionist model comes anywhere close. (For instance, the
only example worked out in detail by MacDonald et al., 1994, is the two-word utterance,
John cooked.)

Accordingly, as every theory of processing should, a processing model based on the Paral­
lel Architecture posits a working memory separate from long-term memory: a “work­
bench” or “blackboard” in roughly the sense of Arbib (1982), on which structures are con­
structed online. Linguistic working memory has three subdivisions or “departments,” one
each for the three components of grammar, plus the capability of establishing linkages
among their parts in terms of online bindings of the standard (if ill-understood) sort stud­
ied by neuroscience. Because we are adopting parallel rather than serial processing, each
department is capable of maintaining more than one hypothesis, linked to one or more hy­
potheses in other departments.10

This notion of working memory contrasts with Baddeley’s (1986) influential treatment, in
which linguistic working memory is a “phonological loop” where perceived phonological
structure is rehearsed. Baddeley’s approach does not tell us how phonological structure
is constructed, nor how the corresponding syntactic and semantic structures are con­
structed and related to phonology. Although a phonological loop may be adequate for de­
scribing the memorization of strings of nonsense syllables (Baddeley’s principal concern),
it is not adequate for characterizing the understanding of strings of meaningful syllables,
that is, the perception of real spoken language. And relegating the rest of language pro­
cessing to a general-purpose “central executive” simply puts off the problem. (See Jack­
endoff, 2002, pp. 205–207, for more discussion.)

An Example

Figure 28.2 Linguistic working memory after pho­


netic processing of the first five syllables of (16a,b).

With this basic conception of working memory in place, we will now see how the knowl­
edge structures posited by the Parallel Architecture can be put to use directly in the
process of language perception. Consider the following pair of sentences.

(16)
a. It’s not a parent, it’s actually a child.
b. It’s not apparent, it’s actually quite obscure.

(16a,b) are phonetically identical (at least in my dialect) up to their final two words. How­
ever they are phonologically different: (16a) has a word boundary that (16b) lacks. They
are also syntactically different: a parent is an NP, whereas apparent is an adjective phrase
Page 14 of 31
A Parallel Architecture Model of Language Processing

(AP). And of course they are semantically different as well. The question is how the two
interpretations are developed in working memory and distinguished at the end.

Suppose that auditory processing deposits raw phonetic input into working memory. Fig­
ure 28.2 shows what working memory looks like when the first five syllables of (16) have
been so assimilated. (I beg the reader’s indulgence in idealizing away from issues of
phoneme identification, which are of course nontrivial.)

At the next stage of processing, the lexicon must be called into play in order to identify
which words are being heard. For convenience, let us represent the lexicon as in Figure
28.3, treating it as a relatively unstructured collection of phonological, syntactic, and se­
mantic structures—sometimes linked—of the sort illustrated in (6), (10), and (12) to (14)
above.

Working memory, seeking potential lexical matches, sends a call to the lexicon, in effect
asking, “Do any of you in there sound like this?” And various phonological structures “vol­
unteer” or are activated. Following Swinney (1979) and Tanenhaus, Leiman, and Seiden­
berg (1979), all possible forms with the appropriate phonetics are activated: both it’s and
its, both not and knot, and both apparent and a + parent. This experimental result stands
to reason, given that at this point only phonetic information is available to the processor.
However, following the lexically driven parsing tradition, we also can assume that the de­
gree and/or speed of activation of the alternative forms depends on their frequency.

Figure 28.3 A fragment of the lexicon.

Figure 28.4 The lexicon after being called by work­


ing memory in Figure 28.2.

Page 15 of 31
A Parallel Architecture Model of Language Processing

Figure 28.5 Activated lexical items are copied/bound


into working memory, creating multiple “drafts.”

Phonological activation in the lexicon spreads to linked syntactic and semantic struc­
tures. If a lexical (p. 587) item’s semantic structure has already been primed by context,
its activation will be faster or more robust, or both. Moreover, once lexical semantic
structure is activated, it begins to prime semantically related lexical items. The result is
depicted in Figure 28.4, in which the activated items are indicated in bold.

Figure 28.6 The status of working memory and the


lexicon after syntactic integration.

Next, the activated lexical items are bound to working memory. However, not only the
phonological structure is bound; the syntactic and semantic structures are also bound (or
copied) to the appropriate departments of working memory, yielding the configuration in
Figure 28.5. Because there are alternative ways of carving the phonological content into
lexical items, working memory comes to contain mutually inhibitory “drafts” (in the sense
of Dennett, 1991) of what is being heard. At this point in processing, there is no way of
knowing which of the two competing “drafts” is correct. (For convenience in exposition,
from here on we consider just the fragment of phonetics corresponding (p. 588) to appar­
ent or a + parent, ignoring its/it’s and not/ knot.)

The syntactic department of working memory now contains strings of syntactic elements,
so it is now possible to undertake syntactic integration: the building of a unified syntactic
structure from the fragments now present in working memory. Syntactic integration uses
the same mechanism as lexical access: the strings in working memory activate treelets in

Page 16 of 31
A Parallel Architecture Model of Language Processing

long-term memory. In turn these treelets are unified with the existing strings. The string
Det—N thus becomes an NP, and the adjective becomes an AP, as shown in Figure 28.6.

Figure 28.7 The status of working memory after se­


mantic integration.

The other necessary step is semantic integration: building a unified semantic structure
from the pieces of semantic structure bound into working memory from the lexicon. This
process has to make use of at least two sets of constraints. One set is the principles of se­
mantic well-formedness: unattached pieces of meaning have to be combined in a fashion
that makes sense—both internally and also in terms of any context that may also be
present in semantic working memory. In the present example, these principles will be suf­
ficient to bring about semantic integration: INDEF and PARENT can easily be combined
into a semantic constituent, and APPARENT forms a constituent on its own. The resulting
state of working memory looks like Figure 28.7.

However, in more complex cases, semantic integration also has to use the syntax–seman­
tics interface rules (also stored in the lexicon, but not stated here), so that integrated syn­
tactic structures in working memory can direct the arrangement of the semantic frag­
ments. In such cases, semantic integration is dependent on successful syntactic integra­
tion. Consider the following example:

(17) What did Sandy say that Pat bought last Tuesday?

Figure 28.8 Status of working memory after it’s ac­


tually a child is semantically integrated.

Figure 28.9 The semantics of the lower draft is ex­


tinguished.

In order for semantic integration to connect the meaning of what to the rest of the inter­
pretation, syntactic integration must determine that it is the object of bought rather than
the object of say. Syntactic integration must also determine that last Tuesday is a modifi­
Page 17 of 31
A Parallel Architecture Model of Language Processing

er of bought and not say (compare to Last Tuesday, what did Sandy say that Pat bought?).
Thus in such cases we expect semantic integration (p. 589) to be (partly) dependent on the
output of syntactic integration.11 In Figure 28.7, as it happens, the semantic structure is
entirely parallel to the syntax, so there is no way to tell whether syntactic integration has
been redundantly evoked to determine the semantics, and with what time course.

At the point reached in Figure 28.7, working memory has two complete and mutually in­
hibitory structures, corresponding to the meanings of the two possible interpretations of
the phonetic input. How is this ambiguity resolved? As observed above, it depends on the
meaning of the following context. In particular, part of the meaning of the construction in
(16), It’s not X, it’s (actually) Y, is that X and Y form a semantic contrast. Suppose the in­
put is (16a), It’s not a parent, it’s actually a child. When the second clause is semantically
integrated into working memory, the result is then Figure 28.8.

At this point in processing (and only at this point), it becomes possible to detect that the
lower “draft” is semantically ill formed because apparent does not form a sensible con­
trast with child. Thus the semantic structure of this draft comes to be inhibited or extin­
guished, as in Figure 28.9.

This in turn sets off a chain reaction of feedback through the entire set of linked struc­
tures. Because the semantic structure of the lower draft helps keep the rest of the lower
draft stable, and because all departments of the upper draft are trying to inhibit the low­
er draft, the entire lower draft comes to be extinguished.

Meanwhile, through this whole process, the activity in working memory has also main­
tained activation of long-term memory items that are bound to working memory. Thus,
when apparent and its syntactic structure are extinguished in working memory, the corre­
sponding parts of the lexicon are deactivated as well—and they therefore cease priming
semantic associates, as in Figure 28.10.

The final state of working memory is Figure 28.11. If the process from Figure 28.6
through Figure 28.11 goes quickly enough, the perceiver ends up hearing the utterance
as It’s not a parent, with no sense of ambiguity or garden-pathing, even though the disam­
biguating information follows the ambiguous passage. Strikingly, the semantics affects
the hearer’s impression of the phonology.

To sum up the process just sketched:

• Phonetic processing provides strings of phonemes in phonological working memory.


• The phonemic strings initiate a call to the lexicon in long-term memory, seeking can­
didate words that match parts of the strings.
• Activated lexical items set up candidate phonological parsings, often in multiple
drafts, each draft linked to a lexical item or sequence of lexical items.
• Activated lexical items also set up corresponding strings of syntactic units and collec­
tions of semantic units in the relevant departments of working memory.

Page 18 of 31
A Parallel Architecture Model of Language Processing

• Syntactic integration proceeds by activating and binding to treelets stored in the lex­
icon.

Figure 28.10 The syntax and phonology of the low­


er draft and their links to the lexicon are extin­
guished.

(p. 590)

Figure 28.11 The resolution of the ambiguity.

• When semantic integration dependson syntactic constituency, it cannot begin until


syntactic integration of the relevant constituents is complete. (However, semantic inte­
gration does not have to wait for the entire sentence to be syntactically integrated—
only for local constituents.)
• Semantic disambiguation among multiple drafts requires semantic integration with
the context (linguistic or nonlinguistic). In general, semantic disambiguation will there­
fore be slower than syntactic disambiguation.
• The last step in disambiguation is the suppression of phonological candidates by
feedback.
• Priming is an effect of lexical activation in long-term memory. Early in processing, se­
mantic associates of all possible meanings of the input are primed. After semantic dis­
ambiguation, priming by disfavored readings terminates.
• Priming need not be confined to the semantics of words. Because syntactic treelets
are also part of the lexicon, it is possible to account for syntactic or constructional
priming (Bock, 1995) in similar terms.

There is ample room in this model to investigate standard processing issues such as ef­
fects of frequency and priming on competition (here localized in lexical access), relative
prominence of alternative parsings (here localized in syntactic integration), influence of
context (here localized in semantic integration), and conditions for garden-pathing (here,
premature extinction of the ultimately correct draft) or absence thereof (as in the present

Page 19 of 31
A Parallel Architecture Model of Language Processing

example). The fact that each step of processing can be made explicit—in terms of ele­
ments independently motivated by linguistic theory—recommends the model as a means
of putting all these issues in larger perspective.

Further Issues
This section briefly discusses two further sorts of phenomena that can be addressed in
the Parallel Architecture’s model of processing. I am not aware of attempts to draw these
together in other models.

Visually Guided Parsing

Tanenhaus et al. (1995) confronted subjects with an array of objects and an instruction
like (18), and their eye movements over the array were tracked.

(18) Put the apple on *the towel in the cup.

At the moment in time marked by *, the question faced by the language processor is
whether on (p. 591) is going to designate where the apple is, or where it is to be put—a
classic PP attachment ambiguity. It turns out that at this point, subjects already start
scanning the relevant locations in the array in order to disambiguate the sentence (Is
there more than one apple? Is there already an apple on the towel?). Hence visual feed­
back is used to constrain interpretation early on in processing.

The Parallel Architecture makes it clear how this can come about. So far we have spoken
only of interfaces between semantic structure and syntax. However, semantic structure
also interfaces with other aspects of cognition. In particular, to be able to talk about what
we see, high-level representations produced by the visual system must be able to induce
the creation of semantic structures that can then be converted into utterances. Address­
ing this need, the Parallel Architecture (Jackendoff, 1987, 1996, 2002, 2012; Landau &
Jackendoff, 1993) proposes a level of mental representation called spatial structure,
which integrates visual, haptic, and proprioceptive inputs into the perception of physical
objects in space (including one’s body). Spatial structure is linked to semantic structure
by means of an interface similar in character to the interfaces within the language faculty.

Some linkages between semantic and spatial structure are stored in long-term memory.
For instance, cat is a semantic category related to the category animal in semantic struc­
ture and associated with the phonological structure /kæt/ in long-term memory. But it is
also associated with a spatial structure, which encodes what cats look like, the counter­
part in the present approach to an “image of a stereotypical instance.” Other linkages
must be computed combinatorially on line. For instance, the spatial structure that arises
from seeing an apple on a towel is not a memorized configuration, and it must be mapped
online into the semantic structure [APPLE BE [ON [TOWEL]]]. Such a spatial structure
has to be computed in another department of working memory that encodes one’s con­
ception of the current spatial layout.12

Page 20 of 31
A Parallel Architecture Model of Language Processing

This description of the visual–linguistic interface is sufficient to give an idea of how exam­
ple (18) works. In hearing (18), which refers to physical space, the goal of processing is to
produce not only a semantic structure but a semantic structure that can be correlated
with the current spatial structure through the semantic–spatial interface. At the point
designated by *, syntactic and semantic integration have led to the two drafts in (19). (As
usual, italics denote anticipatory structure to be filled by subsequent material; the seman­
tics contains YOU because this is an imperative sentence.)

(19)

Syntax Semantics

a. [VP put YOU PUT

[NP the apple] [APPLE; DEF]

[PP on NP]] [ON X]

b. [VP put YOU PUT

[NP the apple [PP on NP]] [APPLE; DEF;

PP] [Place ON X]]

PLACE

Thus the hearer has performed enough semantic integration to anticipate finding a
unique referent in the visual environment for the NP beginning with the apple, and starts
scanning for one.

Suppose spatial structure turns up with two apples. Then the only draft that can be corre­
lated consistently with spatial structure is (19b), with the expectation that the phrase on
NP will provide disambiguating information. The result is that draft (19a) is extinguished,
just like the lower draft in our earlier example. If it happens that the hearer sees one of
the apples on something, say a towel, and the other is not on anything, the first apple can
be identified as the desired unique referent—and the hearer ought to be able to antici­
pate the phonological word towel. Thus by connecting all levels of representation through
the interfaces, it is possible to create an anticipation of phonological structure from visual
input.

This account of (18) essentially follows what Tanenhaus et al. have to say about it. What is
important here is how naturally and explicitly it can be couched in the Parallel Architec­
ture—both in terms of its theory of interfaces among levels of representation and in terms
of its theory of processing.

Page 21 of 31
A Parallel Architecture Model of Language Processing

Semantic Structure Without Syntax or Phonology

The relationship between the Parallel Architecture and its associated processing model is
a two-way street: It is possible to run experiments that test linguistic hypotheses. For ex­
ample, consider the phenomenon of “aspectual coercion,” illustrated in (20) (Jackendoff,
1997; Pustejovsky, 1995; Verkuyl, 1993; among others). Example (20a) conveys a sense of
repeated jumping, but there is no sense of repeated sleeping in the syntactically parallel
(20b).

(20)
a. Joe jumped until the bell rang.
b. Joe slept until the bell rang.

Here is how it works: The semantic effect of until is to place a temporal bound on
(p. 592)

a continuous process. Since sleep denotes a continuous process, semantic integration in


(20b) is straightforward. In contrast, jump is a point-action verb: a jump has a definite
ending, namely when one lands. Thus it cannot integrate properly with until. However, re­
peated jumping is a continuous process, so by construing the sentence in this fashion, se­
mantic integration can proceed. Crucially, in the Parallel Architecture, the sense of repeti­
tion is encoded in none of the words. It is a free-floating semantic operator that can be
used to “fix up” or “coerce” interpretations under certain conditions. A substantial num­
ber of linguistic phenomena have now been explained in terms of coercion (see Jackend­
off, 1997, for several examples; some important more recent examples appear in Culicov­
er & Jackendoff 2005, chapter 12).

The Parallel Architecture makes the prediction that (20a) will look unexceptionable to the
processor until semantic integration. At this point, the meanings of the words cannot be
integrated, and so semantic integration attempts the more costly alternative of coercion.
Thus a processing load should be incurred specifically at the time of semantic integra­
tion. Piñango et al. (1999) test for processing load in examples like (20a,b) during audito­
ry comprehension by measuring reaction time to a lexical decision task on an unrelated
probe. The timing of the probe establishes the timing of the processing load. And indeed
extra processing load does show up in the coerced examples, in a time frame consistent
with semantic rather than syntactic or lexical processing, just as predicted by the Parallel
Architecture.13

Similar experimental results have been obtained for the “light verb construction” shown
in (21).

(21)
a. Sam gave Harry an order. (= Sam ordered Harry)
b. Sam got an order from Harry. (= Harry ordered Sam)

In these examples, the main verb order is paraphrased by the combination of the noun an
order plus the light verbs give and get. But the syntax is identical to a “nonlight” use of
the verb, as in Sam gave an orange to Harry and Sam got an orange from Harry. The light

Page 22 of 31
A Parallel Architecture Model of Language Processing

verb construction comes to paraphrase the simple verb through a semantic manipulation
that combines the argument structures of the light verb and the nominal (Culicover &
Jackendoff, 2005, pp. 222–225). Thus again the Parallel Architecture predicts additional
semantic processing, and this is confirmed by experimental results (Piñango et al., 2006;
Wittenberg et al., forthcoming, Wittenberg et al., in revision).

Final Overview
The theoretical account of processing sketched in the previous three sections follows di­
rectly from the logic of the Parallel Architecture. First, as in purely word-driven ap­
proaches to processing, this account assumes that words play an active role in determin­
ing structure at phonological, syntactic, and semantic levels: The linguistic theory posits
that words, idioms, and constructions are all a part of the rule system. In particular, the
interface properties of words determine the propagation of activity across the depart­
ments of working memory.

Second, unlike purely word-driven approaches to processing, the Parallel Architecture’s


processing model builds hierarchical structure in working memory, using pieces of phrase
structure along with structure inherent in words. This enables the processing model to
encompass sentences of any degree of complexity and to overcome issues such as the
Problem of 2.

Third, structural information is available in processing as soon as a relevant rule (or


word) can be activated in the lexicon and bound into working memory. That is, processing
is opportunistic or incremental— in accord with much experimental evidence. This charac­
teristic of processing is consistent with the constraint-based formalism of the Parallel Ar­
chitecture, which permits structure to be propagated from any point in the sentence—
phonology, semantics, top-down, bottom-up, left to right. Contextual influences from dis­
course or even from the visual system can be brought to bear on semantic integration as
soon as semantic fragments are made available through lexical access.

Fourth, the system makes crucial use of parallel processing: All relevant structures are
processed at once in multiple “drafts,” in competition with one another. The extinction of
competing drafts is carried out along pathways established by the linkages among struc­
tures and the bindings between structures in working memory and the lexicon. Because
the Parallel Architecture conceives of the structure of a sentence as a linkage among
three separate structures, the handling of the competition among multiple drafts is com­
pletely natural.

What is perhaps most attractive about the Parallel Architecture from a psycholinguistic
perspective is that the principles of grammar are used directly by the processor. That is,
unlike the classical MGG architecture, there is no “metaphor” involved in the notion of
(p. 593) grammatical derivation. The formal notion of structure building in the compe­

tence model is the same as in the performance model, except that it is not anchored in
time. Moreover, the principles of grammar are the only routes of communication between
Page 23 of 31
A Parallel Architecture Model of Language Processing

semantic context and phonological structure: context effects involve no “wild card” inter­
actions using non-linguistic strategies. The Parallel Architecture thus paves the way for a
much closer interaction between linguistic theory and psycholinguistics than has been
possible in the past three decades.

Author Note
This article is an abridged version of Jackendoff, 2007. I am thankful to Gina Kuperberg
and Maria Mercedes Piñango for many detailed comments and suggestions on previous
drafts. Two anonymous reviewers also offered important suggestions. Martin Paczynski
helped a great deal with graphics. My deepest gratitude goes to Edward Merrin for re­
search support, through his gift of the Seth Merrin Professorship to Tufts University.

References
Arbib, M. A. (1982). From artificial intelligence to neurolinguistics. In M. A. Arbib, D. Ca­
plan, J. C. Marshall (Eds.), Neural models of language processes (pp. 77–94). New York:
Academic Press.

Baddeley, A. (1986). Working memory. Oxford, UK: Clarendon Press.

Bock, K. (1995). Sentence production: From mind to mouth. In J. L. Miller & P. D. Eimas
(Eds.), Handbook of perception and cognition, Vol. XI: Speech, language, and communica­
tion (pp. 181–216), Orlando, FL: Academic Press.

Bock, K., & Loebell, H. (1990). Framing sentences. Cognition, 35, 1–39.

Bresnan, J. (2001). Lexical-functional syntax. Oxford, UK: Blackwell.

Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of lin­
guistic theory based on domain general principles of human cognition. Linguistic Review,
22, 381–410.

Cheney, D., & Seyfarth, R. (1990). How monkeys see the world. Chicago: University of
Chicago Press.

Chierchia, G., & McConnell-Ginet, S. (1990). Meaning and grammar: An introduction to


semantics. Cambridge, MA: MIT Press.

Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Chomsky, N. (1981). Lectures on government and binding. Dordrecht: Foris.

Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press.

Page 24 of 31
A Parallel Architecture Model of Language Processing

Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge, UK:
Cambridge University Press.

Collins, A., & Quillian, M. (1969). Retrieval time from semantic memory. Journal of Verbal
Learning and Verbal Behavior, 9, 240–247.

Culicover, P., & Jackendoff, R. (2005). Simpler syntax. Oxford, UK: Oxford University
Press.

Dennett, D. C. (1991). Consciousness explained. Boston: Little, Brown.

Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.

Elman, J., Bates, E., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Re­
thinking innateness. Cambridge, MA: MIT Press.

Ford, M., Bresnan, J., Kaplan, R. C., 1982. A competencebased theory of syntactic closure.
In J. Bresnan (Ed.). The mental representation of grammatical relations (pp. 727–796).
Cambridge, MA: MIT Press.

Frazier, L. (1989). Against lexical generation of syntax. In W. Marslen-Wilson (Ed.), Lexi­


cal representation and process (pp. 505–528). Cambridge, MA: MIT Press.

Frazier, L., Carlson, K., & Clifton, C. (2006). Prosodic phrasing is central to language
comprehension. Trends in Cognitive Sciences, 10, 244–249.

Gee, J., & Grosjean, F. (1983). Performance structures: A psycholinguistic and lin­
(p. 595)

guistic appraisal. Cognitive Psychology, 15, 411–458.

Goldsmith, J. (1979). Autosegmental phonology. New York: Garland Press.

Hagoort, P. (2005). On Broca, brain, and binding: A new framework. Trends in Cognitive
Sciences, 9, 416–423.

Hauser, M. D. (2000). Wild minds: What animals really think. New York: Henry Holt.

Hirst, D. (1993). Detaching intonational phrases from syntactic structure. Linguistic In­
quiry, 24, 781–788.

Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press.

Jackendoff, R. (1987). Consciousness and the computational mind. Cambridge, MA: MIT
Press.

Jackendoff, R. (1990). Semantic structures. Cambridge, MA: MIT Press.

Jackendoff, R. (1996). The architecture of the linguistic-spatial interface. In P. Bloom, M.


A. Peterson, L. Nadel, M. F. Garrett (Eds.), Language and space (pp. 1–30). Cambridge,
MA: MIT Press.

Page 25 of 31
A Parallel Architecture Model of Language Processing

Jackendoff, R. (1997). The architecture of the language faculty. Cambridge, MA: MIT
Press.

Jackendoff, R. (2002). Foundations of language. Oxford, UK: Oxford University Press.

Jackendoff, R. (2006). Alternative minimalist visions of language. In Proceedings of the


41st meeting of the Chicago Linguistic Society, Chicago

. Reprinted in
R. Borsley & K. Börjars, Eds. (2011). Non-Transformational syntax (pp. 268–296). Oxford,
UK: Wiley-Blackwell.

Jackendoff, R. (2007). A Parallel Architecture perspective on language processing. Brain


Research, 1146, 2–22.

Jackendoff, R. (2010). Meaning and the lexicon: The Parallel Architecture 1975–2010. Ox­
ford, UK: Oxford University Press.

Jackendoff, R. (2012). A user’s guide to thought and meaning. Oxford, UK: Oxford Univer­
sity Press.

Jackendoff, R. (2011). What is the human language faculty? Two views. Language, 87,
586–624.

Jackendoff, R., & Pinker, S. (2005). The nature of the language faculty and its implications
for the evolution of language (reply to Fitch, Hauser, and Chomsky). Cognition, 97, 211–
225.

Lakoff, G. (1987). Women, fire, and dangerous things. University of Chicago Press, Chica­
go.

Landau, B., & Jackendoff, R. (1993). “What” and “where” in spatial language and spatial
cognition. Behavioral and Brain Sciences, 16, 217–238.

Langacker, R. (1987). Foundations of cognitive grammar (Vol. 1). Stanford, CA: Stanford
University Press.

Lappin, S. (1996). The handbook of contemporary semantic theory. Oxford, UK: Black­
well.

Lewis, R. (2000). Falsifying serial and parallel parsing models: Empirical conundrums and
an overlooked paradigm. Journal of Psycholinguistic Research, 29, 241–248.

Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8,
249–336.

MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment


on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109,
35–54.

Page 26 of 31
A Parallel Architecture Model of Language Processing

MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). Lexical nature of syn­
tactic ambiguity resolution. Psychological Review, 101, 676–703.

Marcus, G. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37, 243–


282.

Marcus, G. (2001). The algebraic mind. Cambridge, MA: MIT Press.

Miller, G. A., & Chomsky, N. (1963). Finitary models of language users. In R. D. Luce, R.
R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 419–
492). New York: Wiley.

Neisser, U. (1967). Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.

Paczynski, M., Jackendoff, R., & Kuperberg, G. R. When events change their nature. Un­
der revision.

Partee, B., Ed. (1976). Montague grammar. New York: Academic Press.

Phillips, C., & Lau, E. (2004). Foundational issues (review article on Jackendoff 2002).
Journal of Linguistics, 40, 1–21.

Piñango, M. M., Mack, J., & Jackendoff, R. (2006). Semantic combinatorial processes in
argument structure: Evidence from light verbs. Proceedings of the Berkeley Linguistics
Society.

Piñango, M. M., Zurif, E., & Jackendoff, R. (1999). Real-time processing implications of
enriched composition at the syntax-semantics interface. Journal of Psycholinguistic Re­
search, 28, 395–414.

Pinker, S. (1989). Learnability and cognition. Cambridge, MA: MIT Press.

Pinker, S. (1999). Words and rules. New York: Basic Books.

Pinker, S. (2007). The stuff of thought. New York: Penguin.

Pinker, S., & Jackendoff, R. (2005). The faculty of language: What’s special about it? Cog­
nition, 95, 201–236.

Pollard, C., & Sag, I. (1994). Head-driven phrase structure grammar. Chicago: University
of Chicago Press.

Prince, A., & Smolensky, P. (1993/2004). Optimality theory: Constraint interaction in gen­
erative grammar. Technical report, Rutgers University and University of Colorado at
Boulder, 1993. (Revised version published by Blackwell, 2004.)

Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.

Page 27 of 31
A Parallel Architecture Model of Language Processing

Pylkkänen, L., & McElree, B. (2006). The syntax-semantics interface: On-line composition
of sentence meaning. In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholin­
guistics (2nd ed.) New York: Elsevier.

Rosch, E., & Mervis, C. (1975). Family resemblances: Studies in the internal structure of
categories. Cognitive Psychology, 7, 573–605.

Schank, R. (1975). Conceptual information processing. New York: Elsevier.

Shieber, S. (1986). An introduction to unification-based approaches to grammar. Stanford,


CA: CSLI.

Smith, E., & Medin, D. (1981). Categories and concepts. Cambridge, MA: Harvard Univer­
sity Press.

Smith, E., Shoben, E., & Rips, L. (1974). Structure and process in semantic memory: A
featural model for semantic decisions. Psychological Review, 81, 214–241.

Swinney, D. (1979). Lexical access during sentence comprehension: (Re)consideration of


context effects. Journal of Verbal Learning and Verbal Behavior, 18, 645–659.

Tabor, W., & Tanenhaus, M. (1999). Dynamical models of sentence processing. Cognitive
Science, 23, 491–515.

Talmy, L. (1988). Force-dynamics in language and thought. Cognitive Science, 12, 49–100.

Tanenhaus, M., Leiman, J. M., & Seidenberg, M. (1979). Evidence for multiple stages in
the processing of ambiguous words in syntactic contexts. Journal of Verbal Learning and
Verbal Behavior, 18, 427–440.

Tanenhaus, M., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integra­
tion of visual and linguistic information in spoken language comprehension. Science, 268,
1632–1634.

Traxler, M. J., Pickering, M. J., & Clifton, C. (1998). Adjunct attachment is not a form of
lexical ambiguity resolution. Journal of Memory and Language, 39, 558–592.

Truckenbrodt, H. (1999). On the relation between syntactic phrases and phonological


phrases. Linguistic Inquiry, 30, 219–255.

Verkuyl, H. (1993). A theory of aspectuality: The interaction between temporal and atem­
poral structure. Cambridge, MA: Cambridge University Press.

Wittenberg, E., Jackendoff, R., Kuperberg, G., Paczynski, M., Snedeker, J., & Wiese, H.
(forthcoming) The processing and representation of light verb constructions. In Bachrach,
A., Roy, I. & Stockall, L. (eds). Structuring the argument. John Benjamins.

Page 28 of 31
A Parallel Architecture Model of Language Processing

Wittenberg, E., Paczynski, M., Wiese, H., Jackendoff, R., & Kuperberg, G. (in revision).
The difference between “giving a rose” and “giving a kiss”: A sustained anterior negativi­
ty to the light verb construction.

Notes:

(1) . What counts as “enough” syntactic structure might be different in perception and
production. Production is perhaps more demanding of syntax, in that the processor has to
make syntactic commitments in order to put words in the correct order; to establish the
proper inflectional forms of verbs, nouns, and adjectives (depending on the language); to
leave appropriate gaps for long-distance dependencies; and so on. Perception might be
somewhat less syntax bound in that “seat-of-the-pants” semantic processing can often get
close to a correct interpretation.

(2) . Frazier et al. (2006) suggest that there is more to the prosody–syntax interface than
rule (4b), in that the relative length of pauses between Intonation Phrases can be used to
signal the relative closeness of syntactic relationship among constituents. This result
adds a further level of sophistication to rule (4b), but it does not materially affect the
point being made here. It does, however, show how experimental techniques can be used
to refine linguistic theory.

(3) . A further point: The notion of an independently generative phonology lends itself ele­
gantly to the description of signed languages, in which phonological structure in the visu­
al-manual modality can easily be substituted for the usual auditory-vocal system.

(4) . Phillips and Lau (2004) find such anticipatory parsing “somewhat mysterious” in the
context of the most recent incarnation of MGG, the Minimalist Program. Frazier (1989),
assuming a mainstream architecture, suggests that the processor uses “precompiled”
phrase structure rules to create syntactic hypotheses. Taken in her terms, the treelets in
(6) are just such precompiled structures. However, in the Parallel Architecture, there are
no “prior” algorithmic phrase structure rules like (5) from which the treelets are “com­
piled”; rather, one’s knowledge of phrase structure is encoded directly in the repertoire of
treelets.

(5) . Bybee and McClelland (2005), observing that languages are far less regular than
MGG thinks, take this as license to discard general rules altogether in favor of a statisti­
cally based connectionist architecture. But most of the irregularities they discuss are
word-based constraints. They completely ignore the sorts of fundamental syntactic gener­
alizations just enumerated.

(6) . I do not exclude the possibility that high-frequency regulars are redundantly stored
in the lexicon. I would add that I do not necessarily endorse every claim made on behalf
of dual-process models of inflection. For more discussion, see Jackendoff (2002, pp.
163-167).

Page 29 of 31
A Parallel Architecture Model of Language Processing

(7) . Unification is superficially like the Merge operation of the Minimalist Program (the
most recent version of MGG). However, there are formal and empirical differences that
favor unification as the fundamental generative process in language. See Jackendoff
(2006, 2011) for discussion.

(8) . Note that rhymes cannot be all memorized. One can judge novel rhymes that cannot
be stored in the lexicon because they involve strings of words. Examples are Gilbert and
Sullivan’s lot o’news/hypotenuse, Ira Gershwin’s embraceable you/irreplaceable you, and
Ogden Nash’s to twinkle so/I thinkle so. Moreover, although embraceable is a legal Eng­
lish word, it is probably an on-the-spot coinage; and thinkle is of course a distortion of
think made up for the sake of a humorous rhyme. So these words are not likely stored in
the lexicon (unless one has memorized the poem).

(9) . For instance, none of the connectionists referred to here cite Marcus; neither do any
of the papers in a 1999 special issue of Cognitive Science entitled “Connectionist Models
of Human Language Processing: Progress and Prospects”; neither was he cited other
than by me at a 2006 Linguistic Society of America Symposium entitled “Linguistic Struc­
ture and Connectionist Models: How Good is the Fit?”

(10) . In describing linguistic working memory as having three “departments,” I do not


wish to commit to whether or not they involve different neural mechanisms or different
brain localizations. The intended distinction is only that phonological working memory is
devoted to processing and constructing phonological structures, syntactic working memo­
ry to syntactic structures, and semantic working memory to semantic structures. This is
compatible with various theories of functional and neural realization. However, Hagoort
(2005) offers an interpretation of the three parallel departments of linguistic working
memory in terms of brain localization.

(11) . Note that in sentence production, the dependency goes the other way: A speaker us­
es the semantic relations in the thought to be expressed to guide the arrangement of
words in syntactic structure.

(12) . Spatial structure in working memory also has the potential for multiple drafts. Is
there a cat behind the bookcase or not? These hypotheses are represented as two differ­
ent spatial structures corresponding to the same visual input.

(13) . Another strand of psycholinguistic research on coercion (e.g., Pylkkänen & McElree,
2006) also finds evidence of increased processing load with coerced sentences. However,
those authors’ experimental technique, self-paced reading, does not provide enough tem­
poral resolution to distinguish syntactic from semantic processing load. Paczynski, Jack­
endoff, and Kuperberg (in revision) find ERP effects connected with aspectual coercion.

Ray Jackendoff

Page 30 of 31
A Parallel Architecture Model of Language Processing

Ray Jackendoff is Seth Merrin Professor of Philosophy and Co-Director of the Center
for Cognitive Studies at Tufts University. He was the 2003 recipient of the Jean Nicod
Prize in Cognitive Philosophy and has been President of both the Linguistic Society of
America and the Society for Philosophy and Psychology. His most recent books are
Foundations of Language (Oxford, 2002), Simpler Syntax (with Peter Culicover, Ox­
ford, 2005), Language, Consciousness, Culture (MIT Press, 2007), Meaning and the
Lexicon (Oxford, 2010), and A User’s Guide to Thought and Meaning (Oxford, 2011).

Page 31 of 31
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?

Epilogue to The Oxford Handbook of Cognitive Neuro­


science—Cognitive Neuroscience: Where Are We Go­
ing?  
Kevin N. Ochsner and Stephen Kosslyn
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience


Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0029

Abstract and Keywords

This epilogue looks at themes and trends that hint at future developments in cognitive
neuroscience. It first considers how affective neuroscience merged the study of neuro­
science and emotion, how social neuroscience merged the study of neuroscience and so­
cial behavior, and how social cognitive neuroscience merged the study of cognitive neuro­
science with social cognition. Another theme is how the levels of analysis of behavior/ex­
perience can be linked with psychological process and neural instantiation. Two topics
that have not yet been fully approached from a cognitive neuroscience perspective, but
seem ripe for near-term future progress, are the study of the development across the
lifespan of the various abilities described in the book, and the study of the functional or­
ganization of the frontal lobes and their contributions to behaviors (e.g., the ability to ex­
ert self-control). This epilogue also explores the multiple methods, both behavioral and
neuroscientific, used in cognitive neuroscience, new ways of modeling relationships be­
tween levels of analysis, and the question of how to make cognitive neuroscience relevant
to everyday life.

Keywords: cognitive neuroscience, emotion, social behavior, social cognition, functional organization, frontal
lobes, behaviors, methods, analysis, neural instantiation

Whether you have read the two-volume Handbook of Cognitive Neuroscience from cover
to cover or have just skimmed a chapter or two, we hope that you take away a sense of
the breadth and depth of work currently being conducted in the field. Since the naming of
the field in the backseat of a New York City taxicab some 35 years ago, the field and the
approach it embodies have become a dominant—if not the dominant—mode of scientific
inquiry in the study of human cognitive, emotional, and social functions.

But where will it go from here? Where will the next 5, 10, or even 20 years take the field
and its approach? Obviously, nobody can say for sure—but there are broad intellectual

Page 1 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
themes and trends that run throughout this two-volume set, and a discussion of them can
be used as a springboard to thinking about possible directions future work might take.

Themes and Trends


Here we discuss themes and trends that hint at possible future developments, focusing on
those that may be more likely to occur in the relatively near term.

What’s in a Name?

It is said that imitation is the sincerest form of flattery. Given the proliferation of new ar­
eas of research with names that seemingly mimic cognitive neuroscience, the original has
reason to feel flattered.

Consider, for example, the development of three comparatively newer fields and the dates
of their naming: social neuroscience (Cacioppo, 1994), affective neuroscience (Panksepp
1991), and social cognitive neuroscience (Ochsner & Lieberman, 2001). Although all
three fields are undoubtedly the (p. 600) products of unique combinations of influences
(see, e.g., Cacioppo, 2002; Ochsner, 2007; Panksepp, 1998), they each followed in the
footsteps of cognitive neuroscience. In cognitive neuroscience the study of cognitive abili­
ties and neuroscience were merged, and in the process of doing so, the field has made
considerable progress. In like fashion, affective neuroscience combined the study of emo­
tion with neuroscience; social neuroscience, the study of social behavior with neuro­
science; and social cognitive neuroscience, the study of social cognition with cognitive
neuroscience.

All three of these fields have adopted the same kind of multilevel, multimethod con­
straints and convergence approach embodied by cognitive neuroscience (as we discussed
in the Introduction to this Handbook). In addition, each of these fields draws from and
builds on, to differing degrees, the methods and models developed within what we can
now call “classic” cognitive neuroscience (see Vol. 1 of the Handbook). These new fields
are siblings in a family of fields that have the similar, if not identical, research “DNA.”

It is for these reasons that Volume 2 of this Handbook has sections devoted to affect and
emotion and to self and social cognition. The topics of the constituent chapters in these
sections could easily appear in handbooks of affective or social or social cognitive neuro­
science (and in some cases, they already have, see, e.g., Cacioppo & Berntson, 2004;
Todorov et al., 2011). We included this material here because it represents the same core
approach that guides research on the classic cognitive topics in Volume 1 and in the lat­
ter half of Volume 2.

Page 2 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?

One might wonder whether these related disciplines are on trajectories for scientific and
popular impact similar to that of classic cognitive neuroscience. In the age of the Inter­
net, one way of quantifying impact is simply to count the number of Google hits returned
by a search for specific terms, in this case, “cognitive neuroscience,” “affective neuro­
science,” and so on. The results of an April 2012 Google search for field names is shown
in the tables at right. The top table compares cognitive neuroscience with two of its an­
tecedent fields: cognitive psychology (Neisser, 1967) and neuroscience. The bottom table
compares the descendants of classic cognitive neuroscience that were noted above. As
can be seen, cognitive psychology and neuroscience are the oldest fields and the ones
with the most online mentions. By comparison, their descendant, cognitive neuroscience,
which describes a narrower field than either of its ancestors, is doing quite well. And the
three newest fields of social, affective, and social cognitive neuroscience, each of which
describes fields even narrower than that of cognitive neuroscience, also are doing well,
with combined hit counts totaling about one-third that of cognitive neuroscience, in spite
of the fact that the youngest field is only about one-third of cognitive neuroscience’s age.

How Do We Link Levels of Analysis?

A theme running throughout the chapters concerns the different ways in which we can
link the levels of analysis of behavior/experience, psychological process, and neural in­
stantiation. Here, we focus on two broad issues that were addressed, explicitly or implic­
itly, by many of the authors of chapters in these volumes.

The first issue is the complexity of the behaviors that one is attempting to map onto un­
derlying processes and neural systems. For example, one might ask whether we should
try to map what might be called “molar” abilities, such as memory or attention, onto sets
of processes and neural systems, or instead whether we should try to map “molecular”
subtypes of memory and subtypes of attention onto their constituent processes and neur­
al systems. As alluded to in the Introduction, for most of the abilities described in Volume
1, it was clear as early as 20 years ago that a more molecular, subtype, method of map­
ping makes the most sense in the context of neuroscience data. The current state-of-the-

Page 3 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
art in the study of perception, attention, memory, and language (reviewed in Volume 1 of
this Handbook) clearly bears this out. All the chapters in these sections describe careful
ways in which researchers have used combinations of behavioral and brain data to frac­
tionate the processes that give rise to specific subtypes of abilities.

This leads us to the second issue, which concerns the fact that for at least some of the
topics discussed (p. 601) in Volume 2, only recently has it become clear that more molecu­
lar mappings are possible. This is because for at least some of the Volume 2 topics, behav­
ioral research before the rise of the cognitive neuroscience approach had not developed
clearly articulated process models that specified explicitly how information is represent­
ed and processed to accomplish a particular task. This limitation was perhaps most evi­
dent for topics such as the self, some aspects of higher level social cognition such as men­
tal state inference, and some aspects of emotion, including how emotions are generated
and regulated. Twenty years ago, when functional neuroimaging burst on the scene, re­
searchers had proposed few if any process models of these molar phenomena. Hence, ini­
tial functional imaging and other types of neuroscience studies on these topics had more
of a “let’s induce an emotional state or evoke a behavior and see what happens” flavor,
and often they did not attempt to test specific theories. This is not to fault these re­
searchers; at the time, they did not have the advantage of decades of process-oriented be­
havioral research from cognitive psychology and vision research to help guide them (see,
e.g., Ochsner & Barrett, 2001; Ochsner & Gross, 2004). Instead, researchers had to devel­
op process models on the fly.

However, times have changed. As attested by the chapters in the first two sections of Vol­
ume 2, the incorporation of brain data into research on the self, social perception, and
emotion has been very useful in developing increasingly complex, “molecular” theories of
the relationships between the behavior/experience, psychological process, and neural in­
stantiation.

Just as the study of memory moved beyond single-system models and toward multiple-sys­
tem models (Schacter & Tulving, 1994), the study of the self, social cognition, and emo­
tion has begun to move beyond simplistic notions that single brain regions (such as the
medial prefrontal cortex or amygdala) are the seat of these abilities.

Looking Toward the Future


Without question, progress has been made. What might the current state of cognitive
neuroscience research auger for the future of cognitive neuroscience research? Here we
address this question in four ways.

New Topics

One of the ideas that recurs in the chapters of this Handbook is that the cognitive neuro­
science approach is a general-purpose scientific tool. This approach can be used to ask
and answer questions about any number of topics. Indeed, even within the broad scope of
Page 4 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
this two-volume set, we have not covered every topic already being fruitfully addressed
using the cognitive neuroscience approach.

That said, of the many topics that have not yet been approached from a cognitive neuro­
science perspective, do any appear particularly promising? Four such topics seem ripe for
near-term future progress. These topics run the gamut from the study of specific brain
systems to the study of lifespan development and differences in group or social network
status, to forging links with the study of mental and physical health.

The first topic is the study of the functional organization of the frontal lobes and the con­
tributions they make to behaviors such as the ability to exert self-control. At first blush,
this might seem like a topic that already has received a great deal of attention. From one
perspective, it has. Over the past few decades numerous labs have studied the relation­
ship of the frontal lobes to behavior. From another perspective, however, not much
progress has been made. What is missing are coherent process models that link specific
behaviors to specific subregions of prefrontal cortex. Notably, some chapters in this
Handbook (e.g., those by Badre, Christoff, and Silvers et al.) attempt to do this within spe­
cific domains. But no general theory of prefrontal cortex has yet emerged that can link
the myriad behaviors in which it is involved to specific and well-described processes that
in turn are instantiated in specific portions of this evolutionarily newest portion of our
brain.

The second topic is the study of the development across the lifespan of the various abili­
ties described in the Handbook. Although some Handbook sections include chapters on
development and aging, many do not—precisely because the cognitive neuroscientific
study of lifespan changes in many abilities has only just begun. Clearly, the development
from childhood into adolescence of various cognitive, social, and affective abilities is cru­
cially important, as is the ways in which these abilities change as we move from middle
adulthood into older age (Casey et al, 2010; Charles & Carstensen, 2010; Mather, 2012).
The multilevel approach that characterizes the cognitive neuroscience approach holds
promise of deepening our understanding of such phenomena. Toward this end, it is impor­
tant to note that new journals devoted to some of these topics have (p. 602) appeared
(e.g., Developmental Cognitive Neuroscience, which was first published in 2010), and var­
ious institutes within the National Institutes of Health (NIH) have called for research on
these topics.

The third topic is the study of the way in which group-level variables impact the develop­
ment and operation of the various processing systems described in both Volumes of this
Handbook. Notably, this is an area of research that is not yet represented in the Hand­
book, although interest in connecting the study of group-level variables to the study of
the brain has been growing over the past few years. Consider, for example, emerging re­
search suggesting that having grown up as a member of different cultural groups can dic­
tate whether and how one engages perceptual, memory, and affective systems both when
reflecting on the self and in social settings (Chiao, 2009). There is also evidence that the
size of one’s social networks can impact the structure of brain systems involved in affect

Page 5 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
and affiliation, and that one’s status within these networks (Bickart et al., 2011) can de­
termine whether and when one recruits brain systems implicated in emotion and social
cognition (Bickart et al., 2011; Chiao, 2010; Muscatell et al., 2011). Forging links be­
tween group-level variables and the behavior/experience, process, and brain levels that
are the focus in the current Handbook will prove challenging and may require new kinds
of collaborative relationships with other disciplines, such as sociology and anthropology.
As these collaborations grow to maturity, we predict this work will make its way into fu­
ture editions of the Handbook.

The fourth topic is the way in which specific brain systems play important roles in physi­
cal, as well as mental, health. The Handbook already includes chapters that illustrate how
cognitive neuroscience approaches are being fruitfully translated to understand the na­
ture of dysfunction, and potential treatments for it, in various kinds of psychiatric and
substance use disorders (see e.g., Barch et al., 2009; Johnstone et al., 2007; Kober et al.,
2010; Ochsner, 2008 and the section below on Translation). This type of translational
work is sure to grow in the future. What the current Handbook is missing, however, is dis­
cussion of how brain systems are critically involved in physical health via their interac­
tions with the immune system. This burgeoning area of interest seeks to connect fields
such as health psychology with cognitive neuroscience and allied disciplines to under­
stand how variables like chronic stress or disease, or social connection vs. isolation, can
boost or diminish physical health. Such an effect would arise via interactions between the
immune system and brain systems involved in emotion, social cognition, and control
(Muscatell & Eisenberger, 2012; Eisenberger & Cole, 2012). This is another key area of
future growth that we expect to be represented in this Handbook in the future.

New Methods

How are we going to make progress on these questions and the countless others posed in
the chapters of the Handbook? On the one hand, the field will undoubtedly continue to
make good use of the multiple methods—both behavioral and neuroscientific—that have
been its bread and butter for the past decades. As noted in the Introduction, certain em­
pirical and conceptual advances were only made possible by technological advances,
which enabled us to measure activity with dramatically new levels of spatial and temporal
resolution. The advent of positron emission tomography, and later functional magnetic
resonance imaging (20–30 years ago), were game-changing advances.

On the other hand, these functional imaging techniques are still limited in terms of their
spatial and temporal resolution, and the areas of the brain they allow researchers to fo­
cus on reflect the contributions of many thousands of neurons. Other techniques, such as
magnetoencephalography and scalp electroencephalography, offer relatively good tempo­
ral resolution, but their spatial localization is relatively poor. Moreover, they are best suit­
ed to studying cortical rather than subcortical regions.

We could continue to beat the drum for the use of converging methods: What one tech­
nique can’t do, another can, and by triangulating across methods, better theories can be

Page 6 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
built and evaluated. But for the next stage of game-changing methodological advances to
be realized, either current technologies will need to undergo a transformation that en­
ables them to combine spatial and temporal resolution in new ways or new techniques
that have better characteristics will need to be invented.

New Ways of Modeling Relationships Between Levels of Analysis

All this said, even the greatest of technological advances will not immediately be useful
unless our ability to conceptualize the cognitive and emotional processes that lie between
brain and behavior becomes more sophisticated.

At present, most theorizing in cognitive neuroscience makes use of commonsense termi­


nology for describing human abilities. We talk about memory, (p. 603) perception, emo­
tion, and so on. We break these molar abilities into more molecular parts and character­
ize them in terms of their automatic or controlled operation, whether the mental repre­
sentations are relational, and so on. Surely, however, the computations performed by spe­
cific brain regions did not evolve to instantiate our folk-psychological ideas about how
best to describe the processes underlying behavior.

One possible response to this concern is that the description of phenomena at multiple
levels of analysis allows us to sidestep this problem. One could argue that at the highest
level of description, it’s just fine to use folk-psychological terms to describe behavior and
experience. After all, our goal is to map these terms—which prove extremely useful for
everyday discourse about human behavior—onto precise descriptions of underlying neur­
al circuitry by reference to a set of information processing mechanisms.

Unfortunately, however, many researchers do not restrict intuitively understandable folk-


psychological terms to describe behavior and experience, but also use such terms to de­
scribe information processing itself. In this case, process-level descriptions are not likely
to map in a direct way onto neural mechanisms.

Marr (1982) suggested a solution to this problem: Rely on the language of computation to
characterize information processing. The language of computation characterizes what
computers do, and this language often can be applied to describe what brains do. But
brains are demonstrably not digital computers, and thus it is not clear whether the tech­
nical vocabulary that evolved to characterize information processing in computers can in
fact always be appropriately applied to brains. Back in the 1980s, many researchers
hoped that connectionist models might provide an appropriate kind of computational
specificity. More recently, computational models from the reinforcement learning and
neuroeconomic literatures have been advanced as offering a new level of computational
specificity.

Although no existing approach has yet offered a computational language that is powerful
enough to describe more than thin slices of human information processing, we believe

Page 7 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
that such a medium will become a key ingredient of the cognitive neuroscience approach
in the future.

Translation

In an era in which increasing numbers of researchers are applying for a static or shrink­
ing pool of grant funding, some have come to focus on the question of how to use cogni­
tive neuroscience to solve problems that arise in everyday life (and therefore address the
concerns of funding agencies, which often are pragmatic and applied).

Research is often divided into two categories (somewhat artificially): “Foundational” re­
search focuses on understanding phenomena for its own sake, whereas “translational” re­
search focuses on using such understanding to solve a real-world problem. Taking cogni­
tive neuroscience models of abilities based on studies of healthy populations and applying
them to understand and treat the bases of dysfunction in specific groups is one form of
translational research. This will surely be an area of great future growth.

Already, a number of areas of psychiatric and substance use research have adopted a two-
step translational research sequence (e.g., Barch et al., 2004, 2009; Carter et al., 2009;
Ochsner, 2008; Paxton et al., 2008). The first step involves building a model of normal be­
havior, typically in healthy adults, using the cognitive neuroscience approach. The second
step involves translating that model to a population of interest, and using the model to ex­
plain the underlying bases of the disorder or other deviation from the normal baseline—
and this would be a crucial step in eventually developing effective treatments. This popu­
lation could suffer from some type of clinically dysfunctional behavior, such as the four
psychiatric groups described in Part 4 of Volume 2 of the Handbook. It could be an ado­
lescent or older adult population, as described in a handful of chapters scattered across
sections of the Handbook. Or—as was not covered in the Handbook, but might be in the
future—it could be a vulnerable group for whom training in a specific type of cognitive,
affective, or social skill would improve the quality of life.

The possibilities abound—and it would behoove researchers in cognitive neuroscience to


capitalize on as many of them as possible. Not just for the pragmatic reason that they
may be more likely to be funded but, more importantly, for the principled reason that it
matters. It matters that we understand real-world, consequential behavior. Yes, we need
to start by studying the ability to learn a list of words in the lab, and we need to under­
stand the brain systems responsible for such relatively simple tasks. But then we need to
move toward understanding, for example, how these brain systems do or do not function
normally in a child growing up in an impoverished household compared with a child af­
forded every advantage (Noble et al., 2007).

Happily, there is evidence that federal funding agencies are beginning to under­
(p. 604)

stand the importance of this two-step, foundational-to-translational research sequence. In


2011, the National Institute of Mental Health (NIMH) announced the Research Domain
Criteria (RDoC) framework as part of NIMH’s Strategic Plan to “Develop, for research
purposes, new ways of classifying mental disorders based upon dimensions of observable
Page 8 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
behavior and neurobiological functioning” (http://www.nimh.nih.gov/about/strategic-plan­
ning-reports/index.shtml). In essence, the RDoC’s framework aims to replace the tradi­
tional symptom-based means of describing abnormal behavior (and that characterizes tra­
ditional psychiatric diagnosis) with a means of describing the full range of normal to ab­
normal behavior in terms of fundamental underlying processes. The idea is that, over
time, researchers will seek to target and understand the nature of these processes, the
ways in which they can go awry, and the behavioral variability to which they can give rise
—as opposed to targeting traditionally defined clinical groups. For example, a researcher
could target processes for generating positive or negative affect, or their control, or the
ways in which interactions between affect and control processes break down to produce
anhedonia or a preponderance of negative affect—as opposed to focusing on a discretely
defined disorder such as major depression (e.g, Pizzagalli et al., 2009).

The two-step approach allows initial research to focus on understanding core processes—
considered in the context of different levels of analysis—but with an eye toward then un­
derstanding how variability in these processes gives rise to the full range of normal to ab­
normal behavior. Elucidating the fundamental nature of these cognitive and emotional
processes, and their relation to the behavioral/experiential level above and to the neural
level below, is the fundamental goal of cognitive neuroscience.

Concluding Comment
How do we measure the success of a field? By the number of important findings and in­
sights? By the number of scientists and practitioners working within it?

If we take that late 1970s taxicab ride, when the term cognitive neuroscience was first
used as the inception point for the field, then by any and all of these metrics, cognitive
neuroscience has been enormously successful. Compared with physics, chemistry, medi­
cine, and biology, however—or even compared with psychology and neuroscience—cogni­
tive neuroscience is just beginning to hit its stride. This is to be expected, given that it
has existed only for a very short period of time. Indeed, the day for cognitive neuro­
science is still young.

This is good news. Even though cognitive neuroscience is entering its mid-30s, compared
with these other broad disciplines that were established hundreds of years ago, this isn’t
even middle age. The hope, then, is that the field can continue to blossom and grow from
its adolescence to full maturity—and make good on the promising returns it has produced
so far.

References
Barch, D. M., Braver, T. S., Carter, C. S., Poldrack, R. A., & Robbins, T. W. (2009). CN­
TRICS Final task selection: Executive control. Schizophrenia Bulletin, 35, 115–135.

Page 9 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Barch, D. M., Mitropoulou, V., Harvey, P. D., New, A. S., Silverman, J. M., & Siever, L. J.
(2004). Context-processing deficits in schizotypal personality disorder. Journal of Abnor­
mal Psychology, 113, 556–568.

Bickart, K. C., Wright, C. I., Dautoff, R. J., Dickerson, B. C., & Barrett, L. F. (2011). Amyg­
dala volume and social network size in humans. Nature Neuroscience, 14, 163–164.

Cacioppo, J. T. (1994). Social neuroscience: Autonomic, neuroendocrine, and immune re­


sponses to stress. Psychophysiology, 31, 113–128.

Cacioppo, J. T. (2002). Social neuroscience: Understanding the pieces fosters understand­


ing the whole and vice versa. American Psychologist, 57, 819–381.

Cacioppo, J. T., & Berntson, G. G. (2004) (Eds.). Social neuroscience: Key readings (Vol.
14). New York: Ohio State University Psychology Press.

Carter, C. S., Barch, D. M., Gur, R., Pinkham, A., & Ochsner, K. (2009). CNTRICS Final
task selection: Social cognitive and affective neuroscience-based measures. Schizophre­
nia Bulletin, 35, 153–162.

Casey, B. J., Jones, R. M., Levita, L., Libby, V., Pattwell, S. S., et al. (2010). The storm and
stress of adolescence: Insights from human imaging and mouse genetics. Developmental
Psychobiology, 52, 225–235.

Charles, S. T., & Carstensen, L. L. (2010). Social and emotional aging. Annual Review Psy­
chology, 61, 383–409.

Chiao, J. Y. (2009). Cultural neuroscience: a once and future discipline. Progress in Brain
Research, 178, 287–304.

Chiao, J. Y. (2010). Neural basis of social status hierarchy across species. Current Opinion
in Neurobiology, 20, 803–809.

Eisenberger, N. I., & Cole, S. W. (2012). Social neuroscience and health: neurophysiologi­
cal mechanisms linking social ties with physical health. Nature Neuroscience, 15, 669–
674.

Johnstone, T., van Reekum, C. M., Urry, H. L., Kalin, N. H., & Davidson, R. J. (2007). Fail­
ure to regulate: counterproductive recruitment of top-down prefrontal-subcortical circuit­
ry in major depression. The Journal of Neuroscience: the Official Journal of the Society for
Neuroscience, 27, 8877–8884.

Kober, H., Mende-Siedlecki, P., Kross, E. F., Weber, J., Mischel, W., et al. (2010). Pre­
frontal-striatal pathway underlies cognitive regulation of craving. Proceedings of the Na­
tional Academy of Sciences of the United States of America, 107, 14811–14816.

Mather, M. (2012). The emotion paradox in the aging brain. Annals of the New York Acad­
emy of Sciences, 1251, 33–49.

Page 10 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Marr, D. (1982). Vision: A computational investigation into the human representa­
(p. 605)

tion and processing of visual information (397pp). San Francisco: W.H. Freeman.

Muscatell, K. A., & Eisenberger, N. I. (2012). A social neuroscience perspective on stress


and health. Social and Personality Psychology Compass, 6, 890–904.

Muscatell, K. A., Morelli, S. A., Falk, E. B., Way, B. M., Pfeifer, J. H., et al. 2012. Social sta­
tus modulates neural activity in the mentalizing network. NeuroImage, 60, 1771–7.

Neisser, U. (1967). Cognitive Psychology (Vol. 16). New York: Appleton-Century-Crofts.

Noble, K. G., McCandliss, B. D., & Farah, M. J. (2007). Socioeconomic gradients predict
individual differences in neurocognitive abilities. Developmental Science, 10, 464–480.

Ochsner, K. (2007). Social cognitive neuroscience: Historical development, core princi­


ples, and future promise. In A. Kruglanksi & E. T. Higgins (Eds.), Social Psychology: A
Handbook of Basic Principles (pp. 39–66). New York: Guilford Press.

Ochsner, K. N. (2008). The social-emotional processing stream: Five core constructs and
their translational potential for schizophrenia and beyond. Biological Psychiatry, 64, 48–
61.

Ochsner, K. N., & Barrett, L. F. (2001). A multiprocess perspective on the neuroscience of


emotion. In T. J. Mayne & G. A. Bonanno (Eds.), Emotions: Current Issues and Future Di­
rections (pp. 38–81). New York: Guilford Press.

Ochsner, K. N., & Gross, J. J. (2004). Thinking makes it so: A social cognitive neuroscience
approach to emotion regulation. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of
Self-regulation: Research, Theory, and Applications (pp. 229–255). New York: Guilford
Press.

Ochsner, K. N., & Lieberman, M. D. (2001). The emergence of social cognitive neuro­
science. American Psychologist, 56, 717–734.

Panksepp, J. (1991). Affective neuroscience: A conceptual framework for the neurobiologi­


cal study of emotions. International Review of Studies on Emotion, 1, 59–99.

Panksepp, J. (1998). Affective neuroscience: The foundations of human and animal emo­
tions. New York: Oxford University Press.

Paxton, J. L., Barch, D. M., Racine, C. A., & Braver, T. S. (2008). Cognitive control, goal
maintenance, and prefrontal function in healthy aging. Cerebral Cortex, 18, 1010–1028.

Pizzagalli, D. A., Holmes, A. J., Dillon, D. G., Goetz, E. L., Birk, J. L., Bogdan, R., et al.
(2009). Reduced caudate and nucleus accumbens response to rewards in unmedicated in­
dividuals with major depressive disorder. Am J Psychiatry, 166 (6), 702–710.

Schacter, D. L., & Tulving, E. (1994). Memory Systems 1994 (Vol. 8). Cambridge, MA: MIT
Press.
Page 11 of 12
Epilogue to The Oxford Handbook of Cognitive Neuroscience—Cognitive
Neuroscience: Where Are We Going?
Todorov, A. B., Fiske, S. T., & Prentice, D. A. (2011). Social neuroscience: Toward Under­
standing the Underpinnings of the Social Mind (Vol. 8). New York: Oxford University
Press.

Kevin N. Ochsner

Kevin N. Oschner is a professor in the Department of Psychology at Columbia Univer­


sity in New York, NY.

Stephen Kosslyn

Stephen M. Kosslyn, Center for Advanced Study in the Behavioral Sciences, Stanford
University, Stanford, CA

Page 12 of 12
Index

Index  
The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics
Edited by Kevin N. Ochsner and Stephen Kosslyn

Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. 606) (p. 607) Index


A
abstract knowledge, 365–366
acetylcholine, 301t
orienting network, 302
achromatopsia, 80
acoustic frequencies, 145
acoustic properties
facial movements and vocal acoustics, 529–531
linking articulatory movements to vocal, 525–526
speech perception, 509–512
acoustic shadowing effect, 150
action
attention and, 255–256, 267–268
attention for, 261
divergence from attention, 261–263
imagery, 123–124
music, 122–124
object-associated, 567–568
performance, 123
sensorimotor coordination, 122–123
sensorimotor modality, 367
action-blindsight, neuropsychology, 322
action monitoring, attention, 299
activate-predict-confirm perception cycle, 62–63
adaptive coding, auditory system, 149–150
affective neuroscience, 599, 600
aging
binding deficit hypothesis and medial temporal lobe (MTL) function, 468
cognitive theories of, 457
compensation-related utilization of neural circuits hypothesis (CRUNCH), 459, 462, 470
double dissociation between medial temporal lobe (MTL) regions, 466f
episodic memory encoding, 459–461, 464
episodic memory retrieval, 461–463, 464–465
Page 1 of 46
Index

functional neuroimaging of cognitive, 465–467


future directions, 469–470
healthy vs. pathological, 468–469
hemispheric asymmetry reduction in older adults (HAROLD), 458–463, 470
medial temporal lobes, 463–465
prefrontal cortex (PFC) activity during episodic memory retrieval, 462f
resource deficit hypothesis and PFC function, 467
sustained and transient memory effects, 461f
working memory, 458–459, 463
working memory and episodic memory, 456
working memory and prefrontal cortex (PFC) for younger and older adults, 459f
agnosia, 80, 530
auditory, 201
visual, 198–200, 204, 278, 279f
visual, of patient D. E., 199
agnosia for letters, 200
agraphia, 497
alexia without, 495
AIDS, 473
akinetopsia, 198
alerting network, attention, 300–301
alexia, 80
alexia without agraphia, 495
algorithm level, cognitive neuroscience, 2
allocentric neglect, 40
Allport, D. A., 356, 568
sensorimotor model of semantic memory, 362
Alzheimer’s dementia, 203, 456, 562
amnesia, 474
amnesic patient, seizures, 4, 438
amnesic syndrome, 474
amplitude compression, sound, 140
amplitude modulation, sound measurement, 145–146
amusia, 127, 201
amygdala, 90, 93, 95f, 601, 602
audition, 200
emotion, 125
gustation, 202, 203
reconsolidation, 450
sensory, 304
sleep, 446
social perception, 203
visual experience, 566
anatomical, 61
anatomy
attention networks, 301t
music and brain, 126–127
angular gyrus (AG), 36, 128f, 491

Page 2 of 46
Index

language, 173
neglect syndrome, 331
reading and spelling, 494, 499–500
speech perception, 509
speech processing system, 508f
anorexia, olfactory perception, 102
anterior cingulate cortex (ACC)
executive attention, 301t, 302, 304
learning, 419, 421
retrieval, 384
anterior olfactory nucleus, 93, 95f
anterior piriform cortex, 95f, 96
anterograde amnesia, 376, 439, 443, 474
aphasia, 368n.3, 390
verbal working memory, 402–403
apraxia, 327, 567, 568
object-related actions, 361
Aristotle, 417
articulation
linking to vocal acoustics, 525–526
speech perception, 514–515
artificial neural networks, spatial judgments, 41
asomatognosia, 202
Association for Chemoreception Science, 97
associative phase, stage model, 417–418
astereopsis, 197
Atlas of Older Character Profiles, Dravnieks, 97
attack, timbre, 114
attention, 255–256, 267–268, 268. See also auditory attention; spatial attention
cognitive neuroscience, 313
constructs of, and marker tasks, 297–300
defining as verb, 256–257
deployment, in time, 230, 231f
(p. 608) development of, networks, 302–311, 312–313

divergence from action, 261–263


efficiency, 311
as executive control, 299
experience, 312
genes, 311–312
introduction to visual, 257–259
musical processing, 119
neuroanatomy of, 300–302
optimizing development, 312–313
parietal cells, 35
perceptual load theory of, 545n.11
Posner’s model of, 299–300
relationship to action, 255–256, 267–268
remapping, 344–345

Page 3 of 46
Index

selective, 225–228
as selectivity, 297–299
as state, 297
sustained, 224–225
theories of visual, 259–263
tuning, 258–259
attentional deficit
neglect and optic ataxia, 337, 339–341
oculomotor deficits, 343–344
optic ataxia, 335–337
attentional disorders
Bálint–Holmes syndrome, 326–328
dorsal and ventral visual streams, 320, 322–323
optic ataxia, 335–341
psychic paralysis of gaze, 341–344
saliency maps of dorsal and ventral visual streams, 321f
simultanagnosia, 342–343
unilateral neglect, 328–334
attentional landscape, reach-to-grasp movement, 276
attentional orienting, inhibition, 298–299
attention-for-action theory, 262–263
Attention in Early Development, Ruff and Rothbart, 302
attention networks
childhood, 306–311
developmental time course of, 311f
development of, 302–311
infancy and toddlerhood, 302–306
attention network task (ANT), 300, 306–307
attention window, spatial relation, 41–42
attentive listening, rhythm, 118
audiovisual phonological fusion, 531
audiovisual speech perception
extraction of visual cues in, 538–540
McGurk illusion, 531–532, 535, 537, 538
neural correlates of integration, 540–542
perception in noise, 529f
spatial constraints, 538
temporal constraints, 534–538
temporal window of integration, 536f, 537
audition, 135–136, 200–201. See also auditory system
amusia, 127, 201
auditory agnosia, 201
auditory scene analysis, 156–162
auditory system, 136, 137f
challenges, 136, 163–164
frequency selectivity and cochlea, 136–137, 139–140
future directions, 162–164
inference problems of, 163

Page 4 of 46
Index

interaction with other senses, 163


interface with cognition, 162
perceptual disorders, 201
peripheral auditory system, 138f
sound measurements, 136–142
sound source perception, 150–156
auditory attention. See also attention
brain areas, 229, 230f
concurrent sound segregation, 218–219
conjunction of pitch and location, 221f
deployment of, 228–230
divided attention, 228
enhancement and suppression mechanism, 221–222
environment, 216–220
figure-ground segregation, 221–222
future directions, 230–232
intermodal, 227–228
intramodal selective attention, 226–227
mechanisms of, 220–223
neural network of, 223–230
object-based, 217–218
selective attention, 225–228
sequential sound organization, 219–220
sharpening tuning curve, 222–223
sustained attention, 224–225
varieties of, 215–216
auditory brainstem response (ABR), evoked electrical potential, 149–150
auditory cues, speech production, 528f
auditory filters, pitch, 154f
auditory imagery, music, 124
auditory nerve, neural coding in, 140–142
auditory objects, 216
auditory oddball paradigm, 176, 177f
auditory scene analysis, 156–162
cocktail party problem, 157f
filling in, 159–160
sequential grouping, 158–159
sound segregation and acoustic grouping cues, 156–158
streaming, 159
auditory speech perception, confusion trees for, 530f
auditory speech signal, properties and perception of, 526–527
auditory streams, 216
auditory system
adaptive coding and plasticity, 149–150
anatomy of auditory cortex, 144f
auditory scene analysis, 156–162
brain basis of sound segregation, 160–161
feedback to cochlea, 142–143

Page 5 of 46
Index

filling in, 159–160


functional organization, 143–145
future directions, 162–164
phase locking, 141–142
reverberation, 161–162
schematic, 137f
separating sound sources from environment, 161–162
sequential grouping, 158–159
sound segregation and acoustic grouping cues, 156–158
sound source perception, 150–156
sound transduction, 136, 137f
streaming, 159
structure of peripheral, 138f
subcortical pathways, 142
tonotopy, 143
auditory task, congenitally blind and sighted participants, 564f
autobiographical memory, temporal gradient of, 439–440
autonomous phase, stage model, 417–418
B
Bálint, Reszo, 326
Bálint–Holmes syndrome, 199, 326–328, 344
Bálint’s syndrome, 34, 36, 46, 200, 320f, 327, 334
bandwidth, loudness, 155
basal ganglia
attention, 227
vision, 289
rhythm, 118
selection, 332
singing, 122
skill learning, 425
working memory, 408
Bayesian models, skill learning, 418–419
behavioral models, skill learning, 417–419
behaviorism, mental imagery, 74–75
Bell, Alexander Graham, 525f
biased competition, visual attention, 261
bilateral brain responses, speech and singing, 174–175
bilateral paracentral scotoma, 196
bilateral parieto-occipital lesions, 46
bilateral postchiasmatic brain injury, 196
binaural cues, sound localization, 150, 151
binding, 458
binding deficit hypothesis
cognitive theory of aging, 457, 458, 464
medial temporal lobe (MTL) function, 468
binocular vision, grasping, 275
bipolar receptor neurons, humans, 92
birds, brain organization, 41

Page 6 of 46
Index

blindsight, residual visual abilities, 283


(p. 609) blood oxygen level-dependent signal (BOLD)

attention, 230f
audiovisual speech, 543
auditory sentence comprehension, 188f
category-specific patterns in healthy brain, 559f
congenitally blind and sighted participants, 564f
episodic memory, 379
intramodal selective attention, 226
measuring early brain activity, 173
mental imagery, 75, 85
responses across neural populations, 19–20
spatial working memory, 399f, 400
sustained attention, 224
body perception disorders, 202
BOLD. See blood oxygen level-dependent signal (BOLD)
bottom-up inputs, 62, 60, 67f
brain
activation in motor areas, 84
audiovisual speech integration, 540–542
auditory attention, 229, 230f
auditory sentence comprehension, 187, 188f
basis of sound segregation, 160–161
bilateral responses for speech and singing, 174–175
category specificity, 571–572
cognitive neuropsychology, 554–555
combining sensory information, 524–525
division of labor, 31–32, 41, 50
functional magnetic resonance imaging (fMRI) measuring activation, 12
functional specialization of visual, 194–195
mapping, 3
measuring activity in early development, 172–173
musical functions, 127–129
object-, face- and place-selective cortex, 12, 13f
olfactory information, 96
semantic memory, 358
topographic maps of, 29–31
brain injury
bilateral postchiasmatic, 196
memory disorders, 473–474
mental imagery and perception, 79–80
musical functions, 128
perceptual disorders, 205
sensorimotor processing, 361
traumatic, 473, 478, 481
visual disorders, 195
Broadbent, Donald, 297
Broca’s aphasia, verbal working memory, 402–403

Page 7 of 46
Index

Broca’s area, 111, 127


auditory sentence comprehension, 188f
infant brain, 173
reading and writing, 497–499
syntactic rules, 187
Brodmann areas, 36, 80
reading and spelling, 494f
visual cortex for areas 17 and 18, 76
buildup, 160
C
capacity, working memory, 400–402
capacity limitation theory, visual attention, 259, 260–261
carbon monoxide poisoning, visual agnosia, 278
categorical, spatial relations, 41–45
category-specific semantic deficits
anatomy of category specificity, 563–565
BOLD response in healthy brain, 559f
congenitally blind and sighted participants, 564f
connectivity as innate domain-specific constraint, 567
correlated structure principle, 561–563
distributed domain-specific hypothesis, 565–567
domain-specific hypothesis, 556, 557, 561
embodied cognition hypothesis, 569
explanations of causes of, 556–558
functional imaging, 563–565
knowledge of tool use, 569f
lesion analyses, 563
multiple semantics assumption, 558–559
object-associated actions, 567–568
phenomenon, 555–556
picture naming performance, 555f
relation between impairments and, 557f
relation between sensory, motor and conceptual knowledge, 568–570
representation of conceptual content, 570–571
role of visual experience, 566–567
second-generation sensory/functional theories, 559–561
sensory/functional theory, 556, 557
toward a synthesis, 571–572
cellular consolidation, 441–443. See also consolidation
long-term potentiation (LTP), 441–443
slow-wave sleep and, 444–445
central executive, working memory, 392, 407, 475
central scotoma, 196
central vision, 322, 323f
centroid, 114
cerebellum, rhythm activity, 118
cerebral achromatopsia, 196–197
cerebral akinetopsia, 198

Page 8 of 46
Index

cerebral dyschromatopsia, 196


cerebral hemiachromatopsia, 197
change blindness, 322
phenomenon, 298
visual attention, 257–258, 260
chemosignaling
accessory olfactory system, 90
human tears, 104f
social interaction, 104–105
Cherry, Colin, 297
childhood
alerting, 307–308
attention network development, 306–311
attention network task (ANT), 306f
orienting, 308–309
selectivity, 308–309
Children’s Television Workshop, 579, 580
Chinese language, speech perception, 512–513
chroma of pitch, 113f, 115
cilia, olfactory receptors, 92
closed-loop, skill learning, 418
closure positive shift (CPS), 175, 183
cochlea
cochlear amplification, 140
feedback to, 142–143
frequency selectivity, 136–137, 139f, 139–140
cocktail party problem, 156, 157f
cognitive maps, spatial representation, 48–49
cognitive neuropsychology, 2
cognitive neuroscience, 1, 2, 600, 601, 604
advances for semantic memory, 366–367
attention, 313
audiovisual speech integration, 542–544
constraints and convergence, 3–5
locative prepositions, 44
looking to the future, 601–604
modeling relationships, 602–603
multiple levels of analysis, 2
neural underpinning, 75
skill learning, 416–417, 419–420
themes and trends, 599–601
translation, 603–604
use of multiple methods, 2–3
visual brain, 32
cognitive phase, stage model, 417–418
cognitive psychology, 600
cognitive subtraction, 396
cognitivist, 61

Page 9 of 46
Index

color imagery, visual, 79–80


color vision, 196–197
communication. See speech perception
comodulation, 158
compensation-related utilization of neural circuits hypothesis (CRUNCH), aging, 459, 462, 470
competition for representation, visual attention, 260–261
computational analysis, 2
computational neuroscience, fast and slow learning, 424–425
conduction aphasia, verbal working memory, 403, 406f
(p. 610) conscious incompetence, skill learning, 417, 418f

consolidation, 436, 451


cellular, 441–443
early views, 436–438
hippocampal activity vs. neocortical activity, 441
modern views of, 438–443
reconsolidation, 450–451
resistance to interference, 436–438
role of brain rhythms in encoding and, states of hippocampus, 447–448
role of sleep in creative problemsolving, 450
sleep and, 443–450
sleep-related, of nondeclarative memory, 448–450
systems, 438–441
temporal gradient of episodic memory, 439–440
temporal gradient of semantic memory, 439
temporal gradients involving shorter time scale, 440–441
testing effect, 451
constraints, cognitive neuroscience, 3–5
context frames, visual processing, 65–66
contextual associations, visual processing, 65–66
contingent negative variation (CNV), 231f, 301, 307, 308
continuous performance task (CPT), 307–308
contrast sensitivity
spatial, 196
spatial attention, 247–248
convergence, cognitive neuroscience, 3–5
convergence zone, 357
coordinate, spatial relations, 41–45
correlated feature-based account, semantic memory, 357
cortical activation, auditory sentence comprehension, 187, 188f
cortical networks, spatial attention, 245
cortical neural pattern (CNP), 377
cortical reactivation, episodic memory, 376f, 382–384
covert attention
eye-movement task, 266–267
shifts of attention without eye displacements, 323
covert orienting
infancy, 303
spatial attention, 240–242, 251, 263–264

Page 10 of 46
Index

Cowan’s K, 401
creative problem-solving, role of sleep in, 450
Critique of Pure Reason, Kant, 50, 528
culture, pleasantness ratings by, 99, 100f
D
declarative memory, 353, 443
long-term, 475–478
sleep-related consolidation of, 444–448
delayed response task, working memory, 394–395, 396f
dementia, 203, 481
depth perception, 197
diabetes mellitus, 203
diagonal illusion, 285
Diamond, Adele, 304
difference in memory (DM) paradigm, 379
diffusion tensor imaging (DTI), 428, 494
direction, gravity, 37–38
disconnected edges, response, 15f
distributed, 23
division of labor
analog vs. digital spatial relations, 45
brain, 31–32
brain organization, 41
visual processing, 277–278
domain-general binding, 380
domain-specific category-based models, semantic memory, 356
domain-specific encoding, 380
domain-specific hypothesis
category-specific semantic deficits, 556, 557, 561
distributed, 565–567, 572
dopamine
alerting network, 300, 301t
executive attention, 301t, 302
dorsal frontoparietal network
orientation, 308
spatial attention, 245–246, 250
dorsal premotor cortex (PMC), 118, 123, 397, 423
dorsal simultanagnosia, 199–200
dorsal system
object map, 46
object recognition, 45–48
spatial information, 43
dorsal visual stream
active vision, 323, 326
episodic retrieval, 383
interaction with ventral stream, 289–290
landmark test characterizing, 321f
object identification, 32

Page 11 of 46
Index

perception and vision with, damage, 279–280


peripheral vision, 322
saliency map, 321f
shape sensitivity, 33–34
spatial mental imagery, 81–82
vision, 194–195
visual processing, 277f
dorsolateral prefrontal cortex (DLPFC), 128f
absolute pitch, 121
attention, 267
divided attention, 228
executive attention, 302
sensorimotor coordination, 122
skill learning, 419, 422, 425
working memory, 119
double-pointing hypothesis, visual-motor system, 275, 286–288
double-step saccade task, 323, 325f
Drosophila, 92
dual-channel hypothesis, visual-motor control, 274–275, 276
dual-task paradigm, saccade location and attention, 265
Dutch, word stress, 178
dynamicist, 61
dysexecutive syndrome, 475
dysgeusia, 203
dyslexia, 196
E
early left anterior negativity (ELAN)
syntactic phrase structure, 175
syntactic rules, 187
early right anterior negativity (ERAN)
harmonic expectancy violation, 116
music and language, 121
eating, olfaction influence, 102–103
eating disorders, olfactory perception, 102–103
Ebbinghaus, Hermann, 1
Ebbinghaus illusion, 284, 285, 286
echoic memory, 484n.1
edges, response, 15f
efference copy, 68
efficient coding hypothesis, auditory system, 149
egocentric spatial, frame of reference, 40
electroencephalography (EEG)
audiovisual speech, 541
cortical reactivation, 385
executive attention, 304
measuring early brain activity, 172–173
phonetic categories of speech, 514
REM sleep, 444

Page 12 of 46
Index

sensorimotor coordination, 122


electrophysiology
language processing, 175
neurons in monkey IT cortex, 19
object and position sensitivity, 16
object recognition, 11–12
prosodic processing, 183
word segmentation studies, 178
emotion, 2
brain regions, 128f
memory impairment, 483–484
music and, 124–126
encephalitis, 473, 481
encoding
episodic memory, 375
episodic memory mechanisms, 378
episodic memory with aging, 459–461, 464
functional imaging, 378–379
stage of remembering, 474
encoding specificity principle, episodic memory, 377
(p. 611) encyclopedic knowledge, 355

English, word stress, 177–179


enhancement and suppression mechanism, auditory attention, 221–222
entorhinal cortex, 48, 49, 93, 95f, 448, 469
envelope, amplitude capture, 145–146
environment, separating sound sources from, 161–162
environmental agnosia, 200
epilepsy, 203, 473
epiphenomenal, 363, 369n.6
episodic buffer, working memory, 392, 393–394, 475
episodic memory, 353, 375–376, 475–476
aging process, 456
coding and hemispheric asymmetry reduction in older adults (HAROLD), 459–461
cortex reactivating during retrieval, 382–384
early insights from patient work, 376–377
encoding using functional imaging, 378–379
functional neuroimaging as tool, 377–378
hippocampal activity during encoding, 379–381
hippocampal reactivation mediating cortical reactivation, 384–385
hippocampus activating during retrieval of, 381–382
medial temporal lobes (MTL) and encoding, 464
memory as reinstatement (MAR) model, 376f, 377
MTL and retrieval, 464–465
musical processes, 120–121
relationship to semantic memory, 354, 477
retrieval and hemispheric asymmetry reduction in older adults (HAROLD), 461–463
temporal gradient of, 439–440
error-dependent learning, 421

Page 13 of 46
Index

error detection and correction, skill learning, 420–425


errorless learning, 481–482
error-minimization mechanisms, 63
error-related negativity (ERN), 299, 310, 311, 421–422
event-related potentials (ERPs)
attention, 298, 309
audiovisual speech, 541, 542–543
auditory attention, 219f
auditory scene, 216
infant N400, 182
language development, 176f
language perception, 175
learning, 421
measuring early brain activity, 172
musical training, 116
music and language, 121
N400 effect, 181, 182
phonetic learning, 518, 519
phonotactics, 179–180
prosodic processing, 183, 184f
sentence-level semantics, 184–185, 185f
spatial attention, 247
syllable discrimination, 177f
syntactic rules, 186f
word meaning, 180–182, 182f
word stress, 178, 179f
evolution, brain organization, 41
excitation pattern, pitch, 154f
executive attention
attention networks, 301t
childhood, 309–311
network in infants, 304–305
executive control
attention as, 299
top-down, for visual-motor system, 289
executive function, 457
Exner’s area, reading and writing, 494, 500
experimental psychology, resistance to new learning, 437
explicit memory, skill learning, 425–427
extinction, dissociation between neglect and, 328–331
extrastriate body area (EBA), 13
eye movements
behavioral evidence linking spatial attention to, 264–266
overt attention, 323
parietal lobes, 35
and spatial attention, 242–244, 251
visual attention and, 256
visual tracking task, 288

Page 14 of 46
Index

F
face agnosia, 200
face-selective cortex, fMRI signals, 13f
facial movements, combination with vocal acoustics, 529–531
familiarity, 458
far transfer, working memory, 402
fault tolerance, 367
fear-conditioning task, sleep, 447
feature-based attention, 258–259, 261–262
feature integration theory, 46
feedback connections, visual world, 61–62
feedforward connections, visual world, 61–62
figure-ground segregation, auditory attention, 221–222
filling in, sound segregation, 159–160
filtering, separating sound sources from environment, 161–162
Fitts, Paul, 417
flanker task, 300, 310
Fowler, Carol, 525
frames of reference
object-centered or word-centered, 38–39
spatial representation, 36–38
French
syntactic structure, 186–187
word stress, 177–179, 179f
frequency selectivity, cochlea, 136–137, 139f, 139–140
frontal eye field (FEF), 36, 226, 230f,\
attention, 334
orienting network, 301
remapping, 325f
spatial attention, 244
spatial attention and eye movements, 266, 267
spatial working memory, 398–399
frontal lobes, 601
functional imaging, encoding episodic memory, 378–379
(p. 612) functional magnetic resonance imaging (fMRI)

abstract knowledge, 366


attention, 217
attentional orienting, 298
audiovisual speech, 542
auditory imagery, 124
category specificity, 563–565
dissociations with aging, 457
episodic memory, 379
grasping control, 281–282
intramodal selective attention, 226–227
lexical competition, 516
measuring brain activation, 12
measuring early brain activity, 172–173

Page 15 of 46
Index

mental imagery, 75–76


neural bases of object recognition, 11, 12
neural dispositions of language in infant brain, 173, 174f
neuroimaging of patient’s dorsal stream, 281f
neuroimaging of patient’s ventral stream, 280f
piriform activity in humans, 96
pitch and melody, 115
reading and spelling, 494, 498f
retinotopic mapping, 31
semantic memory, 439
sentence comprehension, 187
signals of object-, face- and placeselective cortex, 12, 13f
slow-wave sleep, 446
spatial attention, 245
spatial mental imagery, 81
speech articulation, 511
tonal dynamics, 117
verbal working memory, 404–405
visually guided reaching, 282
visual mental imagery, 76–77
visual-spatial attention and eye movements, 266–267
visual-spatial working memory, 397–398
functional magnetic resonance imaging–adaptation (fMRI–A), 14, 361
neural representations, 17
responses of putative voxel, 20, 21f
rotation sensitivity, 20f
functional near-infrared spectroscopy (fNIRS)
infant brain, 173
measuring early brain activity, 172–173
functional neuroimaging
cognitive aging, 465–467
medial temporal lobe (MTL), 468
prefrontal cortex (PFC), 467
reading words and nonwords, 495, 500
tool for episodic memory, 377–378
working memory, 395–400
fusiform body area (FBA), 13, 22–23
fusiform face area (FFA), 13
domain-specific hypothesis, 566
sparsely distributed representation of, 22–23
visual mental imagery, 78–79
fusiform gyrus, 12, 13, 399, 491, 494–497
G
Gabor patches, 76
gap effect, 238
Gault, Robert, 533
gaze, psychic paralysis of, 341–344
gaze ataxia, 327

Page 16 of 46
Index

gaze-contingent display, perception, 264, 264f


Gazzaniga, Michael, 1
genes, attention networks, 301t, 311–312
geometry, cognitive maps, 48–49
German, word stress, 177–179, 179f
Gerstmann’s syndrome, 322
gestalt, 34, 199
gestaltist, 61
Glasgow Outcome Scale, 478
global-to-local integrated model, visual processing, 63–64
glomeruli
olfactory bulb, 93
patterns for rat, activation, 94f
goal-driven attentional shifts, 258
Gottfried, Jay, 96
graceful degradation, 367
grammar
constraint-based principles of, 581–582
mainstream generative, (MGG), 579
no strict lexicon vs., 582–584
grapheme-to-phoneme conversion, 492, 500
graphemic buffer, 493
grasping
binocular vision, 275
illusory displays, 285–286, 287f
objects, 274, 283
reach-to-grasp movements, 274–276
shape and orientation of goal object, 276
studies, 275
visual-motor system, 285–286, 288
gravity, sense of direction, 37–38
grip aperture
grip scaling, 288
optic ataxia vs. visual agnosia, 279f
reach-to-grasp movements, 274f
sensitivity to illusions, 285–286
Weber’s law, 286, 288
grounding by interaction, 570–571
grouping cues
sequential grouping, 158–159
sound segregation, 156–158
group therapy, memory impairment, 483–484
Grueneberg organ, 89, 90
gustation, 202
gustatory perceptual disorders, 203
H
Haberly, Lew, 95–96
Handbook

Page 17 of 46
Index

linking analysis, 600–601


looking to future, 601–604
new methods, 602
new topics, 601–602
new ways of modeling relationships, 602–603
overview of, 5–6, 600
themes and trends, 599–601
translation, 603–604
haptic devices, 534
harmony, music, 112
head-related transfer function (HRTF), sound localization, 151–152
hearing. See also auditory system
frequency selectivity and cochlea, 136–137, 139f, 139–140
research on pitch, 153
hearing science, future directions, 162–164
Hebb, Donald, 297
Hebbian plasticity, 424, 481
hemianopia, 80, 195, 196, 204
hemianopic dyslexia, 196
hemifield constraints, attention and action, 263
hemispheric asymmetry reduction in older adults (HAROLD), prefrontal cortex (PFC), 458–463,
470
Heschl’s gyrus (HG), 115, 117, 126–127, 128f
acoustic properties of speech, 510
attention, 226
auditory sentence comprehension, 188f
episodic retrieval, 383
language, 173
phonetic learning, 518–519
speech perception, 509, 510, 512
high spatial frequency (HSF) information, ventral visual pathway, 67f
hippocampal cells, cognitive maps, 48–49
hippocampal neuro pattern (HNP), 377
hippocampal pattern completion, 376f, 377
hippocampus, 21, 32, 36, 44, 48, 49, 95f
activity during encoding, 379–381
activity during retrieval, 381–382
activity vs. neocortical activity, 441
aging and condition interaction, 463f
consolidation, 438, 440–445
longitudinal changes, 457f
memory, 120, 124, 376
reactivation, 384–385
role of brain rhythms in encoding and consolidation, 447–448
skill learning, 425
homonymous visual field defect, 195
homophony, 527
homunculus, 31

Page 18 of 46
Index

honeybees, spatial information, 29


horizontal-vertical illusion, 285
hub, 357
human behavior, olfactory system, 101–105
human cerebral cortex, two streams of visual processing, 277f
human cognitive psychology, verbal and working memory, 402
human leukocyte antigen (HLA), mating, 103
human olfaction. See also mammalian olfactory system
primacy of, 88, 89f
schematic of system, 89f
human parietal cortex, spatial information, 49
humans
chemosignaling, 104f
microsmatic, 88
motor imagery, 82–83
olfactory cortex, 95f
olfactory system, 90f
schematic of nasal cavity, 91f
human ventral stream. See also ventral visual stream
functional organization of, 12–14
nature of functional organization, 21–23
Huntington’s disease, 203, 427
hypogeusia, 203
I
iconic memory, 484n.1
image rotation, motor imagery, 83
imagery
brain regions, 128f
debate, 4–5
mental imagery and, 74–76
music, 123–124
image scanning paradigm, landmarks, 81
immediate memory, 475
implementation analysis, 2
implicit memory, skill learning, 425–427
improvised performance, music, 123
inattentional blindness, 260
independent-components account, 493
(p. 613) Infant Behavior Questionnaire, 305

infant brain, neural dispositions of language, 173–175


infants, attention network development, 302–306
inferior colliculus, 137f, 146, 148f, 149, 152, 227
inferior frontal gyrus (IFG)
audiovisual speech, 540
lexical competition, 516–517
reading and writing, 497–499
speech processing system, 508 f
inferior parietal lobe (IPL), 319

Page 19 of 46
Index

neglect, 40, 334


polysensory areas, 36
inferior temporal lobe, 46, 78, 322, 358, 399
inferotemporal (IT) cortex, responses to shapes and objects, 11
information flow, speech processing, 517–518
information processing, 3
inhibition, attentional orienting, 298–299
inhibition of return (IOR), 299
inner hair cells, 136
integrative agnosia, 199
intelligibility, visual contributions to speech, 528–529
interaural level differences (ILDs), sound, 150–151
interaural time differences (ITDs), sound, 150–151
interference
attention and action, 262–263
consolidation and resistance to, 436–438
intermodal auditory selective attention, 227–228
internal model framework, action generation, 68
intonational phrase boundaries (IPBs), 182–183
intramodal auditory selective attention, 226–227
intraparietal sulcus (IPS), 13f, 31, 81, 559f
attention, 224, 230f, 319
audiovisual speech, 540
auditory attention, 224–225
neglect syndrome, 331
singing, 122, 128f
spatial attention, 245
visual-spatial attention, 266, 278, 281
working memory, 399, 401f
invariance, 18
invariance problem, speech perception, 513–514
invariant object recognition, neural bases of, 15–16
inverse retinotopy, 76
irrelevant sound effect, verbal working memory, 404
J
Jacobson’s organ, 89–90
James, William, 296, 375, 417
Jost’s law of forgetting, 438
K
Kahneman, Daniel, 297
Katz, Larry, 97
Keller, Helen, 525f
key, music, 112, 113f
key profile, music, 113
Khan, Rehan, 97
Korsakoff’s syndrome, 473
L
landmarks, image scanning, 81

Page 20 of 46
Index

language. See also reading and writing


information processing, 603
parallels between music and, 121–122
perception of lexical tone, 512–513
working memory maintenance, 405f
language acquisition, 171–172
auditory sentence comprehension, 187, 188f
developmental stages, 176f
from sounds to sentences, 182–187
from sounds to words, 175–182
neural dispositions of language in infant brain, 173–175
phoneme characteristics, 176–177
phonological familiarity, 180
phonotactics, 179–180
sentence-level prosody, 182–183
sentence-level semantics, 184–185
syntactic rules, 185–187
word meaning, 180–182
word stress, 177–179
language processing, 507–509. See also Parallel Architecture; speech perception
Parallel Architecture, 578–579
working memory, 584–586
lateral geniculate nucleus (LGN), spatial attention, 244
lateral inferior-temporal multimodality area (LIMA), 496
lateral intraparietal (LIP) area, 321f
attention, 319–320
visual remapping, 325f
lateralization, spatial representations, 40–45
lateral occipital complex (LOC), 12, 13f
cue-invariant responses in, 14–15
object and position information in, 16–18
position and category effects, 17, 18f
selective responses to objects, 14f
viewpoint sensitivity across, 18–21
visual mental imagery, 78
lateral olivocochlear efferents, feedback to cochlea, 142
lateral superior olive (LSO), sound localization, 151
law of prior entry, spatial attention, 249–250
learning, memory-impaired people, 481–483
left hemisphere
categorical perception after damage, 43
digital spatial relations, 41, 42
lesions in, 42–43
object recognition after damage, 47
lesion analyses. See also brain injury
category-specific deficits, 563
object recognition, 11–12
letter-by-letter reading, 495

Page 21 of 46
Index

letter discrimination, optic ataxia, 340f


lexical competition, speech perception, 516–517
lexical-semantic information, sentencelevel, 184–185, 185f
lexical tone, speech perception, 512–513
lexicon
after working memory, 587f
fragment of, 587f
phonological activation, 586–588
speech processing system, 508f
life cycle, memory, 391
linguistic theories, 578–579
linguistic working memory, 586f, 593n.10
lip reading, 527, 530f
listening, separating sound from environment, 161–162
localization, sound source, 150–152
location-based attention, 259
locative prepositions, spatial relations, 44–45
Locke, John, 368n.3
long-term memory, 475
long-term potentiation (LTP), cellular consolidation, 441–443
long-term store (LST), working memory, 390–391
loudness
constancy phenomena, 161
deviation tones, 229
sound, 155–156
love spots, 3
M
McCarthy, Rosaleen, 555
McGurk illusion, 537, 538, 542, 543
audiovisual speech integration, 531–532, 535
schematic showing, 531f
magnetic misreaching, 335
magnetic resonance imaging (MRI)
diffusion-weighted, 499f
perfusion-weighted image, 498f, 499f
working memory, 422
magnetoencephalography (MEG)
acoustic properties of speech, 510
audiovisual speech, 541
cortical reactivation, 385
measuring early brain activity, 172–173
reading and spelling, 494
mainstream generative grammar (MGG), Parallel Architecture, 579
maintenance, working memory, 391, 405f
(p. 614) major histocompatibility complex (MHC), mating, 103

mammalian olfactory system


eating, 102–103
human olfactory cortex, 95f

Page 22 of 46
Index

looking at human behavior through the nose, 101–105


looking at nose through human behavior, 97–101
mating, 103–104
mouse and human, 90f
multiple sensing mechanisms, 89–90
neuroanatomy of, 88–96
olfactory bulb for odorant discrimination, 92–93
olfactory epithelium, 91–92
olfactory perceptual space, 97, 98f
physicochemical space to perceptual space, 99f
piriform cortex, 94–96
pleasantness across cultures, 99, 100f
pleasantness identification, 97, 98f, 99–101, 100f
primary olfactory cortex, 93–94
schematic of human olfactory system, 89f
sniffing, 90–91, 91f
social interaction, 104–105
marker tasks, attention, 297, 301t
mating, olfaction influence, 103–104
medial olivocochlear efferents, feedback to cochlea, 142
medial prefrontal cortex (MPFC), 128f, 601
contextual processing, 66, 67f
learning, 426
memory, 441
tonal dynamics, 117, 120, 123
medial superior olive (MSO), sound localization, 151
medial temporal lobe (MTL)
aging and double dissociation between regions, 466f
binding deficit hypothesis and MTL function, 468
consolidation, 438, 439–440
dysfunction in healthy and pathological aging, 468–469
episodic memory, 456
episodic memory encoding, 464
episodic memory retrieval, 464–465
hippocampal activity during encoding, 379–380
surgical removal, 376–377
working memory, 463
melody
brain regions, 128f
music, 112
tonality, 115
memory, 2. See also episodic memory; semantic memory; working memory
aids in loss compensation, 480–481
assessment of, functioning, 479–480
audition, 162–163
auditory, 217
brain regions, 128f
music and, 119–121

Page 23 of 46
Index

navigational, 48–49
Plato, 74
recovery of functioning, 478–479
rehabilitation, 480–484
stages of remembering, 474
systems, 474–478
memory as reinstatement (MAR) model, episodic memory, 376f, 377, 378, 381
memory disorders, 473–474, 484
anterograde amnesia, 474
assessment of memory functioning, 479–480
compensating with memory aids, 480–481
declarative long–term memory, 475–478
emotional consequences, 483–484
episodic, 475–476
memory systems, 474–478
modifying the environment, 483
new learning, 481–483
non-declarative long-term memory, 478
priming, 478
procedural memory, 478
prospective memory, 477–478
recovery of memory functioning, 478–479
rehabilitation of memory, 480–484
relationship between semantic and episodic memory, 477
retrograde amnesia, 474
semantic memory, 476–477
short-term and working memory, 474–475
stages of remembering, 474
understanding, 474–484
memory-guided saccade (MGS), spatial working memory, 399f
memory impairment, 474
memory systems debate, 4
memory trace
consolidation, 437
skill learning, 418
menstrual synchrony, olfaction influence, 103
mental exertion, 437
mental imagery, 74, 84–85
brain-damaged patients, 80–81
dorsal stream and spatial, 81–82
early studies of, 74–76
imagery debate, 74–76
visual, and early visual areas, 76–78
visual, and higher visual areas, 78–82
mental mimicry, 224
mental rotation tasks, strategies in, 83–84
mental rotation paradigm
humans, 82–83

Page 24 of 46
Index

objects, 75
mesial temporal epilepsy, 203
meter, music, 112, 117–118
middle temporal gyrus, speech processing system, 508f, 509
Mikrokosmos, Bartok, 124
Milan square’s neglect experiment, 49
Miller, George, 1
mismatch negativity (MMN), 178
audiovisual speech, 541, 545n.10
chords, 116
language development, 176f
oddball paradigm, 543
phonetic learning, 518
speech sounds, 176
timbre, 118–119
mismatch paradigm, 176
mismatch response (MMR), 176
missing fundamental illusion, 152
modulation frequencies, 145
modulation tuning, sound measurement, 146–148
Molaison, Henry (H.M.), 4, 376–377, 438, 478
monkeys
navigational memory, 49
neurophysiological studies, 267
perception of space and object, 33
motion perception, 198
motor imagery, 82–84
functional role of area M1, 84
music, 124
physical movement, 82–83
strategies in mental rotation tasks, 83–84
motor systems, speech perception, 514–515
mouse, olfactory system, 90f
moving window paradigm, perception, 264, 264f
Müller, Georg, 436
Müller–Lyer illusion, 66, 285
multimodal speech perception. See speech perception, multimodal
multiple sclerosis, 473
gustatory disorders, 203
olfactory perception, 203
multiple semantics assumption, 558–559
multivoxel pattern analysis (MVPA)
episodic retrieval, 383
semantic memory, 366, 368
sensibility to position, 17, 18
music
absolute pitch, 121
action, 122–124

Page 25 of 46
Index

amusia, 127, 201


anatomy, plasticity and development, 126–127
attention, 119
auditory imagery, 124
auditory perception, 200
brain’s functions, 127–129
building blocks of, 112–115
(p. 615) detecting wrong notes and wrong chords, 115–117

disorders, 127
emotion, 124–126
episodic memory, 120–121
imagery, 123–124
improvised performance, 123
memory, 119–121
motor imagery, 124
parallels between music and language, 121–122
perception and cognition, 115–121
performance, 123
pitch and melody, 115
psychology and neuroscience, 111–112
rhythm and meter, 117–118
score-based performance, 123
semantics, 121–122
sensorimotor coordination, 122–123
singing, 122–123
syntax, 121
tapping, 122
timbre, 114–115, 118–119
time, 112
tonal dynamics, 117
tonality, 112–114, 115–117
tonal relationships, 113f
working memory, 120
working memory maintenance, 405f
N
National Institute of Mental Health (NIMH), 604
Navigation, spatial information, 48–49
n-back task, sustained working memory, 224, 225f
negative priming, phenomenon, 299
neglect. See also unilateral neglect
reference frames, 39f
space, 38–40
neglect dyslexia, 40
Neopolitan sixth, 116
neural networks
invariant object recognition, 15–16
spatial attention, 244–246
neural play, slow-wave sleep, 445–446

Page 26 of 46
Index

neuroanatomy
attention, 300–302
mammalian olfactory system, 88–96
neuroimaging
cognitive aging, 465–467
cortical 3D processing, 36
musical processes, 120
navigational tasks, 49
reach-to-grasp actions, 34–35
shape-selective actions, 33
NeuroPage, memory aid, 480–481
neurophysiology
monkeys, 267
visual-spatial attention and eye movements, 266
neuroscience, 111–112, 600
neurotransmitters, attention networks, 301t
noise-vocoded speech, 146, 147f
nondeclarative memory, 443, 448–450, 478
noradrenaline, alerting network, 301t
norepinephrine, 300, 301
O
obesity, olfactory perception, 102
object-centered, frame of reference, 38–39
object form topography, 22
object map, 46
object recognition
category-specific modules, 22
cue-invariant responses in lateral occipital complex (LOC), 14–15
distributed object form topography, 22
electrophysiology measurements, 11–12
functional magnetic resonance imaging (fMRI), 11, 12
functional organization of human ventral stream, 12–14
future directions, 23–24
lesions of dorsal system, 46
neural bases of invariant, 15–16
neural sensitivity to object view, 20
open questions, 23–24
orbitofrontal cortex (OFC), 64
process maps, 22
representations of faces and body parts, 22–23
responses to shape, edges and surfaces across ventral stream, 15f
theories of, 18, 20–21
variant neuron, 20–21
object recognition task, unilateral posterior lesions, 47f
objects. See also category-specific semantic deficits
ambiguous figures, 68–69
perceptual and conceptual processing of, 360–362
spatial attention, 239–240

Page 27 of 46
Index

spatial information within the, 45–48


visual attention, 258
visual working memory, 399–400
object-selective cortex, fMRI signals, 13f
obstacle avoidance, visual-motor networks, 283
oculomotor deficits, 343–344
oculomotor readiness hypothesis (OMRH), spatial attention, 242–244
oddball paradigm, attention, 224
odorants
discrimination at olfactory bulb, 92–93
pleasantness, 97, 98f, 99–101
sensing mechanisms, 89–90
sniffing, 90–91
transduction at olfactory epithelium, 91–92
odor coding, olfactory bulb, 93
odor space, 97
olfaction. See also human olfaction; mammalian olfactory system
rating pleasantness, 99–101
receptor events in, 92f
olfactory bulb, 95f
odorant discrimination, 92–93
spatial coding, 94f
olfactory cortex, primary, 93–94
olfactory epithelium, odorant transduction, 91–92
olfactory perception, disorders, 202–203
olfactory perceptual space, 97, 98f
olfactory receptors, humans, 92
olfactory tubercle, 95f
On the Motion of Animals, Aristotle, 417
open-loop control, skill learning, 418
optical topography (OT), early brain activity, 172
optic aphasia, 558
optic ataxia, 278, 279f, 327, 328
central cue vs. peripheral cue, 340f
dissociation between attentional deficits in neglect and, 337, 339–341
errors for saccade and reach, 338f
field effect and hand effect, 336f
neuropsychology, 322
Posner paradigm, 339f
reaching errors, 335–337
orbitofrontal cortex (OFC), visual information, 64, 67f
organized unitary content hypothesis (OUCH), 561
organ of Corti, peripheral auditory system, 136, 138f
orienting network
attention, 301–302
childhood, 308–309
infants, 303–304, 305
outer hair cells, 137

Page 28 of 46
Index

overt attention, eye movements, 323


overt orienting, spatial attention, 240–242, 251
P
parahippocampal cortex (PHC), 49, 66, 67f, 379, 465, 468, 565
parahippocampal gyrus, music, 125
parahippocampal place area (PPA), 13, 78–79, 383, 566
Parallel Architecture, 579–581, 581f, 592–593
activated lexical items, 587f
constraint-based principles of grammar, 581–582
example, 586–590
fragment of lexicon, 587f
goals of theory, 578–579
language processing, 578–579
lexicon after working memory, 587f
mainstream generative grammar (MGG), 579
no strict lexicon vs. grammar distinction, 582–584
noun phrases (NPs), 579
phonology, 579–580
processing, 584–586
(p. 616) semantics as independent generative component, 580–581

semantic structure without syntax or phonology, 591–592


syntactic integration, 588
visually guided parsing, 590–591
working memory after semantic integration, 588f, 589f
parietal cortex, rats, 49
parietal lesions, 40, 48
parietal lobe
mapping, 35–36
position, 35
speech perception, 509
unilateral lesions, 46
parietal neglect, spatial working memory, 326f
Parkinson’s disease, 203, 473
perceptual, 61
perceptual disorders, 193–194, 203–205
audition, 200–201
auditory agnosia, 201
body perception disorders, 202
color vision, 196–197
future research, 205
gustatory, 203
olfactory, 202–203
olfactory and gustatory perception, 202–203
social perception, 203
somatosensory perception, 201–202
spatial contrast sensitivity, 196
spatial vision, 197–198
vision, 194–200

Page 29 of 46
Index

visual acuity, 196


visual adaptation, 196
visual agnosia, 198–200
visual field, 195–196
visual identification and recognition, 198–200
visual motion perception, 198
perceptual odor space, 97
perceptual trace, skill learning, 418
performance, music, 123
periodicity, pitch, 153f
peripheral vision, dorsal stream, 322
phase locking
sound frequency, 141, 141f
upper limit of, 141–142
phonemes. See also language acquisition
auditory speech signal, 526–527
characteristics, 176–177
restoration, 160
speech sounds, 175
visual speech signal, 527
phonological loop
verbal working memory, 403–404
working memory, 392–393, 407, 475
phonology, Parallel Architecture, 579–580
phonotactics, language learning, 179–180
phosphene threshold, 77
physical movements, motor imagery, 82–83
picture-matching task, 568
picture-naming performance, category-specific semantic deficits, 555f
picture viewing, 564f
picture-word priming paradigm, phonotactics, 180, 181
Pilzecker, Alfons, 436
piriform cortex, 93, 202
olfactory object formation, 94–96
understanding, 95–96
pitch
absolute, 121
resolvability, 154f
sound source, 152–155
tonality, 115
place cells, 48, 49f
place models, pitch, 153
place-selective cortex, 13f
plasticity
auditory system, 149–150
music, 126–127
phonetic learning in adults, 518–519
Plato, 74

Page 30 of 46
Index

pleasantness
music, 125
odorants, 97, 98f, 99–101
Ponzo illusion, 286, 287f
positron emission tomography (PET)
auditory attention, 223
changes in regional cerebral blood flow, 360
mental imagery, 75–76
reading and spelling, 494
semantic memory retrieval, 359
spatial attention, 245
visual information, 567
visual mental imagery, 76–77
visual-spatial working memory, 396–398
Posner, Michael, 297, 417
model of attention, 299–300
Posner cueing paradigm, 259, 298, 308, 339f
posterior parietal cortex (PPC), 34–36, 49, 565
category specificity, 563
mental imagery, 81, 82, 84
optic ataxia, 336f
somatosensory perception, 201, 205
spatial attention, 245
unilateral neglect, 333–334
vision for action, 277–278, 281–283, 289
visual attention, 319, 323, 327
posterior piriform cortex, 95f, 96
precedence effect, 161
prediction, relevance in visual recognition, 62–65
prefrontal cortex (PFC)
activity in young and older adults, 460f
episodic memory, 456–457
episodic memory encoding, 459–461
episodic memory retrieval, 461–463
hemispheric asymmetry reduction in older adults (HAROLD), 458–463
longitudinal changes, 457f
resource deficit hypothesis, 467
sustained and transient memory effects by aging, 461f
top-down control of working memory, 408–410
verbal and spatial working memory for older and younger adults, 459f
visual-spatial working memory, 397–398
working memory, 394–395, 396f, 458–459
premotor cortex (PMC), 36, 37, 123, 128f
attention, 230f
body perception disorders, 202
mental imagery, 83
rhythm, 118
semantic memory, 361, 364

Page 31 of 46
Index

sensory functional theory, 559f


skill learning, 423
speech perception, 541
visual control of action, 283, 289
working memory, 397, 409
premotor theory
attention, 344
spatial attention, 242–244
pre-supplementary motor area (pSMA)
rhythm discrimination, 118
sensorimotor coordination, 122
primacy effect, 390
primary memory, 475
primary olfactory cortex, structure and function, 93–94
primary somatosensory cortex, 31
priming, memory, 478
principal components analysis (PCA), odor space, 97
procedural memory, 478
process maps, object recognition, 22
progressive semantic dementia, 477
prosody, sentence-level, 182–183
prosopagnosia, 80, 200
prospective memory, 477–478
psychic paralysis of gaze, 341–344
Balint’s description, 341
oculomotor deficits, 343–344
simultanagnosia, 342–343
psychophysics, 273, 275
pulvinar, spatial attention, 238
pulvinar nucleus, spatial attention, 244
pure alexia, 200, 495, 497
putative voxels, functional magnetic resonance imaging (fMRI) responses, 20, 21f
Q
qualitative spatial relations, 44
quantitative propositions, 44
R
(p. 617) radical visual capture, 199

random stimuli, response, 15f


rapid eye movement (REM) sleep, sleep and consolidation, 443–444
rapid serial visual presentation (RSVP), spatial attention, 250
rats
hippocampal lesions in, 438
maze navigation, 48–49
spatial coding of olfactory bulb, 94f
reach-to-grasp actions
bilateral optic ataxia, 338f
dual-channel hypothesis, 274–275
neuroimaging, 34–35

Page 32 of 46
Index

visual control of, 274–276


reaction time, spatial attention, 238–239
reactivation, working memory, 407–408
reading
cognitive architecture, 492–493
mechanisms, 493
representational model, 492f
visual field, 195–196
reading and writing, 491–492, 500–501
angular gyrus (BA 39), 499–500
Brodmann areas, 494f
Broca’s area (BA 44/45), 497–499
cognitive architecture of, 492–493
Exner’s area (BA 6), 500
functional magnetic resonance imaging (fMRI), 498f
fusiform gyrus (BA 37), 494–497
inferior frontal gyrus (IFG), 497–499
neuro correlates of, 493–500
superior temporal gyrus (BA 22), 500
supramarginal gyrus (BA 40), 499
recency effect, 390
recognition, listener, 163
recognition by components (RBC) model, 16, 19
recollection, 458
reconsolidation, 450–451
recovery
mechanisms of memory, 479
memory functioning, 478–479
reference frame, 37. See also frames of reference
reflexive control, spatial attention, 240–242
regional cerebral blood flow, positron emission tomography (PET), 360, 395
region of interest (ROI), episodic retrieval, 382–384
remapping, attention, 344–345
Remote Associations Test, 450
repetitive transcranial magnetic stimulation (rTMS), 77, 228, 422
representation of conceptual content, 570–571
Research Domain Criteria (RDoC), 604
resilience, 562
resource deficit hypothesis
cognitive theory of aging, 457, 458, 461
episodic memory retrieval, 461
prefrontal cortex (PFC) function, 467
retina, image formation, 29–31
retinotopic mapping, stimuli and response times, 77f
retrieval
episodic memory, 375, 377
episodic memory with aging, 461–463, 464–465
hippocampus activating during episodic, 381–382

Page 33 of 46
Index

semantic memory, 358–360


spaced, 481
stage of remembering, 474
retrograde amnesia, 438, 474
retrograde facilitation, 443
retrosplenial complex (RSC), 66, 67f
reverberation, sound, 161–162
rhinal cortex, longitudinal changes, 457f
rhythm, music, 112, 117–118
right anterior negativity (RATN), 116
right hemisphere
analog spatial relation, 41, 42
coordinate space perception after damage, 43
neglect, 38
object recognition after damage, 47
speed of response to stimuli, 43
right orbitofrontal cortex, music, 125
right temporal lobe (RTL)
agnosia, 201
episodic memory, 120, 476
music, 115, 118, 120, 127, 128
rodents, macrosmatic, 88
S
saccadic eye movements
bilateral optic ataxia, 338f
central vision, 323f
overt attention, 323
spatial attention, 264–266, 265f
visual perception, 324f
visual tracking task, 288
saliency maps, dorsal and ventral visual streams, 321f
score-based performance, music, 123
seizures, amnesic patient, 4, 438
selectivity
action and attention, 256–257
attention as, 237, 297–299
childhood, 308–309
mechanisms, 256
self-regulation, late infancy, 304
semantic, 368n.1
semantic categorization task, 181
semantic dementia, 354–355, 562
semantic integration, 588–590, 592
semantic judgment task, 181
semantic knowledge, 355, 359–360
semantic memory, 353, 355, 367–368, 475, 476–477
abstract knowledge, 365–366
acquisition, 354

Page 34 of 46
Index

advances in neuroscience methods, 366–367


biasing representations, 363–364
brain regions, 358
cognitive perspectives, 356
correlated feature-based accounts, 357
differences in sensory experience, 364–365
domain-specific category-based models, 356
future directions, 367–368
individual differences, 364–365
interactions between category and task, 359–360
models, 356–358
neural regions underlying semantic knowledge, 362–363
neural systems supporting, 358–363
organization, 355–358
perceptual and conceptual processing of objects, 360–362
relationship to episodic memory, 354, 477
semantic dementia, 354–355
semantic space, 367, 368
sensorimotor-based theory, 356–357
sensory-functional theory, 356–357
stimulus influence on retrieval, 358–359
task influence on retrieval, 359
temporal gradient of, 439
semantic relevance, 562
semantics
music and language, 121–122
Parallel Architecture, 580–581
sentence-level, 184–185
semantic violation paradigm, 184
semitones, music, 112
sensorimotor adaptation
explicit and implicit processes, 426–427
skill acquisition, 417
sensorimotor-based theory, semantic memory, 356–357, 364–365
sensorimotor contingency, 204
sensorimotor coordination, music, 122–123
sensory-functional theory
category-specific semantic deficits, 556, 557, 559–561
semantic memory, 356–357
sensory memory, 475
sentence-level prosody, 182–183, 184f
sentence-level semantics, 184–185
septal organ, 89, 90
sequence learning
explicit and implicit processes, 426
models of, 422–423
motor, paradigms, 423
skill acquisition, 417

Page 35 of 46
Index

working memory capacity and, 423–424


(p. 618) Sesame Street, Children’s Television Workshop, 579, 580

Shallice, Tim, 555


shape
geometrical entity, 45–46
hole, 15f
information in dorsal system, 33–34
shared-components account, 493
short-term memory, 390, 474–475
short-term store (STS), working memory, 390–391
sight, 277
sign language, double dissociation, 44
simpler syntax, 584
simultanagnosia, 46, 199, 335, 342–343, 344
simultaneous agnosia, 199
singing, music, 122–123
single-photon emission computer tomography (SPECT), 78
size-contrast illusions
Ebbinghaus illusion, 284, 285, 286
Ponzo illusion, 286, 287f
vision for perception and action, 284–288
skeletal image, 46
skill learning, 416
Bayesian models of, 418–419
behavioral models of, 417–419
closed-loop and open-loop control, 418
cognitive neuroscience, 416–417, 419–420
error detection and correction, 420–425
explicit and implicit memory systems, 425–427
fast and slow learning, 424–425
Fitts’ and Posner’s stage model, 417–418
future directions, 428–429
late learning processes, 424
practical implications for, 427–428
questions for future, 429
role of working memory, 422–424
sleep
consolidation, 443–450
rapid eye movement (REM), 443–444
role in creative problem–solving, 450
slow-wave, and cellular consolidation, 444–445
slow-wave, and systems consolidation, 445–447
slow-wave sleep
cellular consolidation, 444–445
systems consolidation, 445–447
Smart houses, 483
smell, 202. See also human olfaction; mammalian olfactory system
sniffing

Page 36 of 46
Index

mechanism for odorant sampling, 90–91


visualization of human sniffairflow, 91f
social cognitive neuroscience, 599
social interaction, olfactory influence, 104–105
social neuroscience, 599, 600
social perception, disorders in, 203
somatosensory perception, 201–202
sound measurements
amplitude compression, 140
amplitude modulation and envelope, 145–146
auditory system, 137f
frequency selectivity and cochlea, 136–137, 139f, 139–140
mapping sounds to words, 515–518
modulation, 145–148
modulation tuning, 146–148
neural coding in auditory nerve, 140–142
peripheral auditory system, 136–142
structure of peripheral auditory system, 138f
sound segregation
acoustic grouping cues, 156–158
auditory attention, 218–219, 219f
brain basis of, 160–161
separating from environment, 161–162
sound source perception. See also auditory system
localization, 150–152
loudness, 155–156
pitch, 152–155
space, 28
cognitive maps, 48–49
models of attention, 238–239
neglecting, 38–40
spaced retrieval, 481
sparsely, 23
sparsely distributed representations, faces and body parts, 22–23
spatial attention, 250–251. See also attention
behavioral evidence linking, to eye movements, 264–266
control of, 240–242
cortical networks of, 245
covert visual-, 263–264
dorsal and ventral frontoparietal networks, 245–246
early visual perception, 247–250
effect on contrast sensitivity, 247–248
effects on spatial sensitivity, 248
effects on temporal sensitivity, 248–250
eye movements and, 242–244, 263–266
functional magnetic resonance imaging (fMRI) in humans, 266–267
law of prior entry, 249–250
neural sources of, 244–246

Page 37 of 46
Index

neurophysiological effects of, 246–247


object-based models, 239–240
oculomotor readiness hypothesis (OMRH), 242–244
parietal cells, 35
premotor theory, 242–244
reflexive and voluntary control, 240–242
space-based models, 238–239
subcortical networks of, 244
zoom lens theory, 238–239
spatial coding, olfactory bulb, 94f
spatial contrast sensitivity, 196
spatial information
coordinate and categorical relations, 42f
shape recognition, 46
within object, 45–48
spatial memory task, schematic, 397f
spatial representations, 28–29, 50
actions, 34–36
brain’s topographic maps, 29–31
central vision, 323f
cognition by humans, 28–29
cognitive maps, 48–49
distinction between analog and digital, 42
lateralization of, 40–45
neglecting space, 38–40
spatial frames of reference, 36–38
spatial information within the object, 45–48
visual perception, 324f
what vs. where in visual system, 31–34
“where,” “how,” or “which” systems, 34–36
spatial sensitivity, spatial attention, 248
spatial vision, 197–198
spatial working memory, younger and older adults, 459f
spectrogram, 145
noise-vocoded speech, 147f
speech utterance, 147f
spectrotemporal receptive fields (STRFs), modulation tuning, 146–148
spectrum, pitch, 154f
speech. See also language acquisition
left hemisphere dominance for processing, 174
speech reading, 527
tactile contributions to, 532–534
visual contributions to, intelligibility, 528–529
working memory maintenance, 405f
speech perception, 507–509
acoustic properties, 509–512
articulation in stop consonants, 511
articulatory and motor influences on, 514–515

Page 38 of 46
Index

functional architecture of auditory system for, 508f


future directions, 519
invariance for phonetic categories, 513–514
lexical competition, 516–517
lexical tone, 512–513
mapping of sounds to words, 515–518
nature of information flow, 517–518
neural plasticity, 518–519
(p. 619) phonetic category learning in adults, 518–519

spectral properties, 511


temporal properties, 510–511
voice onset time (VOT), 511–512, 515
voicing in stop consonants, 510–511
vowels, 511–512
speech perception, multimodal, 544–545
audiovisual, 534–538, 545n.8
audiovisual speech integration, 531–532, 542–544
auditory perception of speech, 530f
auditory speech signal, 526–527
cross-talk between senses, 524–525
facial movements and vocal acoustics, 529–531
linking articulatory movements and vocal acoustics, 525–526
lip reading, 530f
McGurk effect, 531–532
simultaneous communication by modalities, 525f
tactile aids, 533–534
tactile contributions, 532–534
visual contributions to speech intelligibility, 528–529
visual speech signal, 527–528
spelling. See also reading and writing
cognitive architecture of, 492–493
mechanisms, 493
representational model, 492f
split-brain patient, 45
spotlight, spatial attention, 238
stage model, skill acquisition, 417–418
state, attention as, 297
stimulus-driven, attentional shifts, 258
storage, stage of remembering, 474
streaming, sound segregation, 159
stroke, 481
gustatory disorders, 203
object identification, 568
reading or spelling, 494
spelling, 497, 498f
visual field disorders, 195
Stroop task, 299, 302
subcallosal cingulate, 125

Page 39 of 46
Index

subcortical networks
auditory system, 142
neglect, 40
spatial attention, 244
suicide, 473
Sullivan, Anne, 525f
superior colliculus, 30, 36
attention, 322, 323
audiovisual speech, 540
orienting network, 301
spatial attention, 238, 242, 244, 267
visual processing, 277f, 283
superior parietal lobe
agent operating in space, 36
eye movement, 35
limb movements, 37
superior parietal occipital cortex (SPOC), 282
superior temporal cortex, 36, 40, 187, 226, 327, 404–406, 512
superior temporal gyrus (STG), 40, 144f, 187
auditory attention, 227
auditory cortex, 144f
language, 173, 174f
memory, 225f, 402
music, 115, 128f
semantic memory, 365
speech perception, 509
speech processing system, 508f
(p. 620) unilateral neglect, 333

written language, 491, 494, 497, 500


superior temporal sulcus (STS), 144f, 566
auditory brain, 200
audiovisual speech, 540
language, 173
semantic memory, 366
speech perception, 509
working memory, 404, 405f
supplementary motor area (SMA)
rhythm discrimination, 118
sensorimotor coordination, 122
suppression, 160
supramarginal gyrus (SMG)
reading and spelling, 494, 499
speech perception, 509
speech processing system, 508f
surfaces, response, 15f
syntactic phrase structure
emerging ability to process, 175
language, 185–187

Page 40 of 46
Index

syntactic violation paradigm, 186, 187


syntax, music and language, 121
systems consolidation, 438–441. See also consolidation
slow-wave sleep and, 445–447
temporal gradient of autobiographical memory, 439–440
temporal gradient of semantic memory, 439
T
tactile contributions
aids, 533–534
speech, 532–534
Tadoma method
individual communicating by, 533f
sensory qualities, 534
tapping, music, 122
taste, qualities, 202
tears, chemosignaling, 104f
tele-assistance model, metaphor for ventral-dorsal stream interaction, 289
temporal lobe structures, speech perception, 509
temporally graded retrograde amnesia, 443
temporal order judgment (TOJ), 536f, 545n.7
temporal-parietal junction (TPJ), 238
temporal properties, speech perception, 510–511
temporal sensitivity, spatial attention, 248–250
temporal ventriloquist effect, 537
testing effect, consolidation, 451
Thai language, speech perception, 512–513
timbre
music, 114–115
music perception, 118–119
time
deployment of attention in, 230, 231f
music, 112, 113f
toddlers, attention network development, 302–306
Token Test, 44
tonality. See also music
brain regions, 128f
detecting wrong notes and wrong chords, 115–117
dynamics, 117
music, 112–114
pitch and melody, 115
tonal hierarchies, 113
tone deafness, amusia, 127, 201
tonotopy, auditory system, 143
top-down effects
contextual, 65–66
interfunctional nature, 67–69
modulation, 61, 69
visual perception, 60–61

Page 41 of 46
Index

working memory, 408–410


topographagnosia, 200
topographic maps, brain, 29–31
trace hardening, consolidation, 436–437
transcranial direct current stimulation (tDCS), 366
transcranial magnetic stimulation (TMS)
analysis method, 2
mental imagery, 75–76
motor imagery, 84
object-related actions, 361
phonetic categorization, 515
prefrontal cortex function, 467
repetitive TMS (rTMS), 77, 228, 422
semantic memory, 366
verbal working memory, 405
transient attention, perceived contrast, 250
traumatic brain injury (TBI), 473, 478, 481
trial-and-error learning, 481
trigeminal nerve, odorants, 89–90
Tulving, Endel, 353
tuning curve, attention, 222–223
tuning mechanism, attention, 261
tunnel vision, 196
U
unconscious competence, skill learning, 417, 418f
unification, 584
unilateral neglect, 328f, 328–334
dissociation between neglect and extinction, 328–331
lateralized and nonlateralized deficits within, 331–333
Posner paradigm, 339f
posterior parietal cortex organization, 333–334
unilateral posterior lesions, object recognition task, 47f
unimodal lexical-semantic priming paradigm, 181
V
ventral frontoparietal network, spatial attention, 245–246, 250
ventralmedial prefrontal cortex (VMPFC), attention, 229
ventral visual stream
central vision, 322
domain-specific hypothesis, 565–567
episodic retrieval, 383
functional organization of human, 12–14
interaction with dorsal stream, 289–290
landmark test characterizing, 321f
linear gap vs. object in gap, 33f
nature of functional organization in human, 21–23
object identification, 32
perception and vision with, damage, 279–280
responses to shape, edges and surfaces, 15f

Page 42 of 46
Index

saliency map, 321f


vision, 194–195
visual processing, 277f
ventrolateral prefrontal cortex (VLPFC), 128f, 364
attention system, 119
audition, 200
musical syntax, 117
ventromedial prefrontal cortex (VMPFC), 123, 125, 229
verbal working memory, 402–407
aphasia, 402–403
language disturbances, 402–403
phonological loop, 403–404
prefrontal cortex (PFC) for younger and older adults, 459f
short-term memory deficits, 406
vertical and horizontal axes, spatial vision, 197
vibration, pitch, 154f
vision, 2
color, 196–197
dorsal visual pathway, 194–195
human representation of space, 29
image formation on retina, 29–31
motion perception, 198
neural computations for perception, 283–288
spatial, 197–198
spatial contrast sensitivity, 196
ventral visual pathway, 194–195
visual acuity, 196
visual adaptation, 196
visual field, 195–196
visual acuity, 196, 324f
visual adaptation, 196
visual agnosia, 558
bilateral damage, 32, 33f
grasping for patient with, 279f
neuropsychology, 322
visual identification and recognition, 198–200
visual attention. See also attention
capacity limitations, 259, 260–261
change-blindness studies, 257–258
feature-based, 258–259
introduction to, 257–259
limits of, 257–258
location-based, 259
object-based, 258
tuning, 258–259
visual brain, functional specialization, 194–195
visual control of action, 273
interactions between two streams, 289–290

Page 43 of 46
Index

neural computations for perception and action, 283–288


neural substrates, 277–280
neuroimaging evidence for two visual streams, 280–290
neuroimaging of DF’s dorsal stream, 281f
neuroimaging of DF’s ventral stream, 280f
reaching, 282–283
reach-to-grasp movements, 274–276
size-contrast illusion, 284–288
superior parietal occipital cortex (SPOC) and visual feedback, 282
visual cortex, topographic representation, 30f
visual cues
extraction of, in audiovisual speech, 538–540
loudness, 156
speech production, 528f
visual disorientation syndrome, 327
visual mental imagery
early visual areas, 76–78
higher visual areas, 78–82
reverse pattern of dissociation, 80
ventral stream, shape-based mental imagery and color imagery, 78–81
visual-motor psychophysics, 275
visual-motor system, 273. See also visual control of action
cues for grasping, 275
double-pointing hypothesis, 286–288
grasping, 283, 285–286, 288
grip aperture, 285, 288
interactions of ventral–dorsal streams, 289–290
neural computations for perception and action, 283–288
Ponzo illusion, 286, 287f
size-contrast illusion, 284f
visual object working memory, 399–400
visual perception, 4–5
activate-predict-confirm perception cycle, 62–63
ambiguous objects, 68–69
bottom-up progression, 67f
contextual top-down effects, 65–66
error-minimization mechanisms, 63
feedback connections, 61–62
feedforward connections, 61–62
global-to-local integrated model, 63–64
importance of top-down effects, 60–61, 69–70
interfunctional nature of top-down modulations, 67–69
magnocellular (M) pathway, 64, 65
prediction in simple recognition, 62–65
spatial attention, 240–242, 247–250
top-down facilitation model, 64
triad of processes, 324f
understanding visual world, 61–62

Page 44 of 46
Index

visual processing, division of labor, 277–278


visual sequence task, 305
visual spatial attention, 237
visual spatial localization, 197
visual-spatial sketchpad, working memory, 392, 393, 475
visual-spatial working memory, 396–399
visual speech signal, properties and perception of, 527–528
visual synthesis, 323
visual synthesis impairment, 344
visual tracking task, eye movements, 288
visual word form area (VWFA), 13, 495–496
visuo-spatial attention
central vision, 323f
visual perception, 324f
voice onset time (VOT), speech perception, 510–511, 515
voluntary control, spatial attention, 240–242
vomeronasal organ (VSO), 89–90
vowels, speech perception, 511–512
voxel-based morphometry
brain analyses, 126, 127
reading and spelling, 494
W
WAIS-R Block Design Test, 45
Wallach, Hans, 161
Warrington, Elizabeth, 354, 555
Weber’s law, 286, 288
Wernicke’s aphasia, 403
Wernicke’s area, 111
Williams’ syndrome, 32
(p. 621) Wilson, Don, 96

words
frame of reference, 38–39
mapping of sounds to, 515–518
meaning in language acquisition, 180–182
word-learning paradigm, 181
word stress in language acquisition, 177–179
working memory, 389–390, 410
aging process, 456
audition, 162–163
capacity, 400–402
capacity expansion, 402
central executive, 392, 407
cognitive control of, 407–410
compensation-related utilization of neural circuits hypothesis (CRUNCH), 459
delayed response task, 394–395, 396f
development of concept of, 391–394
dorsolateral prefrontal cortex (DLPFC), 119
emergence as neuroscientific concept, 394–395

Page 45 of 46
Index

episodic buffer, 393–394


event-related study of spatial, 399f
frontal lobe, 34
functional neuroimaging studies, 395–400
hemispheric asymmetry reduction in older adults (HAROLD), 458–459
lexical matches, 586, 587f
linguistic, 593n.10
maintenance, 391, 405f
medial temporal lobes (MTL), 463
musical processes, 120
n-back task for sustained, 224, 225f
neurological evidence for short- and long-term stores, 390–391
phonological loop, 392–393, 407
positron emission tomography (PET) studies, 397–398
prefrontal cortex, 194
prefrontal cortex (PFC), 394–395
reactivation, 407–408
recall accuracy as function of serial position, 390f
role in skill learning, 422–424
short-term and, 390, 474–475
spatial, 326f
spatial memory task, 397f
syntactic department of, 588
verbal, 402–407
visual object, 399–400
visual-spatial, 396–399
visual-spatial sketchpad, 393
written language. See reading and writing
Z
zombie agent, dorsal system, 34
zoom lens theory, spatial attention, 238–239

Page 46 of 46

You might also like