Progress in the Simulation of Emergent
Communication and Language
Kyle Wagner1, James A. Reggia2, Juan Uriagereka3, Gerald S. Wilkinson4
1
Sparta, Inc. 2Department of Computer Science, University of Maryland, College Park
3
Department of Linguistics, University of Maryland, College Park 4Department of Biology,
University of Maryland, College Park
This article reviews recent progress made by computational studies investigating the emergence, via
learning or evolutionary mechanisms, of communication among a collection of agents. This work
spans issues related to animal communication and the origins and evolution of language. The studies
reviewed show how population size, spatial constraints on agent interactions, and the tasks involved
can all influence the nature of the communication systems and the ease with which they are learned
and/or evolved. Although progress in this area has been substantial, we are able to identify some
important areas for future research in the evolution of language, including the need for further computational investigation of key aspects of language such as open vocabulary and the more complex
aspects of syntax.
Keywords multi-agent systems · evolution of communication · genetic algorithms · neural networks ·
animal communication · language
1
Introduction
How does an effective communication system arise
among a collection of initially noncommunicating
individuals? Answering this question is important for
at least two reasons. First, scientifically, it is desirable
to understand the evolution of animal communication,
the origins of language, and how language has evolved
and is culturally transmitted. Second, technologically,
there is the potential that an understanding of the fundamental principles involved may lead to innovative
communication methods for use by interacting software agents and in multi-robot systems. Support for
this latter point of view comes from the successful
development of other forms of biologically inspired
computation (neural networks, genetic algorithms, ant
colony optimization algorithms, immunologically inspired computing, etc.) that have emerged during the
last few decades.
As an example, consider understanding the origins of human language. Progress in this area has been
slow, mainly due to scanty, ambiguous evidence and
the difficulty in finding appropriate species and
behaviors for comparative studies. After more than a
century of intense study there are still many conflicting theories about the origins and evolution of language (see, for example, Dingwall, 1988; Wind,
Pulleyblank, de Grolier, & Bichakjian, 1989; Donald,
1993; Pinker, 1994; Aitchison, 1996; Dunbar, 1996;
Deacon, 1997; Bickerton, 1998; Dickins, 2001). Our
understanding of this issue is impaired by the limitations of experimental investigative methods in analyz-
Correspondence to: K. Wagner, Sparta, Inc., 1911 N. Fort Myer Dr.,
Suite 1100, Arlington, VA 22209, USA.
E-mail: kwagner@sparta.com,
Tel.: +1-703-7973009, Fax: +1-703-5580045
Copyright © 2003 International Society for Adaptive Behavior
(2003), Vol 11(1): 37–69.
[1059–7123 (200303) 11:1; 37–69; 035919]
37
38
Adaptive Behavior 11(1)
ing a process (communication) that has not left a
meaningful fossil record. In this context, there has been
a recent surge of interest in using computer simulations
to ask “what if” questions about specific scenarios. By
building a computational model, the assumptions and
implications of a theory about the evolution of animal
communication or language can be made explicit and
their implications examined. Although surely there will
be unknowable details in the actual story of the origins
of language, some general trends and features may be
discovered through the convergence of simulation work
and more traditional experimental approaches. For
example, one would like to know (in principle, at least)
any processes and behaviors necessary for, or facilitative to, the emergence of language (working memory,
learning abilities, cognitive prerequisites, etc.), plausible intermediate stages on a path from simple signaling
to language, social factors involved in the acquisition of
language from a community, and so forth. Computer
simulation experiments may suggest answers to many
of these questions.
The goal of this article is to review and critique the
recent rapid progress made, using computer simulations, in studying how shared communication systems
can arise in a population of interacting agents (individuals) via learning or simulated evolution. Although
many of these computer simulations have aimed to illuminate the emergence1 of communication, the results in some cases apply to the special case of human
language. While Parisi and Steels reviewed the
progress of simulations investigating the evolution of
language in 1997 (Steels, 1997; Parisi, 1997), much
has happened since then that makes a new review
timely. Another review by Kirby has been published
that covers the emergence of language (Kirby, 2002).
Kirby’s review focuses on syntax, meaning (grounding), and one specific method of acquisition (the iterated learning method), whereas we take a broader
view here, including not only work on language, but
also on how animal communication (that may relate to
human language) arises. We discuss various methods
of acquisition/transmission, and we also focus on
properties of communication systems in general, using
a different framework (Hockett and Altmann’s “design
features”). Although our coverage is fairly complete,
it is not exhaustive.2 In addition, there is a very recent
collection of articles on the evolution of language and
communication (Cangelosi & Parisi, 2002) containing
papers very similar to earlier versions that we have al-
ready reviewed here. Regardless, we have tried to be
representative of the many issues examined and approaches taken.
Our analysis is organized as follows. Section 2
begins by briefly describing the kinds of simulations
we will be considering and suggests a framework that
places each simulation in one of four general categories. In each category, we first describe a few representative studies, and then we briefly summarize the
results of many others. Section 3 analyzes the issue of
which of the many aspects of language have actually
been addressed by the simulations we reviewed. This
could be done in a number of ways, but we chose to
use the feature system of Hockett and Altmann (Hockett & Altmann, 1968) to organize the analysis. This
well-known framework characterizes any communication system in terms of a collection of features or
properties (repertoire, structure, groundedness, etc.)
that applies both to animal and human communication. Hockett and Altmann’s framework does not
address many language-specific concerns (e.g., syntactic properties), but it is more amenable to the problem
of communication in general. Since it antedates the
computational studies we review and was developed
independently of them, it provides a useful and objective context in which to assess the accomplishments
and limitations of models of emergent communication.
Section 4 concludes our analysis, summarizing the conclusions and suggesting important directions for future
research.
2
Computational Models of Emergent
Communication
In this section, we review a broad array of models of
communication that emerges among initially noncommunicating agents via either learning or simulated
evolution. Although a number of approaches might be
taken to organizing this material, we find it intuitive
and useful to divide past work into four main categories, based on whether the agents involved are situated
in an artificial world, and whether the communication
acts use single or several unstructured tokens versus
structured utterances composed of multiple tokens.
Situated agents should be able to develop a closer connection than nonsituated agents between each signal
and its meaning, especially because each meaning will
be related to some object or context in the world (e.g.,
Wagner, Reggia, Uriagereka, & Wilkinson
as argued and demonstrated in Harnad, 1990). Studies
of nonsituated agents sacrifice realism and grounded
signals since they have no world or body to relate
these signals to, but we have discovered that they are
generally able to focus more closely on the dynamics
of the emergence and use of a communication system.
Structured utterances may be necessary for agents that
operate within complex environments (and this is certainly a general trend in the studies we present in this
review), whereas unstructured communication should
suffice for agents that need to perform tasks with
fewer nuances (e.g., finding food or avoiding predators). The approaches for each of these four categories
tend to be similar: Most of the studies within a category tend to use the same adaptive process and similar
tasks.
Situated simulations place agents in an environment or “artificial world” to which the agents have
some causal connection.3 Just being in an artificial
world in which objects can be perceived is not enough
for an agent to be classified as situated in this review.
To be situated, an agent must also interact in noncommunicative ways with various entities such as food,
predators, and other agents and must have outputs that
can affect the environment and/or modify its own
internal state. On the other hand, in nonsituated simulations an agent’s actions consist solely of sending and
receiving signals. Such nonembodied agents do not
have noncommunicative interactions with objects or
each other beyond being able perhaps to perceive
objects or events.
Simulations can also be divided based on the
kinds of communication employed by agents: structured versus unstructured. Structured utterances are
composed of smaller units, such as the words forming
a phrase. They can be emitted sequentially or simultaneously. Agents sending sequentially structured utterances produce each unit of the utterance over time,
such as a string of symbols or a series of speech articulator commands. We include in this category agents
that produce a structured utterance all at once, where
hearers interpret the utterance as having parts (a bit
like reading and parsing an entire sentence in a single
moment). Other agents use unstructured utterances
where the utterance is one unit. This includes agents
whose utterances consist of single units on multiple
channels, but the values on different channels have no
relationship to each other and are not dependent on the
other channels for their interpretation. Thus, if the
Simulation of Emergent Communication
39
response to a multi-channel utterance depends on
knowing the values of both channels, then we classify
the utterance as structured. On the other hand, if the
response to the utterance requires knowing the value
of one channel and ignoring the other, then the utterance is unstructured. These divisions yield four basic
types of simulations: nonsituated, unstructured; nonsituated, structured; situated, unstructured; and situated, structured. Accordingly, we organize our review
of past work into these four categories below.
From a computational perspective, the simulations reviewed here are multi-agent systems (Ferber,
1999; Weiss, 1999), meaning that they simulate an
entire population of individuals, or agents, allowing
each agent a chance to act. Agent behavioral mechanisms include finite-state machines, neural networks
of many kinds, lookup tables, production systems, and
hybrid or novel mechanisms. Agents acquire a shared
communication system either by using machine learning methods (e.g., backpropagation of errors in neural
networks) or through a simulated evolutionary process
(e.g., genetic algorithms). In nonsituated simulations,
where agents typically interact with each other in the
absence of a world or environment, the interactions
are usually but not always between pairs of agents.
An interaction within a pair of agents in general
involves each member of the pair both “speaking”
and “listening,” possibly learning from their interactions. Nonsituated simulations typically treat agents
as signal encoders/decoders, and the task is often to
communicate as effectively as possible. In contrast,
situated multi-agent simulations usually allow agents
to interact with and affect multiple other agents in an
artificial world, and multiple speakers may send signals simultaneously, requiring hearer agents to ignore
all but one signal. Often there is a noncommunicative
task to solve for which communication may be helpful
(e.g., finding food or other items, avoiding predators,
moving objects from one location to another).
2.1 Nonsituated, Unstructured
Communication
In simulations involving nonsituated agents and
unstructured signals, agents are typically paired randomly and given arbitrary meanings or internal states
to communicate to each other. Usually, the agent’s
task is to encode an arbitrary meaning as a signal and
send it to another agent, who decodes the signal back
40
Adaptive Behavior 11(1)
Figure 1 An encoder/decoder interaction between two agents in a typical nonsituated simulation. In step I, two agents
are randomly chosen from the population. Here, agents 2 and 5 are chosen from a population of seven agents. In step
II, agent 2 is designated to be the sender while agent 5 is designated as the receiver. Agent 2 is given a “meaning” (or
state) to communicate to the receiver, and it encodes this meaning, meaning1, as a signal. Agent 5 decodes the signal
to derive meaning2. In step III, the receiver’s decoded meaning2 is compared with the original meaning1 given to the
sender. If they match, then communication was successful. After successful communication has concluded, either the
sender or receiver or both will be awarded fitness points (for evolutionary simulations), or they will learn from the interaction (for nonevolutionary simulations).
into a meaning (see Figure 1). We summarize the
results of 24 such studies here that are mostly encoder/
decoder games, although a few simulations involve
mating calls and female preferences or visual discrimination (see Table 1). These simulations typically
involve agents who are evolving or learning to communicate (rarely both). Simple feedforward neural
networks, lookup tables, and similar associative memories are the mechanisms usually used for relating
meanings or internal states to signals.
Overall, these simulations demonstrate several
properties in the emergence of simple communication
systems. They establish in the simplest of settings that
a shared communication system can readily evolve or
be learned by a population, and that the type of learning involved can be of different forms. Consensus
among evolving signalers is best achieved when the
signaler (at least) benefits from good communication, whereas agents who are endowed with observational learning can best achieve consensus when their
population size is small. These simulations have also
shown that spatial constraints encourage the emergence of signaling but can lead to local dialects and
global variations. They show that while population
flux can introduce variation into a communication
system, it does not always disrupt the system. Finally,
genetic factors and female choice are found to play a
role in the kinds of communication that can evolve.
2.1.1 Featured Examples We consider two studies
that are representative of work in this area: One
focuses on the evolution of communication and the
other looks at how a population could learn a system.
Both use experimental designs to investigate the
effects of specific factors on the resulting communication systems, and one (Levin, 1995) describes a set
of highly controlled experiments. In the first of these,
Levin (1995) studied various ecological and evolutionary factors in the evolution of communication.
Populations of agents that had internal states and
externally observable states (observables, represented
as vectors of integers) were simulated. Encoders and
decoders were matrices, specified in the agent’s
genome (see Figure 2). Each agent’s goal was to
guess another agent’s internal states by paying attention to that agent’s observables. During each generation, each agent Ai was randomly paired with members
of a subset of the population (Aj). The size of this subset was determined by a parameter, gregariousness,
defined as the fraction of the population with which an
agent interacts. For each pairing, Ai was given a ran-
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
41
Figure 2 (A) Example encoder and decoder matrices for two internal states and two observables as used in Levin’s
(1995) work, plus the genome that specifies them. (B) Example of using the Menc matrix and a set of internal states I to
produce a set of observables O. I is supplied by the program and has arbitrary values for each interaction. (C) Example
of using another agent’s Mdec matrix to decode the O in the previous example. In this case, the second agent’s decoder
perfectly decodes O so that I2 is the same as I.
Figure 3 Three-layer feedforward neural network used by Hutchins and Hazlehurst’s (1995) agents in the first experiment. A “visual scene” (this is a label of convenience only—there is nothing spatial about the inputs or how the net interprets them) is presented to the net’s 36 input units. Activation propagates through the net to each layer of units. The
goal is to make the output layer the same as the input layer (autoassociation). However, the crucial task for the agent is
to discover and use the “verbal” layer as a description of the visual scenes. A scene is presented to the net, and the activations of the units in the verbal layer are interpreted as the agent’s signal, describing the scene. This signal can be presented to another agent by setting its verbal units to the first agent’s verbal layer activations. If the second agent
produces a visual output consistent with the first agent’s visual input, then effective communication has occurred.
dom internal state and encoded it as a signal (Levin’s
observables). Ai’s partner, Aj, decoded this signal. Ai’s
fitness would increase proportionately to the similarity between its actual internal state and Aj’s guess. A
genetic algorithm was used to replace the least fit twothirds of the population with offspring from the mostfit one third, creating new agents by applying multiple
mutations and sometimes crossover to the agents with
the highest fitness.
Levin manipulated the population size, selection
method, mutation rate, use of crossover, number of
states and signals, gregariousness, and number of
interactions per agent. In most cases, the population
converged to one mapping of states to signals (with
four states and signals). Most manipulations had little
overall effect on the evolution of consensus among the
agents. However, larger population sizes caused the
population to converge (achieve consensus) more
quickly than smaller ones, while more observables
and internal states slowed convergence. Crossover
with mutation speeded up evolution more than mutation alone, as would be expected since the signal systems were represented by matrices, so crossover could
splice together good sections of matrices to create
something better than the parents. Finally, gregariousness at around 40% (contact with around 120 agents)
was optimal for consensus.
In another encoder/decoder study, Hutchins and
Hazlehurst (1995) used agents with autoassociative
two- and three-layer neural nets. The interactions
were similar to Levin’s above, except that these agents
learned with backpropagation instead of evolving. In
their first experiment, six agents with three layers of
weights learned a set of associations between 12 “visual” input patterns and themselves. Visual patterns
were 6 × 6 “scenes” representing phases of the moon,
42
Table 1
Adaptive Behavior 11(1)
Studies involving nonsituated, unstructured communication
Simulation
Adaptive
processa
Behavioral
mechanismb
Type of communication/Task
Berrah and Laboissière 1999
Bullock 1998
Bullock and Cliff 1997
De Boer and Vogt 1999
Dircks and Stoness 1999
Enquist and Arak 1994
Hurd et al. 1995
Hurford 1989
Hutchins and Hazlehurst 1995
Johnstone 1994
Kaplan 2000
Krakauer and Johnstone 1995
Krakauer and Pagel 1995
Levin 1995
Livingstone and Fyfe 1999a, b
Noble 1999a
Noble 1999b
Oliphant 1996
Oliphant 1999
Ryan et al. 2001
Smith, K. 2002a
Smith, K. 2002b
Steels and Kaplan 1999
Wagner and Reggia 2002
L
E
E
L
L
E
E
E+L
L
E
L
E
CA
E
L
E
E
E
L
E+L
E+L
L
L
E+L
Assoc?
FNN
FNN
Assoc
FNN
FNN
FNN
Table
FNN
FNN
Table
FNN
Fixed strategy
Table
FNN
Table
Params
Table
FNN
RNN
FNN
FNN
DT, Assoc, Rob
FNN, Table
Encoding/decoding
Visual discrimination
Visual discrimination
Encoding/decoding
Encoding/decoding
Visual discrimination
Visual discrimination
Encoding/decoding
Encoding/decoding
Visual discrimination
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Mating advertisement
Mating advertisement
Encoding/decoding
Encoding/decoding
mating call discrimination
Encoding/decoding
Encoding/decoding
Object description
Encoding/decoding
a
CA = cellular-automaton adaptation, E = evolution, L = learning
Assoc = associative memory, DT = discrimination trees, FNN = feedforward neural net, Params = agent/contest
parameters, RNN = recurrent neural net, Rob = robotic, Table = lookup table/matrix, ? = paper does not provide
enough information
b
and each agent had to reproduce in its outputs the
same scene that was given to its inputs (36 units for
each input and output layer). Agents also developed
two hidden-layer representations (note that while
supervised learning was being used, these hidden
units were free to develop any adequate hidden-layer
representation). One of these layers was designated
the “verbal layer” (see Figure 3). Pairs of agents—a
speaker and a listener—were chosen at random and
shown one of the 12 scenes. The listener used the
speaker’s verbal layer as a target during supervised
learning (backpropagation). The listener also learned
to autoassociate the scene, which helped ensure a
unique verbal layer. In a first experiment, agents were
able to develop a unique signal for each scene, and
they were able to pass on their system to other agents.
Eventually, the whole population achieved consensus
(low variability between agents for each signal).
In a second experiment using simpler agents over
thousands of interactions, each agent’s verbal activation space showed distinct representations for each
scene. Thus, consensus could be achieved through a
supervised learning paradigm. Figure 4 shows how
variability decreased over time among each agent’s
signal for a particular scene. This would be expected
for the population to arrive at a consensus. Conversely, the same graph shows how each meaning’s
signals differed from the others, which is also crucial
for distinguishing among different signals. However,
as population size increased from 5 to 15, consensus
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
43
initial population variation due to larger population
sizes raises the probabilities for good initial signal–
meaning mappings, which can become represented and
modified in greater numbers in future populations.
With a larger population, the chances are greater that
at least a few agents will initially have partially compatible signaling systems. This would accelerate the
process of evolving consensus.
2.1.2 Survey of Other Related Work Other work with
Figure 4 Summary of Hutchins and Hazlehurst’s (1995)
graphs of signal divergence versus agent interactions
(time). Each signal emitted by an agent describes a visual
scene (the “meaning” of the signal). Two different measures are shown for populations of size 5 and 15. One
measure (same scenes) is the variability between different agents’ signals for the same scene. Initially, agents
will use different signals for the same scene and so
exhibit higher variability. This variability continues to
decrease slightly over time, indicating that the population
closely agrees on a single signal for each meaning. The
other measurement (different scenes) is variability between
signals for different scenes. This increases over time, indicating that each meaning eventually gets a distinct signal.
Smaller populations take less time to create unique and
distinct signals than larger populations. Adapted from
Hutchins & Hazlehurst (1995).
was harder to achieve (distinctions between agents’
signals for the same meaning did not drop nearly as
quickly when there were 15 as opposed to 5 agents).
This study was limited both by the small population
sizes in which communication would arise, and the
unrealistic assumption,4 fairly common among simulations incorporating learning, that agents could use
supervised learning.
The effects of population size on convergence conflict with Levin’s findings (above). This is almost certainly due to the different acquisition (transmission)
mechanisms: Hutchins and Hazlehurst’s agents used
learning whereas Levin’s agents used evolution. When
agents teach each other their signals for various “meanings,” then the more agents in the initial population, the
more variation exists, and thus the more interactions
will be needed before everyone settles into a stable set
of associations. For genetically endowed signaling, initial population variation is important for natural selection to work and hastens the rate of evolution. Higher
nonsituated agents using unstructured communication
has shown similar results to the simulations described
above, demonstrating that either learning or evolution
can account for the emergence of a shared communication system. Studies have begun to explore the
effects of what is learned as well as when and how
something is learned. For example, in one study agents
focused on learning shared signal systems (Oliphant,
1999) and were able to achieve consensus when using a
variant of Hebbian learning that employed lateral inhibition to encourage unique signals for different meanings (how). Just as with Hutchins and Hazlehurst’s
(1995) and Wagner and Reggia’s (2002) simulations,
increasing the number of signals and meanings, as
well as increasing the population size, increased the
time to convergence on one communication system.
Wagner and Reggia’s work further showed that
larger population sizes allowed agents to achieve
consensus more easily when agents evolved than
when they learned (Wagner & Reggia, 2002). Furthermore, another study demonstrated that the stability of a learned communication system is enhanced
when older agents cannot learn, that is, only young
agents learn (de Boer & Vogt, 1999) (when). In perhaps the earliest encoder/decoder simulation, Hurford
(1989) showed that the learning strategy (what) is very
important to overall communicative success. Saussurean learners, who learn their encoder and decoder
associations from others’ decoder outputs (but not from
others’ encoder outputs) perform better than agents
using more precise forms of imitation, where the agent
uses others’ encoder and decoder outputs to learn its
own encoder and decoder associations.
Other work has examined how consensus is
affected by various factors. Agents achieve consensus
more readily when they have learning biases that favor
one-to-one mappings between meanings and signals,
and a genome can confer both learning biases and
44
Adaptive Behavior 11(1)
learning rules in the aid of learning (Smith, 2002a, b).
Benefits to the hearer and sender are also important
factors in the emergence of communication. It has been
found that consensus in a communication system could
evolve when the sender and hearer both benefited from
accurate communication, but that if only the hearer benefited, spatial constraints were needed (Oliphant, 1996).
Specifically, when just hearers were rewarded, a population of agents could only achieve consensus if all agents
could only signal and mate with other nearby agents
(and offspring were placed nearby). Such spatial constraints have important implications concerning communication variation. When agents learn from each
other, spatial constraints can lead to consensus, but local
dialects will develop and there will be substantial global
variation (Livingstone & Fyfe, 1999a, b). Population
flux (migrants entering a population) also clearly adds
variation to a communication system, although the system can still remain stable overall (de Boer & Vogt,
1999; Kaplan, 2000). Variation can also arise due to the
noise and variability inherent in learning and in whom
agents interact with (Dircks & Stoness, 1999).
Simulations have shown that populations of communicators can self-organize their communication
systems, and some studies have found this even without direct pressure to do so. Kaplan found that the
utterances developed by a population (in this case,
sets of digits, for example, “25291”) tended to move
toward medium length (Kaplan, 2000).5 Very short
utterances could be interpreted as something entirely
different if there were just one error among the utterance components (e.g., “12” instead of “17”), whereas
long utterances were more susceptible to higher levels
of noise. Medium-length utterances might still be
understood with an error but would be less likely to
have an error in the first place. Another useful feature
of a self-organized communication system is openness, where new “words” can be added and “meanings” can change over time. A robotic simulation by
Steels and Kaplan (1999) showed that agents could
continually reshape their lexicon, adding new words
and refining or modifying the meanings of old words
in response to encountering new objects.6 Robotic
agents perceived objects as collections of features and
used different “words” to describe a distinctive feature
about a particular object. However, since each agent
perceived different features of each object, ambiguities would arise as to what object was being referred
to. Nevertheless, given enough time and with the
added help of “pointing” to objects, agents could create a shared lexicon that could also be extended or
modified when new objects were added to the group
they had to describe (though pointing may not be necessary, as A. Smith’s (2001) work suggests).
Somewhat different work has addressed how
human vowel systems might self-organize. For example, one study showed that agents could create realistic
vowel systems based on discrimination constraints and
lookup “error” (when the wrong item is recalled from an
associative memory due to a noisy cue; de Boer & Vogt,
1999). These agents learned by hearing the speaker’s
vowel, trying to reproduce it, and using feedback from
the speaker to modify their own production. However,
agents do not necessarily have to rely on the speaker for
feedback, as shown by Berrah and Laboissière (1999).
In this simulation, agents modified a vowel sound until
it was close enough to the vowel sound they had heard.
In both of these simulations, the agents’ vowel systems
were claimed to be similar to real, human vowel systems
along certain featural dimensions.
In addition to population dynamics and learning
methods, simulations have shown that details of the
evolutionary and genetic processes themselves can
play an important role in the emergence of a signaling system. Properties of “calls” can be affected by
female preferences (Noble, 1999a; Ryan, Phelps, &
Rand, 2001) and historical remnants of earlier evolutionary processes (Ryan et al., 2001). Preferences for
symmetrical visual signals can arise in position- and
orientation-invariant object recognition due to sensory biases (Enquist & Arak, 1994), biases for
homogeneity (Bullock & Cliff, 1997), and mate recognition (Johnstone, 1994); distinct signals can arise
from competition among signalers for receivers
(Hurd, Wachtmeister, & Enquist, 1995); and honest
signaling can arise under a variety of conditions (Bullock, 1998). Honesty in calls (mate advertisement) is
usually necessary for them to carry information about
the signaler, but honest signals require extra pressures,
such as costly signals (Krakauer & Johnstone, 1995)
or spatial constraints (Krakauer & Pagel, 1995)
before they will emerge. Nevertheless, calls do not
need to be honest (yield a fitness benefit for the
hearer) in the face of certain genetic correlations
(pleiotropy, hitchhiking7), mutational lag (when mutations are slower than environmental changes), or sensory biases (preferences for certain kinds of sounds
due to other sensory needs such as predator vigilance).
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
45
Table 2 Studies involving nonsituated, structured communications
Simulation
Adaptive processa Behavioral mechanismb Type of communication/Task
Batali 1994
Batali 1998
Brighton 2002
Hare and Elman 1995
Kirby 1998
Kirby 1999
Kirby 2001
Kirby and Hurford 1997
Kvasnicka and Pospichal 1999
MacLennan and Burghardt 1993
Smith, A. 2001
Steels 1998a
Steels 1998b
Steels and Oudeyer 2000
Werner and Todd 1997
E+L
L
L
L
L
L
L
E+L
E+L
E+L
L
L
L
L
E
RNN
RNN
FSMs
FNN
Grammar
DCG
DCG
Table
RNN
FSM
Table
PS?
DT, Assoc, Rob
Assoc
Preference matrix
String recognition
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Encoding/decoding
Object names
Object description
Object description
Object description
Encoding/decoding
Mate choice
a
E = evolution, L = learning
Assoc = associative memory, DCG = definite clause grammar, DT = discrimination trees, FNN = feedforward
neural net, FSM = finite-state machine(s), Grammar = production grammar, PS = production system, RNN =
recurrent neural net, Rob = robotic, Table = lookup table, ? = paper does not provide enough information
b
In fact, if hearers have a negative payoff for responding to a call, they may still evolve to respond to signals
due to mutational lag or sensory bias (Noble, 1999b).
2.2 Nonsituated, Structured Communication
A second class of simulations has given nonsituated
agents the capacity for more complex communication
and has studied how structured utterances can emerge.
Some investigations have focused on the changes that
occur to a complex communication system and what
can bring about those changes. For the most part, the
issues are similar to unstructured signals. However,
structured signals require more complex mechanisms
and a motivation for that complexity to be built and
maintained. Just as with the nonsituated, unstructured
simulations of Section 2.1, many of these simulations
present agents with an abstract “meaning” (some vector, string, or number that does not correspond to anything in a world since there is no world) that the agent
encodes as a structured utterance that another agent
must decode. Agents typically do not have any internal states except for the purpose of producing a
sequential stream of symbols. Since they are nonsituated, agents do not perform actions.
Not all of the simulations are of the encoder/
decoder variety. Some simulations involve the description and naming of objects or the choosing of mates,
and some deal with cooperation among a group of
agents. Perhaps due to the more language-like nature of
the signals, the tasks are a bit more language-like themselves. The majority are still encoder/decoder games,
but object naming and description, as well as cooperation, are plausible tasks for linguistic behaviors.
We review 15 simulations here (see Table 2). A
wide variety of mechanisms are used by the various
agents, including recurrent neural networks, lookup
tables, and associative memories. Learning (both
supervised and reinforcement types) is the predominant form of adaptation, but a few simulations use
evolution in conjunction with learning. These simulations demonstrate that structured communication can
emerge under certain circumstances and that it is often
related to the structure inherent in a task. Evolution
and learning together are shown to be more effective
than either alone. As with unstructured simulations,
spatial constraints are again found to lead to local dialects. Linguistic variation may also be explainable by
transmission errors (younger generations imperfectly
learning from older ones) as well as by parsability and
46
Adaptive Behavior 11(1)
production constraints (constructions that are difficult to transmit or understand might are eventually be
replaced by easier constructions). Finally, phonological and grammatical classes have proven to be natural
solutions to the problem of producing a large repertoire of utterances.
2.2.1 Featured Examples We present two represent-
ative examples in detail. One example uses a common
mechanism for both production and comprehension of
signals that has become a popular tool in later work,
and the other explores how agents could come to name
objects without strong supervision. In the first of these,
Batali (1998) showed how structured utterances might
be created and acquired by agents by learning from
each other (using backpropagation, a supervised learning algorithm). There were 100 meanings to convey,
represented as pronoun–predicate tuples, with 10 pronouns and 10 predicates. Each agent used a recurrent
neural network to produce a stream of tokens up to 20
long (there were 4 tokens, yielding ∑i20= 14i possible
utterances, an astronomical number). The agents were
all initially given random weights in their neural nets.
From the 30 agents in the population, a randomly chosen learner agent was paired with 10 randomly chosen
teacher agents. For each teacher, the learner trained
once on each of the teacher’s utterances for all 100
meanings. After at least 15,000 rounds of training,
agents learned to communicate about a number of situations (“meanings”) using a small repertoire of tokens
emitted in a temporal sequence.
Each agent used its neural network both to send
and to receive signals. The network could take a
sequence of tokens—one at a time—as input, and output a meaning vector, M′, with 10 values between 0
and 1. The sending agent’s task was to send a string of
tokens {a, b, c, d}, one token at a time, to a hearer/
receiver agent that then had to decode what meaning
M the sender was communicating. To decode an utterance, a hearer processed each token in the utterance
(using its recurrent layer to remember the past
tokens); the resulting M′ was the decoding of utterance U. To produce utterance U, an agent passed each
token through its network and chose the token that
would cause it to produce a meaning vector ( M′) closest to the M it had to communicate. It then chose the
next token in the same way, until M′ = M or 20 tokens
had been sent (a “give up” limit).
Meaning vectors had some regularity. The first 4
bits of the vector were taken from a set of 10 arbitrary
bit patterns (intended to correspond to pronoun referents such as “you” or “me”). The last 6 bits of the vector were taken from one of 10 arbitrary bit patterns
intended to represent predicates such as “happy” or
“sad.” Batali found that agents initially developed a
repertoire of token sequences that were different from
each other, although error was still high and sequences
were quite long. These differences were mainly
attributable to the random weights with which each
agent began. After the repertoire was distinguishable, error dropped and the average sequence length
for each meaning fell from 20 to 4. The resulting
token sequences exhibited some systematicity consistent with the structure found in the meaning vectors. Most of the predicates were represented by a
common token sequence “root” (e.g., cd for “happy”
and b for “sad”) and the pronouns were often represented by a common suffix (e.g., ab for “you, singular”). Thus, an utterance representing “you-singular
happy” would look like cdab and “you-singular sad”
would look like bab. Agents also generalized their signaling system to new meanings fairly well. Kvasnicka
and Pospichal (1999) extended this work by adding
genetic and memetic components to agents (Kvasnicka
& Pospichal, 1999). Agent genomes specified hiddenlayer size and connectivity, and each child inherited
some of the mappings that its parents had created (the
memetic contribution), which is similar to having parents teach their children before sending them out into
the world. The populations showed similar results to
Batali’s as well as demonstrating the Baldwin effect
(learning affects the genome, for example, Baldwin,
1996) when memetic components were added.
In a second example, Steels (1998b) studied a
variety of agents in experimental conditions similar to
Batali’s but used a different approach to learning.
Agents played various “language games” with each
other, usually involving the description of an object to
another agent. Agents were located in a room, where
they could “see” but not affect a set of objects. The
objects could be perceived by low-level sensors and
each agent first learned to build feature detectors for
distinctive sensor readings. For example, if there were
five objects in a room, and one object could be distinguished by its red color and its spherical shape, then
an agent might develop a feature detector for red
colors, spherical shapes, or perhaps both features.
Wagner, Reggia, Uriagereka, & Wilkinson
Because every agent developed its own feature detectors, each agent might have different feature detectors,
although most agents would be likely to share feature
detectors for most colors, shapes, and so forth. Once
each agent could distinguish each object from all of
the others, the whole population was given a lexiconcreation task. This task involved agents that could
expand their lexicons to describe new situations.
Initially, each agent began with an empty lexicon
(a mapping from features to words). Two agents were
paired and engaged in language games where the first
agent (speaker) would “point” to and then attempt to
describe one object (the topic) out of the set of objects
to the other agent (the receiver). When a speaker
could not describe one of the topic’s features, it
invented a new word for that feature. If the speaker’s
description failed to help the receiver pick out the
object from the group, the receiver modified its lexicon. Receivers modified the associations between
words and features or added a word when they did
not know it. After thousands of interactions, agents
achieved a high rate of communicative success,
demonstrating that agents can develop a lexicon
from simple object-description interactions despite
having different internal representations of meanings (the objects’ features). An extension to this
work has shown that agents can still develop a shared
lexicon without resorting to pointing (A. Smith, 2001).
Further work by Steels has shown that robotic
agents using similar feature detectors and lexicon
could develop a precursor to syntax: word order
(Steels, 1998a). Agents were given the capability to
create frames that held words in a certain order and
the object features they related to. When objects with
multiple, distinctive features were used, a group of
words could be used to describe them, and from these
common concatenated phrases a simple structure
could arise. Sequential utterances are an important
step in creating a syntactic communication system,
although there are many other features (e.g., hierarchical utterances) that need to appear before syntax could
be said to be present.
2.2.2 Survey of Other Related Work Most other nonsituated, structured simulations have focused on factors affecting structure features, the contributions of
learning and evolution to the emergence of structured
communication, or the role of grammatical and pho-
Simulation of Emergent Communication
47
nological classes in structured communication. Several studies have indicated that evolutionary processes
and learning combined are more effective than either
alone since evolution can lay a foundation from which
learning can proceed (MacLennan & Burghardt, 1993;
Batali, 1994; Kirby & Hurford, 1997).8 Evolution
seems to be able to provide a foundation from which
learning can expand (Batali, 1994; Kirby & Hurford,
1997). There is also some evidence that structured signals used for cooperation can evolve when the number
of situations to communicate is larger than the repertoire of signal components (MacLennan & Burghardt,
1993).
Communication systems can vary geographically
(spatial variation), they can change over time (temporal
variation), they can vary based on the relationship
between speaker and hearer, and they can even vary
within a single speaker. Spatial variation across speakers
emerged in work by Kirby (1999; and also Livingstone
and Fyfe, 1999a, from Section 2.1). In this model, spatial constraints prevented agents from communicating
with others too far away, so local areas developed with
one dialect while other areas farther away could retain a
different dialect, both equally as efficient. The multiagent work reviewed in this section has also explored
temporal variation. Temporal variation might result
from sexual selection. In one simulation, females chose
mates on the basis of the relative novelty of each male’s
song (Werner & Todd, 1997). Female preferences and
male songs were genetically fixed. Males who produced
novel songs, while still adhering to a basic pattern, could
gain more mating opportunities. Song variation between
males was greatest when females could choose from
fewer males and when females preferred “surprising”
songs. But sexual selection is not the only mechanism
responsible for temporal variation in a communication
system. Studies have argued that linguistic selection
could also account for how certain grammatical features
could come to predominate in a language-using population: Parsability and ease of production could both play
a role in generating more efficient grammars, while spatial constraints could account for variability among a
multitude of equally efficient possibilities (Kirby &
Hurford, 1997; Kirby, 1998). These results notwithstanding, linguistic variation does not always have to be
based on optimality. Simulations have also demonstrated that transmission error, frequency of presentation, and ease of learning can explain some forms of
linguistic variation (Hare & Elman, 1995), demonstrat-
48
Adaptive Behavior 11(1)
ing that changes in regular and irregular verb inflections
can occur over time as one agent learned from another,
which then trained another, and so on. The kinds of variation that arose were similar in ways to those observed
in modern languages (in this case, from Old English to
modern English).
Finally, grammatical and phonological classes
have been shown to be useful for agents communicating about a large number of meanings or things.
Among several available sound production systems,
the one that used phonological classes was much
more efficient with respect to memory size (Steels &
Oudeyer, 2000; also demonstrated mathematically in
Nowak, Krakauer, & Dress, 1999). If a large repertoire of sounds (words in human languages) is necessary, rote memory of each sound becomes impractical.
Phonological classes help both in the reduction of
memory required to store and produce each distinct
sound, as well as in the classification of each sound.
Similarly, grammatical classes allow for a grammar
with fewer and more general rules. One simulation
showed that a simple grammar could be inductively
learned to express a large number of meanings (in the
form of propositions; Kirby, 1999). The meanings
were in the form of propositions (e.g., p, p(a, b),
p(q(a), b)). The agents evolved a grammar to reflect
the forms of the meanings by creating classes for each
kind of object and predicate atom, as well as using
recursive rules to deal with higher-degree (embedded)
propositions. In an extension to this work, Kirby
showed that constraints on the frequencies of each
meaning—a communication bottleneck—could give
rise to irregular forms in the grammar (Kirby, 2001).
Using the same paradigm as Kirby, Brighton
showed that when the communication bottleneck was
small (i.e., agents were not able to communicate a
large portion of their language when describing various objects to each other), compositional languages
emerged and tended to be more stable than holistic
(noncompositional) languages (Brighton, 2002). Because
of its ability to generalize, a compositional system
could capture more of a language given fewer interactions than a holistic system could. But generalization
was only possible when objects had many features but
few values for each feature (so that different objects
were likely to share common features and values). In
Kirby and Brighton’s simulations, the population size
was 2 (one agent training the other); larger populations
may exhibit very different dynamics.
2.3 Situated, Unstructured Communication
The work we have considered so far, being nonsituated, is unrealistic in associating no external task with
communication acts. To address this, many studies of
the evolution of communication have examined situated agents using unstructured signals. As with nonsituated, unstructured simulations, agents send single
atomic signals, but now agents exist and interact with
the environment in an artificial world that is usually a
two-dimensional landscape. In a few cases, agents
send several atomic signals on multiple channels, but
the signals are not related to each other. In other
words, hearers choose to pay attention to only one
channel, or use the information on each channel separately (e.g., an alarm call sent simultaneously with a
mating call). Unlike with nonsituated simulations,
agents are evaluated based on their performance on a
task instead of being directly evaluated on their communication abilities.
Most past simulations involving the emergence
of situated, unstructured communication have been
directly or indirectly motivated by observations of
animal communication rather than language. Animals communicate about many things: dominance,
mate selection, food, predators, and so forth. For
example, several species of tamarins and marmosets
give one call type upon discovering food and another
call type while consuming it (Elowson, Tannenbaum,
& Snowdon, 1991; Benz, 1993; Caine, 1995). Vervet
monkeys use four phonically different alarm calls to
indicate the identity of terrestrial, aerial, arboreal or
other predators (Cheney & Seyfarth, 1990). On the
other hand, many alarm calls do not differentiate
among predator types, although some alarm calls convey information about the urgency of the threat
(Manser, 2001). A critical issue in the evolution of
such communication acts is how they benefit signalers
and receivers (and are therefore selected for). At
present, close kinship provides one plausible explanation and has been an issue in some of the simulations
described here.
We consider 17 studies involving situated agents
and unstructured communication (see Table 3). These
studies extend the results from nonsituated simulations by demonstrating that grounded signals can
evolve or be learned. A grounded signal is one that is
somehow related to the organism or its environment
(see Harnad, 1990 for a discussion). Simulations using
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
49
Table 3 Studies involving situated, unstructured communication
Simulation
Adaptive
processa
Behavioral
mechanismb
Type of communication/Task
Ackley and Littman 1994
Baray 1997
Baray 1998
Billard and Dautenhahn 1999
Cangelosi and Parisi 1998
de Bourcier and Wheeler 1995
Di Paolo 2000
Grim et al. 1999
Grim et al. 2000
Murciano and Millán 1997
Noble 1998
Oudeyer 1999
Quinn 2001
Reggia et al. 2001
Saunders and Pollack 1996
Wagner 2000
Werner and Dyer 1991
E
E
E
L
E
E
E
E
CA
L
E
L
E
E
E
E
E
FNN
PS
PS
RNN
FNN
Params
DNN
Fixed strategy
Fixed strategy
Hybrid NN/PS
RNN
Assoc
RNN
FSM
RNN + FSM
Table
RNN
Food & alarm calls
Recruitment calls (food, predators)
Recruitment calls (food, predators)
Object/location/orientation names
Object description
Aggression
Finding another agent
Food & alarm calls
Food calls
Object description
Aggression
Encoding/decoding
Movement coordination
Food & alarm calls
Food calls
Food calls
Mate-finding
a
CA = cellular-automaton adaptation, E = evolution, L = learning
Assoc = associative memory, DNN = dynamic neural net, FNN = neural net, FSM = finite-state machine,
Params = agent parameters, PS = production system, RNN = recurrent neural net, Table = lookup table
b
nonsituated agents cannot explore this kind of communication, and even the situated agent simulations
described here only demonstrate a simple kind of
grounding, tying signals directly to basic needs such
as finding food. Nevertheless, these simple forms of
concrete grounding serve as an important starting
point for the study of meaning in communication.
In these simulations the agents typically have a
task relevant to a real species such as finding food,
finding a mate, or avoiding predators. Agents usually
have a small repertoire of signals, but these signals
are often initially unassociated with particular actions
or situations. Through various adaptive processes,
agents eventually come to associate each signal with
a specific action or situation. Food, alarm and recruitment calls, mate-finding and other agent-finding signals, as well as object description/naming/location
signals are utilized by agents to solve their various
tasks. A few of these studies employ learning, but
evolution is the most common adaptive process used.
Agents are represented by neural nets, production
systems, lookup tables, and a few less common kinds
of mechanisms.
There are several implications of these simulations. They show that grounded signals can evolve in
response to more realistic tasks, and they have assessed
how environmental parameters such as the distribution
of food sources or the density of predators influence the
evolution of communication. Kin selection and spatial
constraints are found to encourage the emergence of
altruistic (selfless) food and alarm calls, while population size affects how useful food and alarm calls really
are. Other findings are that signal cost ensures the
sender’s honesty, that continuous signals can evolve to
be interpreted as having discrete meanings, and that the
entrainment of signaling between two communicators
can be useful.
2.3.1 Featured Examples We consider two illustra-
tive examples of the types of artificial, multi-agent
worlds that have been studied. These examples both
highlight the typical kinds of tasks that situated agents
face and demonstrate the power of controlled experimental design (which is somewhat rare among simulations of emergent communication). In the first of
50
Adaptive Behavior 11(1)
Figure 5 Automata model summarizing the behavioral states of agents in Reggia et al.’s (2001) simulations. States
are indicated by labeled circles and transitions by oriented arcs. Built-in transition priorities are depicted at the lower left.
Noncommunicating agents ignore signals, that is, their behavioral state changes could not follow “heard of…” links.
these, Reggia, Schulz, Wilkinson, and Uriagereka (2001)
simulated a two-dimensional world with food, predators, and agents to determine environmental conditions
under which food-call and alarm-call behaviors would
evolve in a population of initially noncommunicating
agents. Agents moved around their world, looking for
food and avoiding predators, tasks that might be
achieved nonoptimally in the absence of communication. If they found food, they could replenish their
food stores. Agents would die if they reached old age,
starved, or were caught by a predator. Each agent
could only see what was immediately in front of it,
and could move toward food or flee from predators if
aware of these things (which they may not be due to
the limited directionality and distance of their visual
information). Each agent’s genome specified the type
of agent it was: a noncommunicator that could neither
send nor hear calls, a food-caller that sent food calls
when it was near food and moved toward food calls
when it heard them, an alarm-caller that sent alarm
calls when predators were near (and moved away
from alarm calls it heard), and agents that could use
both kinds of calls. In other words, this work assumes
a communication system with a prespecified form. All
noncommunicative behaviors were built into a finite
state machine that determined the agent’s movements,
whether or not it would eat food, flee predators, or
react to signals (Figure 5). An agent’s fitness was
based on its food stores. New living agents replaced
the dead and thereby maintained a constant population
size. Simulations typically began with 50–100 noncommunicating agents (no other types of agents at the
start) and were run for 100,000 iterations, after which
the proportions of each type of agent were measured.
In this study, evolutionary and ecological factors
were manipulated to explore their effects on the evolution of alarm calls and food calls among populations
that initially consisted of only noncommunicating
agents. Alarm calls evolved when population density
was high enough (so that enough hearers could benefit). Only a few predators needed to be present for
alarm calls to confer a benefit. Such altruistic signaling was able to evolve since any agent that could hear
alarm calls would also send them (i.e., no cheating
was possible, an important limitation of this study).
Alarm calls did have an implicit cost, since any agent
hearing an alarm call would flee and thus not be able
to feed. Accordingly, alarm calls did not evolve in
conditions where feeding was more important to producing offspring than surviving for a long time. Spatial constraints on mate selection had no effect on
evolution of alarm calls.
Wagner, Reggia, Uriagereka, & Wilkinson
Food calls evolved most often when food sites were
rich but few in number. This was because food was
harder to find but yielded a substantial fitness bonus if
found; thus signals leading to food would greatly accelerate locating rare but rich food sites. Food sites with little food did not encourage food calls because they
would be quickly depleted. Furthermore, spatial selection and the placement of offspring near parents tended
to favor food calls. This is because a new cluster of food
signalers near each other in a large population of noncommunicators could succeed, even if they made up a
small portion of the population. In contrast, with offspring dispersal, signalers would become too far apart
for their signals to reach other listeners. Without spatial selection, signalers might not reproduce with their
nearby kin (who would also be signalers), so some of
their offspring would be nonsignalers.
In a second example of situated, unstructured
communication, Wagner (2000) placed agents in a
similar two-dimensional cellular world, allowing
them to move around and look for food. There were
no predators. Several agents could occupy a cell
simultaneously, and a food item might be present in
some of the cells (based on a food abundance parameter). An agent could only acquire food when at least
one other agent was in the same cell. Agents could
only see other agents and any food in their current
cell, but they could hear signals from several cells
away. Agents used lookup tables that mapped their
inputs (food and agents seen, signal heard) to a specific action (do nothing, signal, wander, move toward
a signal). Sending a signal carried a fitness cost. Signal cost was necessary to achieve meaningful results
because senders and receivers had different interests
(cf. the handicap principle, Zahavi & Zahavi, 1997).
When signals had no cost, agents evolved to emit signals constantly since they could only benefit from
agents flocking toward them; as a consequence,
receivers tended to ignore signals because they carried
little information. Costly signals forced senders to
have the same interests as receivers. This simulation
was limited by the assumption of a direct cost to signaling as well as a narrowly defined task.
Population density, food abundance, and signal
cost were varied to determine ecological effects on the
evolution of food calls. Agents only evolved to send
food calls under conditions in which population density was not too high. Otherwise, it was easy to find
other agents by wandering around, and listening to
Simulation of Emergent Communication
51
signals accrued no benefit to the hearer. In addition,
food abundance had to be high enough so that the signaler could benefit from continuously signaling while
waiting for another agent to follow its signal (otherwise, the signaler would be better off not signaling,
since signaling had a cost). These results complement
rather than contradict Reggia et al.’s results (above),
showing that food calls are useful when population
sizes are large enough to ensure agents are often
within range of signalers. The benefit of high food
abundance is much like the benefit from Reggia et
al.’s rich food sites. High population densities eliminated the need for signals since agents could easily
find each other by wandering around.
2.3.2 Survey of Other Related Work Many other simulations have shown the evolution of food calls
(Ackley & Littman, 1994; Saunders & Pollack, 1996;
Baray, 1997, 1998; Grim, Kokalis, Tafti, & Kilb,
1999, 2000) and alarm calls (Ackley & Littman, 1994;
Baray, 1997, 1998; Grim et al., 1999). Further, alarm
calls tend to be more costly than food calls (Grim et
al., 1999; Reggia et al., 2001), so predation pressure
must be severe enough to outweigh the costs to foraging before they will evolve. A prominent finding is
that spatially constrained mating and offspring placement (leading to kin selection) encourages the evolution of altruistic food and alarm calls (Ackley &
Littman, 1994; Grim et al., 1999). Simulations have
also showed the benefit of kin selection for food and
alarm calls by using homogeneous populations of
agents (Baray, 1997, 1998). As opposed to other kinds
of alarm calls (resulting in flee responses), these latter
agents evolved recruitment alarm calls, which caused
other agents to flock to the signaler and confuse the
predator (Baray, 1997). However, alarm calls were
less useful when the population increased beyond a
minimal size because agents would propagate the
alarm call, eventually causing all agents in the world
to respond to one agent in need (negating the specificity and usefulness of alarm calls). When population
sizes were in a middle range (about 20–75), food calls
were most useful (Baray, 1998). Overall, the combined results of simulations discussed in this section
suggest that signals (particularly food calls) are generally useful in medium-density populations, since too
few agents means that hearers are scarce and far away,
and too many agents negates the need to signal at all.
52
Adaptive Behavior 11(1)
A variety of other types of “artificial worlds”
have been studied. For example, food and alarm calls
also emerged in a cellular automaton world, but noise
(small errors in action choice) was crucial to the stability of a signaling strategy (Grim et al., 1999, 2000).
Agents needed to find food before they gave a food
call, but they needed to “open their mouths” to find
food (which costs energy). An ideal strategy for an
agent is to wait for a food call before opening its
mouth (preventing it from accruing huge costs by
keeping its mouth open constantly). Without noise,
neither signaling nor mouth opening would be initiated by these “ideal” agents, so they would never
begin to eat or to signal (a sort of prisoner’s dilemma).
Other kinds of grounded signals have also been
learned or evolved by situated agents. Mate finding
is important to many animal species (frogs, birds,
and insects in particular). For example, in one study
females and males were set in a two-dimensional
world and had to try to find each other (Werner &
Dyer, 1991). Females began by simply signaling
their presence (only males could move). Eventually,
females evolved to signal directions that males followed to find them (effectively, “turn left”, “straight
ahead”, etc.). A later simulation with much greater
realism pitted agents against several kinds of predators in an attempt to evolve food calls, mating calls, or
predator-specific alarm calls (Werner & Dyer, 1994).
It is interesting to note that, in this simulation, signaling did not evolve since another noncommunicative
solution was evolved by the agents. Sometimes signals are not as useful as they might appear to be from
an analytical standpoint. A similar finding occurred
independently while attempting to evolve “intention
signals,” which are often used in displays of aggression to avoid a costly conflict (Noble, 1998). Intention
displays did not evolve: agents instead evolved a nonsignaling (but less efficient) strategy. A spatial version
of this work, using a different agent representation
and aggressive-interaction task, showed that reliable
signals would evolve, but only when the signals were
costly (and therefore honest) or if the signals were
partially reliable and the only means of gaining information about a potential opponent (de Bourcier &
Wheeler, 1995). Another spatial version of the evolution of intention signals demonstrated that agents trying to maintain a set distance away from each other
could evolve a “signaling” protocol using proximity
sensors and back-and-forth movements despite the
absence of a dedicated communication channel (Quinn,
2001).
Given that communication about objects is so
common among humans and found in a variety of
other species (e.g., vervets: Cheney & Seyfarth, 1990;
meerkats: Manser, 2001; prairie dogs: Slobodchikoff,
Kiriazis, Fischer, & Creef, 1991; dolphins: Sayigh,
Tyack, Wells, Scott, & Irvine, 1995), it is natural to
explore how it might emerge. One study has shown
that object descriptions and the proper approach to
those objects can evolve even when only the hearer
would benefit (Cangelosi & Parisi, 1998). Agents can
also learn to describe objects when trying to collect
them efficiently (Murciano & Millán, 1997). Learning
of object names, locations and orientations has been
found when agents can follow a teacher agent closely
(so that the learner’s position and orientation was similar to those of the teacher; Billard & Dautenhahn,
1999).
That discrete signals and meanings can emerge
from continuous-valued signals was shown in two
studies. In one simulation several agents in a small
arena evolved to emit a food call by using two continuous channels (Saunders & Pollack, 1996). Agents
evolved to emit oscillatory signals on one channel, but
when near food they would change the phase of the
oscillations. In another study two agents placed in a
small arena and trying to find each other could emit a
continuous-valued intensity on one channel (Di Paolo,
2000). Agents evolved to use cyclical intensity rhythms
to entrain on each other, essentially synchronizing
their signal oscillations as well as movements to find
each other quickly.
As shown in nonsituated simulations, population
flux can also lead to stable communication in a population of situated agents. Too much population flux
will prevent consensus from developing. Encouraging
agents to move toward those with similar signals can
cause dialects to form, but even more interesting,
when two groups come into contact, they can either
“bounce” off of each other or they can merge, merging
their lexicons as well (Oudeyer, 1999).
2.4 Situated, Structured Communication
The complexity of human language understandably
makes it difficult to simulate, and accordingly only a
few simulations involving situated agents using structured communication have been done (see Table 4). All
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
53
Table 4 Studies involving situated, structured communication
Simulation
Adaptive
processa
Behavioral
mechanismb
Type of communication/Task
Alterman and Garland 2001
Cangelosi 1999
Cangelosi and Parisi 2001
Moukas and Hayes 1996
L
E+L
E
L
CBR
FNN
FNN
Robots and NNs
Requests for help/replies
Object description
Response to object/action commands
Food information
a
b
E = evolution, L = learning
CBR = case-based reasoner, FNN = feedforward neural net
of them have dealt with built-in structure. For example,
two studies have explicitly focused on the emergence
of structured signals for facilitating simple, cooperative tasks (Cangelosi, 1999; Cangelosi & Parisi, 2001)
involving actions related to several objects. This work
has shown how signal structure might become related
to the agents’ ecology. Two other studies have shown
how a given communication system might emerge to
help coordinate a group of agents (Moukas & Hayes,
1996; Alterman & Garland, 2001). In all four of these
studies, structure is mostly or completely built into the
communication systems, so the emergence of communicative structure in the first place remains unstudied.
2.4.1 Featured Example We now consider a situ-
ated example where, in contrast to the above, evolution of structured communication was the issue. In
this study, Cangelosi (1999) simulated agents that had
to approach properly three edible and three poisonous
types of mushroom. Each of the three edible mushrooms differed in the proper approach toward it (a different way of eating each one), and all poisonous
mushrooms were to be avoided. Each mushroom type
had a pattern with some regularities that would indicate what type of mushroom the agent was looking at.
Each agent’s neural net (see Figure 6) could output an
action concerning a mushroom (avoid, or eat in one of
three ways) as well as produce a two-component output (two sets of competitive “linguistic” nodes, one
set of six units and one set of two units). Agents were
first evolved using a genetic algorithm to properly
approach the different mushroom types. Then the population of agents was trained using backpropagation to
name the mushroom types using their linguistic units
(names for the three poisonous and three edible
types). Evolution was used again to select for those
agents that were best at approaching mushrooms.
Figure 6 Neural networks used by agents in Cangelosi’s (1999) study of structured communication. Position units detect if there is a mushroom at one of three 40°
arcs in front of the agent. The first two action units code
continuous values for the agent’s movement (how much
forward, how much turning left/right). The third action unit
codes for which “approach” to take to a mushroom (three
ranges, one for each edible mushroom type). Linguistic
units are grouped into two competitive sets, one with six
units and one with two units. Initially, agents were
selected to output the appropriate actions given different
kinds of mushrooms in different positions (using a genetic
algorithm). Later, agents evolved and learned to use their
linguistic units to refer to the different mushroom types.
Utterances were much like a multi-faceted signal with each component presented simultaneously
(e.g., like the hand motion and hand shape of a manual sign). Over many experimental repetitions, each
signal component often evolved to correspond to a
distinct action toward a specific mushroom type.
Furthermore, the two linguistic output sets were
often specialized for distinct types of information:
one output for object description (a “subject” or
“noun”) and the other output for the action toward the
object (a “predicate” or “verb”). In 7 of the 18 experiments with initially random agent populations, the
population evolved and learned to use its linguistic
units in a structured manner closely reflecting its environment. Nevertheless, genetic drift may be responsible for these results—the relative contributions of
evolution and learning are difficult to tease apart in
54
Adaptive Behavior 11(1)
this set of experiments. In the populations that developed structured signals, the six-unit competitive linguistic units were used to name the specific type of
mushroom while the two-unit group was used to name
the general action associated with poisonous or edible
mushrooms (“avoid” or “approach”). The neural nets
were somewhat biased in favor of this result because
of the competitive unit groupings (six and two). However, the structure that the network found was in some
cases related to the task and not to explicit training for
signal–input correlations. It remains to be seen if
agents can build linguistic structure based—at least in
part—on the structure in the environment (as Batali,
1998 has begun to explore using abstract meanings).
2.4.2 Survey of other related work Another simu-
lation also focused on the evolution of structured
communication, this time involving two objects (A
and B) and two actions (push and pull; Cangelosi &
Parisi, 2001). Agents evolved one set of units (“verbs”)
associated with the actions push and pull, and another
set of units (“nouns”) associated with the objects A
and B. Agents were better communicators when they
were first selected for nonlinguistic tasks where they
would see an object and always had to perform the
same action with that object (push A, pull B). Something about the nonlinguistic task appears to have
facilitated performance on the linguistic tasks,
although many of the architectural assumptions would
need to be examined to show the precise mechanisms
involved, and the result would need to be scaled up
to accommodate a more extensive communication
system.
Two other studies did not focus on structured signals but used them as part of the task. One, a robotic
study, showed that a complex visual language could
be learned and associated with actions (Moukas &
Hayes, 1996). Robots observed a teacher using a preprogrammed communication system whose components indicated a location, direction, and amount of
power (like a food source for bees). Using a competitive learning approach, agents were able to associate
each of three signal components with the three food
variables (distance, angle, amount). A very different
study of cooperation among several agents showed
that offline learning could be used to acquire specific
structured utterances that made a cooperative task
more efficient (Alterman & Garland, 2001).
3
Language Features
It is important to point out that, with respect to human
language, we are taking as a given that it evolved for
communicative purposes. This is by no means a universally accepted view. There are many researchers,
including one of us (JU), who believe that language
emerged as part of a repertoire of cognitive abilities
unrelated to communication (Chomsky, 1975). Many of
these researchers would place syntactic concerns much
more centrally to an investigation of language
(Uriagereka, 1998; Saddy & Uriagereka, in press). For
example, all languages can be viewed as falling into the
Chomsky hierarchy of languages (Chomsky, 1956)
based on their syntactic properties. Under this hierarchy
are four classes of increasingly complex languages:
regular (ordered strings), context-free (phrases: embedded/ordered sets of strings), contextsensitive (including
transformations: ordered sets of phrases), and recursively enumerable (all computable functions). Each
class contains the one before it and has fewer restrictions on the kinds of rules that can generate or recognize them than the preceding classes. A communication
system for a regular language, for instance, would
require less complex machinery than a system for a
context-free language, which would require a more
flexible memory. There are many critical issues in the
development of syntax (many related to the Chomsky
hierarchy) that have not been addressed by any
multi-agent computational modeling to date, including phrase structures (e.g., parts of speech, connectives) and transformations (e.g., question formation:
“Which article did you read?”; Saddy & Uriagereka,
in press). Nevertheless, we are organizing this review
in accordance with a framework by Charles Hockett
both because these syntactic issues have generally not
been addressed by multi-agent models and because we
are addressing issues related to communication in
general rather than those specific to human language.
The broad range of simulations described above
have been successful in showing that communication
can emerge via learning/evolution in multi-agent systems under a wide variety of interesting conditions.
However, a key question remains: To what extent do
these simulations shed light on the origins and evolution of language? We have chosen to answer this question within a well-known system of communication
features originally proposed by Hockett in the late
1950s (Hockett, 1959, 1960; Demers, 1988) to under-
Wagner, Reggia, Uriagereka, & Wilkinson
stand the origins of human speech. Hockett argued
that all communication systems fall within a multidimensional feature space (see Figure 7). Hockett’s
original list of 13 features (Hockett, 1960) has been
refined (Hockett & Altmann, 1968; Hockett, 1990) by
classifying the features into groups [frameworks in
Hockett and Altmann’s (1968) terminology] and treating features not as binary properties but as dimensions
along which any communication system can vary. We
further refine the term “feature” to indicate either a
dimension or a finite set of possible values. For example, the various possibilities for acquiring a communication system form an unordered set (e.g., various
learning and evolutionary processes) and cannot be
located along a continuum. Along more of a continuum is utterance structure. Human utterances are hierarchically structured and rule-like, whereas gibbon
calls consist of sequences of units that appear in a
somewhat rule-like order (Mitani & Marler, 1989;
Ujhelyi, 1996), and vervet monkey alarm calls appear
to be completely unstructured in the sense that each
call is not used as a component in any other utterance
(Cheney & Seyfarth, 1990). Within this continuum are
several species of monkeys that use elements of syntax in their calls (Robinson, 1994; Zuberbuhler, 2002),
and sac-winged bats (Davidson & Wilkinson, 2002)
and humpback whales (Cerchio, Jacobsen, & Norris,
2001; Darling & Berube, 2001) that produce songs
with recurring notes, although it is not clear if different orderings of these notes have any significance in
these songs.
Hockett’s features provide an objective “checklist” against which the computational work reviewed
above can be assessed for completeness and significance. Even a casual comparison to these features
indicates a number of limitations of the simulation
studies we have reviewed. For example, no past work,
to our knowledge, has substantially examined Hockett’s features of duration, referents, and displacement,
making these significant issues for further research.
Table 5 provides a summary of Hockett’s features that
have been explored by simulations We group these
features into three frameworks: form (structural), ecological and social. Structural features relate to the
form of the utterances themselves (e.g., are the signals
composed of smaller parts?), whereas ecological features
relate somehow to the signaler’s ecology (e.g., do signals relate to internal motivations or external events?)
and social features relate to the social environment of
Simulation of Emergent Communication
55
Figure 7 Communication systems can be viewed as
points in a multi-dimensional space, where each dimension corresponds to one of Hockett and Altmann’s (1968)
features. This two-dimensional graph is only meant to
illustrate how a multi-dimensional feature in a feature
space might be filled by all known communication systems. In the figure, utterance structure acts mostly like an
ordinal scale, roughly following the Chomsky hierarchy of
languages (e.g., position indicates to some extent the relative structural complexity of a given system). Cultural
transmission is represented as the proportion of the communication system that is transmitted culturally (as opposed
to genetically).
the signaler (e.g., how are the signals acquired?). We
find that, with a few exceptions, most of the features
have received very limited attention.
3.1 Form and Structural Features
3.1.1 Realization The realization of utterances refers
to how they are perceived in relation to how they are
realized. Utterances or their components can be perceived as continuous values along some dimension
(such as volume or pitch), or they can be discrete,
meaning that they are perceived as units, rather than
as the continuous signals that they are at the physical
level. Thus, a letter p in a word spoken by a loud baritone or a quiet child will still be perceived by an English speaker as a discrete phoneme /p/. This is known
as categorical perception. Alternatively, it is possible
that a communication system could relate the continuous value of a signal to its meaning or response; this
may be the case with some alarm cries, whose intensity may signal the degree of alarm (a continuous,
rather than a discrete, relationship).
Among computational models, only a few studies
have tackled the problem of discrete perception of
continuous signals (Saunders & Pollack, 1996; Moukas & Hayes, 1996; Di Paolo, 2000), although some
56
Adaptive Behavior 11(1)
Table 5
Features explored by multi-agent simulations
Category
Feature
Featural aspect
Relevant work
Form
Realization
Continuous→discrete
Saunders & Pollack 1996, Moukas & Hayes 1996,
Steels & Oudeyer 2000, Ryan et al. 2001
Utterance
structure
Rule-like
Sequential
Instantaneous/parallel
Batali 1998, Kirby 1998, 1999, 2001
Batali 1998, 1994, MacLennan & Burghardt 1993,
Steels 1998a, Brighton 2002
Cangelosi 1999
Open, learned
Steels 1998; 1998a, Kirby 1999
Repertoire
Ecological
Groundedness Food calls
relationships
Alarm calls
Mating
Navigation
Object discrimination
Group coordination
Ackley & Littman 1994, Reggia et al. 2001,
Baray 1997, 1998, Wagner 2000,
Grim et al. 2000, Saunders & Pollack 1996
Ackley & Littman 1994, Reggia et al. 2001,
Baray 1997, 1998, Grim et al. 1999
Werner & Dyer 1991, Werner & Todd 1997
Moukas & Hayes 1996, Billard & Dautenhahn 1999
Cangelosi & Parisi 1998, Cangelosi 1999,
Murciano & Millán 1997, Steels 1998
Grim et al. 2000, Murciano & Millán 1997,
Baray 1997, Alterman & Garland 2001
Internal, goal-driven
Internal, aggression
External
Alterman & Garland 2001
Noble 1998, de Bourcier & Wheeler 1995
Most situated simulations
Private
Public
Most nonsituated simulations
Most situated simulations
Variation
Mating
Spatial
Refinement
Parsability
Transmission error
Werner & Dyer 1991, Werner & Todd 1997
Livingstone & Fyfe 1999, Kirby 1998
Alterman & Garland 2001, Steels 1998
Kirby & Hurford 1997, Kirby 1998
Hare & Elman 1995, Kaplan 2000
Acquisition
Genetic
Teaching
Imitation/observation
Werner & Dyer 1991
Hutchins & Hazlehurst 1995
Kirby & Hurford 1997, Kirby 1998
Signal
elicitation
Social
Scope
relationships
work has addressed continuous inputs and discrete
behavior (Ryan et al., 2001). Since human utterances
have hierarchical, discrete structures (morphemes/syllables composing words, words composing phrases)
(Jannedy, Poletto, & Weldon, 1994), the problem is
even more complex and this issue remains mostly
untouched by simulations (but see de Boer & Vogt,
1999; Dircks & Stoness, 1999; Berrah & Laboissière,
1999; Steels & Oudeyer, 2000).
3.1.2 Utterance Structure Utterances may have no
internal structure (as with most alarm and food calls),
they may be composed of several units (as with mockingbird songs), or they may even have rule-like or
hierarchical structures (as with language). Human language, as well as several other known animal communication systems [e.g., gibbons (Mitani & Marler,
1989), songbirds (Catchpole & Slater, 1995)], consists
of utterances that exhibit a compositional or rule-
Wagner, Reggia, Uriagereka, & Wilkinson
based utterance structure: Utterances are built out of
smaller units that are ordered according to rule-like
constraints. The origins of structured utterances is one
of the biggest mysteries in the evolution of language.
Constraints on adaptation and creating a mapping
from signals to meanings have been explored in mathematical modeling (Nowak et al., 1999; Nowak, Plotkin, & Jansen, 2000), but their ecological pressures
have mostly been explored through multi-agent simulations. Nevertheless, the ecological motivation for
structured utterances has only begun to be explored
computationally. MacLennan and Burghardt (1993)
set up a situation in which there were more “conversational topics” than signals. Thus, agents had to combine signals into longer utterances to communicate
about every situation in their world. Hockett had suggested this as a possible motivation for the development of sequential signals (Hockett, 1960). Cangelosi’s
mushroom identification task was also structured by
requiring different approaches to different mushroom
types (Cangelosi, 1999). However, only one structure
was available and the range of possibilities was limited.
It remains an open question as to whether a sequential
signaling system could then lead to syntactic rules and
thematic roles for utterance components.
Batali’s simulations demonstrated that recurrent
neural networks can support the emergence of a structured, sequential communication (Batali, 1998). It is
possible that rule-like utterances could emerge from
the rule-like nature of conversational topics. However, this kind of structure is more complex than the
sequential utterances created by Batali’s agents. The
story may involve not only the structure in the environment, but key nonlinguistic cognitive constraints
(e.g., memory limitations, attention span, poverty of
input) and production and comprehension constraints
(e.g., Hare & Elman, 1995; Kirby & Hurford, 1997;
Kirby, 1999; Kaplan, 2000; Brighton, 2002).
For example, Batali’s agents were given “meanings” composed of a predicate and a referent (although
meanings were not grounded in the agent’s actions).
The agents learned a communication system that often
divided utterance components into a predicate part and a
referent part. This is just the beginning; other external
structures might be used by agents when structuring
their communications, such as the relationships between
objects and the structure of common events.
Other simulations have shown that if agents needed
to communicate about embedded propositional mean-
Simulation of Emergent Communication
57
ings, a kind of grammar could arise to match this
embedded structure (Kirby, 1999, 2001; Brighton,
2002). Also, the natural sequential naming of individual object features can serve as a starting point for
compositional utterances (Steels, 1998a). Still others
have indicated how learning and social interactions
might play a role in the emergence of structured utterances (Batali, 1998; Steels, 1998b). Presumably other
processes—especially exaptation9—endowed hominids with the ability to process sequences of input.
Sequential processing of inputs might have arisen
because of demands from noncommunicative tasks
such as tool usage (Savage-Rumbaugh & Lewin,
1994) or attending to complex social events (e.g., as
with vervet monkeys, Cheney & Seyfarth, 1990). Collectively these simulations indicate that the emergence
of compositional or rule-based utterances may require
the existence of some kind of working memory (a
phonological loop10 or the equivalent). However, at
least some of the structure of utterances might be
acquired through learning and without a mechanism
specialized for that structure [as in Batali’s work or
Moukas and Hayes’ (1996) work].
3.1.3 Repertoire The repertoire of most communication systems is fixed or closed, but human language
is mainly open. That is, most systems do not allow
signalers to add new components or utterances to the
system, but humans, mockingbirds, and possibly other
species are able to add new components to their signal
repertoires. This is not a claim that the systems are
unbounded in size, but merely that new items can be
added to the repertoire during the organism’s lifetime.
Human language is open through two processes: the
construction of new sentences from existing words
and phrases (open utterance repertoire), and the invention of new words (open lexicon).11
Only very limited work has used agents with an
open utterance repertoire and the potential for a truly
open lexicon (Steels, 1998b), and in it the utterance
structure was fixed, effectively using <property,value>
pairs that correspond to the properties of the objects
being described. The mechanism used, only briefly
described, is mostly symbolic, something like a production system. Another study also had an open lexicon and open utterances (Kirby, 1999), augmenting a
simple grammar based on the structure of the meanings to be expressed.
58
Adaptive Behavior 11(1)
3.2 Ecological Relationships
3.2.1 Groundedness In natural communication, sig-
nals exhibit what is referred to as groundedness: utterances relate to states and events in the world that are
relevant to the sender and receiver. Grounding has
been relatively well studied compared to other features (see sections above on situated simulations).
Simulations have repeatedly shown that food finding,
mate finding, and predator avoidance all seem able to
give rise to simple (i.e., unstructured) communication
systems. For example, food calls are given most often
when food is difficult to find but highly rewarding
when it is found, and alarm calls can be costly in some
cases due to the lost foraging opportunities resulting
from fleeing (Wagner, 2000; Reggia et al., 2001).
Most of the usefulness of the signal lies in its being
emitted and in its being distinct from other signals. As
such, these kinds of pressures may not be the best
foundations for a theory of language evolution. Object
discrimination is a more demanding task requiring
more complex signaling (Steels, 1998b; Cangelosi,
1999), but simulations showing this have not been
truly grounded (agents had no actions other than communication), or involve rather artificial situations (one
agent describing a mushroom to another). Future work
on groundedness needs to place agents into more
interesting worlds and set them to performing descriptive tasks under more natural circumstances.
3.2.2 Signal Elicitation Related to groundedness is
signal elicitation, that is, what it is that causes the
elicitation of signals. Signals can be internally or
externally elicited. External elicitation of signals has
been studied by virtually every situated simulation.
The presence of food, predators, and other agents can
cause agents to communicate about them. In addition,
goal-driven signals have been employed to a small
extent (Alterman & Garland, 2001), and motivations
like aggression and mating have also been explored
(de Bourcier & Wheeler, 1995; Noble, 1998, 1999b).
Since human linguistic interactions might relate to
motivations (hunger, sex, pain) and goals (finding a
mate, hunting prey, escaping a trap, playing games),
much more study needs to be made of these internal
motivations to understand the evolution of human language. Deception is also important, as it implies theories of mind as well as internal goals and goal-driven
behavior. Much more needs to be studied in this vein,
as only a start has been made (Krakauer & Johnstone,
1995; Noble, 1998).
3.3 Social Relationships
3.3.1 Scope Speakers may broadcast their message
publicly for many to hear (e.g., sparrow food call,
Ficken, 1989), or they may direct the message to a
few individuals in private (e.g., bowerbird mating
dance). Scope specifies the kind of audience to which
a speaker directs an utterance. For human languages,
this can be private or public or both. Public messages
require the receiver to filter other messages out, since
many senders can simultaneously broadcast in the
same area (the cocktail party phenomenon; Sagi et al.,
2001).
Most nonsituated simulations have used private
scope, as they typically involve the pairwise interaction of encoder/decoder agents. Most situated simulations have used public scope since the agents are
trying to solve tasks in which signals are used to find
something (food: Ackley & Littman, 1994; Wagner,
2000; Grim et al., 2000; Reggia et al., 2001; a mate:
Werner & Dyer, 1991) or avoid something (a predator: Ackley & Littman, 1994; Baray, 1997, 1998;
Grim et al., 1999; Reggia et al., 2001). Even so, no
simulation work has explicitly focused on the problems of scope, particularly publicly broadcast signals.
Although some studies handle multiple, simultaneous
signals by letting agents select which one(s) they will
respond to and which they will ignore (e.g., Baray,
1999; Reggia et al., 2001), a systematic study of how
this should be accomplished remains to be done. Others have shown how agents might ignore their own
signals and pay attention to others through the use of
rhythmic entrainment and cyclic movement (Di Paolo,
2000). Future work should address the mechanisms
required to deal with public utterances, as well as the
specific uses to which private and public communication are put.
3.3.2 Variation A communication system may exhibit
a degree of variation from group to group or over time.
Variation refers to how the existing system may be
modified or acquire new parts. Variation can appear in
form, form–meaning associations, responses to utterances, mode of transmission, or other features. It may
Wagner, Reggia, Uriagereka, & Wilkinson
potentially be due to either genetic or cultural factors,
and it can result from natural population dynamics or
from external pressures for change.
Many aspects of variation have been studied via
simulations. As described in Section 2.3, Werner and
Dyer (1991) described historical changes in the matefinding system their agents evolved. Their work suggests an outline for how human language could have
evolved in a series of stages, from unstructured signals to sequential signals and eventually to our modern hierarchical structures. Several causes of variation
have also been explored, most prominently the spatial
constraints on communications. Spatial constraints
on partners learning to communicate can create local
dialects, each one slightly different from the others
nearby (Kirby, 1998; Livingstone & Fyfe, 1999a). In
addition, movement of agents within a spatial environment can reduce global stability of a language, but
clusters of dialects can form and even merge when
groups come into contact (Oudeyer, 1999). These kinds
of geographical and temporal variation are similar in
some ways to the variation exhibited by real neighboring language groups (e.g., Labov, 1972; Jannedy et al.,
1994). Refinement of a system (making it more accurate or efficient) has been found to cause meanings to
change or even new words to be coined (Steels, 1998b;
Steels & Kaplan, 1999). Parsability and other cognitive constraints may also play a role (Kirby & Hurford,
1997; Kirby, 1998). Population flux is not necessary for
large amounts of change to occur (Dircks & Stoness,
1999). Finally, simulations have shown how transmission and reception errors between speakers could influence changes in a communication system over
generations (Hare & Elman, 1995; Kaplan, 2000), in
addition to the accumulation of error through statistical sampling of the linguistic environment (Dircks &
Stoness, 1999).
3.3.3 Acquisition The acquisition via evolution or
learning of a communication system can depend on its
complexity, the cognitive abilities of the species in
question, and other factors. Both phylogenetic (i.e.,
occurring over generations) and ontogenetic (i.e.,
occurring within the organism’s lifetime) acquisition
are possible. The form (phonological and morphological) and pragmatics (proper use) of all human languages are acquired partially by cultural transmission.
Cultural transmission usually implies that some kind
Simulation of Emergent Communication
59
of observational learning occurs. Its presence can
allow for transmission of traits that are not necessarily
the most fit from a biological standpoint (e.g., Neff,
2000). Cultural transmission plays a role in the communication systems of many nonhuman species such
as vervet monkeys (Seyfarth & Cheney, 1997), Belding’s ground squirrels (Mateo, 1996), bottlenose dolphins (Sayigh et al., 1995), and songbirds (Marler,
1991; Catchpole & Slater, 1995; Marler, 1997; Nelson, Khanna, & Marler, 2001). Which parts of human
language are developmentally canalized and which
are learned is an unresolved issue (Pinker & Bloom,
1990; Crain, 1991; Elman et al., 1996).
Most of the studies reviewed in this article
involve the acquisition of a communication system,
including the demonstration that an increasingly complex communication system can be acquired genetically by a population that had no such system to begin
with (Werner & Dyer, 1991). On the other hand, a system could be entirely learned through explicit teaching (although there would need to be some “innate”
ability to communicate; Hutchins & Hazlehurst, 1995;
Moukas & Hayes, 1996; Billard & Dautenhahn, 1999).
More relevant perhaps to human language and a few
animal systems (e.g., Belding’s ground squirrels,
Mateo, 1996) are those simulations showing that acquisition can involve a genetically endowed system that is
modified based on feedback from the world or other
communicators (MacLennan & Burghardt, 1993; Batali,
1994). However, only one of these studies (Brighton,
2002) has begun to address the fundamental problem
of the poverty of the stimulus, the claim that children
do not get enough information in their linguistic environment to learn a language. This claim is a central
component of human language acquisition. The poverty of the stimulus argument states that if children
indeed fail to receive enough information to learn how
to speak their language(s), then they must have some
kind of specialized language-learning mechanism or
even some innate knowledge of language. The implications of this claim and even the validity of the poverty of stimulus are hotly debated (Chomsky, 1975;
Elman et al., 1996; Pullum & Scholz, 2002).
Many studies have revealed that population size,
social structure, and linguistic constraints have important effects on the dynamics of acquiring a communication system through learning. Population size affects
learning populations and evolved populations in opposite ways. Whereas consensus is easier to attain as
60
Adaptive Behavior 11(1)
evolved populations increase in size (due to greater
genetic variation; Wagner & Reggia, 2002), attaining
consensus becomes more difficult for learning populations as they increase in size (Levin, 1995; Hutchins
& Hazlehurst, 1995; Oliphant, 1999). Not only size,
but social structure—the social networks within a
population—can affect the transmission of a communication system. Tribal and other social structures can
affect how broad the transmission of linguistic features will be (Steele, 1994), even if their contribution
to fitness is zero or negative. Linguistic constraints, as
opposed to ecological fitness, may affect the acquisition of certain features of a language (Kirby, 1998;
Berrah & Laboissière, 1999). These constraints have
been proposed to account for the acquisition of various grammatical features that may not have obvious
fitness benefits (Kirby & Hurford, 1997; Kirby,
1998).
4
Discussion
As demonstrated by the studies reviewed above, very
substantial progress has been made during recent
years in developing computational models of emergent communication in multi-agent settings. The most
fundamental result of this work has been the convincing demonstration that shared communication systems
can readily appear among initially noncommunicating
agents in a very wide range of contexts. This has been
shown to be true for both structured and unstructured
communication, when agents are situated versus when
they are not, and when adaptation is brought about via
learning, evolution, or both. The ease with which simulations have repeatedly led to simple shared communication systems suggests that the common occurrence of
such systems in natural/biological settings is not surprising.
Each of the four general categories of simulation
work have revealed different things about communication. Nonsituated simulations have the advantage of
clearly illustrating general principles of communication systems (dynamics, effectiveness of various transmission techniques) whereas situated simulations are
the only ones that can explore how utterances come to
have meanings. Nonsituated simulations tend to use
learning whereas situated ones have tended to use evolutionary adaptation; perhaps this trend should be
reversed in the future, and more work should be done
with simulations combining both evolutionary and
learning mechanisms. Because of their relative simplicity, unstructured simulations have been able to
reveal how communication can emerge from initially
silent agents as well as what kinds of ecological pressures might bring forth signals in the first place.
Agents have tended to be simpler due to the complexity of their noncommunicative behaviors or due to
the complexity of the experimental setup. On the
other hand, structured simulations have shown how
agents might come to use utterances with structure;
these simulations have rarely been situated, so the
ecological motivations are all but nonexistent. Some
of the mechanisms used in these simulations (e.g.,
recurrent neural nets) are reasonable candidates to
explore in situated simulations to attempt to ground signals. There seems to be a preponderance of encoder/
decoder research; although this research has clearly
produced important insights into the emergence of
communication systems, future simulations should
probably focus on deeper representations of meaning
and grounded signals. Work is also evidently lacking
in situated, structured simulations. This is likely to be
a very fruitful area to explore in the future, although it
is also the most difficult.
These simulations have also provided insight
about a number of factors that influence the likelihood that a communication system will emerge, or
its nature when it does. Introducing spatial relationships between agents with restricted communication
ranges has repeatedly been shown to affect the learning or evolutionary process. For example, spatial
restrictions can influence the likelihood that communication will develop and, when it does, encourage
variability and the appearance of local dialects. In situated simulations where agents interact with an environment in a causal fashion, many other factors have
been shown to affect communication, including
agent density, food distribution, predator density,
signal honesty, and sexual selection. Such results are
directly relevant to many issues in the evolution of
animal communication and may ultimately guide
interpretation of the rapidly expanding experimental
data in this area (Hauser, 1996; Bradbury & Vehrencamp, 1998). Furthermore, software agents and robotic
systems may benefit from a better understanding of
factors that encourage a shared communication system. It is difficult to design by hand a communication
system or set of interaction protocols for a large group
Wagner, Reggia, Uriagereka, & Wilkinson
of agents. Instead, simulated or robotic agents could
be allowed to evolve and/or learn how to communicate
(using the techniques from situated simulations) to
increase their efficiency at performing their task. Spatial
restrictions, agent density, and individual task assignment could be tailored to aid the agents, and the acquisition technique (evolutionary/learning algorithms) could
be chosen to match the task: an “observational” learning
algorithm could be used for small populations of
homogeneous agents, while some kind of evolutionary algorithm might be more effective with large populations of agents with specialized tasks.
Although these results are encouraging for communication in general, less progress has been made in
the quest to gain insight into the origins and evolution
of the more complex linguistic features such as thematic roles, parts of speech, connectives, and transformations. On the positive side, many of the simulations
reviewed in this article contribute to our understanding
of specific features of communication that are widely
recognized to be important in language (groundedness,
variation, etc.). Some simulations, mostly nonsituated
ones involving supervised learning, have gone so far
as to demonstrate the appearance of structured communication, showing how sequential and rule-like utterances can arise, how their structure may be related to
the agents’ ecology or isomorphic to task structures,
and how they depend on agent-to-agent interactions.
Nonetheless, substantial gaps remain. For example, the
origins of the open repertoire of human language has
not been adequately explored, and the ecological
validity of structured communication for situated
agents has not been established. Such gaps are to be
expected since the field is still quite young. As it
matures, one hope is that future work will attempt to
tackle existing hypotheses for the origins of communication/language and thoroughly test them. Currently,
most researchers do not explicitly test exisiting biological, cognitive, or anthropological hypotheses for
the origins of a communication system. Only a few
computational works (Enquist & Arak, 1994; Bullock
& Cliff, 1997; Noble, 1998, 1999b) take seriously several hypotheses on the origins of communication: the
handicap hypothesis (Zahavi & Zahavi, 1997) and
related hypotheses, although one simulation has begun
to explore perceptual biases as one possible origin of
mating calls (Ryan et al., 2001). Some related work in
robotics has looked at cricket calls and female song
preferences (Lund, Webb, & Hallam, 1998; Webb &
Simulation of Emergent Communication
61
Hallam, 1996), taking mechanism and situatedness
(especially embodiment) very seriously. Although
computational modelers have only begun to enter this
area, there is a large literature on mathematical
approaches to biology, including a significant body of
game-theoretic work that covers many issues directly
and indirectly relevant to communicative hypotheses
(e.g., Newman & Caraco, 1989; Caraco & Brown,
1986; Mesterton-Gibbons & Dugatkin, 1999). Unfortunately, coverage of this literature is beyond the scope
of this article.
It is curious to note that most simulations have
demonstrated that agents always succeed in developing a working communication system (except for
Levin, 1995; Noble, 1998; Grim et al., 1999; Wagner, 2000; Reggia et al., 2001). There is a clear need
for careful studies of when communication will not
emerge. This leads into a second criticism of work in
the field: Most of the work that we have reviewed has
suffered from a lack of experimental controls (but see
Levin, 1995; Noble, 1998; Baray, 1998; Grim et al.,
1999; Wagner, 2000; Reggia et al., 2001). The use of
controlled experiments would allow the discovery of
specific factors responsible for the emergence of some
communication system. For example, Wagner used
agent and food density to demonstrate conditions
under which communication would not be any more
useful than remaining silent (Wagner, 2000).
Perhaps the greatest limitation of the work surveyed here with respect to language is that it has not
yet shed substantial light on the origins and evolution
of syntax. We have reviewed these simulations in the
light of Hockett and Altmann’s (1968) features, but
there is an entire field of literature based on elements
of Chomsky’s language hierarchy as well as other
theories that focus in more detail on language and
syntax (e.g., Langacker, 1987; Uriagereka, 1998).
The Chomsky hierarchy of formal languages incorporates levels of complexity involving sequential components, phrase structures and transformations (Saddy
& Uriagereka, in press). So far, multi-agent work has
revealed only sequential elements of syntax, with little
investigation into phrase structure (only Batali, 1998;
Kvasnicka & Pospichal, 1999; Kirby, 2001; Brighton,
2002 to a very limited degree) and no work (that we
know of) on transformations. Progress with respect to
language has been limited to dynamics (e.g., of language change) and simpler formal properties (e.g.,
lower regions of Chomsky’s hierarchy of languages,
62
Adaptive Behavior 11(1)
dealing with ordering of components and very simple
phrase structures). Although progress has been made,
it is relatively small compared to what has to be done
to explore all aspects of language fully. Given that the
field of multi-agent simulations in the evolution of
communication is only about 10 years old and that the
majority of work has been done in the past 7 years, it
is not surprising that many explorations are still in
their infancy.
Theories concerning the origins of language have
often differed in their viewpoint on syntax. For the
functionalist tradition (e.g., Haiman, 1985), syntax is
viewed as a side effect of functional demands on
effective communication. This approach has met with
much skepticism from syntacticians, since it does little to account for actual conditions found by research.
On the other hand, until recently no research within
the generative tradition was devoted to language origins, as the question was deemed too obscure to pursue. That changed in the last decade, when two schools
of thought emerged within generative grammarians.
First, Bickerton (1990), Pinker and Bloom (1990), and
Newmeyer (1991, 1992) tried to argue for different
aspects of a neo-Darwinian approach to the evolution
of syntax. Second, Chomsky (1980), Piatelli-Palmarini
(1989), Lightfoot (1991), and Gould (1991) voiced a
new kind of skepticism, based on punctuated equilibrium theories of evolution, showing that linguistic
structure is not obviously adaptive [Christiansen (1994)
summarizes the two positions]. The last few years have
seen two new developments. Some researchers have
argued that language is complex enough to demand a
sophisticated explanation based on both kinds of theories (e.g., Kirby, 1996; Carstairs-McCarthy, 1999). In
turn, developments in theories of “complexity” have
resulted in both interdisciplinary teamwork and new
models for the emergence of structure (Knight, StuddertKennedy, & Hurford, 1998). The reaction from syntacticians, however, remains skeptical (e.g., Uriagereka,
1998), primarily because the research in question still has
little to say about the hallmarks of syntax, among these
the locality and economy character of derivations and the
recursive properties of syntax (but see Kirby, 1999). A
combination of the multi-agent work reviewed in this
article and mathematical modeling (Nowak, Komarova,
& Niyogi, 2002) may eventually shed light on this difficult problem. Nevertheless, much more computational
modeling work will be needed to address properly the
many issues surrounding the evolution of syntax.
In more general terms, the multi-agent models we
have reviewed here leave several areas largely unexamined, suggesting some important directions for future
research. Even considering just Hockett and Altmann’s feature set, it becomes evident that communication features such as utterance duration, arbitrary
versos iconic referents, and displacement have basically
been untouched by simulation work. Other features,
such as discrete realization of continuous signals, open
repertoires, internal signal elicitation, and scope have
only begun to be studied with computational models.
Future simulation work examining these features is
likely to be fruitful, as is study of combinations of these
features (e.g., studying repertoire and utterance structure interactions may reveal how open repertoires and
hierarchical utterances interact).
With respect to language, as noted above the most
critical issue needing further study is how syntactic
processing can evolve. Although some mathematical
modeling work has explored the general sorts of pressures and initial conditions required for signals to
become structured (Nowak et al., 1999, 2000), this type
of investigation cannot reveal why hominids in particular developed the communication system that they did,
nor can it reveal how individual-level dynamics will
affect the outcome. It seems probable that more realistic
and complex neural network models may be able to
investigate this. However, such research will be limited by the complexities involved in evolving neural
nets, and by the large computational costs involved in
combining evolutionary computation and neural network learning methods. Past research on emergent language has largely emphasized nonsituated agents and
has primarily used supervised learning to examine cultural transmission. Although many researchers believe
that this approach is justified, there are many who
believe that these approaches are not biologically
plausible in many language-learning situations (e.g.,
Elman et al., 1996). Future work that focuses on structured communication and syntax might benefit from
focusing more on situated agents (e.g., so groundedness could be examined in this context) and by more
emphasis on self-organizing communication systems
based on unsupervised and reinforcement learning
[see Dickins (2001) for a discussion of the kinds of
learning processes that are likely to have played an
important role].
There has been growth in each of the four subdivisions (on the situated/structured axes) of this field.
Wagner, Reggia, Uriagereka, & Wilkinson
Little has yet been done with situated/structured simulations as noted earlier. Much of the work up to 1997
focused on more general aspects of the emergence of
communication, asking questions about what mechanisms could make it happen (proof of concept) and
what ecological pressures might bring it about. Later
work has asked more detailed questions about mechanisms, dynamics, the relative contributions of learning
and evolutionary processes, and the structure of signals.
A few of the most recent simulations have employed
controlled experiments, and the hope is that this is the
primary improvement to occur in future work; controlled experimentation will bring this burgeoning
field into maturity.
8
9
10
Notes
11
1
2
3
4
5
6
7
We use the term emergence in the sense that it is often
used in artificial life and other fields (Cottrell, 1977; Ronald, Sipper, & Capcarrere, 1999), that is, to mean the
appearance of a new global property of a complex system
that derives from the local interactions of its numerous
parts. In our case, interacting agents form the principal
“parts,” a multi-agent artificial world is the complex system, and a shared communication protocol arising via
learning/evolution is the global property.
For example, we do not include work published in languages other than English, nor much work relevant to
communication in social insects (ant pheromone trails, bee
“dances,” etc.), nor work on designing rather than learning/evolving inter-agent communication protocols (e.g.,
KQML) from the field of distributed artificial intelligence.
We combine the concepts of embodiment and situatedness
under the single heading, situated. Although any communication occurring in the context of a population might be
viewed as “situated,” we do not adopt that view in this
review.
Some have argued that supervised learning is not justified
in learning language (e.g., Elman et al., 1996).
This is an example of indirect pressure for utterance
length: Kaplan’s agents were directly evaluated on the
basis of communicative accuracy, not on the basis of the
length of their utterances. Direct pressure would have
been selecting agents based on the length of their utterances.
We classify this study as nonsituated according to our criteria, stated earlier, that the agents here have no effect on
the world they are in.
Pleiotropy refers to a gene that is responsible for several
traits, and hitchhiking refers to two genes that are very
Simulation of Emergent Communication
63
close on the same chromosome so that a relatively unrelated trait may be carried forward during evolution because
its gene is located physically close to another gene that
conveys fitness.
But see Noble and Cliff (1996) for a close replication of
MacLennan and Burghardt (1993) that did not show an
advantage for evolution and learning over evolution alone
due to a slightly different population structure that did
not allow agents to predict each other as accurately as
MacLennan and Burghardt’s agents could.
Exaptation is the process by which a trait emerges for one
purpose and is later used by evolution to perform a different purpose. Archaeopteryx’s feathers are one possible
example: initially, feathers may have been used for heat
radiation and only later became useful for short gliding
and finally for flight.
A component of the working memory model (Baddeley,
1992), used to store and manipulate about 2 s of speech
input.
For a critical perspective on this view, see Fodor (1998)
and Fodor and Lepore (1998).
Acknowledgments
We thank Michael Gasser and David Poeppel for useful discussions relating to this article, Reiner Schulz for preparation of a
figure, and three anonymous reviewers for helpful comments.
Dr. Wagner is supported by an NIH Post-doctoral Fellowship
(T32 DC00061) and by funding from the University of Maryland Institute for Advanced Computer Studies. Dr. Reggia is
supported by NINDS Award NS35460 and ONR Award
N000140210810. Dr. Wilkinson is supported by NSF Grant
DEB0077878. Dr. Uriagereka is supported by NSF Grant BCS9817569.
References
Ackley, D., & Littman, M. (1994). Altruism in the evolution of
communication. In R. Brooks & P. Maes (Eds.), Artificial
life IV: Proceedings of the Fourth International Workshop
on the Synthesis and Simulation of Living Systems (pp.
40–48). Cambridge, MA: MIT Press.
Aitchison, J. (1996). The seeds of speech: Language origin and
evolution. Cambridge, UK: Cambridge University Press.
Alterman, R., & Garland, A. (2001). Convention in joint activity. Cognitive Science, 25(4), 611–657.
Baddeley, A. D. (1992). Working memory. Science, 255, 556–559.
Baldwin, J. M. (1996). A new factor in evolution. In R. Belew
& M. Mitchell (Eds.), Adaptive individuals in evolving
populations (pp. 59–79). Reading, MA: Addison-Wesley.
64
Adaptive Behavior 11(1)
Baray, C. (1997). Evolving cooperation via communication in
homogeneous multi-agent systems. In Proceedings of
Intelligent Information Systems (pp. 204–208). Los Alamitos, CA: IEEE Computer Society.
Baray, C. (1998). Effects of population size upon emergent
group behavior. Complexity International, 6. URL: http://
life.csu.edu.au/complex/ci/vol6/baray/
Baray, C. (1999). Evolution of coordination in reactive multiagent systems. Unpublished doctoral dissertation, Indiana
University, Bloomington.
Batali, J. (1994). Innate biases and critical periods: Combining
evolution and learning in the acquisition of syntax. In R.
Brooks & P. Maes (Eds.), Artificial life IV: Proceedings of
the Fourth International Workshop on the Synthesis and
Simulation of Living Systems (pp. 160–171). Cambridge,
MA: MIT Press.
Batali, J. (1998). Computational simulations of the emergence of
grammar. In J. R. Hurford, M. Studdert-Kennedy, & C.
Knight (Eds.), Approaches to the evolution of language (pp.
405–426). Cambridge, UK: Cambridge University Press.
Benz, J. (1993). Food-elicited vocalizations in golden lion tamarins: Design features for representational communication. Animal Behaviour, 45, 443–455.
Berrah, A.-R., & Laboissière, R. (1999). Species: An evolutionary model for the emergence of phonetic structures in
an artificial society of speech agents. In D. Floreano, J.-D.
Nicoud, & F. Mondada (Eds.), Advances in artificial life:
The Fifth European Conference (ECAL ’99) (Vol. 1674,
pp. 674–678). Berlin: Springer.
Bickerton, D. (1990). Language and species. Chicago: Chicago
University Press.
Bickerton, D. (1998). Catastrophic evolution: The case for a
single step from protolanguage to full human language. In
J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.),
Approaches to the evolution of language (pp. 341–358).
Cambridge, UK: Cambridge University Press.
Billard, A., & Dautenhahn, K. (1999). Experiments in learning
by imitation—grounding and use of communication in
robotic agents. Adaptive Behavior, 7(3/4), 415–438.
Boer, B. de, & Vogt, P. (1999). Emergence of speech sounds in
changing populations. In D. Floreano, J.-D. Nicoud, & F.
Mondada (Eds.), Advances in artificial life: The Fifth
European Conference (ECAL ’99) (Vol. 1674, pp. 664–
672). Berlin: Springer.
Bourcier, P. de, & Wheeler, M. (1995). Aggressive signaling meets
adaptive receiving: Further experiments in synthetic behavioural ecology. In F. Morán, A. Moreno, J. Merelo, & P.
Chacón (Eds.), Advances in artificial life: 3rd European Conference on Artificial Life (pp. 760–771). Berlin: Springer.
Bradbury, J. M., & Vehrencamp, S. L. (1998). Principles of
animal communication. Sunderland, MA: Sinauer.
Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8, 25–54.
Bullock, S. (1998). A continuous evolutionary simulation model
of the attainability of honest signalling equilibria. In C.
Adami, R. Belew, H. Kitano, & C. Taylor (Eds.), Artificial
life VI (pp. 339–348). Cambridge, MA: MIT Press.
Bullock, S., & Cliff, D. (1997). The role of ‘hidden preferences’ in the artificial co-evolution of symmetrical signals. Proceedings of the Royal Society of London, Series
B, 264, 505–511.
Caine, N. (1995). Factors affecting the rates of food calls given
by red-bellied tamarins. Animal Behaviour, 50, 53–60.
Cangelosi, A. (1999). Modeling the evolution of communication: From stimulus associations to grounded symbolic
associations. In D. Floreano, J.-D. Nicoud, & F. Mondada
(Eds.), Advances in artificial life: The Fifth European
Conference (ECAL ’99) (pp. 654–663). Berlin: Springer.
Cangelosi, A., & Parisi, D. (1998). The emergence of a “language” in an evolving population of neural networks.
Connection Science, 10, 83–97.
Cangelosi, A., & Parisi, D. (2001). How nouns and verbs differentially affect the behavior of artificial organisms. In J.
D. Moore & K. Stenning (Eds.), Proceedings of the 23rd
Annual Conference of the Cognitive Science Society (pp.
170–175). London: Erlbaum.
Cangelosi, A., & Parisi, D. (2002). Simulating the evolution of
language. New York: Springer.
Caraco, T., & Brown, J. L. (1986). A game between communal
breeders: When is food-sharing stable? Journal of Theoretical Biology, 118, 379–393.
Carstairs-McCarthy, A. (1999). The origins of complex language: An inquiry into the evolutionary beginnings of sentences, syllables, and truth. Oxford: Oxford University
Press.
Catchpole, C. K., & Slater, P. J. B. (1995). Bird song: Biological themes and variations. Cambridge, UK: Cambridge
University Press.
Cerchio, S., Jacobsen, J. K., & Norris, T. F. (2001). Temporal
and geographical variation in songs of humpback whales,
megaptera novaeangliae: Synchronous change in Hawaiian and Mexican breeding assemblages. Animal Behaviour, 62, 313–329.
Cheney, D. L., & Seyfarth, R. M. (1990). How monkeys see the
world: Inside the mind of another species. Chicago: University of Chicago Press.
Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2, 113–
124.
Chomsky, N. (1975). Reflections on language. New York: Pantheon.
Chomsky, N. (1980). Human language and other semiotic
systems. In T. A. Sebeok & J. Umiker-Sebeok (Eds.),
Speaking of apes. A critical anthology of two-way communication with man (pp. 429–440). New York: Plenum
Press.
Wagner, Reggia, Uriagereka, & Wilkinson
Christiansen, M. (1994). Infinite languages, finite minds: Connectionism, learning and linguistic structure. Unpublished
doctoral dissertation, University of Edinburgh.
Cottrell, A. (1977). Emergent properties of complex systems.
In R. Duncan & M. Weston-Smith (Eds.), Encyclopedia of
ignorance (pp. 129–135). Oxford: Pergamon.
Crain, A. (1991). Language acquisition in the absence of experience. Brain and Behavioral Sciences, 14, 597–650.
Darling, J. D., & Berube, M. (2001). Interactions of singing
humpback whales with other males. Marine Mammal Science, 17(3), 570–584.
Davidson, S. M., & Wilkinson, G. S. (2002). Geographic and
individual variation in vocalizations by male Saccopteryx
bilineata (Chiroptera: Emballonuridae). Journal of Mammalogy, 83, 526–535.
Deacon, T. W. (1997). The symbolic species. New York: Norton.
Demers, R. A. (1988). Linguistics and animal communication.
In F. J. Newmeyer (Ed.), Language: Psychological and
biological aspects (Vol. 3, pp. 314–335). New York: Cambridge University Press.
Dickins, T. E. (2001). On the origin of symbols. Connexions, 5.
URL: http://www.shef.ac.uk/uni/academic/N-Q/phil/connex/
index.html.
Dingwall, W. O. (1988). The evolution of human communicative behavior. In F. J. Newmeyer (Ed.), Language: Psychological and biological aspects (Vol. III, pp. 274–313).
Cambridge, UK: Cambridge University Press.
Di Paolo, E. A. (2000). Behavioral coordination, structural congruence and entrainment in a simulation of acoustically
coupled agents. Adaptive Behavior, 8(1), 27–48.
Dircks, C., & Stoness, S. C. (1999). Effective lexicon change in
the absence of population flux. In D. Floreano, J.-D.
Nicoud, & F. Mondada (Eds.), Advances in artificial life:
The Fifth European Conference (ECAL ’99) (Vol. 1674,
pp. 720–724). Berlin: Springer.
Donald, M. W. (1993). Precis of origins of the modern mind:
Three stages in the evolution of culture and cognition.
Behavior and Brain Sciences, 16, 737–791.
Dunbar, R. (1996). Grooming, gossip, and the evolution of language. Cambridge, MA: Harvard University Press.
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith,
A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.
Elowson, A., Tannenbaum, P., & Snowdon, C. (1991). Foodassociated calls correlate with food preferences in cottontop tamarins. Animal Behaviour, 42, 931–937.
Enquist, M., & Arak, A. (1994). Symmetry, beauty and evolution. Nature, 372, 169–172.
Ferber, J. (1999). Multi-agent systems. London: Addison-Wesley.
Ficken, M. S. (1989). Acoustic characteristics of alarm calls
associated with predation risk in chickadees. Animal Behaviour, 39(2), 400–401.
Simulation of Emergent Communication
65
Fodor, J. (1998). Concepts: Where cognitive science went
wrong. New York: Oxford University Press.
Fodor, J., & Lepore, E. (1998). The emptiness of the lexicon:
Critical reflections on J. Pustejovsky’s the generative lexicon. Linguistic Inquiry, 29(2), 269–288.
Gould, S. (1991). Exaptation: A crucial tool for evolutionary
psychology. Journal of Social Issues, 47, 43–65.
Grim, P., Kokalis, T., Tafti, A., & Kilb, N. (1999). Evolution of
communication with a spatialized genetic algorithm. Evolution of Communication, 3(2), 105–134.
Grim, P., Kokalis, T., Tafti, A., & Kilb, N. (2000). Evolution of
communication in perfect and imperfect worlds. World
Futures: The Journal of General Evolution, 56, 179–197.
Haiman, J. (Ed.). (1985). Natural syntax: Iconicity and erosion.
Cambridge, UK: Cambridge University Press.
Hare, M., & Elman, J. L. (1995). Learning and morphological
change. Cognition, 56, 61–98.
Harnad, S. (1990). The symbol grounding problem. Physica D,
42, 335–346.
Hauser, M. D. (1996). The evolution of communication. Cambridge, MA: MIT Press, Bradford Books.
Hockett, C. F. (1959). Animal “languages” and human language. In J. N. Spuhler (Ed.), The evolution of man’s
capacity for culture (pp. 32–39). Detroit, MI: Wayne State
University Press.
Hockett, C. F. (1960). The origin of speech. Scientific American, 203(3), 89–96.
Hockett, C. F. (1990). A comment on design features. Anthropological Linguistics, 32(3–4), 361–363.
Hockett, C. F., & Altmann, S. A. (1968). A note on design features. In T. A. Sebeok (Ed.), Animal communication: Techniques of study and results of research (pp. 61–72).
Bloomington: Indiana University Press.
Hurd, P. L., Wachtmeister, C.-A., & Enquist, M. (1995). Darwin’s
principle of antithesis revisited: A role for perceptual
biases in the evolution of intraspecific signals. Proceedings of the Royal Society of London, Series B, 259, 201–
205.
Hurford, J. R. (1989). Biological evolution of the Saussurean
sign as a component of the language acquisition device.
Lingua, 77, 187–222.
Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon:
The development of shared symbols in interaction. In N.
Gilbert & R. Conte (Eds.), Artificial societies: The computer simulation of social life (pp. 157–189). London: UCL
Press.
Jannedy, S., Poletto, R., & Weldon, T. L. (Eds.). (1994). Language files: Materials for an introduction to language and
linguistics (6th ed.). Columbus: Ohio State University
Press.
Johnstone, R. A. (1994). Female preference for symmetrical
males as a by-product of selection for mate recognition.
Nature, 372, 172–175.
66
Adaptive Behavior 11(1)
Kaplan, F. (2000). Semiotic schemata: Selection units for linguistic cultural evolution. In M. Bedau, J. McCaskill, N.
Packard, & S. Rasmussen (Eds.), Artificial life VII: Proceedings of the Seventh Artificial Life Conference (pp.
372–381). Cambridge, MA: MIT Press.
Kirby, S. (1996). Function, selection and innateness. The emergence of lanuage universals. Unpublished doctoral dissertation, University of Edinburgh.
Kirby, S. (1998). Fitness and the selective adaptation of language. In J. R. Hurford, M. Studdert-Kennedy, & C.
Knight (Eds.), Approaches to the evolution of language
(pp. 359–383). Cambridge, UK: Cambridge University
Press.
Kirby, S. (1999). Syntax out of learning: The cultural evolution
of structured communication in a population of induction
algorithms. In D. Floreano, J.-D. Nicoud, & F. Mondada
(Eds.), Advances in artificial life: The Fifth European
Conference (ECAL ’99) (pp. 694–703). Berlin: Springer.
Kirby, S. (2001). Spontaneous evolution of linguistic structure—an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary
Computation, 5(2), 102–110.
Kirby, S. (2002). Natural language from artificial life. Artificial
Life, 8(2), 185–215.
Kirby, S., & Hurford, J. (1997). Learning, culture and evolution
in the origin of linguistic constraints. In P. Husbands & I.
Harvey (Eds.), Proceedings of the Fourth European Conference on Artificial Life (pp. 493–502). Cambridge, MA:
MIT Press.
Knight, C., Studdert-Kennedy, M., & Hurford, J. (1998). The
evolutionary emergence of language: Social function and
the origins of linguistic form. Cambridge, UK: Cambridge
University Press.
Krakauer, D. C., & Johnstone, R. A. (1995). The evolution of
exploitation and honesty in animal communication: A model
using artificial neural networks. Philosophical Transactions
of the Royal Society of London, Series B, 348, 355–361.
Krakauer, D. C., & Pagel, M. (1995). Spatial structure and the
evolution of honesty cost-free signalling. Proceedings of
the Royal Society of London, Series B, 260, 365–372.
Kvasnicka, V., & Pospichal, J. (1999). An emergence of coordinated communication in populations of agents. Artificial
Life, 5, 319–342.
Labov, W. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
Langacker, R. (1987). Foundations of cognitive grammar I:
Theoretical prerequisites. Stanford, CA: Stanford University Press.
Levin, M. (1995). The evolution of understanding: A genetic
algorithm model of the evolution of communication. BioSystems, 36, 167–178.
Lightfoot, D. (1991). Subjacency and sex. Language and Communication, 11, 67–69.
Livingstone, D., & Fyfe, C. (1999a, April). Diversity in learned
communication. Paper presented at AISB’99 Convention,
Edinburgh.
Livingstone, D., & Fyfe, C. (1999b). Modelling the evolution
of linguistic diversity. In D. Floreano, J.-D. Nicoud, & F.
Mondada (Eds.), Advances in artificial life: The Fifth
European Conference (ECAL ’99) (pp. 704–708). Berlin:
Springer.
Lund, H. H., Webb, B., & Hallam, J. (1998). Physical and temporal scaling considerations in a robot model of cricket
calling song preference. Artificial Life, 4(1), 95–107.
MacLennan, B. J., & Burghardt, G. M. (1993). Synthetic ethology and the evolution of cooperative communication.
Adaptive Behavior, 2(2), 161–188.
Manser, M. B. (2001). The acoustic structure of suricate’s
alarm calls varies with predator type and the level of
response urgency. Proceedings of the Royal Society of
London B, 268, 2315–2324.
Marler, P. (1991). Song-learning behavior—the interface with
neuroethology. Trends in Neuroscience, 14(5), 199–206.
Marler, P. (1997). Three models of song learning: Evidence
from behavior. Journal of Neurobiology, 33(5), 501–516.
Mateo, J. M. (1996). The development of alarm-call response
behaviour in free-living juvenile Belding’s ground squirrels. Animal Behaviour, 52, 489–505.
Mesterton-Gibbons, M., & Dugatkin, L. A. (1999). On the evolution of delayed recruitment to food bonanzas. Behavioral Ecology, 10(4), 377–390.
Mitani, J. C., & Marler, P. (1989). A phonological analysis of
male gibbon singing behavior. Behaviour, 109(1–2), 20–
45.
Moukas, A., & Hayes, G. (1996). Synthetic robotic language
acquisition by observation. In P. Maes, M. J. Mataric, J.A. Meyer, J. Pollack, & S. W. Wilson (Eds.), From animals to animats 4: Proceedings of the Fourth International
Conference on Simulation of Adaptive Behavior (pp. 568–
579). Cambridge, MA: MIT Press.
Murciano, A., & Millán, J. D. (1997). Learning signaling
behaviors and specialization in cooperative agents. Adaptive Behavior, 5, 5–28.
Neff, H. (2000). On evolutionary ecology and evolutionary
archaeology: Some common ground? Current Anthropology, 41(3), 427–429.
Nelson, D. A., Khanna, H., & Marler, P. (2001). Learning by
instruction or selection: Implications for patterns of geographic variation in bird song. Behaviour, 138(9), 1137–
1160.
Newman, J. A., & Caraco, T. (1989). Co-operative and non-cooperative bases of food-calling. Journal of Theoretical
Biology, 141, 197–209.
Newmeyer, F. (1991). Functional explanation in linguistics and
the origins of language. Language and Communication,
11, 3–28.
Wagner, Reggia, Uriagereka, & Wilkinson
Newmeyer, F. (1992). Iconicity and generative grammar. Language, 68, 756–796.
Noble, J. (1998). Tough guys don’t dance: Intention movements and the evolution of signalling in animal contests.
In R. Pfeifer, B. Blumberg, J.-A. Meyer, & S. W. Wilson
(Eds.), From animals to animats 5: Proceedings of the
Fifth International Conference on Simulation of Adaptive
Behavior (pp. 471–476). Cambridge, MA: MIT Press.
Noble, J. (1999a). Cooperation, conflict and the evolution of
communication. Adaptive Behavior, 7(3/4), 349–370.
Noble, J. (1999b). Sexual signalling in an artificial population:
When does the handicap principle work? In D. Floreano,
J.-D. Nicoud, & F. Mondada (Eds.), Advances in artificial
life: The Fifth European Conference (ECAL ’99) (Vol.
1674, pp. 644–653). Berlin: Springer.
Noble, J., & Cliff, D. (1996). On simulating the evolution of
communication. In P. Maes, M. J. Mataric, J.-A. Meyer, J.
Pollack, & S. W. Wilson (Eds.), From animals to animats
4: Proceedings of the Fourth International Conference on
Simulation of Adaptive Behavior (pp. 608–617). Cambridge, MA: MIT Press.
Nowak, M. A., Komarova, N. L., & Niyogi, P. (2002). Computational and evolutionary aspects of language. Nature,
417, 611–617.
Nowak, M. A., Krakauer, D. C., & Dress, A. (1999). An error
limit for the evolution of language. Proceedings of the
Royal Society of London B, 266, 2131–2136.
Nowak, M. A., Plotkin, J. B., & Jansen, V. A. A. (2000). The
evolution of syntactic communication. Nature, 404, 495–
498.
Oliphant, M. (1996). The dilemma of Saussurean communication. Biosystems, 37(1–2), 31–38.
Oliphant, M. (1999). The learning barrier: Moving from innate
to learned systems of communication. Adaptive Behavior,
7(3–4), 371–384.
Oudeyer, P.-Y. (1999). Self-organization of a lexicon in a structured society of agents. In D. Floreano, J.-D. Nicoud, & F.
Mondada (Eds.), Advances in artificial life: The Fifth
European Conference (ECAL ’99) (Vol. 1674, pp. 725–
729). Berlin: Springer.
Parisi, D. (1997). An artificial life approach to language. Brain
and Language, 59, 121–146.
Piatelli-Palmarini, M. (1989). Evolution, selection and cognition: From “learning” to parameter setting in biology and
the study of language. Cognition, 31, 1–44.
Pinker, S. (1994). The language instinct. New York: Morrow.
Pinker, S., & Bloom, P. (1990). Natural language and natural
selection. Behavioral and Brain Sciences, 13, 707–784.
Pullum, G., & Scholz, B. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19(1–2),
9–50.
Quinn, M. (2001). Evolving communication without dedicated
communication channels. In J. Kelemen & P. Sosí k
Simulation of Emergent Communication
67
(Eds.), Advances in artificial life: The Sixth European
Conference (ECAL 2001) (pp. 357–366). Berlin: Springer.
Reggia, J. A., Schulz, R., Wilkinson, G. S., & Uriagereka, J.
(2001). Conditions enabling the emergence of inter-agent
signalling in an artificial world. Artificial Life, 7(1), 3–32.
Robinson, J. G. (1994). Syntactic structures in the vocalizations of
wedge-capped capuchin monkeys, Cebus-olivaceus. Behaviour, 90, 46–79.
Ronald, E., Sipper, M., & Capcarrere, M. (1999). Design,
observation, surprise! A test of emergence. Artificial Life,
5, 225–239.
Ryan, M. J., Phelps, S. M., & Rand, A. S. (2001). How evolutionary history shapes recognition mechanisms. Trends in
Cognitive Sciences, 5(4), 143–148.
Saddy, D., & Uriagereka, J. (in press). Language and complexity: A tutorial. International Journal of Bifurcation and
Chaos.
Sagi, B., Nemat-Nasser, S. C., Kerr, R., Hayek, R., Downing,
C., & Hecht-Nielsen, R. (2001). A biologically motivated
solution to the cocktail party problem. Neural Computation, 13(7), 1575–1602.
Saunders, G. M., & Pollack, J. B. (1996). The evolution of
communication schemes over continuous channels. In P.
Maes, M. J. Mataric, J.-A. Meyer, J. Pollack, & S. W. Wilson (Eds.), From animals to animats 4: Proceedings of the
Fourth International Conference on Simulation of Adaptive Behavior (pp. 580–589). Cambridge, MA: MIT Press.
Savage-Rumbaugh, S., & Lewin, R. (1994). Kanzi: The ape on
the brink of the human mind. New York: Wiley.
Sayigh, L. S., Tyack, P. L., Wells, R. S., Scott, M. D., & Irvine,
A. B. (1995). Sex difference in signature whistle production of free-ranging bottlenose dolphins, Tursiops truncatus. Behavioral and Ecological Sociobiology, 36, 171–
177.
Seyfarth, R. M., & Cheney, D. L. (1997). Some general features of vocal development in nonhuman primates. In C.
T. Snowdon & M. Hausberger (Eds.), Social influences on
vocal development (pp. 249–273). Cambridge, UK: Cambridge University Press.
Slobodchikoff, C. N., Kiriazis, J., Fischer, C., & Creef, E.
(1991). Semantic information distinguishing individual
predators in the alarm calls of Gunnison’s prairie dogs.
Animal Behaviour, 42, 713–719.
Smith, A. D. M. (2001). Establishing communication systems
without explicit meaning transmission. In J. Kelemen & P.
Sosí k (Eds.), Advances in artificial life: The Sixth European
Conference (ECAL 2001) (pp. 381–390). Berlin: Springer.
Smith, K. (2002a). Natural selection and cultural selection in
the evolution of communication. Adaptive Behavior, 10,
(pp. 25–44).
Smith, K. (2002b). The cultural evolution of communication in
a population of neural networks. Connection Science,
14(1), 65–84.
68
Adaptive Behavior 11(1)
Steele, J. (1994). Communication networks and dispersal patterns in human evolution: A simple simulation model.
World Archaeology, 26(2), 126–143.
Steels, L. (1997). The synthetic modeling of language origins.
Evolution of Communication, 1(1), 1–34.
Steels, L. (1998a). The origins of syntax in visually grounded
robotic agents. Artificial Intelligence, 103(1–2), 133–156.
Steels, L. (1998b). Synthesising the origins of language and
meaning using co-evolution, self-organisation and level
formation. In J. R. Hurford, M. Studdert-Kennedy, & C.
Knight (Eds.), Approaches to the evolution of language
(pp. 384–404). Cambridge, UK: Cambridge University
Press.
Steels, L., & Kaplan, F. (1999). Collective learning and semiotic
dynamics. In D. Floreano, J.-D. Nicoud, & F. Mondada
(Eds.), Advances in artificial life: The Fifth European Conference (ECAL ’99) (pp. 679–688). Berlin: Springer.
Steels, L., & Oudeyer, P.-Y. (2000). The cultural evolution of
syntactic constraints in phonology. In M. Bedau, J.
McCaskill, N. Packard, & S. Rasmussen (Eds.), Artificial
life VII: Proceedings of the Seventh Artificial Life Conference (pp. 382–391). Cambridge, MA: MIT Press.
Ujhelyi, M. (1996). Is there any intermediate stage between
animal communication and language? Journal of Theoretical Biology, 180, 71–76.
Uriagereka, J. (1998). Rhyme and reason. MA: MIT Press.
Wagner, K. (2000). Cooperative strategies and the evolution of
communication. Artificial Life, 6(2), 149–179.
Wagner, K., & Reggia, J. A. (2002). Evolving consensus
among a population of communicators. Complexity International, 9. http://www.life.csu.au/ci/vol09/wagner01.
Webb, B., & Hallam, J. (1996). How to attract females: Further
robotic experiments in cricket phonotaxis. In P. Maes, M.
J. Mataric, J.-A. Meyer, J. Pollack, & S. W. Wilson (Eds.),
From animals to animats 4: Proceedings of the Fourth
International Conference on Simulation of Adaptive
Behavior (pp. 75–83). Cambridge, MA: MIT Press.
Weiss, G. (Ed.). (1999). Multiagent systems. Cambridge, MA:
MIT Press.
Werner, G. M., & Dyer, M. G. (1991). Evolution of communication in artificial organisms. In Artificial life II, SFI studies in the sciences of complexity (Vol. X, pp. 659–687).
Reading, MA: Addison-Wesley.
Werner, G. M., & Dyer, M. G. (1994). Bioland: A massively
parallel simulation environment for evolving distributed
forms of intelligent behavior. In H. Kitano (Ed.), Massively parallel AI (pp. 317–349). Cambridge, MA: MIT
Press.
Werner, G. M., & Todd, P. M. (1997). Too many love songs:
Sexual selection and the evolution of communication. In
P. Husbands & I. Harvey (Eds.), Fourth European Conference on Artificial Intelligence (pp. 434–443). Cambridge,
MA: MIT Press.
Wind, J., Pulleyblank, E. G., Grolier, E. de, & Bichakjian, B. H.
(Eds.). (1989). Studies in language origins (Vol. I). Amsterdam: Benjamins.
Zahavi, A., & Zahavi, A. (1997). The handicap principle: A
missing piece of Darwin’s puzzle. New York: Oxford University Press.
Zuberbuhler, K. (2002). A syntactic rule in forest monkey communication. Animal Behaviour, 63, 293–299.
Wagner, Reggia, Uriagereka, & Wilkinson
Simulation of Emergent Communication
69
About the Authors
Kyle Wagner received his doctorate in computer science and cognitive science (double
major) from Indiana University. He spent two years in an NIH postdoctoral fellowship at
the University of Maryland Baltimore and at the University of Maryland Institute for
Advanced Computer Studies (at UM College Park). His work has focused mainly on artificial life investigations of the evolution of communication and language. He currently
works at Sparta, Inc., designing and writing software for physics-based modeling and
using genetic algorithms for optimization.
James A. Reggia is a professor of computer science at the University of Maryland, with
joint appointments in the Institute for Advanced Computer Studies and in the Department
of Neurology of the School of Medicine. He received his Ph.D. in computer science from
the University of Maryland and also has an M.D. with advanced training and board certification in neurology. His research interests are in the general area of biologically inspired
computation, including neural computation, adaptive and/or selforganizing systems, and
evolutionary computation. Address: Department of Computer Science, University of Maryland, College Park, MD 20742, USA. E-mail: reggia@cs.umd.edu
Juan Uriagereka is professor of linguistics at the University of Maryland, and visiting chair
at the Philology Department of the University of the Basque Country. He received his Ph.D.
in linguistics from the University of Connecticut. His research interests are in syntax, comparative grammar, and architectural questions of language, including its origins and its
development in infants, as well as its neuro-biological bases. He received the National
Euskadi Prize for research in the social sciences in 2001, from the Basque government.
Gerald S. Wilkinson is a professor of biology at the University of Maryland, College Park.
He received his Ph.D. in biology from the University of California at San Diego and held
postdoctoral fellowships at the Department of Biological Sciences, University of Sussex
and at the Institute of Behavioral Genetics, University of Colorado, Boulder. His research
focuses on the evolution of social behavior, especially communication and cooperation in
bats and sexual selection and mating behavior in flies.