Inferring semantic maps
TERRY REGIER, NAVEEN KHETARPAL, and ASIFA MAJID
Abstract
Semantic maps are a means of representing universal structure underlying semantic variation. However, no algorithm has existed for inferring a graphbased semantic map from cross-language data. Here, we note that this open
problem is formally identical to the known problem of inferring a social network from disease outbreaks. From this identity it follows that semantic map
inference is computationally intractable, but that an efficient approximation
algorithm for it exists. We demonstrate that this algorithm produces sensible
semantic maps from two existing bodies of data. We conclude that universal semantic graph structure can be automatically approximated from crosslanguage semantic data.
Keywords: algorithm, indefiniteness, semantic maps, semantics, spatial relations
1. Introduction
Languages vary in their semantic categories – that is, in the range of semantic
functions or uses picked out by their linguistic forms. However, many possible semantic categories are not attested, and similar categories often appear in
unrelated languages. This pattern of constrained variation suggests a universal
conceptual basis underlying the variation, such that different languages provide different snapshots of the same conceptual terrain. A semantic map is a
means of capturing this idea, representing both presumed universal structure
and language-specific partitionings of that structure.
A semantic map often takes the form of a discrete graph structure (e.g., Bybee et al. 1994, van der Auwera & Plungian 1998, Haspelmath 1997). More
Linguistic Typology 17 (2013), 89–105
DOI 10.1515/lingty-2013-0003
1430–0532/2013/017-089
©Walter de Gruyter
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
90
Terry Regier, Naveen Khetarpal, and Asifa Majid
predicative
possessor
external
possessor
direction
recipient
beneficiary
purpose
experiencer
to
judicantis
Figure 1. A semantic map of typical dative functions, with the semantic range of English
to shown in dotted outline. French à is similar to English to, but excludes “purpose”
and includes “predicative possessor”. From Haspelmath (2003: 213).
recently semantic maps based on continuous representations have also been
proposed (e.g., Croft & Poole 2008; Cysouw 2001, 2007; Levinson et al. 2003;
Majid et al. 2008). In both traditions, the inferred underlying structure is sometimes interpreted as capturing the conceptual similarity between different semantic functions (e.g., Croft 2003, Croft & Poole 2008); in other work, no
such attribution is made, and a semantic map is viewed simply as a compact
description of attested variation, leaving open the possibility that the structure
of the map may reflect extra-cognitive, such as diachronic or communicative,
factors (e.g., Bybee et al. 1994, Cristofaro 2010). A carefully neutral statement
of the purpose of a semantic map is that it attempts to “visually represent crosslinguistic regularity in semantic structure” (Cysouw et al. 2010: 1). In this article, we use the term “semantic map” to refer specifically to graph-based maps,
and we do not assume that the structure of the graph must necessarily accurately reflect cognitive reality – although we agree with Croft (2010) that it is
likely to often do so.
Formally, a (graph-based) semantic map is a graph in which vertices (nodes)
represent semantic functions or uses, and edges (links) connect closely related
semantic functions. For a given semantic map, the semantic functions and the
connections between them are assumed to be universal. The meaning of a given
linguistic form is then represented as a language-specific grouping of vertices
into a connected region of the universal graph.
An example is given in Figure 1. This semantic map, from Haspelmath
(2003: 213), shows a set of typical semantic functions of the dative, and also
shows the semantic range of the English word to as a connected subset of this
universal graph. This to subset comprises the functions direction (e.g., She
went to Philadelphia), recipient (e.g., He gave the book to his sister), experiencer (e.g., That seems loud to me), and purpose (e.g., I did it to see what
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
91
would happen). French à occupies an overlapping but distinct connected subset, and forms from other languages occupy yet other connected subsets. The
semantic map connectivity hypothesis (Croft 2003: 134) is the proposal
that language-specific categories will always pick out connected subsets of the
graph. For example, given the semantic map in Figure 1, this hypothesis predicts that any linguistic form that expresses both recipient and purpose will
also express direction, since any connected region containing both recipient
and purpose must also include direction.
This hypothesis captures the widely-shared intuition that linguistic categories denote connected regions of conceptual or perceptual space: cf. Nerlove
& Romney’s (1967) observation that languages tend to avoid disjunctively defined kinship categories, and Roberson’s (2005) notion of “grouping by similarity” in color naming. Once a semantic map has been constructed to fit a
body of cross-language data, the expectation is that new categories from asyet-unexamined languages will also pick out connected subgraphs – possibly
novel connected subgraphs. A semantic map thus compactly represents what
patterns of variation one may and may not expect to find in a given semantic
domain, and the underlying graph has been taken to represent “a common human cognitive heritage” (Croft 2003: 139). Semantic maps have been widely
used to represent cross-language semantic variation over a presumably universal base; for recent reviews see Haspelmath (2003) and Cysouw et al. (2010)
plus other papers in the same volume.
The task of constructing a semantic map in graph form from cross-language
data is generally done by hand, and the task can be time-consuming with moderate to large-sized datasets. It would therefore be useful to automate this process; however the computational problem of inferring such a universal semantic
map from cross-language data has not been formally addressed. Croft & Poole
(2008) conjectured that this problem may be computationally intractable, and
they considered this potential intractability to be a shortcoming of graph-based
semantic maps as a representational tool in semantic typology. In contrast, a
continuous map may be straightforwardly inferred from data using well-known
computational techniques such as multidimensional scaling, and this fact has
been held to be an advantage of continuous over graph-based representations
for semantic maps (Croft & Poole 2008, Cysouw 2001, Wälchli 2010). Here,
we address the semantic map inference problem in formal terms, in the previously unexplored case of graph-based semantic maps.
In what follows, we first note that the semantic map inference problem is
formally identical to another problem that superficially appears unrelated: inferring a social network from outbreaks of disease in a population. Angluin et
al. (2010) have recently shown that this social network inference problem is
computationally intractable, but that an efficient algorithm exists that approximates the optimal solution nearly as well as is theoretically possible; it follows
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
92
Terry Regier, Naveen Khetarpal, and Asifa Majid
that both the computational intractability and the applicability of the approximation algorithm hold of semantic map inference. We then apply this algorithm
to the cross-language data of Haspelmath (1997) on indefinite pronouns and of
Levinson et al. (2003) on spatial categories, in both cases yielding sensible and
useful results. We conclude that presumptively universal structure consistent
with cross-language semantic data can be straightforwardly inferred, that the
issue of computational intractability – while real – need not deter researchers,
and that formalization of problems in semantic typology can highlight useful
connections to structurally related problems elsewhere.
2. The semantic map inference problem
The semantic map inference problem can be stated informally as follows. We
are given a set of semantic functions or uses within a particular semantic range
(e.g., recipient, purpose, direction, etc. from the range of the dative, as in
Figure 1). We are also given a set of groupings of these functions into semantic
categories from various languages; each such grouping picks out the semantic
functions that may be expressed by a given linguistic form (e.g., the functions
of English to shown in dotted outline in Figure 1). We assume that each such
grouping picks out a connected region of an underlying universal network of
semantic functions, but we are not given the connections of that network. Instead, we wish to infer the set of connections between semantic functions that
best explains the observed groupings.
This problem can be formalized as follows, illustrated in Figure 2. Given
a set V of vertices (representing semantic functions), and a set of constraints
Si ⊆ V (representing a set of language-specific groupings of these functions into
categories), we wish to find the minimum set of edges E between the vertices
of V such that each Si picks out a connected subgraph of the graph G = (V, E).
By asking for the minimum set E we avoid trivial and uninformative solutions
such as those in which all vertices are connected. Moreover, because edges
are inferred rather than directly observed, the existence of each edge must be
assumed; this means that by minimizing the number of edges, we minimize the
number of assumptions made, and thus privilege parsimonious solutions to the
problem.
Angluin et al. (2010) treated a formally identical problem. They wished to
infer a social network from observations of disease outbreaks in a population.
Thus vertices V now represent people, and each constraint Si ⊆ V represents the
subset of people observed to have been affected by a particular disease outbreak
i. For example, a particular Si might represent the set of people observed to
have caught a cold last November. Angluin et al. (2010) assumed that disease
is spread by social contact, and they represented social contact between two
people as an edge between the corresponding two vertices. They wished to find
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
93
Figure 2. Formalization of the semantic map inference problem. We are given a set of
semantic functions (vertices V, shown as small circles), and groupings of these functions
into language-specific categories (constraints Si ⊆ V , each shown by a dashed outline).
We seek the minimum set of edges E (shown as links between vertices) such that each
grouping picks out a connected region of the overall graph G = (V, E).
the social network that could best account for the observed outbreaks – that is,
the minimum set of edges E such that each constraint Si picks out a connected
subgraph of the overall social graph G = (V, E). This social network inference
problem is formally the same as the semantic map inference problem; therefore
any formal results concerning one also apply to the other.1 (See also Dahl 2001:
1469 for a different disease analogy concerning grammaticalization.)
Some problems can be shown to be computationally intractable, in the sense
that it is expected that there does not exist an efficient algorithm that will always find the optimal solution (Garey & Johnson 1979). If a problem is computationally intractable in this sense, it is natural to abandon the search for an
optimal solution and to ask instead whether an approximation to the optimal
solution can be found efficiently. For some problems it can be shown that even
this fallback goal of approximation is hard (e.g., Trevisan 2004; Vazirani 2001:
306–333), meaning that there exists a value r such that no efficient algorithm
can be expected to always approximate the optimal solution to within a factor
of r. Angluin et al. (2010) showed that the social network inference problem
is hard to approximate in this sense; therefore the same holds of the semantic
map inference problem. This result confirms Croft & Poole’s (2008) suspicion:
the semantic map inference problem is indeed computationally intractable, and
1. Angluin et al. (2010) considered several variants of the social network inference problem.
The specific variant to which we refer here is the one they label the “offline uniform cost
network inference problem”; it corresponds to traditional graph-based semantic maps with
unweighted edges. Other variants discussed by Angluin et al. (2010) are applicable to the
suggestion (Cysouw 2007: 233) that edges in semantic maps may usefully be weighted, to
capture how often a given pair of semantic functions co-occurs.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
94
Terry Regier, Naveen Khetarpal, and Asifa Majid
moreover is hard to approximate. However, this finding leaves open the possibility that an efficient algorithm may nonetheless produce approximations that
are of high enough quality to be useful.
3. The network inference algorithm
Angluin et al. (2010) presented an efficient algorithm for the social network
inference problem and proved that it approximates the optimal solution nearly
as closely as theoretically possible. Following the statement of the inference
problem above, their algorithm is given a set V of vertices (which in the case of
semantic map inference represent semantic functions) and a set of constraints
Si ⊆ V (which in the case of semantic maps represent a set of language-specific
groupings of these functions into categories). It begins with no edges E between the vertices. It then introduces edges one by one in order of their utility
(specified below), until each constraint Si picks out a connected region of the
overall graph.
Informally, the utility or usefulness of a proposed edge is the extent to which
it contributes to the overall goal of the algorithm, namely a graph in which each
constraint Si picks out a connected region. For example, in Figure 2, it is visually clear that the already-inserted edge in the upper right portion of the graph
(call it e) contributes to the connectedness of two constraints, whereas other
already-inserted edges and other possible edges (not shown) each contribute to
the connectedness of one constraint or no constraints. For this reason, beginning with no edges at all, e would have the highest utility and would be the first
edge to be introduced.
This informal notion is captured formally by Angluin et al. (2010) by relying
on the notion of a connected component. A connected component of a graph
is a maximal connected subgraph – that is, a connected subgraph to which
no further vertices may be added without losing this connectedness. Consider
again the graph in Figure 2. Prior to any edges having been inserted, the initial
graph (consisting only of vertices) would have had four connected components,
one corresponding to each vertex. The same graph but with only the aboveidentified edge e inserted, and no other edges, would have three connected
components: one component consisting of e and the two vertices it connects,
and one component for each of the two remaining vertices. Finally, the graph
as it is shown contains just one connected component, because the graph as a
whole is connected. With this by way of background, the Angluin et al. (2010)
algorithm operates as follows.
Let si denote the subgraph of G = (V, E) that is picked out (induced) by constraint Si , and let ncci denote the number of connected components within si .
When there are no edges connecting the vertices of si , ncci equals the number
of vertices in si ; this is its maximum possible value. When si is connected, ncci
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
95
equals 1, its minimum possible value. In general, the lower the value of ncci ,
the closer constraint Si is to being satisfied, i.e., the closer the subgraph induced
by Si is to being connected. Let C be an objective function defined as:
C = ∑(1 − ncci)
i
The algorithm begins with an empty edge set E. This yields a strongly negative
value for C (except in the trivial case in which each constraint contains only one
vertex). The algorithm then adds to E the edge that yields the greatest increase2
in C. This steepest ascent step is repeated until all constraints are satisfied,
i.e., until C = 0. Python code implementing this algorithm can be found in
Appendix A (and a sample input file in Appendix B) in the Supplementary
Online Material.
Because this is an approximation algorithm, it is not guaranteed to find the
optimal solution to a given instance of the problem. For our purposes, the relevant question is whether the degree of approximation attained by this algorithm
is adequate to produce high-quality semantic maps from cross-language data.
We will consider a map to be high-quality if it is relatively parsimonious –
i.e., if it accommodates the data using few edges – and we leave for future
work the exploration of other criteria of success, e.g., correctly inferring independently known cognitive or diachronic connections. With this parsimony
criterion in mind, we turn now to test the algorithm empirically, against two
well-established bodies of such data.
4. Indefinite pronouns
Haspelmath (1997) examined the semantic uses of indefinite pronouns, such as
anybody, someone, and semantically related forms in other languages, through
a large-scale cross-language study. His primary database contained 140 semantic categories, each associated with a linguistic form, from a total of 40
languages. This database is presented in full in his 1997 book. Each category
picked out some subset of the following nine semantic functions, illustrated
below with examples from Haspelmath (1997):
(i)
specific, known to speaker: Somebody called while you were away:
guess who!
(ii) specific, unknown to speaker: I heard something, but I couldn’t tell
what kind of sound it was.
(iii) non-specific, irrealis: Please try somewhere else.
2. There may be instances in which more than one edge yields the same maximal increase in C.
In such circumstances, the choice between these possibilities is not specified by the algorithm
statement given here, and our implementation chooses among these possibilities arbitrarily,
by the order in which edges are considered.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
96
Terry Regier, Naveen Khetarpal, and Asifa Majid
(iv)
(v)
(vi)
polar question: Did anybody tell you anything about it?
conditional protasis: If you see anything, tell me immediately.
standard of comparison: In Freiburg, the weather is nicer than anywhere in Germany.
(vii) direct negation: Nobody knows the answer.
(viii) indirect negation: I don’t think that anybody knows the answer.
(ix) free choice: Anybody can solve this simple problem.
For example, the English form someone can serve the following five semantic functions: specific known, specific unknown, irrealis, question, and
conditional. Based on the cross-language database, Haspelmath (1997: 64)
constructed the semantic map shown in Figure 3. Each of the categories in his
40-language database corresponds to a connected subgraph of this graph, and
the expectation is that the same will hold for forms from languages not yet
examined.
Croft & Poole (2008) re-examined Haspelmath’s (1997) 40-language database, and concluded that the edge from irrealis non-specific to conditional is not necessary – that is, that the connectedness of each category in
the database can be maintained without this edge. They took this finding to
support their argument that “the best conceptual space is not easy to find by
hand” (Croft & Poole 2008: 6), and concluded that the absence of an automated method for inferring semantic maps from data is a potentially serious
limitation.
We ran Angluin et al.’s (2010) algorithm on Haspelmath’s 40-language
database, which he kindly shared with us in electronic form, and obtained the
semantic map suggested by Croft & Poole’s observation – that is, the same
map as Haspelmath’s minus the one disputed edge. Thus this algorithm, and
Croft & Poole, have found a simpler map than that provided by Haspelmath.
Moreover, this simpler map is guaranteed by the algorithm to be sufficient to
account for the 40-language sample. Whether this simpler map will also account for further data remains an open question. Haspelmath (1997: 64) states
that his map was based both on the 40-language sample and on some data
specific
known
specific
unknown
question
indirect
negation
direct
negation
conditional
comparative
free choice
irrealis
non-specific
Figure 3. A semantic map for indefinite pronouns, adapted from Haspelmath (1997: 64).
The dashed edge from “irrealis non-specific” to “conditional” is included by Haspelmath, but not by Angluin et al.’s (2010) network inference algorithm.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
97
beyond it, so it is possible that the disputed edge is necessitated by data outside the sample. However, whatever the outcome of that question, the present
study demonstrates that Angluin et al.’s (2010) algorithm produces output that
is comparable in quality (parsimony) with an influential published semantic
map, and thus establishes the usefulness of this algorithm as a means for inferring universal structure from cross-language data.
5. Spatial categories
Having tested the algorithm against a dataset that covers a small number of
semantic functions or uses, we wished to further assess it using a dataset that
covers a greater number. This would be very time-consuming to do by hand;
it is presumably for this reason that most published semantic maps are small.
We had two specific goals. The first was to determine whether the structure
produced by the algorithm over this more complex domain was intuitively sensible. The second goal was to determine whether the inferred structure would
accommodate data from a language other than those considered in building the
map – that is, whether the structure inferred by the algorithm would generalize
beyond the training set.
We conducted this test in the semantic domain of spatial relations. Spatial
categories across languages show both universal tendencies and cross-language
differences, as illustrated in Figure 4 and supported in greater detail by Bowerman (1996), Levinson et al. (2003), and Talmy (2000), among others. This
mixture of universals and variation seems in principle capturable in terms of a
semantic map – and indeed it has been captured in terms of continuous maps
Figure 4. Universal tendencies and variation in spatial categorization. The four spatial
relations in the left panel all fall in the same category in English (in), and also all fall
in a single category in Dutch and in Yélî-Dnye. The four spatial relations in the right
panel all fall in the same category in English (on; long dashed outline), but they are
categorized differently in Dutch (solid outlines) and Yélî-Dnye (short dashed outlines).
Based on the spatial dataset we treat in this article.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
98
Terry Regier, Naveen Khetarpal, and Asifa Majid
(e.g., Croft & Poole 2008, Levinson et al. 2003). We sought to accommodate
the same data using a large-scale automatically constructed graph-based map.
We relied on an existing dataset of cross-language spatial naming data, based
on 71 pictures portraying simple spatial relations. These stimuli were originally
designed by Bowerman & Pederson (1992, 1993); the scenes in Figure 4 are
adapted from scenes in this set. Levinson et al. (2003) analyzed the spatial
terms applied to these pictured spatial relations by speakers of nine unrelated
languages: Basque, Dutch, Ewe, Lao, Lavukaleve, Tiriyó, Trumai, Yélî-Dnye,
and Yukatek. They describe the spatial naming data elicitation technique as
follows (Levinson et al. 2003: 487):
Each picture has a designated figure (or theme or trajector) colored yellow, and
a ground object (or relatum or landmark), and the researcher uses the pictures to
set up a verbal scenario as close as culturally possible to that depicted, and asks
the consultant to answer a question of the form: ‘Where is the [Figure]?’ (given
the sketched scenario).
Levinson and Meira kindly shared with us the spatial naming data they had
available, resulting from elicitation sessions with speakers of the above nine
languages, against the 71 scenes described above. We took these data as our
dataset.
Our treatment of the data followed theirs as closely as possible. They describe their data treatment as follows (Levinson et al. 2003: 503):
[E]ach language was treated on its own. An average of the consultants’ responses
was calculated: for the languages with many consultants [. . . ] a picture was ascribed to a certain adposition when more than 50 % of the consultants used it;
for languages with four or five consultants, a picture was ascribed to a certain
adposition if at least two of them used it; for the languages with three or fewer
consultants, a picture was ascribed to a certain adposition if any of the consultants
used it.
We followed this procedure for those 7 of the 9 languages for which data from
individual speakers were currently available. However, for the remaining two
languages, Ewe and Yukatek, data from individual speakers were not available,
and thus we could not follow the above procedure. For these two languages
only, we instead used summary data provided by Sérgio Meira.
A natural means of assessing a semantic map is to first construct the map
based on data from one set of languages (which may be considered the training
set), and to then see whether the resulting map also accommodates data from
other languages (the test set). By the definition of the semantic map inference
problem, the categories in the training set are guaranteed to pick out connected
subgraphs of the resulting map; what is not known is whether the categories in
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
99
the test set will as well. They should, to the extent that the inferred structure
accurately reflects universal constraints on semantic variation.
We took the data from the nine languages in the dataset to be our training
set, and we took the spatial terms of English, applied to the same stimuli, to
be our test set. The three authors, all native speakers of English, each independently named each of the scenes in English. A scene was assigned to an English
spatial term when at least two of the three authors used the term to name that
scene.
We obtained a semantic map from the training set via Angluin et al.’s (2010)
network inference algorithm. The results are shown in Figure 5. Edges tend to
connect closely conceptually related scenes; thus it appears that the inferred
structure is intuitively sensible, at least on informal inspection. We then asked
whether the spatial naming system of English was compatible with this map –
that is, whether the spatial categories of English pick out connected subgraphs
of the overall graph. Figure 5 shows that they do for all categories but one.
There are four classes of English spatial category in this figure, distinguished
by their outlines. The categories shown in light dotted outline (against, behind,
in front of, through) contain only one scene each and are thus uninformative
about connectivity. The categories shown in dashed outline (next to, under)
correspond to connected subgraphs – that were also present in the training set
(associated with other, non-English, forms) and that are therefore necessarily
connected in this map. The categories shown in solid outline (around, in, on)
are informative: these categories are not present in the training data, and are
nonetheless connected – thus they confirm a prediction implicitly made by the
structure of the semantic map concerning what categories one may expect to
find beyond the training data. Finally, the one category shown in dotted-anddashed outline (over) is not present in the training data, and is not connected in
this map – it thus violates the prediction that novel categories should conform
to the induced structure.
How are we to evaluate these results? Unlike the case of indefinite pronouns
discussed above, in the case of the spatial dataset there is no human-generated
graph-based semantic map to which we may compare our results; they must be
evaluated in their own terms. Strictly speaking, the model has failed to accommodate all the English data. At the same time, it has succeeded in accommodating almost all the data. What we need, rather than a categorical designation
of success or failure, is a quantitative measure of degree of fit. There is no standard measure of degree of fit for graph-based semantic maps (Cysouw 2007:
228), so we propose our own. Specifically, we use the objective function C
described in Section 3 above. That function reaches its maximum value of 0
when each semantic category in the data corresponds to a connected region of
the network. The extent to which C is less than 0 measures how far a given
semantic map is from fitting the data perfectly.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
100
Terry Regier, Naveen Khetarpal, and Asifa Majid
AGAINST
ON
OVER
BEHIND
NEXT
TO
IN FRONT OF
AROUND
UNDER
THROUGH
IN
Figure 5. A semantic map of spatial meanings, obtained from Levinson et al.’s (2003)
spatial language data. Spatial semantic categories of English are shown as outlined
regions of this map. Dotted outline = singleton category; dashed outline = category
present in the training data; solid outline = novel connected category; dotted-anddashed outline = novel unconnected category. A higher-resolution version of this figure
is available at http:// linguistics.berkeley.edu/ ~regier/ semantic-maps/ .
In the case of the semantic map shown in Figure 5, tested against the English data shown in that figure, C = −1. This is close to the ideal of 0, but
that fact leaves important questions unresolved: Is it in any sense surprising or
informative that the map fits the English data to the degree that it does? Would
any system of categories of complexity comparable to English fit the semantic map as well as this? Or does English fit the structure of the semantic map
significantly better than other systems of comparable complexity would?
We sought to answer these questions through a permutation test, as follows.
We began with the semantic map shown in Figure 5, in which each scene is
labeled with an English spatial term. We then considered hypothetical variants
that retained the same network structure, the same number of English cate-
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
101
gories, and the same number of scenes per category – but randomly reassigned
which English labels were assigned to which scenes.3 Thus we consider a space
of possible naming systems that are of complexity comparable to English, but
that differ in the ways they partition the spatial semantic map. We sampled
105 (100,000) such hypothetical systems, without replacement, and measured
C for each. We found that the value of C obtained from the actual English data
shown in Figure 5 was higher than for any of these hypothetical systems (min
= −46, max = −11). We conclude that the English system fits the structure
of the network better than do hypothetical systems of comparable complexity.
Importantly, if the semantic map had been very densely connected, rearrangements of the labels should not have affected the degree of fit very much because
most possible categories in such a map would be connected. Thus the present
outcome suggests that the inferred semantic map of Figure 5 provides a description of the data that is sparse (parsimonious) enough, and thus constrained
enough, that it accommodates attested data better than it does comparable hypothetical data.
The semantic map of Figure 5 is based on a small set of languages, against
a larger set of stimuli than is common. The map’s approximation to universal
structure is presumably correspondingly loose – as is suggested by its imperfect fit to a novel language, English. A more complete test of these ideas will
require a larger set of crosslinguistic data. Nonetheless, these results do show
that the network inference algorithm can produce interpretable and intuitively
reasonable semantic maps with a large number of vertices, that at least some of
the predictions the resulting map makes about categories from new languages
are supported, and that the resulting map is relatively parsimonious. These findings support the proposal (e.g., Croft & Poole 2008, Levinson et al. 2003) that a
universal representation may underlie the substantial cross-language variation
in spatial semantic systems.
These results also raise a more general theoretical question, concerning the
adequacy of connectedness as a constraint on semantic categories. The map in
Figure 5 supports the categories in the training set, and most of those in the test
set, as connected regions – but it also supports many other connected regions
that seem implausible as semantic categories. For example, one may trace an
elongated connected region that starts at one corner of the figure and extends
in a chain to the opposite corner, picking out a series of connected scenes that
each seem conceptually related to their immediate neighbors in the chain, but
3. The procedure we used to create each such hypothetical variant is as follows. Randomly select
one of the English spatial terms – call the number of scenes associated with this term k. Then
select k random scenes and group them into a category. Continue by selecting another English
term and creating the next category from the set of as-yet-uncategorized scenes. Repeat this
until all scenes are categorized and all English terms have been selected.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
102
Terry Regier, Naveen Khetarpal, and Asifa Majid
that do not hang together as a whole, and that exclude other conceptually related scenes. Categories do often pick out short chains of related meanings
(e.g., Lakoff 1987). For example, Bowerman & Pederson (1993) have identified an apparently universal sequencing or chain of spatial meanings (a subset
of those meanings explored here), ranging from in to on, such that spatial terms
from different languages pick out different subchains of the overall chain; this
is effectively a semantic map in the form of a chain, with spatial terms picking out subchains. But these were relatively short chains of meaning, and it
seems counterintuitive that a category would have the extremely high degree
of elongation shown by the chain imagined above, in the context of Figure
5, without any coherent and reasonably compact core or central region. Thus,
these results underscore the previously-noted fact that connectedness appears
to be too loose a constraint on category shape (Croft 2003: 138, Cysouw 2001:
609), and that categories may tend to be more compact and coherent than is
suggested by this constraint alone. This question mirrors a debate in the literature on color naming, over whether color terms pick out merely connected
regions of perceptual color space, that might exhibit high degrees of chaining
or elongation (Roberson et al. 2000, Roberson 2005) or regions that are both
connected and compact (Jameson & D’Andrade 1997). Although the question
is implicit in the use of connectedness as a constraint in semantic maps generally, it becomes especially prominent given large maps such as that in Figure 5
– and the creation of such maps is facilitated by the availability of an algorithm
for inferring such maps from data.
6. General discussion
We have seen that the problem of inferring presumptively universal structure
from cross-language semantic data is formally identical to the problem of inferring a social network from disease outbreaks in a population. From this identity
it follows that semantic map inference is computationally intractable, confirming an earlier conjecture to this effect. However it also follows that an existing
approximation algorithm for social network inference may be applied to linguistic data, and we have seen that this algorithm yields sensible results when
applied to two cross-language datasets of semantic categories.
Several questions are left open by these findings. It is unclear how well this
algorithm, or any approximation algorithm that may be proposed to replace
it, will perform on other datasets. It is also unclear which semantic domains,
and which questions within these domains, are best approached using graphbased semantic maps, rather than another means of inferring the universal bases
of semantic variation – for example, continuous representations such as those
produced by multi-dimensional scaling and similar procedures (e.g., Cysouw
2001, Croft & Poole 2008, Levinson et al. 2003, Majid et al. 2008). Finally, the
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
103
present results highlight the possibility that connectedness may be too loose a
constraint on category shape, but they do not determine how best to address this
shortcoming: whether it is preferable to supplement connectedness by further
constraints (e.g., Croft 2003: 138), to use weighted edges that reflect the frequency with which pairs of semantic functions co-occur (Cysouw 2007: 233),
or to pursue a different account altogether, such as the view that semantic systems across languages reflect the need for informative communication (e.g.,
Jameson & D’Andrade 1997, Kemp & Regier 2012, Regier et al. 2007). Settling these open questions will require further investigation.
Nonetheless, two broad conclusions can be drawn. First, high-quality (that
is, relatively parsimonious) semantic maps can be efficiently inferred from
cross-language data, and the question of computational tractability should
therefore not be viewed as an obstacle to using them. Second, and more generally, these results suggest that the formalization of problems in semantic typology can lead to insight from structurally similar problems in unrelated domains.
Received: 16 July 2011
Revised: 25 September 2012
University of California, Berkeley
University of Chicago
Max Planck Institute for Psycholinguistics
Correspondence addresses: (Regier) Department of Linguistics, 1203 Dwinelle Hall, University
of California Berkeley, Berkeley, CA 94720, U.S.A.; e-mail: terry.regier@berkeley.edu; (Khetarpal)
Department of Psychology, The University of Chicago, 5848 South University Avenue, Chicago,
IL, 60637, U.S.A.; e-mail: khetarpal@uchicago.edu; (Majid) Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The Netherlands; e-mail: asifa.majid@mpi.nl.
Acknowledgments: We thank Martin Haspelmath for sharing his data, and Stephen Levinson and
Sérgio Meira for sharing theirs. We also thank four anonymous reviewers for their very helpful
comments and suggestions. This work was supported by NSF under grants SBE-0541957 and
SBE-1041707, the Spatial Intelligence and Learning Center (SILC).
Supplementary Online Material
Appendix A. Python code for the network inference algorithm
Appendix B. Sample input file (Excel) for the network inference algorithm
http://www.degruyter.com/view/j/lity.2013.17.issue-00001/lity-2013-0001/lity2013-0003.xml?format=INT
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
104
Terry Regier, Naveen Khetarpal, and Asifa Majid
References
Angluin, Dana, James Aspnes & Lev Reyzin. 2010. Inferring social networks from outbreaks.
In Marcus Hutter, Frank Stephan, Vladimir Vovk & Thomas Zeugmann (eds.), Algorithmic
learning theory 2010 (Lecture Notes in Computer Science 6331), 104–118. Berlin: Springer.
Bowerman, Melissa. 1996. Learning how to structure space for language: A cross-linguistic perspective. In Paul Bloom, Mary A. Peterson, Lynn Nadel & Merrill F. Garrett (eds.) Language
and space, 385–436. Cambridge, MA: MIT Press.
Bowerman, Melissa & Eric Pederson. 1992. Topological relations picture series. In Stephen C.
Levinson (ed.), Space stimuli kit 1.2, 51. Nijmegen: Max Planck Institute for Psycholinguistics.
Bowerman, Melissa & Eric Pederson. 1993. Cross-linguistic studies of spatial semantic organization. In Annual Report of the Max Planck Institute for Psycholinguistics 1992, 53–56.
Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect,
and modality in the languages of the world. Chicago: University of Chicago Press.
Cristofaro, Sonia. 2010. Semantic maps and mental representation. Linguistic Discovery 8. 35–52.
Croft, William. 2003. Typology and universals. 2nd edn. Cambridge: Cambridge University Press.
Croft, William. 2010. What do semantic maps tell us? Linguistic Discovery 8. 53–60.
Croft, William & Keith T. Poole. 2008. Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. Theoretical Linguistics 34. 1–37.
Cysouw, Michael. 2001. Review of Haspelmath (1997). Journal of Linguistics 37. 607–612.
Cysouw, Michael. 2007. Building semantic maps: The case of person marking. In Matti Miestamo
& Bernhard Wälchli (eds.), New challenges in typology, 225–247. Berlin: Mouton de Gruyter.
Cysouw, Michael, Martin Haspelmath & Andrej L. Malchukov. 2010. Introduction to the special
issue “Semantic maps: Methods and applications”. Linguistic Discovery 8. 1–3.
Dahl, Östen. 2001. Principles of areal typology. In Martin Haspelmath, Ekkehard König, Wulf
Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals, Vol. 2,
1456–1470. Berlin: De Gruyter.
Garey, Michael & David Johnson. 1979. Computers and intractability: A guide to the theory of
NP-completeness. New York: Freeman.
Haspelmath, Martin. 1997. Indefinite pronouns. Oxford: Oxford University Press.
Haspelmath, Martin. 2003. The geometry of grammatical meaning: Semantic maps and crosslinguistic comparison. In Michael Tomasello (ed.), The new psychology of language, Vol. 2,
211–242. Mahwah, NJ: Erlbaum.
Jameson, Kimberly & Roy D’Andrade. 1997. It’s not really red, green, yellow and blue: An inquiry
into perceptual color space. In C. L. Hardin & Luisa Maffi (eds.), Color categories in thought
and language, 295–319. Cambridge: Cambridge University Press.
Kemp, Charles & Terry Regier. 2012. Kinship categories across languages reflect general communicative principles. Science 336. 1049–1054.
Lakoff, George. 1987. Women, fire, and dangerous things: What categories reveal about the mind.
Chicago: University of Chicago Press.
Levinson, Stephen, Sérgio Meira & the Language and Cognition Group (2003). ‘Natural concepts’
in the spatial topological domain – adpositional meanings in crosslinguistic perspective: An
exercise in semantic typology. Language 79. 485–516.
Majid, Asifa, James Boster & Melissa Bowerman. 2008. The cross-linguistic categorization of
everyday events: A study of cutting and breaking. Cognition 109. 235–250.
Nerlove, Sara & A. Kimball Romney. 1967. Sibling terminology and cross-sex behavior. American
Anthropologist 69. 179–187.
Regier, Terry, Paul Kay & Naveen Khetarpal. 2007. Color naming reflects optimal partitions of
color space. Proceedings of the National Academy of Sciences 104. 1436–1441.
Roberson, Debi. 2005. Color categories are culturally diverse in cognition as well as in language.
Cross-Cultural Research 39. 56–71.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Inferring semantic maps
105
Roberson, Debi, Ian Davies & Jules Davidoff. 2000. Color categories are not universal: Replications and new evidence from a Stone-age culture. Journal of Experimental Psychology:
General 129. 369–398.
Talmy, Leonard. 2000. How language structures space. In Leonard Talmy, Toward a cognitive
semantics, Vol. 1: Concept structuring systems, 177–254. Cambridge, MA: MIT Press.
Trevisan, Luca. 2004. Inapproximability of combinatorial optimization problems. Technical report
TR04-065, Electronic Colloquium on Computational Complexity.
van der Auwera, Johan & Vladimir Plungian. 1998. Modality’s semantic map. Linguistic Typology
2. 79–124.
Vazirani, Vijay. 2001. Approximation algorithms. New York: Springer.
Wälchli, Bernhard. 2010. Similarity semantics and building probabilistic semantic maps from parallel texts. Linguistic Discovery 8. 331–371.
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
Review Articles
Language typology and syntactic description
MARIA KOPTJEVSKAJA-TAMM and HENRIK LILJEGREN
Timothy Shopen (ed.), Language typology and syntactic description. Second edition.
3 volumes. Cambridge: Cambridge University Press, 2007. Vol. I: xx + 477 pages, ISBN
978-0-521-58156-1 (hardback), ISBN 978-0-521-58857-7 (paperback), ISBN 978-0511-61942-7 (online, January 2010); Vol. II: xxii + 465 pages, ISBN 978-0-521-581578 (hardback), ISBN 978-0-521-58856-0 (paperback), ISBN 978-0-511-61943-4 (online, December 2009); Vol. III: xxii + 426 pages, ISBN 978-0-521-58158-5 (hardback),
ISBN 978-0-521-58855-3 (paperback), ISBN 978-0-511-61843-7 (online, December
2009); GBP 186 (hardback, 3-volume set), 28.99 (paperback, each volume).
1.
Introduction
In the first edition of Language typology and syntactic description, appearing in
1985, the editor, Timothy Shopen, had stated his and his contributors’ purpose
in the introduction to each of the three volumes:
Our purpose has been to do a cross-linguistic survey of syntactic and morphological structure that can serve as a manual for field workers, and for anyone interested
in relating observations about particular languages to a general theory of language.
When Language typology and syntactic description saw a second edition,
22 years later, it was not quite as explicit, but the following passage from the
acknowledgments, as well as providing historical context, gave an idea of what
was to be expected:
I [Timothy Shopen] came up with the idea used to organize the first edition at a
conference on field work questionnaires held at the Center for Applied Linguistics,
Washington, DC. I said the best way to prepare for field work is to gain a good
idea of what to look for. People thought this was right so I was asked to do the
organizing. There have been surveys in the past but I believe none with this scope.
Linguistic Typology 17 (2013), 107–156
DOI 10.1515/lingty-2013-0004
1430–0532/2013/017-0107
©Walter de Gruyter
Brought to you by | Max-Planck-Gesellschaft - WIB6417
Authenticated | 195.169.108.39
Download Date | 8/23/13 8:18 AM
View publication stats