Inferring semantic maps

Naveen Khetarpal

Inferring semantic maps

2000, Linguistic Typology

Inferring semantic maps TERRY REGIER, NAVEEN KHETARPAL, and ASIFA MAJID Abstract Semantic maps are a means of representing universal structure underlying semantic variation. However, no algorithm has existed for inferring a graphbased semantic map from cross-language data. Here, we note that this open problem is formally identical to the known problem of inferring a social network from disease outbreaks. From this identity it follows that semantic map inference is computationally intractable, but that an efficient approximation algorithm for it exists. We demonstrate that this algorithm produces sensible semantic maps from two existing bodies of data. We conclude that universal semantic graph structure can be automatically approximated from crosslanguage semantic data. Keywords: algorithm, indefiniteness, semantic maps, semantics, spatial relations 1. Introduction Languages vary in their semantic categories – that is, in the range of semantic functions or uses picked out by their linguistic forms. However, many possible semantic categories are not attested, and similar categories often appear in unrelated languages. This pattern of constrained variation suggests a universal conceptual basis underlying the variation, such that different languages provide different snapshots of the same conceptual terrain. A semantic map is a means of capturing this idea, representing both presumed universal structure and language-specific partitionings of that structure. A semantic map often takes the form of a discrete graph structure (e.g., Bybee et al. 1994, van der Auwera & Plungian 1998, Haspelmath 1997). More Linguistic Typology 17 (2013), 89–105 DOI 10.1515/lingty-2013-0003 1430–0532/2013/017-089 ©Walter de Gruyter Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 90 Terry Regier, Naveen Khetarpal, and Asifa Majid predicative possessor external possessor direction recipient beneficiary purpose experiencer to judicantis Figure 1. A semantic map of typical dative functions, with the semantic range of English to shown in dotted outline. French à is similar to English to, but excludes “purpose” and includes “predicative possessor”. From Haspelmath (2003: 213). recently semantic maps based on continuous representations have also been proposed (e.g., Croft & Poole 2008; Cysouw 2001, 2007; Levinson et al. 2003; Majid et al. 2008). In both traditions, the inferred underlying structure is sometimes interpreted as capturing the conceptual similarity between different semantic functions (e.g., Croft 2003, Croft & Poole 2008); in other work, no such attribution is made, and a semantic map is viewed simply as a compact description of attested variation, leaving open the possibility that the structure of the map may reflect extra-cognitive, such as diachronic or communicative, factors (e.g., Bybee et al. 1994, Cristofaro 2010). A carefully neutral statement of the purpose of a semantic map is that it attempts to “visually represent crosslinguistic regularity in semantic structure” (Cysouw et al. 2010: 1). In this article, we use the term “semantic map” to refer specifically to graph-based maps, and we do not assume that the structure of the graph must necessarily accurately reflect cognitive reality – although we agree with Croft (2010) that it is likely to often do so. Formally, a (graph-based) semantic map is a graph in which vertices (nodes) represent semantic functions or uses, and edges (links) connect closely related semantic functions. For a given semantic map, the semantic functions and the connections between them are assumed to be universal. The meaning of a given linguistic form is then represented as a language-specific grouping of vertices into a connected region of the universal graph. An example is given in Figure 1. This semantic map, from Haspelmath (2003: 213), shows a set of typical semantic functions of the dative, and also shows the semantic range of the English word to as a connected subset of this universal graph. This to subset comprises the functions direction (e.g., She went to Philadelphia), recipient (e.g., He gave the book to his sister), experiencer (e.g., That seems loud to me), and purpose (e.g., I did it to see what Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 91 would happen). French à occupies an overlapping but distinct connected subset, and forms from other languages occupy yet other connected subsets. The semantic map connectivity hypothesis (Croft 2003: 134) is the proposal that language-specific categories will always pick out connected subsets of the graph. For example, given the semantic map in Figure 1, this hypothesis predicts that any linguistic form that expresses both recipient and purpose will also express direction, since any connected region containing both recipient and purpose must also include direction. This hypothesis captures the widely-shared intuition that linguistic categories denote connected regions of conceptual or perceptual space: cf. Nerlove & Romney’s (1967) observation that languages tend to avoid disjunctively defined kinship categories, and Roberson’s (2005) notion of “grouping by similarity” in color naming. Once a semantic map has been constructed to fit a body of cross-language data, the expectation is that new categories from asyet-unexamined languages will also pick out connected subgraphs – possibly novel connected subgraphs. A semantic map thus compactly represents what patterns of variation one may and may not expect to find in a given semantic domain, and the underlying graph has been taken to represent “a common human cognitive heritage” (Croft 2003: 139). Semantic maps have been widely used to represent cross-language semantic variation over a presumably universal base; for recent reviews see Haspelmath (2003) and Cysouw et al. (2010) plus other papers in the same volume. The task of constructing a semantic map in graph form from cross-language data is generally done by hand, and the task can be time-consuming with moderate to large-sized datasets. It would therefore be useful to automate this process; however the computational problem of inferring such a universal semantic map from cross-language data has not been formally addressed. Croft & Poole (2008) conjectured that this problem may be computationally intractable, and they considered this potential intractability to be a shortcoming of graph-based semantic maps as a representational tool in semantic typology. In contrast, a continuous map may be straightforwardly inferred from data using well-known computational techniques such as multidimensional scaling, and this fact has been held to be an advantage of continuous over graph-based representations for semantic maps (Croft & Poole 2008, Cysouw 2001, Wälchli 2010). Here, we address the semantic map inference problem in formal terms, in the previously unexplored case of graph-based semantic maps. In what follows, we first note that the semantic map inference problem is formally identical to another problem that superficially appears unrelated: inferring a social network from outbreaks of disease in a population. Angluin et al. (2010) have recently shown that this social network inference problem is computationally intractable, but that an efficient algorithm exists that approximates the optimal solution nearly as well as is theoretically possible; it follows Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 92 Terry Regier, Naveen Khetarpal, and Asifa Majid that both the computational intractability and the applicability of the approximation algorithm hold of semantic map inference. We then apply this algorithm to the cross-language data of Haspelmath (1997) on indefinite pronouns and of Levinson et al. (2003) on spatial categories, in both cases yielding sensible and useful results. We conclude that presumptively universal structure consistent with cross-language semantic data can be straightforwardly inferred, that the issue of computational intractability – while real – need not deter researchers, and that formalization of problems in semantic typology can highlight useful connections to structurally related problems elsewhere. 2. The semantic map inference problem The semantic map inference problem can be stated informally as follows. We are given a set of semantic functions or uses within a particular semantic range (e.g., recipient, purpose, direction, etc. from the range of the dative, as in Figure 1). We are also given a set of groupings of these functions into semantic categories from various languages; each such grouping picks out the semantic functions that may be expressed by a given linguistic form (e.g., the functions of English to shown in dotted outline in Figure 1). We assume that each such grouping picks out a connected region of an underlying universal network of semantic functions, but we are not given the connections of that network. Instead, we wish to infer the set of connections between semantic functions that best explains the observed groupings. This problem can be formalized as follows, illustrated in Figure 2. Given a set V of vertices (representing semantic functions), and a set of constraints Si ⊆ V (representing a set of language-specific groupings of these functions into categories), we wish to find the minimum set of edges E between the vertices of V such that each Si picks out a connected subgraph of the graph G = (V, E). By asking for the minimum set E we avoid trivial and uninformative solutions such as those in which all vertices are connected. Moreover, because edges are inferred rather than directly observed, the existence of each edge must be assumed; this means that by minimizing the number of edges, we minimize the number of assumptions made, and thus privilege parsimonious solutions to the problem. Angluin et al. (2010) treated a formally identical problem. They wished to infer a social network from observations of disease outbreaks in a population. Thus vertices V now represent people, and each constraint Si ⊆ V represents the subset of people observed to have been affected by a particular disease outbreak i. For example, a particular Si might represent the set of people observed to have caught a cold last November. Angluin et al. (2010) assumed that disease is spread by social contact, and they represented social contact between two people as an edge between the corresponding two vertices. They wished to find Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 93 Figure 2. Formalization of the semantic map inference problem. We are given a set of semantic functions (vertices V, shown as small circles), and groupings of these functions into language-specific categories (constraints Si ⊆ V , each shown by a dashed outline). We seek the minimum set of edges E (shown as links between vertices) such that each grouping picks out a connected region of the overall graph G = (V, E). the social network that could best account for the observed outbreaks – that is, the minimum set of edges E such that each constraint Si picks out a connected subgraph of the overall social graph G = (V, E). This social network inference problem is formally the same as the semantic map inference problem; therefore any formal results concerning one also apply to the other.1 (See also Dahl 2001: 1469 for a different disease analogy concerning grammaticalization.) Some problems can be shown to be computationally intractable, in the sense that it is expected that there does not exist an efficient algorithm that will always find the optimal solution (Garey & Johnson 1979). If a problem is computationally intractable in this sense, it is natural to abandon the search for an optimal solution and to ask instead whether an approximation to the optimal solution can be found efficiently. For some problems it can be shown that even this fallback goal of approximation is hard (e.g., Trevisan 2004; Vazirani 2001: 306–333), meaning that there exists a value r such that no efficient algorithm can be expected to always approximate the optimal solution to within a factor of r. Angluin et al. (2010) showed that the social network inference problem is hard to approximate in this sense; therefore the same holds of the semantic map inference problem. This result confirms Croft & Poole’s (2008) suspicion: the semantic map inference problem is indeed computationally intractable, and 1. Angluin et al. (2010) considered several variants of the social network inference problem. The specific variant to which we refer here is the one they label the “offline uniform cost network inference problem”; it corresponds to traditional graph-based semantic maps with unweighted edges. Other variants discussed by Angluin et al. (2010) are applicable to the suggestion (Cysouw 2007: 233) that edges in semantic maps may usefully be weighted, to capture how often a given pair of semantic functions co-occurs. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 94 Terry Regier, Naveen Khetarpal, and Asifa Majid moreover is hard to approximate. However, this finding leaves open the possibility that an efficient algorithm may nonetheless produce approximations that are of high enough quality to be useful. 3. The network inference algorithm Angluin et al. (2010) presented an efficient algorithm for the social network inference problem and proved that it approximates the optimal solution nearly as closely as theoretically possible. Following the statement of the inference problem above, their algorithm is given a set V of vertices (which in the case of semantic map inference represent semantic functions) and a set of constraints Si ⊆ V (which in the case of semantic maps represent a set of language-specific groupings of these functions into categories). It begins with no edges E between the vertices. It then introduces edges one by one in order of their utility (specified below), until each constraint Si picks out a connected region of the overall graph. Informally, the utility or usefulness of a proposed edge is the extent to which it contributes to the overall goal of the algorithm, namely a graph in which each constraint Si picks out a connected region. For example, in Figure 2, it is visually clear that the already-inserted edge in the upper right portion of the graph (call it e) contributes to the connectedness of two constraints, whereas other already-inserted edges and other possible edges (not shown) each contribute to the connectedness of one constraint or no constraints. For this reason, beginning with no edges at all, e would have the highest utility and would be the first edge to be introduced. This informal notion is captured formally by Angluin et al. (2010) by relying on the notion of a connected component. A connected component of a graph is a maximal connected subgraph – that is, a connected subgraph to which no further vertices may be added without losing this connectedness. Consider again the graph in Figure 2. Prior to any edges having been inserted, the initial graph (consisting only of vertices) would have had four connected components, one corresponding to each vertex. The same graph but with only the aboveidentified edge e inserted, and no other edges, would have three connected components: one component consisting of e and the two vertices it connects, and one component for each of the two remaining vertices. Finally, the graph as it is shown contains just one connected component, because the graph as a whole is connected. With this by way of background, the Angluin et al. (2010) algorithm operates as follows. Let si denote the subgraph of G = (V, E) that is picked out (induced) by constraint Si , and let ncci denote the number of connected components within si . When there are no edges connecting the vertices of si , ncci equals the number of vertices in si ; this is its maximum possible value. When si is connected, ncci Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 95 equals 1, its minimum possible value. In general, the lower the value of ncci , the closer constraint Si is to being satisfied, i.e., the closer the subgraph induced by Si is to being connected. Let C be an objective function defined as: C = ∑(1 − ncci) i The algorithm begins with an empty edge set E. This yields a strongly negative value for C (except in the trivial case in which each constraint contains only one vertex). The algorithm then adds to E the edge that yields the greatest increase2 in C. This steepest ascent step is repeated until all constraints are satisfied, i.e., until C = 0. Python code implementing this algorithm can be found in Appendix A (and a sample input file in Appendix B) in the Supplementary Online Material. Because this is an approximation algorithm, it is not guaranteed to find the optimal solution to a given instance of the problem. For our purposes, the relevant question is whether the degree of approximation attained by this algorithm is adequate to produce high-quality semantic maps from cross-language data. We will consider a map to be high-quality if it is relatively parsimonious – i.e., if it accommodates the data using few edges – and we leave for future work the exploration of other criteria of success, e.g., correctly inferring independently known cognitive or diachronic connections. With this parsimony criterion in mind, we turn now to test the algorithm empirically, against two well-established bodies of such data. 4. Indefinite pronouns Haspelmath (1997) examined the semantic uses of indefinite pronouns, such as anybody, someone, and semantically related forms in other languages, through a large-scale cross-language study. His primary database contained 140 semantic categories, each associated with a linguistic form, from a total of 40 languages. This database is presented in full in his 1997 book. Each category picked out some subset of the following nine semantic functions, illustrated below with examples from Haspelmath (1997): (i) specific, known to speaker: Somebody called while you were away: guess who! (ii) specific, unknown to speaker: I heard something, but I couldn’t tell what kind of sound it was. (iii) non-specific, irrealis: Please try somewhere else. 2. There may be instances in which more than one edge yields the same maximal increase in C. In such circumstances, the choice between these possibilities is not specified by the algorithm statement given here, and our implementation chooses among these possibilities arbitrarily, by the order in which edges are considered. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 96 Terry Regier, Naveen Khetarpal, and Asifa Majid (iv) (v) (vi) polar question: Did anybody tell you anything about it? conditional protasis: If you see anything, tell me immediately. standard of comparison: In Freiburg, the weather is nicer than anywhere in Germany. (vii) direct negation: Nobody knows the answer. (viii) indirect negation: I don’t think that anybody knows the answer. (ix) free choice: Anybody can solve this simple problem. For example, the English form someone can serve the following five semantic functions: specific known, specific unknown, irrealis, question, and conditional. Based on the cross-language database, Haspelmath (1997: 64) constructed the semantic map shown in Figure 3. Each of the categories in his 40-language database corresponds to a connected subgraph of this graph, and the expectation is that the same will hold for forms from languages not yet examined. Croft & Poole (2008) re-examined Haspelmath’s (1997) 40-language database, and concluded that the edge from irrealis non-specific to conditional is not necessary – that is, that the connectedness of each category in the database can be maintained without this edge. They took this finding to support their argument that “the best conceptual space is not easy to find by hand” (Croft & Poole 2008: 6), and concluded that the absence of an automated method for inferring semantic maps from data is a potentially serious limitation. We ran Angluin et al.’s (2010) algorithm on Haspelmath’s 40-language database, which he kindly shared with us in electronic form, and obtained the semantic map suggested by Croft & Poole’s observation – that is, the same map as Haspelmath’s minus the one disputed edge. Thus this algorithm, and Croft & Poole, have found a simpler map than that provided by Haspelmath. Moreover, this simpler map is guaranteed by the algorithm to be sufficient to account for the 40-language sample. Whether this simpler map will also account for further data remains an open question. Haspelmath (1997: 64) states that his map was based both on the 40-language sample and on some data specific known specific unknown question indirect negation direct negation conditional comparative free choice irrealis non-specific Figure 3. A semantic map for indefinite pronouns, adapted from Haspelmath (1997: 64). The dashed edge from “irrealis non-specific” to “conditional” is included by Haspelmath, but not by Angluin et al.’s (2010) network inference algorithm. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 97 beyond it, so it is possible that the disputed edge is necessitated by data outside the sample. However, whatever the outcome of that question, the present study demonstrates that Angluin et al.’s (2010) algorithm produces output that is comparable in quality (parsimony) with an influential published semantic map, and thus establishes the usefulness of this algorithm as a means for inferring universal structure from cross-language data. 5. Spatial categories Having tested the algorithm against a dataset that covers a small number of semantic functions or uses, we wished to further assess it using a dataset that covers a greater number. This would be very time-consuming to do by hand; it is presumably for this reason that most published semantic maps are small. We had two specific goals. The first was to determine whether the structure produced by the algorithm over this more complex domain was intuitively sensible. The second goal was to determine whether the inferred structure would accommodate data from a language other than those considered in building the map – that is, whether the structure inferred by the algorithm would generalize beyond the training set. We conducted this test in the semantic domain of spatial relations. Spatial categories across languages show both universal tendencies and cross-language differences, as illustrated in Figure 4 and supported in greater detail by Bowerman (1996), Levinson et al. (2003), and Talmy (2000), among others. This mixture of universals and variation seems in principle capturable in terms of a semantic map – and indeed it has been captured in terms of continuous maps Figure 4. Universal tendencies and variation in spatial categorization. The four spatial relations in the left panel all fall in the same category in English (in), and also all fall in a single category in Dutch and in Yélî-Dnye. The four spatial relations in the right panel all fall in the same category in English (on; long dashed outline), but they are categorized differently in Dutch (solid outlines) and Yélî-Dnye (short dashed outlines). Based on the spatial dataset we treat in this article. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 98 Terry Regier, Naveen Khetarpal, and Asifa Majid (e.g., Croft & Poole 2008, Levinson et al. 2003). We sought to accommodate the same data using a large-scale automatically constructed graph-based map. We relied on an existing dataset of cross-language spatial naming data, based on 71 pictures portraying simple spatial relations. These stimuli were originally designed by Bowerman & Pederson (1992, 1993); the scenes in Figure 4 are adapted from scenes in this set. Levinson et al. (2003) analyzed the spatial terms applied to these pictured spatial relations by speakers of nine unrelated languages: Basque, Dutch, Ewe, Lao, Lavukaleve, Tiriyó, Trumai, Yélî-Dnye, and Yukatek. They describe the spatial naming data elicitation technique as follows (Levinson et al. 2003: 487): Each picture has a designated figure (or theme or trajector) colored yellow, and a ground object (or relatum or landmark), and the researcher uses the pictures to set up a verbal scenario as close as culturally possible to that depicted, and asks the consultant to answer a question of the form: ‘Where is the [Figure]?’ (given the sketched scenario). Levinson and Meira kindly shared with us the spatial naming data they had available, resulting from elicitation sessions with speakers of the above nine languages, against the 71 scenes described above. We took these data as our dataset. Our treatment of the data followed theirs as closely as possible. They describe their data treatment as follows (Levinson et al. 2003: 503): [E]ach language was treated on its own. An average of the consultants’ responses was calculated: for the languages with many consultants [. . . ] a picture was ascribed to a certain adposition when more than 50 % of the consultants used it; for languages with four or five consultants, a picture was ascribed to a certain adposition if at least two of them used it; for the languages with three or fewer consultants, a picture was ascribed to a certain adposition if any of the consultants used it. We followed this procedure for those 7 of the 9 languages for which data from individual speakers were currently available. However, for the remaining two languages, Ewe and Yukatek, data from individual speakers were not available, and thus we could not follow the above procedure. For these two languages only, we instead used summary data provided by Sérgio Meira. A natural means of assessing a semantic map is to first construct the map based on data from one set of languages (which may be considered the training set), and to then see whether the resulting map also accommodates data from other languages (the test set). By the definition of the semantic map inference problem, the categories in the training set are guaranteed to pick out connected subgraphs of the resulting map; what is not known is whether the categories in Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 99 the test set will as well. They should, to the extent that the inferred structure accurately reflects universal constraints on semantic variation. We took the data from the nine languages in the dataset to be our training set, and we took the spatial terms of English, applied to the same stimuli, to be our test set. The three authors, all native speakers of English, each independently named each of the scenes in English. A scene was assigned to an English spatial term when at least two of the three authors used the term to name that scene. We obtained a semantic map from the training set via Angluin et al.’s (2010) network inference algorithm. The results are shown in Figure 5. Edges tend to connect closely conceptually related scenes; thus it appears that the inferred structure is intuitively sensible, at least on informal inspection. We then asked whether the spatial naming system of English was compatible with this map – that is, whether the spatial categories of English pick out connected subgraphs of the overall graph. Figure 5 shows that they do for all categories but one. There are four classes of English spatial category in this figure, distinguished by their outlines. The categories shown in light dotted outline (against, behind, in front of, through) contain only one scene each and are thus uninformative about connectivity. The categories shown in dashed outline (next to, under) correspond to connected subgraphs – that were also present in the training set (associated with other, non-English, forms) and that are therefore necessarily connected in this map. The categories shown in solid outline (around, in, on) are informative: these categories are not present in the training data, and are nonetheless connected – thus they confirm a prediction implicitly made by the structure of the semantic map concerning what categories one may expect to find beyond the training data. Finally, the one category shown in dotted-anddashed outline (over) is not present in the training data, and is not connected in this map – it thus violates the prediction that novel categories should conform to the induced structure. How are we to evaluate these results? Unlike the case of indefinite pronouns discussed above, in the case of the spatial dataset there is no human-generated graph-based semantic map to which we may compare our results; they must be evaluated in their own terms. Strictly speaking, the model has failed to accommodate all the English data. At the same time, it has succeeded in accommodating almost all the data. What we need, rather than a categorical designation of success or failure, is a quantitative measure of degree of fit. There is no standard measure of degree of fit for graph-based semantic maps (Cysouw 2007: 228), so we propose our own. Specifically, we use the objective function C described in Section 3 above. That function reaches its maximum value of 0 when each semantic category in the data corresponds to a connected region of the network. The extent to which C is less than 0 measures how far a given semantic map is from fitting the data perfectly. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 100 Terry Regier, Naveen Khetarpal, and Asifa Majid AGAINST ON OVER BEHIND NEXT TO IN FRONT OF AROUND UNDER THROUGH IN Figure 5. A semantic map of spatial meanings, obtained from Levinson et al.’s (2003) spatial language data. Spatial semantic categories of English are shown as outlined regions of this map. Dotted outline = singleton category; dashed outline = category present in the training data; solid outline = novel connected category; dotted-anddashed outline = novel unconnected category. A higher-resolution version of this figure is available at http:// linguistics.berkeley.edu/ ~regier/ semantic-maps/ . In the case of the semantic map shown in Figure 5, tested against the English data shown in that figure, C = −1. This is close to the ideal of 0, but that fact leaves important questions unresolved: Is it in any sense surprising or informative that the map fits the English data to the degree that it does? Would any system of categories of complexity comparable to English fit the semantic map as well as this? Or does English fit the structure of the semantic map significantly better than other systems of comparable complexity would? We sought to answer these questions through a permutation test, as follows. We began with the semantic map shown in Figure 5, in which each scene is labeled with an English spatial term. We then considered hypothetical variants that retained the same network structure, the same number of English cate- Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 101 gories, and the same number of scenes per category – but randomly reassigned which English labels were assigned to which scenes.3 Thus we consider a space of possible naming systems that are of complexity comparable to English, but that differ in the ways they partition the spatial semantic map. We sampled 105 (100,000) such hypothetical systems, without replacement, and measured C for each. We found that the value of C obtained from the actual English data shown in Figure 5 was higher than for any of these hypothetical systems (min = −46, max = −11). We conclude that the English system fits the structure of the network better than do hypothetical systems of comparable complexity. Importantly, if the semantic map had been very densely connected, rearrangements of the labels should not have affected the degree of fit very much because most possible categories in such a map would be connected. Thus the present outcome suggests that the inferred semantic map of Figure 5 provides a description of the data that is sparse (parsimonious) enough, and thus constrained enough, that it accommodates attested data better than it does comparable hypothetical data. The semantic map of Figure 5 is based on a small set of languages, against a larger set of stimuli than is common. The map’s approximation to universal structure is presumably correspondingly loose – as is suggested by its imperfect fit to a novel language, English. A more complete test of these ideas will require a larger set of crosslinguistic data. Nonetheless, these results do show that the network inference algorithm can produce interpretable and intuitively reasonable semantic maps with a large number of vertices, that at least some of the predictions the resulting map makes about categories from new languages are supported, and that the resulting map is relatively parsimonious. These findings support the proposal (e.g., Croft & Poole 2008, Levinson et al. 2003) that a universal representation may underlie the substantial cross-language variation in spatial semantic systems. These results also raise a more general theoretical question, concerning the adequacy of connectedness as a constraint on semantic categories. The map in Figure 5 supports the categories in the training set, and most of those in the test set, as connected regions – but it also supports many other connected regions that seem implausible as semantic categories. For example, one may trace an elongated connected region that starts at one corner of the figure and extends in a chain to the opposite corner, picking out a series of connected scenes that each seem conceptually related to their immediate neighbors in the chain, but 3. The procedure we used to create each such hypothetical variant is as follows. Randomly select one of the English spatial terms – call the number of scenes associated with this term k. Then select k random scenes and group them into a category. Continue by selecting another English term and creating the next category from the set of as-yet-uncategorized scenes. Repeat this until all scenes are categorized and all English terms have been selected. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 102 Terry Regier, Naveen Khetarpal, and Asifa Majid that do not hang together as a whole, and that exclude other conceptually related scenes. Categories do often pick out short chains of related meanings (e.g., Lakoff 1987). For example, Bowerman & Pederson (1993) have identified an apparently universal sequencing or chain of spatial meanings (a subset of those meanings explored here), ranging from in to on, such that spatial terms from different languages pick out different subchains of the overall chain; this is effectively a semantic map in the form of a chain, with spatial terms picking out subchains. But these were relatively short chains of meaning, and it seems counterintuitive that a category would have the extremely high degree of elongation shown by the chain imagined above, in the context of Figure 5, without any coherent and reasonably compact core or central region. Thus, these results underscore the previously-noted fact that connectedness appears to be too loose a constraint on category shape (Croft 2003: 138, Cysouw 2001: 609), and that categories may tend to be more compact and coherent than is suggested by this constraint alone. This question mirrors a debate in the literature on color naming, over whether color terms pick out merely connected regions of perceptual color space, that might exhibit high degrees of chaining or elongation (Roberson et al. 2000, Roberson 2005) or regions that are both connected and compact (Jameson & D’Andrade 1997). Although the question is implicit in the use of connectedness as a constraint in semantic maps generally, it becomes especially prominent given large maps such as that in Figure 5 – and the creation of such maps is facilitated by the availability of an algorithm for inferring such maps from data. 6. General discussion We have seen that the problem of inferring presumptively universal structure from cross-language semantic data is formally identical to the problem of inferring a social network from disease outbreaks in a population. From this identity it follows that semantic map inference is computationally intractable, confirming an earlier conjecture to this effect. However it also follows that an existing approximation algorithm for social network inference may be applied to linguistic data, and we have seen that this algorithm yields sensible results when applied to two cross-language datasets of semantic categories. Several questions are left open by these findings. It is unclear how well this algorithm, or any approximation algorithm that may be proposed to replace it, will perform on other datasets. It is also unclear which semantic domains, and which questions within these domains, are best approached using graphbased semantic maps, rather than another means of inferring the universal bases of semantic variation – for example, continuous representations such as those produced by multi-dimensional scaling and similar procedures (e.g., Cysouw 2001, Croft & Poole 2008, Levinson et al. 2003, Majid et al. 2008). Finally, the Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 103 present results highlight the possibility that connectedness may be too loose a constraint on category shape, but they do not determine how best to address this shortcoming: whether it is preferable to supplement connectedness by further constraints (e.g., Croft 2003: 138), to use weighted edges that reflect the frequency with which pairs of semantic functions co-occur (Cysouw 2007: 233), or to pursue a different account altogether, such as the view that semantic systems across languages reflect the need for informative communication (e.g., Jameson & D’Andrade 1997, Kemp & Regier 2012, Regier et al. 2007). Settling these open questions will require further investigation. Nonetheless, two broad conclusions can be drawn. First, high-quality (that is, relatively parsimonious) semantic maps can be efficiently inferred from cross-language data, and the question of computational tractability should therefore not be viewed as an obstacle to using them. Second, and more generally, these results suggest that the formalization of problems in semantic typology can lead to insight from structurally similar problems in unrelated domains. Received: 16 July 2011 Revised: 25 September 2012 University of California, Berkeley University of Chicago Max Planck Institute for Psycholinguistics Correspondence addresses: (Regier) Department of Linguistics, 1203 Dwinelle Hall, University of California Berkeley, Berkeley, CA 94720, U.S.A.; e-mail: terry.regier@berkeley.edu; (Khetarpal) Department of Psychology, The University of Chicago, 5848 South University Avenue, Chicago, IL, 60637, U.S.A.; e-mail: khetarpal@uchicago.edu; (Majid) Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The Netherlands; e-mail: asifa.majid@mpi.nl. Acknowledgments: We thank Martin Haspelmath for sharing his data, and Stephen Levinson and Sérgio Meira for sharing theirs. We also thank four anonymous reviewers for their very helpful comments and suggestions. This work was supported by NSF under grants SBE-0541957 and SBE-1041707, the Spatial Intelligence and Learning Center (SILC). Supplementary Online Material Appendix A. Python code for the network inference algorithm Appendix B. Sample input file (Excel) for the network inference algorithm http://www.degruyter.com/view/j/lity.2013.17.issue-00001/lity-2013-0001/lity2013-0003.xml?format=INT Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM 104 Terry Regier, Naveen Khetarpal, and Asifa Majid References Angluin, Dana, James Aspnes & Lev Reyzin. 2010. Inferring social networks from outbreaks. In Marcus Hutter, Frank Stephan, Vladimir Vovk & Thomas Zeugmann (eds.), Algorithmic learning theory 2010 (Lecture Notes in Computer Science 6331), 104–118. Berlin: Springer. Bowerman, Melissa. 1996. Learning how to structure space for language: A cross-linguistic perspective. In Paul Bloom, Mary A. Peterson, Lynn Nadel & Merrill F. Garrett (eds.) Language and space, 385–436. Cambridge, MA: MIT Press. Bowerman, Melissa & Eric Pederson. 1992. Topological relations picture series. In Stephen C. Levinson (ed.), Space stimuli kit 1.2, 51. Nijmegen: Max Planck Institute for Psycholinguistics. Bowerman, Melissa & Eric Pederson. 1993. Cross-linguistic studies of spatial semantic organization. In Annual Report of the Max Planck Institute for Psycholinguistics 1992, 53–56. Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect, and modality in the languages of the world. Chicago: University of Chicago Press. Cristofaro, Sonia. 2010. Semantic maps and mental representation. Linguistic Discovery 8. 35–52. Croft, William. 2003. Typology and universals. 2nd edn. Cambridge: Cambridge University Press. Croft, William. 2010. What do semantic maps tell us? Linguistic Discovery 8. 53–60. Croft, William & Keith T. Poole. 2008. Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. Theoretical Linguistics 34. 1–37. Cysouw, Michael. 2001. Review of Haspelmath (1997). Journal of Linguistics 37. 607–612. Cysouw, Michael. 2007. Building semantic maps: The case of person marking. In Matti Miestamo & Bernhard Wälchli (eds.), New challenges in typology, 225–247. Berlin: Mouton de Gruyter. Cysouw, Michael, Martin Haspelmath & Andrej L. Malchukov. 2010. Introduction to the special issue “Semantic maps: Methods and applications”. Linguistic Discovery 8. 1–3. Dahl, Östen. 2001. Principles of areal typology. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals, Vol. 2, 1456–1470. Berlin: De Gruyter. Garey, Michael & David Johnson. 1979. Computers and intractability: A guide to the theory of NP-completeness. New York: Freeman. Haspelmath, Martin. 1997. Indefinite pronouns. Oxford: Oxford University Press. Haspelmath, Martin. 2003. The geometry of grammatical meaning: Semantic maps and crosslinguistic comparison. In Michael Tomasello (ed.), The new psychology of language, Vol. 2, 211–242. Mahwah, NJ: Erlbaum. Jameson, Kimberly & Roy D’Andrade. 1997. It’s not really red, green, yellow and blue: An inquiry into perceptual color space. In C. L. Hardin & Luisa Maffi (eds.), Color categories in thought and language, 295–319. Cambridge: Cambridge University Press. Kemp, Charles & Terry Regier. 2012. Kinship categories across languages reflect general communicative principles. Science 336. 1049–1054. Lakoff, George. 1987. Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press. Levinson, Stephen, Sérgio Meira & the Language and Cognition Group (2003). ‘Natural concepts’ in the spatial topological domain – adpositional meanings in crosslinguistic perspective: An exercise in semantic typology. Language 79. 485–516. Majid, Asifa, James Boster & Melissa Bowerman. 2008. The cross-linguistic categorization of everyday events: A study of cutting and breaking. Cognition 109. 235–250. Nerlove, Sara & A. Kimball Romney. 1967. Sibling terminology and cross-sex behavior. American Anthropologist 69. 179–187. Regier, Terry, Paul Kay & Naveen Khetarpal. 2007. Color naming reflects optimal partitions of color space. Proceedings of the National Academy of Sciences 104. 1436–1441. Roberson, Debi. 2005. Color categories are culturally diverse in cognition as well as in language. Cross-Cultural Research 39. 56–71. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Inferring semantic maps 105 Roberson, Debi, Ian Davies & Jules Davidoff. 2000. Color categories are not universal: Replications and new evidence from a Stone-age culture. Journal of Experimental Psychology: General 129. 369–398. Talmy, Leonard. 2000. How language structures space. In Leonard Talmy, Toward a cognitive semantics, Vol. 1: Concept structuring systems, 177–254. Cambridge, MA: MIT Press. Trevisan, Luca. 2004. Inapproximability of combinatorial optimization problems. Technical report TR04-065, Electronic Colloquium on Computational Complexity. van der Auwera, Johan & Vladimir Plungian. 1998. Modality’s semantic map. Linguistic Typology 2. 79–124. Vazirani, Vijay. 2001. Approximation algorithms. New York: Springer. Wälchli, Bernhard. 2010. Similarity semantics and building probabilistic semantic maps from parallel texts. Linguistic Discovery 8. 331–371. Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM Review Articles Language typology and syntactic description MARIA KOPTJEVSKAJA-TAMM and HENRIK LILJEGREN Timothy Shopen (ed.), Language typology and syntactic description. Second edition. 3 volumes. Cambridge: Cambridge University Press, 2007. Vol. I: xx + 477 pages, ISBN 978-0-521-58156-1 (hardback), ISBN 978-0-521-58857-7 (paperback), ISBN 978-0511-61942-7 (online, January 2010); Vol. II: xxii + 465 pages, ISBN 978-0-521-581578 (hardback), ISBN 978-0-521-58856-0 (paperback), ISBN 978-0-511-61943-4 (online, December 2009); Vol. III: xxii + 426 pages, ISBN 978-0-521-58158-5 (hardback), ISBN 978-0-521-58855-3 (paperback), ISBN 978-0-511-61843-7 (online, December 2009); GBP 186 (hardback, 3-volume set), 28.99 (paperback, each volume). 1. Introduction In the first edition of Language typology and syntactic description, appearing in 1985, the editor, Timothy Shopen, had stated his and his contributors’ purpose in the introduction to each of the three volumes: Our purpose has been to do a cross-linguistic survey of syntactic and morphological structure that can serve as a manual for field workers, and for anyone interested in relating observations about particular languages to a general theory of language. When Language typology and syntactic description saw a second edition, 22 years later, it was not quite as explicit, but the following passage from the acknowledgments, as well as providing historical context, gave an idea of what was to be expected: I [Timothy Shopen] came up with the idea used to organize the first edition at a conference on field work questionnaires held at the Center for Applied Linguistics, Washington, DC. I said the best way to prepare for field work is to gain a good idea of what to look for. People thought this was right so I was asked to do the organizing. There have been surveys in the past but I believe none with this scope. Linguistic Typology 17 (2013), 107–156 DOI 10.1515/lingty-2013-0004 1430–0532/2013/017-0107 ©Walter de Gruyter Brought to you by | Max-Planck-Gesellschaft - WIB6417 Authenticated | 195.169.108.39 Download Date | 8/23/13 8:18 AM View publication stats

Log In

Inferring semantic maps

Related papers

Related papers

Related topics