Freek Van de Velde (°1979) got his MA in Germanic Language Studies from the University of Leuven in 2001. He defended his PhD on the history of the noun phrase in Dutch in 2007 and worked subsequently on several postdoctoral projects funded by Lessius Research Fund, Leuven BOF Research Fund and FWO, and taught a broad range of courses, mainly in the field of language variation and change, at the KU Leuven, University of Gent, Universität zu Köln, and the Westfälische Wilhelms-Universität Münster. Since 2014 he is employed as junior research professor for Dutch linguistics and historical linguistics at the KU Leuven. Full cv: https://tinyurl.com/yc95287s
Journal of Research Design and Statistics in Linguistics and Communication Science, 2019
Within usage-based theory, notably in construction grammar though also elsewhere, the role of the... more Within usage-based theory, notably in construction grammar though also elsewhere, the role of the lexicon and of lexically-specific patterns in morphosyntax is well recognized. The methodology, however, is not always sufficiently suited to get at the details, as lexical effects are difficult to study under what are currently the standard methods for investigating grammar empirically. In this short article, we propose a method from machine learning: regularized regression (Lasso) with k-fold cross-validation, and compare its performance with a Distinctive Collexeme Analysis.
Construction grammar organizes its basic elements of description, its constructions, into network... more Construction grammar organizes its basic elements of description, its constructions, into networks that range from concrete, lexically-filled constructions to fully schematic ones, with several levels of partially schematic constructions in between. However, only few corpus studies with a constructionist background take this multi-level nature fully into account. In this paper, we argue that understanding language variation can be advanced considerably by systematically formulating and testing hypotheses at various levels in the constructional network. To illustrate the approach, we present a corpus study of the Dutch naar-alternation. It is found that this alternation primarily functions at an intermediate level in the constructional network.
Verslagen en Mededelingen van de KANTL 130(1): 5-23, 2020
In this opinion article we look at what the field of linguistics has achieved over the past decad... more In this opinion article we look at what the field of linguistics has achieved over the past decades. We see four major breakthroughs, but it is striking that those breakthroughs either do not bring real new ideas, but rather refer back to old views, or concern purely methodological innovations. And if there is real innovation, it turns out to be brought by outsiders: psychologists, anthropologists, neuroscientists, computer scientists and physicists, who have become interested in language. Linguistics in the classical sense is in crisis. This crisis does not come out of the blue, but is the result of the end of the Renaissance, a culture period in which language took central stage, and had already been predicted by Jürgen Trabant (2003). All this does not entail that current-day research into language is uninteresting. Turning away from a Renaissance perspective and joining forces with other scientific fields brings advantages: less inbreeding and theory-internal stories, as well as an open-minded view on old issues.
Historical linguistics has witnessed an upsurge in quantitative corpus studies. The bulk of these... more Historical linguistics has witnessed an upsurge in quantitative corpus studies. The bulk of these studies involve the use of regression modelling. We point out a number of potential problems with this approach, and offer an alternative. For a multi-state language change, we propose a Markov model in continuous time. The major advantage of this technique, which has been used in medical contexts, is that it is especially geared towards dealing with time as a variable of interest, while it still allows one to look at the effect of several covariates. In this proof-of-concept article, we look at morphological shifts in preterites in Dutch, from 800AD to 2000AD (n = 14,314). This is a well-researched field, allowing us to investigate the performance of the multi-state Markov model.
Grammaticalization has proven to be an insightful approach to semantic-morphosyntactic change wit... more Grammaticalization has proven to be an insightful approach to semantic-morphosyntactic change within and across languages. Many studies, however, rely on assessing the large, obvious differences before and after the change. When investigating burgeoning or ongoing grammaticalization processes, it is notably harder to objectively measure the degree of grammaticalization. One approach is to gauge changes in the well-known 'parameters' of Lehmann, Hopper, and Himmelmann, but this approach is often qualitatively oriented. Quantitative studies mainly rely either on token frequency of a construction, assuming that grammaticalization is accompanied by a frequency increase, or by tracing the development of two competing constructions, looking at the proportion of their respective token frequencies. In this article, we argue for a wider range of quantitative measures, beyond token frequency, as dependent variables. We will show that these measures can jointly point to subtle ongoing grammaticalization. As a case study, we will focus on Dutch binominals with soort 'sort', a core member of the much-discussed sort-kind-type (SKT) construction in the languages of Europe. Based on a large dataset of over 14,000 instances from the period between 1850 and 1999, we investigate quantitatively measurable changes in the construction's surface behavior (i.e., the gradual loss of the relator van 'of' and increasing restrictions on premodification and pluralization, pointing to a process of 'decategorialization'). In addition, we will also use Gries's deviation of proportions (DP) to gauge the dispersion of soort, a valuable but under-used metric in quantitative studies of grammaticalization.
In every-day language use, two or more structurally unrelated constructions may occasionally give... more In every-day language use, two or more structurally unrelated constructions may occasionally give rise to strings that look very similar on the surface. As a result of this superficial resemblance, a subset of instances of one of these constructions may deviate in the probabilistic preference for either of several possible formal variants. This effect is called ‘constructional contamination’, and was introduced in Pijpops and Van de Velde (2016). Constructional contamination bears testimony to the hypothesis that language users do not always execute a full parse of the utterances they interpret, but instead often rely on ‘shallow parsing’ and the storage of large, unanalyzed chunks of language in memory, as proposed in Ferreira, Bailey and Ferraro (2002), Ferreira and Patson (2007) and Dąbrowska (2014). Pijpops and Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive, but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.
Grammaticalization research has led to important insights into the driving processes of innovatio... more Grammaticalization research has led to important insights into the driving processes of innovation and propagation. Yet what has generally been lacking is a principled way of analyzing their interaction. Research into innovation focuses on the role of individual language users and tends to take a more qualitative approach, while propagation is typically studied in terms of the community grammar and tends to be more statistically driven. We propose an approach that bridges the two. Drawing on a much larger historical data set than is commonly done, our study shows how a high-resolution analysis of semantic and morphosyntactic behavior can be married to statistics, resulting in a method that measures the degree of grammaticalization at the level of single attestations. We apply this method to the early grammaticalization of be going to inf, showing how a communal increase breaks down into different rates of change in the run-up to, the middle of, and right after conventionalization. Additionally, we trace lifespan change of individual authors longitudinally. While not robustly in evidence, there are hints of postadolescence reanalysis in the run-up generation, and of increased realization of innovative features in the middle generation.
Structuralism and formal grammar have, in the course of the 20 th century, rightfully taken issue... more Structuralism and formal grammar have, in the course of the 20 th century, rightfully taken issue with more vague and unfalsifiable just-so stories of some of their predecessors. For all its merits, though, the structuralist-formal strand of linguistics has its drawbacks as well. The classical Saussurean distinction between synchrony and diachrony can be harmful: a purely synchronic description is often inferior to the insight gained from diachrony, not only because grammar is laden with heirlooms and débris of prior structures and because languages draw on a wide variety of pathways to generate new grammar, but also because variation can often only be understood fully in the light of its history. Many cases of synchronic variation are the result of competition between an innovative mutant encroaching on an obsolescent construction. In such cases, the synchronic skew in the proportion of one variant to the other is not arbitrary, but is a reflection of how far the change has progressed. To the extent that one wants to incorporate variation in grammatical description – and there are sound theoretical reasons to do so – the historical perspective is indispensable. In this article four case studies from different corners of Dutch grammar are discussed (on cardinal numerals, on the Big Mess Construction, on bare infinitive complements of auxiliaries, and on the hortative). The case studies together form a plea for the historisation of the science of linguistics, just like biology has been historised, and indeed, as is shown in this article, there are numerous parallels between linguistics and biology.
This paper inquires into the external possessor in West Germanic and Romance. Against other accou... more This paper inquires into the external possessor in West Germanic and Romance. Against other accounts in the literature, it argues that the distribution of the dative external possessor can be explained neither by reference to Standard Average European nor by direct substrate influence. Instead, it argues that its diachronic decline is better explained as the result of increased configurationality or a tighter structure of the noun phrase. Although the emergence of a tight NP structure may itself be traced back to language contact factors, substrate influence on the diachrony of the external possessor is shown to be more indirect than what is suggested in the literature. The increase in configurationality can be considered a case of constructional grammaticalization (i.e. constructionalization), as the slots for determination and modification become progressively more fixed. One of the main claims here is that this grammaticalization process proceeds at different rates in cognate languages.
In this article, we introduce the effect of " constructional contamination ". In constructional c... more In this article, we introduce the effect of " constructional contamination ". In constructional contamination, a subset of the instances of a target construction deviate in their realization, due to a superficial resemblance they share with instances of a contaminating construction. We claim that this contaminating effect bears testimony to the hypothesis that language users do not always execute a full parse while interpreting and producing sentences. Instead, they may rely on what has been called " shallow parsing " , i. e., chunking the utterances into large, unanalyzed exemplars that may extend across constituent borders. We propose several measures to quantify constructional contamination in corpus data. To evaluate these measures, the Dutch partitive genitive is taken under scrutiny as a target construction of constructional contamination. In this case study, it is shown that neighboring constructions play a crucial role in determining the presence or absence of the-s suffix among instances of the partitive genitive. The different measures themselves, however, are not construction specific , and can readily be used to track constructional contamination in other case studies as well.
This study applies the methodology described by Gries & Deshors (2014) within the framework of th... more This study applies the methodology described by Gries & Deshors (2014) within the framework of the Contrastive Interlanguage Analysis (Granger, 1996) to the partitive genitive inflection in post-quantifier adjectives in the Moroccan Dutch ethnolect. This implies fitting a logistic regression model on data from the complementary ConDiv and Moroccorp corpora to investigate the differences between the L1 variety and the (early L2/2L1) ethnolect variety. It was found that the Moroccan Dutch language users do not differ from 'ordinary' Dutch language users in the realisation of the partitive genitives suffix, neither through an outspoken preference for one of the inflectional variants, nor in the factors determining the alternation. This is considered a rather surprising result, as such differences do exist for a number of other grammatical phenomena (Cornips and Rooij, 2003; Van de Velde and Weer-man, 2014). This finding can tell us something about the inflectional status of the partitive genitive. It appears that it is less non-transparent than other quirks in adjectival inflection.
The verbal weak inflection, one of the defining innovations of Proto-Germanic, currently holds a ... more The verbal weak inflection, one of the defining innovations of Proto-Germanic, currently holds a dominant position in the verbal inventory of most remaining Germanic languages. This has not always been the case, though. This paper investigates how the weak inflection could have grown to overthrow its competitor, the strong inflection, even if (i) the strong inflection was still more regular and (ii) the weak inflection had to start from a position vastly inferior in frequency to any strong ablaut class. As opposed to earlier work, which focused on language acquisition in models of iterated learning, our focus lies on language usage, which is why we have composed an agent-based model. This enabled us to test a number of minimal assumptions needed to explain an ascent of the weak inflection, of which several have been proposed in the literature. It was found that the weak inflection's functional advantage of general applicability is sufficient by itself already. That is, the weak dental suffix is in principle applicable to all verbs, while each separate strong ablaut class is not. This is shown to put the strong inflection at a crucial disadvantage, even if (i) the strong system as a whole is applicable to all verbs, and (ii) each separate ablaut class starts out as dominant in both type and token frequency over the weak dental suffix. There is no need to assume that the strong system has irregularized for the weak inflection to get airborne; this irregularization may rather be the result and subsequent catalyst of the rise of the weak inflection.
Corpus study of patterns of (semi-)autonomous dat ‘that’ subordination in Dutch.Discovery of a nu... more Corpus study of patterns of (semi-)autonomous dat ‘that’ subordination in Dutch.Discovery of a number of new patterns in Dutch grammar.Construction types treated separately in the few extant accounts are linked together.The types share a semantic–pragmatic value of interpersonal meaning.We propose a diachronic explanation for this shared value in terms of hypoanalysis.This article presents an analysis of autonomous and semi-autonomous subordination patterns in Dutch, some of which have so far gone unnoticed. It proposes a four-way classification of such constructions with the general subordinator dat (‘that’), drawing on Internet Relay Chat corpus data of Flemish varieties. Generalizing over the four types and their various subtypes distinguished here, we find that they all share the semantic property of expressing interpersonal meaning, and most of them also have exclamative illocutionary force. We propose a diachronic explanation for this shared semantic–pragmatic value in terms of the concept of hypoanalysis, and assess to what extent our proposal meshes with extant ellipsis accounts of the patterns studied.
Language is a complex adaptive system. One of the properties of such systems is that they rely on... more Language is a complex adaptive system. One of the properties of such systems is that they rely on what in biology is called ‘degeneracy’, a technical term for the phenomenon that structurally different elements can fulfill the same function. In this article, it is suggested that degenerate strategies help languages sustain instability in times of syntactic changes. Taking a Construction Grammar approach, it is shown that so-called horizontal relations in constructional networks - in which related constructions in a functional domain are mutually defined by differential values they take on a set of grammatical parameters - can be transmitted through time, even if the specific grammatical parameters on which they are defined are under threat. Evidence is drawn from two different domains: argument realisation in experience processes and adverbial subordination.
Journal of Research Design and Statistics in Linguistics and Communication Science, 2019
Within usage-based theory, notably in construction grammar though also elsewhere, the role of the... more Within usage-based theory, notably in construction grammar though also elsewhere, the role of the lexicon and of lexically-specific patterns in morphosyntax is well recognized. The methodology, however, is not always sufficiently suited to get at the details, as lexical effects are difficult to study under what are currently the standard methods for investigating grammar empirically. In this short article, we propose a method from machine learning: regularized regression (Lasso) with k-fold cross-validation, and compare its performance with a Distinctive Collexeme Analysis.
Construction grammar organizes its basic elements of description, its constructions, into network... more Construction grammar organizes its basic elements of description, its constructions, into networks that range from concrete, lexically-filled constructions to fully schematic ones, with several levels of partially schematic constructions in between. However, only few corpus studies with a constructionist background take this multi-level nature fully into account. In this paper, we argue that understanding language variation can be advanced considerably by systematically formulating and testing hypotheses at various levels in the constructional network. To illustrate the approach, we present a corpus study of the Dutch naar-alternation. It is found that this alternation primarily functions at an intermediate level in the constructional network.
Verslagen en Mededelingen van de KANTL 130(1): 5-23, 2020
In this opinion article we look at what the field of linguistics has achieved over the past decad... more In this opinion article we look at what the field of linguistics has achieved over the past decades. We see four major breakthroughs, but it is striking that those breakthroughs either do not bring real new ideas, but rather refer back to old views, or concern purely methodological innovations. And if there is real innovation, it turns out to be brought by outsiders: psychologists, anthropologists, neuroscientists, computer scientists and physicists, who have become interested in language. Linguistics in the classical sense is in crisis. This crisis does not come out of the blue, but is the result of the end of the Renaissance, a culture period in which language took central stage, and had already been predicted by Jürgen Trabant (2003). All this does not entail that current-day research into language is uninteresting. Turning away from a Renaissance perspective and joining forces with other scientific fields brings advantages: less inbreeding and theory-internal stories, as well as an open-minded view on old issues.
Historical linguistics has witnessed an upsurge in quantitative corpus studies. The bulk of these... more Historical linguistics has witnessed an upsurge in quantitative corpus studies. The bulk of these studies involve the use of regression modelling. We point out a number of potential problems with this approach, and offer an alternative. For a multi-state language change, we propose a Markov model in continuous time. The major advantage of this technique, which has been used in medical contexts, is that it is especially geared towards dealing with time as a variable of interest, while it still allows one to look at the effect of several covariates. In this proof-of-concept article, we look at morphological shifts in preterites in Dutch, from 800AD to 2000AD (n = 14,314). This is a well-researched field, allowing us to investigate the performance of the multi-state Markov model.
Grammaticalization has proven to be an insightful approach to semantic-morphosyntactic change wit... more Grammaticalization has proven to be an insightful approach to semantic-morphosyntactic change within and across languages. Many studies, however, rely on assessing the large, obvious differences before and after the change. When investigating burgeoning or ongoing grammaticalization processes, it is notably harder to objectively measure the degree of grammaticalization. One approach is to gauge changes in the well-known 'parameters' of Lehmann, Hopper, and Himmelmann, but this approach is often qualitatively oriented. Quantitative studies mainly rely either on token frequency of a construction, assuming that grammaticalization is accompanied by a frequency increase, or by tracing the development of two competing constructions, looking at the proportion of their respective token frequencies. In this article, we argue for a wider range of quantitative measures, beyond token frequency, as dependent variables. We will show that these measures can jointly point to subtle ongoing grammaticalization. As a case study, we will focus on Dutch binominals with soort 'sort', a core member of the much-discussed sort-kind-type (SKT) construction in the languages of Europe. Based on a large dataset of over 14,000 instances from the period between 1850 and 1999, we investigate quantitatively measurable changes in the construction's surface behavior (i.e., the gradual loss of the relator van 'of' and increasing restrictions on premodification and pluralization, pointing to a process of 'decategorialization'). In addition, we will also use Gries's deviation of proportions (DP) to gauge the dispersion of soort, a valuable but under-used metric in quantitative studies of grammaticalization.
In every-day language use, two or more structurally unrelated constructions may occasionally give... more In every-day language use, two or more structurally unrelated constructions may occasionally give rise to strings that look very similar on the surface. As a result of this superficial resemblance, a subset of instances of one of these constructions may deviate in the probabilistic preference for either of several possible formal variants. This effect is called ‘constructional contamination’, and was introduced in Pijpops and Van de Velde (2016). Constructional contamination bears testimony to the hypothesis that language users do not always execute a full parse of the utterances they interpret, but instead often rely on ‘shallow parsing’ and the storage of large, unanalyzed chunks of language in memory, as proposed in Ferreira, Bailey and Ferraro (2002), Ferreira and Patson (2007) and Dąbrowska (2014). Pijpops and Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive, but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.
Grammaticalization research has led to important insights into the driving processes of innovatio... more Grammaticalization research has led to important insights into the driving processes of innovation and propagation. Yet what has generally been lacking is a principled way of analyzing their interaction. Research into innovation focuses on the role of individual language users and tends to take a more qualitative approach, while propagation is typically studied in terms of the community grammar and tends to be more statistically driven. We propose an approach that bridges the two. Drawing on a much larger historical data set than is commonly done, our study shows how a high-resolution analysis of semantic and morphosyntactic behavior can be married to statistics, resulting in a method that measures the degree of grammaticalization at the level of single attestations. We apply this method to the early grammaticalization of be going to inf, showing how a communal increase breaks down into different rates of change in the run-up to, the middle of, and right after conventionalization. Additionally, we trace lifespan change of individual authors longitudinally. While not robustly in evidence, there are hints of postadolescence reanalysis in the run-up generation, and of increased realization of innovative features in the middle generation.
Structuralism and formal grammar have, in the course of the 20 th century, rightfully taken issue... more Structuralism and formal grammar have, in the course of the 20 th century, rightfully taken issue with more vague and unfalsifiable just-so stories of some of their predecessors. For all its merits, though, the structuralist-formal strand of linguistics has its drawbacks as well. The classical Saussurean distinction between synchrony and diachrony can be harmful: a purely synchronic description is often inferior to the insight gained from diachrony, not only because grammar is laden with heirlooms and débris of prior structures and because languages draw on a wide variety of pathways to generate new grammar, but also because variation can often only be understood fully in the light of its history. Many cases of synchronic variation are the result of competition between an innovative mutant encroaching on an obsolescent construction. In such cases, the synchronic skew in the proportion of one variant to the other is not arbitrary, but is a reflection of how far the change has progressed. To the extent that one wants to incorporate variation in grammatical description – and there are sound theoretical reasons to do so – the historical perspective is indispensable. In this article four case studies from different corners of Dutch grammar are discussed (on cardinal numerals, on the Big Mess Construction, on bare infinitive complements of auxiliaries, and on the hortative). The case studies together form a plea for the historisation of the science of linguistics, just like biology has been historised, and indeed, as is shown in this article, there are numerous parallels between linguistics and biology.
This paper inquires into the external possessor in West Germanic and Romance. Against other accou... more This paper inquires into the external possessor in West Germanic and Romance. Against other accounts in the literature, it argues that the distribution of the dative external possessor can be explained neither by reference to Standard Average European nor by direct substrate influence. Instead, it argues that its diachronic decline is better explained as the result of increased configurationality or a tighter structure of the noun phrase. Although the emergence of a tight NP structure may itself be traced back to language contact factors, substrate influence on the diachrony of the external possessor is shown to be more indirect than what is suggested in the literature. The increase in configurationality can be considered a case of constructional grammaticalization (i.e. constructionalization), as the slots for determination and modification become progressively more fixed. One of the main claims here is that this grammaticalization process proceeds at different rates in cognate languages.
In this article, we introduce the effect of " constructional contamination ". In constructional c... more In this article, we introduce the effect of " constructional contamination ". In constructional contamination, a subset of the instances of a target construction deviate in their realization, due to a superficial resemblance they share with instances of a contaminating construction. We claim that this contaminating effect bears testimony to the hypothesis that language users do not always execute a full parse while interpreting and producing sentences. Instead, they may rely on what has been called " shallow parsing " , i. e., chunking the utterances into large, unanalyzed exemplars that may extend across constituent borders. We propose several measures to quantify constructional contamination in corpus data. To evaluate these measures, the Dutch partitive genitive is taken under scrutiny as a target construction of constructional contamination. In this case study, it is shown that neighboring constructions play a crucial role in determining the presence or absence of the-s suffix among instances of the partitive genitive. The different measures themselves, however, are not construction specific , and can readily be used to track constructional contamination in other case studies as well.
This study applies the methodology described by Gries & Deshors (2014) within the framework of th... more This study applies the methodology described by Gries & Deshors (2014) within the framework of the Contrastive Interlanguage Analysis (Granger, 1996) to the partitive genitive inflection in post-quantifier adjectives in the Moroccan Dutch ethnolect. This implies fitting a logistic regression model on data from the complementary ConDiv and Moroccorp corpora to investigate the differences between the L1 variety and the (early L2/2L1) ethnolect variety. It was found that the Moroccan Dutch language users do not differ from 'ordinary' Dutch language users in the realisation of the partitive genitives suffix, neither through an outspoken preference for one of the inflectional variants, nor in the factors determining the alternation. This is considered a rather surprising result, as such differences do exist for a number of other grammatical phenomena (Cornips and Rooij, 2003; Van de Velde and Weer-man, 2014). This finding can tell us something about the inflectional status of the partitive genitive. It appears that it is less non-transparent than other quirks in adjectival inflection.
The verbal weak inflection, one of the defining innovations of Proto-Germanic, currently holds a ... more The verbal weak inflection, one of the defining innovations of Proto-Germanic, currently holds a dominant position in the verbal inventory of most remaining Germanic languages. This has not always been the case, though. This paper investigates how the weak inflection could have grown to overthrow its competitor, the strong inflection, even if (i) the strong inflection was still more regular and (ii) the weak inflection had to start from a position vastly inferior in frequency to any strong ablaut class. As opposed to earlier work, which focused on language acquisition in models of iterated learning, our focus lies on language usage, which is why we have composed an agent-based model. This enabled us to test a number of minimal assumptions needed to explain an ascent of the weak inflection, of which several have been proposed in the literature. It was found that the weak inflection's functional advantage of general applicability is sufficient by itself already. That is, the weak dental suffix is in principle applicable to all verbs, while each separate strong ablaut class is not. This is shown to put the strong inflection at a crucial disadvantage, even if (i) the strong system as a whole is applicable to all verbs, and (ii) each separate ablaut class starts out as dominant in both type and token frequency over the weak dental suffix. There is no need to assume that the strong system has irregularized for the weak inflection to get airborne; this irregularization may rather be the result and subsequent catalyst of the rise of the weak inflection.
Corpus study of patterns of (semi-)autonomous dat ‘that’ subordination in Dutch.Discovery of a nu... more Corpus study of patterns of (semi-)autonomous dat ‘that’ subordination in Dutch.Discovery of a number of new patterns in Dutch grammar.Construction types treated separately in the few extant accounts are linked together.The types share a semantic–pragmatic value of interpersonal meaning.We propose a diachronic explanation for this shared value in terms of hypoanalysis.This article presents an analysis of autonomous and semi-autonomous subordination patterns in Dutch, some of which have so far gone unnoticed. It proposes a four-way classification of such constructions with the general subordinator dat (‘that’), drawing on Internet Relay Chat corpus data of Flemish varieties. Generalizing over the four types and their various subtypes distinguished here, we find that they all share the semantic property of expressing interpersonal meaning, and most of them also have exclamative illocutionary force. We propose a diachronic explanation for this shared semantic–pragmatic value in terms of the concept of hypoanalysis, and assess to what extent our proposal meshes with extant ellipsis accounts of the patterns studied.
Language is a complex adaptive system. One of the properties of such systems is that they rely on... more Language is a complex adaptive system. One of the properties of such systems is that they rely on what in biology is called ‘degeneracy’, a technical term for the phenomenon that structurally different elements can fulfill the same function. In this article, it is suggested that degenerate strategies help languages sustain instability in times of syntactic changes. Taking a Construction Grammar approach, it is shown that so-called horizontal relations in constructional networks - in which related constructions in a functional domain are mutually defined by differential values they take on a set of grammatical parameters - can be transmitted through time, even if the specific grammatical parameters on which they are defined are under threat. Evidence is drawn from two different domains: argument realisation in experience processes and adverbial subordination.
In recent years, theoretical work in construction grammar has often focused on links between cons... more In recent years, theoretical work in construction grammar has often focused on links between constructions and the design of the constructional network or constructicon (Wellens 2011; Van de Velde 2014; Diessel 2015). Regarding these networks, one of the issues on which we have managed to reach consensus, is the need for a vertical dimension, ranging from fully abstract to lexically specified constructions (Croft 2001: 25–29; Goldberg 2003; Fried and Östman 2004: 15–18). Still, corpus research only rarely explicitly takes this dimension into account and often restrict itself to one particular horizontal level in the network (e.g. Pijpops & Speelman 2017, for exceptions, see a.o. Boas 2010; Wible & Tsao 2017). While such an approach is certainly justifiable, we will argue that neglecting the multi-level nature of the constructicon has led to three problems of constructional semantics. At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question. Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing. ‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst. ‘Spilled hot coffee.’
In recent years, theoretical work in construction grammar has often focused on links between cons... more In recent years, theoretical work in construction grammar has often focused on links between constructions and the design of the constructional network or constructicon (Wellens 2011; Van de Velde 2014; Diessel 2015). Regarding these networks, one of the issues on which we have managed to reach consensus, is the need for a vertical dimension, ranging from fully abstract to lexically specified constructions (Croft 2001: 25–29; Goldberg 2003; Fried and Östman 2004: 15–18). Still, corpus research only rarely explicitly takes this dimension into account and often restrict itself to one particular horizontal level in the network (e.g. Pijpops & Speelman 2017, for exceptions, see a.o. Boas 2010; Wible & Tsao 2017). While such an approach is certainly justifiable, we will argue that neglecting the multi-level nature of the constructicon has led to three problems of constructional semantics. At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question. Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing. ‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst. ‘Spilled hot coffee.’
Lectal contamination is the language-external counterpart of what has been described as construct... more Lectal contamination is the language-external counterpart of what has been described as constructional contamination (Pijpops & Van de Velde 2016). In constructional contamination, various superficially similar constructions within one and the same language variety exert an influence on each other, causing lexically-specific preferences for either of two morphological or syntactic variants, depending on which lexemes the superficially related construction share. In lectal contamination, by contrast, lexically-specific preferences may arise due to language contact with another variety that shares the same construction. In particular, lexemes that occur more often in one variety will come to prefer the morphosyntactic variant that is preferentially used in that particular variety, even in the speech of language users of a different variety. As a result, what is essentially a language-external factor conditioning a particular form of linguistic variation may become internalized. As a case study, we zoom in on the Dutch partitive genitive construction. This construction exhibits variation between a form with and without -s ending, as in (1) and (2). The form with the -s ending is predominant in the Netherlandic regiolect, while the form without -s constitutes a marker of the Belgian regiolect (Pijpops & Van de Velde 2014). Because of this distinction between the Netherlands and Belgium, i.e. a language-external factor, partitive genitive types that feature typically Netherlandic lexemes, such as (1), more often appear in the variant with -s, whereas those that contain typically Belgian lexemes, such as (2), will more often appear without the -s. Our hypothesis was that these lexical preferences got entrenched, so that Belgian speakers using Netherlandic lexemes would import the Netherlandic morphological variant and vice versa. In other words: while the formal realisation is straightforwardly regionally stratified, we expect these lexical preferences to hold even within the Netherlandic and Belgian regiolects.
(1) Iets bijzonder(s) ‘Something remarkable’
(2) Iets speciaal(s) ‘Something special’
We tested this prediction on 3018 manually checked observations from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and found it to be confirmed, even when controlling for all other known variables to influence -s omission. Furthermore, we drew geographically-tagged data from Twitter, totaling 1299 manually checked instances, to replicate this finding and to investigate the geographical spread of both lectal contamination and the partitive genitive variation. The effect of lectal contamination can only be explained if we have a sufficiently precise account of how individual speakers operate in language contact situations (Weinreich, Labov & Herzog 1968). If language contact can, in this way, cause lectal variation to produce lect-internal effects, then a variationist description of a particular regio-, dia-, socio- or ethnolect crucially depends on an understanding of language contact.
1. Introduction
Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language b... more 1. Introduction Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language by investigating the rise of the English regular past tense inflection, which they equated with the weak -ed suffix. Yet, their bold conclusion that “the half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast” (Lieberman et al., 2007, p.713) has successively attracted criticism from scholars in the fields of historical and evolutionary linguistics. First, Carroll, Svare, & Salmons (2012) showed that this constant regularization rate does not hold true for the closely-related German language. Second, Cuskley et al. (2014) found that the rise of the English weak ed suffix is not driven by forces endogenous to language, such as analogy, but rather by external forces, such as new verbs entering the language through language contact. We will reassess the constant-rate controversy by (i) extending the methodological scope with agent-based modeling, and (ii) extending the number of languages going beyond the German-English distinction, adding Dutch. Our results show that the constant rate does not hold. If language change is co-determined by external forces resulting in languages adapting to its niche (Lupyan & Dale 2016) this is exactly what one would expect, since English, Dutch and German have endured external pressures to a different degree. We will focus on the influence of demographic change. In particular, we investigate the growth of cities and the resulting koineization due to migration in the three language areas since the Middle Ages. The three different degrees of urbanization have led to different degrees of dialect contact, which could in turn, as we will argue, lead to different regularization rates. To support this claim, we will present both empirical evidence from linguistic and demographic databases, as well as the results of a computational simulation.
2. Empirical data 2.1. Linguistic data To obtain a clear picture of the linguistic situation, we included the data on English from Lieberman et al. (2007) and the data on German from Carroll et al. (2012), and complemented these with our own Dutch data. This enables us to track the development of the past tense system of these three languages over a 1000 year period (800-1800).
2.2. Demographic data For the demographic data, we make use of the databases of Bairoch et al. (1988), De Vries (1984), and Mitchell (1998). In particular, we compare the population growth of the largest cities in the English, Dutch and German language areas in each particular time period from 800-1800. Historical research has shown that the exponential growth of urban population cannot be reduced to natural growth, but is driven by immigration as well, both of foreigners and of by a rural exodus from the larger agglomeration, leading to dialect contact. We then observed correlations between the success of the weak inflection and the amount of demographic upheaval.
3. Simulation A correlation between a demographic and a linguistic trend does not automatically entail a causation between the former and the latter, however. To further substantiate our claim, we therefore turn to an agent-based computer simulation. In this simulation, agents store exemplars or tokens of what they hear (cf. Pijpops et al., 2015), rather than type states (cf. Colaiori et al., 2015), and use these to produce novel forms. We find that (i) the weak inflection does not require special status as the single regular inflection in order to explain the tendencies observed in reality; (ii) replacement of verbs can indeed cause a continued rise of the weak inflection, even after a stable equilibrium between weak and strong verbs has emerged, confirming Cuskley et al. (2014); and most importantly (iii) if our current understanding of language, as implemented in the simulation, is correct, demography does indeed affect the rise of the weak inflection.
Present-day Dutch has a vestigial partitive genitive morpheme. Adjectives take the genitive -s mo... more Present-day Dutch has a vestigial partitive genitive morpheme. Adjectives take the genitive -s morpheme when they are used as a dependent of a quantifier (Haeseryn et al. 1997: 863; Broekhuis 2013: 420-426). This is illustrated in (1). The construction comes in two variants: either with an overt -s suffix, or without the suffix. (1) iets bijzonder(-s) something special-GEN ‘something special’ While the two variants do not show any observable semantic difference, Pijpops & Van de Velde (2014) applied mixed-model logistic regression and found that the expression of the -s is probabilistically determined by a number of factors. While overall, the [+s] variant is more frequent, the [-s] variant is also fairly common, and is more likely to occur (i) in informal registers, (ii) in low-frequency phrases, and (iii) in the south of the language area (Belgium). There also is a strong main effect for the [-s] variant for adjectives that occurred in superficially similar non-partitive constructions. This is illustrated in (2) and (3): though similar in surface form, the contexts makes clear that (2) is not a partitive construction. The absence of the -s morpheme then spills over to genuine partitives like (3) (see Pijpops & Van de Velde, forthc. for extensive explanation on what they call ‘constructional contamination’). (2) iets verkeerd geïnterpreteerd [something]NP [[wrongly]AdvP interpreted] (3) iets verkeerd gegeten [something wrong]NP eaten This suggests that, in line with exemplar-based theories of language, prior use of constructions leaves a (context-rich) trail in the mind of the language users. In this talk, we want to see whether the same effect also occurs with regard to the regional variable. Can the regional provenance of the lexemes inserted in a construction exert an influence on the morphological realisation of the target construction, even if the construction is used by language users with a different regiolectal background? In our study southern speakers have a stronger tendency to drop the genitive -s, but less so when they are using ‘northern’ lexemes, and vice versa. This effect holds even if the regional provenance of the lexemes is subtle, and unlikely to be a shibboleth of a regionally recognisable type of speech. Furthermore, we see that while the analogical pull of lexemes with a regional profile is felt everywhere in the language area, the effect is more blurry in cities near the border of the two regions and more clear in the core areas. This finding shows that not only the language-internal context of prior instances is stored in memory, but the ‘language-external’, lectal context as well.
From the earliest attested stages on, Germanic languages have at their disposal two competing str... more From the earliest attested stages on, Germanic languages have at their disposal two competing strategies for building preterites. One strategy, exemplified by sing-song, is called the strong inflection. It relies on root apophony (ablaut), and is a reanalysis and extension of an earlier Indo-European aspectual system (Prokosch 1939; Lass 1990). The other strategy, exemplified by work-worked, is called the weak inflection. It does not use apophony, but suffixation, and finds its origin in the morphologisation of a Indo-European stem *dheh1/*dhoh1 (‘do’) added to the verb, eventually turning into a dental suffix (Ball 1968; Tops 1974; Bailey 1997; Hill 2010), though other sources have contributed as well (Heath 1998; Ringe 2007; Hill 2010). Setting the emergence of a third strategy later in Germanic, namely the analytic perfect (exemplified in Afrikaans werk – het gewerk, lit. ‘has worked’) aside, it has often been observed that despite occasional shifts in the opposite direction, Germanic displays a long-term drift in which the weak inflection takes the upper hand at the expense of the strong inflection, although the strong inflection remains remarkably resilient, and still has not fully succumbed to the overall weakening trend (Van Haeringen 1940). Recent years have seen publications in which this ‘weakening’ drift is cast in quantitative terms. Lieberman et al. (2007) notice that in English, the weakening of the verbs follows a constant rate through time, is only dependent on the frequency of the verb, and neatly scales proportionally to the square root of the frequency of verbs. However, Carroll et al. (2012) replicated the study for German and found no such constant rate, hence casting doubt on the universality of the mathematical regularity that seemed to govern the weakening. In our talk, we replicate the Lieberman et al. and the Carroll study for Dutch, allowing a comparison between the three languages in the Van Haeringen (1956) tradition. Our results confirm Carroll et al. (2012)’s critique on the constant rate. Carroll et al. suggested that underlying the differences between English and German are demographic factors, but they left it to future research to actually dig deeper into the demographic history. In our talk, we pick up this thread and couple the weakening with historical demography. Our results indicate that the differences between these three big West-Germanic languages indeed seem related to population effects. Evidence is drawn from grammars and historical demographic databases. We further support our claims with agent-based computer simulation, extending earlier work by Pijpops et al. (2015).
Processing shapes grammatical organisation, including asymmetric coding with a marked vs. unmarke... more Processing shapes grammatical organisation, including asymmetric coding with a marked vs. unmarked alternance (Hawkins 2004), but it is unclear whether the processing considerations at issue are those of speakers or of addressees. Hawkins’s model is framed as benefiting the addressee, though he remarks that it equally benefits the speaker (2004: 24-25). Glossing over parsing and production is legitimate as long as speakers’ and addressees’ motivations are aligned, but this is not always the case. The idea that language has to seek an optimal balance between the often opposite demands of both speech act participants is old, harking back at least to Georg von der Gabelentz in the 19th century. So eventually, we will have to decide which of the two speech act participants has the upper hand in the processing-driven organisation of grammar. On the one hand, there is evidence for an addressee-oriented view: Hawkins’s ‘Minimize Domains’ principle, stating that the syntactic structure should be recognisable in as short a span as possible, benefits the addressee, as the speaker is never unsure about the syntactic structure. Likewise, Rohdenburg’s (1996) Complexity Principle stating that in complex structures more explicit encoding is used is only beneficial to the addressee. If the structure is already complex, adding extra grammatical encoding arguably burdens the speaker’s performance even more. On the other hand, it is not self-evident that speakers should be concerned with their addressees’ needs forfeiting their own. Speaker’s altruism is evolutionarily implausible (Kirby 1999). Levinson (2000) also stresses the speaker’s needs in his neo-Gricean approach. As Levinson points out, the bottleneck in human communication is at the production side: decoding is much faster and more effortless than encoding (Levinson 2000: 28), so that taking inferential short-cuts to add layers of meaning on top of what is truth-conditionally encoded is especially helpful for the speaker. Adding extra material in the overtly coded variant in an alternance (e.g. zero- vs. that-complementation in English) goes against the rationale to prioritize production efficiency over parsing speed. Hawkins’s principle ‘Minimize Forms’ also seems first and foremost serve the speaker’s comfort. True, reducing forms also adds to the parsing effort, as the form-function pair of the extra encoding has to be stored in the hearer’s brain, but given the ease with which inferencing is accomplished (Levinson 2000), and given the vast storage capacities of the human mind (Dąbrowska 2014: 626), the extra speaker’s efforts outweigh the extra addressees’ efforts. In our paper, we will adduce quantitative data from a close-up case study that can shed light in the debate over speaker vs. addressee processing. The case study deals with the direct object vs. prepositional object alternance in Dutch verbs, like zoeken (naar) ‘search (for)’. A corpus study reveal that the prepositional variant is used more often when the object is syntactically complex. This can be explained in two ways: first, the preposition can function as a signpost to help the addressee decode the message. This would be in line with Rohdenburg’s Complexity Principle, and would point to a hearer-driven processing account. Second, the use of a preposition allows the object to be extraposed (or ‘exbraciated’). This would be beneficial to the speaker, who can postpone the expression of the complex object at the end of the clause, when all other issues have been resolved, avoiding centre-embedding. On the basis of corpus investigation, we will tease apart both explanations. Of special interest are cases such as (1), where the head noun of the object is not extraposed (to the right of gezocht ‘search-PST.PTCP’), but the submodifying complement clause is. If the use of the prepositional variant is especially favoured in this context, this would be an argument for the first explanation. Here, the processing difficulty of the discontinuous object may be alleviated for the hearer by adding the extra signpost.
The Germanic languages boast two morphological strategies for past tense formation. The strong in... more The Germanic languages boast two morphological strategies for past tense formation. The strong inflection is based on an ablaut in the verb’s stem (e.g. sing ~ sang, drive ~ drove) and is the oldest, largely descendant from the Indo-European mother tongue (Harbert 2007). The weak inflection, by contrast, adds a dental suffix to the stem (e.g. laugh ~ laughed), and constitutes a Proto-Germanic innovation. In the history of the Germanic languages, this dental suffix has had considerable success in taking over past tense formation, to the detriment of the strong inflection (Harbert 2007; Lieberman et al. 2007; Cuskley et al. 2014). To account for this success, three explanations are given in the literature (Ball 1968: 164; Bailey 1997: 7–8). First, while each separate strong ablaut class is only applicable to a subset of verbs, the weak suffix can, in principle, be attached to any verb indiscriminately. Second, some verbs escaped ablaut formation altogether, for instance because they had a vowel that fitted in none of the ablauting patterns. Such verbs would then create a save nest for the nascent weak inflection, free of competing strong forms. Third, the strong inflection was ravaged by the effects of several sound laws, which severely undermined its transparency. This would have rendered it vulnerable to competition from the seemingly more transparent weak inflection. We will claim that the first explanation is already sufficient to account for the rise of the weak inflection. Moreover, it may explain why the weak inflection first took over the low frequency verbs and low frequency ablaut classes (Carroll, Svare and Salmons 2012). Since we then no longer need the irregularization of the strong inflection to explain these effects, this irregularization may be the result of the rise of the weak inflection, rather than its cause. To support these claims, we have built an agent-based simulation. In this simulation, computational agents communicate with each other by referring to past events, thereby employing either the strong or weak inflection. The agents preferably use the forms that they hear most often from their fellow agents. The simulation was composed in Babel2, a framework for building agent-based models of language evolution (Steels 2012). In the simulation, the only difference between the strong and weak inflection lies in the first explanation given above. Any other possible advantages for the weak inflection were excluded from the model. Under such conditions, it can be observed that a rise of the weak inflection will come to pass in both type and token frequency, accompanied by a Conserving Effect of both the verbs and the ablaut classes (Bybee 2006; Carroll, Svare and Salmons 2012). This rise even takes place if the weak dental suffix starts out as inferior in both type and token frequency to any individual strong ablaut class.
1. A hostile environment
In most present-day Germanic languages, the weak inflection (work-work... more 1. A hostile environment In most present-day Germanic languages, the weak inflection (work-worked) offers a well-established and regular strategy for past tense formation. In contrast, the strong inflection (sing-sang) currently seems no more than a diminishing rubble of sub-rules and irregularities (Harbert, 2007, p. 277). Still, things were once different. Language reconstruction shows that around the time of the birth of the weak-inflection, the strong inflection is likely to have been both clearly regular and dominant in frequency (Bailey, 1997). To explain the conundrum of how a nascent weak dental suffix could have possibly gained the upper hand in such a hostile environment, researchers usually refer to sound changes undermining the regularity of the strong system (Bailey, 1997, p. 17; Ball, 1968, p. 164). We will claim that this assumption is not needed. Instead, the rise of the weak inflection may be initially caused by nothing more than its general applicability, i.e. its ability to be – in principle – applied to any verb. In addition, this general applicability proves capable of explaining that the rise of the weak inflection (i) first affects low frequency verbs, and only later high frequency verbs, and (ii) more heavily affects particular ablaut classes than others. In concert, these effects may create the conditions in which a perfectly functioning strong ablaut system can be surrendered to the disruptive influence of sound changes without causing a problem to the language users. 2. Model design and behavior We ran an agent-based model (Gilbert, 2008), containing the following features:
• There are no irregular verbs, nor ways for verbs or ablaut classes to become irregular. • The weak dental suffix starts out inferior in both type and token frequency to each individual strong ablaut class. • All verbs in the model can be conjugated both strongly and weakly. • The only difference between the strong ablaut classes and weak dental suffix lies in the dental suffix’s general applicability. • The agents do not show any (socially attributed) preference for one of the variants, neither in acquisition nor use. Instead, the simply prefer the variant that they more often hear. • Agents age and are gradually replaced. • The verbs show a realistic, Zipfian frequency distribution (Zipf, 1932).
Under these conditions, it is shown that a gradual rise of the weak dental suffix will take place, first attacking the low-frequency verbs and the low-frequency ablaut classes. Highly frequent ablaut classes prove capable of protecting their low-frequent members against weakening. These effects emerge independently of specific parameter settings. Acknowledgements We would like to thank Remi van Trijp for useful comments about the model.
In an inconspicuous corner of Dutch grammar, one may find adjectives receiving -s inflection (1).... more In an inconspicuous corner of Dutch grammar, one may find adjectives receiving -s inflection (1). However, this -s, a remnant of the partitive genitive, may also disappear (2).
(1) wat zinnig-s ‘something sensible’
(2) iets wit ‘something white’
Earlier research has revealed the precise intra- and extra-linguistic contexts in which this -s omission is taking place (Pijpops & Van de Velde 2014). What remains unclear however, is how second language speakers of Dutch handle this peculiar inflection. Do they generalize one variant, as often with prenominal adjectival inflection (Weerman 2003, Blom et al. 2008, Ruette & Van de Velde 2013: 468-471, Van de Velde & Weerman 2014: 117-119)? Or are they capable of picking up exactly when to place the -s? To answer these questions, we apply the regression-based methodology of Gries & Deshors (2014) to first and second language chatters of Dutch. We believe the results not only provide information on second language acquisition of this postnominal inflection, but also shed light on its current and future linguistic status.
This volume is the first collection of papers that is exclusively dedicated to the concept of exa... more This volume is the first collection of papers that is exclusively dedicated to the concept of exaptation, a notion from evolutionary biology that was famously introduced into linguistics by Roger Lass in 1990. The past quarter-century has seen a heated debate on the properties of linguistic exaptation, its demarcation from other processes of linguistic change, and indeed the question of whether it is a useful concept in historical linguistics at all. The contributions in the present volume reflect these diverging points of view. Along with a comprehensive introduction, covering the history of the notion of exaptation from its conception in the field of biology to its adoption in linguistics, the book offers extensive discussion of the concept from various theoretical perspectives, detailed case studies as well as critical reviews of some stock examples. The book will be of interest to scholars working in the fields of evolutionary linguistics, historical linguistics, and the history of linguistics.
The classical Saussurean distinction between synchrony and diachrony is harmful: a purely synchro... more The classical Saussurean distinction between synchrony and diachrony is harmful: a purely synchronic description is inferior to the insight gained from diachrony, not only because grammar is laden with heirlooms and débris of prior structures and because languages draw on a wide variety of pathways to generate new grammar, but also because variation can often only be understood fully in the light of its history. Many cases of synchronic variation are the result of competition between an innovative mutant encroaching on an obsolescent construction. In such cases, the synchronic skew in the proportion of one variant to the other is not arbitrary, but is a reflection of how far the change has progressed. In this article four case studies from different corners of Dutch grammar are discussed (on cardinal numerals, on the Big Mess Construction, on bare infinitive complements of auxiliaries, and on the hortative). In each of these a diachronic perspective is indispensable to come to grips with the grammatical structure. The case studies together form a plea for the historisation of the science of linguistics, just like evolutionary biology has been historised.
Uploads
Papers by Freek Van de Velde
Pijpops and Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive, but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.
Pijpops and Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive, but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’
As a case study, we zoom in on the Dutch partitive genitive construction. This construction exhibits variation between a form with and without -s ending, as in (1) and (2). The form with the -s ending is predominant in the Netherlandic regiolect, while the form without -s constitutes a marker of the Belgian regiolect (Pijpops & Van de Velde 2014). Because of this distinction between the Netherlands and Belgium, i.e. a language-external factor, partitive genitive types that feature typically Netherlandic lexemes, such as (1), more often appear in the variant with -s, whereas those that contain typically Belgian lexemes, such as (2), will more often appear without the -s. Our hypothesis was that these lexical preferences got entrenched, so that Belgian speakers using Netherlandic lexemes would import the Netherlandic morphological variant and vice versa. In other words: while the formal realisation is straightforwardly regionally stratified, we expect these lexical preferences to hold even within the Netherlandic and Belgian regiolects.
(1) Iets bijzonder(s)
‘Something remarkable’
(2) Iets speciaal(s)
‘Something special’
We tested this prediction on 3018 manually checked observations from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and found it to be confirmed, even when controlling for all other known variables to influence -s omission. Furthermore, we drew geographically-tagged data from Twitter, totaling 1299 manually checked instances, to replicate this finding and to investigate the geographical spread of both lectal contamination and the partitive genitive variation.
The effect of lectal contamination can only be explained if we have a sufficiently precise account of how individual speakers operate in language contact situations (Weinreich, Labov & Herzog 1968). If language contact can, in this way, cause lectal variation to produce lect-internal effects, then a variationist description of a particular regio-, dia-, socio- or ethnolect crucially depends on an understanding of language contact.
Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language by investigating the rise of the English regular past tense inflection, which they equated with the weak -ed suffix. Yet, their bold conclusion that “the half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast” (Lieberman et al., 2007, p.713) has successively attracted criticism from scholars in the fields of historical and evolutionary linguistics. First, Carroll, Svare, & Salmons (2012) showed that this constant regularization rate does not hold true for the closely-related German language. Second, Cuskley et al. (2014) found that the rise of the English weak ed suffix is not driven by forces endogenous to language, such as analogy, but rather by external forces, such as new verbs entering the language through language contact.
We will reassess the constant-rate controversy by (i) extending the methodological scope with agent-based modeling, and (ii) extending the number of languages going beyond the German-English distinction, adding Dutch.
Our results show that the constant rate does not hold. If language change is co-determined by external forces resulting in languages adapting to its niche (Lupyan & Dale 2016) this is exactly what one would expect, since English, Dutch and German have endured external pressures to a different degree. We will focus on the influence of demographic change. In particular, we investigate the growth of cities and the resulting koineization due to migration in the three language areas since the Middle Ages. The three different degrees of urbanization have led to different degrees of dialect contact, which could in turn, as we will argue, lead to different regularization rates. To support this claim, we will present both empirical evidence from linguistic and demographic databases, as well as the results of a computational simulation.
2. Empirical data
2.1. Linguistic data
To obtain a clear picture of the linguistic situation, we included the data on English from Lieberman et al. (2007) and the data on German from Carroll et al. (2012), and complemented these with our own Dutch data. This enables us to track the development of the past tense system of these three languages over a 1000 year period (800-1800).
2.2. Demographic data
For the demographic data, we make use of the databases of Bairoch et al. (1988), De Vries (1984), and Mitchell (1998). In particular, we compare the population growth of the largest cities in the English, Dutch and German language areas in each particular time period from 800-1800. Historical research has shown that the exponential growth of urban population cannot be reduced to natural growth, but is driven by immigration as well, both of foreigners and of by a rural exodus from the larger agglomeration, leading to dialect contact. We then observed correlations between the success of the weak inflection and the amount of demographic upheaval.
3. Simulation
A correlation between a demographic and a linguistic trend does not automatically entail a causation between the former and the latter, however. To further substantiate our claim, we therefore turn to an agent-based computer simulation. In this simulation, agents store exemplars or tokens of what they hear (cf. Pijpops et al., 2015), rather than type states (cf. Colaiori et al., 2015), and use these to produce novel forms. We find that (i) the weak inflection does not require special status as the single regular inflection in order to explain the tendencies observed in reality; (ii) replacement of verbs can indeed cause a continued rise of the weak inflection, even after a stable equilibrium between weak and strong verbs has emerged, confirming Cuskley et al. (2014); and most importantly (iii) if our current understanding of language, as implemented in the simulation, is correct, demography does indeed affect the rise of the weak inflection.
(1) iets bijzonder(-s)
something special-GEN
‘something special’
While the two variants do not show any observable semantic difference, Pijpops & Van de Velde (2014) applied mixed-model logistic regression and found that the expression of the -s is probabilistically determined by a number of factors. While overall, the [+s] variant is more frequent, the [-s] variant is also fairly common, and is more likely to occur (i) in informal registers, (ii) in low-frequency phrases, and (iii) in the south of the language area (Belgium). There also is a strong main effect for the [-s] variant for adjectives that occurred in superficially similar non-partitive constructions. This is illustrated in (2) and (3): though similar in surface form, the contexts makes clear that (2) is not a partitive construction. The absence of the -s morpheme then spills over to genuine partitives like (3) (see Pijpops & Van de Velde, forthc. for extensive explanation on what they call ‘constructional contamination’).
(2) iets verkeerd geïnterpreteerd
[something]NP [[wrongly]AdvP interpreted]
(3) iets verkeerd gegeten
[something wrong]NP eaten
This suggests that, in line with exemplar-based theories of language, prior use of constructions leaves a (context-rich) trail in the mind of the language users.
In this talk, we want to see whether the same effect also occurs with regard to the regional variable. Can the regional provenance of the lexemes inserted in a construction exert an influence on the morphological realisation of the target construction, even if the construction is used by language users with a different regiolectal background? In our study southern speakers have a stronger tendency to drop the genitive -s, but less so when they are using ‘northern’ lexemes, and vice versa. This effect holds even if the regional provenance of the lexemes is subtle, and unlikely to be a shibboleth of a regionally recognisable type of speech. Furthermore, we see that while the analogical pull of lexemes with a regional profile is felt everywhere in the language area, the effect is more blurry in cities near the border of the two regions and more clear in the core areas. This finding shows that not only the language-internal context of prior instances is stored in memory, but the ‘language-external’, lectal context as well.
Setting the emergence of a third strategy later in Germanic, namely the analytic perfect (exemplified in Afrikaans werk – het gewerk, lit. ‘has worked’) aside, it has often been observed that despite occasional shifts in the opposite direction, Germanic displays a long-term drift in which the weak inflection takes the upper hand at the expense of the strong inflection, although the strong inflection remains remarkably resilient, and still has not fully succumbed to the overall weakening trend (Van Haeringen 1940). Recent years have seen publications in which this ‘weakening’ drift is cast in quantitative terms. Lieberman et al. (2007) notice that in English, the weakening of the verbs follows a constant rate through time, is only dependent on the frequency of the verb, and neatly scales proportionally to the square root of the frequency of verbs. However, Carroll et al. (2012) replicated the study for German and found no such constant rate, hence casting doubt on the universality of the mathematical regularity that seemed to govern the weakening.
In our talk, we replicate the Lieberman et al. and the Carroll study for Dutch, allowing a comparison between the three languages in the Van Haeringen (1956) tradition. Our results confirm Carroll et al. (2012)’s critique on the constant rate.
Carroll et al. suggested that underlying the differences between English and German are demographic factors, but they left it to future research to actually dig deeper into the demographic history. In our talk, we pick up this thread and couple the weakening with historical demography. Our results indicate that the differences between these three big West-Germanic languages indeed seem related to population effects. Evidence is drawn from grammars and historical demographic databases. We further support our claims with agent-based computer simulation, extending earlier work by Pijpops et al. (2015).
On the one hand, there is evidence for an addressee-oriented view: Hawkins’s ‘Minimize Domains’ principle, stating that the syntactic structure should be recognisable in as short a span as possible, benefits the addressee, as the speaker is never unsure about the syntactic structure. Likewise, Rohdenburg’s (1996) Complexity Principle stating that in complex structures more explicit encoding is used is only beneficial to the addressee. If the structure is already complex, adding extra grammatical encoding arguably burdens the speaker’s performance even more. On the other hand, it is not self-evident that speakers should be concerned with their addressees’ needs forfeiting their own. Speaker’s altruism is evolutionarily implausible (Kirby 1999). Levinson (2000) also stresses the speaker’s needs in his neo-Gricean approach. As Levinson points out, the bottleneck in human communication is at the production side: decoding is much faster and more effortless than encoding (Levinson 2000: 28), so that taking inferential short-cuts to add layers of meaning on top of what is truth-conditionally encoded is especially helpful for the speaker. Adding extra material in the overtly coded variant in an alternance (e.g. zero- vs. that-complementation in English) goes against the rationale to prioritize production efficiency over parsing speed. Hawkins’s principle ‘Minimize Forms’ also seems first and foremost serve the speaker’s comfort. True, reducing forms also adds to the parsing effort, as the form-function pair of the extra encoding has to be stored in the hearer’s brain, but given the ease with which inferencing is accomplished (Levinson 2000), and given the vast storage capacities of the human mind (Dąbrowska 2014: 626), the extra speaker’s efforts outweigh the extra addressees’ efforts.
In our paper, we will adduce quantitative data from a close-up case study that can shed light in the debate over speaker vs. addressee processing. The case study deals with the direct object vs. prepositional object alternance in Dutch verbs, like zoeken (naar) ‘search (for)’. A corpus study reveal that the prepositional variant is used more often when the object is syntactically complex. This can be explained in two ways: first, the preposition can function as a signpost to help the addressee decode the message. This would be in line with Rohdenburg’s Complexity Principle, and would point to a hearer-driven processing account. Second, the use of a preposition allows the object to be extraposed (or ‘exbraciated’). This would be beneficial to the speaker, who can postpone the expression of the complex object at the end of the clause, when all other issues have been resolved, avoiding centre-embedding. On the basis of corpus investigation, we will tease apart both explanations. Of special interest are cases such as (1), where the head noun of the object is not extraposed (to the right of gezocht ‘search-PST.PTCP’), but the submodifying complement clause is. If the use of the prepositional variant is especially favoured in this context, this would be an argument for the first explanation. Here, the processing difficulty of the discontinuous object may be alleviated for the hearer by adding the extra signpost.
To account for this success, three explanations are given in the literature (Ball 1968: 164; Bailey 1997: 7–8). First, while each separate strong ablaut class is only applicable to a subset of verbs, the weak suffix can, in principle, be attached to any verb indiscriminately. Second, some verbs escaped ablaut formation altogether, for instance because they had a vowel that fitted in none of the ablauting patterns. Such verbs would then create a save nest for the nascent weak inflection, free of competing strong forms. Third, the strong inflection was ravaged by the effects of several sound laws, which severely undermined its transparency. This would have rendered it vulnerable to competition from the seemingly more transparent weak inflection.
We will claim that the first explanation is already sufficient to account for the rise of the weak inflection. Moreover, it may explain why the weak inflection first took over the low frequency verbs and low frequency ablaut classes (Carroll, Svare and Salmons 2012). Since we then no longer need the irregularization of the strong inflection to explain these effects, this irregularization may be the result of the rise of the weak inflection, rather than its cause.
To support these claims, we have built an agent-based simulation. In this simulation, computational agents communicate with each other by referring to past events, thereby employing either the strong or weak inflection. The agents preferably use the forms that they hear most often from their fellow agents. The simulation was composed in Babel2, a framework for building agent-based models of language evolution (Steels 2012).
In the simulation, the only difference between the strong and weak inflection lies in the first explanation given above. Any other possible advantages for the weak inflection were excluded from the model. Under such conditions, it can be observed that a rise of the weak inflection will come to pass in both type and token frequency, accompanied by a Conserving Effect of both the verbs and the ablaut classes (Bybee 2006; Carroll, Svare and Salmons 2012). This rise even takes place if the weak dental suffix starts out as inferior in both type and token frequency to any individual strong ablaut class.
In most present-day Germanic languages, the weak inflection (work-worked) offers a well-established and regular strategy for past tense formation. In contrast, the strong inflection (sing-sang) currently seems no more than a diminishing rubble of sub-rules and irregularities (Harbert, 2007, p. 277).
Still, things were once different. Language reconstruction shows that around the time of the birth of the weak-inflection, the strong inflection is likely to have been both clearly regular and dominant in frequency (Bailey, 1997). To explain the conundrum of how a nascent weak dental suffix could have possibly gained the upper hand in such a hostile environment, researchers usually refer to sound changes undermining the regularity of the strong system (Bailey, 1997, p. 17; Ball, 1968, p. 164). We will claim that this assumption is not needed. Instead, the rise of the weak inflection may be initially caused by nothing more than its general applicability, i.e. its ability to be – in principle – applied to any verb. In addition, this general applicability proves capable of explaining that the rise of the weak inflection (i) first affects low frequency verbs, and only later high frequency verbs, and (ii) more heavily affects particular ablaut classes than others. In concert, these effects may create the conditions in which a perfectly functioning strong ablaut system can be surrendered to the disruptive influence of sound changes without causing a problem to the language users.
2. Model design and behavior
We ran an agent-based model (Gilbert, 2008), containing the following features:
• There are no irregular verbs, nor ways for verbs or ablaut classes to become irregular.
• The weak dental suffix starts out inferior in both type and token frequency to each individual strong ablaut class.
• All verbs in the model can be conjugated both strongly and weakly.
• The only difference between the strong ablaut classes and weak dental suffix lies in the dental suffix’s general applicability.
• The agents do not show any (socially attributed) preference for one of the variants, neither in acquisition nor use. Instead, the simply prefer the variant that they more often hear.
• Agents age and are gradually replaced.
• The verbs show a realistic, Zipfian frequency distribution (Zipf, 1932).
Under these conditions, it is shown that a gradual rise of the weak dental suffix will take place, first attacking the low-frequency verbs and the low-frequency ablaut classes. Highly frequent ablaut classes prove capable of protecting their low-frequent members against weakening. These effects emerge independently of specific parameter settings.
Acknowledgements
We would like to thank Remi van Trijp for useful comments about the model.
(1) wat zinnig-s
‘something sensible’
(2) iets wit
‘something white’
Earlier research has revealed the precise intra- and extra-linguistic contexts in which this -s omission is taking place (Pijpops & Van de Velde 2014). What remains unclear however, is how second language speakers of Dutch handle this peculiar inflection. Do they generalize one variant, as often with prenominal adjectival inflection (Weerman 2003, Blom et al. 2008, Ruette & Van de Velde 2013: 468-471, Van de Velde & Weerman 2014: 117-119)? Or are they capable of picking up exactly when to place the -s? To answer these questions, we apply the regression-based methodology of Gries & Deshors (2014) to first and second language chatters of Dutch. We believe the results not only provide information on second language acquisition of this postnominal inflection, but also shed light on its current and future linguistic status.