The variation of the strong genitive marker of the singular noun has been treated by diverse acco... more The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.
Diese exemplarische Studie zeigt, wie ein Verfahren des maschinellen Lernens eingesetzt werden ka... more Diese exemplarische Studie zeigt, wie ein Verfahren des maschinellen Lernens eingesetzt werden kann, um Regeln für die Wahl von Fugenelementen in nominalen Komposita aufzudecken. Auf die Basis eines Trainingskorpus von über 400.000 Komposita wird der Algorithmus C4.5 angewandt, um einen Entscheidungsbaum zu generieren, der die Fugenelemente mit einer hohen Trefferquote vorhersagt. Es wurde versucht, diesen Entscheidungsbaum linguistisch zu deuten, um bestehende Hypothesen über die Wahl von Fugenelementen zu prüfen.
In this paper, we present the concept, content and experience with an actively running Massive Op... more In this paper, we present the concept, content and experience with an actively running Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities. This video-based course is held in German, does not require any programming skills, and serves as an introduction to automatic text analysis. The target audience is anyone who is interested in applying basic language technology to text corpora. It has a strong empirical focus on digital representations, tools and corpus linguistics. The main goal thereby is to grasp the fundamental terminology and concepts of computational linguistics, to understand the main problems and solutions, as well as to know about the performance and limitations of current methods. Furthermore, manual annotation and data visualization are introduced in this course.
In this feasibility study we aim at contributing at the practical use of domain ontologies for hy... more In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.
In this chapter, the authors discuss common approaches using data visualizations within the field... more In this chapter, the authors discuss common approaches using data visualizations within the field of digital humanities. They argue that by assigning equal importance to the development, as well as the usage of a visualization framework, researchers can question dogmatic ‘best-practice’ norms for data visualizations which may prevent them from developing visualizations that can be used to find emergent phenomena within the data. They then focus on the question, how visualizations reconstitute language by using diagrammatic operations. Working with digital visualizations, the technological background is of big importance for the interpretation and the development of new tools. As an example, they present their visualization framework ‘geocollocations’ which can be used as a tool to detect words that typically collocate with toponyms in text corpora.
Die beiden zentralen Forderungen des Iconic Turns sind, 1) das Bildhafte an kulturellem Handeln a... more Die beiden zentralen Forderungen des Iconic Turns sind, 1) das Bildhafte an kulturellem Handeln anzuerkennen, aber auch 2) Bilder als Analyseinstrument von Kultur zu nutzen. Es geht um eine Rehabilitierung des Visuellen in seiner weitesten Bedeutung, verbunden mit der Erkenntnis, dass Bilder masgeblich kulturelles Handeln pragen. Als Korpuslinguist ist man immer wieder mit dem Vorwurf konfrontiert, mit der Analyse von Textkorpora ebendiese Visualitat straflich zu vernachlasigen. Wenn man beispielsweise Diskurse in Sozialen Medien wie Twitter, Facebook, Instagram u.a. untersucht, sind diese ohne Zweifel von einer reichen Praxis der Bildbenutzung, Bildzitation etc. durchdrungen. Das stellt korpus- und computerlinguistische Untersuchungen von Social-Media-Daten, wie z.B. sog. Sentiment-Analysen, die grose Mengen von Twitter-Tweets auf ihre Tonalitat hin untersuchen, vor Probleme. Denn, um nur ein Minimalbeispiel zu geben, kann ein Text begleitendes Bild die Tonalitat eines Tweets grund...
Uploads
books by Noah Bubenhofer
miscs by Noah Bubenhofer
incollections by Noah Bubenhofer