Noah Bubenhofer

University of Zurich, Switzerland, Deutsches Seminar, Faculty Member

University of Zurich, Switzerland, Computational Linguistics, Post-Doc

Technische Universität Dresden, Institut für Germanistik, Alumnus

Followers

Following

Co-authors

Public Views

see http://www.bubenhofer.com for further information and a up to date publication list with downloadable PDFs

less

InterestsView All (8)

Uploads

The variation of the strong genitive marker of the singular noun has been treated by diverse acco... more The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.

Diskurse berechnen? Wege zu einer korpuslinguistischen Diskursanalyse

"Es liegt in der Natur der Sache…". Korpuslinguistische Untersuchungen zu Kollokationen in Argumentationsfiguren

Datengeleitete Korpuspragmatik: Korpusvergleich als Methode der Stilanalyse

Challenges in building a multilingual alpine heritage corpus

by Noah Bubenhofer and Lenz Furrer

Linguistic Learning: A New Conceptual Focus in Knowledge Visualization

by Noah Bubenhofer and Stefan Bertschi

A Comparable Wikipedia Corpus: From Wiki Syntax to POS Tagged XML

9. Diskurslinguistik und Korpora

Handbuch Diskurs

Einführung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge

ABSTRACT http://www.bubenhofer.com/korpuslinguistik/

Vorhersage von Fugenelementen in nominalen Komposita

Diese exemplarische Studie zeigt, wie ein Verfahren des maschinellen Lernens eingesetzt werden ka... more Diese exemplarische Studie zeigt, wie ein Verfahren des maschinellen Lernens eingesetzt werden kann, um Regeln für die Wahl von Fugenelementen in nominalen Komposita aufzudecken. Auf die Basis eines Trainingskorpus von über 400.000 Komposita wird der Algorithmus C4.5 angewandt, um einen Entscheidungsbaum zu generieren, der die Fugenelemente mit einer hohen Trefferquote vorhersagt. Es wurde versucht, diesen Entscheidungsbaum linguistisch zu deuten, um bestehende Hypothesen über die Wahl von Fugenelementen zu prüfen.

Download

GenitivDB 2.0 – Datenbank zur Genitivmarkierung (Release vom 01.09.2015)

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

In this paper, we present the concept, content and experience with an actively running Massive Op... more In this paper, we present the concept, content and experience with an actively running Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities. This video-based course is held in German, does not require any programming skills, and serves as an introduction to automatic text analysis. The target audience is anyone who is interested in applying basic language technology to text corpora. It has a strong empirical focus on digital representations, tools and corpus linguistics. The main goal thereby is to grasp the fundamental terminology and concepts of computational linguistics, to understand the main problems and solutions, as well as to know about the performance and limitations of current methods. Furthermore, manual annotation and data visualization are introduced in this course.

Download

Visual Linguistics : Plädoyer für ein neues Forschungsfeld

Sagen kann man's schon, nur schreiben tut man's selten" — Die tun-Periphrase

The word war: 'Yes, He Did' - How Obama won the (rhetorical) battle for the White House

Sprechtakel. Linguistische Notizen

Visualisierungen in der Korpuslinguistik

Algorithmische Visualisierungen: Ausdruck von Routinen und Denkstilen in den Digital Humanities

Multilinguality in historical documents - challenges and solutions for digital humanities

Visualisierung sprachlicher Daten : Visual Linguistics – Praxis – Tools

Diskurslinguistik und Korpora

Alinguistisch : Wissenschaft ohne Geist?

Wissenschaft ohne Geist: Herausforderungen der Digital Humanities am Beispiel der Korpuslinguistik

Using a domain ontology for the semantic- statistical classifi cation of specialist hypertexts

In this feasibility study we aim at contributing at the practical use of domain ontologies for hy... more In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.

Download

Die Wort-Wahl: Obama und McCains rhetorischer Kampf ums Weisse Haus

The Linguistic Construction of World: An Example of Visual Analysis and Methodological Challenges

In this chapter, the authors discuss common approaches using data visualizations within the field... more In this chapter, the authors discuss common approaches using data visualizations within the field of digital humanities. They argue that by assigning equal importance to the development, as well as the usage of a visualization framework, researchers can question dogmatic ‘best-practice’ norms for data visualizations which may prevent them from developing visualizations that can be used to find emergent phenomena within the data. They then focus on the question, how visualizations reconstitute language by using diagrammatic operations. Working with digital visualizations, the technological background is of big importance for the interpretation and the development of new tools. As an example, they present their visualization framework ‘geocollocations’ which can be used as a tool to detect words that typically collocate with toponyms in text corpora.

Social Media und der Iconic Turn: Diagrammatische Ordnungen im Web 2.0

Die beiden zentralen Forderungen des Iconic Turns sind, 1) das Bildhafte an kulturellem Handeln a... more Die beiden zentralen Forderungen des Iconic Turns sind, 1) das Bildhafte an kulturellem Handeln anzuerkennen, aber auch 2) Bilder als Analyseinstrument von Kultur zu nutzen. Es geht um eine Rehabilitierung des Visuellen in seiner weitesten Bedeutung, verbunden mit der Erkenntnis, dass Bilder masgeblich kulturelles Handeln pragen. Als Korpuslinguist ist man immer wieder mit dem Vorwurf konfrontiert, mit der Analyse von Textkorpora ebendiese Visualitat straflich zu vernachlasigen. Wenn man beispielsweise Diskurse in Sozialen Medien wie Twitter, Facebook, Instagram u.a. untersucht, sind diese ohne Zweifel von einer reichen Praxis der Bildbenutzung, Bildzitation etc. durchdrungen. Das stellt korpus- und computerlinguistische Untersuchungen von Social-Media-Daten, wie z.B. sog. Sentiment-Analysen, die grose Mengen von Twitter-Tweets auf ihre Tonalitat hin untersuchen, vor Probleme. Denn, um nur ein Minimalbeispiel zu geben, kann ein Text begleitendes Bild die Tonalitat eines Tweets grund...

Noah Bubenhofer

Uploads

books by Noah Bubenhofer

miscs by Noah Bubenhofer

incollections by Noah Bubenhofer

Log In