James K Tauber | Lancaster University - Academia.edu

Skip to main content

James K Tauber

Lancaster University, Linguistics and English Language, Graduate Student

The University of Western Australia, Linguistics, Alumnus

University of Wales Trinity Saint David, Classics, Alumnus

University of Illinois at Chicago, Educational Psychology, Alumnus

Signum University, Language and Literature, Alumnus

Followers

291

Following

293

Co-authors

13

Mentions

111

Public Views

My scholarship is at the intersection of linguistics, comparative philology, Ancient Greek, computing, and learning science.

less

InterestsView All (86)

Uploads

Papers

Tolkien’s use of invented languages in The Lord of the Rings

Reading Fictional Languages, 2023

Challenges for the Representation of Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.

The evolution of XML schemas

... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advis... more

Digital Toolbox: XML After 1.0: You Ain't Seen Nothin' Yet

IEEE Internet Computing, 1999

The prevalence of XML in discussions of the Web and its application to such diverse application d... more The prevalence of XML in discussions of the Web and its application to such diverse application domains in the past year (1998-99) have almost eclipsed the fact that the 1.0 specification, approved as a W3C Recommendation in February 1998, is only the first part of the "structured documents" originally envisioned. The specifications for two more parts-hypertext link types and the stylesheet language-are nearing completion, and the W3C chartered new working groups last year (1998) to generate new members of the family.

pyuca: a Python implementation of the Unicode Collation Algorithm

Journal of open source software, May 18, 2016

XML after 1.0: you ain't seen nothin' yet

IEEE Internet Computing, 1999

The CITE Architecture: Q and A Regarding CTS and CITE

Character Encoding of Classical Languages

De Gruyter eBooks, Aug 5, 2019

Beyond Translation: language hacking and new pathways into language

From File Interoperability to Service Interoperability : The Distributed Text Services

Beyond translation: engaging with foreign languages in a digital library

International Journal on Digital Libraries

Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.

Correction: Beyond translation: engaging with foreign languages in a digital library

International Journal on Digital Libraries

pyuca: a Python implementation of the Unicode Collation Algorithm

The Journal of Open Source Software, 2016

Challenges for the Representation of Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and ...

Character Encoding of Classical Languages

Digital Classical Philology, Aug 5, 2019

Collaborative environment for producing software products

Method and system for dynamically modeling resources

Web-based Image Preference

Some experimenters have begun to carry out image preference experiments over the web, with observ... more Some experimenters have begun to carry out image preference experiments over the web, with observers completing the task in their own time and using their own display devices. This reduces the administrative overhead, and opens the possibility to huge numbers of potential observers. However, we have to surrender some control over viewing conditions. In previous work, we evaluated an existing web-based paired comparison experiment against a lab-based counterpart and found that, generally, the two variants did not correlate to a significantly high degree. In this work we extend that study with the development of our own web-based research platform with greater control over viewing conditions and much larger quantities of observers (over 1,000, with more than 26,000 individual observations). With this, we show much more positive correlation between the web- and lab-based variants. We also show the similarity or otherwise between the two variants as a function of time, which reveals how many web-based observations are required to achieve stable results.

Challenges for the Representations for Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facil...

Tolkien’s use of invented languages in The Lord of the Rings

Reading Fictional Languages, 2023

Challenges for the Representation of Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.

The evolution of XML schemas

... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advis... more

Digital Toolbox: XML After 1.0: You Ain't Seen Nothin' Yet

IEEE Internet Computing, 1999

The prevalence of XML in discussions of the Web and its application to such diverse application d... more The prevalence of XML in discussions of the Web and its application to such diverse application domains in the past year (1998-99) have almost eclipsed the fact that the 1.0 specification, approved as a W3C Recommendation in February 1998, is only the first part of the "structured documents" originally envisioned. The specifications for two more parts-hypertext link types and the stylesheet language-are nearing completion, and the W3C chartered new working groups last year (1998) to generate new members of the family.

pyuca: a Python implementation of the Unicode Collation Algorithm

Journal of open source software, May 18, 2016

XML after 1.0: you ain't seen nothin' yet

IEEE Internet Computing, 1999

The CITE Architecture: Q and A Regarding CTS and CITE

Character Encoding of Classical Languages

De Gruyter eBooks, Aug 5, 2019

Beyond Translation: language hacking and new pathways into language

From File Interoperability to Service Interoperability : The Distributed Text Services

Beyond translation: engaging with foreign languages in a digital library

International Journal on Digital Libraries

Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.

Correction: Beyond translation: engaging with foreign languages in a digital library

International Journal on Digital Libraries

pyuca: a Python implementation of the Unicode Collation Algorithm

The Journal of Open Source Software, 2016

Challenges for the Representation of Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and ...

Character Encoding of Classical Languages

Digital Classical Philology, Aug 5, 2019

Collaborative environment for producing software products

Method and system for dynamically modeling resources

Web-based Image Preference

Some experimenters have begun to carry out image preference experiments over the web, with observ... more Some experimenters have begun to carry out image preference experiments over the web, with observers completing the task in their own time and using their own display devices. This reduces the administrative overhead, and opens the possibility to huge numbers of potential observers. However, we have to surrender some control over viewing conditions. In previous work, we evaluated an existing web-based paired comparison experiment against a lab-based counterpart and found that, generally, the two variants did not correlate to a significantly high degree. In this work we extend that study with the development of our own web-based research platform with greater control over viewing conditions and much larger quantities of observers (over 1,000, with more than 26,000 individual observations). With this, we show much more positive correlation between the web- and lab-based variants. We also show the similarity or otherwise between the two variants as a function of time, which reveals how many web-based observations are required to achieve stable results.

Challenges for the Representations for Morphology in Ontology Lexicons

Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facil...

Linking Lexical Resources for Biblical Greek

(presented at SBL 2017 in Boston) As more resources for Biblical Greek, both old and new, become... more (presented at SBL 2017 in Boston)

As more resources for Biblical Greek, both old and new, become openly available, the opportunities for integrating them become greater. At the level of the word, it might seem a trivial task to match based on lemma. But no two texts are lemmatised the same way and no two lexicons will make the same choices of headwords. Numerical solutions such as Strongs and Goodrick-Kohlenberger solve some problems but introduce new ones. After surveying the various issues and challenges, this talk will provide both a framework for moving forward and a report on practical ways that a variety of texts, lexicons, and other resources such as principal-part lists are being linked in the service of open, biblical digital humanities.

The Route to Adaptive Learning of Greek

(presented at SBL International 2017 in Berlin) One of the promises of machine-actionable ling... more (presented at SBL International 2017 in Berlin)

One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don’t really take advantage of what’s possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student’s knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.

An Online Adaptive Reading Environment for the Greek New Testament

(presented at SBL 2016) One of the promises of machine-actionable linguistic data linked to bibl... more (presented at SBL 2016)

One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don't really take advantage of what's possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student's knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.

A Morphological Lexicon of New Testament Greek

(presented at SBL 2015) Morphological analyses such as analytical lexicons have typically involv... more (presented at SBL 2015)

Morphological analyses such as analytical lexicons have typically involved indicating lemma, part-of-speech, morphosyntactic and morphosemantic information (such as case, number, person, gender, tense, voice, mood and degree). Much progress has been made in recent years making analyses of this sort freely available in digital formats, but the kind of information they contain has not advanced significantly for decades. This paper will provide an overview of the work of the MorphGNT project to develop an electronic Morphological Lexicon of New Testament Greek that adds inflectional classes, roots and stems, stem formation and morphophonological processes, principal parts, and derivational morphology. Beyond serving as a database of linguistic information, the goal of the morphological lexicon is to provide an “executable grammar” so particular grammar points discussed in beginner grammars, intermediate grammars or advanced reference grammars can be tested against a corpus in a way that makes completely transparent where the “rules” are followed and where they fall down. This data also provides useful data for pedagogical tools such as intelligent tutoring systems that typically require better modeling of latent traits in order to determine what a student actually knows and what items best test that knowledge. All data is for the Morphological Lexicon of New Testament Greek is available under a Creative Commons license, and all code used for both the generation and verification of the morphological lexicon is open source.

Better Greek Learning through Better Greek Databases

In an update on the ongoing work he has spoken about in previous Bible Tech conferences, James ta... more

A New Kind of Graded Reader

Morphological Tagging of the Greek New Testament

Slides from a talk given to the Surrey Morphology Group in 2006 on preliminary issues relating to... more

Quantitative Approaches to Versification

by Petr Plecháč, Helena Bermúdez Sabel, Robert Kolár, Anastasia Belousova, James K Tauber, Mirella De Sisto, Kristina V Litvintseva, Andrew Cooper, Vera Polilova (Вера Полилова), Ksenia Tveryanovich, Александр Костюк, and Igor Pilshchikov

This volume presents a wide range of quantitative approaches to versification. It comprises vario... more This volume presents a wide range of quantitative approaches to versification. It comprises various methodological perspectives ranging from simple descriptive statistics to advanced machine learning methods (such as support vector machines, random forests or neural networks) as well as material covering a large span of time and lan -
guages: from very ancient versifications (Sumerian, Akkadian, Hittie; Ancient Greek), through medieval (Old English, Old Icelandic, Old Saxon) and Renaissance verse to modern experiments (free verse, concrete poetry); from English and Russian through Spanish and German to Portuguese and Catalan. Not only written, but also spoken poetry has been analyzed.

Index to the Greek New Testament

Basic Greek Accentuation

Visualisations of the basic rules of Ancient Greek accentuation (law of limitation and σωτῆρα rul... more

Beyond Translation: engaging with foreign languages in a digital library

by Gregory Crane, James K Tauber, and Jake Wegner

International Journal of Digital Libraries

Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.

Linguistic Variation in Tolkien

This study looked at linguistic variation in the fiction writing of J. R. R. Tolkien using a numb... more This study looked at linguistic variation in the fiction writing of J. R. R. Tolkien using a number of different techniques including principal component analysis of function word relative frequencies and multidimensional register analysis. The main prose text of The Hobbit, The Lord of the Rings, and The Silmarillion were prepared and marked up, distinguishing direct speech from narrative. Speaker identification information was then used to generate different ‘lenses’ on the works: subcorpora that focused just on narrative or on speech overall or on the speech of particular characters. Early drafts of the Silmarillion material were also included (although without speaker identification). The texts were annotated for part-of-speech and dependency relations using spaCy and Biber dimension scores calculated using MAT. Numerous visualizations were then made using a custom software pipeline developed in Python and R. This enabled the exploration of differences between the works, between parts within a work, between narrative and direct speech, and between the speech of different characters. The study confirmed previous findings regarding register differences between fictional speech and narration and raised new methodological considerations in that kind of register analysis. It also confirmed that measures such as function word frequency track closely with richer register analyses. Finally, various observations were made about the stylistic shifts in the works themselves that raise interesting literary questions.