Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.
... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advis... more ... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advisor archive. Volume 18 Issue 5, May 2000 Advisor Media, Inc. San Diego, CA, USA table of contents. 2000 Article. Bibliometrics. · Downloads ...
The prevalence of XML in discussions of the Web and its application to such diverse application d... more The prevalence of XML in discussions of the Web and its application to such diverse application domains in the past year (1998-99) have almost eclipsed the fact that the 1.0 specification, approved as a W3C Recommendation in February 1998, is only the first part of the "structured documents" originally envisioned. The specifications for two more parts-hypertext link types and the stylesheet language-are nearing completion, and the W3C chartered new working groups last year (1998) to generate new members of the family.
Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.
Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and ...
Some experimenters have begun to carry out image preference experiments over the web, with observ... more Some experimenters have begun to carry out image preference experiments over the web, with observers completing the task in their own time and using their own display devices. This reduces the administrative overhead, and opens the possibility to huge numbers of potential observers. However, we have to surrender some control over viewing conditions. In previous work, we evaluated an existing web-based paired comparison experiment against a lab-based counterpart and found that, generally, the two variants did not correlate to a significantly high degree. In this work we extend that study with the development of our own web-based research platform with greater control over viewing conditions and much larger quantities of observers (over 1,000, with more than 26,000 individual observations). With this, we show much more positive correlation between the web- and lab-based variants. We also show the similarity or otherwise between the two variants as a function of time, which reveals how many web-based observations are required to achieve stable results.
Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facil...
Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.
... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advis... more ... The evolution of XML schemas. Author: James Tauber, Published in: · Journal. E-business Advisor archive. Volume 18 Issue 5, May 2000 Advisor Media, Inc. San Diego, CA, USA table of contents. 2000 Article. Bibliometrics. · Downloads ...
The prevalence of XML in discussions of the Web and its application to such diverse application d... more The prevalence of XML in discussions of the Web and its application to such diverse application domains in the past year (1998-99) have almost eclipsed the fact that the 1.0 specification, approved as a W3C Recommendation in February 1998, is only the first part of the "structured documents" originally envisioned. The specifications for two more parts-hypertext link types and the stylesheet language-are nearing completion, and the W3C chartered new working groups last year (1998) to generate new members of the family.
Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.
Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and ...
Some experimenters have begun to carry out image preference experiments over the web, with observ... more Some experimenters have begun to carry out image preference experiments over the web, with observers completing the task in their own time and using their own display devices. This reduces the administrative overhead, and opens the possibility to huge numbers of potential observers. However, we have to surrender some control over viewing conditions. In previous work, we evaluated an existing web-based paired comparison experiment against a lab-based counterpart and found that, generally, the two variants did not correlate to a significantly high degree. In this work we extend that study with the development of our own web-based research platform with greater control over viewing conditions and much larger quantities of observers (over 1,000, with more than 26,000 individual observations). With this, we show much more positive correlation between the web- and lab-based variants. We also show the similarity or otherwise between the two variants as a function of time, which reveals how many web-based observations are required to achieve stable results.
Recent years have experienced a growing trend in the publication of language resources as Linguis... more Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facil...
(presented at SBL 2017 in Boston)
As more resources for Biblical Greek, both old and new, become... more (presented at SBL 2017 in Boston)
As more resources for Biblical Greek, both old and new, become openly available, the opportunities for integrating them become greater. At the level of the word, it might seem a trivial task to match based on lemma. But no two texts are lemmatised the same way and no two lexicons will make the same choices of headwords. Numerical solutions such as Strongs and Goodrick-Kohlenberger solve some problems but introduce new ones. After surveying the various issues and challenges, this talk will provide both a framework for moving forward and a report on practical ways that a variety of texts, lexicons, and other resources such as principal-part lists are being linked in the service of open, biblical digital humanities.
(presented at SBL International 2017 in Berlin)
One of the promises of machine-actionable ling... more (presented at SBL International 2017 in Berlin)
One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don’t really take advantage of what’s possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student’s knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.
(presented at SBL 2016)
One of the promises of machine-actionable linguistic data linked to bibl... more (presented at SBL 2016)
One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don't really take advantage of what's possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student's knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.
(presented at SBL 2015)
Morphological analyses such as analytical lexicons have typically involv... more (presented at SBL 2015)
Morphological analyses such as analytical lexicons have typically involved indicating lemma, part-of-speech, morphosyntactic and morphosemantic information (such as case, number, person, gender, tense, voice, mood and degree). Much progress has been made in recent years making analyses of this sort freely available in digital formats, but the kind of information they contain has not advanced significantly for decades. This paper will provide an overview of the work of the MorphGNT project to develop an electronic Morphological Lexicon of New Testament Greek that adds inflectional classes, roots and stems, stem formation and morphophonological processes, principal parts, and derivational morphology. Beyond serving as a database of linguistic information, the goal of the morphological lexicon is to provide an “executable grammar” so particular grammar points discussed in beginner grammars, intermediate grammars or advanced reference grammars can be tested against a corpus in a way that makes completely transparent where the “rules” are followed and where they fall down. This data also provides useful data for pedagogical tools such as intelligent tutoring systems that typically require better modeling of latent traits in order to determine what a student actually knows and what items best test that knowledge. All data is for the Morphological Lexicon of New Testament Greek is available under a Creative Commons license, and all code used for both the generation and verification of the morphological lexicon is open source.
In an update on the ongoing work he has spoken about in previous Bible Tech conferences, James ta... more In an update on the ongoing work he has spoken about in previous Bible Tech conferences, James talks about recent developments in open source learning software and the MorphGNT linguistic database, and how the two work together to provide tools for improving the learning of New Testament Greek.
Slides from a talk given to the Surrey Morphology Group in 2006 on preliminary issues relating to... more Slides from a talk given to the Surrey Morphology Group in 2006 on preliminary issues relating to choice of lexeme division, parts of speech, inflectional morphology tagging in corpora (particularly Greek New Testament) and the use of a lattice structure to model competing analyses.
This volume presents a wide range of quantitative approaches to versification. It comprises vario... more This volume presents a wide range of quantitative approaches to versification. It comprises various methodological perspectives ranging from simple descriptive statistics to advanced machine learning methods (such as support vector machines, random forests or neural networks) as well as material covering a large span of time and lan - guages: from very ancient versifications (Sumerian, Akkadian, Hittie; Ancient Greek), through medieval (Old English, Old Icelandic, Old Saxon) and Renaissance verse to modern experiments (free verse, concrete poetry); from English and Russian through Spanish and German to Portuguese and Catalan. Not only written, but also spoken poetry has been analyzed.
Visualisations of the basic rules of Ancient Greek accentuation (law of limitation and σωτῆρα rul... more Visualisations of the basic rules of Ancient Greek accentuation (law of limitation and σωτῆρα rule) from a mora-based view.
Digital libraries can enable their patrons to go beyond modern language translations and to engag... more Digital libraries can enable their patrons to go beyond modern language translations and to engage directly with sources in more languages than any individual could study, much less master. Translations should be viewed not so much as an end but as an entry point into the sources that they represent. In the case of highly studied sources, one or more experts can curate the network of annotations that support such reading. A digital library should, however, automatically create a serviceable first version of such a multilingual edition. Such a service is possible but benefits (if it does not require) a new generation of increasingly well-designed machine-readable translations, lexica, grammars, and encyclopedias. This paper reports on exploratory work that uses the Homeric epics to explore this wider topic and on the more general application of the results.
This study looked at linguistic variation in the fiction writing of J. R. R. Tolkien using a numb... more This study looked at linguistic variation in the fiction writing of J. R. R. Tolkien using a number of different techniques including principal component analysis of function word relative frequencies and multidimensional register analysis. The main prose text of The Hobbit, The Lord of the Rings, and The Silmarillion were prepared and marked up, distinguishing direct speech from narrative. Speaker identification information was then used to generate different ‘lenses’ on the works: subcorpora that focused just on narrative or on speech overall or on the speech of particular characters. Early drafts of the Silmarillion material were also included (although without speaker identification). The texts were annotated for part-of-speech and dependency relations using spaCy and Biber dimension scores calculated using MAT. Numerous visualizations were then made using a custom software pipeline developed in Python and R. This enabled the exploration of differences between the works, between parts within a work, between narrative and direct speech, and between the speech of different characters. The study confirmed previous findings regarding register differences between fictional speech and narration and raised new methodological considerations in that kind of register analysis. It also confirmed that measures such as function word frequency track closely with richer register analyses. Finally, various observations were made about the stylistic shifts in the works themselves that raise interesting literary questions.
Uploads
As more resources for Biblical Greek, both old and new, become openly available, the opportunities for integrating them become greater. At the level of the word, it might seem a trivial task to match based on lemma. But no two texts are lemmatised the same way and no two lexicons will make the same choices of headwords. Numerical solutions such as Strongs and Goodrick-Kohlenberger solve some problems but introduce new ones. After surveying the various issues and challenges, this talk will provide both a framework for moving forward and a report on practical ways that a variety of texts, lexicons, and other resources such as principal-part lists are being linked in the service of open, biblical digital humanities.
One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don’t really take advantage of what’s possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student’s knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.
One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don't really take advantage of what's possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student's knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.
Morphological analyses such as analytical lexicons have typically involved indicating lemma, part-of-speech, morphosyntactic and morphosemantic information (such as case, number, person, gender, tense, voice, mood and degree). Much progress has been made in recent years making analyses of this sort freely available in digital formats, but the kind of information they contain has not advanced significantly for decades. This paper will provide an overview of the work of the MorphGNT project to develop an electronic Morphological Lexicon of New Testament Greek that adds inflectional classes, roots and stems, stem formation and morphophonological processes, principal parts, and derivational morphology. Beyond serving as a database of linguistic information, the goal of the morphological lexicon is to provide an “executable grammar” so particular grammar points discussed in beginner grammars, intermediate grammars or advanced reference grammars can be tested against a corpus in a way that makes completely transparent where the “rules” are followed and where they fall down. This data also provides useful data for pedagogical tools such as intelligent tutoring systems that typically require better modeling of latent traits in order to determine what a student actually knows and what items best test that knowledge. All data is for the Morphological Lexicon of New Testament Greek is available under a Creative Commons license, and all code used for both the generation and verification of the morphological lexicon is open source.
guages: from very ancient versifications (Sumerian, Akkadian, Hittie; Ancient Greek), through medieval (Old English, Old Icelandic, Old Saxon) and Renaissance verse to modern experiments (free verse, concrete poetry); from English and Russian through Spanish and German to Portuguese and Catalan. Not only written, but also spoken poetry has been analyzed.