The CLARIAH CMDI Forms application provides a new CMDI editing environment for records based on a... more The CLARIAH CMDI Forms application provides a new CMDI editing environment for records based on arbitrary profiles, but with extensive possibilities, based on CMDI 1.2 features and some extensions, to tweak the profile information for a maximum user-friendly editing experience.
Language resources are valuable assets, both for institutions and researchers. To safeguard these... more Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors’ home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT’s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the a Institute and The Language Archive, but is al...
The ISOcat Data Category Registry (www.isocat.org) has been developed by ISO TC 37 and CLARIN to ... more The ISOcat Data Category Registry (www.isocat.org) has been developed by ISO TC 37 and CLARIN to share and explicitate semantics of data categories used within the linguistic community. Semantics in this large and diverse community are constantly evolving and sometimes conflicting. The ISOcat open registry allows community members to collaborate in defining the semantics of linguistic data categories. The aim is to create a core of possibly officially standardized, well specified and widely accepted linguistic data categories. This demonstration will show ISOcat’s features to support direct and indirect collaboration, its efforts to create a set of core data categories for various communities, and possible solutions for current bottlenecks.
This paper describes the development of a CLARIN-compatible repository solution that fulfils both... more This paper describes the development of a CLARIN-compatible repository solution that fulfils both the long-term preservation requirements as well as the current day discoverability and usability needs of an online data repository of language resources. The widely used Fedora Commons open source repository framework, combined with the Islandora discovery layer, forms the basis of the solution. On top of this existing solution, additional modules and tools are developed to make it suitable for the types of data and metadata that are used by the participating partners.
Metadata records created and provided via the Component Metadata Infrastructure (CMDI) can be of ... more Metadata records created and provided via the Component Metadata Infrastructure (CMDI) can be of high quality due to the possibility to create a metadata profile tailored for a specific resource type. However, this flexibility comes with a cost: it's harder to create a metadata editor that can cope well with this diversity. In the Dutch CLARIAH project the aim is to create a user-friendly CMDI editor, which is able to deal with arbitrary profiles and can be embedded in the environments of the various partners. Already a few CMDI editors have been created, e.g., Arbil [Withers 2012], CMDI-Maker [CLASS 2018] and COMEDI [Lyse et al 2015]. Of these Arbil is not supported anymore and CMDI-Maker only supports a limited number of profiles. COMEDI can handle arbitrary CMDI profiles, but it comes with its own dedicated environment and stays very close to the profile, which makes certain technical limitations of CMDI still leak into the end user’s experience. An example is the lack of mul...
Disclaimer/Complaints regulations If you believe that digital publication of certain material inf... more Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.
Im ISOcat-Datenkategorie-Register (Data Category Registry, www.isocat.org) des Technischen Komite... more Im ISOcat-Datenkategorie-Register (Data Category Registry, www.isocat.org) des Technischen Komitees ISO/TC 37 (Terminology and other language and content resources) werden Feldnamen und Werte für Sprachressourcen beschrieben. Empfohlene Feldnamen und zuverlässige Definitionen sollen dazu beitragen, dass Sprachdaten unabhängig von Anwendungen, Plattformen und Communities of Practice (CoP) wiederverwendet werden können. Datenkategorie-Gruppen (Data Category Selections) können eingesehen, ausgedruckt, exportiert und nach kostenloser Registrierung auch neu erstellt werden
The Lexical Markup Framework (ISO 24613:2008) provides a core class diagram and various extension... more The Lexical Markup Framework (ISO 24613:2008) provides a core class diagram and various extensions as the basis for constructing lexical resources. Unfortunately the informative Document Type Definition provided by the standard and other available LMF serializations lack support for many of the powerful features of the model. This paper describes RELISH LMF, which unlocks the full power of the LMF model by providing a set of extensible modern schema modules. As use cases RELISH LL LMF and support by LEXUS, an online lexicon tool, are described.
When managing data sets in research data workflows almost all research disciplines are faced with... more When managing data sets in research data workflows almost all research disciplines are faced with the challenge on how to deal with versioning or, broader, tracking provenance. At this stall we propose an extension to the CMD Infrastructure to specify (provenance) relationships among language resources. Although we are particularly interested in use-cases for describing relations between corpora (update, enrichment etc.), we also like to discuss provenance tracking and provenance use cases in general. Contributions to our work are very welcome.
In the CLARIN infrastructure various national projects have started initiatives to allow users of... more In the CLARIN infrastructure various national projects have started initiatives to allow users of the infrastructure to create chains or workflows of web services. The Component Metadata (CMD) core model for web services described in this paper tries to align the metadata descriptions of these various initiatives. This should allow chaining/workflow engines to find matching and invoke services. The paper describes the landscape of web services architectures and the state of the national initiatives. Based on this a CMD core model for CLARIN is proposed, which, within some limits, can be adapted to the specific needs of an initiative by the standard facilities of CMD. The paper closes with the current state and usage of the model and a look into the future.
The ISOcat Data Category Registry contains basically a flat and easily extensible list of data ca... more The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets o f relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original so...
The CLARIAH CMDI Forms application provides a new CMDI editing environment for records based on a... more The CLARIAH CMDI Forms application provides a new CMDI editing environment for records based on arbitrary profiles, but with extensive possibilities, based on CMDI 1.2 features and some extensions, to tweak the profile information for a maximum user-friendly editing experience.
Language resources are valuable assets, both for institutions and researchers. To safeguard these... more Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors’ home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT’s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the a Institute and The Language Archive, but is al...
The ISOcat Data Category Registry (www.isocat.org) has been developed by ISO TC 37 and CLARIN to ... more The ISOcat Data Category Registry (www.isocat.org) has been developed by ISO TC 37 and CLARIN to share and explicitate semantics of data categories used within the linguistic community. Semantics in this large and diverse community are constantly evolving and sometimes conflicting. The ISOcat open registry allows community members to collaborate in defining the semantics of linguistic data categories. The aim is to create a core of possibly officially standardized, well specified and widely accepted linguistic data categories. This demonstration will show ISOcat’s features to support direct and indirect collaboration, its efforts to create a set of core data categories for various communities, and possible solutions for current bottlenecks.
This paper describes the development of a CLARIN-compatible repository solution that fulfils both... more This paper describes the development of a CLARIN-compatible repository solution that fulfils both the long-term preservation requirements as well as the current day discoverability and usability needs of an online data repository of language resources. The widely used Fedora Commons open source repository framework, combined with the Islandora discovery layer, forms the basis of the solution. On top of this existing solution, additional modules and tools are developed to make it suitable for the types of data and metadata that are used by the participating partners.
Metadata records created and provided via the Component Metadata Infrastructure (CMDI) can be of ... more Metadata records created and provided via the Component Metadata Infrastructure (CMDI) can be of high quality due to the possibility to create a metadata profile tailored for a specific resource type. However, this flexibility comes with a cost: it's harder to create a metadata editor that can cope well with this diversity. In the Dutch CLARIAH project the aim is to create a user-friendly CMDI editor, which is able to deal with arbitrary profiles and can be embedded in the environments of the various partners. Already a few CMDI editors have been created, e.g., Arbil [Withers 2012], CMDI-Maker [CLASS 2018] and COMEDI [Lyse et al 2015]. Of these Arbil is not supported anymore and CMDI-Maker only supports a limited number of profiles. COMEDI can handle arbitrary CMDI profiles, but it comes with its own dedicated environment and stays very close to the profile, which makes certain technical limitations of CMDI still leak into the end user’s experience. An example is the lack of mul...
Disclaimer/Complaints regulations If you believe that digital publication of certain material inf... more Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.
Im ISOcat-Datenkategorie-Register (Data Category Registry, www.isocat.org) des Technischen Komite... more Im ISOcat-Datenkategorie-Register (Data Category Registry, www.isocat.org) des Technischen Komitees ISO/TC 37 (Terminology and other language and content resources) werden Feldnamen und Werte für Sprachressourcen beschrieben. Empfohlene Feldnamen und zuverlässige Definitionen sollen dazu beitragen, dass Sprachdaten unabhängig von Anwendungen, Plattformen und Communities of Practice (CoP) wiederverwendet werden können. Datenkategorie-Gruppen (Data Category Selections) können eingesehen, ausgedruckt, exportiert und nach kostenloser Registrierung auch neu erstellt werden
The Lexical Markup Framework (ISO 24613:2008) provides a core class diagram and various extension... more The Lexical Markup Framework (ISO 24613:2008) provides a core class diagram and various extensions as the basis for constructing lexical resources. Unfortunately the informative Document Type Definition provided by the standard and other available LMF serializations lack support for many of the powerful features of the model. This paper describes RELISH LMF, which unlocks the full power of the LMF model by providing a set of extensible modern schema modules. As use cases RELISH LL LMF and support by LEXUS, an online lexicon tool, are described.
When managing data sets in research data workflows almost all research disciplines are faced with... more When managing data sets in research data workflows almost all research disciplines are faced with the challenge on how to deal with versioning or, broader, tracking provenance. At this stall we propose an extension to the CMD Infrastructure to specify (provenance) relationships among language resources. Although we are particularly interested in use-cases for describing relations between corpora (update, enrichment etc.), we also like to discuss provenance tracking and provenance use cases in general. Contributions to our work are very welcome.
In the CLARIN infrastructure various national projects have started initiatives to allow users of... more In the CLARIN infrastructure various national projects have started initiatives to allow users of the infrastructure to create chains or workflows of web services. The Component Metadata (CMD) core model for web services described in this paper tries to align the metadata descriptions of these various initiatives. This should allow chaining/workflow engines to find matching and invoke services. The paper describes the landscape of web services architectures and the state of the national initiatives. Based on this a CMD core model for CLARIN is proposed, which, within some limits, can be adapted to the specific needs of an initiative by the standard facilities of CMD. The paper closes with the current state and usage of the model and a look into the future.
The ISOcat Data Category Registry contains basically a flat and easily extensible list of data ca... more The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets o f relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original so...
Uploads
Talks by M. Windhouwer
Papers by M. Windhouwer