More than three years have passed since the release of the second edition of VocBench, an open so... more More than three years have passed since the release of the second edition of VocBench, an open source collaborative web platform for the development of thesauri complying with Semantic Web standards. In these years, a vibrant user community has gathered around the system, consisting of public organizations, companies and independent users looking for open source solutions for maintaining their thesauri, code lists and authority resources. The focus on collaboration, the differentiation of user roles and the workflow management for content validation and publication have been the strengths of the platform, especially for those organizations requiring a centralized and controlled publication environment. Now the time has come to widen the scope of the platform: funded by the ISA2programme of the European Commission, VocBench 3 will offer a general-purpose collaborative environment for development of any kind of RDF dataset, improving the editing capabilities of its predecessor, while ...
Ontologies are commonly used resources: we are witnessing to the constant grow, in number and het... more Ontologies are commonly used resources: we are witnessing to the constant grow, in number and heterogeneity, of communities working with large volumes of data. Researchers, practitioners, developers and end users deal with a huge amount of data from different perspective, topics, cultures, languages. For people involved in governing both data and processes this remains a difficult task. End users and practitioners are usually interested in merging, data generated from connected objects in customizable ways. Even if a lot of algorithms and tools have been created to achieve such goal in a full automatic manner, human contribution is nonetheless still important. On the way to reach the aforementioned results, researchers concentrated their effort in making easier both representation and visualization of data, thus simplifying user interaction with the system. The main aspect to deal with is how to represent an ontology alignment providing a good overview of the alignment and whatever ...
In this paper we present a language, PEARL, for projecting annotations based on the Unstructured ... more In this paper we present a language, PEARL, for projecting annotations based on the Unstructured Information Management Architecture (UIMA) over RDF triples. The language offer is twofold: first, a query mechanism, built upon (and extending) the basic FeaturePath notation of UIMA, allows for efficient access to the standard annotation format of UIMA based on feature structures. PEARL then provides a syntax for projecting the retrieved information onto an RDF Dataset, by using a combination of a SPARQL-like notation for matching pre-existing elements of the dataset and of meta-graph patterns, for storing new information into it. In this paper we present the basics of this language and how a PEARL document is structured, discuss a simple use-case and introduce a wider project about automatic acquisition of knowledge, in which PEARL plays a pivotal role.
The Semantic Web dream of a real world-wide graph of interconnected resources is – slowly but ste... more The Semantic Web dream of a real world-wide graph of interconnected resources is – slowly but steadily – becoming a concrete reality. Still, the whole range of models and technologies which will change forever the way we interact with the web, seems to be missing from every-day technologies available on our personal computers. Ontologies, annotation facilities and semantic querying could (and should) bring new life to Personal Information Management, supporting users in contrasting the ever-growing information overload they are facing in these years, overwhelmed by plethora of communication channels and media. In this paper we present our attempt in bringing the Semantic Web Knowledge Management paradigm at the availability of diverse personal desktop tools (Web Browser, Mail clients, Agenda etc...), by evolving Web Browser Semantic extension Semantic Turkey to an extensible framework providing RDF data access at different levels: java access through OSGi extensions, HTTP access or ...
AGROVOC is the multilingual thesaurus managed and published by the Food and Agriculture Organizat... more AGROVOC is the multilingual thesaurus managed and published by the Food and Agriculture Organization of the United Nations (FAO). Its content is available in more than 40 languages and covers all the FAO’s areas of interest. The structural basis is a resource description framework (RDF) and simple knowledge organization system (SKOS). More than 39,000 concepts identified by a uniform resource identifier (URI) and 800,000 terms are related through a hierarchical system and aligned to knowledge organization systems. This paper aims to illustrate the recent developments in the context of AGROVOC and to present use cases where it has contributed to enhancing the interoperability of data shared by different information systems.
In recent years, artificial intelligence technologies were introduced in medicine as a support to... more In recent years, artificial intelligence technologies were introduced in medicine as a support to the diagnostic process. Among others, the multi-systemic and multi-factorial disorders, that involve different aspects of human anatomy and could be triggered by many factors, genetic and environmental, became a challenging application field for AI. The analysis of those factors, in fact, requires the integration of heterogeneous information coming from different fields (genome screening, neuroimaging, environmental risk factors, etc.). The support of artificial intelligence technologies becomes crucial for a better understanding of this kind of diseases, not only for the identification of the factors but also to explain how these factors can impact on the disease. In this paper we present EMMI (Enquiry Model for Medical Investigation): an ontology-driven automatic medical inquirer that (in a first application context), leveraging an Interrogative Model of Inquiry, is able to selectivel...
The Semantic Web is facing the important challenge to maintain its promise of a real world-wide g... more The Semantic Web is facing the important challenge to maintain its promise of a real world-wide graph of interconnected resources. Unfortunately, while URIs almost guarantee a direct reference to entities, the relation between the two is not bijective. Many different URI references to same concepts and entities can arise when - in such a heterogeneous setting as the WWW - people independently build new ontologies, or populate shared ones with new arbitrarily identified individuals. The proliferation of URIs is an unwanted, though natural effect strictly bound to the same principles which characterize the Semantic Web; reducing this phenomenon will improve the recall of Semantic Search engines, which could rely on explicit links between heterogeneous information sources. To address this problem, in this paper we present an integrated environment combining the semantic annotation and ontology building features available in the Semantic Turkey web browser extension, with globally uniqu...
This paper introduces HORUS (Human-readable Ontology Reasoner Unit System), a configurable reason... more This paper introduces HORUS (Human-readable Ontology Reasoner Unit System), a configurable reasoner which provides the user the motivations for every inferred knowledge in the context of a reasoning process. We describe the reasoner, how to write an inference rule and check which explicit knowledge was used to infer a new one. Real cases examples will be provided to show the capabilities of our reasoner and the associated language developed to express inference rules. We show how HORUS allows the user to understand the logical process over which each new RDF triple has been generated.
The result of the EU is a complex, multilingual, multicultural and yet united environment, requir... more The result of the EU is a complex, multilingual, multicultural and yet united environment, requiring solid integration policies and actions targeted at simplifying cross-language and cross-cultural knowledge access. The legal domain is a typical case in which both the linguistic and the conceptual aspects mutually interweave into a knowledge barrier that is hard to break. In the context of the ISA2 funded project “Public Multilingual Knowledge Infrastructure” (PMKI) we are addressing Semantic Interoperability at both the conceptual and lexical level, by developing a set of coordinated instruments for advanced lexicalization of RDF resources (be them ontologies, thesauri and datasets in general) and for alignment of their content. In this paper, we describe the objectives of the project and the concrete actions, specifically in the legal domain, that will create a platform for multilingual cross-jurisdiction accessibility to legal content in the EU.
The dynamic and distributed nature of the Semantic Web demands for methodologies and systems fost... more The dynamic and distributed nature of the Semantic Web demands for methodologies and systems fostering collective participation to the evolution of datasets. In collaborative and iterative processes for dataset development, it is important to keep track of individual changes for provenance. Different scenarios may require mechanisms to foster consensus, resolve conflicts between competing changes, reversing or ignoring changes etc. In this paper, we perform a landscape analysis of version control for RDF datasets, emphasizing the importance of change reversion to support validation. Firstly, we discuss different representations of changes in RDF datasets and introduce higher-level perspectives on change. Secondly, we analyze diverse approaches to version control. We conclude by focusing on validation, characterizing it as a separate need from the mere preservation of different versions of a dataset.
Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, 2018
Creating ontologies is an essential while challenging task to be performed by either a human or a... more Creating ontologies is an essential while challenging task to be performed by either a human or a system: on one hand it is excessively burdensome for a human operator, on the other it is very complex also for a machine due to the not negligible amount of "uncertainty" that it must be able to manage. In the last years, some attempts have been made to automate this process, but at present, due to the large number of aspects to be covered in the automatic creation of an ontology (such as Domain terminology extraction, Concept discovery, Concept hierarchy derivation, …) satisfactory solutions have not been reached yet. In order to produce efficient tools for both creation and enrichment of ontologies, the participation of the human in such a process still seems necessary. Our approach, that foresees a broader framework for ontology learning, is based by first on the automatic extraction of triples from heterogeneous sources, then on the presentation of the most reliable triples to the human operator for validation purposes. The system provides the user with a series of graphical representations that can give him an overview of the level of uncertainty of the automatically generated ontology. Then provides the user with the possibility to perform SPARQL what-if queries, (i.e. assuming as true the triples filtered according to the level of confidence, the source and the structure of the triples). Through a dedicated interface the human can accept or reject triples according to a personal analysis. Such an intervention is fundamental for better completing the ontology creation task in a reduced amount of time. This paper is published under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
OntoLex-Lemon is a collection of RDF vocabularies for specifying the verbalization of ontologies ... more OntoLex-Lemon is a collection of RDF vocabularies for specifying the verbalization of ontologies in natural language. Beyond its original scope, OntoLex-Lemon, as well as its predecessor Monnet lemon, found application in the Linguistic Linked Open Data cloud to represent and interlink language resources on the Semantic Web. Unfortunately, generic ontology and RDF editors were considered inconvenient to use with OntoLex-Lemon because of its complex design patterns and other peculiarities, including indirection, reification and subtle integrity constraints. This perception led to the development of dedicated editors, trading the flexibility of RDF in combining different models (and the features already available in existing RDF editors) for a more direct and streamlined editing of OntoLex-Lemon patterns. In this paper, we investigate on the benefits gained by extending an already existing RDF editor, VocBench 3, with capabilities closely tailored to OntoLex-Lemon and on the challenge...
Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponent... more Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.
In this paper we introduce CODA (Computer-aided Ontology Development Architecture), an Architectu... more In this paper we introduce CODA (Computer-aided Ontology Development Architecture), an Architecture and a Framework for semiautomatic development of ontologies through analysis of heterogeneous information sources. We have been motivated in its design by observing that several fields of research provided interesting contributions towards the objective of augmenting/enriching ontology content, but that they lack a common perspective and a systematic approach. While in the context of Natural Language Processing specific architectures and frameworks have been defined, time is not yet completely mature for systems able to reuse the extracted information for ontology enrichment purposes: several examples do exist, though they do not comply with any leading model or architecture. Objective of CODA is to acknowledge and improve existing frameworks to cover the above gaps, by providing: a conceptual systematization of data extracted from unstructured information to enrich ontology content, an architecture defining the components which take part in such a scenario, and a framework supporting all of the above through standard implementations. This paper provides a first overview of the whole picture, and introduces UIMAST, an extension for the Knowledge Management and Acquisition Platform Semantic Turkey, that implements CODA principles by allowing reuse of components developed inside UIMA framework to drive semi-automatic Acquisition of Knowledge from Web Content.
More than three years have passed since the release of the second edition of VocBench, an open so... more More than three years have passed since the release of the second edition of VocBench, an open source collaborative web platform for the development of thesauri complying with Semantic Web standards. In these years, a vibrant user community has gathered around the system, consisting of public organizations, companies and independent users looking for open source solutions for maintaining their thesauri, code lists and authority resources. The focus on collaboration, the differentiation of user roles and the workflow management for content validation and publication have been the strengths of the platform, especially for those organizations requiring a centralized and controlled publication environment. Now the time has come to widen the scope of the platform: funded by the ISA2programme of the European Commission, VocBench 3 will offer a general-purpose collaborative environment for development of any kind of RDF dataset, improving the editing capabilities of its predecessor, while ...
Ontologies are commonly used resources: we are witnessing to the constant grow, in number and het... more Ontologies are commonly used resources: we are witnessing to the constant grow, in number and heterogeneity, of communities working with large volumes of data. Researchers, practitioners, developers and end users deal with a huge amount of data from different perspective, topics, cultures, languages. For people involved in governing both data and processes this remains a difficult task. End users and practitioners are usually interested in merging, data generated from connected objects in customizable ways. Even if a lot of algorithms and tools have been created to achieve such goal in a full automatic manner, human contribution is nonetheless still important. On the way to reach the aforementioned results, researchers concentrated their effort in making easier both representation and visualization of data, thus simplifying user interaction with the system. The main aspect to deal with is how to represent an ontology alignment providing a good overview of the alignment and whatever ...
In this paper we present a language, PEARL, for projecting annotations based on the Unstructured ... more In this paper we present a language, PEARL, for projecting annotations based on the Unstructured Information Management Architecture (UIMA) over RDF triples. The language offer is twofold: first, a query mechanism, built upon (and extending) the basic FeaturePath notation of UIMA, allows for efficient access to the standard annotation format of UIMA based on feature structures. PEARL then provides a syntax for projecting the retrieved information onto an RDF Dataset, by using a combination of a SPARQL-like notation for matching pre-existing elements of the dataset and of meta-graph patterns, for storing new information into it. In this paper we present the basics of this language and how a PEARL document is structured, discuss a simple use-case and introduce a wider project about automatic acquisition of knowledge, in which PEARL plays a pivotal role.
The Semantic Web dream of a real world-wide graph of interconnected resources is – slowly but ste... more The Semantic Web dream of a real world-wide graph of interconnected resources is – slowly but steadily – becoming a concrete reality. Still, the whole range of models and technologies which will change forever the way we interact with the web, seems to be missing from every-day technologies available on our personal computers. Ontologies, annotation facilities and semantic querying could (and should) bring new life to Personal Information Management, supporting users in contrasting the ever-growing information overload they are facing in these years, overwhelmed by plethora of communication channels and media. In this paper we present our attempt in bringing the Semantic Web Knowledge Management paradigm at the availability of diverse personal desktop tools (Web Browser, Mail clients, Agenda etc...), by evolving Web Browser Semantic extension Semantic Turkey to an extensible framework providing RDF data access at different levels: java access through OSGi extensions, HTTP access or ...
AGROVOC is the multilingual thesaurus managed and published by the Food and Agriculture Organizat... more AGROVOC is the multilingual thesaurus managed and published by the Food and Agriculture Organization of the United Nations (FAO). Its content is available in more than 40 languages and covers all the FAO’s areas of interest. The structural basis is a resource description framework (RDF) and simple knowledge organization system (SKOS). More than 39,000 concepts identified by a uniform resource identifier (URI) and 800,000 terms are related through a hierarchical system and aligned to knowledge organization systems. This paper aims to illustrate the recent developments in the context of AGROVOC and to present use cases where it has contributed to enhancing the interoperability of data shared by different information systems.
In recent years, artificial intelligence technologies were introduced in medicine as a support to... more In recent years, artificial intelligence technologies were introduced in medicine as a support to the diagnostic process. Among others, the multi-systemic and multi-factorial disorders, that involve different aspects of human anatomy and could be triggered by many factors, genetic and environmental, became a challenging application field for AI. The analysis of those factors, in fact, requires the integration of heterogeneous information coming from different fields (genome screening, neuroimaging, environmental risk factors, etc.). The support of artificial intelligence technologies becomes crucial for a better understanding of this kind of diseases, not only for the identification of the factors but also to explain how these factors can impact on the disease. In this paper we present EMMI (Enquiry Model for Medical Investigation): an ontology-driven automatic medical inquirer that (in a first application context), leveraging an Interrogative Model of Inquiry, is able to selectivel...
The Semantic Web is facing the important challenge to maintain its promise of a real world-wide g... more The Semantic Web is facing the important challenge to maintain its promise of a real world-wide graph of interconnected resources. Unfortunately, while URIs almost guarantee a direct reference to entities, the relation between the two is not bijective. Many different URI references to same concepts and entities can arise when - in such a heterogeneous setting as the WWW - people independently build new ontologies, or populate shared ones with new arbitrarily identified individuals. The proliferation of URIs is an unwanted, though natural effect strictly bound to the same principles which characterize the Semantic Web; reducing this phenomenon will improve the recall of Semantic Search engines, which could rely on explicit links between heterogeneous information sources. To address this problem, in this paper we present an integrated environment combining the semantic annotation and ontology building features available in the Semantic Turkey web browser extension, with globally uniqu...
This paper introduces HORUS (Human-readable Ontology Reasoner Unit System), a configurable reason... more This paper introduces HORUS (Human-readable Ontology Reasoner Unit System), a configurable reasoner which provides the user the motivations for every inferred knowledge in the context of a reasoning process. We describe the reasoner, how to write an inference rule and check which explicit knowledge was used to infer a new one. Real cases examples will be provided to show the capabilities of our reasoner and the associated language developed to express inference rules. We show how HORUS allows the user to understand the logical process over which each new RDF triple has been generated.
The result of the EU is a complex, multilingual, multicultural and yet united environment, requir... more The result of the EU is a complex, multilingual, multicultural and yet united environment, requiring solid integration policies and actions targeted at simplifying cross-language and cross-cultural knowledge access. The legal domain is a typical case in which both the linguistic and the conceptual aspects mutually interweave into a knowledge barrier that is hard to break. In the context of the ISA2 funded project “Public Multilingual Knowledge Infrastructure” (PMKI) we are addressing Semantic Interoperability at both the conceptual and lexical level, by developing a set of coordinated instruments for advanced lexicalization of RDF resources (be them ontologies, thesauri and datasets in general) and for alignment of their content. In this paper, we describe the objectives of the project and the concrete actions, specifically in the legal domain, that will create a platform for multilingual cross-jurisdiction accessibility to legal content in the EU.
The dynamic and distributed nature of the Semantic Web demands for methodologies and systems fost... more The dynamic and distributed nature of the Semantic Web demands for methodologies and systems fostering collective participation to the evolution of datasets. In collaborative and iterative processes for dataset development, it is important to keep track of individual changes for provenance. Different scenarios may require mechanisms to foster consensus, resolve conflicts between competing changes, reversing or ignoring changes etc. In this paper, we perform a landscape analysis of version control for RDF datasets, emphasizing the importance of change reversion to support validation. Firstly, we discuss different representations of changes in RDF datasets and introduce higher-level perspectives on change. Secondly, we analyze diverse approaches to version control. We conclude by focusing on validation, characterizing it as a separate need from the mere preservation of different versions of a dataset.
Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, 2018
Creating ontologies is an essential while challenging task to be performed by either a human or a... more Creating ontologies is an essential while challenging task to be performed by either a human or a system: on one hand it is excessively burdensome for a human operator, on the other it is very complex also for a machine due to the not negligible amount of "uncertainty" that it must be able to manage. In the last years, some attempts have been made to automate this process, but at present, due to the large number of aspects to be covered in the automatic creation of an ontology (such as Domain terminology extraction, Concept discovery, Concept hierarchy derivation, …) satisfactory solutions have not been reached yet. In order to produce efficient tools for both creation and enrichment of ontologies, the participation of the human in such a process still seems necessary. Our approach, that foresees a broader framework for ontology learning, is based by first on the automatic extraction of triples from heterogeneous sources, then on the presentation of the most reliable triples to the human operator for validation purposes. The system provides the user with a series of graphical representations that can give him an overview of the level of uncertainty of the automatically generated ontology. Then provides the user with the possibility to perform SPARQL what-if queries, (i.e. assuming as true the triples filtered according to the level of confidence, the source and the structure of the triples). Through a dedicated interface the human can accept or reject triples according to a personal analysis. Such an intervention is fundamental for better completing the ontology creation task in a reduced amount of time. This paper is published under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
OntoLex-Lemon is a collection of RDF vocabularies for specifying the verbalization of ontologies ... more OntoLex-Lemon is a collection of RDF vocabularies for specifying the verbalization of ontologies in natural language. Beyond its original scope, OntoLex-Lemon, as well as its predecessor Monnet lemon, found application in the Linguistic Linked Open Data cloud to represent and interlink language resources on the Semantic Web. Unfortunately, generic ontology and RDF editors were considered inconvenient to use with OntoLex-Lemon because of its complex design patterns and other peculiarities, including indirection, reification and subtle integrity constraints. This perception led to the development of dedicated editors, trading the flexibility of RDF in combining different models (and the features already available in existing RDF editors) for a more direct and streamlined editing of OntoLex-Lemon patterns. In this paper, we investigate on the benefits gained by extending an already existing RDF editor, VocBench 3, with capabilities closely tailored to OntoLex-Lemon and on the challenge...
Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponent... more Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.
In this paper we introduce CODA (Computer-aided Ontology Development Architecture), an Architectu... more In this paper we introduce CODA (Computer-aided Ontology Development Architecture), an Architecture and a Framework for semiautomatic development of ontologies through analysis of heterogeneous information sources. We have been motivated in its design by observing that several fields of research provided interesting contributions towards the objective of augmenting/enriching ontology content, but that they lack a common perspective and a systematic approach. While in the context of Natural Language Processing specific architectures and frameworks have been defined, time is not yet completely mature for systems able to reuse the extracted information for ontology enrichment purposes: several examples do exist, though they do not comply with any leading model or architecture. Objective of CODA is to acknowledge and improve existing frameworks to cover the above gaps, by providing: a conceptual systematization of data extracted from unstructured information to enrich ontology content, an architecture defining the components which take part in such a scenario, and a framework supporting all of the above through standard implementations. This paper provides a first overview of the whole picture, and introduces UIMAST, an extension for the Knowledge Management and Acquisition Platform Semantic Turkey, that implements CODA principles by allowing reuse of components developed inside UIMA framework to drive semi-automatic Acquisition of Knowledge from Web Content.
Uploads
Papers by Andrea Turbati