Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as... more
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF-based text publishing is hindering scientific progress as it buries scholarly information into non-machine-readable formats. The key objective of SKG4EOSC is to improve science productivity through development and implementation of services for text and data conversion, and production, curation, and re-use of FAIR scholarly information. This will be achieved by (1) establishing the Open Research Knowledge Graph (ORKG, orkg.org), a service operated by the SKG4EOSC coordinator, as a Hub for access to FAIR scholarly information in the EOSC; (2) lifting to EOSC of numerous and heterogeneous domain-specific research infrastructures through the...
Background Eating disorders affect an increasing number of people. Social networks provide information that can help. Objective We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders... more
Background Eating disorders affect an increasing number of people. Social networks provide information that can help. Objective We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders domain. Methods We collected tweets related to eating disorders, for 3 consecutive months. After preprocessing, a subset of 2000 tweets was labeled: (1) messages written by people suffering from eating disorders or not, (2) messages promoting suffering from eating disorders or not, (3) informative messages or not, and (4) scientific or nonscientific messages. Traditional machine learning and deep learning models were used to classify tweets. We evaluated accuracy, F1 score, and computational time for each model. Results A total of 1,058,957 tweets related to eating disorders were collected. were obtained in the 4 categorizations, with The bidirectional encoder representations from transformer–based models had the best score among the machine learning a...
Developments in the context of Open, Big, and Linked Data have led to an enormous growth of structured data on the Web. To keep up with the pace of efficient consumption and management of the data at this rate, many data Management... more
Developments in the context of Open, Big, and Linked Data have led to an enormous growth of structured data on the Web. To keep up with the pace of efficient consumption and management of the data at this rate, many data Management solutions have been developed for specific tasks and applications. We present LITMUS, a framework for benchmarking data management solutions. LITMUS goes beyond classical storage benchmarking frameworks by allowing for analysing the performance of frameworks across query languages. In this position paper we present the conceptual architecture of LITMUS as well as the considerations that led to this architecture.
This report documents the program and the outcomes of Dagstuhl Seminar 17262 "Federated Semantic Data Management" (FSDM). The purpose of the seminar was to gather experts from the Semantic Web and Database communities, together... more
This report documents the program and the outcomes of Dagstuhl Seminar 17262 "Federated Semantic Data Management" (FSDM). The purpose of the seminar was to gather experts from the Semantic Web and Database communities, together with experts from application areas, to discuss in-depth open issues that have impeded FSDM approaches to be used on a large scale. The discussions were centered around the following four themes, each of which was the focus of a separate working group: i) graph data models, ii) federated query processing, iii) access control and privacy, and iv) use cases and applications. The main outcome of the seminar is a deeper understanding of the state of the art and of the open challenges of FSDM.
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights... more
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multi-disciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms. If otherwise not specified, we use the general term bias to describe problems r...
RESUMEN Este trabajo describe una plataforma que permite automatizar el proceso de anotacion semantica sobre imagenes medicas, sin depender de la ontologia utilizada. Las anotaciones automaticas se realizan mediante: (a) un proceso de... more
RESUMEN Este trabajo describe una plataforma que permite automatizar el proceso de anotacion semantica sobre imagenes medicas, sin depender de la ontologia utilizada. Las anotaciones automaticas se realizan mediante: (a) un proceso de conversion de imagenes medicas DICOM (RDF-izacion) al formato RDF; (b) la integracion de diferentes ontologias biomedicas, a traves de la correspondencia de distintas ontologias biomedicas a los datos DICOM; haciendo la herramienta independiente de la ontologia; (c) la segmentacion y visualizacion de los datos anotados, se utiliza ademas para generar nuevas anotaciones de acuerdo al conocimiento del experto, permitiendo asi validar las anotaciones. Aplicando ademas tecnicas de recuperacion de imagenes basadas en su contenido visual, hace posible la recuperacion de imagenes medicas por similitud de caracteristicas inherentes a las imagenes. Esta plataforma esta siendo construida sobre una arquitectura distribuida, la cual permite optimizar la forma de c...
Tailoring personalized treatments demands the analysis of a patient’s characteristics, which may be scattered over a wide variety of sources. These features include family history, life habits, comorbidities, and potential treatment side... more
Tailoring personalized treatments demands the analysis of a patient’s characteristics, which may be scattered over a wide variety of sources. These features include family history, life habits, comorbidities, and potential treatment side effects. Moreover, the analysis of the services visited the most by a patient before a new diagnosis, as well as the type of requested tests, may uncover patterns that contribute to earlier disease detection and treatment effectiveness. Built on knowledge-driven ecosystems, we devise DE4LungCancer, a health data ecosystem of data sources for lung cancer. In this data ecosystem, knowledge extracted from heterogeneous sources, e.g., clinical records, scientific publications, and pharmacological data, is integrated into knowledge graphs. Ontologies describe the meaning of the combined data, and mapping rules enable the declarative definition of the transformation and integration processes. DE4LungCancer is assessed regarding the methods followed for da...
Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health... more
Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments’ success; they also allow ordinary people to express their concerns about a disease or disorder. The Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from posts that enable it to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI models from accurately encoding these posts’ contextual knowledge. We propose a hybrid approach where knowledge encoded in community-maintained knowledge graphs (e.g., Wikidata) is combined with deep learning t...
Background Dementia develops as cognitive abilities deteriorate, and early detection is critical for effective preventive interventions. However, mainstream diagnostic tests and screening tools, such as CAMCOG and MMSE, often fail to... more
Background Dementia develops as cognitive abilities deteriorate, and early detection is critical for effective preventive interventions. However, mainstream diagnostic tests and screening tools, such as CAMCOG and MMSE, often fail to detect dementia accurately. Various graph-based or feature-dependent prediction and progression models have been proposed. Whenever these models exploit information in the patients’ Electronic Medical Records, they represent promising options to identify the presence and severity of dementia more precisely. Methods The methods presented in this paper aim to address two problems related to dementia: (a) Basic diagnosis: identifying the presence of dementia in individuals, and (b) Severity diagnosis: predicting the presence of dementia, as well as the severity of the disease. We formulate these two tasks as classification problems and address them using machine learning models based on random forests and decision tree, analysing structured clinical data f...
This booklet aims to tackle this problem by providing a practical introduction to the practice of peer reviewing. Although it mainly focuses on paper reviewing for scientific events in the domain of computer science and (business)... more
This booklet aims to tackle this problem by providing a practical introduction to the practice of peer reviewing. Although it mainly focuses on paper reviewing for scientific events in the domain of computer science and (business) informatics, many of the principles, tips, tricks, and examples are generalizable to journal reviewing and other scientific domains. Some of the principles and tips can also be applied when reviewing proposals for research projects or grants. In addition, many aspects of this booklet will also benefit authors of scientific papers (even outside computer science) as they will gain more insight into how papers are reviewed and hence where they have to pay attention to when writing their papers.
Path-based systems to guide scientists in the maze of biological data sources Sarah Cohen-Boulakia
This paper presents ARTEMIS, a control system for autonomous robots or software agents. ARTEMIS can create human-like artificial emotions during interactions with their environment. We describe the underlying mechanisms for this. The... more
This paper presents ARTEMIS, a control system for autonomous robots or software agents. ARTEMIS can create human-like artificial emotions during interactions with their environment. We describe the underlying mechanisms for this. The control system also captures its past artificial emotions. A specific interpretation of a knowledge graph, called an Agent Knowledge Graph, stores these artificial emotions. ARTEMIS then utilizes current and stored emotions to adapt decision making and planning processes. As proof of concept, we realize a concrete software agent based on the ARTEMIS control system. This software agent acts as a user assistant and executes their orders and instructions. The environment of this user assistant consists of several other autonomous agents that offer their services. The execution of a user’s orders requires interactions of the user assistant with these autonomous service agents. These interactions lead to the creation of artificial emotions within the user as...
In Computer Science, properties of formal theories that model real-world phenomena can be formally demonstrated using logic formal systems, e.g., given a proof of the best case complexity of a problem, or a demonstration of the soundness... more
In Computer Science, properties of formal theories that model real-world phenomena can be formally demonstrated using logic formal systems, e.g., given a proof of the best case complexity of a problem, or a demonstration of the soundness and completeness of a solution. Additionally, as in other Natural Sciences, characteristics of a theory can be empirically evaluated following the scientific method which provides procedures to systematically conduct experiments and to test hypotheses about these characteristics. Formally proven properties or empirically confirmed hypotheses can be accepted as accounting of known facts, while falsifiable statements that cannot be validated correspond to negative and inconclusive results. In this talk, we first discuss the different types of negative results that can be obtained during the formal and empirical validation of Computer Science approaches, e.g., contra-examples of theorems, intractability and undecidability of a problem, or statistically...
Mini-Mental State Examination (MMSE) is used as a diagnostic test for dementia to screen a patient’s cognitive assessment and disease severity. However, these examinations are often inaccurate and unreliable either due to human error or... more
Mini-Mental State Examination (MMSE) is used as a diagnostic test for dementia to screen a patient’s cognitive assessment and disease severity. However, these examinations are often inaccurate and unreliable either due to human error or due to patients’ physical disability to correctly interpret the questions as well as motor deficit. Erroneous data may lead to a wrong assessment of a specific patient. Therefore, other clinical factors (e.g., gender and comorbidities) existing in electronic health records, can also play a significant role, while reporting her examination results. This work considers various clinical attributes of dementia patients to accurately determine their cognitive status in terms of the Mini-Mental State Examination (MMSE) Score. We employ machine learning models to calibrate MMSE score and classify the correctness of diagnosis among patients, in order to assist clinicians in a better understanding of the progression of cognitive impairment and subsequent trea...
Important questions about the scientific community, e.g., what authors are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required... more
Important questions about the scientific community, e.g., what authors are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required to answer such questions is often scattered over multiple isolated datasets. Recently, the Knowledge Graph (KG) concept has been identified as a means for interweaving heterogeneous datasets and enhancing answer completeness and soundness. We present a pipeline for creating high quality knowledge graphs that comprise data collected from multiple isolated structured datasets. As proof of concept, we illustrate the different steps in the construction of a knowledge graph in the domain of scholarly communication metadata (SCM-KG). Particularly, we demonstrate the benefits of exploiting semantic web technology to reconcile data about authors, papers, and conferences. We conducted an experimental study on an SCM-KG that merges scientific research metadata from the DBLP bibliographic source and the Microsoft Academic Graph. The observed results provide evidence that queries are processed more effectively on top of the SCM-KG than over the isolated datasets, while execution time is not negatively affected.
Industry 4.0 (I4.0) standards and standardization frameworks provide a unified way to describe smart factories. Standards specify the main components, systems, and processes inside a smart factory and the interaction among all of them.... more
Industry 4.0 (I4.0) standards and standardization frameworks provide a unified way to describe smart factories. Standards specify the main components, systems, and processes inside a smart factory and the interaction among all of them. Furthermore, standardization frameworks classify standards according to their functions into layers and dimensions. Albeit informative, frameworks can categorize similar standards differently. As a result, interoperability conflicts are generated whenever smart factories are described with miss-classified standards. Approaches like ontologies and knowledge graphs enable the integration of standards and frameworks in a structured way. They also encode the meaning of the standards, known relations among them, as well as their classification according to existing frameworks. This structured modeling of the I4.0 landscape using a graph data model provides the basis for graph-based analytical methods to uncover alignments among standards. This paper contri...
In this article, we look at the challenges that arise in the use and management of education credentials, and from the switch from analogue, paper-based education credentials to digital education credentials. We propose a general... more
In this article, we look at the challenges that arise in the use and management of education credentials, and from the switch from analogue, paper-based education credentials to digital education credentials. We propose a general methodology to capture qualitative descriptions and measurable quantitative results that allow to estimate the effectiveness of a digital credential management system in solving these challenges. This methodology is applied to the EU H2020 project QualiChain use case, where five pilots have been selected to study a broad field of digital credential workflows and credential management. Keywords–Credentials; Education credentials; Digitisation; Challenges in digitisation.
The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and... more
The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and security, telecommunications, and manufacturing. In response to the problems of analyzing large-scale data, different tools, techniques, and technologies have bee developed and are available for experimentation. In our analysis, we focused on literature (review articles) accessible via the Elsevier ScienceDirect service and the Springer Link service from more recent years, mainly from the last two decades. For the selected industries, this chapter also discusses challenges that can be addressed and overcome using the semantic processing approaches and knowledge reasoning approaches discussed in this book.
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most... more
Nowadays, in the era of Big Data and Internet of Things, large volumes of data in motion are produced in heterogeneous formats, frequencies, densities, and quantities. In general, data is continuously produced by diverse devices and most of them must be processed at real-time. Indeed, this change of paradigm in the way in which data are produced, forces us to rethink the way in which they should be processed even in presence of parallel approaches. To process continuous data, data-driven frameworks are demanded; they are required to dynamically adapt execution schedulers, reconfigure computational structures, and adjust the use of resources according to the characteristics of the input data stream. In previous work, we introduced the Dynamic Pipeline as one of these computational structures, and we experimentally showed its efficiency when it is used to solve the problem of counting triangles in a graph. In this work, our aim is to define the main components of the Dynamic Pipeline ...
In this work, we present two experimental metrics named dief@t and dief@k which are able to capture and quantify the behavior of any system that produces results incrementally. We demonstrate the effectiveness of dief@t and dief@k on a... more
In this work, we present two experimental metrics named dief@t and dief@k which are able to capture and quantify the behavior of any system that produces results incrementally. We demonstrate the effectiveness of dief@t and dief@k on a generic SPARQL query engine able to produce results incrementally. Attendees will observe how both metrics are able to capture complementary information about the continuous behavior of the studied query engine. Moreover, valuable insights about the engine configurations that allow for continuously producing more answers over time will be observed. The demo is available at http://km.aifb.kit.edu/services/dief-app/.
The development of domain-specific ontologies requires joint efforts among different groups of stakeholders, such as knowledge engineers and domain experts. During the development processes, ontology changes need to be tracked and... more
The development of domain-specific ontologies requires joint efforts among different groups of stakeholders, such as knowledge engineers and domain experts. During the development processes, ontology changes need to be tracked and propagated across developers. Version Control Systems (VCSs) collect metadata describing changes and allow for the synchronization of different versions of the same ontology. Commonly, VCSs follow optimistic approaches to enable the concurrent modification of ontology artifacts, as well as conflict detection and resolution. For conflict detection, VCSs usually apply techniques where files are compared line by line. However, ontology changes can be serialized in different ways during the development process. As a consequence, existing VCSs may detect a large number of false-positive conflicts, i.e., conflicts that do not result from ontology changes but from the fact that two ontology versions are differently serialized. We developed SerVCS in order to enhance VCSs to cope with different serializations of the same ontology, following the principle of prevention is better than cure. SerVCS resorts on unique ontology serializations and minimizes the number of false-positive conflicts. It is implemented on top of Git, utilizing tools such as Rapper and RDF-toolkit for syntax validation and unique serialization, respectively. We conducted an empirical evaluation to determine the conflict detection accuracy of SerVCS whenever simultaneous changes to an ontology are performed using different ontology editors. Experimental results suggest that SerVCS allows VCSs to conduct more effective synchronization processes by preventing false-positive conflicts.
We received 6 submissions from all around the world covering a broad range of topics. We evaluated them regarding relevance, quality, and novelty, selecting 5 papers. We invited 2 extended abstracts from the recent Special Issue of the... more
We received 6 submissions from all around the world covering a broad range of topics. We evaluated them regarding relevance, quality, and novelty, selecting 5 papers. We invited 2 extended abstracts from the recent Special Issue of the Journal of Web Semantics on Managing the Evolution and Preservation of the Data Web (January 2019). In addition, Laure Berti-Équille will give an inspiring talk, and we will close the workshop with a plenary discussion. Thus, MEPDaW includes the following talks:
The increasing number of RDF data sources that allow for querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data... more
The increasing number of RDF data sources that allow for querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data sources, and rely on source descriptions for selecting the data sources over which unified queries will be executed. Albeit efficient, existing federated SPARQL query engines usually ignore the meaning of data accessible from a data source, and describe sources only in terms of the vocabularies utilized in the data source. Lack of source description may conduce to the erroneous selection of data sources for a query, thus affecting the performance of query processing over the federation. We tackle the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of RDF molecule templates, i.e., abstract descriptions of entities belonging to the same RDF class. Moreover, MULDER utilizes RDF molecule templates for source selection, and query decomposition and optimization. We empirically study the performance of MULDER on existing benchmarks, and compare MULDER performance with state-of-the-art federated SPARQL query engines. Experimental results suggest that RDF molecule templates empower MULDER federated query processing, and allow for the selection of RDF data sources that not only reduce execution time, but also increase answer completeness.
Nowadays, there is a rapid increase in the number of sensor data produced by a wide variety of devices and sensors. Collections of sensor data can be semantically described using ontologies, e.g., the Semantic Sensor Network (SSN)... more
Nowadays, there is a rapid increase in the number of sensor data produced by a wide variety of devices and sensors. Collections of sensor data can be semantically described using ontologies, e.g., the Semantic Sensor Network (SSN) ontology. Albeit semantically enriched, the volume of semantic sensor data is considerably larger than raw sensor data. Moreover, some measurement values can be observed several times, and a large number of repeated facts can be generated. We devise a compact or factorized representation of semantic sensor data, where repeated values are represented only once. To scale up to large datasets, tabular representation is utilized to store and manage factorized semantic sensor data using Big data technologies. We empirically study the effectiveness of the proposed factorized representation of semantic sensor data, and the impact of factorizing semantic sensor data on query processing. Furthermore, we evaluate the effects of storing RDF factorized data on state-of-the-art RDF engines and in the proposed tabular-based representation. Results suggest that factorization techniques empower storage and query processing of sensor data, and execution time can be reduced by up to two orders of magnitude.
Big biomedical data has grown exponentially during the last decades and a similar growth rate is expected in the next years. Likewise, semantic web technologies have also advanced during the last years, and a great variety of tools, e.g.,... more
Big biomedical data has grown exponentially during the last decades and a similar growth rate is expected in the next years. Likewise, semantic web technologies have also advanced during the last years, and a great variety of tools, e.g., ontologies and query languages, have been developed by different scientific communities and practitioners. Although a rich variety of tools and big data collections are available, many challenges need to be addressed in order to discover insights from which decisions can be taken. For instance, different interoperability conflicts can exist among data collections, data may be incomplete, and entities may be dispersed across different datasets. These issues hinder knowledge exploration and discovery, being thus required data integration in order to unveil meaningful outcomes. In this chapter, we address these challenges and devise a knowledge-driven framework that relies on semantic web technologies to enable knowledge exploration and discovery. The framework receives big data sources and integrates them into a knowledge graph. Semantic data integration methods are utilized for identifying equivalent entities, i.e., entities that correspond to the same real-world elements. Fusion policies enable the merging of equivalent entities inside the knowledge graph, as well as with entities in other knowledge graphs, e.g., DBpedia and Bio2RFD. Knowledge discovery allows for the exploration of knowledge graphs in order to uncover novel patterns and relations. As proof of concept, we report on the results of applying the knowledge-driven framework in the EU funded project iASiS (http://project-iasis.eu/) in order to transform big data into actionable knowledge, paving thus the way for personalised medicine.
Transforming natural language questions into formal queries is an integral task in Question Answering (QA) systems. QA systems built on knowledge graphs like DBpedia, require a step after natural language processing for linking words,... more
Transforming natural language questions into formal queries is an integral task in Question Answering (QA) systems. QA systems built on knowledge graphs like DBpedia, require a step after natural language processing for linking words, specifically including named entities and relations, to their corresponding entities in a knowledge graph. To achieve this task, several approaches rely on background knowledge bases containing semantically-typed relations, e.g., PATTY, for an extra disambiguation step. Two major factors may affect the performance of relation linking approaches whenever background knowledge bases are accessed: a) limited availability of such semantic knowledge sources, and b) lack of a systematic approach on how to maximize the benefits of the collected knowledge. We tackle this problem and devise SIBKB, a semantic-based index able to capture knowledge encoded on background knowledge bases like PATTY. SIBKB represents a background knowledge base as a bi-partite and a dynamic index over the relation patterns included in the knowledge base. Moreover, we develop a relation linking component able to exploit SIBKB features. The benefits of SIBKB are empirically studied on existing QA benchmarks and observed results suggest that SIBKB is able to enhance the accuracy of relation linking by up to three times.
Interoperability among actors, sensors, and heterogeneous systems is a crucial factor for realizing the Industry 4.0 vision, i.e., the creation of Smart Factories by enabling intelligent human-to-machine and machine-to-machine... more
Interoperability among actors, sensors, and heterogeneous systems is a crucial factor for realizing the Industry 4.0 vision, i.e., the creation of Smart Factories by enabling intelligent human-to-machine and machine-to-machine cooperation. In order to empower interoperability in Smart Factories, standards and reference architectures have been proposed. Standards allow for the description of components, systems, and processes, as well as interactions among them. Reference architectures classify, align, and integrate industrial standards according to their purposes and features. Industrial communities in Europe, the United States, and Asia have proposed various reference architectures. However, interoperability among analogous standards in these reference architectures is hampered due to different granularity representation of similar processes or production parts. In this paper, we survey the landscape of Industry 4.0 standards from a semantic perspective. To tackle the problem of interoperability between standards, we developed STO, an ontology for describing standards and their relations. Characteristics of I4.0 standards are described using STO, and these descriptions are exploited for classifying standards from different perspectives according to the reference architectures. Moreover, the semantics encoded in STO allows for the discovery of relations between I4.0 standards, and for mappings across reference architectures proposed by different industrial communities.
Public administrations are continuously publishing open data, increasing the amount of government open data over time. The published data includes budgets and spending as part of fiscal data; publishing these data is an important part of... more
Public administrations are continuously publishing open data, increasing the amount of government open data over time. The published data includes budgets and spending as part of fiscal data; publishing these data is an important part of transparent and accountable governance. However, open fiscal data should also meet open data publication guidelines. When requirements in data guidelines are not met, effective data analysis over published datasets cannot be performed effectively. In this article, we present Open Fiscal Data Publication (OFDP), a framework to assess the quality of open fiscal datasets. We also present an extensive open fiscal data assessment and common data quality issues found; additionally, open fiscal data publishing guidelines are presented. We studied and surveyed main quality factors for open fiscal datasets. Moreover, the collected quality factors have been scored according to the results of a questionnaire to score quality factors within the OFDP assessment ...
The evolution of the Web of documents into a Web of services and data has resulted in an increased availability of data from almost any domain. For example, general domain knowledge bases such as DBpedia or Wikidata, or domain specific... more
The evolution of the Web of documents into a Web of services and data has resulted in an increased availability of data from almost any domain. For example, general domain knowledge bases such as DBpedia or Wikidata, or domain specific Web sources like the Oxford Art archive, allow for accessing knowledge about a wide variety of entities including people, organizations, or art paintings. However, these data sources publish data in different ways, and they may be equipped with different search capabilities, e.g., SPARQL endpoints or REST services, thus requiring data integration techniques that provide a unified view of the published data. We devise a semantic data integration approach named FuhSen that exploits keyword and structured search capabilities of Web data sources and generates on-demand knowledge graphs merging data collected from available Web sources. Resulting knowledge graphs model semantics or meaning of merged data in terms of entities that satisfy keyword queries, and relationships among those entities. FuhSen relies on both RDF to semantically describe the collected entities, and on semantic similarity measures to decide on relatedness among entities that should be merged. We empirically evaluate the results of FuhSen data integration techniques on data from the DBpedia knowledge base. The experimental results suggest that FuhSen data integration techniques accurately integrate similar entities semantically into knowledge graphs.
We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general... more
We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed-DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100i¾?%. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.
... Jean-Robert Gruser University of Maryland College Park, MD 20742 gruser@umiacs.umd.edu ... TvGuide(Time,Date,Channel,EpisodeTitle, Category,Description,Cable,ShowType) Source S2: A site with a complete guide for all educational... more
... Jean-Robert Gruser University of Maryland College Park, MD 20742 gruser@umiacs.umd.edu ... TvGuide(Time,Date,Channel,EpisodeTitle, Category,Description,Cable,ShowType) Source S2: A site with a complete guide for all educational programs on all public channels. ...
Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together... more
Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. We propose a model that exploits the characteristics of RDF in order to estimate the completeness of portions of a data set. The completeness model complemented by crowd knowledge is used by the HARE query engine to on-the-fly decide which parts of a query should be executed against the data set or via crowd computing. To evaluate HARE, we created and executed a collection of 50 SPARQL queries against the DBpedia data set. Experimental results clearly show that our solution accurately enhances answer completeness.
ABSTRACT
Infrastructures that support reasoning on services and their composition into workflows are critical to evaluate and improve performance. In this paper we present ProtocolDB, a system that supports process design and analysis. The... more
Infrastructures that support reasoning on services and their composition into workflows are critical to evaluate and improve performance. In this paper we present ProtocolDB, a system that supports process design and analysis. The approach allows the description of a workflow with alternate implementations thus, allowing reasoning on workflow performance with respect to service selection during the rewriting phase. We illustrate
Ontologies play an important role in the Semantic Web. Currently, several Web ontology languages have been proposed and there has been a substantial effort towards the formalization of different knowledge domains. However, it is difficult... more
Ontologies play an important role in the Semantic Web. Currently, several Web ontology languages have been proposed and there has been a substantial effort towards the formalization of different knowledge domains. However, it is difficult to identify the ontologies for a specific domain and their annotated Web sources.XWebSOGO is a Web ontology language modeled after RDF and RDF Schema, that allows users to specify a domain in terms of its relevant entities' properties, the services that can be implemented on these entities, their query capabilities and the links that can connect them. In addition, XWebSOGO allows users to describe Web sources in terms of certain domain ontologies. Finally, XWebSOGO can be used to discover ontologies or Web sources that meet certain properties. XWebSOGOQL is a language used to query ontologies and annotated sources. An XWebSOGOQL query can be seen as a conjunctive query, where predicates are defined in terms of XWebSOGO basic concepts. In this p...
Research Interests:
Based on the limitations raised by existing approaches in the context of the Semantic Web, we propose a formalism, Web Sources Global Ontology (WebSOGO), a data meta-model for the description of Web sources in terms of content,... more
Based on the limitations raised by existing approaches in the context of the Semantic Web, we propose a formalism, Web Sources Global Ontology (WebSOGO), a data meta-model for the description of Web sources in terms of content, query-processing capabilities, service and navigation information. WebSOGO is formalized in the Object-Oriented paradigm using the Unified Modeling Language (UML). We also describe WebSOGO-S, a multi-layered system that will allow agents to discover, execute and compose Web sources through queries against a WebSOGO catalog of source ontologies. The UML specification of WebSOGO has been mapped to the Object-Relational (OR) model in Oracle; we define an Oracle cartridge with a list of types and methods that allow the specification of SQL queries against the WebSOGO catalog. By using the JDBC and ODBC drivers provided by Oracle, any agent or application will be able to contact WebSOGO-S.
Research Interests:
Existing RDF engines have developed caching techniques able to store intermediate results and reuse them in further steps of the query execution process; thus, execution time is speeded up by avoiding repeated computation of the same... more
Existing RDF engines have developed caching techniques able to store intermediate results and reuse them in further steps of the query execution process; thus, execution time is speeded up by avoiding repeated computation of the same results. Although these ...
We address the challenges of supporting scientific workflow design for a bioinformatics pipeline (BIP) to detect alternative splicing of an organism's genes. Scientific workflows depend significantly on the selection of the resources... more
We address the challenges of supporting scientific workflow design for a bioinformatics pipeline (BIP) to detect alternative splicing of an organism's genes. Scientific workflows depend significantly on the selection of the resources (data sources and applications) selected for their ...
We propose a two-fold approach that is able to both consume and exploit semantics encoded in the Linking Open Data LOD cloud, and create news that document events reported in micro-blogging posts that correspond to documentary tweets. A... more
We propose a two-fold approach that is able to both consume and exploit semantics encoded in the Linking Open Data LOD cloud, and create news that document events reported in micro-blogging posts that correspond to documentary tweets. A documentary tweet is similar to a newspaper headline and reports an incident or event. Knowledge extracted from documentary tweets are used to develop a story line which will be augmented with RDF facts consumed from the LOD cloud. The resulting news content is represented in RDF using the rNews Ontology, facilitating news generation and retrieval. We study effectiveness of our approach with respect to a gold standard of manually tagged tweets. Initial experimental results suggest that our techniques are able to generate content that reflects up to 76.38% of the manually tagged terms.
... AZ 85281-5706, USA 2 RIADI-GDL Lab, National School of Computer Science Campus Universitaire de la Manouba, 2010 Tunisia {nadia.yacoubi, zoe.lacroix}@asu ... A service si is represented by an RDF triple (I, ni, O) where ni ∈ RN is the... more
... AZ 85281-5706, USA 2 RIADI-GDL Lab, National School of Computer Science Campus Universitaire de la Manouba, 2010 Tunisia {nadia.yacoubi, zoe.lacroix}@asu ... A service si is represented by an RDF triple (I, ni, O) where ni ∈ RN is the service name and I,O ∈ V are the ...
We present a deductive approach that supports resource discovery. The BiOnMap Web service is designed to support the selection of resources suitable to implement specific tasks. The BiOnMap service is comprised of a metadata catalog and a... more
We present a deductive approach that supports resource discovery. The BiOnMap Web service is designed to support the selection of resources suitable to implement specific tasks. The BiOnMap service is comprised of a metadata catalog and a reasoning engine. The metadata catalog uses domain ontologies to annotate resources semantically and express domain rules that capture path equivalences at the level
Page 1. Efficient Techniques to Explore and Rank Paths in Life Science Data Sources ⋆ Zoé Lacroix1, Louiqa Raschid2, and Maria-Esther Vidal3 1 Arizona State University, Tempe, AZ 85287-6106, USA, zoe.lacroix@asu.edu ...
... The research done by the Universidad Simón Bolıvar team, has been partially supported by the DID-USB. The authors would like to thank Christophe Legendre,Maliha Aziz, and Ismail Khalil Ibrahim for their valuable input. ...
In this paper, we describe an approach aiming at enriching the Semantic Web with active information. We propose ACTION, an ACTIve ONtology formalism to express reactive behavior. In ACTION, events are categorized as concepts of an... more
In this paper, we describe an approach aiming at enriching the Semantic Web with active information. We propose ACTION, an ACTIve ONtology formalism to express reactive behavior. In ACTION, events are categorized as concepts of an ontology and, in conjunction with classes, properties and instances, are considered during the query answering and reasoning tasks. We hypothesize that ACTION provides a more expressive solution to the problem of representing and querying active knowledge than existing ECA-based approaches. However, this expressivity power can negatively impact on the complexity of the query processing and reasoning tasks because the number of derived data depends on the number and relationships of the events. The main source of complexity is produced because the number of the derived facts is polynomial with respect to the size of the events, and the same evaluations may be fired by different events. To overcome this problem, we propose optimization strategies to identify Magic Set rewritings where the number of duplicate evaluations is minimized. We present the query rewriting technique called Intersection of Magic Rewritings (IMR), which is based on Magic Sets rewritings that annotate the minimal set of rules that need to be evaluated to process reactive behavior on an ontology. We have conducted an experimental study and have observed that the proposed strategies are able to speed up the tasks of reasoning and query evaluation in two orders of magnitude for small ontologies, and in four orders of magnitude for medium and large ontologies, with respect to the bottom-up strategy.
Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and... more
Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and create knowledge. Today, numerous biological questions can be answered without entering a wet lab. Scientific protocols designed to answer these questions can be run entirely on a computer. Biological resources are often complementary, focused on different objects and reflecting various experts' points of view. Exploiting the richness and diversity of these resources is crucial for scientists. However, with the increase of resources, scientists have to face the problem of selecting sources and tools when interpreting their data. In this paper, we analyze the way in which biologists express and implement scientific protocols, and we identify the requirements for a system which can guide scientists in constructing protocols to answer new biological ...
ABSTRACT Nowadays, climate changes are impacting life on Earth; ecological effects as warming of sea-surface temperatures, catastrophic events as storms or mudslides, and the increase of infectious diseases, are affecting life and... more
ABSTRACT Nowadays, climate changes are impacting life on Earth; ecological effects as warming of sea-surface temperatures, catastrophic events as storms or mudslides, and the increase of infectious diseases, are affecting life and development. Unfortunately, experts predict that global temperatures will increase even more during the next years; thus, to decide how to assist possibly affected people, experts require tools that help them to discover potential risky regions based on their weather conditions. We address this problem and propose a tool able to support experts in the discovery of these risky areas.We present CAREY, a federated tool built on top of a weather database, that implements a semi-supervised data mining approach to discover regions with similar weather observations which may characterize micro-climate zones. Additionally, Top-k Skyline techniques have been developed to rank micro-climate areas according to how close they are to a given weather condition of risk. We conducted an initial experimental study as a proof-of-concepts, and the preliminary results suggest that CAREY may provide an effective support for the visualization of potential risky areas.
Page 1. R. Meersman et al. (Eds.): OTM Workshops 2005, LNCS 3762, pp. 790–799, 2005. © Springer-Verlag Berlin Heidelberg 2005 Top-k Skyline: A Unified Approach Marlene Goncalves and María-Esther Vidal Universidad ...
Criteria that induce a Skyline naturally represent user's preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be... more
Criteria that induce a Skyline naturally represent user's preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very large, making unfeasible for users to process this set of points. To identify the best points among the Skyline, the Top-k Skyline approach has been proposed. Top-k Skyline uses discriminatory criteria to induce a total order of the points that comprise the Skyline, and recognizes the best or top-k points based on these criteria. In this article the authors model queries as multi-dimensional points that represent bounds of VPT (Vertically Partitioned Table) property values, and datasets as sets of multi-dimensional points; the problem is to locate the k best tuples in the dataset whose distance to the query is minimized. A tuple is among the k best tuples whenever there is not another tuple that is better in all dimensions, and that is closer to t...
Research Interests:
Resource Description Framework (RDF) is a proposal of the WWW Consortium (W3C) to express metadata about resources in the Web. The RDF data model has been formalized using different graph-based representations, each one with its own... more
Resource Description Framework (RDF) is a proposal of the WWW Consortium (W3C) to express metadata about resources in the Web. The RDF data model has been formalized using different graph-based representations, each one with its own limitations with respect to expressive power and sup-port for the tasks of query answering and semantic reasoning. In this paper, we show the advantages of the directed hypergraph-based representation for RDF documents, and analyze space complexity required to store an RDF document, and the impact of this representation on the time complexity of the query answering task. In addition, we empirically compare the DH approach with respect to the labeled directed graphs and bipartite graphs representations. Experimental results show that the time/space tradeoff of the DH-based representation outperforms the other two approaches.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
This book constitutes the thoroughly refereed conference proceedings of the 5th International Workshop on Resource Discovery, RED 2010, co-located with the 9th Extended Semantic Web Conference, held in Heraklion, Greece, in May 2012. The... more
This book constitutes the thoroughly refereed conference proceedings of the 5th International Workshop on Resource Discovery, RED 2010, co-located with the 9th Extended Semantic Web Conference, held in Heraklion, Greece, in May 2012. The 7 revised full papers presented were carefully reviewed and selected from 9 submissions. They deal with various issues related to resource discovery
Research Interests:
Research Interests:
Page 1. 1 Navigating through the Biological Maze Zoé Lacroix and Kaushal Parekh Arizona State University {zoe.lacroix,kaushal}@asu.edu ... Pallavi Mundumby and Srilakshmi Bysani contributed to the identification of the metada of... more
Page 1. 1 Navigating through the Biological Maze Zoé Lacroix and Kaushal Parekh Arizona State University {zoe.lacroix,kaushal}@asu.edu ... Pallavi Mundumby and Srilakshmi Bysani contributed to the identification of the metada of biological resources. 3. References ...
The World Wide Web (WWW) has become the preferred medium for the dissemination of information in virtually every domain of activity. Standards and formats for structured data interchange are being developed. However, access to data is... more
The World Wide Web (WWW) has become the preferred medium for the dissemination of information in virtually every domain of activity. Standards and formats for structured data interchange are being developed. However, access to data is still hindered by the challenge of locating data relevant to a particular problem. Further, after a set of relevant sources has been identified, one
... zoe.lacroix@asu.edu, Arizona State University, PO Box 875706, Tempe Arizona 85287-5706, USA † louiqa@umiacs.umd.edu, University of Maryland, College Park ... a property of the implementation of c by s; • mLS is a mapping from LS to LC... more
... zoe.lacroix@asu.edu, Arizona State University, PO Box 875706, Tempe Arizona 85287-5706, USA † louiqa@umiacs.umd.edu, University of Maryland, College Park ... a property of the implementation of c by s; • mLS is a mapping from LS to LC × [0,1] where for each Li,j ∈ LS ...
Abstract—The widespread explosion of Web accessible re-sources has lead to a new challenge of locating all relevant resources and identifying the best ones to answer a query. This challenge has to address the difficult task of ranking the... more
Abstract—The widespread explosion of Web accessible re-sources has lead to a new challenge of locating all relevant resources and identifying the best ones to answer a query. This challenge has to address the difficult task of ranking the resources based on user needs, as well as ...
Histopathology requires automation, quality control and global collaborative tools. Usually the PIMS (Pathology information management system) automates samples, images and reports and progressively incorporates the PI (Pathology... more
Histopathology requires automation, quality control and global collaborative tools. Usually the PIMS (Pathology information management system) automates samples, images and reports and progressively incorporates the PI (Pathology informatics), the D-PATH (digital pathology), e-PATH (electronic pathology), the PPH (Patho-pharmacology), virtual autopsy (VA) and all type of translational research in the PMIS. Not being subject to a specific standard, quality control follows ISO-13485:2003 on services and medical devices, ISO 17025:2005 on technical aspects; and ISO-15198:2003 for automate and quantifiable procedures that will be affected by the new European Directive on medical devices. For the non-standardized pathology procedures, consumer’s requirements are what define test and calibration procedures. The paper analysed the non-standardized procedures: VS (Virtual Slides), GRID networking and Literature Based Discovery as tools for knowledge discovery of relevant relationships on im...
Criteria that induce a Skyline naturally represent user’s preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very... more
Criteria that induce a Skyline naturally represent user’s preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very large, making unfeasible for users to process this set of points. To identify the best points among the Skyline, the Top-k Skyline approach has been
The World Wide Web (WWW) is a great repository of data and it may reference thousands of data sources for almost any knowledge domain. Users frequently access sources to query information and may be interested only in the top k answers... more
The World Wide Web (WWW) is a great repository of data and it may reference thousands of data sources for almost any knowledge domain. Users frequently access sources to query information and may be interested only in the top k answers that meet their preferences. Although, many declarative languages have been defined to express WWW queries, the problem of specifying
During the last years, RDF datasets from almost any knowledge domain have been published in the Linking Open Data (LOD) cloud. The Linked Open Data guidelines establish the conditions to be satisfied by resources in order to be included... more
During the last years, RDF datasets from almost any knowledge domain have been published in the Linking Open Data (LOD) cloud. The Linked Open Data guidelines establish the conditions to be satisfied by resources in order to be included as part of the LOD cloud, as well as connected to previously published data. The process of publication and linkage of resources in the LOD cloud relies on: i) data cleaning and transformation into existing RDF formats, ii) storage of the data into RDF storage systems, and iii) data interlinking. Because of data source heterogeneity, generated RDF data may be ambiguous and links may be incomplete with respect to this data. Users of the Web of Data require linked data to meet high quality standards in order to develop applications that can produce trustworthy results, but data in the LOD cloud has not been curated; thus, tools are necessary to detect data quality problems. For example, researchers that study Life Sciences datasets to explain phenomena or identify anomalies, demand that their findings correspond to current discoveries, and not to the effect of low data quality standards of completeness or redundancy. In this paper we propose LiQuate, a system that uses Bayesian networks to study the incompleteness of links, and ambiguities between labels and between links in the LOD cloud, and can be applied to any domain. Additionally, a probabilistic rule-based system is used to infer new links that associate equivalent resources, and allow to resolve the ambiguities and incompleteness identified during the exploration of the Bayesian network. As a proof of concept, we applied LiQuate to existing Life Sciences linked datasets, and detected ambiguities in the data, that may compromise the confidence of the results of applications such as link prediction or pattern discovery. We illustrate a variety of identified problems and propose a set of enriched intra- and inter-links that may improve the quality of data items and links of specific datasets of the LOD cloud.
... 181–189, 2012. cO Springer-Verlag Berlin Heidelberg 2012 Page 2. ... 4. Bai, X., Chen, Y., Shao, Z.: Adaptive web services testing. In: Proceedings of the 31st Annual International Computer Software and Applications Conference,... more
... 181–189, 2012. cO Springer-Verlag Berlin Heidelberg 2012 Page 2. ... 4. Bai, X., Chen, Y., Shao, Z.: Adaptive web services testing. In: Proceedings of the 31st Annual International Computer Software and Applications Conference, COMPSAC 2007, Washington, DC, USA, vol. ...
ABSTRACT This paper presents a real test bench for a Static Synchronous Series Compensator (SSSC). The test bench allows verifying the real control devices of a substation and the PLC close-loop control of the SSSC. The proposed SSSC test... more
ABSTRACT This paper presents a real test bench for a Static Synchronous Series Compensator (SSSC). The test bench allows verifying the real control devices of a substation and the PLC close-loop control of the SSSC. The proposed SSSC test bench is a scale FACTS device of a 50MVA real SSSC that will be placed in the Spanish transmission grid. In this paper the design of the test bench is presented, including the main magnitudes of both grid and SSSC, along with the substation SSSC control scenario and the SSSC close-loop control strategy. Besides, the SSSC test bench dynamics and response has been validated using a model of the SSSC and its controls, developed in Matlab, and a model of the test bench, developed using SimPowerSystems toolbox. The paper shows the results of the simulation and compares them with the magnitudes measured in the test bench.
ABSTRACT Flexible AC Transmissions Systems (FACTS), developed during the last decades of the past century, have become one of the most remarkable solutions for the optimization of the electricity grid. Different types of FACTS have been... more
ABSTRACT Flexible AC Transmissions Systems (FACTS), developed during the last decades of the past century, have become one of the most remarkable solutions for the optimization of the electricity grid. Different types of FACTS have been proposed to solve similar transmission system operation problems. This paper reviews the FACTS devices used for solving these problems and the techniques used to optimize their location. The objective of the paper is to serve as a guide for selecting the right power system analysis and optimization techniques for a given transmission system problem, and the most used FACTS for this purpose. The main operation issues that have been considered are: voltage control, assets optimization, line overloads and grid congestion, voltage stability problems, angle stability problems, contingencies and economic issues. The study classifies the optimization methods in four groups, analyses their applications and extracts several conclusions from the bibliography analyzed and the actual state of the market for FACTS.
An abundance of life sciences data sources contain data about scientific entities such as genes and sequences. Scientists are interested in exploring relationships between scientific objects, e.g., between genes and bibliographic... more
An abundance of life sciences data sources contain data about scientific entities such as genes and sequences. Scientists are interested in exploring relationships between scientific objects, e.g., between genes and bibliographic citations. A scientist may choose the OMIM source, which contains information related to human genetic diseases, as a starting point for her exploration, and wish to eventually retrieve all related citations from the PUBMED source. Starting with a keyword search on a certain disease, she can explore all possible relationships between genes in OMIM and citations in PUBMED. This corresponds to the following query: "Return all citations of PUBMED that are linked to an OMIM entry that is related to some disease or condition."
This chapter presents an aggregated metric to estimate the quality of service compositions, and two algorithms to select the best compositions based on this metric. Both algorithms follow different strategies to prune the space of... more
This chapter presents an aggregated metric to estimate the quality of service compositions, and two algorithms to select the best compositions based on this metric. Both algorithms follow different strategies to prune the space of possibilities while minimizing the evaluation cost. The first algorithm, DP-BF, combines a best first strategy with a dynamic-programming technique. The second one, PT-SAM, adapts a Petri-net unfolding algorithm and tries to find a desired marking from an initial state. An experimental study was conducted in order to evaluate the behavior of DP-BF and PT-SAM compared to SAM and to the exhaustive solution. The experiments show that the quality of the compositions identified by the presented algorithms is close to the optimal solution produced by the exhaustive algorithm, while the optimization time is close to the time required by SAM to identify a solution.
... developed by different organizations and Z. Lacroix and ME Vidal (Eds.): RED 2010, LNCS 6799, pp. 23–42, 2012. c Springer-Verlag Berlin Heidelberg 2012 Page 2. 24 E. Blanco et al. characterized by distinct functional (eg ...
ABSTRACT QoS parameters are used to describe services in terms of their behavior and can be used to rank services according to non-functional criteria. To provide an accurate charac-terization of the quality of a service, we propose a... more
ABSTRACT QoS parameters are used to describe services in terms of their behavior and can be used to rank services according to non-functional criteria. To provide an accurate charac-terization of the quality of a service, we propose a sampling-based technique. The ...
As Web Services proliferate, it becomes more difficult to find a service that can perform a given task, and a coordi-nation of several services may be required. We present two algorithms to identify orderings of Web Service composi-tions.... more
As Web Services proliferate, it becomes more difficult to find a service that can perform a given task, and a coordi-nation of several services may be required. We present two algorithms to identify orderings of Web Service composi-tions. These algorithms follow different ...
E-Democracy and E-Participation are sub-areas of E-Government that utilize Information and Communication Technologies (ICT) to empower democracy and allow the participation of ordinary people during the definition of policies that affect... more
E-Democracy and E-Participation are sub-areas of E-Government that utilize Information and Communication Technologies (ICT) to empower democracy and allow the participation of ordinary people during the definition of policies that affect their lives. Particularly, general elections as well as the selection of presidential candidates are types of electoral events where ICT can facilitate the constituency participation, providing a resource to influence the implementation of such events. The authors propose data mining and ranking techniques to analyze voting historical data and identify regions where electoral campaigns need to be intensified. Based on citizens’ participation patterns in previous elections, they illustrate the quality of their approach on Venezuelan electoral data and compare it with respect to the results produced by a baseline independent study. Experimental results suggest that the authors’ techniques are able to predict the classification given for the baseline s...
We focus on ranking and data mining techniques to empower e-Democracy and allow the opinion of ordinary people to be considered in the design of electoral campaigns. We illustrate the quality of our approach on Venezuelan historical... more
We focus on ranking and data mining techniques to empower e-Democracy and allow the opinion of ordinary people to be considered in the design of electoral campaigns. We illustrate the quality of our approach on Venezuelan historical electoral data; ranking results are compared to ground truths produced by an independent study. Our evaluation suggests that the proposed techniques are able to identify up to 85% of the golden results by just analyzing 35% of the whole data.
Page 1. BioNavigation: Selecting Optimum Paths Through Biological Resources to Evaluate Ontological Navigational Queries * Zoé Lacroix1, Kaushal Parekh1, Maria-Esther Vidal2, Marelis Cardenas2, and Natalia Marquez2 ...
It's not surprisingly when entering this site to get the book. One of the popular books now is the resource discovery second international workshop red 2009 lyon france august 28 2009 revised p. You may be confused because you... more
It's not surprisingly when entering this site to get the book. One of the popular books now is the resource discovery second international workshop red 2009 lyon france august 28 2009 revised p. You may be confused because you can't find the book in the book store around your city. Commonly, the popular book will be sold quickly. And when you have found the store to buy the book, it will be so hurt when you run out of it. This is why, searching for this popular book in this website will give you benefit. You will not run out of this book.
... 17. G. Karvounarakis, V. Christophides, D. Plexousakis, and S. Alexaki, “Querying RDF Descriptions for Community Web Portals,” in Actes des 17ème Journées Bases de Données Avancées (BDA 2001), Agadir, Maroc, pp. 133-144, Cépaduès... more
... 17. G. Karvounarakis, V. Christophides, D. Plexousakis, and S. Alexaki, “Querying RDF Descriptions for Community Web Portals,” in Actes des 17ème Journées Bases de Données Avancées (BDA 2001), Agadir, Maroc, pp. 133-144, Cépaduès Edition, October 2001. 18. ...
ABSTRACT The success of Web service technology has brought a lot of interest from a large number of research communities such as Software Engineering, Artificial Intelligence, Semantic Web, Semantic Grid, etc. Despite several efforts... more
ABSTRACT The success of Web service technology has brought a lot of interest from a large number of research communities such as Software Engineering, Artificial Intelligence, Semantic Web, Semantic Grid, etc. Despite several efforts towards automating service discovery and composition, users still search for services via online repositories and compose them manually. In our opinion, this is due to the lack of semantic annotations (metadata) to describe service semantics and support an effective and efficient discovery of services. Semantic annotation is commonly recognized as one of the cornerstones of the semantic Web and also, an expensive, time consuming and error prone process. Thus, approaches to automatically derive annotations that would describe rapidly changing Web services repositories are extremely required. In this paper, we propose a semantic framework for bioinformatics Web service annotation, matching and classification. We propose a semi-automatic extraction approach of lightweight semantic annotations from textual description of Web services. We investigate the use of NLP techniques to derive service properties given a corpus of textual description of bioinformatics services. We evaluate the performance of the annotation extraction method and the importance of lightweight annotations to match, reason and classify bioinformatics Information Interaction Intelligence, volume 11, n°2 - 9Web services in order to bootstrap the service discovery process. Based on extracted annotations, we propose an inference and block clustering approaches, the two approaches are complementary. The former relies on semantic annotations and explicit background knowledge to match a discovery query and a set of Web services. The latter approach aims to deduce implicit associations between services and annotations highly correlated by applying an accelerated version of the Croki2 algorithm.
... Maria-Esther Vidal1, Edna Ruckhaus1 Maribel Acosta1,2, Cosmin Basca3, Gabriela Montoya1 1 Universidad Simón Bolıvar, Venezuela {mvidal, ruckhaus, macosta,gmontoya}@ldc.usb.ve 2 Institute AIFB, Karlsruhe Institute of Technology,... more
... Maria-Esther Vidal1, Edna Ruckhaus1 Maribel Acosta1,2, Cosmin Basca3, Gabriela Montoya1 1 Universidad Simón Bolıvar, Venezuela {mvidal, ruckhaus, macosta,gmontoya}@ldc.usb.ve 2 Institute AIFB, Karlsruhe Institute of Technology, Germany Maribel.Acosta@aifb.uni ...
... We thank David Lipman and Alex Lash of NIH / NCBI for their expertise on NCBI data sour-ces and for providing statistics, Barbara Eckman of IBM Life Sciences for discussions on life sciences ex-ploration, and Damayanti Gupta and Hyma... more
... We thank David Lipman and Alex Lash of NIH / NCBI for their expertise on NCBI data sour-ces and for providing statistics, Barbara Eckman of IBM Life Sciences for discussions on life sciences ex-ploration, and Damayanti Gupta and Hyma Murthy ... T K M 99 ] T. Topaloglou, A. K ...
Research Interests:
... We assume a schema with the following types: Temperature(time, city, value), and Rainfall(time,city, value), etc. ... Thus, we need to obtain source content and quality of data metadata for sources, to assist the user in selecting and... more
... We assume a schema with the following types: Temperature(time, city, value), and Rainfall(time,city, value), etc. ... Thus, we need to obtain source content and quality of data metadata for sources, to assist the user in selecting and ranking sources best suited for some query. ...
The rapid growth of the Internet and Intranets has dramatically increased the number of WebSources. These sources are typically characterized by a limited query capability compared to a relational data source. They may also be incomplete,... more
The rapid growth of the Internet and Intranets has dramatically increased the number of WebSources. These sources are typically characterized by a limited query capability compared to a relational data source. They may also be incomplete, and there may be redundancy ...
Page 1. Querying "Quality of Data" Metadata George A. Mihaila Department of Computer Science University of Toronto 10 King's College Road Toronto, Ontario, Canada, M5S 1A4 phone: (416) 978-1683 fax: (416) 978-4765... more
Page 1. Querying "Quality of Data" Metadata George A. Mihaila Department of Computer Science University of Toronto 10 King's College Road Toronto, Ontario, Canada, M5S 1A4 phone: (416) 978-1683 fax: (416) 978-4765 georgem@cs.toronto.edu ...
This paper studies opportunities for optimization that are typical to WebSources and that will allow the mediator to choose a possibly optimal execution plan. These opportunities exist at the wrapper level, at the mediator level, and at... more
This paper studies opportunities for optimization that are typical to WebSources and that will allow the mediator to choose a possibly optimal execution plan. These opportunities exist at the wrapper level, at the mediator level, and at the interface of the wrapper and mediator. ...
... by Laura Bright , Jean-robert Gruser , Louiqa Raschid , María Esther Vidal , Mar'ia Esther Vidal. Add To MetaCart. ...
Research Interests:
One of the most crucial tasks for today's knowledge workers is to get and retain a thorough overview on the latest state of the art. Especially in dynamic and evolving domains, the amount of relevant sources is constantly increasing,... more
One of the most crucial tasks for today's knowledge workers is to get and retain a thorough overview on the latest state of the art. Especially in dynamic and evolving domains, the amount of relevant sources is constantly increasing, updating and overruling previous methods and approaches. For instance, the digital transformation of manufacturing systems, called Industry 4.0, currently faces an overwhelming amount of standardization efforts and reference initiatives, resulting in a sophisticated information environment. We propose a structured dataset in the form of a semantically annotated knowledge graph for Industry 4.0 related standards, norms and reference frameworks. The graph provides a Linked Data-conform collection of annotated, classified reference guidelines supporting newcomers and experts alike in understanding how to implement Industry 4.0 systems. We illustrate the suitability of the graph for various use cases, its already existing applications, present the maint...
Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and... more
Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines ge...
We tackle the problem of entity and relation linking and present FALCON, a rule-based tool able to accurately map entities and relations in short texts to resources in a knowledge graph. FALCON resorts to fundamental principles of the... more
We tackle the problem of entity and relation linking and present FALCON, a rule-based tool able to accurately map entities and relations in short texts to resources in a knowledge graph. FALCON resorts to fundamental principles of the English morphology (e.g., compounding and headword identification) and performs joint entity and relation linking against a short text. We demonstrate the benefits of the rule-based approach implemented in FALCON on short texts composed of various types of entities. The attendees will observe the behavior of FALCON on the observed limitations of Entity Linking (EL) and Relation Linking (RL) tools. The demo is available at https://labs.tib.eu/falcon/.
In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities... more
In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based simil...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
This paper analyzes the challenges and requirements of establishing energy data ecosystems (EDEs) as data-driven infrastructures that overcome the limitations of currently fragmented energy applications. It proposes a new data- and... more
This paper analyzes the challenges and requirements of establishing energy data ecosystems (EDEs) as data-driven infrastructures that overcome the limitations of currently fragmented energy applications. It proposes a new data- and knowledge-driven approach for management and processing. This approach aims to extend the analytics services portfolio of various energy stakeholders and achieve two-way flows of electricity and information for optimized generation, distribution, and electricity consumption. The approach is based on semantic technologies to create knowledge-based systems that will aid machines in integrating and processing resources contextually and intelligently. Thus, a paradigm shift in the energy data value chain is proposed towards transparency and the responsible management of data and knowledge exchanged by the various stakeholders of an energy data space. The approach can contribute to innovative energy management and the adoption of new business models in future ...
Abstract. Data replication and deployment of local SPARQL endpoints improve scalability and availability of public SPARQL endpoints, mak-ing the consumption of Linked Data a reality. This solution requires syn-chronization and specific... more
Abstract. Data replication and deployment of local SPARQL endpoints improve scalability and availability of public SPARQL endpoints, mak-ing the consumption of Linked Data a reality. This solution requires syn-chronization and specific query processing strategies to take advantage of replication. However, existing replication aware techniques in federa-tions of SPARQL endpoints do not consider data dynamicity. We propose Fedra, an approach for querying federations of endpoints that benefits from replication. Participants in Fedra federations can copy fragments of data from several datasets, and describe them using provenance and views. These descriptions enable Fedra to reduce the number of selected endpoints while satisfying user divergence requirements. Experiments on real-world datasets suggest savings of up to three orders of magnitude.
In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome... more
In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome this limitation, we propose RDF-Hunter, a novel hybrid query processing approach that brings together machine and human computation to execute queries against RDF data. We develop a novel quality model and query engine in order to enable RDF-Hunter to on the fly decide which parts of a query should be executed through conventional technology or crowd computing. To evaluate RDF-Hunter, we created a collection of 50 SPARQL queries against the DBpedia data set, executed them using our hybrid query engine, and analyzed the accuracy of the outcomes obtained from the crowd. The experiments clearly show that the overall approach is feasible and produces query results that reliably and significantly enhance completeness of automatic query processing respon...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Abstract. The widespread explosion of Web accessible resources has led to new problems on the traditional tasks of query evaluation and efficient data ac-cess. With this is mind, we have developed the ontology-based OneQL system which... more
Abstract. The widespread explosion of Web accessible resources has led to new problems on the traditional tasks of query evaluation and efficient data ac-cess. With this is mind, we have developed the ontology-based OneQL system which provides optimization and query ...
ABSTRACT
In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large... more
In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed result...
The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource... more
The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sou...
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational database, CSV, JSON), either by materializing integrated data into RDF or by performing on-the-fly... more
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational database, CSV, JSON), either by materializing integrated data into RDF or by performing on-the-fly integration via SPARQL-to-SQL query translation. In the specific case of tabular datasets comprised of several CSV or Excel files, query translation approaches have been applied taking as input a lightweight schema with table and column names, and considering each source as a single table that can be loaded into a relational database system (RDB). This naive approach does not consider implicit constraints in this type of data, e.g., referential integrity among data sources, datatypes, or data integrity; We propose Morph-CSV, a framework that enforces constraints and can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV resorts to both a Constraints component and a set of operators that apply each type of constraint to the input with the aim of...
A data ecosystem (DE) offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in... more
A data ecosystem (DE) offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requirements and challenges that DEs face when ensuring data transparency. Requirements are derived from the data and organizational management, as well as from broader legal and ethical considerations. We propose a novel knowledge-driven DE architecture, providing the pillars for satisfying the analyzed requirements. We illustrate the potential of our proposal in a real-world scenario. Last, we discuss and rate the potential of the proposed architecture in the fulfillmentof these requirements.
Big biomedical data has grown exponentially during the last decades, as well as the applications that demand the understanding and discovery of the knowledge encoded in available big data. In order to address these requirements while... more
Big biomedical data has grown exponentially during the last decades, as well as the applications that demand the understanding and discovery of the knowledge encoded in available big data. In order to address these requirements while scaling up to the dominant dimensions of big biomedical data –volume, variety, and veracity– novel data integration techniques need to be defined. In this paper, we devise a knowledge-driven approach that relies on Semantic Web technologies such as ontologies, mapping languages, linked data, to generate a knowledge graph that integrates big data. Furthermore, query processing and knowledge discovery methods are implemented on top of the knowledge graph for enabling exploration and pattern uncovering. We report on the results of applying the proposed knowledge-driven approach in the EU funded project iASiS (http://project-iasis.eu/). in order to transform big data into actionable knowledge, paying thus the way for precision medicine and health policy mak...
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing... more
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the...
ABSTRACT
Research Interests:
ABSTRACT This paper proposes a novel strategy for semantifying DICOM medical images (RDF-ization) automatically. We define an architecture that involves processes for extracting, anonymizing, and serializing metadata comprised in DICOM... more
ABSTRACT This paper proposes a novel strategy for semantifying DICOM medical images (RDF-ization) automatically. We define an architecture that involves processes for extracting, anonymizing, and serializing metadata comprised in DICOM medical images into RDF/XML. These processes allow for semantically enriching and sharing the metadata of DICOM medical files through the Linked Health Data cloud. Thereby providing enhanced query capabilities with respect to the ones offered by current PACS environments, while exploiting all advantages of the Linking Open Data (LOD) cloud and Semantic Web technologies.
Image-driven medical applications can aid medical experts to visualize tissues and organs, and thus facilitate the task of identifying anomalies and tumors. However, to ensure reliable results, regions of the image that enclose the organs... more
Image-driven medical applications can aid medical experts to visualize tissues and organs, and thus facilitate the task of identifying anomalies and tumors. However, to ensure reliable results, regions of the image that enclose the organs or tissues of interest have to be precisely visualized. Volume rendering is a technique for visualizing volumetric data by computing a 2D projection of the image. Traditionally, volume rendering generates a semi-transparent image, enhancing the description of the area of interest to be visualized. Particularly during the visualization of medical images, identification of areas of interest depends on existing characterizations of the tissues, their corresponding intensities, and the medical image acquisition modality, e.g., Computed Tomography (CT) or Magnetic Resonance Imaging (MRI). However, a precise classification of a tissue requires specialized segmentation processes to distinguish neighboring tissues that share overlapped intensities. Semantic annotations of ontologies such as, RadLex and the Foundational Model of Anatomy (FMA), conceptually allow the annotation of areas that enclose particular tissues. This may impact on the segmentation process or the volume rendering quality. We survey state-of-the-art approaches that support medical image discovery and visualization based on semantic annotations, and show the benefits of semantically encoding medical images for volume rendering. As a proof of concept, we present ANISE (an AN atom I c SE mantic annotator) a framework for the semantic annotation of medical images. Finally, we describe the improvements achieved by ANISE during the rendering of a benchmark of medical images, enhancing segmented part of the organs and tissues that comprise the studied images.
We address the problem of answering Web ontology queries efficiently. An ontology is formalized as adeductive ontology base(DOB), a deductive database that comprises the ontology's inference axioms and facts. A cost-based query... more
We address the problem of answering Web ontology queries efficiently. An ontology is formalized as adeductive ontology base(DOB), a deductive database that comprises the ontology's inference axioms and facts. A cost-based query optimization technique for DOB is presented. A hybrid cost model is proposed to estimate the cost and cardinality of basic and inferred facts. Cardinality and cost of inferred facts are estimated using an adaptive sampling technique, while techniques of traditional relational cost models are used for estimating the cost of basic facts and conjunctive ontology queries. Finally, we implement a dynamic-programming optimization algorithm to identify query evaluation plans that minimize the number of intermediate inferred facts. We modeled a subset of the Web ontology language Lite as a DOB and performed an experimental study to analyze the predictive capacity of our cost model and the benefits of the query optimization technique. Our study has been conducted ...
Research Interests:
... Queries Marıa Esther Vidal1 Louiqa Raschid2 Vladimir Zadorozhny2 1Universidad Simon Bolivar Caracas, Venezuela 2University of Maryland College Park, MD 20742 1 Introduction ... [2] S. Chaudhuri and K. Shim. Query optimization in the... more
... Queries Marıa Esther Vidal1 Louiqa Raschid2 Vladimir Zadorozhny2 1Universidad Simon Bolivar Caracas, Venezuela 2University of Maryland College Park, MD 20742 1 Introduction ... [2] S. Chaudhuri and K. Shim. Query optimization in the presence of foreing fucntions. Proc. ...
... 1999. 3, et al. A wrapper generation toolkit to specify and construct wrappers for webaccesible data sources (websources – Bright - 1999. 3, Query optimization in the presence of foreing fucntions – Chaudhuri, Shim - 1993. 3, et al.... more
... 1999. 3, et al. A wrapper generation toolkit to specify and construct wrappers for webaccesible data sources (websources – Bright - 1999. 3, Query optimization in the presence of foreing fucntions – Chaudhuri, Shim - 1993. 3, et al. The xml-data home – Layman. ...
Research Interests: