Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
The increasing need of a variety of applications to store and process XML data has led to the development of systems and techniques for XML storage and querying. XML updating hasn't received a corresponding amount of attention. We... more
The increasing need of a variety of applications to store and process XML data has led to the development of systems and techniques for XML storage and querying. XML updating hasn't received a corresponding amount of attention. We discuss XPURS, a system of processing XPath queries and updates on XML Schema-compliant XML data. XPURS updates respect XML ordering and XML Schema typing constraints, and especially type inheritance and polymorphism. XPURS employs an innovative shredding scheme for ...
We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there... more
We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow...
Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join... more
Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join techniques is that they can start producing join results as soon as the first input tuples are available, thus improving pipelining by smoothing join result production and by masking source or network delays. In this paper we first propose Double Index NEsted-loops Reactive join (DINER), a new adaptive two-way join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel re-entrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus better exploiting temporary delays when new data is not available. We then exten...
QURSED enables the development of web-based query forms and reports (QFRs) that query and report semistructured XML data, i.e., data that are characterized by nesting, irregularities and structural variance. The query aspects of a QFR are... more
QURSED enables the development of web-based query forms and reports (QFRs) that query and report semistructured XML data, i.e., data that are characterized by nesting, irregularities and structural variance. The query aspects of a QFR are captured by its query set specification, which formally encodes multiple parameterized, possibly interdependent condition fragments and can describe large numbers of queries. The run-time component of QURSED produces XQuery-compliant queries by synthesizing fragments from the query set specificatio n that have been activated during the interaction of the end-user with the QFR. The design-time component of QURSED, called QURSED Editor, semiautomates the development of the query set specification and its association with the visual components of the QFR and guides the development of meaningful dependencies between condition fragments by translating the visual actions into appropriate query set specifications. We describe QURSED and illustrate how it ...
DEFINITION Database applications provide an XML view of their data so that the data is available to other applications, especially web applications. Database systems provide support for the client applications to use (query and/or... more
DEFINITION Database applications provide an XML view of their data so that the data is available to other applications, especially web applications. Database systems provide support for the client applications to use (query and/or manipulate) the data. The operations specified by the client applications are composed with the view definitions by the database system, thus performing these actions. The internal data model used by the database application, as well as how the operations are performed are transparent to the client applications; they see only an XML view of the entire system. XML views help the database systems to maintain their legacy data, as well as utilize the optimization features present in legacy systems (especially SQL engines), and at the same time make the data accessible to a wide range of web applications.
This paper briefly reviews the DBS and presents a new method of using XML to solve the problem that database is hard to express data structure of tree shape. This method fully develops the advantages of XML and relative-database. In the... more
This paper briefly reviews the DBS and presents a new method of using XML to solve the problem that database is hard to express data structure of tree shape. This method fully develops the advantages of XML and relative-database. In the end, a segment of the actual example was guven.
We study the problem of querying XML data sources that accept only a limited set of queries, such as sources accessible by Web services which can implement very large (potentially infinite) families of XPath queries. To compactly specify... more
We study the problem of querying XML data sources that accept only a limited set of queries, such as sources accessible by Web services which can implement very large (potentially infinite) families of XPath queries. To compactly specify such families of queries we adopt the Query Set Specifications, a formalism close to context-free grammars. We say that query Q is expressible by the specification P if it is equivalent to some expansion of P. Q is supported by P if it has an equivalent rewriting using some finite set of P's expansions. We study the complexity of expressibility and support and identify large classes of XPath queries for which there are efficient (PTIME) algorithms. Our study considers both the case in which the XML nodes in the results of the queries lose their original identity and the one in which the source exposes persistent node ids.
Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have... more
Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have not been sufficiently addressed. In particular, knowledge of redundancy can be used to reduce the number of source accesses that have to be performed to retrieve the answer to the user query. Moreover, probabilistic information about source overlap can help derive efficient query plans for delivering partial answers to queries.
Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer community members to collaboratively maintain and browse data. While community members may believe in conflicting facts (due to conflicting sources,... more
Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer community members to collaboratively maintain and browse data. While community members may believe in conflicting facts (due to conflicting sources, measurements or opinions), current online databases do not yet offer support for the management of data conflicts. Ricolla is a novel online database that treats data conflicts as first-class citizens. Unlike prior work in uncertain databases, which was made to provide a database back-end to application logic, Ricolla is tuned to the requirements of the online database paradigm, allowing intuitive visualization of conflicts and collaborative data editing/conflict resolution. The proposed end-to-end system makes the following contributions: a) an online database paradigm that captures conflicts, allowing data query and update, while enabling personalized, “as-you-go” conflict resolution, b) a data model and corresponding generic user interface for ex...
Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide... more
Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper proposes an incremental maintenance algorithm for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. our algorithm produces a set of queries that compute the updates to the view based upon an update of the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to apply our in...
Inconsistency-tolerant semantics, like the IAR semantics, have been proposed as means to compute meaningful query answers over inconsistent Description Logic (DL) ontologies. So far query answering under the IAR semantics (IARanswering)... more
Inconsistency-tolerant semantics, like the IAR semantics, have been proposed as means to compute meaningful query answers over inconsistent Description Logic (DL) ontologies. So far query answering under the IAR semantics (IARanswering) is known to be tractable only for arguably weak DLs like DL-Lite and the quite restricted EL⊥nr fragment of EL⊥. Towards providing a systematic study of IARanswering, in the current paper we first present a general framework/algorithm for IAR-answering which applies to arbitrary DLs but need not terminate. Nevertheless, this framework allows us to develop a sufficient condition for tractability of IAR-answering and hence of termination of our algorithm. We then show that this condition is always satisfied by the arguably expressive DL DL-Litebool, providing the first positive result for IAR-answering over a non-Horn-DL. In addition, recent results show that this condition usually holds for real-world ontologies and techniques and algorithms for check...
This vision paper presents new challenges and opportuni- ties in the area of distributed data analytics, at the core of which are data mining and machine learning. Atrst, we provide an overview of the current state of the art in the area... more
This vision paper presents new challenges and opportuni- ties in the area of distributed data analytics, at the core of which are data mining and machine learning. Atrst, we provide an overview of the current state of the art in the area and then analyse two aspects of data analytics systems, se- mantics and optimization. We argue that these aspects will emerge as important issues for the data management com- munity in the next years and propose promising research directions for solving them.
Page 1. View-based rewriting of XML queries Alin Deutsch1 Ioana Manolescu2 Vasilis Vassalos3 1University of California in San Diego, abdeutsch@cs.ucsd.edu 2INRIA Saclay–Île-de-France and LRI, U. Paris Sud-XI, ioana.manolescu@inria.fr ...
We develop a new model of the interaction of rational peers in an Peer-to-Peer (P2P) network that has at its heart altruism, an intrinsic parameter reflect-ing peers inherent willingness to contribute. Two differ-ent approaches for... more
We develop a new model of the interaction of rational peers in an Peer-to-Peer (P2P) network that has at its heart altruism, an intrinsic parameter reflect-ing peers inherent willingness to contribute. Two differ-ent approaches for modelling altruistic behavior and its attendant ...
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query $q$ and a set of semistructured views ${\cal V}$, finds rewriting... more
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query $q$ and a set of semistructured views ${\cal V}$, finds rewriting queries, i.e., queries that access the views and produce the same result as $q$. Our algorithm is based on appropriately generalizing containment mappings, the chase, and query composition -- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every non-trivial TSL rewriting query of $q$, and we discuss its complexity. We extend the rewriting algorithm to use some forms of structural constraints (such as DTDs) and find more opportunities for query rewriting
Wireless local area networks have gained significant momentum and, together with the widespread use of advanced mobile devices, create more opportunities for collaborative work. We present the architecture, implementation and evaluation... more
Wireless local area networks have gained significant momentum and, together with the widespread use of advanced mobile devices, create more opportunities for collaborative work. We present the architecture, implementation and evaluation of a secure middleware scheme for ...
XQForms is the first generator of Web-based query forms and reports for XML data. XQForms takes inputs (i) an XML Schema that models the source data to be queried and presented, (ii) a declarative specification, called XQForm annotation,... more
XQForms is the first generator of Web-based query forms and reports for XML data. XQForms takes inputs (i) an XML Schema that models the source data to be queried and presented, (ii) a declarative specification, called XQForm annotation, of the query forms and reports ...
Our main motive for this paper has been the observation that, while artificial intelligence is a prominent branch of computer science that has even found its way into popular culture, it appears that its “fruits” are an inconsequential... more
Our main motive for this paper has been the observation that, while artificial intelligence is a prominent branch of computer science that has even found its way into popular culture, it appears that its “fruits” are an inconsequential part of the computer industry. The paper tries to assess the extent to which this observation is accurate, by determining and analyzing some important factors that affect the fortunes of AI technologies in the marketplace. It identifies the stages in the admission process of Information Technology and the particular ...
Abstract—Active data warehouses have emerged as a new business intelligence paradigm where data in the integrated repository is refreshed in near real-time. This shift of practices achieves higher consistency between the stored... more
Abstract—Active data warehouses have emerged as a new business intelligence paradigm where data in the integrated repository is refreshed in near real-time. This shift of practices achieves higher consistency between the stored information and the latest updates, which in turn ...
Abstract—Active data warehouses have emerged as a new business intelligence paradigm where data in the integrated repository is refreshed in near real-time. This shift of practices achieves higher consistency between the stored... more
Abstract—Active data warehouses have emerged as a new business intelligence paradigm where data in the integrated repository is refreshed in near real-time. This shift of practices achieves higher consistency between the stored information and the latest updates, which in turn ...
Abstract We describe blogTrust, an innovative modular and extensible prototype application for monitoring changes in the interests of blogosphere participants. We also propose a new approach for the analysis of weblog contents that is... more
Abstract We describe blogTrust, an innovative modular and extensible prototype application for monitoring changes in the interests of blogosphere participants. We also propose a new approach for the analysis of weblog contents that is supported by blogTrust and can yield new insights on the analysis of the blogosphere by monitoring the convergence or dispersion of blogosphere interests.

And 42 more