The increasing need of a variety of applications to store and process XML data has led to the dev... more The increasing need of a variety of applications to store and process XML data has led to the development of systems and techniques for XML storage and querying. XML updating hasn't received a corresponding amount of attention. We discuss XPURS, a system of processing XPath queries and updates on XML Schema-compliant XML data. XPURS updates respect XML ordering and XML Schema typing constraints, and especially type inheritance and polymorphism. XPURS employs an innovative shredding scheme for ...
We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and aut... more We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow...
Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applicat... more Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join techniques is that they can start producing join results as soon as the first input tuples are available, thus improving pipelining by smoothing join result production and by masking source or network delays. In this paper we first propose Double Index NEsted-loops Reactive join (DINER), a new adaptive two-way join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel re-entrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus better exploiting temporary delays when new data is not available. We then exten...
QURSED enables the development of web-based query forms and reports (QFRs) that query and report ... more QURSED enables the development of web-based query forms and reports (QFRs) that query and report semistructured XML data, i.e., data that are characterized by nesting, irregularities and structural variance. The query aspects of a QFR are captured by its query set specification, which formally encodes multiple parameterized, possibly interdependent condition fragments and can describe large numbers of queries. The run-time component of QURSED produces XQuery-compliant queries by synthesizing fragments from the query set specificatio n that have been activated during the interaction of the end-user with the QFR. The design-time component of QURSED, called QURSED Editor, semiautomates the development of the query set specification and its association with the visual components of the QFR and guides the development of meaningful dependencies between condition fragments by translating the visual actions into appropriate query set specifications. We describe QURSED and illustrate how it ...
DEFINITION Database applications provide an XML view of their data so that the data is available ... more DEFINITION Database applications provide an XML view of their data so that the data is available to other applications, especially web applications. Database systems provide support for the client applications to use (query and/or manipulate) the data. The operations specified by the client applications are composed with the view definitions by the database system, thus performing these actions. The internal data model used by the database application, as well as how the operations are performed are transparent to the client applications; they see only an XML view of the entire system. XML views help the database systems to maintain their legacy data, as well as utilize the optimization features present in legacy systems (especially SQL engines), and at the same time make the data accessible to a wide range of web applications.
This paper briefly reviews the DBS and presents a new method of using XML to solve the problem th... more This paper briefly reviews the DBS and presents a new method of using XML to solve the problem that database is hard to express data structure of tree shape. This method fully develops the advantages of XML and relative-database. In the end, a segment of the actual example was guven.
We study the problem of querying XML data sources that accept only a limited set of queries, such... more We study the problem of querying XML data sources that accept only a limited set of queries, such as sources accessible by Web services which can implement very large (potentially infinite) families of XPath queries. To compactly specify such families of queries we adopt the Query Set Specifications, a formalism close to context-free grammars. We say that query Q is expressible by the specification P if it is equivalent to some expansion of P. Q is supported by P if it has an equivalent rewriting using some finite set of P's expansions. We study the complexity of expressibility and support and identify large classes of XPath queries for which there are efficient (PTIME) algorithms. Our study considers both the case in which the XML nodes in the results of the queries lose their original identity and the one in which the source exposes persistent node ids.
Many autonomous and heterogeneous information sources are becoming increasingly available to the ... more Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have not been sufficiently addressed. In particular, knowledge of redundancy can be used to reduce the number of source accesses that have to be performed to retrieve the answer to the user query. Moreover, probabilistic information about source overlap can help derive efficient query plans for delivering partial answers to queries.
Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer communit... more Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer community members to collaboratively maintain and browse data. While community members may believe in conflicting facts (due to conflicting sources, measurements or opinions), current online databases do not yet offer support for the management of data conflicts. Ricolla is a novel online database that treats data conflicts as first-class citizens. Unlike prior work in uncertain databases, which was made to provide a database back-end to application logic, Ricolla is tuned to the requirements of the online database paradigm, allowing intuitive visualization of conflicts and collaborative data editing/conflict resolution. The proposed end-to-end system makes the following contributions: a) an online database paradigm that captures conflicts, allowing data query and update, while enabling personalized, “as-you-go” conflict resolution, b) a data model and corresponding generic user interface for ex...
The increasing need of a variety of applications to store and process XML data has led to the dev... more The increasing need of a variety of applications to store and process XML data has led to the development of systems and techniques for XML storage and querying. XML updating hasn't received a corresponding amount of attention. We discuss XPURS, a system of processing XPath queries and updates on XML Schema-compliant XML data. XPURS updates respect XML ordering and XML Schema typing constraints, and especially type inheritance and polymorphism. XPURS employs an innovative shredding scheme for ...
We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and aut... more We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow...
Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applicat... more Abstract—Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join techniques is that they can start producing join results as soon as the first input tuples are available, thus improving pipelining by smoothing join result production and by masking source or network delays. In this paper we first propose Double Index NEsted-loops Reactive join (DINER), a new adaptive two-way join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel re-entrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus better exploiting temporary delays when new data is not available. We then exten...
QURSED enables the development of web-based query forms and reports (QFRs) that query and report ... more QURSED enables the development of web-based query forms and reports (QFRs) that query and report semistructured XML data, i.e., data that are characterized by nesting, irregularities and structural variance. The query aspects of a QFR are captured by its query set specification, which formally encodes multiple parameterized, possibly interdependent condition fragments and can describe large numbers of queries. The run-time component of QURSED produces XQuery-compliant queries by synthesizing fragments from the query set specificatio n that have been activated during the interaction of the end-user with the QFR. The design-time component of QURSED, called QURSED Editor, semiautomates the development of the query set specification and its association with the visual components of the QFR and guides the development of meaningful dependencies between condition fragments by translating the visual actions into appropriate query set specifications. We describe QURSED and illustrate how it ...
DEFINITION Database applications provide an XML view of their data so that the data is available ... more DEFINITION Database applications provide an XML view of their data so that the data is available to other applications, especially web applications. Database systems provide support for the client applications to use (query and/or manipulate) the data. The operations specified by the client applications are composed with the view definitions by the database system, thus performing these actions. The internal data model used by the database application, as well as how the operations are performed are transparent to the client applications; they see only an XML view of the entire system. XML views help the database systems to maintain their legacy data, as well as utilize the optimization features present in legacy systems (especially SQL engines), and at the same time make the data accessible to a wide range of web applications.
This paper briefly reviews the DBS and presents a new method of using XML to solve the problem th... more This paper briefly reviews the DBS and presents a new method of using XML to solve the problem that database is hard to express data structure of tree shape. This method fully develops the advantages of XML and relative-database. In the end, a segment of the actual example was guven.
We study the problem of querying XML data sources that accept only a limited set of queries, such... more We study the problem of querying XML data sources that accept only a limited set of queries, such as sources accessible by Web services which can implement very large (potentially infinite) families of XPath queries. To compactly specify such families of queries we adopt the Query Set Specifications, a formalism close to context-free grammars. We say that query Q is expressible by the specification P if it is equivalent to some expansion of P. Q is supported by P if it has an equivalent rewriting using some finite set of P's expansions. We study the complexity of expressibility and support and identify large classes of XPath queries for which there are efficient (PTIME) algorithms. Our study considers both the case in which the XML nodes in the results of the queries lose their original identity and the one in which the source exposes persistent node ids.
Many autonomous and heterogeneous information sources are becoming increasingly available to the ... more Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have not been sufficiently addressed. In particular, knowledge of redundancy can be used to reduce the number of source accesses that have to be performed to retrieve the answer to the user query. Moreover, probabilistic information about source overlap can help derive efficient query plans for delivering partial answers to queries.
Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer communit... more Shared online databases, such as Google Fusion Tables or Quickbase, allow non-programmer community members to collaboratively maintain and browse data. While community members may believe in conflicting facts (due to conflicting sources, measurements or opinions), current online databases do not yet offer support for the management of data conflicts. Ricolla is a novel online database that treats data conflicts as first-class citizens. Unlike prior work in uncertain databases, which was made to provide a database back-end to application logic, Ricolla is tuned to the requirements of the online database paradigm, allowing intuitive visualization of conflicts and collaborative data editing/conflict resolution. The proposed end-to-end system makes the following contributions: a) an online database paradigm that captures conflicts, allowing data query and update, while enabling personalized, “as-you-go” conflict resolution, b) a data model and corresponding generic user interface for ex...
Uploads
Papers by Vasilis Vassalos