PVLDB: Vol 14, No 10

Volume 14, Issue 10June 2021

Volume 14, Issue 10

June 2021

Editor:

Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Publisher:

VLDB Endowment

ISSN:2150-8097

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Issue Downloads

PDFFront matter (Cover, Contents, Organization, Editorial)

Select All

Export Citations Save to Binder

research-article

How to design robust algorithms using noisy comparison Oracle

Pages 1703–1716https://doi.org/10.14778/3467861.3467862

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as k-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on ...

research-article

SAND: streaming subsequence anomaly detection

Pages 1717–1729https://doi.org/10.14778/3467861.3467863

With the increasing demand for real-time analytics and decision making, anomaly detection methods need to operate over streams of values and handle drifts in data distribution. Unfortunately, existing approaches have severe limitations: they either ...

research-article

Optimizing fitness-for-use of differentially private linear queries

Pages 1730–1742https://doi.org/10.14778/3467861.3467864

In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear ...

research-article

Cryptanalysis of an encrypted database in SIGMOD '14

Pages 1743–1755https://doi.org/10.14778/3467861.3467865

Encrypted database is an innovative technology proposed to solve the data confidentiality issue in cloud-based DB systems. It allows a data owner to encrypt its database before uploading it to the service provider; and it allows the service provider to ...

research-article

Unconstrained submodular maximization with modular costs: tight approximation and application to profit maximization

Pages 1756–1768https://doi.org/10.14778/3467861.3467866

Given a set V, the problem of unconstrained submodular maximization with modular costs (USM-MC) asks for a subset S ⊆ V that maximizes f(S) - c(S), where f is a non-negative, monotone, and submodular function that gauges the utility of S, and c is a non-...

research-article

Distributed deep learning on data systems: a comparative analysis of approaches

Pages 1769–1782https://doi.org/10.14778/3467861.3467867

Deep learning (DL) is growing in popularity for many data analytics applications, including among enterprises. Large business-critical datasets in such settings typically reside in RDBMSs or other data systems. The DB community has long aimed to bring ...

research-article

PR-sketch: monitoring per-key aggregation of streaming data with nearly full accuracy

Pages 1783–1796https://doi.org/10.14778/3467861.3467868

Computing per-key aggregation is indispensable in streaming data analysis formulated as two phases, an update phase and a recovery phase. As the size and speed of data streams rise, accurate per-key information is useful in many applications like ...

research-article

Tensors: an abstraction for general data processing

Pages 1797–1804https://doi.org/10.14778/3467861.3467869

Deep Learning (DL) has created a growing demand for simpler ways to develop complex models and efficient ways to execute them. Thus, a significant effort has gone into frameworks like PyTorch or TensorFlow to support a variety of DL models and run ...

research-article

Budget sharing for multi-analyst differential privacy

Pages 1805–1817https://doi.org/10.14778/3467861.3467870

Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to ...

research-article

In the land of data streams where synopses are missing, one framework to bring them all

Pages 1818–1831https://doi.org/10.14778/3467861.3467871

In pursuit of real-time data analysis, approximate summarization structures, i.e., synopses, have gained importance over the years. However, existing stream processing systems, such as Flink, Spark, and Storm, do not support synopses as first class ...

research-article

Data acquisition for improving machine learning models

Pages 1832–1844https://doi.org/10.14778/3467861.3467872

The vast advances in Machine Learning (ML) over the last ten years have been powered by the availability of suitably prepared data for training purposes. The future of ML-enabled enterprise hinges on data. As such, there is already a vibrant market ...

research-article

Efficiently answering reachability and path queries on temporal bipartite graphs

Pages 1845–1858https://doi.org/10.14778/3467861.3467873

Bipartite graphs are naturally used to model relationships between two different types of entities, such as people-location, author-paper, and customer-product. When modeling real-world applications like disease outbreaks, edges are often enriched with ...

research-article

Preference queries over taxonomic domains

Pages 1859–1871https://doi.org/10.14778/3467861.3467874

When composing multiple preferences characterizing the most suitable results for a user, several issues may arise. Indeed, preferences can be partially contradictory, suffer from a mismatch with the level of detail of the actual data, and even lack ...

research-article

Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory

Pages 1872–1885https://doi.org/10.14778/3467861.3467875

The recent byte-addressable and large-capacity commercialized persistent memory (PM) is promising to drive database as a service (DBaaS) into unchartered territories. This paper investigates how to leverage PMs to revisit the conventional LSM-tree based ...

research-article

Kamino: constraint-aware differentially private data synthesis

Pages 1886–1899https://doi.org/10.14778/3467861.3467876

Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring ...

research-article

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Pages 1900–1912https://doi.org/10.14778/3467861.3467877

It is challenging for cloud-native relational databases to meet the ever-increasing needs of scaling compute and memory resources independently and elastically. The recent emergence of memory disaggregation architecture, relying on high-speed RDMA ...

research-article

Dual-objective fine-tuning of BERT for entity matching

Pages 1913–1921https://doi.org/10.14778/3467861.3467878

An increasing number of data providers have adopted shared numbering schemes such as GTIN, ISBN, DUNS, or ORCID numbers for identifying entities in the respective domain. This means for data integration that shared identifiers are often available for a ...

Subjects

Currently Not Available

Proceedings of the VLDB Endowment

Sections

Issue Downloads

How to design robust algorithms using noisy comparison Oracle

SAND: streaming subsequence anomaly detection

Optimizing fitness-for-use of differentially private linear queries

Cryptanalysis of an encrypted database in SIGMOD '14

Unconstrained submodular maximization with modular costs: tight approximation and application to profit maximization

Distributed deep learning on data systems: a comparative analysis of approaches

PR-sketch: monitoring per-key aggregation of streaming data with nearly full accuracy

Tensors: an abstraction for general data processing

Budget sharing for multi-analyst differential privacy

In the land of data streams where synopses are missing, one framework to bring them all

Data acquisition for improving machine learning models

Efficiently answering reachability and path queries on temporal bipartite graphs

Preference queries over taxonomic domains

Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory

Kamino: constraint-aware differentially private data synthesis

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Dual-objective fine-tuning of BERT for entity matching

Sections

Issue Downloads

Save to Binder

Subjects

Comments