PVLDB: Vol 15, No 13

Volume 15, Issue 13September 2022

Volume 15, Issue 13

September 2022

Editor:

Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Publisher:

VLDB Endowment

ISSN:2150-8097

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFront matter (Cover, Contents, Organization, Letter from the editors in chief)

Select All

Export Citations Save to Binder

research-article

High-Dimensional Data Cubes

Pages 3828–3840https://doi.org/10.14778/3565838.3565839

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information ...

research-article

Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees

Pages 3841–3853https://doi.org/10.14778/3565838.3565840

Mining time series motifs is a fundamental, yet expensive task in exploratory data analytics. In this paper, we therefore propose a fast method to find the top-k motifs with probabilistic guarantees. Our probabilistic approach is based on Locality ...

research-article

FEDEX: An Explainability Framework for Data Exploration Steps

Pages 3854–3868https://doi.org/10.14778/3565838.3565841

When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this laborious ...

research-article

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

Pages 3869–3882https://doi.org/10.14778/3565838.3565842

The ever-increasing demand for high performance Big Data analytics and data processing, has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into ...

research-article

Discovering Polarization Niches via Dense Subgraphs with Attractors and Repulsers

Pages 3883–3896https://doi.org/10.14778/3565838.3565843

Detecting niches of polarization in social media is a first step towards deploying mitigation strategies and avoiding radicalization. In this paper, we model polarization niches as close-knit dense communities of users, which are under the influence of ...

research-article

Sage: A System for Uncertain Network Analysis

Pages 3897–3910https://doi.org/10.14778/3565838.3565844

We propose Sage, a system for uncertain network analysis. Algorithms for uncertain network analysis require large amounts of memory and computing resources as they sample a large number of network instances and run analysis on them. Sage makes uncertain ...

research-article

Mining Bursting Core in Large Temporal Graphs

Pages 3911–3923https://doi.org/10.14778/3565838.3565845

Temporal graphs are ubiquitous. Mining communities that are bursting in a period of time is essential for seeking real emergency events in temporal graphs. Unfortunately, most previous studies on community mining in temporal networks ignore the bursting ...

research-article

Cost-Based or Learning-Based?: A Hybrid Query Optimizer for Query Plan Selection

Pages 3924–3936https://doi.org/10.14778/3565838.3565846

Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn ...

research-article

ONe Index for All Kernels (ONIAK): A Zero Re-Indexing LSH Solution to ANNS-ALT (After Linear Transformation)

Pages 3937–3949https://doi.org/10.14778/3565838.3565847

In this work, we formulate and solve a new type of approximate nearest neighbor search (ANNS) problems called ANNS after linear transformation (ALT). In ANNS-ALT, we search for the vector (in a dataset) that, after being linearly transformed by a user-...

research-article

Learned Index Benefits: Machine Learning Based Index Performance Estimation

Pages 3950–3962https://doi.org/10.14778/3565838.3565848

Index selection remains one of the most challenging problems in relational database management systems. To find an optimum index configuration for a workload, accurately and efficiently quantifying the benefits of each candidate index configuration is ...

research-article

Online Ridesharing with Meeting Points

Pages 3963–3975https://doi.org/10.14778/3565838.3565849

Nowadays, ridesharing becomes a popular commuting mode. Dynamically arriving riders post their origins and destinations, then the platform assigns drivers to serve them. In ridesharing, different groups of riders can be served by one driver if their ...

research-article

Exploiting the Power of Equality-Generating Dependencies in Ontological Reasoning

Pages 3976–3988https://doi.org/10.14778/3565838.3565850

Equality-generating dependencies (EGDs) allow to fully exploit the power of existential quantification in ontological reasoning settings modeled via Tuple-Generating Dependencies (TGDs), by enabling value-assignment or forcing the equivalence of fresh ...

research-article

No Repetition: Fast and Reliable Sampling with Highly Concentrated Hashing

Pages 3989–4001https://doi.org/10.14778/3565838.3565851

Stochastic sample-based estimators are among the most fundamental and universally applied tools in statistics. Such estimators are particularly important when processing huge amounts of data, where we need to be able to answer a wide range of ...

research-article

Witness Generation for JSON Schema

Pages 4002–4014https://doi.org/10.14778/3565838.3565852

JSON Schema is a schema language for JSON documents, based on a complex combination of structural operators, Boolean operators (negation included), and recursive variables. The static analysis of JSON Schema documents comprises practically relevant ...

research-article

Towards Observability for Production Machine Learning Pipelines

Pages 4015–4022https://doi.org/10.14778/3565838.3565853

Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these ...

research-article

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory

Pages 4023–4037https://doi.org/10.14778/3565838.3565854

We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe ...

research-article

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Pages 4038–4047https://doi.org/10.14778/3565838.3565855

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots ...

research-article

Fairness Matters: A Tit-for-Tat Strategy Against Selfish Mining

Pages 4048–4061https://doi.org/10.14778/3565838.3565856

The proof-of-work (PoW) based blockchains are more secure nowadays since profit-oriented miners contribute more computing powers in exchange for fair revenues. This virtuous circle only works under an incentive-compatible consensus, which is found to be ...

research-article

SageDB: An Instance-Optimized Data Analytics System

Pages 4062–4078https://doi.org/10.14778/3565838.3565857

Modern data systems are typically both complex and general-purpose. They are complex because of the numerous internal knobs and parameters that users need to manually tune in order to achieve good performance; they are general-purpose because they are ...

research-article

Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications

Pages 4079–4092https://doi.org/10.14778/3565838.3565858

Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various ...

research-article

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Pages 4093–4105https://doi.org/10.14778/3565838.3565859

Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process ...

Subjects

Currently Not Available

Proceedings of the VLDB Endowment

Sections

Issue Downloads

High-Dimensional Data Cubes

Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees

FEDEX: An Explainability Framework for Data Exploration Steps

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

Discovering Polarization Niches via Dense Subgraphs with Attractors and Repulsers

Sage: A System for Uncertain Network Analysis

Mining Bursting Core in Large Temporal Graphs

Cost-Based or Learning-Based?: A Hybrid Query Optimizer for Query Plan Selection

ONe Index for All Kernels (ONIAK): A Zero Re-Indexing LSH Solution to ANNS-ALT (After Linear Transformation)

Learned Index Benefits: Machine Learning Based Index Performance Estimation

Online Ridesharing with Meeting Points

Exploiting the Power of Equality-Generating Dependencies in Ontological Reasoning

No Repetition: Fast and Reliable Sampling with Highly Concentrated Hashing

Witness Generation for JSON Schema

Towards Observability for Production Machine Learning Pipelines

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Fairness Matters: A Tit-for-Tat Strategy Against Selfish Mining

SageDB: An Instance-Optimized Data Analytics System

Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Sections

Issue Downloads

Save to Binder

Subjects

Comments