Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 15, Issue 13September 2022
Editor:
Publisher:
  • VLDB Endowment
ISSN:2150-8097
Reflects downloads up to 20 Feb 2025Bibliometrics
Skip Table Of Content Section
High-Dimensional Data Cubes

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information ...

Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees

Mining time series motifs is a fundamental, yet expensive task in exploratory data analytics. In this paper, we therefore propose a fast method to find the top-k motifs with probabilistic guarantees. Our probabilistic approach is based on Locality ...

FEDEX: An Explainability Framework for Data Exploration Steps

When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this laborious ...

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

The ever-increasing demand for high performance Big Data analytics and data processing, has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into ...

Discovering Polarization Niches via Dense Subgraphs with Attractors and Repulsers

Detecting niches of polarization in social media is a first step towards deploying mitigation strategies and avoiding radicalization. In this paper, we model polarization niches as close-knit dense communities of users, which are under the influence of ...

Sage: A System for Uncertain Network Analysis

We propose Sage, a system for uncertain network analysis. Algorithms for uncertain network analysis require large amounts of memory and computing resources as they sample a large number of network instances and run analysis on them. Sage makes uncertain ...

research-article
Mining Bursting Core in Large Temporal Graphs

Temporal graphs are ubiquitous. Mining communities that are bursting in a period of time is essential for seeking real emergency events in temporal graphs. Unfortunately, most previous studies on community mining in temporal networks ignore the bursting ...

Cost-Based or Learning-Based?: A Hybrid Query Optimizer for Query Plan Selection

Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn ...

ONe Index for All Kernels (ONIAK): A Zero Re-Indexing LSH Solution to ANNS-ALT (After Linear Transformation)

In this work, we formulate and solve a new type of approximate nearest neighbor search (ANNS) problems called ANNS after linear transformation (ALT). In ANNS-ALT, we search for the vector (in a dataset) that, after being linearly transformed by a user-...

Learned Index Benefits: Machine Learning Based Index Performance Estimation

Index selection remains one of the most challenging problems in relational database management systems. To find an optimum index configuration for a workload, accurately and efficiently quantifying the benefits of each candidate index configuration is ...

Online Ridesharing with Meeting Points

Nowadays, ridesharing becomes a popular commuting mode. Dynamically arriving riders post their origins and destinations, then the platform assigns drivers to serve them. In ridesharing, different groups of riders can be served by one driver if their ...

research-article
Exploiting the Power of Equality-Generating Dependencies in Ontological Reasoning

Equality-generating dependencies (EGDs) allow to fully exploit the power of existential quantification in ontological reasoning settings modeled via Tuple-Generating Dependencies (TGDs), by enabling value-assignment or forcing the equivalence of fresh ...

No Repetition: Fast and Reliable Sampling with Highly Concentrated Hashing

Stochastic sample-based estimators are among the most fundamental and universally applied tools in statistics. Such estimators are particularly important when processing huge amounts of data, where we need to be able to answer a wide range of ...

Witness Generation for JSON Schema

JSON Schema is a schema language for JSON documents, based on a complex combination of structural operators, Boolean operators (negation included), and recursive variables. The static analysis of JSON Schema documents comprises practically relevant ...

research-article
Towards Observability for Production Machine Learning Pipelines

Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these ...

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory

We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe ...

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots ...

Fairness Matters: A Tit-for-Tat Strategy Against Selfish Mining

The proof-of-work (PoW) based blockchains are more secure nowadays since profit-oriented miners contribute more computing powers in exchange for fair revenues. This virtuous circle only works under an incentive-compatible consensus, which is found to be ...

research-article
SageDB: An Instance-Optimized Data Analytics System

Modern data systems are typically both complex and general-purpose. They are complex because of the numerous internal knobs and parameters that users need to manually tune in order to achieve good performance; they are general-purpose because they are ...

Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications

Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various ...

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process ...

Subjects

Currently Not Available

Comments