PACMMOD: Vol 2, No 6

Volume 2, Issue 6December 2024SIGMOD

Volume 2, Issue 6

December 2024

Editor:

Divyakant Agrawal
University of California, Santa Barbara, United States

Publisher:

Association for Computing Machinery
New York
NY
United States

EISSN:2836-6573

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFrontmatter: front cover, IFC, table of contents

Select All

Export Citations Save to Binder

editorial

Free

PACMMOD V2, N6 (SIGMOD), December 2024: Editorial

Article No.: 223, Pages 1–2https://doi.org/10.1145/3698798

The Proceedings of the ACM on Management of Data (PACMMOD) is concerned with the principles, algorithms, techniques, systems, and applications of database management systems, data management technology, and science and engineering of data. It includes ...

research-article

A Universal Sketch for Estimating Heavy Hitters and Per-Element Frequency Moments in Data Streams with Bounded Deletions

Article No.: 224, Pages 1–28https://doi.org/10.1145/3698799

In the field of data stream processing, there are two prevalent models, i.e., insertion-only, and turnstile models. Most previous works were proposed for the insertion-only model, which assumes new elements arrive continuously as a stream, and neglects ...

research-article

An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph Discovery

Article No.: 225, Pages 1–26https://doi.org/10.1145/3698800

Detecting locally, non-overlapping, near-clique densest subgraphs is a crucial problem for community search in social networks. As a vertex may be involved in multiple overlapped local cliques, detecting locally densest sub-structures considering h-...

research-article

Open Access

Buffered Persistence in B+ Trees

Article No.: 226, Pages 1–24https://doi.org/10.1145/3698801

Non-volatile Memory (NVM) offers the opportunity to build large, durable B+ trees with markedly higher performance and faster post-crash recovery than is possible with traditional disk- or flash-based persistence. Unfortunately, cache flush and fence ...

research-article

Camel: Efficient Compression of Floating-Point Time Series

Article No.: 227, Pages 1–26https://doi.org/10.1145/3698802

Time series compression encodes the information in a time-ordered sequence of data points into fewer bits, thereby reducing storage costs and possibly other costs. Compression methods are either general or XOR-based. General compression methods are time-...

research-article

Common Neighborhood Estimation over Bipartite Graphs under Local Differential Privacy

Article No.: 228, Pages 1–26https://doi.org/10.1145/3698803

Bipartite graphs, formed by two vertex layers, arise as a natural fit for modeling the relationships between two groups of entities. In bipartite graphs, common neighborhood computation between two vertices on the same vertex layer is a basic operator, ...

research-article

Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query Processing

Article No.: 229, Pages 1–26https://doi.org/10.1145/3698804

Graph pattern query is a powerful tool for extracting crucial information from property graphs. With the exponential growth of sizes, property graphs are typically divided into multiple subgraphs (referred to as partitions) and stored across various ...

research-article

Constant-time Connectivity Querying in Dynamic Graphs

Article No.: 230, Pages 1–23https://doi.org/10.1145/3698805

Connectivity query processing is a fundamental problem in graph processing. Given an undirected graph and two query vertices, the problem aims to identify whether they are connected via a path. Given frequent edge updates in real graph applications, in ...

research-article

CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning

Article No.: 231, Pages 1–27https://doi.org/10.1145/3698831

Machine learning models are only as good as their training data. Simple models trained on well-chosen features extracted from the raw data often outperform complex models trained directly on the raw data. Data preparation pipelines, which clean and ...

research-article

Open Access

Directional Queries: Making Top-k Queries More Effective in Discovering Relevant Results

Article No.: 232, Pages 1–26https://doi.org/10.1145/3698807

Top-k queries, in particular those based on a linear scoring function, are a common way to extract relevant results from large datasets. Their major advantage over alternative approaches, such as skyline queries (which return all the undominated objects ...

research-article

Open Access

Disclosure-Compliant Query Answering

Article No.: 233, Pages 1–28https://doi.org/10.1145/3698808

In today's data-driven world, organizations face increasing pressure to comply with data disclosure policies, which require data masking measures and robust access control mechanisms. This paper presents Mascara, a middleware for specifying and enforcing ...

research-article

Open Access

DPconv: Super-Polynomially Faster Join Ordering

Article No.: 234, Pages 1–26https://doi.org/10.1145/3698809

We revisit the join ordering problem in query optimization. The standard exact algorithm, DPccp, has a worst-case running time of O(3ⁿ). This is prohibitively expensive for large queries, which are not that uncommon anymore. We develop a new algorithmic ...

research-article

Open Access

Finding Logic Bugs in Spatial Database Engines via Affine Equivalent Inputs

Article No.: 235, Pages 1–26https://doi.org/10.1145/3698810

Spatial Database Management Systems (SDBMSs) aim to store, manipulate, and retrieve spatial data. SDBMSs are employed in various modern applications, such as geographic information systems, computer-aided design tools, and location-based services. ...

research-article

GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models

Article No.: 236, Pages 1–29https://doi.org/10.1145/3698811

Data quality is critical across many applications. The utility of data is undermined by various errors, making rigorous data cleaning a necessity. Traditional data cleaning systems depend heavily on predefined rules and constraints, which necessitate ...

research-article

GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAP

Article No.: 237, Pages 1–26https://doi.org/10.1145/3698812

In this paper, we suggest a novel GPU-in-data-path architecture that leverages a GPU to accelerate the I/O path and thus can achieve almost in-memory bandwidth using SSDs. In this architecture, the main idea is to stream data in heavy-weight compressed ...

research-article

Open Access

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance

Article No.: 238, Pages 1–27https://doi.org/10.1145/3698813

This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. We find that current systems have to choose between fast in-memory operators and slower out-of-memory operators. We ...

research-article

Open Access

iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search

Article No.: 239, Pages 1–26https://doi.org/10.1145/3698814

Range-filtering approximate nearest neighbor (RFANN) search is attracting increasing attention in academia and industry. Given a set of data objects, each being a pair of a high-dimensional vector and a numeric value, an RFANN query with a vector and a ...

research-article

Live Patching for Distributed In-Memory Key-Value Stores

Article No.: 241, Pages 1–26https://doi.org/10.1145/3698816

Providers of high-availability data stores need to roll out software updates without causing noticeable downtimes. For distributed data stores like Redis Cluster, the state-of-the-art is a rolling update, where the nodes are restarted in sequence. This ...

research-article

Open Access

Transforming RDF Graphs to Property Graphs using Standardized Schemas

Article No.: 242, Pages 1–25https://doi.org/10.1145/3698817

Knowledge Graphs can be encoded using different data models. They are especially abundant using RDF and recently also as property graphs. While knowledge graphs in RDF adhere to the subject-predicate-object structure, property graphs utilize multi-...

research-article

LSMGraph: A High-Performance Dynamic Graph Storage System with Multi-Level CSR

Article No.: 243, Pages 1–28https://doi.org/10.1145/3698818

The growing volume of graph data may exhaust the main memory. It is crucial to design a disk-based graph storage system to ingest updates and analyze graphs efficiently. However, existing dynamic graph storage systems suffer from read or write ...

research-article

Memento Filter: A Fast, Dynamic, and Robust Range Filter

Article No.: 244, Pages 1–27https://doi.org/10.1145/3698820

Range filters are probabilistic data structures that answer approximate range emptiness queries. They aid in avoiding processing empty range queries and have use cases in many application domains such as key-value stores and social web analytics. However,...

research-article

Multivariate Time Series Cleaning under Speed Constraints

Article No.: 245, Pages 1–26https://doi.org/10.1145/3698821

Errors are common in time series due to unreliable sensor measurements. Existing methods focus on univariate data but do not utilize the correlation between dimensions. Cleaning each dimension separately may lead to a less accurate result, as some errors ...

research-article

Navigating Labels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search

Article No.: 246, Pages 1–27https://doi.org/10.1145/3698822

Given a query vector, approximate nearest neighbor search (ANNS) aims to retrieve similar vectors from a set of high-dimensional base vectors. However, many real-world applications jointly query both vector data and structured data, imposing label ...

research-article

Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability

Article No.: 247, Pages 1–26https://doi.org/10.1145/3698823

Temporal knowledge graphs (TKGs) are valuable resources for capturing evolving relationships among entities, yet they are often plagued by noise, necessitating robust anomaly detection mechanisms. Existing dynamic graph anomaly detection approaches ...

research-article

Open Access

Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs

Article No.: 248, Pages 1–26https://doi.org/10.1145/3698832

Data analytics tasks are often formulated as data workflows represented as directed acyclic graphs (DAGs) of operators. The recent trend of adopting machine learning (ML) techniques in workflows results in increasingly complicated DAGs with many ...

research-article

Open Access

Personalized Truncation for Personalized Privacy

Article No.: 249, Pages 1–25https://doi.org/10.1145/3698825

In the standard model of differential privacy (DP), every user's privacy is treated equally, which is captured by a single privacy parameter \varepsilon. However, in many real-world situations, users may have diverse privacy concerns and requirements, ...

research-article

Provenance-Enabled Explainable AI

Article No.: 250, Pages 1–27https://doi.org/10.1145/3698826

Machine learning (ML) algorithms have advanced significantly in recent years, progressively evolving into artificial intelligence (AI) agents capable of solving complex, human-like intellectual challenges. Despite the advancements, the interpretability ...

research-article

Open Access

SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs

Article No.: 251, Pages 1–27https://doi.org/10.1145/3698827

Recent advances in Dual In-line Memory Modules (DIMMs) allow DIMMs to support Processing-In-DIMM (PID) by placing In-DIMM Processors (IDPs) near their memory banks. Prior studies have shown that in-memory joins can benefit from PID by offloading their ...

research-article

Towards a Converged Relational-Graph Optimization Framework

Article No.: 252, Pages 1–27https://doi.org/10.1145/3698828

The recent ISO SQL:2023 standard adopts SQL/PGQ (Property Graph Queries), facilitating graph-like querying within relational databases. This advancement, however, underscores a significant gap in how to effectively optimize SQL/PGQ queries within ...

research-article

Open Access

Understanding and Reusing Test Suites Across Database Systems

Article No.: 253, Pages 1–26https://doi.org/10.1145/3698829

Database Management System (DBMS) developers have implemented extensive test suites to test their DBMSs. For example, the SQLite test suites contain over 92 million lines of code. Despite these extensive efforts, test suites are not systematically reused ...

Sections

Issue Downloads

Save to Binder

Subjects

Comments