PACMMOD: Vol 1, No 4

Volume 1, Issue 4December 2023PACMMOD

Volume 1, Issue 4

December 2023

Editor:

Divyakant Agrawal
UC Santa Barbara, United States

Publisher:

Association for Computing Machinery
New York
NY
United States

EISSN:2836-6573

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFrontmatter: front cover, IFC, table of contents

Select All

Export Citations Save to Binder

editorial

Free

PACMMOD Volume 1 Issue 4: Editorial

Article No.: 222, Pages 1–2https://doi.org/10.1145/3626709

Welcome to this issue of the Proceedings of the ACM on Management of Data (Volume 1, Issue 4 (SIGMOD)). While this issue has papers from the SIGMOD track, PACMMOD will soon also have issues with papers from the newly created PODS track. Out of 189 ...

research-article

GEqO: ML-Accelerated Semantic Equivalence Detection

Article No.: 223, Pages 1–25https://doi.org/10.1145/3626710

Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and ...

research-article

The Battleship Approach to the Low Resource Entity Matching Problem

Article No.: 224, Pages 1–25https://doi.org/10.1145/3626711

Entity matching, a core data integration problem, is the task of deciding whether two data tuples refer to the same real-world entity. Recent advances in deep learning methods, using pre-trained language models, were proposed for resolving entity ...

research-article

Open Access

Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control

Article No.: 225, Pages 1–26https://doi.org/10.1145/3626712

Many big data systems are written in languages such as C, C++, Java, and Scala to process large amounts of data efficiently, while data analysts often use Python to conduct data wrangling, statistical analysis, and machine learning. User-defined ...

research-article

ChainKV: A Semantics-Aware Key-Value Store for Ethereum System

Article No.: 226, Pages 1–23https://doi.org/10.1145/3626713

The Log-Structure Merged tree (LSM-tree) based key-value (KV) store has been widely adopted as the storage engine for blockchain systems, such as Ethereum, in which blockchain data are uniformly transformed into randomly distributed KV items for ...

research-article

Proving Query Equivalence Using Linear Integer Arithmetic

Article No.: 227, Pages 1–26https://doi.org/10.1145/3626768

Proving the equivalence between SQL queries is a fundamental problem in database research. Existing solvers model queries using algebraic representations and convert such representations into first-order logic formulas so that query equivalence can be ...

research-article

Open Access

A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

Article No.: 228, Pages 1–27https://doi.org/10.1145/3626715

What is a minimal set of tuples to delete from a database in order to eliminate all query answers? This problem is called "the resilience of a query" and is one of the key algorithmic problems underlying various forms of reverse data management, such as ...

research-article

ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling

Article No.: 229, Pages 1–26https://doi.org/10.1145/3626716

Distributed computing is promising to enable large-scale graph neural network (GNN) model training. However, care is needed to avoid excessive computational and communication overheads. Sampling is promising in terms of enabling scalability, and sampling ...

research-article

Open Access

ALP: Adaptive Lossless floating-Point Compression

Article No.: 230, Pages 1–26https://doi.org/10.1145/3626717

IEEE 754 doubles do not exactly represent most real values, introducing rounding errors in computations and [de]serialization to text. These rounding errors inhibit the use of existing lightweight compression schemes such as Delta and Frame Of Reference (...

research-article

Anchor: A Library for Building Secure Persistent Memory Systems

Article No.: 231, Pages 1–31https://doi.org/10.1145/3626718

Cloud infrastructure is experiencing a shift towards disaggregated setups, especially with the introduction of the Compute Express Link (CXL) technology, where byte-addressable ersistent memory (PM) is becoming prominent. To fully utilize the potential ...

research-article

AS-Parser: Log Parsing Based on Adaptive Segmentation

Article No.: 232, Pages 1–26https://doi.org/10.1145/3626719

System logs have long been recognized as valuable data for analyzing and diagnosing system failures. One fundamental task of log processing is to convert unstructured logs into structured logs through log parsing. All previous log parsing approaches ...

research-article

Open Access

Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools

Article No.: 233, Pages 1–25https://doi.org/10.1145/3626720

Analytical query workloads are prone to rapid fluctuations in resource demands. These rapid, hard to predict resource demand changes make provisioning a challenge. Users must either over provision at excessive cost or suffer poor query latency when ...

research-article

ChainedFilter: Combining Membership Filters by Chain Rule

Article No.: 234, Pages 1–27https://doi.org/10.1145/3626721

Membership (membership query/membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect ...

research-article

Open Access

Correlation Joins over Time Series Data Streams Utilizing Complementary Dimension Reduction and Transformation

Article No.: 235, Pages 1–26https://doi.org/10.1145/3626722

A common analysis task over a stream of time series is to find all pairs of windows whose correlation is above a given threshold. For a large number of streams, doing so naively, i.e., checking the Cartesian product, is too expensive. In essence, finding ...

research-article

Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet

Article No.: 236, Pages 1–29https://doi.org/10.1145/3626723

Video streaming applications (VSAs) are increasingly being deployed on large-scale edge platforms, which have the potential to significantly improve the quality of service (QoS) and end-user experience (QoE), ultimately maximizing business outcomes. ...

research-article

DGC: Training Dynamic Graphs with Spatio-Temporal Non-Uniformity using Graph Partitioning by Chunks

Article No.: 237, Pages 1–25https://doi.org/10.1145/3626724

Dynamic Graph Neural Network (DGNN) has shown a strong capability of learning dynamic graphs by exploiting both spatial and temporal features. Although DGNN has recently received considerable attention by AI community and various DGNN models have been ...

research-article

DP-starJ: A Differential Private Scheme towards Analytical Star-Join Queries

Article No.: 238, Pages 1–24https://doi.org/10.1145/3626725

Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (olap) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the ...

research-article

Open Access

Efficient Approximation Framework for Attribute Recommendation

Article No.: 239, Pages 1–26https://doi.org/10.1145/3626726

Trend analysis is a fundamental type of analytical query in online analytical processing (OLAP) systems. In trend analysis, a key step is to identify k valuable attributes whose distributions in two subsets under different predicates significantly differ ...

research-article

Open Access

Equitable Top-k Results for Long Tail Data

Article No.: 240, Pages 1–24https://doi.org/10.1145/3626727

For datasets exhibiting long tail phenomenon, we identify a fairness concern in existing top-k algorithms, that return a "fixed" set of k results for a given query. This causes a handful of popular records (products, items, etc) getting overexposed and ...

research-article

F3KM: Federated, Fair, and Fast k-means

Article No.: 241, Pages 1–25https://doi.org/10.1145/3626728

This paper proposes a federated, fair, and fast k-means algorithm (F3KM) to solve the fair clustering problem efficiently in scenarios where data cannot be shared among different parties. The proposed algorithm decomposes the fair k-means problem into ...

research-article

FACET: Robust Counterfactual Explanation Analytics

Article No.: 242, Pages 1–27https://doi.org/10.1145/3626729

Machine learning systems are deployed in domains such as hiring and healthcare, where undesired classifications can have serious ramifications for the user. Thus, there is a rising demand for explainable AI systems which provide actionable steps for lay ...

research-article

Generation of Training Examples for Tabular Natural Language Inference

Article No.: 243, Pages 1–27https://doi.org/10.1145/3626730

Tabular data is becoming increasingly important in Natural Language Processing (NLP) tasks, such as Tabular Natural Language Inference (TNLI). Given a table and a hypothesis expressed in NL text, the goal is to assess if the former structured data ...

research-article

Open Access

Hierarchical Cut Labelling - Scaling Up Distance Queries on Road Networks

Article No.: 244, Pages 1–25https://doi.org/10.1145/3626731

Answering the shortest-path distance between two arbitrary locations is a fundamental problem in road networks. Labelling-based solutions are the current state-of-the-arts to render fast response time, which can generally be categorised into hub-based ...

research-article

Public Access

High-Ratio Compression for Machine-Generated Data

Article No.: 245, Pages 1–27https://doi.org/10.1145/3626732

Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, ...

research-article

Open Access

HongTu: Scalable Full-Graph GNN Training on Multiple GPUs

Article No.: 246, Pages 1–27https://doi.org/10.1145/3626733

Full-graph training on graph neural networks (GNN) has emerged as a promising training method for its effectiveness. Full-graph training requires extensive memory and computation resources. To accelerate this training process, researchers have proposed ...

research-article

Open Access

Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries

Article No.: 247, Pages 1–26https://doi.org/10.1145/3626734

With the expansion of modern database services, multi-user access has become a crucial feature in various practical application scenarios, including enterprise applications and e-commerce platforms. However, if multiple users submit queries within a ...

research-article

Lightweight Materialization for Fast Dashboards Over Joins

Article No.: 248, Pages 1–27https://doi.org/10.1145/3626735

Dashboards are vital in modern business intelligence tools, providing non-technical users with an interface to access comprehensive business data. With the rise of cloud technology, there is an increased number of data sources to provide enriched ...

research-article

Open Access

MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and Querying

Article No.: 249, Pages 1–27https://doi.org/10.1145/3626736

LSM-based key-value stores have been leveraged in many state-of-the-art data-intensive applications as storage engines. As data volume scales up, a cost-efficient approach is to deploy these applications on hybrid cloud storage with hot/cold separation, ...

research-article

Open Access

MOST: Model-Based Compression with Outlier Storage for Time Series Data

Article No.: 250, Pages 1–29https://doi.org/10.1145/3626737

Time series data are used in a wide variety of applications. The explosive growth of the amount of time series data poses a significant challenge in efficient data storage and query processing. Unfortunately, existing compression techniques either show ...

research-article

Neural Attributed Community Search at Billion Scale

Article No.: 251, Pages 1–25https://doi.org/10.1145/3626738

Community search has been extensively studied in the past decades. In recent years, there is a growing interest in attributed community search that aims to identify a community based on both the query nodes and query attributes. A set of techniques have ...

Sections

Issue Downloads

Save to Binder

Subjects

Comments