Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2025
Capsule: An Out-of-Core Training Mechanism for Colossal GNNs
Proceedings of the ACM on Management of Data (PACMMOD), Volume 3, Issue 1Article No.: 19, Pages 1–30https://doi.org/10.1145/3709669Cutting-edge platforms of graph neural networks (GNNs), such as DGL and PyG, harness the parallel processing power of GPUs to extract structural information from graph data, achieving state-of-the-art (SOTA) performance in fields such as recommendation ...
- research-articleFebruary 2025
Online Marketplace: A Benchmark for Data Management in Microservices
Proceedings of the ACM on Management of Data (PACMMOD), Volume 3, Issue 1Article No.: 3, Pages 1–26https://doi.org/10.1145/3709653Microservice architectures have become a popular approach for designing scalable distributed applications. Despite their extensive use in industrial settings for over a decade, there is limited understanding of the data management challenges that arise ...
- research-articleDecember 2024
Live Patching for Distributed In-Memory Key-Value Stores
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 6Article No.: 241, Pages 1–26https://doi.org/10.1145/3698816Providers of high-availability data stores need to roll out software updates without causing noticeable downtimes. For distributed data stores like Redis Cluster, the state-of-the-art is a rolling update, where the nodes are restarted in sequence. This ...
- research-articleSeptember 2024
Tao: Improving Resource Utilization while Guaranteeing SLO in Multi-tenant Relational Database-as-a-Service
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 4Article No.: 205, Pages 1–26https://doi.org/10.1145/3677141It is an open challenge for cloud database service providers to guarantee tenants' service-level objectives (SLOs) and enjoy high resource utilization simultaneously. In this work, we propose a novel system Tao to overcome it. Tao consists of three key ...
- research-articleMay 2024
Chase Termination Beyond Polynomial Time
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 2Article No.: 93, Pages 1–17https://doi.org/10.1145/3651594The chase is a widely implemented approach to reason with tuple-generating dependencies (tgds), used in data exchange, data integration, and ontology-based query answering. However, it is merely a semi-decision procedure, which may fail to terminate. ...
- research-articleMarch 2024
SkyPIE: A Fast & Accurate Oracle for Object Placement
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 1Article No.: 55, Pages 1–27https://doi.org/10.1145/3639310Cloud object stores offer vastly different price points for object storage as a function of workload and geography. Poor object placement can thus lead to significant cost overheads. Prior cost-saving techniques attempt to optimize placement policies on ...
- research-articleMarch 2024
LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 1Article No.: 54, Pages 1–25https://doi.org/10.1145/3639309Cardinality estimation is an important step in cost-based database query optimization. The accuracy of the estimates directly affects the ability of an optimizer to identify the most efficient query execution plan correctly. In this paper, we study ...
- research-articleDecember 2023
Rethink Query Optimization in HTAP Databases
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 256, Pages 1–27https://doi.org/10.1145/3626750The advent of data-intensive applications has fueled the evolution of hybrid transactional and analytical processing (HTAP). To support mixed workloads, distributed HTAP databases typically maintain two data copies that are specially tailored for data ...
- research-articleDecember 2023
Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet
- Yanan Li,
- Guangqing Deng,
- Changming Bai,
- Jingyu Yang,
- Gang Wang,
- Hao Zhang,
- Jin Bai,
- Haitao Yuan,
- Mengwei Xu,
- Shangguang Wang
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 236, Pages 1–29https://doi.org/10.1145/3626723Video streaming applications (VSAs) are increasingly being deployed on large-scale edge platforms, which have the potential to significantly improve the quality of service (QoS) and end-user experience (QoE), ultimately maximizing business outcomes. ...
- research-articleDecember 2023
Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 233, Pages 1–25https://doi.org/10.1145/3626720Analytical query workloads are prone to rapid fluctuations in resource demands. These rapid, hard to predict resource demand changes make provisioning a challenge. Users must either over provision at excessive cost or suffer poor query latency when ...
- research-articleNovember 2023
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach
- Zhen Zheng,
- Zaifeng Pan,
- Dalin Wang,
- Kai Zhu,
- Wenyi Zhao,
- Tianyou Guo,
- Xiafei Qiu,
- Minmin Sun,
- Junjie Bai,
- Feng Zhang,
- Xiaoyong Du,
- Jidong Zhai,
- Wei Lin
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 3Article No.: 206, Pages 1–29https://doi.org/10.1145/3617327Compiler optimization plays an increasingly important role to boost the performance of machine learning models for data processing and management. With increasingly complex data, the dynamic tensor shape phenomenon emerges for ML models. However, ...
- research-articleJune 2023
Vineyard: Optimizing Data Sharing in Data-Intensive Analytics
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 200, Pages 1–27https://doi.org/10.1145/3589780Modern data analytics and AI jobs become increasingly complex and involve multiple tasks performed on specialized systems. Sharing of intermediate data between different systems is often a significant bottleneck in such jobs. When the intermediate data ...
- research-articleJune 2023
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning
- Hanyu Zhao,
- Zhi Yang,
- Yu Cheng,
- Chao Tian,
- Shiru Ren,
- Wencong Xiao,
- Man Yuan,
- Langshi Chen,
- Kaibo Liu,
- Yang Zhang,
- Yong Li,
- Wei Lin
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 193, Pages 1–25https://doi.org/10.1145/3589773Training data pre-processing pipelines are essential to deep learning (DL). As the performance of model training keeps increasing with both hardware advancements (e.g., faster GPUs) and various software optimizations, the data pre-processing on CPUs is ...
- research-articleJune 2023
Ghost: A General Framework for High-Performance Online Similarity Queries over Distributed Trajectory Streams
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 173, Pages 1–25https://doi.org/10.1145/3589318Trajectory similarity queries, including similarity search and similarity join, offer a foundation for many geo-spatial applications. With the rapid increase of streaming trajectory data volumes, e.g., data from mobile phones, vessel monitoring, or ...
- research-articleJune 2023
Using Cloud Functions as Accelerator for Elastic Data Analytics
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 161, Pages 1–27https://doi.org/10.1145/3589306Cloud function (CF) services, such as AWS Lambda, have been applied as the new computing infrastructure in implementing analytical query engines. For bursty and sparse workloads, CF-based query engine is more elastic than the traditional query engines ...
Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 151, Pages 1–28https://doi.org/10.1145/3589296We generalize the bulk-synchronous parallel (BSP) processing model to make it better support agent-based simulations. Such simulations frequently exhibit hierarchical structure in their communication patterns which can be exploited to improve ...
- research-articleJune 2023
DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 117, Pages 1–27https://doi.org/10.1145/3589262Providing strong fault-tolerant guarantees for the modern cloud is difficult, as application developers must coordinate between independent stateful services and ephemeral compute and handle various failure-induced anomalies. We propose Composable ...
LightRW: FPGA Accelerated Graph Dynamic Random Walks
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 90, Pages 1–27https://doi.org/10.1145/3588944Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of ...
- research-articleMay 2023
Runtime Variation in Big Data Analytics
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 67, Pages 1–20https://doi.org/10.1145/3588921The dynamic nature of resource allocation and runtime conditions on Cloud can result in high variability in a job's runtime across multiple iterations, leading to a poor experience. Identifying the sources of such variation and being able to predict and ...