Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- panelJune 2022
The DB Community vis-à-vis Environmental, Health, and Societal Grand Challenges: Innovation Engine, Plumber, or Bystander?
- Anastasia Ailamaki,
- Leilani Battle,
- Johannes Gehrke,
- Masaru Kitsuregawa,
- David Maier,
- Christopher Re,
- Meihui Zhang,
- Magdalena Balazinska
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2498–2500https://doi.org/10.1145/3514221.3528414This panel considers the role of the database research community in addressing humanity's greatest challenges. Are we an innovation engine, tool providers, or are we standing on the side while other research communities take the lead?
- panelJune 2022
Publication Culture and Review Processes in the Data Management Community: An Open Discussion
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2501–2502https://doi.org/10.1145/3514221.3528413The Data Management community has explored many options in recent years to improve our publication culture and review processes, ranging from innovative journal-conference hybrids that decouple publication from presentation, incorporating journal-style ...
- research-articleJune 2022
X-SSD: A Storage System with Native Support for Database Logging and Replication
- Sangjin Lee,
- Alberto Lerner,
- André Ryser,
- Kibin Park,
- Chanyoung Jeon,
- Jinsub Park,
- Yong Ho Song,
- Philippe Cudré-Mauroux
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 988–1002https://doi.org/10.1145/3514221.3526188Transaction logging and log shipping are standard techniques to provide recoverability and high availability in data management systems. They entail an update to a local log file at every transaction and sending such an update to a remote site in a ...
- research-articleJune 2022
ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 685–699https://doi.org/10.1145/3514221.3526187In this paper, we propose ScaleStore, a novel distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, and scalability at the same time. Using low latency RDMA messages, ...
- research-articleJune 2022
Materialization and Reuse Optimizations for Production Data Science Pipelines
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1962–1976https://doi.org/10.1145/3514221.3526186Many companies and businesses train and deploy machine learning (ML) pipelines to answer prediction queries. In many applications, new training data continuously becomes available. A typical approach to ensure that ML models are up-to-date is to retrain ...
-
- research-articleJune 2022
Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1243–1256https://doi.org/10.1145/3514221.3526184Window functions became part of the SQL standard in SQL:2003 and are widely used for data analytics: Percentiles, rankings, moving averages, running sums and local maxima are all expressed as window functions in SQL. Yet, the features offered by SQL's ...
- research-articleJune 2022
dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series Classification
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1175–1189https://doi.org/10.1145/3514221.3526183Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a real need in many applications. ...
- research-articleJune 2022
T-LevelIndex: Towards Efficient Query Processing in Continuous Preference Space
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 2149–2162https://doi.org/10.1145/3514221.3526182Top-k related queries in continuous preference space (e.g., k-shortlist preference query kSPR, uncertain top-k query UTK, output-size specified utility-based query ORU) have numerous applications but are expensive to process. Existing algorithms process ...
- research-articleJune 2022
Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 545–558https://doi.org/10.1145/3514221.3526181Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction ...
- research-articleJune 2022
Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1920–1933https://doi.org/10.1145/3514221.3526179Recent learned cardinality estimation (CE) models are vulnerable when query predicates or the underlying datasets drift from what the models were trained upon. We propose a system Warper that accelerates model adaptation to drifts; Warper generates ...
- research-articleJune 2022
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1146–1159https://doi.org/10.1145/3514221.3526178In this paper, we study the near-duplicate text alignment search problem, which, given a collection of source (data) documents and a suspicious (query) document, finds all the near-duplicate passage pairs between the suspicious document and every source ...
- research-articleJune 2022
TSUBASA: Climate Network Construction on Historical and Real-Time Data
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 286–295https://doi.org/10.1145/3514221.3526177A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied to climate data to study the dynamics of a climate network. The core task to enable network dynamics analysis on ...
- research-articleJune 2022
Towards Dynamic and Safe Configuration Tuning for Cloud Databases
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 631–645https://doi.org/10.1145/3514221.3526176Configuration knobs of database systems are essential to achieve high throughput and low latency. Recently, automatic tuning systems using machine learning methods (ML) have shown to find better configurations compared to experienced database ...
- research-articleJune 2022
TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud Storage
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1418–1432https://doi.org/10.1145/3514221.3526175Timeseries management systems have attracted considerable attention during the last decade with the rise of IoT and performance monitoring. With the rapidly increasing data scale in the production environment, deploying timeseries management systems on ...
- research-articleJune 2022
Statistical Schema Learning with Occam's Razor
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 176–189https://doi.org/10.1145/3514221.3526174A judiciously normalized database schema can increase data interpretability, reduce data size, and improve data integrity. However, real world data sets are often stored or shared in a denormalized state. We examine the problem of automatically creating ...
- research-articleJune 2022
Sommelier: Curating DNN Models for the Masses
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1876–1890https://doi.org/10.1145/3514221.3526173Deep learning model repositories are indispensable in machine learning ecosystems today to facilitate model reuse. However, existing model repositories provide a bare-bone interface for model retrieval. The onus is on the user to profile and select from ...
- research-articleJune 2022
Hybrid Deterministic and Nondeterministic Execution of Transactions in Actor Systems
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 65–78https://doi.org/10.1145/3514221.3526172The actor model has been widely adopted in building stateful middle-tiers for large-scale interactive applications, where ACID transactions are useful to ensure application correctness. In this paper, we present Snapper, a new transaction library on top ...
- research-articleJune 2022
Skeena: Efficient and Consistent Cross-Engine Transactions
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 34–48https://doi.org/10.1145/3514221.3526171Database systems are becoming increasingly multi-engine. In particular, a main-memory database engine may coexist with a traditional storage-centric engine in a system to support various applications. It is desirable to allow applications to access data ...
SAM: Database Generation from Query Workloads with Supervised Autoregressive Models
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1542–1555https://doi.org/10.1145/3514221.3526168With the prevalence of cloud databases, database users are increasingly reliant on the cloud database providers to manage their data. It becomes a challenge for cloud providers to benchmark different DBMS for a specific database instance without having ...
- research-articleJune 2022
Proteus: A Self-Designing Range Filter
- Eric R. Knorr,
- Baptiste Lemaire,
- Andrew Lim,
- Siqiang Luo,
- Huanchen Zhang,
- Stratos Idreos,
- Michael Mitzenmacher
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 1670–1684https://doi.org/10.1145/3514221.3526167We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic ...