Newsletter Downloads
Technical Perspective on 'Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch'
The topics of private data analysis and streaming data management have both been separately the focus of much study within the data management community for many years. However, more recently there have been studies which bring these two previously ...
Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch
We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan,...
Technical Perspective: Allocating Isolation Levels to Transactions in a Multiversion Setting
Among the ways a database management system adds value, is the transaction abstraction, where the application coder can group together multiple data accesses that collectively perform one meaningful real-world activity. The platform will provide the "...
Allocating Isolation Levels to Transactions in a Multiversion Setting
A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe ...
Technical Perspective: From Binary Join to Free Join
Most queries access data from more than one relation, which makes joins between relations an extremely common operation. In many cases the execution time of a query is dominated by the processing of the involved joins. This observation has led to a wide ...
From Binary Join to Free Join
Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional ...
Technical Perspective: Efficient and Reusable Lazy Sampling
When interactively working with data, query latency is very important. In particular when ad-hoc queries are written in an explorative manner, it is essential to quickly get feedback in order to refine and correct the query based upon result values. This ...
Efficient and Reusable Lazy Sampling
Modern analytical engines rely on Approximate Query Processing (AQP) to provide faster response times than the hardware allows for exact query answering. However, existing AQP methods impose steep performance penalties as workload unpredictability ...
Technical Perspective: Unicorn: A Unified Multi-Tasking Matching Model
Data integration has been a long-standing challenge for data management. It has recently received significant attention due to at least three main reasons. First, many data science projects require integrating data from disparate sources before analysis ...
Unicorn: A Unified Multi-Tasking Matching Model
Data matching, which decides whether two data elements (e.g., string, tuple, column, or knowledge graph entity) are the "same" (a.k.a. a match), is a key concept in data integration. The widely used practice is to build task-specific or even dataset-...
Technical Perspective: Graph Theory for Data Privacy: A New Approach for Complex Data Flows
Nearly all of the world's population now uses online services that request personal information, covering almost every aspect of our lives. The abundance of personal data in digital form has brought incredible benefits to end users, enabling them to ...
Graph Theory for Consent Management: A New Approach for Complex Data Flows
Through legislation and technical advances users gain more control over how their data is processed, and they expect online services to respect their privacy choices and preferences. However, data may be processed for many different purposes by several ...
Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark
Synthetic data is a vital substitute for real sensitive personal data in supporting social science research and policy studies. Extensive prior research has delved into various models for generating synthetic data, from traditional statistical approaches ...
Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy
- Lucas Rosenblatt,
- Bernease Herman,
- Anastasia Holovenko,
- Wonkwon Lee,
- Joshua Loftus,
- Elizabeth McKinnie,
- Taras Rumezhak,
- Andrii Stadnik,
- Bill Howe,
- Julia Stoyanovich
Differential privacy (DP) data synthesizers are increasingly proposed to afford public release of sensitive information, offering theoretical guarantees for privacy (and, in some cases, utility), but limited empirical evidence of utility in practical ...
Learning to Restructure Tables Automatically
By now, it is widely-accepted folk wisdom that "half of the time in any data analysis project is spent wrangling the data". Analytic algorithms and tools-built on mathematical foundations of matrices and relations-require their data to be lined up in ...
Auto-Tables: Relationalize Tables without Using Examples
Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild"...
Technical Perspective: A Fresh Look at Stream Computation through DSP Glasses
DBSP (Data Base Stream Processing) is a simple yet expressive language for stream computation that draws inspiration from DSP (Digital Signal Processing). In DBSP, stream computation is expressed using circuits of stream operators whose input and output ...
DBSP: Incremental Computation on Streams and Its Applications to Databases
We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are "changing". The goal of an efficient implementation is to "reuse" previously computed results. Ideally, when ...
Subjects
Currently Not Available