Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3662165.3662762acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Performance Truthfulness of Differential Privacy for DB Testing

Published: 09 June 2024 Publication History

Abstract

Performance benchmarking is a crucial tool to evaluate whether performance degradation occurs when a system is modified, allowing cloud providers to plan the provisioning of and changes to their resources. This is most commonly realized through the execution of standardized industry benchmarks. However, these benchmarks oftentimes do not capture all facets of actual customer workloads. To mitigate this issue, one can use an anonymized version of the customer's workload through algorithms such as hashing, k-anonymity, or differential privacy, allowing for targeted and customized performance benchmarking. In this paper, we focus on one of these techniques, namely differential privacy, and examine its impact on benchmarking performance, i.e., whether differential privacy can maintain the same performance characteristics as the original workload. We discuss several challenges specific to differential privacy algorithms such as the handling of unique column values, queries that span multiple tables, and continuous values, as well as how effectively it can be deployed. Our examination shows that differential privacy is a promising technique in this space but has practical limitations such as scaling problems and, for some queries, a distortion of the performance characteristics.

References

[1]
2020. Smartnoise. https://smartnoise.org/. Accessed: 2024-03-08.
[2]
Arvind Arasu, Raghav Kaushik, and Jian Li. 2011. Data Generation Using Declarative Constraints. In SIGMOD (Athens, Greece). 685--696. https://doi.org/10.1145/1989323.1989395
[3]
Meeyoung Cha, Hamed Haddadi, Fabrício Benevenuto, and Krishna P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. AAAI Conference on Weblogs and Social Media 14.
[4]
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277--288. http://www.vldb.org/pvldb/vol7/p277-difallah.pdf
[5]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology - EUROCRYPT 2006, Serge Vaudenay (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 486--503.
[6]
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science 9, 3-4 (2014), 211--407. https://doi.org/10.1561/0400000042
[7]
Eric Enzler. 2024. Differentially Private Data and Query Set Generation for Benchmarking. ETH Zurich (2024). https://doi.org/10.3929/ethz-b-000668913
[8]
Olivier Goerens. 2023--04-26. Secure Database Performance. ETH Zurich (2023-04-26). https://doi.org/10.3929/ethz-b-000614214
[9]
Moritz Hardt, Katrina Ligett, and Frank McSherry. 2010. A simple and practical algorithm for differentially private data release. CoRR abs/1012.4763 (2010). arXiv:1012.4763 http://arxiv.org/abs/1012.4763
[10]
James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2018. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In International Conference on Learning Representations. https://api.semanticscholar.org/CorpusID:53342261
[11]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 3-es.
[12]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (mar 2007), 3-es. https://doi.org/10.1145/1217299.1217302
[13]
Ryan McKenna, Gerome Miklau, and Daniel Sheldon. 2021. Winning the NIST Contest: A scalable and general approach to differentially private synthetic data. CoRR abs/2108.04978 (2021). arXiv:2108.04978 https://arxiv.org/abs/2108.04978
[14]
Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerome Miklau. 2022. AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data. CoRR abs/2201.12677 (2022). arXiv:2201.12677 https://arxiv.org/abs/2201.12677
[15]
Joseph P. Near and Chiké Abuah. 2021. Programming Differential Privacy. Vol. 1. https://programming-dp.com/
[16]
Parimarjan Negi, Laurent Bindschaedler, Mohammad Alizadeh, Tim Kraska, Jyoti Leeka, Anja Gruenheid, and Matteo Interlandi. 2023. Unshackling Database Benchmarking from Synthetic Workloads. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023. IEEE, 3659--3662. https://doi.org/10.1109/ICDE55515.2023.00292
[17]
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2017. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. arXiv:1610.05755 [stat.ML]
[18]
Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems 10, 05 (2002), 557--570.
[19]
Latanya Sweeney. 2002. K-Anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 5 (oct 2002), 557--570. https://doi.org/10.1142/S0218488502001648
[20]
TPC. 2023. TPC Benchmarks Overview. https://www.tpc.org/information/benchmarks5.asp.
[21]
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. R@2019. Modeling Tabular data using Conditional GAN. CoRR abs/1907.00503 (2019). arXiv:1907.00503 http://arxiv.org/abs/1907.00503
[22]
Jingyi Yang and et al. 2022. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. In SIGMOD.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems
June 2024
45 pages
ISBN:9798400706691
DOI:10.1145/3662165
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Check for updates

Author Tags

  1. DB testing
  2. differential privacy
  3. performance benchmarking

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 31 of 56 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 40
    Total Downloads
  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)28
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media