Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605098.3636070acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

DiApprox: Differential Privacy-based Online Range Queries Approximation for Multidimensional Data

Published: 21 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    With the increasing use of software services in daily life, the data collected by service providers is massive and sensitive. Although current big data analytics frameworks provide enormous data processing capacity, obtaining appropriate and private responses to large-scale queries quickly and without revealing sensitive information remains a challenging problem. It is clear that Approximate Query Processing (AQP) achieves faster execution with reasonable accuracy loss and Differential Privacy (DP) is popular for enforcing privacy by noising answers to queries. In this paper, we address the problem of combining AQP and DP in multidimensional data based on range queries. We present our private approximation system called DiApprox which takes into account online sampling to accelerate the execution of range queries and minimizes the noise to be injected into the samples and query results in order to preserve the data privacy. Through empirical evaluation, we show that DiApprox is able to approximate aggregation on large datasets over ×21 times faster than exact execution, with high accuracy.

    References

    [1]
    F. Abdulla, M. Hossain, and M. Rahman. 2014. On the selection of samples in probability proportional to size sampling: cumulative relative frequency method. Mathematical Theory and Modeling 4, 6 (2014), 102--107.
    [2]
    John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2867--2867.
    [3]
    Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. The aqua approximate query answering system. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 574--576.
    [4]
    Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European conference on computer systems. 29--42.
    [5]
    Hossein Ahmadvand, Maziar Goudarzi, and Fouzhan Foroutan. 2019. Gapprox: using gallup approach for approximation in big data processing. Journal of Big Data 6 (2019), 1--24.
    [6]
    Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment 13, 12 (2020), 2691--2705.
    [7]
    Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. 2017. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles. 441--459.
    [8]
    Vladimir Braverman and Rafail Ostrovsky. 2013. Generalizing the layering method of indyk and woodruff: Recursive sketches for frequency-based vectors on streams. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 58--70.
    [9]
    Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.
    [10]
    José S Costa Filho and Javam C Machado. 2023. FELIP: A local Differentially Private approach to frequency estimation on multidimensional datasets. (2023).
    [11]
    Cynthia Dwork. 2006. Differential privacy. In International colloquium on automata, languages, and programming. Springer, 1--12.
    [12]
    Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 1054--1067.
    [13]
    Inigo Goiri, Ricardo Bianchini, Santosh Nagarakatte, and Thu D Nguyen. 2015. Approxhadoop: Bringing approximations to mapreduce frameworks. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. 383--397.
    [14]
    Sudipto Guha and Boulos Harb. 2005. Wavelet synopsis for data streams: minimizing non-euclidean error. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 88--97.
    [15]
    Peter J Haas and Christian König. 2004. A bi-level bernoulli scheme for database sampling. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. 275--286.
    [16]
    Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online Aggregation. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA. 171--182.
    [17]
    Noah Johnson, Joseph P Near, and Dawn Song. 2018. Towards practical differential privacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), 526--539.
    [18]
    Kelly Kostopoulou, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, and Mathias Lécuyer. 2023. Boost: Effective Caching in Differentially-Private Databases. arXiv preprint arXiv:2306.16163 (2023).
    [19]
    Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. 2019. Privatesql: a differentially private sql query engine. Proceedings of the VLDB Endowment 12, 11 (2019), 1371--1384.
    [20]
    Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data. 615--629.
    [21]
    Sharon L. Lohr. 2009. Sampling : Design and Analysis.
    [22]
    Miti Mazmudar, Thomas Humphries, Jiaxiang Liu, Matthew Rafuse, and Xi He. 2022. Cache me if you can: Accuracy-aware inference engine for differentially private data exploration. arXiv preprint arXiv:2211.15732 (2022).
    [23]
    Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Machanavajjhala. 2021. HDMM: Optimizing error of high-dimensional statistical queries under differential privacy. arXiv preprint arXiv:2106.12118 (2021).
    [24]
    Frank Olken and Doron Rotem. 1986. Simple random sampling from relational databases. (1986).
    [25]
    Frank Olken and Doron Rotem. 1995. Random sampling from databases: a survey. Statistics and Computing 5 (1995), 25--42.
    [26]
    Gregory Piatetsky-Shapiro and Charles Connell. 1984. Accurate estimation of the number of tuples satisfying a condition. ACM Sigmod Record 14, 2 (1984), 256--276.
    [27]
    Chengjie Qin and Florin Rusu. 2014. PF-OLA: a high-performance framework for parallel online aggregation. Distributed and Parallel Databases 32 (2014), 337--375.
    [28]
    Guangxuan Song, Wenwen Qu, Xiaojie Liu, and Xiaoling Wang. 2018. Approximate calculation of window aggregate functions via global random sample. Data Science and Engineering 3 (2018), 40--51.
    [29]
    ADP Team et al. 2017. Learning with privacy at scale. Apple Mach. Learn. J 1, 8 (2017), 1--25.
    [30]
    Tianhao Wang, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, and Somesh Jha. 2019. Answering multi-dimensional analytical queries under local differential privacy. In Proceedings of the 2019 International Conference on Management of Data. 159--176.
    [31]
    Zhenya Wang, Xiang Cheng, Sen Su, Jintao Liang, and Haocheng Yang. 2023. ATLAS: GAN-based Differentially Private Multi-party Data Sharing. IEEE Transactions on Big Data (2023).
    [32]
    Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 651--662.
    [33]
    Fei Xu, Christopher M. Jermaine, and Alin Dobra. 2008. Confidence bounds for sampling-based group by estimates. ACM Trans. Database Syst. 33, 3 (2008), 16:1--16:44.
    [34]
    Xuhong Zhang, Jun Wang, and Jiangling Yin. 2016. Sapprox: Enabling efficient and accurate approximations on sub-datasets with distribution-aware online sampling. Proceedings of the VLDB Endowment 10, 3 (2016), 109--120.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
    April 2024
    1898 pages
    ISBN:9798400702433
    DOI:10.1145/3605098
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2024

    Check for updates

    Author Tags

    1. approximate queries
    2. differential privacy
    3. sampling
    4. OLAP

    Qualifiers

    • Research-article

    Conference

    SAC '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 12
      Total Downloads
    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media