Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Differentially private histogram publication

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data, with strong privacy guarantees against adversaries with arbitrary background knowledge. Existing studies on differential privacy mostly focus on simple aggregations such as counts. This paper investigates the publication of DP-compliant histograms, which is an important analytical tool for showing the distribution of a random variable, e.g., hospital bill size for certain patients. Compared to simple aggregations whose results are purely numerical, a histogram query is inherently more complex, since it must also determine its structure, i.e., the ranges of the bins. As we demonstrate in the paper, a DP-compliant histogram with finer bins may actually lead to significantly lower accuracy than a coarser one, since the former requires stronger perturbations in order to satisfy DP. Moreover, the histogram structure itself may reveal sensitive information, which further complicates the problem. Motivated by this, we propose two novel mechanisms, namely NoiseFirst and StructureFirst, for computing DP-compliant histograms. Their main difference lies in the relative order of the noise injection and the histogram structure computation steps. NoiseFirst has the additional benefit that it can improve the accuracy of an already published DP-compliant histogram computed using a naive method. For each of proposed mechanisms, we design algorithms for computing the optimal histogram structure with two different objectives: minimizing the mean square error and the mean absolute error, respectively. Going one step further, we extend both mechanisms to answer arbitrary range queries. Extensive experiments, using several real datasets, confirm that our two proposals output highly accurate query answers and consistently outperform existing competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://www.ncbi.nlm.nih.gov/gap.

  2. http://www.moh.gov.sg/content/moh_web/home/statistics.html.

  3. An alternative definition of sensitivity [9] concerns the maximum changes in the query results after modifying a record in the database. In our example, this leads to \(\Delta =2\), since in the worst case, changing a person’s age can affect the values in two different bins by 1 each.

References

  1. Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: PODS, pp. 273–282 (2007)

  2. Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: KDD, pp. 503–512 (2010)

  3. Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609–618 (2008)

  4. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn., pp. 185–192. MIT Press and McGraw-Hill, New York (2001)

  5. Cormode, G., Procopiuc, C.M., Srivastava, D., Tran, T.T.L.: Differentially private publication of sparse data. In: ICDT (2012)

  6. Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: ICDE (2012)

  7. Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD, pp. 217–228 (2011)

  8. Dwork, C.: Differential privacy: a survey of results. In: TAMC, pp. 1–19 (2008)

  9. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265–284 (2006)

  10. Dwork, C., McSherry, F., Talwar, K.: The price of privacy and the limits of LP decoding. In: STOC, pp. 85–94 (2007)

  11. Dwork, C., Rothblum, G.N., Vadhan, S.P.: Boosting and differential privacy. In: FOCS, pp. 51–60 (2010)

  12. Friedman, A., Schuster, A.: Data mining with differential privacy. In: KDD, pp. 493–502 (2010)

  13. Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing search logs—a comparative study of privacy guarantees. IEEE TKDE 24(3): 520–532 (2012)

    Google Scholar 

  14. Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM TODS 31(1), 396–438 (2006)

    Google Scholar 

  15. Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)

    Google Scholar 

  16. Homer N., Szelinger S., Redman M., Duggan D., Tembe W., Muehling J., Pearson J.V., Stephan D.A., Nelson S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e100167 (2008)

  17. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)

  18. Jagadish H.V., Koudas N., Muthukrishnan S., Poosala V., Sevcik K.C., Suel T. Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)

  19. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW, pp. 171–180 (2009)

  20. Kotz, S., Kozubowski, T., Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhäuser Publication, Boston (2001)

  21. Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123–134 (2010)

  22. Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. PVLDB 5(6), 514–525 (2012)

    Google Scholar 

  23. McSherry, F., Mahajan R. Differentially-private network trace analysis. In: SIGCOMM, pp. 123–134 (2010)

  24. Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.E.: Gupt: privacy preserving data analysis made easy. In: SIGMOD, pp. 349–360 (2012)

  25. Rastogi V., Nath S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: SIGMOD, pp. 735–746 (2010)

  26. Wang, R., Li, Y., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: Information leaks in genome wide association study. In: ACM CCS (2009)

  27. Xiao, X., Bender, G., Hay, M., Gehrke, J.: ireduct: differential privacy with reduced relative errors. In: SIGMOD, pp. 229–240 (2011)

  28. Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: ICDE, pp. 225–236 (2010)

  29. Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Secure Data Management, pp. 150–168 (2010)

  30. Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352–1363 (2012)

    Google Scholar 

  31. Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. PVLDB 5(11), 1364–1375 (2012)

    Google Scholar 

Download references

Acknowledgments

Jia Xu and Ge Yu are supported by the National Basic Research Program of China (973) under Grant 2012CB316201, the National Natural Science Foundation of China (with Nos. 61033007 and 61003058), and the Fundamental Research Funds for the Central Universities (with No. N100704001). Zhenjie Zhang and Yin Yang are supported by SERC Grant No. 102 158 0074 from Singapore’s A*STAR. Xiaokui Xiao is supported by Nanyang Technological University under SUG Grant M58020016 and AcRF Tier 1 Grant RG 35/09, and by the Agency for Science, Technology and Research (Singapore) under SERG Grant 1021580074

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Zhang, Z., Xiao, X. et al. Differentially private histogram publication. The VLDB Journal 22, 797–822 (2013). https://doi.org/10.1007/s00778-013-0309-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0309-y

Keywords