Abstract
Open data is a growing demand by data analysts, companies, and the general public. Yet, when databases to be publicly released contain information on individual respondents (e.g., responses to polls, census information, healthcare records, etc.), they must be released in a way that preserves the privacy of these respondents: it should be de facto impossible to relate the published data to specific individuals. To achieve this goal, the Statistical Disclosure Control (SDC) discipline has proposed a plethora of privacy protection methods, known under a variety of names such as SDC methods, anonymization methods, or sanitization methods. This chapter provides an overview of the issues in database privacy, a survey of the best-known SDC methods, a discussion on the related data privacy/utility trade-offs, and a description of privacy models proposed by the computer science community in recent years. Some relevant freeware packages are also identified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adam NR, Wortmann JC (1989) Security-control for statistical databases: a comparative study. ACM Comput Surv 21(4):515–556
Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International conference on management of data, SIGMOD’00ACM, New York, USA, pp 439–450
Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms, vol 34 of Advances in database systems. Springer, Heidelberg (2008)
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigraphy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of the 10th International conference on database theory, ICDT 2005, pp 246–258
ARX—Powerful data anonymization (2014). http://arx.deidentifier.org
Batet M, Erola A, Sánchez D, Castellá-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:110–123
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. Proceedings of the 21st International conference on data engineering ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 217–228
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual symposium on the theory of computing-STOC 2008, pp 609–618
Chen B-C, Kifer D, LeFevre K, Machanavajjhala A (2009) Privacy-preserving data publishing. Found Trends Databases 2(1–2):1–167
Chen R, Mohammed N, Fung BCM, Desai BC, Xiong L (2011) Publishing set-valued data via differential privacy. In: 37th International conference on very large data bases-VLDB 2011/Proceedings of the VLDB endowment, vol 4, issue no 11, 1087–1098
Chin FY, Ozsoyoglu G (1982) Auditing and inference control in statistical databases. IEEE Trans Softw Eng SE-8:574–582
Dalenius T (1974) The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12:213–225
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on design and analysis of longitudinal surveys, Ottawa, Canada, pp 195–204
Denning DE, Denning PJ, Schwartz MD (1979) The tracker: a threat to statistical database security. ACM Trans Database Syst 4(1):76–96
Dobra A, Fienberg SE, Trottini M (2003) Assessing the risk of disclosure of confidential categorical data. In: Bernardo J et al (eds) Bayesian statistics 7, Proceedings of the Seventh Valencia International meeting on Bayesian statistics. Oxford University Press, Oxford, pp 125–139
Domingo-Ferrer J (2007) A three-dimensional conceptual framework for database privacy. In: Secure data management-4th VLDB workshop SDM’2007, vol 4721. Lecture notes in computer science, pp 193–202
Domingo-Ferrer J (2008) A critique of k-anonymity and some of its enhancements. In: Proceedings of ARES/PSAI 2008. IEEE Computer Society, pp 990–993
Domingo-Ferrer J, Martnez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. VLDB J 15:355–369
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. Confidentiality. Disclosure and data access: theory and practical applications for statistical agencies, North-Holland, Amsterdam, pp 111–134
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogenous k-anonymity through microaggregation. Data Min Knowl Disc 11(2):195–212
Domingo-Ferrer J, Sebé F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55(4):714–732
Domingo-Ferrer J, Sánchez D, Rufian-Torrell G (2013) Anonymization of nominal data based on semantic marginality. Inf Sci 242:35–48
Duncan GT, Mukherjee S (2000) Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J Am Stat Assoc 45:720–729
Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan S (2009) On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual symposium on the theory of computing-STOC 2009, pp 381–390
Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001) Disclosure limitation methods and information loss for tabular data. In: Confidentiality, disclosure and data access: theory and practical applications for statistical agencies. North-Holland, Amsterdam, pp 135–166
Dwork C (2006) Differential privacy. In: Proceedings of 33rd International colloquium on automata, languages and programming, ICALP 2006. Springer, pp 1–12
Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. Proceedings of the 21st International conference on data engineering, ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 205–216
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4)
Gopal R, Garfinkel R, Goes P (2002) Confidentiality via camouflage: the CVC approach to disclosure limitation when answering queries to databases. Oper Res 50:501–516
Gouweleeuw JM, Kooiman P, Willenborg LCRJ, DeWolf P-P (1997) Post randomisation for statistical disclosure control: theory and implementation. Research paper no. 9731. Statistics Netherlands, Voorburg
Greenberg B (1987) Rank swapping for ordinal data. U. S. Bureau of the Census, Washington, DC (unpublished manuscript)
Guarino N (1998) Formal ontology in information systems, In: Proceedings of the 1st International conference on formal ontology in information systems, Trento, Italy, pp 3–15
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Proceedings of the IEEE 12th International conference on data mining workshops, pp 360–369. IEEE Computer Society
Hajian S, Domingo-Ferrer J, Farràs O (to appear) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Mining Knowl Discov
Hansen SL, Mukherjee S (2003) Polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Hardt M, Ligett K, McSherry F (2010) A simple and practical algorithm for differentially private data release. Preprint arXiv:1012.4763v1
Karr AF, Kohnen CN, Oganian A, Reiter JP, Sanil AP (2006) A framework for evaluating the utility of data altered to protect confidentiality. Am Stat 60(3)
Hundepool A, Van de Wetering A, Ramaswamy R, Franconi L, Polettini S, Capobianchi A, DeWolf P-P, Domingo-Ferrer J, Torra V, Brand R, Giessing S (2008) μ-ARGUS version 4.2 Software and user’s manual. statistics Netherlands, Voorburg NL. http://neon.vb.cbs.nl/casc/mu.htm. Accessed 22 Dec 2008
Hundepool A, Van de Wetering A, Ramaswamy R, de Wolf P-P, Giessing S, Fischetti M, Salazar J-J, Castro J, Lowthian P (2011) τ-ARGUS v. 3.5 Software and user’s manual. CENEX SDC Project Deliverable. http://neon.vb.cbs.nl/casc/tau.htm
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, New York
Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the section on survey research methods. American Statistical Association, Alexandria VA, pp 303–308
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Laszlo M, Mukherjee S (2009) Approximation bounds for minimum information loss microaggregation. IEEE Trans Knowl Data Eng 21(11):1643–1647
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: Efficient full-domain k-anonymity. Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD’05ACM, New York, USA, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International conference on data engineering, ICDE’06. IEEE Computer Society, Washington, DC, USA, p 25
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2007, pp 106–115
Machanavajjhala A, Gehrke J, Kifer, D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2006, p 24
Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the IEEE international conference on data engineering, ICDE 2008, pp 277–286
Martínez S, Sánchez D, Valls A (2013) A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J Biomed Inf 46(2):294–303
Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Disc 11(2):181–193
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. Proceedings of the 23th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems PODS’04. ACM, New York, USA, pp 223–228
Muralidhar D, Sarathy R (2006) Data shuffling—a new masking approach for numerical data. Manage Sci 52(5):658–670
Muralidhar K, Batra D, Kirs PJ (1995) Accessibility, security and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Manage Sci 41:1549–1564
Reiter JP (2002) Satisfying disclosure restrictions with synthetic data sets. J Off Stat 18(4):531–544
Rubin DB (1993) Discussion of statistical disclosure limitation. J Off Stat 9(2):461–468
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS’98, p 188. ACM, New York, USA (1998)
Sánchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728
Schlörer J (1975) Identification and retrieval of personal records from a statistical data bank. Methods Inf Med 14(1):7–13
Schlörer J (1980) Disclosure from statistical databases: quantitative aspects of trackers. ACM Trans Database Syst 5:467–492
sdcMicro: statistical disclosure control methods for anonymization of microdata and risk estimation, v. 4.2.0. http://cran.r-project.org/web/packages/sdcMicro/index.html. Accessed 10 Jan 2014
sdcTable: Methods for statistical disclosure control in tabular data, v. 0.10.3. http://cran.r-project.org/package=sdcTable. Accessed 4 Nov 2014
Sebé F, Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases. Lecture notes in computer science, vol 2316. Springer, Berlin, pp 163–171
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martnez S (to appear) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Templ M (2008) Statistical disclosure control for microdata using the R-package sdcMicro. Trans Data Priv 1(2):67–85
Torra V (2004) Microaggregation for categorical variables: a median based approach. In: Privacy in statistical databases-PSD 2004, LNCS, vol 3050. Springer, Heidelberg, pp 162–174
Traub JF, Yemini Y, Wozniakowski H (1984) The statistical security of a statistical database. ACM Trans Database Syst 9:672–679
Willenborg L, DeWaal T (2001) Elements of statistical disclosure control. Springer, New York
Winkler WE (1998) Re-identification methods for evaluating the confidentiality of analytically valid microdata. Res Off Stat 1(2):50–69
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the ACM SIGKDD International conference on knowledge discovery and data mining, KDD 2016, pp 754–759
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International conference on very large data bases-VLDB 2006, pp 139–150
Xiao Y, Xiong L, Yuan C (2010) Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th VLDB conference on secure data management, SDM’10, pp 150–168
Xu J, Zhang Z, Xiao X, Yang Y, Yu G (2012) Differentially private histogram publication. In: Proceedings of the IEEE International conference on data engineering, ICDE 2012, pp 32–43
Acknowledgments and Disclaimer
This work was partly supported by the Government of Catalonia under grant 2014 SGR 537, by the Spanish Government through projects TIN2011-27076-C03-01 “CO-PRIVACY” and TIN2014-57364-C2-R “SmartGlacis”, and by the European Commission under H2020 project “CLARUS”. J. Domingo-Ferrer is partially supported as an ICREA Acadèmia researcher by the Government of Catalonia. The authors are with the UNESCO Chair in Data Privacy, but they are solely responsible for the views expressed in this chapter, which do not necessarily reflect the position of UNESCO nor commit that organization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Domingo-Ferrer, J., Sánchez, D., Hajian, S. (2015). Database Privacy. In: Zeadally, S., Badra, M. (eds) Privacy in a Digital, Networked World. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-08470-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-08470-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08469-5
Online ISBN: 978-3-319-08470-1
eBook Packages: Computer ScienceComputer Science (R0)