Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On the Discrepancy Measures for the Optimal Equal Probability Partitioning in Bayesian Multivariate Micro-Aggregation

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Data holders, such as statistical institutions and financial organizations, have a very serious and demanding task when producing data for official and public use. It’s about controlling the risk of identity disclosure and protecting sensitive information when they communicate data-sets among themselves, to governmental agencies and to the public. One of the techniques applied is that of micro-aggregation. In a Bayesian setting, micro-aggregation can be viewed as the optimal partitioning of the original data-set based on the minimization of an appropriate measure of discrepancy, or distance, between two posterior distributions, one of which is conditional on the original data-set and the other conditional on the aggregated data-set. Assuming d-variate normal data-sets and using several measures of discrepancy, it is shown that the asymptotically optimal equal probability m-partition of \( \mathbb{R}^{d} \), with m 1/d\( \mathbb{N} \), is the convex one which is provided by hypercubes whose sides are formed by hyperplanes perpendicular to the canonical axes, no matter which discrepancy measure has been used. On the basis of the above result, a method that produces a sub-optimal partition with a very small computational cost is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ADAM, N.P., and WORTMANN, J.C. (1989), “Security Control Methods for Statistical Databases. A Comparative Study”, ACM Computing Surveys, 21, 515–556.

    Article  Google Scholar 

  • DEFAYS, D., and NANOPOULOS, P. (1993), “Panels of Enterprises and Confidentiality: The Small Aggregates Method”, in Proceedings of Statistics Canada Symposium –Design and Analysis of Longitudinal Surveys, Statistics Canada: Ottawa, 195–204.

    Google Scholar 

  • DEGROOT, M.H. (1970), Optimal Statistical Decisions, New York: McGraw-Hill.

    MATH  Google Scholar 

  • DOMINGO-FERRER, J., and MATEO-SANZ, J.M. (2002), “Practical Data-oriented Microaggregation for Statistical Disclosure Control”, IEEE Transactions on Knowledge and Data Engineering, 14(1), 189–201.

    Article  Google Scholar 

  • DUNCAN, G.T., and LAMBERT, D. (1989), “The Risk of Disclosure for Microdata”, Journal of Business and Economic Statistics, 7, 207–217.

    Article  Google Scholar 

  • DUNCAN, G.T., and PEARSON, R.W. (1991), “Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future”, Statistical Science, 6, 219–239.

    Article  Google Scholar 

  • FIENBERG, S.E. (1994), “Conflict Between the Needs for Access to Statistical Information and Demands for Confidentiality”, Journal of Official Statistics, 10, 115–132.

    Google Scholar 

  • KOKOLAKIS, G., and FOUSKAKIS, D. (2007), Importance Partitioning in Micro-Aggregation. Submitted.

  • KOKOLAKIS, G., and NANOPOULOS, P. (2001), “Bayesian Multivariate Micro-Aggregation under the Hellinger’s Distance Criterion”, Research in Official Statistics, 4, 117–125.

    Google Scholar 

  • KOKOLAKIS, G., NANOPOULOS, P., and FOUSKAKIS, D. (2006), “Bregman Divergences in the (m × k)−partitioning Problem”, Computational Statistics and Data Analysis, 51, 668–678.

    Article  MathSciNet  Google Scholar 

  • KRZANOWSKI, W.J. (1983), “Distance Between Populations Using Mixed Continuous and Categorical Variables”, Biometrika, 70, 235–243.

    Article  MATH  MathSciNet  Google Scholar 

  • MCLACHLAN, G.J. (1992), Discriminant Analysis and Statistical Pattern Recognition, New York: Wiley.

    Google Scholar 

  • ROBERT, C.P. (1994), The Bayesian Choice: A Decision-Theoretic Motivation, New York: Springer.

    MATH  Google Scholar 

  • ROCKAFELLAR, R.T. (1997), Convex Analysis, Princeton University Press: Princeton, NJ.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Kokolakis.

Additional information

Published online xx, xx, xxxx.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kokolakis, G., Fouskakis, D. On the Discrepancy Measures for the Optimal Equal Probability Partitioning in Bayesian Multivariate Micro-Aggregation. J Classif 25, 209–224 (2008). https://doi.org/10.1007/s00357-008-9014-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-008-9014-8

Keywords