Abstract
Data holders, such as statistical institutions and financial organizations, have a very serious and demanding task when producing data for official and public use. It’s about controlling the risk of identity disclosure and protecting sensitive information when they communicate data-sets among themselves, to governmental agencies and to the public. One of the techniques applied is that of micro-aggregation. In a Bayesian setting, micro-aggregation can be viewed as the optimal partitioning of the original data-set based on the minimization of an appropriate measure of discrepancy, or distance, between two posterior distributions, one of which is conditional on the original data-set and the other conditional on the aggregated data-set. Assuming d-variate normal data-sets and using several measures of discrepancy, it is shown that the asymptotically optimal equal probability m-partition of \( \mathbb{R}^{d} \), with m 1/d ∈ \( \mathbb{N} \), is the convex one which is provided by hypercubes whose sides are formed by hyperplanes perpendicular to the canonical axes, no matter which discrepancy measure has been used. On the basis of the above result, a method that produces a sub-optimal partition with a very small computational cost is presented.
Similar content being viewed by others
References
ADAM, N.P., and WORTMANN, J.C. (1989), “Security Control Methods for Statistical Databases. A Comparative Study”, ACM Computing Surveys, 21, 515–556.
DEFAYS, D., and NANOPOULOS, P. (1993), “Panels of Enterprises and Confidentiality: The Small Aggregates Method”, in Proceedings of Statistics Canada Symposium –Design and Analysis of Longitudinal Surveys, Statistics Canada: Ottawa, 195–204.
DEGROOT, M.H. (1970), Optimal Statistical Decisions, New York: McGraw-Hill.
DOMINGO-FERRER, J., and MATEO-SANZ, J.M. (2002), “Practical Data-oriented Microaggregation for Statistical Disclosure Control”, IEEE Transactions on Knowledge and Data Engineering, 14(1), 189–201.
DUNCAN, G.T., and LAMBERT, D. (1989), “The Risk of Disclosure for Microdata”, Journal of Business and Economic Statistics, 7, 207–217.
DUNCAN, G.T., and PEARSON, R.W. (1991), “Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future”, Statistical Science, 6, 219–239.
FIENBERG, S.E. (1994), “Conflict Between the Needs for Access to Statistical Information and Demands for Confidentiality”, Journal of Official Statistics, 10, 115–132.
KOKOLAKIS, G., and FOUSKAKIS, D. (2007), Importance Partitioning in Micro-Aggregation. Submitted.
KOKOLAKIS, G., and NANOPOULOS, P. (2001), “Bayesian Multivariate Micro-Aggregation under the Hellinger’s Distance Criterion”, Research in Official Statistics, 4, 117–125.
KOKOLAKIS, G., NANOPOULOS, P., and FOUSKAKIS, D. (2006), “Bregman Divergences in the (m × k)−partitioning Problem”, Computational Statistics and Data Analysis, 51, 668–678.
KRZANOWSKI, W.J. (1983), “Distance Between Populations Using Mixed Continuous and Categorical Variables”, Biometrika, 70, 235–243.
MCLACHLAN, G.J. (1992), Discriminant Analysis and Statistical Pattern Recognition, New York: Wiley.
ROBERT, C.P. (1994), The Bayesian Choice: A Decision-Theoretic Motivation, New York: Springer.
ROCKAFELLAR, R.T. (1997), Convex Analysis, Princeton University Press: Princeton, NJ.
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online xx, xx, xxxx.
Rights and permissions
About this article
Cite this article
Kokolakis, G., Fouskakis, D. On the Discrepancy Measures for the Optimal Equal Probability Partitioning in Bayesian Multivariate Micro-Aggregation. J Classif 25, 209–224 (2008). https://doi.org/10.1007/s00357-008-9014-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-008-9014-8