Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-13945-1_1guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

An Optimization-Based Decomposition Heuristic for the Microaggregation Problem

Published: 21 September 2022 Publication History

Abstract

Given a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and k{3,5,10}.

References

[1]
Aloise D, Hansen P, Rocha C, and Santi É Column generation bounds for numerical microaggregation J. Global Optim. 2014 60 2 165-182
[2]
Aloise D and Araújo A A derivative-free algorithm for refining numerical microaggregation solutions Int. Trans. Oper. Res. 2015 22 693-712
[3]
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J. M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002). http://neon.vb.cbs.nl/casc, https://research.cbs.nl/casc/CASCtestsets.html
[4]
Castro J, Gentile C, and Spagnolo-Arrizabalaga E An algorithm for the microaggregation problem using column generation Comput. Oper. Res. 2022 144 105817
[5]
Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of Second International Symposium Statistical Confidentiality, pp. 69–78 (1995)
[6]
Domingo-Ferrer J and Mateo-Sanz JM Practical data-oriented microaggregation for statistical disclosure control IEEE Trans. Knowl. Data Eng. 2002 14 189-201
[7]
Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, and Sebé F Efficient multivariate data-oriented microaggregation VLDB J. 2006 15 355-369
[8]
Domingo-Ferrer J and Torra V Ordinal, continuous and heterogeneous k-anonymity through microaggregation Data Mining Knowl. Disc. 2005 11 195-212
[9]
Ji X and Mitchell JE Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement Discr. Optim. 2007 4 87-102
[10]
Ghosh, J., Liu, A.: K-means. In: The Top Ten Algorithms in Data Mining, pp. 21–35. Taylor & Francis, Boca Raton (2009)
[11]
Hansen S and Mukherjee S A polynomial algorithm for optimal univariate microaggregation IEEE Trans. Knowl. Data Eng. 2003 15 1043-1044
[12]
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (Program PAM). In: Wiley Series in Probability and Statistics, pp. 68–125. John Wiley & Sons, Hoboken (1990)
[13]
Khomnotai L, Lin J-L, Peng Z-Q, and Samanta A Iterative group decomposition for refining microaggregation solutions Symmetry 2018 10 262
[14]
Maya-López, A., Casino, F., Solanas, A.: Improving multivariate microaggregation through Hamiltonian paths and optimal univariate microaggregation. Symmetry. 13, 916 (2021).
[15]
Oganian A and Domingo-Ferrer J On the complexity of optimal microaggregation for statistical disclosure control Statist. J. U. N. Econ. Com. Eur. 2001 18 345-354
[16]
Panagiotakis C and Tziritas G Successive group selection for microaggregation IEEE Trans. Knowl. Data Eng. 2013 25 1191-1195
[17]
Soria-Comas J, Domingo-Ferrer J, and Mulero R Torra V, Narukawa Y, Pasi G, and Viviani M Efficient near-optimal variable-size microaggregation Modeling Decisions for Artificial Intelligence 2019 Cham Springer 333-345
[18]
Spagnolo-Arrizabalaga, E.: On the use of Integer Programming to pursue Optimal Microaggregation. B.Sc. thesis, University Politècnica de Catalunya, School of Mathematics and Statistics, Barcelona (2016)
[19]
Solanas, A., Martínez-Ballesté, A.: V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings of COMPSTAT Symposium IASC, pp. 917–925 (2006)
[20]
Sweeney L k-anonymity: a model for protecting privacy Int. J. Uncertain Fuzziness Knowl. Based Syst. 2002 10 557-570

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings
Sep 2022
374 pages
ISBN:978-3-031-13944-4
DOI:10.1007/978-3-031-13945-1

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 21 September 2022

Author Tags

  1. Statistical disclosure control
  2. Microdata
  3. Microaggregation problem
  4. Mixed integer linear optimization
  5. Column generation
  6. Local search
  7. Heuristics

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media