research-article

An algorithm for the microaggregation problem using column generation

Authors:

Claudio Gentile,

Enric Spagnolo-ArrizabalagaAuthors Info & Claims

Volume 144, Issue C

https://doi.org/10.1016/j.cor.2022.105817

Published: 01 August 2022 Publication History

Abstract

The field of statistical disclosure control aims to reduce the risk of re-identifying an individual from disseminated data, a major concern among national statistical agencies. Operations Research (OR) techniques have been widely used in the past for protecting tabular data, but not microdata (i.e., files of individuals and attributes). Few papers apply OR techniques to the microaggregation problem, which is considered one of the best methods for microdata protection and is known to be NP-hard.

The new heuristic approach is based on a column generation scheme and, unlike previous (primal) heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Using real data that is typically used in the literature, our computational results show, first, that solutions with small gaps are often achieved and, second, that dramatic improvements are obtained relative to the literature’s most popular heuristics.

Highlights

•

Microaggregation is a Statistical Disclosure Control technique to protect microarrays.

•

An algorithm for computing a feasible solution for Microaggregation is also proposed.

•

The proposed algorithm improves on the solution quality with respect to algorithms MDAV and V-MDAV.

References

[1]

Abowd J.M., Domingo-Ferrer J., Torra V., Using mahalanobis distance-based record linkage for disclosure risk assessment, in: Domingo-Ferrer J., Franconi L. (Eds.), Privacy in Statistical Databases 2006, in: Lecture Notes in Computer Science, vol. 4302, Springer, Heidelberg, 2006, pp. 233–242.

[2]

Aloise D., Hansen P., Rocha C., Santi E., Column generation bounds for numerical microaggregation, J. Glob. Optim. 60 (2014) 165–182.

[3]

Baena D., Castro J., Frangioni A., Stabilized benders methods for large-scale combinatorial optimization, with application to data privacy, Manage. Sci. 66 (2020) 3051–3068.

[4]

Baena D., Castro J., González J.A., Fix-and-relax approaches for controlled tabular adjustment, Comput. Oper. Res. 58 (2015) 41–52.

[5]

Brand R., Domingo-Ferrer J., Mateo-Sanz J.M., Reference data sets to test and compare SDC methods for protection of numerical microdata, 2002, European Project IST-2000-25069 CASC, http://neon.vb.cbs.nl/casc, https://research.cbs.nl/casc/CASCtestsets.htm.

[6]

Castro J., A shortest paths heuristic for statistical disclosure control in positive tables, INFORMS J. Comput. 9 (4) (2007) 520–533.

[7]

Castro J., Recent advances in optimization techniques for statistical tabular data protection, European J. Oper. Res. 216 (2012) 257–269.

[8]

Castro J., Frangioni A., Gentile C., Perspective reformulations of the CTA problem with L2 distances, Oper. Res. 62 (4) (2014) 891–909.

[9]

Castro J., González J.A., A linear optimization based method for data privacy in statistical tabular data, Optim. Methods Softw. 34 (2019) 37–61.

[10]

Dalenius T., Reiss S., Data-swapping: a technique for disclosure control (extended abstract), in: Proc. ASA Section on Survey Research Methods, American Statistical Association, Washington DC, 1978, pp. 191–194.

[11]

Domingo-Ferrer J., Mateo-Sanz J.M., Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. Knowl. Data Eng. 14 (1) (2002).

[12]

Domingo-Ferrer J., Torra V., A quantitative comparison of disclosure control methods for microdata, in: Doyle P., Lane J., Theeuwes J., Zayatz L. (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, Amsterdam, 2001, pp. 111–134.

[13]

Domingo-Ferrer J., Torra V., Ordinal, continuous and heterogeneous k-anonimity through microaggregation, Data Min. Knowl. Discov. 11 (2005) 195–212.

[14]

Fischetti M., Salazar-González J.J., Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control, Math. Program. 84 (2) (1999) 283–312.

[15]

Fischetti M., Salazar-González J.J., Solving the cell suppression problem on tabular data with linear constraints, Manage. Sci. 47 (7) (2001) 1008–1027.

[16]

Ghosh J., Liu A., K-Means, in: The Top Ten Algorithms in Data Mining, Taylor & Francis, Boca Raton, 2009, pp. 21–35.

[17]

González J.A., Castro J., A heuristic block coordinate descent approach for controlled tabular adjustment, Comput. Oper. Res. 38 (2011) 1826–1835.

[18]

Hansen S., Mukherjee S., A polynomial algorithm for optimal univariate microaggregation, IEEE Trans. Knowl. Data Eng. 15 (4) (2003).

[19]

Hernández-García M.S., Salazar-González J.J., Enhanced controlled tabular adjustment, Comput. Oper. Res. 43 (2014) 61–67.

[20]

Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Nordholt E.S., Spicer K., de Wolf P.-P., Statistical Disclosure Control, Wiley, Chichester, 2012.

[21]

Ji X., Mitchell J.E., Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement, Discrete Optim. 4 (1) (2007) 87–102.

[22]

Melo M.T., Nickel S., da Gama F.S., Facility location and supply chain management–A review, European J. Oper. Res. 196 (2009) 401–412.

[23]

Moore R., Controlled Data-Swapping Techniques for Masking Public Use Microdata, U.S. Bureau of the Census Statistical Research Division, 1996.

[24]

Muralidhar K., Sarathy R., Data shuffling: A new masking approach for numerical data, Manage. Sci. 52 (2006) 658–570.

[25]

Oganian A., Domingo-Ferrer J., On the complexity of optimal microaggregation for statistical disclosure control, Stat. J. United Nations Econ. Comission Eur. 18 (2001) 345–354.

[26]

Sage A.J., Wright S.E., Obtaining cell counts for contingency tables from rounded conditional frequencies, European J. Oper. Res. 250 (1) (2016) 91–100.

[27]

Salazar-González J.J., Mathematical models for applying cell suppression methodology in statistical data protection, European J. Oper. Res. 154 (2004) 740–754.

[28]

Salazar-González J.J., Controlled rounding and cell perturbation: Statistical disclosure limitation methods for tabular data, Math. Program. 105 (2006) 583–603.

[29]

Samarati P., Protecting respondents identities in microdata release, IEEE Trans. Knowl. Data Eng. 13 (6) (2001) 1010–1027.

Digital Library

[30]

Solanas, A., Martínez-Ballesté, A., 2006. V-MDAV: A Multivariate Microaggregation With Variable Group Size. In: Proc. COMPSTAT Symp. IASC. pp. 917–925.

[31]

Spagnolo E., On the use of Integer Programming to pursue optimal Microaggregation, (Master’s thesis) School of Mathematics and Statistics, Universitat Politècnica de Catalunya, 2016.

Cited By

Castro JGentile CSpagnolo-Arrizabalaga E(2022)An Optimization-Based Decomposition Heuristic for the Microaggregation ProblemPrivacy in Statistical Databases10.1007/978-3-031-13945-1_1(3-14)Online publication date: 21-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-13945-1_1

Index Terms

An algorithm for the microaggregation problem using column generation

Index terms have been assigned to the content through auto-classification.

Recommendations

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation
Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of ...
A new framework to automate constrained microaggregation
PAVLAD '09: Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases

Data protection methods from privacy preserving data mining and statistical disclosure control can introduce perturbation in the data. While this perturbation helps to protect the privacy of the respondents, it can introduce inconsistencies and errors. ...
A polynomial-time approximation to optimal multivariate microaggregation

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying ...

Comments

Information & Contributors

Information

Published In

cover image Computers and Operations Research

Computers and Operations Research Volume 144, Issue C

Aug 2022

498 pages

ISSN:0305-0548

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 August 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Castro JGentile CSpagnolo-Arrizabalaga E(2022)An Optimization-Based Decomposition Heuristic for the Microaggregation ProblemPrivacy in Statistical Databases10.1007/978-3-031-13945-1_1(3-14)Online publication date: 21-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-13945-1_1

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents