Abstract
Privacy-preserving data splitting is a technique that aims to protect data privacy by storing different fragments of data in different locations. In this work we give a new combinatorial formulation to the data splitting problem. We see the data splitting problem as a purely combinatorial problem, in which we have to split data attributes into different fragments in a way that satisfies certain combinatorial properties derived from processing and privacy constraints. Using this formulation, we develop new combinatorial and algebraic techniques to obtain solutions to the data splitting problem. We present an algebraic method which builds an optimal data splitting solution by using Gröbner bases. Since this method is not efficient in general, we also develop a greedy algorithm for finding solutions that are not necessarily minimally sized.
Similar content being viewed by others
References
Abu-Libdeh H., Princehouse L., Weatherspoon H.: RACS: a case for cloud storage diversity. In: Proceedings of the 1st ACM symposium on Cloud computing. ACM, New York (2010).
Aggarwal G., Bawa M., Ganesan P., Garcia-Molina H., Kenthapadi K., Motwani R., Srivastava U., Thomas D., Xu Y.: Two can keep a secret: a distributed architecture for secure database services. In: Conference on Innovative Data Systems Research, vol. 2005, pp. 186–199 (2005).
Beimel A., Farràs O., Mintz Y.: Secret sharing schemes for very dense graphs. J. Cryptol. 29(2), 336–362 (2016).
Brélaz D.: New methods to color the vertices of a graph. Commun. ACM 22(4), 251–256 (1979).
Brinkman R., Maubach S., Jonker W.: A lucky dip as a secure data store. In: Proceedings of Workshop on Information and System Security (2006).
Calviño A., Ricci S., Domingo-Ferrer J.: Privacy-preserving distributed statistical computation to a semi-honest multi-cloud. In: 2015 IEEE Conference on Communications and Network Security (CNS), pp. 506–514 (2015). https://doi.org/10.1109/CNS.2015.7346863.
Cao N., Wang C., Li M., Ren K., Lou W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 25(1), 222–233 (2014).
Carter M.W.: A rvey of practical applications of examination timetabling algorithms. Oper. Res. 34(2), 193–202 (1986).
Ciriani V., De Capitani di Vimercati S., Foresti S., Jajodia S., Paraboschi S., Samarati P.: Fragmentation and encryption to enforce privacy in data storage. In Computer Security – ESORICS 2007. Lecture Notes in Computer Science, vol. 4734, pp. 171–186. Springer, Heidelberg (2007).
Ciriani V., De Capitani di Vimercati S., Foresti S., Jajodia S., Paraboschi S., Samarati P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. 13(3), 221–223 (2010).
Ciriani V., Capitani De, di Vimercati S., Foresti S., Jajodia S., Paraboschi S., Samarati P.: Selective data outsourcing for enforcing privacy. J. Comput. Secur. 19(3), 531–566 (2011).
Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M.Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4(2), 28–34 (2002).
Cox D., Little J., O’shea D.: Ideals, Varieties, and Algorithms. Springer, New York (1992).
De Loera J.A.: Gröbner bases and graph colorings. Beitr. Algebra Geom. 36(1), 89–96 (1995).
De Loera J.A., Margulies S., Pernpeintner M., Riedl E., Rolnick D., Spencer G., Stasi D., Swenson J.: Graph-coloring ideals: Nullstellensatz certificates, Gröbner bases for chordal graphs, and hardness of Gröbner bases. In: Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation, pp. 133–140 (2015).
Dev H., Sen T., Basak B., Ali M.E.: An approach to protect the privacy of cloud data from data mining based attacks. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion. IEEE (2012).
Domingo-Ferrer J., Farràs O., Ribes-González J., Sánchez D.: Privacy-preserving cloud computing on sensitive data: a survey of methods, products and challenges. Comput. Commun. 140–141, 38–60 (2019).
Du W., Yunghsiang S.H., Shigang C.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics (2004).
Erdős P., Goodman A.W., Pósa L.: The representation of a graph by set intersections. Can. J. Math. 18, 106–112 (1966).
Faugere J.C., Gianni P., Lazard D., Mora T.: Efficient computation of zero-dimensional Gröbner bases by change of ordering. J. Symb. Comput. 16(4), 329–44 (1993).
Farràs O., Ribes-González J., Ricci S.: Local bounds for the optimal information ratio of secret sharing schemes. Des. Codes Cryptogr. 87(6), 1323–1344 (2019).
Ganapathy V., Thomas D., Feder T., Garcia-Molina H., Motwani R.: Distributing data for secure database services. Trans. Data Privacy 5(1), 253–272 (2012).
Goethals B., Laur S., Lipmaa H., Mielikäinen T.: On private scalar product computation for privacy-preserving data mining. In: International Conference on Information Security and Cryptology. Springer, Berlin (2004).
Guruswami V., Hastad J., Sudan M.: Hardness of approximate hypergraph coloring. SIAM J. Comput. 31(6), 1663–1686 (2002).
Hall Jr. M.: A problem in partitions. Bull. Am. Math. Soc. 47, 801–807 (1941).
Hillar C.J., Windfeldt T.: Algebraic characterization of uniquely vertex colorable graphs. J. Combin. Theory Ser. B 98(2), 400–14 (2008).
Kantarcioglu M.: A survey of privacy-preserving methods across horizontally partitioned data. In: Privacy-Preserving Data Mining, pp. 313–335. Springer, Boston (2008).
Leighton F.T.: A graph coloring algorithm for large scheduling problems. J. Res. Natl Bur. Stand. 84, 489–506 (1979).
Levy-dit-Vehel, F., Marinari, M.G., Perret, L., Traverso, C.: A survey on Polly Cracker systems. In: Gröbner Bases, Coding, and Cryptography, pp. 285–305. Springer, Berlin (2009).
Loera J.A., Lee J., Margulies S., Onn S.: Expressing combinatorial problems by systems of polynomial equations and Hilbert’s Nullstellensatz. Combin. Probab. Comput. 18(4), 551–82 (2009).
Ricci S., Domingo-Ferrer J., Sánchez D.: Privacy-preserving cloud-based statistical analyses on sensitive categorical data. In: Modeling Decisions for Artificial Intelligence. Springer, Cham (2016).
Sánchez D., Batet M.: Privacy-preserving data outsourcing in the cloud via semantic data splitting. Comput. Commun. 110, 187–201 (2017).
Shan Z., Ren K., Blanton M., Wang C.: Practical secure computation outsourcing: a survey. ACM Comput. Surv. 51(2), Article No. 31 (2018).
Spencer J.: Ten lectures on the probabilistic method. SIAM Regional Conference Series in Applied Mathematics, vol. 52. SIAM, Philadelphia (1987).
Sweeney L.: Simple demographics often identify people uniquely. Health (San Francisco) 671, 1–34 (2000).
Tang J., Cui Y., Li Q., Ren K., Liu J., Buyya R.: Ensuring security and privacy preservation for cloud data services. ACM Comput. Surv. 49(1), Article No. 13 (2016).
The Sage Mathematical Software System. http://www.sagemath.org/. Accessed 10 Jan 2021.
Tsukiyama S., Ide M., Ariyoshi H., Shirakawa I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6(3), 505–517 (1977).
Welsh D.J.A., Powell M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967).
Yang Q., Wu X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006).
Acknowledgements
This article is supported by the Ministry of the Interior of the Czech Republic (grant VJ01030002), by the Government of Catalonia (grant 2017 SGR 705), by the European Commission (project H2020-871042 “SoBigData++”), by the Spanish Government (project RTI2018-095094-B-C21, “CONSENT”), and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total cost eligible.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by C. J. Colbourn.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Farràs, O., Ribes-González, J. & Ricci, S. Privacy-preserving data splitting: a combinatorial approach. Des. Codes Cryptogr. 89, 1735–1756 (2021). https://doi.org/10.1007/s10623-021-00884-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10623-021-00884-6