Abstract
In this paper, we proposed a new privacy-preserving clustering framework. We proposed two different types of data transformation methods on clustering solution vectors and clustering ensemble consensus matrix in a unified way. The first one is encryption, which includes cryptography-based methods and hashing functions. The other one is the perturbation, which includes data swapping, spatial transformation, and randomized perturbation. The related transformed clustering pairs or the consensus matrix pairs are evaluated them using 19 popular pair-counting similarity measures. The original evaluation results and those of privacy-preserving methods are discussed according to different privacy-preserving methods and different pair-counting similarity measures. We found that 1) for clustering similarity, the information loss of clustering similarity in cryptography-based methods and hashing functions is generally large and not applicable to the clustering ensemble case, while the information loss of data swapping, spatial transformation, and randomized perturbation are small and applicable to a wide range. 2) Through experimental validation, we found that different privacy-preserving methods and different pair-counting similarity measures show certain adaptability, where five pair-counting similarity measures are stable for all types of privacy protection methods. 3) Cryptography and hashing functions have high security and too much complexity, while data swapping, spatial transformation, and randomized perturbation have low complexity but low security. The above work can be used as an original investigation in this field and as a reference for subsequent related future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)
Bakhtiari, S., Safavi-Naini, R., Pieprzyk, J., et al.: Cryptographic hash functions: a survey. Eur. Trans. Telecommun. Relat. Technol. 5(4), 431–448 (1995)
Bellare, M.: New proofs for NMAC and HMAC: security without collision-resistance. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 602–619. Springer, Heidelberg (2006). https://doi.org/10.1007/11818175_36
Gilbert, H., Handschuh, H.: Security analysis of SHA-256 and sisters. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 175–193. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24654-1_13
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599 (2005)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2005)
Micali, S., Goldreich, O., Wigderson, A.: How to play any mental game. In: Proceedings of the Nineteenth ACM Symposium on Theory of Computing, STOC, pp. 218–229. ACM (1987)
Muralidhar, K., Sarathy, R.: A theoretical basis for perturbation methods. Stat. Comput. 13(4), 329–335 (2003)
Nie, T., Zhang, T.: A study of des and blowfish encryption algorithm. In: Tencon 2009-2009 IEEE Region 10 Conference, pp. 1–4. IEEE (2009)
Oliveira, S.R.M., Zaïane, O.R.: Achieving privacy preservation when sharing data for clustering. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178, pp. 67–82. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30073-1_6
Oliveira, S.R., Zaiane, O.R.: Privacy preserving clustering by data transformation. J. Inf. Data Manag. 1(1), 37 (2010)
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Singh, S.P., Maini, R.: Comparison of data encryption algorithms. Int. J. Comput. Sci. Commun. 2(1), 125–127 (2011)
Smid, M.E., Branstad, D.K.: Data encryption standard: past and future. Proc. IEEE 76(5), 550–559 (1988)
Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 379–390. SIAM (2004)
Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Infer. 6(1), 73–85 (1982)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)
Yang, L.I.: Research on differential privacy preserving k-means clustering. Comput. Sci. 59(1), 1–34 (2013)
Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 92–102. SIAM (2005)
Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (SFCS 1986), pp. 162–167. IEEE (1986)
Zhang, S., Wong, H.S., Shen, Y.: Generalized adjusted rand indices for cluster ensembles. Pattern Recogn. 45(6), 2214–2226 (2012)
Zhang, S., Yang, Z., Xing, X., Gao, Y., Xie, D., Wong, H.S.: Generalized pair-counting similarity measures for clustering and cluster ensembles. IEEE Access 5, 16904–16918 (2017)
Acknowledgments
The work described in this paper was partially supported by grants from Guang-dong Natural Science Foundation of China No. 2018A030313922, the funding of Guangzhou education scientific research project No. 1201730714, and the National Natural Science Foundation of China under grant No. 61801133.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, P., Zhang, S., Yang, L. (2022). Privacy-Preserving Cluster Validity. In: Wang, G., Choo, KK.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol 1557. Springer, Singapore. https://doi.org/10.1007/978-981-19-0468-4_12
Download citation
DOI: https://doi.org/10.1007/978-981-19-0468-4_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0467-7
Online ISBN: 978-981-19-0468-4
eBook Packages: Computer ScienceComputer Science (R0)