Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Privacy-Preserving Cluster Validity

  • Conference paper
  • First Online:
Ubiquitous Security (UbiSec 2021)

Abstract

In this paper, we proposed a new privacy-preserving clustering framework. We proposed two different types of data transformation methods on clustering solution vectors and clustering ensemble consensus matrix in a unified way. The first one is encryption, which includes cryptography-based methods and hashing functions. The other one is the perturbation, which includes data swapping, spatial transformation, and randomized perturbation. The related transformed clustering pairs or the consensus matrix pairs are evaluated them using 19 popular pair-counting similarity measures. The original evaluation results and those of privacy-preserving methods are discussed according to different privacy-preserving methods and different pair-counting similarity measures. We found that 1) for clustering similarity, the information loss of clustering similarity in cryptography-based methods and hashing functions is generally large and not applicable to the clustering ensemble case, while the information loss of data swapping, spatial transformation, and randomized perturbation are small and applicable to a wide range. 2) Through experimental validation, we found that different privacy-preserving methods and different pair-counting similarity measures show certain adaptability, where five pair-counting similarity measures are stable for all types of privacy protection methods. 3) Cryptography and hashing functions have high security and too much complexity, while data swapping, spatial transformation, and randomized perturbation have low complexity but low security. The above work can be used as an original investigation in this field and as a reference for subsequent related future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/datasets.html.

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)

    Google Scholar 

  2. Bakhtiari, S., Safavi-Naini, R., Pieprzyk, J., et al.: Cryptographic hash functions: a survey. Eur. Trans. Telecommun. Relat. Technol. 5(4), 431–448 (1995)

    Google Scholar 

  3. Bellare, M.: New proofs for NMAC and HMAC: security without collision-resistance. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 602–619. Springer, Heidelberg (2006). https://doi.org/10.1007/11818175_36

    Chapter  Google Scholar 

  4. Gilbert, H., Handschuh, H.: Security analysis of SHA-256 and sisters. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 175–193. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24654-1_13

    Chapter  Google Scholar 

  5. Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599 (2005)

    Google Scholar 

  6. Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3

    Chapter  Google Scholar 

  7. Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2005)

    Google Scholar 

  8. Micali, S., Goldreich, O., Wigderson, A.: How to play any mental game. In: Proceedings of the Nineteenth ACM Symposium on Theory of Computing, STOC, pp. 218–229. ACM (1987)

    Google Scholar 

  9. Muralidhar, K., Sarathy, R.: A theoretical basis for perturbation methods. Stat. Comput. 13(4), 329–335 (2003)

    Article  MathSciNet  Google Scholar 

  10. Nie, T., Zhang, T.: A study of des and blowfish encryption algorithm. In: Tencon 2009-2009 IEEE Region 10 Conference, pp. 1–4. IEEE (2009)

    Google Scholar 

  11. Oliveira, S.R.M., Zaïane, O.R.: Achieving privacy preservation when sharing data for clustering. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178, pp. 67–82. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30073-1_6

    Chapter  Google Scholar 

  12. Oliveira, S.R., Zaiane, O.R.: Privacy preserving clustering by data transformation. J. Inf. Data Manag. 1(1), 37 (2010)

    Google Scholar 

  13. Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)

    Article  Google Scholar 

  14. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  15. Singh, S.P., Maini, R.: Comparison of data encryption algorithms. Int. J. Comput. Sci. Commun. 2(1), 125–127 (2011)

    Google Scholar 

  16. Smid, M.E., Branstad, D.K.: Data encryption standard: past and future. Proc. IEEE 76(5), 550–559 (1988)

    Article  Google Scholar 

  17. Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 379–390. SIAM (2004)

    Google Scholar 

  18. Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Infer. 6(1), 73–85 (1982)

    Article  MathSciNet  Google Scholar 

  19. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)

    Google Scholar 

  20. Yang, L.I.: Research on differential privacy preserving k-means clustering. Comput. Sci. 59(1), 1–34 (2013)

    Google Scholar 

  21. Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 92–102. SIAM (2005)

    Google Scholar 

  22. Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (SFCS 1986), pp. 162–167. IEEE (1986)

    Google Scholar 

  23. Zhang, S., Wong, H.S., Shen, Y.: Generalized adjusted rand indices for cluster ensembles. Pattern Recogn. 45(6), 2214–2226 (2012)

    Article  Google Scholar 

  24. Zhang, S., Yang, Z., Xing, X., Gao, Y., Xie, D., Wong, H.S.: Generalized pair-counting similarity measures for clustering and cluster ensembles. IEEE Access 5, 16904–16918 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

The work described in this paper was partially supported by grants from Guang-dong Natural Science Foundation of China No. 2018A030313922, the funding of Guangzhou education scientific research project No. 1201730714, and the National Natural Science Foundation of China under grant No. 61801133.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, P., Zhang, S., Yang, L. (2022). Privacy-Preserving Cluster Validity. In: Wang, G., Choo, KK.R., Ko, R.K.L., Xu, Y., Crispo, B. (eds) Ubiquitous Security. UbiSec 2021. Communications in Computer and Information Science, vol 1557. Springer, Singapore. https://doi.org/10.1007/978-981-19-0468-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0468-4_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0467-7

  • Online ISBN: 978-981-19-0468-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics