Abstract
Privacy risk assessment determines the extent to which generalization and obfuscation should be applied to the sensitive data. In this paper, we propose PriTxt for evaluating the privacy risk associated with text data by exploiting the semantic correlation. Using definitions derived from the General Data Protection Regulation (GDPR), PriTxt first defines the private features that related to individual privacy. By using the word2vec algorithm, a word-embedding model is further constructed to identify the quasi-sensitive words. The privacy risk of a given text is finally evaluated by aggregating the weighted risks of the sensitive and the quasi-sensitive words in the text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, L., Yang, J., Wang, Q.: Privacy-preserving data publishing for free text Chinese electronic medical records. In: 2012 IEEE 36th Annual Computer Software and Applications Conference, pp. 567–572 (2012)
Fang, B., Jia, Y., Aiping, L.I., Jiang, R.: Privacy preservation in big data: a survey. Big Data Res. 5, 33 (2016)
Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations. Association for Computing Machinery, New York (2020)
Hu, K., et al.: A domain keyword analysis approach extending term frequency-keyword active index with google word2vec model. Scientometrics 114(3), 1031–1068 (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Orooji, M., Knapp, G.M.: A novel microdata privacy disclosure risk measure (2019)
Pellungrini, R., Monreale, A., Guidotti, R.: Privacy risk for individual basket patterns. In: ECML PKDD 2018 Workshops, pp. 141–155. Springer International Publishing, Cham (2019)
Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: A data mining approach to assess privacy risk in human mobility data. ACM Trans. Intell. Syst. Technol. 9(3), 1–27 (2017)
Presthus, W., Sørum, H.: Are consumers concerned about privacy? An online survey emphasizing the general data protection regulation. Procedia Comput. Sci. 138, 603–611 (2018)
Torra, V.: Privacy Models and Disclosure Risk Measures, pp. 111–189. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57358-8_5
Yan, Z., Li, G., Liu, J.: Private rank aggregation under local differential privacy. Int. J. Intell. Syst. 35(10), 1492–1519 (2020)
Acknowledgment
This work is supported by the Humanities and Social Sciences Planning Project of the China Ministry of Education under Grant No. 19YJAZH099.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiong, P., Liang, L., Zhu, Y., Zhu, T. (2021). Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12939. Springer, Cham. https://doi.org/10.1007/978-3-030-86137-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-86137-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86136-0
Online ISBN: 978-3-030-86137-7
eBook Packages: Computer ScienceComputer Science (R0)