Abstract
Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today’s interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacy-sensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users’ profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
US social security name database includes year of birth, gender, and the corresponding name for babies born in the United States.
References
Google maps API (2020). https://developers.google.com/maps/
Natural language toolkit (2020). http://www.nltk.org/
Patienslikeme (2020). https://www.patientslikeme.com/
Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016)
Andreou, A., Goga, O., Loiseau, P.: Identity vs. attribute disclosure risks for users with multiple social profiles. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2017)
Ayday, E., Fekri, F.: Iterative trust and reputation management using belief propagation. IEEE Trans. Dependable Secur. Comput. 9(3), 375–386 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)
Debnath, S., Ganguly, N., Mitra, P.: Feature weighting in content based recommendation system using social network analysis. In: Proceedings of the International Conference on World Wide Web (2008)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005)
Goga, O., Lei, H., Parthasarathi, S.H.K., Friedland, G., Sommer, R., Teixeira, R.: Exploiting innocuous activity for correlating users across sites. In: Proceedings of the 22nd International Conference on World Wide Web (2013)
Goga, O., Loiseau, P., Sommer, R., Teixeira, R., Gummadi, K.P.: On the reliability of profile matching across large online social networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)
Halimi, A., Ayday, E.: Profile matching across unstructured online social networks: threats and countermeasures. arXiv preprint arXiv:1711.01815 (2017)
Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K.: Identifying users across social tagging systems. In: Proceedings of the International AAAI Conference on Web and Social Media (2011)
Jain, P., Kumaraguru, P., Joshi, A.: @i seek ‘fb.me’: identifying users across multiple online social network. In: Proceedings of the 22nd International Conference on World Wide Web (2013)
Ji, S., Li, W., Gong, N.Z., Mittal, P., Beyah, R.: On your social network de-anonymizablity: quantification and large scale evaluation with seed knowledge. In: Proceedings of the Network and Distributed System Security Symposium (2015)
Ji, S., Li, W., Mittal, P., Hu, X., Beyah, R.: SecGraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: Proceedings of the 24th USENIX Security Symposium (2015)
Ji, S., Li, W., Srivatsa, M., Beyah, R.: Structural data de-anonymization: quantification, practice, and implications. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 1040–1053. ACM (2014)
Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endowment 7(5), 377–388 (2014)
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Liu, J., Zhang, F., Song, X., Song, Y.I., Lin, C.Y., Hon, H.W.: What’s in the name?: an unsupervised approach to link users across communities. In: Proceedings of ACM International Conference on Web Search and Data Mining (2013)
Liu, S., Wang, S., Zhu, F.: Structured learning from heterogeneous behavior for social identity linkage. IEEE Trans. Knowl. Data Eng. 27(7), 2005–2019 (2015)
Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)
Malhotra, A., Totti, L., Meira Jr., W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (2012)
Motoyama, M., Varghese, G.: I seek you: searching and matching individuals in social networks. In: Proceedings of the 11th International Workshop on Web Information and Data Management (2009)
Narayanan, A., Shi, E., Rubinstein, B.I.P.: Link prediction by de-anonymization: how we won the kaggle social network challenge. In: Proceedings of the International Joint Conference on Neural Networks (2011)
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proceedings of IEEE Symposium on Security and Privacy (2009)
Nilizadeh, S., Kapadia, A., Ahn, Y.Y.: Community-enhanced de-anonymization of online social networks. In: Proceedings of ACM Conference on Computer and Communications Security (2014)
Nunes, A., Calado, P., Martins, B.: Resolving user identities over social networks through supervised learning and rich similarity features. In: Proceedings of ACM Symposium on Applied Computing (2012)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988)
Pedarsani, P., Figueiredo, D.R., Grossglauser, M.: A Bayesian method for matching two similar graphs without seeds. In: Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (2013)
Perito, D., Castelluccia, C., Kaafar, M.A., Manils, P.: How unique and traceable are usernames? In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 1–17. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22263-4_1
Pishro-Nik, H., Fekri, F.: Performance of low-density parity-check codes with linear minimum distance. IEEE Trans. Inf. Theory 52(1), 292–300 (2005)
Sharad, K., Danezis, G.: An automated social graph de-anonymization technique. In: Proceedings of the 13th ACM Workshop on Privacy in the Electronic Society (2014)
Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter 18(2), 5–17 (2017)
Vosecky, J., Hong, D., Shen, V.Y.: User identification across multiple social networks. In: Proceedings of the International Conference on Networked Digital Technologies (2009)
Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to de-anonymize social network users. In: Proceedings of IEEE Symposium on Security and Privacy (2010)
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu
Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Web and Social Media (2009)
Zafarani, R., Liu, H.: Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of ACM SIDKDD Conference on Knowledge Discovery and Data Mining (2013)
Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J., Zhong, T.: DeepLink: a deep learning approach for user identity linkage. In: Proceedings of IEEE International Conference on Computer Communications, pp. 1313–1321. IEEE (2018)
Acknowledgment
We would like to thank the anonymous reviewers and our shepherd Shujun Li for their constructive feedback which has helped us to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
A Scalability of the BP-Based Algorithm
We study the effect of the OSNs’ size to precision and recall of the proposed algorithm. In Sect. 5.4, we provided the results when the number of users in OSN T is fixed. Here, we provide the results of the other two scenarios. In Fig. 7, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in the target OSN (OSN T) increases while the number of auxiliary users (i.e., users in OSN A) is fixed. We set the number of users in OSN A as 1000 and increase the number of users in OSN T from 100 to 1000 in steps of 100.
In Fig. 8, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in both OSNs (i.e., OSN A and T) increases from 100 to 1000 in steps of 100. In both scenarios, we observe that the precision/recall values of the proposed algorithm only slightly decrease with the increase of the number of users in the target OSN, or the increase of the number of users in both OSNs, which shows the scalability of our proposed algorithm.
To further check the effect of the auxiliary OSN’s size to precision and recall of the BP-based algorithm, we quantify the precision/recall values obtained by the proposed algorithm for larger scales in \(\mathrm {D3}\). We fix the number of users in the target OSN (i.e., OSN T) to 1000 while the number of users in the auxiliary OSN (i.e., OSN A) increases from 1000 to 8000 in steps of 1000 (in Fig. 4 the number of users in OSN T was fixed to 100 while the number of users in OSN A was increasing from 100 to 1000). We show the results for \(\mathrm {D3}\) in Fig. 9. The precision/recall values slightly decrease with the increase of the number of users in OSN A, confirming the scalability of the proposed algorithm. Note that, in \(\mathrm {D3}\), we only use the graph connectivity attribute for profile matching. We expect that the decrease in precision/recall values will be smaller when both the graphical structure and other attributes of the users are used to generate the model.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Halimi, A., Ayday, E. (2020). Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds) Computer Security – ESORICS 2020. ESORICS 2020. Lecture Notes in Computer Science(), vol 12308. Springer, Cham. https://doi.org/10.1007/978-3-030-58951-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-58951-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58950-9
Online ISBN: 978-3-030-58951-6
eBook Packages: Computer ScienceComputer Science (R0)