Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation

Halimi, Anisa; Ayday, Erman

doi:10.1007/978-3-030-58951-6_6

Anisa Halimi¹² &
Erman Ayday^12,13

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12308))

Included in the following conference series:

European Symposium on Research in Computer Security

4153 Accesses

Abstract

Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today’s interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacy-sensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users’ profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Privacy Scoring of Social Network User Profiles Through Risk Analysis

Network-aware privacy risk estimation in online social networks

Article 16 April 2019

Priv-S: Privacy-Sensitive Data Identification in Online Social Networks

Notes

1.
US social security name database includes year of birth, gender, and the corresponding name for babies born in the United States.

References

Google maps API (2020). https://developers.google.com/maps/
Natural language toolkit (2020). http://www.nltk.org/
Patienslikeme (2020). https://www.patientslikeme.com/
Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016)
Google Scholar
Andreou, A., Goga, O., Loiseau, P.: Identity vs. attribute disclosure risks for users with multiple social profiles. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2017)
Google Scholar
Ayday, E., Fekri, F.: Iterative trust and reputation management using belief propagation. IEEE Trans. Dependable Secur. Comput. 9(3), 375–386 (2012)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)
Article Google Scholar
Debnath, S., Ganguly, N., Mitra, P.: Feature weighting in content based recommendation system using social network analysis. In: Proceedings of the International Conference on World Wide Web (2008)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005)
Google Scholar
Goga, O., Lei, H., Parthasarathi, S.H.K., Friedland, G., Sommer, R., Teixeira, R.: Exploiting innocuous activity for correlating users across sites. In: Proceedings of the 22nd International Conference on World Wide Web (2013)
Google Scholar
Goga, O., Loiseau, P., Sommer, R., Teixeira, R., Gummadi, K.P.: On the reliability of profile matching across large online social networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)
Google Scholar
Halimi, A., Ayday, E.: Profile matching across unstructured online social networks: threats and countermeasures. arXiv preprint arXiv:1711.01815 (2017)
Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K.: Identifying users across social tagging systems. In: Proceedings of the International AAAI Conference on Web and Social Media (2011)
Google Scholar
Jain, P., Kumaraguru, P., Joshi, A.: @i seek ‘fb.me’: identifying users across multiple online social network. In: Proceedings of the 22nd International Conference on World Wide Web (2013)
Google Scholar
Ji, S., Li, W., Gong, N.Z., Mittal, P., Beyah, R.: On your social network de-anonymizablity: quantification and large scale evaluation with seed knowledge. In: Proceedings of the Network and Distributed System Security Symposium (2015)
Google Scholar
Ji, S., Li, W., Mittal, P., Hu, X., Beyah, R.: SecGraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: Proceedings of the 24th USENIX Security Symposium (2015)
Google Scholar
Ji, S., Li, W., Srivatsa, M., Beyah, R.: Structural data de-anonymization: quantification, practice, and implications. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 1040–1053. ACM (2014)
Google Scholar
Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endowment 7(5), 377–388 (2014)
Article Google Scholar
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MathSciNet Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
Liu, J., Zhang, F., Song, X., Song, Y.I., Lin, C.Y., Hon, H.W.: What’s in the name?: an unsupervised approach to link users across communities. In: Proceedings of ACM International Conference on Web Search and Data Mining (2013)
Google Scholar
Liu, S., Wang, S., Zhu, F.: Structured learning from heterogeneous behavior for social identity linkage. IEEE Trans. Knowl. Data Eng. 27(7), 2005–2019 (2015)
Article Google Scholar
Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)
Google Scholar
Malhotra, A., Totti, L., Meira Jr., W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (2012)
Google Scholar
Motoyama, M., Varghese, G.: I seek you: searching and matching individuals in social networks. In: Proceedings of the 11th International Workshop on Web Information and Data Management (2009)
Google Scholar
Narayanan, A., Shi, E., Rubinstein, B.I.P.: Link prediction by de-anonymization: how we won the kaggle social network challenge. In: Proceedings of the International Joint Conference on Neural Networks (2011)
Google Scholar
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proceedings of IEEE Symposium on Security and Privacy (2009)
Google Scholar
Nilizadeh, S., Kapadia, A., Ahn, Y.Y.: Community-enhanced de-anonymization of online social networks. In: Proceedings of ACM Conference on Computer and Communications Security (2014)
Google Scholar
Nunes, A., Calado, P., Martins, B.: Resolving user identities over social networks through supervised learning and rich similarity features. In: Proceedings of ACM Symposium on Applied Computing (2012)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988)
Google Scholar
Pedarsani, P., Figueiredo, D.R., Grossglauser, M.: A Bayesian method for matching two similar graphs without seeds. In: Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (2013)
Google Scholar
Perito, D., Castelluccia, C., Kaafar, M.A., Manils, P.: How unique and traceable are usernames? In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 1–17. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22263-4_1
Chapter Google Scholar
Pishro-Nik, H., Fekri, F.: Performance of low-density parity-check codes with linear minimum distance. IEEE Trans. Inf. Theory 52(1), 292–300 (2005)
Article MathSciNet Google Scholar
Sharad, K., Danezis, G.: An automated social graph de-anonymization technique. In: Proceedings of the 13th ACM Workshop on Privacy in the Electronic Society (2014)
Google Scholar
Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter 18(2), 5–17 (2017)
Article Google Scholar
Vosecky, J., Hong, D., Shen, V.Y.: User identification across multiple social networks. In: Proceedings of the International Conference on Networked Digital Technologies (2009)
Google Scholar
Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to de-anonymize social network users. In: Proceedings of IEEE Symposium on Security and Privacy (2010)
Google Scholar
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu
Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Web and Social Media (2009)
Google Scholar
Zafarani, R., Liu, H.: Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of ACM SIDKDD Conference on Knowledge Discovery and Data Mining (2013)
Google Scholar
Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J., Zhong, T.: DeepLink: a deep learning approach for user identity linkage. In: Proceedings of IEEE International Conference on Computer Communications, pp. 1313–1321. IEEE (2018)
Google Scholar

Download references

Acknowledgment

We would like to thank the anonymous reviewers and our shepherd Shujun Li for their constructive feedback which has helped us to improve this paper.

Author information

Authors and Affiliations

Case Western Reserve University, Cleveland, OH, USA
Anisa Halimi & Erman Ayday
Bilkent University, Ankara, Turkey
Erman Ayday

Authors

Anisa Halimi
View author publications
You can also search for this author in PubMed Google Scholar
Erman Ayday
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erman Ayday .

Editor information

Editors and Affiliations

University of Surrey, Guildford, UK
Liqun Chen
Purdue University, West Lafayette, IN, USA
Ninghui Li
Delft University of Technology, Delft, The Netherlands
Kaitai Liang
University of Surrey, Guildford, UK
Steve Schneider

Appendices

Appendix

A Scalability of the BP-Based Algorithm

We study the effect of the OSNs’ size to precision and recall of the proposed algorithm. In Sect. 5.4, we provided the results when the number of users in OSN T is fixed. Here, we provide the results of the other two scenarios. In Fig. 7, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in the target OSN (OSN T) increases while the number of auxiliary users (i.e., users in OSN A) is fixed. We set the number of users in OSN A as 1000 and increase the number of users in OSN T from 100 to 1000 in steps of 100.

In Fig. 8, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in both OSNs (i.e., OSN A and T) increases from 100 to 1000 in steps of 100. In both scenarios, we observe that the precision/recall values of the proposed algorithm only slightly decrease with the increase of the number of users in the target OSN, or the increase of the number of users in both OSNs, which shows the scalability of our proposed algorithm.

To further check the effect of the auxiliary OSN’s size to precision and recall of the BP-based algorithm, we quantify the precision/recall values obtained by the proposed algorithm for larger scales in $\mathrm {D3}$. We fix the number of users in the target OSN (i.e., OSN T) to 1000 while the number of users in the auxiliary OSN (i.e., OSN A) increases from 1000 to 8000 in steps of 1000 (in Fig. 4 the number of users in OSN T was fixed to 100 while the number of users in OSN A was increasing from 100 to 1000). We show the results for $\mathrm {D3}$ in Fig. 9. The precision/recall values slightly decrease with the increase of the number of users in OSN A, confirming the scalability of the proposed algorithm. Note that, in $\mathrm {D3}$, we only use the graph connectivity attribute for profile matching. We expect that the decrease in precision/recall values will be smaller when both the graphical structure and other attributes of the users are used to generate the model.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halimi, A., Ayday, E. (2020). Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds) Computer Security – ESORICS 2020. ESORICS 2020. Lecture Notes in Computer Science(), vol 12308. Springer, Cham. https://doi.org/10.1007/978-3-030-58951-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-58951-6_6
Published: 12 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58950-9
Online ISBN: 978-3-030-58951-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics