Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation

  • Conference paper
  • First Online:
Computer Security – ESORICS 2020 (ESORICS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12308))

Included in the following conference series:

Abstract

Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today’s interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacy-sensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users’ profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    US social security name database includes year of birth, gender, and the corresponding name for babies born in the United States.

References

  1. Google maps API (2020). https://developers.google.com/maps/

  2. Natural language toolkit (2020). http://www.nltk.org/

  3. Patienslikeme (2020). https://www.patientslikeme.com/

  4. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016)

    Google Scholar 

  5. Andreou, A., Goga, O., Loiseau, P.: Identity vs. attribute disclosure risks for users with multiple social profiles. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2017)

    Google Scholar 

  6. Ayday, E., Fekri, F.: Iterative trust and reputation management using belief propagation. IEEE Trans. Dependable Secur. Comput. 9(3), 375–386 (2012)

    Article  Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)

    Article  Google Scholar 

  9. Debnath, S., Ganguly, N., Mitra, P.: Feature weighting in content based recommendation system using social network analysis. In: Proceedings of the International Conference on World Wide Web (2008)

    Google Scholar 

  10. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005)

    Google Scholar 

  11. Goga, O., Lei, H., Parthasarathi, S.H.K., Friedland, G., Sommer, R., Teixeira, R.: Exploiting innocuous activity for correlating users across sites. In: Proceedings of the 22nd International Conference on World Wide Web (2013)

    Google Scholar 

  12. Goga, O., Loiseau, P., Sommer, R., Teixeira, R., Gummadi, K.P.: On the reliability of profile matching across large online social networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)

    Google Scholar 

  13. Halimi, A., Ayday, E.: Profile matching across unstructured online social networks: threats and countermeasures. arXiv preprint arXiv:1711.01815 (2017)

  14. Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K.: Identifying users across social tagging systems. In: Proceedings of the International AAAI Conference on Web and Social Media (2011)

    Google Scholar 

  15. Jain, P., Kumaraguru, P., Joshi, A.: @i seek ‘fb.me’: identifying users across multiple online social network. In: Proceedings of the 22nd International Conference on World Wide Web (2013)

    Google Scholar 

  16. Ji, S., Li, W., Gong, N.Z., Mittal, P., Beyah, R.: On your social network de-anonymizablity: quantification and large scale evaluation with seed knowledge. In: Proceedings of the Network and Distributed System Security Symposium (2015)

    Google Scholar 

  17. Ji, S., Li, W., Mittal, P., Hu, X., Beyah, R.: SecGraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: Proceedings of the 24th USENIX Security Symposium (2015)

    Google Scholar 

  18. Ji, S., Li, W., Srivatsa, M., Beyah, R.: Structural data de-anonymization: quantification, practice, and implications. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 1040–1053. ACM (2014)

    Google Scholar 

  19. Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endowment 7(5), 377–388 (2014)

    Article  Google Scholar 

  20. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MathSciNet  Google Scholar 

  21. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  22. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  23. Liu, J., Zhang, F., Song, X., Song, Y.I., Lin, C.Y., Hon, H.W.: What’s in the name?: an unsupervised approach to link users across communities. In: Proceedings of ACM International Conference on Web Search and Data Mining (2013)

    Google Scholar 

  24. Liu, S., Wang, S., Zhu, F.: Structured learning from heterogeneous behavior for social identity linkage. IEEE Trans. Knowl. Data Eng. 27(7), 2005–2019 (2015)

    Article  Google Scholar 

  25. Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)

    Google Scholar 

  26. Malhotra, A., Totti, L., Meira Jr., W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (2012)

    Google Scholar 

  27. Motoyama, M., Varghese, G.: I seek you: searching and matching individuals in social networks. In: Proceedings of the 11th International Workshop on Web Information and Data Management (2009)

    Google Scholar 

  28. Narayanan, A., Shi, E., Rubinstein, B.I.P.: Link prediction by de-anonymization: how we won the kaggle social network challenge. In: Proceedings of the International Joint Conference on Neural Networks (2011)

    Google Scholar 

  29. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proceedings of IEEE Symposium on Security and Privacy (2009)

    Google Scholar 

  30. Nilizadeh, S., Kapadia, A., Ahn, Y.Y.: Community-enhanced de-anonymization of online social networks. In: Proceedings of ACM Conference on Computer and Communications Security (2014)

    Google Scholar 

  31. Nunes, A., Calado, P., Martins, B.: Resolving user identities over social networks through supervised learning and rich similarity features. In: Proceedings of ACM Symposium on Applied Computing (2012)

    Google Scholar 

  32. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988)

    Google Scholar 

  33. Pedarsani, P., Figueiredo, D.R., Grossglauser, M.: A Bayesian method for matching two similar graphs without seeds. In: Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (2013)

    Google Scholar 

  34. Perito, D., Castelluccia, C., Kaafar, M.A., Manils, P.: How unique and traceable are usernames? In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 1–17. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22263-4_1

    Chapter  Google Scholar 

  35. Pishro-Nik, H., Fekri, F.: Performance of low-density parity-check codes with linear minimum distance. IEEE Trans. Inf. Theory 52(1), 292–300 (2005)

    Article  MathSciNet  Google Scholar 

  36. Sharad, K., Danezis, G.: An automated social graph de-anonymization technique. In: Proceedings of the 13th ACM Workshop on Privacy in the Electronic Society (2014)

    Google Scholar 

  37. Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter 18(2), 5–17 (2017)

    Article  Google Scholar 

  38. Vosecky, J., Hong, D., Shen, V.Y.: User identification across multiple social networks. In: Proceedings of the International Conference on Networked Digital Technologies (2009)

    Google Scholar 

  39. Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to de-anonymize social network users. In: Proceedings of IEEE Symposium on Security and Privacy (2010)

    Google Scholar 

  40. Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu

  41. Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Web and Social Media (2009)

    Google Scholar 

  42. Zafarani, R., Liu, H.: Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of ACM SIDKDD Conference on Knowledge Discovery and Data Mining (2013)

    Google Scholar 

  43. Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J., Zhong, T.: DeepLink: a deep learning approach for user identity linkage. In: Proceedings of IEEE International Conference on Computer Communications, pp. 1313–1321. IEEE (2018)

    Google Scholar 

Download references

Acknowledgment

We would like to thank the anonymous reviewers and our shepherd Shujun Li for their constructive feedback which has helped us to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erman Ayday .

Editor information

Editors and Affiliations

Appendices

Appendix

A Scalability of the BP-Based Algorithm

We study the effect of the OSNs’ size to precision and recall of the proposed algorithm. In Sect. 5.4, we provided the results when the number of users in OSN T is fixed. Here, we provide the results of the other two scenarios. In Fig. 7, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in the target OSN (OSN T) increases while the number of auxiliary users (i.e., users in OSN A) is fixed. We set the number of users in OSN A as 1000 and increase the number of users in OSN T from 100 to 1000 in steps of 100.

In Fig. 8, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in both OSNs (i.e., OSN A and T) increases from 100 to 1000 in steps of 100. In both scenarios, we observe that the precision/recall values of the proposed algorithm only slightly decrease with the increase of the number of users in the target OSN, or the increase of the number of users in both OSNs, which shows the scalability of our proposed algorithm.

Fig. 7.
figure 7

The effect of target OSN’s (OSN T) size to precision/recall when the size of auxiliary OSN (OSN A) is 1000 in \(\mathrm {D1}\), \(\mathrm {D2}\), and \(\mathrm {D3}\).

Fig. 8.
figure 8

The effect of auxiliary and target OSNs’ (OSN A and T) size to precision/recall in \(\mathrm {D1}\), \(\mathrm {D2}\), and \(\mathrm {D3}\).

To further check the effect of the auxiliary OSN’s size to precision and recall of the BP-based algorithm, we quantify the precision/recall values obtained by the proposed algorithm for larger scales in \(\mathrm {D3}\). We fix the number of users in the target OSN (i.e., OSN T) to 1000 while the number of users in the auxiliary OSN (i.e., OSN A) increases from 1000 to 8000 in steps of 1000 (in Fig. 4 the number of users in OSN T was fixed to 100 while the number of users in OSN A was increasing from 100 to 1000). We show the results for \(\mathrm {D3}\) in Fig. 9. The precision/recall values slightly decrease with the increase of the number of users in OSN A, confirming the scalability of the proposed algorithm. Note that, in \(\mathrm {D3}\), we only use the graph connectivity attribute for profile matching. We expect that the decrease in precision/recall values will be smaller when both the graphical structure and other attributes of the users are used to generate the model.

Fig. 9.
figure 9

The effect of auxiliary OSN’s (OSN A) size to precision/recall when the size of target OSN (OSN T) is 1000 in \(\mathrm {D3}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halimi, A., Ayday, E. (2020). Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds) Computer Security – ESORICS 2020. ESORICS 2020. Lecture Notes in Computer Science(), vol 12308. Springer, Cham. https://doi.org/10.1007/978-3-030-58951-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58951-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58950-9

  • Online ISBN: 978-3-030-58951-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics