Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The widespread use of online social networks (OSNs) has made them prime targets for cyber attackers, who exploit these platforms for various malicious activities. As a result, a whole industry of black-market services has emerged, selling services based on the sale of fake accounts. Because of the massive rise of OSNs, the number of fraudulent accounts rapidly expands. Hence, this research focuses on detecting fraudulent profiles on Instagram and Facebook and aims to find an optimal subset of features that can effectively differentiate between real and fake accounts. The problem has been formulated as a multiobjective optimization task, aiming to maximize the classification accuracy while minimizing the number of selected features. NSGA-II (non-dominated sorting genetic algorithm II) is employed as the optimization algorithm to explore the trade-offs between these conflicting objectives. In the current study, a novel approach for feature selection using the NSGA-II optimization algorithm to detect fake accounts is proposed. The proposed methodology relies on input data comprising features characterizing the profiles under investigation. The selected features are utilized to train a machine learning model. The model’s performance is evaluated using various metrics, including precision, recall, F1-score, and receiver operating characteristic (ROC) curve. The final prediction model achieved accuracy values ranging from 90 to 99.88%. The results indicated that the model, utilizing features selected by the NSGA-II algorithm, delivered high prediction accuracy while using less than 31% of the total feature space. This efficient feature selection allowed for the precise differentiation between fake and real users, demonstrating the model’s effectiveness with a minimal number of input variables. Furthermore, the results of experiments demonstrate that the proposed approach achieves better performance as compared to other existing approaches. This research paper focuses on explainability, which refers to the ability to understand and interpret the decisions and outcomes of machine learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

We declare that all the data associated with the manuscript are mentioned in the manuscript.

References

  1. Adewole KS, Balogun AO, Raheem MO, Jimoh MK, Jimoh RG, Mabayoje MA, Usman-Hamza FE, Akintola AG, Asaju-Gbolagade AW (2021) Hybrid feature selection framework for sentiment analysis on large corpora. Jordan J Comput Inf Technol. https://doi.org/10.5455/jjcit.71-1609858713

    Article  Google Scholar 

  2. Aditya BL, Mohanty SN (2023) Heterogenous social media analysis for efficient deep learning fake-profile identification. IEEE Access 11:99339–99351. https://doi.org/10.1109/ACCESS.2023.3313169

    Article  Google Scholar 

  3. Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text classification. Secur Priv 1(1):9

    Article  Google Scholar 

  4. Akyon FC, Esat Kalfaoglu M (2019) Instagram fake and automated account detection. In: Proceedings—2019 innovations in intelligent systems and applications conference, ASYU 2019 https://doi.org/10.1109/ASYU48272.2019.8946437. arXiv:1910.03090

  5. Albayati MB, Altamimi AM (2019) Identifying fake Facebook profiles using data mining techniques. J ICT Res Appl 13:107–117. https://doi.org/10.5614/itbj.ict.res.appl.2019.13.2.2

    Article  Google Scholar 

  6. Allam M, Nandhini M (2018) Optimal feature selection using binary teaching learning based optimization algorithm. J King Saud Univ Comput Inf Sci 34:329–341

    Google Scholar 

  7. Alnagi E, Ahmad A, Al-Haija QA, Aref A (2024) Unmasking fake social network accounts with explainable intelligence. Int J Adv Comput Sci Appl 15:1277–1283. https://doi.org/10.14569/IJACSA.2024.01503125

    Article  Google Scholar 

  8. Alsubaei FS (2023) Detection of inappropriate tweets linked to fake accounts on twitter. Appl Sci (Switzerland). https://doi.org/10.3390/app13053013

    Article  Google Scholar 

  9. Anand N, Sehgal R, Anand S, Kaushik A (2021) Feature selection on educational data using Boruta algorithm. Int J Comput Intell Stud 10:27–35

    Google Scholar 

  10. Arega KL, Alasadi MK, Yaseen AJ, Salau AO, Braide SL, Bandele JO (2023) Machine learning based detection of fake Facebook profiles in Afan Oromo language. Math Model Eng Probl 10:1987–1993. https://doi.org/10.18280/mmep.100608

    Article  Google Scholar 

  11. Bakhshandeh B (2019) Instagram fake spammer genuine accounts

  12. Bhattasali T, Saeed K (2021) Typing pattern analysis for fake profile detection in social media, in: Computer information systems and industrial management: 20th international conference, CISIM 2021, Ełk, Poland, September 24–26, 2021, Proceedings 20, Springer. pp 17–27. https://doi.org/10.1007/978-3-030-84340-3_2

  13. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  14. Carmi E (2020) Rhythmedia: a study of Facebook immune system. Theory Cult Soc 37:119–138. https://doi.org/10.1177/0263276420917466

    Article  Google Scholar 

  15. Cauteruccio F, Kou Y (2023) Investigating the emotional experiences in esports spectatorship: the case of league of legends. Inf Process Manag 60:103516. https://doi.org/10.1016/j.ipm.2023.103516

    Article  Google Scholar 

  16. Chalkiadakis G, Elkind E, Wooldridge M (2012) Cooperative game theory: basic concepts and computational challenges. IEEE Intell Syst 27:86–90

    Article  Google Scholar 

  17. Chen C, Zhang J, Xie Y, Xiang Y, Zhou W, Hassan MM, AlElaiwi A, Alrubaian M (2015) A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans Comput Soc Syst 2:65–76

    Article  Google Scholar 

  18. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake twitter followers. Decis Support Syst 80:56–71

    Article  Google Scholar 

  19. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion, pp 963–972

  20. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6:182–197. https://doi.org/10.1109/4235.996017

    Article  Google Scholar 

  21. Deepa S (2008) Introduction to genetic algorithms. Springer, Berlin

    Google Scholar 

  22. Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimed Tools Appl 78:3797–3816

    Article  Google Scholar 

  23. Fakhraei S, Foulds J, Shashanka M, Getoor L (2015) Collective spammer detection in evolving multi-relational social networks. In: Proceedings of the 21th ACM sigkdd international conference on knowledge discovery and data mining, pp 1769–1778

  24. Feng S, Tan Z, Wan H, Wang N, Chen Z, Zhang B, Zheng Q, Zhang W, Lei Z, Yang S et al (2022) Twibot-22: towards graph-based twitter bot detection. Adv Neural Inf Process Syst 35:35254–35269

    Google Scholar 

  25. Fraser A, Burnell D et al (1970) Computer models in genetics. Comput Models Genet

  26. Galán-García P, Puerta JGDI, Gómez CL, Santos I, Bringas PG (2016) Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Logic J IGPL 24:42–53

    MathSciNet  Google Scholar 

  27. Gambella C, Ghaddar B, Naoum-Sawaya J (2021) Optimization problems for machine learning: a survey. Eur J Oper Res 290:807–828. https://doi.org/10.1016/j.ejor.2020.08.045

    Article  MathSciNet  Google Scholar 

  28. Gazeloğlu C (2020) Prediction of heart disease by classifying with feature selection and machine learning methods. Progress Nutr. https://doi.org/10.23751/pn.v22i2.9830

  29. Ghatasheh N, Altaharwa I, Aldebei K (2022) Modified genetic algorithm for feature selection and hyper parameter optimization: case of XGBoost in spam prediction. IEEE Access 10:84365–84383

    Article  Google Scholar 

  30. Gu B, Zhai Z, Li X, Huang H (2022) Towards fairer classifier via true fairness score path. In: Proceedings of the 31st ACM international conference on information & knowledge management, pp 3113–3121

  31. Haq ZU, Ullah H, Khan MNA, Naqvi SR, Ahad A, Amin NAS (2022) Comparative study of machine learning methods integrated with genetic algorithm and particle swarm optimization for bio-char yield prediction. Bioresour Technol 363:128008. https://doi.org/10.1016/j.biortech.2022.128008

    Article  Google Scholar 

  32. Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) An efficient pareto-based feature selection algorithm for multi-label classification. Inf Sci 581:428–447. https://doi.org/10.1016/j.ins.2021.09.052

    Article  MathSciNet  Google Scholar 

  33. Igual L, Seguí S (2017) Introduction to data science: a python approach to concepts. Tech Appl. https://doi.org/10.1007/978-3-319-50017-1

    Article  Google Scholar 

  34. Jennings PC, Lysgaard S, Hummelshøj JS, Vegge T, Bligaard T (2019) Genetic algorithms for computational materials discovery accelerated by machine learning. NPJ Comput Mater. https://doi.org/10.1038/s41524-019-0181-4

    Article  Google Scholar 

  35. Joshi S, Nagariya HG, Dhanotiya N, Jain S (2020) Identifying fake profile in online social network: an overview and survey. In: International conference on machine learning. Image Processing, Network Security and Data Sciences, Springer, pp 17–28

  36. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80:8091–8126

    Article  Google Scholar 

  37. Kaubiyal J, Jain AK (2019) A feature based approach to detect fake profiles in twitter. In: ACM international conference proceeding series. https://doi.org/10.1145/3361758.3361784

  38. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE. pp 1942–1948. https://doi.org/10.1109/ICNN.1995.488968

  39. Kubat M (2017) An introduction to machine learning. https://doi.org/10.1007/978-3-319-63913-0

  40. Kursa M, Rudnicki W (2020) Boruta: wrapper algorithm for all relevant feature selection. Visité le 6:2020

    Google Scholar 

  41. Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. J Stat Softw 36:1–13. https://doi.org/10.18637/jss.v036.i11

  42. Liu XY, Liang Y, Wang S, Yang ZY, Ye HS (2018) A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 6:22863–22874

    Article  Google Scholar 

  43. Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems 2017-December, 4766–4775. arXiv:1705.07874

  44. Ma W, Zhou X, Zhu H, Li L, Jiao L (2021) A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recog. https://doi.org/10.1016/j.patcog.2021.107933

    Article  Google Scholar 

  45. Medhane DV, Sangaiah AK (2017) Search space-based multi-objective optimization evolutionary algorithm. Comput Electr Eng 58:126–143. https://doi.org/10.1016/j.compeleceng.2017.01.025

    Article  Google Scholar 

  46. Mohammadrezaei M, Shiri ME, Rahmani AM (2018) Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur Commun Netw. https://doi.org/10.1155/2018/5923156

    Article  Google Scholar 

  47. Moslehi F, Haeri A (2020) A novel hybrid wrapper-filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput 11:1105–1127

    Article  Google Scholar 

  48. Nettleton D (2014) Selection of variables and factor derivation. In: Commercial data mining. https://doi.org/10.1016/b978-0-12-416602-8.00006-6

  49. Oumaima L, Mariam R, Ouafae B, Abdelouahid L (2024) Fake account detection in twitter using long short-term memory and convolutional neural network. Int J Eng Trends Technol 72:116–126. https://doi.org/10.14445/22315381/IJETT-V72I3P112

    Article  Google Scholar 

  50. Rácz A, Bajusz D, Héberger K (2019) Multi-level comparison of machine learning classifiers and their performance metrics. Molecules 24:2811

    Article  Google Scholar 

  51. Raja EVS, Aditya BL, Mohanty SN (2024) Fake profile detection using logistic regression and gradient descent algorithm on online social networks. EAI Endorsed Trans Scalable Inf Syst 11:1–10. https://doi.org/10.4108/eetsis.4342

    Article  Google Scholar 

  52. Rostami M, Berahmand K, Forouzandeh S (2021) A novel community detection based genetic algorithm for feature selection. J Big Data 8:1–27

    Article  Google Scholar 

  53. Shah A, Varshney S, Mehrotra M (2024) Detection of fake profiles on online social network platforms: performance evaluation of artificial intelligence techniques. SN Comput Sci. https://doi.org/10.1007/s42979-024-02839-9

    Article  Google Scholar 

  54. Shami TM, El-Saleh AA, Alswaitti M, Al-Tashi Q, Summakieh MA, Mirjalili S (2022) Particle swarm optimization: a comprehensive survey. IEEE Access 10:10031–10061. https://doi.org/10.1109/ACCESS.2022.3142859

    Article  Google Scholar 

  55. Sheikhi S (2020) An efficient method for detection of fake accounts on the Instagram platform. Rev d’Intelligence Artif 34:429–436

    Google Scholar 

  56. Shirataki S, Yamaguchi S (2017) A study on interpretability of decision of machine learning. In: Proceedings—2017 IEEE international conference on big data, big data vol 2018, pp 4830–4831. https://doi.org/10.1109/BigData.2017.8258557

  57. Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm Evolut Comput 36:27–36. https://doi.org/10.1016/j.swevo.2017.04.002

    Article  Google Scholar 

  58. Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th international advance computing conference (IACC), IEEE. pp 338–343

  59. Song XF, Zhang Y, Guo YN, Sun XY, Wang YL (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24:882–895. https://doi.org/10.1109/TEVC.2020.2968743

    Article  Google Scholar 

  60. Statista. Most popular social networks worldwide as of january 2022, ranked by number of monthly active users. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

  61. Unni MV, Jeevananda S, Kalapurackal JJ, Fatma S (2024) Enhancing authenticity and trust in social media: an automated approach for detecting fake profiles. Indones J Electr Eng Comput Sci 35:292–300. https://doi.org/10.11591/ijeecs.v35.i1.pp292-300

    Article  Google Scholar 

  62. Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19:3–26

    MathSciNet  Google Scholar 

  63. Venkatesh SC, Shaji S, Sundaram BM (2024) A fake profile detection model using multistage stacked ensemble classification. Proc Eng Technol Innov 26:18–32. https://doi.org/10.46604/peti.2024.13200

    Article  Google Scholar 

  64. Wang X, Lai CM, Lin YC, Hsieh CJ, Wu SF, Cam H (2019) Multiple accounts detection on Facebook using semi-supervised learning on graphs. In: Proceedings—IEEE military communications conference MILCOM 2019-Oct 94–101. https://doi.org/10.1109/MILCOM.2018.8599718

  65. Wani MA, Agarwal N, Jabin S, Hussain SZ (2019) Analyzing Real and Fake users in Facebook Network based on Emotions. 2019 11th International Conference on Communication Systems and Networks, COMSNETS 2019 2061:110–117. https://doi.org/10.1109/COMSNETS.2019.8711124

  66. Xue Y, Li M, Shepperd M, Lauria S, Liu X (2019) A novel aggregation-based dominance for pareto-based evolutionary algorithms to configure software product lines. Neurocomputing 364:32–48. https://doi.org/10.1016/j.neucom.2019.06.075

    Article  Google Scholar 

  67. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316. https://doi.org/10.1016/j.neucom.2020.07.061

    Article  Google Scholar 

  68. Zeng F, Sun Y, Li Y (2023) MRLBot : Multi-dimensional representation learning for social media bot detection

Download references

Funding

The authors received no specific funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anand Nayyar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sallah, A., Abdellaoui Alaoui, E.A., Hessane, A. et al. An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithm. Neural Comput & Applic 36, 21487–21515 (2024). https://doi.org/10.1007/s00521-024-10350-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-10350-8

Keywords