Abstract
The widespread use of online social networks (OSNs) has made them prime targets for cyber attackers, who exploit these platforms for various malicious activities. As a result, a whole industry of black-market services has emerged, selling services based on the sale of fake accounts. Because of the massive rise of OSNs, the number of fraudulent accounts rapidly expands. Hence, this research focuses on detecting fraudulent profiles on Instagram and Facebook and aims to find an optimal subset of features that can effectively differentiate between real and fake accounts. The problem has been formulated as a multiobjective optimization task, aiming to maximize the classification accuracy while minimizing the number of selected features. NSGA-II (non-dominated sorting genetic algorithm II) is employed as the optimization algorithm to explore the trade-offs between these conflicting objectives. In the current study, a novel approach for feature selection using the NSGA-II optimization algorithm to detect fake accounts is proposed. The proposed methodology relies on input data comprising features characterizing the profiles under investigation. The selected features are utilized to train a machine learning model. The model’s performance is evaluated using various metrics, including precision, recall, F1-score, and receiver operating characteristic (ROC) curve. The final prediction model achieved accuracy values ranging from 90 to 99.88%. The results indicated that the model, utilizing features selected by the NSGA-II algorithm, delivered high prediction accuracy while using less than 31% of the total feature space. This efficient feature selection allowed for the precise differentiation between fake and real users, demonstrating the model’s effectiveness with a minimal number of input variables. Furthermore, the results of experiments demonstrate that the proposed approach achieves better performance as compared to other existing approaches. This research paper focuses on explainability, which refers to the ability to understand and interpret the decisions and outcomes of machine learning models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
We declare that all the data associated with the manuscript are mentioned in the manuscript.
References
Adewole KS, Balogun AO, Raheem MO, Jimoh MK, Jimoh RG, Mabayoje MA, Usman-Hamza FE, Akintola AG, Asaju-Gbolagade AW (2021) Hybrid feature selection framework for sentiment analysis on large corpora. Jordan J Comput Inf Technol. https://doi.org/10.5455/jjcit.71-1609858713
Aditya BL, Mohanty SN (2023) Heterogenous social media analysis for efficient deep learning fake-profile identification. IEEE Access 11:99339–99351. https://doi.org/10.1109/ACCESS.2023.3313169
Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text classification. Secur Priv 1(1):9
Akyon FC, Esat Kalfaoglu M (2019) Instagram fake and automated account detection. In: Proceedings—2019 innovations in intelligent systems and applications conference, ASYU 2019 https://doi.org/10.1109/ASYU48272.2019.8946437. arXiv:1910.03090
Albayati MB, Altamimi AM (2019) Identifying fake Facebook profiles using data mining techniques. J ICT Res Appl 13:107–117. https://doi.org/10.5614/itbj.ict.res.appl.2019.13.2.2
Allam M, Nandhini M (2018) Optimal feature selection using binary teaching learning based optimization algorithm. J King Saud Univ Comput Inf Sci 34:329–341
Alnagi E, Ahmad A, Al-Haija QA, Aref A (2024) Unmasking fake social network accounts with explainable intelligence. Int J Adv Comput Sci Appl 15:1277–1283. https://doi.org/10.14569/IJACSA.2024.01503125
Alsubaei FS (2023) Detection of inappropriate tweets linked to fake accounts on twitter. Appl Sci (Switzerland). https://doi.org/10.3390/app13053013
Anand N, Sehgal R, Anand S, Kaushik A (2021) Feature selection on educational data using Boruta algorithm. Int J Comput Intell Stud 10:27–35
Arega KL, Alasadi MK, Yaseen AJ, Salau AO, Braide SL, Bandele JO (2023) Machine learning based detection of fake Facebook profiles in Afan Oromo language. Math Model Eng Probl 10:1987–1993. https://doi.org/10.18280/mmep.100608
Bakhshandeh B (2019) Instagram fake spammer genuine accounts
Bhattasali T, Saeed K (2021) Typing pattern analysis for fake profile detection in social media, in: Computer information systems and industrial management: 20th international conference, CISIM 2021, Ełk, Poland, September 24–26, 2021, Proceedings 20, Springer. pp 17–27. https://doi.org/10.1007/978-3-030-84340-3_2
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Carmi E (2020) Rhythmedia: a study of Facebook immune system. Theory Cult Soc 37:119–138. https://doi.org/10.1177/0263276420917466
Cauteruccio F, Kou Y (2023) Investigating the emotional experiences in esports spectatorship: the case of league of legends. Inf Process Manag 60:103516. https://doi.org/10.1016/j.ipm.2023.103516
Chalkiadakis G, Elkind E, Wooldridge M (2012) Cooperative game theory: basic concepts and computational challenges. IEEE Intell Syst 27:86–90
Chen C, Zhang J, Xie Y, Xiang Y, Zhou W, Hassan MM, AlElaiwi A, Alrubaian M (2015) A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans Comput Soc Syst 2:65–76
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake twitter followers. Decis Support Syst 80:56–71
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion, pp 963–972
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6:182–197. https://doi.org/10.1109/4235.996017
Deepa S (2008) Introduction to genetic algorithms. Springer, Berlin
Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimed Tools Appl 78:3797–3816
Fakhraei S, Foulds J, Shashanka M, Getoor L (2015) Collective spammer detection in evolving multi-relational social networks. In: Proceedings of the 21th ACM sigkdd international conference on knowledge discovery and data mining, pp 1769–1778
Feng S, Tan Z, Wan H, Wang N, Chen Z, Zhang B, Zheng Q, Zhang W, Lei Z, Yang S et al (2022) Twibot-22: towards graph-based twitter bot detection. Adv Neural Inf Process Syst 35:35254–35269
Fraser A, Burnell D et al (1970) Computer models in genetics. Comput Models Genet
Galán-García P, Puerta JGDI, Gómez CL, Santos I, Bringas PG (2016) Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Logic J IGPL 24:42–53
Gambella C, Ghaddar B, Naoum-Sawaya J (2021) Optimization problems for machine learning: a survey. Eur J Oper Res 290:807–828. https://doi.org/10.1016/j.ejor.2020.08.045
Gazeloğlu C (2020) Prediction of heart disease by classifying with feature selection and machine learning methods. Progress Nutr. https://doi.org/10.23751/pn.v22i2.9830
Ghatasheh N, Altaharwa I, Aldebei K (2022) Modified genetic algorithm for feature selection and hyper parameter optimization: case of XGBoost in spam prediction. IEEE Access 10:84365–84383
Gu B, Zhai Z, Li X, Huang H (2022) Towards fairer classifier via true fairness score path. In: Proceedings of the 31st ACM international conference on information & knowledge management, pp 3113–3121
Haq ZU, Ullah H, Khan MNA, Naqvi SR, Ahad A, Amin NAS (2022) Comparative study of machine learning methods integrated with genetic algorithm and particle swarm optimization for bio-char yield prediction. Bioresour Technol 363:128008. https://doi.org/10.1016/j.biortech.2022.128008
Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) An efficient pareto-based feature selection algorithm for multi-label classification. Inf Sci 581:428–447. https://doi.org/10.1016/j.ins.2021.09.052
Igual L, Seguí S (2017) Introduction to data science: a python approach to concepts. Tech Appl. https://doi.org/10.1007/978-3-319-50017-1
Jennings PC, Lysgaard S, Hummelshøj JS, Vegge T, Bligaard T (2019) Genetic algorithms for computational materials discovery accelerated by machine learning. NPJ Comput Mater. https://doi.org/10.1038/s41524-019-0181-4
Joshi S, Nagariya HG, Dhanotiya N, Jain S (2020) Identifying fake profile in online social network: an overview and survey. In: International conference on machine learning. Image Processing, Network Security and Data Sciences, Springer, pp 17–28
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80:8091–8126
Kaubiyal J, Jain AK (2019) A feature based approach to detect fake profiles in twitter. In: ACM international conference proceeding series. https://doi.org/10.1145/3361758.3361784
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE. pp 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Kubat M (2017) An introduction to machine learning. https://doi.org/10.1007/978-3-319-63913-0
Kursa M, Rudnicki W (2020) Boruta: wrapper algorithm for all relevant feature selection. Visité le 6:2020
Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. J Stat Softw 36:1–13. https://doi.org/10.18637/jss.v036.i11
Liu XY, Liang Y, Wang S, Yang ZY, Ye HS (2018) A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 6:22863–22874
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems 2017-December, 4766–4775. arXiv:1705.07874
Ma W, Zhou X, Zhu H, Li L, Jiao L (2021) A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recog. https://doi.org/10.1016/j.patcog.2021.107933
Medhane DV, Sangaiah AK (2017) Search space-based multi-objective optimization evolutionary algorithm. Comput Electr Eng 58:126–143. https://doi.org/10.1016/j.compeleceng.2017.01.025
Mohammadrezaei M, Shiri ME, Rahmani AM (2018) Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur Commun Netw. https://doi.org/10.1155/2018/5923156
Moslehi F, Haeri A (2020) A novel hybrid wrapper-filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput 11:1105–1127
Nettleton D (2014) Selection of variables and factor derivation. In: Commercial data mining. https://doi.org/10.1016/b978-0-12-416602-8.00006-6
Oumaima L, Mariam R, Ouafae B, Abdelouahid L (2024) Fake account detection in twitter using long short-term memory and convolutional neural network. Int J Eng Trends Technol 72:116–126. https://doi.org/10.14445/22315381/IJETT-V72I3P112
Rácz A, Bajusz D, Héberger K (2019) Multi-level comparison of machine learning classifiers and their performance metrics. Molecules 24:2811
Raja EVS, Aditya BL, Mohanty SN (2024) Fake profile detection using logistic regression and gradient descent algorithm on online social networks. EAI Endorsed Trans Scalable Inf Syst 11:1–10. https://doi.org/10.4108/eetsis.4342
Rostami M, Berahmand K, Forouzandeh S (2021) A novel community detection based genetic algorithm for feature selection. J Big Data 8:1–27
Shah A, Varshney S, Mehrotra M (2024) Detection of fake profiles on online social network platforms: performance evaluation of artificial intelligence techniques. SN Comput Sci. https://doi.org/10.1007/s42979-024-02839-9
Shami TM, El-Saleh AA, Alswaitti M, Al-Tashi Q, Summakieh MA, Mirjalili S (2022) Particle swarm optimization: a comprehensive survey. IEEE Access 10:10031–10061. https://doi.org/10.1109/ACCESS.2022.3142859
Sheikhi S (2020) An efficient method for detection of fake accounts on the Instagram platform. Rev d’Intelligence Artif 34:429–436
Shirataki S, Yamaguchi S (2017) A study on interpretability of decision of machine learning. In: Proceedings—2017 IEEE international conference on big data, big data vol 2018, pp 4830–4831. https://doi.org/10.1109/BigData.2017.8258557
Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm Evolut Comput 36:27–36. https://doi.org/10.1016/j.swevo.2017.04.002
Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th international advance computing conference (IACC), IEEE. pp 338–343
Song XF, Zhang Y, Guo YN, Sun XY, Wang YL (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24:882–895. https://doi.org/10.1109/TEVC.2020.2968743
Statista. Most popular social networks worldwide as of january 2022, ranked by number of monthly active users. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
Unni MV, Jeevananda S, Kalapurackal JJ, Fatma S (2024) Enhancing authenticity and trust in social media: an automated approach for detecting fake profiles. Indones J Electr Eng Comput Sci 35:292–300. https://doi.org/10.11591/ijeecs.v35.i1.pp292-300
Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19:3–26
Venkatesh SC, Shaji S, Sundaram BM (2024) A fake profile detection model using multistage stacked ensemble classification. Proc Eng Technol Innov 26:18–32. https://doi.org/10.46604/peti.2024.13200
Wang X, Lai CM, Lin YC, Hsieh CJ, Wu SF, Cam H (2019) Multiple accounts detection on Facebook using semi-supervised learning on graphs. In: Proceedings—IEEE military communications conference MILCOM 2019-Oct 94–101. https://doi.org/10.1109/MILCOM.2018.8599718
Wani MA, Agarwal N, Jabin S, Hussain SZ (2019) Analyzing Real and Fake users in Facebook Network based on Emotions. 2019 11th International Conference on Communication Systems and Networks, COMSNETS 2019 2061:110–117. https://doi.org/10.1109/COMSNETS.2019.8711124
Xue Y, Li M, Shepperd M, Lauria S, Liu X (2019) A novel aggregation-based dominance for pareto-based evolutionary algorithms to configure software product lines. Neurocomputing 364:32–48. https://doi.org/10.1016/j.neucom.2019.06.075
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Zeng F, Sun Y, Li Y (2023) MRLBot : Multi-dimensional representation learning for social media bot detection
Funding
The authors received no specific funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest to report regarding the present study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sallah, A., Abdellaoui Alaoui, E.A., Hessane, A. et al. An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithm. Neural Comput & Applic 36, 21487–21515 (2024). https://doi.org/10.1007/s00521-024-10350-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10350-8