1. Introduction
Electronic commerce (E-commerce) is the revolutionary method of buying and selling goods and services using the internet. This evolution in traditional business models has made it possible for consumers to shop from the comfort of their homes [
1,
2]. E-commerce platforms offer several advantages, such as speeding up procurement processes, reducing costs, improving customer convenience, enabling easy comparison of products and prices, quickly adjusting to market changes, and offering various payment options. These features have significantly supported economic stability, primarily when government restrictions and stay-at-home orders were enforced during pandemics like COVID-19 [
3,
4]. In 2020, eMarketer reports indicated that sales worldwide grew by 27.6% and were forecasted to grow by an additional 14.3% in 2021, approaching 5 trillion (
https://www.insiderintelligence.com/content/worldwide-ecommerce-will-approach-5-trillion-this-year (accessed on 16 July 2024)).
The 2008–2009 financial crisis saw a substantial surge in e-commerce revenue growth from 15% to 25% annually. However, due to the economic downturn, it subsequently plummeted to around 3% in 2009. Post-2009, the development of E-commerce bounced back, climbing over 10% each year, a figure significantly exceeding that of overall sales. In 2020, the COVID-19 pandemic, despite its negative impact on digital travel sales, sparked a notable rise in revenue from E-commerce. This was driven by a shift in consumer behavior towards online shopping, a trend that has persisted beyond the pandemic [
5].
The ongoing COVID-19 pandemic has resulted in a surge in demand for goods, including necessities. This trend has also led to an increased use of digital payment options, which has increased financial fraud. As e-commerce continues to evolve, businesses of all sizes have started accepting credit card (CC) payments. However, this uptick in CC use, especially for online purchases, has given fraudsters avenues to exploit and steal consumer CC details [
6].
Financial fraud significantly impacts academic, commercial, and regulatory spheres, presenting a significant challenge for service providers and their clientele. This critical issue permeates the financial sector, affecting daily economic transactions worldwide. It involves the unauthorized use of assets or funds for personal gain, undermining confidence in financial institutions and leading to higher living costs. Financial fraud encompasses various activities, including falsifying financial statements, telecommunications fraud, and deceptive insurance practices misconduct in the markets. The widespread consequences of fraudulent actions have led to significant disruptions in the global financial system, hastening the shift towards digital financial services and introducing new challenges [
7,
8].
Between 2000 and 2015, there was a substantial surge in the amount of money lost to fraud related to credit and debit cards. Carta et al. [
9] emphasize that while illegal transactions and counterfeit CCs make up from 10 to 15% of all frauds, they account for an overwhelming 75–80% of financial fraud damages. This has increased private and public investments in developing more advanced systems to detect fraudulent activities. The significant monetary transactions in E-commerce make it a prime target for fraud, leading to potential substantial economic losses. The report from Juniper Research clearly shows an alarming increase in fraud-related financial loss, soaring from 17.5 billion in 2020 to a staggering 20 billion in 2021. This underscores the urgent imperative for financial institutions to bolster their CCFD measures without delay [
10].
Fraud is an unauthorized action marked by deceit. Credit card fraud (CCF) involves illegally obtaining a cardholder’s details through phone calls, text messages, or online hacking to carry out unauthorized transactions. These fraud acts typically involve using software controlled by the fraudster. The process of detecting CCF starts when a customer purchases using their information. This transaction must be verified to ensure its authenticity [
10,
11,
12].
Statistics reveal that in the fourth quarter of 2020, Visa and Mastercard distributed 2287 million CCs globally, as shown in
Figure 1a,b. Visa was responsible for issuing 1131 million cards, while Mastercard issued 1156 million. This data highlights the growing ease and popularity of card-based transactions among consumers. However, it also indicates a potential risk, as this significant volume of transactions attracts fraudsters looking to exploit card users [
13].
Research efforts are centered on creating detection systems that utilize ML, data mining (DM), and deep learning (DL) approaches. These systems analyze transactions to distinguish between legitimate and deceptive transactions. As fraudulent transactions increasingly mimic legitimate ones, the challenge of detecting CCF grows. This necessitates the adoption of more sophisticated fraud detection (FD) technologies by CC companies. An effective and accurate FDS for real-time fraud identification can generally be divided into two main types: anomaly detection systems [
11].
Financial entities issuing CCs or overseeing online payments must implement automatic FD mechanisms. This practice not only cuts down on financial losses but also boosts the confidence of their customers. Thanks to the advent of artificial intelligence and big data, there are now innovative opportunities to employ sophisticated ML algorithms for identifying fraudulent activities [
14]. The current FD technologies leverage sophisticated DM, ML, and DL techniques to achieve high efficiency. These systems use a binary classification approach, utilizing datasets with labeled transactions—identifying them as normal or fraudulent. The model created through this process can determine new transactions’ legitimacy. However, employing classification methods to identify fraudulent activities comes with its own set of challenges [
15,
16].
Automated mechanisms for identifying fraud play a crucial role for entities engaged in providing CC services or managing online payments, as they aid in reducing financial losses and enhancing trust among consumers. The advent of big data and advancements in artificial intelligence have paved the way for using complex ML models to spot fraudulent activities. Bao and colleagues (2022) [
14] have demonstrated that the most recent advancements in fraud detection systems (FDSs), which utilize sophisticated DM, ML, and DL techniques, are highly efficient. The present study explores CCF, a critical type of banking fraud. CCs, a significant form of online payment globally, have made electronic transactions easier and resulted in increased fraud perpetrated by cyber crimes. The illegal utilization of CC systems or data, frequently without the card owner’s awareness, is an escalating issue affecting many banks and financial institutions worldwide.
Feature selection (FS) stands as a crucial preprocessing step aimed at addressing the issue of irrelevant features detrimentally affecting the performance of ML models. This process involves pinpointing and removing unnecessary attributes to decrease the dimensionality of the feature sets without compromising the accuracy of the model’s performance. Numerous strategies have been devised for classifying datasets, among which meta-heuristic optimization algorithms (MHOA) have stood out for their proficiency in solving a broad spectrum of optimization challenges [
17].
MHOA stands for optimization techniques that aim to identify the best or nearly the best solutions to various optimization challenges. These methods are not dependent on derivatives, highlighting their ease of use, versatility, and ability to avoid getting stuck at local optimum points. MHOAs employ a stochastic approach, starting their optimization journey with solutions generated at random, in contrast to gradient search methods that rely on calculating derivatives within the search space. Their straightforwardness and simplicity, rooted in fundamental concepts and ease of implementation, render these algorithms adaptable and straightforward to tailor to specific issues. A distinctive feature of MHOAs is their superior ability to avoid early convergence. Thanks to their stochastic nature, they can effectively operate as a dark box, evading local optima and extensively probing the search space [
18,
19,
20,
21,
22,
23].
Employing MHOAs to manage FS dilemmas can significantly mitigate obstacles during data analytics. FS plays a pivotal role in identifying pertinent features within imbalanced datasets, especially before the classification of CCF instances in large datasets. The key advantages of FS encompass easier data understanding, decreased time required for training, and addressing high dimensionality concerns. Bio-inspired algorithms, which excel in solving intricate and combinatorial challenges, have successfully identified CCF.
1.1. Motivations
This study addresses the challenge of data imbalance in CCFD by implementing advanced meta-heuristic optimization (MHO) techniques to refine classifier performance. Random forest (RF) and support vector machine (SVM) classifiers are utilized, leveraging their proven effectiveness across various domains [
24,
25,
26]. The core of the experimentation involved 135 variations in 9 typical S-shaped and V-shaped transfer functions, each designed to enhance the performance of these MHO techniques.
These variants were assessed using CCF benchmark datasets from the Kaggle repository, with RF and SVM classifiers acting as fitness evaluators. This robust evaluation pinpointed the sailfish optimizer (SFO) as a standout among 15 renowned MHO algorithms, including brown-bear optimization (BBO), African vultures optimization (AVO), and others. SFO was particularly notable for its ability to reduce feature size by up to 90% while achieving classification accuracy as high as 97%.
The results of this investigation are critical for improving the security of CC transactions, thus enhancing e-commerce reliability worldwide. Utilizing the European credit cardholders dataset from Kaggle, the study demonstrated the algorithm’s capability to identify critical features accurately and reliably. Each algorithm was tested through 30 separate executions using the RF and SVM classifiers, ensuring the consistency and reliability of the model in detecting CCF. This research provides a significant foundation for future enhancements in FD technologies.
1.2. Contributions
This study offers significant advancements in addressing the challenges of data imbalance in CCFD, as outlined in the following contributions:
Employing data from the European cardholder dataset on Kaggle, this work evaluates the performance of various techniques to address data imbalance. These techniques are assessed based on mean classification accuracy, the number of selected features, fitness values, and computational time.
Implementation and testing of 15 MHO algorithms, each enhanced by nine different transfer functions, are conducted to identify the most significant features for managing data imbalance within the dataset.
Evaluation of two machine learning techniques, random forest(RF) and support vector Machine (SVM), assessing their effectiveness on features selected by the MHO algorithms to validate robustness in improving model performance under conditions of data imbalance.
Document significant improvements in classification accuracy, notably with the sailfish optimizer combined with the random forest (SFO-RF) approach, achieving up to 97% accuracy. This highlights the effectiveness of the proposed methods in overcoming the challenges posed by imbalanced datasets.
1.3. Structure
The remainder of the paper is organized as follows:
Section 2 introduces the current researches that handle FD.
Section 3 presents the proposed techniques.
Section 4 discusses the findings and analysis of experimental results. Finally,
Section 5 concludes the paper, outlining implications derived from the results.
2. Related Work
Many research studies have recently emerged, conducting reviews on existing strategies for FD and prevention, as noted in the scholarly works. This segment highlights various academic studies centered on FD, focusing on those addressing the issue within the framework of class imbalance challenges. A wide range of methods has been utilized to uncover fraudulent financial transactions. To effectively review the most relevant literature in this area, it is helpful to categorize the critical methodologies into several distinct groups, such as ML, strategies for detecting CCF, ensemble methods, feature ranking techniques, and methods for user authentication.
Zojaji et al. [
27] have organized the methods for detecting fraud in CC transactions into two primary categories: supervised and unsupervised. They provided a comprehensive classification of the techniques mentioned in the studies, focusing on the categories and the applied datasets. However, they did not suggest any new methods themselves. On the other hand, Adewumi et al. [
28] reviewed detecting fraud in online CC transactions with methods inspired by nature using ML. Their review spanned studies from 1997 to 2016, focusing primarily on the essential techniques and algorithms that emerged between 2010 and 2015 without delving into the methodologies. Similarly, Chilaka et al. [
29] investigated methods for FD in CCs within the e-banking industry, focusing on pertinent studies from 2014 until 2019 to summarize the approaches taken. They concentrated on solutions that emphasized a quick, efficient response. However, their review was not conducted systematically and did not include a classification system.
Khalid et al. [
30] has developed an ensemble approach using various ML algorithms to improve CCFD. This method leverages different algorithms’ strengths to enhance the precision and dependability of FD mechanisms. Through detailed experiments and analysis, the authors illustrate how their method can effectively tackle the difficulties of identifying fraudulent activities in CC transactions.
Abdul et al. [
31] introduced a federated learning (FL) framework tailored explicitly for detecting CCF, including methods for balancing data to enhance effectiveness. FL allows for training models on multiple decentralized data sources without the need to gather the data in a single location, thereby protecting data privacy. The researchers applied methodologies to overcome the challenge of uneven data distribution, a common issue in FD data. Their experiments and analysis show that their method not only maintains privacy but also significantly increases FD’s accuracy by addressing the data imbalance.
Chen et al. [
32] have proposed a method for CCFD that leverages sampling alongside self-supervised learning approaches. Intelligent sampling is used to refine the choice of samples for training, thereby enhancing the training process’s efficiency. Meanwhile, self-supervised learning is applied to extract valuable features from data that have not been labeled, which is crucial for identifying illegal transactions. Through the research, the team has shown that their methodology improves the accuracy of FD, reduces the computational effort required, and lessens the dependency on labeled datasets.
Taha et al. [
33] developed an intelligent technique for identifying fraud in CC transactions using a refined version of the light gradient boosting machine (LightGBM) algorithm. Known for its speed and precision in processing vast amounts of data, the authors further enhance LightGBM’s effectiveness by fine-tuning its settings. Their research and testing reveal that this optimized method significantly improves the accuracy of FD while keeping the computational demands manageable. This strategy presents a viable option for banks and other financial entities aiming to upgrade their systems to detect fraudulent activities.
In their work, Rawashdeh et al. [
34] developed an effective technique for identifying CCF, employing a combination of evolutionary algorithms for selecting the best features and random weight networks for classification. This innovative approach aims to enhance the FD precision with careful choice of the most significant features and optimize the network weights by minimizing the effort consumed. The effectiveness of this method in accurately spotting CCF is demonstrated through various tests and assessments, highlighting the security in finance.
Kennedy et al. [
35] outlined the issue of significant imbalance in the CCFD dataset by introducing a method for synthesizing class labels. This imbalance often results in models that are biased and ineffective at identifying fraudulent transactions. To address this, the team created synthetic examples of the less represented class, effectively evening out the dataset. They applied ML algorithms to these balanced datasets to train their models. Through experimental analysis, they showed that their method significantly enhances the performance of systems that detect CCF, making a noteworthy contribution to the FD domain.
Aziz et al. [
36] delved into several DM approaches to identify CCF, concentrating on a range of ML tactics, including RF, SVM, hybrid methods, decision tree (DT), and DL. These techniques were employed to detect common patterns in consumer behavior from historical data. Their analysis of ML strategies revealed notable disparities across various studies and pointed out potential future directions for investigation.
Singh et al. [
7] introduced a model for CCFD that integrates a two-stage process involving SVM and an optimization technique inspired by firefly behavior. Initially, the model employs the firefly technique alongside the CfsSubsetEval technique to refine the FS. Subsequently, an SVM classifier is utilized to build the CCFD model in the second stage. This approach achieved a classification with 85.65% accuracy in 591 transactions.
Nguyen et al. [
37] utilized a DL strategy that incorporates long short-term memory (LSTM) and convolutional neural network techniques to successfully identify CCF on a variety of card types, including European, small, and tall, across various datasets. To combat the challenge of imbalanced classes, the research employed sampling methods, which, while reducing efficacy on unseen data, enhanced performance on familiar samples. The study demonstrates the proposed DL methods’ capability to detect CCF in real-world applications, outperforming traditional ML models. Among all tested algorithms, the LSTM model with 50 units was highlighted for its superior performance, achieving an F1 score of 84.85%.
Ahmed et al. [
38] explored using FS to identify intrusions in wireless sensor networks. They utilized particle swarm optimization (PSO) and principal component analysis space for this purpose, and they also compared its effectiveness with that of the genetic algorithm (GA). Rtayli et al. [
39] developed an advanced method for identifying CC risk, employing algorithms such as DF and SVM to detect fraud.
Misra et al. [
40] and Schlör et al. [
41] have balanced datasets for detection models by applying under-sampling methods. A significant drawback of under-sampling is its propensity to exclude valuable instances from the training dataset, potentially diminishing detection accuracy. Conversely, isolation techniques have been utilized to approximate data distribution and construct a model with a diverse mixture of components. Such strategies for detecting outliers have proven effective in identifying fraud, as shown by Buschjäger et al. [
42]. However, there is a notable absence in the research into comprehensive evaluations of the recent ML algorithms that leverage under-sampling to mitigate issues of imbalance in hybrid semi-supervised methods, which merge supervised learning with unsupervised outlier detection, which are significantly underutilized—additionally, the assessment of FDSs.
Hajek et al. (2022) [
43] focused on creating FDSs utilizing XGBoost, which also evaluates the financial implications of such systems. This system underwent extensive testing on a dataset of over 6 million mobile transactions. To determine their model’s effectiveness, they compared the proposed model to other ML strategies designed for managing imbalanced datasets and identifying outliers. Their research showed that a semi-supervised ensemble model, combining unsupervised outlier detection techniques with an XGBoost technique, surpassed the performance of other models regarding standard classification metrics. The most substantial cost reduction was achieved by integrating random under-sampling with the XGBoost approach.
Krim et al. [
44] described an autoencoder as a particular kind of neural network that can encode and decode data similarly. This approach includes the specialized training of autoencoders on data points that are not anomalous. It depends on evaluating reconstruction errors to identify anomalies classified as either ’fraud’ or ’no fraud.’ This suggests that in situations not previously encountered by the system, there is a greater chance of detecting anomalies [
2]. A slight rise beyond the maximum limit is typically marked as unusual. This method has been utilized in scenarios involving an autoencoder-based framework for identifying anomalies. Within the realm of ML, a generative adversarial network (GAN) consists of two neural networks collaborating to enhance their forecasting abilities. Mainly unsupervised, GANs develop by engaging in a collaborative zero-sum game approach.
These investigations utilized various ML techniques, such as SVM, DT, RF, NB, LR, LightGBM, and MLP. Moreover, firefly optimization, SMO, and hybrid sampling were employed. The findings are predominantly shown through accuracy , showcasing high levels of success in numerous cases. For example, Singh et al. reached an accuracy level of 85.65%, whereas Balogun et al. recorded accuracies of 97.50% with SVM and 98.60% with RF. Although various methods are used across these studies, a significant number have proven to be effective in detecting fraud.
The comprehensive analysis of the existing literature underlines a crucial realization: the landscape of FD, especially within the realm of CC transactions, is rapidly evolving in response to the equally dynamic tactics of fraudsters. The shift from traditional statistical methods to more intricate approaches, such as ML and MHO, marks a significant turning point in the battle against financial fraud. These advancements are not merely incremental; they are pivotal in addressing the persistent challenge of imbalanced data, which significantly undermines the effectiveness of detection systems. As such, the field stands on the brink of a new era in FD, where deploying sophisticated algorithms could redefine security standards in digital financial transactions. The ongoing innovation in this sector is vital, promising not only to enhance accuracy but also to fortify the resilience of economic systems against cyber-fraud threats.
3. Proposed Model
The diagram depicts a comprehensive workflow for FS in the context of FD using MHO algorithms. FS is a critical preprocessing step aimed at reducing the dimensionality of high-dimensional datasets by eliminating irrelevant and redundant features, thereby enhancing the performance of ML models. To address the class imbalance, we adopted the under-sampling technique. This involved randomly sampling an equal number of instances from the minority class (fraudulent transactions) and the majority class (non-fraudulent transactions). By equalizing the distribution of classes in the dataset, we aimed to enhance the performance of classification models in detecting fraudulent activities. The dataset was partitioned into subsets based on the class label, segregating fraud and non-fraud transactions. Then, an equivalent number of instances (492 instances) were randomly sampled from the majority class to match the minority class. The sampled subsets from both classes were combined to form a balanced dataset, which was shuffled to introduce randomness and prevent bias in subsequent analyses. The FS phase involves the application of different well-known MHO algorithms, including brown-bear optimization (BBO), African vultures optimization (AVO), Aquila optimization (AO), sparrow search algorithm (SSA), artificial bee colony (ABC), particle swarm optimization (PSO), bat algorithm (BA), grey wolf optimization (GWO), whale optimization algorithm (WOA), grasshopper optimization algorithm (GOA), sailfish optimizer (SFO), Harris hawks optimization (HHO), bird swarm algorithm (BSA), atom search optimization (ASO), and Henry gas solubility optimization (HGSO). These algorithms are designed to efficiently search for optimal feature subsets while considering the combinatorial nature of the problem and the exponential increase in computational time with problem complexity. The MHO techniques are evaluated against nine common S-shaped and V-shaped TFs to produce multiple variants. These variants are assessed using random forest (RF) and support vector machine (SVM) classifiers as fitness evaluators. Finally, the performance of the top-performing algorithm variants for each classifier is compared with 15 MHO algorithms. This comparative analysis provides insights into the effectiveness and robustness of the MHO algorithms for FS in FD applications, as shown in
Figure 2.
3.1. Data Preprocessing
The preprocessing steps involve addressing class imbalance and preparing the dataset for feature selection. An under-sampling technique is employed to handle the severe class imbalance in the dataset, where an equal number of instances from both fraudulent and non-fraudulent classes are randomly sampled, resulting in a balanced dataset. This ensures that the models are trained on representative data from both classes. Additionally, the dataset is partitioned based on the class label, and the sampled subsets are combined and shuffled to introduce randomness. Following the preprocessing steps, a balanced dataset was obtained, comprising 984 instances (492 frauds and 492 non-frauds), wherein the occurrences of both fraudulent and non-fraudulent transactions are approximately equal, as in
Table 1. This balanced dataset is the foundation for subsequent analyses, which include feature engineering, model training, and evaluation.
3.2. MHO Algorithms for FS
The advantage of using MHO algorithms lies in their ability to pinpoint the key features within corpus data. In our recent research, we employed 15 distinct MHO algorithms to determine the crucial features needed for the accurate prediction of CCF and to identify which features, when eliminated, could improve or maintain the system’s predictive performance at its peak. A brief description of each algorithm is provided below.
Brown-Bear Optimization (BBO): The BBO algorithm is an MHO approach to optimization, first presented in paper [
18]. This method is unique because it is inspired by the specific abilities of brown bears to distinguish and sniff out scents, a trait not seen in other bear species. These special abilities have been translated into mathematical models to build the BBO technique, effectively replicating the natural behaviors of bears. Brown bears showcase significant intelligence through their ability to differentiate between various smells, using their sense of smell as a critical form of communication. They demonstrate pedal scent differentiation throughout their territories, each group displaying unique behaviors. These include specific walking patterns, deliberate stepping, and manipulation of their feet on the ground, all contributing to their ability to distinguish scents. Moreover, brown bears exhibit a sniffing pedal differentiation behavior, where group members prefer to engage in sniffing activities. The efficiency of the optimization algorithm is dependent on its ability to exploit and explore. The BBO algorithm’s exploitation aspect is inspired by the behavior of differentiating scents through pedals. In contrast, its exploration aspect is akin to the act of sniffing out differences in pedal scents.
African Vultures Optimization (AVO): AVO is a novel swarm-based optimization technique [
45] inspired by hunting African vultures, who scavenge for weak animals and carcasses. These birds exhibit diverse traits and are classified into three groups based on strength, with the strongest having the highest chances of securing food. Vultures employ rotational flight to cover vast distances and locate food sources, often using aggressive tactics to access prey. The AVOA mimics these behaviors to optimize search processes in various problem-solving scenarios [
46].
Aquila Optimization (AO): AO is introduced [
47] as a novel MHO algorithm inspired by the hunting strategies and behaviors of the majestic Aquila genus, which includes eagles known for their keen vision, agility, and efficiency in capturing prey. The algorithm adapts these characteristics into a computational framework for solving the optimization. By mimicking the efficient hunting techniques of eagles, the Aquila optimizer aims to offer a powerful and practical approach to optimization tasks, potentially outperforming existing algorithms in terms of convergence speed, solution quality, and robustness across various problem domains.
Sparrow Search Algorithm (SSA): SSA [
48] is an MHO method inspired by the social behavior and interactions of bird swarms, particularly sparrows. Sparrows, found globally and often living near human habitats, are omnivorous birds known for feeding on weed or grain seeds. They exhibit intelligence and memory, employing anti-predation and foraging behaviors. Captive sparrows are categorized into producers, actively seeking food sources, and scroungers, who obtain food from producers. Sparrows flexibly switch between these roles using similar foraging strategies. In SSA, each sparrow monitors the behavior of its neighbors, with attackers competing for high food intake for the flock. Sparrows utilize different foraging strategies to optimize energy utilization and increase food intake, with scrawny sparrows benefiting from these strategies. Sparrows in the search space are vulnerable to predator attacks and must seek safer locations. Sparrows exhibit natural curiosity and vigilance, emitting warning chirps to alert the group of danger, prompting them to fly away from potential threats. Based on the behaviors observed in sparrows, a mathematical model is formulated to construct the SSA algorithm, which leverages these principles to optimize search processes in various problem-solving scenarios [
49].
Artificial Bee Colony (ABC): ABC, in 2005, was proposed by Karaboga’s model [
50] of the bee colony’s foraging behavior. Inspired by the intelligent foraging behaviors of honey bees, the ABC algorithm’s search process comprises three primary phases: dispatching forager bees to assess nectar quantity, sharing information with onlooker bees, and deploying scout bees to explore potential new food sources. This algorithm is part of a broader trend of algorithms inspired by insect colonies’ foraging behavior under the “survival of the fittest” rule. This algorithm boasts easy implementation, minimal control parameters, and robust stability [
51].
Particle Swarm Optimization (PSO): PSO is an effective and straightforward optimization technique inspired by the social behavior of animals like birds and fish. It has been widely applied across numerous fields, such as ML, image processing, data mining, robotics, etc. PSO was initially introduced by Eberhart and Kennedy in 1995 [
52], drawing on models that mimic the collective behavior observed in natural species. As a result, PSO has found application across a broad range of industries for tackling various optimization challenges [
53].
Bat Algorithm (BA): BA, a meta-heuristic approach inspired by the echolocation of bats, was introduced by Yang in 2010 [
54]. This method draws inspiration from the echolocation behavior of microbats, which is characterized by varying pulse emission rates and loudness. Moreover, it incorporates principles of swarm intelligence (SI), influenced by observations of bats. Typically, bats utilize short, intense sound pulses during nocturnal hunts to locate obstacles or prey through the echoes these pulses generate. Additionally, the unique auditory system of bats enables them to ascertain the size and position of objects [
55].
Grey Wolf Optimization (GWO): In 2014, Zorarpacı and Özel [
56] introduced the GWO algorithm, which has since become one of the most used algorithms based on SI. The GWO algorithm’s inspiration comes from grey wolves’ natural hunting behavior, which efficiently tracks and captures their prey. The algorithm mimics the social hierarchy within a wolf pack to assign various roles during the optimization process. These roles are categorized into four groups: omega, alpha, delta, and beta, each signifying an optimal solution that has to be discovered [
57].
Whale Optimization Algorithm (WOA): Mirjalili and Lewis developed WOA [
58]. It was designed to tackle numerical optimization challenges. It incorporates three distinct mechanisms inspired by the feeding strategies of humpback whales: prey detection, prey encirclement, and bubble-net hunting. WOA aims to pinpoint the optimal solution for specific optimization issues by deploying a group of search agents. What sets WOA apart from similar algorithms are the unique rules it applies to enhance potential solutions at every step of optimization. Mimicking the predatory tactics of humpback whales, WOA zeroes in on and captures prey using a method referred to as bubble-net feeding [
59].
Grasshopper Optimization Algorithm (GOA): Saremi et al. [
60] introduced GOA, which draws inspiration from the natural foraging and swarming behavior of grasshoppers. This algorithm stands out due to its adaptive mechanism, balancing the exploration and exploitation processes. Due to these features, GOA has the potential to navigate the complexities of multi-objective search spaces more efficiently than other strategies. Moreover, it boasts a lower computational complexity than many current optimization methods [
61].
Sailfish Optimizer (SFO): SFO [
62] is a meta-heuristic algorithm that mimics the hunting behavior of sailfish preying on sardines. This hunting strategy aids predators in conserving energy. The algorithm features two populations: sailfish and sardines. The sailfish represent candidate solutions, with their positions in the search space corresponding to problem variables. SFO aims to randomize the movement of both the sailfish and sardines. Sailfish are dispersed throughout the search area, while the positioning of sardines assists in locating the best solution within the search space [
63].
Harris Hawks Optimization (HHO): HHO is a developed population-based optimization method inspired by the cooperative hunting behavior of Harris hawks in nature. Heidari et al. [
64] Introduced HHO to simulate the dynamic teamwork and hunting strategies of these hawks, which include techniques such as tracing, encircling, approaching, and attacking prey. In this model, the hawks’ pursuit efforts represent agents navigating the search area, with the prey representing the optimal solution. HHO effectively addresses various real-world optimization challenges and can handle discrete and continuous domains. It can explore uncharted search spaces and achieve high-quality solutions, making it suitable for tasks requiring optimal parameter extraction. Overall, HHO demonstrates promising performance and offers a novel approach to solving optimization problems inspired by nature’s cooperative behaviors [
65].
Bird Swarm Algorithm (BSA): Meng et al. [
66] unveiled a novel MHO strategy known as BSA for tackling continuous optimization issues. This approach is inspired by SI, which originates from the collective behaviors and interactions observed in bird swarms. By emulating the search for food, vigilance, and flight patterns of birds, BSA effectively leverages SI drawn from these avian swarms to address various optimization challenges [
67].
Atom Search Optimization (ASO): ASO is presented as an optimization method inspired by molecular dynamics. In this approach, the search space is navigated by the position of atoms, each representing a potential solution, evaluated based on its mass or “heaviness” [
68]. The interaction between atoms is determined by their proximity, leading to either attraction or repulsion. This dynamic causes lighter atoms to move towards the heavier ones. Additionally, heavier atoms have a slower movement, so they are more efficient in thoroughly searching local areas for improved solutions. On the other hand, the rapid movement of lighter atoms enables them to explore new and broader areas of the search space more effectively.
Henry Gas Solubility Optimization (HGSO): Hashim et al. introduced [
69] the HGSO algorithm in 2019. HGSO is a variant of the MHO algorithm, drawing inspiration from Henry’s law to mimic the behavior of gas particles [
70]. It employs gas clustering behavior to effectively balance exploitation and exploration within the search space, thus mitigating the risk of converging to local optima [
71].
3.3. ML Techniques
This section outlines the ML classifiers employed in the research to evaluate the subset of selected features regarding classification accuracy and fitness values.
3.4. Transfer Functions (TFs)
As the final solution acquired through different utilized MHO techniques comprises continuous values, MHO techniques cannot address an FS problem directly. Thus, it becomes essential to employ a mapping (transfer) function to convert these continuous values into binary 0 s or 1 s. Transfer functions (TFs) [
79] dictate the rate of change in the decision variable values from 0 to 1 and vice versa. Two shared transfer functions used for this purpose are S-shaped, which are so named because their graphical representation resembles the letter ’S’, and V-shaped transfer functions. S-shaped functions are used to map continuous values to probabilities, which can then be converted into binary values through a thresholding process. The output of an S-shaped transfer function lies between 0 and 1, representing the probability of including a feature. V-shaped transfer functions, on the other hand, are characterized by their ’V’-shaped graphical representation. These functions are also used to map continuous values to binary decisions but follow a mathematical approach different from that of S-shaped functions. The output of a V-shaped transfer function is used to determine the likelihood of a feature’s inclusion based on a different set of criteria.
When selecting a TF for the conversion of continuous to binary values, several considerations must be taken into account from the MHO techniques perspective, as follows:
The range of values from a TF should be between 0 and 1, representing the probability of a feature-changing state.
If the evaluation metric for the feature indicates suboptimal performance, the TF should show a higher probability of changing the current state in the next iteration.
When a feature is considered optimal, the TF should have a low probability of changing its current state.
The probability generated by the TF should rise as the evaluation metric approaches a threshold value. This enables less optimal features to have a higher likelihood of changing their state, which helps move towards more optimal solutions in subsequent iterations.
The probability derived from a TF must decrease as the evaluation metric moves away from the threshold value.
These concepts demonstrate the high capability of TFs to convert the process of continuous search into binary for each
, using Equation (
1):
where
represents the
j-th dimension of the
i-th individual at the current iteration
,
is a number selected randomly from within the range
, and
is the probability value obtained when applying a given TF to every
j-th component’s continuous value of agent
i. It is clear from Equation (
1) that we have two cases: (i) if the TF is S-shaped, then if
is less than the probability returned by the involved TF, the
j-th dimension of the original individual is set to 0; otherwise, it is set to 1; and (ii) if the TF is V-shaped, then if
is less than the probability returned by the involved TF, the
j-th dimension is negated; otherwise, it remains unchanged. Thus, continuous variables are successfully mapped into binary by using the S-shaped and V-shaped TFs and Equation (
1).
Table 2 reports the families of TFs, while
Figure 3 exhibits these two families, divided into S-shaped and V-shaped TFs. Here, it should be noted that the proposed MHO techniques were evaluated based on the nine TFs whose mathematical expressions are shown in the
Table 2.
3.5. Sampling Technique
The dataset used in this study contains 284,807 transactions with a severe class imbalance, where only 492 instances are fraudulent. Under-sampling was chosen because it creates a balanced dataset by reducing the majority of class instances without introducing synthetic data points. This methodological approach aligns to accurately capture the inherent characteristics of both fraudulent and non-fraudulent transactions. Given the computational efficiency and simplicity of implementation, under-sampling is suitable for handling large-scale datasets like ours. It allows us to focus on the intrinsic patterns within the data without the added complexity of generating synthetic instances, which may not accurately represent real-world fraud scenarios. Under-sampling is one of several techniques data scientists can use to extract more accurate information from originally unbalanced datasets [
82].
The steps taken for the under-sampling process are as follows:Random Sampling: Instances from the majority class (non-fraudulent transactions) were randomly sampled to match the instances in the minority class (fraudulent transactions), resulting in a balanced dataset.
Data Partitioning: The dataset was partitioned based on the class label, ensuring equal representation from both classes.
Combining and Shuffling: The sampled subsets were then combined and shuffled to introduce randomness and prevent ordering bias.
While reducing the dataset size to 492 instances from the original 284,807 transactions does involve a reduction in the overall data volume, our approach ensured that this reduction was performed in a manner that preserved the integrity and representativeness of the data:
We employed stratified sampling to ensure that the under-sampling process maintained a proportional representation of both fraud and non-fraud within the reduced dataset. This approach helps mitigate the risk of losing critical information that may be present in the minority class.
Post-sampling, rigorous feature engineering, and feature selection techniques(MHO) maximize the retained information’s relevance and quality. Focusing on the most discriminative features, we aimed to capture and leverage the essential characteristics that differentiate fraudulent from non-fraudulent transactions.
4. Experimental Methodology
This section presents the performance evaluation results of 15 MHO techniques and their counterparts with different Ml classifiers RF and SVM based on nine TFs. The experimental dataset was downloaded from the CCF online dataset on the Kaggle-ML Repository. The parameters of the utilized MHO methods and ML classifiers are defined in
Section 4.2.
Section 4.3 states the utilized Performance Metrics. The experimental results are discussed in
Section 4.4 and
Section 4.5. The convergence curves are shown in
Section 4.7.
4.1. Dataset Description
To assess the robustness of the MHO techniques with nine different TFs (five
and four
functions), the Kaggle dataset was considered in this research. It can be downloaded directly from the Kaggle repository (
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (accessed on 16 July 2024).This dataset was used to create a framework for CCFD. More relevant autonomous information functions and target yield markers are extracted and utilized to detect fraud. The dataset contains transactions made by credit-cardholders in September 2013, primarily from European cardholders. It includes 284,807 transactions over two days, of which only 492 transactions are recorded as fraud. This creates a severe class imbalance, with fraudulent transactions accounting for only 0.172% of all transactions. The dataset consists of numerical input variables resulting from a principal component analysis (PCA) transformation. Due to confidentiality constraints, the original features and additional background information about the data are unavailable. The features V1 through V28 represent the principal components obtained through PCA, while ’Time’ and ’Amount’ are the only features not subjected to PCA transformation. ’Time’ means the elapsed time in seconds between each transaction and the first transaction in the dataset, while ’Amount’ denotes the monetary value of each transaction. The objective class is divided into two categories: 1 for fraudulent transactions and 0 for non-fraudulent transactions
Table 3. We explored various ML techniques using binary or multiclass datasets in the Python programming language.
4.2. Parameter Settings
Several MHO algorithms based on two ML classifiers were evaluated utilizing 9 TFs. These MHO algorithms include BBO, AVO, AO, SSA, ABC, PSO, BA, GWO, WOA, GOA, SFO, HHO, BSA, ASO, and HGSO. Each technique underwent thirty experiments on the utilized dataset due to the stochastic character of the Meta-heuristic techniques. We documented evaluation metrics based on average results to make a fair comparison between different methods. We assigned a population size of 10 and a maximum of 100 iterations to all techniques. The benchmark reflects the problem size by the number of features, and the search domain is set to , allowing exploration within a constrained space.
Our framework used a ten-fold cross-validation method to ensure the outcomes’ reliability. This approach involves randomly dividing the benchmark into training and testing subsets. The training subset comprises 80% of the data and is used to learn the machine learning model, and the test dataset evaluates the selected attributes. Each method’s configurations and parameter standards were based on their initial versions and information from their primary publications, as introduced in
Table 4. Our computing environment utilized Python 3.10, an Intel Core i7 processor, 16 GB of RAM, and an NVIDIA GTX 1050i GPU.
4.3. Performance Metrics
The benchmark data are determined 30 times to compare the effectiveness of 15 MHO algorithms with RF and SVM classifiers. The following are the FS methodology evaluation metrics that were utilized.
The average accuracy (()) is evaluated by executing the method for 30 runs and calculating the percentage of correct data classification. The accuracy is determined using the following equation:
where
m represents the number of samples in the test subset,
and
, respectively, indicate the predicted and reference class labels for sample
r. The comparison function
determines the matching between the predicted and the reference label. If they match, then
equals 1; otherwise, it equals 0.
The average fitness outcomes () are evaluated by implementing the approach in 30 individual trials and unequivocally demonstrate that decreasing the selected attributes’ number and increasing the accuracy rate synergistically work together. It is crucial to note that the ideal outcome is determined by the following formula: the lowest value represents the best result.
where
is the optimal fitness value obtained in the
k-th run.
The average number of features chosen () indicates the average number of chosen features by implementing the methodology individually 30 times and is represented as
where
is the length of features chosen in the optimal solution for the
k-th run, and
is the entire length of attributes in the utilized dataset.
The average computational time () shows the execution time in seconds for each algorithm validated over 30 different runs and is represented as (
5)
where
N is the runs’ number, and
is the computational time in seconds at run
i.
Standard Deviation (STDE): The average results from the thirty runs of the algorithm on the used dataset are assessed for stability as
where
Y is the metric to be assessed,
is the metric value
Y in the
k-th run, and
is the average of the metric over 30 independent runs.
In the following sections, we examine the analytical results, highlighting the most promising outcomes in bold.
4.4. Comparisons Based on RF Classifier Using CCF Dataset
In this section, we compare the performance of the fifteen MHO techniques based on the RF classifier. This evaluates the average classification accuracy, the number of selected features, and the average computational time. The aim is to assess the impact of the MHO algorithms in choosing the most relevant features.
Firstly, the classification accuracy is estimated using the original RF classifier without utilizing MHO techniques based on the number of features in the CCF dataset. The original RF classifier achieved 0.9328 classification accuracy. On the other hand,
Table 5 shows the performance analysis of the RF classifier with 15 MHO techniques regarding different TFs based on the average classification accuracy to evaluate the impact of the MHO algorithms on the utilized dataset. Remarkably, SFO-RF ranked first with all TFs except
(
achieved 0.9762,
achieved 0.9754,
achieved 0.9773,
achieved 0.9770,
achieved 0.9779,
achieved 0.9768,
achieved 0.9756, and
achieved 0.9765). BBO-RF ranked first regarding
by obtaining 0.9754 classification accuracy.
Secondly,
Table 6 shows the performance analysis of the RF classifier with 15 MHO techniques regarding different TFs based on the average number of selected features on the utilized dataset. It is remarkable that (
) based on the PSO-RF according to
selected 5.0667 features, WOA-RF regarding
selected 9.8667 features, SFO-RF regarding
selected 6.8667 features; furthermore, SFO-RF regarding
selected 7.1333 features, PSO-RF regarding
selected 4.6000 features, AO-RF regarding
selected 11.3667 features, HHO-RF regarding
selected 11.8000 features, AO-RF and PSO-SSA regarding
selected 11.8333 features, and AVO-RF regarding
selected 11.5667 features. Therefore, PSO-RF regarding
selected the least number of chosen features.
Thirdly,
Table 7 shows the performance analysis of the RF classifier with 15 MHO techniques in terms of different TFs based on the average classification fitness to evaluate the impact of the MHO algorithms on the utilized dataset. Remarkably,
based on the SFO-RF method ranked first with all TFs except
(
achieved 0.0262,
achieved 0.0290,
achieved 0.0248,
achieved 0.0252,
achieved 0.0281,
achieved 0.0262,
achieved 0.0275, and
achieved 0.0282). BBO-RF ranked first regarding
.
Finally,
Table 8 shows the performance analysis of the RF classifier with 15 MHO techniques in terms of different TFs based on the average computational time to evaluate the impact of the MHO algorithms on the utilized dataset. Remarkably, the BA-RF method ranked first with five TFs based on
(
took 19623 MS,
took 16105 MS,
took 16243 MS,
took 20211 MS, and
took 17714 MS). The AO-RF method ranked first based on
and
. The AVO-RF method ranked first based on
. Finally, the HHO-RF method ranked first based on
.
In the end, The results show that the SFO-RF method achieves the best average accuracy and fitness regarding 8 of 9 TFs and the best feature size regarding 2 of 9 TFs. PSO-RF, SFO-RF, and AO-RF methods also have the best feature size for 2 of 9 TFs. For the average computational time measure, the BA-RF method achieved the best result for 5 of 9 TFs, while the AO-RF method achieved the best result for only 2 of 9 TFs.
4.5. Comparisons Based on SVM Classifier Using CCF Dataset
In this subsection, we compare the performance of the SVM classifier with 15 MHO techniques based on the average classification accuracy, the average number of selected features, the average fitness values, and the average computational time to evaluate the impact of the MHO algorithms in improving classification accuracy and choosing the most appropriate features.
Firstly, the classification accuracy is based on the original SVM classifier (before FS), and the original number of features on the CCF dataset is 0.5378. On the other hand,
Table 9 shows the performance analysis of the SVM classifier with fifteen MHO techniques regarding different TFs based on the average classification accuracy to evaluate the impact of the MHO algorithms on the utilized dataset. Remarkably,
based on the SFO-SVM method ranked first for four TFS (
achieved 0.9406,
achieved 0.9412,
achieved 0.9401, and
achieved 0.9412). The AO-SVM method ranked first with
by achieving an accuracy of 0.9328. BBO-SVM ranked first with
by achieving an accuracy of 0.9347. BBO-SVM, AVO-SVM, and AO-SVM ranked first for
by achieving an accuracy of 0.9339. BBO-SVM ranked first for
by achieving an accuracy of 0.9347. Finally, AO-SVM ranked first by achieving an accuracy of 0.9339 for
.
Secondly,
Table 10 shows the performance analysis of the SVM classifier and 15 MHO techniques in terms of different TFs based on the average number of selected features on the utilized dataset. It is remarkable that
based on the AO-SVM ranked first for five TFs (
selected 1.5333 features,
selected 1.3667 features,
selected 1.5 features,
selected 4.6333 features, and 5 features were selected by
). SSA-SVM ranked first for two TFs (
selected 1.9 features, and
selected 1.5333). BBO-SVM ranked first according to
by selecting 5.2 features. Finally, BBO-SVM and AVO-SVM ranked first according to
by selecting five features.
Thirdly,
Table 11 shows the performance analysis of the SVM classifier with 15 MHO techniques in terms of different TFs based on the average classification fitness to evaluate the impact of the MHO algorithms on the utilized dataset. Remarkably, (
) based on the SFO-SVM ranked first with 4 TFs (
achieved a fitness value of 0.0595,
achieved a fitness value of 0.0591,
achieved a fitness value of 0.0605, and
achieved a fitness value of 0.0590). AVO-SVM ranked first with
by achieving a fitness value of 0.0636. BBO-SVM ranked first with
,
, and
by achieving fitness values of 0.0664, 0.0672, 0.0665 sequentially. Finally, AO-SVM ranked first by achieving a fitness value of 0.0672 for
.
Finally,
Table 12 shows the performance analysis of the SVM classifier with 15 MHO techniques regarding different TFs based on the average computational time to evaluate the impact of the MHO algorithms on the utilized dataset.
Remarkably, with the HHO-SVM method ranked first in five TFs ( took 9258, took 6164 MS, took 6059 MS, took 6221 MS, and took 5757MS). PSO-SVM ranked first with by taking 6395 MS and 5365 MS with . WOA-SVM ranked first with by taking 7940 MS. AVO-SVM method ranked first with by taking 8993 MS. Finally, PSO-SVM method ranked first based on by taking 5365 MS.
In the end, the results show that the SFO-SVM method achieves the best average accuracy and fitness for 4 of 9 TFs sequentially. At the same time, BBO-SVM and AO-SVM methods achieve the best accuracy for 3 of 9 TFs. For average feature size, AO-SVM performs well for 5 of 9, and BBO-SVM and SSA-SVM perform well for 2 of 9 TFs. For average fitness, BBO-SVM performs well for 3 of 9 TFs. HHO-SVM reaches the best result in computational time measure for 5 of 9 TFs, while the PSO-SVM method achieves the best result for only 2 of 9 TFs.
Finally, the SFO method achieves the best average accuracy and fitness results for two classifiers, RF and SVM, and with most TFs.
4.6. Comparing with Other Studies That Utilized the Credit European Cardholders Dataset
In this subsection, a detailed comparison of various research efforts on the same CCF Kaggle dataset is presented.
Table 13 illustrates the impact of different methodologies on classification accuracy. Notably, our research (CCFD) distinguishes itself by applying the MHO (SFO) feature selection technique, achieving accuracy rates of 97.79% with the RF classifier and 94.12% with the support SVM classifier, employing under-sampling to mitigate data imbalance. This improvement over earlier studies, such as the 2018 research by Lakshmi and Selvani, achieving 95.50% accuracy using RF and oversampling, and the 2023 studies by Mniai et al., who utilized varied feature selection methods but attained lower accuracies, highlights the effectiveness of MHO (SFO). Our findings suggest that advanced feature selection techniques like MHO (SFO) can significantly boost machine learning performance, providing a robust approach for managing intricate datasets.
4.7. Convergence Investigation
This section examines the performance of 4 methods with 15 MHO techniques (RF
(with TFs), RF
(with TFs), SVM
(with TFs), and SVM
(with TFs)) for handling the FS strategy using the European credit-cardholders dataset. The aim is to evaluate their convergence capabilities, as shown in figures sequentially labeled
Figure 4,
Figure 5,
Figure 6 and
Figure 7. These graphs indicate that the SFO-RF algorithm demonstrated superior and optimal convergence over the dataset compared to its peers, assessed under the same population size and iteration number conditions.
4.8. Evaluating Model Robustness and Generalization
Detecting fraud in financial transactions is a critical task that demands reliable and adaptable predictive models. In this study, we assessed the robustness and ability to generalize of our chosen model for fraud detection. After comprehensive experimentation, we selected the best-performing model from our experiments in term of accuracy, which was trained on a balanced dataset. The chosen model utilized a random forest classifier, along with a transfer function and sailfish optimizer. To evaluate its effectiveness, we tested the model on an imbalanced dataset comprising 20% of the original data, totaling 284,807 samples. The model achieved an accuracy of 97.14% on this imbalanced test set, demonstrating its robustness and ability to generalize effectively to real-world data distributions.