[1]\fnmVadlamani \surRavi
[1]\orgdivCentre for Artificial Intelligence and Machine Learning, \orgnameInstitute for Development and Research in Banking Technology, \orgaddress\streetCastle Hills Road #1, Masab Tank, \cityHyderabad, \postcode500076, \stateTelangana, \countryIndia
2]\orgdivResearch Laboratory of Electronics, \orgnameMassachusetts Institute of Technology, \orgaddress \postcodeMA, \stateCambridge, \countryUnited States of America
3]\orgdivDepartment of Computer Science and Engineering, \orgnameNational Institute of Technology Warangal, \orgaddress\cityWarangal, \postcode506004, \stateTelangana, \countryIndia
Improved Differential Evolution based Feature Selection through Quantum, Chaos, and Lasso
Abstract
Modern deep learning continues to achieve outstanding performance on an astounding variety of high-dimensional tasks. In practice, this is obtained by fitting deep neural models to all the input data with minimal feature engineering, thus sacrificing interpretability in many cases. However, in applications such as medicine, where interpretability is crucial, feature subset selection becomes an important problem. Metaheuristics such as Binary Differential Evolution are a popular approach to feature selection, and the research literature continues to introduce novel ideas, drawn from quantum computing and chaos theory, for instance, to improve them. In this paper, we demonstrate that introducing chaos-generated variables, generated from considerations of the Lyapunov time, in place of random variables in quantum-inspired metaheuristics significantly improves their performance on high-dimensional medical classification tasks and outperforms other approaches. We show that this chaos-induced improvement is a general phenomenon by demonstrating it for multiple varieties of underlying quantum-inspired metaheuristics. Performance is further enhanced through Lasso-assisted feature pruning. At the implementation level, we vastly speed up our algorithms through a scalable island-based computing cluster parallelization technique.
keywords:
Feature Subset Selection; Differential Evolution; Quantum inspired algorithms; Chaos; Big Data1 Introduction
The fundamental objective of feature selection is to identify the most important and discriminative features from a given set of features. Its prominence has garnered the attention of researchers and practitioners in domains replete with big, high-dimensional datasets. Feature selection [4, 46] entails a great reduction in computational complexity, improvement in human comprehensibility, and easy deployment of models in production. Wrapper methods based on metaheuristics solve the feature subset selection (FSS) problem by posing it as a combinatorial optimization problem. These methods attempt to identify an efficient feature subset with the least cardinality and associated high accuracy among the possible number of combinations, where is the total number of features in the dataset. Among the metaheuristics employed for this purpose, evolutionary algorithms have been proven to be most efficient for determining optimal feature subsets owing to the inherent parallelism present in the population-based search [27, 45].
Empty set | |
---|---|
Population size | |
Population comprising N number of solutions | |
Number of dimensions / features | |
ith solution of population X and having dimensions | |
Mutated vectors at generation comprising solutions | |
solution of mutation population of dimensions | |
F | Mutation factor |
trial vectors at generation t comprising ps solutions | |
solution of mutation population of dimensions | |
Crossover rate | |
Maximum number of iterations | |
Randomly chosen index | |
Chaotic number at tth time step | |
Random number generated between | |
Logistic map control parameter | |
AUCi | AUC score of an ith solution |
cardinalityi | Cardinality score of an solution |
Chaotic random number at time step | |
RDD of population X | |
Train dataset | |
Test dataset | |
Local population size | |
Maximum number of migrations | |
Maximum number of generations | |
Local population / sub-population pertained to a data island comprising lps solutions |
Among the evolutionary algorithms, Differential Evolution (DE) proved to be robust while solving many combinatorial and continuous optimization problems [9, 8, 32]. Despite its supremacy over other algorithms, DE suffers from a tendency to get stuck prematurely in local optimal solutions, which affects its exploration and exploitation capabilities [9]. To alleviate these issues, researchers proposed several quantum-inspired algorithms (QIEA) to solve a myriad of combinatorial optimization problems such as knapsack problem, truck trail problem, and portfolio optimization. Often, QIEAs accelerate the evolution process owing to their quantum parallelization [38, 19] and entanglement of the quantum state. Further, the diversity and convergence rate are improved well enough to increase the probability of getting better global optimal solutions.
In today’s data-rich environment, the humongous growth of high-dimensional datasets warrants the critical need for developing scalable algorithms [17, 36]. Despite the popularity of Hadoop and its ecosystem as a big data framework, Spark’s unique features, including in-memory computing and seamless integration, have made it a viable and competitive alternative to Hadoop [47]. However, the extant Quantum-Inspired Evolutionary Algorithms (QIEAs) are not scalable to large, high-dimensional datasets.
Our contributions in this paper include:
-
•
proposing chaotic, quantum-inspired evolutionary algorithms for FSS in high-dimensional data and demonstrating their superiority over the extant methods on four problems.
-
•
utilizing the Lyapunov exponent to ensure that we work in a truly chaotic regime.
-
•
introducing LASSO LR in place of LR as a classifier for the FSS wrapper
-
•
proposing parallel versions of the above-mentioned algorithms operating in a single-objective environment under an island-based approach in Apache Spark framework.
The paper is structured as follows: Section 2 reviews the literature, Section 3 presents our new algorithms, Section 4 describes the datasets analyzed in the study, Section 5 analyses the results obtained, and section 6 concludes the paper.
No. | Algorithm | Description |
---|---|---|
1 | LR | Logistic Regression |
2 | LLR | Least Absolute Shrinkage and Selection Operator LR |
3 | BDE | non-quantum counterpart of QBDE |
4 | QBDE-I | a non-gate quantum variant of BDE with threshold trick and random numbers generated from the uniform distribution |
5 | QBDE-II | a gate quantum variant of BDE with threshold trick and random numbers generated from uniform distribution |
6 | CQBDE-I | a chaotic maps guided variant of QBDE-I but without Lyapunov exponent guidance |
7 | CQBDE-II | a chaotic variant of QBDE-II without Lyapunov exponent guidance |
8 | CLQBDE-I | a chaotic variant with Lyapunov exponent of QBDE-I |
9 | CLQBDE-II | a chaotic variant with Lyapunov exponent of QBDE-II |
10 | CQIEA | an algorithm from [35] |
11 | CTQIEA | a variant of CQIEA [35]which integrates threshold trick and chaos without Lyapunov exponent guidance |
12 | CLTQIEA | a variant of CQIEA [35] which incorporates both threshold trick and Lyapunov exponent guidance. |
2 Background and Literature Review
2.1 Differential Evolution
Binary Differential Evolution (BDE), a stochastic population-based global optimization algorithm, starts by initializing the random population, consisting of candidate solution vectors (), where is the population size. This candidate solution vector follows the binary encoding scheme. Each candidate solution vector is subjected to all of the following three heuristics in each iteration (or generation) of the algorithm (see Fig. 1(a)).
At each generation , the candidate solution vector () within dimensional search space, is subjected to the mutation operation yielding the mutant vector (). The mutation operation is applied as presented in Eq. (1).
(1) |
where , and are three randomly chosen distinct vectors from the current generation . , the mutation factor, is a user-defined parameter, and lies in the range . After this, the mutant vector may not be binary anymore. Hence, sigmoid based discretization process (see Eq. (2)) is applied to every ( member of the ) thereby converting continuous vector into a binary vector.
(2) |
Then, the discretized mutant vector is subjected to crossover operation where it is subjected to the mating with the corresponding candidate solution vector to generate the trial vector. The crossover operation is applied to trial vector , as presented in Eq. (3).
(3) |
where is the bit of , is the random number generated in the interval from a uniform distribution. is a randomly chosen index to make sure that the generated trial vector is different from the mutant vector. CR represents the crossover rate, is a user-defined parameter, and lies in the range .
Finally, the fitness score is computed for the trial vectors. Then, the selection operation is applied by comparing the corresponding target vectors and trial vector to produce an offspring. Better solutions survive and forms the parent population for the subsequent iteration. The selection operation follows the rule as presented in Eq. (4):
(4) |
As mentioned earlier, this is continued till the completion of maximum iterations or other convergence criteria, if any, are met.
2.2 Quantum Computing
To exploit the notions of the quantum theory within classical computers, quantum-inspired algorithms are proposed by [19]. It employs quantum mechanics concepts such as quantum measurement, superposition of states, inference, and entanglement. A quantum bit is the basic unit of information in quantum computation defined by the linear combination of the two states as given in Eq. (5).
(5) |
The coefficients of and are two complex numbers that must satisfy the norm relation as given in Eq. (6).
(6) |
where the probability of the observing state is and the probability of the state is . A quantum register is composed of qubits containing possible values simultaneously owing to the superposition of states.
2.3 Overview of Chaos Theory
The theory of chaos originated in the 1800s and was further developed by [7] to tackle challenges in complex non-linear systems [30]. Chaotic systems are dynamic and deterministic, evolving from initial conditions, with trajectories describing the system states in the state space. Chaos heavily relies on the initial conditions and exhibits two properties: ergodicity and intrinsically stochastic nature. Chaotic maps generate the sequence of numbers that exhibit these chaotic properties, aiding EAs in escaping local minima [31, 25, 26]. It is noticed that these chaotic maps are introduced to dynamically adjust the hyperparameters, and enhance adaptability to handle evolutionary dynamics optimally. It also facilitates searching in the regions which are left out by the random sequence. They enable exploration in regions left out by random sequences and have been extensively studied in large-scale continuous optimization problems. Chaotic maps produce a series of numbers from a probability distribution that differs from a uniform (0,1) distribution, exhibiting deterministic randomness. This deterministic nature allows predicting the sequence of numbers generated, as they are governed by differential equations and subject to the initial conditions. Now, a well-known chaotic map logistic map which is used in the current study is discussed below:
Logistic map [28]: The Logistic map exhibits chaotic behaviour in a discrete-time demographic model. This is a polynomial mapping of degree 2. The mathematical representation is defined in Eq. (7).
(7) |
Here, the constant lies in the range of and determines the behaviour of this Logistic map. In the current research study, = 4 is chosen.
2.4 Literature Review
The first quantum-inspired evolutionary algorithm (QEA) was proposed by Han and Kim for the knapsack problem in the pioneering paper, [19]. Their approach employed qubit notation and a rotation gate together with the migration operation, strategically guiding the solution to reach a global optimum. [22] proposed an adaptive quantum differential evolution (AQDE), which dynamically adjusts the mutation and cross-over rates based on their success streaks. It outperformed QEA in solving the knapsack problem. [39] proposed elitism-based quantum differential evolution for FSS on small datasets. [13] proposed Multi-strategy quantum differential evolution, which integrates multiple strategies, including a mutation strategy with difference vector, multi-population mutation, and adaptive rotation angle state. Other quantum evolutionary algorithms studied in the literature include (i) hybrid of QDE (QDE acronym is not defined anywhere) and grey wolf optimizer [43] tailored for the knapsack problem, (ii) multi-objective quantum-inspired hybrid DE [24], which is a hybrid of genetic algorithm quantum variants and DE for multi-objective next-release problems, and (iii) vector hop algorithm [18], a hybrid of differential evolution and particle swarm optimization. [12] proposed an improved quantum differential evolution by incorporating the principles of the divide-and-conquer concept of a cooperative coevolutionary algorithm, which improved both exploration and exploitation capabilities. (iv) Another hybrid DE where the first stage invoked QDE and the resultant population is passed on to BDE in the second stage, which continues the evolution process. All these hybrids improved the search process and obtained better convergence capabilities than their individual constituents in the standalone mode.
We now briefly survey the parallel and distributed versions of DE [20, 33, 37, 48, 40] developed across varied environments like Spark, CUDA, MPI, and OpenMP. [48] proposed two master-slave-based parallel approaches, namely, (i) a data-based MapReduce model and (ii) a population-based MapReduce model. [40] introduced two parallel strategies, master-slave and island approaches, evaluated on the AWS Spark cloud and tested the performance on benchmark functions. [6] and [5] developed master-slave-based parallel DE approaches for large-scale clustering and cluster optimization problems, respectively. [2] introduced a cost-sensitive DE classifier (SCDE) based on Euclidean distance for imbalanced classification datasets. [1] introduced a fine-grained parallel DE under the OpenMP framework for optimal networking, aiming to reduce the computational load on mappers and reducers. [11] introduced Parallel DE (PDE) under the Spark environment, showing promising speedup. [21] presented SgtDE, a grouping topology model for large-scale optimization, achieving significant speedup. [10] developed parallel DE under CUDA, while [44] designed a self-adaptive DE framework in CUDA for benchmark functions. Further, several parallel versions of DE are proposed to solve resource allocation problems [3, 15, 14], hydro scheduling [16], large-scale clustering [23], optimized workflow placement [41], and multi-objective flow scheduling problems [34] as well. Table A1 of the Appendix captures a brief overview of the literature.
3 Proposed Methodology
This section introduces the objective function employed in the current study, followed by the algorithm, and an overview of the proposed parallel mechanism, which applies to all algorithms discussed. All the acronyms employed in the study are presented in Table 1.
3.1 Objective Function
The objective function considered in this study is the Area under the receiver operating characteristic Curve (AUC) Eq.(8). It is defined as the mean of specificity and sensitivity. We specifically considered AUC due to its proven robustness while handling imbalanced datasets.
(8) |
Where sensitivity (refer to Eq. (10)) is the ratio of the positive samples that are correctly predicted to be positive to all the predicted positive samples. This is also called the True Positive Rate (TPR).
(9) |
Where and are the true positive and false negative, respectively. Specificity (refer to Eq. (10)) is the ratio of the negative samples that are correctly predicted to be negative to all the predicted negative samples. This is also called True Negative Rate (TNR).
(10) |
Where and are the true negative, and false positive, respectively.
This study employs LR and LLR as the classifiers for the proposed FSS wrapper because their training converges quickly. Unlike LR, LLR performs regularization and reduces the number of features too.
3.2 Proposed Chaotic Quantum Inspired Algorithm
In this study, we propose two distinct algorithms based on chaotic and quantum principles: (i) non-gate variant, one focusing on information exchange across different qubits without gates (CLQBDE-I), and (ii) gate variant, the second involving information exchange via gates (CLQBDE-II) (see Table 2 and Fig. 1(b), Fig. 1(c)).
Unlike the chaos-based algorithms proposed in the literature [29, 4, 46], the Lyapunov exponent guides our population initialization method, ensuring that the wrapper based FSS operates within a true chaotic regime. Consequently, chaotic numbers are not utilized from the initial time step of the logistic chaotic map, unlike the existing approaches. Instead, they are introduced after a specific number of time steps, which are determined by the Lyapunov exponent. After thorough experimentation, we discovered that the logistic map enters a truly chaotic regime after time steps. This is observed offline and employed within the algorithm. Histogram plots comparing the distributions of numbers generated from the uniform random distribution and logistic chaotic map guided by Lyapunov exponent from 1 to 5000 time steps are presented in Fig. A1 of the Appendix. It is crucial to note that in the current study, Lyapunov exponent-guided logistic map is introduced as the sequence generated by chaotic numbers is observed to be in true chaotic regime after first timesteps. Consequently, both variants, CQBDE-I and CQBDE-II, incorporate chaotic numbers in the initialization for each qubit’s alpha and beta states for solution, satisfying Eq. (11).
(11) |
(12) |
where is the number of dimensions, is the index of the solution in the population of size .
The quantum representation of each population member is given by Eq. (11) where the components satisfy the condition in Eq. (12). It consists of candidate solution vectors (), where is the population size and each solution vector is of size , where is the number of features. Often thus, formed quantum state vectors need to be converted into binary encoded solutions for combinatorial optimization problems. In the literature, the following rule is generally adopted.
(13) |
where and are the solution and feature indices, respectively.
However, we observed that as the number of dimensions () increases, more features are being selected. This observation prompted us to introduce a constraint-based quantum state-to-binary solution conversion. The modified rule is defined as follows (refer to Eq. (14)). This Equation is called as the threshold trick. This is employed for both the QBDE-I and QBDE-II variants.
(14) |
Here, represents a user-defined parameter within the range of , is the number generated from rand(0,1).
We proposed two different variants of chaotic quantum algorithms guided by the Lyapunov exponent and threshold trick. The distinction between them is as follows: (i) CLQBDE-I is a non-gate variant, and (ii) CLQBDE-II is a gate-variant where a rotation gate is employed in place of quantum mutation.
3.2.1 CLQBDE-I
The proposed Chaotic Quantum Binary Differential Evolution-I (CLQBDE-I) begins by generating the quantum matrix as described in Eq. (11). For each candidate solution , the variables are generated from the chaotic series, while the corresponding variables are obtained by using the relation Eq. (12). As mentioned earlier, the first step of the chaotic series is guided by the Lyapunov exponent. Once generated, the binary encoded solution is produced using the threshold trick outlined in Eq. (14). The fitness score is calculated by training the classifier, LR or LLR.
(15) |
Then, the initial population undergoes mutation operation (refer to Eq. 15) effected by each candidate solution’s and states by randomly selecting the three solutions. This operation is performed for a pre-specified population size.
(16) |
where is the number generated from rand(0,1).
Subsequently, the mutated quantum matrix of each of the solutions undergoes crossover operation as described in Eq. 16. Selection is then performed by combining parent and offspring populations and sorting them based on fitness scores. The resulting population becomes the parent population for the next iteration. This iterative process continues for a pre-specified number of iterations, resulting in the convergence of the algorithm, and the resultant population is evaluated on the test data, which remains unchanged throughout the evolution process.
3.2.2 CLQBDE-II
We proposed another variant, gate-based quantum binary differential evolution, and named it chaotic quantum binary differential evolution-II (CQBDE-II). The distinction between them lies in employing the rotation gate in place of the quantum mutation operation to generate the mutated quantum matrix.
The population is initialized The rotation gate is introduced to mimic the behaviour of the mutation gate. These rotation gates are represented as unitary matrices, which are employed to rotate the state of a qubit based on the rotation angle. We followed the look-up table obtained from [19] to decide the rotation angle. The rotation gate is multiplied with each qubit of the corresponding qubit in the candidate solution and thus the mutated vector is generated. Thissolution vector is then subjected to the crossover operation, selection, and threshold trick. These steps are identical to that of CLQBDE-I as discussed in the sub-section 3.2.1. The rotation gate employed in this study is presented in Eq. 17.
(17) |
Key | Value: |
---|---|
, , , | |
, , , | |
… | … |
, , , |
3.3 Proposed Parallel Approach
The proposed parallel algorithm adopts an island approach, which divides the data into several partitions known as data islands.
Table 3 depicts the population schema maintained for all approaches in this work. The population consists of a solution of size N. Each solution has two different fields: (i) Key field having the unique id information to identify the solution uniquely, (ii) The Value field has the following subfields: (a) Binary vector: which is of length number of features, nfeat is a binary vector where the presence of ’0’ says that a particular feature is not selected, and ’1’ says that the particular feature is selected. (b) Quantum matrix: Here, the quantum state information of the corresponding is stored.(c) Trained model coefficients: as we know that in the wrapper methods, a classifier is chosen to evaluate the solution’s performance. Thus trained model coefficients are stored in this sub-field. The main reason for storing them is to use them in the test phase. (d) AUC: The trained model results in an AUC for each solution. That information is stored in the sub-field. This kind of schema makes sure to preserve the solution-related information in a single space.
The proposed parallel approach comprises three main phases: Initialization, Training phase, and Test phase. As illustrated in Fig. 2, there are sub-populations and data islands. The parallel algorithm operates at two levels: driver and worker. The corresponding algorithms are presented in Algorithms 1 and 2 (see Appendix), respectively. All these phases, along with migration rule invocation, happen at the driver node, while the evolution of the parallel EAs happens during the training phase, and evaluation during the test phase is executed at the worker node. However, filtering top solutions post-migration and the aggregation of the final results during the test phase is performed at the driver node.
Phase-I: Initialization In this phase, the quantum matrix for the population size is chaotically initialized using the biased sampling method [42] depending on the underlying metaheuristic (i.e., DE / EA). This population follows the data structure as presented in Table 3. Subsequently, binary-encoded solutions are obtained using a threshold trick. Thus initialized population is treated as a global population and broadcasted along with hyperparameters from the driver to the worker nodes.
Phase-II: Training Phase A miniature EA is evolved in parallel in each data island at each worker node. The block diagram of miniature wrapper is depicted in Fig. 1(d). A sub-population/local population of size () is initially extracted by following random sampling with replacement. The phase begins, where fitness scores are evaluated using the classifier, LR or LLR, followed by the updation of the quantum matrix, binary encoded solutions, classifier coefficients, and AUC in the respective fields (see Table 3). This process is common for all the algorithms. Heuristics specific to each algorithm are applied to the quantum matrix of the population, resulting in the generation of an offspring quantum matrix. operations follow, and the top solutions are retained after selection. This process is repeated for a pre-specified maximum number of generations (), resulting in solutions from each of the data islands. Upon completing these steps, the control returns to the driver, where the migration policy is invoked.
Migration Policy Solutions are sorted based on the fitness scores, and the top solutions are selected. After invoking the migration policy, the worker algorithm is again executed using the population obtained post-migration which is distributed to worker nodes. Subsequently, the sub-population corresponding to a single island is selected by following random sampling with replacement and the evolution process continues. This process is repeated for a pre-specified number of migrations ().
Phase-III: Test Phase In this phase, the converged population obtained in the training phase is evaluated on the test dataset. That means, AUC is computed on the test dataset using the coefficients corresponding to each solution. Subsequently, test fitness scores are computed, and the results are reported accordingly. Notably, while the proposed parallel metaheuristics differ in their respective heuristics, they adhere uniformly to the same train-and-update step, migration policy invocation, and test phase.
4 Results & Discussions
All experiments are conducted on a 5-node cluster configuration, comprising a driver node and 4 slave nodes, each equipped with Intel i7 generation processors and 32GB of RAM. Notably, the driver node serves the dual role as the driver node as well as the worker node. The benchmark datasets analyzed in this study are briefly described in Table 4, while Table 5 presents the optimal hyperparameters identified through meticulous fine-tuning following grid-search. A train-test split ratio of 80%:20% is followed using stratified random sampling to ensure equal proportion of representation of classes in both datasets. All algorithms are executed for 20 runs in order to mitigate the impact of random seed variation on the algorithm, which is the standard practice followed in the evolutionary algorithms (EAs) literature. For all the experiments, the number of migrations is fixed at one. The top solution in each run is identified as the one achieving the highest AUC. Thus, we get 20 top solutions, one each for 20 runs. Mean AUC and mean cardinality of these 20 top solutions are computed and presented in Tables 6-10.
The proposed different quantum algorithms are compared against a non-quantum counterpart, BDE, and the Chaotic Quantum-Inspired Evolution Algorithm (CQIEA)[35]. To assess the effectiveness of the proposed method, we developed the variants listed in Table 2.
The reasons for developing these variants are as follows: The algorithms QBDE-I and QBDE-II are meant to see the effectiveness of introducing quantum operators; CQBDE-I and CQBDE-II are designed to see the effectiveness of chaos; CLQBDE-I and CQBDE-II are designed to see the role of the Lyapunov exponent when coupled with chaotic maps. Further, CTQIEA and CLTQIEA are introduced to see the effectiveness of the threshold trick and threshold trick in combination with the Lyapunov exponent, respectively.
Dataset Name | # Objects | # Features | Size of dataset |
---|---|---|---|
Epsilon | 500,000 | 2,000 | 10.8 GB |
IEEE Malware | 1,500,000 | 1,000 | 3.2 GB |
OVA_Omentum | 1,584 | 10,935 | 108.3 MB |
OVA_Uterus | 1,584 | 10,935 | 108.3 MB |
Dataset Name | () | Total generations per migration | CR | MR | |
---|---|---|---|---|---|
Epsilon | 30 (15) | 20 (10) | 0.90 | 0.80 | 0.10 |
IEEE Malware | 30 (15) | 10 (5) | 0.90 | 0.80 | 0.15 |
OVA_Omentum | 300 (200) | 20 (10) | 0.90 | 0.8 | 0.01 |
OVA_Uterus | 300 (200) | 20 (10) | 0.90 | 0.8 | 0.01 |
Datasets | ||||||||
Algorithm | Epsilon | IEEE Malware | OVA_Omentum | OVA_Uterus | ||||
BDE+LR | 1321.3 | 0.835 | 646.50 | 0.802 | 876.26 | 0.836 | 63.35 | 0.788 |
BDE+LLR | 1323.7 | 0.846 | 505.7 | 0.817 | 112.65 | 0.833 | 108.65 | 0.802 |
Datasets | ||||||||
---|---|---|---|---|---|---|---|---|
Algorithm | Epsilon | IEEE Malware | OVA_Omentum | OVA_Uterus | ||||
QBDE-I+LR | 204.7 | 0.742 | 121.25 | 0.753 | 159.25 | 0.800 | 69.90 | 0.807 |
QBDE-II+LR | 214.75 | 0.745 | 127.30 | 0.758 | 70.2 | 0.846 | 70.65 | 0.808 |
QBDE-I+LLR | 248.95 | 0.753 | 168.5 | 0.826 | 168.5 | 0.826 | 107.75 | 0.796 |
QBDE-II+LLR | 140.85 | 0.783 | 140.6 | 0.793 | 107.0 | 0.944 | 108.4 | 0.799 |
Datasets | ||||||||
---|---|---|---|---|---|---|---|---|
Algorithm | Epsilon | IEEE Malware | OVA_Omentum | OVA_Uterus | ||||
CQBDE-I+LR | 269.6 | 0.522 | 142.05 | 0.80 | 794.7 | 0.5 | 830.05 | 0.5 |
CQBDE-II+LR | 143.85 | 0.780 | 144.5 | 0.791 | 107.5 | 0.905 | 106.45 | 0.842 |
CQBDE-I+LLR | 397.95 | 0.797 | 162.55 | 0.818 | 159.25 | 0.808 | 108.75 | 0.796 |
CQBDE-II+LLR | 138.65 | 0.784 | 145.85 | 0.799 | 110.35 | 0.855 | 101.7 | 0.808 |
Datasets | ||||||||
---|---|---|---|---|---|---|---|---|
Algorithm | Epsilon | IEEE Malware | OVA_Omentum | OVA_Uterus | ||||
CLQBDE-I+LR | 273.85 | 0.523 | 143.8 | 0.806 | 798.7 | 0.5 | 798.7 | 0.5 |
CLQBDE-II+LR | 141.5 | 0.777 | 140.6 | 0.793 | 107 | 0.944 | 107.15 | 0.853 |
CLQBDE-I+LLR | 145.95 | 0.755 | 145.95 | 0.796 | 156.95 | 0.923 | 151.0 | 0.865 |
CLQBDE-II+LLR | 141.70 | 0.784 | 140.9 | 0.801 | 110.25 | 0.954 | 112.2 | 0.812 |
Datasets | ||||||||
---|---|---|---|---|---|---|---|---|
Algorithm | Epsilon | IEEE Malware | OVA_Omentum | OVA_Uterus | ||||
CQIEA+LR | 1968.10 | 0.841 | 885.5 | 0.802 | 5033.0 | 0.788 | 6942.9 | 0.792 |
CTQIEA+LR | 421.0 | 0.790 | 217.65 | 0.779 | 240.6 | 0.947 | 241.1 | 0.884 |
CLTQIEA+LR | 412.7 | 0.790 | 208.3 | 0.812 | 290.0 | 0.819 | 280.95 | 0.794 |
CQIEA+LLR | 888.0 | 0.853 | 677.9 | 0.810 | 8356.25 | 0.837 | 6942.90 | 0.792 |
CTQIEA+LLR | 256.20 | 0.815 | 190.8 | 0.798 | 261.50 | 0.842 | 318.8 | 0.806 |
CLTQIEA+LLR | 258.50 | 0.815 | 192.05 | 0.801 | 258.70 | 0.893 | 332.3 | 0.803 |
Dataset | Model (Top1 vs Top2 w.r.t AUC) | t-statistic | p-value |
---|---|---|---|
Epsilon | CTQIEA + LLR* vs | 0.421 | 0.675 |
CLTQIEA+LLR | |||
IEEE Malware | QBDE-I+LLR* vs | 1.204 | 0.235 |
CQBDE-I + LLR | |||
OVA_Omentum | CLQBDE-II + LLR* vs | 1.14 | 0.260 |
CTQIEA + LR | |||
OVA_Uterus | CTQIEA + LR vs | 3.84 | 0.0004 |
CLQBDE-I + LLR** | |||
*Better method based on cost-benefit analysis. | |||
**Better method based on statistical significance. |
4.1 Comparative Analysis Ablation Study
We conducted our ablation study in a boosted manner. That means in the first instance, parallel BDE is developed as a baseline and then quantum-inspired versions of BDE are developed (boosted wrapper-I), and finally, chaotic versions of these quantum-inspired algorithms are developed (boosted wrapper-II).
Initially, non-quantum variants, i.e., BDE variants, are developed as a baseline, and the corresponding results are presented in Table 6. With an aim to remove irrelevant features, we employed LLR in place of LR as a classifier. LLR demonstrated its ability to weed out unimportant features in the following datasets: (i) In the IEEE Malware dataset, BDE with LLR as the classifier, obtained mean AUC, and there was a reduction of 150 features. (ii) Similarly, in the OVA_Omentum dataset, LLR led to a great reduction (around 600 features) in terms of mean cardinality, with slightly improved mean AUC. However, this observation is not consistently noticed across the remaining two datasets. In the other datasets (Epsilon and OVA_Uterus), the mean AUC is slightly improved. However, it is accompanied by a modest increase in the mean cardinality.
The strength of LLR over LR is that the latter retains unimportant features during the evolution process, leading to increased cardinality of the solutions.
In the boosted wrapper-I, to obtain better optimal results, we developed their quantum-inspired counterparts, namely, QBDE-I and QBDE-II, where the QBDE-I variant has gates, and the QBDE-II variant has no gates. Consequently, we achieved lower mean cardinality accompanied by higher mean AUC across all datasets except the Epsilon dataset (see Table 7).
Then, in the boosted wrapper-II, to further improve the exploration capability, we introduced chaotic initialization in two different ways: i.e., with and without Lyapunov-guided chaotic series (see Table 8, (see Table 9)) resulting in CLQBDE and CQBDE respectively. In the latter case (i.e., without the Lyapunov exponent), we noticed a reduction in mean AUC in all the datasets than their corresponding non-chaotic counterparts (i.e. QBDE variants). This is likely due to the fact that we do not operate in a true chaotic regime.
To balance both the exploration and exploitation capability of an algorithm, we introduced Lyapunov-guided chaotic series (see Table 9)in the initialization phase. It turned out that these variants achieved efficient solutions meaning higher mean AUC with decreased mean cardinality. The chaotic numbers introduced in the first step (of CQBDE variants) are not generated in a truly chaotic regime. However, after introducing the Lyapunov exponent-guided chaotic numbers (i.e. CLQBDE variants), we discarded the first 5000 time steps, ensuring the chaotic series is in the true chaotic regime. This transition helped the algorithm improve exploration and exploitation capability as evidenced by the results in Table 9). This is spectacularly observed in the case of both high-dimensional datasets, i.e., OVA_Omentum and OVA_Uterus. However, the true chaotic regime alone did not suffice in the other two datasets.
Further, to create a level-playing field to the algorithm we would like to compare with (i.e. variants of CQIEA [35] and CLQIEA), we demonstrated the effectiveness of the threshold trick and the influence of Lyapunov exponent-guided chaotic numbers by invoking them in the variants of CQIEA [35] and CLQIEA in Table 10. We noticed that the original CQIEA [35], i.e., without the threshold trick and the Lyapunov exponent-guided chaotic numbers, yielded unacceptably high mean cardinality, impacting its performance adversely. However, after introducing the threshold trick combined with Lyapunov exponent-guided chaotic numbers, we could effectively control the cardinality while improving the mean AUC. This behaviour is particularly evident when LASSO LR (LLR) is employed as the classifier.
4.2 Statistical Testing
A two-tailed t-test at 5% level of significance and 38 (=20+20-2) degrees of freedom is conducted on the mean AUC obtained from 20 runs across the top two algorithms (w.r.t mean AUC) in each dataset (refer to Table 11). The t-test demonstrates that in three out of four datasets (except OVA_Uterus), the top 2 best algorithms solely based on AUC, turned out to be statistically similar. The top 2 best algorithms corresponding to each dataset are as follows: (i) in the Epsilon dataset (CTQIEA+LLR and CLTQIEA+LLR), (ii) in IEEE Malware dataset (QBDE-I+LLR and CQBDE-I+LLR), (iii) in OVA_Omentum dataset (CLQBDE-II+LLR and CTQIEA+LR) and (iv) in OVA_Uterus dataset (CTQIEA+LR and CLQBDE-I+LLR).
In the case of statistical similarity, to break the tie, preference is accorded to the algorithm that selected less mean cardinality. In other words, higher preference is accorded to the algorithm that yielded a great reduction in mean cardinality and an insignificant reduction in mean AUC. After performing this type of cost-benefit analysis, the better-performing algorithm is presented in bold face in Table 11).
It is noticed that in three datasets, the top-performing algorithm yielded a higher mean AUC with almost similar mean cardinality. In these cases, preference is accorded to the one with a higher mean AUC. For example, in the IEEE Malware dataset, QBDE-I+LLR obtained a mean AUC of 0.826 with a mean cardinality of 168.5. However, the next-best algorithm (CQBDE-I+LLR) obtained a mean AUC of 0.818 (which is 0.008 than that obtained by QBDE-I+LLR) with a mean cardinality of 162.55. Here, the reduction in cardinality is very minimal ( 6 features). This makes the QBDE-I+LLR win over the CQBDE-I algorithm. The same cost-benefit analysis is followed for the other datasets as well, resulting in the Epsilon and OVA_Omentum datasets; the QBDE-I + LLR and CLQBDE-II+LLR are the winners, respectively. However, in the OVA_Uterus dataset, CLQBDE-I + LLR turned out to be statistically significant when compared to the CTQIEA-I + LR. Accordingly, CLQBDE-I + LLR is presented in boldface in Table 11
It is important to note that when algorithm A and algorithm B are statistically similar, the winner is chosen based on the cost-benefit analysis. However, if statistical significance is observed, the preference is accorded to the algorithm, which is the statistically significant algorithm.
In summary, the insights derived from the current study are as follows:
-
•
In the Epsilon dataset, CTQIEA + LLR emerged as the winner due to the significant impact of the threshold trick and chaos.
-
•
In the IEEE Malware dataset, QBDE-I+LLR was the best algorithm, primarily due to the effectiveness of the threshold trick.
-
•
In the OVA_Omentum dataset, CLQBDE-II + LLR outperformed others due to the introduction of Lyapunov-based chaos and the threshold trick.
-
•
In the OVA_Uterus dataset, CLQBDE-I + LLR was the winner largely due to the guidance by Lyapunov.
-
•
The CLQBDE variants with LLR as a classifier turned out to be the best wrappers no matter a quantum gate is adopted or not in both high-dimensional datasets.
5 Conclusions
This study proposes CLQBDE-I, where a Lyapunov exponent-guided chaotic map-based initialization method is incorporated into the quantum-inspired BDE algorithm for FSS. Our results show its superiority on high-dimensional datasets and its competitive performance on Epsilon and IEEE Malware datasets compared to the alternative algorithms. CLQBDE-I outperformed not only our baseline, namely, QBDE but also other baselines CLTQIEA-I and CTQIEA-I in all but the IEEE Malware dataset in the latter algorithm. Overall, working in conjunction with QBDE variants, integrating Lyapunov exponent-guided chaotic dynamics into them yielded better solutions than simple, chaotic-based initialization methods.
Future research directions include extending this approach to multi-objective environments, designing hybrid chaotic mapping techniques, and exploring chaotic quantum hybrid EAs. This methodology also holds promise for large-scale clustering and feature selection applications across various domains like finance and economics.
References
- \bibcommenthead
- Adhianto et al. [2020] Adhianto, L., S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. Tallent. 2020. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22: 685–701 .
- Al-Sawwa and Ludwig [2020] Al-Sawwa, J. and S. Ludwig. 2020. Performance evaluation of a cost-sensitive differential evolution classifier using spark – imbalanced binary classification. J. Comput. Sci. 40: 101065 .
- Cao et al. [2017] Cao, B., J. Zhao, Z. Lv, and X. Liu. 2017. A distributed parallel cooperative coevolutionary multiobjective evolutionary algorithm for large-scale optimization. IEEE Transactions on Industrial Informatics 13: 2030–2038 .
- Chandrashekar and Sahin [2014] Chandrashekar, B. and F. Sahin. 2014. A survey on feature selection methods. Comput. Electr. Eng. 40: 16–28 .
- Chen et al. [2016] Chen, Z., X. Jiang, J. Li, S. Li, and L. Wang. 2016. Pdeco: Parallel differential evolution for clusters optimization. J. Comput. Chem. 34: 1046–1059 .
- Cho et al. [2019] Cho, P., T. Nyunt, and T. Aung 2019. Differential evolution for large-scale clustering. In Proc. 2019 9th Int. Work. Comput. Sci. Eng. (WCSE 2019 SPRING), pp. 58–62.
- Danforth [2013] Danforth, C.M. 2013. Chaos in an atmosphere hanging on a wall. Mathematics of Planet Earth 2013 .
- Das et al. [2016] Das, S., S.S. Mullick, and P.N. Suganthan. 2016. Recent advances in differential evolution–an updated survey. Swarm and Evolutionary Computation 27: 1–30 .
- Das and Suganthan [2011] Das, S. and P. Suganthan. 2011, Feb. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15(1): 4–31 .
- de P. Veronese and Krohling [2010] de P. Veronese, L. and R. Krohling 2010. Differential evolution algorithm on the gpu with c-cuda. In IEEE Congress on Evolutionary Computation, pp. 1–7.
- Deng et al. [2015] Deng, C., X. Tan, X. Dong, and Y. Tan. 2015. A parallel version of differential evolution based on resilient distributed datasets model, Commun. Comput. Inf. Sci., Volume 562, 84–93.
- Deng et al. [2021] Deng, W., S. Shang, X. Cai, H. Zhao, Y. Zhou, H. Chen, and W. Deng. 2021. Quantum differential evolution with cooperative coevolution framework and hybrid mutation strategy for large scale optimization. Knowledge-Based Systems 224: 107080. 10.1016/j.knosys.2021.107080 .
- Deng et al. [2022] Deng, W., J. Xu, X.Z. Gao, and H. Zhao. 2022, Mar. An enhanced msiqde algorithm with novel multiple strategies for global optimization problems. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52(3): 1578–1587. 10.1109/TSMC.2020.3030792 .
- Falco et al. [2017] Falco, I.D., U. Scafuri, E. Tarantino, and A.D. Cioppa 2017. A distributed differential evolution approach for mapping in a grid environment. In 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP’07), pp. 442–449.
- Ge et al. [2018] Ge, Y., W. Yu, Y. Lin, Y. Gong, Z. Zhan, W. Chen, and J. Zhang. 2018. Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Transactions on Cybernetics 48: 2166–2180 .
- Glotic et al. [2014] Glotic, A., P. Kitak, J. Pihler, and I. Ticar. 2014. Parallel self-adaptive differential evolution algorithm for solving short-term hydro scheduling problem. IEEE Transactions on Power Systems 29: 2347–2358 .
- Gupta et al. [2016] Gupta, P., A. Sharma, and R. Jindal. 2016. Scalable machine-learning algorithms for big data analytics: A comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6): 194–214 .
- Han et al. [2021] Han, D., J. Wang, C. Tang, T. Weng, K. Li, and C. Dobre. 2021. A multi-objective distance vector-hop localization algorithm based on differential evolution quantum particle swarm optimization. International Journal of Communication Systems 34(14): e4924 .
- Han and Kim [2002] Han, K.H. and J.H. Kim. 2002, Dec. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization. IEEE Transactions on Evolutionary Computation 6(6): 580–593. 10.1109/TEVC.2002.804320 .
- Harada et al. [2020] Harada, T., M. Kaidan, and R. Thawonmas. 2020. Comparison of synchronous and asynchronous parallelization of extreme surrogate-assisted multi-objective evolutionary algorithm. Natural Computing .
- He et al. [2021] He, Z., H. Peng, J. Chen, C. Deng, and Z. Wu. 2021. A spark-based differential evolution with grouping topology model for large-scale global optimization. Cluster Comput. 24: 515–535 .
- Hota and Pat [2010] Hota, A.R. and A. Pat 2010. An adaptive quantum-inspired differential evolution algorithm for 0-1 knapsack problem. In 2010 Second World Congress on Nature and Biologically Inspired Computing (NaBIC), Kitakyushu, Japan, pp. 703–708.
- Kromer et al. [2013] Kromer, P., J. Platos, and V. Snasel 2013. Scalable differential evolution for many-core and clusters in unified parallel c. In 2013 IEEE International Conference on Cybernetics (CYBCO), pp. 180–185.
- Kumari et al. [2013] Kumari, A.C., K. Srinivas, and M. Gupta. 2013. Software requirements optimization using multi-objective quantum-inspired hybrid differential evolution, In EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II, ed. et al., O.S., Volume 175 of Advances in Intelligent Systems and Computing. Springer, Berlin, Heidelberg. 10.1007/978-3-642-31519-0_7.
- Liu et al. [2015] Liu, T., X. Gao, and L. Wang. 2015. Multi-objective optimization method using an improved nsga-ii algorithm for oil–gas production process. Journal of the Taiwan Institute of Chemical Engineers 57: 42–53 .
- Lu et al. [2013] Lu, H., R. Niu, J. Liu, and Z. Zhu. 2013. A chaotic non-dominated sorting genetic algorithm for the multi-objective automatic test task scheduling problem. Applied Soft Computing 13(5): 2790–2802 .
- Maier et al. [2019] Maier, H.R., S. Razavi, Z. Kapelan, L.S. Matott, J. Kasprzyk, and B.A. Tolson. 2019. Introductory overview: Optimization using evolutionary algorithms and other metaheuristics. Environmental Modelling & Software 114: 195–213 .
- May [1976] May, R. 1976. Simple mathematical models with very complicated dynamics. Nature 261: 459–467 .
- Olyaei et al. [2017] Olyaei, A., C. Wu, and W. Kinsner. 2017. Detecting unstable periodic orbits in chaotic time series using synchronization. American Physical Society 96 .
- Packard et al. [1980] Packard, N.H., J.P. Crutchfield, J.D. Farmer, and R.S. Shaw. 1980. Geometry from a time series. Phys. Rev. Lett. 45: 712 .
- Pan and Da [2015] Pan, I. and S. Da. 2015. Fractional-order load-frequency control of interconnected power systems using chaotic multi-objective optimization. Applied Soft Computing 29: 328–344 .
- Pant et al. [2020] Pant, M., H. Zaheer, L. Garcia-Hernandez, and A. Abraham. 2020. Differential evolution: A review of more than two decades of research. Engineering Applications of Artificial Intelligence 90: 103479 .
- Peralta et al. [2015] Peralta, D., S.D. Río, S. Ramírez-Gallego, I. Triguero, J. Benitez, and F. Herrera. 2015. Evolutionary feature selection for big data classification: A mapreduce approach. Math. Probl. Eng. .
- Qian et al. [2009] Qian, B., L. Wang, D. Huang, W. Wang, and X. Wang. 2009. An effective hybrid de-based algorithm for multi-objective flow shop scheduling with limited buffers. Computers & Operations Research 36(1): 209–233 .
- Ramos and Vellasco [2020] Ramos, A.C. and M. Vellasco 2020. Chaotic quantum-inspired evolutionary algorithm: enhancing feature selection in bci. In 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, pp. 1–8.
- Rastogi and Shim [1999] Rastogi, R. and K. Shim 1999. Scalable algorithms for mining large databases. In Tutorial Notes of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Rong et al. [2019] Rong, M., D. Gong, and X. Gao. 2019. Feature selection and its use in big data: Challenges, methods, and trends. IEEE Access 7: 19709–19725 .
- Schliemann et al. [2002] Schliemann, J., A.V. Khaetskii, and D. Loss. 2002. Spin decay and quantum parallelism. Physical Review B 66(24): 245303 .
- Srikrishna et al. [2015] Srikrishna, V., R. Ghosh, V. Ravi, and K. Deb. 2015. Elitist quantum-inspired differential evolution based wrapper for feature subset selection, In Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2015, eds. Bikakis, A. and X. Zheng, Volume 9426 of Lecture Notes in Computer Science. Springer, Cham. 10.1007/978-3-319-26181-2_11.
- Teijeiro et al. [2016] Teijeiro, D., X. Pardo, P. González, J. Banga, and R. Doallo. 2016. Implementing parallel differential evolution on spark, In Applications of Evolutionary Computation. EvoApplications 2016, eds. Squillero, G. and P. Burelli, Volume 9598 of Lecture Notes in Computer Science. Springer, Cham.
- Thomert et al. [2016] Thomert, D., A. Bhattacharya, E. Caron, K. Gadireddy, and L. Lefevre 2016. Parallel differential evolution approach for cloud workflow placements under simultaneous optimization of multiple objectives. In 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 822–829.
- Vivek et al. [2022] Vivek, Y., V. Ravi, and P. RadhaKrishna. 2022. Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment. Cluster Computing. 10.1007/s10586-022-03725-w .
- Wang and Wang [2021] Wang, Y. and W. Wang. 2021. Quantum-inspired differential evolution with grey wolf optimizer for 0-1 knapsack problem. Mathematics 9(1233). 10.3390/math9111233 .
- Wong et al. [2015] Wong, T., A. Qin, S. Wang, and Y. Shi. 2015. cusade: A cuda-based parallel self-adaptive differential evolution algorithm. IEEE Congress on Evolutionary Computation (CEC) 2: 375–388 .
- Wu et al. [2019] Wu, G., R. Mallipeddi, and P.N. Suganthan. 2019. Ensemble strategies for population-based optimization algorithms–a survey. Swarm and Evolutionary Computation 44: 695–711 .
- Xue et al. [2016] Xue, B., M. Zhang, W.N. Browne, and X. Yao. 2016. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20: 606–626 .
- Zaharia et al. [2010] Zaharia, M., M. Chowdhury, M.J. Franklin, S. Shenker, and I. Stoica 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10).
- Zhou [2010] Zhou, C. 2010. Fast parallelization of differential evolution algorithm using mapreduce. In Proc. 12th Annu. Genet. Evol. Comput. Conf. GECCO ’10, pp. 1113–1114.
Appendix A Details of the proposed approach
Authors | Algorithm | Environment | Solved Problem |
---|---|---|---|
Zhou [17] | DE | Spark | Discussed Pros and cons of various parallel approaches |
Teijeiro et al. [18] | DE | Spark + AWS | Benchmark functions |
Cho et al. [19] | DE | Spark | Clustering |
Chen et al. [20] | Modified DE | SPMD | Cluster Optimization |
Al-Sawwa and Ludwig [21] | DE | Spark | DE based classifier |
Adhianto et al. [22] | DE | OpenMP | Optical Network problem |
Deng et al. [23] | DE | Spark | Benchmark functions |
He et al. [24] | Five variants of DE | Spark + Cloud | Ring topology model applied to benchmark functions |
Veronse & Krohling [25] | DE | CUDA | Large scale optimization |
Wong et al. [26] | Self-Adaptive DE | CUDA | Benchmark functions |
Cao et al. [27] | DPCCMOEA | MPI | Large scale optimization |
Ge et al. [28] | DDE-AMS | MPI | Large scale optimization |
Falco et al. [29] | DE | MPI | Resource allocation |
Glotic et al. [30] | PSADE | MATLAB | Hydro Scheduling algorithm |
Daoudi et al. [31] | DE | Hadoop | Clustering |
Thomert et al. [32] | NSDE-II | OpenMP | Cloud work placement |
Kromer et al. [33] | DE | Unified Parallel C | Large scale optimization |
Qian et al. [34] | MPFPSP | Multithreading | Flow scheduling problem |
Vivek et al. [12] | PB-TADE, PB-DETA,PB-DE | Spark | FSS |
Current study | CLTQBDE-I, CLTQIEA | Spark | FSS |