Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

[1]\fnmVadlamani \surRavi

[1]\orgdivCentre for Artificial Intelligence and Machine Learning, \orgnameInstitute for Development and Research in Banking Technology, \orgaddress\streetCastle Hills Road #1, Masab Tank, \cityHyderabad, \postcode500076, \stateTelangana, \countryIndia

2]\orgdivResearch Laboratory of Electronics, \orgnameMassachusetts Institute of Technology, \orgaddress \postcodeMA, \stateCambridge, \countryUnited States of America

3]\orgdivDepartment of Computer Science and Engineering, \orgnameNational Institute of Technology Warangal, \orgaddress\cityWarangal, \postcode506004, \stateTelangana, \countryIndia

Improved Differential Evolution based Feature Selection through Quantum, Chaos, and Lasso

\fnmYelleti \surVivek yvivek@idrbt.ac.in    \fnmSri Krishna \surVadlamani srikv@mit.edu    vravi@idrbt.ac.in    \fnmP. Radha \surKrishna prkrishna@nitw.ac.in * [ [
Abstract

Modern deep learning continues to achieve outstanding performance on an astounding variety of high-dimensional tasks. In practice, this is obtained by fitting deep neural models to all the input data with minimal feature engineering, thus sacrificing interpretability in many cases. However, in applications such as medicine, where interpretability is crucial, feature subset selection becomes an important problem. Metaheuristics such as Binary Differential Evolution are a popular approach to feature selection, and the research literature continues to introduce novel ideas, drawn from quantum computing and chaos theory, for instance, to improve them. In this paper, we demonstrate that introducing chaos-generated variables, generated from considerations of the Lyapunov time, in place of random variables in quantum-inspired metaheuristics significantly improves their performance on high-dimensional medical classification tasks and outperforms other approaches. We show that this chaos-induced improvement is a general phenomenon by demonstrating it for multiple varieties of underlying quantum-inspired metaheuristics. Performance is further enhanced through Lasso-assisted feature pruning. At the implementation level, we vastly speed up our algorithms through a scalable island-based computing cluster parallelization technique.

keywords:
Feature Subset Selection; Differential Evolution; Quantum inspired algorithms; Chaos; Big Data

1 Introduction

The fundamental objective of feature selection is to identify the most important and discriminative features from a given set of features. Its prominence has garnered the attention of researchers and practitioners in domains replete with big, high-dimensional datasets. Feature selection [4, 46] entails a great reduction in computational complexity, improvement in human comprehensibility, and easy deployment of models in production. Wrapper methods based on metaheuristics solve the feature subset selection (FSS) problem by posing it as a combinatorial optimization problem. These methods attempt to identify an efficient feature subset with the least cardinality and associated high accuracy among the 2n1superscript2𝑛12^{n}-12 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - 1 possible number of combinations, where n𝑛nitalic_n is the total number of features in the dataset. Among the metaheuristics employed for this purpose, evolutionary algorithms have been proven to be most efficient for determining optimal feature subsets owing to the inherent parallelism present in the population-based search [27, 45].

Table 1: Notation used in the current study
ϕitalic-ϕ\phiitalic_ϕ Empty set
N𝑁Nitalic_N Population size
X𝑋Xitalic_X Population comprising N number of solutions
n𝑛nitalic_n Number of dimensions / features
Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ith solution of population X and having n𝑛nitalic_n dimensions
Mtsuperscript𝑀𝑡M^{t}italic_M start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Mutated vectors at generation t𝑡titalic_t comprising N𝑁Nitalic_N solutions
Mitsuperscriptsubscript𝑀𝑖𝑡M_{i}^{t}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ith𝑖thi\textsuperscript{th}italic_i solution of mutation population of n𝑛nitalic_n dimensions
F Mutation factor
Utsuperscript𝑈𝑡U^{t}italic_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT trial vectors at generation t comprising ps solutions
Uitsuperscriptsubscript𝑈𝑖𝑡U_{i}^{t}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ith𝑖thi\textsuperscript{th}italic_i solution of mutation population of n𝑛nitalic_n dimensions
CR𝐶𝑅CRitalic_C italic_R Crossover rate
MAXITR𝑀𝐴𝑋𝐼𝑇𝑅MAXITRitalic_M italic_A italic_X italic_I italic_T italic_R Maximum number of iterations
randi𝑟𝑎𝑛𝑑𝑖randiitalic_r italic_a italic_n italic_d italic_i Randomly chosen index
dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT Chaotic number at tth time step
rand(0,1)𝑟𝑎𝑛𝑑01rand(0,1)italic_r italic_a italic_n italic_d ( 0 , 1 ) Random number generated between [0,1]01[0,1][ 0 , 1 ]
λ𝜆\lambdaitalic_λ Logistic map control parameter
AUCi AUC score of an ith solution
cardinalityi Cardinality score of an ith𝑖thi\textsuperscript{th}italic_i solution
cvaluet𝑐𝑣𝑎𝑙𝑢subscript𝑒𝑡cvalue_{t}italic_c italic_v italic_a italic_l italic_u italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT Chaotic random number at tth𝑡tht\textsuperscript{th}italic_t time step
P𝑃Pitalic_P RDD of population X
Xtrain𝑋𝑡𝑟𝑎𝑖𝑛Xtrainitalic_X italic_t italic_r italic_a italic_i italic_n Train dataset
Xtest𝑋𝑡𝑒𝑠𝑡Xtestitalic_X italic_t italic_e italic_s italic_t Test dataset
localN𝑙𝑜𝑐𝑎𝑙𝑁localNitalic_l italic_o italic_c italic_a italic_l italic_N Local population size
mMig𝑚𝑀𝑖𝑔mMigitalic_m italic_M italic_i italic_g Maximum number of migrations
mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n Maximum number of generations
localN𝑙𝑜𝑐𝑎𝑙𝑁localNitalic_l italic_o italic_c italic_a italic_l italic_N Local population / sub-population pertained to a data island comprising lps solutions

Among the evolutionary algorithms, Differential Evolution (DE) proved to be robust while solving many combinatorial and continuous optimization problems [9, 8, 32]. Despite its supremacy over other algorithms, DE suffers from a tendency to get stuck prematurely in local optimal solutions, which affects its exploration and exploitation capabilities [9]. To alleviate these issues, researchers proposed several quantum-inspired algorithms (QIEA) to solve a myriad of combinatorial optimization problems such as knapsack problem, truck trail problem, and portfolio optimization. Often, QIEAs accelerate the evolution process owing to their quantum parallelization [38, 19] and entanglement of the quantum state. Further, the diversity and convergence rate are improved well enough to increase the probability of getting better global optimal solutions.

In today’s data-rich environment, the humongous growth of high-dimensional datasets warrants the critical need for developing scalable algorithms [17, 36]. Despite the popularity of Hadoop and its ecosystem as a big data framework, Spark’s unique features, including in-memory computing and seamless integration, have made it a viable and competitive alternative to Hadoop [47]. However, the extant Quantum-Inspired Evolutionary Algorithms (QIEAs) are not scalable to large, high-dimensional datasets.

Our contributions in this paper include:

  • proposing chaotic, quantum-inspired evolutionary algorithms for FSS in high-dimensional data and demonstrating their superiority over the extant methods on four problems.

  • utilizing the Lyapunov exponent to ensure that we work in a truly chaotic regime.

  • introducing LASSO LR in place of LR as a classifier for the FSS wrapper

  • proposing parallel versions of the above-mentioned algorithms operating in a single-objective environment under an island-based approach in Apache Spark framework.

The paper is structured as follows: Section 2 reviews the literature, Section 3 presents our new algorithms, Section 4 describes the datasets analyzed in the study, Section 5 analyses the results obtained, and section 6 concludes the paper.

Table 2: Full form of the acronyms of the Algorithms employed in the current study
No. Algorithm Description
1 LR Logistic Regression
2 LLR Least Absolute Shrinkage and Selection Operator LR
3 BDE non-quantum counterpart of QBDE
4 QBDE-I a non-gate quantum variant of BDE with threshold trick and random numbers generated from the uniform distribution
5 QBDE-II a gate quantum variant of BDE with threshold trick and random numbers generated from uniform distribution
6 CQBDE-I a chaotic maps guided variant of QBDE-I but without Lyapunov exponent guidance
7 CQBDE-II a chaotic variant of QBDE-II without Lyapunov exponent guidance
8 CLQBDE-I a chaotic variant with Lyapunov exponent of QBDE-I
9 CLQBDE-II a chaotic variant with Lyapunov exponent of QBDE-II
10 CQIEA an algorithm from [35]
11 CTQIEA a variant of CQIEA [35]which integrates threshold trick and chaos without Lyapunov exponent guidance
12 CLTQIEA a variant of CQIEA [35] which incorporates both threshold trick and Lyapunov exponent guidance.
Refer to caption
(a) Schematic of the BDE
Refer to caption
(b) Schematic of the proposed QBDE variants
Refer to caption
(c) Schematic of the proposed CQBDE variants
Refer to caption
(d) Block diagram of the proposed wrapper
Figure 1: Generic framework of the proposed wrappers

2 Background and Literature Review

2.1 Differential Evolution

Binary Differential Evolution (BDE), a stochastic population-based global optimization algorithm, starts by initializing the random population, consisting of N𝑁Nitalic_N candidate solution vectors (Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), where N𝑁Nitalic_N is the population size. This candidate solution vector follows the binary encoding scheme. Each candidate solution vector is subjected to all of the following three heuristics in each iteration (or generation) of the algorithm (see Fig. 1(a)).

At each generation t𝑡titalic_t, the candidate solution vector (Xitsuperscriptsubscript𝑋𝑖𝑡X_{i}^{t}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT) within n𝑛nitalic_n dimensional search space, is subjected to the mutation operation yielding the mutant vector (Mitsuperscriptsubscript𝑀𝑖𝑡M_{i}^{t}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT). The mutation operation is applied as presented in Eq. (1).

Mit=Xi1t+F(Xi2tXi3t)superscriptsubscript𝑀𝑖𝑡superscriptsubscript𝑋𝑖1𝑡𝐹superscriptsubscript𝑋𝑖2𝑡superscriptsubscript𝑋𝑖3𝑡M_{i}^{t}=X_{i1}^{t}+F*(X_{i2}^{t}-X_{i3}^{t})italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_F ∗ ( italic_X start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_X start_POSTSUBSCRIPT italic_i 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) (1)

where Xi1tsuperscriptsubscript𝑋𝑖1𝑡X_{i1}^{t}italic_X start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT,Xi2tsuperscriptsubscript𝑋𝑖2𝑡X_{i2}^{t}italic_X start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and Xi3tsuperscriptsubscript𝑋𝑖3𝑡X_{i3}^{t}italic_X start_POSTSUBSCRIPT italic_i 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT are three randomly chosen distinct vectors from the current generation t𝑡titalic_t. F𝐹Fitalic_F, the mutation factor, is a user-defined parameter, and lies in the range [0,1]01[0,1][ 0 , 1 ]. After this, the mutant vector may not be binary anymore. Hence, sigmoid based discretization process (see Eq. (2)) is applied to every mijtsuperscriptsubscript𝑚𝑖𝑗𝑡m_{ij}^{t}italic_m start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT (jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT member of the Mitsuperscriptsubscript𝑀𝑖𝑡M_{i}^{t}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT) thereby converting continuous vector into a binary vector.

mijt={1,if rand(0,1)<sigmoid(mjt)0,elsesuperscriptsubscript𝑚𝑖𝑗𝑡cases1if rand01sigmoidsuperscriptsubscript𝑚𝑗𝑡0elsem_{ij}^{t}=\begin{cases}1,&\text{if }\text{rand}(0,1)<\text{sigmoid}(m_{j}^{t}% )\\ 0,&\text{else}\end{cases}italic_m start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if roman_rand ( 0 , 1 ) < sigmoid ( italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL else end_CELL end_ROW (2)

Then, the discretized mutant vector is subjected to crossover operation where it is subjected to the mating with the corresponding candidate solution vector to generate the trial vector. The crossover operation is applied to trial vector Uitsuperscriptsubscript𝑈𝑖𝑡U_{i}^{t}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, as presented in Eq. (3).

uijt={mijt,if rand(0,1)<CR and jrandixijt,if rand(0,1)CR and jrandisuperscriptsubscript𝑢𝑖𝑗𝑡casessuperscriptsubscript𝑚𝑖𝑗𝑡if rand01𝐶𝑅 and 𝑗𝑟𝑎𝑛𝑑𝑖superscriptsubscript𝑥𝑖𝑗𝑡if rand01𝐶𝑅 and 𝑗𝑟𝑎𝑛𝑑𝑖u_{ij}^{t}=\begin{cases}m_{ij}^{t},&\text{if }\text{rand}(0,1)<CR\text{ and }j% \neq\text{$randi$}\\ x_{ij}^{t},&\text{if }\text{rand}(0,1)\geq CR\text{ and }j\neq\text{$randi$}% \end{cases}italic_u start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_m start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL if roman_rand ( 0 , 1 ) < italic_C italic_R and italic_j ≠ italic_r italic_a italic_n italic_d italic_i end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL if roman_rand ( 0 , 1 ) ≥ italic_C italic_R and italic_j ≠ italic_r italic_a italic_n italic_d italic_i end_CELL end_ROW (3)

where j=1,2,n,uijt𝑗12𝑛superscriptsubscript𝑢𝑖𝑗𝑡j=1,2,…n,u_{ij}^{t}italic_j = 1 , 2 , … italic_n , italic_u start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT bit of Uitsuperscriptsubscript𝑈𝑖𝑡U_{i}^{t}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, rand(0,1)𝑟𝑎𝑛𝑑01rand(0,1)italic_r italic_a italic_n italic_d ( 0 , 1 ) is the random number generated in the interval [0,1]01[0,1][ 0 , 1 ] from a uniform distribution. randi𝑟𝑎𝑛𝑑𝑖randiitalic_r italic_a italic_n italic_d italic_i is a randomly chosen index to make sure that the generated trial vector is different from the mutant vector. CR represents the crossover rate, is a user-defined parameter, and lies in the range [0,1]01[0,1][ 0 , 1 ].

Finally, the fitness score is computed for the trial vectors. Then, the selection operation is applied by comparing the corresponding target vectors and trial vector to produce an offspring. Better solutions survive and forms the parent population for the subsequent iteration. The selection operation follows the rule as presented in Eq. (4):

Xi(t+1)={Xi(t),if f(Xi)>f(Ui)Ui(t),otherwisesuperscriptsubscript𝑋𝑖𝑡1casessuperscriptsubscript𝑋𝑖𝑡if 𝑓subscript𝑋𝑖𝑓subscript𝑈𝑖superscriptsubscript𝑈𝑖𝑡otherwiseX_{i}^{(t+1)}=\begin{cases}X_{i}^{(t)},&\text{if }f(X_{i})>f(U_{i})\\ U_{i}^{(t)},&\text{otherwise}\end{cases}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > italic_f ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , end_CELL start_CELL otherwise end_CELL end_ROW (4)

As mentioned earlier, this is continued till the completion of maximum iterations or other convergence criteria, if any, are met.

2.2 Quantum Computing

To exploit the notions of the quantum theory within classical computers, quantum-inspired algorithms are proposed by [19]. It employs quantum mechanics concepts such as quantum measurement, superposition of states, inference, and entanglement. A quantum bit is the basic unit of information in quantum computation defined by the linear combination of the two states as given in Eq. (5).

Q=α|0+β|1𝑄𝛼ket0𝛽ket1Q=\alpha\ket{0}+\beta\ket{1}italic_Q = italic_α | start_ARG 0 end_ARG ⟩ + italic_β | start_ARG 1 end_ARG ⟩ (5)

The coefficients of α𝛼\alphaitalic_α and β𝛽\betaitalic_β are two complex numbers that must satisfy the norm relation as given in Eq. (6).

|α|2+|β|2=1superscript𝛼2superscript𝛽21|\alpha|^{2}+|\beta|^{2}=1| italic_α | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_β | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 (6)

where the probability of the observing state |0ket0\ket{0}| start_ARG 0 end_ARG ⟩ is |α|2superscript𝛼2|\alpha|^{2}| italic_α | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the probability of the state |1ket1\ket{1}| start_ARG 1 end_ARG ⟩ is |β|2superscript𝛽2|\beta|^{2}| italic_β | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. A quantum register is composed of n𝑛nitalic_n qubits containing 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT possible values simultaneously owing to the superposition of states.

2.3 Overview of Chaos Theory

The theory of chaos originated in the 1800s and was further developed by [7] to tackle challenges in complex non-linear systems [30]. Chaotic systems are dynamic and deterministic, evolving from initial conditions, with trajectories describing the system states in the state space. Chaos heavily relies on the initial conditions and exhibits two properties: ergodicity and intrinsically stochastic nature. Chaotic maps generate the sequence of numbers that exhibit these chaotic properties, aiding EAs in escaping local minima [31, 25, 26]. It is noticed that these chaotic maps are introduced to dynamically adjust the hyperparameters, and enhance adaptability to handle evolutionary dynamics optimally. It also facilitates searching in the regions which are left out by the random sequence. They enable exploration in regions left out by random sequences and have been extensively studied in large-scale continuous optimization problems. Chaotic maps produce a series of numbers from a probability distribution that differs from a uniform (0,1) distribution, exhibiting deterministic randomness. This deterministic nature allows predicting the sequence of numbers generated, as they are governed by differential equations and subject to the initial conditions. Now, a well-known chaotic map logistic map which is used in the current study is discussed below:

Logistic map [28]: The Logistic map exhibits chaotic behaviour in a discrete-time demographic model. This is a polynomial mapping of degree 2. The mathematical representation is defined in Eq. (7).

dt+1=λ(dt(1dt))subscript𝑑𝑡1𝜆subscript𝑑𝑡1subscript𝑑𝑡d_{t+1}=\lambda*(d_{t}*(1-d_{t}))italic_d start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_λ ∗ ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∗ ( 1 - italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) (7)

Here, the constant λ𝜆\lambdaitalic_λ lies in the range of [0,4]04[0,4][ 0 , 4 ] and determines the behaviour of this Logistic map. In the current research study, λ𝜆\lambdaitalic_λ = 4 is chosen.

2.4 Literature Review

The first quantum-inspired evolutionary algorithm (QEA) was proposed by Han and Kim for the knapsack problem in the pioneering paper, [19]. Their approach employed qubit notation and a rotation gate together with the migration operation, strategically guiding the solution to reach a global optimum. [22] proposed an adaptive quantum differential evolution (AQDE), which dynamically adjusts the mutation and cross-over rates based on their success streaks. It outperformed QEA in solving the knapsack problem. [39] proposed elitism-based quantum differential evolution for FSS on small datasets. [13] proposed Multi-strategy quantum differential evolution, which integrates multiple strategies, including a mutation strategy with difference vector, multi-population mutation, and adaptive rotation angle state. Other quantum evolutionary algorithms studied in the literature include (i) hybrid of QDE (QDE acronym is not defined anywhere) and grey wolf optimizer [43] tailored for the knapsack problem, (ii) multi-objective quantum-inspired hybrid DE [24], which is a hybrid of genetic algorithm quantum variants and DE for multi-objective next-release problems, and (iii) vector hop algorithm [18], a hybrid of differential evolution and particle swarm optimization. [12] proposed an improved quantum differential evolution by incorporating the principles of the divide-and-conquer concept of a cooperative coevolutionary algorithm, which improved both exploration and exploitation capabilities. (iv) Another hybrid DE where the first stage invoked QDE and the resultant population is passed on to BDE in the second stage, which continues the evolution process. All these hybrids improved the search process and obtained better convergence capabilities than their individual constituents in the standalone mode.

We now briefly survey the parallel and distributed versions of DE [20, 33, 37, 48, 40] developed across varied environments like Spark, CUDA, MPI, and OpenMP. [48] proposed two master-slave-based parallel approaches, namely, (i) a data-based MapReduce model and (ii) a population-based MapReduce model. [40] introduced two parallel strategies, master-slave and island approaches, evaluated on the AWS Spark cloud and tested the performance on benchmark functions. [6] and [5] developed master-slave-based parallel DE approaches for large-scale clustering and cluster optimization problems, respectively. [2] introduced a cost-sensitive DE classifier (SCDE) based on Euclidean distance for imbalanced classification datasets. [1] introduced a fine-grained parallel DE under the OpenMP framework for optimal networking, aiming to reduce the computational load on mappers and reducers. [11] introduced Parallel DE (PDE) under the Spark environment, showing promising speedup. [21] presented SgtDE, a grouping topology model for large-scale optimization, achieving significant speedup. [10] developed parallel DE under CUDA, while [44] designed a self-adaptive DE framework in CUDA for benchmark functions. Further, several parallel versions of DE are proposed to solve resource allocation problems [3, 15, 14], hydro scheduling [16], large-scale clustering [23], optimized workflow placement [41], and multi-objective flow scheduling problems [34] as well. Table A1 of the Appendix captures a brief overview of the literature.

Refer to caption
Figure 2: Generic schematic diagram of the island model based wrapper

3 Proposed Methodology

This section introduces the objective function employed in the current study, followed by the algorithm, and an overview of the proposed parallel mechanism, which applies to all algorithms discussed. All the acronyms employed in the study are presented in Table 1.

3.1 Objective Function

The objective function considered in this study is the Area under the receiver operating characteristic Curve (AUC) Eq.(8). It is defined as the mean of specificity and sensitivity. We specifically considered AUC due to its proven robustness while handling imbalanced datasets.

AUC=Sensitivity+Specificity2AUCSensitivitySpecificity2\text{AUC}=\frac{\text{Sensitivity}+\text{Specificity}}{2}AUC = divide start_ARG Sensitivity + Specificity end_ARG start_ARG 2 end_ARG (8)

Where sensitivity (refer to Eq. (10)) is the ratio of the positive samples that are correctly predicted to be positive to all the predicted positive samples. This is also called the True Positive Rate (TPR).

Sensitivity=TPTP+FNSensitivity𝑇𝑃𝑇𝑃𝐹𝑁\text{Sensitivity}=\frac{TP}{TP+FN}Sensitivity = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG (9)

Where TP𝑇𝑃TPitalic_T italic_P and FN𝐹𝑁FNitalic_F italic_N are the true positive and false negative, respectively. Specificity (refer to Eq. (10)) is the ratio of the negative samples that are correctly predicted to be negative to all the predicted negative samples. This is also called True Negative Rate (TNR).

Specificity=TNTN+FPSpecificity𝑇𝑁𝑇𝑁𝐹𝑃\text{Specificity}=\frac{TN}{TN+FP}Specificity = divide start_ARG italic_T italic_N end_ARG start_ARG italic_T italic_N + italic_F italic_P end_ARG (10)

Where TN𝑇𝑁TNitalic_T italic_N and FP𝐹𝑃FPitalic_F italic_P are the true negative, and false positive, respectively.

This study employs LR and LLR as the classifiers for the proposed FSS wrapper because their training converges quickly. Unlike LR, LLR performs regularization and reduces the number of features too.

3.2 Proposed Chaotic Quantum Inspired Algorithm

In this study, we propose two distinct algorithms based on chaotic and quantum principles: (i) non-gate variant, one focusing on information exchange across different qubits without gates (CLQBDE-I), and (ii) gate variant, the second involving information exchange via gates (CLQBDE-II) (see Table 2 and Fig. 1(b), Fig. 1(c)).

Unlike the chaos-based algorithms proposed in the literature [29, 4, 46], the Lyapunov exponent guides our population initialization method, ensuring that the wrapper based FSS operates within a true chaotic regime. Consequently, chaotic numbers are not utilized from the initial time step of the logistic chaotic map, unlike the existing approaches. Instead, they are introduced after a specific number of time steps, which are determined by the Lyapunov exponent. After thorough experimentation, we discovered that the logistic map enters a truly chaotic regime after 5000500050005000 time steps. This is observed offline and employed within the algorithm. Histogram plots comparing the distributions of numbers generated from the uniform random distribution and logistic chaotic map guided by Lyapunov exponent from 1 to 5000 time steps are presented in Fig. A1 of the Appendix. It is crucial to note that in the current study, Lyapunov exponent-guided logistic map is introduced as the sequence generated by chaotic numbers is observed to be in true chaotic regime after first 5000500050005000 timesteps. Consequently, both variants, CQBDE-I and CQBDE-II, incorporate chaotic numbers in the initialization for each qubit’s alpha and beta states for ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT solution, satisfying Eq. (11).

Q(i)=[α1(i)α2(i)αd(i)β1(i)β2(i)βd(i)]superscript𝑄𝑖matrixsubscriptsuperscript𝛼𝑖1subscriptsuperscript𝛼𝑖2subscriptsuperscript𝛼𝑖𝑑subscriptsuperscript𝛽𝑖1subscriptsuperscript𝛽𝑖2subscriptsuperscript𝛽𝑖𝑑Q^{(i)}=\begin{bmatrix}\alpha^{(i)}_{1}&\alpha^{(i)}_{2}&\dots&\alpha^{(i)}_{d% }\\ \beta^{(i)}_{1}&\beta^{(i)}_{2}&\dots&\beta^{(i)}_{d}\end{bmatrix}italic_Q start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (11)
|αj(i)|2+|βj(i)|2=1superscriptsubscriptsuperscript𝛼𝑖𝑗2superscriptsubscriptsuperscript𝛽𝑖𝑗21\left|\alpha^{(i)}_{j}\right|^{2}+\left|\beta^{(i)}_{j}\right|^{2}=1| italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 (12)

where n𝑛nitalic_n is the number of dimensions, i𝑖iitalic_i is the index of the solution in the population of size N𝑁Nitalic_N.

The quantum representation of each population member is given by Eq. (11) where the components satisfy the condition in Eq. (12). It consists of N𝑁Nitalic_N candidate solution vectors (Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), where n𝑛nitalic_n is the population size and each solution vector is of size 2n2𝑛2*n2 ∗ italic_n, where n𝑛nitalic_n is the number of features. Often thus, formed quantum state vectors need to be converted into binary encoded solutions for combinatorial optimization problems. In the literature, the following rule is generally adopted.

xj(i)={1,if rand(0,1)<|βj(i)|20,otherwisesubscriptsuperscript𝑥𝑖𝑗cases1if rand01superscriptsubscriptsuperscript𝛽𝑖𝑗20otherwisex^{(i)}_{j}=\begin{cases}1,&\text{if }\text{rand}(0,1)<\left|\beta^{(i)}_{j}% \right|^{2}\\ 0,&\text{otherwise}\end{cases}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if roman_rand ( 0 , 1 ) < | italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW (13)

where i𝑖iitalic_i and j𝑗jitalic_j are the solution and feature indices, respectively.

However, we observed that as the number of dimensions (n𝑛nitalic_n) increases, more features are being selected. This observation prompted us to introduce a constraint-based quantum state-to-binary solution conversion. The modified rule is defined as follows (refer to Eq. (14)). This Equation is called as the threshold trick. This is employed for both the QBDE-I and QBDE-II variants.

xi={1,if rand(0,1)<|βj(i)|2 and d<θ0,otherwisesubscript𝑥𝑖cases1if rand01superscriptsubscriptsuperscript𝛽𝑖𝑗2 and 𝑑𝜃0otherwisex_{i}=\begin{cases}1,&\text{if }\text{rand}(0,1)<\left|\beta^{(i)}_{j}\right|^% {2}\text{ and }d<\theta\\ 0,&\text{otherwise}\end{cases}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if roman_rand ( 0 , 1 ) < | italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and italic_d < italic_θ end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW (14)

Here, θ𝜃\thetaitalic_θ represents a user-defined parameter within the range of [0,1]01[0,1][ 0 , 1 ], d𝑑ditalic_d is the number generated from rand(0,1).

We proposed two different variants of chaotic quantum algorithms guided by the Lyapunov exponent and threshold trick. The distinction between them is as follows: (i) CLQBDE-I is a non-gate variant, and (ii) CLQBDE-II is a gate-variant where a rotation gate is employed in place of quantum mutation.

3.2.1 CLQBDE-I

The proposed Chaotic Quantum Binary Differential Evolution-I (CLQBDE-I) begins by generating the quantum matrix as described in Eq. (11). For each candidate solution i𝑖iitalic_i, the αj(i)subscriptsuperscript𝛼𝑖𝑗\alpha^{(i)}_{j}italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT variables are generated from the chaotic series, while the corresponding βj(i)subscriptsuperscript𝛽𝑖𝑗\beta^{(i)}_{j}italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT variables are obtained by using the relation Eq. (12). As mentioned earlier, the first step of the chaotic series is guided by the Lyapunov exponent. Once generated, the binary encoded solution is produced using the threshold trick outlined in Eq. (14). The fitness score is calculated by training the classifier, LR or LLR.

[αj(i)βj(i)]=[αj(x1)+F(αj(x2)αj(x3))βj(x1)+F(βj(x2)βj(x3))]matrixsubscriptsuperscript𝛼𝑖𝑗subscriptsuperscript𝛽𝑖𝑗matrixsubscriptsuperscript𝛼𝑥1𝑗𝐹subscriptsuperscript𝛼𝑥2𝑗subscriptsuperscript𝛼𝑥3𝑗subscriptsuperscript𝛽𝑥1𝑗𝐹subscriptsuperscript𝛽𝑥2𝑗subscriptsuperscript𝛽𝑥3𝑗\small\begin{bmatrix}\alpha^{(i)}_{j}\\ \beta^{(i)}_{j}\end{bmatrix}=\begin{bmatrix}\alpha^{(x1)}_{j}+F\cdot(\alpha^{(% x2)}_{j}-\alpha^{(x3)}_{j})\\ \beta^{(x1)}_{j}+F\cdot(\beta^{(x2)}_{j}-\beta^{(x3)}_{j})\end{bmatrix}[ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_x 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_F ⋅ ( italic_α start_POSTSUPERSCRIPT ( italic_x 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_α start_POSTSUPERSCRIPT ( italic_x 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_x 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_F ⋅ ( italic_β start_POSTSUPERSCRIPT ( italic_x 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_β start_POSTSUPERSCRIPT ( italic_x 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] (15)

Then, the initial population undergoes mutation operation (refer to Eq. 15) effected by each candidate solution’s α𝛼\alphaitalic_α and β𝛽\betaitalic_β states by randomly selecting the three solutions. This operation is performed for a pre-specified population size.

[αj(u)βj(u)]={[αj(m)βj(m)],if d<CR and jrandi[αj(u)βj(u)],otherwisematrixsubscriptsuperscript𝛼𝑢𝑗subscriptsuperscript𝛽𝑢𝑗casesmatrixsubscriptsuperscript𝛼𝑚𝑗subscriptsuperscript𝛽𝑚𝑗if 𝑑𝐶𝑅 and 𝑗𝑟𝑎𝑛𝑑𝑖otherwiseotherwisematrixsubscriptsuperscript𝛼𝑢𝑗subscriptsuperscript𝛽𝑢𝑗otherwise\begin{bmatrix}\alpha^{(u)}_{j}\\ \beta^{(u)}_{j}\end{bmatrix}=\begin{cases}\begin{bmatrix}\alpha^{(m)}_{j}\\ \beta^{(m)}_{j}\end{bmatrix},&\text{if }d<CR\text{ and }j\neq\text{$randi$}\\ \\ \begin{bmatrix}\alpha^{(u)}_{j}\\ \beta^{(u)}_{j}\end{bmatrix},&\text{otherwise}\end{cases}[ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = { start_ROW start_CELL [ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , end_CELL start_CELL if italic_d < italic_C italic_R and italic_j ≠ italic_r italic_a italic_n italic_d italic_i end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL [ start_ARG start_ROW start_CELL italic_α start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_β start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , end_CELL start_CELL otherwise end_CELL end_ROW (16)

where d𝑑ditalic_d is the number generated from rand(0,1).

Subsequently, the mutated quantum matrix of each of the solutions undergoes crossover operation as described in Eq. 16. Selection is then performed by combining parent and offspring populations and sorting them based on fitness scores. The resulting population becomes the parent population for the next iteration. This iterative process continues for a pre-specified number of iterations, resulting in the convergence of the algorithm, and the resultant population is evaluated on the test data, which remains unchanged throughout the evolution process.

3.2.2 CLQBDE-II

We proposed another variant, gate-based quantum binary differential evolution, and named it chaotic quantum binary differential evolution-II (CQBDE-II). The distinction between them lies in employing the rotation gate in place of the quantum mutation operation to generate the mutated quantum matrix.

The population is initialized The rotation gate is introduced to mimic the behaviour of the mutation gate. These rotation gates are represented as unitary matrices, which are employed to rotate the state of a qubit based on the rotation angle. We followed the look-up table obtained from [19] to decide the rotation angle. The rotation gate is multiplied with each qubit of the corresponding qubit in the candidate solution and thus the mutated vector is generated. Thissolution vector is then subjected to the crossover operation, selection, and threshold trick. These steps are identical to that of CLQBDE-I as discussed in the sub-section 3.2.1. The rotation gate employed in this study is presented in Eq. 17.

U(Δθ)=[cos(Δθ)sin(Δθ)sin(Δθ)cos(Δθ)]𝑈Δ𝜃matrixΔ𝜃Δ𝜃Δ𝜃Δ𝜃U(\Delta\theta)=\begin{bmatrix}\cos(\Delta\theta)&-\sin(\Delta\theta)\\ \sin(\Delta\theta)&\cos(\Delta\theta)\end{bmatrix}italic_U ( roman_Δ italic_θ ) = [ start_ARG start_ROW start_CELL roman_cos ( roman_Δ italic_θ ) end_CELL start_CELL - roman_sin ( roman_Δ italic_θ ) end_CELL end_ROW start_ROW start_CELL roman_sin ( roman_Δ italic_θ ) end_CELL start_CELL roman_cos ( roman_Δ italic_θ ) end_CELL end_ROW end_ARG ] (17)
Table 3: Encoding Scheme of the population
Key Value:
Key1𝐾𝑒subscript𝑦1Key_{1}italic_K italic_e italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT \langle BinaryVector1𝐵𝑖𝑛𝑎𝑟𝑦𝑉𝑒𝑐𝑡𝑜subscript𝑟1BinaryVector_{1}italic_B italic_i italic_n italic_a italic_r italic_y italic_V italic_e italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, QuantumMatrix1𝑄𝑢𝑎𝑛𝑡𝑢𝑚𝑀𝑎𝑡𝑟𝑖subscript𝑥1QuantumMatrix_{1}italic_Q italic_u italic_a italic_n italic_t italic_u italic_m italic_M italic_a italic_t italic_r italic_i italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, trainedModelCoef1𝑡𝑟𝑎𝑖𝑛𝑒𝑑𝑀𝑜𝑑𝑒𝑙𝐶𝑜𝑒subscript𝑓1trainedModelCoef_{1}italic_t italic_r italic_a italic_i italic_n italic_e italic_d italic_M italic_o italic_d italic_e italic_l italic_C italic_o italic_e italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, AUC1𝐴𝑈subscript𝐶1AUC_{1}italic_A italic_U italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT \rangle
Key2𝐾𝑒subscript𝑦2Key_{2}italic_K italic_e italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT \langle BinaryVector2𝐵𝑖𝑛𝑎𝑟𝑦𝑉𝑒𝑐𝑡𝑜subscript𝑟2BinaryVector_{2}italic_B italic_i italic_n italic_a italic_r italic_y italic_V italic_e italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, QuantumMatrix2𝑄𝑢𝑎𝑛𝑡𝑢𝑚𝑀𝑎𝑡𝑟𝑖subscript𝑥2QuantumMatrix_{2}italic_Q italic_u italic_a italic_n italic_t italic_u italic_m italic_M italic_a italic_t italic_r italic_i italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, trainedModelCoef2𝑡𝑟𝑎𝑖𝑛𝑒𝑑𝑀𝑜𝑑𝑒𝑙𝐶𝑜𝑒subscript𝑓2trainedModelCoef_{2}italic_t italic_r italic_a italic_i italic_n italic_e italic_d italic_M italic_o italic_d italic_e italic_l italic_C italic_o italic_e italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, AUC2𝐴𝑈subscript𝐶2AUC_{2}italic_A italic_U italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT \rangle
KeyN𝐾𝑒subscript𝑦𝑁Key_{N}italic_K italic_e italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT \langle BinaryVectorN𝐵𝑖𝑛𝑎𝑟𝑦𝑉𝑒𝑐𝑡𝑜subscript𝑟𝑁BinaryVector_{N}italic_B italic_i italic_n italic_a italic_r italic_y italic_V italic_e italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, QuantumMatrixN𝑄𝑢𝑎𝑛𝑡𝑢𝑚𝑀𝑎𝑡𝑟𝑖subscript𝑥𝑁QuantumMatrix_{N}italic_Q italic_u italic_a italic_n italic_t italic_u italic_m italic_M italic_a italic_t italic_r italic_i italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, trainedModelCoefN𝑡𝑟𝑎𝑖𝑛𝑒𝑑𝑀𝑜𝑑𝑒𝑙𝐶𝑜𝑒subscript𝑓𝑁trainedModelCoef_{N}italic_t italic_r italic_a italic_i italic_n italic_e italic_d italic_M italic_o italic_d italic_e italic_l italic_C italic_o italic_e italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, AUCN𝐴𝑈subscript𝐶𝑁AUC_{N}italic_A italic_U italic_C start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT \rangle

3.3 Proposed Parallel Approach

The proposed parallel algorithm adopts an island approach, which divides the data into several partitions known as data islands.

Table 3 depicts the population schema maintained for all approaches in this work. The population consists of a solution of size N. Each solution has two different fields: (i) Key field having the unique id information to identify the solution uniquely, (ii) The Value field has the following subfields: (a) Binary vector: which is of length number of features, nfeat is a binary vector where the presence of ’0’ says that a particular feature is not selected, and ’1’ says that the particular feature is selected. (b) Quantum matrix: Here, the quantum state information of the corresponding is stored.(c) Trained model coefficients: as we know that in the wrapper methods, a classifier is chosen to evaluate the solution’s performance. Thus trained model coefficients are stored in this sub-field. The main reason for storing them is to use them in the test phase. (d) AUC: The trained model results in an AUC for each solution. That information is stored in the sub-field. This kind of schema makes sure to preserve the solution-related information in a single space.

The proposed parallel approach comprises three main phases: Initialization, Training phase, and Test phase. As illustrated in Fig. 2, there are k𝑘kitalic_k sub-populations and k𝑘kitalic_k data islands. The parallel algorithm operates at two levels: driver and worker. The corresponding algorithms are presented in Algorithms 1 and 2 (see Appendix), respectively. All these phases, along with migration rule invocation, happen at the driver node, while the evolution of the parallel EAs happens during the training phase, and evaluation during the test phase is executed at the worker node. However, filtering top solutions post-migration and the aggregation of the final results during the test phase is performed at the driver node.

Phase-I: Initialization In this phase, the quantum matrix for the population size is chaotically initialized using the biased sampling method [42] depending on the underlying metaheuristic (i.e., DE / EA). This population follows the data structure as presented in Table 3. Subsequently, binary-encoded solutions are obtained using a threshold trick. Thus initialized population is treated as a global population and broadcasted along with hyperparameters from the driver to the worker nodes.

Phase-II: Training Phase A miniature EA is evolved in parallel in each data island at each worker node. The block diagram of miniature wrapper is depicted in Fig. 1(d). A sub-population/local population of size localN𝑙𝑜𝑐𝑎𝑙𝑁localNitalic_l italic_o italic_c italic_a italic_l italic_N (<Nabsent𝑁<N< italic_N) is initially extracted by following random sampling with replacement. The trainandupdate𝑡𝑟𝑎𝑖𝑛𝑎𝑛𝑑𝑢𝑝𝑑𝑎𝑡𝑒train-and-updateitalic_t italic_r italic_a italic_i italic_n - italic_a italic_n italic_d - italic_u italic_p italic_d italic_a italic_t italic_e phase begins, where fitness scores are evaluated using the classifier, LR or LLR, followed by the updation of the quantum matrix, binary encoded solutions, classifier coefficients, and AUC in the respective fields (see Table 3). This trainandupdate𝑡𝑟𝑎𝑖𝑛𝑎𝑛𝑑𝑢𝑝𝑑𝑎𝑡𝑒train-and-updateitalic_t italic_r italic_a italic_i italic_n - italic_a italic_n italic_d - italic_u italic_p italic_d italic_a italic_t italic_e process is common for all the algorithms. Heuristics specific to each algorithm are applied to the quantum matrix of the population, resulting in the generation of an offspring quantum matrix. Trainandupdate𝑇𝑟𝑎𝑖𝑛𝑎𝑛𝑑𝑢𝑝𝑑𝑎𝑡𝑒Train-and-updateitalic_T italic_r italic_a italic_i italic_n - italic_a italic_n italic_d - italic_u italic_p italic_d italic_a italic_t italic_e operations follow, and the top localN𝑙𝑜𝑐𝑎𝑙𝑁localNitalic_l italic_o italic_c italic_a italic_l italic_N solutions are retained after selection. This process is repeated for a pre-specified maximum number of generations (mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n), resulting in klps𝑘𝑙𝑝𝑠k*lpsitalic_k ∗ italic_l italic_p italic_s solutions from each of the k𝑘kitalic_k data islands. Upon completing these steps, the control returns to the driver, where the migration policy is invoked.

Migration Policy Solutions are sorted based on the fitness scores, and the top N𝑁Nitalic_N solutions are selected. After invoking the migration policy, the worker algorithm is again executed using the population obtained post-migration which is distributed to worker nodes. Subsequently, the sub-population corresponding to a single island is selected by following random sampling with replacement and the evolution process continues. This process is repeated for a pre-specified number of migrations (mMig𝑚𝑀𝑖𝑔mMigitalic_m italic_M italic_i italic_g).

Phase-III: Test Phase In this phase, the converged population obtained in the training phase is evaluated on the test dataset. That means, AUC is computed on the test dataset using the coefficients corresponding to each solution. Subsequently, test fitness scores are computed, and the results are reported accordingly. Notably, while the proposed parallel metaheuristics differ in their respective heuristics, they adhere uniformly to the same train-and-update step, migration policy invocation, and test phase.

4 Results & Discussions

All experiments are conducted on a 5-node cluster configuration, comprising a driver node and 4 slave nodes, each equipped with Intel i7 9thsuperscript9𝑡9^{th}9 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT generation processors and 32GB of RAM. Notably, the driver node serves the dual role as the driver node as well as the worker node. The benchmark datasets analyzed in this study are briefly described in Table 4, while Table 5 presents the optimal hyperparameters identified through meticulous fine-tuning following grid-search. A train-test split ratio of 80%:20% is followed using stratified random sampling to ensure equal proportion of representation of classes in both datasets. All algorithms are executed for 20 runs in order to mitigate the impact of random seed variation on the algorithm, which is the standard practice followed in the evolutionary algorithms (EAs) literature. For all the experiments, the number of migrations is fixed at one. The top solution in each run is identified as the one achieving the highest AUC. Thus, we get 20 top solutions, one each for 20 runs. Mean AUC and mean cardinality of these 20 top solutions are computed and presented in Tables 6-10.

The proposed different quantum algorithms are compared against a non-quantum counterpart, BDE, and the Chaotic Quantum-Inspired Evolution Algorithm (CQIEA)[35]. To assess the effectiveness of the proposed method, we developed the variants listed in Table 2.

The reasons for developing these variants are as follows: The algorithms QBDE-I and QBDE-II are meant to see the effectiveness of introducing quantum operators; CQBDE-I and CQBDE-II are designed to see the effectiveness of chaos; CLQBDE-I and CQBDE-II are designed to see the role of the Lyapunov exponent when coupled with chaotic maps. Further, CTQIEA and CLTQIEA are introduced to see the effectiveness of the threshold trick and threshold trick in combination with the Lyapunov exponent, respectively.

Table 4: Dataset Information
Dataset Name # Objects # Features Size of dataset
Epsilon 500,000 2,000 10.8 GB
IEEE Malware 1,500,000 1,000 3.2 GB
OVA_Omentum 1,584 10,935 108.3 MB
OVA_Uterus 1,584 10,935 108.3 MB
Table 5: Hyper parameters employed in the current study
Dataset Name N𝑁Nitalic_N (localN𝑙𝑜𝑐𝑎𝑙𝑁localNitalic_l italic_o italic_c italic_a italic_l italic_N) Total generations mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n per migration CR MR θ𝜃\thetaitalic_θ
Epsilon 30 (15) 20 (10) 0.90 0.80 0.10
IEEE Malware 30 (15) 10 (5) 0.90 0.80 0.15
OVA_Omentum 300 (200) 20 (10) 0.90 0.8 0.01
OVA_Uterus 300 (200) 20 (10) 0.90 0.8 0.01
Table 6: Results of BDE variants
Datasets
Algorithm Epsilon IEEE Malware OVA_Omentum OVA_Uterus
f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
BDE+LR 1321.3 0.835 646.50 0.802 876.26 0.836 63.35 0.788
BDE+LLR 1323.7 0.846 505.7 0.817 112.65 0.833 108.65 0.802
Table 7: Results of QBDE variants
Datasets
Algorithm Epsilon IEEE Malware OVA_Omentum OVA_Uterus
f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
QBDE-I+LR 204.7 0.742 121.25 0.753 159.25 0.800 69.90 0.807
QBDE-II+LR 214.75 0.745 127.30 0.758 70.2 0.846 70.65 0.808
QBDE-I+LLR 248.95 0.753 168.5 0.826 168.5 0.826 107.75 0.796
QBDE-II+LLR 140.85 0.783 140.6 0.793 107.0 0.944 108.4 0.799
Table 8: Results of CQBDE variants
Datasets
Algorithm Epsilon IEEE Malware OVA_Omentum OVA_Uterus
f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
CQBDE-I+LR 269.6 0.522 142.05 0.80 794.7 0.5 830.05 0.5
CQBDE-II+LR 143.85 0.780 144.5 0.791 107.5 0.905 106.45 0.842
CQBDE-I+LLR 397.95 0.797 162.55 0.818 159.25 0.808 108.75 0.796
CQBDE-II+LLR 138.65 0.784 145.85 0.799 110.35 0.855 101.7 0.808
Table 9: Results of CLQBDE variants
Datasets
Algorithm Epsilon IEEE Malware OVA_Omentum OVA_Uterus
f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
CLQBDE-I+LR 273.85 0.523 143.8 0.806 798.7 0.5 798.7 0.5
CLQBDE-II+LR 141.5 0.777 140.6 0.793 107 0.944 107.15 0.853
CLQBDE-I+LLR 145.95 0.755 145.95 0.796 156.95 0.923 151.0 0.865
CLQBDE-II+LLR 141.70 0.784 140.9 0.801 110.25 0.954 112.2 0.812
Table 10: Results of CQIEA variants
Datasets
Algorithm Epsilon IEEE Malware OVA_Omentum OVA_Uterus
f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
CQIEA+LR 1968.10 0.841 885.5 0.802 5033.0 0.788 6942.9 0.792
CTQIEA+LR 421.0 0.790 217.65 0.779 240.6 0.947 241.1 0.884
CLTQIEA+LR 412.7 0.790 208.3 0.812 290.0 0.819 280.95 0.794
CQIEA+LLR 888.0 0.853 677.9 0.810 8356.25 0.837 6942.90 0.792
CTQIEA+LLR 256.20 0.815 190.8 0.798 261.50 0.842 318.8 0.806
CLTQIEA+LLR 258.50 0.815 192.05 0.801 258.70 0.893 332.3 0.803
Table 11: Results of the paired t-test
Dataset Model (Top1 vs Top2 w.r.t AUC) t-statistic p-value
Epsilon CTQIEA + LLR* vs 0.421 0.675
CLTQIEA+LLR
IEEE Malware QBDE-I+LLR* vs 1.204 0.235
CQBDE-I + LLR
OVA_Omentum CLQBDE-II + LLR* vs 1.14 0.260
CTQIEA + LR
OVA_Uterus CTQIEA + LR vs 3.84 0.0004
CLQBDE-I + LLR**
*Better method based on cost-benefit analysis.
**Better method based on statistical significance.

4.1 Comparative Analysis Ablation Study

We conducted our ablation study in a boosted manner. That means in the first instance, parallel BDE is developed as a baseline and then quantum-inspired versions of BDE are developed (boosted wrapper-I), and finally, chaotic versions of these quantum-inspired algorithms are developed (boosted wrapper-II).

Initially, non-quantum variants, i.e., BDE variants, are developed as a baseline, and the corresponding results are presented in Table 6. With an aim to remove irrelevant features, we employed LLR in place of LR as a classifier. LLR demonstrated its ability to weed out unimportant features in the following datasets: (i) In the IEEE Malware dataset, BDE with LLR as the classifier, obtained mean AUC, and there was a reduction of 150 features. (ii) Similarly, in the OVA_Omentum dataset, LLR led to a great reduction (around 600 features) in terms of mean cardinality, with slightly improved mean AUC. However, this observation is not consistently noticed across the remaining two datasets. In the other datasets (Epsilon and OVA_Uterus), the mean AUC is slightly improved. However, it is accompanied by a modest increase in the mean cardinality.

The strength of LLR over LR is that the latter retains unimportant features during the evolution process, leading to increased cardinality of the solutions.

In the boosted wrapper-I, to obtain better optimal results, we developed their quantum-inspired counterparts, namely, QBDE-I and QBDE-II, where the QBDE-I variant has gates, and the QBDE-II variant has no gates. Consequently, we achieved lower mean cardinality accompanied by higher mean AUC across all datasets except the Epsilon dataset (see Table 7).

Then, in the boosted wrapper-II, to further improve the exploration capability, we introduced chaotic initialization in two different ways: i.e., with and without Lyapunov-guided chaotic series (see Table 8, (see Table 9)) resulting in CLQBDE and CQBDE respectively. In the latter case (i.e., without the Lyapunov exponent), we noticed a reduction in mean AUC in all the datasets than their corresponding non-chaotic counterparts (i.e. QBDE variants). This is likely due to the fact that we do not operate in a true chaotic regime.

To balance both the exploration and exploitation capability of an algorithm, we introduced Lyapunov-guided chaotic series (see Table  9)in the initialization phase. It turned out that these variants achieved efficient solutions meaning higher mean AUC with decreased mean cardinality. The chaotic numbers introduced in the first step (of CQBDE variants) are not generated in a truly chaotic regime. However, after introducing the Lyapunov exponent-guided chaotic numbers (i.e. CLQBDE variants), we discarded the first 5000 time steps, ensuring the chaotic series is in the true chaotic regime. This transition helped the algorithm improve exploration and exploitation capability as evidenced by the results in Table 9). This is spectacularly observed in the case of both high-dimensional datasets, i.e., OVA_Omentum and OVA_Uterus. However, the true chaotic regime alone did not suffice in the other two datasets.

Further, to create a level-playing field to the algorithm we would like to compare with (i.e. variants of CQIEA [35] and CLQIEA), we demonstrated the effectiveness of the threshold trick and the influence of Lyapunov exponent-guided chaotic numbers by invoking them in the variants of CQIEA [35] and CLQIEA in Table 10. We noticed that the original CQIEA [35], i.e., without the threshold trick and the Lyapunov exponent-guided chaotic numbers, yielded unacceptably high mean cardinality, impacting its performance adversely. However, after introducing the threshold trick combined with Lyapunov exponent-guided chaotic numbers, we could effectively control the cardinality while improving the mean AUC. This behaviour is particularly evident when LASSO LR (LLR) is employed as the classifier.

4.2 Statistical Testing

A two-tailed t-test at 5% level of significance and 38 (=20+20-2) degrees of freedom is conducted on the mean AUC obtained from 20 runs across the top two algorithms (w.r.t mean AUC) in each dataset (refer to Table 11). The t-test demonstrates that in three out of four datasets (except OVA_Uterus), the top 2 best algorithms solely based on AUC, turned out to be statistically similar. The top 2 best algorithms corresponding to each dataset are as follows: (i) in the Epsilon dataset (CTQIEA+LLR and CLTQIEA+LLR), (ii) in IEEE Malware dataset (QBDE-I+LLR and CQBDE-I+LLR), (iii) in OVA_Omentum dataset (CLQBDE-II+LLR and CTQIEA+LR) and (iv) in OVA_Uterus dataset (CTQIEA+LR and CLQBDE-I+LLR).

In the case of statistical similarity, to break the tie, preference is accorded to the algorithm that selected less mean cardinality. In other words, higher preference is accorded to the algorithm that yielded a great reduction in mean cardinality and an insignificant reduction in mean AUC. After performing this type of cost-benefit analysis, the better-performing algorithm is presented in bold face in Table 11).

It is noticed that in three datasets, the top-performing algorithm yielded a higher mean AUC with almost similar mean cardinality. In these cases, preference is accorded to the one with a higher mean AUC. For example, in the IEEE Malware dataset, QBDE-I+LLR obtained a mean AUC of 0.826 with a mean cardinality of 168.5. However, the next-best algorithm (CQBDE-I+LLR) obtained a mean AUC of 0.818 (which is <<< 0.008 than that obtained by QBDE-I+LLR) with a mean cardinality of 162.55. Here, the reduction in cardinality is very minimal (<<< 6 features). This makes the QBDE-I+LLR win over the CQBDE-I algorithm. The same cost-benefit analysis is followed for the other datasets as well, resulting in the Epsilon and OVA_Omentum datasets; the QBDE-I + LLR and CLQBDE-II+LLR are the winners, respectively. However, in the OVA_Uterus dataset, CLQBDE-I + LLR turned out to be statistically significant when compared to the CTQIEA-I + LR. Accordingly, CLQBDE-I + LLR is presented in boldface in Table 11

It is important to note that when algorithm A and algorithm B are statistically similar, the winner is chosen based on the cost-benefit analysis. However, if statistical significance is observed, the preference is accorded to the algorithm, which is the statistically significant algorithm.

In summary, the insights derived from the current study are as follows:

  • In the Epsilon dataset, CTQIEA + LLR emerged as the winner due to the significant impact of the threshold trick and chaos.

  • In the IEEE Malware dataset, QBDE-I+LLR was the best algorithm, primarily due to the effectiveness of the threshold trick.

  • In the OVA_Omentum dataset, CLQBDE-II + LLR outperformed others due to the introduction of Lyapunov-based chaos and the threshold trick.

  • In the OVA_Uterus dataset, CLQBDE-I + LLR was the winner largely due to the guidance by Lyapunov.

  • The CLQBDE variants with LLR as a classifier turned out to be the best wrappers no matter a quantum gate is adopted or not in both high-dimensional datasets.

5 Conclusions

This study proposes CLQBDE-I, where a Lyapunov exponent-guided chaotic map-based initialization method is incorporated into the quantum-inspired BDE algorithm for FSS. Our results show its superiority on high-dimensional datasets and its competitive performance on Epsilon and IEEE Malware datasets compared to the alternative algorithms. CLQBDE-I outperformed not only our baseline, namely, QBDE but also other baselines CLTQIEA-I and CTQIEA-I in all but the IEEE Malware dataset in the latter algorithm. Overall, working in conjunction with QBDE variants, integrating Lyapunov exponent-guided chaotic dynamics into them yielded better solutions than simple, chaotic-based initialization methods.

Future research directions include extending this approach to multi-objective environments, designing hybrid chaotic mapping techniques, and exploring chaotic quantum hybrid EAs. This methodology also holds promise for large-scale clustering and feature selection applications across various domains like finance and economics.

References

  • \bibcommenthead
  • Adhianto et al. [2020] Adhianto, L., S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. Tallent. 2020. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22: 685–701 .
  • Al-Sawwa and Ludwig [2020] Al-Sawwa, J. and S. Ludwig. 2020. Performance evaluation of a cost-sensitive differential evolution classifier using spark – imbalanced binary classification. J. Comput. Sci. 40: 101065 .
  • Cao et al. [2017] Cao, B., J. Zhao, Z. Lv, and X. Liu. 2017. A distributed parallel cooperative coevolutionary multiobjective evolutionary algorithm for large-scale optimization. IEEE Transactions on Industrial Informatics 13: 2030–2038 .
  • Chandrashekar and Sahin [2014] Chandrashekar, B. and F. Sahin. 2014. A survey on feature selection methods. Comput. Electr. Eng. 40: 16–28 .
  • Chen et al. [2016] Chen, Z., X. Jiang, J. Li, S. Li, and L. Wang. 2016. Pdeco: Parallel differential evolution for clusters optimization. J. Comput. Chem. 34: 1046–1059 .
  • Cho et al. [2019] Cho, P., T. Nyunt, and T. Aung 2019. Differential evolution for large-scale clustering. In Proc. 2019 9th Int. Work. Comput. Sci. Eng. (WCSE 2019 SPRING), pp.  58–62.
  • Danforth [2013] Danforth, C.M. 2013. Chaos in an atmosphere hanging on a wall. Mathematics of Planet Earth 2013 .
  • Das et al. [2016] Das, S., S.S. Mullick, and P.N. Suganthan. 2016. Recent advances in differential evolution–an updated survey. Swarm and Evolutionary Computation 27: 1–30 .
  • Das and Suganthan [2011] Das, S. and P. Suganthan. 2011, Feb. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15(1): 4–31 .
  • de P. Veronese and Krohling [2010] de P. Veronese, L. and R. Krohling 2010. Differential evolution algorithm on the gpu with c-cuda. In IEEE Congress on Evolutionary Computation, pp.  1–7.
  • Deng et al. [2015] Deng, C., X. Tan, X. Dong, and Y. Tan. 2015. A parallel version of differential evolution based on resilient distributed datasets model, Commun. Comput. Inf. Sci., Volume 562, 84–93.
  • Deng et al. [2021] Deng, W., S. Shang, X. Cai, H. Zhao, Y. Zhou, H. Chen, and W. Deng. 2021. Quantum differential evolution with cooperative coevolution framework and hybrid mutation strategy for large scale optimization. Knowledge-Based Systems 224: 107080. 10.1016/j.knosys.2021.107080 .
  • Deng et al. [2022] Deng, W., J. Xu, X.Z. Gao, and H. Zhao. 2022, Mar. An enhanced msiqde algorithm with novel multiple strategies for global optimization problems. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52(3): 1578–1587. 10.1109/TSMC.2020.3030792 .
  • Falco et al. [2017] Falco, I.D., U. Scafuri, E. Tarantino, and A.D. Cioppa 2017. A distributed differential evolution approach for mapping in a grid environment. In 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP’07), pp.  442–449.
  • Ge et al. [2018] Ge, Y., W. Yu, Y. Lin, Y. Gong, Z. Zhan, W. Chen, and J. Zhang. 2018. Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Transactions on Cybernetics 48: 2166–2180 .
  • Glotic et al. [2014] Glotic, A., P. Kitak, J. Pihler, and I. Ticar. 2014. Parallel self-adaptive differential evolution algorithm for solving short-term hydro scheduling problem. IEEE Transactions on Power Systems 29: 2347–2358 .
  • Gupta et al. [2016] Gupta, P., A. Sharma, and R. Jindal. 2016. Scalable machine-learning algorithms for big data analytics: A comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6): 194–214 .
  • Han et al. [2021] Han, D., J. Wang, C. Tang, T. Weng, K. Li, and C. Dobre. 2021. A multi-objective distance vector-hop localization algorithm based on differential evolution quantum particle swarm optimization. International Journal of Communication Systems 34(14): e4924 .
  • Han and Kim [2002] Han, K.H. and J.H. Kim. 2002, Dec. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization. IEEE Transactions on Evolutionary Computation 6(6): 580–593. 10.1109/TEVC.2002.804320 .
  • Harada et al. [2020] Harada, T., M. Kaidan, and R. Thawonmas. 2020. Comparison of synchronous and asynchronous parallelization of extreme surrogate-assisted multi-objective evolutionary algorithm. Natural Computing .
  • He et al. [2021] He, Z., H. Peng, J. Chen, C. Deng, and Z. Wu. 2021. A spark-based differential evolution with grouping topology model for large-scale global optimization. Cluster Comput. 24: 515–535 .
  • Hota and Pat [2010] Hota, A.R. and A. Pat 2010. An adaptive quantum-inspired differential evolution algorithm for 0-1 knapsack problem. In 2010 Second World Congress on Nature and Biologically Inspired Computing (NaBIC), Kitakyushu, Japan, pp.  703–708.
  • Kromer et al. [2013] Kromer, P., J. Platos, and V. Snasel 2013. Scalable differential evolution for many-core and clusters in unified parallel c. In 2013 IEEE International Conference on Cybernetics (CYBCO), pp.  180–185.
  • Kumari et al. [2013] Kumari, A.C., K. Srinivas, and M. Gupta. 2013. Software requirements optimization using multi-objective quantum-inspired hybrid differential evolution, In EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II, ed. et al., O.S., Volume 175 of Advances in Intelligent Systems and Computing. Springer, Berlin, Heidelberg. 10.1007/978-3-642-31519-0_7.
  • Liu et al. [2015] Liu, T., X. Gao, and L. Wang. 2015. Multi-objective optimization method using an improved nsga-ii algorithm for oil–gas production process. Journal of the Taiwan Institute of Chemical Engineers 57: 42–53 .
  • Lu et al. [2013] Lu, H., R. Niu, J. Liu, and Z. Zhu. 2013. A chaotic non-dominated sorting genetic algorithm for the multi-objective automatic test task scheduling problem. Applied Soft Computing 13(5): 2790–2802 .
  • Maier et al. [2019] Maier, H.R., S. Razavi, Z. Kapelan, L.S. Matott, J. Kasprzyk, and B.A. Tolson. 2019. Introductory overview: Optimization using evolutionary algorithms and other metaheuristics. Environmental Modelling & Software 114: 195–213 .
  • May [1976] May, R. 1976. Simple mathematical models with very complicated dynamics. Nature 261: 459–467 .
  • Olyaei et al. [2017] Olyaei, A., C. Wu, and W. Kinsner. 2017. Detecting unstable periodic orbits in chaotic time series using synchronization. American Physical Society 96 .
  • Packard et al. [1980] Packard, N.H., J.P. Crutchfield, J.D. Farmer, and R.S. Shaw. 1980. Geometry from a time series. Phys. Rev. Lett. 45: 712 .
  • Pan and Da [2015] Pan, I. and S. Da. 2015. Fractional-order load-frequency control of interconnected power systems using chaotic multi-objective optimization. Applied Soft Computing 29: 328–344 .
  • Pant et al. [2020] Pant, M., H. Zaheer, L. Garcia-Hernandez, and A. Abraham. 2020. Differential evolution: A review of more than two decades of research. Engineering Applications of Artificial Intelligence 90: 103479 .
  • Peralta et al. [2015] Peralta, D., S.D. Río, S. Ramírez-Gallego, I. Triguero, J. Benitez, and F. Herrera. 2015. Evolutionary feature selection for big data classification: A mapreduce approach. Math. Probl. Eng. .
  • Qian et al. [2009] Qian, B., L. Wang, D. Huang, W. Wang, and X. Wang. 2009. An effective hybrid de-based algorithm for multi-objective flow shop scheduling with limited buffers. Computers & Operations Research 36(1): 209–233 .
  • Ramos and Vellasco [2020] Ramos, A.C. and M. Vellasco 2020. Chaotic quantum-inspired evolutionary algorithm: enhancing feature selection in bci. In 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, pp.  1–8.
  • Rastogi and Shim [1999] Rastogi, R. and K. Shim 1999. Scalable algorithms for mining large databases. In Tutorial Notes of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  • Rong et al. [2019] Rong, M., D. Gong, and X. Gao. 2019. Feature selection and its use in big data: Challenges, methods, and trends. IEEE Access 7: 19709–19725 .
  • Schliemann et al. [2002] Schliemann, J., A.V. Khaetskii, and D. Loss. 2002. Spin decay and quantum parallelism. Physical Review B 66(24): 245303 .
  • Srikrishna et al. [2015] Srikrishna, V., R. Ghosh, V. Ravi, and K. Deb. 2015. Elitist quantum-inspired differential evolution based wrapper for feature subset selection, In Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2015, eds. Bikakis, A. and X. Zheng, Volume 9426 of Lecture Notes in Computer Science. Springer, Cham. 10.1007/978-3-319-26181-2_11.
  • Teijeiro et al. [2016] Teijeiro, D., X. Pardo, P. González, J. Banga, and R. Doallo. 2016. Implementing parallel differential evolution on spark, In Applications of Evolutionary Computation. EvoApplications 2016, eds. Squillero, G. and P. Burelli, Volume 9598 of Lecture Notes in Computer Science. Springer, Cham.
  • Thomert et al. [2016] Thomert, D., A. Bhattacharya, E. Caron, K. Gadireddy, and L. Lefevre 2016. Parallel differential evolution approach for cloud workflow placements under simultaneous optimization of multiple objectives. In 2016 IEEE Congress on Evolutionary Computation (CEC), pp.  822–829.
  • Vivek et al. [2022] Vivek, Y., V. Ravi, and P. RadhaKrishna. 2022. Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment. Cluster Computing. 10.1007/s10586-022-03725-w .
  • Wang and Wang [2021] Wang, Y. and W. Wang. 2021. Quantum-inspired differential evolution with grey wolf optimizer for 0-1 knapsack problem. Mathematics 9(1233). 10.3390/math9111233 .
  • Wong et al. [2015] Wong, T., A. Qin, S. Wang, and Y. Shi. 2015. cusade: A cuda-based parallel self-adaptive differential evolution algorithm. IEEE Congress on Evolutionary Computation (CEC) 2: 375–388 .
  • Wu et al. [2019] Wu, G., R. Mallipeddi, and P.N. Suganthan. 2019. Ensemble strategies for population-based optimization algorithms–a survey. Swarm and Evolutionary Computation 44: 695–711 .
  • Xue et al. [2016] Xue, B., M. Zhang, W.N. Browne, and X. Yao. 2016. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20: 606–626 .
  • Zaharia et al. [2010] Zaharia, M., M. Chowdhury, M.J. Franklin, S. Shenker, and I. Stoica 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10).
  • Zhou [2010] Zhou, C. 2010. Fast parallelization of differential evolution algorithm using mapreduce. In Proc. 12th Annu. Genet. Evol. Comput. Conf. GECCO ’10, pp.  1113–1114.

Appendix A Details of the proposed approach

Refer to caption
Figure 3: Histograms
Table 12: Summary of parallel evolutionary algorithms in different environments
Authors Algorithm Environment Solved Problem
Zhou [17] DE Spark Discussed Pros and cons of various parallel approaches
Teijeiro et al. [18] DE Spark + AWS Benchmark functions
Cho et al. [19] DE Spark Clustering
Chen et al. [20] Modified DE SPMD Cluster Optimization
Al-Sawwa and Ludwig [21] DE Spark DE based classifier
Adhianto et al. [22] DE OpenMP Optical Network problem
Deng et al. [23] DE Spark Benchmark functions
He et al. [24] Five variants of DE Spark + Cloud Ring topology model applied to benchmark functions
Veronse & Krohling [25] DE CUDA Large scale optimization
Wong et al. [26] Self-Adaptive DE CUDA Benchmark functions
Cao et al. [27] DPCCMOEA MPI Large scale optimization
Ge et al. [28] DDE-AMS MPI Large scale optimization
Falco et al. [29] DE MPI Resource allocation
Glotic et al. [30] PSADE MATLAB Hydro Scheduling algorithm
Daoudi et al. [31] DE Hadoop Clustering
Thomert et al. [32] NSDE-II OpenMP Cloud work placement
Kromer et al. [33] DE Unified Parallel C Large scale optimization
Qian et al. [34] MPFPSP Multithreading Flow scheduling problem
Vivek et al. [12] PB-TADE, PB-DETA,PB-DE Spark FSS
Current study CLTQBDE-I, CLTQIEA Spark FSS
Algorithm 1 Driver Algorithm
1:Input: ps𝑝𝑠psitalic_p italic_s, lps𝑙𝑝𝑠lpsitalic_l italic_p italic_s, X𝑋Xitalic_X, Xtrain𝑋𝑡𝑟𝑎𝑖𝑛Xtrainitalic_X italic_t italic_r italic_a italic_i italic_n, Xtest𝑋𝑡𝑒𝑠𝑡Xtestitalic_X italic_t italic_e italic_s italic_t, mMig𝑚𝑀𝑖𝑔mMigitalic_m italic_M italic_i italic_g, mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n
2:Output: P𝑃Pitalic_P: population evolved after mMig𝑚𝑀𝑖𝑔mMigitalic_m italic_M italic_i italic_g migrations
3:i0𝑖0i\leftarrow 0italic_i ← 0
4:Qt𝑄𝑡absentQt\leftarrowitalic_Q italic_t ← Chaotically Initialize the quantum matrix
5:X𝑋absentX\leftarrowitalic_X ← getBinarySolution(Qt𝑄𝑡Qtitalic_Q italic_t)
6:P[ϕ,ϕ,,ϕ]ps×4𝑃subscriptitalic-ϕitalic-ϕitalic-ϕ𝑝𝑠4P\leftarrow[\phi,\phi,\ldots,\phi]_{ps\times 4}italic_P ← [ italic_ϕ , italic_ϕ , … , italic_ϕ ] start_POSTSUBSCRIPT italic_p italic_s × 4 end_POSTSUBSCRIPT
7:for i=0,1,,ps𝑖01𝑝𝑠i=0,1,\ldots,psitalic_i = 0 , 1 , … , italic_p italic_s do
8:     sol[i,X[i],ϕ,0.0]𝑠𝑜𝑙𝑖𝑋delimited-[]𝑖italic-ϕ0.0sol\leftarrow[i,X[i],\phi,0.0]italic_s italic_o italic_l ← [ italic_i , italic_X [ italic_i ] , italic_ϕ , 0.0 ] \triangleright Encoding Scheme (see Table 3)
9:     PPsol𝑃𝑃𝑠𝑜𝑙P\leftarrow P\cup solitalic_P ← italic_P ∪ italic_s italic_o italic_l
10:end for
11:(Xtrain1,Xtrain2,,Xtraink)Xtrain𝑋𝑡𝑟𝑎𝑖subscript𝑛1𝑋𝑡𝑟𝑎𝑖subscript𝑛2𝑋𝑡𝑟𝑎𝑖subscript𝑛𝑘𝑋𝑡𝑟𝑎𝑖𝑛(Xtrain_{1},Xtrain_{2},\ldots,Xtrain_{k})\leftarrow Xtrain( italic_X italic_t italic_r italic_a italic_i italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X italic_t italic_r italic_a italic_i italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X italic_t italic_r italic_a italic_i italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ← italic_X italic_t italic_r italic_a italic_i italic_n \triangleright Divide data into k𝑘kitalic_k islands
12:while i=0,1,,mMig𝑖01𝑚𝑀𝑖𝑔i=0,1,\ldots,mMigitalic_i = 0 , 1 , … , italic_m italic_M italic_i italic_g do
13:     Call IslandMapper(P𝑃Pitalic_P) \triangleright Migration Rule
14:     R𝑅absentR\leftarrowitalic_R ← Collect {pr1,pr2,,prk}subscript𝑝subscript𝑟1subscript𝑝subscript𝑟2subscript𝑝subscript𝑟𝑘\{p_{r_{1}},p_{r_{2}},\ldots,p_{r_{k}}\}{ italic_p start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT } from k𝑘kitalic_k islands
15:     R𝑅absentR\leftarrowitalic_R ← SortBasedOnFitness(R𝑅Ritalic_R)
16:     P{R:R until |R|<ps}𝑃conditional-set𝑅𝑅 until 𝑅𝑝𝑠P\leftarrow\{R:R\text{ until }|R|<ps\}italic_P ← { italic_R : italic_R until | italic_R | < italic_p italic_s }
17:end while
18:for each sol in Prsubscript𝑃𝑟P_{r}italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT do
19:     model𝑚𝑜𝑑𝑒𝑙absentmodel\leftarrowitalic_m italic_o italic_d italic_e italic_l ← sol[2] \triangleright Collect coefficients for each solution
20:     score𝑠𝑐𝑜𝑟𝑒absentscore\leftarrowitalic_s italic_c italic_o italic_r italic_e ← testModel(model,Xtest𝑚𝑜𝑑𝑒𝑙𝑋𝑡𝑒𝑠𝑡model,Xtestitalic_m italic_o italic_d italic_e italic_l , italic_X italic_t italic_e italic_s italic_t) \triangleright Evaluate AUC on test dataset
21:     cardinality𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦absentcardinality\leftarrowitalic_c italic_a italic_r italic_d italic_i italic_n italic_a italic_l italic_i italic_t italic_y ← sum(sol[1])
22:     testFitness(score×(1(cardinality/m)))𝑡𝑒𝑠𝑡𝐹𝑖𝑡𝑛𝑒𝑠𝑠𝑠𝑐𝑜𝑟𝑒1𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦𝑚testFitness\leftarrow(score\times(1-(cardinality/m)))italic_t italic_e italic_s italic_t italic_F italic_i italic_t italic_n italic_e italic_s italic_s ← ( italic_s italic_c italic_o italic_r italic_e × ( 1 - ( italic_c italic_a italic_r italic_d italic_i italic_n italic_a italic_l italic_i italic_t italic_y / italic_m ) ) )
23:end for
24:Return P𝑃Pitalic_P, testFitness𝑡𝑒𝑠𝑡𝐹𝑖𝑡𝑛𝑒𝑠𝑠testFitnessitalic_t italic_e italic_s italic_t italic_F italic_i italic_t italic_n italic_e italic_s italic_s
Algorithm 2 Worker algorithm
1:Input: lps𝑙𝑝𝑠lpsitalic_l italic_p italic_s, mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n, F𝐹Fitalic_F, CR𝐶𝑅CRitalic_C italic_R
2:Output: localP𝑙𝑜𝑐𝑎𝑙𝑃localPitalic_l italic_o italic_c italic_a italic_l italic_P: population evolved after mGen𝑚𝐺𝑒𝑛mGenitalic_m italic_G italic_e italic_n maximum iterations
3:k0𝑘0k\leftarrow 0italic_k ← 0
4:localP \leftarrow Randomly pick lps number of solutions from P \triangleright local population
5:for i=0,1,,lps𝑖01lpsi=0,1,\ldots,\text{lps}italic_i = 0 , 1 , … , lps do
6:     bVC \leftarrow localP[i].BinaryEncodedVector
7:     auc \leftarrow LLR (bVC, islandData𝑖𝑠𝑙𝑎𝑛𝑑𝐷𝑎𝑡𝑎islandDataitalic_i italic_s italic_l italic_a italic_n italic_d italic_D italic_a italic_t italic_a)
8:     localP[i][3] \leftarrow updateAUC
9:end for
10:for k=0,1,,mGen𝑘01mGenk=0,1,\ldots,\text{mGen}italic_k = 0 , 1 , … , mGen do
11:     mV {ϕ}absentitalic-ϕ\leftarrow\{\phi\}← { italic_ϕ }
12:     for i=0,1,,lps𝑖01lpsi=0,1,\ldots,\text{lps}italic_i = 0 , 1 , … , lps do
13:         loPVec \leftarrow localP[i].QuantumMatrix
14:         mVec \leftarrow Mutation(loPVec)
15:         mV {mVec}𝑚𝑉𝑒𝑐\cup\{mVec\}∪ { italic_m italic_V italic_e italic_c }
16:     end for
17:     locOfP {ϕ}absentitalic-ϕ\leftarrow\{\phi\}← { italic_ϕ }
18:     for i=0,1,,lps𝑖01lpsi=0,1,\ldots,\text{lps}italic_i = 0 , 1 , … , lps do
19:         mV \leftarrow mutatedP[i].QuantumMatrix
20:         TV \leftarrow Invoke Crossover on mV
21:         locOfP {TV}TV\cup\{\text{TV}\}∪ { TV }
22:     end for
23:     Invoke train-and-update phase on locOfP
24:     newP {ϕ}absentitalic-ϕ\leftarrow\{\phi\}← { italic_ϕ }
25:     for i=0,1,,lps𝑖01lpsi=0,1,\ldots,\text{lps}italic_i = 0 , 1 , … , lps do
26:         nS \leftarrow Invoke Selection (locOfP[i], P[i])
27:         newP {nS}nS\cup\{\text{nS}\}∪ { nS }
28:     end for
29:     localP \leftarrow newP
30:     kk+1𝑘𝑘1k\leftarrow k+1italic_k ← italic_k + 1
31:end for
32:return localP