1 Introduction

Nowadays, optimization problems have become complications, and many engineering applications can be designed as large-scale optimizations to be solved, such as large-scale power system, scheduling with a large number of resources, and the vehicle routing planning of massive transportation network (Maisto and Esposito 2012; Wan et al. 2019). However, traditional approaches, simplex method and gradient descent method, no longer meet actual needs.

Nature has always been a rich source of human creativity, and adaptive optimization phenomena continue to arouse human beings (Nadimi-Shahraki et al. 2021b, 2022b). Intelligent algorithms, inspired by the laws of nature and biology, are designed to solve practical problems (Bandyopadhyay et al. 2021; Osuna-Enciso et al. 2022). They assign complex tasks to individuals in a group to complete cooperation, and have the characteristics of simple concept and convenient implementation (Chaturvedi et al. 2020; Torres-Cerna et al. 2016). Due to the distribution, simplicity, flexibility and robustness of intelligent algorithms, they have been widely used in the areas of computer science, knowledge discovery, communication networks and scheduling planning, and have become a research hotspot (Maisto and Esposito 2012; Kampouridis and Otero 2017; Nadimi-Shahraki et al. 2021d).

Genetic algorithm (GA), differential evolution (DE) and QUasi-affine transformation evolution (QUATRE) derive from biological evolution, heredity, crossover and mutation operations (Long et al. 2020; Zheng et al. 2019; Nadimi-Shahraki et al. 2020; Hu et al. 2022). Particle swarm optimization (PSO) and ant colony optimization (ACO) simulate the joint search behaviors of birds and ants (Zhao et al. 2021; Menéndez et al. 2016), and grey wolf optimizer (GWO), fish migration optimization (FMO), cat swarm optimization (CSO), phasmatodea population evolution (PPE) and moth flame (MF) imitate the living habits of wolves, grayings, cats, stick insects and moths (Hatta et al. 2019; Chai et al. 2020; Pan et al. 2021d; Song et al. 2021; Nadimi-Shahraki et al. 2021e). Imperialist competitive algorithm (ICA) and teaching-learning-based optimization (TLBO) follow the characteristics of human behaviors (Armaghani et al. 2021; Mishra et al. 2019). Gravitational search algorithm (GSA) and equilibrium optimizer (EO) come from the laws and phenomena of physics (Gauthama Raman et al. 2020; Shao et al. 2021). Fig. 1 presents the classification of metaheuristic algorithm.

Fig. 1
figure 1

Metaheuristic algorithm

Although these algorithms are well suited for continuous search space, there are also discrete and binary issues (Nadimi-Shahraki et al. 2021c). GA, ACO and discrete moth-flame optimization (DFMO) are proposed to solve discrete space. Binary optimization aims to handle problems with the variables limited to “0” and “1” (Pan et al. 2021c), so various binary metaheuristic algorithms, binary DE, binary PSO, binary GWO and binary FMO, are proposed to deal with these applications such as open-off, maximum-minimum, selected-nonselected, yes-no, active-inactive, etc. Binary PSO uses the concepts of the velocity and momentum of continuous PSO, resulting in its limited performance. Nguyen et al. adopted stickiness property and rollover probability to replace them, and introduced a novel binary PSO (Nguyen et al. 2021). Inspired by plant genetics, Gupta et al. proposed a novel multi-species binary coded algorithm, Mendelian evolution theory optimization algorithm (Gupta et al. 2020b). First, the DNAs of two different species are denaturation producing hybrid offspring. Second, the algorithm describes how dominant and recessive traits emerge in two consecutive generations. Third, organisms resist natural mutation through Epimutation.

Constraint satisfaction problem (CSP) is a hot research topic in artificial intelligence, and many issues are described as CSPs such as timetabling, bus scheduling, resource allocation and planning, etc. Gortazar et al. applied decentralized search method to general binary problems (Gortazar et al. 2010). They distinguish the constraint directly handled by a solver from the constraint resolved by penalty function, and conduct experiments on four well-known binary optimizations to study the effectiveness and efficiency of the method. Fu et al. proposed a new self-adaptive DE (SADE) algorithm (Fu et al. 2011). SADE adaptively adjusts the mutation rate F and the crossover rate CR according to different population distribution, and improves population diversity and convergence ability. Michalak visualized combinatorial search space with low-dimensional Euclidean embedding where obtained visualization method analyzes the behavior of the population and the work of genetic operators in binary space (Michalak 2018), and studied the problems of four-peak, firefighter, knapsack, traveling salesman and quadratic assignment.

The above methods employ continuous optimization algorithms to deal with binary applications, while Glover et al. introduced a new binarization method for zero-one optimization (Glover et al. 2019). The algorithm assigns values to variables accompanying the processes of augmentation and shifting, and the diverse solutions obtained by permutation mapping improve the flexibility and generality of the algorithm which is extended to scheduling and routing applications.

Metaheuristic algorithms have been successfully applied to many real-world problems, and have also shown competitiveness in the application research of binary engineering. We summarize various applications of binary metaheuristics. Although correlational research has only recently begun, people have put forward many methods to work out complex applications. A major contribution of this survey is that it systematically expounds the coding application of binary metaheuristic algorithms for the first time, providing basic common guidelines for addressing the design of these algorithms. In particular, by discussing each application, the most popular solutions are described in detail, and the pros and cons of methods are analyzed.

Although there are survey papers on metaheuristic algorithms, they mainly focus on the research of continuous space and applications. We aim to provide a comprehensive survey of the latest works to further study some improvements and applications of binary metaheuristic algorithms, and we also hope that this survey attracts the attention of researchers to further investigate challenges and opportunities facing them.

The remainder of this paper is organized as follows: Sect. 2 introduces the binary coded schemes of metaheuristic algorithms. Section 3 surveys theory research and benchmark functions. In Sect. 4, engineering applications have been widely discussed, including packing, feature selection, scheduling, structure optimization, layout and parameter optimization, etc. Sect. 5, future research prospects are pointed out, and finally, Sect. 6 concludes the works.

2 Binary coded scheme

For a binary minimization, its mathematical model is defined as:

$$\begin{aligned} min f(x) = f(x_1,\ldots ,x_D), x_i \in \{0,1\} \& i \in \{1, 2, .., D\} \end{aligned}$$
(1)

where D is dimension.

The chromosomes of traditional GA are composed of “0” and “1”, and it can directly solve binary optimizations (Contaldi et al. 2019). The continuous values applied by other algorithms need to be discretized, and we divide binary algorithms into transfer function method, quantum-inspired method, XOR method and threshold method according to different discretization ways. Figure 2 presents the process of mapping continuous search space to binary search space.

Fig. 2
figure 2

The process of mapping a continuous search space to a discrete search space

2.1 Transfer function method

Transfer function is commonly used in binary algorithms. It is responsible for mapping obtained value to [0, 1], and then determines that the next position of a particle is “0” or “1” by comparing with a random number in [0, 1] interval. Transfer functions are believed to be the cheapest and simplest operation in constructing binary metaheuristic algorithms, and their main advantage is to maintain the structure of continuous algorithms. (Saremi et al. 2015) studied the behaviors of transfer functions from improving the performance of metaheuristic algorithms, and proved that transfer function plays an important role in the exploration and exploitation of binary metaheuristic algorithms.

The first binary PSO (BPSO) uses Sigmoid function as its transfer function. Mirjalili and Lewis paid attention to transfer functions, and proposed S-type and V-type transfer functions for the first time (Mirjalili and Lewis 2013). Mina and Hossein claimed a binary ICA using S-type and V-type transfer functions (Mirhosseini and Nezamabadi-pour 2018), and the performance of the algorithm is evaluated through 0–1 knapsack, feature selection, and content-based image retrieval (CBIR) system. To solve the trouble that BPSO is easy to fall into local optimum, Guo et al. proposed new Z-shaped transfer functions (Guo et al. 2020a). Mirjalili et al. (2020) and Ahmed et al. (2021) utilized U-shaped transfer function to implement binarization, and Mirjalili et al. (2020) found that the algorithm acquires the best experimental results when \(\alpha =1\) and \(\beta =1.5\).

Leonard et al. studied angle modulated PSO (AMPSO) (Leonard et al. 2015). Trigonometric function is defined as following:

$$\begin{aligned} T(x) = {\text{sin}}[2\pi (x-a)b * {\text{cos}}(2\pi (x-a)c)] + d \end{aligned}$$
(2)

where a, b, c and d are four coefficients, and their values are set to 0, 0.5, 0.8 and 0 in Leonard et al. (2015). It only needs to know the value of the first dimension, and the values of other dimensions are obtained from their previous dimensions with Eq. (2), as shown in Fig. 3. The purpose of this research is to identify and provide the evidence where the algorithm may fail. It turns out that the assumption of the excellent solutions grouped together does not apply to AMPSO algorithm, and the reason is explained by the generating function which hinders the exploitation ability of the algorithm.

Fig. 3
figure 3

The example of trigonometric function

Mafarja et al. proposed time-varying S-shaped and V-shaped transfer functions with step vector to balance exploration and exploitation (Mafarja et al. 2018). In the early stage of the algorithm, the probability of changing position is high, which helps to explore new solutions from initial population, while the probability becomes low at end stage.

$$\begin{aligned} T(x,\tau ) = \dfrac{1}{1+e^{\frac{-x}{\tau }}} \end{aligned}$$
(3)
$$\begin{aligned} \tau = \left(1-\frac{t}{MAX\_IT} \right)\tau _{max} + \frac{t}{MAX\_IT}\tau _{min} \end{aligned}$$
(4)

where t and \(MAX\_IT\) represent current and max iterations. \(\tau _{min}\) and \(\tau _{max}\) mean the min and max values of \(\tau\), and they are set to 0.01 and 4.

The performance of binary metaheuristic algorithms depends on their solution ability and transfer function, Beheshti proposed an adaptive transfer function with two linear functions to avoid the defects of existing transfer functions (Beheshti 2021).

$$\begin{aligned} T(x) = max\{min\{-G_1x/(x_{max}-G_2) + G_3,1\},0\} \end{aligned}$$
(5)
$$\begin{aligned} G_m=G_{mf}+(G_{ms}-G_{mf})(T-t)/(T-1), m = 1, 2, 3 \end{aligned}$$
(6)

where \(G_{ms}\) and \(G_{mf}\) represent the initial and final values of \(G_m\). Experiments prove that the algorithm acquires optimal performance when \(G_{1s}=0.5\), \(G_{2s}=0\), \(G_{3s}=0.5\), and \(G_{1f}=1\), \(G_{2f}=0.9*x_{max}\), \(G_{3f}\)=0.

Baykasoğlu and Ozsoydan (2018) introduced the first binary dynamic optimization problem (DOP) of weighted superposition attraction (WSA) algorithm, and proposed two new binary conversion schemes.

$$\begin{aligned} x = floor(abs(x \% 2)) \end{aligned}$$
(7)
$$\begin{aligned} x = round(abs(x \% 2)) \% 2 \end{aligned}$$
(8)

For easy reference, we have summarized commonly used transfer functions, as shown in Tables 1 and  2.

Table 1 The details of transfer functions
Table 2 The details of transfer functions -2

2.2 Quantum-inspired method

The smallest unit for storing information in quantum computing is called Q-bit which may be in “0/1” state, or in any superposition state of the two (Rizk-Allah 2021). The state of Q-bit is acquired by:

$$\begin{aligned} \vert \psi \rangle = \alpha \vert 0 \rangle + \beta \vert 1 \rangle \end{aligned}$$
(9)

where \(\alpha\) and \(\beta\) are probabilities in “0” state and “1” state, and \(\alpha ^2\) + \(\beta ^2\) = 1. An individual is represented as a string of Q-bits:

$$\begin{aligned} X_i = \begin{bmatrix} \alpha _1,\ldots ,\alpha _n\\ \beta _1,\ldots ,\beta _n \end{bmatrix} \end{aligned}$$
(10)

where n is the length of \(X_i\). Quantum gate (Q-gate) can change the state of Q-bit.

Based on the concepts and principles of quantum computing, Han and Kim developed quantum evolution algorithm (QEA) with Q-bit and state superposition (Han and Kim 2002). Like other EAs, QEA also has the characteristics of individual representation, objective function and population evolution. QEA is not binary or digital representation, but uses Q-bit as probability.

Agrawal et al. utilized Q-bits and quantum rotation to present individuals and mutation operator, and introduced quantum-based crossover to improve global search ability (Agrawal et al. 2020). Srikanth et al. integrated quantum computing with binary GWO to improve solution ability (Srikanth et al. 2018). Q-bit and Q-gate assist the proposed algorithm balancing exploitation and exploration, and they participate in the position update of wolves. Quantum-inspired EAs (QIEAs) follow the superposition of multiple states on Q-bits and rotation gate, and Wright and Jordanov proposed half significant bit (HSB) and evolved rotation gate to solve the problem of binary premature convergence (Wright and Jordanov 2017). In Guo et al. (2020b), rotation angle is determined through the pbest and gbest of PSO and multi-bit quantum rotation gate updates the positions of particles with the variance of fitness values. Mohammad et al. employed quantum-inspired GA (QIGA) including improved quantum measurement and adaptive quantum rotation gate based on the quality and length of abstracts to improve performance (Mojrian and Mirroshandel 2021).

2.3 XOR method

XOR is a bitwise operator, and XOR and its variant are adopted to implement position update. To obtain excellent performance in Industry 4.0 application, Mojtaba et al. introduced an improved binary GSA which benefits from XOR operator and repository to improve global search ability (Ahmadieh Khanesar et al. 2020). Truth table for XOR operator is defined as in Table 3.

Table 3 Truth table for XOR operator

Kiran et al. developed an artificial bee colony (ABC) algorithm for binary optimization, and position update adopts XOR (Kiran and Gündüz 2013), as shown in Eq. (11) and Table 4. The proposed method is studied on the problem of facility location without capacity limitation.

$$\begin{aligned} X_i^d = X_i^d \oplus [\varphi (X_i^d \oplus X_k^d )] \end{aligned}$$
(11)

where \(X_i^d\) is the dth dimension of \(X_i\) and \(i\ne k\).

Table 4 Position update with XOR operator

Aslan et al. first discretized Jaya by logical XOR operator (Aslan et al. 2019), while the proposed idea is simple but effective.

$$\begin{aligned} X_i^d = X_i^d \oplus (Best^d \oplus Worst^d) \end{aligned}$$
(12)

where Best and Worst represent the best and worst particles.

To solve the time-consuming problem of threshold image segmentation, Gao et al. claimed a PSO algorithm based on learning strategy (Gao et al. 2016). First, new jump operators and learning items improve the search ability of PSO and maintain convergence rate. Second, a random crossover operator and an exchange strategy are further studied so that particles have more opportunities to explore search space in each dimension.

$$\begin{aligned} X_i^d = X_i^d \oplus \phi (X_i^d \oplus X_k^d) \end{aligned}$$
(13)

where \(\phi\) means logic NOT operator with 50% probability.

2.4 Threshold method

The threshold methods used in binary EAs are simple. They limit search space to [0, 1], and then calculate particles’ positions by continuous algorithms. Final positions are determined through comparing with a predefined threshold T. The values of T are usually set to 0.5, 0.6, or 0.7 (Faris et al. 2018a; Mlakar et al. 2017; Baraldi et al. 2018), and a position is updated with 1 when it is greater than T.

3 Binary metaheuristic algorithms in theory and test problems

This section summarizes the latest progress in related theoretical research and the benchmark functions of binary metaheuristic algorithms.

3.1 Theoretical research

EAs have randomness and uncertainty, but their theoretical research is not perfect. There is a little research which mainly focuses on convergence, running time and working principles in binary algorithms.

Inspired by learning mechanism in PSO, Chen et al. proposed a binary learning DE (BLDE) algorithm which effectively locates global optimal solution by studying from the last population (Chen et al. 2015). They theoretically prove the global convergence of BLDE and compare it with existing binary coded EAs through numerical experiments.

Biogeography based optimization (BBO) is a new EA, and Ma et al. analyzed the convergence of BBO on binary problem based on Markov chain model (Ma et al. 2014). Relevance theory presents that BBO only with migration and mutation will never converge to global optimal solution. However, the elitist BBO preserves the best candidate from the population and converges to global optima. Although there are several differences between GA and BBO, the paper proves that the convergence of BBO is similar to GA.

Sudholt and Witt studied the runtime of binary PSO and developed a theoretical understanding of the algorithm, especially from the perspective of computational complexity (Sudholt and Witt 2010). They start from the original equations of Kennedy and Eberhart in which the velocity is at a fixed interval [\(V_{min}\), \(V_{max}\)] to prevent divergence. If the \(V_{max}\) is unchanged, its impact is catastrophic when the problem to be solved increases. Instead, they introduce a equation where the \(V_{max}\) is adjusted with growing dimension, and the new choice of the \(V_{max}\) leads to provably effective optimization time.

Doerr and Zheng theoretically analyzed the working principle of binary DE (BDE) and found that the behavior of BDE is completely different from classical EAs or distribution based methods (Doerr and Zheng 2020). It is impossible to prove all expected results with mathematical rigor due to the dependence of reusing the same individuals in mutation and selection operators, and they bring a more independent variant of BDE through mean field method in statistical physics. They experimentally prove its similarity to BDE, and rigorously certificate several statements only for the variant.

In PSO, the inertia weight is an important parameter that controls its search ability. There have been conducted in-depth studies it in continuous optimization, but there is a little research on binary problems. Liu et al. comprehensively studied the influence of the inertia weight on the performance of BPSO from both theoretical and empirical aspects (Liu et al. 2015). They propose a mathematical model to analyze the behavior of BPSO, and derive several lemmas and theorems about the inertia weight. Their research results illustrate that a small weight improves exploration ability and a large value encourages local search.

3.2 Binary benchmark functions

Classic binary test functions consist of Max-Ones and Royal-Road, and some scholars also employ continuous benchmark functions to verify the performance of binary algorithms. Rodriguez et al. proposed binary test functions consisting of 27 combinatorial optimizations (Rodriguez et al. 2012). 13 of which are obtained from manual problems, and the others are derived from practical applications. Table 5 lists their name, dimension (D) and f, where f represents global optimum and all functions are expressed as maximization.

Table 5 Tackled test problems

4 Binary metaheuristic algorithms in applications

Binary metaheuristic algorithms demonstrate excellent performance in engineering applications. According to different scenes, we introduce them in detail from knapsack, feature selection, scheduling, structure optimization, layout and parameter optimization, etc.

4.1 Packing problem/knapsack problem

Knapsack problem (KP) exists in scientific engineering applications, including finance, flexible manufacturing system, capital budgeting, project selection and loading, and is a well-known NP-hard optimization. Given a set of items with nonnegative weights and values (profits), KP selects several items to put in backpacks and has a specified capacity limitation to maximize profits. It is mathematically formulated as follows:

$$\begin{aligned} \begin{aligned} max f(x) = \sum _{i=1}^{n}p_ix_i, i = 1, 2,\ldots , n \\ s.t. \sum _{i=1}^{n}c_{ij}x_i \le b_j, j = 1, 2,\ldots , m \\ x_i \in \{0, 1\} \end{aligned} \end{aligned}$$
(14)

where n and m represent the numbers of items and constraints, and \(p_i\) is the ith capacity. It is mainly divided into three categories:

  1. (1)

    0–1 KP: each item can only be selected once at most.

  2. (2)

    Multi-dimensional KP (MKP): if we have n items and m knapsacks with capacities not necessarily the same, knapsacks will be filled at the same time.

  3. (3)

    Dynamic KP: the capacity, weight and profit of knapsacks dynamically change.

Laabadi proposed a binary crow search algorithm (CSA) to solve two-dimensional (2D) bin packing, which uses Sigmoid function for mapping real solutions to binary values (Laabadi et al. 2020). Bansal and Deep used a modified binary PSO (MBPSO) algorithm to solve 0-1 KP and MKP (Bansal and Deep 2012). Compared with basic BPSO, MBPSO introduces a new probability function maintaining population diversity and making it more effective in solving KPs. In BPSO, the velocity is limited to [0,1]. The probability of position switching is slow when its value is small and the probability reaches maximum when it is close to the \(V_{max}\). The Sigmoid function of BPSO provides low diversity and low exploration. Linear normalization function replaces Sigmoid, and makes search process more exploration and efficient, as shown in Eq. (15).

$$\begin{aligned} X_i^d = \left\{ \begin{array}{l} 1 \qquad if(U(0,1) <\frac{X_i^d + V_i^d + V_{max}}{1+2V_{max}}) \\ 0 \qquad \qquad else \end{array} \right. \end{aligned}$$
(15)

where \(V_i^d\) represents the dth dimension of \(V_i\), and U(0, 1) is a uniformly distributed random number between [0,1].

Agrawal et al. proposed a new binary version of gaining sharing knowledge-based optimization algorithm (NBGSK) to solve KPs with small and large dimensions (Agrawal et al. 2022a). The algorithm mainly includes binary junior gain sharing stage and binary senior gain sharing stage, and is able to efficiently explore and exploit binary search space. Population size gradually reduces with a linear function to improve performance and prevent solutions from falling into local optimum.

Based on the characteristics of MKP, the profit of each project and the overload/remaining capacity of every dimension are considered, and logarithmic utility function adjusts trade-off between objective and capacity constraints when implementing repair operation and local search (Li et al. 2020). Tinós and Yang’s theoretically analyzed discrete dynamic optimization problem (DOP) according to fitness landscape during optimization process (Tinós and Yang 2014), and they focused on the problems of XOR DOP generator and dynamic KP. Tlili and Krichen studied double loading problem (DLP) which is used to optimize the number of trash bins in industrial parks (Tlili and Krichen 2015). In DLP, items are packed into boxes, then the boxes are loaded into a set of compartments while minimizing the number of used trash bins.

Considering real-time packaging constraints such as load-bearing, placement and orientation, Kanna et al. used GA to optimize three-dimensional (3D) packaging with any sizes in prismatic container (Kanna et al. 2018). The first nine binary digits represent the dimension of 3D box, and next three digits indicate the position of the box in the container following the last two digits which mean auxiliary components to identify a bin.

Zou and Chen extended KP to transportation power system (Zou and Chen 2019), and took into account the uncertainties of disruption, traffic demand, mitigation and repair costs. This issue is optimized with the combination of binary PSO and knapsack-based heuristic initialization, and a priority index for the mitigation and repair of faulty components is established according to solution results. Rahim et al. introduced a general architecture of demand side management (DSM) which integrates residence and smart area through wide area network, and solved it with MKP (Rahim et al. 2016). They adopt a combination model of time-of-use electricity price and inclined congestion rate, and experiments illustrate that energy management controller based on GA is better than BPSO and ACO in terms of reducing electricity charge, minimizing peak-to-average ratio and maximizing user comfort. Hardware/software (HW/SW) partitioning is an important topic in the collaborative design of software and hardware, and also is an NP-hard problem. To solve HW/SW quickly and effectively with EA, Zhai et al. regarded it as a variant of KP (Zhai et al. 2021). Infeasible solutions are eliminated by the proposed greedy repair optimization algorithm, and binary EAs deal with HW/SW.

Hassan et al. presented a proposal for scheduling shuttle ambulance vehicles assigned to COVID-19 patients with discrete binary gaining-sharing knowledge-based optimization algorithm (DBGSK) (Hassan et al. 2021a), namely, multi-objective multiple 0–1 knapsack problem. The scheduling aims to achieve the best utilization of predetermined planning time slot where the utilization is evaluated through maximizing the number of evacuated people who might be infected with the virus to isolation hospital and maximizing the effectiveness of prioritizing patients relative to their health status.

In sum, we list the optimization algorithms discussed and their applications, as shown in Table 6.

Table 6 The details of the cited references in KPs

4.2 Feature selection problem

Feature selection utilizes a small part from original feature set to reduce data dimensionality, accelerate learning and improve performance (Paul and Das 2015; Campagner et al. 2021). Figure 4 shows the process of feature selection. When data dimension is too large, it becomes difficult to choose the best solution from \(2^D\) possibilities (Kanwal et al. 2021; Mandanas and Kotropoulos 2020). The goal is to reduce the number of features and improve classification quality, so it is optimized as a single- or multi-objective (Lima et al. 2021; Yan et al. 2016). Binary metaheuristic algorithms realize the jobs of prediction, recognition and classification through feature selection (Shu et al. 2016; Agrawal et al. 2021b).

Fig. 4
figure 4

The process of feature selection

In Faris et al. (2018b), Dhiman et al. (2021), Mafarja et al. (2019), Nadimi-Shahraki et al. (2022a), Agrawal et al. (2022b), they adopt S-shaped and V-shaped transfer functions to map search space to binary space. In Kumar and Bharti (2021), a hybrid method of binary PSO and sine cosine algorithm (SCA) chooses feature subset with rich information and performs cluster analysis where V-shaped transfer function calculates the probabilities of all particles changing their positions. The performance of two different transfer functions is studied in Abdel-Basset et al. (2020), and GWO-S performs better than GWO-V in most of datasets.

Agrawal et al. represented a binary variant of GSK with probability estimation operator (Agrawal et al. 2021a), and chaotic maps improve performance. Xue et al. proposed a multi-classifier PSO (SPS-PSO) algorithm based on self-adaptive parameter where a representation method of solutions and a generation strategy for five candidate solutions are brought and their parameter values automatically adjust during evolution process (Xue et al. 2020). SPS-PSO has favorably global and local search abilities when dealing with large-scale feature selection. Song et al. discussed a variable-size co-evolutionary PSO (VS-CCPSO) algorithm which adopts the idea of “divide and conquer” and uses space division strategy based on feature importance (Song et al. 2020). Related features are divided into the same subspaces with low computational cost, and the size of each subswarm is adaptive adjustment to save calculation cost. To protect the quality of particles, VS-CCPSO deletes individuals by fitness-guided binary clustering method, and generates new particles with feature importance and crossover.

Multi-objective feature selection usually includes minimizing the number of selected features and minimizing classification error. Xu et al. (2020) proposed an EA based on repeated analysis for bi-objective feature selection in classification, and they improved a basic dominance-based EA framework from three aspects: the first is to modify reproduction process to improve the quality of offsprings; the second is to propose duplication analysis to filter redundant solutions; the third is to further select retained solutions according to diversity. Hancer et al. proposed a new multi-objective artificial bee colony (ABC) algorithm which combines nondominated sorting process and genetic operator (Hancer et al. 2018), and feature selection is realized by binary-continuous ABC.

Nondominated sorting genetic algorithm (NSGA-II) has been successfully applied to various feature selections, but (Karagoz et al. 2021) is the first to perform multi-objective parallel feature selection on the local descriptors of image/video datasets. It selects the minimum number of features in multi-label image/video datasets, and provides maximum prediction accuracy. Through multi-label classification technology, a subset of nondominated features are extracted and verified by pruned set (PS), classifier chain (CC), binary correlation (BR) and random k-Labelset (RAkEL). Hancer brought multi-objective DE for clustering and feature selection where a string coding scheme represents possible solutions (Hancer 2020). The scheme includes two matters: (1) a set of real numbers in the range of [− 1, 1] indicating possible cluster centroid positions; (2) a set of activation codes in the range of [0, 1] meaning a possible feature subset. Suppose that the string scheme encodes D-dimensional datasets for L clusters, the length of a possible solution will be \(L*D+D\).

4.2.1 Prediction field

Prediction is an important data analysis method which mines the valuable information contained in database. At present, prediction technology has been extensively used in energy, finance, environment, meteorology and other fields, and has achieved excellent results, providing support for data analysis, project planning, scientific research and policy formulation. However, prediction models have great differences in feature selection and these characteristics make it difficult to suit for all fields. Moreover, real data often contains many uncertain factors such as noise, random disturbance, distortion and missing value, and they play a great impact on the performance of prediction models (Zheng et al. 2017; Niu et al. 2018).

A novel wind power prediction system is proposed by (Wang et al. 2019a) which is composed of feature selection, prediction model, system optimization and system evaluation modules, and it implements simultaneously deterministic and probabilistic predictions. In the feature selection module, a hybrid strategy determines the optimal input of the system, and in the prediction module, a recurrent neural network (RNN) based on dynamic reservoir theory is developed. In the system optimization module, an enhanced multi-objective algorithm optimizes system parameters to improve accuracy and stability, and the effectiveness and feasibility of the system are validated with the evaluation module. (Zhang et al. 2017) developed a new hybrid model for short-term wind speed prediction and proved its effectiveness. Real-valued backtracking search algorithm (BSA) optimizes the weights and bias of extreme learning machine (ELM), and binary BSA is used as a feature selection approach to reconstruct input matrix from the candidate inputs predefined by partial autocorrelation function values. Because of the randomness and fluctuation of wind speed signal, variational mode decomposition (OVMD) filters out redundant noise. The parameters of OVMD are acquired with the center frequencies of models, and final result is constituted with residual evaluation index (REI) and the prediction results of the models .

The prediction of liquefaction sensitivity is a key issue that it has highly imbalanced dataset. The multi-objective feature selection algorithm proposed by Das et al. is applied to liquefaction sensitivity (Das et al. 2020) where NSGA-II and multi-objective symbiotic organisms search (SOS) algorithm are combined with learning algorithm, neural network (NN) and multi-variate adaptive regression spline to effectively find optimal parameters and minimize error. Sales forecasting uses historical data to predict the short-term or long-term future performance of enterprises. Jiménez et al. combined regression, model evaluation, random forest and decision-making to establish an accurate feature selection model for multi-objective online sales prediction (Jiménez et al. 2017). Mohanty et al. (2017) brought an effective prediction model for determining the pull-out capacity of small ground anchors where NN is a learning algorithm and NSGA-II performs feature subset selection.

Feature selection implements single- and multi-objective predictions in balanced and imbalanced datasets, and binary metaheuristic algorithms improve prediction accuracy.

4.2.2 Biology & text recognition fields

In recent years, emotion recognition is an interesting research field because it has found many applications in real life. Mistry et al. (2016), Hassan and Mohammed (2020), Mlakar et al. (2017), Karthiga and Mangai (2019) proposed expression recognition systems with EAs and feature selection. Mistry et al. (2016) employed irreplaceable memory, new velocity update strategy, small population, deep local facial feature search based on subdimensions, and a cooperation of local and global search to alleviate the premature convergence of PSO. In Hassan and Mohammed (2020), the nodes and edges of a graph express facial region, while the frequent substructure of each emotion is constructed with graph mining algorithm. To reduce the number of generated subgraphs and improve classification accuracy, binary cat swarm algorithm (CSA) seeks final subgraphs. Based on the histogram of gradient descriptor and differential feature vector, Mlakar et al. (2017) recognized seven typical emotions by an improved multi-objective DE algorithm which is developed by the selection strategies of “emotion-specific features” and “more distinctive features for all emotions”.

Text classification is the process of classifying text documents into one or more predefined categories or similar document categories, but this process is a challenging task in machine learning mainly due to existing numerous discriminators. Chantar et al. proposed an improved GWO method to solve the problem of Arabic text classification under different learning models such as naive bayes, decision tree, K-nearest neighbor (KNN) and support vector machine (SVM) (Chantar et al. 2020). Thiyagarajan and Shanthi suggested a multi-objective text classification algorithm which has crossover in its operation (Thiyagarajan and Shanthi 2019). The optimal numbers of selected features are different in various datasets, and it is not the best method only extracting features based on the scores calculated with a metric because of converting many documents into zero length. In Huang et al. (2019), a multi-objective feature selection is modeled to obtain the optimal number of selected features reasonably and automatically, and a parallel algorithm is designed using dynamic programming method to improve running time.

4.2.3 Hyperspectral imagery field

Hyperspectral image is composed of the data obtained by a large number of remote sensors, and it records hundreds of channels in the same scene. To improve transmission efficiency, hyperspectral images need to be compressed or segmented (Ghamisi et al. 2014).

Inspired by the clonal selection theory of artificial immune system (AIS), Zhang et al. proposed clonal selection feature selection (CSFS) and clonal selection feature weighting (CSFW) for dimensionality reduction in hyperspectral remote sensing images (Zhang et al. 2007). In CSF, each solution evolves in binary space, and the value of each bit is “1” or “0” which means that corresponding feature is selected or removed respectively. In CSFW, each antibody is expressed with a string containing integers and their corresponding weights. Shukla and Nanda described the selection of optimal bands as an unsupervised problem (Shukla and Nanda 2018). First, images are compressed with discrete wavelet transform, then images’ details in transform domain are extracted with the entropy of space information and the first-order spectral derivative of time redundancy. Finally, a new binary social spider optimization (SSO) algorithm optimizes these details to obtain final bands, which balances exploration and exploitation and avoids premature convergence.

The most important task of Mars rover is to automatically segment the images taken by frontline panoramic camera. Image analysis facilitates the navigation and positioning of Mars rover, but the transmission cost of images from Mars to Earth is very high. Rashno et al. introduced a new feature selection of Mars images based on wavelet and color features (Rashno et al. 2017). Selected features are applied to ELM classifier, and high-precision pixel result is obtained.

4.2.4 Gene field

The data of gene/microarray often contains thousands of dimensions, and a compact subset of genes is generated with feature selection to improve accuracy (Alshamlan 2018). Choosing an appropriate gene subset has strong practical value for cancer diagnosis and treatment (Pashaei et al. 2019).

To improve prediction accuracy and acquire more interpretable genes, (Han et al. 2017) combined binary PSO and gene-to-class sensitivity (GCS) information overcoming the shortcomings of traditional gene selection methods. Firstly, GCS information is acquired from gene data, and engages in gene selection revealing whether a gene is sensitive to sample categories. Secondly, the method adopts the proposed binary PSO to choose possible gene subsets from initial genes. Finally, GCS information is encoded into the operations of initializing particles, updating particles, modifying the velocity and adaptive mutation. To acquire the most predictive gene features, Xiong et al. provided a gene selection method (Xiong et al. 2019). A initial gene pool is divided into multiple clusters according to its structure, and then the least absolute shrinkage selection operator is used to opt genes with high prediction and calculate the contribution value which represents the sensitivity of genes to the classes of samples.

Most of existing feature selection methods adopt fixed-length solutions. When these methods are applied to high-dimensional data, they not only consume a lot of memory, but also require large computation time. Tran et al. proposed a variable-length PSO algorithm where particles have different lengths (Tran et al. 2018). By feature relevance, particles with short lengths obtain better classification performance, while particles with large lengths improve global search ability.

A multi-objective PSO is introduced by Wang et al. (2020b) optimizing four objectives, feature number, accuracy, and entropy-based correlation and redundancy. Firstly, it employs a new binary coding strategy to select informative genes. Secondly, a mutation operator improves search ability. Finally, the “best/1” mutation operator of DE promotes local search, and 41 cancer datasets prove the effectiveness of the algorithm.

Operon includes useful knowledge on protein function determination and drug design, however, current methods for operon detection are usually difficult to realize. Chuang et al. proposed a chaotic binary PSO to predict operons in bacterial genomes where fitness function is designed with participation in the same metabolic pathway, intergenic distance and the cluster of orthologous groups attributes of a genome (Chuang et al. 2013).

4.2.5 Fault diagnosis field

Binary metaheuristic algorithms perform state monitoring and fault diagnosis on complex systems. Their purposes are to make timely and correctly judge on various abnormal states or faults and to improve equipment reliability and safety. The problems of the difficult extraction of feature information and long calculation time in fault diagnosis are solved with intelligent algorithms due to the advantage of collaboration among particles, and optimal feature set is generated (Baraldi et al. 2016).

Since projection data still overlaps significantly in subspaces, a single global transformation cannot offer great classification accuracy in multi-class tasks. Van and Kang developed a new wavelet kernel local fisher discriminant analysis (WKLFDA) where PSO chooses the parameters of WKLFD (Van and Kang 2015), and a one-to-one strategy converts tasks into all possible binary classification tasks to improve the performance of bearing defect classification. PSO-WKLFDA collects the effective features of each binary class, and a decision fusion method combines the classification result of each SVM classifier to identify final bearing state.

Baraldi et al. measured component degradation with health indicators from a set of the signals collected during operation (Baraldi et al. 2018), and they employed feature extraction and multi-objective binary DE to select the optimal feature subset defined by health indicators. Peimankar et al. introduced a two-step algorithm for power transformers fault diagnosis based on multi-objective particle swarm optimization (MOPSO) algorithm (Peimankar et al. 2017). First, the algorithm selects the most effective features and the small number of features in the multi-objective framework as input. Second, a ensemble classifier is created according to the most accurate and diverse classifiers. Finally, Dempster-Shafer theory combines the outputs of the ensemble classifier to determine the actual faults of power transformers.

4.2.6 Security field

The emergence and wide application of network have brought convenience to people’s life and work, but it has also yielded many security problems. Huge losses are caused by various types of vulnerabilities and attacks. Feature selection extracts collected abnormal network data from normal data, so as identifying intrusion and normal network behavior. These problems are converted into optimal classification and solved by binary metaheuristic algorithms.

Intrusion detection system (IDS) technology is used in network security to protect sensitive assets. By implementing effective IDS as defense mechanism, cybersecurity risks can be lessened (Raman et al. 2017). Wazirali proposed an intrusion detection model based on adaptive fuzzy (FKNN) algorithm (Wazirali 2021). PSO optimizes the neighborhood size (K) and fuzzy intensity parameter (m) of FKNN, and binary PSO selects a subset of conditional features for detection. To skillfully adjust local comprehensive search skills of PSO, the system uses two control parameters including time-varying inertia weight and time-varying acceleration coefficient, and the proposed method acquires a large number of true errors with the least false positives and false negatives. Maza and Touahria claimed a new hybrid multi-objective feature selection algorithm (MOEDAFS) (Maza and Touahria 2019). By calculating the probability of each feature, four probability models are integrated into MOEDFS, and each feature subset selected by MOEDFS has better classification performance and fewer features. Data in IDSs often has the problem of unbalanced classification which obviously limits efficiency. Aiming at the problem, Zhu et al. improved the NSGA-III algorithm which uses probability-based bias selection to handle the problems of imbalance and fitting selection to eliminate redundant features (Zhu et al. 2017).

Irrelevant and redundant data in network traffic wastes lots of computation and storage resources, and reduces the accuracy of network anomaly detection. Zhang and Xie proposed a multi-objective feature selection algorithm to solve the problem (Zhang and Xie 2020). First, the proposed algorithm establishes evolutionary strategy pool and dominance strategy pool, and then designs random probability selection to improve the convergence and diversity of the algorithm. The model takes the number of selected features, accuracy, precision, detection rate and false positive rate as objective functions, and characterizes performance from different aspects. Malicious web domains pose huge threat to the privacy and security of users. As there is a large amount of free data on web domains in the Internet, Hu et al. investigated the performance of machine learning algorithms with such data, and adopted a feature selection method based on BPSO to identify malicious web domains (Hu et al. 2016). Zhang et al. introduced a new spam detection algorithm to reduce the false positives that incorrectly label nonspams as spams (Zhang et al. 2014). Firstly, they use binary PSO with mutation operator to implement feature selection. Secondly, crucial features are trained by decision tree classifier and K-fold cross validation. Finally, cost matrix assigns different weights to false positive and false negative errors.

The term “usability” describes software usability model based on hierarchical structure, and it has become one of the most important methods to measure software quality. Gupta et al. (2020a) proposed an improved CSA which finds usability features from hierarchical model and provides optimal solutions based on searching for useful features. Jain et al. developed an improved GWO for selecting the most important features in hierarchical software model (Jain et al. 2021). The permission set of an Android application is that it is required during installation. Bhattacharya et al. proposed a feature selection algorithm by rough set and PSO in Android malware detection (Bhattacharya et al. 2019). They adopt a new random key encoding method to binarize PSO, and improve premature convergence. The quality of software features determines the performance of software defect prediction (SDP) model, but redundant and irrelevant features degrade the performance of built model. Ni et al. proposed a multi-objective feature selection which minimizes the number of selected features and maximizes the performance of constructed SDP model (Ni et al. 2019).

Risk-sensitive L2R attempts to learn models with good performance while reducing the risk of underperforming in a few but important queries. Sousa et al. introduced a new method to explore search space and proposed a new evolutionary algorithm (SPEA2) to evaluate the sensitivity and effectiveness of L2R (Sousa et al. 2019). Kozodoi et al. introduced a profit-centered feature selection framework which uses expected maximum profit (EMP) metric as objective function to extend the use of EMP to feature selection (Kozodoi et al. 2019). Then NSGA-II implements multi-objective feature selection, and adopts scorecard profitability and savings as fitness functions.

Souza et al. (2021) studied the over-fitting problem of binary PSO in wrapper feature selection, and proposed a method based on global verification strategy and external archive for handwritten signature verification. Cruz et al. developed a dynamic integration selection algorithm based on meta-learning which embeds multiple standards (Cruz et al. 2017). The system is coded into different meta-features, and its performance is advanced with a meta-feature selection method and overfitting cautious BPSO.

4.2.7 Medical field

Mining medical data provides doctors with decision-making and support when diagnosing and treating patients. Through the integration and processing of a large amount of medical data, feature selection effectively reduces the complexities of analysis and calculation, and improves analysis effect. Binary metaheuristic algorithms have successfully applied in COVID-19 diagnosis, EGG, lesion malignancy classification, warfarin dose prediction, schizophrenia (SZ) disorder diagnosis and virtual screening (VS).

In the past 3 years, COVID-19 threatens the lives of the world people, and early detection is an important topic in disease treatment. Shaban et al. introduced a accurate detection strategy of diagnosing COVID-19 patients with distance biased naive bayes (DBNB) where advanced PSO selects the most informative and important features of patients (Canayaz 2021). Feature selection is implemented through filter and wrapper approaches, and DBNB is employed as classifier to accurately detect infected patients with minimal time penalty. Canayaz proposed a method based on deep learning which is used for early diagnosis of COVID-19 (Shaban et al. 2021). Firstly, a dataset composed of normal, pneumonia and COVID-19 lung X-ray images is created and image contrast enhancement algorithm preprocesses original dataset. Secondly, feature extraction is implemented from new dataset using deep learning models such as VGG19, ResNet, GoogleNet and AlexNet. Thirdly, binary PSO and binary GSO execute feature selection, and SVM classifies selected features.

One of the challenge tasks in brain-computer interface (BCI) is how to deal with high dimensionality data from electroencephalogram (EEG) signals. Pereira et al. systematically evaluated the performance of different variants of binary magnetic optimization algorithm (MOA) and used optimum-path forest as classifier (Pereira et al. 2019). They adopt wavelet transform to extract features from epileptic EEG signals and the proposed algorithms implement feature selection. Baig et al. proposed a new hybrid feature selection method which employs DE to search feature space and generate optimal subset (Baig et al. 2017), and they also conduct a comprehensive research on the significance of EAs in feature selection on EEG signals.

For feature selection in medical diagnosis, Anter and Ali designed a hybrid CSA combining chaos theory and fuzzy c-means algorithm (Anter and Ali 2020). CSA is used as global search and fuzzy c-means is regarded as objective function. Nadimi-Shahraki et al. proposed binary flame optimization algorithm (B-MFO) to select effective features from medical datasets (Nadimi-Shahraki et al. 2021a). Three types of B-MFO are developed using S-shaped, V-shaped and U-shaped transfer functions to convert MFO from continuous to binary. SZ is a common brain disease, and Manohar and Ganesan studied the relationship between the image textures of SZ and normal images (Manohar and Ganesan 2018). With mutual information entropy as objective function, a fuzzy SVM classifier based on BPSO distinguishes SZ individuals from healthy people.

The acquisition costs of features are different in medical data, Chen et al. suggested a feature selection method based on confidence and cost-effectiveness where BPSO improves the classification performance of healthcare data (Chen et al. 2019). Feature confidence, consisting of the correlation between features and classes, promotes search efficiency, and the algorithm considers the fine-grained impact on each dimension of a particle about classification performance.

VS has been proved to improve the success rate of drug discovery activity, but the prediction ability of VS needs to enhance due to the limitations of describing drug binding phenomena by the principles of basic physics. Jiménez et al. brought a multi-objective feature selection method for VS in drug discovery (Jiménez et al. 2019). A detailed comparison is made between ENORA and NSGA-II in accuracy, sensitivity and ROC. As decision support system is widely used in various fields, drug’s dose prediction has become an important research filed. Sohrabi and Tajik developed a multi-objective feature method to support warfarin dose prediction where NSGA-II and MOPSO extract clinical and genetic features and NN is used as classifier (Sohrabi and Tajik 2017).

4.2.8 Electricity field

Electric system is composed of power generation, transmission, transformation, distribution and consumption, and also has relevant information in all links to measure, control, regulate, protect, communicate and dispatch the production process of electric energy (Wang et al. 2020a; Hossain et al. 2019).

Wang et al. introduced a feature selection algorithm that employs the advantages of grey relational analysis (GRA) and BPSO (Wang et al. 2021), and modified initialization with GRA coefficient enhances optimization rate. The algorithm solves the large-scale feature selection of power system and finds out the features that are extremely related to target power system scene. In Ramos et al. (2016), Ramos et al. compared business loss in Brazil with the task of characterizing irregular consumers through black hole algorithm (HHA).

Eseye and Lehtonen (2020) predicted the short-time heat demand of buildings in district heating system through the integration of empirical mode decomposition (EMD), ICA and SVM. The model combined binary GA with Gaussian regression to acquire the most important and nonredundant features. Gu et al. proposed a feature selection method for transient stability assessment in power system based on kernelized fuzzy rough set (KFRS) and memetic algorithm (MA) where original feature set derives from the operation parameters of power system (Gu et al. 2015) . KFRS-based generalized classification is adopted as separability criterion, and MA based on tabu search (TS) and binary DE acquires optimal feature subset with the largest solution quality. The algorithm avoids the information loss caused by feature discretization based on rough set, and improves classification ability.

4.2.9 Parameter optimization field

Classifiers judge the quality of selected features when performing feature selection, but the core parameters of some classifiers such as SVM need to be set in advance which are important for classification. Several algorithms optimize them while completing feature selection, so as achieving the purpose of maximizing prediction.

Based on feature selection and parameter optimization, (Niu et al. 2021) developed a short-term load prediction framework where original loads are decomposed into a series of relatively simple subcomponents with empirical mode decomposition. For each subcomponent, real-valued cooperation search algorithm (CSA) finds the best hyperparameters, and binary CSA is used as a feature selection tool to determine candidate input variables. Finally, the outputs of all sub-modules constitute final prediction result. Li and Yin proposed a multi-objective BBO to select a subset of informative genes (Li and Yin 2013). The algorithm first uses Fisher-Markov to select 60 top gene expression data, then the framework integrates nondominated sorting method and crowded distance method. Finally, the proposed method is adopted for gene selection, and the algorithm is validated by leave-one-out cross-validation (LOOCV) and SVM, as shown in Fig. 5.

Fig. 5
figure 5

The encoding scheme of individuals for SVM optimization and feature selection

Faris et al. proposed a robust method based on multi-variate optimization (MVO) which selects optimal feature subset and optimizes the parameters of SVM (Faris et al. 2018a). Individuals are encoded as real-number vectors, and the number of elements in each vector is equal to the number of features plus two representing parameters of SVM, cost (C) and gamma (\(\gamma\)). Each value is a random number between [0, 1]. Therefore, elements representing features are rounded, and for C and \(\gamma\), they are mapped to different scales because their search spaces are different. For example, the element corresponding to C is mapped to [0, 3500], and the value corresponding to \(\gamma\) is mapped to [0, 32]. In (Li and Yin 2013), the values of C and \(\gamma\) are linearly obtained by Eq. (16).

$$\begin{aligned} B = \dfrac{A - min_A}{max_A - min_A}(max_B-min_B) + min_B \end{aligned}$$
(16)

To sum up, we list the optimization algorithms discussed and their applications, as shown in Tables 7 and 8.

Table 7 The details of the cited references in feature selection
Table 8 The details of the cited references in feature selection -2

4.3 Scheduling problem & resource management problem

Scheduling problem plans jobs, and plays an important role in wireless sensor network (WSN), energy, transportation and other systems. An effective scheduling method greatly improves resource utilization, and core issues are modeling and optimization algorithm. Scheduling are effectively solved by binary metaheuristic algorithms.

4.3.1 WSN field

Resource allocation is an important part in wireless networks, however, its optimization is a nonconvex NP-hard problem. Binary metaheuristic algorithms deal with the task allocations of sensor nodes and base stations, broadcast scheduling, resource allocation and routing problems in WSNs.

A challenging issue of task assignment in WSNs is to reasonably allocate sensor nodes to reduce total power consumption and ensure that the task is completed on time. Guo et al. introduced a real-time fault-tolerant task allocation algorithm which uses primary/backup technology and passive backup copies overlapping to support fault-tolerant (Guo et al. 2014). Binary-valued matrix constructs optimization process, and objective function is composed of task execution time, ode energy cost and network load. Li et al. proposed an optimized sleep strategy based on cluster to relieve power consumption and interference in dense heterogeneous networks (Li et al. 2018). Small base stations with large interference are divided into clusters and binary PSO determines their sleep strategy. The algorithm enhances the sleep and activation efficiency of clusters, and reduces the interruption probability of the system.

In resource allocation, Pham et al. studied the applicability of whale optimization algorithm (WOA) and introduced a penalty method to deal with optimization constraints (Pham et al. 2020). They test three examples in wireless networks, including power allocation for energy and spectral efficiency trade-off, power allocation for maximizing secure throughput, and mobile edge computing offloading, and also use WOA to deal with resource allocations in 5G wireless networks. In view of heterogeneous vehicle communication environment, Liu et al. suggested an SDN-based service architecture which supports the centralized scheduling of vehicles (Liu et al. 2017), and introduced a new code-assisted broadcast scheduling (CBS) problem which aims to combine vehicle cache and network coding into the scheduler. Because CBS is an NP-hard problem, MA solves data dissemination in vehicle networks.

QoS multi-cast routing is a major research field in wireless mesh network transmission. By introducing inertia weight into the velocity update equation (Meraihi et al. 2019), binary bat algorithm (BA) is improved, and then a appropriate value is selected through chaotic map, uniform distribution and Gaussian distribution. The algorithm meets the constraints of multiple QoS such as bandwidth, delay and packet loss rate, and obtains a low-cost multi-cast tree. Shen et al. proposed a bi-velocity discrete PSO and employed it to complete multi-cast routing (Shen et al. 2014). The bi-velocity strategy represents the possibility that a dimension is “1” or “0” where “1” means selecting a node to construct multi-cast tree and “0” means nonselecting.

4.3.2 Unit commitment (UC) field

The task of UC is to reasonably arrange the start-stop status and load distribution of units, so as to satisfy load demand during dispatching period and minimize the cost of power generation under the constraints of generator units and system. UC is a binary scheduling optimization problem, that is, whether unit status is on or off (Pan et al. 2021a).

Panwar et al. proposed three binary GWO models where Sigmoid and tanh transfer functions realize binarization (Panwar et al. 2018). Transmission line rating affects the economic and security of power system, while geographic information system (GIS) is becoming more and more important in the generation and planning of power system. Hemparuva et al. introduced a dynamic route rating based on GIS and weather, and studied its effectiveness in security constrained unit commitment problem (SCUCP) (Hemparuva et al. 2018).

Yang et al. (2017, 2019), Wang et al. (2019b) solve the UCs of plug-in electric vehicles. Yang et al. developed a hybrid power UC with renewable energy power and the charging and discharging management of plug-in electric vehicles, which combines hybrid topology binary PSO, adaptive DE and lambda iteration method Yang et al. (2017). The algorithm intelligently determines the on/off state of each thermal unit, the generation power of on-line unit and the DSM of plug-in electric vehicles. For charging and discharging with a large number of plug-in electric vehicles, Yang et al. suggested a new hybrid coded heuristic algorithm which uses binary PSO based on V-shaped symmetric transfer function Yang et al. (2019), and the influence of transfer function on UC is studied in a 10 power system containing 50,000 plug-in electric vehicles. Wang et al. proposed a parallel framework that simultaneously solves the coordinate UC and DSM of plug-in electric vehicles (Wang et al. 2019b). Real-valued competitive swarm optimization (CSO) improves the flexible access of demand side load and adjusts the disordered charging load of power system. Due to its unique binary switching and large-scale characteristics, binary CSO is improved to optimize states. Then in parallel optimization process, weighting factor is integrated into the framework to analyze the influence of demand side load on the system.

4.3.3 Energy management field

Power system is a large and complex economy. Under the background of increasing scale and higher requirements for multiple quality indicators, we face various compound optimization problems to ensure the realization of power system operation (Abdolrasol et al. 2021; Wu et al. 2000; Nojavan et al. 2015). Its complexity is mainly reflected in multi-objective, various constraints, multi-extreme, high-dimension and uncertain factors, and they bring major challenges to modeling and algorithm innovation. Binary metaheuristic algorithms have been applied in the field of power optimization because of their simple implementation and parallel search (Pedrasa et al. 2009; Mellal and Williams 2020; Li et al. 2016).

In power dispatching, Fang et al. adopted cascading fault model and multi-objective nondominated sorting binary differential evolution algorithm (NSBDE) to solve the link allocation of generators and distributions (Fang et al. 2015). A low computational cost topology model is used to simulate and quantify network’s resilience to the cascading failures caused by target attacks, and they apply the algorithm to 400 kV French transmission network with the goal of minimizing investment costs and maximizing network’s ability to recover from cascading failures.

To simulate wind power generation scenarios, Tan et al. constructed initial scene set through interval method and used Kantorovich distance to create scene reduction strategy (Tan et al. 2014). Energy storage systems and demand responses are employed to power generation side and demand side respectively, and a two-stage scheduling model is built for wind power storage system. Then, a chaotic binary PSO algorithm solves the proposed model and the deficiency that BPSO falls into local optimality. Nojavan et al. developed a new method of preventing voltage instability, and took the optimal switching of transmission lines as a new means to reduce the cost of voltage stability margin by improved binary PSO (Nojavan et al. 2018). Distribution network loss is one of the most significant troubles in the operation of distribution network, and reconfiguration is an important optimization process for loss reduction in the network (Elsied et al. 2016). Milani and Haghifam discussed a basic reconstruction model, and gradually formed an evolutionary method with optimal time interval (Milani and Haghifam 2013). GA handles the proposed model, and it is tested on IEEE 33 Bus network.

In energy management, Khan et al. proposed a priority-induced DSM strategy based on load transfer technology considering the energy cycles of equipment (Khan et al. 2019). The proposed day-ahead load transfer technology is mathematically formulated as a MKP, and three metaheuristic optimization techniques, GA, enhanced DE and binary PSO, are embedded in the proposed autonomous energy management controller. Ayub et al. (2020) designed a robust energy management technology that monitors and controls residential loads in smart homes meeting the requirements of user-defined preferences. They adopt an improved binary GWO which provides customers with maximum satisfaction under predefined user budge. To reduce the consumption and cost of power grid, Faisal et al. studied PSO based on fuzzy controller for the charging–discharging and scheduling of battery energy storage system (ESS) in microgrid (Faisal et al. 2020). Fuzzy logic controller (FLC) controls the charging and discharging of ESS, and the parameters of FLC is optimized by PSO with load demand, available power, battery temperature and charging state.

Scholars not only use binary metaheuristic algorithms, but also improve them to better work out energy optimizations. Li et al. (2016), Tan et al. (2014) and He et al. (2019) adopt new binary coded approaches to implement space transformation, as shown in Eqs. (17) and (18).

$$\begin{aligned} T(x) = \left\{ \begin{array}{l} \dfrac{2}{1+e^{-x}} - 1 \qquad if(rand > 0)) \\ 1 - \dfrac{2}{1+e^{-x}} \qquad \qquad else \end{array} \right. \end{aligned}$$
(17)
$$\begin{aligned} X_i^d = X_i^d + F_i \otimes (X_{r1}^d \oplus X_{r2}^d) \end{aligned}$$
(18)

where “+”, “\(\otimes\)” and “\(\oplus\)” represent “OR”, “AND” and “XOR” operators, respectively.

4.3.4 Other fields

Scheduling problems optimized by binary metaheuristic algorithms also include ship scheduling (Karbassi Yazdi et al. 2020), timetable (Ji et al. 2018; Yang et al. 2014), diet planning (Wulandhari et al. 2019), medication time (Neri et al. 2007), infectious disease control (Zhao et al. 2019), calculation unloading (Du et al. 2018) and test cases (Han et al. 2020).

Karbassi et al. proposed a binary PSO to optimize ship routing and scheduling for liquefied natural gas (LNG) transportation (Karbassi Yazdi et al. 2020). Ji et al. studied the joint scheduling of cascaded ship locks with multi-chambers in water transportation (Ji et al. 2018). The algorithm divides the problem into three interrelated subproblems, each of which has simple structure and high flexibility. Outer layer and inner layer respectively involve the number of ship locks and the sum of ship arrangements. As a bridge connecting the two layers, inter layer is a ship lock direction combination and timetable trouble which is a high-dimensional mixed integer optimization. A hybrid method, quantum-inspired GSA and improved moth-flame optimization (MFO) algorithm, works out the optimization, and two different scheduling rules test the proposed algorithm. Train timetable determines the arrival and departure time of trains. Yang et al. proposed a timetable optimization model in subway systems that improves the utilization of renewable energy while reducing passengers’ waiting time (Yang et al. 2014). Bi-objective optimization model is built with headway and dwell time control, and binary coded GA solves the model.

Wulandhari et al. proposed a binary PSO algorithm which searches the optimal combination of food portion and food selection according to individual daily eating habits (Wulandhari et al. 2019). To find the best multi-drug treatment plan for HIV, Neri et al. proposed an adaptive multi-MA which includes the characteristics of evolutionary framework and three local searchers intelligently activated by adaptive rules (Neri et al. 2007). The first local search is high exploration because it performs random search on all variables of candidate solutions. The second local search is extreme exploitation since it processes one variable. The third local search has intermediate exploration/exploitation characteristics, and uses simulated annealing (SA) to jump out of suboptimal domain to search new promising directions. Zhao et al. developed an networked epidemic control system with binary PSO (Zhao et al. 2019). An improved susceptible-exposed-infected-vigilant (SEIV) model simulates the spread of infectious diseases, and a specific resource description model imitates real world goods/services and their assignment. Based on these two models, a stochastic optimization strategy is introduced, in which each particle determines its own solution according to its excellent neighborhood and the historical search experience of the whole swarm without additional related information.

Computation offloading is critical for computationally intensive applications in mobile user equipment (UE). Du et al. solved it in a hybrid fog and cloud computing system which consists of a fog node, a powerful cloud center and UEs (Du et al. 2018). Unloading decision is obtained through binary customized firework algorithm (FA), and maximum delay among UEs is minimized with the joint optimization of offloading decision and computational resource allocation. Han et al. developed a test configuration method based on GA and binary PSO (Han et al. 2020). This method combines the global search ability of GA with the optimization speed of binary PSO, and uses Bayesian network model to obtain accurate testability indicator in limited time.

In general, we list the optimization algorithms discussed and their applications, as shown in Table 9.

Table 9 The details of the cited references in scheduling

4.4 Structure optimization problem/model optimization problem

The optimization processes of materials and engineering structures involve binary issues, for example, whether the units of array antenna are turned on; whether material surface is coated with metal; whether a model adopts certain structure/layout, etc.

4.4.1 Antenna design

Smart antenna, also known as adaptive antenna array, achieves directional acquisition through programmable antenna units, thereby improving the directional characteristics of the link between mobile terminal and base station. By adjusting the direction of beam, smart antenna increases gain and saves the transmission power of signals. Some indicators evaluate its performance, such as array gain, sidelobe level, interference cancellation ratio and the direction of arrival estimation error, etc (Aldhafeeri and Rahmat-Samii 2019; Modiri and Kiasaleh 2011).

Choo et al. reported the application of GA in the design optimization of small wire antenna considering bandwidth, antenna size and efficiency (Choo et al. 2005). Multi-segment wire structure is adopted and the performance of each wire structure is predicted by numerical electromagnetics program (NEC). The proposed algorithm adopts binary chromosome representing wire shape, and includes a two-point crossover scheme with three chromosomes and a geometric filter. Dong et al. proposed an improved binary PSO algorithm for designing compact, multi-functional and high-dimensional fragment-type antenna (Dong et al. 2017). First, random initialization is replaced with orthogonal array, and is uniformly sampled to obtain better population diversity. Then, a transfer function with time-varying deals with the problem that BPSO is easy to fall into local optimum.

In engineering electromagnetics, Jin and Rahmat-Samii applied a hybrid real-binary PSO (Jin and Rahmat-Samii 2010). Each candidate is composed of the real and binary variables which are optimized according to the velocity and position update equations of PSO and BPSO, and single- and multi-objective versions are verified by functional test bench. Choo et al. used GA to design the optimal shape of corrugated coating under grazing incidence (Choo et al. 2003), and each shape of coating is coded as binary chromosome. To improve optimization ability, a two-point crossover scheme is proposed containing geometric filter and chromosomes, and the algorithm presents a relationship between absorption performance and coating height and weight.

Frequency selective surface (FSS) is widely used in commercial and military fields. Chakravarty et al. proposed a method to design broadband microwave absorbers by multiple FSS shields embedded in dielectric material (Chakravarty et al. 2002), and binary coded micro genetic algorithm (MGA) optimizes the parameters of thickness, relative permittivity and placement.

4.4.2 Chip field

With the increasing scale of integrated circuits and the decreasing size of chips, the distribution of chips seriously affects the performance of chips.

Reed-Müller (RM) logic circuit is commonly adopted in binary algorithms. He et al. used a new binary DE algorithm to search for the best polarity RM logic circuits under performance constraint (He et al. 2017). Zhang et al. brought niche GA to improve the efficiency of area optimization in fixed-polarity RM (FPRM) circuit and find the best polarities of FPRM (Zhang et al. 2011). Because the number of optimal polarities cannot be determined in advance, niche radius is adjusted according to the real-time information of the algorithm, avoiding the defect of parameter setting.

Mak et al. studied a compact power splitter to design a standard foundry silicon photonic platform with BPSO in 2D mesh (Mak et al. 2016). They demonstrate a design with 4.8 \(\upmu\)m*4.8 \(\upmu\)m footprint which contains 100 nm*100 nm and 200 nm*200 nm cells. The design has low insertion loss and wide bandwidth, and presents consistent behavior on the whole wafer. Mirjalili et al. designed the first binary version of BA and applied it to optical buffer design (Mirjalili et al. 2014). 193-nm immersion system is unsuitable for high quality lithography due to the expensive complexity of mask pattern. Kuo et al. used source optimization technology to improve lithography resolution where binary ACO algorithm constructs freeform sources to improve contrast and process window (Kuo et al. 2016).

4.4.3 Engineering structure optimization

Structural design optimizes weight and cost meeting various specifications or different requirements, that is, optimal scheme is found among all feasible solutions according to certain standard. The purpose is to make designed structure more economical material and more reasonable load-bearing distribution, and its content includes structural size optimization, shape optimization and topology optimization (Zhou et al. 2017; Di Cesare et al. 2016).

Truss layout optimization is the process of optimizing truss structure under the restraints of shape, size and topological variables (Richardson et al. 2012). Chen et al. claimed an improved two-level approximate GA which minimizes the weight of trusses by shape variables and shape sensitivities (Chen et al. 2017). Real coded GA presents shape variable and binary coded GA means 0/1 topological variable. Luh et al. adopted the concept of genotype-phenotype representation to improve binary PSO and applied the algorithm in structural topology optimization to minimize weight and compliance (Luh et al. 2011).

Lanthanum-modified lead zirconate titanate (LLZT) driver converts light energy into mechanical motion, and has potential application in smart structures. Zheng et al. introduced a novel algorithm based on GA for LLZT to control the multi-modal vibration of beam structure with the goal of maximizing modal force index (Shijie et al. 2014). Zio et al. discussed the optimization of key infrastructure protection strategies in complex network system (Zio et al. 2012). Three protection strategies minimize the effects of cascading failures on predetermined areas, the system and two protection intervention scales, and they employ an improved multi-objective binary DE to implement the strategies and overcome the time-consuming problem of such optimization.

4.4.4 Other fields

Binary metaheuristic algorithms not only solve set covering problem, but also optimize the engineering structure of physics, chemistry and acoustics.

The traditional identification method of minimum cut set is not suitable for prime implication (PI) in the noncoherent structure function of dynamic system. Di et al. proposed an calculation method to solve PI and used Eq. (19) as its transfer function (Di Maio et al. 2017). Through an improved binary DE, it finds PI that covers all minimum items.

$$\begin{aligned} T(X_i) = \dfrac{1}{1 + e^{-\frac{2b(X_i + F(X_l - X_m) - 0.5)}{1 + 2F}}} \end{aligned}$$
(19)

where F is a weighting factor and b is a coefficient with the value of 6. l and m are two randomly chosen individuals.

Huang et al. used GA to search the global structure of \(B-Al\) binary clusters, and optimized them under density functional theory (Huang et al. 2016). The results show that in binary cluster, its structure is gradually transformed from icosahedron to quasi-planar with the increase of the number on B atoms, and the structure is determined by the ratio of B to Al atoms. Sieber and Werner introduced a novel method for synthesizing metamaterials with Bessel surface and the method is handed by real-valued optimization technique with covariance matrix adaptive evolution strategy (Sieber and Werner 2014). It compares computational efficiency between optimizing the quarter-wave plate metasurface of Bézier and optimizing pixelized surface by binary GA. Through acoustic Rayleigh-Sommerfeld diffraction integral and BPSO, Zhao et al. realized the free manipulation of focusing characteristics in acoustics for the first time (Zhao et al. 2014).

Xia et al. proposed an optimized transposition method for power transformer windings, and combined co-Kriging into binary PSO to minimize circulating current loss (Xia et al. 2015). Co-Kriging obtains most samples from data with low accuracy but low cost, and only acquires few samples from data with high accuracy but high cost, so as to improve fitting accuracy and reduce computational cost.

In brief, we list the optimization algorithms discussed and their applications, as shown in Table 10.

Table 10 The details of the cited references in structure optimization

4.5 Layout problem/location problem

In engineering applications, there are many optimization tasks related to location/placement, such as the allocation of phasor measurement unit (PMU), the deployment of nodes and hubs in WSN, the deployment of cloud computing nodes and virtual machines (VMs) in Internet of Thing (IoT), the deployment of power facility nodes, the selection of warehouse location, the deployment of distribution center etc. First, corresponding mathematical models are constructed, and then “0” and “1” variables indicate whether to deploy nodes in certain location. Finally, binary metaheuristic algorithms seek optimal location/deployment satisfying application constraints.

4.5.1 PMU allocation

PMU plays an important role in the wide-area monitoring and protection of modern power systems, and its deployment is restricted by the high cost of equipment. Binary metaheuristic algorithms have successfully solved the optimal deployment of PMUs (Elroby et al. 2019).

Abd et al. proposed an improved binary PSO which uses mutation strategy, and V-shaped and S-shaped transfer functions to maximize measurement redundancy under the constraints of zero-injection bus, single PMU loss and PMU channel (Abd Rahman and Zobaa 2017). Mutation strategy and V-shaped Sigmoid function improve population diversity and reduce the chance falling into local optimum. To provide multiple solutions, Maji and Acharjee developed an exponential binary PSO algorithm (Maji and Acharjee 2017). The algorithm uses nonlinear inertia weight to improve search ability and adopts a new mathematical equation to update the positions of particles as shown in Eq. (20).

$$\begin{aligned} T(x,V_i^d) = \dfrac{V_{max} + V_i^d + x}{2V_{max} + 1} \end{aligned}$$
(20)

The purpose of traditionally optimal PMU layout is to find the smallest number of devices that maximize observability under different constraints. Due to great improvement in relay technology, digital relay is used as relay and PMU, leading to a lot of deployment costs in substation installation. Therefore, Mishra et al. adopted binary PSO to minimize the number of substations under practical constraints (Mishra et al. 2016a). Based on the measurement reliability of sensitive bus, Maji and Acharjee combined a stage-wise PMU allocation method with binary CSA (Maji and Acharjee 2018). The validity of multiple results is analyzed when solving PMUs, and sensitive bus observability assessment is mathematically modeled with generator, weak current and transformer connected buses to determine the best result.

4.5.2 Network field

WSN integrates micro-electromechanical technology, sensor technology, embedded technology, wireless communication technology and distributed information processing technology, and is a research hotspot. Sensor nodes are subject to their own size, weight, power consumption and cost, harsh deployment environment and numerous external interferences, and they are prone to failure, energy exhaustion, and environmental damage affecting network stability and its reliability (Singh et al. 2021; Swayamsiddha et al. 2019).

In response to hub location problem in competitive environment such as passenger transportation network, Niknamfar et al. proposed a multi-objective BBO where the model minimizes the total transportation cost of a company and maximizes total captured flow (Niknamfar et al. 2017). Ma et al. proposed joint optimization for anchor deployment and power allocation to achieve high-precision positioning in wireless networks (Ma et al. 2020). The goal is to minimize positioning error under given sensors and fixed power budget, while the optimization is solved by binary PSO through using square error as a metric. To maximize the performance of heterogeneous WSN and save network cost, YU et al. proposed a clustering routing algorithm based on wolf pack algorithm (WPA) (Xiu-Wu et al. 2020). First, a mixed integer represents the optimal deployment of heterogeneous nodes. Second, WPA is improved through logistic function and levy flight, and it works out routing problem. Finally, nodes in heterogeneous network are dynamically cluster with improved distributed energy-efficient clustering algorithm to reduce transmission time.

Due to high peak to average power ratio (PAPR), additive Gaussian white noise plays a great impact on visible light communication (VLC) with white light emitting diodes. To solve this problem in 5G networks, Sindhuja and Shankar employed selective mapping (SLM) to reduce PAPR and binary CSA optimizes subblock partition in SLM to improve its convergence rate on VLC system (Sindhuja and Shankar 2020). Industrial wireless sensor network (IWSN) is a new technology that greatly reduces control cost and advances production efficiency. Wang et al. studied the optimal placement of nodes in IWSN to ensure network reliability and reduce costs (Wang et al. 2011). A node layout model of IWSN is established considering the costs of installation, maintenance, reliability and scalability, and advanced binary PSO with adaptive mutation probability searches for optimal placement solution. Moreno et al. combined a ray tracing tool with BPSO to design indoor wireless local area network (WLAN) (Moreno et al. 2015). The power levels of candidate access points (APs) are acquired by the ray tracing tool at potential receiver locations, and are optimized by BPSO. Several restrictions are imposed to select a small number of APs and their channel assignments maintain low transmission power levels. In practice, different coverage priority areas are predefined in process to improve the quality of QoS.

4.5.3 Cloud computing and fog computing

With the development and popularization of cloud computing and fog computing, resource management, operational efficiency and investment have become key issues. In application environment, resource management and task scheduling consider a variety of heterogeneous resources and complex and changeable application requirements, including the overall energy consumption of data center, resource utilization, economic benefits and user service quality. These problems are usually interrelated and cannot be solved by simple weight assignment. Therefore, the scheduling problem of cloud computing has the common characteristics of discrete optimization and multi-objective optimization, and it is very suitable for optimization algorithms.

Zhu et al. proposed a model for delay and task allocation in vehicular fog computing where the model supports the mobility of vehicles (Zhu et al. 2018). Task allocation between fixed and mobile fog nodes is described as joint optimization under the limitations of fog capacity, quality loss and service delay, and binary PSO is utilized. Reducing energy consumption is a current research hotspot in fog computing environment, and consumption mainly depends on the technology that distributes services to a group of VMs. Mishra et al. used PSO, binary PSO and BA to implement sustainable service allocation (Mishra et al. 2018).

IoT includes a large-scale of physical infrastructures and software layers that create applications intuitively and transparently. This highly distributed and energy-consuming model safeguards the quality of deployed services, and also protects the heterogeneity of protocols. Djemai et al. studied IoT service layout (Djemai et al. 2019), and proposed a model of infrastructure and IoT application where discrete PSO minimizes energy consumption and application delay. To solve energy consumption in data centers, Abdessamia et al. believed that VM placement plays an important role in ensuring cloud consolidation and reducing energy consumption (Abdessamia et al. 2020), and binary GSA deals with such problem in heterogeneous data centers.

4.5.4 Power & location fields

Power optimization plays an important role in ensuring the operation of power system, and it is directly related to the security and economy of the system. With the development of optimization technology, seeking algorithms with good convergence, strong robustness and intelligent characteristics has become an important research direction in power system optimization, and it has significantly theoretical engineering significance (Li et al. 2013).

Harris hawks optimization (HHO) is a new algorithm proposed in 2019, and Beşkirli and Dağ developed 8 versions of binary HHO for wind turbine micrositing (Beşkirli and Dağ 2020), as shown in Table 11.

Table 11 The transfer functions of binary HHO

Wave energy is a kind of renewable energy with rapid development and broad prospect. Neshat et al. maximized the total utilized power of a wide wave field composed of three fully submerged wave energy converters (WECs) (Neshat et al. 2020). Due to the large computational cost and high dimensionality search space of hydrodynamic interactions among WECs, they adopt a hybrid multi-strategy evolution framework that combines swarm initialization, binary evolutionary algorithm, discrete local search and continuous global optimization to solve such problems. Hassan et al. used binary PSO and shuffled frog leap algorithm (SLFA) to detect the optimal position and size of distribution powers (Hassan et al. 2020a). The algorithm significantly reduces power loss and improves voltage stability. Ledezma et al. brought a hybrid algorithm based on PSO and quadratic programming (QP) to solve multi-stage transmission expansion planning (Ledezma and Alcaraz 2020). The model considers transmission loss and safety criteria where PSO is used for investment and QP solves operating cost. Jalali et al. (2016) studied the optimal planning of radial distribution system, including the sizes of wires, the location of distributed power, and the locations and sizes of shunt capacitors. Jalali et al. proposed a binary PSO method to plan distribution network. The algorithm is able to solve continuous and binary variables, and it minimizes the cost of the system under inflation rate, power and energy.

In the location of battery exchange station, Zhang et al. proposed a location-routing algorithm under stochastic demands, and the proposed algorithm combines PSO and variable neighborhood search to minimize the number of battery exchange stations and the route between users and stations (Zhang et al. 2019b). Mouhrim et al. studied the location of wireless charging infrastructure in a transmission network composed of multiple routes (Mouhrim et al. 2019). A nonlinear integer programming solution is proposed and solved by multi-objective binary PSO. The algorithm maintains the quality of vehicle route while achieving a compromise between the costs of batteries and installing power transmitter. Phuangpornpitak and Tia suggested a new BPSO for optimizing the placement of photovoltaic (PV) units on radius power distribution system (Phuangpornpitak and Tia 2016). The algorithm converges quickly and explores solution space in different directions avoiding local trap in optimization process. Sahoo et al. developed a multi-objective planning algorithm for electrical distribution system based on PSO where it optimizes the numbers and locations of feeders, section switches, and tie lines (Sahoo et al. 2012). Objective functions consist of minimizing total installation and operating costs and maximizing network reliability, and they are realized by SPEA2.

Mohammadi et al. proposed a multi-objective PSO algorithm for making earthquake response plans which contain pre- and post-disaster decisions (Mohammadi et al. 2016). Tri-objectives are maximizing total expected demand coverage, minimizing total expected cost and minimizing the satisfaction difference of nodes. A genotype-phenotype PSO algorithm handles binary and continuous variables, and a new adaptive inertia weight and two mutation operators are adopted to ensure diversity and balance exploration and exploitation abilities.

Location problem is a very important optimization topic in modern logistics, especially in the current rapid development of mobile Internet and e-commerce, and there is a strong demand for the location optimization of distribution center. Distribution center location refers to the planning process of choosing a specified number of addresses in economic area with demand points, and it is a very complex decision-making optimization activity. Logistics network includes warehouses, suppliers and receivers. Jacyna-GolDa and Izdebski used GA to solve the multi-criteria optimization of warehouse location in logistics network under the constraints of the production capacity of suppliers and the storage capacity of receivers (Jacyna-GolDa and Izdebski 2017). Supply chain network design (SCND) is a complex constrained issue which performs a significant role in enterprise management. Zhang et al. extended it to a large-scale uncertainty SCND which is difficult for traditional methods to obtain feasible solutions in limited time (Zhang et al. 2019b). A co-operative co-evolutionary bare bone PSO with function independent decomposition is proposed and two repair strategies adjust infeasible solutions that often appear in the model. Mogale et al. discussed the three-stage food distribution of India’s Public Distribution System (PDS) which consists of purchasing centers, farmers, base silos and field silos (Mogale et al. 2018). A hybrid model of PSO and chemical reaction optimization (CRO) plans the transportation and storage of grain with the goals of minimizing grain transportation, inventory holding and operating costs, and meets the constraints of seasonal procurement, demand satisfaction, silo capacity and vehicle capacity.

4.5.5 Planning field

Binary metaheuristic algorithms solve layout problems in engineering plannings such as decoupling capacitors location (Lee et al. 2021), gate matrix layout (de Oliveira and Lorena 2002), web service location-allocation (Tan et al. 2017), water distribution system (Zheng et al. 2014), oil well drilling (Guria et al. 2014), 3D rock slope excavation (Wang et al. 2018) and hospital (Ali et al. 2021).

Lee et al. developed a new model to reduce conductive emissions in a automatic emergency braking system where X/Y decoupling capacitors suppress switching noises and binary PSO decides the most effective positions of decoupling capacitors to minimize the number of X/Y in the braking system (Lee et al. 2021). Compared with traditional GA, constructive genetic algorithm (CGA) has many features such as dynamic population and using other heuristics in optimization. Alexandre and Lorena described the application of CGA to gate matrix layout problem (GMLP) which occurs commonly in large-scale integrated designs (de Oliveira and Lorena 2002). It is converted to allocating a set of circuit nodes with the best order, and layout area is minimizing the number of the tracks required to cover gates interconnection.

Network latency is one of the main issues of web service, therefore, the service should consider its physical and user’s positions. Tan et al. suggested an improved binary PSO algorithm which uses adaptive inertia technology to assign web service locations (Tan et al. 2017). Figure 6 presents its binary encoding, and \(A_s^j\) indicates whether service s is allocated at position j.

Fig. 6
figure 6

Encoding scheme example

Zheng et al. adopted a coupled binary linear programming (BLP)-DE algorithm to optimize water distribution system (WDS) design (Zheng et al. 2014). Firstly, graph algorithm decomposes WDS into several trees and a core, and then binary linear programming optimizes the trees. Finally, DE is utilized to optimize the core with obtained optimal trees to generate the approximate optimal solution of WD. The algorithm takes advantage of the benefits of BLP and DE where BLP efficiently provides a global optimal solution of the trees and DE yields the high-quality solutions for the core. Guria et al. used a binary coded elitist GA for optimizing oil well drilling, and the main optimizations are to maximize drilling depth and minimize drilling time and cost (Guria et al. 2014). For wedge failure during rock slope excavation, Wang et al. proposed a binary PSO to optimize the shear strength of 3D rock (Wang et al. 2018). Monte Carlo calculates the failure probabilities of rock slopes under different excavation surfaces, and the reliability of the algorithm is validated in a project of Chongqing.

Hassan et al. proposed the application of suitable location model for field hospitals during the COVID-19 pandemic (Ali et al. 2021). The used model is the most appropriate among three common location models utilized to solve healthcare (set covering model, maximal covering model and P-median model). DBGSK algorithm is adopted to solve the optimum location of facilities in hospital.

In sum, we list the optimization algorithms discussed and their applications, as shown in Table 12.

Table 12 The details of the cited references in layout optimization

4.6 Parameter optimization problem

The solving knowledge of NN is represented through connection weights among neurons, and NN has the characteristics of parallel processing, self-learning, and approaching any nonlinear function with arbitrary precision.

However, there are still several problems in the applications of NN such as the local minimum of network learning, slow convergence rate, complex network structure design and weak generalization performance, and meatheuristic algorithms optimize the parameters of NN to overcome such problems. Jimenez et al. discussed a parameter calibration method for random discontinuous network based on GA (Jimenez and Jurado-Pina 2012). They present the examples used in original Poisson discontinuous network parameters which are known in advance to verify the proposed method, and the reasoning ability of the model is evaluated with back-calculated parameters and various objective functions with different crossover and mutation probabilities. Inputset is a key step to successfully predict data-driven traffic. Taormina and Chau proposed a new input variable selection model (Taormina and Chau 2015), and selected inputset and ELM specifications are encoded as binary particles which are evolved by single- and multi-objective optimizations. The control strategies of charging and discharging have practical significance for electric vehicles participating in peak shaving. Considering household microgrid price and electric vehicles, Zhang et al. established a two-stage optimization model and used an improved binary PSO to optimize the parameters of the model (Zhang et al. 2018). Vuolio et al. built a multi-variable parameterized prediction model for molten iron desulfurization (Vuolio et al. 2019). The model is identified by LOOCV and binary coded GA, and has the characteristics of automatic recognition, reliability and economy.

The hyperparameters of NN, such as the numbers of neurons and hidden layers and the settings of convolutional pool, need to be defined in advance, and they determine the performance of NN. Users try to do various attempts to obtain satisfactory results, and recently, scholars have begun to use metaheuristic algorithms to optimize NN to automatically complete model evolution (Liu et al. 2020; Taormina et al. 2015; Yu et al. 2018). Joshi et al. adopted disruption-based SSA and CSO to optimize the structure of convolutional neural network (CNN) and used it for identifying and classifying peripheral blood cell images (Joshi et al. 2020). The proposed method overcomes hyperparameters’ shortcomings in CNN and also assists the model in processing small datasets. A binary coded technique is employed to transform hyperparameters tuning into an optimization, and the algorithm increases search space by High classification accuracy. NNs have achieved success in image classification, face recognition, audio recognition and pattern detection, etc. However, their design costs are High and they need expert knowledge in related fields. Ahmad et al. proposed a swarm intelligence algorithm to search for new architectures without manual intervention (Ahmad et al. 2020). Firstly, CSA is combined with binary network representation, and hamming distance-based similarity replaces original distance metric to make it compatible with neural structure. Secondly, the adjustment parameters of CSA are reduced through dynamic flight length distribution algorithm. Thirdly, random selection is replaced with tournament selection method. The model develops a automated system and realizes classification with various data. Zhang et al. proposed a two-stage approach to improve the structure and weights of NN (Zhang et al. 2019a). Firstly, a collaborative binary-real DE optimizes network structure and connection weights. Secondly, parameters are fine-tuning by levenberg-marquardt (LM) backpropagation algorithm. The method quickly generates compact NNs with exceptional generalization ability at low computational cost (Fig. 7).

Fig. 7
figure 7

The mixed-coding individual represents the ANN

CNNs have been proven in solving complex image classification, but the design of their architecture is challenging. Li et al. used quantum-based PSO to evolve CNN’s architecture and overcome the disadvantages of traditional PSO (Li et al. 2019). As shown in Fig. 8, each particle has a fixed-length binary string to present the configuration of CNN, including its ID and parameters as shown in Table 13. The same mutation probability is shared in different bits, so ID is placed in the first m bytes without losing generality. All the parameters of this layer are placed in the last n bytes, thereby the length of each binary string is m+n.

Fig. 8
figure 8

A fixed-length binary string of BQPSO individual

Table 13 The parameters of different types of CNN layers

Sadiq et al. introduced an efficient hybrid algorithm of binary PSO and BBO to solve the high computational cost of soft set reduction (Sadiq et al. 2020). The algorithm yields precise decisions for optimal and suboptimal selections.

In short, we list the optimization algorithms discussed and their applications, as shown in Table 14.

Table 14 The details of the cited references in parameter optimization

4.7 Other problems

Aiming at the problems of maximum independent set, maximum clique, set cover, set partition and set packing, Guturu and Dantu proposed a unified EA (Guturu and Dantu 2008). In the proposed method, they map these troubles to maximum clique problem (MCP), and then use an evolutionary strategy to solve it, improving the performance of exploring the largest clique in EAs. Set covering problem (SCP) finds a subset of decision variables to satisfy minimum cost, and Crawford et al. proposed an improved binary monkey search algorithm (MSA) to handle SCP (Crawford et al. 2020). The algorithm uses new climbing process to improve exploration ability and novel cooperative evolution to decrease the number of infeasible solutions. Jaszkiewicz compared the computing efficiency of three state-of-the-art multi-objective metaheuristic algorithms on SCP (Jaszkiewicz 2003), and computational effort is compared in achieving the same solution quality through the average of scalarizing functions on representative samples. Kılıç and Yüzgeç proposed an antlion optimization (ALO) algorithm for quadratic assignment based on contest selection (Kılıç and Yüzgeç 2019). In the random walking of ALO, tournament selection strategy is introduced to replace roulette method, and several equations in ALO are modified.

Minimum labeling spanning tree (MLST) is an NP-hard problem and it is commonly applied in communication networks and data compression. To solve this problem, Lin et al. introduced a binary FA which repairs infeasible solutions and eliminates redundant tags (Lin et al. 2020), and the algorithm is more suitable for discrete optimization. Vehicular ad hoc networks (VANETs) need strong paths connecting all nodes to achieve reliable and efficient information transmission, but classic graph theory only obtains one minimum spanning tree (MST). Zhang and Zhang proposed a binary coded ABC algorithm to solve the construction of spanning tree, and applied the algorithm to roadside-vehicle communication (Zhang and Zhang 2017). Da et al. proposed an improved maximum vertex cover algorithm to meet the strict time complexity constraint of mixed integer linear program, and multi-start local branch handles it by combining the proposed algorithm with local search (Da Silva et al. 2019).

Comments from various fields are increasingly published on the Internet, and analyzing these contents will help stakeholders make decisions. Text summarization technology generates concise summaries consisting of sentiments that are useful for analysis. Priya and Umamaheswari proposed two multi-objective optimization techniques based on PSO (Priya and Umamaheswari 2019). Mojrian and Seyed brought a new multi-document text summarization method which adopts salient sentences from source documents to generate summaries (Mojrian and Mirroshandel 2021). The proposed general summarizer describes text summarization as binary optimization, and solutions are acquired by QIGA. With the rapid growth of Internet and e-government, automatic multi-document summarization has attracted people’s attention. Alguliev et al. modeled multi-document summarization as a binary difference where objective function is a weighted combination of redundancy objective and content coverage (Alguliev et al. 2012). Debnath et al. used an improved multi-objective CSO approach to extract automatic text summarization (Debnath et al. 2021). In the work, the population represents feasible solutions and the constraint is summary length. Objective functions are composed of “coverage rate and informativeness” and “anti-redundancy”, and optimal solution is selected from nondominated solutions according to ROUGE score.

Software testing is the key to ensuring high-quality products, but test suites are large to fully meet software testing requirements. Since the suites usually contain redundancy, i.e., two or more test cases (TCs) cover the same requirement/code segment. De et al. modeled TCs selection as an optimization task based on constrained search, and objective function is maximizing demand coverage under the constraint of execution effort (De Souza et al. 2013). A modified binary PSO solves the problem and two hybrid algorithms are integrated into PSO to improve local search ability. Resources and time are commonly ignored in software development, but they become main constraints in software testing. The selection of TCs is critical for reducing time complexity, therefore, Geetha and Jeya Mala proposed a new multi-objective BA algorithm (Geetha and Jeya Mala 2021). It has two goals, code coverage and object-oriented, and has fast convergence.

Edge detection is widely used in computer vision applications and it is a challenging and time-consuming task. In (Dagar and Dahiya 2020), Dagar and Dahiya suggested an edge detection technique based on multi-objective binary PSO. The proposed method is tested on 500 BSD images, and it is compared with edge detection technologies such as Prewitt, Canny, ACO and GA. Pellot et al. introduced a method for the 3D reconstruction of vessel lumen from two angiographic images (Pellot et al. 1994). The cross-section of each vessel is denoted by a binary string, and SA optimizes the cross-section. The proposed method does successfully on both single and branch vessels, and has inherent ambiguity when viewed under oblique angles. For single vessels and bifurcations, it presents the results of the independent reconstruction of 2D slice and the 3D reconstruction of spatially continuous 2D slices. The irrelevant tasks and redundant channels used in brain-computer interface (BCI) may lead to low classification accuracy, high computational complexity and application inconvenience. Shi et al. brought a new binary harmony search (HS) method to select the best channels and improve the performance of BCI (Shi et al. 2021). EEG signals are collected by the sensors placed in the different positions of human head, and have great potential in biometric systems due to their natural robustness and uniqueness. Rodrigues et al. reduced the number of sensors and maintained comparable performance (Rodrigues et al. 2016). They evaluate binary flower pollination algorithm (FPA) with different transfer functions and optimum-path forest classifier to maximize the accuracy of selected channels.

From household appliances to satellites, embedded system has become an indispensable part of people’s lives. JPEG encoder, an embedded system, is used to obtain high-quality output from digital images. Nath and Datta recommended a multi-objective binary coded GA that uses JPEG encoder to optimize software and hardware components in embedded process (Nath and Datta 2014). High-efficiency itemset mining (HUIM) reveals profitable products through both quantity and profit factors. Lin et al. claimed an efficient HUIM search algorithm with PSO (Lin et al. 2017). The designed algorithm is based on transaction weighted utility model to deeply reduce combinational problem in optimization process, and OR/NOR tree structure is advanced to reduce the discovery of the invalid combinations of HUI.

Lot sizing finds optimal order to minimize the holding and ordering costs of product portfolio, and it becomes an NP-hard problem considering multi-items at multi-levels with capacity constraints. Mishra et al. successfully developed an improved binary PSO technology to solve lot sizing problem with large capacitated multi-level multi-item (Mishra et al. 2016b). Binary PSO deals with the problem in reasonable time, and improved local search mechanism advances the solutions acquired by binary PSO.

Hassan et al. proposed stochastic travelling advisor problem (STAP) in network optimization (Hassan et al. 2020b), and it is defined for an advisory group who chooses a subset of candidate workplaces comprising the most profitable route within the time limit of day working hours. STAP is a typical binary optimization which has a stochastic nature in travelling and advising time, and binary GSK solves it. A travelling disinfection-man problem (TDP) is presented in Hassan et al. (2021b) which optimizes the scheduling for the disinfection process of COVID-19. TDP is similar to travelling salesman problem (TSP), but it is likely to select a route to reach a subset of the predetermined places to be disinfected with the most utilization of available day working hours. DBGSK is used to deal with TDP, improving the scheduling of coronavirus disinfection process for five contaminated faculties in Ain Shams University in Cairo.

In general, we list the optimization algorithms discussed and their applications, as shown in Table 15.

Table 15 The details of the cited references

4.8 Taxonomy and survey

Through investigation, it is found that binary metaheuristic algorithms are widely used in engineering optimizations. Transfer function is main binary strategy, and Sigmoid function is commonly adopted. Binary PSO and NSGA-II are two popular optimization algorithms which deal with the single- and multi-objective problems of feature selection, scheduling, layout and engineering structure optimization.

For the convenience of research, we summarize the datasets commonly used in the applications of binary metaheuristic algorithms, as shown in Table 16. In feature selection, UCI is the most often utilized dataset, and NSL-KDD is employed to compare the performance of algorithms in intrusion detection. IEEE unit test systems and IEEE bus systems are frequently adopted in the fields of electric power and energy, while DUC mostly appears in text summarization.

Table 16 Commonly utilized datasets

5 Future research prospects

Based on the development and current research status of binary metaheuristic algorithms, following interests are presented.

  1. 1.

    Binary algorithm

To apply continuous metaheuristic algorithms to binary applications, it is necessary to use various methods to map the positions of agents to {0, 1} space. Since these algorithms are not initially designed to solve binary problems, the performance of converted binary algorithms is limited. To improve solution ability, it is necessary to specially construct algorithms suitable for binary optimizations. Although several ideas have been put forward, they have not yet received widespread attention. In addition, for metaheuristic algorithms, it is essential to design special improvement strategies for particular binary applications. For example, when PSO solves feature selection, the algorithm is limited by binary position, and its velocity is different from that of continuous PSO. We may propose improved method suitable for feature selection according to the characteristics of binary PSO and data structure.

Although there has some theoretical research on binary algorithms, progress is slow. To provide theoretical basis for solving problems and improvement approaches, it is necessary to increase relevant theoretical analysis. For example, we adopt the latest study on continuous algorithms, Monte Carlo chain and quadratic programming to prove the convergence ability of binary algorithms.

  1. 2.

    Transfer function

Transfer function plays an important role in converting from continuous metaheuristic algorithms to binary versions, and currently used transfer functions are mainly S-shaped and V-shaped. Mathematical functions are further studied and new methods are proposed to improve the solution ability of algorithms. Transfer functions have different performance in binary algorithms, and for specific problem and metaheuristic algorithm, proper transfer function is elaborately designed.

  1. 3.

    Benchmark functions

Scholars have proposed a large number of benchmark functions on continuous algorithms such as the CEC series test functions. However, binary metaheuristic algorithms are limited by position space and they have poor performance in these functions. At present, there is a little research on the benchmark functions of binary algorithms. In future works, more binary benchmark functions are be proposed to facilitate the comparison of algorithms.

  1. 4.

    Large-scale binary problems

With the development of information technology and the advancement of data collection technology, people have to deal with complex troubles. Although the execution efficiency of metaheuristic algorithms is higher than traditional algorithms, it still consumes a lot of time. Surrogate-assisted techniques are believed to solve such problems, and various models are proposed such as Kriging, radial basis function neural network (RBFNN), polynomial regression, etc (Shinde et al. 2020; Winter et al. 2021; Pan et al. 2021b; Ren et al. 2021). The space of these surrogate models is constructed based on real values, but it is a challenging proposing surrogate models suitable for binary problems.

  1. 5.

    Applications

Metaheuristic algorithms have made progress in the fields of transportation, WSN and energy, but they lack integration abilities. In engineering applications, binary and continuous problems often accompany each other, and in future research, hybrid coding or ensemble approaches are needed to solve complex combinatorial optimizations. In the field of intelligent transportation, algorithms mostly optimize partial problems. It is a future research direction to integrate traffic signal optimization, vehicle path planning, traffic flow prediction, shared carpooling, vehicle control and other issues through intelligent algorithms to reduce traffic congestion and improve driving safety and pedestrian comfort. Metaheuristic algorithms globally optimize the network coverage, cluster routing algorithm, nodes deployment, hubs layout, task allocation, base stations deployment and resource allocation of WSNs, thereby reducing resources and energy consumption and improving system utilization. Intelligent algorithms optimize household energy management, power generation/consumption forecasting, power dispatching, distribution network selection, charging and discharging, and power distribution systems to improve energy utilization.

6 Conclusion

Metaheuristic algorithms quickly seek acceptable solutions under the constraints of space and time, and they have gained a lot of attentions. Engineering applications contain binary optimization issues which need to be solved through their binary versions. This article comprehensively introduces binary metaheuristic algorithms in engineering fields, and covers all commonly utilized algorithms and applications. It focuses on binarization schemes, theoretical researches, benchmark functions and engineering problems, and also discusses challenges. According to different applications, algorithms are presented from knapsack, feature selection, scheduling, structure optimization, layout and parameter optimization, etc. We list related issues through optimization algorithm, application, convergence, complexity and scalability, and also provide detailed explanations of reviewed works for easy understanding. In addition, we summarize frequently utilized transfer functions, public datasets, applications, and characteristics to facilitate scholars to conduct research.

Although binary metaheuristic algorithms have achieved successes, their potential has not yet been fully tapped. It is a challenging task to design a new binary algorithm and theoretically prove it. Surrogate for large-scale global optimization has inspired further research on binary problems, but it is a prospective work to develop promising surrogate models. To improve solution ability, it is necessary to research specialized algorithms and binary coded strategies according to practical applications. Many optimizations are related such as energy dispatch, power generation layout, power forecasting, home appliances control etc and how to integrate them is a topic that needs to be investigated in future.