Isvm Algorithm
Isvm Algorithm
Scientific Programming
Volume 2022, Article ID 4872230, 19 pages
https://doi.org/10.1155/2022/4872230
Research Article
An ISVM Algorithm Based on High-Dimensional Distance and
Forgetting Characteristics
Received 21 March 2022; Revised 14 August 2022; Accepted 26 October 2022; Published 11 November 2022
Copyright © 2022 Wenhao Xie et al. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In the face of the batch, dynamic access data, or the fow of data that continuous changes over time, the traditional support vector
machine algorithm cannot dynamically adjust the previous classifcation model. To overcome this shortcoming, the incremental
support vector machine (ISVM) algorithm is proposed. However, many incremental support vector algorithms still have
shortcomings such as low efciency, memory limitation, and poor generalization. Tis paper puts forward the new ISVM al-
gorithm, HDFC-ISVM∗ algorithm, based on the high-dimensional distance and forgetting characteristics. Tis paper frstly
proposes the original HDFC-ISVM algorithm that frst learns the distribution characteristics of the samples according to the
distance between the samples and the normative hyperplane. Ten, it introduces the forgetting factor. In the incremental learning
process, the classifer gradually accumulates the spatial distribution knowledge of samples, eliminates the samples that have no
contributions to the classifer, and selectively forgets some useless samples according to the forgetting factor, which overcomes the
shortcomings such as low efciency and poor accuracy of some algorithms. But, the original HDFC-ISVM algorithm is sensitive to
parameters, and diferent settings of the parameters have a great impact on the fnal classifcation accuracy of the algorithm.
Terefore, on the basis of the original algorithm, an improved algorithm HDFC-ISVM∗ based on the adjustments to the
initialization strategy and updating rules of the forgetting factor is proposed. Te initialization strategy and updating rules of the
forgetting factor are adjusted to adapt datasets with diferent distributions in this improved algorithm. Te rationality of the
improved strategy about the forgetting factor is discussed theoretically. At the same time, the proposed algorithm has better
classifcation accuracy, classifcation efciency, and better generalization ability than other algorithms, which is verifed
by experiments.
At this time, the necessary and sufcient condition of the the classical ISVM algorithm [11]. Zhang et al. introduced
optimal solution can be obtained for this optimization RCMDE as the feature extraction method and proposed an
problem is that the corresponding Karush–Kuhn–Tucker improved ISVM fault classifer based on the whale opti-
(KKT) conditions are held [5]. mization algorithm (WOA) to diagnose and predict
SVM was originally designed to solve the problem of bearing faults [18]. Te above incremental models are all
binary classifcation of balanced data obtained in batch. based on sample preselection strategies. In addition, many
With the development of machine learning research, SVM scholars and experts have proposed ISVM learning algo-
has also expanded from the initial binary classifcation rithms based on KKT conditions and the Lagrange mul-
problem and regression problem to other machine learning tiplier methods [19–21].
topics, such as feature selection, semisupervision, top order As can be seen from the abovementioned research, in the
learning, ordered regression, outlier detection, and multi- incremental learning process, with the addition of new
perspective learning [6]. At the same time, the extended samples, how to select new support vector sets so that useful
algorithms based on SVM are also applied to more complex information will not be discarded while retaining the
data classifcation or regression. For example, for efectively original training results has become an important content in
reducing the impact of noise in the dataset, solving the noise the construction of the ISVM learning model.
sensitivity and instability of resampling, realizing the high-
precision classifcation of imbalanced data, and the efcient
classifcation of dynamically obtained data or stream data,
2. Description of the Original HDFC-
such as the research of its extended algorithms has always ISVM Algorithm
been one of the research directions in the feld of machine
Previously, many classical ISVM learning algorithms have
learning. Tese studies are of great signifcance for support
been proposed, including Simple_ISVM [22], KKT_ISVM
vector machines and their variants [7–10]. In these new
[23, 24], CSV_ISVM [14], GGKKT_ISVM [25], CD_ISVM
topics, the models evolved from SVM inherit most of the
[26], and other ISVM algorithms mentioned above, these
original characteristics, such as interval theory, kernel
algorithms provide the diferent selection methods of in-
techniques, and structural risk minimization and also inherit
cremental learning training samples from diferent per-
the defects of the original SVM model.
spectives. However, the ability of the classifer to gradually
In the face of the batch, dynamic access data or the fow
accumulate the spatial distribution knowledge of samples is
of data that continuously changes over time, the traditional
still not fully developed, so the accuracy and efciency can be
support vector machine algorithm cannot dynamically ad-
further improved. In order to further learn the distribution
just the previous classifcation model. To overcome this
characteristics of samples, this new ISVM algorithm called
shortcoming, the incremental support vector machine
“HDFC-ISVM” algorithm based on the high-dimensional
(ISVM) algorithm is proposed. In recent years, the incre-
distance and forgetting characteristics is proposed in this
mental learning based on SVM, the ISVM algorithm has
paper. It can fully train the ability of the classifer to ac-
attracted a lot of researchers’ attention [11–14]. In every
cumulate the knowledge of the spatial distribution of
incremental learning process of the ISVM, how to efectively
samples. Te fow of this algorithm is shown in Figure 1.
retain the historical information, selectively discard and
forget the useless training data, and save the storage space
while maintaining the classifcation accuracy is the key of the 2.1. Te Distance from Every Sample to the Optimal Hyper-
ISVM classifcation algorithm. plane in High-Dimensional Euclidean Space. Te training of
Scholars at home and abroad have carried out a lot of the SVM classifcation hyperplane is only related to the
research work on ISVM algorithms based on sample support vectors, and the support vectors are the ones that fall
preselection strategy. For example, Xiao et al. proposed a on the normative hyperplane wx + b � ± 1.In n-dimen-
new incremental learning algorithm—α-SVM [15]. Wang sional Euclidean space, let the mapping function be φ(x). If
analyzed that the samples near the classifcation boundary the projection point x′ of point x to the optimal hyperplane
were easy to become support vectors, and then selected the Π: w · φ(x) + b � 0, then it satisfes the following formula:
nonsupport vectors near the classifcation boundary into
the incremental update and proposed a redundant ISVM
w · φ x′ + b � 0, (2)
learning algorithm [16]. Yao et al. proposed a fast ISVM
learning method based on local sensitive hashing in order where w � (w1 , w2 , . . . , wn ), b � (b1 , b2 , . . . , bn ), x � (x1 ,
to improve the classifcation accuracy of large-scale high- x2 , . . . , xn )T , x′ � (x1′, x2′, . . . , xn′)T and SVs are the set of
dimensional data. Tis method frstly used the local sen- support vectors. ��→
sitive hash to quickly fnd similar data, and then selected As can be seen from Figure 2, the vector x′ x is parallel to
the samples that may become SVs in the increment on the →
the normal vector w of the hyperplane Π, which satisfes the
basis of the SVM algorithm, and then used these samples following formula:
together with the existing SVs as the basis for subsequent �� �
training [17]. Tang combined the strict incremental process ��w · x′ x��� � ‖w‖ · ‖x′ x‖ � ‖w‖ · d, (3)
of the classical ISVM algorithm with the idea of passive-
aggressive online learning to efectively solve the problem where d is the distance from point x to the hyperplane Π. In
of how to better select the new SVs in the online process of addition, the following formula holds:
Scientifc Programming 3
No
normalize di to pi
Check the frogetting factor of
each sample in Tc T+
End the algorithm and
output the final
classifier f1
>0
SVM
forgetting
factor Output the
new classifier
f2
Tc=Ts
f1=f2
x φ (x)
φ
d
x’
φ (x’)
wφ (x’) +b=0
w · x′ x � w1 , w2 , . . . , wn · x1 − x1′, x2 − x2′, . . . , xn − xn′ αi yi K xi , xk + b
d � . (9)
� w1 x1 − x1′ + w2 x2 − x2′ + · · · + wn xn − xn′ α α y y Kx , x (1/2)
i j i j i j
� w1 x1 + w2 x2 + · · · + wn xn
− w1 x1′ + w2 x2′ + · · · + wn xn′
2.3. Mapping and Normalization. HDFC-ISVM algorithm
� w · x − (− b)
frstly trains an optimal classifcation hyperplane from the
� w · x + b. initial data, and then obtains the normative hyperplane wx +
(4) b � ± 1 where the support vectors are located. In each
increment process, formula (9) is used to calculate the
Formula (5) is given by formulas (3) and (4): distance between the newly added positive samples, negative
samples, and the corresponding normative hyperplane wx +
‖w‖ · d � |w · φ(x) + b|. (5)
b � ± 1, respectively. Te distance between the positive
Terefore, the distance from any point in n-dimensional sample x+i and the hyperplane wx + b � +1 is denoted as d+i ,
Euclidean space to the optimal hyperplane is obtained as and the distance between the negative sample x−i and the
follows: hyperplane wx + b � − 1 is denoted as d−i , as shown in
Figure 3.
|w · φ(x) + b| Tus, it can be seen, this algorithm uses the distances
d� . (6)
‖w‖ between the samples in the high-dimensional space and the
hyperplane (see formula (9)) to describe the geometric
distribution of the samples. In order to better refect the
distribution information of the samples, these distances are
2.2. Te Distance between Every Sample and the Optimal
then mapped into the corresponding probability value.
Hyperplane Is Calculated under the Action of Kernel Function.
For nonlinear separability problems, it is necessary to in-
troduce the mapping function φ(x) to map the samples to Defnition 1. Let D+ and D− represent the distance sets of the
the high-dimensional space, and then realize the linearly positive and negative samples from their respective nor-
separability or approximately linearly separability of the mative hyperplanes that are the set D+ � d+i | i ∈ X+ and
samples. Te literature [27] theoretically proves that under the set D− � d−i | i ∈ X− . Let d+max , d+min , d−max , d−min be the
the action of kernel function, the higher the dimension of maximum and minimum values of samples in sets
samples is, the higher the probability of linear separability is D+ and D− , respectively, then the following defnitions can
after they are mapped to a higher dimensional space and a be obtained:
better classifcation efect can be obtained. For this reason, d+max − d+i
this paper frst calculates the distance between the sample p+i � ,
d+max − d+min
and the hyperplane in the high-dimensional space under the
(10)
action of the kernel function. d−max − d−j
Let the mapping function be φ(x) and the kernel p−j � + −
,i ∈ D ,j ∈ D .
function be K(xi , xj ). At this time, w � αi yi φ(xi ) and put d−max − d−min
it into formula (6) to obtain formulas as follows:
In essence, Defnition 1 normalizes the distance from
each sample to the corresponding normative hyperplane and
‖w · xk ‖ � ‖αi yi φ xi · φ xk ‖ maps the distance in Euclidean space to the interval (0,1) .
(7)
‖ � αi yi K xi , xk ‖,
2.4. Forgetting Factor and Initialization
(1/2)
T (1/2)
2.4.1. Forgetting Factor. Before introducing HDFC-ISVM
‖w‖ � w · w � αi yi φ xi αj yj φxj algorithm proposed in this paper, we frst give Defnition 2.
(1/2)
� αi yi φ xi αj yj φxj Defnition 2 see([15]). Te defnitions for the following
(1/2) (8) types of samples are given as follows:
� αi αj yi yj φ xi φxj ① Te SV samples that have never been selected for any
round of training in the sample set are called the in-
(1/2)
class samples, which usually account for a large
� αi αj yi yj Kxi , xj . proportion in the dataset
② Te samples that always appear in each round of SV
Substitute formulas (7) and (8) into formula (6) to obtain sets are called the boundary samples
the distance between the sample and the hyperplane in the
③ Te samples which jitter appear in the SV set are
high-dimensional space as the following formula:
defned as the quasi-boundary samples
Scientifc Programming 5
Premise: suppose T � (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ), xi ∈ Rn , yi ∈ {− 1, 1}, i � 1, 2, . . . , n, T is the initial sample set, and T+ is the new
sample set;
Objective: it will fnd the SVM classifer based on T⋃ T+ .
Step 1. Te initial dataset T is trained to obtain the initial classifer f1 T and the initial SV set Tc , and the forgetting factor
corresponding to each sample in the set Tc is calculated.
Step 2. Check whether the incremental set T+ exists. If not, the algorithm ends, and f1 T is the fnal classifer; otherwise, it will enter
into step 3.
Step 3. For the incremental set T+ , the forgetting factor of every sample of T+ is calculated according to the classifer f1 T.
Step 4. Set Ttotal � T+ ⋃ Tc , it will select the samples whose forgetting factor satisfed αi > 0 to construct the set Ts .
Step 5. A new round of SVM training is carried out for the dataset Ts to obtain a new classifer f2 .
Step 6. For the classifer f2 , the above threshold adjustment rule is used to update the forgetting factor of each sample in the dataset Ts ,
setting Tc � Ts , f1 � f2 and then turning to step 2.
number of positive samples predicted to be negative samples robustness of the classifers is poor because the two algo-
by the model). ACC is the accuracy rate, which represents rithms ignore the set of quasi-boundary vectors that may
the proportion of the number of samples correctly predicted become SVs. However, the classifcation accuracy of CSV-
by the model to the total number of samples. TPR is sen- ISVM algorithm and HDFC-ISVM algorithm fuctuates less
sitivity, which denotes the proportion of the samples cor- during the incremental process, and shows an overall growth
rectly predicted to be positive samples in all positive samples, trend. Te classifcation accuracy of HDFC-ISVM algorithm
and TNR is specifcity, which denotes the proportion of the is slightly lower than other algorithms under the initial state,
samples correctly predicted to be negative samples in all but it can continuously learn the spatial distribution
negative samples. F1-score takes into account both accuracy knowledge of samples and adjust the training set through the
and recall of classifcation models. It can be regarded as a forgetting factors during the incremental learning process.
weighted average of model accuracy and recall. So that it can obtain slightly higher accuracy than Simple-
Tables 2–7 respectively show the predicted results of the ISVM and KKT-ISVM algorithms at last.
above four incremental learning algorithms for the 6 datasets From Figure 5, it can be seen that the diference of
listed in Table 1, where “Iteration count” refers to the in- training time of the four algorithms is not big in the initial
cremental learning times, “ACC” refers the accuracy rate of stages of training, but the total run time of the four algo-
the classifers, and “Time” refers to the training time of this rithms has a huge diference after several incremental
incremental learning. In addition, TPR and TNR values in learning. Te average running time of Simple-ISVM algo-
Table 8 represent the sensitivity and specifcity indexes of the rithm and KKT-ISVM algorithm is 50% longer than CSV-
classifers. Finally, aiming at the training sets and the testing ISVM algorithm and HDFC-ISVM algorithm. Compara-
sets, we compare the classifcation accuracy after each in- tively, the average cumulative running time of HDFC-ISVM
cremental learning by taking their average values. algorithm is less than that of the other algorithms, so this
According to the abovementioned experimental results, algorithm has a great advantage in training efciency.
we obtained the comparison graphs of the accuracy and It can be intuitively seen from Figure 6, compared with
cumulative time of the abovementioned four algorithms for other algorithms, HDFC-ISVM algorithm has little difer-
all the datasets, as shown in Figures 4 and 5. Te comparison ence in sensitivity to positive and negative samples for all the
graph of TPR and TNR values of all the algorithms for the 6 datasets, and the maximum accuracy diference is only
datasets is shown in Figure 6. Figure 7 shows the average within 5%, which makes this algorithm obtain better clas-
classifcation accuracy for the training sets and testing sets sifcation accuracy for both positive and negative samples.
according to Table 9. In Table 9, “Train_Acc” represents the Figure 7 shows the average classifcation accuracy of the
average precision of training datasets, and “Test_Acc” classifers for the training sets and the testing sets after all
represents the average precision of testing datasets. incremental learning. HDFC-ISVM algorithm has certain
As can be seen from Figure 4, during the incremental advantages in the average classifcation accuracy for the
learning process, the classifcation accuracy of Simple-ISVM training set and testing set. In addition, the diference be-
and KKT-ISVM algorithms fuctuates greatly and the tween the average classifcation accuracy of HDFC-ISVM
Scientifc Programming 7
Table 6: Continued.
Simple-ISVM KKT-ISVM CSV-ISVM HDFC-ISVM
Iteration count
ACC Time (s) ACC Time (s) ACC Time (s) ACC Time (s)
5 1 0.5385 0.998 0.1862 0.997 0.1997 1 0.2
6 0.996 0.5215 0.982 0.1805 0.999 0.1751 0.997 0.175
7 0.994 0.6904 0.98 0.1546 0.999 0.197 0.999 0.197
8 0.994 0.6168 0.918 0.1906 0.996 0.1462 0.999 0.146
9 0.998 0.7751 1 0.1299 0.997 0.1971 1 0.197
10 0.9986 0.7634 0.994 0.1934 0.999 0.1681 1 0.168
11 0.9917 1.0033 0.998 0.2045 0.993 0.2079 1 0.208
Table 8: Te comparison of the TPR value and TNR value of the four algorithms for all the datasets.
Data_set Algorithm TPR (%) TNR (%)
Simple-ISVM 96.67 90.51
KKT-ISVM 97.88 88.80
Breast_cancer
CSV-ISVM 99.56 93.53
HDFC-ISVM 97.54 92.97
Simple-ISVM 95.25 92.66
KKT-ISVM 94.01 91.18
German
CSV-ISVM 99.71 96.78
HDFC-ISVM 98.16 96.91
Simple-ISVM 99.95 99.49
KKT-ISVM 99.09 98.10
Heart
CSV-ISVM 99.61 99.47
HDFC-ISVM 99.65 99.60
Simple-ISVM 98.56 98.32
KKT-ISVM 96.80 98.77
Image
CSV-ISVM 98.66 95.66
HDFC-ISVM 99.78 99.79
Simple-ISVM 99.80 97.45
KKT-ISVM 97.96 99.90
Mushroom
CSV-ISVM 98.75 98.56
HDFC-ISVM 99.74 99.72
Simple-ISVM 98.27 94.62
KKT-ISVM 98.22 95.00
Tyroid
CSV-ISVM 98.92 95.23
HDFC-ISVM 99.85 99.01
algorithm for the training set and the testing set is only studied, and a series of hyperparameters involved in this
3.59%, which is lower than 5.61% of Simple-ISVM algorithm algorithm are tested to fully explore the sensitivity of the
and 3.89% of CSV-ISVM algorithm. Terefore, HDFC- algorithm to the introduced hyperparameters.
ISVM algorithm has strong generalization performance.
1.00
1.00 1.00
0.95
Acc
Acc
Acc
0.95 0.95
0.90
0.85 0.85
1 2 3 4 1 2 3 4 5 6 1 2 3 4 5 6 7 8
Increment Count Increment Count Increment Count
Acc
ps , pq , and pm are introduced as the thresholds for initial- 2.7.2. Sensitivity Analysis of Parameter α. For HDFC-ISVM
izing the forgetting factor. In order to explore the sensitivity algorithm proposed above, the assignment strategy of for-
of this algorithm to the initial threshold of the forgetting getting factor α has a great impact on the algorithm per-
factor, four groups of diferent parameters are set for the formance, and diferent assignment of α will lead to diferent
hyperparameters ps , pq , and pm , and tests are carried out on tendencies in selecting candidate support vectors. Terefore,
the abovementioned 6 datasets. Te experimental results are this paper still adopts the abovementioned 6 datasets and
shown in Table 10. Te frst column in Table 10 is the values selects diferent assignment strategies to explore the infu-
of the hyperparameters ps , pq , and pm used in the previous ence of α on the experimental results. Te specifc results are
experiment. shown in Table 11. Here, all the results in Table 11 are the
It can be seen from Table 10 that the values of diferent accuracy when ps , pq , and pm is set to 0.6, 0.75, and 0.9 and α
groups have a greater impact on the experimental results. is set to diferent values. Table 11 lists the combinations of 4
Appropriately increasing the values of the hyperparameters diferent thresholds of parameter α corresponding to the
can improve the classifcation accuracy to a certain extent. four diferent situations and the frst column shows the
However, if the values of the hyperparameters are too large, values of α used in the previous experiment.
the classifcation accuracy will decrease. Meanwhile, it can be It can be seen from Table 11, the diferent assignment
seen from Table 10 that diferent hyperparameters have strategies of α will have great infuences on the fnal results of
diferent infuences on the classifcation accuracy for dif- this algorithm. If the value of α is too small, the forgetting
ferent datasets. Experimental results show that setting dif- factor cannot play its due efect, resulting in some data
ferent hyperparameters will make the accuracy fuctuate to a samples being forgotten prematurely. When the value of
certain extent, so HDFC-ISVM algorithm is sensitive to forgetting factor α increases, the testing accuracy will im-
parameters ps , pq , and pm . prove for some datasets, while it will decrease for others. At
10 Scientifc Programming
Time (s)
Time (s)
1.0
0.6
0.4 0.8
0.3 0.6 0.4
0.2 0.4
0.2
0.1 0.2
1 2 3 4 1 2 3 4 5 6 1 2 3 4 5 6 7 8
Increment Count Increment Count Increment Count
0.6
2.5 1.75
1.50 0.5
2.0
1.25 0.4
Time (s)
Time (s)
Time (s)
1.5 1.00
0.3
1.0 0.75
0.2
0.50
0.5
0.25 0.1
0.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5
Increment Count Increment Count Increment Count
the same time, increasing the assignment of α leads to more 2.8. Te Improvement Strategy for the Original HDFC-ISVM
data to be learned incrementally, which increases the Algorithm. From the abovementioned sensitivity experi-
training time. Terefore, it can be seen that the algorithm is ments, it can be seen that diferent settings of parameters
also sensitive to parameter α. have a great impact on the fnal classifcation accuracy for
the original HDFC-ISVM algorithm. Because of too many
parameters, the algorithm cannot adapt to the datasets with
2.7.3. Te Conclusion about the Parameter Sensitivity. diferent distributions. Te datasets with diferent distri-
Te experimental results show that the original HDFC- butions often need diferent groups of hyperparameters to
ISVM algorithm is sensitive to both ps , pq , andpm and α, and achieve the ideal classifcation results. So, it needs to adjust
diferent parameters need to be adjusted for diferent the initialization strategy and update a rule for forgetting
datasets to achieve the best classifcation efect. When the factor α to some extent. Te following article will do this
values of ps , pq , and pm are 0.6, 0.75, and 0.9, respectively, work and the algorithm after adjusting the forgetting factor
the performance of this algorithm is relatively stable for the initialization rule and updating strategy is called HDFC-
abovementioned 6 datasets. When the parameters’ values are ISVM ∗. Te new rules are as follows.
increased or decreased, the accuracy of the classifer will
fuctuate for diferent datasets. Similarly, when the values of
α are 0.1, 0.15, and 0.3, the testing accuracy of the classifer 2.8.1. Initialization Process. For the new round of incre-
for all the 6 datasets is high, while increasing the values of the mental learning dataset Xadd , we frst use formula (10) to
parameters will lead to rapid decline of testing accuracy for perform probability calculation to obtain the set of p values
some datasets. of all samples, and then the forgetting factor is initialized by
Scientifc Programming 11
(%)
(%)
92 92 92
90 90 90
88 88 88
86 86 86
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Algorithm Algorithm Algorithm
(%)
92 92 92
90 90 90
88 88 88
86 86 86
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Algorithm Algorithm Algorithm
Figure 6: Te comparison of TPR and TNR values of all the algorithms for the 6 datasets.
assigning to the samples in dataset Xadd by the following combined, and the samples with forgetting factor αi > 0 are
formula: screened out. Te samples constitute dataset Xch , and a new
round of SVM training is conducted for dataset Xch .
⎨ ceil pi − pmin 2 , pi > θ,
⎧
αi � ⎩ (15)
0, else,
2.8.2. Updating Rule for the Forgetting Factor. In order to
where pi is calculated by formula (10), pmin represents the make the forgetting factor self-adaptive update, reduce the
minimum in the set of p values for this round, ceil{ } rep- setting of parameters and improve the generalization per-
resents the result taken up by one decimal place, and formance of the model, this paper proposes a new forgetting
θ ∈ (0, 1) represents the regulating parameter. factor update strategy, that is, before a new round of in-
All the samples in incremental dataset Xadd are marked cremental training, the forgetting factor is updated for the
with the corresponding forgetting factors by the above- original data as follows:
mentioned method, then the dataset of the previous round is
1 1 ⎨
⎧ x · xT ⎬
⎫
αi+1 � αi + ⎝ i j − 1, 0⎞
is.Contains XSV , xi + max⎩ min⎛ ⎠ , (16)
1 + αi θ j∈SV ‖xi ‖ · ‖xj ‖ ⎭
12 Scientifc Programming
(%)
(%)
92 92 92
90 90 90
88 88 88
86 86 86
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Algorithm Algorithm Algorithm
(%)
(%)
92 92 92
90 90 90
88 88 88
86 86 86
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Simple–ISVM
KKT–ISVM
CSV–ISVM
HDFC–ISVM
Algorithm Algorithm Algorithm
where is · Contains(·) indicates whether the set XSV contains a support vector. Terefore, the distance mapping between xi
xi . If it does, return 1, otherwise, return 0. xj represents the and the nearest support vector is obtained through the cal-
member of set XSV . culation of function maxj∈SV {min ((xi · xTj /xi · xj ) − 1, 0)}.
Te interpretation of formula (16) is as follows. When xi is When updating αi , this algorithm adjusts the attenuation size
the support vector after the last round of training, then by threshold (1/θ). Te closer it is to the current support
function is · Contains(·) returns 1 and maxj∈SV {min ((xi · vector, the smaller attenuation of αi is, and the further it is, the
xTj /xi · xj ) − 1, 0)} returns 0, and (1/1 + αi ) acts as a weight to greater attenuation of αi is. In this way, the forgetting factor is
adjust the increment of the forgetting factor. When the initialized and updated, the number of parameters is reduced,
forgetting factor is large, the increment will decrease in each the algorithm can adjust the updating rules adaptively by data
round to ensure the sensitivity of the forgetting factor to the distribution for diferent datasets, and the generalization
candidate support vectors. On the contrary, if xi is not the performance of this algorithm is improved.
support vector after the last round of training, then function
is · Contains(·) returns 0. At this point, the forgetting factor is 2.9. Analysis of Experiments and Results of HDFC-ISVM∗
reduced by function maxj∈SV min((xi · xTj /xi · xj ) − 1, 0) Algorithm
because the inner function min((xi · xTj /xi · xj ) − 1, 0) dis-
criminates the distance between xi and support vector xj by 2.9.1. Experimental Datasets for the Improved Algorithm.
comparing the cosine similarity between them. We think that In order to better test the algorithm performance after
the closer xi is to the support vector, the more likely it is to be adjustment, the experiment added 6 datasets in the UCI
Scientifc Programming 13
Table 10: Te accuracy (%) of the original HDFC-ISVM algorithm under diferent settings of parameters ps , pq , and pm .
Diferent values of parameters ps , pq , and pm
Data_set
(0.6, 0.75, and 0.9) (0.1, 0.4, and 0.7) (0.3, 0.6, and 0.9) (0.4, 0.6, and 0.8)
Breast_cancer 91.16 88.47 92.37 93.63
German 94.99 90.23 84.40 95.14
Heart 99.12 96.51 99.04 100.00
Image 97.50 94.79 98.97 98.81
Mushroom 90.53 86.67 93.74 88.66
Tyroid 98.77 95.15 98.40 98.67
Table 11: Te accuracy (%) of the original HDFC-ISVM algorithm under diferent settings of parameter α.
Diferent values of parameters α
Data_set
(0, 0.1, 0.15, and 0.3) (0, 0.05, 0.1, and 0.15) (0, 0.05, 0.1, and 0.3) (0, 0.15, 0.2, and 0.3)
Breast_cancer 91.16 90.67 92.71 93.19
German 94.99 91.67 91.97 93.93
Heart 99.12 99.04 99.04 99.87
Image 97.50 95.83 95.30 98.18
Mushroom 90.53 90.14 90.16 91.14
Tyroid 98.77 94.26 94.91 97.66
library on the basis of the original 6 datasets, namely 12 [26], HDFC-ISVM, and HDFC-ISVM∗ (HDFC-ISVM∗ al-
experimental datasets, and the latest dataset information is gorithm is the improved algorithm based on the original
shown in Table 12. HDFC-ISVM algorithm) for the abovementioned 12 data-
sets in Table 12. In this experiment, for all the algorithms
mentioned above, the initial training datasets contain 500
2.9.2. Te Experimental Results. Based on the above- samples. Each time 500 samples are added for incremental
mentioned experiments, this round of experiment compares learning until all training samples are trained, and the value
the training results of Simple-ISVM [22], KKT-ISVM of θ is 0.3. Te ACC index and F1-score index are introduced
[23, 24], CSV-ISVM [14], GGKKT-ISVM [25], CD-ISVM simultaneously to evaluate the performance of the classifer
14 Scientifc Programming
Table 13: Te F1-score values comparison of all the algorithms for diferent datasets.
F1-score
Datasets
Simple-ISVM KKT-ISVM CSV-ISVM GGKKT-ISVM CD-ISVM HDFC-ISVM HDFC-ISVM∗
Breast_cancer 0.962 0.960 0.983 0.970 0.980 0.971 0.983
German 0.955 0.941 0.991 0.965 0.953 0.991 0.991
Heart 0.977 0.985 0.986 0.985 0.985 0.987 0.996
Image 0.978 0.947 0.977 0.967 0.978 0.985 0.987
Mushroom 0.966 0.982 0.982 0.985 0.983 0.991 0.991
Tyroid 0.975 0.968 0.981 0.982 0.990 0.986 0.990
Titanic 0.837 0.865 0.834 0.860 0.860 0.842 0.875
Splice 0.905 0.908 0.911 0.910 0.918 0.925 0.924
Diabetes 0.931 0.913 0.931 0.928 0.925 0.933 0.974
Credit 0.918 0.918 0.911 0.911 0.910 0.914 0.919
Spambase 0.836 0.844 0.836 0.843 0.840 0.842 0.862
Waveform 0.718 0.697 0.713 0.713 0.715 0.713 0.740
Mean 0.913 0.910 0.919 0.918 0.920 0.923 0.936
(see formulas (11) and (14)). Te specifc experimental re- results show that by adjusting the initialization and update
sults are as follows. strategies of the forgetting factor, the new algorithm can better
It can be seen from Table 13 and Figure 8 that the F1-score adjust the data of each training round and adjust the update
values of HDFC-ISVM algorithm before and after the im- strategy of the forgetting factor adaptively, so as to train the
provement are signifcantly improved for the above- classifer with a better efect.
mentioned 12 datasets. Te mean value of F1-score of HDFC-
ISVM∗ for all datasets is 0.936, 1.3 percent points higher than
HDFC-ISVM algorithm before the improvement and 2.6 2.9.3. Sensitivity Analysis of Parameter θ. In order to further
percent points higher than KKT-ISVM algorithm. Te F1- explore the infuence of parameter θ on experimental results,
score mean value of HDFC-ISVM∗ algorithm for diferent the frst 6 datasets in the abovementioned experiments are
datasets is higher than that of other algorithms, which proves taken to test the accuracy of HDFC-ISVM∗ algorithm. Te
that the improved algorithm has advantages in accuracy and experimental results represent the accuracy (%) of the al-
recall rate compared with other algorithms. At the same time, gorithm for diferent testing sets, with values of θ are 0.1, 0.2,
it can be obtained from Tables 14 and 15 that the average 0.3, and 0.4, respectively. Te experimental results are shown
training accuracy of the HDFC-ISVM∗ algorithm is 92.29% in Table 16.
for all datasets, and the average testing accuracy 90.80% for all It can be seen from the results in Table 16, diferent
datasets. Tis algorithm is not only obviously better than values of θ have a certain degree of infuence on the ex-
other algorithms but also has better efect than the original perimental results. When the value of θ increases from 0.1 to
HDFC-ISVM algorithm. It can be seen from Figure 9, the 0.4, the classifcation accuracy of each dataset also fuctuates.
testing accuracy of the improved HDFC-ISVM∗ algorithm on In general, when the value of θ is 0.3, the algorithm per-
almost all datasets is no lower than HDFC-ISVM algorithm, formance is optimal. Further increasing the value of θ will
especially for the “mushroom” dataset, the testing accuracy of not increase the classifcation accuracy of the algorithm, but
the improved algorithm is improved by 8.61%, and the testing will afect the algorithm’s perception to the overall distri-
accuracy for the “breast_cancer” dataset is improved by 5.46% bution of the datasets and reduce the classifcation accuracy
compared to HDFC-ISVM algorithm. Te experimental of the algorithm because too many samples are deleted.
Scientifc Programming 15
breast_cancer 0.94
1
german 0.93
waveform
0.95
0.92
0.9
spambase heart 0.91
0.85
F1-score
0.8 0.90
0.75 0.89
credit 0.7 image 0.88
0.87
0.86
diabetis mushroom 0.85
Simple-ISVM
KKT-ISVM
CSV-ISVM
GGKKT-ISVM
CD-ISVM
HDFC-ISVM
HDFC-ISVM*
splice thyroid
titanic
Figure 8: Te comparison of F1-score values of the diferent algorithms for all datasets. (a) Te radar graph of F1-score value and (b) the
comparison diagram of F1-score means.
Table 14: Te train_ACC values comparison of all the algorithms for diferent datasets.
Train_ACC (%)
Datasets
Simple-ISVM KKT-ISVM CSV-ISVM GGKKT-ISVM CD-ISVM HDFC-ISVM HDFC-ISVM∗
Breast_cancer 94.78 95.0 97.69 96.50 97.85 96.12 98.86
German 94.63 93.23 98.85 95.60 97.70 97.81 99.50
Heart 99.75 98.63 99.60 98.89 98.90 99.57 100
Image 98.42 96.10 99.79 97.56 98.30 98.72 99.47
Mushroom 99.5 98.75 98.86 98.76 98.72 99.71 99.96
Tyroid 97.52 97.78 97.65 97.80 99.40 98.79 99.44
Titanic 76.94 81.98 77.03 81.85 81.83 80.75 82.33
Splice 90.84 92.26 90.19 92.20 92.19 92.26 92.26
Diabetes 94.33 94.73 94.95 94.10 94.20 94.98 97.81
Credit 86.52 85.12 85.60 85.60 85.80 86.01 86.09
Spambase 78.81 79.73 79.05 79.80 79.81 79.72 80.56
Waveform 68.34 65.67 68.97 67.20 68.53 68.97 71.28
Mean 90.03 89.92 90.68 90.49 91.10 91.11 92.29
Table 15: Te test_ACC values comparison of all the algorithms for diferent datasets.
Test_ACC (%)
Datasets
Simple-ISVM KKT-ISVM CSV-ISVM GGKKT-ISVM CD-ISVM HDFC-ISVM HDFC-ISVM∗
Breast_cancer 89.91 93.70 94.50 93.95 94.20 91.16 96.62
German 89.22 91.97 93.10 92.00 93.20 94.99 99.13
Heart 94.32 98.61 98.34 98.50 98.65 99.12 99.98
Image 93.70 95.34 93.83 95.40 95.56 97.50 99.24
Mushroom 90.61 90.15 90.18 90.30 90.25 90.53 99.14
Tyroid 93.22 93.84 99.16 95.20 99.20 98.77 99.26
Titanic 74.85 80.65 75.45 76.23 76.30 76.38 79.21
Splice 87.29 90.34 90.16 90.18 90.20 90.33 90.31
Diabetes 91.89 91.49 91.81 91.50 91.75 91.90 93.66
Credit 85.17 83.15 83.14 83.20 84.40 85.16 85.17
Spambase 76.9 77.24 76.99 77.35 77.87 77.13 78.53
Waveform 66.76 64.84 66.77 66.78 66.85 66.78 69.40
Mean 86.15 87.61 87.78 87.55 88.20 88.31 90.80
16 Scientifc Programming
100
80
Test_Acc
60
40
20
titanic
splice
spambase
breast_
cancergerman
heart
mushroom
thyroid
waveform
image
credit
diabetis
DataSets
HDFC-ISVM
HDFC-ISVM*
Figure 9: Precision comparison diagram of the testing set before and after HDFC-ISVM algorithm improvement.
Table 16: Classifcation accuracy of HDFC-ISVM∗ algorithm with diferent values of parameter θ.
Te diferent values of parameter θ
Datasets
θ � 0.1 θ � 0.2 θ � 0.3 θ � 0.4
Breast_cancer 95.22 96.23 96.62 96.23
German 97.31 99.01 99.13 96.67
Heart 99.45 100.00 99.98 100.00
Image 97.26 99.25 99.24 99.08
Mushroom 96.73 98.99 99.14 98.77
Tyroid 98.51 99.26 99.26 98.66
1.00
0.95
0.90
Acc
0.85
0.80
0.75
0 5 10 15 20 25
Increment Count
Figure 10: Te incremental learning training precision graph of HDFC-ISVM∗ algorithm.
2.9.4. Application of HDFC-ISVM∗ in Image Detection. In this experimental training set, 2500 pictures of cats
In order to explore the actual efect of HDFC-ISVM∗ al- and 2500 pictures of dogs are selected for training. In this
gorithm in image classifcation, this paper adopts “catsvs- experiment, 20% of the training pictures are extracted by the
dogs” dataset provided by Kaggle as a training dataset. 5000 method of 5 fold cross validation, AlexNet convolutional
images are selected for classifcation to explore the efect of neural network [28] is used to extract image features, and
the proposed algorithm in image classifcation. 4096 dimensional features are fnally extracted as input data.
Scientifc Programming 17
predition:cat
real:cat
0 predition:cat predition:cat predition:cat predition:cat
real:cat real:cat real:cat real:cat
100 0
0 0 0
100 100 50
200 100
400 0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
0 50 100 150 200
Figure 11: Te partial classifcation results display of HDFC-ISVM∗ algorithm for dataset “catsvsdogs.”
Table 17: Te classifcation comparison results of AlexNet and are adjusted to some extent, and an improved algorithm,
HDFC-ISVM∗ for dataset “catsvsdogs.” HDFC-ISVM∗ algorithm, is proposed at last.
Results AlexNet HDFC-ISVM∗ Te algorithm has the following innovations:
Total time (s) 2593 514 (1) It uses the distance formula in the high-dimensional
Test_ACC (%) 95.23 97.93 space to better express the spatial distribution law of
samples;
It marks the cat as − 1 and the dog as +1. Te dataset is (2) Forgetting factor screening method is proposed and
divided into 25 incremental learning units, each batch has relevant screening strategies are formulated to retain
200 image data, and the experiment is carried out using as much as possible part of the datasets that may
HDFC-ISVM∗ algorithm to obtain data such as running become support vectors to improve the classifcation
time and test accuracy. Te specifc experimental results are accuracy. On this basis, the initialization strategy and
as follows. update rule of the forgetting factor are further ad-
Figure 10 shows the incremental learning training justed. Te experimental results show that HDFC-
precision of HDFC-ISVM∗ algorithm for the dataset ISVM∗ algorithm has a good classifcation efect on
“catsvsdogs.” Te frst row in Figure 11 shows the partially most datasets and has the same sensitivity to positive
correctly classifed images, and the second row shows the and negative samples. Te experiments verify that
partially incorrectly classifed images. It can be seen from the HDFC-ISVM∗ algorithm has higher average classi-
experimental results, HDFC-ISVM∗ algorithm has achieved fcation accuracy for the training sets and testing sets
a good classifcation efect. In this experiment, the con- compared with other algorithms.
volutional neural network algorithm—AlexNet algorithm is
Finally, the experimental results show that HDFC-
compared with HDFC-ISVM∗ algorithm proposed in this
ISVM∗ algorithm has better generalization performance and
paper. Te comparison results are shown in Table 17. It can
classifcation efects than other ISVM algorithms and can be
be seen from Table 17, HDFC-ISVM ∗ algorithm has higher
correctly applied to image classifcation. In the image de-
classifcation accuracy and better classifcation efciency for
tection classifcation experiment, HDFC-ISVM∗ is com-
image dataset “catsvsdogs.”
pared with the relatively new convolutional neural network
algorithm—AlexNet algorithm for image dataset “catsvs-
3. Conclusion dogs.” Te results proved that HDFC-ISVM∗ algorithm has
higher classifcation accuracy and classifcation efect than
In this paper, an improved incremental learning algorithm, AlexNet algorithm.
HDFC-ISVM, is proposed, which achieves a good classifcation HDFC-ISVM∗ incremental learning algorithm proposed
efect. On this basis, aiming at the sensitivity of parameters, the in this paper has good classifcation accuracy and classif-
initialization strategy and update rule of the forgetting factor cation efect. However, it can be seen from Tables 14 and 15
18 Scientifc Programming
that for datasets—“waveform,” “spambase,” and “credit,” the [4] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and
classifcation accuracy of HDFC-ISVM∗ algorithm is better K. R. K Murthy, “Improvements to platt’s SMO algorithm for
than other algorithms, but the overall classifcation accuracy SVM classifer design,” Neural Computation, vol. 13, no. 3,
is not very high. Tese datasets have features such as uneven pp. 637–649, 2001.
positive and negative sample sizes to varying degrees and [5] Q. Wu, Research on Extended Support Vector Machine Al-
high-dimensional data, which may be the reason why most gorithm, Science Press, Beijing, China, 2015.
[6] F. Zhu, Research on Some Issues in Support Vector Machines,
ISVM algorithms have low classifcation accuracy for these
Nanjing University of Science & Technology, Nanjing, China,
imbalanced datasets, especially for highly imbalanced
2019.
datasets because in each round of incremental learning, the [7] B. B. Hazarika, D. Gupta, and P. Borah, “An intuitionistic
distribution of the training samples is very diferent from the fuzzy kernel ridge regression classifer for binary classifca-
distribution of the overall samples due to the extreme im- tion,” Applied Soft Computing, vol. 112, no. 4, Article ID
balance of these datasets. So, the accuracy of classifer trained 107816, 2021.
by incremental learning algorithm is reduced. Terefore, for [8] D. Gupta and U. Gupta, “On robust asymmetric Lagrangian
the incremental learning of the imbalanced datasets, espe- ]-twin support vector regression using pinball loss function,”
cially those with large diferences in the number of positive Applied Soft Computing, vol. 102, no. 3, Article ID 107099,
and negative samples, further research work can be carried 2021.
out in the future. For example, we can consider assigning [9] B. B. Hazarika and D. Gupta, “Density weighted twin support
diferent weights to the forgetting factors of training samples vector machines for binary class imbalance learning,” Neural
in each round of incremental learning, especially consid- Processing Letters, vol. 54, no. 2, pp. 1091–1130, 2022.
ering the huge diference in the number of positive and [10] A. Soula, K. Tbarki, R. Ksantini, S. B. Said, and Z. Lachiri, “A
negative samples. Te optimization of appropriate forgetting novel incremental Kernel Nonparametric SVM model (iKN-
SVM) for data classifcation: An application to face detection,”
factor updating strategy will make the training samples in
Engineering Applications of Artifcial Intelligence, vol. 89,
each round of incremental learning have the same distri-
pp. 103468.1–103468, 2020.
bution characteristics as the samples in the original total [11] T. L. Tang, Research on Support Vector Machine Incremental
dataset as far as possible, so as to improve the incremental Learning, Zhejiang University of Technology, Hangzhou,
learning efect of such imbalanced datasets. In addition, China, 2018.
future research work can continue to explore the application [12] C. Hou and Z. H. Zhou, “One-pass learning with incremental
of ISVM algorithm in image classifcation, outlier detection, and decremental features,” IEEE Transactions on Pattern
and other felds. Analysis and Machine Intelligence, vol. 40, no. 11, pp. 2776–
2792, 2018.
[13] R. MelloA, M. R. Stemmer, and A. L. Koerich, “Incremental
Data Availability and decremental fuzzy bounded twin support vector ma-
Te datasets used to support the fndings of this study have chine,” Information Sciences, vol. 526, pp. 20–38, 2020.
[14] R. Chitrakar and C. Huang, “Selection of Candidate Support
been deposited in the UCI dataset, and they are available
Vectors in incremental SVM for network intrusion detec-
openly (the URL of the UCI dataset is https://archive.ics.uci.
tion,” Computers & Security, vol. 45, pp. 231–241, 2014.
edu/ml/index.php). In addition, dataset “catsvsdogs” was [15] R. Xiao, J. C. Wang, and Z. X. Sun, “An approach to incre-
provided by Kaggle (the URL of dataset “catsvsdogs” is mental SVM learning algorithm,” Journal of Nanjing Uni-
https://www.kaggle.com/datasets). versity (Natural Science Edition), vol. 38, no. 2, pp. 152–157,
2002.
Conflicts of Interest [16] W. J. Wang, “A Redundant Incremental Learning Algorithm
for SVM,” in Proceedings of the International Conference on
Te authors declare that they have no conficts of interest. Machine Learning & Cybernetics, pp. 734–738, Helsinki,
Finland, June 2008.
[17] M. H. Yao, X. M. Lin, and X. B. Wang, “Fast incremental
Acknowledgments learning algorithm of SVM with locality sensitive hashing,”
Computer Science, vol. B11, pp. 88–91, 2017.
Tis work was supported by Natural Science Foundation of [18] X. Zhang, J. M. Zhao, H. Z. Teng, and G Liu, “A novel faults
Shaanxi Province (Project No. 2022JM-409) and the Key detection method for rolling bearing based on RCMDE and
R&D Program in Shaanxi Province (Project No. 2021GY- ISVM,” Journal of Vibroengineering, vol. 21, no. 8,
084). pp. 2148–2158, 2019.
[19] G. J. Chen, Research on the Key Issues and Applications of
References Anomaly Detection Based on Support Vector Machines,
Taiyuan University of Technology, Taiyuan, China, 2016.
[1] C. Cortes and V. Vapnik, “Support-vector networks,” Ma- [20] Y. F. Li, B. Su, and G. S. Liu, “An incremental learning al-
chine Learning, vol. 20, no. 3, pp. 273–297, 1995. gorithm for SVM based on combined reserved set,” Journal of
[2] C. J. C. Burges, “A Tutorial on support vector machines for Shanghai Jiaotong University, vol. 50, no. 7, pp. 1054–1059,
pattern recognition,” Data Mining and Knowledge Discovery, 2016.
vol. 2, no. 2, Article ID 121167, 1998. [21] L. Y. Li, Research on Incremental Learning of Support Vector
[3] S. Abe, Support Vector Machines for Pattern Classifcation, Machine Based on Robustness, Wuhan Textile University,
Springer Press, Beilin, Germany, 2005. Wuhan, China, 2018.
Scientifc Programming 19