4.3.1. Results on Designed Models
In this part, we present the experimental results of our designed models. The pruning results on Conv1, Conv2, Conv3 and Conv4 are shown in
Table 3,
Table 4,
Table 5 and
Table 6, respectively. Moreover, we plot the Pareto fronts of EMFP on four designed models in
Figure 3.
First of all, from
Figure 3a, it is well known that we can obtain a series of trade-off pruned models after running EMFP, and the remained filter ratio is in the range of [0.14 0.29] while most of them have an error which is less than 0.1. Compared results with other methods are presented in
Table 3, EMOFP obtains better pruned models, especially before fine-tuning. From
Table 3, the configurations of pruned models are 18, 13 and 9 for all pruning methods while that of the original model is 64. Although the configurations of pruned models are the same, the pruning schemes are different. Obviously, the errors of pruned models are different, and EMOFP always obtains much less errors than other methods. After we fine-tune these pruned models with same training strategies, as all the methods can obtain acceptable final models with similar or little larger errors compared with the original model. With the increasing compression ratio, the error is increased whether it is the pruned model or fine-tuned model. For EMOFP, the fine-tuned models are better than the original model, except in the case where CR is 7.11. Even when the pruned model only has nine filters, the error after fine-tuning is 0.0136, which is slightly larger than 0.0122. Moreover, the FLOPs of the pruned model is significantly less than that of the original model, and the FLOPs of the last pruned model are only 14% of that of original model. Thus, for the designed Conv1, EMOFP generally has a good performance for obtaining a lightweight model.
Secondly,
Figure 3b shows the Pareto front of EMFP on Conv2. From the figure, we can deduce that EMFP obtains a series of uniform trade-off solutions, and their errors are acceptable. The remained filter ratios of these solutions are in the range of [0.15 0.5] which satisfy the setting of parameters
and
, and their maximum error is approximately 0.25—which is acceptable and can become very small by fine-tuning. More detailed results on Conv2 are presented in
Table 4. The difference with the results of Conv1 is that the comparison methods consist of
-global,
-layer,
-global and
-layer. From the table, we can deduce that the configuration and error of the original model are (32, 64) and 0.0083, respectively, and all the methods prune the model with three different filter compression ratios: 2.04, 3.31 and 5.65. Although their CRs are the same, the detailed pruning schemes are different, especially for different pruning ratio assignment strategies. In terms of the error of the pruned models, the EMOFP can obtain better results than all comparison methods, except in the case where filter pruning ratio is 2.04, in which case the results of our method become better and better than those of others with an increasing pruning ratio. In terms of the error of the fine-tuned model, the results of all methods are similar and approximate the error of the original model. Moreover, the fine-tuned error of EMOFP is better than the original one, except for the third pruning scheme, and EMOFP performs better than comparison methods in most cases.
Thirdly, the comparison results and the Pareto front are shown in
Table 5 and
Figure 3c. The Pareto front of EMFP on Conv3 is not very smooth but uniformly distributed. From the figure, we can deduce that the remaining filter ratio is in the range of [0.24 0.42] and their error is in the range [0.1 0.6]. Generally, the results of EMFP are not bad; however, the range of the pruning ratio is a little small, especially for high pruning ratio solutions. From
Table 5, it is well known that EMOFP is always better than the four comparison methods regardless of the pruning scheme. The configuration of the original model is (16, 32, 64), which has 112 filters in total, and the filter’s number of pruned models is approximately 45, 34 and 27, respectively. It is worth noting that the pruned configuration will be 46, 34 and 28 when the pruning ratio assignment is layer-wise. The error of the original model is 0.0071, and the error of the final models of all methods are worse than that of original model, although our method performed better than the comparison methods. However, the errors of the final models are perfectly acceptable, even in terms of the filter compression ratio which is maximum 4.15, and the maximum error of our pruned model is 0.0104. When comparing the error of the pruned model which is not fine-tuned, the error of EMOFP is obviously smaller than that of the comparison methods, which reveals that EMOFP is better than the comparison methods on Conv3. When we focus on the FLOPs of the pruned model, EMOFP certainly obtains a lightweight model which only has approximately 20% of the FLOPs of the original model.
Finally, we present comparison results and Pareto front on Conv4 in
Table 6 and
Figure 3d, respectively. In
Figure 3d, the Pareto front of EMFP on Conv4 is not very good the because the front is not smooth enough and the range of remaining filter ratio is not wide enough. The smallest pruned model has kept over 30% filters, and the error of this model is approximately 0.8 before fine-tuning. Meanwhile, the biggest pruned model has kept approximately 55% filters, and the error of this model is approximately 0.1. Detailed comparison results are in
Table 6. It is well known that the original model has 176 filters in total with the configuration of (16, 32, 64, 64), and its error is 0.0065. Although the detailed configuration of pruned models with different methods is different, the filter’s number of each model with the same pruning ratio is similar, and they are approximately 97, 80 and 60, respectively. Moreover, the filter compression ratios of our method are 1.81, 2.2 and 2.93, respectively. Compared to the original model, the performances of all fine-tuned models are worse than that of the original model, although the their error is also acceptable, especially for models of EMOFP, where the biggest error of our fine-tuned model is only 0.0093. In terms of the error of the pruned model, EMOFP performs much better than the comparison methods. For the second pruning ratio, the error of EMOFP is 0.1867 while the best result of the comparison methods is 0.4856. Obviously, EMOFP can prune over 70% filters of Conv4 with little performance loss, and perform better than comparison methods in general. Moreover, the average FLOPs of the pruned models are approximately 20% those of the original model, which reveals that EMOFP can obtain a lightweight model with acceptable performance.
From these results on four designed models, we can make sure that EMOFP provides a series of efficient different trade-off solutions and has better performance than the comparison methods. Moreover, we can also know that with the depth of increasing depth of the model, the pruning performance decreases. For four Pareto fronts, the Pareto fronts of Conv1 and Conv2 are better than those of Conv3 and Conv4. It is well known that the filter pruning problem dimension is increasing while the model becomes increasingly deep. Therefore, the difficulty of the pruning problem is increased. For example, the pruning problem dimensions of four designed models are 64, 96, 112 and 176. Furthermore, the number of filters of each layer is also a restricted condition for optimization, which will be complex due to the increasing number of layers. It is therefore increasingly difficult to find solutions with a large filter pruning ratio, and the results show that the biggest filter compression ratio decreases when the model becomes complex. Moreover, the FLOPs of our pruned models are less obvious than those of the original model, although the FLOPs of EMOFP are not the most competitive. This is because our EMOFP is only optimized for the number of filters and does not take FLOPs into account. Generally, EMOFP surely obtains a lightweight model with acceptable performance.
4.3.2. Results on LeNet
In this part, we will show the experimental results on LeNet, which is one of the most familiar convolutional neural networks. Firstly, we plot the Pareto front of EMFP and the fitness of the fine-tuned models corresponding to Pareto solutions in
Figure 4. In
Figure 4a, the blue circle dot means the solution of EMFP and the red square dot denotes the solution after fine-tuning. It is well known that EMFP can obtain a very good Pareto front for which the ranges of the remained filter ratio and error are both in [0, 0.7] and the Pareto front is smooth and uniformly distributed. Moreover, as the errors of models after fine-tuning are all below 0.1, it is difficult to observe the change in these solutions in
Figure 4a. In order to show them to be more precise, we plot a separate scatter figure in
Figure 4b. From
Figure 4b, we can know that the distribution of fine-tuned solutions is approximate to Pareto distribution. The maximum error of obtained fine-tuned models is approximately 0.054 with the remaining filter ratio of 0.08. Furthermore, the minimum error is approximately 0.0085 while the remained filter ratio is approximately 0.667.
The comparison results on LeNet are presented in
Table 7. The configuration and error of the used LeNet are (8, 16) and 0.0095, respectively. For EMOFP and the comparison filter pruning methods, the minimum filter compression ratio is 1.5 when the number of remaining filters is 16, and the maximum filter compression ratio is 12 when only two filters remain. For global pruning ratio assignment methods, the
-layer and
-layer, they cannot generate a normal convolutional neural network because there is no filter in the second convolutional layer. In terms of the error of the pruned model which is not fine-tuned, EMOFP is much better than all comparison methods, especially with the increasing filter pruning ratio. Moreover, in terms of the error of the fine-tuned model, EMOFP is also better than all comparison methods, where the errors of three different pruning schemes of EMOFP are 0.0085, 0.0106 and 0.0541, respectively. It is well known that EMOFP obtained a series of valuable different trade-off pruning solutions, and that their FLOPs of are greatly less than that of the original model.
4.3.3. Results on AlexNet
AlexNet was the deepest convolutional neural network used to examine the performance of EMOFP in the experimental studies. Furthermore, the detailed experimental results are shown in
Figure 5 and
Table 8. Firstly, we plot the Pareto front of EMFP and scatter plot of fine-tuned models in
Figure 5. In
Figure 5a, the blue circle dot means the solution of EMFP and the red square dot denotes the solution after fine-tuning. The Pareto front of EMFP is approximate to a line, and the errors of Pareto solutions are not small where all of them are greater than 0.5. Moreover, the range of the remaining filter ratio is [0.1 0.55], which is a little narrow. From the Pareto front, we can deduce that our EMFP can provide a series of trade-off pruned models but it suffers some difficulties of higher dimension optimization. In order to analyze the final performance of these models, we also scatter plot the fine-tuned model in
Figure 5a and separately show it in
Figure 5b. It is well known that the distribution of fine-tuned solutions is approximate to Pareto distribution, and the error of a fine-tuned model is in range of 0.15–0.21. All of them are worse than the original AlexNet. In total, EMOFP does not perform as well on AlexNet as it did before from
Figure 5.
We present a detailed comparison of the results on AlexNet in
Table 8. The filter configuration of the original AlexNet is (24, 64, 96, 96, 64) and the number of filters is 344 in total. Furthermore, the error of the original AlexNet is 0.0996. The comparison methods consist of norm-based filter pruning [
13], average percentage of zeros (APoZ) [
15], soft filter pruning (SFP) [
16] and ThiNet [
17], where APoZ and SFP are implemented on one-shot pruning framework and ThiNet belongs to iterative pruning. From
Table 8, under the condition of a similar pruned model (pruning approximately 60% filters), the performance of EMOFP is not bad but just worse than SFP and ThiNet. Specifically, the configuration of the pruned model with EMOFP is (9, 20, 39, 43, 24), while that of most of comparison methods is (10, 25, 38, 38, 25), because these comparisons use the same layer pruning ratio. Norm-based filter pruning methods are obviously worse than others due to the rough filter importance estimation. SFP pruning filters, while training the model, could update the weights in time. ThiNet applies an iterative pruning framework which usually works better. For a one-shot pruning method, EMOFP achieves the error of 0.1794 on AlexNet, which is acceptable. Moreover, the FLOPs of our pruned model are only half those of the original model, because there are a lot of FLOPs in fully connected layers which are not pruned.