In this section, we present the simulation results of the proposed DU-OS-ELM applied to the GUR prediction model. In addition, we compare the performance of DU-OS-ELM with ELM, OS-ELM and FOS-ELM. All the data in the experiments are from the actual production of a steel works located in China. Besides, in order to verify the validity of the USS, we employ a DU-OS-ELM without the USS as the comparison, which is named DOS-ELM. The sigmoidal additive activation function, i.e., , is adopted in the following experiments, where the input weights and the bias are randomly chosen from the range [−1, 1]. All the experiments are carried out in MATLAB 7.11.0 environment running on a desktop computer equipped with an AMD Athlon(tm)IIX2 250 processor operating at 3.00 GHz and with 2.00 GB RAM.
5.1. Data Preprocessing
The data are collected from a medium-size BF with an inner volume of about 2500 m3. We collect 1500 data pairs in all after processing the outlier values. The first 1200 input-output instances are used for training while the others are used for testing to evaluate the performance of the model.
Next, we calculate some important statistical properties of the variables on the selected data and make the further analysis to get deeper understanding. The statistical properties include maximum, minimum, mean, standard deviation (SD). The results are shown in
Table 2 and
Table 3, respectively.
Table 2 illustrates the statistical properties of GUR in respect of the divided training and testing sets. According to
Table 2, the training and testing sets have different statistical properties, so it can better explain the performance of the predicted results.
Table 3 details the statistical properties of the variables. Based on the statistical properties, it can be found that the selected data has the characteristics of violent fluctuation.
Figure 5 demonstrates the series of GUR and blast volume measured from BF. According to
Table 3 and
Figure 5, the magnitudes of the variables have big difference clearly. In fact, the effect of the variables with a large magnitude on the modeling is larger than the one with a small magnitude, thus it is not appropriate to directly take the data to establish the model [
47]. Thus, all the data are normalized into (0, 1) with the same magnitude to eliminate the influence of dimension among variables before applying in the experiments. The method can be referred Equation (A3) in the
Appendix C.
5.2. Performance Evaluation
To construct the GUR prediction model using the proposed DU-OS-ELM, it is necessary to determine the related parameters. In DU-OS-ELM, there are three important parameters which need to be determined, i.e., the regularization factor
, the number of hidden nodes
L, and the step size
for controlling the forgetting rate. After some experiments we set
in this paper. The number of hidden nodes with the minimum training error is taken as the optimal
L [
14]. The model selection of the optimal
L for DU-OS-ELM is shown in
Figure 6. According to
Figure 6, the green and blue lines on the left stand for the training and testing errors (averaged over 50 trials), and the red line on the right stands for the training time, respectively. In this simulation, we increase
L from 1 to 50. Accordingly, the intervals from 1 to 15 are relatively small and the hidden nodes increase 5 by 5 from 15 to 50. As shown in
Figure 6, with the increase of
L, the root mean squared error (RMSE) of the model is smaller gradually and the training time is increased. The lowest testing error is achieved when
L is within the range (15, 50) with smooth RMSE curves. Therefore, one can select the optimal
L from this range. Considering the computational complexity and testing accuracy, we choose
L equals to 20 as a good trade-off.
One of the contributions of the proposed DU-OS-ELM is the utilization of a novel DFF. In Equation (19), the step size
is an important user specified parameter that also needs to be determined.
Figure 7 presents the testing accuracy with different step sizes
in the range [1, 10]. We increase
1 by 1. As observed from
Figure 7, the minimum testing accuracy is obtained when
equals 5, which will be used in the following experiments.
The convergence of DU-OS-ELM is analyzed theoretically in
Section 3.3, and a simulation evaluation is also performed to show the convergence speed among DU-OS-ELM, DOS-ELM and OS-ELM. The number of initial training data
has been taken as 100 and the size of block of data learned in each step is 20 in this simulation.
Figure 8 details the changing trends among DU-OS-ELM, DOS-ELM and OS-ELM. According to this figure, as the number increment proceeds, the testing error tends to decrease, and DU-OS-ELM obtains the smallest testing error with the fastest convergence speed compared with DOS-ELM and OS-ELM, which illustrates that the proposed DU-OS-ELM has better dynamic tracking ability due to the utilization of the DFF and the USS. DOS-ELM tends to be stable when the number of increment approximately equals to 35 in the black line. However, OS-ELM tends to be stable when the number of increments is approximately equal to 50 in the blue line. The comparison results show that DOS-ELM has faster convergence speed than OS-ELM. In addition, we can find that the red line is smoother than others, which implies that DU-OS-ELM is more suitable for time-varying environments. According to the aforementioned analysis, the proposed DU-OS-ELM has faster convergence speed and is more effective for the time-varying environments.
The changes of the DFF
in Equation (19) of DU-OS-ELM (red line) and DOS-ELM (black line) are depicted in
Figure 9. We can find that the DFF of DOS-ELM is updated iteratively depending on the predicted error. However, due to the addition of the USS, the DFF of DU-OS-ELM is not updated in each iteration, for example, there are constant segments when the number of increments equals 23, 34, 50, respectively. This proves the effectiveness of the USS and indicates that there is no need to update the model in each time step. In addition, according to the changes of the DFF, we can understand the changing status of the production process.
Figure 10 shows comparison results of DU-OS-ELM, DOS-ELM, FOS-ELM, and OS-ELM in a single trial. The parameter is set as
s = 4 in FOS-ELM. The number of initial data is 100 and the size of block of data learned in each step is 10 in DU-OS-ELM, DOS-ELM and OS-ELM. The black line is the actual output value. As observed from
Figure 10, the predicted results of the four approaches are basically consistent with the actual trend, which can reflect the changing trend of GUR, but there are differences in the agreement degree. Obviously, the predicted result of OS-ELM shows the lowest agreement with the actual values. The predicted results of DOS-ELM, FOS-ELM and OS-ELM have little difference, but DU-OS-ELM is better than the other three approaches. In order to identify the simulation results more clearly, the results between 60 and 100 and between 220 and 250 are magnified. As can be found from local amplified drawing in
Figure 10, DU-OS-ELM is more close to the actual output value compared with other three approaches, which shows that the proposed DU-OS-ELM provides more accurately predicted results.
Figure 11 presents the correlation between the predicted values of the different approaches and the desired values. As observed from
Figure 11, most of the predicted values of the proposed DU-OS-ELM are close to
y =
x, but the predicted values of the other approaches are relatively far away from
y =
x. In addition, the calculated results show that the correlation coefficient of DU-OS-ELM, DOS-ELM, FOS-ELM and OS-ELM equal to 0.8813, 0.7301, 0.7275 and 0.6613, respectively, which indicates that the proposed DU-OS-ELM has better prediction performance than others.
In order to evaluate the performance of the proposed DU-OS-ELM quantitatively, the following frequently criteria are used in the experimental results: training time, testing time, training RMSE, testing RMSE, mean absolute percentage error (MAPE) [
48] and SD [
6,
7,
15]. RMSE makes an excellent general purpose error metric for prediction model. MAPE is a relative value and usually expresses accuracy as a percentage. The smaller the RMSE and MAPE, the better the prediction accuracy is. SD is a measure of the dispersion degree of many experiment results. The smaller the SD, the more stable the algorithm is. The mathematical representations of the three statistics are shown in Equations (A4)–(A6) in
Appendix C.
The performance of the proposed DU-OS-ELM is evaluated by one-by-one learning mode and chunk-by-chunk learning mode with 5 and 10 chunk sizes. Fifty trials are carried out in each case. The results are averaged and summarized in
Table 4. According to
Table 4, the training time decreases as the chunk size increases. In addition, it is worth noting that the testing accuracy only has a little change as the chunk size increases. For the same learning mode, the training time (learning time) of DU-OS-ELM is relatively longer than DOS-ELM due to the utilization of the USS, but the output weight is not always updated, thus, the training time of DU-OS-ELM does not increase obviously. Overall, the predicted accuracy of DU-OS-ELM is better than DOS-ELM. Meanwhile, both of the two approaches have smaller SD, however, DU-OS-ELM is smaller than DOS-ELM. It shows that the former is more stable than the latter. The above analysis once again proves the effectiveness of the USS in DU-OS-ELM for time-varying environments.
The compared results of different approaches are given in
Table 5. Since batch ELM does not have the ability to learn sequentially, we use the entire training set to train, and other approaches are updated with the new data online. A fixed chunk size of 10 is selected for DU-OS-ELM, DOS-ELM and OS-ELM in the chunk-by-chunk learning mode. It can be found from
Table 5 that DU-OS-ELM achieves the best generalization performance with the smallest RMSE and MAPE compared with other approaches. In terms of training time, batch ELM takes the least time. After adding DFF and USS, the training time of DU-OS-ELM is a little longer than that of FOS-ELM, but shorter than that of OS-ELM, which shows that DU-OS-ELM does not add too much computational complexity and is easy to implement. In addition, the time consumption of DU-OS-ELM is acceptable in actual production. Therefore, DU-OS-ELM can achieve the accurate GUR prediction, and can satisfy the production requirements.