EnsPKDE&IncLKDE: a hybrid time series prediction algorithm integrating dynamic ensemble pruning, incremental learning, and kernel density estimation

Zhu, Gangliang; Dai, Qun

doi:10.1007/s10489-020-01802-4

EnsP^KDE&IncL^KDE: a hybrid time series prediction algorithm integrating dynamic ensemble pruning, incremental learning, and kernel density estimation

Published: 22 August 2020

Volume 51, pages 617–645, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Gangliang Zhu¹ &
Qun Dai¹

907 Accesses
10 Citations
Explore all metrics

Abstract

Ensemble pruning can effectively overcome several shortcomings of the classical ensemble learning paradigm, such as the relatively high time and space complexity. However, each predictor has its own unique ability. One predictor may not perform well on some samples, but it will perform very well on other samples. Blindly underestimating the power of specific predictors is unreasonable. Choosing the best predictor set for each query sample is exactly what dynamic ensemble pruning techniques address. This paper proposes a hybrid Time Series Prediction (TSP) algorithm to implement one-step-ahead prediction task, integrating Dynamic Ensemble Pruning (DEP), Incremental Learning (IL), and Kernel Density Estimation (KDE), abbreviated as the EnsP^KDE&IncL^KDE algorithm. It dynamically selects proper predictor sets based on the kernel density distribution of all base learners’ prediction values. Due to the characteristic of TSP problems that samples arrive in chronological order, the idea of IL is naturally introduced into EnsP^KDE&IncL^KDE, while DEP is a common technology to address the concept drift issue inherent in IL. The algorithm is divided into three subprocesses: 1) Overproduction, which generates the original ensemble learning system; 2) Dynamic Ensemble Pruning (DEP), achieved by one subalgorithm called EnsP^KDE; 3) Incremental Learning (IL), realized by one subalgorithm termed IncL^KDE. Benefited from the advantages of integrating Dynamic Ensemble Pruning scheme, Incremental Learning paradigm and Kernel Density Estimation, in the experimental results, EnsP^KDE&IncL^KDE demonstrates superior prediction performance to several other state-of-the-art algorithms in fulfilling time series forecasting tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DWE-IL: a new incremental learning algorithm for non-stationary time series prediction via dynamically weighting ensemble learning

Article 26 April 2021

Several Novel Dynamic Ensemble Selection Algorithms for Time Series Prediction

Article 30 November 2018

DEP-TSP^meta: a multiple criteria Dynamic Ensemble Pruning technique ad-hoc for time series prediction

Article 19 March 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

M. C. A. Neto, G. D. C. Cavalcanti, and I. R. Tsang, "Financial time series prediction using exogenous series and combined neural networks," in International Joint Conference on Neural Networks, pp. 2578–2585, 2009
Bodyanskiy YV, Tyshchenko OK (2020) A hybrid Cascade neural network with ensembles of extended neo-fuzzy neurons and its deep learning. Inf Technol Syst Res Comput Phys 945:164–174
Article Google Scholar
Lim JS, Lee S, Pang HS (2013) Low complexity adaptive forgetting factor for online sequential extreme learning machine (OS-ELM) for application to nonstationary system estimations. Neural Comput Applic 22:569–576
Article Google Scholar
Ye Y, Squartini S, Piazza F (2013) Online sequential extreme learning machine in nonstationary environments. Neurocomputing 116:94–101
Article Google Scholar
Yee P, Haykin S (1999) A dynamic regularized radial basis function network for nonlinear, nonstationary time series prediction. IEEE Trans Signal Process 47:2503–2521
Article MathSciNet MATH Google Scholar
Crone SF, Hibon M, Nikolopoulos K (2011) Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction. Int J Forecast 27:635–660
Article Google Scholar
J. Villarreal and P. Baffes, "Time series prediction using neural networks," 1993
Castillo O, Melin P (2001) Simulation and forecasting complex economic time series using neural networks and fuzzy logic. IEEE Int Conf Syst 3:1805–1810
Google Scholar
Chandra R (2015) Competition and collaboration in cooperative coevolution of Elman recurrent neural networks for time-series prediction. IEEE Trans Neural N Learning Syst 26:3123–3136
Article MathSciNet Google Scholar
Dieleman S, Willett KW, Dambre J (2015) Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon Not R Astron Soc 450:1441–1459
Article Google Scholar
Gaxiola F, Melin P, Valdez F, Castillo O (2015) Generalized type-2 fuzzy weight adjustment for backpropagation neural networks in time series prediction. Inf Sci 325:159–174
Article MathSciNet MATH Google Scholar
Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42:855–863
Article Google Scholar
D. Sotiropoulos, A. Kostopoulos, and T. Grapsa (2002) A spectral version of Perry’s conjugate gradient method for neural network training, in Proceedings of 4th GRACM Congress on Computational Mechanics, pp. 291–298
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 4:212–223
Google Scholar
Huang GB, Zhu QY, Siew CK (2005) Extreme learning machine: a new learning scheme of feedforward neural networks. IEEE Int Joint Confer Neural Networks 2:985–990
Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Article Google Scholar
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybernetics Part B 42:513–529
Article Google Scholar
Wang X, Han M (2014) Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing 145:90–97
Article Google Scholar
Ye R, Dai Q (2018) A novel transfer learning framework for time series forecasting. Knowl-Based Syst 156:74–99
Article Google Scholar
W. Hong, L. Lei, and F. Wei (2016) Time series prediction based on ensemble fuzzy extreme learning machine, in IEEE International Conference on Information & Automation, pp. 2001–2005
W. Hong, F. Wei, F. Sun, and X. Qian (2015) An adaptive ensemble model of extreme learning machine for time series prediction," in International Computer Conference on Wavelet Active Media Technology & Information Processing, pp. 80–85
Lin L, Fang W, Xie X, Zhong S (2017) Random forests-based extreme learning machine ensemble for multi-regime time series prediction. Expert Syst Appl 83:164–176
Article Google Scholar
X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga (2014) Ensemble deep learning for regression and time series forecasting, in Computational Intelligence in Ensemble Learning, pp. 1–6
Li J, Dai Q, Ye R (2019) A novel double incremental learning algorithm for time series prediction. Neural Comput & Applic 31:6055–6077
Article Google Scholar
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Article MathSciNet MATH Google Scholar
E. Ley and M. F. Steel (1993) Bayesian econometrics: Conjugate analysis and rejection sampling, in Economic and Financial Modeling with Mathematica®, ed: Springer, pp. 344–367
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
M. I. Jordan, "Serial order: A parallel distributed processing approach," in Advances in Psychology. vol. 121, ed: Elsevier, 1997, pp. 471–495
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681
Article Google Scholar
Hochreiter S, Schmidhuber JR (1997) Long short-term memory. Neural Computation 9:1735–1780
Article Google Scholar
Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17:1411–1423
Article Google Scholar
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernetics, Part C (Applications Rev) 31:497–508
Article Google Scholar
Muhlbaier MD, Topalis A, Polikar R (2008) Learn++.NC: combining Ensemble of Classifiers with Dynamically Weighted Consult-and-Vote for efficient incremental learning of new classes. IEEE Trans Neural Netw 20:152–168
Article Google Scholar
Zhang W, Xu A, Ping D, Gao M (2019) An improved kernel-based incremental extreme learning machine with fixed budget for nonstationary time series prediction. Neural Comput & Applic 31:637–652
Article Google Scholar
Yang Y, Che J, Li Y, Zhao Y, Zhu S (2016) An incremental electric load forecasting model based on support vector regression. Energy 113:796–808
Article Google Scholar
Woloszynski T, Kurzynski M (2011) A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recogn 44:2656–2668
Article MATH Google Scholar
Zhai JH, Xu HY, Wang XZ (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16:1493–1502
Article Google Scholar
Cruz RM, Sabourin R, Cavalcanti GD, Ren TI (2015) META-DES: a dynamic ensemble selection framework using META-learning. Pattern Recogn 48:1925–1935
Article Google Scholar
H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, et al.(2018) Deep multi-view spatial-temporal network for taxi demand prediction," in Thirty-Second AAAI Conference on Artificial Intelligence
R. Senanayake, S. O'Callaghan, and F. Ramos (2016) Predicting spatio-temporal propagation of seasonal influenza using variational Gaussian process regression," in Thirtieth AAAI Conference on Artificial Intelligence, pp. 3901–3907
A. Venkatraman, M. Hebert, and J. A. Bagnell (2015) Improving multi-step prediction of learned time series models," in Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 3024–3030
S. Dasgupta and T. Osogami (2017) Nonlinear dynamic Boltzmann machines for time-series prediction, in Thirty-First AAAI Conference on Artificial Intelligence
Z. Liu and M. Hauskrecht (2016) Learning adaptive forecasting models from irregularly sampled multivariate clinical data, in Thirtieth AAAI Conference on Artificial Intelligence, pp. 1273–1279
Zhou ZH, Wu J, Jiang Y (2001) Genetic algorithm based selective neural network ensemble. Int Joint Conf Artif Intell:797–802
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137:239–263
Article MathSciNet MATH Google Scholar
He H, Chen S, Li K, Xu X (2011) Incremental learning from stream data. IEEE Trans Neural Netw 22:1901–1914
Article Google Scholar
J. O. Gama, R. SebastiÃ£o, and P. P. Rodrigues (2009) Issues in evaluation of stream learning algorithms, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Carlo E. Bonferroni, "Il calcolo delle assicurazioni su gruppi di teste," In Studi in Onore del Professore Salvatore Ortu Carboni, Rome: Italy, pp. 13–60, 1935
Zhou T, Gao S, Wang J, Chu C, Todo Y, Tang Z (2016) Financial time series prediction using a dendritic neuron model. Knowl-Based Syst 105:214–224
Article Google Scholar
Soares E, Costa P, Costa B, Leite D (2018) Ensemble of evolving data clouds and fuzzy models for weather time series prediction. Appl Soft Comput 64:445–453
Article Google Scholar
Svarer C, Hansen LK, Larsen J (1993) On Design And Evaluation Of Tapped-Delay Neural-Network Architectures. IEEE Int Conf Neural Netw 1–3:46–51
Article Google Scholar
Bezerra CG, Costa BSJ, Guedes LA, Angelov PP (2016) An evolving approach to unsupervised and real-time fault detection in industrial processes. Expert Syst Appl 63:134–144
Article Google Scholar
D. Kangin and P. Angelov (2015) Evolving clustering, classification and regression with TEDA, in 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8
Yao C, Dai Q, Song G (2019) Several novel dynamic ensemble selection algorithms for time series prediction. Neural Process Lett 50:1789–1829
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (Grant Nos. 2018YFC2001600, 2018YFC2001602), and the National Natural Science Foundation of China under Grant no. 61473150.

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Gangliang Zhu & Qun Dai

Authors

Gangliang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qun Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Dai.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

APPENDIX Online Extreme Learning Machine with Kernels

1.1 Extreme Learning Machine (ELM)

As shown in Fig. 11, the architecture of Extreme Learning Machine (ELM) [15] is the same with a SLFN. The weights between its input and hidden layers are initialized as random values. While its output weight matrix is to be computed. Supposing that there are N arbitrary distinct samples (x_i, t_i), where x_i ∈ Rⁿ, t_i ∈ R, ELM could be formulated as:

$$ \sum \limits_{i=1}^L{w}_ig\left({\theta}_i^T\cdot {x}_j+{b}_i\right)={o}_j,\kern0.33em j=1,2,\dots N $$

(13)

where θ_i ∈ Rⁿ represents the weight vector of the connection between the input layer and the i-th hidden neuron, b_i ∈ R represents the bias value of the i-th hidden neuron, g(.) is the activation function, w_i is the weight vector of the connection between the i-th hidden neuron and the output neurons, L denotes the hidden neurons number, and o_j denotes the output of the j-th sample. Eq. (13) can be more concisely formulated as:

$$ H\mathrm{w}=\mathrm{o} $$

(14)

where

$$ H={\left[\begin{array}{ccc}g\left({\uptheta}_1\cdot {\mathrm{x}}_1+{b}_1\right)& \cdots & g\left({\uptheta}_L\cdot {\mathrm{x}}_1+{b}_L\right)\\ {}\vdots & \ddots & \vdots \\ {}g\left({\uptheta}_1\cdot {\mathrm{x}}_N+{b}_1\right)& \cdots & g\left({\uptheta}_L\cdot {\mathrm{x}}_N+{b}_L\right)\end{array}\right]}_{N\times L} $$

(15)

o = [o₁, o₂, …, o_N]^T represents the output of ELM, and w = [w₁, w₂, …, w_L]^T is weight matrix connecting the hidden and the output layers. The value of w can be calculated by:

$$ \mathrm{w}={H}^{\varPsi}\mathrm{t} $$

(16)

where t = [t₁, t₂, …, t_N]^T denotes the vector of target values, H^Ψ represents the Moore-Penrose generalized inverse of H.

1.2 Extreme Learning Machine with Kernels (ELMK)

ELMK [16] is one promotion of ELM. Define h(x) ∈ R^d(d > > n) as a mapping from low dimensions to high ones. Let H = [h(x₁), h(x₂), …, h(x_N)]^T, the loss function of ELMK is:

$$ {\displaystyle \begin{array}{c}\mathit{\operatorname{Min}}:\kern0.33em {L}_p=\frac{1}{2}{\left\Vert \mathrm{w}\right\Vert}^2+\frac{C}{2}\sum \limits_{i=1}^N{\xi}_i^2\\ {}s.\kern0.33em t.\kern1em \mathrm{h}{\left({\mathrm{x}}_i\right)}^T\cdot \mathrm{w}={t}_i-{\xi}_i,\kern0.33em i=1,2,\dots, N\end{array}} $$

(17)

Using Lagrange multiplier method to solve Eq. (17), then Eq. (18) can be obtained:

$$ f\left(\mathrm{x}\right)=\mathrm{h}{\left(\mathrm{x}\right)}^T\cdot {H}^T{\left(H{H}^T+\frac{I}{C}\right)}^{-1}\mathrm{t} $$

(18)

where x represents the test instance, and f(x) expresses the predicted value. Let K(x_i, x_j) = h(x_i)^T ⋅ h(x_j), K is a kernel function. Eq. (18) can be rewritten as:

$$ f\left(\mathrm{x}\right)=\left[K\left(\mathrm{x},{\mathrm{x}}_1\right),\dots, K\Big(\mathrm{x},{\mathrm{x}}_N\Big)\right]{\left(\left[\begin{array}{ccc}K\left({\mathrm{x}}_1,{\mathrm{x}}_1\right)& \cdots & K\left({\mathrm{x}}_1,{\mathrm{x}}_N\right)\\ {}\vdots & \ddots & \vdots \\ {}K\left({\mathrm{x}}_N,{\mathrm{x}}_1\right)& \cdots & K\left({\mathrm{x}}_N,{\mathrm{x}}_N\right)\end{array}\right]+\frac{I}{C}\right)}^{-1}\mathrm{t} $$

(19)

1.2.1 Online Sequential ELMK (OS-ELMK)

ELM and ELMK are substantially offline learning models. While in TSP problems, samples tend to enter the model one-by-one or batch-by-batch over time, therefore, online learning is especially important for TSP. And thus, online sequential ELMK (OS-ELMK) [17] came into being.

The corresponding Lagrangian dual problem of Eq. (17) could be formulated as:

$$ {L}_D=\frac{1}{2}{\left\Vert \mathrm{w}\right\Vert}^2+\frac{C}{2}\sum \limits_{i=1}^N{\xi}_i^2-\sum \limits_{i=1}^N{\theta}_i\left(\mathrm{h}{\left({\mathrm{x}}_i\right)}^T\mathrm{w}-{t}_i+{\xi}_i\right) $$

(20)

The KKT optimality conditions of Eq. (20) are listed as below:

$$ \frac{\partial {L}_D}{\mathrm{\partial w}}=\mathrm{w}-\sum \limits_{i=1}^N{\theta}_i\mathrm{h}\left({\mathrm{x}}_i\right)=0\Rightarrow \mathrm{w}=\sum \limits_{i=1}^N{\theta}_i\mathrm{h}\left({\mathrm{x}}_i\right) $$

(21)

$$ \frac{\partial {L}_D}{\partial {\xi}_i}=C{\xi}_i-{\theta}_i=0\Rightarrow {\theta}_i=C{\xi}_i,\kern1em i=1,\dots, N $$

(22)

$$ \frac{\partial {L}_D}{\partial {\theta}_i}=\mathrm{h}{\left({\mathrm{x}}_i\right)}^T\mathrm{w}-{t}_i+{\xi}_i=0,\kern1em i=1,\dots, N $$

(23)

If Eqs. (21) and (22) are substituted into Eq. (23), it could be obtained that:

$$ \sum \limits_{j=1}^N{\theta}_j\mathrm{h}{\left({\mathrm{x}}_i\right)}^T\mathrm{h}\left({\mathrm{x}}_j\right)-{t}_i+\frac{\theta_i}{C},\kern1em i=1,\dots, N $$

(24)

When some new samples $ \left({\mathrm{x}}_k^{new},{t}_k^{new}\right),k=1,\dots, m $ arrive, Eq. (24) can be updated as:

$$ \sum \limits_{k=1}^m{\theta}_k^{new}\mathrm{h}{\left({\mathrm{x}}_k^{new}\right)}^T\cdot \mathrm{h}\left({\mathrm{x}}_i\right)+\sum \limits_{j=1}^N{\theta}_j^{\ast}\mathrm{h}{\left({\mathrm{x}}_j\right)}^T\cdot \mathrm{h}\left({\mathrm{x}}_i\right)-{t}_i+\frac{\theta_i^{\ast }}{C}=0,\kern0.33em i=1,\dots, N $$

(25)

$$ \sum \limits_{k=1}^m{\theta}_k^{new}\mathrm{h}{\left({\mathrm{x}}_k^{new}\right)}^T\cdot \mathrm{h}\left({\mathrm{x}}_i^{new}\right)+\sum \limits_{j=1}^N{\theta}_j^{\ast}\mathrm{h}{\left({\mathrm{x}}_j\right)}^T\cdot \mathrm{h}\left({\mathrm{x}}_i^{new}\right)-{t}_i^{new}+\frac{\theta_i^{new}}{C}=0,\kern0.33em i=1,\dots, m $$

(26)

where $ {\theta}_i^{\ast } $ indicates the updated θ_i, $ {\theta}_k^{new} $ indicates the new k-th θ. It is defined that $ {\uptheta}^{new}={\left[{\theta}_1^{new},\dots, {\theta}_m^{new}\right]}^T $, and Δθ = [Δθ₁, Δθ₂, …, Δθ_N]^T, where $ {\theta}_j^{\ast }={\theta}_j+\varDelta {\theta}_j $, j = 1,2,…,N. Combining Eq. (24) with Eq. (25), then it can be obtained that:

$$ {\displaystyle \begin{array}{c}\sum \limits_{j=1}^N{\theta}_j^{\ast }K\left({\mathrm{x}}_j,{\mathrm{x}}_i\right)+\sum \limits_{k=1}^m{\theta}_k^{new}K\left({\mathrm{x}}_k^{new},{\mathrm{x}}_i\right)-\sum \limits_{j=1}^N{\theta}_jK\left({\mathrm{x}}_j,{\mathrm{x}}_i\right)+\frac{\theta_i^{\ast }-{\theta}_i}{C}=0\\ {}\Rightarrow \sum \limits_{j=1}^N\left({\theta}_j^{\ast }-{\theta}_j\right){K}_{j,i}+\sum \limits_{k=1}^m{\theta}_k^{new}{K}_{k,i}^{\prime }+\frac{\theta_i^{\ast }-{\theta}_i}{C}=0;\kern1em i=1,\dots, N\end{array}} $$

(27)

where K_{j, i} = K(x_j, x_i), $ {K}_{k,i}^{\prime }=K\left({\mathrm{x}}_k^{new},{\mathrm{x}}_i\right) $. Equation (27) can be reformulated as its matrix presentation:

$$ \left(\left[\begin{array}{ccc}{K}_{1,1}& \cdots & {K}_{N,1}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,N}& \cdots & {K}_{N,N}\end{array}\right]+\frac{I}{C}\right)\varDelta \uptheta =-\left[\begin{array}{ccc}{K}_{1,1}^{\prime }& \cdots & {K}_{m,1}^{\prime}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,N}^{\prime }& \cdots & {K}_{m,N}^{\prime}\end{array}\right]{\uptheta}^{new} $$

(28)

For making the expression more concise, the below definitions are made:

$$ R={\left(\left[\begin{array}{ccc}{K}_{1,1}& \cdots & {K}_{N,1}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,N}& \cdots & {K}_{N,N}\end{array}\right]+\frac{I}{C}\right)}^{-1} $$

(29)

$$ {K}_m^{\prime }=\left[\begin{array}{ccc}{K}_{1,1}^{\prime }& \cdots & {K}_{m,1}^{\prime}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,N}^{\prime }& \cdots & {K}_{m,N}^{\prime}\end{array}\right] $$

(30)

$$ G=-R{K}_m^{\prime } $$

(31)

Equation (28) can be rewritten compactly as:

$$ \varDelta \uptheta =G{\uptheta}^{new} $$

(32)

Substituting $ {\theta}_j^{\ast }={\theta}_j+\varDelta {\theta}_j $, j = 1,2,…,N into Eq. (26), we can get:

$$ {\displaystyle \begin{array}{c}\sum \limits_{k=1}^m{\theta}_k^{new}K\left({\mathrm{x}}_k^{new},{\mathrm{x}}_i^{new}\right)+\sum \limits_{j=1}^N\varDelta {\theta}_jK\left({\mathrm{x}}_j,{\mathrm{x}}_i^{new}\right)\\ {}+\sum \limits_{j=1}^N{\theta}_jK\left({\mathrm{x}}_j,{\mathrm{x}}_i^{new}\right)-{t}_i^{new}+\frac{\theta_i^{new}}{C}=0,\kern1em i=1,\dots, m\end{array}} $$

(33)

The matrix form of the above formula is:

$$ \left(\left[\begin{array}{ccc}{K}_{1,1}^{\prime \prime }& \cdots & {K}_{m,1}^{\prime \prime}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,m}^{\prime \prime }& \cdots & {K}_{m,m}^{\prime \prime}\end{array}\right]+\frac{I}{C}\right){\uptheta}^{new}+\left[\begin{array}{ccc}{K}_{1,1}^{\prime }& \cdots & {K}_{1,N}^{\prime}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{m,1}^{\prime }& \cdots & {K}_{m,N}^{\prime}\end{array}\right]\varDelta \uptheta =\left[\begin{array}{c}{t}_1^{new}-f\left({\mathrm{x}}_1^{new}\right)\\ {}\vdots \\ {}{t}_m^{new}-f\left({\mathrm{x}}_m^{new}\right)\end{array}\right] $$

(34)

where $ {K}_{i,j}^{{\prime\prime} }=K\left({\mathrm{x}}_i^{new},{\mathrm{x}}_j^{new}\right) $. Let:

$$ {K}_m^{\prime \prime }=\left[\begin{array}{ccc}{K}_{1,1}^{\prime \prime }& \cdots & {K}_{m,1}^{\prime \prime}\\ {}\vdots & \ddots & \vdots \\ {}{K}_{1,m}^{\prime \prime }& \cdots & {K}_{m,m}^{\prime \prime}\end{array}\right] $$

(35)

$$ {\mathrm{e}}_m=\left[\begin{array}{c}{t}_1^{new}-f\left({\mathrm{x}}_1^{new}\right)\\ {}\vdots \\ {}{t}_m^{new}-f\left({\mathrm{x}}_m^{new}\right)\end{array}\right] $$

(36)

where f(.) is the regression function without being trained by using the newly added samples.

The compact matrix-vector expression of Eq. (34) is:

$$ \left({K}_m^{\prime \prime }+\frac{I}{C}\right){\uptheta}^{new}+{\left({K}_m^{\prime}\right)}^TG{\uptheta}^{new}={\mathrm{e}}_m $$

(37)

We get:

$$ {\uptheta}^{new}={\left(\left({K}_m^{\prime \prime }+\frac{I}{C}\right)+{\left({K}_m^{\prime}\right)}^TG\right)}^{-1}{\mathrm{e}}_m $$

(38)

After θ^new has been obtained, ∆θ can be computed according to Eq. (32), and finally the updated θ^update = [(θ + Δθ)^T, (θ^new)^T]^T can be acquired. When the new samples are added, R in Eq. (29) also needs to be updated. The kernel matrix in the updated R^* is:

$$ {K}^{\ast }=\left[\begin{array}{cc}K& {K}_m^{\prime}\\ {}{\left({K}_m^{\prime}\right)}^T& {K}_m^{\prime \prime}\end{array}\right] $$

(39)

The updated R^* is:

$$ {R}^{\ast }={\left({K}^{\ast }+\frac{I}{C}\right)}^{-1} $$

(40)

After a series of matrix calculations, can we get the updated R^*:

$$ {\displaystyle \begin{array}{c}{R}^{\ast }={\left[\begin{array}{cc}R& \begin{array}{ccc}0& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & 0\end{array}\\ {}\begin{array}{ccc}0& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & 0\end{array}& \begin{array}{ccc}0& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & 0\end{array}\end{array}\right]}_{N+m,N+m}\\ {}\kern1em +{\left[\begin{array}{c}G\\ {}\begin{array}{ccc}1& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & 1\end{array}\end{array}\right]}_{N+m,m}{\left(\left({K}_m^{\prime \prime }+\frac{I}{C}\right)+{\left({K}_m^{\prime}\right)}^TG\right)}^{-1}{\left[{G}^T\kern0.5em \begin{array}{ccc}1& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & 1\end{array}\right]}_{m,N+m}\end{array}} $$

(41)

In the end, the online sequential learning algorithm of ELMK (OS-ELMK) is presented, in detail, in Algorithm 4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, G., Dai, Q. EnsP^KDE&IncL^KDE: a hybrid time series prediction algorithm integrating dynamic ensemble pruning, incremental learning, and kernel density estimation. Appl Intell 51, 617–645 (2021). https://doi.org/10.1007/s10489-020-01802-4

Download citation

Published: 22 August 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10489-020-01802-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EnsP^KDE&IncL^KDE: a hybrid time series prediction algorithm integrating dynamic ensemble pruning, incremental learning, and kernel density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DWE-IL: a new incremental learning algorithm for non-stationary time series prediction via dynamically weighting ensemble learning

Several Novel Dynamic Ensemble Selection Algorithms for Time Series Prediction

DEP-TSP^meta: a multiple criteria Dynamic Ensemble Pruning technique ad-hoc for time series prediction

References

Acknowledgements