Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Incremental model selection and ensemble prediction under virtual concept drifting environments

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Model selection for machine learning systems is one of the most important issues to be addressed for obtaining greater generalization capabilities. This paper proposes a strategy to achieve model selection incrementally under virtual concept drifting environments, where the distribution of learning samples varies over time. To carry out incremental model selection, the system generally uses all the learning samples that have been observed until now. Under virtual concept drifting environments, however, the distribution of the observed samples is considerably different from the distribution of cumulative dataset so that model selection is usually unsuccessful. To overcome this problem, the author had earlier proposed the weighted objective function and model-selection criterion based on the predictive input density of the learning samples. Although the previous method described in the author’s previous study shows good performances to some datasets, it occasionally fails to yield appropriate learning results because of the failure in the prediction of the actual input density. To reduce the adverse effect, the method proposed in this paper improves on the previously described method to yield the desired outputs using an ensemble of the constructed radial basis function neural networks (RBFNNs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. We assume that the time interval for state transition is considerably longer than that for presenting each sample.

  2. Note that org-RBFNN is an optimal learning result under the assumption that the observed samples are i.i.d. samples from the original input distribution.

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC-19(6):716–723

    Article  MathSciNet  Google Scholar 

  • Ans B, Roussert S (2000) Neural networks with a self-refreshing memory: knowledge transfer in sequential learning tasks without catastrophic forgetting. Connect Sci 12(1):1–19

    Article  Google Scholar 

  • Bezdek J (1980) A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:1–8

    Article  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Feng G, Huang GB, Lin Q, Gay R (2009) Error minimized extreme learning machinewith growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357

    Article  Google Scholar 

  • French RM (1997) Pseudo-recurrent connectionist networks: an approach to the “sensitivity stability” dilemma. Connect Sci 9(4):353–379

    Article  Google Scholar 

  • Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning rbf (ggap-rbf) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67

    Article  Google Scholar 

  • Lòpez-Rubio E (2009) Multivariate student-t self-organizing maps. Neural Netw 22:1432–1447

    Article  Google Scholar 

  • Moody J, Darken CJ (1989) Fast learning in neural networks of locally-tuned processing units. Neural Comput 1:281–294

    Article  Google Scholar 

  • Ozawa S, Okamoto K (2009) An incremental learning algorithm for resource allocating networks based on local linear regression. In: 16th international conference on neural information processing Bangkok, Thailand, December 1–5, 2009, vol LNCS5863, pp 562–569

  • Ozawa S, Toh SL, Abe S, Pang S, Kasabov N (2005) Incremental learning of feature space and classifier for face recognition. Neural Netw 18:575–584

    Article  Google Scholar 

  • Platt J (1991) A resource allocating network for function interpolation. Neural Comput 3(2):213–225

    Article  MathSciNet  Google Scholar 

  • Pouzols FM, Lendasse A (2010) Evolving fuzzy optimally pruned extreme learning machine for regression problems. Evol Syst 1(1):43–58

    Article  Google Scholar 

  • Sato M, Ishii S (2000) On-line em algorithm for the normalized gaussian network. Neural Comput 12:407–432

    Article  Google Scholar 

  • Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8):2047–2084

    Article  Google Scholar 

  • Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244

    Article  MathSciNet  MATH  Google Scholar 

  • Sugiyama M, Nakajima S, Kashima H, von Búnau P, Kawanabe M (2007) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Twenty-first annual conference on neural information processing systems (NIPS2007)

  • Yamakawa H, Masumoto D, Kimoto T, Nagata S (1994) Active data selection and subsequent revision for sequential learning with neural networks. In: World congress of neural networks (WCNN’94), vol 3, pp 661–666

  • Yamauchi K (2009) Optimal incremental learning under covariate shift. Memet Comput 1(4):271–279

    Article  Google Scholar 

  • Yamauchi K (2010a) Incremental learning and model selection under virtual concept drifting environments. In: The 2010 IEEE World Congress on Computational Intelligence (IEEE WCCI 2010), The Institute of Electrical and Electronics Engineers, Inc. New York, New York

  • Yamauchi K (2010b) Incremental model selection and ensemble prediction under virtual concept drifting environments. In: Zhang BT, Orgun MA (eds) PRICAI 2010: Trends in Artificial Intelligence, Springer, vol LNAI6230, pp 570–582

  • Yamauchi K, Hayami J (2007) Incremental learning and model selection for radial basis function network through sleep. IEICE Trans Inf Syst E90-D(4):722–735

  • Yamauchi K, Yamaguchi N, Ishii N (1999) Incremental learning methods with retrieving interfered patterns. IEEE Trans Neural Netw 10(6):1351–1365

    Article  Google Scholar 

  • Yoneda T, Yamanaka M, Kakazu Y (1992) Study on optimization of grinding conditions using neural networks—a method of additional learning. J Jpn Soc Precis Eng/Seimitsu kogakukaishi 58(10):1707–1712

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koichiro Yamauchi.

Appendices

Appendix: Derivation of Eq. 18

The expected loss of the ensembled network can be rewrite under the condition ∑ n π n  = 1 as follows.

$$ \begin{aligned} E&=\int\left\{F(x)-\sum_{n \in E_{x}}\pi_{n} f_{{ {\varvec{\theta}}^{WLS}_{n}}({\user2{x}})}\right\}^{2} W_{1}({\user2{x}}) P({\varvec{x}}) d{\varvec{x}}\\ &=\int \left\{\sum_{n \in E_{x}} \pi_{n} \left(F({\varvec{x}})-f_{ {\varvec{\theta}}^{WLS}_{n}}({\user2{x}})\right)\right\}^2 W_{1}({\user2{x}}) P({\user2{x}})d{\user2{x}} \end{aligned} $$
(21)

Now, if we assume that there are no correlation between the validations of arbitrary two RBFNN’s, the above equation can approximately be rewrite as follows.

$$ \begin{aligned} E \simeq& \int \sum_{n \in E_{x}} \pi_{n}^2 \left \{F({\user2{x}})-f_{ {\varvec{\theta}}^{WLS}_{n}}({\user2{x}})\right\}^2 W_{1}({\user2{x}}) P({\user2{x}}) d {\user2{x}}\\ =&\sum_{n \in E_{x}} \pi_{n}^2 \int \left \{F({\user2{x}})-f_{ {\varvec{\theta}}^{WLS}_{n}}({\user2{x}})\right\}^2 W_{1}({\user2{x}}) P({\user2{x}}) d{\user2{x}} \end{aligned} $$
(22)

According to the definition of information criterions, the value of information criterion represents the averaged expected error of the statistical model. Therefore, Eq. 22 can be approximated by

$$ E\simeq\sum_{n \in E_{x}}\pi_{n}^2 \hat{E}(\lambda_{n}). $$
(23)

From these discussion, the objective function for π n is

$$ U \equiv \sum_{n \in E_{x}} \pi_{n}^2 \hat{E}(\lambda_{n}) + \lambda \left( 1-\sum_{n \in E_{x}} \pi_{n} \right), $$
(24)

where λ denotes a Lagrangian coefficient. The solution of Eq. 24 is

$$ \frac{{\partial U}}{\partial \pi_{n}}=0 \Longleftrightarrow \ \pi_{n}=\frac{{\lambda}}{2\hat{E}(\lambda_{n})} $$
(25)
$$ \frac{{\partial U}}{\partial \lambda}=0\ \Longleftrightarrow \sum_{n \in E_{x}}\pi_{n}=1. $$
(26)

From Eqs. 25, 26, we obtain

$$ \pi_{n}^{*}=\frac{{1/\hat{E}(\lambda_{n})}}{\sum_{j \in E_{x}}1/ \hat{E}(\lambda_{j})}. $$
(27)

Derivation of Eq. 19

According to the definition of information criteria, IC w predicts mean log likelihood:

$$ N_{total}\log\hat{E}(\lambda_{n}) \simeq IC_{w}(\lambda_{n}) $$
(28)

Therefore, we obtain:

$$ \hat{E}(\lambda_{n})\simeq \exp \left(\frac{{1}}{N_{total}} IC_{w}(\lambda_{n}) \right). $$
(29)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamauchi, K. Incremental model selection and ensemble prediction under virtual concept drifting environments. Evolving Systems 2, 249–260 (2011). https://doi.org/10.1007/s12530-011-9038-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-011-9038-x

Keywords