Abstract
Model selection for machine learning systems is one of the most important issues to be addressed for obtaining greater generalization capabilities. This paper proposes a strategy to achieve model selection incrementally under virtual concept drifting environments, where the distribution of learning samples varies over time. To carry out incremental model selection, the system generally uses all the learning samples that have been observed until now. Under virtual concept drifting environments, however, the distribution of the observed samples is considerably different from the distribution of cumulative dataset so that model selection is usually unsuccessful. To overcome this problem, the author had earlier proposed the weighted objective function and model-selection criterion based on the predictive input density of the learning samples. Although the previous method described in the author’s previous study shows good performances to some datasets, it occasionally fails to yield appropriate learning results because of the failure in the prediction of the actual input density. To reduce the adverse effect, the method proposed in this paper improves on the previously described method to yield the desired outputs using an ensemble of the constructed radial basis function neural networks (RBFNNs).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We assume that the time interval for state transition is considerably longer than that for presenting each sample.
Note that org-RBFNN is an optimal learning result under the assumption that the observed samples are i.i.d. samples from the original input distribution.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC-19(6):716–723
Ans B, Roussert S (2000) Neural networks with a self-refreshing memory: knowledge transfer in sequential learning tasks without catastrophic forgetting. Connect Sci 12(1):1–19
Bezdek J (1980) A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:1–8
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38
Feng G, Huang GB, Lin Q, Gay R (2009) Error minimized extreme learning machinewith growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357
French RM (1997) Pseudo-recurrent connectionist networks: an approach to the “sensitivity stability” dilemma. Connect Sci 9(4):353–379
Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning rbf (ggap-rbf) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Lòpez-Rubio E (2009) Multivariate student-t self-organizing maps. Neural Netw 22:1432–1447
Moody J, Darken CJ (1989) Fast learning in neural networks of locally-tuned processing units. Neural Comput 1:281–294
Ozawa S, Okamoto K (2009) An incremental learning algorithm for resource allocating networks based on local linear regression. In: 16th international conference on neural information processing Bangkok, Thailand, December 1–5, 2009, vol LNCS5863, pp 562–569
Ozawa S, Toh SL, Abe S, Pang S, Kasabov N (2005) Incremental learning of feature space and classifier for face recognition. Neural Netw 18:575–584
Platt J (1991) A resource allocating network for function interpolation. Neural Comput 3(2):213–225
Pouzols FM, Lendasse A (2010) Evolving fuzzy optimally pruned extreme learning machine for regression problems. Evol Syst 1(1):43–58
Sato M, Ishii S (2000) On-line em algorithm for the normalized gaussian network. Neural Comput 12:407–432
Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8):2047–2084
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244
Sugiyama M, Nakajima S, Kashima H, von Búnau P, Kawanabe M (2007) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Twenty-first annual conference on neural information processing systems (NIPS2007)
Yamakawa H, Masumoto D, Kimoto T, Nagata S (1994) Active data selection and subsequent revision for sequential learning with neural networks. In: World congress of neural networks (WCNN’94), vol 3, pp 661–666
Yamauchi K (2009) Optimal incremental learning under covariate shift. Memet Comput 1(4):271–279
Yamauchi K (2010a) Incremental learning and model selection under virtual concept drifting environments. In: The 2010 IEEE World Congress on Computational Intelligence (IEEE WCCI 2010), The Institute of Electrical and Electronics Engineers, Inc. New York, New York
Yamauchi K (2010b) Incremental model selection and ensemble prediction under virtual concept drifting environments. In: Zhang BT, Orgun MA (eds) PRICAI 2010: Trends in Artificial Intelligence, Springer, vol LNAI6230, pp 570–582
Yamauchi K, Hayami J (2007) Incremental learning and model selection for radial basis function network through sleep. IEICE Trans Inf Syst E90-D(4):722–735
Yamauchi K, Yamaguchi N, Ishii N (1999) Incremental learning methods with retrieving interfered patterns. IEEE Trans Neural Netw 10(6):1351–1365
Yoneda T, Yamanaka M, Kakazu Y (1992) Study on optimization of grinding conditions using neural networks—a method of additional learning. J Jpn Soc Precis Eng/Seimitsu kogakukaishi 58(10):1707–1712
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix: Derivation of Eq. 18
The expected loss of the ensembled network can be rewrite under the condition ∑ n π n = 1 as follows.
Now, if we assume that there are no correlation between the validations of arbitrary two RBFNN’s, the above equation can approximately be rewrite as follows.
According to the definition of information criterions, the value of information criterion represents the averaged expected error of the statistical model. Therefore, Eq. 22 can be approximated by
From these discussion, the objective function for π n is
where λ denotes a Lagrangian coefficient. The solution of Eq. 24 is
Derivation of Eq. 19
According to the definition of information criteria, IC w predicts mean log likelihood:
Therefore, we obtain:
Rights and permissions
About this article
Cite this article
Yamauchi, K. Incremental model selection and ensemble prediction under virtual concept drifting environments. Evolving Systems 2, 249–260 (2011). https://doi.org/10.1007/s12530-011-9038-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-011-9038-x