Abstract
The mutual information (MI) based on averaged shifted histogram (ASH) probability density estimator is considered as a good indicator of relevance between input variables and output variable. However, it cannot deal with redundant input variables problem. Therefore, a method integrates principal component analysis (PCA) with MI is proposed for radial basis function network (RBFN) to improve the predicting performance of RBFN. Firstly, PCA is employed to characterize the PCs from original variables, among which there is non-correlation. Secondly, MI based on ASH is applied to select the several closest correlation PCs with output variable as the new input variables. Finally, PCA-ASH-RBFN is employed to develop the housing price model based on the Boston housing data set. The result shows that PCA-ASH-RBFN has better prediction and robust performance than PCA-RBFN and RBFN integrating with robust feature selection for input variables.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Huth R, Pokorna L (2005) Simulations analysis of climatic trend in multiple variable an example of application of multivariate statistical methods. Int J Climatol 25:469–484
Shi WL, Gao TB, Wang SE (2008) Evaluation of urbanization level using principal component analysis and cluster analysis. Ind Eng J 11:112–115
Martinez Lopez J, Llamas Borrajo J, De Miguel Garcia E, Rey Arrans J, Hidalgo Estevez Ma C, Saez Castillo AJ (2008) Multivariate analysis of contamination in the mining district of Linares. Appl Geochem 23:2324–2336
Chen YH, Rangarajan G, Feng JF (2004) Analyzing multiple nonlinear time series with extended Granger causality. Phys Lett A 324:26–35
Joliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Yucceer M (2010) Artificial neural network models for HFCS isomerization process. Neural Comput Appl 19(7):979–986
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441
Malinowski ER (1991) Factor analysis in chemistry, 2nd edn. Wiley, New York
Perkins RG, Underwood GJC (2000) Gradients of chlorophyll a and water chemistry along an eutrophic reservoir with determination of the limiting nutrient by in situ nutrient addition. Water Res 34:713–724
Lai D (2003) Principal component analysis on human development indicators of China. Soc Indic Res 61:319–330
Diamantaras K, Papadimitriou T (2009) Applying PCA neural models for the blind separation of signals. Neurocomputing 73:3–9
Wachs A, Lewin DR (1999) Improved PCA methods for process disturbance and failure identification. AIChE J 45:1688–1700
Croux C, Ruiz-Gazen A (1996) A fast algorithm for robust principal components based on Projection Pursuit. In: Prat A (ed) Compstat: proceedings in computational statistics. Physica, Heidelberg, pp 211–216
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550
Kwak N Choi CH (1999) Improved mutual information feature selector for neural networks in supervised learning. In: Proceedings of the international joint conference on neural networks, Washington, DC
Kwak N, Choi CH (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13:143–159
Rossi F, Lendase A, Francois D, Wertz V, Verleysen M (2006) Mutual information for the selection of relevant variables in spectrometric nonlinear modeling. Chemom Intell Lab Syst 80(2):215–226
Scott DW (1985) Averaged shifted histograms: effective nonparametric estimators in several dimensions. Ann Stat 13:1024–1040
Powell MJD (1987) Radial basis functions for multivariable interpolation: a review. In: Mason JC, Cox MG (eds) Algorithms for approximation. Clarendon Press, Oxford, pp 143–167
Moody J, Darken C (1989) Fast learning in networks of locally-tuned processing units. Neural Comput 4:740–747
Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355
Francois D, Rossi F, Wertz V, Verleysen M (2007) Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing 70:1276–1288
Scott D (1992) Multivariable density estimation: theory, practice, and visualization. Wiley, New York
Fernando TMKG, Maier HR, Dandy GC (2009) Selection of input variables for data driven models: an average shifted histogram partial mutual information estimator approach. J Hydrol 367:165–176
Scott DW, Terrell GR (1987) Biased and unbiased cross-validation in density estimation. J Am Stat Assoc 82:1131–1146
Blake CL, Merz CJ (1998) UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Chen DZ (1998) Multivariate data processing. Chemical Industry Press, Beijing
Acknowledgments
The authors gratefully acknowledge the supports from the following foundations: National Natural Science Foundation of China (20776042), Doctoral Fund of Ministry of Education of China (20090074110005), Program for New Century Excellent Talents in University (NCET-09-0346), “Shu Guang” project (09SG29) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, C., Yan, X. Transforming input variables for RBFN based on PCA-ASH multivariate correlation analysis and its application. Neural Comput & Applic 22 (Suppl 1), 101–111 (2013). https://doi.org/10.1007/s00521-012-0968-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-0968-4