Abstract
Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear connections in conventional weight matrices, the functional connections can represent nonlinear relations between two neighbouring layers. Neural networks with the functional connections, which are called functional transfer neural networks, can be trained via back-propagation. On the two spirals problem, the functional transfer neural networks are able to show considerably better performance than conventional multi-layer perceptrons. On the MNIST handwritten digit recognition task, the performance of the functional transfer neural networks is comparable to that of the conventional model. This study has demonstrated that the functional transfer matrices are able to perform better than the conventional weight matrices in specific cases, so that they can be alternatives of the conventional ones.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
For details about layer-wise supervised training, please also refer to Algorithm 7 in the appendix of the referenced paper.
The implementation of the functional transfer matrices in Table 1 can be downloaded from https://github.com/cchrewrite/Functional-Transfer-Neural-Networks/tree/master/Matrices.
For more information about the CSTR machine learning platform, please refer to https://github.com/CSTR-Edinburgh/mlpractical.
The “best” activation means that when all models with a certain type of functional connections, a certain depth and different values of γ are considered as a group, the activation brings about the highest accuracy in this group.
References
Aizenberg IN, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm. Soft Comput 11(2):169–183. https://doi.org/10.1007/s00500-006-0075-5
Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2016) Deepcoder: Learning to write programs. CoRR arXiv:1611.01989
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in Neural information processing systems 19, Proceedings of the Twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4-7, 2006, pp 153–160. http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks
Bergstra J, Desjardins G, Lamblin P, Bengio Y (2009) Quadratic polynomials learn better image features. Tech. rep. Technical Report 1337, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. 2nd edn, pp 421–436. https://doi.org/10.1007/978-3-642-35289-8_25
Buchholz S, Sommer G (2008) On clifford neurons and clifford multi-layer perceptrons. Neural Netw 21 (7):925–935. https://doi.org/10.1016/j.neunet.2008.03.004
Cai C, Ke D, Xu Y, Su K (2017) Learning of human-like algebraic reasoning using deep feedforward neural networks. CoRR arXiv:1704.07503
Cai C, Ke D, Xu Y, Su K (2017) Symbolic manipulation based on deep neural networks and its application to axiom discovery. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2136–2143. https://doi.org/10.1109/IJCNN.2017.7966113
Chung H, Lee SJ, Park JG (2016) Deep neural network using trainable activation functions. In: 2016 International joint conference on neural networks, IJCNN 2016, vancouver, BC, Canada, July 24-29, 2016, pp 348–352. https://doi.org/10.1109/IJCNN.2016.7727219
Dehuri S, Cho S (2010) A comprehensive survey on functional link neural networks and an adaptive PSO-BP learning for CFLNN. Neural Comput & Applic 19(2):187–205. https://doi.org/10.1007/s00521-009-0288-5
Dehuri S, Cho S (2010) Evolutionarily optimized features in functional link neural network for classification. Expert Syst Appl 37(6):4379–4391. https://doi.org/10.1016/j.eswa.2009.11.090
Dehuri S, Roy R, Cho S, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85(6):1333–1345. https://doi.org/10.1016/j.jss.2012.01.025
Deng L (2012) The MNIST, database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Deng L, Hinton GE, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2013, vancouver, BC, Canada, May 26-31, 2013, pp 8599–8603. https://doi.org/10.1109/ICASSP.2013.6639344
Dreiseitl S (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5-6):352–359. https://doi.org/10.1016/S1532-0464(03)00034-0
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pp 315–323. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
Goodenough DJ, Rossmann K, Lusted LB (1974) Radiographic applications of receiver operating characteristic (roc) curves. Radiology 110(1):89–95
Hecht-Nielsen R (1988) Theory of the backpropagation neural network. Neural Netw 1(Supplement-1):445–448. https://doi.org/10.1016/0893-6080(88)90469-8
Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Irving G, Szegedy C, Alemi AA, Eén N, Chollet F, Urban J (2016) Deepmath - deep sequence models for premise selection. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 2235–2243 . http://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selection
Kominami Y, Ogawa H, Murase K (2017) Convolutional neural networks with multi-valued neurons. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2673–2678. https://doi.org/10.1109/IJCNN.2017.7966183
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Signal Process 39 (9):2101–2104. https://doi.org/10.1109/78.134446
Matsui N, Isokawa T, Kusamichi H, Peper F, Nishimura H (2004) Quaternion neural network with geometrical operators. J Intell Fuzzy Syst 15(3-4):149–164. http://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs00236
Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22. https://doi.org/10.1109/TASL.2011.2109382
Montúfar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2924–2932. http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks
Pao Y, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. CoRR arXiv:1712.04621
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science
Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognitive Modeling 5(3):1
Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech &, Language Processing 22(4):778–784. https://doi.org/10.1109/TASLP.2014.2303296
Siniscalchi SM, Svendsen T, Sorbello F, Lee C (2010) Experimental studies on continuous speech recognition using neural architectures with adaptive hidden activation functions. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp 4882–4885. https://doi.org/10.1109/ICASSP.2010.5495120
Siniscalchi SM, Li J, Lee C (2013) Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Trans Audio Speech Lang Process 21(10):2152–2161. https://doi.org/10.1109/TASL.2013.2270370
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.org/citation.cfm?id=2670313
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Villaseñor C, Arana-daniel N, Alanis AY, López-Franco C (2017) Hyperellipsoidal neuron. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 788–794. https://doi.org/10.1109/IJCNN.2017.7965932
Wiesler S, Richard A, Schlüter R, Ney H (2014) Mean-normalized stochastic gradient for large-scale deep learning. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 180–184. https://doi.org/10.1109/ICASSP.2014.6853582
Zhang L, Suganthan PN (2016) A comprehensive evaluation of random vector functional link networks. Inf Sci 367-368:1094–1105. https://doi.org/10.1016/j.ins.2015.09.025
Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 215–219. https://doi.org/10.1109/ICASSP.2014.6853589
Zhao H, Zhang J (2008) Functional link neural network cascaded with chebyshev orthogonal polynomial for nonlinear channel equalization. Signal Process 88(8):1946–1957. https://doi.org/10.1016/j.sigpro.2008.01.029
Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pp 2595–2603. http://papers.nips.cc/paper/4006-parallelized-stochastic-gradient-descent
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities (No. 2016JX06) and the National Natural Science Foundation of China (No. 61472369).
Author information
Authors and Affiliations
Corresponding author
Additional information
Funding: This work was supported by the Fundamental Research Funds for the Central Universities [grant number 2016JX06]; and the National Natural Science Foundation of China [grant number 61472369].
Rights and permissions
About this article
Cite this article
Cai, CH., Xu, Y., Ke, D. et al. Trainable back-propagated functional transfer matrices. Appl Intell 49, 376–395 (2019). https://doi.org/10.1007/s10489-018-1266-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1266-3