Trainable back-propagated functional transfer matrices

Cai, Cheng-Hao; Xu, Yanyan; Ke, Dengfeng; Su, Kaile; Sun, Jing

doi:10.1007/s10489-018-1266-3

Trainable back-propagated functional transfer matrices

Published: 30 August 2018

Volume 49, pages 376–395, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Cheng-Hao Cai¹,
Yanyan Xu ORCID: orcid.org/0000-0001-7174-6588²,
Dengfeng Ke³,
Kaile Su⁴ &
…
Jing Sun¹

290 Accesses
2 Altmetric
Explore all metrics

Abstract

Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear connections in conventional weight matrices, the functional connections can represent nonlinear relations between two neighbouring layers. Neural networks with the functional connections, which are called functional transfer neural networks, can be trained via back-propagation. On the two spirals problem, the functional transfer neural networks are able to show considerably better performance than conventional multi-layer perceptrons. On the MNIST handwritten digit recognition task, the performance of the functional transfer neural networks is comparable to that of the conventional model. This study has demonstrated that the functional transfer matrices are able to perform better than the conventional weight matrices in specific cases, so that they can be alternatives of the conventional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functionally weighted neural networks: frugal models with high accuracy

Article 07 November 2020

Trainable activation function with differentiable negative side and adaptable rectified point

Article 15 October 2020

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

For details about layer-wise supervised training, please also refer to Algorithm 7 in the appendix of the referenced paper.
The implementation of the functional transfer matrices in Table 1 can be downloaded from https://github.com/cchrewrite/Functional-Transfer-Neural-Networks/tree/master/Matrices.
For more information about the CSTR machine learning platform, please refer to https://github.com/CSTR-Edinburgh/mlpractical.
The “best” activation means that when all models with a certain type of functional connections, a certain depth and different values of γ are considered as a group, the activation brings about the highest accuracy in this group.

References

Aizenberg IN, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm. Soft Comput 11(2):169–183. https://doi.org/10.1007/s00500-006-0075-5
Article Google Scholar
Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2016) Deepcoder: Learning to write programs. CoRR arXiv:1611.01989
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in Neural information processing systems 19, Proceedings of the Twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4-7, 2006, pp 153–160. http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks
Bergstra J, Desjardins G, Lamblin P, Bengio Y (2009) Quadratic polynomials learn better image features. Tech. rep. Technical Report 1337, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. 2nd edn, pp 421–436. https://doi.org/10.1007/978-3-642-35289-8_25
Buchholz S, Sommer G (2008) On clifford neurons and clifford multi-layer perceptrons. Neural Netw 21 (7):925–935. https://doi.org/10.1016/j.neunet.2008.03.004
Article MATH Google Scholar
Cai C, Ke D, Xu Y, Su K (2017) Learning of human-like algebraic reasoning using deep feedforward neural networks. CoRR arXiv:1704.07503
Cai C, Ke D, Xu Y, Su K (2017) Symbolic manipulation based on deep neural networks and its application to axiom discovery. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2136–2143. https://doi.org/10.1109/IJCNN.2017.7966113
Chung H, Lee SJ, Park JG (2016) Deep neural network using trainable activation functions. In: 2016 International joint conference on neural networks, IJCNN 2016, vancouver, BC, Canada, July 24-29, 2016, pp 348–352. https://doi.org/10.1109/IJCNN.2016.7727219
Dehuri S, Cho S (2010) A comprehensive survey on functional link neural networks and an adaptive PSO-BP learning for CFLNN. Neural Comput & Applic 19(2):187–205. https://doi.org/10.1007/s00521-009-0288-5
Article Google Scholar
Dehuri S, Cho S (2010) Evolutionarily optimized features in functional link neural network for classification. Expert Syst Appl 37(6):4379–4391. https://doi.org/10.1016/j.eswa.2009.11.090
Article Google Scholar
Dehuri S, Roy R, Cho S, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85(6):1333–1345. https://doi.org/10.1016/j.jss.2012.01.025
Article Google Scholar
Deng L (2012) The MNIST, database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Article Google Scholar
Deng L, Hinton GE, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2013, vancouver, BC, Canada, May 26-31, 2013, pp 8599–8603. https://doi.org/10.1109/ICASSP.2013.6639344
Dreiseitl S (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5-6):352–359. https://doi.org/10.1016/S1532-0464(03)00034-0
Article Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pp 315–323. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
Goodenough DJ, Rossmann K, Lusted LB (1974) Radiographic applications of receiver operating characteristic (roc) curves. Radiology 110(1):89–95
Article Google Scholar
Hecht-Nielsen R (1988) Theory of the backpropagation neural network. Neural Netw 1(Supplement-1):445–448. https://doi.org/10.1016/0893-6080(88)90469-8
Article Google Scholar
Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Irving G, Szegedy C, Alemi AA, Eén N, Chollet F, Urban J (2016) Deepmath - deep sequence models for premise selection. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 2235–2243 . http://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selection
Kominami Y, Ogawa H, Murase K (2017) Convolutional neural networks with multi-valued neurons. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2673–2678. https://doi.org/10.1109/IJCNN.2017.7966183
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Signal Process 39 (9):2101–2104. https://doi.org/10.1109/78.134446
Article Google Scholar
Matsui N, Isokawa T, Kusamichi H, Peper F, Nishimura H (2004) Quaternion neural network with geometrical operators. J Intell Fuzzy Syst 15(3-4):149–164. http://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs00236
MATH Google Scholar
Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22. https://doi.org/10.1109/TASL.2011.2109382
Article Google Scholar
Montúfar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2924–2932. http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks
Pao Y, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Article Google Scholar
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. CoRR arXiv:1712.04621
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science
Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognitive Modeling 5(3):1
MATH Google Scholar
Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech &, Language Processing 22(4):778–784. https://doi.org/10.1109/TASLP.2014.2303296
Article Google Scholar
Siniscalchi SM, Svendsen T, Sorbello F, Lee C (2010) Experimental studies on continuous speech recognition using neural architectures with adaptive hidden activation functions. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp 4882–4885. https://doi.org/10.1109/ICASSP.2010.5495120
Siniscalchi SM, Li J, Lee C (2013) Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Trans Audio Speech Lang Process 21(10):2152–2161. https://doi.org/10.1109/TASL.2013.2270370
Article Google Scholar
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.org/citation.cfm?id=2670313
MathSciNet MATH Google Scholar
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Article Google Scholar
Villaseñor C, Arana-daniel N, Alanis AY, López-Franco C (2017) Hyperellipsoidal neuron. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 788–794. https://doi.org/10.1109/IJCNN.2017.7965932
Wiesler S, Richard A, Schlüter R, Ney H (2014) Mean-normalized stochastic gradient for large-scale deep learning. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 180–184. https://doi.org/10.1109/ICASSP.2014.6853582
Zhang L, Suganthan PN (2016) A comprehensive evaluation of random vector functional link networks. Inf Sci 367-368:1094–1105. https://doi.org/10.1016/j.ins.2015.09.025
Article Google Scholar
Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 215–219. https://doi.org/10.1109/ICASSP.2014.6853589
Zhao H, Zhang J (2008) Functional link neural network cascaded with chebyshev orthogonal polynomial for nonlinear channel equalization. Signal Process 88(8):1946–1957. https://doi.org/10.1016/j.sigpro.2008.01.029
Article MATH Google Scholar
Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pp 2595–2603. http://papers.nips.cc/paper/4006-parallelized-stochastic-gradient-descent

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (No. 2016JX06) and the National Natural Science Foundation of China (No. 61472369).

Author information

Authors and Affiliations

Department of Computer Science, The University of Auckland, 38 Princes Street, Auckland, 1142, New Zealand
Cheng-Hao Cai & Jing Sun
School of Information Science and Technology, Beijing Forestry University, No.35 East Qinghua Road, Haidian District, Beijing, 100083, China
Yanyan Xu
Institute of Automation, Chinese Academy of Sciences, No.95 East Zhongguancun Road, Haidian District, Beijing, 100190, China
Dengfeng Ke
School of Information and Communication Technology, Griffith University, 170 Kessels Road, Nathan, Brisbane, Qld, 4111, Australia
Kaile Su

Authors

Cheng-Hao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dengfeng Ke
View author publications
You can also search for this author in PubMed Google Scholar
Kaile Su
View author publications
You can also search for this author in PubMed Google Scholar
Jing Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanyan Xu.

Additional information

Funding: This work was supported by the Fundamental Research Funds for the Central Universities [grant number 2016JX06]; and the National Natural Science Foundation of China [grant number 61472369].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, CH., Xu, Y., Ke, D. et al. Trainable back-propagated functional transfer matrices. Appl Intell 49, 376–395 (2019). https://doi.org/10.1007/s10489-018-1266-3

Download citation

Published: 30 August 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10489-018-1266-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trainable back-propagated functional transfer matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Functionally weighted neural networks: frugal models with high accuracy

Trainable activation function with differentiable negative side and adaptable rectified point

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Trainable back-propagated functional transfer matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Functionally weighted neural networks: frugal models with high accuracy

Trainable activation function with differentiable negative side and adaptable rectified point

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation