Analysis of function of rectified linear unit used in deep learning

K Hara, D Saito, H Shouno - 2015 international joint conference …, 2015 - ieeexplore.ieee.org
K Hara, D Saito, H Shouno
2015 international joint conference on neural networks (IJCNN), 2015ieeexplore.ieee.org
Deep Learning is attracting much attention in object recognition and speech processing. A
benefit of using the deep learning is that it provides automatic pre-training. Several
proposed methods that include auto-encoder are being successfully used in various
applications. Moreover, deep learning uses a multilayer network that consists of many
layers, a huge number of units, and huge amount of data. Thus, executing deep learning
requires heavy computation, so deep learning is usually utilized with parallel computation …
Deep Learning is attracting much attention in object recognition and speech processing. A benefit of using the deep learning is that it provides automatic pre-training. Several proposed methods that include auto-encoder are being successfully used in various applications. Moreover, deep learning uses a multilayer network that consists of many layers, a huge number of units, and huge amount of data. Thus, executing deep learning requires heavy computation, so deep learning is usually utilized with parallel computation with many cores or many machines. Deep learning employs the gradient algorithm, however this traps the learning into the saddle point or local minima. To avoid this difficulty, a rectified linear unit (ReLU) is proposed to speed up the learning convergence. However, the reasons the convergence is speeded up are not well understood. In this paper, we analyze the ReLU by a using simpler network called the soft-committee machine and clarify the reason for the speedup. We also train the network in an on-line manner. The soft-committee machine provides a good test bed to analyze deep learning. The results provide some reasons for the speedup of the convergence of the deep learning.
ieeexplore.ieee.org