Layer normalization

JL Ba, JR Kiros, GE Hinton - arXiv preprint arXiv:1607.06450, 2016 - arxiv.org
Training state-of-the-art, deep neural networks is computationally expensive. One way to
reduce the training time is to normalize the activities of the neurons. A recently introduced
technique called batch normalization uses the distribution of the summed input to a neuron
over a mini-batch of training cases to compute a mean and variance which are then used to
normalize the summed input to that neuron on each training case. This significantly reduces
the training time in feed-forward neural networks. However, the effect of batch normalization …