Comparative Performance of Several Recent Supervised Learning Algorithms
Comparative Performance of Several Recent Supervised Learning Algorithms
net/publication/324865905
CITATIONS READS
4 563
1 author:
Rana Ismail
Michigan State University
29 PUBLICATIONS 70 CITATIONS
SEE PROFILE
All content following this page was uploaded by Rana Ismail on 01 May 2018.
Abstract— A wide variety of optimization algorithms have back propagation where it combines Gauss-Newton algorithm
been developed, however their performance is still unclear (GNA) with the conventional gradient descent published in
across optimization landscapes. The manuscript presented 1944. Levenberg’s contribution modifies the GNA to include a
herein discusses methods for modeling and training neural hyperparameter µ allowing the learning to increase in the
networks on a small dataset. The algorithms include initial phases of training and decreases when the solution tends
conventional gradient descent, Levenberg-Marquardt, towards convergence depending on the rate of change of the
Momentum, Nesterov Momentum, ADAgrad, and cost function s(w) [9]. On the other hand, Marquardt’s
RMSprop learning methodologies. The work aims to extension in 1963 suggests scaling each component of the
compare the performance, efficiency, and accuracy of the gradient per the curvature so there are large movement where
different algorithms utilizing the fertility dataset available the rate of change of the cost function is smaller [10]. In 2012,
through the UC Irvine machine learning repository. an adaptive gradient method or ADAgrad that incorporates
knowledge of the geometry of the data being observed was
proposed by Duchi et al. [11]. Root mean square propagation
Keywords: Neural Networks, Back Propagation, Lebenberg- or RMSprop is a modified version of the ADAgrad that
Marquardt, Momentum, Nesterov, ADAgrad, RMSprop,
attempts to reduce its aggressiveness while monotonically
Hyperparameters, Newton Method, Supervised Learning, gradient
descent, delta rule, Python, Fertility
decreasing the learning rate. The latter method was presented
by Geoff Hinton in his Coursera class. While the RMSprop is
an unpublished training algorithm but it is being adapted by
I. INTRODUCTION the neurocomputing community, hence its addition to our
Background scope of work in this manuscript. Numerous training
An artificial neural network is a supervised machine learning algorithms were presented, discussed, and utilized in artificial
algorithm used by computers to model complex high neural networks with one hidden layer on one front and deep
dimensional datasets and make predictions without being learning neural networks such as recurrent and convoluted
explicitly programmed. Neural networks are biologically networks on the other front. Comparison of neural network
inspired through mimicking neurons operation in a human back-propagation algorithms have been carried out in the
brain. The first neural network was modeled in 1957 by Frank literature addressing specific case studies such as stream flow
Rosenblatt and was referred to as perceptron. Lately, neural forecasting, determination of lateral stress in cohesion-less
networks have been gaining popularity through deep learning soils, electricity load forecasting, radio network planning tool,
such as recurrent and convolution networks for speech power quality monitoring, software defect prediction, sleep
recognition and autonomous applications. Several back- disorders, and electro-static precipitation for air quality control
propagation learning algorithms were developed to assist with [12][13][14][15][16][17][18]. However, none of them
the convergence speed and performance of the networks. compare and study the effectiveness of the algorithms covered
Algorithms such as the momentum back propagation in our work especially when modeling complexities in high
algorithm has been widely used in neurocomputing [1][2] and dimensional small datasets. To bridge the gap, we employ the
several adaptive methodologies were then produced to fertility database from UC Irvine machine learning repository
dynamically vary the momentum parameter as described in to compare the performance and the effectiveness of
[3][4][5][6][7]. Nesterov’s momentum is one of the adaptive conventional gradient descent, Levenberg-Marquardt,
algorithms that gained popularity where it was developed by Momentum, Nesterov Momentum, ADAgrad, and RMSprop
Yuri Nesterov in 1983 [8]. Levenberg’s algorithm is another learning methods through studying the mean square error
www.ijcit.com 49
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 07 – Issue 02, March 2018
propagation versus the number of iterations and visually updating and tuning the parameters to minimize the cost
comparing actual versus predicted observations for our labeled function. The gradient descent rule is described
dataset. The fertility dataset ivolves analyzing semen samples mathematically per the following formula.
from 100 volunteers per the 2010 world health organization
J s ( )
T
criteria such as socio-demographic data, environmental
t
factors, health status, and life habits. The neural network along
with the different training algorithms presented in our scope of new old t
work are programmed in Python version 2.7 utilizing the Learning rate (1)
Spyder 2.3.8 development environment employing the Weights
“numpy” fundamental package for scientific computing. It is J Cost function
important to mention that all the algorithms and the neural t = Number of iteration
network were developed and programmed without using
generic Python scripts or APIs such as Scipy and Kerras.
www.ijcit.com 50
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 07 – Issue 02, March 2018
J J( ) s ( ) e ( )T e ( ) ( y A )T ( y A )
t
Gt Perturbing s ( ) and appyling partial derivation with respect to yields:
J
2
www.ijcit.com 51
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 07 – Issue 02, March 2018
III. RESULTS
The neural network parameters used in our work include nine
input, nine hidden, and 1 output node. The learning rate is set
to 0.4 for all cases while the momentum parameter is set to 0.4
for both Nesterov and regular momentum methods. On the
other hand, for the Levenberg-Marquardt algorithm, the
parameter µ is initialized to 0.001, 0.1 decrease factor, and 10
increase factor. The weights matrices initialization is constant
for all cases. The training stops when the mean squared error
reaches 0.0001 for all cases. The results are presented visually
in figures 2 through 7 while table 1 provides a summary
comparing the models in terms of performance and accuracy.
The graphical representation includes two plots where the first
subplot shows the mean square error propagation versus the
number of iterations while the second subplot includes the
neural network predicted outputs and the dataset actual
outputs.
www.ijcit.com 52
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 07 – Issue 02, March 2018
IV. CONCLUSION
The work performed in this manuscript suggests that all
learning models sustained a 100% accuracy. The RMSprop
algorithm was the fastest algorithm and attained convergence
in 11 iterations compared to 165 iterations for the Levenberg-
Marquardt algorithm ranked second. The RMSprop algorithm
Figure 6: ADAgrad algorithm implementation
is more efficient costing less computing power than the
Levenberg-Marquardt due to having to perform matrix
inversion in the latter. In addition, the RMSprop algorithm
reduces the number of hyper-parameters compared to the LM
methodology. Although our dataset is considered small but it
does provide a good benchmarking tool when considering
large scale datasets or deep learning such as convolution
neural networks. The RMSprop,ADAgrad, and LM algorithms
convergence were more stable as they progressed towards the
global minimum compared with the gradient descent,
conventional momentum, and Nesterov’s momentum
methodologies.
www.ijcit.com 53
International Journal of Computer and Information Technology (ISSN: 2279 – 0764)
Volume 07 – Issue 02, March 2018
www.ijcit.com 54