COMPARISON OF SGD, RMSProp, AND ADAM
OPTIMATION IN ANIMAL CLASSIFICATION
USING CNNs
Desi Irfan1, Teddy Surya Gunawan2, Wanayumini3
Universitas Potensi Utama, Medan
Indonesia
1
desiirfan@gmail.com
2
tsgunawan@iium.edu.my
3
wanayumini@gmail.com
Abstract. Many measures have been taken to protect endangered
species by using "camera trap" technology which is widespread
in the field of technology-based nature protection field research.
In this study, a machine learning-based approach is presented to
identify endangered wildlife images with a data set containing
5000 images taken from Kaggle and some other sources. The
Gradient Descent optimization method is often used for Artificial
Neural Network (ANN) training. This method plays a role in
finding the weight values that give the best output value. Three
optimization methods have been implemented, namely Stochastic
Gradient Descent (SGD), ADADELTA, and Adam on the
Artificial Neural Network system for animal data classification.
In some of the studies reviewed there are differences in the
results of SGD and ADAM, which on the one hand SGD is
superior, and on the one hand ADAM is superior with the
appropriate learning rate. The results of this study show that the
CNN method with the Adam optimization function produces the
highest accuracy compared to the SGD and RMSprop
optimization methods. The model trained using Adam's
optimization function achieved an accuracy of 89.81% on the test,
showing the feasibility of the approach.
Keywords—Optimization Function, SGD, Adam, RMSProp
I.
INTRODUCTION
As many as one million species of animals and plants on land,
sea, and air are threatened with extinction, due to human
actions, according to a 1,800-page UN report[1]. One of the
efforts in making it easier for researchers to find out the
number of endangered animals is to implement an automation
system using digital image processing. Because of this, many
measures have been taken to protect endangered species[2],
and "camera trap" technology[3] is widespread in the field of
technology-based nature protection field research. Rich and
Knight (1991) mentioned that Artificial Intelligence (AI) is a
study of how to make computers do things that can currently
be done better by humans [4]. Computer recognition starts
from the process of classifying objects/images and is a fairly
easy task for humans, but for computers/machines classifying
objects/images is a very complex task, so it would be very
useful if we can automate this whole process using Computer
Vision. Deep Learning is a technology used in image
II.
a.
THEORETICAL STUDY
Animal
Satwa in the large Indonesian dictionary states satwa is a
synonym for animal or animal [5] Reporting to the World
Wildlife Fund, many endangered animals such as Sumatran
elephants, Asian elephants, African elephants, blue whales,
hawksbill turtles, orangutans, Javan rhinos, dugongs, hippos,
turtles, polar bears, penguins, and many others [6].
Fig.1 Animal Image
b.
Convolutional Neural Network
A convolutional neural network (or CNN) is a special type of
multilayer neural network or deep learning architecture
inspired by the visual system of living things[7]. A
convolutional neural network (CNN) is a special type of
neural network for processing data that has a grid-like
topology[8].
An example of such data is an image. An image can be
considered as a 2-dimensional grid of pixels. The use of pixel
optimization is useful for object detection, and the
segmentation of pixel values is considered a significant
factor[9]. The name "convolutional neural network" indicates
that it uses a mathematical operation called convolution. CNN
has three main types of layers viz: convolutional layer,
pooling layer, and fully-connected (FC) layer [10]. The basic
unit of computation in a neural network is the neuron, often
also called a node or unit. Nodes receive input from several
other nodes or from external sources, after which the node
processes the input and produces output. Each input has
associated weights (w). Nodes apply the function f to the
weighted sum as shown in Fig.3 [11].
√
Fig.2 Example of CNN Architecture
c.
Weight Optimization
Fig.3 is the Back-propagation process used to update the
weights in a neural network [13].
The last proposed optimizer is adam. The third optimizer
ADAM [19] is one of the most and most efficient optimizer
algorithms calculating the learning rate for each parameter.
The algorithm updates exponential moving averages of the
gradient mt and the square of the gradient ut where the
hyperparameters ρ1, ρ2 control the decay rate of these
exponential moving averages. The exponential moving
averages themselves are estimates of the first moment (mean)
and second raw moment (uncentered variance) of the gradient.
Adam's algorithm requires the first and second-moment
variables m and u. After computing the gradient, the biased
estimates of the first and second moments are updated every
time step t:
Fig.3. Weight and Bias Optimization Process [12]
The gradient of the parameter model is sampled iteratively,
behind the direction of the network weights, to find new
weights that minimize the error value in terms of classification
[14].
d.
Next, the bias is corrected in the first and second moments.
Using the adjusted moments, the updated prediction
parameters are calculating and applied:
Optimizer
The first proposed optimizer is SGD. SGD[15] follows the
gradient of a randomly selected minibatch downhill. To train a
neural network using SGD, first, the estimated gradient is
calculated using a loss function. Then, the update at iteration k
applied with parameter θ. Calculation for each minibatch m
instances of the training set {x(1),…,x(m)} with appropriate
targets y(i), equation as follows:
∑
∑
Here, the learning rate ϵk is a very important hyperparameter.
The magnitude of the update depends on the learning rate. If it
is too large, the update depends too much on the recent case.
If it is small, many updates may be required for convergence
[16]. This hyperparameter can be chosen by trial and error.
One way is to choose one of several learning rates that
produce the smallest loss function value. This is called line
search. Another way is to monitor the first few epochs and use
a learning rate that is higher than the best learning rate. In
Equation 2, the learning rate is denoted as k at iteration k
because, in practice, it is necessary to decrease the learning
rate gradually [17].
The second proposed optimizer RMSProp[18] is an
optimization algorithm that calculates the learning rate by an
exponential average of the squared gradient. To implement
RMSProp, the squared gradient is accumulated after
calculating the gradient:
√
Adam has many advantages. First of all, it requires little
tuning for the learning rate. Also, it is a gradient diagonal
ww
w
scaling method that is easy to implement and immutable. It is
computationally efficient and also has little memory
(2)
requirements. Moreover, Adam is suitable for non-stationary
purposes and problems with very noisy and sparse
gradients[19].
III.
In this study, a classification system of animal species in the
wild was designed to determine accuracy using digital image
processing methods. Fig.4 shows the system block diagram
designed in this study.
Image
Collection
PreProcessing
Training
Testing
Fig.4 General System Block Diagram
In general, the systematic system block diagram shown in
Figure 4 is Animal image collection, preprocessing with resize
and data augmentation stages, Training, and Testing.
A.
where ρ is the decay rate. Then the parameter update is
calculated and applied as follows:
METHODS
Image collection (3)
The data used in this research is secondary data. The data is
sourced from Kaggle. The reason the author takes data from
Kaggle is because of the reliability of the dataset that has been
(1)
tested. Keep in mind, using high-resolution training images
can also be used to get better accuracy[21].
B.
Preliminary Observation
Results To have a foundation that we can build upon
comparison of how good our model is
forbidden to do things naturally, us using pre-trained VGG-19
with following structure:
TABLE.I
Hyperparameter Model CNN
No
1
2
3
4
5
6
Layer
Batch Size
Crop size
Input Layer
nn.model
Global
Average
pooling 2d layer
Dropout
Output Shape
128
64
3 x 64 x 64
64 x 4 x 4
3, 8, 3, 1, 1
10 %
In all the experiments we have done do, the only change we
do replace the Optimization Function with Adam, RMSProp,
and SGD with the appropriate Learning Rate.
C.
Preprocessing.
at the second convolution stage uses a total of 20 kernels with
a 5x5 matrix, using ReLU with a padding value = valid.
Furthermore, flatten is changing the output of the convolution
process in the form of a matrix into a vector that will be
forwarded to the classification process using MLP (MultiLayer Perceptron) with a predetermined number of neurons in
the hidden layer. In SGD Optimization, RMSProp and Adam
will be applied to the node for weight and bias optimization
with the default Learning rate using the softmax activation
function according to the number of classes, in this study,
there are 5 classes of neurons. The class of the image is then
classified based on the value of the neurons in the hidden
layer using the softmax activation function.
E.
Testing
Fig.6 shows the flowchart of the system testing stage. The
testing stage is the process of classifying animal species by
testing test image data and comparing it with the training
model results of training image data stored in the database.
Image data taken as much as 1000 for the original data then
3000 image data augmentation results. The image that has
been taken will be processed by the CNN algorithm until it
produces system output in the form of Animal Class
information.
After the image collection process, pre-processing is done to
optimize the quality of the image, and to facilitate and boost
the system's ability to identify objects. Pre-processing
augmentation is done by resizing and data augmentation.
Training
Image Input
DATABASE
D.
Pre-Processing
Results
Pre-Processing
Start
Training
At the training stage, the learning process is carried out on the
image, which then outputs a model that will be stored for use
in the testing process. Model building is the process of
training and training image data in identifying objects and
categorizing them according to their class.
Input 3x64x64
3x64x64
Conv(3, 8)
FA
FA
Linear (1024,
256)
Optimasi + AF
MaxPool
Drop Out 10%
Conv(16, 32)
32x8x8
8x32x32
Conv(32, 64)
MaxPool
64x4x4
FA
Conv(8, 16)
Linear (256, 5)
LogSoftmax
MaxPool
FA
MaxPool
16x16x16
Flatten
64x4x4
=1024
DATABASE
Fig.5 Flowchart of Training System Stages referring to LeNet-5
In this study, the method refers to the LeNet-5 architecture
which is very popular and has been tested using 2 layers
shown in Figure 5. Input size 64 x 64 x 3. In the first
convolution using the number of kernels as much as 10 with a
3x3 matrix with a padding value = valid, ReLu activation is
used in this convolution process as Non-Linearity. The
pooling process, especially max pooling, uses a 2x2 size and
Test Image
Input
Results
Pre-Processing
Fig.6 Flowchart of System Testing Stage
IV.
RESULTS AND DISCUSSION
Tests on the system that has been designed using the CNN
method with architecture referring to LeNet-5 to determine the
type of animal from a dataset that is divided into five classes
namely Bear, Elephant, Orang Utan, Tiger, and Zebra with
consideration. The test system is formed by utilizing
hyperparameter changes in the data before augmentation and
after augmentation. The hyperparameters used are changes in
the Optimizer type, namely Adam, SGD, and RMSprop,
changes in batch size, namely 16, 32, 64, and 128 changes in
learning rate values, namely 0.1, 0.01, and 0.001 and the
number of training iterations (epochs) in this case using early
stop.
A.
Data Testing and Analysis
The first data to be tested is the original data totaling 5000
images. In the training process, the data used amounted to 0,8
of the total data at 4000 data. While in the testing process the
data used amounted to 20% of the total data or 1000 data. This
test uses three Optimizers, namely Adam, SGD, and RMSprop
with parameters, namely batch size of 16, for Epoch here
using early stop, where when the model cannot get accuracy
and loss again then the process automatically stops. To
determine the Learning Rate in accordance with the Optimizer,
several Learning Rate values will be tried, namely 0.1, 0.01,
and 0.001.
1)
SGD
RMSProp
Adam
Fig.9 Plot of Cost Adam, SGD and RMSProp
Learning Rate 0,1
RMSProp
SGD
Adam
Fig.10 Plot of Adam's Score, SGD and RMSProp
Cost SGD
Cost Adam
Cost RMSProp
Fig.7 Plot of Adam's cost, SGD and RMSProp
But the results obtained from the three models are still very
bad considering the dataset used is good and the model also
uses an activation function that performs very well. This
means that there is still a problem with the Learning rate value
used in the optimizer algorithm in this model.
TABLE.III
Score SGD
Score RMSProp
Score Adam
OPTIMIZER COMPARISON WITH LEARNING RATE 0.01
Fig.8 Plot of Adam's Score, SGD and RMSProp
To find the right learning rate for the model created, three
learning rate values were tried. It can be seen in Fig.7 and
Fig.8, from the three optimization algorithms above SGD,
outperforms Adam's algorithm both from the Cost and Score
values. Adam and RMSProp optimizers are very bad at a
learning value of 0.1, this means we will change the learning
rate value to find a better score and cost.
TABLE.II
COMPARISON OF OPTIMIZER WITH LEARNING RATE 0.1
Optimizer
Test Score
Test Cost
Best Epoch
SGD
0.7480
0.7255
25
RMSProp
0.2000
1.6140
3
Adam
0.2000
1.6122
2
From the table.II above SGD is superior to Adam and
RMSProp with the acquisition of a Test Score of 74.80% and
Test Cost of 72.55%. And it can be seen that the Test Score on
SGD and RMSProp cannot exceed 2% and the Cost is still
very high, this means that at learning 0.1 there is a problem
with the RMSPop and Adam optimizers. Therefore,
researchers changed the Learning rate value to 0.01 in the
second experiment.
Optimizer
Test Score
Test Cost
Best Epoch
SGD
0.2440
1.6090
12
RMSProp
0.4430
1.3005
8
Adam
0.2000
1.6096
1
From table III, the highest Score only reaches 0.4430 with the
lowest Cost of 1.3005 obtained by RMSProp at epoch 12. This
is very far from the expected model performance. Therefore,
the researcher changed to the next Learning rate value of
0.001 in the third experiment.
3)
Learning Rate 0,001
SGD
RMSProp
Adam’s
Fig.11 Plot of Cost Adam, SGD and RMSProp
SGD
RMSProp
Adam
Fig.12 Plot of Adam's Score, SGD and RMSProp
2)
Learning Rate 0.01
In the first experiment with a Learning Rate value of 0.1, it
turns out that SGD and RMSProp produce costs and scores
that are still bad. It can be seen in Fig.9 and Fig.10 of the three
optimization algorithms used RMSProp outperforms the SGD
and Adam optimization algorithms both from the Cost and
Score values.
The results of the SGD Optimizer Plot in Fig.11 and Fig.12
are still very bad, which is shown in the cost and score values
at the time of testing the graph is very monotonous and during
the training process, the graph moves otherwise very unstable.
This indicates that SGD is not very suitable for use at this
learning rate value. But on the contrary for Adam and
RMSProp the cost and score results are very good even
though RMSProp is not as good as the Adam optimizer. From
TABLE.VI
several experiments by changing the value of the learning rate,
it is concluded that the SGD, RMSProp, and Adam optimizers
have their respective ideal learning values.
COMPARISON OF PRECISION, RECALL AND F1-SCORE
SGD
Precision
Recall
F1-Score
1
0.73
0.82
0.76
2
0.46
0.30
0.36
Test
3
4
0.47 0.38
0.50 0.19
0.48 0.26
5
0.58
0.47
0.49
6
0.50
0.34
0.40
RMSProp
Precision
Recall
F1-Score
0.85
0.82
0.84
0.72
0.74
0.73
0.54
0.46
0.47
0.57
0.63
0.58
0.58
0.64
0.57
0.71
0.75
0.76
Adam
Precision
Recall
F1-Score
0.82
0.78
0.80
0.74
0.74
0.74
0.87
0.82
0.84
0.75
0.76
0.75
0.70
0.78
0.74
0.78
0.84
0.80
Result
TABLE.IV
OPTIMIZER COMPARISON WITH LEARNING RATE 0.001
SGD
0.2660
1.6090
2
0.2
0.08
0.04
Optimizer
Test Score
Test Cost
Best Epoch
Precission
Recall
F1 Score
RMSProp
0.7920
0.5925
18
0.85
0.82
0.84
Adam
0.8280
0.4926
23
0.82
0.78
0.80
Table.IV shows that SGD at learning 0.001 is not optimal, but
we have got the right Learning rate value at 0.1. It can be seen
that the RMSProp and Adam optimizers produce quite good
Score, Cost, Precision, Recall, and F1 Score values. But to see
which optimizer is actually superior to the three optimizers
used, we try several experiments to see the consistency of the
optimizer's performance with the suitability of the Learning
Rate described above.
B.
After 6 experiments, the results were obtained as shown in
Table.V that Adam's Score is the highest with the lowest Loss
Cost. Likewise Table.VI the results of Precision, Recall, and
F1-Score Optimizer adam are very stable which shows that
adam is the best model that can be applied in this study.
Followed by RMSProp and SGD optimizers.
C.
Optimizer Performance Consistency
In this research, the drawing is done randomly which results
in different accuracy results from the model built. This is
where consistent performance results are needed from the
model, especially related to optimizer algorithms such as SGD,
Adam, and RMSProp to determine the reliability of the
optimizer consistently so that it can be concluded which
optimizer performance is the best. From the above
experiments, it can be seen that the best performance of SGD
uses a Learning Rate value of 0.1 with a momentum of 0.9
and RMSProp and Adam with a Learning rate value of 0.001.
The performance of the three optimizers will be tested with 6
trials with the best learning rate value of each optimizer
algorithm.
Confusion Matrix Comparison
Fig.13 shows the plot of the confusion matrix results from the
training process using the three optimizer algorithms. On the
left side of the plot, there are True Labels of 5 animal classes
where this is the actualization of the real animal class and at
the bottom, there are Predicted labels where this is the
prediction of the training process.
SGD (Lr=0.1)
RMSProp(Lr=0.001)
Adam (Lr=0.001)
Fig.13 Confusion Matrix of Adam, SGD and RMSProp
TABLE.V
COMPARISON OF OPTIMIZER WITH LEARNING RATE 0.1 AND 0.01
SGD
(Lr=0.1)
Score
Cost
RMSProp
(Lr=0.001)
Score
Cost
Adam
(Lr=0.001)
Score
Cost
1
0.7480
0.7255
0.7920
0.5925
0.8280
0.4926
2
0.5470
1.1258
0.7810
0.5898
0.8410
0.4661
3
0.6310
0.9648
0.6980
0.7726
0.8500
0.4278
4
0.5840
1.0965
0.7940
0.5369
0.8320
0.4594
5
0.6410
0.9316
0.7010
0.8466
0.8220
0.5095
6
0.5900
1.0634
0.8060
0.5602
0.8470
0.4614
Test
D.
Manual Calculation of Confusion Matrix
Because in this study the Adam Optimizer is the best
algorithm, the researcher only calculates the performance of
the optimizer system. Fig.14 is the result of retesting and
generating a Confusion matrix using Adam with 5 classes
namely bear, elephant, orangutan, tiger, and zebra.
Fig.14 Confusion Matrix using Adam's
Optimizer
At this stage, researchers will try to calculate the class
manually by taking an example of one bear class as a
representative of other classes.
True Positive
= Actual Bear(22) predicted Bear.
False Negative = Actual bear(6) predicted elephant +
Actual Bear(2) predicted orangutan.
False Positive
= Actual Elephant(1) predicted Bear +
Actual orangutan(1) predicted Bear +
Actual tiger(1) predicted bear.
True Negative
= Actual Elephant(18) predicted Elephant +
Actual Elephant(1) predicted orangutan +
actual elephant(3) predicted tiger + actual
elephant(2) predicted zebra + actual
orangutan(25) predicted orangutan +
actual orangutan(2) predicted elephant +
actual tiger(22) predicted tiger + actual
tiger(1) predicted elephant + actual
tiger(1) predicted zebra + actual zebra(19)
predicted zebra + actual zebra(1)
predicted elephant.
Tiger, Elephant, and Zebra. Meanwhile, the lowest Los is
obtained by the Zebra class followed by Tiger/zebra, Bear,
and Elephant. And the acquisition of the entire class resulted
in a precision of 86.75%, Accuracy of 89.81%, and Loss of
6%. This is a good result considering the data processed in the
form of images in a number of varied classes.
E.
Data Visualization
To see how the prediction visualization results of the model
built will be displayed in the form of images. As explained in
the previous explanation that the withdrawal of image data is
128 images. To adjust the display, dimensions of 5 rows and 8
columns are used which only display 40 images.
TP = 22, FN = 8, FP = 3, TN = 95.
=
=
=
While on the right there are color parameters that indicate the
number of images or datasets that are drawn so that they
become parts of True Positive, False Negative, false positive,
and true negative.
After the precision, accuracy, and loss of each class have been
searched, the average will be calculated to determine the
precision, accuracy, and loss of the model with the Adam
optimizer.
TABLE.VII
MANUAL CALCULATION ACCURACY, PRECISSION AND LOSS
Class
Bear
Elephant
Orangutan
TP
22
18
25
22
19
FN
8
7
3
3
1
FP
3
10
3
3
3
TN
95
93
97
100
105
Bitch Size
128
128
128
128
128
Precission 0,880
0,642
0,892
0,953 0,968
0,867
Accuracy 0,914
0,867
0,953
0,892 0,863
0,898
0,085
0,132
0,046
0,046
0,068
Loss
Tiger Zebra
0,03
Mean
From Table.VII above, we can conclude that the best
Accuracy is owned by the Orangutan class followed by Bear,
Fig.15 Visualization of Prediction result image using CNN with Adam’s
Optimizer
From fig.15 It can be seen that the actual label will be
compared with the prediction label where if the actual label
and prediction label display the same class it will be green, in
other words, True Positive. And if the actual label and the
prediction label display a different class, it will be red False
Positive. Prediction errors can occur because image factors
can be caused by several factors such as testing data in this
case in the form of unclear images, background influences
such as nature, similarities in color and shape, and so on. As
in the example of the fourth column of the first row, there is
an image of a tiger with the head cut off, a larger machine
presented to the elephant image. This can be overcome by
adding a tiger image without the head part to the training data.
V.
CONCLUSION
In this study, we provide a solution to help scientists identify
and monitor protected animal species more accurately, with
89.81% accuracy for our best model. The solution offered is
that it can help in monitoring animal species, especially
protected ones, cheaper, faster, and more reliably. This
research also proves that with the appropriate learning rate in
each estimation function, Adam is superior followed by
RMSProp and SGD. Suggestions for further research are to
compare the effect of activation functions with the same
dataset and optimizer in this study to further improve the
performance of the previous model.
REFERENCES
[1]
PBB: One million species of animals and plants are threatened
with extinction due to human activity (2019) BBC News Indonesia. BBC.
Available at: https://www.bbc.com/indonesia/magazine-48189137 (Accessed:
December 15, 2022).
[2]
J. A. Veech, ―A comparison of landscapes occupied by
increasing and decreasing populations of grassland birds,‖ Conserv. Biol., vol.
20, no. 5, pp. 1422–1432, 2006, doi: 10.1111/j.1523-1739.2006.00487.x.
[3]
P. D. Meek, G. A. Ballard, P. J. S. Fleming, M. Schaefer, W.
Williams, and G. Falzon, ―Camera traps can be heard and seen by animals,‖
PLoS One, vol. 9, no. 10, 2014, doi: 10.1371/journal.pone.0110832.
[4]
Singh, S.P. (2019) Fully connected layer: The brute force layer
of a machine learning model, OpenGenus IQ: Computing Expertise &
Legacy. OpenGenus IQ: Computing Expertise & Legacy. Available at:
https://iq.opengenus.org/fully-connected-layer/ (Accessed: December 15,
2022).
[5]
Setiawan, E. (no date) Indonesia Dictionary (ID), Meaning of
the word animal - Indonesia Dictionary (ID) Online. Available at:
https://kbbi.web.id/satwa (Accessed: December 15, 2022).
[6]
Sartika, R.E.A. (2019) As a result of human life, one million
species are threatened with extinction from the earth page all, KOMPAS.com.
Kompas.com.
Available
at:
https://sains.kompas.com/read/2019/05/09/163500923/akibat-kehidupanmanusia-satu-juta-spesies-terancam-punah-dari-bumi?page=all
(Accessed:
December 15, 2022).
[7]
D. Saravanan, D. Joseph, and S. Vaithyasubramanian, Effective
utilization of image information using data mining technique, vol. 172. 2019.
doi: 10.1007/978-3-030-32644-9_22.
[8]
Hidayatullah, P., Wang, X., Yamasaki, T., Mengko, T. L. E. R.,
Munir, R., Barlian, A., Sukmawati, E., & Supraptono, S. (2021). DeepSperm:
A robust and real-time bull sperm-cell detection in densely populated semen
videos. Computer Methods and Programs in Biomedicine, 209, 106302.
https://doi.org/10.1016/j.cmpb.2021.106302
[9]
Wanayumini, Sitompul, O. S., Suwilo, S., & Zarlis, M. (2020).
Supervised image classification of chaos phenomenon in cumulonimbus cloud
using spectral angle mapper. International Journal on Advanced Science,
Engineering
and
Information
Technology,
10(3),
987–992.
https://doi.org/10.18517/ijaseit.10.3.11493
[10]
Courville, I. G. and Y. B. and A. (2016). Deep learning 简介 一 、
什么是 Deep Learning ?. Nature, 29(7553), 1–73. http://deeplearning.net/
[11]
Ujjwalkarn (2016) A quick introduction to Neural Networks,
Ujjwal Karn. Available at: https://ujjwalkarn.me/2016/08/09/quick-introneural-networks (Accessed: December 15, 2022).
[12]
Kingma, D. P., & Ba, J. L. (2015). Adam: A method for
stochastic optimization. 3rd International Conference on Learning
Representations, ICLR 2015 - Conference Track Proceedings, 1–15.
[13]
Madduri, A., Adusumalli, S. S., Katragadda, H. S., Dontireddy,
M. K. R., & Suhasini, P. S. (2021). Classification of Breast Cancer
Histopathological Images using Convolutional Neural Networks. Proceedings
of the 8th International Conference on Signal Processing and Integrated
Networks,
SPIN
2021,
755–759.
https://doi.org/10.1109/SPIN52536.2021.9566015
[14]
Li, J., Cheng, J. H., Shi, J. Y., & Huang, F. (2012). Brief
introduction of back propagation (BP) neural network algorithm and its
improvement. Advances in Intelligent and Soft Computing, 169 AISC(VOL. 2),
553–558. https://doi.org/10.1007/978-3-642-30223-7_87
[15]
Robbins, H., & Monro, S. (1951). A Stochastic Approximation
Method. The Annals of Mathematical Statistics, 22(3), 400–407.
https://doi.org/10.1214/aoms/1177729586
[16]
Ethem Alpaydın. (2019). Introduction to Machine Learning. In
Library of Congress Cataloging-in-Publication Information Alpaydin, (Third
Edit, Vol. 4, Issue 1).
[17]
Heaton, J. (2018). Ian Goodfellow, Yoshua Bengio, and Aaron
Courville: Deep learning. Genetic Programming and Evolvable Machines,
19(1–2), 305–307. https://doi.org/10.1007/s10710-017-9314-z
[18]
Mukkamala, M. C., & Hein, M. (2017). Variants of RMSProp
and adagrad with logarithmic regret bounds. 34th International Conference on
Machine Learning, ICML 2017, 5, 3917–3932.
[19]
Zhang, Z. (2019). Improved Adam Optimizer for Deep Neural
Networks. 2018 IEEE/ACM 26th International Symposium on Quality of
Service, IWQoS 2018, 1, 1–2. https://doi.org/10.1109/IWQoS.2018.8624183
[20]
Lan, K., Liu, L., Li, T., Chen, Y., Fong, S., Marques, J. A. L.,
Wong, R. K., & Tang, R. (2020). Multi-view convolutional neural network
with leader and long-tail particle swarm optimizer for enhancing heart disease
and breast cancer detection. Neural Computing and Applications, 32(19),
15469–15488. https://doi.org/10.1007/s00521-020-04769-y
[21]
Gunawan, T. S., Gani, M. H. H., Rahman, F. D. A., & Kartiwi,
M. (2017). Development of face recognition on raspberry pi for security
enhancement of smart home system. Indonesian Journal of Electrical
Engineering
and
Informatics,
5(4),
317–325.
https://doi.org/10.11591/ijeei.v5i4.361