Deep learning exp 2.3 MU
Deep learning exp 2.3 MU
Aim: Apply any of the following learning algorithms to learn the parameters of the supervised
single layer feed forward neural network.
Theory:
What is an Optimizer?
An optimizer is a function or algorithm that modifies the neural network’s attributes, such as
weights and learning rate. As a result, it helps to reduce overall loss and improve accuracy.
Choosing the right weights for the model is difficult because a deep-learning model typically
has millions of parameters. It emphasizes the importance of selecting an appropriate
optimization algorithm for your application.
Understanding the Optimization algorithms is crucial to diving deep into deep learning.
Before you continue, there are a few terms you should be familiar with.
Epoch: The number of times the algorithm runs through the entire training dataset.
Batch: This is the number of samples that will be used to update the model parameters.
Learning Rate: The learning rate is a parameter that tells the model how frequently the model
weights should be updated.
Cost Function/Loss Function: It is used to calculate the cost, which is the difference between
the predicted and actual value.
Weights/Bias: A model’s learnable parameters that control the signal between two neurons.
Momentum: A very popular technique that is used along with SGD. Instead of relying solely
on the Gradient of the current step to guide the search, momentum considers the gradients of
previous steps to determine the best output.
Adam derives its name from adaptive moment estimation. This optimization algorithm is a
stochastic gradient descent extension that updates network weights during training. It is a
hybrid of the “gradient descent with momentum” and the “RMSP” algorithms.
It is an adaptive learning rate method that calculates individual learning rates for various
parameters.
Adam can be used instead of the classical stochastic gradient descent procedure to update
network weights iterative based on training data.
The Adam optimizer employs a hybrid of two gradient descent methods:
Momentum: This algorithm is used to speed up the gradient descent algorithm by considering
the ”exponentially weighted average” of the gradients. Using averages causes the algorithm to
converge to the minima more quickly.
RMSprop, or root mean square prop, is an adaptive learning algorithm that attempts to improve
AdaGrad. It uses the ”exponential moving average” rather than the cumulative Sum of squared
gradients as AdaGrad does.
• Wt = Weights at time t
• Wt+1 = Weights at time t+1
• αt = Learning rate at time t
• ∂L = Derivative of Loss Function
• ∂Wt = Derivative of weights at time t
• Vt = Sum of the square of past gradients. [i.e sum(∂L/∂Wt-1)] (initially, Vt = 0)
• β = Moving average parameter (const, 0.9)
• ϵ = A small positive constant (10-8)
Adam Optimizer takes the strengths or positive characteristics of the previous two methods and
builds on them to provide a more optimized gradient descent.
In this case, we control the gradient descent rate so that there is minimal oscillation when it
reaches the global minimum while taking large enough steps (step size) to avoid the local
minima hurdles along the way—as a result, combining the features of the above methods to
reach the global minimum efficiently.
Using the formulas used in the previous two methods, we get the following:
Code:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape, y_train.shape)
x_train= x_train.reshape(x_train.shape[0],28,28,1)
x_test= x_test.reshape(x_test.shape[0],28,28,1)
input_shape=(28,28,1)
y_train=keras.utils.to_categorical(y_train)#,num_classes=)
y_test=keras.utils.to_categorical(y_test)#, num_classes)
x_train= x_train.astype('float32')
x_test= x_test.astype('float32')
x_train /= 255
x_test /=255
batch_size = 64
num_classes = 10
epochs = 10
def build_model(optimizer):
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
optimizers = ['Adadelta', 'Adagrad', 'Adam', 'RMSprop', 'SGD']
for i in optimizers:
model = build_model(i)
hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=(x_test, y_test))
Output:
Conclusion: Hence we are able to Apply any of the following learning algorithms to learn
the parameters of the supervised single layer feed forward neural network.