Explore the Implementation of CNNs in Python
Explore the Implementation of CNNs in Python
Jessica Cervi
Activity Overview ¶
The term deep neural networks (DNNs) broadly refers to any kind of neural network with many layers,
assembled in order to achieve some larger task.
In this activity, we'll first explore how to assemble a simplified version of a DNN using the familiar MNIST
digits dataset. We will train our network and measure its accuracy to see if it makes a correct prediction.
This activity is designed to help you apply the machine learning algorithms you have learned using the
packages in Python . Python concepts, instructions, and starter code are embedded within this Jupyter
Notebook to help guide you as you progress through the activity. Remember to run the code of each code
cell prior to submitting the assignment. Upon completing the activity, we encourage you to compare your
work against the solution file to perform a self-assessment.
Index:
Week 3: Deep Neural Networks
Computers see images using pixels. Pixels in images are usually related, particularly to other nearby pixels.
For example, a certain group of pixels may signify an edge in an image, a particular texture, or some other
pattern. Convolutions use this to help identify and classify these images.
For example, to achieve image classification using DNNs, we may use a sequence of convolution, ReLU and
pooling layers, whose purpose is to essentially learn and extract relevant features from the image.
That might then be combined with a flatten operation and a sequence of fully connected layers, culminating
in some output, for example, using a softmax activation function at the end to indicate the “best” choice
among several possible classes for the image.
In this activity, we will use a simplified DNN consisting of only three layers: a convolution, a flattening layer,
and a softmax activation. Details about each of these layes will be given in the next sections of this activity.
Convolutional layers in Python with Keras
Since we are going to work with images, it's a good idea to use a DNN. In a similar way as we did for the
coding activity about autoencoders, we will use the Python library Keras (https://keras.io) to set up and build
our problem.
Back to top
Run the code cell below to import some of the libraries we will use in this exercise.
(60000,) (10000,)
Now let’s take a look at one of the images in our dataset to see what we're working with. We will plot the first
image in our dataset using matplotlib .
In [3]: import matplotlib.pyplot as plt
#plot the first image in the dataset
plt.imshow(X_train[0])
By default, the shape of every image in the MNIST dataset is 28x28, so we will not need to check the shape
of all the images. When using real-world datasets, you may not be so lucky. 28x28 is also a fairly small size,
so the DNN will be able to run over each image pretty quickly.
X_test = X_test.reshape(10000,28,28,1)
As a final step of our data preparation, we need to ‘one-hot-encode’ our target variable. To achieve this, we
will be using the function to_categorical
(https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) from the Keras module
utils .
This means that a column will be created for each output category and a separate binary variable is created
for each category. For example, we saw that the first image in the dataset is a 5. This means that the sixth
number in our array (corresponding to the sixth possible digit type counting from 0) will have a 1 and the rest
of the array will be filled with 0.
y_train[0]: 5
y_train_onehot[0]: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
Back to top
The model type that we will be using is Sequential. The Keras class sequential
(https://keras.io/api/models/sequential/) is the easiest way to build a model in Keras. It allows you to build a
model layer by layer.
In the code below, fill in the ellipsis so that num_filters , filter_size , and pool_size are equal to
8, 3, and 2, respectively.
In [8]: from keras.models import Sequential
Our first layer is a Conv2D layer. This is a convolution layer that will deal with our input images, which are
seen as 2-dimensional matrices. In the code cell below, 8 is the number of nodes in each layer. This number
can be adjusted to be higher or lower, depending on the size of the dataset. Our first layer also takes in the
shape of an input. The shape of each input image is 28,28,1, as seen earlier, with the 1 signifying that the
images are greyscale.
In between the Conv2D layers and the dense layer, there is a Flatten
(https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer. Flatten serves as a connection
between the convolution and dense layers, converting higher-dimensional data into a single 1-dimensional
vector as needed by a dense layer.
Finally, as we have seen for autoencoders, Dense is the layer type that we will use for our output layer.
Dense is a standard layer type that is used in many cases for neural networks.
We will have 10 nodes in our output layer, one for each possible outcome (0–9). The activation is "softmax".
Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then
make its prediction based on whichever option or class has the highest probability.
The optimizer controls the learning rate. We will be using adam as our optmizer. Adam is generally a good
optimizer to use for many cases. The adam optimizer adjusts the learning rate throughout training. The
learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate
may lead to more accurate weights (up to a certain point), but as we saw, the time it takes to compute the
weights will be longer.
We will use 'categorical_crossentropy' for our loss function. This is the most common choice for
classification. A lower score indicates that the model is performing better. To make things even easier to
interpret, we will use the 'accuracy' metric to see the accuracy score on the validation set when we train
the model.
In the code cell, fill in the ellipsis to set the argument loss equal to 'categorical_crossentropy' .
Back to top
For our validation data, we will use the test set provided to us in our dataset, which we have processed into
X_test and y_test_onehot . The number of epochs is the number of times the model will cycle
through the data. The more epochs we run, the more the model will improve up to a certain point. After that
point, the model will stop improving during each epoch.
For efficiency, in our model below we will set the number of epochs to 3.
Epoch 1/3
1875/1875 [==============================] - 6s 3ms/step - loss: 2.2
710 - accuracy: 0.8854 - val_loss: 0.5618 - val_accuracy: 0.9310
Epoch 2/3
1875/1875 [==============================] - 7s 4ms/step - loss: 0.3
337 - accuracy: 0.9428 - val_loss: 0.2534 - val_accuracy: 0.9479
Epoch 3/3
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2
018 - accuracy: 0.9536 - val_loss: 0.2188 - val_accuracy: 0.9533
Question
What is the accuracy of our model after 3 epochs? Round your answer to two decimal digits.
0.95, or 95%
Making Predictions
If you want to see the actual predictions that our model has made for the test data, we can use the
predict (https://www.tensorflow.org/api_docs/python/tf/keras/Model) function.
The predict function will give an array with 10 numbers. Note that these numbers are the probabilities that
the input image represents each respective digit (0–9). The array index with the highest number represents
the model prediction. The sum of each array equals 1 (since each number is a probability).
In [12]: #predict all images in the test set, and look at first 4
pred_probs = model.predict(X_test)
pred_probs[:4]
Question
What is the predicted digit type of the last image shown above?.
digit 0 (zero)
Finally, let’s compare this with the actual results. It's a little easier if we first convert our prediction
probabilities into the corresponding class label. We can use argmax for that, to find the label with the
maximum probability.