CHAPTER 2
A Closer Look
at TensorFlow
In the previous chapter, you saw the capabilities of the TensorFlow
platform. Having seen a glimpse of TensorFlow powers, it is time now
to start learning how to harness this power into your own real-world
applications.
We will start with a trivial application that will teach you the intricacies
of a simple ML application development.
A Trivial Machine Learning Application
To get you started on TensorFlow coding, we will start with a trivial Hello
World kind of application. In this trivial application, you will develop
a machine learning model that does the predictions using statistical
regression techniques.
In this application, we will use a fixed set of data points declared within
the program code itself. Our data will consist of (x, y) coordinate values.
We compute a value called z that has some linear relationship with x and
y. For example, the value of z for a given x and y values may be computed
using the following mathematical equation:
z = 7 * x + 6 * y + 5
© Poornachandra Sarang 2021
P. Sarang, Artificial Neural Networks with TensorFlow 2,
https://doi.org/10.1007/978-1-4842-6150-7_2
25
Chapter 2
a Closer look at tensorFlow
Our task is to make the machine learn on its own to find the best
fit for the preceding relationship given a sufficiently large number
of x and y values and the corresponding target z values. Once the
model is trained, we will use this model to predict z for any unseen x
and y values. For example, given x equals 2 and y equals 3, the model
should predict an output z equal to 37. If it predicts 37 and likewise if
it predicts z with 100% accuracy for any not previously known x and y,
we say that the model is fully trained with 100% accuracy. Practically, it
is never possible to develop a model that predicts with 100% accuracy.
So we try to optimize the model performance to achieve this idealistic
accuracy level of 100%.
As you can see from the preceding discussion, the problem that we
are trying to solve is a classical linear regression case study. To keep things
simple, we will create a single-layer network consisting of only one neuron,
which is trained to solve a linear regression problem. In practice, your
network will always consist of multiple layers with multiple nodes. In this
trivial application, I will avoid the use of such deep networks as defining
those requires a deeper understanding of Keras API. You will be exposed
to those Keras APIs later in this chapter. For this trivial application and
all subsequent applications in this book, you will use Google Colab for
developments.
Creating Colab Notebook
In this first application, I will guide you through the entire process of
creating, testing, and inferring a ML model development in Colab. This is a
bit of a detailed explanation of the development process for the benefit of
the readers who are new to ML development.
Start Google Colab in your browser by typing the following URL:
http://colab.research.google.com
You will see the screen shown in Figure 2-1.
26
Chapter 2
a Closer look at tensorFlow
Figure 2-1. Creating a new Colab notebook
Select NEW PYTHON3 NOTEBOOK option to open a new Python 3
notebook. Assuming that you are logged in to your Google account, you
would see a screen as shown in Figure 2-2.
Figure 2-2. New Colab notebook
27
Chapter 2
a Closer look at tensorFlow
The default name for the notebook starts with Untitledxxx.ipnyb.
Change the name to Hello World or whatever you prefer. Next, you will
write code to import TensorFlow libraries in your Python code.
Imports
Our trivial program will require three imports – TensorFlow 2.x, numpy
library for handling our data, and matplotlib to do some charting.
Importing TensorFlow 2.x
To import TensorFlow in your Python notebook, you would use the
following program statement:
import tensorflow as tf
This imports the default version, which is currently 1.x (at the time of
this writing). The output of executing the preceding command is shown in
Figure 2-3.
Figure 2-3. Default TensorFlow library import
As this book is based on TensorFlow 2.x, we need to import it explicitly.
To do so, you must run a tensorflow_version magic. Magic is a feature of
Colab and is run using the following statement:
%tensorflow_version 2.x
28
Chapter 2
a Closer look at tensorFlow
Figure 2-4. Loading TensorFlow 2.x
When you run the code, TensorFlow 2.x will be selected. The output is
shown in Figure 2-4.
After the TensorFlow 2.x is selected, you would import TensorFlow
libraries using the traditional import statement as follows:
import tensorflow as tf
Note the use of magic is no more required in the current version
of Colab.
Keras library is now part of TensorFlow. To use Keras in our
application, we need to import it from TensorFlow. This is done using the
following import statement:
from tensorflow import keras
To use Keras modules, you now use tf.keras syntax. Next, you will
import other required libraries.
29
Chapter 2
a Closer look at tensorFlow
Importing numpy
NumPy is a library for supporting large, multidimensional arrays in
Python. It has a collection of high-level mathematical functions to operate
on these arrays. Any machine learning model development relies heavily
on the use of arrays. You will be using numpy arrays to store the input data
required by our network.
To import numpy, you use the following import statement:
import numpy as np
The matplotlib is a Python library for creating quality 2D plots. You will
use this library in our project for plotting accuracy and error metrics.
To import matplotlib, you use the following statement:
import matplotlib.pyplot as plt
This completes our imports for the application. Next, you will be
creating data for our application.
Setting Up Data
We will create a set of 100 data points consisting of x and y coordinates.
The count for data points is declared in the Python variable using the
statement:
number_of_datapoints = 100
To generate x and y coordinates, you use the random module in
numpy. To generate x values, you use the following program statement:
# generate random x values in the range -5 to +5
x = np.random.uniform(low = -5 , high = 5 ,
size = (number_of_datapoints, 1))
30
Chapter 2
a Closer look at tensorFlow
The low and high parameters in the uniform function define the lower
and upper bounds for the random number generator. The size parameter
specifies the dimensions of the array, that is, how many values are to be
generated. The return value of the preceding program statement is an
array consisting of 100 rows and 1 column. You can print the first five
values of the generated array using the statement:
x[:5,:].round(2)
In the output, each value is truncated to two decimal digits by calling
the round function. A sample output of the execution of this statement is
shown here:
array([[ 4.57],
[-0.68],
[ 2.64],
[-3.17],
[-4.86]])
Note that the output varies on every run. Likewise, the y values are
generated using a similar statement as shown here:
y = np.random.uniform(-5 , 5 , size = (number_of_datapoints , 1))
We set up the relation between x and y using a linear equation:
z = 7 * x + 6 * y + 5
In machine learning terms, the x and y are the features and z is the
label. Once our model is trained, we will ask the model to predict z for
the given x and y. I have mentioned it earlier that we will be training our
network to discover the relationship between x and y. For this, we need
to introduce some noise in every value of z that we compute using the
31
Chapter 2
a Closer look at tensorFlow
preceding equation. You generate the noise using the random function
as earlier with the values ranging from –1 to +1 using the following
statement:
noise = np.random.uniform(low =-1 , high =1,
size = (number_of_datapoints, 1))
Now, you create the z array using the linear equation and adding noise
to it as shown in this statement:
z = 7 * x + 6 * y + 5 + noise
The input to our neural network is a single dimensional array
consisting of 100 rows with each row consisting of another single
dimensional array having x and y values as columns. To create the required
input data format, you use the column_stack function as follows:
input = np.column_stack((x,y))
Printing the first five values of the input array produces the following
output:
array([[-1.9 , 2.91],
[-2.14, -0.81],
[ 4.18, 1.79],
[-0.93, -4.41],
[-1.8 , -1.31]])
At this point, you are ready with the data for training the network. Our
next task is to create a network itself.
Defining Neural Network
As mentioned earlier, our neural network will consist of a single neuron
that accepts a single dimensional vector and outputs a single value. The
network is depicted in Figure 2-5.
32
Chapter 2
a Closer look at tensorFlow
Figure 2-5. A single layer/single node network
To define the network models, Keras provides a Sequential API. Using
this API, you will be able to construct multilevel sophisticated network
architectures. In our current requirement, we need to create an ANN
architecture consisting of a single layer with a single neuron. You define
the model using the following statement:
model = tf.keras.Sequential([keras.layers.Dense(units=1,
input_shape=[1])])
The units parameter defines the dimensionality of the output space.
Here, by specifying a value of 1, you define a single layer network with a
single neuron outputting a single value. The Dense function takes several
parameters that allows you to create complex ANN architectures. You
would create lots of complex architectures throughout this book using the
keras.Sequential API.
After the model is defined, we need to compile it and make it ready for
training on our dataset.
33
Chapter 2
a Closer look at tensorFlow
Compiling Model
To train a model, we first need to define a learning process. The model
compilation is a way of setting up its learning process. The learning
process itself consists of a few components:
•
Objective loss function
•
Optimizer
•
Metrics
Firstly, it uses some loss function to determine how far the inference
is from the target value. The model tries to minimize the loss during its
training. Keras provides several predefined loss functions, to name a few,
categorical_crossentropy, mean_squared_error, huber_loss, and poisson.
Secondly, to help in reducing the losses, we use an optimizer. An optimizer
is an algorithm or a method used to change the attributes of a neural
network. The attributes are the weights and the learning rate. By changing
these attributes, we try to minimize the losses. Keras provides several
predefined optimizers, again to name a few, SGD (stochastic gradient
descent), RMSprop, Adagrad, and Adam. Lastly, we define a metric
function that is used to judge the performance of the model – to name a
few predefined metric functions, MSE (mean squared error), RMSE (root
mean squared error), MAE (mean absolute error), and MAPE (mean
absolute percentage error).
So we set up the learning process by calling the compile function
on the model. The compile takes the abovesaid three parameters as its
arguments. The following code segment illustrates the use of compile:
model.compile(optimizer = 'sgd' ,
loss = 'mean_squared_error' ,
metrics = ['mse'] )
34
Chapter 2
a Closer look at tensorFlow
Here we specify the stochastic gradient descent as the optimizer, the
mean squared error as the loss function, and again the mean squared error
as the metrics.
Now, as the model knows the learning process, it is time to feed some
training data to it.
Training Network
The model is trained in several iterations. Initially, we assign some
preset weights to the various nodes in the network. After the first training
iteration, we look at the losses and then adjust these weights for the second
iteration where we try to minimize the losses. The process continues
through several iterations; we call it epochs. At the end of each epoch, we
save and monitor the losses to ensure that we are optimizing the network
in the right direction. To save this state, we need to create a history object
in Keras, which is done using the following code segment:
from tensorflow.keras.callbacks import History
history = History()
We will pass the preceding history object to the model training method
as a parameter. The training itself is done by calling the model’s fit method
as shown here:
model.fit(input, z , epochs = 15 , verbose = 1,
validation_split = 0.2, callbacks = [history])
The first parameter specifies the stacked input that we have created
earlier. The second parameter specifies the target values. The epochs
parameter defines the number of iterations. The verbose parameter
specifies if you want to observe the training progress. The validation_split
parameter specifies that 20% of the given data would be used for validating
35
Chapter 2
a Closer look at tensorFlow
the trained model. Lastly, the callbacks parameter specifies where the
intermediate monitoring data would be stored. It specifies the callback
function which is called at the end of each epoch. The partial output
during training is shown in Figure 2-6.
Figure 2-6. Output during model training
Once all epochs are completed, the model is supposedly trained. We
need to verify that the model is indeed trained to meet our needs. To do so,
we will examine the training output by observing the metrics that we have
asked the model to create during its learning.
Examining Training Output
We have asked the model to save the status at each epoch in a history
variable. We can examine what is recorded in history with the help of the
following statement:
print(history.history.keys())
The output of this print statement would be
dict_keys(['loss', 'mse', 'val_loss', 'val_mse'])
36
Chapter 2
a Closer look at tensorFlow
We see that at each epoch the model has saved the loss and the mse
(mean squared error) on both the training and the validation data. The
validation loss and mse are indicated with the prefix val_. We will now
create a plot for both loss and mse. To print the loss on both the training
and validation data, use the following code segment:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title(Accuracy')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
Here, we plot the loss and val_loss key values from the recorded
history. The output is shown in Figure 2-7.
Figure 2-7. The loss vs. epoch
37
Chapter 2
a Closer look at tensorFlow
We observe from Figure 2-7 that the loss is minimized quite early by the
end of the third epoch, and the model is fully trained by the end of 15 epochs
that we have specified in the fit method. We will also print the mean squared
error to further verify this claim. To plot the mse, use the following code
segment:
plt.plot(history.history['mse'])
plt.plot(history.history['val_mse'])
plt.title('mean squared error')
plt.ylabel('mse')
plt.xlabel('epochs')
plt.legend(['train' , 'validation'] , loc = 'upper right')
plt.show()
The output of the execution of the preceding code is shown in Figure 2-8.
Figure 2-8. mse vs. epoch
38
Chapter 2
a Closer look at tensorFlow
Lastly, we will also plot the predicted output vs. the real output using
the following code segment:
plt.plot(np.squeeze(model.predict_on_batch(input)),
np.squeeze(z))
plt.xlabel('predicted output')
plt.ylabel('real output')
plt.show()
The program output is shown in Figure 2-9.
Figure 2-9. Predicted vs. real output values
As you observe in Figure 2-9, the output predicted by the model is very
close to the expected output. So, we can certainly assume that the model
is well trained. To pass the test, we need to test the model’s prediction on
unseen data, which is done next.
39
Chapter 2
a Closer look at tensorFlow
Predicting
To make a prediction on an unseen x and y values, you will use the predict
function on the trained model. This is shown in the following program
statement:
print("Predicted z for x=2, y=3 ---> ",
model.predict([[2,3]]).round(2))
Here, we specify x equals 2 and y equals 3. The result is rounded to
two decimal digits. When you execute the preceding print statement, you
would see the following output:
Predicted z for x=2, y=3 ---> [[36.99]]
Now, let us check whether the prediction is close enough to the
expected output. To see the expected output, run the following code:
# Checking from equation
# z = 7*x + 6*y + 5
print("Expected output: ", 7*2 + 6*3 + 5)
The execution prints 37 on the screen. Our model’s prediction is 36.99,
which is close enough to the expected value. Note that if you run the
code, the predicted output would vary on each run because the model’s
accuracy varies each time. You can test the model’s predictions with a few
more x and y values to satisfy yourself on the model’s training.
Full Source Code
The full source code for our trivial Hello World application described
earlier is given in Listing 2-1 for your quick reference.
40
Chapter 2
a Closer look at tensorFlow
Listing 2-1. A trivial linear regression application source
# Load TensorFlow 2.x in a Colab project.
%tensorflow_version 2.x
# Import required libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Set up data
number_of_datapoints = 1000
# generate random x values in the range -5 to +5
x = np.random.uniform(low = -5 , high = 5 , size = (number_of_
datapoints, 1))
# generate random y values in the range -5 to +5
y = np.random.uniform(-5 , 5 , size = (number_of_datapoints , 1))
# generate some random error in the range -1 to +1
noise = np.random.uniform(low =-1 , high =1, size = (number_of_
datapoints, 1))
z = 7 * x + 6 * y + 5 + noise
# Print x, y and z sample values for manual verification
x[:5,:].round(2)
y[:2,:].round(2)
z[:2,:].round(2)
# Stack x and y arrays for inputting to neural network
input = np.column_stack((x,y))
# Print few values of input array for demonstration purpose.
input[:2,:].round(2)
41
Chapter 2
a Closer look at tensorFlow
# Create a Keras sequential model consisting of single layer
with a single neuron.
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1)])
# Compile the model with the spcified optimizier, loss function
and error metrics.
model.compile(optimizer = 'sgd' , loss = 'mean_squared_error' ,
metrics = ['mse'] )
# Import History module to record loss and accuracy on each
epoch during training
from tensorflow.keras.callbacks import History
history = History()
model.fit(input, z , epochs = 15 , verbose = 1, validation_
split=0.2, callbacks=[history])
# Print keys in the history just to know their names. These
will be used for plotting the metrics.
print(history.history.keys())
# Plot the loss metric on both training and validation
datasets.
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Accuracy')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
#Plot the mean squared error on both training and validation
datasets.
plt.plot(history.history['mse'])
42
Chapter 2
a Closer look at tensorFlow
plt.plot(history.history['val_mse'])
plt.title('mean squared error')
plt.ylabel('mse')
plt.xlabel('epochs')
plt.legend(['train' , 'validation'] , loc = 'upper right')
plt.show()
plt.plot(np.squeeze(model.predict_on_batch(input)),
np.squeeze(z))
plt.xlabel('predicted output')
plt.ylabel('real output')
plt.show()
print("Predicted z for x=2, y=3 ---> ", model.predict([[2,3]]).
round(2))
# Checking from equation
# z = 7*x + 6*y + 5
print("Expected output: ", 7*2 + 6*3 + 5)
If you have run this trivial project and are getting the preceding output,
congratulations! Your setup for deep learning with TensorFlow 2.x is
now complete. You will now dive deep into real-world machine learning
development. In the next section, you will learn a real-world machine
learning development life cycle. You will be using a real dataset, learn how
to preprocess it and make it ready for feeding into a neural network, define
a multilevel deep neural network, train it, test the model, and plot the
accuracy metrics to refine the model. Not only this, you will learn how to
use TensorBoard in Colab environment to visualize the metrics for doing
analysis on the model training.
So let us start on this real-world project based on TensorFlow.
43
Chapter 2
a Closer look at tensorFlow
Binary Classification in TensorFlow
In the previous example, we used the built-in dataset for training our
model. Now, we will use a realistic dataset to do some real-world machine
learning. You will be solving a classification problem. We will use a dataset
from the popular Kaggle site used in one of their challenges. The dataset
contains the customer data of a bank. Consider now that the bank has
approached you to develop a ML prediction model which provides them
some insights on the likelihood of the customer leaving the bank. In the
financial terms, this is called churning. Having the knowledge that a
certain customer may leave the bank in the near future, the bank can take
some preventive measures to retain the customer.
The problem that you are trying to solve is to develop a binary
classification model. You will use TensorFlow’s deep learning library and
will make use of Keras high-level API for implementing the model. More
specifically, you will learn the following in this example:
•
How to load a CSV data from a local or a remote server?
•
How to preprocess it and make it ready for ML
algorithm?
•
How to define a multilayer ANN using TensorFlow’s
high-level Keras API?
•
How to train the model?
•
Evaluating the model’s performance on test data.
•
Visualizing the results on TensorBoard.
•
Doing performance analysis.
•
Inferring on unseen data.
So let us start with the project development.
44
Chapter 2
a Closer look at tensorFlow
Setting Up Project
Create a new Colab project and name it Binary Classification. The project
uses the bank customer data (Churn_Modelling.csv) which is available
in the book’s source download. Copy the downloaded file to your Google
Drive in a folder of your choice. Note that you will need to later on set up
the file path appropriately in your program code. If you do not wish to
download the data file, you will still be able to run the project taking the
data from the GitHub setup for this book. You are now ready to code your
project.
Imports
As in the earlier example, include the following imports in the code window:
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
You will be using pandas dataframes for loading external database.
You will use sklearn library for preprocessing the data and creating the
training/validation datasets. You will use the matplotlib for some charting.
To use these libraries, include the following imports in your project code:
#loading data
import pandas as pd
#scaling feature values
from sklearn.preprocessing import StandardScaler
#encoding target values
from sklearn.preprocessing import LabelEncoder
#shuffling data
from sklearn.utils import shuffle
#splitting the dataset into training and validation
45
Chapter 2
a Closer look at tensorFlow
from sklearn.model_selection import train_test_split
#plotting curves
import matplotlib.pyplot as plt
Next, you need to mount the drive in your program so that the program
can access the documents stored in Google Drive.
Mounting Google Drive
To mount Google Drive, type the following code in a new code cell:
from google.colab import drive
drive.mount('/content/drive')
When you run this code, it will ask you to enter the authorization code
to access your drive. You will see the screen shown in Figure 2-10.
Figure 2-10. Authorization code for Google Drive
Click the provided link. You will be asked to sign in to your Google
account. You will see the authorization code similar to that shown in
Figure 2-11.
46
Chapter 2
a Closer look at tensorFlow
Figure 2-11. Google sign-in authorization
Click the icon next to the authorization code to copy it to the clipboard.
Paste the code in the earlier shown authorization window. After a
successful authorization, you will see this message on your screen:
Mounted at /content/drive
Now, you are ready to access the contents of your drive through your
program code.
Loading Data
To load the data, enter the following program code in a new code cell and
execute it:
data = pd.read_csv('/content/drive/
<path to downloaded CSV>/Churn_Modelling.csv')
Note you will need to set up the appropriate path to your csv file.
If you have decided to use the data from the book’s GitHub, use the
following code instead of the preceding code segment:
data_url = 'https://raw.githubusercontent.com/Apress/artificialneural-networks-with-tensorflow-2/main/ch02/Churn_Modelling.csv'
data=pd.read_csv(data_url)
47
Chapter 2
a Closer look at tensorFlow
The read_csv function loads the data from the specified file and copies
it in a pandas dataframe.
Shuffling Data
The data that is collected on field may be in a specific order as per the
convenience and the comfort of the data collector. For better machine
learning, you should randomize the data so that the learning does not
follow the undesired patterns in the data. So, we shuffle the data using the
following statement:
data=shuffle(data)
Examining Data
You can verify that the data is correctly loaded by printing the contents of
the data dataframe. Instead of printing the top rows by calling data.head(),
I have printed the full dataset so that you will know the number of records
and columns present in it. This is shown in Figure 2-12.
Figure 2-12. Dataset
48
Chapter 2
a Closer look at tensorFlow
There are 10,000 rows and 14 columns in the database. A brief
explanation of the various fields is given here:
•
RowNumber – Numbers from 1 to 10,000.
•
CustomerId – A unique identification for the customer.
•
Surname – Customer’s last name.
•
CreditScore – Customer’s credit score.
•
Geography – Customer’s country.
•
Gender – Male or female.
•
Age – Customer’s age.
•
Tenure – How long the customer is banking with them?
•
Balance – Customer’s bank balance.
•
NumOfProducts – The count of bank products the
customer is currently using.
•
HasCrCard – Does the customer hold a credit card?
•
IsActiveMember – Is the customer currently active?
•
EstimatedSalary – Customer’s current estimated
salary.
•
Exited – A value of 1 indicates that the customer has
left the bank.
Now, as you have loaded the data in memory, your next task is to
cleanse it before feeding it to our network. We call it data preprocessing,
which is discussed next.
49
Chapter 2
a Closer look at tensorFlow
Data Preprocessing
The raw real data may not always meet the training requirements of a
neural network (ANN). Specifically, you will be checking out data and
processing it for the following items:
•
Data may contain null values.
•
All fields in the database may not be useful for learning.
•
The numerical fields may exhibit large variance in their
values and thus must be scaled to the same level.
•
Certain fields may contain categorical values, for
example, male and female; these need to be encoded to
0s and 1s.
•
Finally, you will need to decide which fields are to be
used as features and what the labels are.
So, let us start processing the data.
Checking Nulls
If the data contains null values, it will severely affect the network training.
The easiest way to check for a null value is to call the isnull function. This is
done using the following program statement:
data.isnull().sum()
The output of the preceding statement gives the following result:
RowNumber
CustomerId
Surname
CreditScore
Geography
50
0
0
0
0
0
Chapter 2
Gender
Age
Tenure
Balance
NumOfProducts
HasCrCard
IsActiveMember
EstimatedSalary
Exited
dtype: int64
a Closer look at tensorFlow
0
0
0
0
0
0
0
0
0
Computing the sum on a null value generates an error. Very clearly,
our dataset does not contain any null values. So there is no question of
removing rows from the dataset (the ones containing null fields).
Now comes the major task of selecting fields for machine learning.
Selecting Features and Labels
Not all the fields in our database would be useful for training the
algorithm. For example, fields such as CustomerId and Surname do
not make any sense to our machine learning. So, we need to drop these
columns. This is done using the following statement:
X = data.drop(labels=['CustomerId', 'Surname',
'RowNumber', 'Exited'], axis = 1)
Note that we dropped four fields from the dataset. X is a new array
containing the fields which are relevant to our model building.
Whether the customer exited the bank marks the output of our model.
Thus, the field Exited becomes the label for our model building. This is
extracted into the variable y using the following statement:
y = data['Exited']
At this point, you are ready with features (X) and labels (y) tensors.
51
Chapter 2
a Closer look at tensorFlow
Encoding Categorical Columns
You must check if any of the selected columns have categorical values.
For this, we will check the data types of all the selected columns using the
following statement:
X.dtypes
The output of the preceding statement is shown as follows:
CreditScore
Geography
Gender
Age
Tenure
Balance
NumOfProducts
HasCrCard
IsActiveMember
EstimatedSalary
Dtype: object
int64
object
object
int64
int64
float64
int64
int64
int64
float64
Note that Geography and Gender are object types. You can check the
values they contain by printing the first five rows of the features tensor as
shown in Figure 2-13.
Figure 2-13. Top five rows of the features vector
52
Chapter 2
a Closer look at tensorFlow
The Gender takes two categorical values, Male and Female, while
Geography has three categorical values – Germany, Spain, and France.
You will need to convert these into numerical values before feeding it to
the network. The encoding is done by using the LabelEncoder in sklearn’s
preprocessing module. This is shown in the following code snippet:
from sklearn.preprocessing import LabelEncoder
label = LabelEncoder()
X['Geography'] = label.fit_transform(X['Geography'])
X['Gender'] = label.fit_transform(X['Gender'])
The one-hot encoding creates dummy variables for categorical
columns. As Geography column has three distinct values, it will create
three variables – one for each country. Thus, we will have three features
pertaining to the country in our training dataset. Having many features
increases the training time. To reduce the number of features, you can
exclude one of the dummy variables pertaining to the country field and yet
achieve the same results. We drop the first variable by calling get_dummies
method:
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
If you examine the data at this point by printing the top five rows, you
will notice that there are only two columns for Geography – Geography_1
and Geography_2.
One more thing that we need to do before feeding data to the network
is to scale all the numerical values in the range –1 to 1.
53
Chapter 2
a Closer look at tensorFlow
Scaling Numerical Values
As the features in the real data can have a wide range of data values,
machine learning would work better if we standardized all these data
points to the same scale. Ideally, the mean for each column should be
0, and the standard deviation should be 1 for better results on machine
learning. So we transform all our data points using the equation:
z = (x - mu) / s
where mu is mean and s is standard deviation. This standardization
is performed by using the StandardScaler function of sklearn as shown in
this code snippet:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
At this point, our preprocessing of data is completed. We are now
ready to define and train the model. When we train the model, we also
need to validate our training. If the results of the training do not meet our
expectations, we will need to further preprocess the data – like adjusting
the number of features and so on. For testing, we will reserve a part of the
data. Thus, we split our entire preprocessed data – a larger portion used for
training and the smaller portion for testing the trained model.
Creating Training and Testing Datasets
To split the data into two portions, we use the train_test_split method of
sklearn as in the following program statement:
# Split dataset into training and testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.3)
54
Chapter 2
a Closer look at tensorFlow
The test_size parameter determines what percentage of data should be
reserved for testing. The function returns a set of vectors for training and testing.
Many times, the terms validation and test datasets are used
interchangeably. To avoid further confusions, I am giving widely accepted
unambiguous definitions of these terms:
•
Training dataset – The part of data that is used for
model fitting.
•
Validation dataset – The part of data used for tuning
hyperparameters during training.
•
Test dataset – The part of data used for evaluating a
model’s performance after its training.
Now, it is time to define our network.
Defining ANN
After preprocessing, we have 11 features in our dataset. The number of
features is determined by computing the shape of the training dataset with
the following statement:
X_train.shape[1]
The expected output of the network is a binary value indicating the
likelihood of the customer leaving the bank. The target values are specified
in the y_train vector.
You will create a four-layer deep learning network model. In the first
layer, you will use 128 nodes, the second one will have 64, the third one will
have 32, and the fourth will be a single output node. To create the network,
you use tf.keras API, which is a new standard in TensorFlow. You use
Sequential API to create a linear stack of layers. You instantiate the model
using the following statement:
model = keras.models.Sequential()
55
Chapter 2
a Closer look at tensorFlow
You add the first layer to the stack consisting of 128 nodes using the
following statement:
model.add(keras.layers.Dense(128, activation = 'relu',
input_dim = X_train.shape[1]))
The input dimension to this layer is set in the parameter input_dim,
which is the number of features defined by the shape of X_train vector. We
use ReLU (rectified linear unit) as the activation function. The activation
function is used in deciding whether the node is to be activated depending
on its weighted sum. ReLU is the most widely used activation function that
outputs 0 for negative inputs and 1 otherwise.
Likewise, you add the second layer to the network using the
following statement:
model.add(keras.layers.Dense(64, activation = 'relu'))
The input to this layer comes from the previous layer so there is no
need to specify the dimensions of the input vector. The third layer is added
using the statement:
model.add(keras.layers.Dense(32, activation = 'relu'))
Finally, the last layer in the network is added using the statement:
model.add(keras.layers.Dense(1, activation = 'sigmoid'))
We use sigmoid as the activation function here as this layer is
outputting a binary value. A sigmoid function is a type of activation
function and is also known as a squashing function. A squashing function
limits the output to a range between 0 and 1, making it suitable in
predicting probabilities.
You print the network summary by calling the summary function
as follows:
model.summary()
56
Chapter 2
a Closer look at tensorFlow
The summary as printed on the screen is shown here:
Model: "sequential"
_______________________________________________________________
Layer (type)
Output Shape
Param #
===============================================================
dense (Dense)
(None, 128)
1536
_______________________________________________________________
dense_1 (Dense)
(None, 64)
8256
_______________________________________________________________
dense_2 (Dense)
(None, 32)
2080
_______________________________________________________________
dense_3 (Dense)
(None, 1)
33
===============================================================
Total params: 11,905
Trainable params: 11,905
Non-trainable params: 0
_______________________________________________________________
Compiling Model
After the model architecture is defined, it needs to be compiled. To
compile the model, you call the model’s compile method:
model.compile(loss = 'binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
As the model that we are developing is a binary classifier, we use the
binary_crossentropy as our loss function. We use Adam optimizer while
training the model as this is suited best in such situations. Later on after
the training, if you are not satisfied with the model’s performance, you may
experiment with other optimizers. The accuracy metrics are collected for
analysis by specifying the value for metrics parameter.
57
Chapter 2
a Closer look at tensorFlow
I will also show you how to use TensorBoard for analyzing the network
performance. For this, we need to define a callback function which will be
called at each epoch during training. We will be collecting the logs in the
log folder. To clear the earlier log, we use the following action:
!rm -rf ./log/
You define the callback function using the following code snippet:
#tensorboard visualization
import datetime, os
logdir = os.path.join("log",
datetime.datetime.now().
strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir,
histogram_freq = 1)
With this setup for training analysis and the compilation of the model,
we are now ready to start the training.
Model Training
To train the model, you use the fit method on the model instance:
r = model.fit(X_train, y_train, batch_size = 32, epochs = 50,
validation_data = (X_test, y_test),
callbacks = [tensorboard_callback])
The first parameter to the fit function defines the features vector and
the second defines the labels. The batch_size parameter as the name
suggests defines the batch size for the training. The epochs parameter
determines how many iterations would be performed during training.
The test data that we generated during data preprocessing is used for
58
Chapter 2
a Closer look at tensorFlow
model validation and is passed to the fit function in the validation_data
parameter. Lastly, the callbacks parameter specifies which callback
function would be called at the end of each iteration. The partial output
during training is shown in Figure 2-14.
Figure 2-14. Program output during training
Once the training is over, you can use the collected metrics to evaluate
if the model is trained to your desired accuracy.
Performance Evaluation
To evaluate the performance, we will launch the TensorBoard in our Colab
environment using the %tensorboard magic. Before this, we need to load
the tensorboard using %load_ext magic.
%load_ext tensorboard
%tensorboard --logdir log #command to launch tensorboard on
colab
Running this magic runs the TensorBoard, and you will see the
accuracy and loss metrics plotted on the screen. The accuracy and loss
metrics plot is shown in Figure 2-15.
59
Chapter 2
a Closer look at tensorFlow
Figure 2-15. Accuracy and loss metrics in TensorBoard
The two curves shown here are plotted on the training and the
validation data. The examination of accuracy and loss metrics helps you
in determining if the model is performing well. The plots basically show
you the model’s accuracy and the loss at every epoch. If the accuracy is
improving on every epoch, your training is in the right direction. Similarly,
the loss should keep on reducing on every epoch. These plots can easily
detect issues such as overfitting and so on. If you are not satisfied with
the performance, you may adjust the model parameters and retrain it to
improve the accuracy. You may try a different optimizer and/or introduce
regularization to improve the model accuracy.
You can also evaluate the model’s performance on the test data by
calling the evaluate method and passing the test features and labels
vectors as parameters. The program statement for evaluating the model
and its output is shown here:
test_scores = model.evaluate(X_test, y_test)
print('Test Loss: ', test_scores[0])
print('Test accuracy: ', test_scores[1] * 100)
Test Loss: 0.6143370634714762
Test accuracy: 83.96666646003723
The accuracy on the test data which is about 83% indicates that the
model will correctly classify 83% of the given data points.
60
Chapter 2
a Closer look at tensorFlow
I will also show you how to plot the performance charts on the
validation data using the matplotlib – a traditional way of performance
evaluation. Use the following code snippet to do so:
%matplotlib inline
import matplotlib.pyplot as plt #for plotting curves
plt.plot(r.history['val_accuracy'], label='val_acc')
plt.plot(r.history['val_loss'], label='val_loss')
plt.legend()
plt.show()
The plot generated by matplotlib is shown in Figure 2-16.
Figure 2-16. Accuracy/loss matrix on the validation data
Besides the accuracy and loss metrics, we also used the confusion
matrix quite often to evaluate the performance of our network, which is
discussed next.
61
Chapter 2
a Closer look at tensorFlow
Predicting on Test Data
The confusion matrix requires both the predictions and the True labels.
Thus, we first need to generate predictions on our test data. For predicting,
use predict_classes method as shown here:
y_pred = model.predict_classes(X_test)
The method takes the features vector as its argument and returns a
tensor of predictions. You may print the predictions on the console. The
result is shown as follows:
y_pred
array([[1],
[0],
[0],
...,
[1],
[0],
[0]], dtype=int32)
Here the value of 1 at any index value indicates that the customer is
going to leave the bank, and the value of 0 indicates that the bank has
retained the customer.
You use these prediction results to create and plot a confusion matrix
that provides a better visualization of the model’s performance.
62
Chapter 2
a Closer look at tensorFlow
Confusion Matrix
I will first show you how to generate a confusion matrix and then show you
how to interpret the matrix plot. To generate the confusion matrix, you use
the sklearn’s built-in function as shown here:
from sklearn.metrics import confusion_matrix
cf = confusion_matrix(y_test, y_pred)
cf
This gives the following output:
array([[2175,
[ 283,
209],
333]])
You will plot this matrix to give you a visual effect using the
following code:
from mlxtend.plotting import plot_confusion_matrix
plot_confusion_matrix(conf_mat = cf, cmap = plt.cm.cmapname)
The plot is shown in Figure 2-17.
Figure 2-17. Confusion matrix
63
Chapter 2
a Closer look at tensorFlow
The x-axis represents the predicted labels, and the y-axis represents
the True labels. As the chart shows, there are 2175 True positives and 333
True negatives. A True positive indicates that the customers would leave
the bank, and they have been correctly classified by our model. Likewise,
a True negative indicates that these customers will not leave the bank, and
they have been correctly classified. The True positive and True negative
help us in determining the accuracy of our model.
The sklearn defines accuracy_score function to compute the accuracy
score which is calculated by adding the number of True positives and True
negatives and then dividing the sum by the total number of predictions. The
following program statement computes the accuracy score for our model:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
0.8396666666666667
The execution of this statement gives the accuracy score of 83.63%,
which is a largely accepted accuracy in machine learning.
As the model is trained to our satisfaction, it is now time to use it on
unseen data.
Predicting on Unseen Data
To create an unseen data for our use case, we need to know the data types
of all the features to which we will assign some dummy values.
The head dump of our features shows us the column names and the
range of values it holds. A partial dump of our features vector is shown in
Figure 2-18.
Figure 2-18. Screenshot of features data
64
Chapter 2
a Closer look at tensorFlow
So, for our unseen test data, we will use the following values:
CreditScore = 615
Gender = Male
Age = 22
Tenure = m5
Balance = 20000
NumOfProducts = 1
HasCrCard = 1
IsActiveMember = 1
EstimatedSalary = 60000
Geography = Spain
You will input this data to our trained model by calling its predict method
with the preceding values at appropriate indexes in the parameters list.
customer = model.predict([[615, 1, 22, 5, 20000, 5, 1, 1,
60000, 0, 0]])
customer
if customer[0] == 1:
print ("Customer is likely to leave")
else:
print ("Customer will stay")
The execution of the preceding code segment gives this output:
Customer will stay
The value of 0 indicates that this customer is unlikely to leave the bank.
Note that the accuracy of this prediction is still about 83% as computed
earlier.
65
Chapter 2
a Closer look at tensorFlow
After the model is fully trained to your satisfaction, you may save
it to disk and deploy it on your production server for real-life use. How
this is done is explained in the next chapter when I discuss the tf.keras
implementation in depth.
Full Source Code
The complete source code for the project is given in Listing 2-2 for your
quick reference.
Listing 2-2. Binary classification full source
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
import pandas as pd
# Load data from Github
data_url = 'https://raw.githubusercontent.com/Apress/artificialneural-networks-with-tensorflow-2/main/ch02/Churn_Modelling.csv'
data=pd.read_csv(data_url)
# Shuffle data for taking care of patterns in data collection
from sklearn.utils import shuffle
data=shuffle(data) #shuffling the data
# Examine loaded data
data
# Check for null values
data.isnull().sum()
# Drop irrelevant columns to set up features vector
X = data.drop(labels=['CustomerId', 'Surname', 'RowNumber',
'Exited'], axis = 1)
66
Chapter 2
a Closer look at tensorFlow
# Set up labels vector
y = data['Exited']
# Check data types for finding categorical columns
X.dtypes
# Examine few records for finding values in categorical columns
X.head()
# Encode categorical columns
from sklearn.preprocessing import LabelEncoder
label = LabelEncoder()
X['Geography'] = label.fit_transform(X['Geography'])
X['Gender'] = label.fit_transform(X['Gender'])
# Drop the first column of Geography to reduce the number of
features
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
X.head()
# Scale all data points to -1 to + 1
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split dataset into training and validation
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_
size = 0.3)
# Determine number of features
X_train.shape[1]
# Create a stacked layers sequential network
67
Chapter 2
a Closer look at tensorFlow
model = keras.models.Sequential() # Create linear stack of
layers
model.add(keras.layers.Dense(128, activation = 'relu', input_
dim = X_train.shape[1])) # Dense fully connected layer
model.add(keras.layers.Dense(64, activation = 'relu'))
model.add(keras.layers.Dense(32, activation = 'relu'))
model.add(keras.layers.Dense(1, activation = 'sigmoid')) #
activation sigmoid for a single output
# Print model summary
model.summary()
# Compile model with desired loss function, optimizer and
evaluation metrics
model.compile(loss = 'binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
#to clear any other logs if present so that graphs won't
overlap with previous saved logs in tensorboard
!rm -rf ./log/
#tensorboard visualization
import datetime, os
logdir = os.path.join("log", datetime.datetime.now().
strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir,
histogram_freq = 1)
# Perform training
r = model.fit(X_train, y_train, batch_size = 32, epochs = 50,
validation_data = (X_test, y_test), callbacks = [tensorboard_
callback])
68
Chapter 2
a Closer look at tensorFlow
# Load tensorboard in Colab
%load_ext tensorboard
%tensorboard --logdir log #command to launch tensorboard on
colab
# evaluate model performance on test data
test_scores = model.evaluate(X_test, y_test)
print('Test Loss: ', test_scores[0])
print('Test accuracy: ', test_scores[1] * 100)
# Plot metrics in matplotlib
%matplotlib inline
import matplotlib.pyplot as plt #for plotting curves
plt.plot(r.history['val_accuracy'], label='val_acc')
plt.plot(r.history['val_loss'], label='val_loss')
plt.legend()
plt.show()
# Predict on test data
y_pred = model.predict_classes(X_test)
y_pred
# Create confusion matrix
from sklearn.metrics import confusion_matrix
cf = confusion_matrix(y_test, y_pred)
cf
# Plot confusion matrix
from mlxtend.plotting import plot_confusion_matrix
plot_confusion_matrix(conf_mat = cf, cmap = plt.cm.cmapname)
# Compute accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
69
Chapter 2
a Closer look at tensorFlow
# Predict on unseen customer data
customer = model.predict([[615, 1, 22, 5, 20000, 5, 1, 1,
60000, 0, 0]])
customer
if customer[0] == 1:
print ("Customer is likely to leave")
else:
print ("Customer will stay")
Summary
In this chapter, you set up your environment for deep learning using
TensorFlow 2.x. You used Colab for developing your Python notebooks.
A trivial application helped you learn the development environment.
This was followed by a more detailed realistic example. In this realistic
example, you loaded an external database and learned how to preprocess
the data for making it suitable for machine learning, how to define a
deep neural network, how to compile it, how to train it, how to evaluate
the model’s performance using TensorBoard and plots from matplotlib,
and finally how to use the trained model to make predictions on unseen
data. Though the model’s performance is evaluated, I never talked about
how to improve it. In the next chapter, you will learn a few techniques
for improving a model’s performance. The next chapter covers the
TensorFlow Keras integration in more depth and discusses the image
classification problem.
70