Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Closer Look at TensorFlow

2020
...Read more
25 © Poornachandra Sarang 2021 P. Sarang, Artificial Neural Networks with TensorFlow 2, https://doi.org/10.1007/978-1-4842-6150-7_2 CHAPTER 2 A Closer Look at TensorFlow In the previous chapter, you saw the capabilities of the TensorFlow platform. Having seen a glimpse of TensorFlow powers, it is time now to start learning how to harness this power into your own real-world applications. We will start with a trivial application that will teach you the intricacies of a simple ML application development. A Trivial Machine Learning Application To get you started on TensorFlow coding, we will start with a trivial Hello World kind of application. In this trivial application, you will develop a machine learning model that does the predictions using statistical regression techniques. In this application, we will use a fixed set of data points declared within the program code itself. Our data will consist of (x, y) coordinate values. We compute a value called z that has some linear relationship with x and y. For example, the value of z for a given x and y values may be computed using the following mathematical equation: z = 7 * x + 6 * y + 5
26 Our task is to make the machine learn on its own to find the best fit for the preceding relationship given a sufficiently large number of x and y values and the corresponding target z values. Once the model is trained, we will use this model to predict z for any unseen x and y values. For example, given x equals 2 and y equals 3, the model should predict an output z equal to 37. If it predicts 37 and likewise if it predicts z with 100% accuracy for any not previously known x and y, we say that the model is fully trained with 100% accuracy. Practically, it is never possible to develop a model that predicts with 100% accuracy. So we try to optimize the model performance to achieve this idealistic accuracy level of 100%. As you can see from the preceding discussion, the problem that we are trying to solve is a classical linear regression case study. To keep things simple, we will create a single-layer network consisting of only one neuron, which is trained to solve a linear regression problem. In practice, your network will always consist of multiple layers with multiple nodes. In this trivial application, I will avoid the use of such deep networks as defining those requires a deeper understanding of Keras API. You will be exposed to those Keras APIs later in this chapter. For this trivial application and all subsequent applications in this book, you will use Google Colab for developments. Creating Colab Notebook In this first application, I will guide you through the entire process of creating, testing, and inferring a ML model development in Colab. This is a bit of a detailed explanation of the development process for the benefit of the readers who are new to ML development. Start Google Colab in your browser by typing the following URL: http://colab.research.google.com You will see the screen shown in Figure 2-1. CHAPTER 2 A CLOSER LOOK AT TENSORFLOW
CHAPTER 2 A Closer Look at TensorFlow In the previous chapter, you saw the capabilities of the TensorFlow platform. Having seen a glimpse of TensorFlow powers, it is time now to start learning how to harness this power into your own real-world applications. We will start with a trivial application that will teach you the intricacies of a simple ML application development. A Trivial Machine Learning Application To get you started on TensorFlow coding, we will start with a trivial Hello World kind of application. In this trivial application, you will develop a machine learning model that does the predictions using statistical regression techniques. In this application, we will use a fixed set of data points declared within the program code itself. Our data will consist of (x, y) coordinate values. We compute a value called z that has some linear relationship with x and y. For example, the value of z for a given x and y values may be computed using the following mathematical equation: z = 7 * x + 6 * y + 5 © Poornachandra Sarang 2021 P. Sarang, Artificial Neural Networks with TensorFlow 2, https://doi.org/10.1007/978-1-4842-6150-7_2 25 Chapter 2 a Closer look at tensorFlow Our task is to make the machine learn on its own to find the best fit for the preceding relationship given a sufficiently large number of x and y values and the corresponding target z values. Once the model is trained, we will use this model to predict z for any unseen x and y values. For example, given x equals 2 and y equals 3, the model should predict an output z equal to 37. If it predicts 37 and likewise if it predicts z with 100% accuracy for any not previously known x and y, we say that the model is fully trained with 100% accuracy. Practically, it is never possible to develop a model that predicts with 100% accuracy. So we try to optimize the model performance to achieve this idealistic accuracy level of 100%. As you can see from the preceding discussion, the problem that we are trying to solve is a classical linear regression case study. To keep things simple, we will create a single-layer network consisting of only one neuron, which is trained to solve a linear regression problem. In practice, your network will always consist of multiple layers with multiple nodes. In this trivial application, I will avoid the use of such deep networks as defining those requires a deeper understanding of Keras API. You will be exposed to those Keras APIs later in this chapter. For this trivial application and all subsequent applications in this book, you will use Google Colab for developments. Creating Colab Notebook In this first application, I will guide you through the entire process of creating, testing, and inferring a ML model development in Colab. This is a bit of a detailed explanation of the development process for the benefit of the readers who are new to ML development. Start Google Colab in your browser by typing the following URL: http://colab.research.google.com You will see the screen shown in Figure 2-1. 26 Chapter 2 a Closer look at tensorFlow Figure 2-1. Creating a new Colab notebook Select NEW PYTHON3 NOTEBOOK option to open a new Python 3 notebook. Assuming that you are logged in to your Google account, you would see a screen as shown in Figure 2-2. Figure 2-2. New Colab notebook 27 Chapter 2 a Closer look at tensorFlow The default name for the notebook starts with Untitledxxx.ipnyb. Change the name to Hello World or whatever you prefer. Next, you will write code to import TensorFlow libraries in your Python code. Imports Our trivial program will require three imports – TensorFlow 2.x, numpy library for handling our data, and matplotlib to do some charting. Importing TensorFlow 2.x To import TensorFlow in your Python notebook, you would use the following program statement: import tensorflow as tf This imports the default version, which is currently 1.x (at the time of this writing). The output of executing the preceding command is shown in Figure 2-3. Figure 2-3. Default TensorFlow library import As this book is based on TensorFlow 2.x, we need to import it explicitly. To do so, you must run a tensorflow_version magic. Magic is a feature of Colab and is run using the following statement: %tensorflow_version 2.x 28 Chapter 2 a Closer look at tensorFlow Figure 2-4. Loading TensorFlow 2.x When you run the code, TensorFlow 2.x will be selected. The output is shown in Figure 2-4. After the TensorFlow 2.x is selected, you would import TensorFlow libraries using the traditional import statement as follows: import tensorflow as tf Note the use of magic is no more required in the current version of Colab. Keras library is now part of TensorFlow. To use Keras in our application, we need to import it from TensorFlow. This is done using the following import statement: from tensorflow import keras To use Keras modules, you now use tf.keras syntax. Next, you will import other required libraries. 29 Chapter 2 a Closer look at tensorFlow Importing numpy NumPy is a library for supporting large, multidimensional arrays in Python. It has a collection of high-level mathematical functions to operate on these arrays. Any machine learning model development relies heavily on the use of arrays. You will be using numpy arrays to store the input data required by our network. To import numpy, you use the following import statement: import numpy as np The matplotlib is a Python library for creating quality 2D plots. You will use this library in our project for plotting accuracy and error metrics. To import matplotlib, you use the following statement: import matplotlib.pyplot as plt This completes our imports for the application. Next, you will be creating data for our application. Setting Up Data We will create a set of 100 data points consisting of x and y coordinates. The count for data points is declared in the Python variable using the statement: number_of_datapoints = 100 To generate x and y coordinates, you use the random module in numpy. To generate x values, you use the following program statement: # generate random x values in the range -5 to +5 x = np.random.uniform(low = -5 , high = 5 , size = (number_of_datapoints, 1)) 30 Chapter 2 a Closer look at tensorFlow The low and high parameters in the uniform function define the lower and upper bounds for the random number generator. The size parameter specifies the dimensions of the array, that is, how many values are to be generated. The return value of the preceding program statement is an array consisting of 100 rows and 1 column. You can print the first five values of the generated array using the statement: x[:5,:].round(2) In the output, each value is truncated to two decimal digits by calling the round function. A sample output of the execution of this statement is shown here: array([[ 4.57], [-0.68], [ 2.64], [-3.17], [-4.86]]) Note that the output varies on every run. Likewise, the y values are generated using a similar statement as shown here: y = np.random.uniform(-5 , 5 , size = (number_of_datapoints , 1)) We set up the relation between x and y using a linear equation: z = 7 * x + 6 * y + 5 In machine learning terms, the x and y are the features and z is the label. Once our model is trained, we will ask the model to predict z for the given x and y. I have mentioned it earlier that we will be training our network to discover the relationship between x and y. For this, we need to introduce some noise in every value of z that we compute using the 31 Chapter 2 a Closer look at tensorFlow preceding equation. You generate the noise using the random function as earlier with the values ranging from –1 to +1 using the following statement: noise = np.random.uniform(low =-1 , high =1, size = (number_of_datapoints, 1)) Now, you create the z array using the linear equation and adding noise to it as shown in this statement: z = 7 * x + 6 * y + 5 + noise The input to our neural network is a single dimensional array consisting of 100 rows with each row consisting of another single dimensional array having x and y values as columns. To create the required input data format, you use the column_stack function as follows: input = np.column_stack((x,y)) Printing the first five values of the input array produces the following output: array([[-1.9 , 2.91], [-2.14, -0.81], [ 4.18, 1.79], [-0.93, -4.41], [-1.8 , -1.31]]) At this point, you are ready with the data for training the network. Our next task is to create a network itself. Defining Neural Network As mentioned earlier, our neural network will consist of a single neuron that accepts a single dimensional vector and outputs a single value. The network is depicted in Figure 2-5. 32 Chapter 2 a Closer look at tensorFlow Figure 2-5. A single layer/single node network To define the network models, Keras provides a Sequential API. Using this API, you will be able to construct multilevel sophisticated network architectures. In our current requirement, we need to create an ANN architecture consisting of a single layer with a single neuron. You define the model using the following statement: model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])]) The units parameter defines the dimensionality of the output space. Here, by specifying a value of 1, you define a single layer network with a single neuron outputting a single value. The Dense function takes several parameters that allows you to create complex ANN architectures. You would create lots of complex architectures throughout this book using the keras.Sequential API. After the model is defined, we need to compile it and make it ready for training on our dataset. 33 Chapter 2 a Closer look at tensorFlow Compiling Model To train a model, we first need to define a learning process. The model compilation is a way of setting up its learning process. The learning process itself consists of a few components: • Objective loss function • Optimizer • Metrics Firstly, it uses some loss function to determine how far the inference is from the target value. The model tries to minimize the loss during its training. Keras provides several predefined loss functions, to name a few, categorical_crossentropy, mean_squared_error, huber_loss, and poisson. Secondly, to help in reducing the losses, we use an optimizer. An optimizer is an algorithm or a method used to change the attributes of a neural network. The attributes are the weights and the learning rate. By changing these attributes, we try to minimize the losses. Keras provides several predefined optimizers, again to name a few, SGD (stochastic gradient descent), RMSprop, Adagrad, and Adam. Lastly, we define a metric function that is used to judge the performance of the model – to name a few predefined metric functions, MSE (mean squared error), RMSE (root mean squared error), MAE (mean absolute error), and MAPE (mean absolute percentage error). So we set up the learning process by calling the compile function on the model. The compile takes the abovesaid three parameters as its arguments. The following code segment illustrates the use of compile: model.compile(optimizer = 'sgd' , loss = 'mean_squared_error' , metrics = ['mse'] ) 34 Chapter 2 a Closer look at tensorFlow Here we specify the stochastic gradient descent as the optimizer, the mean squared error as the loss function, and again the mean squared error as the metrics. Now, as the model knows the learning process, it is time to feed some training data to it. Training Network The model is trained in several iterations. Initially, we assign some preset weights to the various nodes in the network. After the first training iteration, we look at the losses and then adjust these weights for the second iteration where we try to minimize the losses. The process continues through several iterations; we call it epochs. At the end of each epoch, we save and monitor the losses to ensure that we are optimizing the network in the right direction. To save this state, we need to create a history object in Keras, which is done using the following code segment: from tensorflow.keras.callbacks import History history = History() We will pass the preceding history object to the model training method as a parameter. The training itself is done by calling the model’s fit method as shown here: model.fit(input, z , epochs = 15 , verbose = 1, validation_split = 0.2, callbacks = [history]) The first parameter specifies the stacked input that we have created earlier. The second parameter specifies the target values. The epochs parameter defines the number of iterations. The verbose parameter specifies if you want to observe the training progress. The validation_split parameter specifies that 20% of the given data would be used for validating 35 Chapter 2 a Closer look at tensorFlow the trained model. Lastly, the callbacks parameter specifies where the intermediate monitoring data would be stored. It specifies the callback function which is called at the end of each epoch. The partial output during training is shown in Figure 2-6. Figure 2-6. Output during model training Once all epochs are completed, the model is supposedly trained. We need to verify that the model is indeed trained to meet our needs. To do so, we will examine the training output by observing the metrics that we have asked the model to create during its learning. Examining Training Output We have asked the model to save the status at each epoch in a history variable. We can examine what is recorded in history with the help of the following statement: print(history.history.keys()) The output of this print statement would be dict_keys(['loss', 'mse', 'val_loss', 'val_mse']) 36 Chapter 2 a Closer look at tensorFlow We see that at each epoch the model has saved the loss and the mse (mean squared error) on both the training and the validation data. The validation loss and mse are indicated with the prefix val_. We will now create a plot for both loss and mse. To print the loss on both the training and validation data, use the following code segment: plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title(Accuracy') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'validation'], loc='upper right') plt.show() Here, we plot the loss and val_loss key values from the recorded history. The output is shown in Figure 2-7. Figure 2-7. The loss vs. epoch 37 Chapter 2 a Closer look at tensorFlow We observe from Figure 2-7 that the loss is minimized quite early by the end of the third epoch, and the model is fully trained by the end of 15 epochs that we have specified in the fit method. We will also print the mean squared error to further verify this claim. To plot the mse, use the following code segment: plt.plot(history.history['mse']) plt.plot(history.history['val_mse']) plt.title('mean squared error') plt.ylabel('mse') plt.xlabel('epochs') plt.legend(['train' , 'validation'] , loc = 'upper right') plt.show() The output of the execution of the preceding code is shown in Figure 2-8. Figure 2-8. mse vs. epoch 38 Chapter 2 a Closer look at tensorFlow Lastly, we will also plot the predicted output vs. the real output using the following code segment: plt.plot(np.squeeze(model.predict_on_batch(input)), np.squeeze(z)) plt.xlabel('predicted output') plt.ylabel('real output') plt.show() The program output is shown in Figure 2-9. Figure 2-9. Predicted vs. real output values As you observe in Figure 2-9, the output predicted by the model is very close to the expected output. So, we can certainly assume that the model is well trained. To pass the test, we need to test the model’s prediction on unseen data, which is done next. 39 Chapter 2 a Closer look at tensorFlow Predicting To make a prediction on an unseen x and y values, you will use the predict function on the trained model. This is shown in the following program statement: print("Predicted z for x=2, y=3 ---> ", model.predict([[2,3]]).round(2)) Here, we specify x equals 2 and y equals 3. The result is rounded to two decimal digits. When you execute the preceding print statement, you would see the following output: Predicted z for x=2, y=3 ---> [[36.99]] Now, let us check whether the prediction is close enough to the expected output. To see the expected output, run the following code: # Checking from equation # z = 7*x + 6*y + 5 print("Expected output: ", 7*2 + 6*3 + 5) The execution prints 37 on the screen. Our model’s prediction is 36.99, which is close enough to the expected value. Note that if you run the code, the predicted output would vary on each run because the model’s accuracy varies each time. You can test the model’s predictions with a few more x and y values to satisfy yourself on the model’s training. Full Source Code The full source code for our trivial Hello World application described earlier is given in Listing 2-1 for your quick reference. 40 Chapter 2 a Closer look at tensorFlow Listing 2-1. A trivial linear regression application source # Load TensorFlow 2.x in a Colab project. %tensorflow_version 2.x # Import required libraries import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # Set up data number_of_datapoints = 1000 # generate random x values in the range -5 to +5 x = np.random.uniform(low = -5 , high = 5 , size = (number_of_ datapoints, 1)) # generate random y values in the range -5 to +5 y = np.random.uniform(-5 , 5 , size = (number_of_datapoints , 1)) # generate some random error in the range -1 to +1 noise = np.random.uniform(low =-1 , high =1, size = (number_of_ datapoints, 1)) z = 7 * x + 6 * y + 5 + noise # Print x, y and z sample values for manual verification x[:5,:].round(2) y[:2,:].round(2) z[:2,:].round(2) # Stack x and y arrays for inputting to neural network input = np.column_stack((x,y)) # Print few values of input array for demonstration purpose. input[:2,:].round(2) 41 Chapter 2 a Closer look at tensorFlow # Create a Keras sequential model consisting of single layer with a single neuron. model = tf.keras.Sequential([tf.keras.layers.Dense(units=1)]) # Compile the model with the spcified optimizier, loss function and error metrics. model.compile(optimizer = 'sgd' , loss = 'mean_squared_error' , metrics = ['mse'] ) # Import History module to record loss and accuracy on each epoch during training from tensorflow.keras.callbacks import History history = History() model.fit(input, z , epochs = 15 , verbose = 1, validation_ split=0.2, callbacks=[history]) # Print keys in the history just to know their names. These will be used for plotting the metrics. print(history.history.keys()) # Plot the loss metric on both training and validation datasets. plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('Accuracy') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'validation'], loc='upper right') plt.show() #Plot the mean squared error on both training and validation datasets. plt.plot(history.history['mse']) 42 Chapter 2 a Closer look at tensorFlow plt.plot(history.history['val_mse']) plt.title('mean squared error') plt.ylabel('mse') plt.xlabel('epochs') plt.legend(['train' , 'validation'] , loc = 'upper right') plt.show() plt.plot(np.squeeze(model.predict_on_batch(input)), np.squeeze(z)) plt.xlabel('predicted output') plt.ylabel('real output') plt.show() print("Predicted z for x=2, y=3 ---> ", model.predict([[2,3]]). round(2)) # Checking from equation # z = 7*x + 6*y + 5 print("Expected output: ", 7*2 + 6*3 + 5) If you have run this trivial project and are getting the preceding output, congratulations! Your setup for deep learning with TensorFlow 2.x is now complete. You will now dive deep into real-world machine learning development. In the next section, you will learn a real-world machine learning development life cycle. You will be using a real dataset, learn how to preprocess it and make it ready for feeding into a neural network, define a multilevel deep neural network, train it, test the model, and plot the accuracy metrics to refine the model. Not only this, you will learn how to use TensorBoard in Colab environment to visualize the metrics for doing analysis on the model training. So let us start on this real-world project based on TensorFlow. 43 Chapter 2 a Closer look at tensorFlow Binary Classification in TensorFlow In the previous example, we used the built-in dataset for training our model. Now, we will use a realistic dataset to do some real-world machine learning. You will be solving a classification problem. We will use a dataset from the popular Kaggle site used in one of their challenges. The dataset contains the customer data of a bank. Consider now that the bank has approached you to develop a ML prediction model which provides them some insights on the likelihood of the customer leaving the bank. In the financial terms, this is called churning. Having the knowledge that a certain customer may leave the bank in the near future, the bank can take some preventive measures to retain the customer. The problem that you are trying to solve is to develop a binary classification model. You will use TensorFlow’s deep learning library and will make use of Keras high-level API for implementing the model. More specifically, you will learn the following in this example: • How to load a CSV data from a local or a remote server? • How to preprocess it and make it ready for ML algorithm? • How to define a multilayer ANN using TensorFlow’s high-level Keras API? • How to train the model? • Evaluating the model’s performance on test data. • Visualizing the results on TensorBoard. • Doing performance analysis. • Inferring on unseen data. So let us start with the project development. 44 Chapter 2 a Closer look at tensorFlow Setting Up Project Create a new Colab project and name it Binary Classification. The project uses the bank customer data (Churn_Modelling.csv) which is available in the book’s source download. Copy the downloaded file to your Google Drive in a folder of your choice. Note that you will need to later on set up the file path appropriately in your program code. If you do not wish to download the data file, you will still be able to run the project taking the data from the GitHub setup for this book. You are now ready to code your project. Imports As in the earlier example, include the following imports in the code window: %tensorflow_version 2.x import tensorflow as tf from tensorflow import keras You will be using pandas dataframes for loading external database. You will use sklearn library for preprocessing the data and creating the training/validation datasets. You will use the matplotlib for some charting. To use these libraries, include the following imports in your project code: #loading data import pandas as pd #scaling feature values from sklearn.preprocessing import StandardScaler #encoding target values from sklearn.preprocessing import LabelEncoder #shuffling data from sklearn.utils import shuffle #splitting the dataset into training and validation 45 Chapter 2 a Closer look at tensorFlow from sklearn.model_selection import train_test_split #plotting curves import matplotlib.pyplot as plt Next, you need to mount the drive in your program so that the program can access the documents stored in Google Drive. Mounting Google Drive To mount Google Drive, type the following code in a new code cell: from google.colab import drive drive.mount('/content/drive') When you run this code, it will ask you to enter the authorization code to access your drive. You will see the screen shown in Figure 2-10. Figure 2-10. Authorization code for Google Drive Click the provided link. You will be asked to sign in to your Google account. You will see the authorization code similar to that shown in Figure 2-11. 46 Chapter 2 a Closer look at tensorFlow Figure 2-11. Google sign-in authorization Click the icon next to the authorization code to copy it to the clipboard. Paste the code in the earlier shown authorization window. After a successful authorization, you will see this message on your screen: Mounted at /content/drive Now, you are ready to access the contents of your drive through your program code. Loading Data To load the data, enter the following program code in a new code cell and execute it: data = pd.read_csv('/content/drive/ <path to downloaded CSV>/Churn_Modelling.csv') Note you will need to set up the appropriate path to your csv file. If you have decided to use the data from the book’s GitHub, use the following code instead of the preceding code segment: data_url = 'https://raw.githubusercontent.com/Apress/artificialneural-networks-with-tensorflow-2/main/ch02/Churn_Modelling.csv' data=pd.read_csv(data_url) 47 Chapter 2 a Closer look at tensorFlow The read_csv function loads the data from the specified file and copies it in a pandas dataframe. Shuffling Data The data that is collected on field may be in a specific order as per the convenience and the comfort of the data collector. For better machine learning, you should randomize the data so that the learning does not follow the undesired patterns in the data. So, we shuffle the data using the following statement: data=shuffle(data) Examining Data You can verify that the data is correctly loaded by printing the contents of the data dataframe. Instead of printing the top rows by calling data.head(), I have printed the full dataset so that you will know the number of records and columns present in it. This is shown in Figure 2-12. Figure 2-12. Dataset 48 Chapter 2 a Closer look at tensorFlow There are 10,000 rows and 14 columns in the database. A brief explanation of the various fields is given here: • RowNumber – Numbers from 1 to 10,000. • CustomerId – A unique identification for the customer. • Surname – Customer’s last name. • CreditScore – Customer’s credit score. • Geography – Customer’s country. • Gender – Male or female. • Age – Customer’s age. • Tenure – How long the customer is banking with them? • Balance – Customer’s bank balance. • NumOfProducts – The count of bank products the customer is currently using. • HasCrCard – Does the customer hold a credit card? • IsActiveMember – Is the customer currently active? • EstimatedSalary – Customer’s current estimated salary. • Exited – A value of 1 indicates that the customer has left the bank. Now, as you have loaded the data in memory, your next task is to cleanse it before feeding it to our network. We call it data preprocessing, which is discussed next. 49 Chapter 2 a Closer look at tensorFlow Data Preprocessing The raw real data may not always meet the training requirements of a neural network (ANN). Specifically, you will be checking out data and processing it for the following items: • Data may contain null values. • All fields in the database may not be useful for learning. • The numerical fields may exhibit large variance in their values and thus must be scaled to the same level. • Certain fields may contain categorical values, for example, male and female; these need to be encoded to 0s and 1s. • Finally, you will need to decide which fields are to be used as features and what the labels are. So, let us start processing the data. Checking Nulls If the data contains null values, it will severely affect the network training. The easiest way to check for a null value is to call the isnull function. This is done using the following program statement: data.isnull().sum() The output of the preceding statement gives the following result: RowNumber CustomerId Surname CreditScore Geography 50 0 0 0 0 0 Chapter 2 Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited dtype: int64 a Closer look at tensorFlow 0 0 0 0 0 0 0 0 0 Computing the sum on a null value generates an error. Very clearly, our dataset does not contain any null values. So there is no question of removing rows from the dataset (the ones containing null fields). Now comes the major task of selecting fields for machine learning. Selecting Features and Labels Not all the fields in our database would be useful for training the algorithm. For example, fields such as CustomerId and Surname do not make any sense to our machine learning. So, we need to drop these columns. This is done using the following statement: X = data.drop(labels=['CustomerId', 'Surname', 'RowNumber', 'Exited'], axis = 1) Note that we dropped four fields from the dataset. X is a new array containing the fields which are relevant to our model building. Whether the customer exited the bank marks the output of our model. Thus, the field Exited becomes the label for our model building. This is extracted into the variable y using the following statement: y = data['Exited'] At this point, you are ready with features (X) and labels (y) tensors. 51 Chapter 2 a Closer look at tensorFlow Encoding Categorical Columns You must check if any of the selected columns have categorical values. For this, we will check the data types of all the selected columns using the following statement: X.dtypes The output of the preceding statement is shown as follows: CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Dtype: object int64 object object int64 int64 float64 int64 int64 int64 float64 Note that Geography and Gender are object types. You can check the values they contain by printing the first five rows of the features tensor as shown in Figure 2-13. Figure 2-13. Top five rows of the features vector 52 Chapter 2 a Closer look at tensorFlow The Gender takes two categorical values, Male and Female, while Geography has three categorical values – Germany, Spain, and France. You will need to convert these into numerical values before feeding it to the network. The encoding is done by using the LabelEncoder in sklearn’s preprocessing module. This is shown in the following code snippet: from sklearn.preprocessing import LabelEncoder label = LabelEncoder() X['Geography'] = label.fit_transform(X['Geography']) X['Gender'] = label.fit_transform(X['Gender']) The one-hot encoding creates dummy variables for categorical columns. As Geography column has three distinct values, it will create three variables – one for each country. Thus, we will have three features pertaining to the country in our training dataset. Having many features increases the training time. To reduce the number of features, you can exclude one of the dummy variables pertaining to the country field and yet achieve the same results. We drop the first variable by calling get_dummies method: X = pd.get_dummies(X, drop_first=True, columns=['Geography']) If you examine the data at this point by printing the top five rows, you will notice that there are only two columns for Geography – Geography_1 and Geography_2. One more thing that we need to do before feeding data to the network is to scale all the numerical values in the range –1 to 1. 53 Chapter 2 a Closer look at tensorFlow Scaling Numerical Values As the features in the real data can have a wide range of data values, machine learning would work better if we standardized all these data points to the same scale. Ideally, the mean for each column should be 0, and the standard deviation should be 1 for better results on machine learning. So we transform all our data points using the equation: z = (x - mu) / s where mu is mean and s is standard deviation. This standardization is performed by using the StandardScaler function of sklearn as shown in this code snippet: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X = scaler.fit_transform(X) At this point, our preprocessing of data is completed. We are now ready to define and train the model. When we train the model, we also need to validate our training. If the results of the training do not meet our expectations, we will need to further preprocess the data – like adjusting the number of features and so on. For testing, we will reserve a part of the data. Thus, we split our entire preprocessed data – a larger portion used for training and the smaller portion for testing the trained model. Creating Training and Testing Datasets To split the data into two portions, we use the train_test_split method of sklearn as in the following program statement: # Split dataset into training and testing from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3) 54 Chapter 2 a Closer look at tensorFlow The test_size parameter determines what percentage of data should be reserved for testing. The function returns a set of vectors for training and testing. Many times, the terms validation and test datasets are used interchangeably. To avoid further confusions, I am giving widely accepted unambiguous definitions of these terms: • Training dataset – The part of data that is used for model fitting. • Validation dataset – The part of data used for tuning hyperparameters during training. • Test dataset – The part of data used for evaluating a model’s performance after its training. Now, it is time to define our network. Defining ANN After preprocessing, we have 11 features in our dataset. The number of features is determined by computing the shape of the training dataset with the following statement: X_train.shape[1] The expected output of the network is a binary value indicating the likelihood of the customer leaving the bank. The target values are specified in the y_train vector. You will create a four-layer deep learning network model. In the first layer, you will use 128 nodes, the second one will have 64, the third one will have 32, and the fourth will be a single output node. To create the network, you use tf.keras API, which is a new standard in TensorFlow. You use Sequential API to create a linear stack of layers. You instantiate the model using the following statement: model = keras.models.Sequential() 55 Chapter 2 a Closer look at tensorFlow You add the first layer to the stack consisting of 128 nodes using the following statement: model.add(keras.layers.Dense(128, activation = 'relu', input_dim = X_train.shape[1])) The input dimension to this layer is set in the parameter input_dim, which is the number of features defined by the shape of X_train vector. We use ReLU (rectified linear unit) as the activation function. The activation function is used in deciding whether the node is to be activated depending on its weighted sum. ReLU is the most widely used activation function that outputs 0 for negative inputs and 1 otherwise. Likewise, you add the second layer to the network using the following statement: model.add(keras.layers.Dense(64, activation = 'relu')) The input to this layer comes from the previous layer so there is no need to specify the dimensions of the input vector. The third layer is added using the statement: model.add(keras.layers.Dense(32, activation = 'relu')) Finally, the last layer in the network is added using the statement: model.add(keras.layers.Dense(1, activation = 'sigmoid')) We use sigmoid as the activation function here as this layer is outputting a binary value. A sigmoid function is a type of activation function and is also known as a squashing function. A squashing function limits the output to a range between 0 and 1, making it suitable in predicting probabilities. You print the network summary by calling the summary function as follows: model.summary() 56 Chapter 2 a Closer look at tensorFlow The summary as printed on the screen is shown here: Model: "sequential" _______________________________________________________________ Layer (type) Output Shape Param # =============================================================== dense (Dense) (None, 128) 1536 _______________________________________________________________ dense_1 (Dense) (None, 64) 8256 _______________________________________________________________ dense_2 (Dense) (None, 32) 2080 _______________________________________________________________ dense_3 (Dense) (None, 1) 33 =============================================================== Total params: 11,905 Trainable params: 11,905 Non-trainable params: 0 _______________________________________________________________ Compiling Model After the model architecture is defined, it needs to be compiled. To compile the model, you call the model’s compile method: model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy']) As the model that we are developing is a binary classifier, we use the binary_crossentropy as our loss function. We use Adam optimizer while training the model as this is suited best in such situations. Later on after the training, if you are not satisfied with the model’s performance, you may experiment with other optimizers. The accuracy metrics are collected for analysis by specifying the value for metrics parameter. 57 Chapter 2 a Closer look at tensorFlow I will also show you how to use TensorBoard for analyzing the network performance. For this, we need to define a callback function which will be called at each epoch during training. We will be collecting the logs in the log folder. To clear the earlier log, we use the following action: !rm -rf ./log/ You define the callback function using the following code snippet: #tensorboard visualization import datetime, os logdir = os.path.join("log", datetime.datetime.now(). strftime("%Y%m%d-%H%M%S")) tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq = 1) With this setup for training analysis and the compilation of the model, we are now ready to start the training. Model Training To train the model, you use the fit method on the model instance: r = model.fit(X_train, y_train, batch_size = 32, epochs = 50, validation_data = (X_test, y_test), callbacks = [tensorboard_callback]) The first parameter to the fit function defines the features vector and the second defines the labels. The batch_size parameter as the name suggests defines the batch size for the training. The epochs parameter determines how many iterations would be performed during training. The test data that we generated during data preprocessing is used for 58 Chapter 2 a Closer look at tensorFlow model validation and is passed to the fit function in the validation_data parameter. Lastly, the callbacks parameter specifies which callback function would be called at the end of each iteration. The partial output during training is shown in Figure 2-14. Figure 2-14. Program output during training Once the training is over, you can use the collected metrics to evaluate if the model is trained to your desired accuracy. Performance Evaluation To evaluate the performance, we will launch the TensorBoard in our Colab environment using the %tensorboard magic. Before this, we need to load the tensorboard using %load_ext magic. %load_ext tensorboard %tensorboard --logdir log #command to launch tensorboard on colab Running this magic runs the TensorBoard, and you will see the accuracy and loss metrics plotted on the screen. The accuracy and loss metrics plot is shown in Figure 2-15. 59 Chapter 2 a Closer look at tensorFlow Figure 2-15. Accuracy and loss metrics in TensorBoard The two curves shown here are plotted on the training and the validation data. The examination of accuracy and loss metrics helps you in determining if the model is performing well. The plots basically show you the model’s accuracy and the loss at every epoch. If the accuracy is improving on every epoch, your training is in the right direction. Similarly, the loss should keep on reducing on every epoch. These plots can easily detect issues such as overfitting and so on. If you are not satisfied with the performance, you may adjust the model parameters and retrain it to improve the accuracy. You may try a different optimizer and/or introduce regularization to improve the model accuracy. You can also evaluate the model’s performance on the test data by calling the evaluate method and passing the test features and labels vectors as parameters. The program statement for evaluating the model and its output is shown here: test_scores = model.evaluate(X_test, y_test) print('Test Loss: ', test_scores[0]) print('Test accuracy: ', test_scores[1] * 100) Test Loss: 0.6143370634714762 Test accuracy: 83.96666646003723 The accuracy on the test data which is about 83% indicates that the model will correctly classify 83% of the given data points. 60 Chapter 2 a Closer look at tensorFlow I will also show you how to plot the performance charts on the validation data using the matplotlib – a traditional way of performance evaluation. Use the following code snippet to do so: %matplotlib inline import matplotlib.pyplot as plt #for plotting curves plt.plot(r.history['val_accuracy'], label='val_acc') plt.plot(r.history['val_loss'], label='val_loss') plt.legend() plt.show() The plot generated by matplotlib is shown in Figure 2-16. Figure 2-16. Accuracy/loss matrix on the validation data Besides the accuracy and loss metrics, we also used the confusion matrix quite often to evaluate the performance of our network, which is discussed next. 61 Chapter 2 a Closer look at tensorFlow Predicting on Test Data The confusion matrix requires both the predictions and the True labels. Thus, we first need to generate predictions on our test data. For predicting, use predict_classes method as shown here: y_pred = model.predict_classes(X_test) The method takes the features vector as its argument and returns a tensor of predictions. You may print the predictions on the console. The result is shown as follows: y_pred array([[1], [0], [0], ..., [1], [0], [0]], dtype=int32) Here the value of 1 at any index value indicates that the customer is going to leave the bank, and the value of 0 indicates that the bank has retained the customer. You use these prediction results to create and plot a confusion matrix that provides a better visualization of the model’s performance. 62 Chapter 2 a Closer look at tensorFlow Confusion Matrix I will first show you how to generate a confusion matrix and then show you how to interpret the matrix plot. To generate the confusion matrix, you use the sklearn’s built-in function as shown here: from sklearn.metrics import confusion_matrix cf = confusion_matrix(y_test, y_pred) cf This gives the following output: array([[2175, [ 283, 209], 333]]) You will plot this matrix to give you a visual effect using the following code: from mlxtend.plotting import plot_confusion_matrix plot_confusion_matrix(conf_mat = cf, cmap = plt.cm.cmapname) The plot is shown in Figure 2-17. Figure 2-17. Confusion matrix 63 Chapter 2 a Closer look at tensorFlow The x-axis represents the predicted labels, and the y-axis represents the True labels. As the chart shows, there are 2175 True positives and 333 True negatives. A True positive indicates that the customers would leave the bank, and they have been correctly classified by our model. Likewise, a True negative indicates that these customers will not leave the bank, and they have been correctly classified. The True positive and True negative help us in determining the accuracy of our model. The sklearn defines accuracy_score function to compute the accuracy score which is calculated by adding the number of True positives and True negatives and then dividing the sum by the total number of predictions. The following program statement computes the accuracy score for our model: from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred) 0.8396666666666667 The execution of this statement gives the accuracy score of 83.63%, which is a largely accepted accuracy in machine learning. As the model is trained to our satisfaction, it is now time to use it on unseen data. Predicting on Unseen Data To create an unseen data for our use case, we need to know the data types of all the features to which we will assign some dummy values. The head dump of our features shows us the column names and the range of values it holds. A partial dump of our features vector is shown in Figure 2-18. Figure 2-18. Screenshot of features data 64 Chapter 2 a Closer look at tensorFlow So, for our unseen test data, we will use the following values: CreditScore = 615 Gender = Male Age = 22 Tenure = m5 Balance = 20000 NumOfProducts = 1 HasCrCard = 1 IsActiveMember = 1 EstimatedSalary = 60000 Geography = Spain You will input this data to our trained model by calling its predict method with the preceding values at appropriate indexes in the parameters list. customer = model.predict([[615, 1, 22, 5, 20000, 5, 1, 1, 60000, 0, 0]]) customer if customer[0] == 1: print ("Customer is likely to leave") else: print ("Customer will stay") The execution of the preceding code segment gives this output: Customer will stay The value of 0 indicates that this customer is unlikely to leave the bank. Note that the accuracy of this prediction is still about 83% as computed earlier. 65 Chapter 2 a Closer look at tensorFlow After the model is fully trained to your satisfaction, you may save it to disk and deploy it on your production server for real-life use. How this is done is explained in the next chapter when I discuss the tf.keras implementation in depth. Full Source Code The complete source code for the project is given in Listing 2-2 for your quick reference. Listing 2-2. Binary classification full source %tensorflow_version 2.x import tensorflow as tf from tensorflow import keras import pandas as pd # Load data from Github data_url = 'https://raw.githubusercontent.com/Apress/artificialneural-networks-with-tensorflow-2/main/ch02/Churn_Modelling.csv' data=pd.read_csv(data_url) # Shuffle data for taking care of patterns in data collection from sklearn.utils import shuffle data=shuffle(data) #shuffling the data # Examine loaded data data # Check for null values data.isnull().sum() # Drop irrelevant columns to set up features vector X = data.drop(labels=['CustomerId', 'Surname', 'RowNumber', 'Exited'], axis = 1) 66 Chapter 2 a Closer look at tensorFlow # Set up labels vector y = data['Exited'] # Check data types for finding categorical columns X.dtypes # Examine few records for finding values in categorical columns X.head() # Encode categorical columns from sklearn.preprocessing import LabelEncoder label = LabelEncoder() X['Geography'] = label.fit_transform(X['Geography']) X['Gender'] = label.fit_transform(X['Gender']) # Drop the first column of Geography to reduce the number of features X = pd.get_dummies(X, drop_first=True, columns=['Geography']) X.head() # Scale all data points to -1 to + 1 from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X = scaler.fit_transform(X) # Split dataset into training and validation from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_ size = 0.3) # Determine number of features X_train.shape[1] # Create a stacked layers sequential network 67 Chapter 2 a Closer look at tensorFlow model = keras.models.Sequential() # Create linear stack of layers model.add(keras.layers.Dense(128, activation = 'relu', input_ dim = X_train.shape[1])) # Dense fully connected layer model.add(keras.layers.Dense(64, activation = 'relu')) model.add(keras.layers.Dense(32, activation = 'relu')) model.add(keras.layers.Dense(1, activation = 'sigmoid')) # activation sigmoid for a single output # Print model summary model.summary() # Compile model with desired loss function, optimizer and evaluation metrics model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy']) #to clear any other logs if present so that graphs won't overlap with previous saved logs in tensorboard !rm -rf ./log/ #tensorboard visualization import datetime, os logdir = os.path.join("log", datetime.datetime.now(). strftime("%Y%m%d-%H%M%S")) tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq = 1) # Perform training r = model.fit(X_train, y_train, batch_size = 32, epochs = 50, validation_data = (X_test, y_test), callbacks = [tensorboard_ callback]) 68 Chapter 2 a Closer look at tensorFlow # Load tensorboard in Colab %load_ext tensorboard %tensorboard --logdir log #command to launch tensorboard on colab # evaluate model performance on test data test_scores = model.evaluate(X_test, y_test) print('Test Loss: ', test_scores[0]) print('Test accuracy: ', test_scores[1] * 100) # Plot metrics in matplotlib %matplotlib inline import matplotlib.pyplot as plt #for plotting curves plt.plot(r.history['val_accuracy'], label='val_acc') plt.plot(r.history['val_loss'], label='val_loss') plt.legend() plt.show() # Predict on test data y_pred = model.predict_classes(X_test) y_pred # Create confusion matrix from sklearn.metrics import confusion_matrix cf = confusion_matrix(y_test, y_pred) cf # Plot confusion matrix from mlxtend.plotting import plot_confusion_matrix plot_confusion_matrix(conf_mat = cf, cmap = plt.cm.cmapname) # Compute accuracy score from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred) 69 Chapter 2 a Closer look at tensorFlow # Predict on unseen customer data customer = model.predict([[615, 1, 22, 5, 20000, 5, 1, 1, 60000, 0, 0]]) customer if customer[0] == 1: print ("Customer is likely to leave") else: print ("Customer will stay") Summary In this chapter, you set up your environment for deep learning using TensorFlow 2.x. You used Colab for developing your Python notebooks. A trivial application helped you learn the development environment. This was followed by a more detailed realistic example. In this realistic example, you loaded an external database and learned how to preprocess the data for making it suitable for machine learning, how to define a deep neural network, how to compile it, how to train it, how to evaluate the model’s performance using TensorBoard and plots from matplotlib, and finally how to use the trained model to make predictions on unseen data. Though the model’s performance is evaluated, I never talked about how to improve it. In the next chapter, you will learn a few techniques for improving a model’s performance. The next chapter covers the TensorFlow Keras integration in more depth and discusses the image classification problem. 70
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Musabe Jean Bosco
Chongqing University of Posts and Telecommunications
Ferhat Bozkurt
Ataturk University
Munish Jindal
GZS PTU Campus Bathinda
Stefan Wermter
University of Hamburg