Python Machine Learning Projects
Python Machine Learning Projects
Projects
DigitalOcean, New York City, New York, USA
Foreword
As machine learning is increasingly leveraged to find patterns,
conduct analysis, and make decisions without final input from
humans, it is of equal importance to not only provide resources to
advance algorithms and methodologies, but to also invest in bringing
more stakeholders into the fold. This book of Python projects in
machine learning tries to do just that: to equip the developers of
today and tomorrow with tools they can use to better understand,
evaluate, and shape machine learning to help ensure that it is serving
us all.
This book will set you up with a Python programming environment
if y o u don’t have one already, then provide you with a conceptual
understanding of machine learning in the chapter “An Introduction
to Machine Learning.” What follows next are three Python machine
learning projects. They will help you create a machine learning
classifier, build a neural network to recognize handwritten digits,
and give you a background in deep reinforcement learning through
building a bot for Atari.
These chapters originally appeared as articles on DigitalOcean
Community, written by members of the international software
developer community. If you are interested in contributing to this
knowledge base, consider proposing a tutorial to the Write for
DOnations program at do.co/w4do. DigitalOcean offers payment to
authors and provides a matching donation to tech-focused
nonprofits.
Setting Up a Python
Programming Environment
Written by Lisa Tagliaferri
Python is a flexible and versatile programming language suitable for
many use cases, with strengths in scripting, automation, data
analysis, machine learning, and back-end development. First
published in 1991 the Python development team was inspired by the
British comedy group Monty Python to make a programming
language that was fun to use. Python 3 is the most current version of
the language and is considered to be the future of Python.
This tutorial will help get your remote server or local computer set
up with a Python 3 programming environment. If you already have
Python 3 installed, along with pip and venv, feel free to move onto
the next chapter!
Prerequisites
This tutorial will be based on working with a Linux or Unix-like
(*nix) system and use of a command line or terminal environment.
Both macOS and specifically the PowerShell program of Windows
should be able to achieve similar results.
You’ll receive output in the terminal window that will let you know
the version number. While this number may vary, the output will be
similar to this:
Output
Python 3.7.2
Once you are in the directory where you would like the
environments to live, you can create an environment. You should use
the version of Python that is installed on your machine as the first
part of the command (the output you received when typingpython
-V). If that version was Python 3.6.3, you can type the
following:
python3.6 -m venv my_env
If, instead, your computer hasPython 3.7.3 installed, use the
following command:
python3.7 -m venv my_env
Windows machines may allow you to remove the version number
entirely:
python -m venv my_env
Once you run the appropriate command, you can verify that the
environment is set up be continuing.
Essentially,pyvenvsets up a new directory that contains a few items
which we can view with thelscommand:
ls my_env
Output
bin include lib lib64 pyvenv.cfg share
Together, these files work to make sure that your projects are
isolated from the broader context of your local machine, so that
system files and project files don’t mix. This is good practice for
version control and to ensure that each of your projects has access to
the particular packages that it needs. Python Wheels, a built-package
format for Python that can speed up your software production by
reducing the number of times you need to compile, will be in the
Ubuntu 18.04sharedirectory.
To use this environment, you need to activate it, which you can
achieve by typing the following command that calls the activate
script:
source my_env/bin/activate
Your command prompt will now be prefixed with the name of your
environment, in this case it is called my_env. Depending on what
version o f Debian Linux you are running, your prefix may appear
somewhat differently, but the name of your environment in
parentheses should be the first thing you see on your line:
(my_env) sammy@sammy:~/environments$
Conclusion
At this point you have a Python 3 programming environment set up
on your machine and you can now begin a coding project!
If you would like to learn more about Python, you can download our
free How To Code in Python 3 eBook via do.co/python-book.
An Introduction to Machine
Learning
Written by Lisa Tagliaferri
Machine learning is a subfield of artificial intelligence (AI). The
goal of machine learning generally is to understand the structure of
data and fit that data into models that can be understood and utilized
by people.
Although machine learning is a field within computer science, it
differs from traditional computational approaches. In traditional
computing, algorithms are sets of explicitly programmed
instructions used by computers to calculate or problem solve.
Machine learning algorithms instead allow for computers to train on
data inputs and use statistical analysis in order to output values that
fall within a specific range. Because of this, machine learning
facilitates computers in building models from sample data in order
to automate decision-making processes based on data inputs.
Any technology user today has benefitted from machine learning.
Facial recognition technology allows social media platforms to help
users tag and share photos of friends. Optical character recognition
(OCR) technology converts images of text into movable type.
Recommendation engines, powered by machine learning, suggest
what movies or television shows to watch next based on user
preferences. Self-driving cars that rely on machine learning to
navigate may soon be available to consumers.
Machine learning is a continuously developing field. Because of
this, there are some considerations to keep in mind as you work with
machine learning methodologies, or analyze the impact of machine
learning processes.
In this tutorial, we’ll look into the common machine learning
methods o f supervised and unsupervised learning, and common
algorithmic approaches in machine learning, including the k-nearest
neighbor algorithm, decision tree learning, and deep learning. We’ll
explore which programming languages are most used in machine
learning, providing y o u with some of the positive and negative
attributes of each. Additionally, we’ll discuss biases that are
perpetuated by machine learning algorithms, and consider what can
be kept in mind to prevent these biases when building algorithms.
Approaches
As a field, machine learning is closely related to computational
statistics, so having a background knowledge in statistics is useful
for understanding and leveraging machine learning algorithms.
For those who may not have studied statistics, it can be helpful to
first define correlation and regression, as they are commonly used
techniques for investigating the relationship among quantitative
variables. Correlation is a measure of association between two
variables that are not designated as either dependent or independent.
Regression at a basic level is used to examine the relationship
between one dependent and one independent variable. Because
regression statistics can be used to anticipate the dependent variable
when the independent variable is known, regression enables
prediction capabilities.
Approaches to machine learning are continuously being developed.
For our purposes, we’ll go through a few of the popular approaches
that are being used in machine learning at the time of writing.
k-nearest neighbor
The k-nearest neighbor algorithm is a pattern recognition model that
can be used for classification as well as regression. Often
abbreviated as kNN, the k in k-nearest neighbor is a positive integer,
which is typically small. In either classification or regression, the
input will consist of the k closest training examples within a space.
We will focus on k-NN classification. In this method, the output is
class membership. This will assign a new object to the class most
common among its k nearest neighbors. In the case of k = 1, the
object is assigned to the class of the single nearest neighbor.
Let’s look at an example of k-nearest neighbor. In the diagram
below, there are blue diamond objects and orange star objects. These
belong to two separate classes: the diamond class and the star class.
k-nearest neighbor initial data set
When a new object is added to the space — in this case a green heart
— we will want the machine learning algorithm to classify the heart
to a certain class.
Human Biases
Although data and computational analysis may make us think that
we are receiving objective information, this is not the case; being
based on data does not mean that machine learning outputs are
neutral. Human bias plays a role in how data is collected, organized,
and ultimately in the algorithms that determine how machine
learning will interact with that data.
If, for example, people are providing images for “fish” as data to
train an algorithm, and these people overwhelmingly select images
of goldfish, a computer may not classify a shark as a fish. This
would create a bias against sharks as fish, and sharks would not be
counted as fish.
When using historical photographs of scientists as training data, a
computer may not properly classify scientists who are also people of
color or women. In fact, recent peer-reviewed research has indicated
that AI and machine learning programs exhibit human-like biases
that include race and gender prejudices. See, for example
“Semantics derived automatically from language corpora contain
human-like biases” and “Men Also Like Shopping: Reducing
Gender Bias Amplification using Corpus-level Constraints” [PDF].
As machine learning is increasingly leveraged in business, uncaught
biases can perpetuate systemic issues that may prevent people from
qualifying for loans, from being shown ads for high-paying job
opportunities, or from receiving same-day delivery options.
Because human bias can negatively impact others, it is extremely
important to be aware of it, and to also work towards eliminating it
as much as possible. One way to work towards achieving this is by
ensuring that there are diverse people working on a project and that
diverse people are testing and reviewing it. Others have called for
regulatory third parties to monitor and audit algorithms, building
alternative systems that can detect biases, and ethics reviews as part
of data science project planning. Raising awareness about biases,
being mindful of our own unconscious biases, and structuring equity
in our machine learning projects and pipelines can work to combat
bias in this field.
Conclusion
This tutorial reviewed some of the use cases of machine learning,
common methods and popular approaches used in the field, suitable
machine learning programming languages, and also covered some
things to keep in mind in terms of unconscious biases being
replicated in algorithms.
Because machine learning is a field that is continuously being
innovated, it is important to keep in mind that algorithms, methods,
and approaches will continue to change.
Currently, Python is one of the most popular programming
languages to use with machine learning applications in professional
fields. Other languages you may wish to investigate include Java, R,
and C++.
Prerequisites
To complete this tutorial, we’ll use Jupyter Notebooks, which are a
useful and interactive way to run machine learning experiments.
With Jupyter Notebooks, you can run short blocks of code and see
the results quickly, making it easy to test and debug your code.
To get up and running quickly, you can open up a web browser and
navigate to the Try Jupyter website: jupyter.org/try. From there,
click on Try Jupyter with Python, and you will be taken to an
interactive Jupyter Notebook where you can start to write Python
code.
If you would like to learn more about Jupyter Notebooks and how to
set up your own Python programming environment to use with
Jupyter, you can read our tutorial on How To Set Up Jupyter
Notebook for Python 3.
Alt Jupyter Notebook with three Python cells, which prints the first instance in our dataset
As the image shows, our class names are malignant and benign,
which are then mapped to binary values of0 and1, where0
represents malignant tumors and1represents benign tumors.
Therefore, our first data instance is a malignant tumor whose mean
radius is 1.79900000e+01.
Now that we have our data loaded, we can work with our data to
build our machine learning classifier.
After we train the model, we can then use the trained model to make
predictions on our test set, which we do using
thepredict()function. Thepredict()function returns an array
of predictions for each data instance in the test set. We can then print
our predictions to get a sense of what the model determined.
Use thepredict()function with thetestset and print the results:
ML Tutorial
…
# Make predictions
preds = gnb.predict(test) print(preds)
Jupyter Notebook with Python cell that prints the predicted values of the Naive Bayes
classifier on our test data
As you see in the Jupyter Notebook output, thepredict()function
returned an array of0s and1s which represent our predicted values
for the tumor class (malignant vs. benign).
Now that we have our predictions, let’s evaluate how well our
classifier is performing.
Alt Jupyter Notebook with Python cell that prints the accuracy of our NB classifier
Now you can continue to work with your code to see if you can
make your classifier perform even better. You could experiment with
different subsets of features or even try completely different
algorithms. Check out Scikit-learn’s website at scikit-learn.org/stable
for more machine learning ideas.
Conclusion
In this tutorial, you learned how to build a machine learning
classifier in Python. Now you can load data, organize data, train,
predict, and evaluate machine learning classifiers in Python using
Scikit-learn. The steps in this tutorial should help you facilitate the
process of working with your own data in Python.
Prerequisites
To complete this tutorial, you’ll need a local or remote Python 3
development environment that includes pip for installing Python
packages, and venv for creating virtual environments.
Next, install the libraries you’ll use in this tutorial. We’ll use specific
versions of these libraries by creating arequirements.txtfile in
the project directory which specifies the requirement and the version
we need. Create therequirements.txtfile:
(tensorflow-demo) $ touch requirements.txt
Open the file in your text editor and add the following lines to
specify the Image, NumPy, and TensorFlow libraries and their
versions:
requirements.txt
image==1.5.20
numpy==1.14.3
tensorflow==1.4.0
Save the file and exit the editor. Then install these libraries with the
following command:
(tensorflow-demo) $ pip install -r requirements.txt With the
dependencies installed, we can start working on our project.
Let’s create a Python program to work with this dataset. We will use
one file for all of our work in this tutorial. Create a new file called
main.py:
(tensorflow-demo) $ touch main.py
Now open this file in your text editor of choice and add this line of
code to the file to import the TensorFlow library:
main.py
import tensorflow as tf
Add the following lines of code to your file to import the MNIST
dataset and store the image data in the variablemnist:
main.py
…
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“MNIST_data/”,
one_hot=True) # y labels are oh-encoded
Now that we have our data imported, it’s time to think about the
neural network.
The learning rate represents how much the parameters will adjust at
each step of the learning process. These adjustments are a key
component of training: after each pass through the network we tune
the weights slightly to try and reduce the loss. Larger learning rates
can converge faster, but also have the potential to overshoot the
optimal values as they are updated. The number of iterations refers
to how many times we go through the training step, and the batch
size refers to how many training examples we are using at each step.
Thedropoutvariable represents a threshold at which we eliminate
some units at random. We will be using dropoutin our final
hidden layer to give each unit a 50% chance of being eliminated at
every training step. This helps prevent overfitting.
We have now defined the architecture of our neural network, and the
hyperparameters that impact the learning process. The next step is to
build the network as a TensorFlow graph.
For the bias, we use a small constant value to ensure that the tensors
activate in the intial stages and therefore contribute to the
propagation. The weights and bias tensors are stored in dictionary
objects for ease of access. Add this code to your file to define the
biases:
main.py
…
biases = {
‘b1’: tf.Variable(tf.constant(0.1, shape=[n_hidden1])), ‘b2’:
tf.Variable(tf.constant(0.1, shape=[n_hidden2])), ‘b3’:
tf.Variable(tf.constant(0.1, shape=[n_hidden3])), ‘out’:
tf.Variable(tf.constant(0.1, shape=[n_output]))
}
Next, set up the layers of the network by defining the operations that
will manipulate the tensors. Add these lines to your file:
main.py … layer_1 = tf.add(tf.matmul(X, weights[‘w1’]),
biases[‘b1’]) layer_2 = tf.add(tf.matmul(layer_1,
weights[‘w2’]), biases[‘b2’]) layer_3 =
tf.add(tf.matmul(layer_2, weights[‘w3’]), biases[‘b3’])
layer_drop = tf.nn.dropout(layer_3, keep_prob) output_layer =
tf.matmul(layer_3, weights[‘out’]) + biases[‘out’]
It’s now time to run our program and see how accurately our neural
network can recognize these handwritten digits. Save
themain.pyfile and execute the following command in the terminal
to run the script:
(tensorflow-demo) $ python main.py
You’ll see an output similar to the following, although individual
loss and accuracy results may vary slightly:
Output
Iteration 0 | Loss = 3.67079 | Accuracy = 0.140625
Iteration 100 | Loss = 0.492122 | Accuracy = 0.84375
Iteration 200 | Loss = 0.421595 | Accuracy = 0.882812
Iteration 300 | Loss = 0.307726 | Accuracy = 0.921875
Iteration 400 | Loss = 0.392948 | Accuracy = 0.882812
Iteration 500 | Loss = 0.371461 | Accuracy = 0.90625
Iteration 600 | Loss = 0.378425 | Accuracy = 0.882812
Iteration 700 | Loss = 0.338605 | Accuracy = 0.914062
Iteration 800 | Loss = 0.379697 | Accuracy = 0.875 Iteration
900 | Loss = 0.444303 | Accuracy = 0.90625
Accuracy on test set: 0.9206
Open the main.pyfile in your editor and add the following lines of
code to the top of the file to import two libraries necessary for image
manipulation.
main.py
import numpy as np from PIL import Image …
Then at the end of the file, add the following line of code to load the
test image of the handwritten digit:
main.py
…
img =
np.invert(Image.open(“test_img.png”).convert(‘L’)).ravel()
You can try testing the network with more complex images –– digits
that look like other digits, for example, or digits that have been
drawn poorly or incorrectly –– to see how well it fares.
Conclusion
In this tutorial you successfully trained a neural network to classify
the MNIST dataset with around 92% accuracy and tested it on an
image of your own. Current state-of-the-art research achieves around
99% on this same problem, using more complex network
architectures involving convolutional layers. These use the 2D
structure of the image to better represent the contents, unlike our
method which flattened all the pixels into one vector of 784 units.
You can read more about this topic on the TensorFlow website, and
see the research papers detailing the most accurate results on the
MNIST website.
Now that you know how to build and train a neural network, you can
try and use this implementation on your own data, or test it on other
popular datasets such as the Google StreetView House Numbers, or
the CIFAR-10 dataset for more general image recognition.
Prerequisites
To complete this tutorial, you will need:
A server running Ubuntu 18.04, with at least 1GB of RAM. This
server should have a non-root user withsudoprivileges configured,
as well as a firewall set up with UFW. You can set this up by
following this Initial Server Setup Guide for Ubuntu 18.04. A
Python 3 virtual environment which you can achieve by reading our
guide “How To Install Python 3 and Set Up a Programming
Environment on an Ubuntu 18.04 Server.”
Alternatively, if you are using a local machine, you can install
Python 3 and set up a local programming environment by reading
the appropriate tutorial for your operating system via our Python
Installation and Setup Series.
Then create a new virtual environment for the project. You can name
this virtual environment anything you’d like; here, we will name it
ataribot:
python3 -m venv ataribot
Activate your environment:
source ataribot/bin/activate
Note: Throughout this guide, the bots’ names are aligned with the
Step number in which they appear, rather than the order in which
they appear. Hence, this bot is namedbot\_2\_random.py
rather than bot\_1\_random.py.
Start this script by adding the following highlighted lines. These
lines include a comment block that explains what this script will do
and two import statements that will import the packages this script
will ultimately need in order to function:
/AtariBot/bot_2_random.py ””” Bot 2 — Make a random, baseline agent
for the SpaceInvaders game. ”””
import gym import random
Save the file and close your editor, then run the script by typing the
following in your terminal:
python bot_2_random.py
This will output the following reward, exactly:
Output
Making new env: SpaceInvaders-v0 Reward: 555.0
This is your very first bot, although it’s rather unintelligent since it
doesn’t account for the surrounding environment when it makes
decisions. For a more reliable estimate of your bot’s performance,
you could have the agent run for multiple episodes at a time,
reporting rewards averaged across multiple episodes. To configure
this, first reopen the file:
nano bot_2_random.py
Afterrandom.seed(0), add the following highlighted line which
tells the agent to play the game for 10 episodes:
/AtariBot/bot_2_random.py . . .
random.seed(0)
num_episodes = 10
. . .
Right beforebreak, currently the last line of the main game loop,
add the current episode’s reward to the list of all rewards:
/AtariBot/bot_2_random.py
. . .
if done:
print(‘Reward: %s’ % episode_reward)
rewards.append(episode_reward) break
. . .
Your file will now align with the following. Please note that the
following code block includes a few comments to clarify key parts
of the script:
/AtariBot/bot_2_random.py
”””
Bot 2 — Make a random, baseline agent for the SpaceInvaders
game. ”””
import gym import random
random.seed(0) # make results reproducible
num_episodes = 10
def main(): env = gym.make(‘SpaceInvaders-v0’) # create the
game env.seed(0) # make results reproducible rewards = []
for _ in range(num_episodes): env.reset() episode_reward = 0
while True: action = env.action_space.sample() _, reward,
done, _ = env.step(action) # random action episode_reward +=
reward if done: print(‘Reward: %d’ % episode_reward)
rewards.append(episode_reward) break print(‘Average reward:
%.2f’ % (sum(rewards) / len(rewards))) if __name__ ==
‘__main__’: main()
Save the file, exit the editor, and run the script:
python bot_2_random.py
This will print the following average reward, exactly:
Output
Making new env: SpaceInvaders-v0 . . .
Average reward: 163.50
state0 shoot 10
state0 right 3
state0 left 3
However, most games have too many states to list in a table. In such
cases, the Q-learning agent learns a Q-function instead of a Q-table.
We use this Q-function similarly to how we used the Q-table
previously. Rewriting the table entries as functions gives us the
following:
Q(state0, shoot) = 10 Q(state0, right) = 3 Q(state0, left) =
3
The player starts at the top left, denoted by S, and works its way to
the goal at the bottom right, denoted byG. The available actions are
right, left, up, and down, and reaching the goal results in a score of
1. There are a number of holes, denotedH, and falling into one
immediately results in a score of 0.
In this section, you will implement a simple Q-learning agent. Using
what you’ve learned previously, you will create an agent that trades
off between exploration and exploitation. In this context, exploration
means the agent acts randomly, and exploitation means it uses its Q-
values to choose what it believes to be the optimal action. You will
also create a table to hold the Q-values, updating it incrementally as
the agent acts and learns.
Make a copy of your script from Step 2:
cp bot_2_random.py bot_3_q_table.py
Then open up this new file for editing:
nano bot_3_q_table.py
Begin by updating the comment at the top of the file that describes
the script’s purpose. Because this is only a comment, this change
isn’t necessary for the script to function properly, but it can be
helpful for keeping track of what the script does:
/AtariBot/bot_3_q_table.py
”””
Bot 3 — Build simple q-learning agent for FrozenLake ”””
. . .
After these modifications your game loop will match the following:
. . . for _ in range(num_episodes): state = env.reset()
episode_reward = 0 while True: action =
env.action_space.sample() state2, reward, done, _ =
env.step(action) episode_reward += reward state = state2 if
done: rewards.append(episode_reward)) break . . .
Next, add the ability for the agent to trade off between exploration
and exploitation. Right before your main game loop (which starts
with for…), create the Q-value table:
/AtariBot/bot_3_q_table.py . . . Q =
np.zeros((env.observation_space.n, env.action_space.n)) for _
in range(num_episodes): . . .
Then, rewrite theforloop to expose the episode number:
/AtariBot/bot_3_q_table.py
. . . Q = np.zeros((env.observation_space.n,
env.action_space.n)) for episode in range(1, num_episodes +
1): . . .
Your main game loop will then match the following: Next, you will
update your Q-value table using the Bellman update equation, an
equation widely used in machine learning to find the optimal policy
within a given environment.
/AtariBot/bot_3_q_table.py
. . .
Q = np.zeros((env.observation_space.n, env.action_space.n))
for episode in range(1, num_episodes + 1):
state = env.reset()
episode_reward = 0
while True:
noise = np.random.random((1, env.action_space.n)) /
(episode**2.)
action = np.argmax(Q[state, :] + noise)
state2, reward, done, _ = env.step(action) episode_reward +=
reward
state = state2
if done:
rewards.append(episode_reward)
break
. . .
The Bellman equation incorporates two ideas that are highly relevant
to this project. First, taking a particular action from a particular state
many times will result in a good estimate for the Q-value associated
with that state and action. To this end, you will increase the number
of episodes this bot must play through in order to return a stronger
Q-value estimate. Second, rewards must propagate through time, so
that the original action is assigned a non-zero reward. This idea is
clearest in games with delayed rewards; for example, in Space
Invaders, the player is rewarded when the alien is blown up and not
when the player shoots. However, the player shooting is the true
impetus for a reward. Likewise, the Q-function must assign
(state0,shoot) a positive reward.
First, updatenum_episodesto equal 4000:
/AtariBot/bot_3_q_table.py . . . np.random.seed(0)
num_episodes = 4000 . . .
Then, add the necessary hyperparameters to the top of the file in the
form of two more variables:
/AtariBot/bot_3_q_table.py
. . .
num_episodes = 4000
discount_factor = 0.8 learning_rate = 0.9
. . .
Compute the new target Q-value, right after the line containing
env.step(…):
/AtariBot/bot_3_q_table.py
. . .
state2, reward, done, _ = env.step(action)
Qtarget = reward + discount_factor * np.max(Q[state2, :])
episode_reward += reward
. . .
Check that your main game loop now matches the following:
Q = np.zeros((env.observation_space.n, env.action_space.n))
for episode in range(1, num_episodes + 1): state =
env.reset() episode_reward = 0 while True: noise =
np.random.random((1, env.action_space.n)) / (episode**2.)
action = np.argmax(Q[state, :] + noise) state2, reward, done,
_ = env.step(action) Qtarget = reward + discount_factor *
np.max(Q[state2, :]) Q[state, action] = ( 1-learning_rate ) *
Q[state, action] + learning_rate * Qtarget
episode_reward += reward state = state2 if done:
rewards.append(episode_reward) break . . .
Our logic for training the agent is now complete. All that’s left is to
add reporting mechanisms.
Even though Python does not enforce strict type checking, add types
to your function declarations for cleanliness. At the top of the file,
before the first line readingimport gym, import theListtype:
from typing import List import gym . . .
Save the file, exit your editor, and run the script:
python bot_3_q_table.py
Your output will match the following:
Output
100-ep Average: 0.11 . Best 100-ep Average: 0.12 . Average:
0.03 (Episode 500)
100-ep Average: 0.25 . Best 100-ep Average: 0.24 . Average:
0.09
(Episode 1000)
100-ep Average: 0.39 . Best 100-ep Average: 0.48 . Average:
0.19
(Episode 1500)
100-ep Average: 0.43 . Best 100-ep Average: 0.55 . Average:
0.25
(Episode 2000)
100-ep Average: 0.44 . Best 100-ep Average: 0.55 . Average:
0.29
(Episode 2500)
100-ep Average: 0.64 . Best 100-ep Average: 0.68 . Average:
0.32
(Episode 3000)
100-ep Average: 0.63 . Best 100-ep Average: 0.71 . Average:
0.36
(Episode 3500)
100-ep Average: 0.56 . Best 100-ep Average: 0.78 . Average:
0.40
(Episode 4000) 100-ep Average: 0.56 . Best 100-ep Average:
0.78 . Average: 0.40 (Episode -1)
You now have your first non-trivial bot for games, but let’s put this
average reward of0.78 into perspective. According to the Gym
FrozenLake page, “solving” the game means attaining a 100-episode
average of0.78. Informally, “solving” means “plays the game very
well”. While not in record time, the Q-table agent is able to solve
FrozenLake in 4000 episodes.
However, the game may be more complex. Here, you used a table to
store all of the 144 possible states, but consider tic tac toe in which
there are 19,683 possible states. Likewise, consider Space Invaders
where there are too many possible states to count. A Q-table is not
sustainable as games grow increasingly complex. For this reason,
you need some way to approximate the Q-table. As you continue
experimenting in the next step, you will design a function that can
accept states and actions as inputs and output a Q-value.
To reiterate, the goal is to reimplement all of the logic from the bots
we’ve already built using Tensorflow’s abstractions. This will make
your operations more efficient, as Tensorflow can then perform all
computation on the GPU.
Begin by duplicating your Q-table script from Step 3:
cp bot_3_q_table.py bot_4_q_network.py
Then open the new file withnanoor your preferred text editor:
nano bot_4_q_network.py
First, update the comment at the top of the file:
/AtariBot/bot_4_q_network.py ””” Bot 4 — Use Q-learning network to
train bot ”””
Again directly beneath the last line you added, insert the following
higlighted code. The first two lines are equivalent to the line added
in Step 3 that computesQtarget, whereQtarget = reward +
discount_factor * np.max(Q[state2, :]). The next
two lines set up your loss, while the last line computes the action
that maximizes your Q-value:
/AtariBot/bot_4_q_network.py
. . .
q_current = tf.matmul(obs_t_ph, W) q_target =
tf.matmul(obs_tp1_ph, W)
q_target_max = tf.reduce_max(q_target_ph, axis=1) q_target_sa
= rew_ph + discount_factor * q_target_max q_current_sa =
q_current[0, act_ph] error =
tf.reduce_sum(tf.square(q_target_sa - q_current_sa))
pred_act_ph = tf.argmax(q_current, 1)
Q = np.zeros((env.observation_space.n, env.action_space.n))
for episode in range(1, num_episodes + 1): . . .
After setting up your algorithm and the loss function, define your
optimizer:
/AtariBot/bot_4_q_network.py . . . error =
tf.reduce_sum(tf.square(q_target_sa - q_current_sa))
pred_act_ph = tf.argmax(q_current, 1)
# 3. Setup optimization trainer =
tf.train.GradientDescentOptimizer(learning_rate=learning_rate
) update_model = trainer.minimize(error)
Q = np.zeros((env.observation_space.n, env.action_space.n))
for episode in range(1, num_episodes + 1): . . .
Next, set up the body of the game loop. To do this, pass data to the
Tensorflow placeholders and Tensorflow’s abstractions will handle
the computation on the GPU, returning the result of the algorithm.
Start by deleting the old Q-table and logic. Specifically, delete the
lines that defineQ(right before thefor loop),noise(in
thewhileloop), action,Qtarget, andQ[state, action].
Renamestatetoobs_t andstate2toobs_tp1to align with the
Tensorflow placeholders you set previously. When finished,
yourforloop will match the following:
/AtariBot/bot_4_q_network.py
. . .
# 3. Setup optimization
trainer =
tf.train.GradientDescentOptimizer(learning_rate=learning_rate
) update_model = trainer.minimize(error)
for episode in range(1, num_episodes + 1): obs_t =
env.reset() episode_reward = 0 while True:
obs_tp1, reward, done, _ = env.step(action)
episode_reward += reward obs_t = obs_tp1 if done: …
Save the file, exit your editor, and run the script:
python bot_4_q_network.py
Your output will end with the following, exactly:
Output 100-ep Average: 0.11 . Best 100-ep Average: 0.11 .
Average: 0.05 (Episode 500) 100-ep Average: 0.41 . Best 100-
ep Average: 0.54 . Average: 0.19 (Episode 1000) 100-ep
Average: 0.56 . Best 100-ep Average: 0.73 . Average: 0.31
(Episode 1500) 100-ep Average: 0.57 . Best 100-ep Average:
0.73 . Average: 0.36 (Episode 2000) 100-ep Average: 0.65 .
Best 100-ep Average: 0.73 . Average: 0.41 (Episode 2500) 100-
ep Average: 0.65 . Best 100-ep Average: 0.73 . Average: 0.43
(Episode 3000) 100-ep Average: 0.69 . Best 100-ep Average:
0.73 . Average: 0.46 (Episode 3500) 100-ep Average: 0.77 .
Best 100-ep Average: 0.79 . Average: 0.48 (Episode 4000) 100-
ep Average: 0.77 . Best 100-ep Average: 0.79 . Average: 0.48
(Episode -1)
You’ve now trained your very first deep Q-learning agent. For a
game as simple as FrozenLake, your deep Q-learning agent required
4000 episodes to train. Imagine if the game were far more complex.
How many training samples would that require to train? As it turns
out, the agent could require millions of samples. The number of
samples required is referred to as sample complexity, a concept
explored further in the next section.
Before the block of imports near the top of your file, add two more
imports for type checking:
/AtariBot/bot_5_ls.py . . . from typing import Tuple from typing
import Callable from typing import List import gym . . .
Following this, you will need to modify the training logic. In the
previous script you wrote, the Q-table was updated every iteration.
This script, however, will collect samples and labels every time step
and train a new model every 10 steps. Additionally, instead of
holding a Q-table or a neural network, it will use a least squares
model to predict Q-values.
Go to themainfunction and replace the definition of the Q-table (Q
= np.zeros(…)) with the following:
/AtariBot/bot_5_ls.py . . . def main(): … rewards = []
n_obs, n_actions = env.observation_space.n,
env.action_space.n W, Q = initialize((n_obs, n_actions))
states, labels = [], [] for episode in range(1, num_episodes
+ 1): . . .
Scroll down before theforloop. Directly below this, add the
following lines which reset thestates andlabelslists if there is
too much information stored:
/AtariBot/bot_5_ls.py
. . .
def main():
…
for episode in range(1, num_episodes + 1): if len(states) >=
10000:
states, labels = [], []
. . .
Modify the line directly after this one, which defines state =
env.reset(), so that it becomes the following. This will one-hot
encode the state immediately, as all of its usages will require a one-
hot vector:
/AtariBot/bot_5_ls.py . . . for episode in range(1, num_episodes +
1): if len(states) >= 10000: states, labels = [], [] state =
one_hot(env.reset(), n_obs) . . .
Before the first line in yourwhilemain game loop, amend the list of
states:
/AtariBot/bot_5_ls.py
. . . for episode in range(1, num_episodes + 1): …
episode_reward = 0 while True: states.append(state) noise =
np.random.random((1, env.action_space.n)) / (episode\*\*2.) .
. .
Then, save the file, exit the editor, and run the script:
python bot_5_ls.py
This will output the following:
Output
100-ep Average: 0.17 . Best 100-ep Average: 0.17 . Average:
0.09 (Episode 500)
100-ep Average: 0.11 . Best 100-ep Average: 0.24 . Average:
0.10 (Episode 1000)
100-ep Average: 0.08 . Best 100-ep Average: 0.24 . Average:
0.10
(Episode 1500)
100-ep Average: 0.24 . Best 100-ep Average: 0.25 . Average:
0.11
(Episode 2000)
100-ep Average: 0.32 . Best 100-ep Average: 0.31 . Average:
0.14
(Episode 2500)
100-ep Average: 0.35 . Best 100-ep Average: 0.38 . Average:
0.16
(Episode 3000)
100-ep Average: 0.59 . Best 100-ep Average: 0.62 . Average:
0.22
(Episode 3500)
100-ep Average: 0.66 . Best 100-ep Average: 0.66 . Average:
0.26
(Episode 4000)
100-ep Average: 0.60 . Best 100-ep Average: 0.72 . Average:
0.30
(Episode 4500) 100-ep Average: 0.75 . Best 100-ep Average:
0.82 . Average: 0.34 (Episode 5000) 100-ep Average: 0.75 .
Best 100-ep Average: 0.82 . Average: 0.34 (Episode -1)
You will now run this pretrained Space Invaders agent to see how it
performs. Unlike the past few bots we’ve used, you will write this
script from scratch.
Create a new script file:
nano bot_6_dqn.py
Begin this script by adding a header comment, importing the
necessary utilities, and beginning the main game loop:
/AtariBot/bot_6_dqn.py
”””
Bot 6 - Fully featured deep q-learning network. ”””
import cv2 import gym import numpy as np import random import
tensorflow as tf from bot_6_a3c import a3c_model
def main():
if **name** == ‘**main**’: main()
Directly after your imports, set random seeds to make your results
reproducible. Also, define a hyperparameternum_episodeswhich
will tell the script how many episodes to run the agent for:
/AtariBot/bot_6_dqn.py
. . .
import tensorflow as tf
from bot_6_a3c import a3c_model
random.seed(0) # make results reproducible
tf.set_random_seed(0)
num_episodes = 10
def main(): . . .
Next, add the following lines which check whether the episode is
done and, if it is, print the episode’s total reward and amend the list
of all results and break thewhileloop early:
/AtariBot/bot_6_dqn.py . . . while True: … episode_reward += reward
if done: print(‘Reward: %d’ % episode_reward)
rewards.append(episode_reward) break . . .
Compare this to the result from the first script, where you ran a
random agent for Space Invaders. The average reward in that case
was only about 150, meaning this result is over twenty times better.
However, you only ran your code for three episodes, as it’s fairly
slow, and the average of three episodes is not a reliable metric.
Running this over 10 episodes, the average is 2756; over 100
episodes, the average is around 2500. Only with these averages can
you comfortably conclude that your agent is indeed performing an
order of magnitude better, and that you now have an agent that plays
Space Invaders reasonably well.
However, recall the issue that was raised in the previous section
regarding sample complexity. As it turns out, this Space Invaders
agent takes millions of samples to train. In fact, this agent required
24 hours on four Titan X GPUs to train up to this current level; in
other words, it took a significant amount of compute to train it
adequately. Can you train a similarly high-performing agent with far
fewer samples? The previous steps should arm you with enough
knowledge to begin exploring this question. Using far simpler
models and per bias-variance tradeoffs, it may be possible.
Conclusion
In this tutorial, you built several bots for games and explored a
fundamental concept in machine learning called bias-variance. A
natural next question is: Can you build bots for more complex
games, such as StarCraft 2? As it turns out, this is a pending research
question, supplemented with open-source tools from collaborators
across Google, DeepMind, and Blizzard. If these are problems that
interest you, see open calls for research at OpenAI, for current
problems.
The main takeaway from this tutorial is the bias-variance tradeoff. It
is up to the machine learning practitioner to consider the effects of
model complexity. Whereas it is possible to leverage highly complex
models and layer on excessive amounts of compute, samples, and
time, reduced model complexity could significantly reduce the
resources required.