Demonstrating The Openai Gym and Deep Reinforcement Learning When Applied To Atari 2600 Games
Demonstrating The Openai Gym and Deep Reinforcement Learning When Applied To Atari 2600 Games
Demonstrating The Openai Gym and Deep Reinforcement Learning When Applied To Atari 2600 Games
DAVID MEYER
CSCI 4830: MACHINE LEARNING
OpenAI Gym
Introduction Gym Atari Learning Environment
In 2015 Elon Musk and Sam Altman founded a Skimage python image processing
new Artificial Intelligence and Machine learning library
initiative known as OpenAI. The aim of the Keras python neural networking library
initiative is to promote free collaboration
between institutions by making its patents and Keras is primarily used to define the deep q
research open to the public.1 Over $1 billion has network while TensorFlow is used for
been pledged to the company.2 The key founders, optimization and execution of the
Elon Musk and Sam Altman, have stated that
environments.5
they formed OpenAI due to concerns about
existential risk from artificial general
Related Work
intelligence.3
The Montreal Institute for Learning Algorithms
(MILA), proposed a paper defining further
In April of 2016, OpenAI released its first
asynchronous methods for deep reinforcement
product free to the public. This product is known
learning. 6 Using GPUs vs. CPUs makes a
as OpenAI Gym and it functions as a platform for
substantial difference in recorded results. MILA
reinforcement learning research. 4 It is easy to
explains this with regard to Atari games.
setup and provides a large selection of test
environments in which different methods for
Experiments
reinforcement learning can be applied.
Breakout
The first experiment involves asynchronously
In this paper OpenAI Gym and its application to
training a network to play the game Breakout.
Atari games will be demonstrated.5 Originally, a
Eight active-learner threads are used in order to
demo of reinforcement learning applied to the
aid with training.
popular PC game, Doom, was planned. However,
toolkits for plugging the PyDoom library into
deep learning reinforcement systems, such as
TensorFlow, were sparsely available.
1
Gershgorn, Dave (December 11, 2015). "New
'OpenAI' Artificial Intelligence Group Formed By
Elon Musk, Peter Thiel, And More". Popular Science.
2 Figure 1 (Window of Breakout)
"Tech giants pledge $1bn for 'altruistic AI' venture,
OpenAI". BBC News. $: python async_dqn.py --experiment breakout --
3 Cade Metz (27 April 2016). "Inside OpenAI, Elon
game "Breakout-v0" --num_concurrent 8
Musk's Wild Plan to Set Artificial Intelligence Free".
Wired magazine.
4 Greg Brockman; John Schulman (27 April 2016).
"OpenAI Gym Beta". OpenAI Blog. OpenAI.
5 6 Montreal Institute for Learning Algorithms (4
Corey Lynch (25 June 2016). Asyncronous RL in
Tensorflow + Keras + OpenAI's Gym. Github. February 2016). Asynchronous Methods for Deep
Github.com/coreylynch/async-rl Reinforcement Learning. University of Montreal.
1
the graphs from the training, evaluation, and
Pacman checkpoint files.
The second experiment involves four active-
learner threads in order to aid with training in
playing the classic game Pacman.
Figure 4 (Sidebar of TensorBoard)
Figure 3 (Window of Asteroids) Figure 5 (Episode Reward for Breakout)
2
8,000 steps back from this high point there is a
local minimum and then it appears that the
system has an epiphany in which the Episode
Reward begins to increase exponentially. If the
simulation were to run longer than 50,000 steps a
substantial score may be achieved.
Figure 6 (Epsilon of Breakout)
Figure 9 (Epsilon of Pacman)
Figure 7 (Max Q Value of Breakout)
Asteroids
The game Asteroids behaves similarly to
Breakout in which if the simulation runs for too
long, there is no noticeable benefit from deep
reinforcement learning.
Figure 8 (Episode Reward of Pacman)
3
Future Work
Since the OpenAI Gym is expandable, a plethora
of other Atari games could be tested with deep
reinforcement learning. Longer tests on more
powerful systems could also be performed.
Perhaps if time allowed, the recorded results for
the weakest performers (Breakout and Asteroids)
may be completely different. It is plausible that
longer simulations (100,000 steps or more) could
demonstrate better results.
Figure 11 (Episode Reward of Asteroids) Conclusion
After using OpenAI Gym paired with Atari
A maximum Episode Reward of 2200 is
games, one can conclude that the system is
achieved roughly 10,000 steps into the
generally useful for machine learning
simulation. The Episode Reward hovers around
researchers. This is because the toolkit provides
1000 then after.
a relatively simple and universal protocol for
simulating deep reinforcement learning
environments. Although the Doom environment
is not quite ready, the Atari environment proves
itself quite adaptable when TensorFlow and
Keras are applied.
Figure 13 (Max Q Value of Asteroids)
7 OpenAI (5 December 2016). Universe. OpenAI
Blog. OpenAI.