0% found this document useful (0 votes)

63 views

AC Project

This document describes a project that uses the actor-critic method and PyTorch to train an AI agent. The agent is initially trained on the pendulum environment from OpenAI Gym to balance an inverted pendulum. The project aims to modify the agent's policies and models to instead train it on a racing environment to navigate a racetrack as fast as possible while avoiding going off the track. The document provides details on the algorithm, example code structure and analysis, and planned modifications needed to adapt the agent to the new environment.

Uploaded by

Chirodea Mihai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

AC Project

Uploaded by

Chirodea Mihai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Department of Computer Science

Technical University of Cluj-Napoca

Intelligent Systems
Laboratory activity 2018-2019

Project title: Actor-Critic method

Tool: Pytorch

Name: Chirodea Mihai Cristian — Condrea Stefan

Group: 30434
Email: m.chirodea@gmail.com — stefan.condrea.7@gmail.com

1
Contents

1 Introduction 3
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Main functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Installing the tool and running it . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Algorithm details and Examples 4

2.1 Algorithm details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Example description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Example analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Project Details 7
3.1 What will the system do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Narrative description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.5 Top level design of the scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.6 Knowledge acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2
Chapter 1

Introduction

1.1 Overview
1. PyTorch is an open-source machine learning library for Python, based on Torch,
used for applications such as natural language processing. It is primarily developed
by Facebook’s artificial-intelligence research group, and Uber’s ”Pyro” Probabilistic
programming language software is built on it.

1.2 Main functionalities

PyTorch provides two high-level features:

1. Tensor computation (like NumPy) with strong GPU acceleration.

2. Deep neural networks built on a tape-based autodiff system.

3. Our example, outputs at each end of an episode the average reward for that episode,
additionally, every episode is animated for a better understanding of the algorithm.

1.3 Installing the tool and running it

1. First install anaconda.

2. Using anaconda, run the commands conda install pytorch - c soumith, pip install
gym[all], pip install numpy.

3. The commands needed are the following:

(a) Create an enviorment in anaconda (conda create name).

(b) Run main.py using python main.py.

3
Chapter 2

Algorithm details and Examples

2.1 Algorithm details

1. Steps repeated by the program for an episode:

(a) First the actor picks an action based on it’s policy and gets feedback from
the critic, the action chosen influences the enviornment and thus, the reward
for that action. An episode consists of 10000 actions, and at the end of each
action, the policies for the actor and critic are updated, unlike the normal
reinforcement methods where the update occurs at the end of the episode.
(b) The longer the agent keeps the pendulum up, the more it gets rewarded, so as
we can see from the episodes above, at the beginning the actor doesn’t perform
well, as expected, but after 500 episodes it manages to keep the pendulum up
for quite some time.
(c) Pseudo Code:

2.2 Example description

1. After many tries to find a propper working example, we managed to find one that
was almost working, we just had to undertand it and correct the deprecated methods
and old incompatible code. This example uses the pendulum enviorment from gym
in which the ai is required to keep a pendulum in the upright position, by moving
either left or right.

2. As for structure, the program is split into 5 files or modules: main, buffer, train,
models, utils.

4
2.3 Example analysis
1. As stated above, the output for the program is the reward for each episode and
an animation describing the decisions taken by the actor. For analysis, we will
concentrate only on the reward part and not on the animated part.

2. Episodes up to 10:

(a) EPISODE :- 1 Episode Completed, AVG reward: -0.14063631507713914

(b) EPISODE :- 2 Episode Completed, AVG reward: -0.15245825319563192
(c) EPISODE :- 3 Episode Completed, AVG reward: -0.1553249504442359
(d) EPISODE :- 4 Episode Completed, AVG reward: -0.11799205773615105
(e) EPISODE :- 5 Episode Completed, AVG reward: -0.14789929131569005
(f) EPISODE :- 6 Episode Completed, AVG reward: -0.13701401380974024
(g) EPISODE :- 7 Episode Completed, AVG reward: -0.1439521036382022
(h) EPISODE :- 8 Episode Completed, AVG reward: -0.14397906303281907
(i) EPISODE :- 9 Episode Completed, AVG reward: -0.1279225645576127
(j) EPISODE :- 10 Episode Completed, AVG reward: -0.1543431574260769

3. Episodes from 501 to 511

(a) EPISODE :- 501 Episode Completed, AVG reward: -0.012322675092690518

(b) EPISODE :- 502 Episode Completed, AVG reward: -0.03488088185067334
(c) EPISODE :- 503 Episode Completed, AVG reward: -0.023373503903452462
(d) EPISODE :- 504 Episode Completed, AVG reward: -0.0123346942695461
(e) EPISODE :- 505 Episode Completed, AVG reward: -0.013230738989921456
(f) EPISODE :- 506 Episode Completed, AVG reward: -0.025706292150818193
(g) EPISODE :- 507 Episode Completed, AVG reward: -0.02513481264172472
(h) EPISODE :- 508 Episode Completed, AVG reward: -0.03586770043686828
(i) EPISODE :- 509 Episode Completed, AVG reward: -0.03565572025182023
(j) EPISODE :- 510 Episode Completed, AVG reward: -0.025022857883360693
(k) EPISODE :- 511 Episode Completed, AVG reward: -0.00026405064733336984

5
Chapter 3

Project Details

3.1 What will the system do

The current system works only on the pendulum enviornment and we intend to modify it by
changing it’s policies and models in order to make it race on a track.

3.2 Narrative description

The racetrack enviornment uses 3 main action in order to move the car: brake, accelerate and
steer. The track is randomised on each run of the program and the car has to follow the road
and go as fast as possible, if it goes off the track it get -1000 points and dies. We have to
be carefull not to press accelerate and steer at the same time as the car is a powerfull RWD
machine and it will start losing traction.

3.3 Facts
The facts of the scenario are that we need to implement new policies for the actor and critic
methods and make the connection with the enviornment as in the current state, the program
crashes on start when we give it the racetrack.

3.4 Specifications
When starting the program, the user will first see the animated window in which the car will try
to move along the track, for the first few episodes the agent won’t be able to perform very well,
but as in the pendulum enviornment, given time, it will learn to control the car and eventually,
move really fast along the track.

3.5 Top level design of the scenario

The main program will reside in the main module, there, the other modules are instantiated
and also there the actor takes the action and gets feedback from the critic. Other important
modules are the train module (which is responsible for updating the policies and loading the
models from file) and the models module (where the policies reside).

7
3.6 Knowledge acquisition
The project I chose requires me to take inspiration from other applications based on pytorch,
and to take into account on how those programs shape their policies. It also requires me to
understand how to connect with the tool and to understand each type of action it uses so the
reward is increased.

3.7 Related work

• Actor-Critic methods: https://towardsdatascience.com/understanding-actor-critic-method

• PyTorch examples: https://github.com/pytorch/examples/tree/master/reinforcement_

learning

Intelligent Systems Group

Project Finance - Practical Case Studies
No ratings yet
Project Finance - Practical Case Studies
9 pages
Skilla Transversal Skills Future Proof
100% (1)
Skilla Transversal Skills Future Proof
48 pages
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
15 pages
Decontamination and Sterilization
No ratings yet
Decontamination and Sterilization
7 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
Summer Training Report - Ishan Patwal
No ratings yet
Summer Training Report - Ishan Patwal
21 pages
03_pytorch_computer_vision
No ratings yet
03_pytorch_computer_vision
29 pages
Practical 1to10
No ratings yet
Practical 1to10
32 pages
Satya Final Minor Report
100% (1)
Satya Final Minor Report
25 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
Artificial Intelligence Lab Manual: (ACADEMIC YEAR: 2018-19) Semester - I
No ratings yet
Artificial Intelligence Lab Manual: (ACADEMIC YEAR: 2018-19) Semester - I
40 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Torch (Machine Learning)
No ratings yet
Torch (Machine Learning)
4 pages
Applied Machine Learning in Python: Nikhil Sharma 1710991526 Data Science Batch
No ratings yet
Applied Machine Learning in Python: Nikhil Sharma 1710991526 Data Science Batch
27 pages
Complete Download (Ebook) Deep Learning With Pytorch by Eli Stevens, Luca Antiga, Thomas Viehmann ISBN 9781617295263, 1617295264 PDF All Chapters
100% (6)
Complete Download (Ebook) Deep Learning With Pytorch by Eli Stevens, Luca Antiga, Thomas Viehmann ISBN 9781617295263, 1617295264 PDF All Chapters
65 pages
Chapter - 1: Existing System
100% (1)
Chapter - 1: Existing System
15 pages
Implemented LeNet on PyTorch
100% (1)
Implemented LeNet on PyTorch
17 pages
LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
No ratings yet
LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
45 pages
AIML_Dom_25_Nov_2024
No ratings yet
AIML_Dom_25_Nov_2024
22 pages
Python GTU Study Material Presentations Unit-2 24072020062038AM
No ratings yet
Python GTU Study Material Presentations Unit-2 24072020062038AM
18 pages
Module-2
100% (1)
Module-2
62 pages
Full download Natural Language Processing for Electronic Design Automation Mathias Soeken pdf docx
50% (2)
Full download Natural Language Processing for Electronic Design Automation Mathias Soeken pdf docx
65 pages
Pytorch
No ratings yet
Pytorch
38 pages
CSE3999 Technical Answers For Real World Problems (TARP)
No ratings yet
CSE3999 Technical Answers For Real World Problems (TARP)
22 pages
Modern AI Pro Essentials
100% (1)
Modern AI Pro Essentials
9 pages
C Programming
No ratings yet
C Programming
55 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
Panaversity Cloud Native Applied Generative AI Engineer
No ratings yet
Panaversity Cloud Native Applied Generative AI Engineer
36 pages
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
No ratings yet
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
21 pages
COC255 - Investigating Natural Language Processing Methods To Classify News Articles As Terrorist Attacks
No ratings yet
COC255 - Investigating Natural Language Processing Methods To Classify News Articles As Terrorist Attacks
67 pages
Expert System Architecture
No ratings yet
Expert System Architecture
5 pages
MIC Assignment4
No ratings yet
MIC Assignment4
9 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
ML Trends
No ratings yet
ML Trends
89 pages
Project Ideas
No ratings yet
Project Ideas
5 pages
Digital Assistance For System Requirement Discovery and Analysis Using Machine Learning Natural Language Processing Algorithm
No ratings yet
Digital Assistance For System Requirement Discovery and Analysis Using Machine Learning Natural Language Processing Algorithm
12 pages
Hackaton Round 1
No ratings yet
Hackaton Round 1
14 pages
Udemy Test4
No ratings yet
Udemy Test4
41 pages
NLTK Installation Guide
No ratings yet
NLTK Installation Guide
13 pages
Ai Exam 1
No ratings yet
Ai Exam 1
10 pages
Major Project
No ratings yet
Major Project
33 pages
Final Srs PDF
No ratings yet
Final Srs PDF
10 pages
Thesis Final
No ratings yet
Thesis Final
63 pages
Part3 ML
No ratings yet
Part3 ML
201 pages
Machine Learning Lab Dlihebca6sem
No ratings yet
Machine Learning Lab Dlihebca6sem
25 pages
Kabir Data Preprocessing Python
No ratings yet
Kabir Data Preprocessing Python
14 pages
Human Resource Analytics: Bachelor of Technology
No ratings yet
Human Resource Analytics: Bachelor of Technology
66 pages
MLOps and Systems - Syllabus and Weekly Schedule - September 2021
No ratings yet
MLOps and Systems - Syllabus and Weekly Schedule - September 2021
4 pages
A Recipe For Training Neural Networks
No ratings yet
A Recipe For Training Neural Networks
15 pages
ML Placement
No ratings yet
ML Placement
6 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
1. LLMs for Me - Introduction LLMs & Generative Text
No ratings yet
1. LLMs for Me - Introduction LLMs & Generative Text
38 pages
Which Quantization Method Is Right For You - (GPTQ vs. GGUF vs. AWQ) - by Maarten Grootendorst - Nov, 2023 - Towards Data Science
No ratings yet
Which Quantization Method Is Right For You - (GPTQ vs. GGUF vs. AWQ) - by Maarten Grootendorst - Nov, 2023 - Towards Data Science
25 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Machine Learning Foundation
No ratings yet
Machine Learning Foundation
4 pages
Natural Language Processing: Rada Mihalcea
No ratings yet
Natural Language Processing: Rada Mihalcea
27 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
Natural Language Processing - Session 1 - Introduction
No ratings yet
Natural Language Processing - Session 1 - Introduction
55 pages
Full Python AI Article
No ratings yet
Full Python AI Article
7 pages
School of Computer Science: Python For ML/Al Internship
No ratings yet
School of Computer Science: Python For ML/Al Internship
12 pages
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Developing Intelligent Agent Systems: A Practical Guide
From Everand
Developing Intelligent Agent Systems: A Practical Guide
Lin Padgham
3/5 (1)
NETZSCH PlantEngineering e
No ratings yet
NETZSCH PlantEngineering e
12 pages
G.R. No. 246255 - Cordova v. Ty
No ratings yet
G.R. No. 246255 - Cordova v. Ty
12 pages
VR Amritsar
No ratings yet
VR Amritsar
10 pages
PDF A Project On Marketing Strategy of Big Bazaar
No ratings yet
PDF A Project On Marketing Strategy of Big Bazaar
18 pages
Doaa Article Presentation Media Literacy
No ratings yet
Doaa Article Presentation Media Literacy
23 pages
Ion Exchange Demineralizers: Big Problems, Small Solutions
No ratings yet
Ion Exchange Demineralizers: Big Problems, Small Solutions
10 pages
Ratio and Proportion Worksheets 7th Grade Worksheet 1
No ratings yet
Ratio and Proportion Worksheets 7th Grade Worksheet 1
8 pages
LOAN
No ratings yet
LOAN
3 pages
GS 08 50208 SMDS Wax Plant Expansion PPD RevB1 PDF
No ratings yet
GS 08 50208 SMDS Wax Plant Expansion PPD RevB1 PDF
28 pages
Export RoHS 20220621 22243924
No ratings yet
Export RoHS 20220621 22243924
2 pages
Definition and Objective of An Assurance Engagement
No ratings yet
Definition and Objective of An Assurance Engagement
3 pages
Indian Facilities Management Services Report-Final
50% (2)
Indian Facilities Management Services Report-Final
51 pages
J40-922 Issue 01.1 - Cabri G2 Flight Manual Supplement - LH Chin Window Mount For External Equipment
No ratings yet
J40-922 Issue 01.1 - Cabri G2 Flight Manual Supplement - LH Chin Window Mount For External Equipment
4 pages
Deep learning exp 2.3 MU
No ratings yet
Deep learning exp 2.3 MU
4 pages
2018 Yam Marine b2b Eu Web
No ratings yet
2018 Yam Marine b2b Eu Web
59 pages
Financial Statement With Adjustments
No ratings yet
Financial Statement With Adjustments
3 pages
Download Complete Grow Fruit Alan Buckingham PDF for All Chapters
100% (5)
Download Complete Grow Fruit Alan Buckingham PDF for All Chapters
85 pages
LV Home Ipid
No ratings yet
LV Home Ipid
2 pages
The Narcotic Drugs and Psychotropic Substances NDPS Act, 1985
No ratings yet
The Narcotic Drugs and Psychotropic Substances NDPS Act, 1985
57 pages
Technical Information: Thread Identification Guide
No ratings yet
Technical Information: Thread Identification Guide
1 page
Criterion Referenced Assessment Workshop Handout 20111502
No ratings yet
Criterion Referenced Assessment Workshop Handout 20111502
7 pages
Heuristics
No ratings yet
Heuristics
4 pages
ENPREP 114E - TDS US English
No ratings yet
ENPREP 114E - TDS US English
4 pages
Buffing Process in Manufacturing
No ratings yet
Buffing Process in Manufacturing
4 pages
Power Quality Improvement of Non-Linear-1337
No ratings yet
Power Quality Improvement of Non-Linear-1337
8 pages
Supplementary-Student-Application-Form-2020
No ratings yet
Supplementary-Student-Application-Form-2020
13 pages