Intra-task Curriculum Learning for Faster Reinforcement Learning in Video Games

du Preez-Wilkinson, Nathaniel; Gallagher, Marcus; Hu, Xuelei

doi:10.1007/978-3-030-03991-2_6

Nathaniel du Preez-Wilkinson¹⁶,
Marcus Gallagher¹⁶ &
Xuelei Hu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11320))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2513 Accesses

Abstract

In this paper we present a new method for improving reinforcement learning training times under the following two assumptions: (1) we know the conditions under which the environment gives reward; and (2) we can control the initial state of the environment at the beginning of a training episode. Our method, called intra-task curriculum learning, presents the different episode starting states to an agent in order of increasing distance to immediate reward.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Designing Curriculum for Deep Reinforcement Learning in StarCraft II

Transfer Learning and Curriculum Learning in Sokoban

Extending the Capabilities of Reinforcement Learning Through Curriculum: A Review of Methods and Applications

Article 29 October 2021

Notes

1.
Eligibility traces are recorded for each value in value function based reinforcement learning. We use the term “state” here, but in practice the values can also be stored for state-action pairs (e.g. in Q-learning).
2.
For the purpose of this paper, we define the distance form state $s_a$ to state $s_b$ as the minimum number of transitions required to get from $s_a$ to $s_b$. This definition is sufficient because the environments we use have deterministic transitions, but a different definition would be required for stochastic environments.
3.
We have a $5 \times 5$ grid with 7 walls; leaving 18 free spaces.
4.
We number the states in the 2D environment from left to right, bottom to top.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Pathak, D., et al.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)
Google Scholar
Chentanez, N., Barto, A.G., Singh, S.P.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288 (2005)
Google Scholar
Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. In: ICLR (2017)
Google Scholar
Bengio, Y., et al.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia
Nathaniel du Preez-Wilkinson, Marcus Gallagher & Xuelei Hu

Authors

Nathaniel du Preez-Wilkinson
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Gallagher
View author publications
You can also search for this author in PubMed Google Scholar
Xuelei Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathaniel du Preez-Wilkinson .

Editor information

Editors and Affiliations

University of Canterbury, Christchurch, New Zealand
Tanja Mitrovic
School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Bing Xue
RMIT University, Melbourne, VIC, Australia
Xiaodong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

du Preez-Wilkinson, N., Gallagher, M., Hu, X. (2018). Intra-task Curriculum Learning for Faster Reinforcement Learning in Video Games. In: Mitrovic, T., Xue, B., Li, X. (eds) AI 2018: Advances in Artificial Intelligence. AI 2018. Lecture Notes in Computer Science(), vol 11320. Springer, Cham. https://doi.org/10.1007/978-3-030-03991-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-03991-2_6
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03990-5
Online ISBN: 978-3-030-03991-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Intra-task Curriculum Learning for Faster Reinforcement Learning in Video Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Designing Curriculum for Deep Reinforcement Learning in StarCraft II

Transfer Learning and Curriculum Learning in Sokoban

Extending the Capabilities of Reinforcement Learning Through Curriculum: A Review of Methods and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Intra-task Curriculum Learning for Faster Reinforcement Learning in Video Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Designing Curriculum for Deep Reinforcement Learning in StarCraft II

Transfer Learning and Curriculum Learning in Sokoban

Extending the Capabilities of Reinforcement Learning Through Curriculum: A Review of Methods and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation