Abstract
A comparison between four common reinforcement learning algorithms, namely deep Q network (DQN), double deep Q network (DDQN), prioritized experience reply (DQN + PER) and double DQN + PER; and discussion on the methodology with the limitations and advantages of each algorithm are included in this paper. In order to provide these insights, OpenAI environments that demonstrate the working of these algorithms was used. Mountain car environment was used to generalize our results and prove the consistency of our insights. Insights were derived by evaluating basic parameters like, episode length, minimum rewards, maximum rewards and average rewards. This study discusses strategies for including reinforcement learning in supply chain management by using it for inventory management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Christopher M (1992) Logistic and supply chain management. Pitman Publishing, London
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Lin L-J (1993) Reinforcement learning for robots using neural networks. Technical report, DTIC Document
Baird L (1995) Residual algorithms: reinforcement learning with function approximation. In: Machine learning: proceedings of the twelfth international conference, pp 30–37
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Barbuceanu M, Fox MS (1996) Coordinating multiple agents in the supply chain. In: Proceedings of the fifth workshop on enabling technology for collaborative enterprises (WET ICE’96), Stanford University, CA, pp 134–141
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Diuk C, Cohen A, Littman ML (2008) An object-oriented representation for efficient reinforcement learning. In: Proceedings of the 25th international conference on machine learning, pp 240–247
van Hasselt, H (2011) Insights in reinforcement learning. Ph.D. thesis, Utrecht University
Sutton RS, Mahmood AR, White M (2015) An emphatic approach to the problem of off-policy temporal-difference learning. arXiv preprint arXiv:1503.04269
Van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning
Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singi, S., Gopal, S., Auti, S., Chaurasia, R. (2020). Reinforcement Learning for Inventory Management. In: Vasudevan, H., Kottur, V., Raina, A. (eds) Proceedings of International Conference on Intelligent Manufacturing and Automation. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-4485-9_33
Download citation
DOI: https://doi.org/10.1007/978-981-15-4485-9_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4484-2
Online ISBN: 978-981-15-4485-9
eBook Packages: EngineeringEngineering (R0)