Abstract
Action-reward learning is a reinforcement learning method. In this machine learning approach, an agent interacts with non-deterministic control domain. The agent selects actions at decision epochs and the control domain gives rise to rewards with which the performance measures of the actions are updated. The objective of the agent is to select the future best actions based on the updated performance measures. In this paper, we develop an asynchronous action-reward learning model which updates the performance measures of actions faster than conventional action-reward learning. This learning model is suitable to apply to nonstationary control domain where the rewards for actions vary over time. Based on the asynchronous action-reward learning, two situation reactive inventory control models (centralized and decentralized models) are proposed for a two-stage serial supply chain with nonstationary customer demand. A simulation based experiment was performed to evaluate the performance of the proposed two models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press
Narendra KS, Thatcher MA (1989) Learning automata: an introduction. Prentice-Hall
Lee HL, Padmanabhan V, Whang S (1997) The bullwhip effect in supply chains. Sloan Manag Rev 38(3):93–102
Graves S (1999) A single-item inventory model for a non-stationary demand process. Manufactur Service Oper Manag 1(1):50–61
Sethi S, Cheng F (1997) Optimality of (s, S) policies in inventory models with Markovian demands. Oper Res 45(6):931–939
Gavirneni S, Tayur S (2001) An efficient procedure for non-stationary inventory control. IIE Trans 33(2):83–89
Cachon G, Zipkin PH (1999) Competitive and cooperative inventory policies in a two stage supply chain. Manag Sci 45(7):936–953
Waller M, Johnson M, Davis T (1999) Vendor-managed inventory in the retail supply chain. J Busin Logist 20(1):183–203
Achabal DD, McIntyre SH, Smith SA, Kalyanam K (2000) A decision support system for vendor managed inventory. J Retail 76(4):430–454
Zhao X, Xie J, Lau R (2001) Improving the supply chain performance: use of forecasting models versus early order commitments. Int J Prod Res 39(17):3923–3939
Kim CO, Jun J, Baek JK, Smith RL, Kim YD (2005) Adaptive inventory control models for supply chain management. Int J Adv Manufactur Technol 26(9–10):1184–1192
Kaipia R, Holmstrom J, Tanskanen K (2002) VMI: what are you loosing if you let your customer place orders? Prod Plan Contr 13(1):17–25
Chaudhury A, Whinston AB (1990) Towards an adaptive Kanban system. Int J Prod Res 28(3):437–458
Takahashi K, Nakamura N (1999) Reacting JIT ordering systems to the unstable changes in demand. Int J Prod Res 37(10):2293–2313
Takahashi K (2003) Comparing reactive Kanban systems. Int J Prod Res 41(18):4317–4337
Quintana R, Lambert BK, Roderick L (1997) Adaptive pull-type production control using Kalman Filters. Int J Prod Res 35(10):2689–2699
Min HS, Yih Y, Kim CO (1998) A competitive neural network approach to multi-objective FMS scheduling. Int J Prod Res 36(7):1749–1765
Azri Y, Iaroslavitz L (1999) Neural network-based adaptive production control system for a flexible manufacturing cell under a random environment. IIE Trans 31(3):217–230
Min HS, Yih Y (2003) Selection of dispatching rules on multiple dispatching decision points in real-time scheduling of a semiconductor wafer fabrication system. Int J Prod Res 41(16):3921–3941
Kim CO, Min HS, Yih Y (1998) Integration of inductive learning and neural network for multi-objective FMS scheduling. Int J Prod Res 36(9):2497–2509
Li DC, Chen LS, Lin YS (2003) Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. Int J Prod Res 41(17):4011–4024
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Sutton RS (1988) Learning to predict by the method of temporal difference. Mach Learn 3:9–44
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13:103–130
Sutton RS (1991) Dyna, and integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163
Simchi-Levi D, Kaminsky P, Simchi-Levi E (2003) Designing and managing the supply chain: concepts, strategies, and case studies. McGraw-Hill
Axsäter S (2001) A framework for decentralized multi-echelon inventory control. IIE Trans 33(2):91–97
Brown RG (1959) Statistical forecasting for inventory control. McGraw-Hill
Zipkin PH (2000) Foundations of inventory management. McGraw-Hill
Author information
Authors and Affiliations
Corresponding author
Additional information
Chang Ouk Kim received his Ph.D. in industrial engineering from Purdue University in 1996 and his B.S. and M.S. degrees from Korea University, Republic of Korea in 1988 and 1990, respectively. From 1998--2001, he was an assistant professor in the Department of Industrial Systems Engineering at Myongji University, Republic of Korea. In 2002, he joined the Department of Information and Industrial Engineering at Yonsei University, Republic of Korea and is now an associate professor. He has published more than 30 articles at international journals. He is currently working on applications of artificial intelligence and adaptive control theory in supply chain management, RFID based logistics information system design, and advanced process control in semiconductor manufacturing.
Ick-Hyun Kwon is a postdoctoral researcher in the Department of Civil and Environmental Engineering at University of Illinois at Urbana-Champaign. Previous to this position, Dr. Kwon was a research assistant professor in the Research Institute for Information and Communication Technology at Korea University, Seoul, Republic of Korea. He received his B.S., M.S., and Ph.D. degrees in Industrial Engineering from Korea University, in 1998, 2000, and 2006, respectively. His current research interests are supply chain management, inventory control, production planning and scheduling.
Jun-Geol Baek is an assistant professor in the Department of Business Administration at Kwangwoon University, Seoul, Korea. He received his B.S., M.S., and Ph.D. degrees in Industrial Engineering from Korea University, Seoul, Korea, in 1993, 1995, and 2001 respectively. From March 2002 to February 2007, he was an assistant professor in the Department of Industrial Systems Engineering at Induk Institute of Technology, Seoul, Korea. His research interests include machine learning, data mining, intelligent machine diagnosis, and ubiquitous logistics information systems.
An erratum to this article can be found at http://dx.doi.org/10.1007/s10489-007-0087-6
Rights and permissions
About this article
Cite this article
Kim, C.O., Kwon, IH. & Baek, JG. Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Appl Intell 28, 1–16 (2008). https://doi.org/10.1007/s10489-007-0038-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-007-0038-2