review-article

On reliability of reinforcement learning based production scheduling systems: a comparative survey

Authors:

Constantin Waubert de Puiseau,

Tobias MeisenAuthors Info & Claims

Journal of Intelligent Manufacturing, Volume 33, Issue 4

Pages 911 - 927

https://doi.org/10.1007/s10845-022-01915-2

Published: 01 April 2022 Publication History

Abstract

The deep reinforcement learning (DRL) community has published remarkable results on complex strategic planning problems, most famously in virtual scenarios for board and video games. However, the application to real-world scenarios such as production scheduling (PS) problems remains a challenge for current research. This is because real-world application fields typically show specific requirement profiles that are often not considered by state-of-the-art DRL research. This survey addresses questions raised in the domain of industrial engineering regarding the reliability of production schedules obtained through DRL-based scheduling approaches. We review definitions and evaluation measures of reliability both, in the classical numerical optimization domain with focus on PS problems and more broadly in the DRL domain. Furthermore, we define common ground and terminology and present a collection of quantifiable reliability definitions for use in this interdisciplinary domain. Concludingly, we identify promising directions of current DRL research as a basis for tackling different aspects of reliability in PS applications in the future.

References

[1]

Abdolrazzagh-Nezhad M and Abdullah S Job shop scheduling: Classification, constraints and objective functions International Journal of Computer and Information Engineering 2017 11 429-434

[2]

Abdullah, M. A., Ren, H., Ammar, H. B., Milenkovic, V., Luo, R., Zhang, M., et al. (2019). Wasserstein robust reinforcement learning. https://arxiv.org/pdf/1907.13196.

[3]

Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In: ICML'17: Proceedings of the 34th international conference on machine learning (70), 22–31.

[4]

Al-Hinai N and ElMekkawy TY Robust and stable flexible job shop scheduling with random machine breakdowns using a hybrid genetic algorithm International Journal of Production Economics 2011 132 279-291

[5]

Allahverdi A A survey of scheduling problems with no-wait in process European Journal of Operational Research 2016 255 665-686

[6]

Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., & Topcu, U. (2018). Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

[7]

Amrina, E., & Yusof, S. M. (2011). Key performance indicators for sustainable manufacturing evaluation in automotive companies. In: Proceedings of the 2011 IEEE international conference on industrial engineering and engineering management (IEEM), Singapore, Singapore, 12/6/2011–12/9/2011 (pp. 1093–1097). [Piscataway, NJ]: IEEE.

[8]

Arviv, K., Stern, H., & Edan, Y. (2016). Collaborative reinforcement learning for a two-robot job transfer flow-shop scheduling problem. IEEE SMC 2013 Conference, 54(4), 1196–1209.

[9]

Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., et al. (2020). Agent57: Outperforming the Atari human benchmark. Proceedings of the 37th International Conference on Machine Learning, 37(119), 507–5017.

[10]

Bastani, O. (2019). Safe reinforcement learning with nonlinear dynamics via model predictive shielding. https://arxiv.org/abs/1905.10691.

[11]

Bäuerle N and Ott J Markov decision processes with average-value-at-risk criteria Mathematical Methods of Operations Research 2011 74 361-379

[12]

Bäuerle N and Rieder U More risk-sensitive markov decision processes Mathematics of Operations Research 2014 39 105-120

[13]

Bean JC, Birge JR, Mittenthal J, and Noon CE Matchup scheduling with multiple resources, release dates and disruptions Operations Research 1991 39 3 470-483

[14]

Bellemare MG, Candido S, Castro PS, Gong J, Machado MC, Moitra S, et al. Autonomous navigation of stratospheric balloons using reinforcement learning Nature 2020 588 77-82

[15]

Bellemare MG, Naddaf Y, Veness J, and Bowling M The arcade learning environment: An evaluation platform for general agents Journal of Artificial Intelligence Research 2013 47 253-279

[16]

Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 908–918.

[17]

Birolini A Reliability engineering: Theory and practice 2004 Springer

[18]

Bohez, S., Abdolmaleki, A., Neunert, M., Buchli, J., Heess, N., & Hadsell, R. (2019). Value constrained model-free continuous control. https://arxiv.org/pdf/1902.04623.

[19]

Boutilier, C., & Lu, T. (2016). Budget allocation using weakly coupled, constrained Markov decision processes. In: UAI'16: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 52–61.

[20]

Carrara, N., Leurent, E., Laroche, R., Urvoy, T., Maillard, O.-A., & Pietquin, O. (2019). Budgeted reinforcement learning in continuous state space. NeurIPS 2019: Advances in neural information processing systems, 32.

[21]

Chaari, T., Chaabane, S., Aissani, N., & Trentesaux, D. (2014). Scheduling under uncertainty: Survey and research directions. In: Proceedings of the 3rd international conference on advanced logistics and transport, 2014, IEEE.

[22]

Chan, S. C. Y., Fishman, S., Canny, J., Korattikara, A., & Guadarrama, S. (2020). Measuring the reliability of reinforcement learning algorithms. International Conference on Learning Representations.

[23]

Cheng R, Orosz G, Murray RM, and Burdick JW End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks Proceedings of the AAAI Conference on Artificial Intelligence 2019 33 3387-3395

[24]

Cheng, R., Verma, A., Orosz, G., Chaudhuri, S., Yue, Y., & Burdick, J. W. (2019b). Control regularization for reduced variance reinforcement learning. In: Proceedings of the 36th international conference on machine learning (7).

[25]

Cheng TCE and Podolsky S Just-in-time manufacturing: An introduction/T. C. E. Cheng and S. Podolsky 1996 2 Chapman & Hall

[26]

Chollet, F. (2019). On the Measure of Intelligence. https://arxiv.org/pdf/1911.01547.

[27]

Chow Y, Ghavamzadeh M, Janson L, and Pavone M Risk-constrained reinforcement learning with percentile risk criteria Journal of Machine Learning Research 2018 18 1-51

[28]

Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., & Ghavamzadeh, M. (2018b). Lyapunov-based safe policy optimization for continuous control. In: NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 8103–8112.

[29]

Da Silva FL and Costa AHR A survey on transfer learning for multiagent reinforcement learning systems Journal of Artificial Intelligence Research 2019 64 645-703

[30]

Dabney, W., Ostrovski, G., Silver, D., & Munos, R. (2018). Implicit quantile networks for distributional reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, 1096–1105.

[31]

Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, and Tassa Y Safe exploration in continuous action spaces Journal of Artificial Intelligence Research 2012 45 1

[32]

Daniels RL and Carrillo JE Beta-robust scheduling for single-machine systems with uncertain processing times IIE Transactions 1997 29 977-985

[33]

Derman, E., Mankowitz, D. J., Mann, T. A., & Mannor, S. (2018). Soft-robust actor-critic policy-gradient. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80

[34]

Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016). Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33th international conference on machine learning, vol. 48, pp. 1329–1338.

[35]

Dulac-Arnold, G., Mankowitz, D., & Hester, T. (2019). Challenges of real-world reinforcement learning. ICML Workshop on Real-Life Reinforcement Learning.

[36]

Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., et al. (2020). Implementation matters in deep RL: a case study on PPO and TRPO. In: Eighth international conference on learning representations.

[37]

Ferdowsi, A., Challita, U., Saad, W., & Mandayam, N. B. (2018). Robust deep reinforcement learning for security and safety in autonomous vehicle systems. In: Proceedings of the 21st international conference on intelligent transportation systems (ITSC).

[38]

Fisac, J. F., Lugovoy, N. F., Rubies-Royo, V., Ghosh, S., & C. J. Tomlin. (2019). Bridging Hamilton-Jacobi safety analysis and reinforcement learning. In: Proceedings of the 2019 international conference on robotics and automation (ICRA) (pp. 8550–8556).

[39]

Fu, J., Luo, K., & Levine, S. (2017). Learning robust rewards with adversarial inverse reinforcement learning. https://arxiv.org/pdf/1710.11248.

[40]

Fuchigami HY and Rangel S A survey of case studies in production scheduling: Analysis and perspectives Journal of Computational Science 2018 25 425-436

[41]

Gleißner, W. (2011). Quantitative Verfahren im Risikomanagement: Risikoaggregation, Risikomaße und Performancemaße. Der Controlling-Berater, vol. 16

[42]

Golpîra H and Tirkolaee EB Stable maintenance tasks scheduling: A bi-objective robust optimization model Computers and Industrial Engineering 2019

[43]

Goren S and Sabuncuoglu I Robustness and stability measures for scheduling: Single-machine environment IIE Transactions 2008 40 66-83

[44]

Göttlich S and Knapp S Uncertainty quantification with risk measures in production planning Journal of Mathematics in Industry 2020

[45]

Hall NG and Posner ME Sensitivity analysis for scheduling problems Journal of Scheduling 2004 7 49-83

[46]

Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2017). Deep reinforcement learning that matters. http://arxiv.org/pdf/1709.06560v3.

[47]

Hiraoka, T., Imagawa, T., Mori, T., Onishi, T., & Tsuruoka, Y. (2019). Learning robust options by conditional value at risk optimization. In: NeurIPS 2019: Advances in neural information processing systems, vol. 33

[48]

Kenton, Z., Filos, A., Evans, O., & Gal, Y. (2019). Generalizing from a few environments in safety-critical reinforcement learning. In: SafeML ICLR 2019 Workshop.

[49]

Kouvelis, P., & Yu, G. (1997). Robust discrete optimization and its applications (Nonconvex optimization and its applications, Vol. 14). Boston, MA: Springer.

[50]

Kuhnle A, Kaiser J-P, Theiß F, Stricker N, and Lanza G Designing an adaptive production control system using reinforcement learning Journal of Intelligent Manufacturing 2020

[51]

Lang, S., Lanzerath, N., Reggelin, T., Behrendt, F., & Müller, M. (2020). Integration of deep reinforcement learning and discrete-event simulation for real-time scheduling of a flexible job shop production. In: Proceedings Winter Simulation Conference 2020.

[52]

Lazic, N., Lu, T., Boutilier, C., Ryu, M.K., Wong, E.J., Roy, B., et al. (2018). Data center cooling using model-predictive control.

[53]

Leon VJ, Wu SD, and Storer RH Robustness measures and robust scheduling for job shops IIE Transactions 1994 26 5 32-43

[54]

Leusin, M., Frazzon, E., Uriona Maldonado, M., Kück, M., & Freitag, M. (2018). Solving the job-shop scheduling problem in the industry 4.0 era. Technologies, 6, 107.

[55]

Lipton, Z. C., Azizzadenesheli, K., Kumar, A., Li, L., Gao, J., & Deng, L. (2016). Combating reinforcement learning's sisyphean curse with intrinsic fear. https://arxiv.org/pdf/1611.01211.

[56]

Luo S Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning Applied Soft Computing 2020 91

[57]

Mankowitz, D. J., Tamar, A., & Mannor, S. (2016). Situational awareness by risk-conscious skills. https://arxiv.org/pdf/1610.02847.

[58]

Mankowitz, D. J., Mann, T. A., Bacon, P., Precup, D., & Mannor, S. (2018). Learning robust options (wasserste). https://arxiv.org/abs/1802.03236.

[59]

Martí R, Pardalos PM, and Resende MGC Handbook of heuristics (Springer reference) 2018 Springer

[60]

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al. (2013). Playing Atari with deep reinforcement learning. https://arxiv.org/pdf/1312.5602.

[61]

Osband, I., Doron, Y., Hessel, M., Aslanides, J., Sezener, E., Saraiva, A., et al. (2020). Behaviour suite for reinforcement learning. In; International Conference on Learning Representations.

[62]

Osogami, T. (2012). Robustness and risk-sensitivity in Markov decision processes. In: NIPS'12: Proceedings of the 25th international conference on neural information processing systems, vol. 1, pp. 233–241.

[63]

Pinedo, M. (2016). Scheduling: Theory, algorithms, and systems/by Michael L. Pinedo. Cham: Springer.

[64]

Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In: International Conference on Machine Learning, pp. 2817–2826.

[65]

Policella N, Cesta A, Oddi A, and Smith S From precedence constraint posting to partial order schedules: A CSP approach to Robust Scheduling AI Communications 2007 20 163-180

[66]

Prashanth, L. A. (2014). Policy gradients for CVaR-constrained MDPs. In: P. Auer (Ed.), Cham, 2014 (pp. 155–169, LNCS sublibrary. SL 7, Artificial intelligence, Vol. 8776). Cham: Springer.

[67]

Rahmani D and Heydari M Robust and stable flow shop scheduling with unexpected arrivals of new jobs and uncertain processing times Journal of Manufacturing Systems 2014 33 84-92

[68]

Rinciog, A., Mieth, C., Scheikl, P. M., & Meyer, A. (2020). Sheet-metal production scheduling using AlphaGo zero.

[69]

Ruszczyński A Risk-averse dynamic programming for Markov decision processes Mathematical Programming 2010 125 235-261

[70]

Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust region policy optimization. In: ICML'15: Proceedings of the 32nd international conference on international conference on machine learning, vol. 37, pp. 1887–1897.

[71]

Shen X-N, Han Y, and Fu J-Z Robustness measures and robust scheduling for multi-objective stochastic flexible job shop scheduling problems Soft Computing 2017 21 6531-6554

[72]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm Science 2018 6419 1140-1144

[73]

Sotskov Y, Sotskova NY, and Werner F Stability of an optimal schedule in a job shop Omega 1997 25 397-414

[74]

Stooke, A., Achiam, J., & Abbeel, P. (2020). Responsive safety in reinforcement learning by PID lagrangian methods. In: International Conference on Machine Learning, pp. 9133–9143.

[75]

Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, p. 12.

[76]

Sutton RS and Barto AG Reinforcement learning: An introduction 2018 MIT Press

[77]

Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015a). Policy gradient for coherent risk measures. In: NIPS'15: Proceedings of the 28th international conference on neural processing systems, pp. 1468–1476.

[78]

Tamar, A., Glassner, Y., & Mannor, S. (2015b). Optimizing the CVaR via sampling. In: AAAI'15: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp. 2993–2999.

[79]

Tessler, C., Mankowitz, D. J., & Mannor, S. (2018). Reward constrained policy optimization. https://arxiv.org/pdf/1805.11074.

[80]

Tessler, C., Efroni, Y., & Mannor, S. (2019). Action robust reinforcement learning and applications in continuous control. In: Proceedings of the 36th international conference on machine learning, vol. 97, pp. 6215–6224.

[81]

The, Y., Bapst, V., Czarnecki, W.M., Quan, J., Kirkpatrick, J., Hadsell, R., et al. (2017). Distral: Robust multitask reinforcement learning. In: NIPS'17: Proceedings of the 31st international conference on neural information processing systems, pp. 4496–4506.

[82]

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS).

[83]

Vieira GE, Herrmann JW, and Lin E Rescheduling manufacturing systems: A framework of strategies, policies, and methods Journal of Scheduling 2003 6 39-62

[84]

Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning Nature 2019 575 350-354

[85]

Wabersich, K. P., & Zeilinger, M. N. (2018). Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning. https://arxiv.org/pdf/1812.05506.

[86]

Waschneck B, Reichstaller A, Belzner L, Altenmüller T, Bauernhansl T, Knapp A, et al. Optimization of global production scheduling with deep reinforcement learning Procedia CIRP 2018 72 1264-1269

[87]

Wiesemann W, Kuhn D, and Rustem B Robust Markov decision processes Mathematics of Operations Research 2013 38 153-183

[88]

Witty, S., Lee, J. K., Tosch, E., Atrey, A., Littman, M., & Jensen, D. (2018). Measuring and characterizing generalization in deep reinforcement learning. https://arxiv.org/pdf/1812.02868.

[89]

Wu CW, Brown KN, and Beck JC Scheduling with uncertain durations: Modeling -robust scheduling with constraints Computers and Operations Research 2009 36 2348-2356

[90]

Yang, T.-Y., Rosca, J., Narasimhan, K., & Ramadge, P. J. (2020). Projection-based constrained policy optimization. https://arxiv.org/pdf/2010.03152.

[91]

Yehuda, G., Gabel, M., & Schuster, A. (2020). It’s not what machines can learn, it’s what we cannot teach. In: International conference on machine learning, pp. 10831–10841.

[92]

Yoshida, Y. (2019). Risk-sensitive markov decision under risk constraints with coherent risk measures. In V. Torra, Y. Narukawa, G. Pasi, & M. Viviani (Eds.), Cham, 2019 (pp. 29–40, LNCS SublibraryL SL7 - Artificial Intelligence, Vol. 11676). Cham, Switzerland: Springer.

[93]

Zhang, J., Bedi, A. S., Wang, M., & Koppel, A. (2020). Cautious reinforcement learning via distributional risk in the dual domain. https://arxiv.org/pdf/2002.12475.

[94]

Zhu, W., & Wang, B. (2017). New robust single machine scheduling to hedge against processing time uncertainty. In: Proceedings of the 2017 29th Chinese Control And Decision Conference (CCDC) (pp. 2418–2423).

Cited By

Shen HWang B(2023)Q-learning based Dynamic Scheduling for No-wait Flow Shop with Maintenance Window and Minimization of Total TardinessProceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum10.1145/3660395.3660400(21-24)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1145/3660395.3660400
Nasuta AKemmerling MLütticke DSchmitt R(2023)Reward Shaping for Job Shop SchedulingMachine Learning, Optimization, and Data Science10.1007/978-3-031-53969-5_16(197-211)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-53969-5_16
Heik DBahrpeyma FReichelt D(2023)Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing SystemsMachine Learning, Optimization, and Data Science10.1007/978-3-031-53966-4_18(237-254)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-53966-4_18

Index Terms

On reliability of reinforcement learning based production scheduling systems: a comparative survey
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-agent deep reinforcement learning: a survey
Abstract
The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid ...
Application of reinforcement learning for agent-based production scheduling

Reinforcement learning (RL) has received some attention in recent years from agent-based researchers because it deals with the problem of how an autonomous agent can learn to select proper actions for achieving its goals through interacting with its ...
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting

Reinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent Manufacturing

Journal of Intelligent Manufacturing Volume 33, Issue 4

Apr 2022

272 pages

ISSN:0956-5515

Issue’s Table of Contents

© The Author(s) 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 April 2022

Accepted: 20 January 2022

Received: 07 April 2021

Author Tags

Qualifiers

Review-article

Funding Sources

Bundesministerium für Wirtschaft und Energie
Bergische Universität Wuppertal (3089)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shen HWang B(2023)Q-learning based Dynamic Scheduling for No-wait Flow Shop with Maintenance Window and Minimization of Total TardinessProceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum10.1145/3660395.3660400(21-24)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1145/3660395.3660400
Nasuta AKemmerling MLütticke DSchmitt R(2023)Reward Shaping for Job Shop SchedulingMachine Learning, Optimization, and Data Science10.1007/978-3-031-53969-5_16(197-211)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-53969-5_16
Heik DBahrpeyma FReichelt D(2023)Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing SystemsMachine Learning, Optimization, and Data Science10.1007/978-3-031-53966-4_18(237-254)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-53966-4_18

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents