Abstract
Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points, which are solutions, policies, or strategies that cause an irrecoverable loss (e.g., breakage of a machine or equipment, or life threat). Although a comprehensive survey of safe reinforcement learning algorithms was published in 2015, a number of new algorithms have been proposed thereafter, and related works in active learning and in optimization were not considered. This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary computing, and active learning. We provide the fundamental concepts on which the reviewed algorithms are based and a characterization of the individual algorithms. We conclude by explaining how the algorithms are connected and suggestions for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allmendinger, R.: Tuning evolutionary search for closed-loop optimization. Ph.D. thesis, The University of Manchester, UK (2012)
Allmendinger, R., Knowles, J.D.: Evolutionary search in lethal environments. In: International Conference on Evolutionary Computation Theory and Applications, pp. 63–72. SciTePress (2011)
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002)
Bachoc, F., Helbert, C., Picheny, V.: Gaussian process optimization with failures: classification and convergence proof. J. Glob. Optim. 78, 483–506 (2020). https://doi.org/10.1007/s10898-020-00920-0
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. arXiv preprint arXiv:1602.04450 (2016)
Berkenkamp, F., Schoellig, A.P., Krause, A.: Safe controller optimization for quadrotors with Gaussian processes. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 491–496. IEEE (2016)
Bıyık, E., Margoliash, J., Alimo, S.R., Sadigh, D.: Efficient and safe exploration in deterministic Markov decision processes with unknown transition models. In: 2019 American Control Conference (ACC), pp. 1792–1799. IEEE (2019)
Brochu, E., Cora, V., de Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, December 2010
Duivenvoorden, R.R.P.R., Berkenkamp, F., Carion, N., Krause, A., Schoellig, A.P.: Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning. IFAC-PapersOnLine 50(1), 11800–11807 (2017)
Ferrer, J., López-Ibáñez, M., Alba, E.: Reliable simulation-optimization of traffic lights in a real-world city. Appl. Soft Comput. 78, 697–711 (2019)
Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45(1–3), 50–79 (2009)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 646–653. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_63
Gosavi, A.: Reinforcement learning: a tutorial survey and recent advances. INFORMS J. Comput. 21(2), 178–192 (2009)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Global Optim. 34(3), 441–466 (2006). https://doi.org/10.1007/s10898-005-2454-3
Kaji, H., Ikeda, K., Kita, H.: Avoidance of constraint violation for experiment-based evolutionary multi-objective optimization. In: Proceedings of the 2009 Congress on Evolutionary Computation (CEC 2009), pp. 2756–2763. IEEE Press, Piscataway (2009)
Knowles, J.D.: Closed-loop evolutionary multiobjective optimization. IEEE Comput. Intell. Mag. 4, 77–91 (2009)
Likar, B., Kocijan, J.: Predictive control of a gas-liquid separation plant based on a Gaussian process model. Comput. Chem. Eng. 31(3), 142–152 (2007)
Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning, ICML 2012, pp. 1451–1458. Omnipress (2012)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Sacher, M., et al.: A classification approach to efficient global optimization in presence of non-computable domains. Struct. Multidiscip. Optim. 58(4), 1537–1557 (2018). https://doi.org/10.1007/s00158-018-1981-8
Schillinger, M., Hartmann, B., Skalecki, P., Meister, M., Nguyen-Tuong, D., Nelles, O.: Safe active learning and safe Bayesian optimization for tuning a PI-controller. IFAC-PapersOnLine 50(1), 5967–5972 (2017)
Schillinger, M., et al.: Safe active learning of a high pressure fuel supply system. In: Proceedings of the 9th EUROSIM Congress on Modelling and Simulation, EUROSIM 2016 and the 57th SIMS Conference on Simulation and Modelling SIMS 2016, pp. 286–292, Linköping University Electronic Press (2018)
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M.: Safe exploration for active learning with Gaussian processes. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 133–149. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_9
Schulz, E., Speekenbrink, M., Krause, A.: A tutorial on Gaussian process regression: modelling, exploring, and exploiting functions. J. Math. Psychol. 85, 1–16 (2018)
Small, B.G., et al.: Efficient discovery of anti-inflammatory small-molecule combinations using evolutionary computing. Nat. Chem. Biol. 7(12), 902–908 (2011)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NIPS 25), pp. 2960–2968. Curran Associates, Red Hook (2012)
Sui, Y., Gotovos, A., Burdick, J.W., Krause, A.: Safe exploration for optimization with Gaussian processes. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, vol. 37, pp. 997–1005 (2015)
Sui, Y., Zhuang, V., Burdick, J.W., Yue, Y.: Stagewise safe Bayesian optimization with Gaussian processes. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4788–4796. PMLR (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration in finite Markov decision processes with Gaussian processes. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS 29), pp. 4312–4320 (2016)
Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration for interactive machine learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS 32), pp. 2887–2897 (2019)
Wachi, A., Sui, Y., Yue, Y., Ono, M.: Safe exploration and optimization of constrained MDPs using Gaussian processes. In: McIlraith, S.A., Weinberger, K.Q. (eds.) AAAI Conference on Artificial Intelligence, pp. 6548–6556, AAAI Press, February 2018
Acknowledgements
M. López-Ibáñez is a “Beatriz Galindo” Senior Distinguished Researcher (BEAGAL 18/00053) funded by the Spanish Ministry of Science and Innovation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, Y., Allmendinger, R., López-Ibáñez, M. (2021). Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art. In: Heintz, F., Milano, M., O'Sullivan, B. (eds) Trustworthy AI - Integrating Learning, Optimization and Reasoning. TAILOR 2020. Lecture Notes in Computer Science(), vol 12641. Springer, Cham. https://doi.org/10.1007/978-3-030-73959-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-73959-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73958-4
Online ISBN: 978-3-030-73959-1
eBook Packages: Computer ScienceComputer Science (R0)