Abstract
It is known that applying adaptive operator selection (AOS) techniques can improve the search process in the space of candidate solutions of a multi-objective evolutionary algorithm. The AOS consists of two main tasks; the first assigns the weights or credits to each available operator, and the second selects the best operator. For the first time in this chapter, a reinforcement learning technique (Q-learning) is used to perform an adaptive selection of operators in the MOEA/D algorithm, which we call MOEA/D-QL. The objective of Q-learning is to learn a series of rules that tell an agent what action to take under certain circumstances; that is, an agent seeks to execute the actions that give it the most ac-cumulated reward. In this case, an action corresponds to a variation operator. Four variants of the Differential Evolution operator have been used for this work. In addition, two states are also used: the first state, \(S_{0}\), corresponds to a child solution that enters front 0, and the second state, \(S_{1}\), corresponds to solutions that do not enter front 0. MOEA/D-QL algorithm has been validated by comparing it with two state-of-the-art multi-objective algorithms, MOEA/D and a version of MOEA/D that uses an AOS based on Thompson's dynamic sampling called MOEA/D-DYTS. Fifteen multi-objective reference problems with 2 and 3 objectives were used as instances. Three metrics have been applied: hypervolume, generalized dispersion, and inverse generational distance. The non-parametric tests Wilcoxon signed-rank and Friedman were applied with a significance level of 5%, where it is observed that the MOEA/D-QL algorithm is superior in the hypervolume and inverted generational distance metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sun, L., Li, K.: Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D (Version 1). arXiv (2020)
Li, K., Fialho, A., Kwong, S., Zhang, Q.: Adaptive operator selection with bandits for a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 18(1), 114–130 (2014). Institute of Electrical and Electronics Engineers (IEEE)
Fialho, Á., da Costa, L., Schoenauer, M., Sebag, M.: Analyzing bandit-based adaptive operator selection mechanisms. Annals Math. Artif. Intell. 60(1), 25–64 (2010). Springer Verlag
Goldberg, D.E.: Probability matching, the magnitude of reinforcement, and classifier system bidding. Mach. Learn. 5(4), 407–425 (1990). Springer Science and Business Media LLC
Thierens, D.: An adaptive pursuit strategy for allocating operator probabilities. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation—GECCO ’05. The 2005 Conference. ACM Press (2005)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2/3), 235–256 (2002). Springer Science and Business Media LLC
Sutton, R.S., Barto, A.G.: Reinforcement Learning, Second Edition: An Introduction. MIT Press, London, England (2018)
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD Thesis, University of Cambridge, England (1989)
Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007)
Deb, K., Thiele, L., Laumanns, M., Zitzler, E.: Scalable test problems for evolutionary multi-objective optimization. ETH Zurich (2001). https://doi.org/10.3929/ETHZ-A-004284199
Zhang, Q., Zhou, A., Zhao, S., Suganthan, P.N., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the CEC 2009 special session and competition; special session on performance assessment of multi-objective optimization algorithms, technical report; University of Essex, Colchester. UK; Nanyang Technological University, Singapore, vol. 264, pp. 1–30 (2008)
Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000). https://doi.org/10.1162/106365600568202
Brambila-Hernández, J.A., García-Morales, M.Á., Fraire-Huacuja, H.J., del Angel, A.B., Villegas-Huerta, E., Carbajal-López, R.: Experimental evaluation of adaptive operators selection methods for the dynamic multiobjective evolutionary algorithm based on decomposition (DMOEA/D). In: Castillo, O., Melin, P. (eds.) Hybrid Intelligent Systems Based on Extensions of Fuzzy Logic, Neural Networks and Metaheuristics. Studies in Computational Intelligence, vol. 1096. Springer, Cham (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Brambila-Hernández, J.A., García-Morales, M.Á., Fraire-Huacuja, H.J., Cruz-Reyes, L., Frausto-Solís, J. (2024). Novel Decomposition-Based Multi-objective Evolutionary Algorithm Using Reinforcement Learning Adaptive Operator Selection (MOEA/D-QL). In: Castillo, O., Melin, P. (eds) New Horizons for Fuzzy Logic, Neural Networks and Metaheuristics. Studies in Computational Intelligence, vol 1149. Springer, Cham. https://doi.org/10.1007/978-3-031-55684-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-55684-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55683-8
Online ISBN: 978-3-031-55684-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)