research-article

Meta-Learning for Multi-objective Reinforcement Learning

Authors:

Ali Ghadirzadeh,

Mårten Björkman,

Patric JensfeltAuthors Info & Claims

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Pages 977 - 983

https://doi.org/10.1109/IROS40897.2019.8968092

Published: 01 November 2019 Publication History

Abstract

Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.

References

[1]

M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Van de Wiele, V. Mnih, N. Heess, and J. T. Springenberg, “Learning by playing-solving sparse reward tasks from scratch,” arXiv preprint arXiv:1802.10567, 2018.

[2]

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, “Hindsight experience replay,” in Advances in Neural Information Processing Systems, 2017, pp. 5048–5058.

[3]

A. Ghadirzadeh, A. Maki, D. Kragic, and M. Björkman, “Deep predictive policy training using reinforcement learning,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 2351–2358.

[4]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

[5]

Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning, 2016, pp. 1329–1338.

[6]

C. Liu, X. Xu, and D. Hu, “Multiobjective reinforcement learning: A comprehensive overview,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 3, pp. 385–398, 2015.

[7]

P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical evaluation methods for multiobjective reinforcement learning algorithms,” Machine learning, vol. 84, no. 1-2, pp. 51–80, 2011.

Digital Library

[8]

D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” Journal of Artificial Intelligence Research, vol. 48, pp. 67–113, 2013.

[9]

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” arXiv preprint arXiv:1703.03400, 2017.

[10]

S. Parisi, M. Pirotta, N. Smacchia, L. Bascetta, and M. Restelli, “Policy gradient approaches for multi-objective sequential decision making,” in Neural networks (ijcnn), 2014 international joint conference on. IEEE, 2014, pp. 2323–2330.

[11]

K. Miettinen and M. M. Mäkelä, “On scalarizing functions in multi-objective optimization,” OR spectrum, vol. 24, no. 2, pp. 193–213, 2002.

[12]

K. Miettinen, “Some methods for nonlinear multi-objective optimization,” in International Conference on Evolutionary Multi-Criterion Optimization. Springer, 2001, pp. 1–20.

[13]

C. A. C. Coello, G. B. Lamont, D. A. Van Veldhuizen et al., Evolutionary algorithms for solving multi-objective problems. Springer, 2007, vol. 5.

Digital Library

[14]

A. Konak, D. W. Coit, and A. E. Smith, “Multi-objective optimization using genetic algorithms: A tutorial,” Reliability Engineering & System Safety, vol. 91, no. 9, pp. 992–1007, 2006.

[15]

E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto evolutionary algorithm,” TIK-report, vol. 103, 2001.

[16]

K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii,” in International Conference on Parallel Problem Solving From Nature. Springer, 2000, pp. 849–858.

[17]

T. Brys, A. Harutyunyan, P. Vrancx, A. Nowé, and M. E. Taylor, “Multi-objectivization and ensembles of shapings in reinforcement learning,” Neurocomputing, vol. 263, pp. 48–59, 2017.

[18]

D. J. Lizotte, M. H. Bowling, and S. A. Murphy, “Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10). Citeseer, 2010, pp. 695–702.

[19]

I. Das and J. E. Dennis, “A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems,” Structural optimization, vol. 14, no. 1, pp. 63–69, 1997.

[20]

K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques.” in AD-PRL, 2013, pp. 191–199.

[21]

K. Van Moffaert, M. M. Drugan, and A. Nowé, “Hypervolume-based multi-objective reinforcement learning,” in International Conference on Evolutionary Multi-Criterion Optimization. Springer, 2013, pp. 352–366.

[22]

S. Natarajan and P. Tadepalli, “Dynamic p in multi-criteria reinforcement learning,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 601–608.

[23]

H. Handa, “Eda-rl: estimation of distribution algorithms for reinforcement learning problems,” in Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, 2009, pp. 405–412.

[24]

H. Handa, “Solving multi-objective reinforcement learning problems by eda-rl-acquisition of various strategies,” in 2009 ninth international conference on intelligent systems design and applications. IEEE, 2009, pp. 426–431.

[25]

M. Pirotta, S. Parisi, and M. Restelli, “Multi-objective reinforcement learning with continuous pareto frontier approximation,” in 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015. AAAI Press, 2015, pp. 2928–2934.

[26]

S. Parisi, M. Pirotta, and J. Peters, “Manifold-based multi-objective policy search with sample reuse,” Neurocomputing, vol. 263, pp. 3–14, 2017.

[27]

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International Conference on Machine Learning, 2015, pp. 1889–1897.

[28]

“Roboschool: open-source software for robot simulation,” https://blog.openai.com/roboschool/

[29]

M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V. Kumar, and W. Zaremba, “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” 2018.

Cited By

Vamplew PFoale CHayes CMannion PHowley EDazeley RJohnson SKällström JRamos GRadulescu RRöpke WRoijers DDastani MSichman JAlechina NDignum V(2024)Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663264(2717-2721)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663264
Gharoun HMomenifar FChen FGandomi A(2024)Meta-learning Approaches for Few-Shot Learning: A Survey of Recent AdvancesACM Computing Surveys10.1145/365994356:12(1-41)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3659943
Shao HCohen LBlum AMansour YSaha AWalter MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Eliciting user preferences for personalized multi-objective decision making through comparative feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666657(12192-12221)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666657
Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

6597 pages

Copyright © 2019.

Publisher

IEEE Press

Publication History

Published: 01 November 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vamplew PFoale CHayes CMannion PHowley EDazeley RJohnson SKällström JRamos GRadulescu RRöpke WRoijers DDastani MSichman JAlechina NDignum V(2024)Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663264(2717-2721)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663264
Gharoun HMomenifar FChen FGandomi A(2024)Meta-learning Approaches for Few-Shot Learning: A Survey of Recent AdvancesACM Computing Surveys10.1145/365994356:12(1-41)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3659943
Shao HCohen LBlum AMansour YSaha AWalter MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Eliciting user preferences for personalized multi-objective decision making through comparative feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666657(12192-12221)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666657
Xu MShe YJin YWang J(2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3623405
Basaklar TGoksoy AKrishnakumar AGumussoy SOgras U(2023)DTRL: Decision Tree-based Multi-Objective Reinforcement Learning for Runtime Task Scheduling in Domain-Specific System-on-ChipsACM Transactions on Embedded Computing Systems10.1145/360910822:5s(1-22)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609108

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents