Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IROS40897.2019.8968092guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Meta-Learning for Multi-objective Reinforcement Learning

Published: 01 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.

    References

    [1]
    M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Van de Wiele, V. Mnih, N. Heess, and J. T. Springenberg, “Learning by playing-solving sparse reward tasks from scratch,” arXiv preprint arXiv:1802.10567, 2018.
    [2]
    M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, “Hindsight experience replay,” in Advances in Neural Information Processing Systems, 2017, pp. 5048–5058.
    [3]
    A. Ghadirzadeh, A. Maki, D. Kragic, and M. Björkman, “Deep predictive policy training using reinforcement learning,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 2351–2358.
    [4]
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
    [5]
    Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning, 2016, pp. 1329–1338.
    [6]
    C. Liu, X. Xu, and D. Hu, “Multiobjective reinforcement learning: A comprehensive overview,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 3, pp. 385–398, 2015.
    [7]
    P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical evaluation methods for multiobjective reinforcement learning algorithms,” Machine learning, vol. 84, no. 1-2, pp. 51–80, 2011.
    [8]
    D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” Journal of Artificial Intelligence Research, vol. 48, pp. 67–113, 2013.
    [9]
    C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” arXiv preprint arXiv:1703.03400, 2017.
    [10]
    S. Parisi, M. Pirotta, N. Smacchia, L. Bascetta, and M. Restelli, “Policy gradient approaches for multi-objective sequential decision making,” in Neural networks (ijcnn), 2014 international joint conference on. IEEE, 2014, pp. 2323–2330.
    [11]
    K. Miettinen and M. M. Mäkelä, “On scalarizing functions in multi-objective optimization,” OR spectrum, vol. 24, no. 2, pp. 193–213, 2002.
    [12]
    K. Miettinen, “Some methods for nonlinear multi-objective optimization,” in International Conference on Evolutionary Multi-Criterion Optimization. Springer, 2001, pp. 1–20.
    [13]
    C. A. C. Coello, G. B. Lamont, D. A. Van Veldhuizen et al., Evolutionary algorithms for solving multi-objective problems. Springer, 2007, vol. 5.
    [14]
    A. Konak, D. W. Coit, and A. E. Smith, “Multi-objective optimization using genetic algorithms: A tutorial,” Reliability Engineering & System Safety, vol. 91, no. 9, pp. 992–1007, 2006.
    [15]
    E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto evolutionary algorithm,” TIK-report, vol. 103, 2001.
    [16]
    K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii,” in International Conference on Parallel Problem Solving From Nature. Springer, 2000, pp. 849–858.
    [17]
    T. Brys, A. Harutyunyan, P. Vrancx, A. Nowé, and M. E. Taylor, “Multi-objectivization and ensembles of shapings in reinforcement learning,” Neurocomputing, vol. 263, pp. 48–59, 2017.
    [18]
    D. J. Lizotte, M. H. Bowling, and S. A. Murphy, “Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10). Citeseer, 2010, pp. 695–702.
    [19]
    I. Das and J. E. Dennis, “A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems,” Structural optimization, vol. 14, no. 1, pp. 63–69, 1997.
    [20]
    K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques.” in AD-PRL, 2013, pp. 191–199.
    [21]
    K. Van Moffaert, M. M. Drugan, and A. Nowé, “Hypervolume-based multi-objective reinforcement learning,” in International Conference on Evolutionary Multi-Criterion Optimization. Springer, 2013, pp. 352–366.
    [22]
    S. Natarajan and P. Tadepalli, “Dynamic p in multi-criteria reinforcement learning,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 601–608.
    [23]
    H. Handa, “Eda-rl: estimation of distribution algorithms for reinforcement learning problems,” in Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, 2009, pp. 405–412.
    [24]
    H. Handa, “Solving multi-objective reinforcement learning problems by eda-rl-acquisition of various strategies,” in 2009 ninth international conference on intelligent systems design and applications. IEEE, 2009, pp. 426–431.
    [25]
    M. Pirotta, S. Parisi, and M. Restelli, “Multi-objective reinforcement learning with continuous pareto frontier approximation,” in 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015. AAAI Press, 2015, pp. 2928–2934.
    [26]
    S. Parisi, M. Pirotta, and J. Peters, “Manifold-based multi-objective policy search with sample reuse,” Neurocomputing, vol. 263, pp. 3–14, 2017.
    [27]
    J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International Conference on Machine Learning, 2015, pp. 1889–1897.
    [28]
    Roboschool: open-source software for robot simulation,” https://blog.openai.com/roboschool/
    [29]
    M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V. Kumar, and W. Zaremba, “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” 2018.

    Cited By

    View all
    • (2024)Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663264(2717-2721)Online publication date: 6-May-2024
    • (2024)Meta-learning Approaches for Few-Shot Learning: A Survey of Recent AdvancesACM Computing Surveys10.1145/365994356:12(1-41)Online publication date: 3-May-2024
    • (2023)Eliciting user preferences for personalized multi-objective decision making through comparative feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666657(12192-12221)Online publication date: 10-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    6597 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 November 2019

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663264(2717-2721)Online publication date: 6-May-2024
    • (2024)Meta-learning Approaches for Few-Shot Learning: A Survey of Recent AdvancesACM Computing Surveys10.1145/365994356:12(1-41)Online publication date: 3-May-2024
    • (2023)Eliciting user preferences for personalized multi-objective decision making through comparative feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666657(12192-12221)Online publication date: 10-Dec-2023
    • (2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
    • (2023)DTRL: Decision Tree-based Multi-Objective Reinforcement Learning for Runtime Task Scheduling in Domain-Specific System-on-ChipsACM Transactions on Embedded Computing Systems10.1145/360910822:5s(1-22)Online publication date: 31-Oct-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media