Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3635637.3663111acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
extended-abstract

Minimizing Negative Side Effects in Cooperative Multi-Agent Systems using Distributed Coordination

Published: 06 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks-up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods.

    References

    [1]
    Parand Alizadeh Alamdari, Toryn Q Klassen, Rodrigo Toro Icarte, and Sheila A McIlraith. 2021. Avoiding negative side effects by considering Others. In Safe and Robust Control of Uncertain Systems Workshop at NeurIPS.
    [2]
    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).
    [3]
    Raphen Becker, Shlomo Zilberstein, Victor Lesser, and Claudia V Goldman. 2003. Transition-independent decentralized Markov decision processes. In Proceedings of the Second International Joint Conference on Autonomous agents and Multiagent systems. 41--48.
    [4]
    Jack Clark and Dario Amodei. 2016. Faulty reward functions in the wild. Internet: https://blog. openai. com/faulty-reward-functions (2016).
    [5]
    Raffaello D'Andrea. 2012. A revolution in the warehouse: A retrospective on kiva systems and the grand challenges ahead. IEEE Transactions on Automation Science and Engineering, Vol. 9, 4 (2012), 638--639.
    [6]
    Alessandro Farinelli, Alex Rogers, and Nick R Jennings. 2014. Agent-based decentralised coordination for sensor networks using the max-sum algorithm. Autonomous Agents and Multi-agent Systems, Vol. 28 (2014), 337--380.
    [7]
    Ferdinando Fioretto, Enrico Pontelli, and William Yeoh. 2018. Distributed constraint optimization problems and applications: A survey. Journal of Artificial Intelligence Research, Vol. 61 (2018), 623--698.
    [8]
    Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. 2017. Inverse reward design. Advances in neural information processing systems, Vol. 30 (2017).
    [9]
    Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, and Ken Goldberg. 2023. Fleet-dagger: Interactive robot fleet learning with scalable human supervision. In Conference on Robot Learning. PMLR, 368--380.
    [10]
    Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, and Shane Legg. 2019. Penalizing side effects using stepwise relative reachability. In IJCAI AI Safety Workshop.
    [11]
    Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit Press.
    [12]
    Ramya Ramakrishnan, Ece Kamar, Besmira Nushi, Debadeepta Dey, Julie Shah, and Eric Horvitz. 2019. Overcoming blind spots in the real world: Leveraging complementary abilities for joint execution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6137--6145.
    [13]
    Sandhya Saisubramanian, Ece Kamar, and Shlomo Zilberstein. 2020. A multi-objective approach to mitigate negative side effects. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Yokohama, Japan, 354--361.
    [14]
    Sandhya Saisubramanian, Shlomo Zilberstein, and Ece Kamar. 2022. Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems. AI Magazine, Vol. 42, 4 (2022), 62--71.
    [15]
    Joar Skalse, Lewis Hammond, Charlie Griffin, and Alessandro Abate. 2022. Lexicographic multi-objective reinforcement learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).
    [16]
    Kyle Wray, Shlomo Zilberstein, and Abdel-Illah Mouaddib. 2015. Multi-objective MDPs with conditional lexicographic reward preferences. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
    [17]
    Chongjie Zhang and Victor Lesser. 2011. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 25. 764--770.
    [18]
    Shun Zhang, Edmund H Durfee, and Satinder Singh. 2018. Minimax-regret querying on side effects for safe optimality in factored markov decision processes. In Proceedings of the Twenty-sixth International Joint Conferences on Artificial Intelligence. 4867--4873.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
    May 2024
    2898 pages
    ISBN:9798400704864

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2024

    Check for updates

    Author Tags

    1. ai safety
    2. cooperative multi-agent systems
    3. distributed constraint optimization problems
    4. negative side effects

    Qualifiers

    • Extended-abstract

    Funding Sources

    • ONR
    • NSF

    Conference

    AAMAS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 10
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media