Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1102351.1102442acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Coarticulation: an approach for generating concurrent plans in Markov decision processes

Published: 07 August 2005 Publication History
  • Get Citation Alerts
  • Abstract

    We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natural way for generating concurrency in the system is by coarticulating among the set of learned activities available to the agent. In general due to the multiple DOF in the system, often there exists a redundant set of admissible sub-optimal policies associated with each learned activity. Such flexibility enables the agent to concurrently commit to several subgoals according to their priority levels, given a new task defined in terms of a set of prioritized subgoals. We present efficient approximate algorithms for computing such policies and for generating concurrent plans. We also evaluate our approach in a simulated domain.

    References

    [1]
    Albus, J. (1981). Brain, behavior, and robotics. ByteBooks.
    [2]
    Boutilier, C., Brafman, R., & Geib, C. (1997). Prioritized goal decomposition of Markov decision processes: Towards a synthesis of classical and decision theoretic planning. Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 1156--1163). San Francisco: Morgan Kaufmann.
    [3]
    Dechter, R. (1999). Bucket elimination: A unifying framework for probabilistic inference. Artificial Intelligence.
    [4]
    Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning.
    [5]
    Guestrin, C., & Gordon, G. (2002). Distributed planning in hierarchical factored mdps. In the Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (pp. 197--206). Edmonton, Canada.
    [6]
    Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the ICMI-02. Sydney Australia.
    [7]
    Huber, M. (2000). A hybrid architecture for adaptive robot control. Doctoral dissertation, University of Massachusetts, Amherst.
    [8]
    Nilsson, D. (1998). An efficient algorithm for finding the m most probable configurations in bayesian networks. Statistics and Computing, 8, 159--173.
    [9]
    Precup, D. (2000). Temporal abstraction in reinforcement learning. Doctoral dissertation, Department of Computer Science, University of Massachusetts, Amherst.
    [10]
    Rohanimanesh, K., Platt, R., Mahadevan, S., & Grupen, R. (2004). Coarticulation in markov decision processes. Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, Canada.
    [11]
    Singh, S., & Cohn, D. (1998). How to dynamically merge markov decision processes. Proceedings of NIPS 11.
    [12]
    Sutton, R., & Barto, A. (1998). An introduction to reinforcement learning. Cambridge, MA.: MIT Press.
    [13]
    Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems (pp. 1038 1044). The MIT Press.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '05: Proceedings of the 22nd international conference on Machine learning
    August 2005
    1113 pages
    ISBN:1595931805
    DOI:10.1145/1102351
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Modeling sensory-motor decisions in natural behaviorPLOS Computational Biology10.1371/journal.pcbi.100651814:10(e1006518)Online publication date: 25-Oct-2018
    • (2018)Learning Generalizable Control ProgramsIEEE Transactions on Autonomous Mental Development10.1109/TAMD.2010.21033113:3(216-231)Online publication date: 12-Dec-2018
    • (2018)Efficient behavior learning in human---robot collaborationAutonomous Robots10.1007/s10514-017-9674-542:5(1103-1115)Online publication date: 28-Dec-2018
    • (2016)Relational activity processes for modeling concurrent cooperation2016 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2016.7487765(5505-5511)Online publication date: May-2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media