Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/279943.279964acmconferencesArticle/Chapter ViewAbstractPublication PagescoltConference Proceedingsconference-collections
Article
Free access

Learning agents for uncertain environments (extended abstract)

Published: 24 July 1998 Publication History
  • Get Citation Alerts
  • First page of PDF

    References

    [1]
    Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic., 10, 174-205.
    [2]
    Bertsekas, D. C., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Athena Scientific, Belmont, Mass.
    [3]
    Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997a). Adaptive probabllistic networks with hidden variables. Machine Learning, 29, 213-244.
    [4]
    Binder, J., Murphy, K., & Russell, S. (1997b). Space-efficient inference in dynamic probabilistic networks. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97) Nagoya, Japan. Morgan Kaufmann.
    [5]
    Boyen, X., & Koller, D. (1998). Tractable inference for complex stochastic processes. In Proc. 14th Annual Conference on Uncertainty in AI (UAD. to appear.
    [6]
    Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. ComputationallnteUigence, 5(3), 142-150.
    [7]
    Doya, K., & Sejnowski, T. (1995). A novel reinforcement model of birdsong vocalization learning. In Tesauro, G., Touretzky, D., & Leen, T (Eds.), Advances in Neural Information Processing Systems, Vol. 8, pp. 101-8 Denver, CO. M1T Press.
    [8]
    Farley, C. T., & Taylor, C. R. (1991). A mechanical trigger for the trot-gallop transition in horses. Science, 253(5017), 306- 308.
    [9]
    Friedman, N., Murphy, K., & Russell, S. (1998). Leaming the structure of dynamic probabilistic networks. In Uncertainty in Artificial Intelligence : Proceedings of the Fourteenth Conference Madison, Wisconsin. Morgan Kaufmann.
    [10]
    Hoyt, D., & Taylor, C. (1981). Gait and the energetics of locomotion in horses. Nature, 292,239-240.
    [11]
    Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4,237-285.
    [12]
    Kanazawa, K., Koller, D., & Russell, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. In Eleventh Conference, pp. 346-351 Montreal, Canada. Morgan Kaufmann.
    [13]
    Keeney, R. L., & Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York.
    [14]
    McCallum, A. R. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pp. 190-196 Amherst, Massachusetts. Morgan Kaufmann.
    [15]
    Montague, P. R., Dayan, P., Person, C., & Sejnowski, T. J. (1995). Bee foraging in uncertain environments using predictive hebbian learning. Nature, 377, 725-728.
    [16]
    Parr, R., & Russell, S. (1995). Approximating optimal policies for partially observable stochastic domains. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) Montreal, Canada. Morgan Kaufmann.
    [17]
    Parr, R., & Russell, S. (1998). Reinforcement leaming with hierarchies of machines. In Keams, M. (Ed.), Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, Massachusetts.
    [18]
    Russell, S. J., & Norvig, P. (1995). ArtificialIntelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, New Jersey.
    [19]
    Rust, J. (1994). Do people behave according to bellman's principal of optimality?. Submitted to Journal of Economic Perspecfives.
    [20]
    Sargent, T. J. (1978). Estimation of dynamic labor demand schedules under rational expectations. Journal of Political Economy, 86(6), 1009-1044.
    [21]
    Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation: a neural network approach. Adaptive Behavior, 6(1), 63-129.
    [22]
    Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
    [23]
    Touretzky, D. S., & Saksida, L. M. (1997). Operant conditioning in Skinnerbots. Adaptive Behavior, 5(3-4), 219-47.

    Cited By

    View all
    • (2024)Bridging Requirements, Planning, and Evaluation: A Review of Social Robot NavigationSensors10.3390/s2409279424:9(2794)Online publication date: 27-Apr-2024
    • (2024)Inference of Utilities and Time Preference in Sequential Decision-MakingSSRN Electronic Journal10.2139/ssrn.4840776Online publication date: 2024
    • (2024)Learning Agents in Robot Navigation: Trends and Next ChallengesJournal of Robotics and Mechatronics10.20965/jrm.2024.p050836:3(508-516)Online publication date: 20-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    COLT' 98: Proceedings of the eleventh annual conference on Computational learning theory
    July 1998
    304 pages
    ISBN:1581130570
    DOI:10.1145/279943
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 1998

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    COLT98
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 35 of 71 submissions, 49%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)450
    • Downloads (Last 6 weeks)38

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bridging Requirements, Planning, and Evaluation: A Review of Social Robot NavigationSensors10.3390/s2409279424:9(2794)Online publication date: 27-Apr-2024
    • (2024)Inference of Utilities and Time Preference in Sequential Decision-MakingSSRN Electronic Journal10.2139/ssrn.4840776Online publication date: 2024
    • (2024)Learning Agents in Robot Navigation: Trends and Next ChallengesJournal of Robotics and Mechatronics10.20965/jrm.2024.p050836:3(508-516)Online publication date: 20-Jun-2024
    • (2024)Estimation of Different Reward Functions Latent in Trajectory DataJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p040328:2(403-412)Online publication date: 20-Mar-2024
    • (2024)Imitation Learning: Progress, Taxonomies and ChallengesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321324635:5(6322-6337)Online publication date: May-2024
    • (2024)Learning to Evaluate Potential Safety Risk of Ego-Vehicle via Inverse Reinforcement Learning2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)10.1109/IAEAC59436.2024.10503957(1520-1526)Online publication date: 15-Mar-2024
    • (2024)Off-Dynamics Inverse Reinforcement LearningIEEE Access10.1109/ACCESS.2024.339424212(65117-65127)Online publication date: 2024
    • (2024)Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement LearningIEEE Access10.1109/ACCESS.2024.336103012(19942-19951)Online publication date: 2024
    • (2024)Machine learning meets advanced robotic manipulationInformation Fusion10.1016/j.inffus.2023.102221105(102221)Online publication date: May-2024
    • (2024)A two-stage framework for parking search behavior prediction through adversarial inverse reinforcement learning and transformerExpert Systems with Applications10.1016/j.eswa.2024.124548255(124548)Online publication date: Dec-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media