Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1036843.1036863acmotherconferencesArticle/Chapter ViewAbstractPublication PagesuaiConference Proceedingsconference-collections
Article

Metrics for finite Markov decision processes

Published: 07 July 2004 Publication History

Abstract

We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.

References

[1]
Bernardo, M., & Bravetti, M. (2003). Performance measure sensitive congruences for Markovian process algebras. Theoretical Computer Science, 290, 117--160.]]
[2]
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1--94.]]
[3]
Dean, T., Givan, R., & Leach, S. (1997). Model reduction techniques for computing approximately optimal solutions for Markov decision processes. Proceedings of UAI (pp. 124--131).]]
[4]
Desharnais, J., Gupta, V., Jagadeesan, R., & Panangaden, P. (1999). Metrics for labeled markov systems. International Conference on Concurrency Theory (pp. 258--273).]]
[5]
Desharnais, J., Gupta, V., Jagadeesan, R., & Panangaden, P. (2002). The metric analogue of weak bisimulation for probabilistic processes. Logic in Computer Science (pp. 413--422). IEEE Computer Society.]]
[6]
Ferns, N. (2003). Metrics for markov decision processes. Master's thesis, McGill University. URL: http://www.cs.mcgill.ca/~nferns/mythesis.ps.]]
[7]
Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 70, 419--435.]]
[8]
Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in markov decision processes. Artificial Intelligence, 147, 163--223.]]
[9]
Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. Proceedings of the ICML.]]
[10]
Larsen, K., & Skou, A. (1991). Bisimulation through probabilistic testing. Information and Computation, 94, 1--28.]]
[11]
Milner, R. (1980). A calculus of communicating systems. Lecture Notes in Computer Science Vol. 92. Springer-Verlag.]]
[12]
Orlin, J. (1988). A faster strongly polynomial minimum cost flow algorithm. Proceedings of the Twentieth annual ACM symposium on Theory of Computing (pp. 377--387). ACM Press.]]
[13]
Park, D. (1981). Concurrency and automata on infinite sequences. Proceedings of the 5th GI-Conference on Theoretical Computer Science (pp. 167--183). Springer-Verlag.]]
[14]
Puterman, M. L. (1994). Markov decision:processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc.]]
[15]
van Breugel, F., & Worrell, J. (2001). An algorithm for quantitative verification of probabilistic transition systems. Proceedings of the 12th International Conference on Concurrency Theory (pp. 336--350). Springer-Verlag.]]
[16]
Winskel, G. (1993). The formal semantics of programming languages. Foundations of Computing. The MIT Press.]]

Cited By

View all
  • (2024)Provable representation with efficient planning for partially observable reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694540(59759-59782)Online publication date: 21-Jul-2024
  • (2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
  • (2023)Energy-based predictive representations for partially observed reinforcement learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626065(2477-2487)Online publication date: 31-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
UAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence
July 2004
657 pages
ISBN:0974903906

Sponsors

  • Alberta Ingenuity Centre for Machine Learning
  • Sun Microsystems of Canada
  • Hewlett-Packard Laboratories
  • Information Extraction and Transportation
  • Informatics Circle of Research Excellence
  • Yahoo! Research Labs
  • IBMR: IBM Research
  • Intel: Intel
  • Microsoft Research: Microsoft Research
  • Pacific Institute of Mathematical Sciences
  • Boeing
  • University of Alberta: University of Alberta
  • Northrop Grumman Corporation

Publisher

AUAI Press

Arlington, Virginia, United States

Publication History

Published: 07 July 2004

Check for updates

Qualifiers

  • Article

Conference

UAI '04
Sponsor:
  • IBMR
  • Intel
  • Microsoft Research
  • University of Alberta

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Provable representation with efficient planning for partially observable reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694540(59759-59782)Online publication date: 21-Jul-2024
  • (2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
  • (2023)Energy-based predictive representations for partially observed reinforcement learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626065(2477-2487)Online publication date: 31-Jul-2023
  • (2022)Learning representations via a robust behavioral metric for deep reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602926(36654-36666)Online publication date: 28-Nov-2022
  • (2022)Deciding what to modelProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600926(9024-9044)Online publication date: 28-Nov-2022
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
  • (2021)Jointly-Learned State-Action Embedding for Efficient Reinforcement LearningProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482357(1447-1456)Online publication date: 26-Oct-2021
  • (2021)Smart choices and the selection monadProceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science10.1109/LICS52264.2021.9470641(1-14)Online publication date: 29-Jun-2021
  • (2020)Domain adaptive imitation learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525428(5286-5295)Online publication date: 13-Jul-2020
  • (2020)On efficiency in hierarchical reinforcement learningProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496287(6708-6718)Online publication date: 6-Dec-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media