Article

Metrics for finite Markov decision processes

Authors:

Prakash Panangaden,

Doina PrecupAuthors Info & Claims

UAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence

Pages 162 - 169

Published: 07 July 2004 Publication History

Abstract

We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.

References

[1]

Bernardo, M., & Bravetti, M. (2003). Performance measure sensitive congruences for Markovian process algebras. Theoretical Computer Science, 290, 117--160.]]

Digital Library

[2]

Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1--94.]]

Digital Library

[3]

Dean, T., Givan, R., & Leach, S. (1997). Model reduction techniques for computing approximately optimal solutions for Markov decision processes. Proceedings of UAI (pp. 124--131).]]

Digital Library

[4]

Desharnais, J., Gupta, V., Jagadeesan, R., & Panangaden, P. (1999). Metrics for labeled markov systems. International Conference on Concurrency Theory (pp. 258--273).]]

Digital Library

[5]

Desharnais, J., Gupta, V., Jagadeesan, R., & Panangaden, P. (2002). The metric analogue of weak bisimulation for probabilistic processes. Logic in Computer Science (pp. 413--422). IEEE Computer Society.]]

Digital Library

[6]

Ferns, N. (2003). Metrics for markov decision processes. Master's thesis, McGill University. URL: http://www.cs.mcgill.ca/~nferns/mythesis.ps.]]

[7]

Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 70, 419--435.]]

[8]

Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in markov decision processes. Artificial Intelligence, 147, 163--223.]]

Digital Library

[9]

Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. Proceedings of the ICML.]]

Digital Library

[10]

Larsen, K., & Skou, A. (1991). Bisimulation through probabilistic testing. Information and Computation, 94, 1--28.]]

Digital Library

[11]

Milner, R. (1980). A calculus of communicating systems. Lecture Notes in Computer Science Vol. 92. Springer-Verlag.]]

[12]

Orlin, J. (1988). A faster strongly polynomial minimum cost flow algorithm. Proceedings of the Twentieth annual ACM symposium on Theory of Computing (pp. 377--387). ACM Press.]]

Digital Library

[13]

Park, D. (1981). Concurrency and automata on infinite sequences. Proceedings of the 5th GI-Conference on Theoretical Computer Science (pp. 167--183). Springer-Verlag.]]

Digital Library

[14]

Puterman, M. L. (1994). Markov decision:processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc.]]

Digital Library

[15]

van Breugel, F., & Worrell, J. (2001). An algorithm for quantitative verification of probabilistic transition systems. Proceedings of the 12th International Conference on Concurrency Theory (pp. 336--350). Springer-Verlag.]]

Digital Library

[16]

Winskel, G. (1993). The formal semantics of programming languages. Foundations of Computing. The MIT Press.]]

Digital Library

Cited By

Zhang HRen TXiao CSchuurmans DDai BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Provable representation with efficient planning for partially observable reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694540(59759-59782)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694540
Pavse BHanna JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667956
Zhang TRen TXiao CXiao WGonzalez JSchuurmans DDai BEvans RShpitser I(2023)Energy-based predictive representations for partially observed reinforcement learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626065(2477-2487)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3626065
Show More Cited By

Index Terms

Metrics for finite Markov decision processes

Index terms have been assigned to the content through auto-classification.

Recommendations

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon ...
Variability Sensitive Markov Decision Processes

Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

This paper studies the constrained (nonhomogeneous) continuous-time Markov decision processes on the finite horizon. The performance criterion to be optimized is the expected total reward on the finite horizon, while N constraints are imposed on similar ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

UAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence

July 2004

657 pages

ISBN:0974903906

Conference Chair:
Christopher Meek
Microsoft Research
,
Program Chairs:
Max Chickering
Microsoft Reasearch
,
Joseph Halpern
Cornell University

Sponsors

Alberta Ingenuity Centre for Machine Learning
Sun Microsystems of Canada
Hewlett-Packard Laboratories
Information Extraction and Transportation
Informatics Circle of Research Excellence
Yahoo! Research Labs
IBMR: IBM Research
Intel: Intel
Microsoft Research: Microsoft Research
Pacific Institute of Mathematical Sciences
Boeing
University of Alberta: University of Alberta
Northrop Grumman Corporation

Publisher

AUAI Press

Arlington, Virginia, United States

Publication History

Published: 07 July 2004

Check for updates

Qualifiers

Article

Conference

UAI '04

Sponsor:

IBMR
Intel
Microsoft Research
University of Alberta

UAI '04: Uncertainty in artificial intelligence

July 7 - 11, 2004

Banff, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
474
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HRen TXiao CSchuurmans DDai BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Provable representation with efficient planning for partially observable reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694540(59759-59782)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694540
Pavse BHanna JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667956
Zhang TRen TXiao CXiao WGonzalez JSchuurmans DDai BEvans RShpitser I(2023)Energy-based predictive representations for partially observed reinforcement learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626065(2477-2487)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3626065
Chen JPan SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning representations via a robust behavioral metric for deep reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602926(36654-36666)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602926
Arumugam DVan Roy BKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Deciding what to modelProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600926(9024-9044)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600926
Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Pritz PMa LLeung KDemartini GZuccon GCulpepper JHuang ZTong H(2021)Jointly-Learned State-Action Embedding for Efficient Reinforcement LearningProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482357(1447-1456)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482357
Abadi MPlotkin GGorla D(2021)Smart choices and the selection monadProceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science10.1109/LICS52264.2021.9470641(1-14)Online publication date: 29-Jun-2021
https://dl.acm.org/doi/10.1109/LICS52264.2021.9470641
Kim KGu YSong JZhao SErmon SDaumé HSingh A(2020)Domain adaptive imitation learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525428(5286-5295)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525428
Wen ZPrecup DIbrahimi MBarreto AVan Roy BSingh SLarochelle HRanzato MHadsell RBalcan MLin H(2020)On efficiency in hierarchical reinforcement learningProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496287(6708-6718)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496287
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten