Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3020751.3020774guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Bisimulation metrics are optimal value functions

Published: 23 July 2014 Publication History

Abstract

Bisimulation is a notion of behavioural equivalence on the states of a transition system. Its definition has been extended to Markov decision processes, where it can be used to aggregate states. A bisimulation metric is a quantitative analog of bisimulation that measures how similar states are from a the perspective of long-term behavior. Bisimulation metrics have been used to establish approximation bounds for state aggregation and other forms of value function approximation. In this paper, we prove that a bisimulation metric defined on the state space of a Markov decision process is the optimal value function of an optimal coupling of two copies of the original model. We prove the result in the general case of continuous state spaces. This result has important implications in understanding the complexity of computing such metrics, and opens up the possibility of more efficient computational methods.

References

[1]
Abate, A. (2012). Approximation Metrics based on Probabilistic Bisimulations for General State-Space Markov Processes: a Survey. Electronic Notes in Theoretical Computer Sciences.
[2]
Bacci, G., Bacci, G., Larsen, K. G., & Mardare, R. (2013). Computing behavioral distances, compositionally. Proceedings of the 38th International Symposium on Mathematical Foundations of Computer Science (MFCS) (pp. 74-85).
[3]
Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic Dynamic Programming with Factored Representations. Artificial Intelligence, 121, 49-107.
[4]
Chen, D., van Breugel, F., & Worrell, J. (2012). On the Complexity of Computing Probabilistic Bisimilarity. FoSSaCS (pp. 437-451). Springer.
[5]
Comanici, G., Panangaden, P., & Precup, D. (2012). On-the-Fly Algorithms for Bisimulation Metrics. QEST (pp. 94-103). IEEE Computer Society.
[6]
Comanici, G., & Precup, D. (2012). Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics. AAAI.
[7]
Dean, T., Givan, R., & Leach, S. (1997). Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes. UAI (pp. 124-131).
[8]
Desharnais, J., Jagadeesan, R., Gupta, V., & Panangaden, P. (2002). The Metric Analogue of Weak Bisimulation for Probabilistic Processes. LICS (pp. 413-422). IEEE Computer Society.
[9]
Desharnais, J., Laviolette, F., & Zhioua, S. (2013). Testing Probabilistic Equivalence Through Reinforcement Learning. Information and Computation, 227, 21-57.
[10]
Doberkat, E.-E. (2007). Stochastic Relations. Foundations for Markov Transition Systems. Chapman & Hall/CRC.
[11]
Feinberg, E., & Shwartz, A. (Eds.). (2002). Handbook of Markov Decision Processes - Methods and Applications. Kluwer International Series.
[12]
Ferns, N., Castro, P. S., Precup, D., & Panangaden, P. (2006). Methods for Computing State Similarity in Markov Decision Processes. UAI.
[13]
Ferns, N., Panangaden, P., & Precup, D. (2004). Metrics for Finite Markov Decision Processes. UAI (pp. 162-169).
[14]
Ferns, N., Panangaden, P., & Precup, D. (2005). Metrics for Markov Decision Processes with Infinite State Spaces. UAI (pp. 201-208).
[15]
Ferns, N., Panangaden, P., & Precup, D. (2011). Bisimulation Metrics for Continuous Markov Decision Processes. SIAM Journal on Computing, 40, 1662-1714.
[16]
Ferns, N., Precup, D., & Knight, S. (2014). Bisimulation for Markov Decision Processes Through Families of Functional Expressions. Horizons of the Mind. A Tribute to Prakash Panangaden (pp. 319-342). Springer.
[17]
Folland, G. B. (1999). Real analysis: Modern techniques and their applications. Wiley-Interscience. Second edition.
[18]
Giry, M. (1982). A Categorical Approach to Probability Theory. Categorical Aspects of Topology and Analysis, 68-85.
[19]
Givan, R., Dean, T., & Greig, M. (2003). Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence, 147, 163-223.
[20]
Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-Time Markov Control Processes : Basic Optimality Criteria. Applications of Mathematics. Springer.
[21]
Hernández-Lerma, O., & Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics. Springer.
[22]
Larsen, K. G., & Skou, A. (1991). Bisimulation Through Probabilistic Testing. Information and Computation, 94, 1-28.
[23]
Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a Unified Theory of State Abstraction for MDPs. Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics (pp. 531-539).
[24]
Lindvall, T. (2002). Lectures on the Coupling Method. Dover Publications Inc.
[25]
Pazis, J., & Parr, R. (2013). Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming. AAAI.
[26]
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc.
[27]
Srivastava, S. M. (2008). A Course on Borel Sets, vol. 180 of Graduate texts in mathematics. Springer.
[28]
Sutton, R. S., & Barto, A. G. (2012). Reinforcement Learning: An Introduction (Second Edition, In Progress). MIT Press.
[29]
van Breugel, F., & Worrell, J. (2001a). Towards Quantitative Verification of Probabilistic Transition Systems. ICALP (pp. 421-132). Springer.
[30]
Villani, C. (2003). Topics in Optimal Transportation (Graduate Studies in Mathematics, Vol. 58). American Mathematical Society.

Cited By

View all
  • (2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 24-Feb-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
UAI'14: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence
July 2014
926 pages
ISBN:9780974903910
  • Editors:
  • Nevin Zhang,
  • Jin Tian

Sponsors

  • Google Inc.
  • Artificial Intelligence Journal
  • IBMR: IBM Research
  • Microsoft Research: Microsoft Research
  • Facebook: Facebook

Publisher

AUAI Press

Arlington, Virginia, United States

Publication History

Published: 23 July 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 24-Feb-2021

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media