research-article

Free access

Learning good state and action representations for Markov decision process via tensor decomposition

AUTHORs:

Munther Dahleh,

Anru R. ZhangAuthors Info & Claims

The Journal of Machine Learning Research, Volume 24, Issue 1

Article No.: 115, Pages 5157 - 5209

Published: 06 March 2024 Publication History

PDF eReader Publisher Site

Abstract

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.

References

[1]

Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, and Wen Sun. Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33:20095-20107, 2020.

[2]

Animashree Anandkumar, Rong Ge, and Majid Janzamin. Guaranteed non-orthogonal tensor decomposition via alternating rank-1 updates. arXiv preprint arXiv:1402.5180, 2014.

[3]

Kamyar Azizzadenesheli, Alessandro Lazaric, and Animashree Anandkumar. Reinforcement learning in rich-observation mdps using spectral methods. arXiv preprint arXiv:1611.03907, 2016.

[4]

Dimitri P Bertsekas. Dynamic programming and optimal control. Athena scientific Belmont, MA, 2007.

[5]

Dimitri P Bertsekas and John N Tsitsiklis. Neuro-dynamic programming. Athena Scientific, Belmont, MA, 1996.

[6]

T Tony Cai and Anru Zhang. Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics, 46(1):60-89, 2018.

[7]

Sayak Ray Chowdhury and Aditya Gopalan. Online learning in kernelized markov decision processes. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3197-3205. PMLR, 2019.

[8]

Andrzej Cichocki, Danilo Mandic, Lieven De Lathauwer, Guoxu Zhou, Qibin Zhao, Cesar Caiafa, and Huy Anh Phan. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Processing Magazine, 32(2): 145-163, 2015.

[9]

Ronald R. Coifman, Ioannis G. Kevrekidis, Stéphane Lafon, Mauro Maggioni, and Boaz Nadler. Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. SIAM Journal on Multiscale Modeling and Simulation, 7(2):852-864, 2008.

[10]

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. On the best rank-1 and rank-(r 1, r 2,..., rn) approximation of higher-order tensors. SIAM journal on Matrix Analysis and Applications, 21(4):1324-1342, 2000a.

[11]

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253-1278, 2000b.

[12]

Vin De Silva and Lek-Heng Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications, 30(3): 1084-1127, 2008.

[13]

Simon S Du, Sham M Kakade, Ruosong Wang, and Lin F Yang. Is a good representation sufficient for sample efficient reinforcement learning? arXiv preprint arXiv:1910.03016, 2019a.

[14]

Simon S Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, and John Langford. Provably efficient rl with rich observations via latent state decoding. arXiv preprint arXiv:1901.09018, 2019b.

[15]

Weinan E, Tiejun Li, and Eric Vanden-Eijnden. Optimal partition and effective dynamics of complex networks. Proceedings of the National Academy of Sciences, 105(23):7907-7912, 2008.

[16]

Rungang Han, Rebecca Willett, and Anru R Zhang. An optimal statistical and computational framework for generalized tensor estimation. The Annals of Statistics, 50(1):1-29, 2022.

[17]

David Hong, Tamara G Kolda, and Jed A Duersch. Generalized canonical polyadic tensor decomposition. SIAM Review, 62(1):133-163, 2020.

[18]

Majid Janzamin, Rong Ge, Jean Kossaifi, and Anima Anandkumar. Spectral learning on matrices and tensors. Foundations and Trends ® in Machine Learning, 12(5-6):393-536, 2019.

[19]

Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, and Robert E Schapire. Contextual decision processes with low bellman rank are pac-learnable. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1704-1713. JMLR.org, 2017.

[20]

Chi Jin, Zhuoran Yang, Zhaoran Wang, and Michael I Jordan. Provably efficient reinforcement learning with linear function approximation. arXiv preprint arXiv:1907.05388, 2019.

[21]

Jeff Johns and Sridhar Mahadevan. Constructing basis functions from directed graphs for value function approximation. In Proceedings of the 24th international conference on Machine learning, pages 385-392. ACM, 2007.

[22]

Ian T Jolliffe. Principal components in regression analysis. In Principal component analysis, pages 129-155. Springer, 1986.

[23]

Stefan Klus, Péter Koltai, and Christof Schütte. On the numerical approximation of the perron-frobenius and koopman operator. Journal of Computational Dynamics, 3(1):51-79, 2016.

[24]

Stefan Klus, Ingmar Schuster, and Krikamol Muandet. Eigendecompositions of transfer operators in reproducing kernel hilbert spaces. Journal of Nonlinear Science, 30(1): 283-315, 2020.

[25]

Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455-500, 2009.

[26]

Stéphane Lafon and Ann Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(9):1393-1403, 2006.

[27]

Michail G Lagoudakis and Ronald Parr. Least-squares policy iteration. Journal of machine learning research, 4(Dec):1107-1149, 2003.

[28]

Tor Lattimore and Csaba Szepesvari. Learning with good feature representations in bandits and in rl with a generative model. arXiv preprint arXiv:1911.07676, 2019.

[29]

David Asher Levin, Yuval Peres, and Elizabeth Lee Wilmer. Markov chains and mixing times. American Mathematical Soc., 2009.

[30]

Matthias Löffler and Antoine Picard. Spectral thresholding for the estimation of markov chain transition operators. Electronic Journal of Statistics, 15(2):6281-6310, 2021.

[31]

Sridhar Mahadevan. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd international conference on Machine learning, pages 553-560. ACM, 2005.

[32]

Sridhar Mahadevan et al. Learning representation and control in markov decision processes: New frontiers. Foundations and Trends® in Machine Learning, 1(4):403-565, 2009.

[33]

Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, and Animashree Anandkumar. Tesseract: Tensorised actors for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 7301-7312. PMLR, 2021.

[34]

Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, and John Langford. Kinematic state abstraction and provably efficient rich-observation reinforcement learning. arXiv preprint arXiv:1911.05815, 2019.

[35]

Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, and Alekh Agarwal. Model-free representation learning and exploration in low-rank mdps. arXiv preprint arXiv:2102.07035, 2021.

[36]

Andrew W Moore. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In Machine Learning Proceedings 1991, pages 333-337. Elsevier, 1991.

[37]

Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, and Mengdi Wang. Representation learning for general-sum low-rank markov games. International Conference on Learning Representations, 2023.

[38]

Dirk Ormoneit and Peter Glynn. Kernel-based reinforcement learning in average-cost problems. IEEE Transactions on Automatic Control, 47(10):1624-1636, 2002.

[39]

Yannis Panagakis, Jean Kossaifi, Grigorios G Chrysos, James Oldfield, Mihalis A Nicolaou, Anima Anandkumar, and Stefanos Zafeiriou. Tensor methods in computer vision and deep learning. Proceedings of the IEEE, 109(5):863-890, 2021.

[40]

Ronald Parr, Christopher Painter-Wakefield, Lihong Li, and Michael Littman. Analyzing feature generation for value-function approximation. In Proceedings of the 24th international conference on Machine learning, pages 737-744. ACM, 2007.

[41]

Marek Petrik. An analysis of laplacian methods for value function approximation in mdps. In IJCAI, pages 2574-2579, 2007.

[42]

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177-1184, 2008.

[43]

Zhiyuan Ren and Bruce H Krogh. State aggregation in markov decision processes. In Decision and Control, 2002, Proceedings of the 41st IEEE Conference on, volume 4, pages 3819-3824. IEEE, 2002.

[44]

Emile Richard and Andrea Montanari. A statistical model for tensor pca. In Advances in Neural Information Processing Systems, pages 2897-2905, 2014.

[45]

Christof Schütte, Frank Noe, Jianfeng Lu, Macro Sarich, and Eric Vanden-Eijnden. Markov state models based on milestoning. The Journal of Chemical Physics, 134(20):204105, 2011.

[46]

Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E Papalexakis, and Christos Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551-3582, 2017.

[47]

Satinder P Singh, Tommi Jaakkola, and Michael I Jordan. Reinforcement learning with soft state aggregation. In Advances in neural information processing systems, pages 361-368, 1995.

[48]

Zhao Song, David Woodruff, and Huan Zhang. Sublinear time orthogonal tensor decomposition. In Advances in Neural Information Processing Systems, pages 793-801, 2016.

[49]

Will Wei Sun, Junwei Lu, Han Liu, and Guang Cheng. Provable sparse tensor decomposition. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):899-916, 2017.

[50]

Yifan Sun, Yaqi Duan, Hao Gong, and Mengdi Wang. Learning low-dimensional state embeddings and metastable clusters from time series data. In Advances in Neural Information Processing Systems, pages 4563-4572, 2019.

[51]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.

[52]

Joel A Tropp. Freedman's inequality for matrix martingales. Electron. Commun. Probab, 16:262-270, 2011.

[53]

John N Tsitsiklis and Benjamin Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1-3):59-94, 1996.

[54]

Masatoshi Uehara, Xuezhou Zhang, and Wen Sun. Representation learning for online and offline rl in low-rank mdps. arXiv preprint arXiv:2110.04652, 2021.

[55]

Pascal Van Der Vaart, Anuj Mahajan, and Shimon Whiteson. Model based multi-agent reinforcement learning with tensor decompositions. arXiv preprint arXiv:2110.14524, 2021.

[56]

Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing, 34 (2):A1027-A1052, 2012.

[57]

Roman Vershynin. High-Dimensional Probability. Cambridge University Press (to appear), 2017.

[58]

Per-Åke Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12(1):99-111, 1972.

[59]

Lin Yang and Mengdi Wang. Sample-optimal parametric q-learning using linearly additive features. In International Conference on Machine Learning, pages 6995-7004, 2019.

[60]

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, and Emma Brunskill. Limiting extrapolation in linear approximate value iteration. In Advances in Neural Information Processing Systems, pages 5616-5625, 2019.

[61]

Anru Zhang. Cross: Efficient low-rank tensor completion. The Annals of Statistics, 47(2): 936-964, 2019.

[62]

Anru Zhang and Rungang Han. Optimal sparse singular value decomposition for high-dimensional high-order data. Journal of the American Statistical Association, pages 1708-1725, 2019.

[63]

Anru Zhang and Mengdi Wang. Spectral state compression of markov processes. IEEE transactions on information theory, 66(5):3202-3231, 2020.

[64]

Anru Zhang and Dong Xia. Tensor svd: Statistical and computational limits. IEEE Transactions on Information Theory, 64(11):7311-7338, 2018.

[65]

Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, and Wen Sun. Efficient reinforcement learning in block mdps: A model-free representation learning approach. In International Conference on Machine Learning, pages 26517-26547. PMLR, 2022.

[66]

Ziwei Zhu, Xudong Li, Mengdi Wang, and Anru Zhang. Learning markov models via low-rank optimization. Operations Research, 70(4):2384-2398, 2022.

Index Terms

Learning good state and action representations for Markov decision process via tensor decomposition

Index terms have been assigned to the content through auto-classification.

Recommendations

Learning Good State and Action Representations via Tensor Decomposition
2021 IEEE International Symposium on Information Theory (ISIT)
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations ...
Tucker decomposition-based tensor learning for human action recognition

The spatial information is the important cue for human action recognition. Different from the vector representation, the spatial structure of human action in the still images can be preserved by the tensor representation. This paper proposes a robust ...
Tensor completion via tensor QR decomposition and L 2 , 1-norm minimization
Highlights
- A new tensor decomposition method named CTSVD-QR, faster than t-SVD, can iteratively decompose a third-order tensor into three low-tubal-rank tensors and is ...
Abstract
In this paper, we consider the tensor completion problem, which has been a concern for many researchers studying signal processing and computer vision. Our fast and precise method is built on extending the L 2 , 1-norm minimization and ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 24, Issue 1

January 2023

18881 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Pradeep Ravikumar
Carnegie Mellon University
,
Tong Zhang
University of Illinois Urbana-Champaign

Issue’s Table of Contents

Copyright © 2023.

CC-BY 4.0

Publisher

JMLR.org

Publication History

Published: 06 March 2024

Accepted: 01 February 2023

Revised: 01 February 2023

Received: 01 August 2022

Published in JMLR Volume 24, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
32
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents