Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

K-spin Hamiltonian for quantum-resolvable Markov decision processes

  • Research Article
  • Published:
Quantum Machine Intelligence Aims and scope Submit manuscript

Abstract

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In arguing that the variational minimization of our Hamiltonian is approximately equivalent to the Bellman optimality condition for a prevalent class of environments we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Find the latest articles, discoveries, and news in related topics.

References

  • Albash T, Lidar DA (2018) Demonstration of a scaling advantage for a quantum annealer over simulated annealing. Phys Rev X 8(3):031016

    Google Scholar 

  • Bapst V, Foini L, Krzakala F, Semerjian G, Zamponi F (2013) The quantum adiabatic algorithm applied to random optimization problems: the quantum spin glass perspective. Phys Rep 523(3):127–205

    Article  MathSciNet  Google Scholar 

  • Barahona F (1982) On the computational complexity of ising spin glass models. J Phys A Math Gen 15(10):3241

    Article  MathSciNet  Google Scholar 

  • Barenco A, Bennett CH, Cleve R, DiVincenzo DP, Margolus N, Shor P, Sleator T, Smolin JA, Weinfurter H (1995) Elementary gates for quantum computation. Phys Rev A 52(5):3457

    Article  Google Scholar 

  • Barry AC (2000) The ising model is np-complete. SIAM News 33(6):1–3

    Google Scholar 

  • Boothby K, Bunyk P, Raymond J, Roy A (2019) Next-generation topology of d-wave quantum processors. Technical report, Technical report

  • Boros E, Hammer PL (2002) Pseudo-boolean optimization. Discret Appl Math 123(1-3):155–225

    Article  MathSciNet  Google Scholar 

  • Briegel HJ, De las Cuevas G (2012) Projective simulation for artificial intelligence. Sci Rep 2:400

    Article  Google Scholar 

  • Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540

  • Dattani N (2019) Quadratization in discrete optimization and quantum mechanics. arXiv:1901.04405

  • Day AGR, Bukov M, Weinberg P, Mehta P, Dries S (2019) Glassy phase of optimal quantum control. Phys Rev Lett 122(2):020601

    Article  Google Scholar 

  • Denchev VS, Boixo S, Isakov SV, Ding N, Babbush R, Smelyanskiy V, Martinis J, Neven H (2016) What is the computational value of finite-range tunneling?. Phys Rev X 6(3):031015

    Google Scholar 

  • Derrida B (1980) Random-energy model: limit of a family of disordered models. Phys Rev Lett 45(2):79

    Article  MathSciNet  Google Scholar 

  • Dong D, Chen C, Li H, Tarn TJ (2008) Quantum reinforcement learning. IEEE Trans Sys Man Cybern Part B Cybern 38(5):1207–1220

    Article  Google Scholar 

  • Dunjko V, Friis N, Hans JB (2015) Quantum-enhanced deliberation of learning agents using trapped ions. New J Phys 17(2):023006

    Article  Google Scholar 

  • Dunjko V, Taylor JM, Hans JB (2016) Quantum-enhanced machine learning. Phys Rev Lett 117(13):130501

    Article  MathSciNet  Google Scholar 

  • Dunjko V, Taylor JM, Hans JB (2017) Advances in quantum reinforcement learning. In: IEEE international conference on systems, man, and cybernetics (SMC), pp 282–287. IEEE

  • Dynkin EB (1983) Markov processes as a tool in field theory. J Funct Anal 50(2):167–187

    Article  MathSciNet  Google Scholar 

  • Farhi E, Gosset D, Hen I, Sandvik AW, Shor P, Young AP, Francesco Z (2012) Performance of the quantum adiabatic algorithm on random instances of two optimization problems on regular hypergraphs. Phys Rev A 86(5):052334

    Article  Google Scholar 

  • Farhi E, Goldstone J, Gutmann S (2014) A quantum approximate optimization algorithm. arXiv:1411.4028

  • Fix A, Gruber A, Boros E, Ramin Z (2011) A graph cut algorithm for higher-order markov random fields. In: International conference on computer vision, pp. 1020–1027. IEEE, p 2011

  • Golovin N, Rahm E (2004) Reinforcement learning architecture for web recommendations. In: International conference on information technology: coding and computing, 2004. Proceedings. ITCC 2004. vol 1, pp 398–402. IEEE

  • Greenlaw R, Hoover JH, Ruzzo WL, et al. (1995) Limits to parallel computation: p-completeness theory. Oxford University Press on Demand, Oxford

    Book  Google Scholar 

  • Isakov SV, Zintchenko IN, Rønnow TF, Troyer M (2015) Optimised simulated annealing for ising spin glasses. Comput Phys Commun 192:265–271

    Article  MathSciNet  Google Scholar 

  • Jones EB, Kapit E, Chang CY, Biagioni D, Vaidhynathan D, Graf P, Jones W (2020) On the computational viability of quantum optimization for pmu placement. arXiv:2001.04489

  • Kadowaki T, Nishimori H (1998) Quantum annealing in the transverse ising model. Phys Rev E 58(5):5355

    Article  Google Scholar 

  • Kappen HJ (2005) Path integrals and symmetry breaking for optimal control theory. J Stat Mech Theory Exp 2005(11):P11011

    Article  MathSciNet  Google Scholar 

  • Kumar P (2013) Direct implementation of an n-qubit controlled-unitary gate in a single step. Quantum Inf Process 12(2):1201–1223

    Article  MathSciNet  Google Scholar 

  • Lamata L (2017) Basic protocols in quantum reinforcement learning with superconducting circuits. Sci Rep 7(1):1609

    Article  Google Scholar 

  • Lucas A (2019) Hard combinatorial problems and minor embeddings on lattice graphs. Quantum Inf Process 18(7):203

    Article  MathSciNet  Google Scholar 

  • Neukart F, Dollen DV, Seidel C, Compostella G (2018) Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces. Front Phys 5:71

    Article  Google Scholar 

  • Nielsen MA, Chuang I (2002) Quantum computation and quantum information

  • Papadimitriou CH, Tsitsiklis JN (1987) The complexity of markov decision processes. Math Oper Res 12(3):441–450

    Article  MathSciNet  Google Scholar 

  • Paparo GD, Dunjko V, Makmal A, Martin-Delgado MA, Hans JB (2014) Quantum speedup for active learning agents. Phys Rev X 4(3):031002

    Google Scholar 

  • Patil P, Kourtis S, Chamon C, Mucciolo ER, Andrei ER (2019) Obstacles to quantum annealing in a planar embedding of xorsat. Phys Rev B 100(5):054435

    Article  Google Scholar 

  • Pedersen SP, Christensen KS, Nikolaj TZ (2019) Native three-body interaction in superconducting circuits. Phys Rev Res 1(3):033123

    Article  Google Scholar 

  • Peskin ME (2018) An introduction to quantum field theory. CRC Press, Boca Raton

    Book  Google Scholar 

  • Rosenberg IG (1975) Reduction of bivalent maximization to the quadratic case. Cahiers du Centre d’etudes de Recherche Operationnelle 17:71–74

    MathSciNet  MATH  Google Scholar 

  • Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al. (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  Google Scholar 

  • Stuart ED (1965) Dynamic programming and the calculus of variations. Technical report, RAND CORP SANTA MONICA CA

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22

    Article  Google Scholar 

  • D-Wave Systems Inc. (2018) Source code for neal.sampler. https://docs.ocean.dwavesys.com/projects/neal/en/latest/_modules/neal/sampler.html#SimulatedAnnealingSampler.sample. Accessed: 2020-03-21

  • Theodorou E, Buchli J, Schaal S (2010) A generalized path integral control approach to reinforcement learning. J Mach Learn Res 11(Nov):3137–3181

    MathSciNet  MATH  Google Scholar 

  • Yates R (2009) Fixed-point arithmetic: an introduction. Digital Signal Labs 81(83):198

    Google Scholar 

  • Zintchenko I, Hastings MB, Troyer M (2015) From local to global ground states in ising spin glasses. Phys Rev B 91(2):024201

    Article  Google Scholar 

Download references

Acknowledgments

This work was authored in part by the National Renewable Energy Laboratory (NREL), operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. This research used Ising, Los Alamos National Laboratory’s D-Wave quantum annealer. Ising is supported by NNSA’s Advanced Simulation and Computing program. The authors would like to thank Scott Pakin and Denny Dahl.

Funding

This work was supported by the Laboratory Directed Research and Development (LDRD) Program at NREL. This material is based in-part upon work supported by the National Science Foundation under Grant No. PHY-1653820.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric B. Jones.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Disclaimer

The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

Code availability

The code that supports the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 250 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jones, E.B., Graf, P., Kapit, E. et al. K-spin Hamiltonian for quantum-resolvable Markov decision processes. Quantum Mach. Intell. 2, 12 (2020). https://doi.org/10.1007/s42484-020-00026-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42484-020-00026-6

Keywords