Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Grammar-aware sentence classification on quantum computers

  • Research Article
  • Published:
Quantum Machine Intelligence Aims and scope Submit manuscript

Abstract

Natural language processing (NLP) is at the forefront of great advances in contemporary AI, and it is arguably one of the most challenging areas of the field. At the same time, in the area of quantum computing (QC), with the steady growth of quantum hardware and notable improvements towards implementations of quantum algorithms, we are approaching an era when quantum computers perform tasks that cannot be done on classical computers with a reasonable amount of resources. This provides an new range of opportunities for AI, and for NLP specifically. In this work, we work with the Categorical Distributional Compositional (DisCoCat) model of natural language meaning, whose underlying mathematical underpinnings make it amenable to quantum instantiations. Earlier work on fault-tolerant quantum algorithms has already demonstrated potential quantum advantage for NLP, notably employing DisCoCat. In this work, we focus on the capabilities of noisy intermediate-scale quantum (NISQ) hardware and perform the first implementation of an NLP task on a NISQ processor, using the DisCoCat framework. Sentences are instantiated as parameterised quantum circuits; word-meanings are embedded in quantum states using parameterised quantum-circuits and the sentence’s grammatical structure faithfully manifests as a pattern of entangling operations which compose the word-circuits into a sentence-circuit. The circuits’ parameters are trained using a classical optimiser in a supervised NLP task of binary classification. Our novel QNLP model shows concrete promise for scalability as the quality of the quantum hardware improves in the near future and solidifies a novel branch of experimental research at the intersection of QC and AI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The toy datasets of generated sentences used in this work can be found in Appendix D. Further information about the datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

References

  • Aaronson S (2015) Read the fine print. Nat Phys 11(4):291–293

    Article  Google Scholar 

  • Abramsky S, Coecke B (2004) A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004., pp. 415–425. https://doi.org/10.1109/LICS.2004.1319636

  • Aharonov D, Jones V, Landau Z (2008) A polynomial quantum algorithm for approximating the jones polynomial. Algorithmica 55(3):395–421. https://doi.org/10.1007/s00453-008-9168-0

    Article  MathSciNet  MATH  Google Scholar 

  • Arute F, Arya K, Babbush R, Bacon D, Bardin JC, Barends R, Biswas R, Boixo S, Brandao FGSL, Buell DA, Burkett B, Chen Y, Chen Z, Chiaro B, Collins R, Courtney W, Dunsworth A, Farhi E, Foxen B, Fowler A, Gidney C, Giustina M, Graff R, Guerin K, Habegger S, Harrigan MP, Hartmann MJ, Ho A, Hoffmann M, Huang T, Humble TS, Isakov SV, Jeffrey E, Jiang Z, Kafri D, Kechedzhi K, Kelly J, Klimov PV, Knysh S, Korotkov A, Kostritsa F, Landhuis D, Lindmark M, Lucero E, Lyakh D, Mandrà S, McClean JR, McEwen M, Megrant A, Mi X, Michielsen K, Mohseni M, Mutus J, Naaman O, Neeley M, Neill C, Niu MY, Ostby E, Petukhov A, Platt JC, Quintana C, Rieffel EG, Roushan P, Rubin NC, Sank D, Satzinger KJ, Smelyanskiy V, Sung KJ, Trevithick MD, Vainsencher A, Villalonga B, White T, Yao ZJ, Yeh P, Zalcman A, Neven H, Martinis JM (2019) Quantum supremacy using a programmable superconducting processor. Nature 574(7779):505–510. https://doi.org/10.1038/s41586-019-1666-5

    Article  Google Scholar 

  • Baez JC, Stay M (2009) Physics, topology, Logic and Computation: A Rosetta Stone

  • Bankova D, Coecke B, Lewis M, Marsden D (2016) Graded entailment for compositional distributional semantics

  • Bausch J, Subramanian S, Piddock S (2020) A quantum search decoder for natural language processing

  • Beer K, Bondarenko D, Farrelly T, Osborne TJ, Salzmann R, Scheiermann D, Wolf R (2020) Training deep quantum neural networks. Nature Communications 11(1). https://doi.org/10.1038/s41467-020-14454-2

  • Benedetti M, Fiorentini M, Lubasch M (2020) Hardware-efficient variational quantum algorithms for time evolution

  • Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quantum Science and Technology 4(4):043001. https://doi.org/10.1088/2058-9565/ab4eb5

    Article  Google Scholar 

  • Bharti K, Cervera-Lierta A, Kyaw TH, Haug T, Alperin-Lea S, Anand A, Degroote M, Heimonen H, Kottmann JS, Menke T, Mok W-K, Sim S, Kwek L-C, Aspuru-guzik A (2021) Noisy intermediate-scale quantum (NISQ) algorithms

  • Blackburn P, Bos J (2005) Representation and inference for natural language: a first course in computational semantics center for the study of language and information. Stanford, CA

  • Bonet-Monroig X, Wang H, Vermetten D, Senjean B, Moussa C, Bäck T, Dunjko V, O’brien TE (2021) Performance comparison of optimization methods on variational quantum algorithms

  • Bradbury J, Frostig R, Hawkins P, Johnson MJ, Leary C, Maclaurin D, Wanderman-Milne S (2018) JAX: Composable Transformations of Python+NumPy programs. http://github.com/google/jax

  • Bradley T. -D., Stoudenmire EM, Terilla J (2019) Modeling sequences with quantum states: a look under the hood

  • Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are Few-Shot learners

  • Buhrmester V, Münch D, Arens M (2019) Analysis of explainers of black box deep neural networks for computer vision: a survey

  • Buszkowski W, Moroz K (2007) Pregroup Grammars and Context-free Grammars

  • Chen JC (2002) Quantum computation and natural language processing

  • Chen Y, Pan Y, Dong D (2020) Quantum language model with entanglement embedding for question answering

  • Chia N-H, Gilyén A, Li T, Lin H-H, Tang E, Wang C (2020) Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. https://doi.org/10.1145/3357713.3384314

  • Chomsky N (1957) Syntactic structures. Mouton

  • Coecke B (2020) The mathematics of text structure

  • Coecke B, Kissinger A (2017) Picturing quantum processes. a first course in quantum theory and diagrammatic reasoning. Cambridge University Press, Cambridge. https://doi.org/10.1017/9781316219317

    Book  MATH  Google Scholar 

  • Coecke B, Sadrzadeh M, Clark S (2010) Mathematical foundations for a compositional distributional model of meaning

  • Coecke B, de Felice G, Meichanetzidis K, Toumi A (2020) Foundations for near-term quantum natural language processing

  • Cowtan A, Dilkes S, Duncan R, Krajenbrink A, Simmons W, Sivarajah S (2019) On the qubit routing problem. In: van Dam W, Mancinska L (eds) 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019). Leibniz International Proceedings in Informatics (LIPIcs). https://doi.org/10.4230/LIPIcs.TQC.2019.5. http://drops.dagstuhl.de/opus/volltexte/2019/10397, vol 135. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp 5–1532

  • de Felice G, Meichanetzidis K, Toumi A (2020) Functorial question answering. Electronic Proceedings in Theoretical Computer Science 323:84–94. https://doi.org/10.4204/eptcs.323.6

    Article  MathSciNet  MATH  Google Scholar 

  • de Felice G, Toumi A, Coecke B (2020) Discopy: monoidal categories in Python

  • Dunjko V, Taylor JM, Briegel HJ (2016) Quantum-enhanced machine learning. Physical Review Letters 117(13). https://doi.org/10.1103/physrevlett.117.130501

  • Efthymiou S, Hidary J, Leichenauer S (2019) Tensornetwork for machine learning

  • Eisert J (2013) Entanglement and tensor network states

  • Gallego AJ, Orus R. (2019) Language Design as Information Renormalization

  • Gao F, Han L (2010) Implementing the nelder-mead simplex algorithm with adaptive parameters. Comput Optim Appl 51(1):259–277. https://doi.org/10.1007/s10589-010-9329-3

    Article  MathSciNet  MATH  Google Scholar 

  • Grefenstette E, Sadrzadeh M (2011) Experimental support for a categorical compositional distributional model of meaning. In: The 2014 conference on empirical methods on natural language processing. arXiv:1106.4058, pp 1394–1404

  • Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Physical Review Letters 103(15). https://doi.org/10.1103/physrevlett.103.150502

  • Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212. https://doi.org/10.1038/s41586-019-0980-2

    Article  Google Scholar 

  • Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, vol 1. Prentice Hall PTR, USA

    Google Scholar 

  • Kartsaklis D, Fan I, Yeung R, Pearson A, Lorenz R, Toumi A, de Felice G, Meichanetzidis K, Clark S, Coecke B (2021) Lambeq: An Efficient High-Level Python Library for Quantum NLP

  • Kartsaklis D, Sadrzadeh M (2013) Prior disambiguation of word tensors for constructing sentence vectors. In: The 2013 conference on empirical methods on natural language processing. ACL, pp 1590–1601

  • Kerenidis I, Landman J, Luongo A, Prakash A (2019) Q-means: a quantum algorithm for unsupervised machine learning. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. https://proceedings.neurips.cc/paper/2019/file/16026d60ff9b54410b3435b403afd226-Paper.pdf , vol 32. Curran Associates Inc, pp 4134–4144

  • Lambek J (1958) The mathematics of sentence structure. American Mathematical Monthly. 154–170

  • Lambek J (2008) From word to sentence

  • Lewis M (2020) Towards logical negation for compositional distributional semantics

  • Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Physical Review Letters 114(14). https://doi.org/10.1103/physrevlett.114.140504

  • Lloyd S, Schuld M, Ijaz A, Izaac J, Killoran N (2020) Quantum embeddings for machine learning

  • Meichanetzidis K, Gogioso S, Felice GD, Chiappori N, Toumi A, Coecke B (2020) Quantum natural language processing on near-term quantum computers

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space

  • Mitarai K, Fujii K (2019) Methodology for replacing indirect measurements with direct measurements. Physical Review Research 1(1). https://doi.org/10.1103/physrevresearch.1.013006

  • Montague R (2008) Universal grammar. Theoria 36(3):373–398. https://doi.org/10.1111/j.1755-2567.1970.tb00434.x

    Article  MathSciNet  MATH  Google Scholar 

  • O’Riordan LJ, Doyle M, Baruffa F, Kannan V (2020) A hybrid classical-quantum workflow for natural language processing. Machine Learning: Science and Technology. https://doi.org/10.1088/2632-2153/abbd2e

  • Olson B, Hashmi I, Molloy K, Shehu A (2012) Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules. Advances in Artificial Intelligence 2012:1–19. https://doi.org/10.1155/2012/674832

    Article  Google Scholar 

  • Orús R (2019) Tensor networks for complex quantum systems. Nature Reviews Physics 1 (9):538–550. https://doi.org/10.1038/s42254-019-0086-7

    Article  Google Scholar 

  • Pentus M (1993) Lambek grammars are context free. In: 1993 Proceedings Eighth Annual IEEE Symposium on Logic in Computer Science, pp 429–433. https://doi.org/10.1109/LICS.1993.287565

  • Pestun V, Vlassopoulos Y (2017) Tensor network language model

  • Piedeleu R, Kartsaklis D, Coecke B, Sadrzadeh M (2015) Open system categorical quantum semantics in natural language processing

  • Preller A (2007) Linear processing with pregroups. Studia Logica: An International Journal for Symbolic Logic 87(2/3):171–197

    Article  MathSciNet  MATH  Google Scholar 

  • Sadrzadeh M, Clark S, Coecke B (2013) The Frobenius anatomy of word meanings i: subject and object relative pronouns. J Log Comput 23(6):1293–1317. https://doi.org/10.1093/logcom/ext044

    Article  MathSciNet  MATH  Google Scholar 

  • Sadrzadeh M, Clark S, Coecke B (2014) The Frobenius anatomy of word meanings ii: possessive relative pronouns. J Log Comput 26(2):785–815. https://doi.org/10.1093/logcom/exu027

    Article  MathSciNet  MATH  Google Scholar 

  • Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Physical Review A 101(3). https://doi.org/10.1103/physreva.101.032308

  • Schuld M, Killoran N (2019) Quantum machine learning in feature hilbert spaces. Physical Review Letters 122(4). https://doi.org/10.1103/physrevlett.122.040504

  • Searls DB (2002) The language of genes. Nature 420(6912):211–217. https://doi.org/10.1038/nature01255

    Article  Google Scholar 

  • Selinger P (2010) A survey of graphical languages for monoidal categories. Lecture Notes in Physics, 289–355. https://doi.org/10.1007/978-3-642-12821-9_4

  • Sivarajah S, Dilkes S, Cowtan A, Simmons W, Edgington A, Duncan R (2020) Tket: a retargetable compiler for nisq devices Quantum Science and Technology. https://doi.org/10.1088/2058-9565/ab8e92

  • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. https://www.aclweb.org/anthology/D13-1170. Association for Computational Linguistics, pp 1631–1642

  • Spall JC (1998) Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans Aerosp Electron Syst 34(3):817–823. https://doi.org/10.1109/7.705889

    Article  Google Scholar 

  • Turing AM (1950) I.—computing machinery and intelligence. Mind LIX(236):433–460. https://academic.oup.com/mind/article-pdf/LIX/236/433/30123314/lix-236-433.pdf. https://doi.org/10.1093/mind/LIX.236.433

    Article  MathSciNet  Google Scholar 

  • Wiebe N, Bocharov A, Smolensky P, Troyer M, Svore KM (2019) Quantum Language Processing

  • Wootton JR (2020) Procedural generation using quantum computation. International Conference on the Foundations of Digital Games. https://doi.org/10.1145/3402942.3409600

  • Yeung R, Kartsaklis D (2021) A CCG-based Version of the DisCoCat Framework

  • Zeng W, Coecke B (2016) Quantum algorithms for compositional natural language processing. Electronic Proceedings in Theoretical Computer Science 221:67–75. https://doi.org/10.4204/eptcs.221.8

    Article  Google Scholar 

  • Zeng Z, Shi H, Wu Y, Hong Z (2015) Survey of natural language processing techniques in bioinformatics. Comput Math Methods Med 2015:674296. https://doi.org/10.1155/2015/674296

    Article  Google Scholar 

  • Zhao Q, Hou C, Liu C, Zhang P, Xu R (2020) A quantum expectation value based language model with application to question answering. Entropy 22(5):533

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

KM thanks Vojtech Havlicek and Christopher Self for discussions on QML, Robin Lorenz and Marcello Benedetti for comments on the manuscript, and the TKET team at Quantinuum for technical advice on quantum compilation on IBMQ machines. We acknowledge the use of IBM Quantum services for this work.

Funding

KM is grateful to the Royal Commission for the Exhibition of 1851 for financial support under a Postdoctoral Research Fellowship. AT thanks Simon Harrison for financial support through the Wolfson Harrison UK Research Council Quantum Foundation Scholarship.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the theory, the design of the model, and the high-level definition of the classification task. AT and GDF wrote the DisCoPy library (de Felice et al. 2020) and tested early versions of the experiment on ibmq_singapore. KM generated and annotated the data, implemented the simulations and experiments, and wrote the manuscript. BC led the project.

Corresponding author

Correspondence to Konstantinos Meichanetzidis.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Disclaimer

The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

In this supplementary material, we begin by briefly reviewing pregroup grammar. We then provide the necessary background to the graphical language of process theories describe our procedure for generating random sentence diagrams using a context-free grammar. For completeness we include the three labelled corpora of sentences we used in this work. Furthermore, we show details of our mapping from sentence diagrams to quantum circuits. Finally, we give details on the optimisation methods we used for our supervised quantum machine learning task and the specific compilation pass we used from CQC’s compiler, TKET.

Appendix A: Pregroup grammar

Pregroup grammars where introduced by Lambek as an algebraic model for grammar (Lambek 2008).

A pregroup grammar G is freely generated by the basic types in a finite set bB. Basic types are decorated by an integer \(k\in \mathbb {Z}\), which signifies their adjoint order. Negative integers − k, with \(k\in \mathbb {N}\), are called left adjoints of order k and positive integers \(k\in \mathbb {N}\) are called right adjoints. We shall refer to a basic type to some adjoint order (include the zeroth order) simply as ‘type’. The zeroth order k = 0 signifies no adjoint action on the basic type and so we often omit it in notation, b0 = b.

The pregroup algebra is such that the two kinds of adjoint (left and right) act as left and right inverses under multiplication of basic types

$$ b^{k} b^{k+1} \to \epsilon \to b^{k+1} b^{k} ,$$

where 𝜖B is the trivial or unit type. The left hand side of this reduction is called a contraction and the right hand side an expansion. Pregroup grammar also accommodates induced steps ab for a,bB. The symbol ‘→’ is to be read as ‘type-reduction’ and the pregroup grammar sets the rules for which reductions are valid.

Now, to go from word to sentence, we consider a finite set of words called the vocabulary V. We call the dictionary (or lexicon) the finite set of entries \(D \subseteq V \times (B\times \mathbb {Z})^{*}\). The star symbol A denotes the set of finite strings that can be generated by the elements of the set A. Each dictionary entry assigns a product (or string) of types to a word \(t_{w} = {\prod }_{i} b_{i}^{k_{i}}\), \(k_{i} \in \mathbb {Z}\).

Finally, a pregroup grammar G generates a language \(L_{G} \subseteq V^{*}\) as follows. A sentence is a sequence (or list) of words σV. The type of a sentence is the product of types of its words \(t_{\sigma } = {\prod }_{i} t_{w_{i}}\), where wiV and i ≤|σ|. A sentence is grammatical, i.e. it belongs to the language generated by the grammar σLG, if and only if there exists a sequence of reductions so that the type of the sentence reduces to the special sentence-typesB as \(t_{\sigma } \to {\dots } \to s\). Note that it is in fact possible to type-reduce grammatical sentences only using contractions.

Appendix B: String diagrams

String diagrams describing process theories are generated by states, effects, and processes. In Fig. 6, we comprehensively show these generators along with constraining equations on them. String diagrams for process theories formally describe process networks where only connectivity matters, i.e. which outputs are connected to which inputs. In other words, the length of the wires carries no meaning and the wires are freely deformable as long as the topology of the network is respected.

Fig. 6
figure 6

Diagrams are read from top to bottom. States have only outputs, effects have only inputs, processes (boxes) have both input and output wires. All wires carry types. Placing boxes side by side is allowed by the monoidal structure and signifies parallel processes. Sequential process composition is represented by composing outputs of a box with inputs of another box. A process transforms a state into a new state. There are special kinds of states called caps and effects called cups, which satisfy the snake equation which relates them to the identity wire (trivial process). Process networks freely generated by these generators need not be planar, and so there exists a special process that swaps wires and acts trivially on caps and cups

It is beyond the purposes of this work to provide a comprehensive exposition on diagrammatic languages. We provide the necessary elements which are used for the implementation of our QNLP experiments.

Appendix C: Random sentence generation with CFG

A context-free grammar generates a language from a set of production (or rewrite) rules applied on symbols. Symbols belong to a finite set Σ and There is a special type S ∈Σ called initial. Production rules belong to a finite set R and are of the form \(T \to {\prod }_{i} T_{i}\), where T,Ti ∈Σ. The application of a production rule results in substituting a symbol with a product (or string) of symbols. Randomly generating a sentence amounts to starting from S and randomly applying production rules uniformly sampled from the set R. The production ends when all types produced are terminal types, which are non other than words in the finite vocabulary V.

From a process theory point of view, we represent symbols as types carried by wires. Production rules are represented as boxes with input and output wires labelled by the appropriate types. The process network (or string diagram) describing the production of a sentence ends with a production rule whose output is the S-type. Then we randomly pick boxes and compose them backwards, always respecting type-matching when inputs of production rules are fed into outputs of other production rules. The generation terminates when production rules are applied which have no inputs (i.e. they are states), and they correspond to the words in the finite vocabulary.

In Fig. 7 (on the left hand side of the arrows), we show the string-diagram generators we use to randomly produce sentences from a vocabulary of words composed of nouns, transitive verbs, intransitive verbs, and relative pronouns. The corresponding types of these parts of speech are N,TV,IV,RPRON. The vocabulary is the union of the words of each type, V = VNVTVVIVVRPRON.

Fig. 7
figure 7

CFG generation rules used to produce the corpora K30,K6,K16 used in this work, represented as string-diagram generators, where wNVN, wTVVTV, wIVVIV, wRPRONVRPRON. They are mapped to pregroup reductions by mapping CFG symbols to pregroup types, and so CFG-states are mapped to DisCoCat word-states and production boxes are mapped to products of cups and identities. Note that the pregroup unit 𝜖 is the empty wire and so it is never drawn. Pregroup type contractions correspond to cups and expansions to caps. Since grammatical reduction are achievable only with contractions, only cups are required for the construction of sentence diagrams

Having randomly generated a sentence from the CFG, its string diagram can be translated into a pregroup sentence diagram. To do so, we use the translation rules shown in Fig. 7. Note that a cup labeled by the basic type b is used to represent a contraction bkbk+ 1𝜖. Pregroup grammars are weakly equivalent to context-free grammars, in the sense that they generate the same language (Buszkowski and Moroz 2007; Pentus 1993).

Appendix D: Corpora

Here, we present the sentences and their labels used in the experiments presented in the main text.

The types assigned to the words of this sentence are as follows. Nouns get typed as \(t_{w\in V_{N}}=n^{0}\), transitive verbs are given type \(t_{w\in V_{TV}}=n^{1} s^{0} n^{-1}\), intransitive verbs are typed twIV = n1s0, and the relative pronoun is typed twho = n1n0s− 1n0.

Corpus K30 of 30 labeled sentences from the vocabulary

VN = {’Dude’, ’Walter’}, VTV = {’loves’, ’annoys’}, VIV = {’abides’,’bowls’}, VRPRON = {’who’}: [(’Dude who loves Walter bowls’, 1), (’Dude bowls’, 1), (’Dude annoys Walter’, 0), (’Walter who abides bowls’, 0), (’Walter loves Walter’, 1), (’Walter annoys Dude’, 1), (’Walter bowls’, 1), (’Walter abides’, 0), (’Dude loves Walter’, 1), (’Dude who bowls abides’, 1), (’Walter who bowls annoys Dude’, 1), (’Dude who bowls bowls’, 1), (’Dude who abides abides’, 1), (’Dude annoys Dude who bowls’, 0), (’Walter annoys Walter’, 0), (’Dude who abides bowls’, 1), (’Walter who abides loves Walter’, 0), (’Walter who bowls bowls’, 1), (’Walter loves Walter who abides’, 0), (’Walter annoys Walter who bowls’, 0), (’Dude abides’, 1), (’Dude loves Walter who bowls’, 1), (’Walter who loves Dude bowls’, 1), (’Dude loves Dude who abides’, 1), (’Walter who abides loves Dude’, 0), (’Dude annoys Dude’, 0), (’Walter who annoys Dude bowls’, 1), (’Walter who annoys Dude abides’, 0), (’Walter loves Dude’, 1), (’Dude who bowls loves Walter’, 1)]

Corpus K6 of 6 labeled sentences from the vocabulary VN = {’Romeo’, ’Juliet’}, VTV = {’loves’}, VIV = {’dies’}, VRPRON = {’who’}: [(’Romeo dies’, 1.0), (’Romeo loves Juliet’, 0.0), (’Juliet who dies dies’, 1.0), (’Romeo loves Romeo’, 0.0), (’Juliet loves Romeo’, 0.0), (’Juliet dies’, 1.0)]

Corpus K16 of 16 labeled sentences from the vocabulary VN = {’Romeo’, ’Juliet’}, VTV = {’loves’, ’kills’}, VIV = {’dies’}, VRPRON = {’who’}: [(’Juliet kills Romeo who dies’, 0), (’Juliet dies’, 1), (’Romeo who loves Juliet dies’, 1), (’Romeo dies’, 1), (’Juliet who dies dies’, 1), (’Romeo loves Juliet’, 1), (’Juliet who dies loves Juliet’, 0), (’Romeo kills Juliet who dies’, 0), (’Romeo who kills Romeo dies’, 1), (’Romeo who dies dies’, 1), (’Romeo who loves Romeo dies’, 0), (’Romeo kills Juliet’, 0), (’Romeo who dies kills Romeo’, 1), (’Juliet who dies kills Romeo’, 0), (’Romeo loves Romeo’, 0), (’Romeo who dies kills Juliet’, 0)]

Appendix E: Sentence to circuit mapping

Quantum theory has formally been shown to be a process theory. Therefore it enjoys a diagrammatic language in terms of string diagrams. Specifically, in the context of the quantum circuits we construct in our experiments, we use pure quantum theory. In the case of pure quantum theory, processes are unitary operations, or quantum gates in the context of circuits. The monoidal structure allowing for parallel processes is instantiated by the tensor product and sequential composition is instantiated by sequential composition of quantum gates.

In Fig. 8, we show the generic construction of the mapping from sentence diagrams to parameterised quantum circuits for the hyperparameters and parameterised word-circuits we use in this work.

Fig. 8
figure 8

Mapping from sentence diagrams to parameterised quantum circuits. Here, we show how the generators of sentence diagrams are mapped to generators of circuits, for the hyperparameters we consider in this work

A wire carrying basic pregroup type b is given qb qubits. A word-state with only one output wire becomes a one-qubit-state prepared from |0〉. For the preparation of such unary states we choose the sequence of gates defining an Euler decomposition of one-qubit unitaries Rz(𝜃1) ∘ Rx(𝜃2) ∘ Rz(𝜃3). Word-states with more than one output wires become multiqubit states on k > 1 qubits prepared by an IQP-style circuit from \({\prod }_{i=1}^{k}\vert 0\rangle \). Such a word-circuit is composed of d-many layers. Each layer is composed of a layer of Hadamard gates followed by a layer in which every neighbouring pair of qubit wires is connected by a CRz(𝜃) gate, \(\left (\otimes _{i=1}^{k}H\right ) \circ \left (\otimes _{i=1}^{k-1} {CR_{z}}(\theta _{i})_{i,i+1}\right )\). Since all CRz gates commute with each other it is justified to consider this as a single layer, at least abstractly. The Kronecker tensor with n-many output wires of type b is mapped to a GHZ state on nqb qubits. Specifically, GHZ is a circuit that prepares the state \({\sum }_{x=0}^{2^{q_{b}}} \bigotimes _{i=1}^{n} \vert \text {bin}(x)\rangle \), where bin is the binary expression of an integer. The cup of pregroup type b is mapped to qb-many nested Bell effects, each of which is implemented as a CNOT followed by a Hadamard gate on the control qubit and postselection on 〈00|.

Appendix F: Optimisation method

The gradient-free otpimisation method we use, Simultaneous Perturbation Stochastic Approximation (SPSA), works as follows. Start from a random point in parameter space. At every iteration pick randomly a direction and estimate the derivative by finite difference with step-size depending on c towards that direction. This requires two cost function evaluations. This provides a significant speed up the evaluation of L(𝜃). Then take a step of size depending on a towards (opposite) that direction if the derivative has negative (positive) sign. In our experiments we use minimizeSPSA from the Python package noisyopt https://github.com/andim/noisyopt, and we set a = 0.1 and c = 0.1, except for the experiment on ibmq for d = 3 for which we set a = 0.05 and c = 0.05.

Note that for classical simulations, we use just-in-time compilation of the cost function by invoking jit from jax (Bradbury et al. 2018). In addition, the choice of the squares-of-differences cost we defined in Eq.2 is not unique. One can as well use the binary cross entropy

$$L^{\text{BCE}}(\theta) = -\frac{1}{\vert{\Delta} \vert}{\sum}_{\sigma\in{\Delta}} l_{\sigma} \log l_{\sigma}^{\text{pr}}(\theta_{\sigma}) + (1-l_{\sigma}) \log(1-l_{\sigma}^{\text{pr}}(\theta_{\sigma}))$$

and the cost function can be minimised as well, as shown in Fig. 9.

Fig. 9
figure 9

Minimisation of binary cross entropy cost function LBCE with SPSA for the question answering task for corpus K30

In our classical simulation of the experiment, we also used basinhopping (Olson et al. 2012) in combination with Nelder-Mead (Gao and Han 2010) from the Python package SciPy https://pypi.org/project/scipy. Nelder-Mead is a gradient-free local optimisation method. basinhopping hops (or jumps) between basins (or local minima) and then returns the minimum over local minima of the cost function, where each minimum is found by Nelder-Mead. The hop direction is random. The hop is accepted according to a Metropolis criterion depending on the the cost function to be minimised and a temperature. We used the default temperature value (1) and the default number of basin hops (100).

1.1 F.1: Error decay

In Fig. 10, we show the decay of mean training and test errors for the question answering task for corpus K30 simulated classically, which is shown as inset in Fig. 4. Plotting in log-log scale we reveal, at least initially, an algebraic decay of the errors with the depth of the word-circuits.

Fig. 10
figure 10

Algebraic decay of mean training and testing error for the data displayed in Fig. 4 (bottom) obtained by basinhopping. Increasing the depth of the word-circuits results in algebraic decay of the mean training and testing errors. The slopes are \(\log e_{\text {tr}}\sim \log -1.2 d\) and \(\log e_{\text {te}}\sim -0.3 \log d\). We attribute the existence of the plateau for etr at large depths is due the small scale of our experiment and the small values for our hyperparameters determining the size of the quantum-enhanced feature space

1.2 F.2: On the influence of noise to the cost function landscape

Regarding optimisation on a quantum computer, we comment on the effect of noise on the optimal parameters. Consider a successful optimisation of L(𝜃) performed on a NISQ device, returning \(\theta ^{*}_{\text {NISQ}}\). However, if we instantiate the circuits \(C_{\sigma }(\theta ^{*}_{\text {NISQ}})\) and evaluate them on a classical computer to obtain the predicted labels \(l^{\text {CC}}_{\text {pr}}(\theta ^{*}_{\text {NISQ}})\), we observe that these can in general differ from the labels \(l^{\text {NISQ}}_{\text {pr}}(\theta ^{*}_{\text {NISQ}})\) predicted by evaluating the circuits on the quantum computer. In the context of a fault-tolerant quantum computer, this should not be the case. However, since there is a non-trivial coherent-noise channel that our circuits undergo, it is expected that the optimiser’s result are affected in this way.

Appendix G: Quantum compilation

In order to perform quantum compilation, we use pytket (Sivarajah et al. 2020). It is a Python module for interfacing with CQC’s TKET, a toolset for quantum programming. From this toolbox, we need to make use of compilation passes.

At a high level, quantum compilation can be described as follows. Given a circuit and a device, quantum operations are decomposed in terms of the devices native gateset. Furthermore, the quantum circuit is reshaped in order to make it compatible with the device’s topology (Cowtan et al. 2019). Specifically, the compilation pass that we use is default_compilation_pass(2). The integer option is set to 2 for maximum optimisation under compilation https://github.com/CQCL/pytket.

Circuits written in pytket can be run on other devices by simply changing the backend being called, regardless whether the hardware might be fundamentally different in terms of what physical systems are used as qubits. This makes TKET it platform agnostic. We stress that on IBMQ machines specifically, the native gates are arbitrary single-qubit unitaries (‘U3’ gate) and entangling controlled-not gates (‘CNOT’ or ‘CX’). Importantly, CNOT gates show error rates which are one or even two orders of magnitude larger than error rates of U3 gates. Therefore, we measure the depth of or circuits in terms of the CNOT-depth. Using pytket this can be obtained by invoking the command depth_by_type(OpType.CX).

For both backends used in this work, ibmq_montreal and ibmq_toronto, the reported quantum volume is 32 and the maximum allowed number of shots is 213.

Appendix H: Swap test and Hadamard test

In our binary classification NLP task, the predicted label is the norm squared of zero-to-zero transition amplitude where the unitary U represents the word-circuits and the circuits that implement the Bell effects as dictated by the grammatical structure. Estimating \(\vert \langle 0{\dots } 0\vert U \vert 0{\dots } 0\rangle \vert ^{2}\), or the amplitude \(\langle 0\dots 0\vert U \vert 0{\dots } 0\rangle \) itself in case one wants to define a cost function where it appears instead of its norm, can be done by postselecting on \(\langle 0\dots 0\vert \). However, postselection costs exponential time in the number of postselected qubits; in our case, it needs to discard all bitstring sampled from the quantum computer that have Hamming weight other than zero. This is the procedure we follow in this proof of concept experiment, as we can afford doing so due to the small circuit sizes.

In such a setting, postselection can be avoided by using the swap test to estimate the normed square of the amplitude or the Hadamard test for the amplitude itself (Aharonov et al. 2008). See Fig. 11 for the corresponding circuits of those routines. In Fig. 12, we show how the swap test or the Hadamard test can be used to estimate the amplitude represented by the postselected sentence-circuit of Fig. 3. Furthermore, at least for tasks that are defined such that every sentence corresponds to a circuit, we argue that sentences do not grow arbitrarily long, and so the cost of evaluating the cost function is upper bounded in practical applications.

Fig. 11
figure 11

(Left) Circuit for the Hadamard test. Measuring the control qubit in the computational basis allows one to estimate 〈Z〉 = Re(〈ψ|U|ψ〉) if b = 0, and 〈Z〉 = Im(〈ψ|U|ψ〉) if b = 1. The state ψ can be a multiqubit state, and in this work we are interested in the case \(\psi = \vert 0{\dots } 0\rangle \). (Right) Circuit for the swap test. Sampling from the control qubit allows one to estimate 〈Z〉 = |〈ψ|ϕ〉|2

Fig. 12
figure 12

Use of swap test (top) and Hadamard test (bottom) to estimate the norm-squared of the amplitude or the amplitude itself respectively, which is represented by the postselected circuit of Fig. 3

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meichanetzidis, K., Toumi, A., de Felice, G. et al. Grammar-aware sentence classification on quantum computers. Quantum Mach. Intell. 5, 10 (2023). https://doi.org/10.1007/s42484-023-00097-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42484-023-00097-1

Keywords