A Quantum Expectation Value Based Language Model with Application to Question Answering
Abstract
:1. Introduction
- We propose QEV-LM, which represents words and sentences as different observables in a quantum system and utilizes a shared density matrix to measure joint question-answer observables. Under this scheme, the matching score of a question-answer pair is naturally explained as the quantum expectation value of the corresponding joint question-answer observable.
- We come up with a computationally efficient approach to constructing the shared density matrix via a quantum-like kernel trick.
- We apply QEV-LM to a typical answer selection Question Answering task on TREC-QA and WIKIQA datasets. Our model outperforms other quantum models with low time consumption and also surpasses strong CNN and LSTM baselines.
- A detailed discussion is conducted. In particular, we show that the off-diagonal elements of the density matrix, which correspond to sememes’ superpositions, play an important role to improve the model’s performance.
2. Related Work
3. Background
3.1. Basic Concepts
3.2. Quantum Expectation Value of an Observable
4. Quantum Expectation Value Based Language Model
4.1. Complex-Valued Word Embedding
4.2. Sentence Level Observable Construction
4.3. Joint Question-Answer Observable
4.4. Quantum Expectation Value of the Joint Question-Answer Observable
- Be symmetric. First of all, for every is symmetric. , for . Therefore
- Be semi-definite. A matrix M (rank n) is said to be semi-definite if is positive or zero for every non-zero column vector z of n numbers [27]. We can show that
- Be of trace 1. After the density matrix is constructed, one can always multiply a scalar to the matrix to make its trace being 1 without violating the symmetric and semi-definite properties. Since this scalar can be shifted to the later pipeline treated as a rescale operation to our parameters, we do not directly restrict the trace of density matrix in our model.
4.5. Leaning to Rank
5. Experiments
5.1. Experimental Setup
- TREC-QA [28] is a standard QA dataset in the Text REtrieval Conference (TREC).
- WIKIQA [29] is an open domain QA dataset released by Microsoft Research.
5.2. Baselines for Comparison
- NNQLM-II [5]. It is an end-to-end language model. Embedding vector is introduced to encode question and answer sentences. The matching score is computed over the joint representation of question and answer density matrices.
- CNM [10]. It is a complex-valued matching network. Sentence is modeled with local mixture density matrices. Projectors select features for question and answer density matrices, and cosine similarity is used to calculate the matching score.
5.3. Implementation Details
5.4. Experimental Results
6. Discussion
6.1. Ablation Study
6.1.1. Ablation Study on Hilbert Space
6.1.2. Ablation Study on Observables
6.1.3. Ablation Study on Density Matrix
6.2. Parameter Scale and Time Consumption
6.3. Comparison on Physical Interpretability
7. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Basile, I.; Tamburini, F. Towards quantum language models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1840–1849. [Google Scholar]
- Blacoe, W.; Kashefi, E.; Lapata, M. A quantum-theoretic approach to distributional semantics. In Proceedings of the HLT-NAACL, Atlanta, GA, USA, 9–14 June 2013; pp. 847–857. [Google Scholar]
- Van Rijsbergen, C.J. The Geometry of Information Retrieval; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Sordoni, A.; Nie, J.-Y.; Bengio, Y. Modeling term dependencies with quantum language models for ir. In Proceedings of the SIGIR, Dublin, Ireland, 28 July–1 August 2013; pp. 653–662. [Google Scholar] [CrossRef]
- Zhang, P.; Niu, J.; Su, Z.; Wang, B.; Ma, L.; Song, D. End-to-end quantum-like language models with application to question answering. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; pp. 5666–5673. [Google Scholar]
- Yu, L.; Moritz Hermann, K.; Blunsom, P.; Pulman, S. Deep learning for answer sentence selection. arXiv 2014, arXiv:1412.1632. [Google Scholar]
- Dos Santos, C.; Tan, M.; Xiang, B.; Zhou, B. Attentive pooling networks. arXiv 2016, arXiv:1602.03609. [Google Scholar]
- Sordoni, A.; Nie, J.-Y.; Blunsom, P. Looking at vector space and language models for ir using density matrices. In Proceedings of the QI, Leicester, UK, 25–27 July 2013; pp. 147–159. [Google Scholar]
- Melucci, M.; van Rijsbergen, K. Quantum Mechanics and Information Retrieval; Springer: Berlin, Germany, 2011; pp. 125–155. [Google Scholar]
- Li, Q.; Wang, B.; Melucci, M. Cnm: An interpretable complex-valued network for matching. In Proceedings of the NAACL, Minneapolis, MN, USA, 2–7 June 2019; pp. 4139–4148. [Google Scholar]
- Zhang, P.; Song, D.; Hou, Y.; Wang, J. Automata modeling for cognitive interference in users relevance judgment. In Proceedings of the QI, Arlington, Virginia, 11–13 November 2010; pp. 125–133. [Google Scholar]
- Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv 2013, arXiv:1307.0411. [Google Scholar]
- Galea, D.; Bruza, P. Modelling cued-target recall using quantum inspired models of target activation. In Proceedings of the QI, Filzbach, Switzerland, 15–17 July 2015; pp. 258–271. [Google Scholar]
- Zuccon, G.; Azzopardi, L.A.; van Rijsbergen, K. The quantum probability ranking principle for information retrieval. In Proceedings of the ICTIR, Cambridge, UK, 10–12 September 2009; pp. 232–240. [Google Scholar]
- Piwowarski, B.; Frommholz, I.; Lalmas, M. What can quantum theory bring to information retrieval? In Proceedings of the CIKM, Toronto, ON, Canada, 26–30 October 2010; pp. 59–68. [Google Scholar] [CrossRef]
- Xie, M.; Hou, Y.; Zhang, P.; Li, J.; Li, W.; Song, D. Modeling quantum entanglements in quantum language models. In Proceedings of the IJCAI, Buenos Aires, Argentina, 25–31 July 2015; pp. 1362–1368. [Google Scholar]
- Von Neumann, J. Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton, NJ, USA, 1955. [Google Scholar]
- Michael, A.; Nielsen, I.; Chuang, L. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Gleason, A.M. Measures on the closed subspaces of a hilbert space. J. Math. Mech. 1957, 6, 885–893. [Google Scholar] [CrossRef]
- Hughes, R.I.G. The Structure and Interpretation of Quantum Mechanics; Harvard University Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Sakurai, J.J. Modern Quantum Mechanics; Addison Wesley Longman: Boston, MA, USA, 1994. [Google Scholar]
- Goddard, C.; Wierzbicka, A. Semantic and Lexical Universals: Theory and Empirical Findings; John Benjamins Publishing: Amsterdam, The Netherlands, 1994. [Google Scholar]
- Bourbaki, N. Topological vector spaces. In Elements of Mathematics; Springer: Berlin, Germany, 1987. [Google Scholar]
- Hu, B.; Lu, Z.; Li, H.; Chen, Q. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the NIPS, Montreal, QC, Canada, 8–13 December 2014; pp. 2042–2050. [Google Scholar]
- Wang, B.; Zhang, P.; Li, J.; Song, D.; Hou, Y.; Shang, Z. Exploration of quantum interference in document relevance judgement discrepancy. Entropy 2016, 18, 144. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Bhatia, R. Positive Definite Matrices; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
- Voorhees, E.M.; Tice, D.M. Building a question answering test collection. In Proceedings of the SIGIR, Athens, Greece, 24–28 July 2000; pp. 200–207. [Google Scholar] [CrossRef]
- Yang, Y.; Yih, W.-T.; Meek, C. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the EMNLP, Lisbon, Portugal, 17–21 September 2015; pp. 2013–2018. [Google Scholar]
- Severyn, A.; Moschitti, A. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the SIGIR, Santiago, Chile, 9–13 August 2015; pp. 373–382. [Google Scholar]
- Severyn, A.; Moschitti, A. Modeling relational information in question-answer pairs with convolutional neural networks. arXiv 2016, arXiv:1604.01178. [Google Scholar]
- He, H.; Gimpel, K.; Lin, J. Multiperspective sentence similarity modeling with convolutional neural networks. In Proceedings of the EMNLP, Lisbon, Portugal, 7–21 September 2015; pp. 1576–1586. [Google Scholar]
- Miao, Y.; Yu, L.; Blunsom, P. Neural variational inference for text processing. arXiv 2015, arXiv:1511.06038. [Google Scholar]
- Wang, D.; Nyberg, E. A long short-term memory model for answer sentence selection in question answering. In Proceedings of the ACL, Beijing, China, 26–31 July 2015; pp. 707–712. [Google Scholar]
- Steiner, B.; DeVito, Z.; Chintala, S.; Gross, S.; Paszke, A.; Massa, F.; Lerer, A.; Chanan, G.; Lin, Z.; Yang, Z.; et al. Pytorch: An imperative style, highperformancedeep learning librar. In Proceedings of the NIPS, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Zhang, P.; Su, Z.; Zhang, L.; Wang, B.; Song, D. A quantum many body wave function inspired language modeling approach. In Proceedings of the CIKM, Turin, Italy, 22–26 October 2018; pp. 1303–1312. [Google Scholar] [CrossRef] [Green Version]
Dataset | Train | Dev | Test | |
---|---|---|---|---|
TREC-QA | Questions | 1229 | 65 | 68 |
Pairs | 53,417 | 117 | 1442 | |
WIKIQA | Questions | 837 | 126 | 633 |
Pairs | 8627 | 1130 | 2351 |
Model | TREC-QA | WIKIQA | ||
---|---|---|---|---|
MAP | MRR | MAP | MRR | |
Ngram-CNN | 0.6709 | 0.7280 | 0.6661 | 0.6851 |
MP-CNN | 0.7770 | 0.8360 | / | / |
Three-Layer BLSTM + BM25 | 0.7134 | 0.7913 | / | / |
LSTM-attn | / | / | 0.6639 | 0.6828 |
QLM | 0.6784 | 0.7265 | 0.5109 | 0.5148 |
NNQLM-II | 0.7589 | 0.8254 | 0.6496 | 0.6594 |
CNM | 0.7701 | 0.8591 | 0.6748 | 0.6864 |
QEV-LM | 0.8106 | 0.8934 | 0.6748 | 0.6948 |
Over CNM | 5.26% | 3.99% | 0% | 1.22% |
Model | TREC-QA | WIKIQA | ||
---|---|---|---|---|
MAP | MRR | MAP | MRR | |
QEV-LM-real | 0.8066 | 0.8669 | 0.6510 | 0.6641 |
QEV-LM-no-weight | 0.8049 | 0.8895 | 0.6729 | 0.6934 |
QEV-LM-sum | 0.6629 | 0.7413 | 0.6226 | 0.6406 |
QEV-LM-class1 | 0.7391 | 0.8053 | 0.6007 | 0.6182 |
QEV-LM-class2 | 0.7402 | 0.8023 | 0.6049 | 0.6195 |
QEV-LM | 0.8106 | 0.8934 | 0.6748 | 0.6948 |
Model | Params | TC |
---|---|---|
NNQLM-II | 3.03 M | 53.2 ms |
QEV-LM-real | 2.97 M | 45.0 ms |
CNM | 5.90 M | 1.50 s |
QEV-LM | 5.89 M | 87.5 ms |
Component | Model | Physical Explanation |
---|---|---|
Word Encoder | QLM | one-hot embedding |
NNQLM-II | real-valued physical state | |
CNM | complex-valued physical state | |
QEV-LM | complex-valued physical state | |
Sentence representation | QLM | Sentence density matrix |
NNQLM-II | Sentence density matrix | |
CNM | Sentence density matrix via local density matrices scheme | |
QEV-LM | Sentence Observable via word project operators | |
Question-Answer pair | QLM | question density matrix and answer density matrix |
NNQLM-II | joint question-answer density matrix | |
CNM | question density matrix answer density matrix | |
QEV-LM | joint question-answer observable | |
Matching score | QLM | VN-divergence |
NNQLM-II | softmax valud of CNN-based density features | |
CNM | Cosine distance of density features | |
QEV-LM | Quantum expectation value of joint observable |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Q.; Hou, C.; Liu, C.; Zhang, P.; Xu, R. A Quantum Expectation Value Based Language Model with Application to Question Answering. Entropy 2020, 22, 533. https://doi.org/10.3390/e22050533
Zhao Q, Hou C, Liu C, Zhang P, Xu R. A Quantum Expectation Value Based Language Model with Application to Question Answering. Entropy. 2020; 22(5):533. https://doi.org/10.3390/e22050533
Chicago/Turabian StyleZhao, Qin, Chenguang Hou, Changjian Liu, Peng Zhang, and Ruifeng Xu. 2020. "A Quantum Expectation Value Based Language Model with Application to Question Answering" Entropy 22, no. 5: 533. https://doi.org/10.3390/e22050533