Ask to Understand: Question Generation for Multi-hop Question Answering

Li, Jiawei; Ren, Mucheng; Gao, Yang; Yang, Yizhe

doi:10.1007/978-981-99-6207-5_2

Jiawei Li¹⁴,
Mucheng Ren¹⁴,
Yang Gao^14,15 &
…
Yizhe Yang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

471 Accesses

Abstract

Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents. Graph Network (GN) and Question Decomposition (QD) are two common approaches at present. The former uses the “black-box” reasoning process to capture the potential relationship between entities and sentences, thus achieving good performance. At the same time, the latter provides a clear reasoning logical route by decomposing multi-hop questions into simple single-hop sub-questions. In this paper, we propose a novel method to complete multi-hop QA from the perspective of Question Generation (QG). Specifically, we carefully design an end-to-end QG module on the basis of a classical QA module, which could help the model understand the context by asking inherently logical sub-questions, thus inheriting interpretability from the QD-based method and showing superior performance. Experiments on the HotpotQA dataset demonstrate that the effectiveness of our proposed QG module, human evaluation further clarifies its interpretability quantitatively, and thorough analysis shows that the QG module could generate better sub-questions than QD methods in terms of fluency, consistency, and diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving embedded knowledge graph multi-hop question answering by introducing relational chain reasoning

Article 11 November 2022

Dynamic Reasoning Network for Multi-hop Question Answering

Coarse-grained decomposition and fine-grained interaction for multi-hop question answering

Article 07 June 2021

Notes

1.
QA module is not the main focus of this work, and DFGN is one of the representative off-the-shelf QA models. In fact, any QA model could be adopted to replace it.
2.
In our experiments, we set $\lambda _{1}=\lambda _{2}=\lambda _{3}=1$, $\lambda _{4}=5$.
3.
https://github.com/simonepri/lm-scorer.
4.
Because Bride type questions always has deterministic linear reasoning chains.

References

Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic qa corpora generation with roundtrip consistency. arXiv preprint arXiv:1906.05416 (2019)
Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7, 49–72 (2019)
Article Google Scholar
Clark, C., Gardner, M.: Simple and effective multi-paragraph reading comprehension. In: Proceedings of the 56th ACL, vol. 1: Long Papers, pp. 845–855 (2018)
Google Scholar
De Cao, N., Aziz, W., Titov, I.: Question answering by reasoning across documents with graph convolutional networks. In: ACL, pp. 2306–2317 (2019)
Google Scholar
Dhingra, B., Jin, Q., Yang, Z., Cohen, W., Salakhutdinov, R.: Neural models for reasoning over multiple mentions using coreference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 42–48 (2018)
Google Scholar
Dhole, K.D., Manning, C.D.: Syn-qg: syntactic and shallow semantic rules for question generation (2021)
Google Scholar
Ding, M., Zhou, C., Chen, Q., Yang, H., Tang, J.: Cognitive graph for multi-hop reading comprehension at scale. In: ACL, pp. 2694–2703 (2019)
Google Scholar
Durrani, N., Sajjad, H., Dalvi, F., Belinkov, Y.: Analyzing individual neurons in pre-trained language models. arXiv preprint arXiv:2010.02695 (2020)
Elazar, Y., Ravfogel, S., Jacovi, A., Goldberg, Y.: Amnesic probing: behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Linguist. 9, 160–175 (2021)
Article Google Scholar
Fabbri, A.R., Ng, P., Wang, Z., Nallapati, R., Xiang, B.: Template-based question generation from retrieved sentences for improved unsupervised question answering (2020)
Google Scholar
Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering (2020)
Google Scholar
Fu, R., Wang, H., Zhang, X., Zhou, J., Yan, Y.: Decomposing complex questions makes multi-hop qa easier and more interpretable. In: EMNLP, pp. 169–180 (2021)
Google Scholar
Gardner, M., et al.: Evaluating models’ local decision boundaries via contrast sets. arXiv preprint arXiv:2004.02709 (2020)
Hao, Y., Dong, L., Wei, F., Xu, K.: Self-attention attribution: interpreting information interactions inside transformer. arXiv preprint arXiv:2004.11207 (2020)
Janizek, J.D., Sturmfels, P., Lee, S.I.: Explaining explanations: axiomatic feature interactions for deep networks. J. Mach. Learn. Res. 22(104), 1–54 (2021)
MathSciNet MATH Google Scholar
Jiang, Y., Bansal, M.: Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop qa. In: ACL (2019)
Google Scholar
Jiang, Y., Bansal, M.: Self-assembling modular networks for interpretable multi-hop reasoning. In: EMNLP-IJCNLP, pp. 4474–4484 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Google Scholar
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale ReAding comprehension dataset from examinations. In: ACL, pp. 785–794 (2017)
Google Scholar
Lee, D.B., Lee, S., Jeong, W.T., Kim, D., Hwang, S.J.: Generating diverse and consistent qa pairs from contexts with information-maximizing hierarchical conditional vaes. arXiv preprint arXiv:2005.13837 (2020)
Min, S., Zhong, V., Zettlemoyer, L., Hajishirzi, H.: Multi-hop reading comprehension through question decomposition and rescoring. In: ACL (2019)
Google Scholar
Nishida, K., et al.: Multi-task learning for multi-hop qa with evidence extraction (2019)
Google Scholar
Pyatkin, V., Roit, P., Michael, J., Goldberg, Y., Tsarfaty, R., Dagan, I.: Asking it all: generating contextualized questions for any semantic role. In: Proceedings of the 2021 Conference on EMNLP, pp. 1429–1441 (2021)
Google Scholar
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63, 1872–1897 (2020)
Article Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP, pp. 2383–2392 (2016)
Google Scholar
Ravichander, A., Dalmia, S., Ryskina, M., Metze, F., Hovy, E., Black, A.W.: Noiseqa: challenge set evaluation for user-centric question answering. arXiv preprint arXiv:2102.08345 (2021)
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension (2018)
Google Scholar
Sultan, M.A., Chandel, S., Fernandez Astudillo, R., Castelli, V.: On the importance of diversity in question generation for QA. In: ACL (2020)
Google Scholar
Trischler, A., et al.: NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 191–200 (2017)
Google Scholar
Tu, M., Huang, K., Wang, G., Huang, J., He, X., Zhou, B.: Interpretable multi-hop reading comprehension over multiple documents (2020)
Google Scholar
Tu, M., Wang, G., Huang, J., Tang, Y., He, X., Zhou, B.: Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In: ACL, pp. 2704–2713 (2019)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. TACL 6, 287–302 (2018)
Article Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2020)
Google Scholar
Wu, Z., Peng, H., Smith, N.A.: Infusing finetuning with semantic dependencies. Trans. Assoc. Comput. Linguist. 9, 226–242 (2021)
Article Google Scholar
Xiao, Y., et al.: Dynamically fused graph network for multi-hop reasoning (2019)
Google Scholar
Yang, A., et al.: Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2346–2357. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1226. https://aclanthology.org/P19-1226
Yang, Z., et al.: Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: EMNLP, pp. 2369–2380 (2018)
Google Scholar
Zhang, S., Bansal, M.: Addressing semantic drift in question generation for semi-supervised question answering. arXiv preprint arXiv:1909.06356 (2019)
Zhong, V., Xiong, C., Keskar, N.S., Socher, R.: Coarse-grain fine-grain coattention network for multi-evidence question answering. arXiv preprint arXiv:1901.00603 (2019)

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Jiawei Li, Mucheng Ren, Yang Gao & Yizhe Yang
Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing, China
Yang Gao

Authors

Jiawei Li
View author publications
You can also search for this author in PubMed Google Scholar
Mucheng Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yizhe Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Gao .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

A Appedix: Human Evaluation Instruction

Specifically, we design human evaluation by following steps:

1.
We assemble 16 well-educated volunteers and randomly divide them into two groups, A and B. Each group contains 8 volunteers and evenly gender.
2.
We randomly sample 8 Bridge type^{Footnote 4} questions from the dev set, and manually write out the correct two-hop reasoning chain for solving each question.
3.
We replace the entity that appeared in each correct reasoning chain with other confusing entities selected from context to generate three more wrong reasoning chains (i.e., each question has 4 reasoning chains.), then shuffle them and combine them with the original question to form a four-way multi-choice QA.
4.
For group A, except the original question, final answer and four reasoning chains, we also provide supporting facts. Then volunteers are asked to find the correct reasoning chain.
5.
For group B, except the original question, final answer and four reasoning chains, we also provide the sub-questions generated by our QG module. Then volunteers are asked to find the correct reasoning chain.
6.
We count the accuracy and time elapsed for solving problem.

Beyond that, some details are worth noting:

The volunteers participated in the human evaluation test are all well-educated graduate students with skilled English.
We use the online questionnaire platform to design the electronic questionnaire.
The questionnaire system can automatically score according to the pre-set reference answers, and count the time spent on answering the questions.
The timer starts when the volunteer clicks “accept" button on the questionnaire, and ends when the volunteer clicks “submit" button.
Volunteers are required to answer the questionnaire without any interruption, ensuring that all time spent is for answering questions.
Before starting filling the questionnaire, we provide a sample example as instruction to teach the volunteers how to find the answer.

The interface of human evaluation for each group could be found in Fig. 5 and Fig. 6.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Ren, M., Gao, Y., Yang, Y. (2023). Ask to Understand: Question Generation for Multi-hop Question Answering. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_2
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ask to Understand: Question Generation for Multi-hop Question Answering