Abstract
Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents. Graph Network (GN) and Question Decomposition (QD) are two common approaches at present. The former uses the “black-box” reasoning process to capture the potential relationship between entities and sentences, thus achieving good performance. At the same time, the latter provides a clear reasoning logical route by decomposing multi-hop questions into simple single-hop sub-questions. In this paper, we propose a novel method to complete multi-hop QA from the perspective of Question Generation (QG). Specifically, we carefully design an end-to-end QG module on the basis of a classical QA module, which could help the model understand the context by asking inherently logical sub-questions, thus inheriting interpretability from the QD-based method and showing superior performance. Experiments on the HotpotQA dataset demonstrate that the effectiveness of our proposed QG module, human evaluation further clarifies its interpretability quantitatively, and thorough analysis shows that the QG module could generate better sub-questions than QD methods in terms of fluency, consistency, and diversity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
QA module is not the main focus of this work, and DFGN is one of the representative off-the-shelf QA models. In fact, any QA model could be adopted to replace it.
- 2.
In our experiments, we set \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\), \(\lambda _{4}=5\).
- 3.
- 4.
Because Bride type questions always has deterministic linear reasoning chains.
References
Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic qa corpora generation with roundtrip consistency. arXiv preprint arXiv:1906.05416 (2019)
Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7, 49–72 (2019)
Clark, C., Gardner, M.: Simple and effective multi-paragraph reading comprehension. In: Proceedings of the 56th ACL, vol. 1: Long Papers, pp. 845–855 (2018)
De Cao, N., Aziz, W., Titov, I.: Question answering by reasoning across documents with graph convolutional networks. In: ACL, pp. 2306–2317 (2019)
Dhingra, B., Jin, Q., Yang, Z., Cohen, W., Salakhutdinov, R.: Neural models for reasoning over multiple mentions using coreference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 42–48 (2018)
Dhole, K.D., Manning, C.D.: Syn-qg: syntactic and shallow semantic rules for question generation (2021)
Ding, M., Zhou, C., Chen, Q., Yang, H., Tang, J.: Cognitive graph for multi-hop reading comprehension at scale. In: ACL, pp. 2694–2703 (2019)
Durrani, N., Sajjad, H., Dalvi, F., Belinkov, Y.: Analyzing individual neurons in pre-trained language models. arXiv preprint arXiv:2010.02695 (2020)
Elazar, Y., Ravfogel, S., Jacovi, A., Goldberg, Y.: Amnesic probing: behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Linguist. 9, 160–175 (2021)
Fabbri, A.R., Ng, P., Wang, Z., Nallapati, R., Xiang, B.: Template-based question generation from retrieved sentences for improved unsupervised question answering (2020)
Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering (2020)
Fu, R., Wang, H., Zhang, X., Zhou, J., Yan, Y.: Decomposing complex questions makes multi-hop qa easier and more interpretable. In: EMNLP, pp. 169–180 (2021)
Gardner, M., et al.: Evaluating models’ local decision boundaries via contrast sets. arXiv preprint arXiv:2004.02709 (2020)
Hao, Y., Dong, L., Wei, F., Xu, K.: Self-attention attribution: interpreting information interactions inside transformer. arXiv preprint arXiv:2004.11207 (2020)
Janizek, J.D., Sturmfels, P., Lee, S.I.: Explaining explanations: axiomatic feature interactions for deep networks. J. Mach. Learn. Res. 22(104), 1–54 (2021)
Jiang, Y., Bansal, M.: Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop qa. In: ACL (2019)
Jiang, Y., Bansal, M.: Self-assembling modular networks for interpretable multi-hop reasoning. In: EMNLP-IJCNLP, pp. 4474–4484 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale ReAding comprehension dataset from examinations. In: ACL, pp. 785–794 (2017)
Lee, D.B., Lee, S., Jeong, W.T., Kim, D., Hwang, S.J.: Generating diverse and consistent qa pairs from contexts with information-maximizing hierarchical conditional vaes. arXiv preprint arXiv:2005.13837 (2020)
Min, S., Zhong, V., Zettlemoyer, L., Hajishirzi, H.: Multi-hop reading comprehension through question decomposition and rescoring. In: ACL (2019)
Nishida, K., et al.: Multi-task learning for multi-hop qa with evidence extraction (2019)
Pyatkin, V., Roit, P., Michael, J., Goldberg, Y., Tsarfaty, R., Dagan, I.: Asking it all: generating contextualized questions for any semantic role. In: Proceedings of the 2021 Conference on EMNLP, pp. 1429–1441 (2021)
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63, 1872–1897 (2020)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP, pp. 2383–2392 (2016)
Ravichander, A., Dalmia, S., Ryskina, M., Metze, F., Hovy, E., Black, A.W.: Noiseqa: challenge set evaluation for user-centric question answering. arXiv preprint arXiv:2102.08345 (2021)
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension (2018)
Sultan, M.A., Chandel, S., Fernandez Astudillo, R., Castelli, V.: On the importance of diversity in question generation for QA. In: ACL (2020)
Trischler, A., et al.: NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 191–200 (2017)
Tu, M., Huang, K., Wang, G., Huang, J., He, X., Zhou, B.: Interpretable multi-hop reading comprehension over multiple documents (2020)
Tu, M., Wang, G., Huang, J., Tang, Y., He, X., Zhou, B.: Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In: ACL, pp. 2704–2713 (2019)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. TACL 6, 287–302 (2018)
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2020)
Wu, Z., Peng, H., Smith, N.A.: Infusing finetuning with semantic dependencies. Trans. Assoc. Comput. Linguist. 9, 226–242 (2021)
Xiao, Y., et al.: Dynamically fused graph network for multi-hop reasoning (2019)
Yang, A., et al.: Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2346–2357. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1226. https://aclanthology.org/P19-1226
Yang, Z., et al.: Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: EMNLP, pp. 2369–2380 (2018)
Zhang, S., Bansal, M.: Addressing semantic drift in question generation for semi-supervised question answering. arXiv preprint arXiv:1909.06356 (2019)
Zhong, V., Xiong, C., Keskar, N.S., Socher, R.: Coarse-grain fine-grain coattention network for multi-evidence question answering. arXiv preprint arXiv:1901.00603 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appedix: Human Evaluation Instruction
A Appedix: Human Evaluation Instruction
Specifically, we design human evaluation by following steps:
-
1.
We assemble 16 well-educated volunteers and randomly divide them into two groups, A and B. Each group contains 8 volunteers and evenly gender.
-
2.
We randomly sample 8 Bridge typeFootnote 4 questions from the dev set, and manually write out the correct two-hop reasoning chain for solving each question.
-
3.
We replace the entity that appeared in each correct reasoning chain with other confusing entities selected from context to generate three more wrong reasoning chains (i.e., each question has 4 reasoning chains.), then shuffle them and combine them with the original question to form a four-way multi-choice QA.
-
4.
For group A, except the original question, final answer and four reasoning chains, we also provide supporting facts. Then volunteers are asked to find the correct reasoning chain.
-
5.
For group B, except the original question, final answer and four reasoning chains, we also provide the sub-questions generated by our QG module. Then volunteers are asked to find the correct reasoning chain.
-
6.
We count the accuracy and time elapsed for solving problem.
Beyond that, some details are worth noting:
-
The volunteers participated in the human evaluation test are all well-educated graduate students with skilled English.
-
We use the online questionnaire platform to design the electronic questionnaire.
-
The questionnaire system can automatically score according to the pre-set reference answers, and count the time spent on answering the questions.
-
The timer starts when the volunteer clicks “accept" button on the questionnaire, and ends when the volunteer clicks “submit" button.
-
Volunteers are required to answer the questionnaire without any interruption, ensuring that all time spent is for answering questions.
-
Before starting filling the questionnaire, we provide a sample example as instruction to teach the volunteers how to find the answer.
The interface of human evaluation for each group could be found in Fig. 5 and Fig. 6.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, J., Ren, M., Gao, Y., Yang, Y. (2023). Ask to Understand: Question Generation for Multi-hop Question Answering. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-6207-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)