Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Ask to Understand: Question Generation for Multi-hop Question Answering

  • Conference paper
  • First Online:
Chinese Computational Linguistics (CCL 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

  • 471 Accesses

Abstract

Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents. Graph Network (GN) and Question Decomposition (QD) are two common approaches at present. The former uses the “black-box” reasoning process to capture the potential relationship between entities and sentences, thus achieving good performance. At the same time, the latter provides a clear reasoning logical route by decomposing multi-hop questions into simple single-hop sub-questions. In this paper, we propose a novel method to complete multi-hop QA from the perspective of Question Generation (QG). Specifically, we carefully design an end-to-end QG module on the basis of a classical QA module, which could help the model understand the context by asking inherently logical sub-questions, thus inheriting interpretability from the QD-based method and showing superior performance. Experiments on the HotpotQA dataset demonstrate that the effectiveness of our proposed QG module, human evaluation further clarifies its interpretability quantitatively, and thorough analysis shows that the QG module could generate better sub-questions than QD methods in terms of fluency, consistency, and diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    QA module is not the main focus of this work, and DFGN is one of the representative off-the-shelf QA models. In fact, any QA model could be adopted to replace it.

  2. 2.

    In our experiments, we set \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\), \(\lambda _{4}=5\).

  3. 3.

    https://github.com/simonepri/lm-scorer.

  4. 4.

    Because Bride type questions always has deterministic linear reasoning chains.

References

  1. Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic qa corpora generation with roundtrip consistency. arXiv preprint arXiv:1906.05416 (2019)

  2. Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7, 49–72 (2019)

    Article  Google Scholar 

  3. Clark, C., Gardner, M.: Simple and effective multi-paragraph reading comprehension. In: Proceedings of the 56th ACL, vol. 1: Long Papers, pp. 845–855 (2018)

    Google Scholar 

  4. De Cao, N., Aziz, W., Titov, I.: Question answering by reasoning across documents with graph convolutional networks. In: ACL, pp. 2306–2317 (2019)

    Google Scholar 

  5. Dhingra, B., Jin, Q., Yang, Z., Cohen, W., Salakhutdinov, R.: Neural models for reasoning over multiple mentions using coreference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 42–48 (2018)

    Google Scholar 

  6. Dhole, K.D., Manning, C.D.: Syn-qg: syntactic and shallow semantic rules for question generation (2021)

    Google Scholar 

  7. Ding, M., Zhou, C., Chen, Q., Yang, H., Tang, J.: Cognitive graph for multi-hop reading comprehension at scale. In: ACL, pp. 2694–2703 (2019)

    Google Scholar 

  8. Durrani, N., Sajjad, H., Dalvi, F., Belinkov, Y.: Analyzing individual neurons in pre-trained language models. arXiv preprint arXiv:2010.02695 (2020)

  9. Elazar, Y., Ravfogel, S., Jacovi, A., Goldberg, Y.: Amnesic probing: behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Linguist. 9, 160–175 (2021)

    Article  Google Scholar 

  10. Fabbri, A.R., Ng, P., Wang, Z., Nallapati, R., Xiang, B.: Template-based question generation from retrieved sentences for improved unsupervised question answering (2020)

    Google Scholar 

  11. Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering (2020)

    Google Scholar 

  12. Fu, R., Wang, H., Zhang, X., Zhou, J., Yan, Y.: Decomposing complex questions makes multi-hop qa easier and more interpretable. In: EMNLP, pp. 169–180 (2021)

    Google Scholar 

  13. Gardner, M., et al.: Evaluating models’ local decision boundaries via contrast sets. arXiv preprint arXiv:2004.02709 (2020)

  14. Hao, Y., Dong, L., Wei, F., Xu, K.: Self-attention attribution: interpreting information interactions inside transformer. arXiv preprint arXiv:2004.11207 (2020)

  15. Janizek, J.D., Sturmfels, P., Lee, S.I.: Explaining explanations: axiomatic feature interactions for deep networks. J. Mach. Learn. Res. 22(104), 1–54 (2021)

    MathSciNet  MATH  Google Scholar 

  16. Jiang, Y., Bansal, M.: Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop qa. In: ACL (2019)

    Google Scholar 

  17. Jiang, Y., Bansal, M.: Self-assembling modular networks for interpretable multi-hop reasoning. In: EMNLP-IJCNLP, pp. 4474–4484 (2019)

    Google Scholar 

  18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)

    Google Scholar 

  19. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale ReAding comprehension dataset from examinations. In: ACL, pp. 785–794 (2017)

    Google Scholar 

  20. Lee, D.B., Lee, S., Jeong, W.T., Kim, D., Hwang, S.J.: Generating diverse and consistent qa pairs from contexts with information-maximizing hierarchical conditional vaes. arXiv preprint arXiv:2005.13837 (2020)

  21. Min, S., Zhong, V., Zettlemoyer, L., Hajishirzi, H.: Multi-hop reading comprehension through question decomposition and rescoring. In: ACL (2019)

    Google Scholar 

  22. Nishida, K., et al.: Multi-task learning for multi-hop qa with evidence extraction (2019)

    Google Scholar 

  23. Pyatkin, V., Roit, P., Michael, J., Goldberg, Y., Tsarfaty, R., Dagan, I.: Asking it all: generating contextualized questions for any semantic role. In: Proceedings of the 2021 Conference on EMNLP, pp. 1429–1441 (2021)

    Google Scholar 

  24. Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63, 1872–1897 (2020)

    Article  Google Scholar 

  25. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  26. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP, pp. 2383–2392 (2016)

    Google Scholar 

  27. Ravichander, A., Dalmia, S., Ryskina, M., Metze, F., Hovy, E., Black, A.W.: Noiseqa: challenge set evaluation for user-centric question answering. arXiv preprint arXiv:2102.08345 (2021)

  28. Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension (2018)

    Google Scholar 

  29. Sultan, M.A., Chandel, S., Fernandez Astudillo, R., Castelli, V.: On the importance of diversity in question generation for QA. In: ACL (2020)

    Google Scholar 

  30. Trischler, A., et al.: NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 191–200 (2017)

    Google Scholar 

  31. Tu, M., Huang, K., Wang, G., Huang, J., He, X., Zhou, B.: Interpretable multi-hop reading comprehension over multiple documents (2020)

    Google Scholar 

  32. Tu, M., Wang, G., Huang, J., Tang, Y., He, X., Zhou, B.: Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In: ACL, pp. 2704–2713 (2019)

    Google Scholar 

  33. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  34. Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. TACL 6, 287–302 (2018)

    Article  Google Scholar 

  35. Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2020)

    Google Scholar 

  36. Wu, Z., Peng, H., Smith, N.A.: Infusing finetuning with semantic dependencies. Trans. Assoc. Comput. Linguist. 9, 226–242 (2021)

    Article  Google Scholar 

  37. Xiao, Y., et al.: Dynamically fused graph network for multi-hop reasoning (2019)

    Google Scholar 

  38. Yang, A., et al.: Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2346–2357. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1226. https://aclanthology.org/P19-1226

  39. Yang, Z., et al.: Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: EMNLP, pp. 2369–2380 (2018)

    Google Scholar 

  40. Zhang, S., Bansal, M.: Addressing semantic drift in question generation for semi-supervised question answering. arXiv preprint arXiv:1909.06356 (2019)

  41. Zhong, V., Xiong, C., Keskar, N.S., Socher, R.: Coarse-grain fine-grain coattention network for multi-evidence question answering. arXiv preprint arXiv:1901.00603 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Gao .

Editor information

Editors and Affiliations

A Appedix: Human Evaluation Instruction

A Appedix: Human Evaluation Instruction

Specifically, we design human evaluation by following steps:

  1. 1.

    We assemble 16 well-educated volunteers and randomly divide them into two groups, A and B. Each group contains 8 volunteers and evenly gender.

  2. 2.

    We randomly sample 8 Bridge typeFootnote 4 questions from the dev set, and manually write out the correct two-hop reasoning chain for solving each question.

  3. 3.

    We replace the entity that appeared in each correct reasoning chain with other confusing entities selected from context to generate three more wrong reasoning chains (i.e., each question has 4 reasoning chains.), then shuffle them and combine them with the original question to form a four-way multi-choice QA.

  4. 4.

    For group A, except the original question, final answer and four reasoning chains, we also provide supporting facts. Then volunteers are asked to find the correct reasoning chain.

  5. 5.

    For group B, except the original question, final answer and four reasoning chains, we also provide the sub-questions generated by our QG module. Then volunteers are asked to find the correct reasoning chain.

  6. 6.

    We count the accuracy and time elapsed for solving problem.

Beyond that, some details are worth noting:

  • The volunteers participated in the human evaluation test are all well-educated graduate students with skilled English.

  • We use the online questionnaire platform to design the electronic questionnaire.

  • The questionnaire system can automatically score according to the pre-set reference answers, and count the time spent on answering the questions.

  • The timer starts when the volunteer clicks “accept" button on the questionnaire, and ends when the volunteer clicks “submit" button.

  • Volunteers are required to answer the questionnaire without any interruption, ensuring that all time spent is for answering questions.

  • Before starting filling the questionnaire, we provide a sample example as instruction to teach the volunteers how to find the answer.

The interface of human evaluation for each group could be found in Fig. 5 and Fig. 6.

Fig. 5.
figure 5

Interface for human evaluation of choosing reasoning chain based on support facts.

Fig. 6.
figure 6

Interface for human evaluation of choosing reasoning chain based on sub-questions.s

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Ren, M., Gao, Y., Yang, Y. (2023). Ask to Understand: Question Generation for Multi-hop Question Answering. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6207-5_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6206-8

  • Online ISBN: 978-981-99-6207-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics