Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Deep QA: An Open-Domain Dataset of Deep Questions and Comprehensive Answers

  • Conference paper
  • First Online:
Computer and Communication Engineering (CCCE 2023)

Abstract

Current available question answering (QA) datasets fall short in two aspects - providing comprehensive answers that span over a few sentences and questions being deep or analytical in nature. Though individually these issues are addressed, a dataset that addresses both these issues is still not available. To address this gap, we introduce Deep QA(DQA), i.e., a dataset consisting of 12816 questions broadly classified into 4 types of questions. The generated dataset has been analyzed and compared with a standard QA dataset to prove that it demands higher cognitive skills. To prove the point further, state of art models trained on remembering type factive QA dataset have been pre-trained on the proposed dataset and are shown to perform poorly on the question types generated. Finally, some preliminary investigation using a graph neural model has been done to probe the possibility of an alternative answer generation technique on such a dataset of deeper questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://doi.org/10.5281/zenodo.7538113.

References

  1. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)

  2. Artetxe, M., Ruder, S., Yogatama, D.: On the cross-lingual transferability of mono-lingual representations. arXiv preprint arXiv:1910.11856 (2019)

  3. Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)

  4. Jia, X., Zhou, W., Sun, X., Wu, Y.: Eqg-race: examination-type question generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13143–13151 (2021)

    Google Scholar 

  5. Cao, S., et al.: KQA pro: a dataset with explicit compositional programs for complex question answering over knowledge base. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), vol. 1:(Long Papers), pp. 6101–6119. Association Computational Linguistics-ACL (2022)

    Google Scholar 

  6. Sen, P., Aji, A.F., Saffari, A.: Mintaka: a complex, natural, and multilingual dataset for end-to-end question answering. arXiv preprint arXiv:2210.01613 (2022)

  7. Gu, Y., et al.: Beyond IID: three levels of generalization for question answering on knowledge bases. In: Proceedings of the Web Conference 2021, pp. 3477–3488 (2021)

    Google Scholar 

  8. Yang, Z., et al.: Hotpotqa: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)

  9. Wang, Z.: Modern question answering datasets and benchmarks: a survey. arXiv preprint arXiv:2206.15030 (2022)

  10. Boratko, M., et al.: A systematic classification of knowledge, reasoning, and context within the arc dataset. arXiv preprint arXiv:1806.00358 (2018)

  11. Liu, J., Cui, L., Liu, H., Huang, D., Wang, Y., Zhang, Y.: LogiQA: a challenge dataset for machine reading comprehension with logical reasoning. arXiv preprint arXiv:2007.08124 (2020)

  12. Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor conduct electricity? A new dataset for open book question answering. arXiv preprint arXiv:1809.02789 (2018)

  13. Clark, C., Lee, K., Chang, M.W., Kwiatkowski, T., Collins, M., Toutanova, K.: Boolq: exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 (2019)

  14. Voskarides, N., Li, D., Panteli, A., Ren, P.: ILPS at TREC 2019 conversational assistant track. In: TREC (2019)

    Google Scholar 

  15. Trischler, A., et al.: NewsQA: a machine comprehension dataset. arXiv preprint arXiv:1611.09830 (2016)

  16. Dunn, M., Sagun, L., Higgins, M., Guney, V.U., Cirik, V., Cho, K.: SearchQA: a new Q&A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179 (2017)

  17. Choi, E., et al.: Quac: question answering in context. arXiv preprint arXiv:1808.07036 (2018)

  18. Tafjord, O., Gardner, M., Lin, K., Clark, P.: Quartz: an open-domain dataset of qualitative relationship questions. arXiv preprint arXiv:1909.03553 (2019)

  19. Nguyen, T., et al.: MS marco: a human generated machine reading comprehension dataset. Choice 2640, 660 (2016)

    Google Scholar 

  20. Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)

  21. Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., Auli, M.: Eli5: long form question answering. arXiv preprint arXiv:1907.09190 (2019)

  22. Ullrich, S., Geierhos, M.: Using bloom’s taxonomy to classify question complexity. In: Proceedings of the Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 285–289 (2021)

    Google Scholar 

  23. Palmer, M., Gildea, D., Xue, N.: Semantic role labeling. Synth. Lect. Hum. Lang. Technol. 3(1), 1–103 (2010)

    Article  Google Scholar 

  24. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  25. Nguyen, D., Nguyen, T.: A question answering model based evaluation for OVL (ontology for Vietnamese language). Int. J. Comput. Theory Eng. 347–351 (2011). https://doi.org/10.7763/IJCTE.2011.V3.330, https://huggingface.co/ChuVN/bart-base-finetuned-squad2

  26. https://huggingface.co/ChuVN/bart-base-finetuned-squad2

  27. https://huggingface.co/csarron/bert-base-uncased-squad-v1

  28. https://huggingface.co/ozcangundes/T5-base-for-BioQA

  29. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., et al.: Graph attention networks. Stat 1050(20), 10–48550 (2017)

    Google Scholar 

  30. Mrini, K., Farcas, E., Nakashole, N.: Recursive tree-structured self-attention for answer sentence selection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4651–4661 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harshavardhan Veeranna Navalli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anbarasu, H.S., Navalli, H.V., Vidapanakal, H., Gowd, K.M., Das, B. (2023). Deep QA: An Open-Domain Dataset of Deep Questions and Comprehensive Answers. In: Neri, F., Du, KL., Varadarajan, V., San-Blas, AA., Jiang, Z. (eds) Computer and Communication Engineering. CCCE 2023. Communications in Computer and Information Science, vol 1823. Springer, Cham. https://doi.org/10.1007/978-3-031-35299-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35299-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35298-0

  • Online ISBN: 978-3-031-35299-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics