Abstract
Current available question answering (QA) datasets fall short in two aspects - providing comprehensive answers that span over a few sentences and questions being deep or analytical in nature. Though individually these issues are addressed, a dataset that addresses both these issues is still not available. To address this gap, we introduce Deep QA(DQA), i.e., a dataset consisting of 12816 questions broadly classified into 4 types of questions. The generated dataset has been analyzed and compared with a standard QA dataset to prove that it demands higher cognitive skills. To prove the point further, state of art models trained on remembering type factive QA dataset have been pre-trained on the proposed dataset and are shown to perform poorly on the question types generated. Finally, some preliminary investigation using a graph neural model has been done to probe the possibility of an alternative answer generation technique on such a dataset of deeper questions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Artetxe, M., Ruder, S., Yogatama, D.: On the cross-lingual transferability of mono-lingual representations. arXiv preprint arXiv:1910.11856 (2019)
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
Jia, X., Zhou, W., Sun, X., Wu, Y.: Eqg-race: examination-type question generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13143–13151 (2021)
Cao, S., et al.: KQA pro: a dataset with explicit compositional programs for complex question answering over knowledge base. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), vol. 1:(Long Papers), pp. 6101–6119. Association Computational Linguistics-ACL (2022)
Sen, P., Aji, A.F., Saffari, A.: Mintaka: a complex, natural, and multilingual dataset for end-to-end question answering. arXiv preprint arXiv:2210.01613 (2022)
Gu, Y., et al.: Beyond IID: three levels of generalization for question answering on knowledge bases. In: Proceedings of the Web Conference 2021, pp. 3477–3488 (2021)
Yang, Z., et al.: Hotpotqa: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)
Wang, Z.: Modern question answering datasets and benchmarks: a survey. arXiv preprint arXiv:2206.15030 (2022)
Boratko, M., et al.: A systematic classification of knowledge, reasoning, and context within the arc dataset. arXiv preprint arXiv:1806.00358 (2018)
Liu, J., Cui, L., Liu, H., Huang, D., Wang, Y., Zhang, Y.: LogiQA: a challenge dataset for machine reading comprehension with logical reasoning. arXiv preprint arXiv:2007.08124 (2020)
Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor conduct electricity? A new dataset for open book question answering. arXiv preprint arXiv:1809.02789 (2018)
Clark, C., Lee, K., Chang, M.W., Kwiatkowski, T., Collins, M., Toutanova, K.: Boolq: exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 (2019)
Voskarides, N., Li, D., Panteli, A., Ren, P.: ILPS at TREC 2019 conversational assistant track. In: TREC (2019)
Trischler, A., et al.: NewsQA: a machine comprehension dataset. arXiv preprint arXiv:1611.09830 (2016)
Dunn, M., Sagun, L., Higgins, M., Guney, V.U., Cirik, V., Cho, K.: SearchQA: a new Q&A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179 (2017)
Choi, E., et al.: Quac: question answering in context. arXiv preprint arXiv:1808.07036 (2018)
Tafjord, O., Gardner, M., Lin, K., Clark, P.: Quartz: an open-domain dataset of qualitative relationship questions. arXiv preprint arXiv:1909.03553 (2019)
Nguyen, T., et al.: MS marco: a human generated machine reading comprehension dataset. Choice 2640, 660 (2016)
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., Auli, M.: Eli5: long form question answering. arXiv preprint arXiv:1907.09190 (2019)
Ullrich, S., Geierhos, M.: Using bloom’s taxonomy to classify question complexity. In: Proceedings of the Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 285–289 (2021)
Palmer, M., Gildea, D., Xue, N.: Semantic role labeling. Synth. Lect. Hum. Lang. Technol. 3(1), 1–103 (2010)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Nguyen, D., Nguyen, T.: A question answering model based evaluation for OVL (ontology for Vietnamese language). Int. J. Comput. Theory Eng. 347–351 (2011). https://doi.org/10.7763/IJCTE.2011.V3.330, https://huggingface.co/ChuVN/bart-base-finetuned-squad2
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., et al.: Graph attention networks. Stat 1050(20), 10–48550 (2017)
Mrini, K., Farcas, E., Nakashole, N.: Recursive tree-structured self-attention for answer sentence selection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4651–4661 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anbarasu, H.S., Navalli, H.V., Vidapanakal, H., Gowd, K.M., Das, B. (2023). Deep QA: An Open-Domain Dataset of Deep Questions and Comprehensive Answers. In: Neri, F., Du, KL., Varadarajan, V., San-Blas, AA., Jiang, Z. (eds) Computer and Communication Engineering. CCCE 2023. Communications in Computer and Information Science, vol 1823. Springer, Cham. https://doi.org/10.1007/978-3-031-35299-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-35299-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35298-0
Online ISBN: 978-3-031-35299-7
eBook Packages: Computer ScienceComputer Science (R0)