Abstract
In this paper, we describe the development of TALAA-AFAQ, a Corpus of Arabic Factoid Question Answers that is developed to be used in the training modules of an Arabic Question Answering System (AQAS). The process of building our corpus consists of five steps, in which we extract syntactic, semantic features and other information. In addition, we extract a set of answer patterns for each question from the web. The corpus contains 2002 question answer pairs. Out of these, 618 question-answer pairs have their answer-patterns. The corpus is divided into four main classes and 34 finer categories. All answer patterns and features have been validated by experts on Arabic. To the best of our knowledge, this is the first corpus of Arabic Factoid Question Answers which is specifically built to support the development of Arabic QASs (AQAS).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: Proceedings of the 10th International Conference on World Wide Web, pp. 169–178. ACM (2001)
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 284–293. Association for Computational Linguistics (2008)
Bhaskar, P., Pakray, P., Banerjee, S., Banerjee, S., Bandyopadhyay, S., Gelbukh, A.F.: Question answering system for QA4MRE@CLEF 2012. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57 (1997)
Cohn, A.B.R.C.D., Mittal, D.F.V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, p. 192. ACM Press (2000)
Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools (2009)
Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(03), 229–249 (2006)
Peñas, A., Hovy, E.H., Forner, P., Rodrigo, Á., Sutcliffe, R.F., Forascu, C., Sporleder, C.: Overview of QA4MRE@CLEF 2011: question answering for machine reading evaluation. In: CLEF (Notebook Papers/Labs/Workshop), pp. 1–20 (2011)
Shawar, B.A., Atwell, E.: Arabic question-answering via instance based learning from an FAQ corpus. In: Proceedings of the CL 2009 International Conference on Corpus Linguistics. UCREL (2009)
Soricut, R., Brill, E.: Automatic question answering using the web: beyond the factoid. Inf. Retrieval 9(2), 191–206 (2006)
Tomás, D., Vicedo, J.L., Bisbal, E., Moreno, L.: Trainqa: a training corpus for corpus-based question answering systems. Polibits 40, 5–11 (2009)
Trigui, O., Belguith, H., Rosso, P.: Defarabicqa: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Aouichat, A., Guessoum, A. (2017). Building TALAA-AFAQ, a Corpus of Arabic FActoid Question-Answers for a Question Answering System. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)