A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Boratko, Michael; Padigela, Harshit; Mikkilineni, Divyendra; Yuvraj, Pritish; Das, Rajarshi; McCallum, Andrew; Chang, Maria; Fokoue-Nkoutche, Achille; Kapanipathi, Pavan; Mattei, Nicholas; Musa, Ryan; Talamadupula, Kartik; Witbrock, Michael

Computer Science > Artificial Intelligence

arXiv:1806.00358 (cs)

[Submitted on 1 Jun 2018 (v1), last revised 4 Feb 2019 (this version, v2)]

Title:A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Authors:Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael Witbrock

View PDF

Abstract:The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the Challenge Set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.

Comments:	Presented at the Machine Reading for Question Answering (MRQA 2018) Workshop at the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2018). 11 pages, 5 tables, 4 figures. Added missing citations in the latest draft
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:1806.00358 [cs.AI]
	(or arXiv:1806.00358v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1806.00358

Submission history

From: Rajarshi Das [view email]
[v1] Fri, 1 Jun 2018 14:06:45 UTC (270 KB)
[v2] Mon, 4 Feb 2019 20:59:32 UTC (572 KB)

Computer Science > Artificial Intelligence

Title:A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators