The data for "What Makes Reading Comprehension Questions Easier?" (Sugawara et al., EMNLP 2018)
This repository now has easy/hard question ids and annotation data for the following datasets:
- SQuAD (v1.1) [Rajpurkar et al., 2016]
- AddSent [Jia and Liang, 2017]
- NewsQA [Trischler et al., 2017]
- TriviaQA (Wikipedia set) [Joshi et al., 2017]
- QAngaroo (WikiHop) [Welbl et al., 2018]
- MS MARCO (v2) [Nguyen et al., 2016]
- NarrativeQA (summary) [Kočiský et al., 2018]
- MCTest (160 + 500) [Richardson et al., 2013]
- RACE (middle + high) [Lai et al., 2017]
- MCScript [Ostermann et al., 2018]
- ARC Easy [Clark et al., 2018]
- ARC Challenge [Clark et al., 2018]
A json file consists as follows:
{question_id:
{"f1"/"exact_match"/"rouge_l": [score],
"predictions": [prediction]
}
question_id:
...
}
The score and prediction are made by the baseline system (BiDAF or GAR).
- valid/invalid = 1/0
- multiple/single candidate = 1/0
- unambiguous/ambiguous = 1/0
Multi-labeling. if multiple labels are selected, we used the most bottom label to compute statistics
- 0 = word matching
- 1 = paraphrasing
- 2 = knowledge reasoning
- 3 = meta/whole reasoning
- 4 = math/logical reasoning
Relations (single-labeling)
- 0 = coreference
- 1 = causal relation
- 2 = spatial temporal
- 3 = none
See the paper for the details.