mrc-heuristics

The data for "What Makes Reading Comprehension Questions Easier?" (Sugawara et al., EMNLP 2018)

Easy and Hard subsets for machine reading comprehension datasets

Datasets

This repository now has easy/hard question ids and annotation data for the following datasets:

SQuAD (v1.1) [Rajpurkar et al., 2016]
AddSent [Jia and Liang, 2017]
NewsQA [Trischler et al., 2017]
TriviaQA (Wikipedia set) [Joshi et al., 2017]
QAngaroo (WikiHop) [Welbl et al., 2018]
MS MARCO (v2) [Nguyen et al., 2016]
NarrativeQA (summary) [Kočiský et al., 2018]
MCTest (160 + 500) [Richardson et al., 2013]
RACE (middle + high) [Lai et al., 2017]
MCScript [Ostermann et al., 2018]
ARC Easy [Clark et al., 2018]
ARC Challenge [Clark et al., 2018]

Easy and hard subsets

A json file consists as follows:

{question_id:
  {"f1"/"exact_match"/"rouge_l": [score],
   "predictions": [prediction]
  }
 question_id:
 ...
}

The score and prediction are made by the baseline system (BiDAF or GAR).

Validity and requisite skill annotation

Validity

valid/invalid = 1/0
multiple/single candidate = 1/0
unambiguous/ambiguous = 1/0

Skills

Multi-labeling. if multiple labels are selected, we used the most bottom label to compute statistics

0 = word matching
1 = paraphrasing
2 = knowledge reasoning
3 = meta/whole reasoning
4 = math/logical reasoning

Multisentence reasoining or not = 1/0

Relations (single-labeling)

0 = coreference
1 = causal relation
2 = spatial temporal
3 = none

See the paper for the details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
annotation		annotation
subsets		subsets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mrc-heuristics

Easy and Hard subsets for machine reading comprehension datasets

Datasets

Easy and hard subsets

Validity and requisite skill annotation

Validity

Skills

Multisentence reasoining or not = 1/0

About

Releases

Packages

Contributors 2

Alab-NII/mrc-heuristics

Folders and files

Latest commit

History

Repository files navigation

mrc-heuristics

Easy and Hard subsets for machine reading comprehension datasets

Datasets

Easy and hard subsets

Validity and requisite skill annotation

Validity

Skills

Multisentence reasoining or not = 1/0

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages