Qasper

AllenNLP, Semantic Scholar • 2021
A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
License: CC BY

Current version: 0.3

Clicking Download will provide a link to download the training and development sets of the latest version of the dataset in JSON format. The files only contain text. You can download images of the tables and figures in the papers from the link below.

Images of tables and figures in train and development sets (the size is about 450MB).

Test set and official evaluator

Once you are ready to evaluate your finalized model on the test set, use the following links to download the test set

What’s new in v0.3

Due to an issue in the annotation interface, a small number of annotations (about 0.6%) in the older versions had multiple answer types (e.g.: unanswerable and boolean). These were manually fixed to create v0.3. These fixes affected train, development, and test sets. Thanks to Xanh Ho for pointing out the issue.

Older versions

If you need the older versions for some reason, you can access them here:

Version 0.2

Version 0.1 (does not contain the images of figures and tables)

Authors

Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner