Datasets

Find us on Hugging Face
AI2's latest open-source models and datasets can be found on our Hugging Face page.

Viewing 21-30 of 86 datasets

ATOMIC 2020
An atlas of everyday commonsense reasoning, organized through 1.33M textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC 2020, a commonsense knowledge graph with 1.33M everyday inferential knowledge tuples about entities and events. ATOMIC 2020 represents a large-scale common sense repository of textual descriptions that encode both the social and the physical…
Rainbow: A Commonsense Reasoning Benchmark
A commonsense reasoning benchmark spanning social and physical common senseMosaic • 2021Rainbow is a universal commonsense reasoning benchmark spanning both social and physical common sense. Rainbow brings together 6 existing commonsense reasoning tasks: aNLI, Cosmos QA, HellaSWAG, Physical IQa, Social IQa, and WinoGrande. Modelers are…
Scruples: Subreddit Corpus Requiring Understanding Principles in Life-like Ethical Situations
A corpus and benchmark for predicting communities' ethical judgments on real-life anecdotesMosaic • 2021Scruples is a corpus and benchmark for studying descriptive machine ethics, or machines' ability to understand people's ethical judgments. Scruples offers two datasets: the Anecdotes and the Dilemmas. The Anecdotes collect real-life experiences with ethical…
StrategyQA
2,780 implicit multi-hop reasoning questionsAI2 Israel, Question Understanding, Aristo • 2021StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question…
ProofWriter
Updated RuleTaker datasets with 500k questions, answers and proofs over rulebases.Aristo • 2020These datasets accompany the paper "ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language". They contain updated RuleTaker-style datasets with 500k questions, answers and proofs over natural-language rulebases, used to…
RuleTaker: Transformers as Soft Reasoners over Language
Datasets used to teach transformers to reasonAristo • 2020Can transformers be trained to reason (or emulate reasoning) over rules expressed in language? In the associated paper and demo we provide evidence that they can. Our models, that we call RuleTakers, are trained on datasets of synthetic rule bases plus…
A Dataset of Incomplete Information Reading Comprehension Questions
13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pagesAllenNLP • 2020IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes…
ZEST: ZEroShot learning from Task descriptions
ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.Mosaic, AllenNLP • 2020ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity…
Open PI
33K state changes over 4,050 sentences from 810 procedural, real-world paragraphsAristo, Mosaic • 2020Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. Our solution is a new task formulation in which just the text is provided, from which a set of state changes (entity…
MOCHA
A benchmark for training and evaluating generative reading comprehension metrics.AllenNLP • 2020Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap…

1
2
3
4
•••
9

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Datasets

ATOMIC 2020

Rainbow: A Commonsense Reasoning Benchmark

Scruples: Subreddit Corpus Requiring Understanding Principles in Life-like Ethical Situations

StrategyQA

ProofWriter

RuleTaker: Transformers as Soft Reasoners over Language

A Dataset of Incomplete Information Reading Comprehension Questions

ZEST: ZEroShot learning from Task descriptions

Open PI

MOCHA