Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 216 papers
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin ChoiConference on Empirical Methods in Natural Language Processing • Findings • 2023 Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical…Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. SmitharXiv • 2023 The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e.g., the…CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter ClarkarXiv.org • 2023 Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero…Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle RichardsonKyle RichardsonarXiv • 2023 Can Large Language Models (LLMs) simulate human behavior in complex environments? LLMs have recently been shown to exhibit advanced reasoning skills but much of NLP evaluation still relies on static benchmarks. Answering this requires evaluation environments…Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations
Nirbhay Modhe, Qiaozi Gao, A. Kalyan, Dhruv Batra, G. Thattai, G. SukhatmearXiv.org • 2023 Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based…DISCO: Distilling Phrasal Counterfactuals with Large Language Models
Zeming Chen, Qiyue Gao, Kyle Richardson, Antoine Bosselut, Ashish SabharwalACL • 2023 Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if…Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish SabharwalACL • 2023 Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either…Do language models have coherent mental models of everyday things?
Yuling Gu, Bhavana Dalvi Mishra, Peter ClarkACL • 2023 When people think of everyday things like an “egg,” they typically have a mental image associated with it. This commonsense knowledge helps us understand how these everyday things work and how to interact with them. For example, when someone tries to make a…RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
Afra Feyza Akyurek, Ekin Akyürek, Aman Madaan, A. Kalyan, Peter Clark, D. Wijaya, Niket TandonAnnual Meeting of the Association for Computational Linguistics • 2023 Despite their unprecedented success, even the largest language models make mistakes.Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their…Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges, Niket Tandon, Tanja Kaser, Antoine BosselutarXiv • 2023 Natural Language Feedback (NLF) is an increasingly popular avenue to align Large Language Models (LLMs) to human preferences. Despite the richness and diversity of the information it can convey, NLF is often hand-designed and arbitrary. In a different world…