Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 988 papers
  • Neural Network Parameterization of Subgrid‐Scale Physics From a Realistic Geography Global Storm‐Resolving Simulation

    Oliver Watt‐Meyer, Noah D. Brenowitz, S. K. Clark, Brian Henn, Anna Kwa, Jeremy McGibbon, W. Perkins, Lucas Harris, Christopher S. BrethertonJournal of Advances in Modeling Earth Systems2024 Parameterization of subgrid‐scale processes is a major source of uncertainty in global atmospheric model simulations. Global storm‐resolving simulations use a finer grid (less than 5 km) to reduce this uncertainty by explicitly resolving deep convection and…
  • The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

    Peter Hase, Mohit Bansal, Peter Clark, Sarah WiegreffearXiv.org2024 How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have…
  • Tropical Cirrus Are Highly Sensitive to Ice Microphysics Within a Nudged Global Storm‐Resolving Model

    R. Atlas, C. Bretherton, A. Sokol, P. Blossey, M. F. KhairoutdinovGeophysical Research Letters2024 Cirrus dominate the longwave radiative budget of the tropics. For the first time, the variability in cirrus properties and longwave cloud radiative effects (CREs) that arises from using different microphysical schemes within nudged global storm‐resolving…
  • Paloma: A Benchmark for Evaluating Language Model Fit

    Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, A. Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hanna Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse DodgearXiv2023 Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution…
  • Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

    Dirk Groeneveld, Anas Awadalla, Iz Beltagy, Akshita Bhagia, Ian Magnusson, Hao Peng, Oyvind Tafjord, Pete Walsh, Kyle Richardson, Jesse DodgearXiv.org2023 The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks, domains, and datasets, often at an extreme scale. This imposes…
  • Kilometer-scale global warming simulations and active sensors reveal changes in tropical deep convection

    Maximilien Bolot, Lucas M. Harris, Kai-Yuan Cheng, Timothy M. Merlis, Peter N. Blossey, Christopher S. Bretherton, Spencer K. Clark, Alex Kaltenbaugh, Linjiong Zhou & Stephan Fueglistaler NPJ Climate and Atmospheric Science2023 Changes in tropical deep convection with global warming are a leading source of uncertainty for future climate projections. A comparison of the responses of active sensor measurements of cloud ice to interannual variability and next-generation global storm…
  • ACE: A fast, skillful learned global atmospheric model for climate prediction

    Oliver Watt‐Meyer, Gideon Dresdner, J. McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah Brenowitz, K. Kashinath, Michael S. Pritchard, B. Bonev, Matthew E. Peters, Christopher S. BrethertonNeurIPS • Tackling Climate Change with Machine Learning2023 Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing…
  • IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

    Wenhao Yu, Meng Jiang, Peter Clark, Ashish SabharwalEMNLP2023 Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we…
  • Probabilistic Precipitation Downscaling with Optical Flow-Guided Diffusion

    Prakhar Srivastava, Ruihan Yang, Gavin Kerrigan, Gideon Dresdner, Jeremy McGibbon, Christopher Bretherton, S. MandtarXiv2023 In climate science and meteorology, local precipitation predictions are limited by the immense computational costs induced by the high spatial resolution that simulation methods require. A common workaround is statistical downscaling (aka superresolution…
  • Self-Refine: Iterative Refinement with Self-Feedback

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, Peter ClarkNeurIPS2023 Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback…