Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 64 results for author: Dodge, J

.
  1. arXiv:2409.00069  [pdf, other

    cs.HC cs.AI

    How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

    Authors: Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga, Shikha Soneji, Rupika Dikkala, Teresita Guzman Nader, Leo Soccio, Sourav Panda, Rupak Kumar Das, Margaret Burnett, Jonathan Dodge

    Abstract: Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor… ▽ More

    Submitted 23 August, 2024; originally announced September 2024.

    ACM Class: D.2.8

  2. arXiv:2406.08446  [pdf, other

    cs.CL cs.AI

    OLMES: A Standard for Language Model Evaluations

    Authors: Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi

    Abstract: Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to how a model is evaluated on a task can lead to large changes in measured performance. There is no common standard setup, so different models are evaluated on the same tasks in different ways, leading to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2404.13087  [pdf, other

    cs.CL cs.LG

    Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service

    Authors: Shikha Soneji, Mitchell Hoesing, Sujay Koujalgi, Jonathan Dodge

    Abstract: The complexities of legalese in terms and policy documents can bind individuals to contracts they do not fully comprehend, potentially leading to uninformed data sharing. Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents, aiming to enhance user understanding and facilitate informed decisions. We compared… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  4. arXiv:2402.10290  [pdf, other

    cs.AI

    Experiments with Encoding Structured Data for Neural Networks

    Authors: Sujay Nagesh Koujalgi, Jonathan Dodge

    Abstract: The project's aim is to create an AI agent capable of selecting good actions in a game-playing domain called Battlespace. Sequential domains like Battlespace are important testbeds for planning problems, as such, the Department of Defense uses such domains for wargaming exercises. The agents we developed combine Monte Carlo Tree Search (MCTS) and Deep Q-Network (DQN) techniques in an effort to nav… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 18 pages, 8 figures, 2 tables

    ACM Class: I.2.4

  5. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  6. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  7. arXiv:2401.06408  [pdf, other

    cs.CL

    AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

    Authors: Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren F. Klein, Jesse Dodge

    Abstract: Large language models' (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage are under-scrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-des… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 28 pages, 13 figures. Association for Computational Linguistics (ACL) 2024

  8. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Project Page: https://paloma.allen.ai/

  9. arXiv:2312.10253  [pdf, other

    cs.CL

    Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

    Authors: Dirk Groeneveld, Anas Awadalla, Iz Beltagy, Akshita Bhagia, Ian Magnusson, Hao Peng, Oyvind Tafjord, Pete Walsh, Kyle Richardson, Jesse Dodge

    Abstract: The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks, domains, and datasets, often at an extreme scale. This imposes new engineering challenges: efforts in constructing datasets and models have been fragmented, and their formats and interfaces are incompati… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: technical report, work in progress

  10. arXiv:2312.03193  [pdf, other

    cs.HC cs.CY

    Conceptualizing the Relationship between AI Explanations and User Agency

    Authors: Iyadunni Adenuga, Jonathan Dodge

    Abstract: We grapple with the question: How, for whom and why should explainable artificial intelligence (XAI) aim to support the user goal of agency? In particular, we analyze the relationship between agency and explanations through a user-centric lens through case studies and thought experiments. We find that explanation serves as one of several possible first steps for agency by allowing the user convert… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: CHI 2023 Workshop: Human-Centered Explainable AI (HCXAI)

  11. arXiv:2310.20707  [pdf, other

    cs.CL cs.LG

    What's In My Big Data?

    Authors: Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge

    Abstract: Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corp… ▽ More

    Submitted 5 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024 spotlight

  12. arXiv:2310.14564  [pdf, other

    cs.CL

    Language Models Hallucinate, but May Excel at Fact Verification

    Authors: Jian Guan, Jesse Dodge, David Wadden, Minlie Huang, Hao Peng

    Abstract: Recent progress in natural language processing (NLP) owes much to remarkable advances in large language models (LLMs). Nevertheless, LLMs frequently "hallucinate," resulting in non-factual outputs. Our carefully-designed human evaluation substantiates the serious hallucination issue, revealing that even GPT-3.5 produces factual outputs less than 25% of the time. This underscores the importance of… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted in NAACL 2024

  13. arXiv:2310.03193  [pdf

    cs.DL cs.CL cs.CY physics.hist-ph physics.soc-ph

    The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices

    Authors: Hancheng Cao, Jesse Dodge, Kyle Lo, Daniel A. McFarland, Lucy Lu Wang

    Abstract: In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and co… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  14. arXiv:2307.09701  [pdf, other

    cs.CL

    Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

    Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across diffe… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  15. arXiv:2307.00204  [pdf

    cond-mat.supr-con

    Status of the Spurious Evidence for Photoinduced Superconductivity

    Authors: J. Steven Dodge, Leya Lopez, Derek G. Sahota

    Abstract: After more than a decade of research on photoinduced superconductivity, the experimental evidence for its existence remains controversial. Recently, we identified a fundamental flaw in the analysis of several influential results on K$_3$C$_{60}$ and showed that similar measurements on other compounds suffer from the same problem. We described how to account for this systematic error, and reanalyze… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: Abstract accepted for presentation at the IRMMW-THz 2023 Conference in Montreal, Canada. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2306.16900  [pdf, other

    cs.CL

    Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

    Authors: Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

    Abstract: Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  17. arXiv:2306.09562  [pdf, other

    cs.CL

    Reproducibility in NLP: What Have We Learned from the Checklist?

    Authors: Ian Magnusson, Noah A. Smith, Jesse Dodge

    Abstract: Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first analysis of the Checklist by examining 10,405 anonymous responses to it. First, we find evidence of an increase in reporting of information on… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: To be published in ACL 2023 Findings

  18. arXiv:2306.05949  [pdf, other

    cs.CY cs.AI

    Evaluating the Social Impact of Generative AI Systems in Systems and Society

    Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman , et al. (6 additional authors not shown)

    Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Forthcoming in Hacker, Engel, Hammer, Mittelstadt (eds), Oxford Handbook on the Foundations and Regulation of Generative AI. Oxford University Press

  19. arXiv:2306.02190  [pdf, other

    cs.CL

    Stubborn Lexical Bias in Data and Models

    Authors: Sofia Serrano, Jesse Dodge, Noah A. Smith

    Abstract: In NLP, recent work has seen increased focus on spurious correlations between various features and labels in training data, and how these influence model behavior. However, the presence and effect of such correlations are typically examined feature by feature. We investigate the cumulative impact on a model of many such intersecting features. Using a new statistical method, we examine whether such… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: ACL Findings 2023

  20. arXiv:2304.06939  [pdf, other

    cs.CV cs.CL

    Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

    Authors: Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi

    Abstract: In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., "What do image A and image B have in common?" To support this interface, pretraining occurs… ▽ More

    Submitted 28 October, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: NeurIPS D&B 2023. Project homepage: https://github.com/allenai/mmc4

  21. arXiv:2302.07027  [pdf, other

    cs.CL

    AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

    Authors: Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge

    Abstract: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel d… ▽ More

    Submitted 28 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023; camera-ready version; fixed typo in related work

  22. arXiv:2212.09676  [pdf, other

    cs.CL cs.DL

    Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

    Authors: Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith

    Abstract: Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring scholarly jargon from text. Expanding the scope of prior work which focuses on word types, we use word sense induction to also identify words that… ▽ More

    Submitted 22 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 17 pages, 11 figures, to appear in Findings of the Association for Computational Linguistics 2023

  23. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  24. arXiv:2210.01114  [pdf, other

    cond-mat.supr-con cond-mat.str-el

    Optical Saturation Produces Spurious Evidence for Photoinduced Superconductivity in K$_{3}$C$_{60}$

    Authors: J. Steven Dodge, Leya Lopez, Derek G. Sahota

    Abstract: We discuss a systematic uncertainty in time-resolved optical conductivity measurements that becomes important at high pump intensities. We show that common optical nonlinearities can distort the photoconductivity depth profile, and by extension distort the photoconductivity spectrum. We show evidence that this distortion is present in existing measurements on K$_{3}$C$_{60}$, and describe how it m… ▽ More

    Submitted 6 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: 5 pages, 5 figures, supplemental material. v2: Some acknowledgments removed upon request. v3: Corrected an error in Eq. (5) and revised the affected figures and discussion accordingly. Added sections to supplemental material. v4: Updated title; other minor changes suggested by journal editors

    Journal ref: Phys. Rev. Lett. 130, 146002 (2023)

  25. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  26. arXiv:2206.05985  [pdf, other

    cs.LG stat.ME

    Modeling the Machine Learning Multiverse

    Authors: Samuel J. Bell, Onno P. Kampman, Jesse Dodge, Neil D. Lawrence

    Abstract: Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the multiverse analysis. Our framework builds upon the multiverse analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML sea… ▽ More

    Submitted 12 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: To appear in Advances in Neural Information Processing Systems (NeurIPS) 2022

  27. arXiv:2206.05229  [pdf, other

    cs.LG

    Measuring the Carbon Intensity of AI in Cloud Instances

    Authors: Jesse Dodge, Taylor Prewitt, Remi Tachet Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan

    Abstract: By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint. As a result, recent scholarship has called for better estimates of the greenhouse gas impact of AI: data scientists today do not have easy or reliable access… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: In ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2022

  28. arXiv:2206.03216  [pdf, other

    cs.CY cs.AI cs.CL

    Data Governance in the Age of Large-Scale Data-Driven Language Technology

    Authors: Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev, Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Gérard Dupont, Jesse Dodge, Kyle Lo, Zeerak Talat, Isaac Johnson, Dragomir Radev, Somaieh Nikpoor, Jörg Frohberg, Aaron Gokaslan, Peter Henderson, Rishi Bommasani, Margaret Mitchell

    Abstract: The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distrib… ▽ More

    Submitted 2 November, 2022; v1 submitted 3 May, 2022; originally announced June 2022.

    Comments: 32 pages: Full paper and Appendices; Association for Computing Machinery, New York, NY, USA, 2206-2222

    Journal ref: Proceedings of 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22)

  29. arXiv:2203.06211  [pdf, other

    cs.CL

    Staged Training for Transformer Language Models

    Authors: Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew Peters, Iz Beltagy

    Abstract: The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width. By initializing each stage with the output… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  30. arXiv:2112.08786  [pdf, other

    cs.CL

    Efficient Hierarchical Domain Adaptation for Pretrained Language Models

    Authors: Alexandra Chronopoulou, Matthew E. Peters, Jesse Dodge

    Abstract: The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we… ▽ More

    Submitted 3 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: NAACL 2022 accepted paper camera ready version

  31. arXiv:2110.07574  [pdf, other

    cs.CL

    Can Machines Learn Morality? The Delphi Experiment

    Authors: Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi

    Abstract: As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications,… ▽ More

    Submitted 12 July, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  32. arXiv:2110.00613  [pdf, other

    cs.CL

    Expected Validation Performance and Estimation of a Random Variable's Maximum

    Authors: Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

    Abstract: Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments).… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

  33. arXiv:2109.13978  [pdf, other

    cs.AI

    Identifying Reasoning Flaws in Planning-Based RL Using Tree Explanations

    Authors: Kin-Ho Lam, Zhengxian Lin, Jed Irvine, Jonathan Dodge, Zeyad T Shureih, Roli Khanna, Minsuk Kahng, Alan Fern

    Abstract: Enabling humans to identify potential flaws in an agent's decision making is an important Explainable AI application. We consider identifying such flaws in a planning-based deep reinforcement learning (RL) agent for a complex real-time strategy game. In particular, the agent makes decisions via tree search using a learned model and evaluation function over interpretable states and actions. This gi… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  34. arXiv:2104.08758  [pdf, other

    cs.CL cs.AI

    Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

    Authors: Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner

    Abstract: Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scraping significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 accepted paper camera ready version

  35. arXiv:2104.08646  [pdf, other

    cs.CL

    Competency Problems: On Finding and Removing Artifacts in Language Data

    Authors: Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

    Abstract: Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have "spurious" instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a… ▽ More

    Submitted 28 December, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021. This version fixes an error in Proposition 1 and adds discussion (the EMNLP camera ready version is unfixed) (and v3 adds the acknowledgements that we forgot to put into v2)

  36. arXiv:2012.08509  [pdf, ps, other

    physics.data-an physics.optics

    Maximum-likelihood parameter estimation in terahertz time-domain spectroscopy

    Authors: Laleh Mohtashemi, Paul Westlund, Derek G. Sahota, Graham B. Lea, Ian Bushfield, Payam Mousavi, J. Steven Dodge

    Abstract: We present a maximum-likelihood method for parameter estimation in terahertz time-domain spectroscopy. We derive the likelihood function for a parameterized frequency response function, given a pair of time-domain waveforms with known time-dependent noise amplitudes. The method provides parameter estimates that are superior to other commonly-used methods, and provides a reliable measure of the goo… ▽ More

    Submitted 22 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: 15 pages, 6 figures; minor formatting updates

    Journal ref: Optics Express 29, 4912 (2021)

  37. arXiv:2004.07453  [pdf, other

    cs.CL cs.LG

    The Right Tool for the Job: Matching Model and Instance Complexities

    Authors: Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith

    Abstract: As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" from neural network calculations for simple instances, and late (and accurate) exi… ▽ More

    Submitted 8 May, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: ACL 2020; 12 pages; code available in https://github.com/allenai/sledgehammer

  38. arXiv:2002.06305  [pdf, other

    cs.CL cs.LG

    Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

    Authors: Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah Smith

    Abstract: Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we experiment with four datasets from the GLUE benchmark, fine-tuning… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

  39. arXiv:1912.01604  [pdf, ps, other

    cond-mat.str-el cond-mat.supr-con

    Many-body recombination in insulating cuprates

    Authors: Derek G. Sahota, Ruixing Liang, M. Dion, Patrick Fournier, Hanna A. DÄ…bkowska, Graeme M. Luke, J. Steven Dodge

    Abstract: We study the pump-probe response of three insulating cuprates and develop a model for its recombination kinetics. The dependence on time, fluence, and both pump and probe photon energies imply many-body recombination on femtosecond timescales, characterized by anomalously large trapping and Auger coefficients. The fluence dependence follows a universal form that includes a characteristic volume sc… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Journal ref: Phys. Rev. Research 1, 033214 (2019)

  40. arXiv:1909.03011  [pdf, other

    cs.CL cs.LG stat.ML

    RNN Architecture Learning with Sparse Regularization

    Authors: Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith

    Abstract: Neural models for NLP typically use large numbers of parameters to reach state-of-the-art performance, which can lead to excessive memory usage and increased runtime. We present a structure learning method for learning sparse, parameter-efficient NLP models. Our method applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state au… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  41. arXiv:1909.03004  [pdf, other

    cs.LG cs.CL stat.ME stat.ML

    Show Your Work: Improved Reporting of Experimental Results

    Authors: Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

    Abstract: Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best. We argue for reporting additional details, especially per… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  42. arXiv:1907.10597  [pdf, other

    cs.CY cs.CL cs.CV cs.LG stat.ME

    Green AI

    Authors: Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni

    Abstract: The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These computations have a surprisingly large carbon footprint [38]. Ironically, deep learning was inspired by the human brain, which is remarkably energy efficient. Moreover, the financial cost of the computations can make it difficult for aca… ▽ More

    Submitted 13 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: 12 pages

  43. arXiv:1903.09708  [pdf, other

    cs.HC cs.AI

    Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

    Authors: Andrew Anderson, Jonathan Dodge, Amrita Sadarangani, Zoe Juozapaitis, Evan Newman, Jed Irvine, Souti Chattopadhyay, Alan Fern, Margaret Burnett

    Abstract: We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare partic… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 March, 2019; originally announced March 2019.

    Comments: 7 pages

  44. arXiv:1903.09265  [pdf, other

    nucl-ex physics.ins-det

    Measurement of Moller Scattering at 2.5 MeV

    Authors: C. S. Epstein, R. Johnston, S. Lee, J. C. Bernauer, R. Corliss, K. Dow, P. Fisher, I. Friscic, D. Hasell, R. G. Milner, P. Moran, S. G. Steadman, Y. Wang, J. Dodge, E. Ihloff, J. Kelsey, C. Vidal, C. M. Cooke

    Abstract: Moller scattering is one of the most fundamental processes in QED. Understanding it to high precision is necessary for a variety of modern nuclear and particle physics experiments. In a recent calculation, existing soft-photon radiative corrections were combined with new hard-photon bremsstrahlung calculations to take into account the effect of photon emission at any photon energy, where the elect… ▽ More

    Submitted 13 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

    Journal ref: Phys. Rev. D 102, 012006 (2020)

  45. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

    Authors: Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, Casey Dugan

    Abstract: Ensuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically gener… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  46. arXiv:1711.08019  [pdf, other

    cs.HC

    Toward Foraging for Understanding of StarCraft Agents: An Empirical Study

    Authors: Sean Penney, Jonathan Dodge, Claudia Hilderbrand, Andrew Anderson, Logan Simpson, Margaret Burnett

    Abstract: Assessing and understanding intelligent agents is a difficult task for users that lack an AI background. A relatively new area, called "Explainable AI," is emerging to help address this problem, but little is known about how users would forage through information an explanation system might offer. To inform the development of Explainable AI systems, we conducted a formative study, using the lens o… ▽ More

    Submitted 26 December, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: 13 pages, 10 figures, to appear in ACM IUI 2018

  47. arXiv:1711.06953  [pdf, other

    cs.HC

    How the Experts Do It: Assessing and Explaining Agent Behaviors in Real-Time Strategy Games

    Authors: Jonathan Dodge, Sean Penney, Claudia Hilderbrand, Andrew Anderson, Margaret Burnett

    Abstract: How should an AI-based explanation system explain an agent's complex behavior to ordinary end users who have no background in AI? Answering this question is an active research area, for if an AI-based explanation system could effectively explain intelligent agents' behavior, it could enable the end users to understand, assess, and appropriately trust (or distrust) the agents attempting to help the… ▽ More

    Submitted 18 November, 2017; originally announced November 2017.

    Comments: 12 pages, 11 figures, submitted to CHI 2017

  48. arXiv:1711.00138  [pdf, other

    cs.AI

    Visualizing and Understanding Atari Agents

    Authors: Sam Greydanus, Anurag Koul, Jonathan Dodge, Alan Fern

    Abstract: While deep reinforcement learning (deep RL) agents are effective at maximizing rewards, it is often unclear what strategies they use to do so. In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. In particular, we focus on using saliency maps to understand how an agent learns and executes a policy. We introduce a method for generating u… ▽ More

    Submitted 10 September, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: ICML 2018 conference paper. Code: https://github.com/greydanus/visualize_atari Blog: https://greydanus.github.io/2017/11/01/visualize-atari/

  49. arXiv:1706.01566  [pdf, other

    stat.ML cs.LG

    Open Loop Hyperparameter Optimization and Determinantal Point Processes

    Authors: Jesse Dodge, Kevin Jamieson, Noah A. Smith

    Abstract: Driven by the need for parallelizable hyperparameter optimization methods, this paper studies \emph{open loop} search methods: sequences that are predetermined and can be generated before a single configuration is evaluated. Examples include grid search, uniform random search, low discrepancy sequences, and other sampling distributions. In particular, we propose the use of $k$-determinantal point… ▽ More

    Submitted 8 May, 2019; v1 submitted 5 June, 2017; originally announced June 2017.

  50. arXiv:1704.04803  [pdf, other

    cond-mat.supr-con cond-mat.str-el

    Disorder and superfluid density in overdoped cuprate superconductors

    Authors: N. R. Lee-Hone, J. S. Dodge, D. M. Broun

    Abstract: We calculate superfluid density for a dirty d-wave superconductor. The effects of impurity scattering are treated within the self-consistent t-matrix approximation, in weak-coupling BCS theory. Working from a realistic tight-binding parameterization of the Fermi surface, we find a superfluid density that is both correlated with T_c and linear in temperature, in good correspondence with recent expe… ▽ More

    Submitted 15 June, 2018; v1 submitted 16 April, 2017; originally announced April 2017.

    Comments: 10 pages, 6 figures, including appended erratum

    Journal ref: Phys. Rev. B 96, 024501 (2017); Erratum-ibid. 97, 219903 (2018)