-
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Authors:
Bertie Vidgen,
Adarsh Agrawal,
Ahmed M. Ahmed,
Victor Akinwande,
Namir Al-Nuaimi,
Najla Alfaraj,
Elie Alhajjar,
Lora Aroyo,
Trupti Bavalatti,
Max Bartolo,
Borhane Blili-Hamelin,
Kurt Bollacker,
Rishi Bomassani,
Marisa Ferrara Boston,
Siméon Campos,
Kal Chakra,
Canyu Chen,
Cody Coleman,
Zacharie Delpierre Coudert,
Leon Derczynski,
Debojyoti Dutta,
Ian Eisenberg,
James Ezick,
Heather Frase,
Brian Fuller
, et al. (75 additional authors not shown)
Abstract:
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu…
▽ More
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
△ Less
Submitted 13 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk Disclosures
Authors:
Eli Sherman,
Ian W. Eisenberg
Abstract:
As AI systems' sophistication and proliferation have increased, awareness of the risks has grown proportionally (Sorkin et al. 2023). In response, calls have grown for stronger emphasis on disclosure and transparency in the AI industry (NTIA 2023; OpenAI 2023b), with proposals ranging from standardizing use of technical disclosures, like model cards (Mitchell et al. 2019), to yet-unspecified licen…
▽ More
As AI systems' sophistication and proliferation have increased, awareness of the risks has grown proportionally (Sorkin et al. 2023). In response, calls have grown for stronger emphasis on disclosure and transparency in the AI industry (NTIA 2023; OpenAI 2023b), with proposals ranging from standardizing use of technical disclosures, like model cards (Mitchell et al. 2019), to yet-unspecified licensing regimes (Sindhu 2023). Since the AI value chain is complicated, with actors representing various expertise, perspectives, and values, it is crucial that consumers of a transparency disclosure be able to understand the risks of the AI system the disclosure concerns. In this paper we propose a risk profiling standard which can guide downstream decision-making, including triaging further risk assessment, informing procurement and deployment, and directing regulatory frameworks. The standard is built on our proposed taxonomy of AI risks, which reflects a high-level categorization of the wide variety of risks proposed in the literature. We outline the myriad data sources needed to construct informative Risk Profiles and propose a template-based methodology for collating risk information into a standard, yet flexible, structure. We apply this methodology to a number of prominent AI systems using publicly available information. To conclude, we discuss design decisions for the profiles and future work.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Democratizing Ethical Assessment of Natural Language Generation Models
Authors:
Amin Rasekh,
Ian Eisenberg
Abstract:
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms by generating discriminatory language, hateful speech, profane content, and other harmful material. Ethical assessment of these m…
▽ More
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms by generating discriminatory language, hateful speech, profane content, and other harmful material. Ethical assessment of these models is therefore critical. But it is also a challenging task, requiring an expertise in several specialized domains, such as computational linguistics and social justice. While significant strides have been made by the research community in this domain, accessibility of such ethical assessments to the wider population is limited due to the high entry barriers. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models: Tool for Ethical Assessment of Language generation models (TEAL), a component of Credo AI Lens, an open-source assessment framework.
△ Less
Submitted 22 July, 2022; v1 submitted 30 June, 2022;
originally announced July 2022.