-
Queer In AI: A Case Study in Community-Led Participatory AI
Authors:
Organizers Of QueerInAI,
:,
Anaelia Ovalle,
Arjun Subramonian,
Ashwin Singh,
Claas Voelcker,
Danica J. Sutherland,
Davide Locatelli,
Eva Breznik,
Filip Klubička,
Hang Yuan,
Hetvi J,
Huan Zhang,
Jaidev Shriram,
Kruno Lehman,
Luca Soldaini,
Maarten Sap,
Marc Peter Deisenroth,
Maria Leonor Pacheco,
Maria Ryskina,
Martin Mundt,
Milind Agarwal,
Nyx McLean,
Pan Xu,
A Pranav
, et al. (26 additional authors not shown)
Abstract:
We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the process, look at ways this organization has fallen short of operationalizing participatory and intersectional principles, and then assess th…
▽ More
We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the process, look at ways this organization has fallen short of operationalizing participatory and intersectional principles, and then assess the organization's impact. Queer in AI provides important lessons and insights for practitioners and theorists of participatory methods broadly through its rejection of hierarchy in favor of decentralization, success at building aid and programs by and for the queer community, and effort to change actors and institutions outside of the queer community. Finally, we theorize how communities like Queer in AI contribute to the participatory design in AI more broadly by fostering cultures of participation in AI, welcoming and empowering marginalized participants, critiquing poor or exploitative participatory practices, and bringing participation to institutions outside of individual research projects. Queer in AI's work serves as a case study of grassroots activism and participatory methods within AI, demonstrating the potential of community-led participatory methods and intersectional praxis, while also providing challenges, case studies, and nuanced insights to researchers developing and using participatory methods.
△ Less
Submitted 8 June, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Authors:
Mathew Monfort,
Bowen Pan,
Kandan Ramakrishnan,
Alex Andonian,
Barry A McNamara,
Alex Lascelles,
Quanfu Fan,
Dan Gutfreund,
Rogerio Feris,
Aude Oliva
Abstract:
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not…
▽ More
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information present in each video in training. Towards this goal, we present the Multi-Moments in Time dataset (M-MiT) which includes over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning, provide improved methods for visualizing and interpreting models trained for multi-label action detection and show the strength of transferring models trained on M-MiT to smaller datasets.
△ Less
Submitted 27 September, 2021; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Improving Neural Question Generation using World Knowledge
Authors:
Deepak Gupta,
Kaheer Suleman,
Mahmoud Adada,
Andrew McNamara,
Justin Harris
Abstract:
In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world…
▽ More
In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world knowledge features. The proposed world knowledge enriched question generation model is able to outperform the vanilla neural question generation model by 1.37 and 1.59 absolute BLEU 4 score on SQuAD and MS MARCO test dataset respectively.
△ Less
Submitted 10 September, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Authors:
Payal Bajaj,
Daniel Campos,
Nick Craswell,
Li Deng,
Jianfeng Gao,
Xiaodong Liu,
Rangan Majumder,
Andrew McNamara,
Bhaskar Mitra,
Tri Nguyen,
Mir Rosenberg,
Xia Song,
Alina Stoica,
Saurabh Tiwary,
Tong Wang
Abstract:
We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that…
▽ More
We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.
△ Less
Submitted 31 October, 2018; v1 submitted 28 November, 2016;
originally announced November 2016.
-
Key attributes of a modern statistical computing tool
Authors:
Amelia McNamara
Abstract:
In the 1990s, statisticians began thinking in a principled way about how computation could better support the learning and doing of statistics. Since then, the pace of software development has accelerated, advancements in computing and data science have moved the goalposts, and it is time to reassess. Software continues to be developed to help do and learn statistics, but there is little critical…
▽ More
In the 1990s, statisticians began thinking in a principled way about how computation could better support the learning and doing of statistics. Since then, the pace of software development has accelerated, advancements in computing and data science have moved the goalposts, and it is time to reassess. Software continues to be developed to help do and learn statistics, but there is little critical evaluation of the resulting tools, and no accepted framework with which to critique them. This paper presents a set of attributes necessary for a modern statistical computing tool. The framework was designed to be broadly applicable to both novice and expert users, with a particular focus on making more supportive statistical computing environments. A modern statistical computing tool should be accessible, provide easy entry, privilege data as a first-order object, support exploratory and confirmatory analysis, allow for flexible plot creation, support randomization, be interactive, include inherent documentation, support narrative, publishing, and reproducibility, and be flexible to extensions. Ideally, all these attributes could be incorporated into one tool, supporting users at all levels, but a more reasonable goal is for tools designed for novices and professionals to `reach across the gap,' taking inspiration from each others' strengths.
△ Less
Submitted 1 June, 2018; v1 submitted 30 September, 2016;
originally announced October 2016.
-
On the State of Computing in Statistics Education: Tools for Learning and for Doing
Authors:
Amelia McNamara
Abstract:
This paper lays out the current landscape of tools used in statistics education. In particular, it considers graphing calculators, spreadsheets, applets and microworlds, standalone educational software, statistical programming tools, tools for reproducible research and bespoke tools. The strengths and weaknesses of the tools are considered, particularly in the context of McNamara (2016)'s list of…
▽ More
This paper lays out the current landscape of tools used in statistics education. In particular, it considers graphing calculators, spreadsheets, applets and microworlds, standalone educational software, statistical programming tools, tools for reproducible research and bespoke tools. The strengths and weaknesses of the tools are considered, particularly in the context of McNamara (2016)'s list of attributes for a statistical computing tool. Best practices for computing in introductory statistics are suggested.
△ Less
Submitted 30 September, 2016;
originally announced October 2016.