-
Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts
Authors:
Marissa Radensky,
Daniel S. Weld,
Joseph Chee Chang,
Pao Siangliulue,
Jonathan Bragg
Abstract:
Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of…
▽ More
Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of strategic planning grounded in the source document, well-organized drafting, and thoughtful revisions. Can tools powered by large language models (LLMs) assist scientists in writing research-paper blog posts? To investigate this question, we conducted a formative study (N=6) to understand the main challenges of writing such blog posts with an LLM: high interaction costs for 1) reviewing and utilizing the paper content and 2) recurrent sub-tasks of generating and modifying the long-form output. To address these challenges, we developed Papers-to-Posts, an LLM-powered tool that implements a new Plan-Draft-Revise workflow, which 1) leverages an LLM to generate bullet points from the full paper to help users find and select content to include (Plan) and 2) provides default yet customizable LLM instructions for generating and modifying text (Draft, Revise). Through a within-subjects lab study (N=20) and between-subjects deployment study (N=37 blog posts, 26 participants) in which participants wrote blog posts about their papers, we compared Papers-to-Posts to a strong baseline tool that provides an LLM-generated draft and access to free-form LLM prompting. Results show that Papers-to-Posts helped researchers to 1) write significantly more satisfying blog posts and make significantly more changes to their blog posts in a fixed amount of time without a significant change in cognitive load (lab) and 2) make more changes to their blog posts for a fixed number of writing actions (deployment).
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers
Authors:
Yoonjoo Lee,
Hyeonsu B. Kang,
Matt Latzke,
Juho Kim,
Jonathan Bragg,
Joseph Chee Chang,
Pao Siangliulue
Abstract:
With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper tit…
▽ More
With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles and abstracts. To help researchers spot these connections, we present PaperWeaver, an enriched paper alerts system that provides contextualized text descriptions of recommended papers based on user-collected papers. PaperWeaver employs a computational method based on Large Language Models (LLMs) to infer users' research interests from their collected papers, extract context-specific aspects of papers, and compare recommended and collected papers on these aspects. Our user study (N=15) showed that participants using PaperWeaver were able to better understand the relevance of recommended papers and triage them more confidently when compared to a baseline that presented the related work sections from recommended papers.
△ Less
Submitted 9 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Mitigating Barriers to Public Social Interaction with Meronymous Communication
Authors:
Nouran Soliman,
Hyeonsu B Kang,
Matthew Latzke,
Jonathan Bragg,
Joseph Chee Chang,
Amy X. Zhang,
David R Karger
Abstract:
In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps them tailor their communication. We explore a design space of meronymous communication, where people can reveal carefully chosen aspects of t…
▽ More
In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps them tailor their communication. We explore a design space of meronymous communication, where people can reveal carefully chosen aspects of their identity and also leverage trusted endorsers to gain credibility. We implemented these ideas in a system for scholars to meronymously seek and receive paper recommendations on Twitter and Mastodon. A formative study with 20 scholars confirmed that scholars see benefits to participating but are deterred due to social anxiety. From a month-long public deployment, we found that with meronymity, junior scholars could comfortably ask ``newbie'' questions and get responses from senior scholars who they normally found intimidating. Responses were also tailored to the aspects about themselves that junior scholars chose to reveal.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Papeos: Augmenting Research Papers with Talk Videos
Authors:
Tae Soo Kim,
Matt Latzke,
Jonathan Bragg,
Amy X. Zhang,
Joseph Chee Chang
Abstract:
Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers…
▽ More
Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers and talk videos to leverage their complementary nature to provide a rich and fluid research consumption experience. Based on formative and co-design studies, we present Papeos, a novel reading and authoring interface that allow authors to augment their papers by segmenting and localizing talk videos alongside relevant paper passages with automatically generated suggestions. With Papeos, readers can visually skim a paper through clip thumbnails, and fluidly switch between consuming dense text in the paper or visual summaries in the video. In a comparative lab study (n=16), Papeos reduced mental load, scaffolded navigation, and facilitated more comprehensive reading of papers.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
RCT Rejection Sampling for Causal Estimation Evaluation
Authors:
Katherine A. Keith,
Sergey Feldman,
David Jurgens,
Jonathan Bragg,
Rohit Bhattacharya
Abstract:
Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment…
▽ More
Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.
△ Less
Submitted 31 January, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
Authors:
Mike D'Arcy,
Alexis Ross,
Erin Bransom,
Bailey Kuehl,
Jonathan Bragg,
Tom Hope,
Doug Downey
Abstract:
Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response. We introduce this task for large language models and release ARIES, a dataset of review comments and t…
▽ More
Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response. We introduce this task for large language models and release ARIES, a dataset of review comments and their corresponding paper edits, to enable training and evaluating models. We study two versions of the task: comment-edit alignment and edit generation, and evaluate several baselines, including GPT-4. We find that models struggle even to identify the edits that correspond to a comment, especially in cases where the comment is phrased in an indirect way or where the edit addresses the spirit of a comment but not the precise request. When tasked with generating edits, GPT-4 often succeeds in addressing comments on a surface level, but it rigidly follows the wording of the feedback rather than the underlying intent, and includes fewer technical details than human-written edits. We hope that our formalization, dataset, and analysis will form a foundation for future work in this area.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
Authors:
Zejiang Shen,
Tal August,
Pao Siangliulue,
Kyle Lo,
Jonathan Bragg,
Jeff Hammerbacher,
Doug Downey,
Joseph Chee Chang,
David Sontag
Abstract:
Large language models have introduced exciting new opportunities and challenges in designing and developing new AI-assisted writing support tools. Recent work has shown that leveraging this new technology can transform writing in many scenarios such as ideation during creative writing, editing support, and summarization. However, AI-supported expository writing--including real-world tasks like sch…
▽ More
Large language models have introduced exciting new opportunities and challenges in designing and developing new AI-assisted writing support tools. Recent work has shown that leveraging this new technology can transform writing in many scenarios such as ideation during creative writing, editing support, and summarization. However, AI-supported expository writing--including real-world tasks like scholars writing literature reviews or doctors writing progress notes--is relatively understudied. In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts. We characterize expository writing as evidence-based and knowledge-generating: it contains summaries of external documents as well as new information or knowledge. It can be seen as the product of authors' sensemaking process over a set of source documents, and the interplay between reading, reflection, and writing opens up new opportunities for designing AI support. We sketch three components for AI support design and discuss considerations for future research.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Authors:
Kyle Lo,
Joseph Chee Chang,
Andrew Head,
Jonathan Bragg,
Amy X. Zhang,
Cassidy Trier,
Chloe Anastasiades,
Tal August,
Russell Authur,
Danielle Bragg,
Erin Bransom,
Isabel Cachola,
Stefan Candra,
Yoganand Chandrasekhar,
Yen-Sung Chen,
Evie Yu-Yen Cheng,
Yvonne Chou,
Doug Downey,
Rob Evans,
Raymond Fok,
Fangzhou Hu,
Regan Huff,
Dongyeop Kang,
Tae Soo Kim,
Rodney Kinney
, et al. (30 additional authors not shown)
Abstract:
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan…
▽ More
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges.
△ Less
Submitted 23 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
Authors:
Joseph Chee Chang,
Amy X. Zhang,
Jonathan Bragg,
Andrew Head,
Kyle Lo,
Doug Downey,
Daniel S. Weld
Abstract:
When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide person…
▽ More
When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
ComLittee: Literature Discovery with Personal Elected Author Committees
Authors:
Hyeonsu B. Kang,
Nouran Soliman,
Matt Latzke,
Joseph Chee Chang,
Jonathan Bragg
Abstract:
In order to help scholars understand and follow a research topic, significant research has been devoted to creating systems that help scholars discover relevant papers and authors. Recent approaches have shown the usefulness of highlighting relevant authors while scholars engage in paper discovery. However, these systems do not capture and utilize users' evolving knowledge of authors. We reflect o…
▽ More
In order to help scholars understand and follow a research topic, significant research has been devoted to creating systems that help scholars discover relevant papers and authors. Recent approaches have shown the usefulness of highlighting relevant authors while scholars engage in paper discovery. However, these systems do not capture and utilize users' evolving knowledge of authors. We reflect on the design space and introduce ComLittee, a literature discovery system that supports author-centric exploration. In contrast to paper-centric interaction in prior systems, ComLittee's author-centric interaction supports curation of research threads from individual authors, finding new authors and papers with combined signals from a paper recommender and the curated authors' authorship graphs, and understanding them in the context of those signals. In a within-subjects experiment that compares to an author-highlighting approach, we demonstrate how ComLittee leads to a higher efficiency, quality, and novelty in author discovery that also improves paper discovery.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Authors:
Srishti Palani,
Aakanksha Naik,
Doug Downey,
Amy X. Zhang,
Jonathan Bragg,
Joseph Chee Chang
Abstract:
Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers' related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work pa…
▽ More
Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers' related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work paragraphs helps overview a topic, it is hard to navigate overlapping and diverging references and research foci. In this work, we design a system, Relatedly, that scaffolds exploring and reading multiple related work paragraphs on a topic, with features including dynamic re-ranking and highlighting to spotlight unexplored dissimilar information, auto-generated descriptive paragraph headings, and low-lighting of redundant information. From a within-subjects user study (n=15), we found that scholars generate more coherent, insightful, and comprehensive topic outlines using Relatedly compared to a baseline paper list.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
The Semantic Scholar Open Data Platform
Authors:
Rodney Kinney,
Chloe Anastasiades,
Russell Authur,
Iz Beltagy,
Jonathan Bragg,
Alexandra Buraczynski,
Isabel Cachola,
Stefan Candra,
Yoganand Chandrasekhar,
Arman Cohan,
Miles Crawford,
Doug Downey,
Jason Dunkelberger,
Oren Etzioni,
Rob Evans,
Sergey Feldman,
Joseph Gorney,
David Graham,
Fangzhou Hu,
Regan Huff,
Daniel King,
Sebastian Kohlmeier,
Bailey Kuehl,
Michael Langan,
Daniel Lin
, et al. (23 additional authors not shown)
Abstract:
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF conte…
▽ More
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date, with 200M+ papers, 80M+ authors, 550M+ paper-authorship edges, and 2.4B+ citation edges. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings. In this paper, we describe the components of the S2 data processing pipeline and the associated APIs offered by the platform. We will update this living document to reflect changes as we add new data offerings and improve existing services.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
A Dataset of Alt Texts from HCI Publications: Analyses and Uses Towards Producing More Descriptive Alt Texts of Data Visualizations in Scientific Papers
Authors:
Sanjana Chintalapati,
Jonathan Bragg,
Lucy Lu Wang
Abstract:
Figures in scientific publications contain important information and results, and alt text is needed for blind and low vision readers to engage with their content. We conduct a study to characterize the semantic content of alt text in HCI publications based on a framework introduced by Lundgard and Satyanarayan. Our study focuses on alt text for graphs, charts, and plots extracted from HCI and acc…
▽ More
Figures in scientific publications contain important information and results, and alt text is needed for blind and low vision readers to engage with their content. We conduct a study to characterize the semantic content of alt text in HCI publications based on a framework introduced by Lundgard and Satyanarayan. Our study focuses on alt text for graphs, charts, and plots extracted from HCI and accessibility publications; we focus on these communities due to the lack of alt text in papers published outside of these disciplines. We find that the capacity of author-written alt text to fulfill blind and low vision user needs is mixed; for example, only 50% of alt texts in our sample contain information about extrema or outliers, and only 31% contain information about major trends or comparisons conveyed by the graph. We release our collected dataset of author-written alt text, and outline possible ways that it can be used to develop tools and models to assist future authors in writing better alt text. Based on our findings, we also discuss recommendations that can be acted upon by publishers and authors to encourage inclusion of more types of semantic content in alt text.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
FeedLens: Polymorphic Lenses for Personalizing Exploratory Search over Knowledge Graphs
Authors:
Harmanpreet Kaur,
Doug Downey,
Amanpreet Singh,
Evie Yu-Yen Cheng,
Daniel S. Weld,
Jonathan Bragg
Abstract:
The vast scale and open-ended nature of knowledge graphs (KGs) make exploratory search over them cognitively demanding for users. We introduce a new technique, polymorphic lenses, that improves exploratory search over a KG by obtaining new leverage from the existing preference models that KG-based systems maintain for recommending content. The approach is based on a simple but powerful observation…
▽ More
The vast scale and open-ended nature of knowledge graphs (KGs) make exploratory search over them cognitively demanding for users. We introduce a new technique, polymorphic lenses, that improves exploratory search over a KG by obtaining new leverage from the existing preference models that KG-based systems maintain for recommending content. The approach is based on a simple but powerful observation: in a KG, preference models can be re-targeted to recommend not only entities of a single base entity type (e.g., papers in the scientific literature KG, products in an e-commerce KG), but also all other types (e.g., authors, conferences, institutions; sellers, buyers). We implement our technique in a novel system, FeedLens, which is built over Semantic Scholar, a production system for navigating the scientific literature KG. FeedLens reuses the existing preference models on Semantic Scholar -- people's curated research feeds -- as lenses for exploratory search. Semantic Scholar users can curate multiple feeds/lenses for different topics of interest, e.g., one for human-centered AI and another for document embeddings. Although these lenses are defined in terms of papers, FeedLens re-purposes them to also guide search over authors, institutions, venues, etc. Our system design is based on feedback from intended users via two pilot surveys (n=17 and n=13, respectively). We compare FeedLens and Semantic Scholar via a third (within-subjects) user study (n=15) and find that FeedLens increases user engagement while reducing the cognitive effort required to complete a short literature review task. Our qualitative results also highlight people's preference for this more effective exploratory search experience enabled by FeedLens.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Scim: Intelligent Skimming Support for Scientific Papers
Authors:
Raymond Fok,
Hita Kambhamettu,
Luca Soldaini,
Jonathan Bragg,
Kyle Lo,
Andrew Head,
Marti A. Hearst,
Daniel S. Weld
Abstract:
Researchers need to keep up with immense literatures, though it is time-consuming and difficult to do so. In this paper, we investigate the role that intelligent interfaces can play in helping researchers skim papers, that is, rapidly reviewing a paper to attain a cursory understanding of its contents. After conducting formative interviews and a design probe, we suggest that skimming aids should a…
▽ More
Researchers need to keep up with immense literatures, though it is time-consuming and difficult to do so. In this paper, we investigate the role that intelligent interfaces can play in helping researchers skim papers, that is, rapidly reviewing a paper to attain a cursory understanding of its contents. After conducting formative interviews and a design probe, we suggest that skimming aids should aim to thread the needle of highlighting content that is simultaneously diverse, evenly-distributed, and important. We introduce Scim, a novel intelligent skimming interface that reifies this aim, designed to support the skimming process by highlighting salient paper contents to direct a skimmer's focus. Key to the design is that the highlights are faceted by content type, evenly-distributed across a paper, with a density configurable by readers at both the global and local level. We evaluate Scim with an in-lab usability study and deployment study, revealing how skimming aids can support readers throughout the skimming experience and yielding design considerations and tensions for the design of future intelligent skimming tools.
△ Less
Submitted 25 September, 2023; v1 submitted 9 May, 2022;
originally announced May 2022.
-
From Who You Know to What You Read: Augmenting Scientific Recommendations with Implicit Social Networks
Authors:
Hyeonsu B. Kang,
Rafal Kocielnik,
Andrew Head,
Jiangjiang Yang,
Matt Latzke,
Aniket Kittur,
Daniel S. Weld,
Doug Downey,
Jonathan Bragg
Abstract:
The ever-increasing pace of scientific publication necessitates methods for quickly identifying relevant papers. While neural recommenders trained on user interests can help, they still result in long, monotonous lists of suggested papers. To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledg…
▽ More
The ever-increasing pace of scientific publication necessitates methods for quickly identifying relevant papers. While neural recommenders trained on user interests can help, they still result in long, monotonous lists of suggested papers. To improve the discovery experience we introduce multiple new methods for \em augmenting recommendations with textual relevance messages that highlight knowledge-graph connections between recommended papers and a user's publication and interaction history. We explore associations mediated by author entities and those using citations alone. In a large-scale, real-world study, we show how our approach significantly increases engagement -- and future engagement when mediated by authors -- without introducing bias towards highly-cited authors. To expand message coverage for users with less publication or interaction history, we develop a novel method that highlights connections with proxy authors of interest to users and evaluate it in a controlled lab study. Finally, we synthesize design implications for future graph-based messages.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing
Authors:
Tal August,
Lucy Lu Wang,
Jonathan Bragg,
Marti A. Hearst,
Andrew Head,
Kyle Lo
Abstract:
When seeking information not covered in patient-friendly documents, like medical pamphlets, healthcare consumers may turn to the research literature. Reading medical papers, however, can be a challenging experience. To improve access to medical papers, we introduce a novel interactive interface-Paper Plain-with four features powered by natural language processing: definitions of unfamiliar terms,…
▽ More
When seeking information not covered in patient-friendly documents, like medical pamphlets, healthcare consumers may turn to the research literature. Reading medical papers, however, can be a challenging experience. To improve access to medical papers, we introduce a novel interactive interface-Paper Plain-with four features powered by natural language processing: definitions of unfamiliar terms, in-situ plain language section summaries, a collection of key questions that guide readers to answering passages, and plain language summaries of the answering passages. We evaluate Paper Plain, finding that participants who use Paper Plain have an easier time reading and understanding research papers without a loss in paper comprehension compared to those who use a typical PDF reader. Altogether, the study results suggest that guiding readers to relevant passages and providing plain language summaries, or "gists," alongside the original paper content can make reading medical papers easier and give readers more confidence to approach these papers.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
FLEX: Unifying Evaluation for Few-Shot NLP
Authors:
Jonathan Bragg,
Arman Cohan,
Kyle Lo,
Iz Beltagy
Abstract:
Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. In response, we formulate the FLEX Principles, a set of requirements and best…
▽ More
Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. In response, we formulate the FLEX Principles, a set of requirements and best practices for unified, rigorous, valid, and cost-sensitive few-shot NLP evaluation. These principles include Sample Size Design, a novel approach to benchmark design that optimizes statistical accuracy and precision while keeping evaluation costs manageable. Following the principles, we release the FLEX benchmark, which includes four few-shot transfer settings, zero-shot evaluation, and a public leaderboard that covers diverse NLP tasks. In addition, we present UniFew, a prompt-based model for few-shot learning that unifies pretraining and finetuning prompt formats, eschewing complex machinery of recent prompt-based approaches in adapting downstream task formats to language model pretraining objectives. We demonstrate that despite simplicity, UniFew achieves results competitive with both popular meta-learning and prompt-based approaches.
△ Less
Submitted 8 November, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users
Authors:
Lucy Lu Wang,
Isabel Cachola,
Jonathan Bragg,
Evie Yu-Yen Cheng,
Chelsea Haupt,
Matt Latzke,
Bailey Kuehl,
Madeleine van Zuylen,
Linda Wagner,
Daniel S. Weld
Abstract:
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce…
▽ More
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce the SciA11y system to offset some of the issues around inaccessibility. SciA11y incorporates several machine learning models to extract the content of scientific PDFs and render this content as accessible HTML, with added novel navigational features to support screen reader users. An intrinsic evaluation of extraction quality indicates that the majority of HTML renders (87%) produced by our system have no or only some readability issues. We perform a qualitative user study to understand the needs of BLV researchers when reading papers, and to assess whether the SciA11y system could address these needs. We summarize our user study findings into a set of five design recommendations for accessible scientific reader systems. User response to SciA11y was positive, with all users saying they would be likely to use the system in the future, and some stating that the system, if available, would become their primary workflow. We successfully produce HTML renders for over 12M papers, of which an open access subset of 1.5M are available for browsing at https://scia11y.org/
△ Less
Submitted 30 April, 2021;
originally announced May 2021.
-
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
Authors:
Daniel Khashabi,
Gabriel Stanovsky,
Jonathan Bragg,
Nicholas Lourie,
Jungo Kasai,
Yejin Choi,
Noah A. Smith,
Daniel S. Weld
Abstract:
While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research. We revisit this problem with a focus on producing consistent evaluations that are reproducible -- over time and across different populations. We study this goal in different stages of the human evaluation pipeline. In particular, we consider design choices for the annota…
▽ More
While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research. We revisit this problem with a focus on producing consistent evaluations that are reproducible -- over time and across different populations. We study this goal in different stages of the human evaluation pipeline. In particular, we consider design choices for the annotation interface used to elicit human judgments and their impact on reproducibility. Furthermore, we develop an automated mechanism for maintaining annotator quality via a probabilistic model that detects and excludes noisy annotators. Putting these lessons together, we introduce GENIE: a system for running standardized human evaluations across different generation tasks. We instantiate GENIE with datasets representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. For each task, GENIE offers a leaderboard that automatically crowdsources annotations for submissions, evaluating them along axes such as correctness, conciseness, and fluency. We have made the GENIE leaderboards publicly available, and have already ranked 50 submissions from 10 different research groups. We hope GENIE encourages further progress toward effective, standardized evaluations for text generation.
△ Less
Submitted 31 October, 2022; v1 submitted 16 January, 2021;
originally announced January 2021.
-
Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing
Authors:
Quanze Chen,
Jonathan Bragg,
Lydia B. Chilton,
Daniel S. Weld
Abstract:
Traditional approaches for ensuring high quality crowdwork have failed to achieve high-accuracy on difficult problems. Aggregating redundant answers often fails on the hardest problems when the majority is confused. Argumentation has been shown to be effective in mitigating these drawbacks. However, existing argumentation systems only support limited interactions and show workers general justifica…
▽ More
Traditional approaches for ensuring high quality crowdwork have failed to achieve high-accuracy on difficult problems. Aggregating redundant answers often fails on the hardest problems when the majority is confused. Argumentation has been shown to be effective in mitigating these drawbacks. However, existing argumentation systems only support limited interactions and show workers general justifications, not context-specific arguments targeted to their reasoning.
This paper presents Cicero, a new workflow that improves crowd accuracy on difficult tasks by engaging workers in multi-turn, contextual discussions through real-time, synchronous argumentation. Our experiments show that compared to previous argumentation systems which only improve the average individual worker accuracy by 6.8 percentage points on the Relation Extraction domain, our workflow achieves 16.7 percentage point improvement. Furthermore, previous argumentation approaches don't apply to tasks with many possible answers; in contrast, Cicero works well in these cases, raising accuracy from 66.7% to 98.8% on the Codenames domain.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.