-
GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets
Authors:
Qiming Wu,
Zichen Chen,
Will Corcoran,
Misha Sra,
Ambuj K. Singh
Abstract:
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph da…
▽ More
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with an increase of 11.11\%, 33.37\%, and 33.37\%, respectively.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
EntangleVR++: Evaluating the Potential of using Entanglement in an Interactive VR Scene Creation System
Authors:
Mengyu Chen,
Marko Peljhan,
Misha Sra
Abstract:
Interactive digital stories provide a sense of flexibility and freedom to players by allowing them to make choices at key junctions. These choices advance the narrative and determine, to some degree, how the story evolves for that player. As shown in prior work, the ability to control or participate in the construction of the narrative can give the player a high level of agency that results in a s…
▽ More
Interactive digital stories provide a sense of flexibility and freedom to players by allowing them to make choices at key junctions. These choices advance the narrative and determine, to some degree, how the story evolves for that player. As shown in prior work, the ability to control or participate in the construction of the narrative can give the player a high level of agency that results in a stronger sense of immersion in the narrative experience. To support the design of this type of interactive storytelling, our system, EntangleVR++, borrows the idea of entanglement from quantum computing. Our use of entanglement allows creators and storytellers control over which sequences of story events take place in correlation with each other, initiated by the choices a player makes. In this work, we evaluated how well our idea of entanglement enables creators to easily and quickly design interactive VR narratives. We asked 16 participants to use our system and based on user interviews, analyses of screen recordings, and questionnaire feedback, we extracted four themes. From these themes and the study overall, we derived four authoring strategies for tool designers interested in the design of future visual interface for interactively creating virtual scenes that include relational objects and multiple outcomes driven by player interactions.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
ConnectVR: A Trigger-Action Interface for Creating Agent-based Interactive VR Stories
Authors:
Mengyu Chen,
Marko Peljhan,
Misha Sra
Abstract:
The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences…
▽ More
The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences is a particularly complex task as it involves coding a series of story events (character animation, movements, time control, dialogues, etc.) to be connected and triggered by a variety of player behaviors. In this work, we present ConnectVR, a trigger-action interface to enable non-technical creators design agent-based narrative experiences. Our no-code authoring method specifically focuses on the design of narratives driven by a series of cause-effect relationships triggered by the player's actions. We asked 15 participants to use ConnectVR in a preliminary workshop study as well as two artists to extensively use our system to create VR narrative projects in a three-week in-depth study. Our findings shed light on the creative opportunities facilitated by ConnectVR's trigger-action approach, particularly its capability to establish chained behavioral effects between virtual characters and objects. The results of both studies underscore the positive feedback from participants regarding our system's capacity to not only support creativity but also to simplify the creation of interactive narrative experiences. Results indicate compatibility with non-technical narrative creator's workflows, showcasing its potential to enhance the overall creative process in the realm of VR narrative design.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory
Authors:
Gordon Dai,
Weijia Zhang,
Jinhan Li,
Siqi Yang,
Chidera Onochie lbe,
Srihas Rao,
Arthur Caetano,
Misha Sra
Abstract:
The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in…
▽ More
The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in a sandbox survival environment. We conduct an evaluation of the agent society through the lens of Thomas Hobbes's seminal Social Contract Theory (SCT). We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security. Our experiments unveil an alignment: Initially, agents engage in unrestrained conflict, mirroring Hobbes's depiction of the state of nature. However, as the simulation progresses, social contracts emerge, leading to the authorization of an absolute sovereign and the establishment of a peaceful commonwealth founded on mutual cooperation. This congruence between our LLM agent society's evolutionary trajectory and Hobbes's theoretical account indicates LLMs' capability to model intricate social dynamics and potentially replicate forces that shape human societies. By enabling such insights into group behavior and emergent societal phenomena, LLM-driven multi-agent simulations, while unable to simulate all the nuances of human behavior, may hold potential for advancing our understanding of social structures, group dynamics, and complex human systems.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Nemotron-4 340B Technical Report
Authors:
Nvidia,
:,
Bo Adler,
Niket Agarwal,
Ashwath Aithal,
Dong H. Anh,
Pallab Bhattacharya,
Annika Brundyn,
Jared Casper,
Bryan Catanzaro,
Sharon Clay,
Jonathan Cohen,
Sirshak Das,
Ayush Dattagupta,
Olivier Delalleau,
Leon Derczynski,
Yi Dong,
Daniel Egert,
Ellie Evans,
Aleksander Ficek,
Denys Fridman,
Shaona Ghosh,
Boris Ginsburg,
Igor Gitman,
Tomasz Grzegorzek
, et al. (58 additional authors not shown)
Abstract:
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be…
▽ More
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Where there's a will there's a way: ChatGPT is used more for science in countries where it is prohibited
Authors:
Honglin Bao,
Mengyi Sun,
Misha Teplitskiy
Abstract:
Regulating AI is a key societal challenge, but which regulation methods are effective is unclear. This study measures the effectiveness of restricting AI services geographically, focusing on ChatGPT. OpenAI restricts ChatGPT access in several countries, including China and Russia. If restrictions are effective, ChatGPT use should be minimal in these countries. We measured use with a classifier bas…
▽ More
Regulating AI is a key societal challenge, but which regulation methods are effective is unclear. This study measures the effectiveness of restricting AI services geographically, focusing on ChatGPT. OpenAI restricts ChatGPT access in several countries, including China and Russia. If restrictions are effective, ChatGPT use should be minimal in these countries. We measured use with a classifier based on distinctive word usage found in early versions of ChatGPT, e.g. "delve." We trained the classifier on pre- and post-ChatGPT "polished" abstracts and found it outperformed GPTZero and ZeroGPT on validation sets, including papers with self-reported AI use. Applying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv showed ChatGPT was used in about 12.6% of preprints by August 2023, with 7.7% higher usage in restricted countries. The gap appeared before China's first major legal LLM became widely available. To test the possibility that, due to high demand, use in restricted countries would have been even higher without restrictions, we compared Asian countries with high expected demand (where English is not an official language) and found that use was higher in those with restrictions. ChatGPT use was correlated with higher views and downloads, but not citations or journal placement. Overall, restricting ChatGPT geographically has proven ineffective in science and possibly other domains, likely due to widespread workarounds.
△ Less
Submitted 27 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
DanceGen: Supporting Choreography Ideation and Prototyping with Generative AI
Authors:
Yimeng Liu,
Misha Sra
Abstract:
Choreography creation requires high proficiency in artistic and technical skills. Choreographers typically go through four stages to create a dance piece: preparation, studio, performance, and reflection. This process is often individualized, complicated, and challenging due to multiple constraints at each stage. To assist choreographers, most prior work has focused on designing digital tools to s…
▽ More
Choreography creation requires high proficiency in artistic and technical skills. Choreographers typically go through four stages to create a dance piece: preparation, studio, performance, and reflection. This process is often individualized, complicated, and challenging due to multiple constraints at each stage. To assist choreographers, most prior work has focused on designing digital tools to support the last three stages of the choreography process, with the preparation stage being the least explored. To address this research gap, we introduce an AI-based approach to assist the preparation stage by supporting ideation, creating choreographic prototypes, and documenting creative attempts and outcomes. We address the limitations of existing AI-based motion generation methods for ideation by allowing generated sequences to be edited and modified in an interactive web interface. This capability is motivated by insights from a formative study we conducted with seven choreographers. We evaluated our system's functionality, benefits, and limitations with six expert choreographers. Results highlight the usability of our system, with users reporting increased efficiency, expanded creative possibilities, and an enhanced iterative process. We also identified areas for improvement, such as the relationship between user intent and AI outcome, intuitive and flexible user interaction design, and integration with existing physical choreography prototyping workflows. By reflecting on the evaluation results, we present three insights that aim to inform the development of future AI systems that can empower choreographers.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Sam Ade Jacobs,
Ammar Ahmad Awan,
Jyoti Aneja,
Ahmed Awadallah,
Hany Awadalla,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Qin Cai,
Martin Cai,
Caio César Teodoro Mendes,
Weizhu Chen,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Parul Chopra
, et al. (90 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.
△ Less
Submitted 23 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Authors:
Sherry X. Chen,
Yaron Vaxman,
Elad Ben Baruch,
David Asulin,
Aviad Moreshet,
Kuo-Chin Lien,
Misha Sra,
Pradeep Sen
Abstract:
Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text…
▽ More
Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Learning Action-based Representations Using Invariance
Authors:
Max Rudolph,
Caleb Chuck,
Kevin Black,
Misha Lvovsky,
Scott Niekum,
Amy Zhang
Abstract:
Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps,…
▽ More
Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps, capturing long-horizon elements remains a challenging problem. Myopic controllability can capture the moment right before an agent crashes into a wall, but not the control-relevance of the wall while the agent is still some distance away. To address this we introduce action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint. By doing this, action-bisimulation learns a multi-step controllability metric that smoothly discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments, including a photorealistic 3D simulation domain, Habitat. Additionally, we provide theoretical analysis and qualitative results demonstrating the information captured by action-bisimulation.
△ Less
Submitted 24 June, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students
Authors:
Mina J. Kian,
Mingyu Zong,
Katrin Fischer,
Abhyuday Singh,
Anna-Maria Velentza,
Pau Sang,
Shriya Upadhyay,
Anika Gupta,
Misha A. Faruki,
Wallace Browning,
Sebastien M. R. Arnold,
Bhaskar Krishnamachari,
Maja J. Mataric
Abstract:
Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performa…
▽ More
Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performance of the SAR through a 15-day study with 38 university students randomly assigned to interact daily with the robot or a chatbot (using the same LLM), or complete traditional CBT worksheets throughout the duration of the study. We measured weekly therapeutic outcomes, changes in pre-/post-session anxiety measures, and adherence to completing CBT exercises. We found that self-reported measures of general psychological distress significantly decreased over the study period in the robot and worksheet conditions but not the chatbot condition. Furthermore, the SAR enabled significant single-session improvements for more sessions than the other two conditions combined. Our findings suggest that SAR-guided LLM-powered CBT may be as effective as traditional worksheet methods in supporting therapeutic progress from the beginning to the end of the study and superior in decreasing user anxiety immediately after completing the CBT exercise.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Exploring AI-assisted Ideation and Prototyping for Choreography
Authors:
Yimeng Liu,
Misha Sra
Abstract:
Choreography creation is a multimodal endeavor, demanding cognitive abilities to develop creative ideas and technical expertise to convert choreographic ideas into physical dance movements. Previous endeavors have sought to reduce the complexities in the choreography creation process in both dimensions. Among them, non-AI-based systems have focused on reinforcing cognitive activities by helping an…
▽ More
Choreography creation is a multimodal endeavor, demanding cognitive abilities to develop creative ideas and technical expertise to convert choreographic ideas into physical dance movements. Previous endeavors have sought to reduce the complexities in the choreography creation process in both dimensions. Among them, non-AI-based systems have focused on reinforcing cognitive activities by helping analyze and understand dance movements and augmenting physical capabilities by enhancing body expressivity. On the other hand, AI-based methods have helped the creation of novel choreographic materials with generative AI algorithms. The choreography creation process is constrained by time and requires a rich set of resources to stimulate novel ideas, but the need for iterative prototyping and reduced physical dependence have not been adequately addressed by prior research. Recognizing these challenges and the research gap, we present an innovative AI-based choreography-support system. Our goal is to facilitate rapid ideation by utilizing a generative AI model that can produce diverse and novel dance sequences. The system is designed to support iterative digital dance prototyping through an interactive web-based user interface that enables the editing and modification of generated motion. We evaluated our system by inviting six choreographers to analyze its limitations and benefits and present the evaluation results along with potential directions for future work.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Direct Language Model Alignment from Online AI Feedback
Authors:
Shangmin Guo,
Biao Zhang,
Tianlin Liu,
Tianqi Liu,
Misha Khalman,
Felipe Llinares,
Alexandre Rame,
Thomas Mesnard,
Yao Zhao,
Bilal Piot,
Johan Ferret,
Mathieu Blondel
Abstract:
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are…
▽ More
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy. In this study, we posit that online feedback is key and improves DAP methods. Our method, online AI feedback (OAIF), uses an LLM as annotator: on each training iteration, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback. Despite its simplicity, we demonstrate via human evaluation in several tasks that OAIF outperforms both offline DAP and RLHF methods. We further show that the feedback leveraged in OAIF is easily controllable, via instruction prompts to the LLM annotator.
△ Less
Submitted 29 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Ten Hard Problems in Artificial Intelligence We Must Get Right
Authors:
Gavin Leech,
Simson Garfinkel,
Misha Yagudin,
Alexander Briand,
Aleksandr Zhuravlev
Abstract:
We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same…
▽ More
We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. For each problem, we outline the area, identify significant recent work, and suggest ways forward. [Note: this paper reviews literature through January 2023.]
△ Less
Submitted 19 April, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
LiPO: Listwise Preference Optimization through Learning-to-Rank
Authors:
Tianqi Liu,
Zhen Qin,
Junru Wu,
Jiaming Shen,
Misha Khalman,
Rishabh Joshi,
Yao Zhao,
Mohammad Saleh,
Simon Baumgartner,
Jialu Liu,
Peter J. Liu,
Xuanhui Wang
Abstract:
Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to a…
▽ More
Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to amortize the cost of reading prompt. Multiple responses can also be ranked by reward models or AI feedback. There lacks such a thorough study on directly fitting upon a list of responses. In this work, we formulate the LM alignment as a \textit{listwise} ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. This view draws an explicit connection to Learning-to-Rank (LTR), where most existing preference optimization work can be mapped to existing ranking objectives. Following this connection, we provide an examination of ranking objectives that are not well studied for LM alignment with DPO and SLiC as special cases when list size is two. In particular, we highlight a specific method, LiPO-$λ$, which leverages a state-of-the-art \textit{listwise} ranking objective and weights each preference pair in a more advanced manner. We show that LiPO-$λ$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks with both curated and real rankwise preference data.
△ Less
Submitted 22 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Semidefinite programs simulate approximate message passing robustly
Authors:
Misha Ivkov,
Tselil Schramm
Abstract:
Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of me…
▽ More
Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is adversarially corrupted. Ours are the first robust guarantees for many of these problems. Further, our results offer an interesting counterpoint to strong lower bounds against less constrained SDP relaxations for average-case max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and other problems.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making
Authors:
Zichen Chen,
Jianda Chen,
Mitali Gaidhani,
Ambuj Singh,
Misha Sra
Abstract:
Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs)…
▽ More
Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs) in a novel way. Our dataset includes 12,102 question-answer-explanation (QAE) triples. Each explanation in the dataset links the LLM's reasoning to entities and relations in the KGs. The explanation component includes a why-choose explanation, a why-not-choose explanation, and a set of reason-elements that underlie the LLM's decision. We leverage KGs and graph attention networks (GAT) to find the reason-elements and transform them into why-choose and why-not-choose explanations that are comprehensible to humans. Through quantitative and qualitative evaluations, we demonstrate the potential of our dataset to improve the in-context learning of LLMs, and enhance their interpretability and explainability. Our work contributes to the field of explainable AI by enabling a deeper understanding of the LLMs decision-making process to make them more transparent and thereby, potentially more reliable, to researchers and practitioners alike. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.git
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Using large language models to study human memory for meaningful narratives
Authors:
Antonios Georgiou,
Tankut Can,
Mikhail Katkov,
Misha Tsodyks
Abstract:
One of the most impressive achievements of the AI revolution is the development of large language models that can generate meaningful text and respond to instructions in plain English with no additional training necessary. Here we show that language models can be used as a scientific instrument for studying human memory for meaningful material. We developed a pipeline for designing large scale mem…
▽ More
One of the most impressive achievements of the AI revolution is the development of large language models that can generate meaningful text and respond to instructions in plain English with no additional training necessary. Here we show that language models can be used as a scientific instrument for studying human memory for meaningful material. We developed a pipeline for designing large scale memory experiments and analyzing the obtained results. We performed online memory experiments with a large number of participants and collected recognition and recall data for narratives of different lengths. We found that both recall and recognition performance scale linearly with narrative length. Furthermore, in order to investigate the role of narrative comprehension in memory, we repeated these experiments using scrambled versions of the presented stories. We found that even though recall performance declined significantly, recognition remained largely unaffected. Interestingly, recalls in this condition seem to follow the original narrative order rather than the scrambled presentation, pointing to a contextual reconstruction of the story in memory.
△ Less
Submitted 28 November, 2023; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Calibrating Likelihoods towards Consistency in Summarization Models
Authors:
Polina Zablotskaia,
Misha Khalman,
Rishabh Joshi,
Livio Baldini Soares,
Shoshana Jakobovits,
Joshua Maynez,
Shashi Narayan
Abstract:
Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context, but t…
▽ More
Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context, but they often do not accurately rank sequences by their consistency. In this work, we solve this problem by calibrating the likelihood of model generated sequences to better align with a consistency metric measured by natural language inference (NLI) models. The human evaluation study and automatic metrics show that the calibrated models generate more consistent and higher-quality summaries. We also show that the models trained using our method return probabilities that are better aligned with the NLI scores, which significantly increase reliability of summarization models.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Segmentation-based Assessment of Tumor-Vessel Involvement for Surgical Resectability Prediction of Pancreatic Ductal Adenocarcinoma
Authors:
Christiaan Viviers,
Mark Ramaekers,
Amaan Valiuddin,
Terese Hellström,
Nick Tasios,
John van der Ven,
Igor Jacobs,
Lotte Ewals,
Joost Nederend,
Peter de With,
Misha Luyer,
Fons van der Sommen
Abstract:
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with limited treatment options. This research proposes a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement, a key factor in determining tumor resectability. Correct assessment of resectability is vital to determine treatment options. The proposed workflow involves processing CT…
▽ More
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with limited treatment options. This research proposes a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement, a key factor in determining tumor resectability. Correct assessment of resectability is vital to determine treatment options. The proposed workflow involves processing CT scans to segment the tumor and vascular structures, analyzing spatial relationships and the extent of vascular involvement, which follows a similar way of working as expert radiologists in PDAC assessment. Three segmentation architectures (nnU-Net, 3D U-Net, and Probabilistic 3D U-Net) achieve a high accuracy in segmenting veins, arteries, and the tumor. The segmentations enable automated detection of tumor involvement with high accuracy (0.88 sensitivity and 0.86 specificity) and automated computation of the degree of tumor-vessel contact. Additionally, due to significant inter-observer variability in these important structures, we present the uncertainty captured by each of the models to further increase insights into the predicted involvement. This result provides clinicians with a clear indication of tumor-vessel involvement and may be used to facilitate more informed decision-making for surgical interventions. The proposed method offers a valuable tool for improving patient outcomes, personalized treatment strategies and survival rates in pancreatic cancer.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Statistical Rejection Sampling Improves Preference Optimization
Authors:
Tianqi Liu,
Yao Zhao,
Rishabh Joshi,
Misha Khalman,
Mohammad Saleh,
Peter J. Liu,
Jialu Liu
Abstract:
Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attrac…
▽ More
Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attractive alternatives, offering improvements in stability and scalability while maintaining competitive performance. SLiC refines its loss function using sequence pairs sampled from a supervised fine-tuned (SFT) policy, while DPO directly optimizes language models based on preference data, foregoing the need for a separate reward model. However, the maximum likelihood estimator (MLE) of the target optimal policy requires labeled preference pairs sampled from that policy. DPO's lack of a reward model constrains its ability to sample preference pairs from the optimal policy, and SLiC is restricted to sampling preference pairs only from the SFT policy. To address these limitations, we introduce a novel approach called Statistical Rejection Sampling Optimization (RSO) that aims to source preference data from the target optimal policy using rejection sampling, enabling a more accurate estimation of the optimal policy. We also propose a unified framework that enhances the loss functions used in both SLiC and DPO from a preference modeling standpoint. Through extensive experiments across three diverse tasks, we demonstrate that RSO consistently outperforms both SLiC and DPO on evaluations from both Large Language Model (LLM) and human raters.
△ Less
Submitted 23 January, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Authors:
Konstantinos Bousmalis,
Giulia Vezzani,
Dushyant Rao,
Coline Devin,
Alex X. Lee,
Maria Bauza,
Todor Davchev,
Yuxiang Zhou,
Agrim Gupta,
Akhil Raju,
Antoine Laurens,
Claudio Fantacci,
Valentin Dalibard,
Martina Zambelli,
Murilo Martins,
Rugile Pevceviciute,
Michiel Blokzijl,
Misha Denil,
Nathan Batchelor,
Thomas Lampe,
Emilio Parisotto,
Konrad Żołna,
Scott Reed,
Sergio Gómez Colmenarejo,
Jon Scholz
, et al. (14 additional authors not shown)
Abstract:
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de…
▽ More
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.
△ Less
Submitted 22 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
$\pi2\text{vec}$: Policy Representations with Successor Features
Authors:
Gianluca Scarpellini,
Ksenia Konyushkova,
Claudio Fantacci,
Tom Le Paine,
Yutian Chen,
Misha Denil
Abstract:
This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe…
▽ More
This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.
△ Less
Submitted 24 January, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Vacant Holes for Unsupervised Detection of the Outliers in Compact Latent Representation
Authors:
Misha Glazunov,
Apostolis Zarras
Abstract:
Detection of the outliers is pivotal for any machine learning model deployed and operated in real-world. It is essential for the Deep Neural Networks that were shown to be overconfident with such inputs. Moreover, even deep generative models that allow estimation of the probability density of the input fail in achieving this task. In this work, we concentrate on the specific type of these models:…
▽ More
Detection of the outliers is pivotal for any machine learning model deployed and operated in real-world. It is essential for the Deep Neural Networks that were shown to be overconfident with such inputs. Moreover, even deep generative models that allow estimation of the probability density of the input fail in achieving this task. In this work, we concentrate on the specific type of these models: Variational Autoencoders (VAEs). First, we unveil a significant theoretical flaw in the assumption of the classical VAE model. Second, we enforce an accommodating topological property to the image of the deep neural mapping to the latent space: compactness to alleviate the flaw and obtain the means to provably bound the image within the determined limits by squeezing both inliers and outliers together. We enforce compactness using two approaches: (i) Alexandroff extension and (ii) fixed Lipschitz continuity constant on the mapping of the encoder of the VAEs. Finally and most importantly, we discover that the anomalous inputs predominantly tend to land on the vacant latent holes within the compact space, enabling their successful identification. For that reason, we introduce a specifically devised score for hole detection and evaluate the solution against several baseline benchmarks achieving promising results.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Authors:
Yao Zhao,
Rishabh Joshi,
Tianqi Liu,
Misha Khalman,
Mohammad Saleh,
Peter J. Liu
Abstract:
Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC),…
▽ More
Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
GAANet: Ghost Auto Anchor Network for Detecting Varying Size Drones in Dark
Authors:
Misha Urooj Khan,
Maham Misbah,
Zeeshan Kaleem,
Yansha Deng,
Abbas Jamalipour
Abstract:
The usage of drones has tremendously increased in different sectors spanning from military to industrial applications. Despite all the benefits they offer, their misuse can lead to mishaps, and tackling them becomes more challenging particularly at night due to their small size and low visibility conditions. To overcome those limitations and improve the detection accuracy at night, we propose an o…
▽ More
The usage of drones has tremendously increased in different sectors spanning from military to industrial applications. Despite all the benefits they offer, their misuse can lead to mishaps, and tackling them becomes more challenging particularly at night due to their small size and low visibility conditions. To overcome those limitations and improve the detection accuracy at night, we propose an object detector called Ghost Auto Anchor Network (GAANet) for infrared (IR) images. The detector uses a YOLOv5 core to address challenges in object detection for IR images, such as poor accuracy and a high false alarm rate caused by extended altitudes, poor lighting, and low image resolution. To improve performance, we implemented auto anchor calculation, modified the conventional convolution block to ghost-convolution, adjusted the input channel size, and used the AdamW optimizer. To enhance the precision of multiscale tiny object recognition, we also introduced an additional extra-small object feature extractor and detector. Experimental results in a custom IR dataset with multiple classes (birds, drones, planes, and helicopters) demonstrate that GAANet shows improvement compared to state-of-the-art detectors. In comparison to GhostNet-YOLOv5, GAANet has higher overall mean average precision (mAP@50), recall, and precision around 2.5\%, 2.3\%, and 1.4\%, respectively. The dataset and code for this paper are available as open source at https://github.com/ZeeshanKaleem/GhostAutoAnchorNet.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Do "bad" citations have "good" effects?
Authors:
Honglin Bao,
Misha Teplitskiy
Abstract:
The scientific community discourages authors of research papers from citing papers that did not influence them. Such "rhetorical" citations are assumed to degrade the literature and incentives for good work. While a world where authors cite only substantively appears attractive, we argue that mandating substantive citing may have underappreciated consequences on the allocation of attention and dyn…
▽ More
The scientific community discourages authors of research papers from citing papers that did not influence them. Such "rhetorical" citations are assumed to degrade the literature and incentives for good work. While a world where authors cite only substantively appears attractive, we argue that mandating substantive citing may have underappreciated consequences on the allocation of attention and dynamism in scientific literatures. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists by (rhetorically) citing papers that support their narrative, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability.
△ Less
Submitted 16 April, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
LMExplainer: a Knowledge-Enhanced Explainer for Language Models
Authors:
Zichen Chen,
Ambuj K Singh,
Misha Sra
Abstract:
Large language models (LLMs) such as GPT-4 are very powerful and can process different kinds of natural language processing (NLP) tasks. However, it can be difficult to interpret the results due to the multi-layer nonlinear model structure and millions of parameters. A lack of clarity and understanding of how the language models (LMs) work can make them unreliable, difficult to trust, and potentia…
▽ More
Large language models (LLMs) such as GPT-4 are very powerful and can process different kinds of natural language processing (NLP) tasks. However, it can be difficult to interpret the results due to the multi-layer nonlinear model structure and millions of parameters. A lack of clarity and understanding of how the language models (LMs) work can make them unreliable, difficult to trust, and potentially dangerous for use in real-world scenarios. Most recent works exploit attention weights to provide explanations for LM predictions. However, pure attention-based explanations are unable to support the growing complexity of LMs, and cannot reason about their decision-making processes. We propose LMExplainer, a knowledge-enhanced explainer for LMs that can provide human-understandable explanations. We use a knowledge graph (KG) and a graph attention neural network to extract the key decision signals of the LM. We further explore whether interpretation can also help the AI understand the task better. Our experimental results show that LMExplainer outperforms existing LM+KG methods on CommonsenseQA and OpenBookQA. We compare the explanation results with generated explanation methods and human-annotated results. The comparison shows our method can provide more comprehensive and clearer explanations. LMExplainer demonstrates the potential to enhance model performance and furnish explanations for the LM reasoning process in natural language.
△ Less
Submitted 3 August, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Vision-Language Models as Success Detectors
Authors:
Yuqing Du,
Ksenia Konyushkova,
Misha Denil,
Akhil Raju,
Jessica Landon,
Felix Hill,
Nando de Freitas,
Serkan Cabi
Abstract:
Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on developing robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success det…
▽ More
Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on developing robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success detection as a visual question answering (VQA) problem, denoted SuccessVQA. We study success detection across three vastly different domains: (i) interactive language-conditioned agents in a simulated household, (ii) real world robotic manipulation, and (iii) "in-the-wild" human egocentric videos. We investigate the generalisation properties of a Flamingo-based success detection model across unseen language and visual changes in the first two domains, and find that the proposed method is able to outperform bespoke reward models in out-of-distribution test scenarios with either variation. In the last domain of "in-the-wild" human videos, we show that success detection on unseen real videos presents an even more challenging generalisation task warranting future work. We hope our initial results encourage further work in real world success detection and reward modelling.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Toward A Dynamic Comfort Model for Human-Building Interaction in Grid-Interactive Efficient Buildings: Supported by Field Data
Authors:
SungKu Kang,
Kunind Sharma,
Maharshi Pathak,
Emily Casavant,
Katherine Bassett,
Misha Pavel,
David Fannon,
Michael Kane
Abstract:
Controlling building electric loads could alleviate the increasing grid strain caused by the adoption of renewables and electrification. However, current approaches that automatically setback thermostats on the hottest day compromise their efficacy by neglecting human-building interaction (HBI). This study aims to define challenges and opportunities for developing engineering models of HBI to be u…
▽ More
Controlling building electric loads could alleviate the increasing grid strain caused by the adoption of renewables and electrification. However, current approaches that automatically setback thermostats on the hottest day compromise their efficacy by neglecting human-building interaction (HBI). This study aims to define challenges and opportunities for developing engineering models of HBI to be used in the design of controls for grid-interactive efficient buildings (GEBs). Building system and measured and just-in-time surveyed psychophysiological data were collected from 41 participants in 20 homes from April-September. ASHRAE Standard 55 thermal comfort models for building design were evaluated with these data. Increased error bias was observed with increasing spatiotemporal temperature variations. Unsurprising, considering these models neglect such variance, but questioning their suitability for GEBs controlling thermostat setpoints, and given the observed 4°F intra-home spatial temperature variation. The results highlight opportunities for reducing these biases in GEBs through a paradigm shift to modeling discomfort instead of comfort, increasing use of low-cost sensors, and models that account for the observed dynamic occupant behavior: of the thermostat setpoint overrides made with 140-minutes of a previous setpoint change, 95% of small changes ( 2°F) were made with 120-minutes, while 95% of larger changes ( 10°F) were made within only 70-minutes.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction
Authors:
Avinash Ajit Nargund,
Misha Sra
Abstract:
3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applic…
▽ More
3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applications. In this paper, we present a non-autogressive model for human motion prediction. We focus on learning spatio-temporal representations non-autoregressively for generation of plausible future motions. We propose a novel architecture that leverages the recently proposed Transformers. Human motion involves complex spatio-temporal dynamics with joints affecting the position and rotation of each other even though they are not connected directly. The proposed model extracts these dynamics using both convolutions and the self-attention mechanism. Using specialized spatial and temporal self-attention to augment the features extracted through convolution allows our model to generate spatio-temporally coherent predictions in parallel independent of the activity. Our contributions are threefold: (i) we frame human motion prediction as a sequence-to-sequence problem and propose a non-autoregressive Transformer to forecast a sequence of poses in parallel; (ii) our method is activity agnostic; (iii) we show that despite its simplicity, our approach is able to make accurate predictions, achieving better or comparable results compared to the state-of-the-art on two public datasets, with far fewer parameters and much faster inference.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Do Bayesian Variational Autoencoders Know What They Don't Know?
Authors:
Misha Glazunov,
Apostolis Zarras
Abstract:
The problem of detecting the Out-of-Distribution (OoD) inputs is of paramount importance for Deep Neural Networks. It has been previously shown that even Deep Generative Models that allow estimating the density of the inputs may not be reliable and often tend to make over-confident predictions for OoDs, assigning to them a higher density than to the in-distribution data. This over-confidence in a…
▽ More
The problem of detecting the Out-of-Distribution (OoD) inputs is of paramount importance for Deep Neural Networks. It has been previously shown that even Deep Generative Models that allow estimating the density of the inputs may not be reliable and often tend to make over-confident predictions for OoDs, assigning to them a higher density than to the in-distribution data. This over-confidence in a single model can be potentially mitigated with Bayesian inference over the model parameters that take into account epistemic uncertainty. This paper investigates three approaches to Bayesian inference: stochastic gradient Markov chain Monte Carlo, Bayes by Backpropagation, and Stochastic Weight Averaging-Gaussian. The inference is implemented over the weights of the deep neural networks that parameterize the likelihood of the Variational Autoencoder. We empirically evaluate the approaches against several benchmarks that are often used for OoD detection: estimation of the marginal likelihood utilizing sampled model ensemble, typicality test, disagreement score, and Watanabe-Akaike Information Criterion. Finally, we introduce two simple scores that demonstrate the state-of-the-art performance.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
SafeSpace MFNet: Precise and Efficient MultiFeature Drone Detection Network
Authors:
Misha Urooj Khan,
Mahnoor Dil,
Muhammad Zeshan Alam,
Farooq Alam Orakazi,
Abdullah M. Almasoud,
Zeeshan Kaleem,
Chau Yuen
Abstract:
The increasing prevalence of unmanned aerial vehicles (UAVs), commonly known as drones, has generated a demand for reliable detection systems. The inappropriate use of drones presents potential security and privacy hazards, particularly concerning sensitive facilities. To overcome those obstacles, we proposed the concept of MultiFeatureNet (MFNet), a solution that enhances feature representation b…
▽ More
The increasing prevalence of unmanned aerial vehicles (UAVs), commonly known as drones, has generated a demand for reliable detection systems. The inappropriate use of drones presents potential security and privacy hazards, particularly concerning sensitive facilities. To overcome those obstacles, we proposed the concept of MultiFeatureNet (MFNet), a solution that enhances feature representation by capturing the most concentrated feature maps. Additionally, we present MultiFeatureNet-Feature Attention (MFNet-FA), a technique that adaptively weights different channels of the input feature maps. To meet the requirements of multi-scale detection, we presented the versions of MFNet and MFNet-FA, namely the small (S), medium (M), and large (L). The outcomes reveal notable performance enhancements. For optimal bird detection, MFNet-M (Ablation study 2) achieves an impressive precision of 99.8\%, while for UAV detection, MFNet-L (Ablation study 2) achieves a precision score of 97.2\%. Among the options, MFNet-FA-S (Ablation study 3) emerges as the most resource-efficient alternative, considering its small feature map size, computational demands (GFLOPs), and operational efficiency (in frame per second). This makes it particularly suitable for deployment on hardware with limited capabilities. Additionally, MFNet-FA-S (Ablation study 3) stands out for its swift real-time inference and multiple-object detection due to the incorporation of the FA module. The proposed MFNet-L with the focus module (Ablation study 2) demonstrates the most remarkable classification outcomes, boasting an average precision of 98.4\%, average recall of 96.6\%, average mean average precision (mAP) of 98.3\%, and average intersection over union (IoU) of 72.8\%. To encourage reproducible research, the dataset, and code for MFNet are freely available as an open-source project: github.com/ZeeshanKaleem/MultiFeatureNet.
△ Less
Submitted 6 October, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
TF-Net: Deep Learning Empowered Tiny Feature Network for Night-time UAV Detection
Authors:
Maham Misbah,
Misha Urooj Khan,
Zhaohui Yang,
Zeeshan Kaleem
Abstract:
Technological advancements have normalized the usage of unmanned aerial vehicles (UAVs) in every sector, spanning from military to commercial but they also pose serious security concerns due to their enhanced functionalities and easy access to private and highly secured areas. Several instances related to UAVs have raised security concerns, leading to UAV detection research studies. Visual techniq…
▽ More
Technological advancements have normalized the usage of unmanned aerial vehicles (UAVs) in every sector, spanning from military to commercial but they also pose serious security concerns due to their enhanced functionalities and easy access to private and highly secured areas. Several instances related to UAVs have raised security concerns, leading to UAV detection research studies. Visual techniques are widely adopted for UAV detection, but they perform poorly at night, in complex backgrounds, and in adverse weather conditions. Therefore, a robust night vision-based drone detection system is required to that could efficiently tackle this problem. Infrared cameras are increasingly used for nighttime surveillance due to their wide applications in night vision equipment. This paper uses a deep learning-based TinyFeatureNet (TF-Net), which is an improved version of YOLOv5s, to accurately detect UAVs during the night using infrared (IR) images. In the proposed TF-Net, we introduce architectural changes in the neck and backbone of the YOLOv5s. We also simulated four different YOLOv5 models (s,m,n,l) and proposed TF-Net for a fair comparison. The results showed better performance for the proposed TF-Net in terms of precision, IoU, GFLOPS, model size, and FPS compared to the YOLOv5s. TF-Net yielded the best results with 95.7\% precision, 84\% mAp, and 44.8\% $IoU$.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Elementary Bitcoin economics: from production and transaction demand to values
Authors:
Misha Perepelitsa
Abstract:
In this paper we give an elementary analysis of economics of Bitcoin that combines the transaction demand by the consumers and the supply of hashrate by miners. We argue that the decreasing block reward will have no significant effect on the exchange rate (price) of Bitcoin and thus the network will be transitioning to a regime where transaction fees will play a bigger part of miners' revenue. We…
▽ More
In this paper we give an elementary analysis of economics of Bitcoin that combines the transaction demand by the consumers and the supply of hashrate by miners. We argue that the decreasing block reward will have no significant effect on the exchange rate (price) of Bitcoin and thus the network will be transitioning to a regime where transaction fees will play a bigger part of miners' revenue. We consider a simple model where consumers demand bitcoins for transactions, but not for hoarding bitcoins, and we analyze market equilibrium where the demand is matched with the hashrate supplied by miners. Our main conclusion is that the exchange rate of Bitcoin cannot be determined from the market equilibrium and so our arguments support the hypothesis that Bitcoin price has no economic fundamentals and is free to fluctuate according to the present demand for hoarding and speculation. We point out that increasing fees bear the risk of Bitcoin being outcompeted by its main rival Ethereum, and that decreasing revenues to miners depreciate the perception of Bitcoin as a medium for store value (hoarding demand) which will have effect its exchange rate.
△ Less
Submitted 13 November, 2022;
originally announced November 2022.
-
CardsVR: A Two-Person VR Experience with Passive Haptic Feedback from a Deck of Playing Cards
Authors:
Andrew Huard,
Mengyu Chen,
Misha Sra
Abstract:
Presence in virtual reality (VR) is meaningful for remotely connecting with others and facilitating social interactions despite great distance while providing a sense of "being there." This work presents CardsVR, a two-person VR experience that allows remote participants to play a game of cards together. An entire deck of tracked cards are used to recreate the sense of playing cards in-person. Pri…
▽ More
Presence in virtual reality (VR) is meaningful for remotely connecting with others and facilitating social interactions despite great distance while providing a sense of "being there." This work presents CardsVR, a two-person VR experience that allows remote participants to play a game of cards together. An entire deck of tracked cards are used to recreate the sense of playing cards in-person. Prior work in VR commonly provides passive haptic feedback either through a single object or through static objects in the environment. CardsVR is novel in providing passive haptic feedback through multiple cards that are individually tracked and represented in the virtual environment. Participants interact with the physical cards by picking them up, holding them, playing them, or moving them on the physical table. Our participant study (N=23) shows that passive haptic feedback provides significant improvement in three standard measures of presence: Possibility to Act, Realism, and Haptics.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Integration of Riemannian Motion Policy with Whole-Body Control for Collision-Free Legged Locomotion
Authors:
Daniel Marew,
Misha Lvovsky,
Shangqun Yu,
Shotaro Sessions,
Donghyun Kim
Abstract:
In this paper, we present a Riemannian Motion Policy (RMP)flow-based whole-body control framework for improved dynamic legged locomotion. RMPflow is a differential geometry-inspired algorithm for fusing multiple task-space policies (RMPs) into a configuration space policy in a geometrically consistent manner. RMP-based approaches are especially suited for designing simultaneous tracking and collis…
▽ More
In this paper, we present a Riemannian Motion Policy (RMP)flow-based whole-body control framework for improved dynamic legged locomotion. RMPflow is a differential geometry-inspired algorithm for fusing multiple task-space policies (RMPs) into a configuration space policy in a geometrically consistent manner. RMP-based approaches are especially suited for designing simultaneous tracking and collision avoidance behaviors and have been successfully deployed on serial manipulators. However, one caveat of RMPflow is that it is designed with fully actuated systems in mind. In this work, we, for the first time, extend it to the domain of dynamic-legged systems, which have unforgiving under-actuation and limited control input. Thorough push recovery experiments are conducted in simulation to validate the overall framework. We show that expanding the valid stepping region with an RMP-based collision-avoidance swing leg controller improves balance robustness against external disturbances by up to 53\% compared to a baseline approach using a restricted stepping region. Furthermore, a point-foot biped robot is purpose-built for experimental studies of dynamic biped locomotion. A preliminary unassisted in-place stepping experiment is conducted to show the viability of the control framework and hardware.
△ Less
Submitted 6 November, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Calibrating Sequence likelihood Improves Conditional Language Generation
Authors:
Yao Zhao,
Misha Khalman,
Rishabh Joshi,
Shashi Narayan,
Mohammad Saleh,
Peter J. Liu
Abstract:
Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding…
▽ More
Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding as output quality degrading with large beam sizes, and decoding strategies benefiting from heuristics such as length normalization and repetition-blocking. In this work, we introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space. With SLiC, decoding heuristics become unnecessary and decoding candidates' quality significantly improves regardless of the decoding method. Furthermore, SLiC shows no sign of diminishing returns with model scale, and presents alternative ways to improve quality with limited training and inference budgets. With SLiC, we exceed or match SOTA results on a wide range of generation tasks spanning abstractive summarization, question generation, abstractive question answering and data-to-text generation, even with modest-sized models.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data
Authors:
Karine Tung,
Steven De La Torre,
Mohamed El Mistiri,
Rebecca Braga De Braganca,
Eric Hekler,
Misha Pavel,
Daniel Rivera,
Pedja Klasnja,
Donna Spruijt-Metz,
Benjamin M. Marlin
Abstract:
In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific…
▽ More
In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific focus on the efficient, declarative specification of dynamic Bayesian Networks (DBNs). The BayesLDM compiler combines a model specification with inspection of available data and outputs code for performing Bayesian inference for unknown model parameters while simultaneously handling missing data. These capabilities have the potential to significantly accelerate iterative modeling workflows in domains that involve the analysis of complex longitudinal data by abstracting away the process of producing computationally efficient probabilistic inference code. We describe the BayesLDM system components, evaluate the efficiency of representation and inference optimizations and provide an illustrative example of the application of the system to analyzing heterogeneous and partially observed mobile health data.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Intentional and serendipitous diffusion of ideas: Evidence from academic conferences
Authors:
Misha Teplitskiy,
Soya Park,
Neil Thompson,
David Karger
Abstract:
This paper investigates the effects of seeing ideas presented in-person when they are easily accessible online. Presentations may increase the diffusion of ideas intentionally (when one attends the presentation of an idea of interest) and serendipitously (when one sees other ideas presented in the same session). We measure these effects in the context of 25 computer science conferences using data…
▽ More
This paper investigates the effects of seeing ideas presented in-person when they are easily accessible online. Presentations may increase the diffusion of ideas intentionally (when one attends the presentation of an idea of interest) and serendipitously (when one sees other ideas presented in the same session). We measure these effects in the context of 25 computer science conferences using data from the scheduling application Confer, which lets users browse papers, Like those of interest, and receive schedules of their presentations. We address endogeneity concerns in presentation attendance by exploiting scheduling conflicts: when a user Likes multiple papers that are presented at the same time, she cannot see them both, potentially affecting their diffusion. Estimates show that being able to see presentations increases citing of Liked papers within two years by 1.5 percentage points (62.5% boost over the baseline citation rate). Attention to Liked papers also spills over to non-Liked papers in the same session, increasing their citing by 0.5 percentage points (125% boost), and this serendipitous diffusion represents 30.5% of the total effect. Both diffusion types were concentrated among papers semantically close to an attendee's prior work, suggesting that there are inefficiencies in finding related research that conferences help overcome. Overall, even when ideas are easily accessible online, in-person presentations substantially increase diffusion, much of it serendipitous.
△ Less
Submitted 19 January, 2024; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Improved Pancreatic Tumor Detection by Utilizing Clinically-Relevant Secondary Features
Authors:
Christiaan G. A. Viviers,
Mark Ramaekers,
Peter H. N. de With,
Dimitrios Mavroeidis,
Joost Nederend,
Misha Luyer,
Fons van der Sommen
Abstract:
Pancreatic cancer is one of the global leading causes of cancer-related deaths. Despite the success of Deep Learning in computer-aided diagnosis and detection (CAD) methods, little attention has been paid to the detection of Pancreatic Cancer. We propose a method for detecting pancreatic tumor that utilizes clinically-relevant features in the surrounding anatomical structures, thereby better aimin…
▽ More
Pancreatic cancer is one of the global leading causes of cancer-related deaths. Despite the success of Deep Learning in computer-aided diagnosis and detection (CAD) methods, little attention has been paid to the detection of Pancreatic Cancer. We propose a method for detecting pancreatic tumor that utilizes clinically-relevant features in the surrounding anatomical structures, thereby better aiming to exploit the radiologist's knowledge compared to other, conventional deep learning approaches. To this end, we collect a new dataset consisting of 99 cases with pancreatic ductal adenocarcinoma (PDAC) and 97 control cases without any pancreatic tumor. Due to the growth pattern of pancreatic cancer, the tumor may not be always visible as a hypodense lesion, therefore experts refer to the visibility of secondary external features that may indicate the presence of the tumor. We propose a method based on a U-Net-like Deep CNN that exploits the following external secondary features: the pancreatic duct, common bile duct and the pancreas, along with a processed CT scan. Using these features, the model segments the pancreatic tumor if it is present. This segmentation for classification and localization approach achieves a performance of 99% sensitivity (one case missed) and 99% specificity, which realizes a 5% increase in sensitivity over the previous state-of-the-art method. The model additionally provides location information with reasonable accuracy and a shorter inference time compared to previous PDAC detection methods. These results offer a significant performance improvement and highlight the importance of incorporating the knowledge of the clinical expert when developing novel CAD methods.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
Adaptive Virtual Neuroarchitecture
Authors:
Abhinandan Jain,
Pattie Maes,
Misha Sra
Abstract:
Our surrounding environment impacts our cognitive-emotional processes on a daily basis and shapes our physical, psychological and social wellbeing. Although the effects of the built environment on our psycho-physiological processes are well studied, virtual environment design with a potentially similar impact on the user, has received limited attention. Based on the influence of space design on a…
▽ More
Our surrounding environment impacts our cognitive-emotional processes on a daily basis and shapes our physical, psychological and social wellbeing. Although the effects of the built environment on our psycho-physiological processes are well studied, virtual environment design with a potentially similar impact on the user, has received limited attention. Based on the influence of space design on a user and combining that with the dynamic affordances of virtual spaces, we present the idea of adaptive virtual neuroarchitecture (AVN), where virtual environments respond to the user and the user's real world context while simultaneously influencing them both in realtime. To show how AVN has been explored in current research, we present a sampling of recent work that demonstrates reciprocal relationships using physical affordances (space, objects), the user's state (physiological, cognitive, emotional), and the virtual world used in the design of novel virtual reality experiences. We believe AVN has the potential to help us learn how to design spaces and environments that can enhance the wellbeing of their inhabitants.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
List-Decodable Covariance Estimation
Authors:
Misha Ivkov,
Pravesh K. Kothari
Abstract:
We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In…
▽ More
We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In $n^{\mathsf{poly}(1/α)}$ time, it outputs a constant-size list of $k = k(α)= (1/α)^{\mathsf{poly}(1/α)}$ candidate parameters that, with high probability, contains a $(\hatμ,\hatΣ)$ such that the total variation distance $TV(\mathcal{N}(μ_*,Σ_*),\mathcal{N}(\hatμ,\hatΣ))<1-O_α(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1-α)$-corruptions of any distribution $D$ that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials.
Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomial-time \emph{exact} algorithm for list-decodable linear regression and subspace recovery that allows, in particular, to obtain $2^{-\mathsf{poly}(d)}$ error in polynomial-time. Our result also implies an improved algorithm for clustering non-spherical mixtures.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
The Gender Gap in Scholarly Self-Promotion on Social Media
Authors:
Hao Peng,
Misha Teplitskiy,
Daniel M. Romero,
Emőke-Ágnes Horvát
Abstract:
Self-promotion in science is ubiquitous but may not be exercised equally by men and women. Research on self-promotion in other domains suggests that, due to bias in self-assessment and adverse reactions to non-gender-conforming behaviors (``pushback''), women tend to self-promote less often than men. We test whether this pattern extends to scholars by examining self-promotion over six years using…
▽ More
Self-promotion in science is ubiquitous but may not be exercised equally by men and women. Research on self-promotion in other domains suggests that, due to bias in self-assessment and adverse reactions to non-gender-conforming behaviors (``pushback''), women tend to self-promote less often than men. We test whether this pattern extends to scholars by examining self-promotion over six years using 23M Tweets about 2.8M research papers by 3.5M authors. Overall, women are about 28% less likely than men to self-promote their papers even after accounting for important confounds, and this gap has grown over time. Moreover, differential adoption of Twitter does not explain the gender gap, which is large even in relatively gender-balanced broad research areas, where bias in self-assessment and pushback are expected to be smaller. Further, the gap increases with higher performance and status, being most pronounced for productive women from top-ranked institutions who publish in high-impact journals. Critically, we find differential returns with respect to gender: while self-promotion is associated with increased tweets of papers, the increase is smaller for women than for men. Our findings suggest that self-promotion varies meaningfully by gender and help explain gender differences in the visibility of scientific ideas.
△ Less
Submitted 10 October, 2023; v1 submitted 10 June, 2022;
originally announced June 2022.
-
The CLRS Algorithmic Reasoning Benchmark
Authors:
Petar Veličković,
Adrià Puigdomènech Badia,
David Budden,
Razvan Pascanu,
Andrea Banino,
Misha Dashevskiy,
Raia Hadsell,
Charles Blundell
Abstract:
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate…
▽ More
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate specific hypotheses, making results hard to transfer across publications, and increasing the barrier of entry. To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms. We perform extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges. Our library is readily available at https://github.com/deepmind/clrs.
△ Less
Submitted 4 June, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Parametric Level-sets Enhanced To Improve Reconstruction (PaLEnTIR)
Authors:
Ege Ozsar,
Misha Kilmer,
Eric Miller,
Eric de Sturler,
Arvind Saibaba
Abstract:
We introduce PaLEnTIR, a significantly enhanced parametric level-set (PaLS) method addressing the restoration and reconstruction of piecewise constant objects. Our key contribution involves a unique PaLS formulation utilizing a single level-set function to restore scenes containing multi-contrast piecewise-constant objects without requiring knowledge of the number of objects or their contrasts. Un…
▽ More
We introduce PaLEnTIR, a significantly enhanced parametric level-set (PaLS) method addressing the restoration and reconstruction of piecewise constant objects. Our key contribution involves a unique PaLS formulation utilizing a single level-set function to restore scenes containing multi-contrast piecewise-constant objects without requiring knowledge of the number of objects or their contrasts. Unlike standard PaLS methods employing radial basis functions (RBFs), our model integrates anisotropic basis functions (ABFs), thereby expanding its capacity to represent a wider class of shapes. Furthermore, PaLEnTIR improves the conditioning of the Jacobian matrix, required as part of the parameter identification process, and consequently accelerates optimization methods. We validate PaLEnTIR's efficacy through diverse experiments encompassing sparse and limited angle of view X-ray computed tomography (2D and 3D), nonlinear diffuse optical tomography (DOT), denoising, and deconvolution tasks using both real and simulated data sets.
△ Less
Submitted 13 February, 2024; v1 submitted 20 April, 2022;
originally announced April 2022.
-
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
Authors:
Jack FitzGerald,
Christopher Hench,
Charith Peris,
Scott Mackie,
Kay Rottmann,
Ana Sanchez,
Aaron Nash,
Liam Urbach,
Vishesh Kakarala,
Richa Singh,
Swetha Ranganath,
Laurie Crist,
Misha Britan,
Wouter Leeuwis,
Gokhan Tur,
Prem Natarajan
Abstract:
We present the MASSIVE dataset--Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 5…
▽ More
We present the MASSIVE dataset--Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 50 typologically diverse languages from 29 genera. We also present modeling results on XLM-R and mT5, including exact match accuracy, intent classification accuracy, and slot-filling F1 score. We have released our dataset, modeling code, and models publicly.
△ Less
Submitted 17 June, 2022; v1 submitted 18 April, 2022;
originally announced April 2022.
-
Best Practices and Scoring System on Reviewing A.I. based Medical Imaging Papers: Part 1 Classification
Authors:
Timothy L. Kline,
Felipe Kitamura,
Ian Pan,
Amine M. Korchi,
Neil Tenenholtz,
Linda Moy,
Judy Wawira Gichoya,
Igor Santos,
Steven Blumer,
Misha Ysabel Hwang,
Kim-Ann Git,
Abishek Shroff,
Elad Walach,
George Shih,
Steve Langer
Abstract:
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection i…
▽ More
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection is often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of SIIM has identified a knowledge gap and a serious need to establish guidelines for reviewing these studies. Although there have been several recent papers with this goal, this present work is written from the machine learning practitioners standpoint. In this series, the committee will address the best practices to be followed in an A.I.-based study and present the required sections in terms of examples and discussion of what should be included to make the studies cohesive, reproducible, accurate, and self-contained. This first entry in the series focuses on the task of image classification. Elements such as dataset curation, data pre-processing steps, defining an appropriate reference standard, data partitioning, model architecture and training are discussed. The sections are presented as they would be detailed in a typical manuscript, with content describing the necessary information that should be included to make sure the study is of sufficient quality to be considered for publication. The goal of this series is to provide resources to not only help improve the review process for A.I.-based medical imaging papers, but to facilitate a standard for the information that is presented within all components of the research study. We hope to provide quantitative metrics in what otherwise may be a qualitative review process.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.