-
AI-Assisted Generation of Difficult Math Questions
Authors:
Vedant Shah,
Dingli Yu,
Kaifeng Lyu,
Simon Park,
Nan Rosemary Ke,
Michael Mozer,
Yoshua Bengio,
Sanjeev Arora,
Anirudh Goyal
Abstract:
Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLM…
▽ More
Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an "out of distribution" task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH$^2$ - a dataset of higher-quality math questions, as evidenced by: (a) Lower performance of all models on MATH$^2$ than on MATH (b) Higher performance on MATH when using MATH$^2$ questions as in-context examples. Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning, and potentially as a component of scalable oversight. Also of interest is a striking relationship observed between models' performance on the new dataset: the success rate on MATH$^2$ is the square on MATH, suggesting that successfully solving the question in MATH$^2$ requires a nontrivial combination of two distinct math skills.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Controllability problems of a neutral integro-differential equation with memory
Authors:
Sumit Arora,
Akambadath Nandakumaran
Abstract:
The current study addresses the control problems posed by a semilinear neutral integro-differential equation with memory. The primary objectives of this study are to investigate the existence of a mild solution and approximate controllability of both linear and semilinear control systems in Banach spaces. To accomplish this, we begin by introducing the concept of a resolvent family associated with…
▽ More
The current study addresses the control problems posed by a semilinear neutral integro-differential equation with memory. The primary objectives of this study are to investigate the existence of a mild solution and approximate controllability of both linear and semilinear control systems in Banach spaces. To accomplish this, we begin by introducing the concept of a resolvent family associated with the homogeneous neutral integro-differential equation without memory. In the process, we establish some important properties of the resolvent family. Subsequently, we develop approximate controllability results for a linear control problem by constructing a linear-quadratic regulator problem. This involves establishing the existence of an optimal pair and determining the expression of the optimal control that produces the approximate controllability of the linear system. Furthermore, we deduce sufficient conditions for the existence of a mild solution and approximate controllability of a semilinear system in a reflexive Banach space with a uniformly convex dual. Additionally, we delve into the discussion of approximate controllability for a semilinear problem in general Banach space, assuming a Lipschitz type condition on the nonlinear term. Finally, we implement our findings to examine the approximate controllability of certain partial differential equations, demonstrating their practical relevance.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Continuous variable quantum teleportation using photon subtracted and photon added two mode squeezed coherent state
Authors:
Shikhar Arora,
Chandan Kumar,
Arvind
Abstract:
We consider non-Gaussian states generated by photon subtraction (PS) and photon addition (PA) on two-mode squeezed coherent (TMSC) states, as resource states for continuous variable (CV) quantum teleportation (QT). To this end, we derive the Wigner characteristic function for the family of photon subtracted and photon added TMSC states, which is then utilized to calculate the fidelity of teleporti…
▽ More
We consider non-Gaussian states generated by photon subtraction (PS) and photon addition (PA) on two-mode squeezed coherent (TMSC) states, as resource states for continuous variable (CV) quantum teleportation (QT). To this end, we derive the Wigner characteristic function for the family of photon subtracted and photon added TMSC states, which is then utilized to calculate the fidelity of teleporting a single mode coherent state and a squeezed vacuum state. The analysis shows that while symmetric PS enhances the fidelity of QT in an extensive range of squeezing, asymmetric PS enhances the performance marginally and only in the low squeezing regime. The addition operations on the other hand are less useful, symmetric three-PA leads to a marginal improvement while the other addition operations are useless. We have considered the actual experimental setup for PS and PA operations and computed their success probabilities which should be kept in mind while advocating the use of these operations. We could compute the fidelity of QT for a broad range of states because we analytically derived the Wigner characteristic function for these family of states which we think will be useful for various other applications of these families of states.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Just read twice: closing the recall gap for recurrent language models
Authors:
Simran Arora,
Aman Timalsina,
Aaryan Singhal,
Benjamin Spector,
Sabri Eyuboglu,
Xinyi Zhao,
Ashish Rao,
Atri Rudra,
Christopher Ré
Abstract:
Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key chal…
▽ More
Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives $11.0 \pm 1.3$ points of improvement, averaged across $16$ recurrent LMs and the $6$ ICL tasks, with $11.9\times$ higher throughput than FlashAttention-2 for generation prefill (length $32$k, batch size $16$, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides $99\%$ of Transformer quality at $360$M params., $30$B tokens and $96\%$ at $1.3$B params., $50$B tokens on average across the tasks, with $19.2\times$ higher throughput for prefill than FA2.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
LeDNet: Localization-enabled Deep Neural Network for Multi-Label Radiography Image Classification
Authors:
Lalit Pant,
Shubham Arora
Abstract:
Multi-label radiography image classification has long been a topic of interest in neural networks research. In this paper, we intend to classify such images using convolution neural networks with novel localization techniques. We will use the chest x-ray images to detect thoracic diseases for this purpose. For accurate diagnosis, it is crucial to train the network with good quality images. But man…
▽ More
Multi-label radiography image classification has long been a topic of interest in neural networks research. In this paper, we intend to classify such images using convolution neural networks with novel localization techniques. We will use the chest x-ray images to detect thoracic diseases for this purpose. For accurate diagnosis, it is crucial to train the network with good quality images. But many chest X-ray images have irrelevant external objects like distractions created by faulty scans, electronic devices scanned next to lung region, scans inadvertently capturing bodily air etc. To address these, we propose a combination of localization and deep learning algorithms called LeDNet to predict thoracic diseases with higher accuracy. We identify and extract the lung region masks from chest x-ray images through localization. These masks are superimposed on the original X-ray images to create the mask overlay images. DenseNet-121 classification models are then used for feature selection to retrieve features of the entire chest X-ray images and the localized mask overlay images. These features are then used to predict disease classification. Our experiments involve comparing classification results obtained with original CheXpert images and mask overlay images. The comparison is demonstrated through accuracy and loss curve analyses.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
$f(Q,L_m)$ gravity, and its cosmological implications
Authors:
Ayush Hazarika,
Simran Arora,
P. K. Sahoo,
Tiberiu Harko
Abstract:
In the present work, we extend the $f(Q)$ symmetric teleparallel gravity by introducing an arbitrary coupling between the non-metricity $Q$ and matter Lagrangian $L_m$ in the Lagrangian density $f$ of the theory, which thus leads to the $f\left(Q,L_m\right)$ theory. This generalisation encompasses Coincident General Relativity (CGR) and the Symmetric Teleparallel Equivalent to GR (STEGR). Using th…
▽ More
In the present work, we extend the $f(Q)$ symmetric teleparallel gravity by introducing an arbitrary coupling between the non-metricity $Q$ and matter Lagrangian $L_m$ in the Lagrangian density $f$ of the theory, which thus leads to the $f\left(Q,L_m\right)$ theory. This generalisation encompasses Coincident General Relativity (CGR) and the Symmetric Teleparallel Equivalent to GR (STEGR). Using the metric formalism, we derive the field equation of the theory, which generalizes the field equations of $f(Q)$ gravity. From the study of the covariant divergence of the field equations, it follows that the presence of the geometry-matter coupling leads to the non-conservation of the matter energy-momentum tensor. The cosmological implications of the theory are investigated in the case of a flat, homogeneous, and isotropic Friedmann-Lemaitre-Robertson-Walker geometry. As a first step in this direction, we obtain the modified Friedmann equations for the $f(Q,L_m)$ gravity in a general form. Specific cosmological models are investigated for several choices of $f(Q,L_m)$, including $f(Q,L_m)=-αQ + 2L_m + β$, and $f(Q,L_m)=- αQ + (2L_m)^2 + β$, respectively. Comparative analyses with the standard $Λ$ CDM paradigm are carried out, and the observational implications of the models are investigated in detail.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Authors:
Zirui Wang,
Mengzhou Xia,
Luxi He,
Howard Chen,
Yitao Liu,
Richard Zhu,
Kaiqu Liang,
Xindi Wu,
Haotian Liu,
Sadhika Malladi,
Alexis Chevalier,
Sanjeev Arora,
Danqi Chen
Abstract:
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou…
▽ More
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to outperform strong proprietary models on these benchmarks, a simple stress test with slightly different charts or questions can deteriorate performance by up to 34.5%. In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from arXiv papers. CharXiv includes two types of questions: 1) descriptive questions about examining basic chart elements and 2) reasoning questions that require synthesizing information across complex visual elements in the chart. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Our results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%. All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs. We hope CharXiv facilitates future research on MLLM chart understanding by providing a more realistic and faithful measure of progress. Project page and leaderboard: https://charxiv.github.io/
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Authors:
Shane Arora,
Marzena Karpinska,
Hung-Ting Chen,
Ipsita Bhattacharjee,
Mohit Iyyer,
Eunsol Choi
Abstract:
Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally a…
▽ More
Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally agnostic questions translated from English into 22 other languages. We define culturally specific questions as those uniquely or more likely to be asked by people from cultures associated with the question's language. We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-resourced, rarely-studied languages such as Fijian and Kirundi. Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers. We automatically evaluate a suite of open- and closed-source models on CaLMQA by detecting incorrect language and token repetitions in answers, and observe that the quality of LLM-generated answers degrades significantly for some low-resource languages. Lastly, we perform human evaluation on a subset of models and languages. Manual evaluation reveals that model performance is significantly worse for culturally specific questions than for culturally agnostic questions. Our findings highlight the need for further research in non-English LFQA and provide an evaluation framework.
△ Less
Submitted 3 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Decoder-only Architecture for Streaming End-to-end Speech Recognition
Authors:
Emiru Tsunoo,
Hayato Futami,
Yosuke Kashiwagi,
Siddhant Arora,
Shinji Watanabe
Abstract:
Decoder-only language models (LMs) have been successfully adopted for speech-processing tasks including automatic speech recognition (ASR). The LMs have ample expressiveness and perform efficiently. This efficiency is a suitable characteristic for streaming applications of ASR. In this work, we propose to use a decoder-only architecture for blockwise streaming ASR. In our approach, speech features…
▽ More
Decoder-only language models (LMs) have been successfully adopted for speech-processing tasks including automatic speech recognition (ASR). The LMs have ample expressiveness and perform efficiently. This efficiency is a suitable characteristic for streaming applications of ASR. In this work, we propose to use a decoder-only architecture for blockwise streaming ASR. In our approach, speech features are compressed using CTC output and context embedding using blockwise speech subnetwork, and are sequentially provided as prompts to the decoder. The decoder estimates the output tokens promptly at each block. To this end, we also propose a novel training scheme using random-length prefix prompts to make the model robust to the truncated prompts caused by blockwise processing. An experimental comparison shows that our proposed decoder-only streaming ASR achieves 8% relative word error rate reduction in the LibriSpeech test-other set while being twice as fast as the baseline model.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Authors:
Yosuke Kashiwagi,
Hayato Futami,
Emiru Tsunoo,
Siddhant Arora,
Shinji Watanabe
Abstract:
End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech. Since the common scenario is where the language is already known, these models can perform as language-specific by using language information as prompts, which is particularly beneficial for attentio…
▽ More
End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech. Since the common scenario is where the language is already known, these models can perform as language-specific by using language information as prompts, which is particularly beneficial for attention-based encoder-decoder architectures. However, the Connectionist Temporal Classification (CTC) approach, which enhances recognition via joint decoding and multi-task training, does not normally incorporate language prompts due to its conditionally independent output tokens. To overcome this, we introduce an encoder prompting technique within the self-conditioned CTC framework, enabling language-specific adaptation of the CTC model in a zero-shot manner. Our method has shown to significantly reduce errors by 28% on average and by 41% on low-resource languages.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Authors:
Hayato Futami,
Siddhant Arora,
Yosuke Kashiwagi,
Emiru Tsunoo,
Shinji Watanabe
Abstract:
Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specif…
▽ More
Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specific subnetworks within a multi-task SLU model via neural network pruning. In addition to model compression, we expect that the forgetting of previously trained tasks can be mitigated by updating only a task-specific subnetwork. We conduct experiments on top of the state-of-the-art multi-task SLU model ``UniverSLU'', trained for several tasks such as emotion recognition (ER), intent classification (IC), and automatic speech recognition (ASR). We show that pruned models were successful in adapting to additional ASR or IC data with minimal performance degradation on previously trained tasks.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Authors:
Siddhant Arora,
Ankita Pasad,
Chung-Ming Chien,
Jionghao Han,
Roshan Sharma,
Jee-weon Jung,
Hira Dhamyal,
William Chen,
Suwon Shon,
Hung-yi Lee,
Karen Livescu,
Shinji Watanabe
Abstract:
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th…
▽ More
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. However, the community still lacks a fine-grained understanding of the comparative utility of different SFMs. Inspired by this, we ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs? To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head. Although the supervised SFMs are pre-trained on much more speech recognition data (with labels), they do not always outperform self-supervised SFMs; the latter tend to perform at least as well as, and sometimes better than, supervised SFMs, especially on the sequence generation tasks in SLUE. While there is no universally optimal way of incorporating SFMs, the complex prediction head gives the best performance for most tasks, although it increases the inference time. We also introduce an open-source toolkit and performance leaderboard, SLUE-PERB, for these tasks and modeling strategies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
To what extent can ASV systems naturally defend against spoofing attacks?
Authors:
Jee-weon Jung,
Xin Wang,
Nicholas Evans,
Shinji Watanabe,
Hye-jin Shim,
Hemlata Tak,
Sidhhant Arora,
Junichi Yamagishi,
Joon Son Chung
Abstract:
The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex…
▽ More
The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically exploring diverse ASV systems and spoofing attacks, ranging from traditional to cutting-edge techniques. Through extensive analyses conducted on eight distinct ASV systems and 29 spoofing attack systems, we demonstrate that the evolution of ASV inherently incorporates defense mechanisms against spoofing attacks. Nevertheless, our findings also underscore that the advancement of spoofing attacks far outpaces that of ASV systems, hence necessitating further research on spoofing-robust ASV methodologies.
△ Less
Submitted 14 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Cosmological dynamics of interacting dark energy and dark matter in $f(Q)$ gravity
Authors:
Gaurav N. Gadbail,
Simran Arora,
Phongpichit Channuie,
P. K. Sahoo
Abstract:
In this work, we explore the behavior of interacting dark energy and dark matter within a model of $f(Q)$ gravity, employing a standard framework of dynamical system analysis. We consider the power-law $f(Q)$ model incorporating with two different forms of interacting dark energy and dark matter: $3αHρ_m$ and $\fracα{3H}ρ_m ρ_{DE}$. The evolution of $Ω_m, Ω_r, Ω_{DE}, q$, and $ω$ for different val…
▽ More
In this work, we explore the behavior of interacting dark energy and dark matter within a model of $f(Q)$ gravity, employing a standard framework of dynamical system analysis. We consider the power-law $f(Q)$ model incorporating with two different forms of interacting dark energy and dark matter: $3αHρ_m$ and $\fracα{3H}ρ_m ρ_{DE}$. The evolution of $Ω_m, Ω_r, Ω_{DE}, q$, and $ω$ for different values of the model parameter $n$ and the interaction parameter $α$ has been examined. Our results show that the universe was dominated by matter in the early stages and will be dominated by dark energy in later stages. Using the observational data, the fixed points are found to be stable and can be represented the de Sitter and quintessence acceleration solutions. We discover that the dynamical profiles of the universe in $f(Q)$ dark energy models are influenced by both the interaction term and the relevant model parameters.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Forbidden subgraphs of conjugacy class graphs of groups
Authors:
Papi Ray,
Sonakshee Arora,
Pallabi Manna
Abstract:
Let G be a finite group. The nilpotent/commuting/solvable conjugacy class graph is a simple graph with non-central conjugacy classes of G as its vertex set and two vertices are adjacent if and only if a member of one conjugacy class with a member of another conjugacy class generates a nilpotent/abelian/solvable (sub)group. In this paper have discussed about the forbidden subgraphs of nilpotent, co…
▽ More
Let G be a finite group. The nilpotent/commuting/solvable conjugacy class graph is a simple graph with non-central conjugacy classes of G as its vertex set and two vertices are adjacent if and only if a member of one conjugacy class with a member of another conjugacy class generates a nilpotent/abelian/solvable (sub)group. In this paper have discussed about the forbidden subgraphs of nilpotent, commuting and solvable conjugacy class graphs of groups.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Using 3-D LiDAR Data for Safe Physical Human-Robot Interaction
Authors:
Sarthak Arora,
Karthik Subramanian,
Odysseus Adamides,
Ferat Sahin
Abstract:
This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed t…
▽ More
This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed that leverages reflectivity, signal, near-infrared, and point-cloud data from a 3-D lidar. This data is then used to perform safety based control whilst satisfying the speed and separation monitoring (SSM) criteria. In order to support the perception pipeline, a state-of-the-art object detection network was leveraged and fine-tuned by transfer learning. An analysis is provided along with results of the perception and the safety based controller. Additionally, this system is compared with the previous work.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Eventually positive semigroups: spectral and asymptotic analysis
Authors:
Sahiba Arora
Abstract:
The spectral theory of semigroup generators is a crucial tool for analysing the asymptotic properties of operator semigroups. Typically, Tauberian theorems, such as the ABLV theorem, demand extensive information about the spectrum to derive convergence results. However, the scenario is significantly simplified for positive semigroups on Banach lattices. This observation extends to the broader clas…
▽ More
The spectral theory of semigroup generators is a crucial tool for analysing the asymptotic properties of operator semigroups. Typically, Tauberian theorems, such as the ABLV theorem, demand extensive information about the spectrum to derive convergence results. However, the scenario is significantly simplified for positive semigroups on Banach lattices. This observation extends to the broader class of eventually positive semigroups -- a phenomenon observed in various concrete differential equations. In this paper, we investigate the spectral and asymptotic properties of eventually positive semigroups, focusing particularly on the persistently irreducible case. Our findings expand upon the existing theory of eventual positivity, offering new insights into the cyclicity of the peripheral spectrum and asymptotic trends. Notably, several arguments for positive operators and semigroups do not apply in our context, necessitating the use of ultrapower arguments to circumvent these challenges.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Authors:
Aniket Didolkar,
Anirudh Goyal,
Nan Rosemary Ke,
Siyuan Guo,
Michal Valko,
Timothy Lillicrap,
Danilo Rezende,
Yoshua Bengio,
Michael Mozer,
Sanjeev Arora
Abstract:
Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interac…
▽ More
Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans.
To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
A computational test of quantum contextuality, and even simpler proofs of quantumness
Authors:
Atul Singh Arora,
Kishor Bharti,
Alexandru Cojocaru,
Andrea Coladangelo
Abstract:
Bell non-locality is a fundamental feature of quantum mechanics whereby measurements performed on "spatially separated" quantum systems can exhibit correlations that cannot be understood as revealing predetermined values. This is a special case of the more general phenomenon of "quantum contextuality", which says that such correlations can occur even when the measurements are not necessarily on se…
▽ More
Bell non-locality is a fundamental feature of quantum mechanics whereby measurements performed on "spatially separated" quantum systems can exhibit correlations that cannot be understood as revealing predetermined values. This is a special case of the more general phenomenon of "quantum contextuality", which says that such correlations can occur even when the measurements are not necessarily on separate quantum systems, but are merely "compatible" (i.e. commuting). Crucially, while any non-local game yields an experiment that demonstrates quantum advantage by leveraging the "spatial separation" of two or more devices (and in fact several such demonstrations have been conducted successfully in recent years), the same is not true for quantum contextuality: finding the contextuality analogue of such an experiment is arguably one of the central open questions in the foundations of quantum mechanics.
In this work, we show that an arbitrary contextuality game can be compiled into an operational "test of contextuality" involving a single quantum device, by only making the assumption that the device is computationally bounded. Our work is inspired by the recent work of Kalai et al. (STOC '23) that converts any non-local game into a classical test of quantum advantage with a single device. The central idea in their work is to use cryptography to enforce spatial separation within subsystems of a single quantum device. Our work can be seen as using cryptography to enforce "temporal separation", i.e. to restrict communication between sequential measurements.
Beyond contextuality, we employ our ideas to design a "proof of quantumness" that, to the best of our knowledge, is arguably even simpler than the ones proposed in the literature so far.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models
Authors:
Samir Arora,
Liangliang Wang
Abstract:
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more effic…
▽ More
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more efficient fine-tuning (PEFT) methods. Commonly known parameter-efficient fine-tuning methods like LoRA and BitFit are typically applied across all layers of the model. We propose a PEFT method, called Stratified Progressive Adaptation Fine-tuning (SPAFIT), based on the localization of different types of linguistic knowledge to specific layers of the model. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods while fine-tuning only a fraction of the parameters adjusted by other methods.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Improving device-independent weak coin flipping protocols
Authors:
Atul Singh Arora,
Jamie Sikora,
Thomas Van Himbeeck
Abstract:
Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, wh…
▽ More
Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, where the bias is a commonly adopted security measure for coin flipping protocols. This work presents two techniques to lower the bias of such protocols, namely self-testing and abort-phobic compositions. We apply these techniques to the SCAKPM '11 protocol above and, assuming a continuity conjecture, lower the bias to $\varepsilon \approx 0.29104$. We believe that these techniques could be useful in the design of device-independent protocols for a variety of other tasks.
Independently of weak coin flipping, en route to our results, we show how one can test $n-1$ out of $n$ devices, and estimate the performance of the remaining device, for later use in the protocol. The proof uses linear programming and, due to its generality, may find applications elsewhere.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
The Third Monocular Depth Estimation Challenge
Authors:
Jaime Spencer,
Fabio Tosi,
Matteo Poggi,
Ripudaman Singh Arora,
Chris Russell,
Simon Hadfield,
Richard Bowden,
GuangYuan Zhou,
ZhengXin Li,
Qiang Rao,
YiPing Bao,
Xiao Liu,
Dohyeong Kim,
Jinseong Kim,
Myunghyun Kim,
Mykola Lavreniuk,
Rui Li,
Qing Mao,
Jiang Wu,
Yu Zhu,
Jinqiu Sun,
Yanning Zhang,
Suraj Patni,
Aradhye Agarwal,
Chetan Arora
, et al. (16 additional authors not shown)
Abstract:
This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su…
▽ More
This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%.
△ Less
Submitted 27 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
PDS 70 unveiled by star-hopping: total intensity, polarimetry and mm-imaging modeled in concert
Authors:
Z. Wahhaj,
M. Benisty,
C. Ginski,
C. Swastik,
S. Arora,
R. G. van Holstein,
R. J. De Rosa,
B. Yang,
J. Bae,
B. Ren
Abstract:
Context. Most ground-based planet search direct imaging campaigns use angular differential imaging, which distorts the signal from extended sources like protoplanetary disks. In the case PDS 70, a young system with two planets found within the cavity of a protoplanetary disk, obtaining a reliable image of both planets and disk is essential to understanding planet-disk interactions. Aims. Our goals…
▽ More
Context. Most ground-based planet search direct imaging campaigns use angular differential imaging, which distorts the signal from extended sources like protoplanetary disks. In the case PDS 70, a young system with two planets found within the cavity of a protoplanetary disk, obtaining a reliable image of both planets and disk is essential to understanding planet-disk interactions. Aims. Our goals are to reveal the true intensity of the planets and disk without self-subtraction effects for the first time, search for new giant planets beyond separations of 0.1" and to study the morphology of the disk shaped by two massive planets. Methods. We present YJHK-band imaging, polarimetry, and spatially resolved spectroscopy of PDS 70 using near-simultaneous reference star differential imaging, also known as star-hopping. We created a radiative transfer model of the system to match the near-infrared imaging and polarimetric data, along with sub-millimeter imaging from ALMA. Furthermore, we extracted the spectra of the planets and the disk and compared them. Results. We find that the disk is quite flared with a scale height of ~15% at the outer edge of the disk at ~90 au, similar to some disks in the literature. The gap inside of ~50 au is estimated to have ~1% of the dust density of the outer disk. The Northeast outer disk arc seen in previous observations is likely the outer lip of the flared disk. Abundance ratios of grains estimated by the modeling indicate a shallow grain-size index > -2.7, instead of the canonical -3.5. There is both vertical and radial segregation of grains. Planet c is well separated from the disk and has a spectrum similar to planet b, clearly redder than the disk spectra. Planet c is possibly associated with the sudden flaring of the disk starting at ~50 au. No new planets > 5 Mj were found.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
The lattice structure of negative Sobolev and extrapolation spaces
Authors:
Sahiba Arora,
Jochen Glück,
Felix L. Schwenninger
Abstract:
It is well-known that the Sobolev spaces $W^{k,p}(\mathbb R^d)$ are vector lattices with respect to the pointwise almost everywhere order if $k \in \{0,1\}$, but not if $k \ge 2$. In this note, we consider negative $k$ and show that the span of the positive cone in $W^{k,p}(\mathbb R^d)$ is a vector lattice in this case.
We also prove a related abstract result: if $(T(t))_{t \in [0,\infty)}$ is…
▽ More
It is well-known that the Sobolev spaces $W^{k,p}(\mathbb R^d)$ are vector lattices with respect to the pointwise almost everywhere order if $k \in \{0,1\}$, but not if $k \ge 2$. In this note, we consider negative $k$ and show that the span of the positive cone in $W^{k,p}(\mathbb R^d)$ is a vector lattice in this case.
We also prove a related abstract result: if $(T(t))_{t \in [0,\infty)}$ is a positive $C_0$-semigroup on a Banach lattice $X$ with order continuous norm, then the span of the cone $X_{-1,+}$ in the extrapolation space $X_{-1}$ is a vector lattice. This complements results obtained by Bátkai, Jacob, Wintermayr, and Voigt in the context of perturbation theory and provides additional context for the theory of infinite-dimensional positive systems.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Limit-case admissibility for positive infinite-dimensional systems
Authors:
Sahiba Arora,
Jochen Glück,
Lassi Paunonen,
Felix L. Schwenninger
Abstract:
In the context of positive infinite-dimensional linear systems, we systematically study $L^p$-admissible control and observation operators with respect to the limit-cases $p=\infty$ and $p=1$, respectively. This requires an in-depth understanding of the order structure on the extrapolation space $X_{-1}$, which we provide. These properties of $X_{-1}$ also enable us to discuss when zero-class admi…
▽ More
In the context of positive infinite-dimensional linear systems, we systematically study $L^p$-admissible control and observation operators with respect to the limit-cases $p=\infty$ and $p=1$, respectively. This requires an in-depth understanding of the order structure on the extrapolation space $X_{-1}$, which we provide. These properties of $X_{-1}$ also enable us to discuss when zero-class admissibility is automatic. While those limit-cases are the weakest form of admissibility on the $L^p$-scale, it is remarkable that they sometimes follow from order theoretic and geometric assumptions. Our assumptions on the geometries of the involved spaces are minimal.
△ Less
Submitted 4 May, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Optimistic Verifiable Training by Controlling Hardware Nondeterminism
Authors:
Megha Srivastava,
Simran Arora,
Dan Boneh
Abstract:
The increasing compute demands of AI systems has led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which struggle to scal…
▽ More
The increasing compute demands of AI systems has led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which struggle to scale due to requiring cryptographic techniques, and "optimistic" methods that consider a trusted third-party auditor who replicates the training process. A key challenge with the latter is that hardware nondeterminism between GPU types during training prevents an auditor from replicating the training process exactly, and such schemes are therefore non-robust. We propose a method that combines training in a higher precision than the target model, rounding after intermediate computation steps, and storing rounding decisions based on an adaptive thresholding procedure, to successfully control for nondeterminism. Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of ResNet-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems.
△ Less
Submitted 16 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech
Authors:
Aron R,
Indra Sigicharla,
Chirag Periwal,
Mohanaprasad K,
Nithya Darisini P S,
Sourabh Tiwari,
Shivani Arora
Abstract:
The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vi…
▽ More
The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for our novel multi-output learning architecture Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Simple linear attention language models balance the recall-throughput tradeoff
Authors:
Simran Arora,
Sabri Eyuboglu,
Michael Zhang,
Aman Timalsina,
Silas Alberti,
Dylan Zinsley,
James Zou,
Atri Rudra,
Christopher Ré
Abstract:
Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the KV-cache's aggressive memory consumption. In this work, we explore whether we can improve language model efficiency (e.g. by reducing memory consumption) without…
▽ More
Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the KV-cache's aggressive memory consumption. In this work, we explore whether we can improve language model efficiency (e.g. by reducing memory consumption) without compromising on recall. By applying experiments and theory to a broad set of architectures, we identify a key tradeoff between a model's state size and recall ability. We show that efficient alternatives to attention (e.g. H3, Mamba, RWKV) maintain a fixed-size recurrent state, but struggle at recall. We propose BASED a simple architecture combining linear and sliding window attention. By varying BASED window size and linear attention feature dimension, we can dial the state size and traverse the pareto frontier of the recall-memory tradeoff curve, recovering the full quality of attention on one end and the small state size of attention-alternatives on the other. We train language models up to 1.3b parameters and show that BASED matches the strongest sub-quadratic models (e.g. Mamba) in perplexity and outperforms them on real-world recall-intensive tasks by 6.22 accuracy points. Implementations of linear attention are often less efficient than optimized standard attention implementations. To make BASED competitive, we develop IO-aware algorithms that enable 24x higher throughput on language generation than FlashAttention-2, when generating 1024 tokens using 1.3b parameter models. Code for this work is provided at: https://github.com/HazyResearch/based.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Authors:
Kaifeng Lyu,
Haoyu Zhao,
Xinran Gu,
Dingli Yu,
Anirudh Goyal,
Sanjeev Arora
Abstract:
Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extens…
▽ More
Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extensive experiments on several chat models (Meta's Llama 2-Chat, Mistral AI's Mistral 7B Instruct v0.2, and OpenAI's GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the "Pure Tuning, Safe Testing" (PTST) principle -- fine-tune models without a safety prompt, but include it at test time. Fine-tuning experiments on GSM8K, ChatDoctor, and OpenOrca show that PTST significantly reduces the rise of unsafe behaviors, and even almost eliminates them in some cases.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Energy conditions in the $f(R,L,T)$ theory of gravity
Authors:
Simran Arora,
P. H. R. S. Moraes,
P. K. Sahoo
Abstract:
We construct the energy conditions for the recently proposed $f(R,L,T)$ gravity theory, for which $f$ is a generic function of the Ricci scalar $R$, matter lagrangian density $L$ and trace of the energy-momentum tensor $T$. We analyse two different forms for the $f(R,L,T)$ function within the framework of the Friedmann-Lemâitre-Robertson-Walker universe. We constrain the model parameters from the…
▽ More
We construct the energy conditions for the recently proposed $f(R,L,T)$ gravity theory, for which $f$ is a generic function of the Ricci scalar $R$, matter lagrangian density $L$ and trace of the energy-momentum tensor $T$. We analyse two different forms for the $f(R,L,T)$ function within the framework of the Friedmann-Lemâitre-Robertson-Walker universe. We constrain the model parameters from the energy conditions. This approach allows us to assess the feasibility of specific forms of the $f(R,L,T)$ gravity.
△ Less
Submitted 5 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Authors:
Minsu Kim,
Jee-weon Jung,
Hyeongseop Rha,
Soumi Maiti,
Siddhant Arora,
Xuankai Chang,
Shinji Watanabe,
Yong Man Ro
Abstract:
The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, whe…
▽ More
The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, where we interpret different modalities as different languages, and treat multi-modal translation as a well-established machine translation problem. To this end, we tokenize speech and image data into discrete tokens, which provide a unified interface across modalities and significantly decrease the computational cost. In the proposed TMT, a multi-modal encoder-decoder conducts the core translation, whereas modality-specific processing is conducted only within the tokenization and detokenization stages. We evaluate the proposed TMT on all six modality translation tasks. TMT outperforms single model counterparts consistently, demonstrating that unifying tasks is beneficial not only for practicality but also for performance.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Protocols for Quantum Weak Coin Flipping
Authors:
Atul Singh Arora,
Jérémie Roland,
Chrysoula Vlachou,
Stephan Weis
Abstract:
Weak coin flipping is an important cryptographic primitive -- it is the strongest known secure two-party computation primitive that classically becomes secure only under certain assumptions (e.g. computational hardness), while quantumly there exist protocols that achieve arbitrarily close to perfect security. This breakthrough result was established by Mochon in 2007 [arXiv:0711.4114]. However, hi…
▽ More
Weak coin flipping is an important cryptographic primitive -- it is the strongest known secure two-party computation primitive that classically becomes secure only under certain assumptions (e.g. computational hardness), while quantumly there exist protocols that achieve arbitrarily close to perfect security. This breakthrough result was established by Mochon in 2007 [arXiv:0711.4114]. However, his proof relied on the existence of certain unitary operators which was established by a non-constructive argument. Consequently, explicit protocols have remained elusive. In this work, we give exact constructions of related unitary operators. These, together with a new formalism, yield a family of protocols approaching perfect security thereby also simplifying Mochon's proof of existence. We illustrate the construction of explicit weak coin flipping protocols by considering concrete examples (from the aforementioned family of protocols) that are more secure than all previously known protocols.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Language Models as Science Tutors
Authors:
Alexis Chevalier,
Jiayi Geng,
Alexander Wettig,
Howard Chen,
Sebastian Mizera,
Toni Annala,
Max Jameson Aragon,
Arturo Rodríguez Fanlo,
Simon Frieder,
Simon Machado,
Akshara Prabhakar,
Ellie Thieu,
Jiachen T. Wang,
Zirui Wang,
Xindi Wu,
Mengzhou Xia,
Wenhan Xia,
Jiatong Yu,
Jun-Jie Zhu,
Zhiyong Jason Ren,
Sanjeev Arora,
Danqi Chen
Abstract:
NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering bench…
▽ More
NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TutorEval helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multi-disciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. Therefore, we create TutorChat, a dataset of 80,000 long synthetic dialogues about textbooks. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH. Our datasets build on open-source materials, and we release our models, data, and evaluations.
△ Less
Submitted 21 July, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
Authors:
Jon Saad-Falcon,
Daniel Y. Fu,
Simran Arora,
Neel Guha,
Christopher Ré
Abstract:
Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval perform…
▽ More
Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts (corresponding to queries) and long contexts (corresponding to documents), and (3) how to fine-tune this model for retrieval under the batch size limitations imposed by GPU memory constraints. To address these challenges, we first introduce LoCoV1, a novel 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scaling to documents up to 32K tokens long. We describe a pretraining data mixture which allows this encoder to process both short and long context sequences, and a finetuning approach that adapts this base model to retrieval with only single-sample batches. Finally, we validate the M2-BERT retrieval encoder on LoCoV1, finding that it outperforms competitive Transformer-based models by at least 23.3 points, despite containing upwards of 90x fewer parameters.
△ Less
Submitted 13 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Reconstruction of the singularity-free $f(\mathcal{R})$ gravity via Raychaudhuri equations
Authors:
Gaurav N. Gadbail,
Simran Arora,
P. K. Sahoo,
Kazuharu Bamba
Abstract:
We study the bounce cosmology to construct a singularity-free $f(\mathcal{R})$ model using the reconstruction technique. The formulation of the $f(\mathcal{R})$ model is based on the Raychaudhari equation, a key element employed in reconstructed models to eliminate singularities. We explore the feasibility of obtaining stable gravitational Lagrangians, adhering to the conditions…
▽ More
We study the bounce cosmology to construct a singularity-free $f(\mathcal{R})$ model using the reconstruction technique. The formulation of the $f(\mathcal{R})$ model is based on the Raychaudhari equation, a key element employed in reconstructed models to eliminate singularities. We explore the feasibility of obtaining stable gravitational Lagrangians, adhering to the conditions $f_{\mathcal{R}}>0$ and $f_{\mathcal{R}\mathcal{R}}>0$. Consequently, both models demonstrate stability, effectively avoiding the Dolgov-Kawasaki instability. Our assessment extends to testing the reconstructed model using energy conditions and the effective equation-of-state (EoS). Our findings indicate that the reconstructed super-bounce model facilitates the examination of a singularity-free accelerating universe for both phantom and non-phantom phases. However, in the case of the reconstructed oscillatory bounce model, two scenarios are considered with $ω=-1/3$ and $ω=-2/3$. While the model proves suitable for studying a singular-free accelerating universe in the $ω=-1/3$ case, it fails to demonstrate such behavior under energy conditions for the $ω=-2/3$ scenario. The reconstructed models accommodate early-time bouncing behavior and late-
△ Less
Submitted 27 July, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
LESS: Selecting Influential Data for Targeted Instruction Tuning
Authors:
Mengzhou Xia,
Sadhika Malladi,
Suchin Gururangan,
Sanjeev Arora,
Danqi Chen
Abstract:
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we…
▽ More
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we frame as targeted instruction tuning. We propose LESS, an optimizer-aware and practically efficient algorithm to effectively estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. Crucially, LESS adapts existing influence formulations to work with the Adam optimizer and variable-length instruction data. LESS first constructs a highly reusable and transferable gradient datastore with low-dimensional gradient features and then selects examples based on their similarity to few-shot examples embodying a specific capability. Experiments show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Furthermore, the selected data is highly transferable: smaller models can be leveraged to select useful data for larger models and models from different families. Our qualitative analysis shows that our method goes beyond surface form cues to identify data that exemplifies the necessary reasoning skills for the intended downstream application.
△ Less
Submitted 12 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
OLMo: Accelerating the Science of Language Models
Authors:
Dirk Groeneveld,
Iz Beltagy,
Pete Walsh,
Akshita Bhagia,
Rodney Kinney,
Oyvind Tafjord,
Ananya Harsh Jha,
Hamish Ivison,
Ian Magnusson,
Yizhong Wang,
Shane Arora,
David Atkinson,
Russell Authur,
Khyathi Raghavi Chandu,
Arman Cohan,
Jennifer Dumas,
Yanai Elazar,
Yuling Gu,
Jack Hessel,
Tushar Khot,
William Merrill,
Jacob Morrison,
Niklas Muennighoff,
Aakanksha Naik,
Crystal Nam
, et al. (18 additional authors not shown)
Abstract:
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models…
▽ More
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.
△ Less
Submitted 7 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Authors:
Yifan Peng,
Jinchuan Tian,
William Chen,
Siddhant Arora,
Brian Yan,
Yui Sudo,
Muhammad Shakeel,
Kwanghee Choi,
Jiatong Shi,
Xuankai Chang,
Jee-weon Jung,
Shinji Watanabe
Abstract:
Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite…
▽ More
Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder architectures. This work aims to improve the performance and efficiency of OWSM without additional data. We present a series of E-Branchformer-based models named OWSM v3.1, ranging from 100M to 1B parameters. OWSM v3.1 outperforms its predecessor, OWSM v3, in most evaluation benchmarks, while showing an improved inference speed of up to 25%. We further reveal the emergent ability of OWSM v3.1 in zero-shot contextual biasing speech recognition. We also provide a model trained on a subset of data with low license restrictions. We will publicly release the code, pre-trained models, and training logs.
△ Less
Submitted 16 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
SecPLF: Secure Protocols for Loanable Funds against Oracle Manipulation Attacks
Authors:
Sanidhay Arora,
Yingjiu Li,
Yebo Feng,
Jiahua Xu
Abstract:
The evolving landscape of Decentralized Finance (DeFi) has raised critical security concerns, especially pertaining to Protocols for Loanable Funds (PLFs) and their dependency on price oracles, which are susceptible to manipulation. The emergence of flash loans has further amplified these risks, enabling increasingly complex oracle manipulation attacks that can lead to significant financial losses…
▽ More
The evolving landscape of Decentralized Finance (DeFi) has raised critical security concerns, especially pertaining to Protocols for Loanable Funds (PLFs) and their dependency on price oracles, which are susceptible to manipulation. The emergence of flash loans has further amplified these risks, enabling increasingly complex oracle manipulation attacks that can lead to significant financial losses. Responding to this threat, we first dissect the attack mechanism by formalizing the standard operational and adversary models for PLFs. Based on our analysis, we propose SecPLF, a robust and practical solution designed to counteract oracle manipulation attacks efficiently. SecPLF operates by tracking a price state for each crypto-asset, including the recent price and the timestamp of its last update. By imposing price constraints on the price oracle usage, SecPLF ensures a PLF only engages a price oracle if the last recorded price falls within a defined threshold, thereby negating the profitability of potential attacks. Our evaluation based on historical market data confirms SecPLF's efficacy in providing high-confidence prevention against arbitrage attacks that arise due to minor price differences. SecPLF delivers proactive protection against oracle manipulation attacks, offering ease of implementation, oracle-agnostic property, and resource and cost efficiency.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
EXPLORE -- Explainable Song Recommendation
Authors:
Abhinav Arun,
Mehul Soni,
Palash Choudhary,
Saksham Arora
Abstract:
This study explores the development of an explainable music recommendation system with enhanced user control. Leveraging a hybrid of collaborative filtering and content-based filtering, we address the challenges of opaque recommendation logic and lack of user influence on results. We present a novel approach combining advanced algorithms and an interactive user interface. Our methodology integrate…
▽ More
This study explores the development of an explainable music recommendation system with enhanced user control. Leveraging a hybrid of collaborative filtering and content-based filtering, we address the challenges of opaque recommendation logic and lack of user influence on results. We present a novel approach combining advanced algorithms and an interactive user interface. Our methodology integrates Spotify data with user preference analytics to tailor music suggestions. Evaluation through RMSE and user studies underscores the efficacy and user satisfaction with our system. The paper concludes with potential directions for future enhancements in group recommendations and dynamic feedback integration.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Late Time Acceleration with Observational Constraints in Modified Theories of Gravity
Authors:
Simran Arora
Abstract:
The late time acceleration of the Universe has challenged contemporary cosmology since its discovery. General Relativity explains this phenomenon by introducing the cosmological constant, named the standard cosmological model ($Λ$CDM). However, the cosmological constant solution has several drawbacks that have led cosmologists to explore and propose alternative models to explain the late time acce…
▽ More
The late time acceleration of the Universe has challenged contemporary cosmology since its discovery. General Relativity explains this phenomenon by introducing the cosmological constant, named the standard cosmological model ($Λ$CDM). However, the cosmological constant solution has several drawbacks that have led cosmologists to explore and propose alternative models to explain the late time acceleration of the Universe. These alternatives span from models of a dynamical dark fluid, known as dark energy, to models of large-scale modifications of the gravitational interaction, known as modified gravity. The current dissertation intends to show several ways to investigate late-time cosmology or to look at probable places for future investigations in order to shed more light on the dark sector of the Universe...
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Authors:
Hayato Futami,
Emiru Tsunoo,
Yosuke Kashiwagi,
Hiroaki Ogawa,
Siddhant Arora,
Shinji Watanabe
Abstract:
In speech recognition applications, it is important to recognize context-specific rare words, such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise for this purpose, which efficiently biases such words with a prefix tree. While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusua…
▽ More
In speech recognition applications, it is important to recognize context-specific rare words, such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise for this purpose, which efficiently biases such words with a prefix tree. While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations. As TCPGen handles biasing words as subword units, we propose obtaining subword-level phoneme-aware encoding by using alignment between phonemes and subwords. Furthermore, we propose injecting phoneme-level predictions from CTC into queries of TCPGen so that the model better interprets the phoneme-aware encodings. We conducted ASR experiments with TCPGen for RNN transducer. We observed that proposed phoneme-aware encoding outperformed ordinary grapheme-based encoding on both the English LibriSpeech and Japanese CSJ datasets, demonstrating the robustness of our approach across linguistically diverse languages.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Revisiting kink-like parametrization and constraints using OHD/Pantheon+/BAO samples
Authors:
Simran Arora,
P. K. Sahoo
Abstract:
We reexamine the kink-like parameterization of the deceleration parameter to derive constraints on the transition redshift from cosmic deceleration to acceleration. This is achieved using observational Hubble data, Type Ia Supernovae Pantheon+ samples and Baryon acoustic oscillations. In this parametrization, the value of the initial $q$ parameter is $q_{i}$, the final value is $q_f$, the present…
▽ More
We reexamine the kink-like parameterization of the deceleration parameter to derive constraints on the transition redshift from cosmic deceleration to acceleration. This is achieved using observational Hubble data, Type Ia Supernovae Pantheon+ samples and Baryon acoustic oscillations. In this parametrization, the value of the initial $q$ parameter is $q_{i}$, the final value is $q_f$, the present value is denoted by $q_{0}$, and the transition duration is given by $α$. We perform our calculations using the Monte Carlo Markov Chain method, utilizing the emcee package. Under the assumption of a flat geometry, we constrain the range of possible values for three scenarios: when $q_{f}$ is unrestricted, when $q_{f}$ is equal to $-1$, and when $α$ is $1/3$. This is done assuming that $q_{i}=1/2$. Here, we achieve that the $SN$ data fixes the free parameters tightly as in the flat $Λ$CDM for unrestricted $q_{f}$. In addition, if we fix $q_{f}=-1$, the model behaves well as the $Λ$CDM for the combined dataset. We also acquire the current value of the deceleration parameter, which is consistent with the latest results that assume the $Λ$CDM model. Furthermore, we observe a deviation from the standard $Λ$CDM model in the current model based on the evolution of $j(z)$, and it is evident that the universe transitions from deceleration to acceleration and will eventually reach the $Λ$CDM model in the near future.
△ Less
Submitted 29 April, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Zoology: Measuring and Improving Recall in Efficient Language Models
Authors:
Simran Arora,
Sabri Eyuboglu,
Aman Timalsina,
Isys Johnson,
Michael Poli,
James Zou,
Atri Rudra,
Christopher Ré
Abstract:
Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile…
▽ More
Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile. In fine-grained analysis, we find 82% of the gap is explained by each model's ability to recall information that is previously mentioned in-context, e.g. "Hakuna Matata means no worries Hakuna Matata it means no" $\rightarrow$ "??". On this task, termed "associative recall", we find that attention outperforms gated-convolutions by a large margin: a 70M parameter attention model outperforms a 1.4 billion parameter gated-convolution model on associative recall. This is surprising because prior work shows gated convolutions can perfectly solve synthetic tests for AR capability. To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language. We perform an empirical and theoretical study of MQAR that elucidates differences in the parameter-efficiency of attention and gated-convolution recall. Informed by our analysis, we evaluate simple convolution-attention hybrids and show that hybrids with input-dependent sparse attention patterns can close 97.4% of the gap to attention, while maintaining sub-quadratic scaling. Our code is accessible at: https://github.com/HazyResearch/zoology.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
RELIC: Investigating Large Language Model Responses using Self-Consistency
Authors:
Furui Cheng,
Vilém Zouhar,
Simran Arora,
Mrinmaya Sachan,
Hendrik Strobelt,
Mennatallah El-Assady
Abstract:
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence…
▽ More
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence in individual claims in the generated texts. Using this idea, we design RELIC, an interactive system that enables users to investigate and verify semantic-level variations in multiple long-form responses. This allows users to recognize potentially inaccurate information in the generated text and make necessary corrections. From a user study with ten participants, we demonstrate that our approach helps users better verify the reliability of the generated text. We further summarize the design implications and lessons learned from this research for future studies of reliable human-LLM interactions.
△ Less
Submitted 4 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Unlearning via Sparse Representations
Authors:
Vedant Shah,
Frederik Träuble,
Ashish Malik,
Hugo Larochelle,
Michael Mozer,
Sanjeev Arora,
Yoshua Bengio,
Anirudh Goyal
Abstract:
Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p…
▽ More
Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Authors:
Dingli Yu,
Simran Kaur,
Arushi Gupta,
Jonah Brown-Cohen,
Anirudh Goyal,
Sanjeev Arora
Abstract:
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This…
▽ More
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This work introduces Skill-Mix, a new evaluation to measure ability to combine skills. Using a list of $N$ skills the evaluator repeatedly picks random subsets of $k$ skills and asks the LLM to produce text combining that subset of skills. Since the number of subsets grows like $N^k$, for even modest $k$ this evaluation will, with high probability, require the LLM to produce text significantly different from any text in the training set. The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.
Administering a version of to popular chatbots gave results that, while generally in line with prior expectations, contained surprises. Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on $k=5$ is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.
We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Invariant Physics-Informed Neural Networks for Ordinary Differential Equations
Authors:
Shivam Arora,
Alex Bihlo,
Francis Valiquette
Abstract:
Physics-informed neural networks have emerged as a prominent new method for solving differential equations. While conceptually straightforward, they often suffer training difficulties that lead to relatively large discretization errors or the failure to obtain correct solutions. In this paper we introduce invariant physics-informed neural networks for ordinary differential equations that admit a f…
▽ More
Physics-informed neural networks have emerged as a prominent new method for solving differential equations. While conceptually straightforward, they often suffer training difficulties that lead to relatively large discretization errors or the failure to obtain correct solutions. In this paper we introduce invariant physics-informed neural networks for ordinary differential equations that admit a finite-dimensional group of Lie point symmetries. Using the method of equivariant moving frames, a differential equation is invariantized to obtain a, generally, simpler equation in the space of differential invariants. A solution to the invariantized equation is then mapped back to a solution of the original differential equation by solving the reconstruction equations for the left moving frame. The invariantized differential equation together with the reconstruction equations are solved using a physics-informed neural network, and form what we call an invariant physics-informed neural network. We illustrate the method with several examples, all of which considerably outperform standard non-invariant physics-informed neural networks.
△ Less
Submitted 11 March, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
A Quadratic Synchronization Rule for Distributed Deep Learning
Authors:
Xinran Gu,
Kaifeng Lyu,
Sanjeev Arora,
Jingzhao Zhang,
Longbo Huang
Abstract:
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While…
▽ More
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While $H$ has been viewed as a hyperparameter to trade optimization efficiency for communication cost, recent research indicates that setting a proper $H$ value can lead to generalization improvement. Yet, selecting a proper $H$ is elusive. This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR), which recommends dynamically setting $H$ in proportion to $\frac{1}{η^2}$ as the learning rate $η$ decays over time. Extensive ImageNet experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. Compared with the standard data parallel training, QSR enables Local AdamW on ViT-B to cut the training time on 16 or 64 GPUs down from 26.7 to 20.2 hours or from 8.6 to 5.5 hours and, at the same time, achieves $1.16\%$ or $0.84\%$ higher top-1 validation accuracy.
△ Less
Submitted 12 April, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.